Advanced setups

This page covers hybrid setups that are not recommended for most use cases, but may be helpful for some custom Data Commons instances:

Run the data management container locally and the service container in the cloud

This process is similar to running both data management and services containers locally, with a few exceptions:

  • Your input directory will be the local file system, while the output directory will be a Google Cloud Storage bucket and folder.
  • You must start the job with credentials to be passed to Google Cloud, to access the Cloud SQL instance.

Before you proceed, ensure you have set up all necessary GCP services.

Step 1: Set environment variables

To run a local instance of the data management container, you need to set all of the environment variables in the custom_dc/env.list file, including all the GCP ones.

  1. Obtain the values output by Terraform scripts: Go to https://console.cloud.google.com/run/jobs for your project, select the relevant job from the list, and click View and edit job configuration.
  2. Expand Edit container, and select the Variables and secrets tab.
  3. Copy the values of all the variables, with the exception of FORCE_RESTART and INPUT_DIR to your env.list file.
  4. Set the value of INPUT_DIR to the full local path where your CSV, JSON, and JSON files are located.

Step 2: Generate credentials for Google Cloud authentication

For the services to connect to the Cloud SQL instance, you need to generate credentials that can be used in the local Docker container for authentication. You should refresh the credentials every time you rerun the Docker container.

Open a terminal window and run the following command:

gcloud auth application-default login

This opens a browser window that prompts you to enter credentials, sign in to Google Auth Library and allow Google Auth Library to access your account. Accept the prompts. When it has completed, a credential JSON file is created in
$HOME/.config/gcloud/application_default_credentials.json. Use this in the Docker commands below to authenticate from the Docker container.

Step 3: Run the data management Docker container

From your project root directory, run:

docker run \
--env-file $PWD/custom_dc/env.list \
-v INPUT_DIRECTORY:INPUT_DIRECTORY \
-v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
gcr.io/datcom-ci/datacommons-data:VERSION

The input directory is the local path. The output directory is the Cloud Storage path. The version is latest or stable.

To verify that the data is correctly created in your Cloud SQL database, use the procedure in Inspect the Cloud SQL database.

(Optional) Run the data management Docker container in schema update mode

If you have tried to start a container, and have received a SQL check failed error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, DATA_RUN_MODE to miminize the startup time.

To do so, add the following line to the above command:

-e DATA_RUN_MODE=schemaupdate \

Step 4: Restart the services container in Google Cloud

Follow any of the procedures provided in Start/restart the services container.

Access Cloud data from a local services container

For testing purposes, if you wish to run the services Docker container locally but access the data in Google Cloud. This process is similar to running both data management and services containers in the cloud, but with a step to start a local Docker services container.

Before you proceed, ensure you have set up all necessary GCP services.

Step 1: Set environment variables

To run a local instance of the services container, you need to set all of the environment variables in the custom_dc/env.list file, including all the GCP ones.

  1. Obtain the values output by Terraform scripts: Go to https://console.cloud.google.com/run/services for your project, select the relevant service from the list, and click the Revisions tab.
  2. In the right-hand window, scroll to Environment variables.
  3. Copy the values of all the variables, with the exception of FORCE_RESTART to your env.list file.

Step 2: Generate credentials for Google Cloud default application

See the section above for procedures.

Step 3: Run the services Docker container

From the root directory of your repo, run the following command, assuming you are using a locally built image:

docker run -it \
--env-file $PWD/custom_dc/env.list \
-p 8080:8080 \
-e DEBUG=true \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
-v INPUT_DIRECTORY:INPUT_DIRECTORY \
-v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
[-v $PWD/server/templates/custom_dc/custom:/workspace/server/templates/custom_dc/custom \]
[-v $PWD/static/custom_dc/custom:/workspace/static/custom_dc/custom \]
IMAGE_NAME:IMAGE_TAG

The input and output directories are Google Cloud Storage paths. The image name and image tag are the values you set when you created the package.

Once the services are up and running, visit your local instance by pointing your browser to http://localhost:8080.

If you encounter any issues, look at the detailed output log on the console, and visit the Troubleshooting Guide for detailed solutions to common problems.

Page last updated: January 21, 2025 • Send feedback about this page