Advanced setups
This page covers hybrid setups that are not recommended for most use cases, but may be helpful for some custom Data Commons instances:
- Running the data management container locally, and the service container in Google Cloud. In this scenario, you store your input data locally, and write the output to Cloud Storage and Cloud SQL. This might be useful for users with very large data sets, that would like to cut down on output generation times and the cost of storing input data in addition to output data.
- Running the service container locally, and the data management container in Google Cloud. If you have already set up a data processing pipeline to send your input data to Google Cloud, but are still iterating on the website code, this might be a useful option.
Run the data management container locally and the service container in the cloud
This process is similar to running both data management and services containers locally, with a few exceptions:
- Your input directory will be the local file system, while the output directory will be a Google Cloud Storage bucket and folder.
- You must start the job with credentials to be passed to Google Cloud, to access the Cloud SQL instance.
Before you proceed, ensure you have set up all necessary GCP services.
Set environment variables
To run a local instance of the data management container, you need to set all of the environment variables in the custom_dc/env.list
file, including all the GCP ones.
- Obtain the values output by Terraform scripts: Go to https://console.cloud.google.com/run/jobs for your project, select the relevant job from the list, and click View and edit job configuration.
- Expand Edit container, and select the Variables and secrets tab.
- Copy the values of all the variables, with the exception of
FORCE_RESTART
andINPUT_DIR
to yourenv.list
file. - Set the value of
INPUT_DIR
to the full local path where your CSV, JSON, and JSON files are located.
Run the data management Docker container
- Bash script
- Docker commands
./run_cdc_dev_docker.sh --container data [--release latest]If you don't specify the
--release
option, it will use the stable
version by default.
- Generate credentials for Cloud application authentication:
gcloud auth application-default login
- Run the container:
docker run \ --env-file $PWD/custom_dc/env.list \ -v INPUT_DIRECTORY:INPUT_DIRECTORY \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \ gcr.io/datcom-ci/datacommons-data:VERSION
- The input directory is the local path. You don't specify the output directory, as you aren't mounting a local output volume.
- The version is
latest
orstable
.
To verify that the data is correctly created in your Cloud SQL database, use the procedure in Inspect the Cloud SQL database.
(Optional) Run the data management Docker container in schema update mode
If you have tried to start a container, and have received a SQL check failed
error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, DATA_RUN_MODE
to miminize the startup time.
- Bash script
- Docker commands
./run_cdc_dev_docker.sh --container data --schema_update [--release latest]
docker run \ --env-file $PWD/custom_dc/env.list \ -v INPUT_DIRECTORY:INPUT_DIRECTORY \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \ -e DATA_RUN_MODE=schemaupdate gcr.io/datcom-ci/datacommons-data:VERSION
Restart the services container in Google Cloud
Follow any of the procedures provided in Start/restart the services container.
Access Cloud data from a local services container
For testing purposes, if you wish to run the services Docker container locally but access the data in Google Cloud. This process is similar to running both data management and services containers in the cloud, but with a step to start a local Docker services container.
Before you proceed, ensure you have set up all necessary GCP services.
Set environment variables
To run a local instance of the services container, you need to set all of the environment variables in the custom_dc/env.list
file, including all the GCP ones.
- Obtain the values output by Terraform scripts: Go to https://console.cloud.google.com/run/services for your project, select the relevant service from the list, and click the Revisions tab.
- In the right-hand window, scroll to Environment variables.
- Copy the values of all the variables, with the exception of
FORCE_RESTART
to yourenv.list
file.
Run the services Docker container
- Bash script
- Docker commands
./run_cdc_dev_docker.sh --actions build_run --container service --image IMAGE_NAME:IMAGE_TAGTo run a previously built custom image:
./run_cdc_dev_docker.sh --container service --image IMAGE_NAME:IMAGE_TAGTo run a Data Commons standard release:
./run_cdc_dev_docker.sh --container service [--release latest]If you don't specify the
--release
option, it will use the stable
version by default.
- Generate credentials for Cloud application authentication:
gcloud auth application-default login
- Run the container.
To run a custom image:docker run -it \ --env-file $PWD/custom_dc/env.list \ -p 8080:8080 \ -e DEBUG=true \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \ -v $PWD/server/templates/custom_dc/custom:/workspace/server/templates/custom_dc/custom \ -v $PWD/static/custom_dc/custom:/workspace/static/custom_dc/custom \ IMAGE_NAME:IMAGE_TAG
- The image name and image tag are the values you set when you created the package.
- You don't specify any directories here, as you aren't mounting any local volumes.
To run a Data Commons standard release:docker run -it \ --env-file $PWD/custom_dc/env.list \ -p 8080:8080 \ -e DEBUG=true \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \ -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \ gcr.io/datcom-ci/datacommons-services:VERSION
- The version is
latest
orstable
. - You don't specify any directories here, as you aren't mounting any local volumes.
Once the services are up and running, visit your local instance by pointing your browser to http://localhost:8080.
If you encounter any issues, look at the detailed output log on the console, and visit the Troubleshooting Guide for detailed solutions to common problems.
Page last updated: April 14, 2025 • Send feedback about this page