Data Commons Python API V2

Note: The V2 version of the Python client libraries is in Beta. Documentation and tutorials have not yet been updated to V2.

Overview

The Data Commons Python API is a Python client library that enables developers to programmatically access nodes in the Data Commons knowledge graph. This package allows you to explore the structure of the graph, integrate statistics from the graph into data analysis workflows and much more.

Before proceeding, make sure you have followed the setup instructions below.

What’s new in V2

The latest version of Python client libraries implements the REST V2 APIs and adds many convenience methods. The package name is datacommons_client.

Here are just some of the changes from the previous version of the libraries:

  • You can use this new version to query custom Data Commons instances in addition to base datacommons.org.
  • The Data Commons [Pandas]((https://pandas.pydata.org/){: target=”_blank”} module is included as an option in the install package; there is no need to install each library separately. Pandas APIs have also been migrated to use the REST V2 Observation API.
  • Requests to base datacommons.org require an API key.
  • The primary interface is a set of classes representing the REST V2 API endpoints.
  • Each class provides a fetch method that takes an API relation expression as an argument as well as several convenience methods for commonly used operations.
  • There is no SPARQL endpoint.

Install the Python Data Commons V2 API

This procedure uses a Python virtual environment as recommended by Google Cloud Setting up a Python development environment.

  1. If not done already, install python3 and pip3. See Installing Python for procedures.
  2. Go to your project directory and create a virtual environment using venv, as described in Using venv to isolate dependencies.
  3. Install the the datacommons-client package:

    $ pip install datacommons-client
    

To get additional functionality with Pandas DataFrames, run:

$ pip install "datacommons-client[Pandas]"

Run Python interactively

The pages in this site demonstrate running Python methods interactively from the Bash shell. To use this facility, be sure to import the datacommons_client package:

From your virtual environment, run:

python3
>>> import datacommons_client

Create a client

You access all Data Commons Python endpoints and methods through the DataCommonsClient class.

To create a client and connect to the base Data Commons, namely datacommons.org:

from datacommons_client.client import DataCommonsClient
client = DataCommonsClient(api_key="YOUR_API_KEY")

See below about API keys.

To create a client and connect to a custom Data Commons by a publicly resolvable DNS hostname:

from datacommons_client.client import DataCommonsClient
client = DataCommonsClient(dc_instance="DNS_HOSTNAME")

For example:

client = DataCommonsClient(dc_instance="datacommons.one.org")

To create a client and connect to a custom Data Commons by a private/non-resolvable address, specify the full API path, including the protocol and API version:

from datacommons_client.client import DataCommonsClient
client = DataCommonsClient(url="http://YOUR_ADDRESS/core/api/v2/")

For example, to connect to a locally running DataCommons instance:

from datacommons_client.client import DataCommonsClient
client = DataCommonsClient(url="http://localhost:8080/core/api/v2/")

Authentication

All access to the base Data Commons (datacommons.org) the V2 APIs must be authenticated and authorized with an API key. The DataCommonsClient object manages propagating the API key to all requests, so you don’t need to specify it as part of data requests.

We provide a trial API key for general public use. This key will let you try the APIs and make single requests.

The trial key is capped with a limited quota for requests. If you are planning on using the APIs more rigorously (e.g. for personal or school projects, developing applications, etc.) please request an official key without any quota limits; see Obtain an API key for information.

For custom DC instances, you do not need to provide any API key.

Request endpoints and responses

The Python client library sends HTTP POST requests to the Data Commons REST API endpoints and receives JSON responses. Each endpoint has a corresponding response type. The classes are below:

API Endpoint Description Response type
Observation observation Fetches statistical observations (time series) ObservationResponse
Observations Pandas DataFrame observations_dataframe Same as above, except the functionality is provided by a method of the DataCommonsClient class directly, instead of an intermediate endpoint pd.DataFrame
Node node Fetches information about edges and neighboring nodes NodeResponse
Resolve entities resolve Returns a Data Commons ID (DCID) for entities in the graph ResolveResponse

To send a request, you use one of the endpoints available as methods of the client object. For example:

Request:

client.resolve.fetch_dcids_by_name(names="Georgia")

Response:

ResolveResponse(entities=[Entity(node='Georgia', candidates=[Candidate(dcid='geoId/13', dominantType=None), Candidate(dcid='country/GEO', dominantType=None), Candidate(dcid='geoId/5027700', dominantType=None)])])

See the linked pages for descriptions of the methods available for each endpoint, its methods and responses.

Find available entities, variables, and their DCIDs

Many requests require the DCID of the entity or variable you wish to query. For tips on how to find relevant DCIDs, entities and variables, please see the Key concepts document, specifically the following sections:

Relation expressions

Each endpoint has a fetch() method that takes a relation expression. For complete information on the syntax and usage of relation expressions, please see the REST V2 API relation expressions documentation.

For common requests, each endpoint also provides convenience methods that build the expressions for you. See the endpoint pages for details.

Response formatting

By default, responses are returned as Python dataclass objects with the full structure. For example:

response = client.resolve.fetch_dcids_by_name(names="Georgia")
print(response)
ResolveResponse(entities=[Entity(node='Georgia', candidates=[Candidate(dcid='geoId/13', dominantType=None), Candidate(dcid='country/GEO', dominantType=None), Candidate(dcid='geoId/5027700', dominantType=None)])])

Each response class provides some property methods that are useful for formatting the output.

Method Description
to_dict Converts the dataclass to a Python dictionary.
to_json Serializes the dataclass to a JSON string (using json.dumps()).

Both methods take the following input parameter:

Parameter Description
exclude_none
Optional
Compact response with nulls and empty lists removed. Defaults to True. To preserve the original structure and return all properties including null values and empty lists, set this to False.

Examples

Example 1: Return dictionary in compact format

This example removes all properties that have null values or empty lists.

Request:

client.resolve.fetch_dcids_by_name(names="Georgia").to_dict()

Response:

{'entities': [{'node': 'Georgia', 'candidates': [{'dcid': 'geoId/13'}, {'dcid': 'country/GEO'}, {'dcid': 'geoId/5027700'}]}]}

Example 2: Return dictionary with original structure

This example sets exclude_none to False to preserve all properties from the original response, including all nulls and empty lists.

Request:

client.resolve.fetch_dcids_by_name(names="Georgia").to_dict(exclude_none=False)

Response:

{'entities': [{'node': 'Georgia', 'candidates': [{'dcid': 'geoId/13', 'dominantType': None}, {'dcid': 'country/GEO', 'dominantType': None}, {'dcid': 'geoId/5027700', 'dominantType': None}]}]}

Example 3: Return compact JSON string

This example converts the response to a formatted JSON string, in compact form, and prints the response for better readability.

Request:

client.resolve.fetch_dcids_by_name(names="Georgia").to_json()

Response:

{
  "entities": [
    {
      "node": "Georgia",
      "candidates": [
        {
          "dcid": "geoId/13"
        },
        {
          "dcid": "country/GEO"
        },
        {
          "dcid": "geoId/5027700"
        }
      ]
    }
  ]
}

Note: On the endpoint reference pages we will show all responses using this format, but will leave out the response methods for succinctness.

Page last updated: March 27, 2025 • Send feedback about this page