Get Multivariate DataFrame

datacommons_pandas.build_multivariate_dataframe(places, stats_vars)

Returns a pandas.DataFrame with places as index and stat_vars as columns, where each cell is latest observed statistic for its Place and StatisticalVariable.

See the full list of StatisticalVariables.

Arguments

  • places (Iterable of str): A list of dcids of the Places to query for.

  • stat_vars (Iterable of str): A list of dcids of the StatisticalVariables to query for.

Returns

A pandas.DataFrame with places (str) as index and stat_vars (str) as columns, where each cell is latest observed statistic (float) for its Place and StatisticalVariable.

Raises

  • ValueError - If no statistical values found for the given parameters.

Be sure to initialize the library. See the datacommons_pandas library setup guide for more details.

You can find a list of StatisticalVariables with human-readable names here.

Examples

We would like to get a DataFrame of

for the United States, California,and Santa Clara County.

>>> import datacommons_pandas as dcpd
>>> dcpd.build_multivariate_dataframe(["country/USA", "geoId/06", "geoId/06085"],
                  ["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"])
             Count_Person  Median_Age_Person  UnemploymentRate_Person
place                                                                
country/USA     328239523               37.9                      NaN
geoId/06         39512223               36.3                     15.1
geoId/06085       1927852               37.0                     10.7

In the next example, there is no data about RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone nor RetailDrugDistribution_DrugDistribution_Amphetamine for non-USA places, so the API throws ValueError for no data:

>>> import datacommons_pandas as dcpd
>>> dcpd.build_multivariate_dataframe(
      ["country/MEX", "nuts/AT32"],
      ["RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone",
      "RetailDrugDistribution_DrugDistribution_Amphetamine"
      ]
    )
ValueError    Traceback (most recent call last)
...
-->    raise ValueError('No data for any of specified Places and StatisticalVariables.')

ValueError: No data for any of specified places and stat_vars.