14 Sep 2022 – Jennifer Chen
In the last year, we have added several interesting datasets and exciting new features to Data Commons. One such feature is the new Data Download tool that allows you to easily download statistical variable data for a large number of places with just a few button clicks.
The Data Commons knowledge graph is huge – there are over 240B data points for over 120K statistical variables. Sometimes, you may want to export just some of this data and use it in a custom tool. We now make that easy to do with the new data download tool. The new tool gives you the data in a csv file, does not require any coding experience to use, and allows you to select the statistical variables, places, and dates that you are interested in.
Maybe you want to explore the population of all the countries in the world (get the data here). Or you want to analyze poverty levels during COVID-19 (get the data here). Or you’re interested in projected temperature differences (relative to 2006) and activities that can be affected by temperature rise (get the data here). The Data Download tool gives you the power to use the data in our knowledge graph to explore all of this and much more in your tool of choice.
As always, we would love to hear from you! Please share your feedback with our team.
Jennifer on behalf of the Data Commons team
22 Apr 2022 – Jennifer Chen
Data Commons now includes 100+ sources of Sustainability data, covering topics from climate predictions (CMIP 5 and CMIP 6) from NASA, emissions from EPA, energy from EIA, NREL and UN, disasters from USGS and USFS, health from CDC and more. You can learn more about the launch of Sustainability Data Commons on the Google Keyword Blog.
As always, we are eager to hear your feedback.
Jennifer on behalf of the Data Commons team
10 Oct 2021 – Natalie Diaz
Over the past few months, we’ve continued to incorporate new data into our knowledge graph and develop new tools. Here are some of the highlights:
New Statistical Variable Explorer
As Data Commons has grown, the number of Statistical Variables has increased. With over 300k variables to choose from (and counting!), we wanted to make it easier for you to find the right variables for your analysis. To address this, we added a new tool for exploring Statistical Variables. The tool provides metadata about the observations, places, and provenances we have for each variable.
Lately, we’ve been focused on building up our inventory of sustainability-related data. Some of recent our imports include:
- Several of the IPCC RCP scenarios (e.g. Max Daily Temperature Based on RCP 8.5 in the US)
- WHO’s Global Health Observatory (e.g. Prevalence (%) of females in the US with BMI of 30 or greater, Percent of rural population in South Africa with at least basic drinking water services, and Percent of urban population in the US with household expenditures on health greater than 10% of total household expenditure or income)
- UN’s Energy Statistics Database (e.g. Annual Generation of Coal in the US)
- EPA’s Greenhouse Gas Reporting Program (e.g. Greenhouse Gas emissions from large facilities in Santa Clara County, and California, as well as EPA reporting facilities such as Anheuser Busch Baldwinsville Brewery and Glen Burnie Landfill)
- Stanford’s DeepSolar (e.g. Count of Solar Installation per capita in California)
We’re also in the process of importing a large number of US Census American Community Survey Subject Tables, which contain detailed demographic data about a variety of topics. For example:
- Count of With Food Stamps in The Past 12 Months, Below Poverty Level in The Past 12 Months per capita
- Count of Single Mother Family Household, Some College or Associate’s Degree
New Import Tool
We’ve made it easier for contributors to add datasets to Data Commons with our new open source command-line tool. This tool provides linting and detailed stats validation, streamlining our data ingestion process and making it more accessible.
Check out our Github repo here.
As always, please feel free to share any feedback.
Natalie on behalf of the Data Commons team
01 Jun 2021 – Carolyn Au
We’ve been hard at work since we surfaced Data Commons in Google Search last October. Some of the exciting features we’ve added include:
Place Explorer is now available in 8 languages in addition to English: German, Spanish, French, Hindi, Italian, Japanese, Korean and Russian. Additionally, support for these languages are carried forward from Google Search, here’s an example.
New Graph Browser
The Graph Browser was rewritten from the ground up to be faster and more responsive. It includes search support for the growing number of Statistical Variables available for each node, as well as redesigned to improve information density. Try it out for some nodes such as India, Unemployment Rate in Boston and Renal Cell Carcinoma.
New Scatter Plot Explorer
The new Scatter Plot Explorer enables quick visual exploration of any two statistical variables for a set of places. Try it out for Bachelor Degree Attainment vs Females per capita in California Counties or Covid-19 cases vs African Americans per capita among US States.
API Documentation Refresh
We participated in the 2020 Season of Docs, working with Anne Ulrich (@KilimAnnejaro) to completely refresh and improve our API documentation. Every API page was rewritten, in addition to new Google Sheets API tutorials. We had a wonderful time collaborating with Anne on this project and hope the improved documentation enables more developers to harness the power of our APIs.
New Stats API
We have also released a new set of APIs centered around statistics retrieval. There are different REST endpoints to retrieve a single statistical value, a statistical time series or the entire collection of statistical data for a set of places. We have used these APIs to build the new Scatter Plot Explorer and hope this enables other applications too.
As always, we continue to add more data to the Data Commons Graph. Some recent additions include:
- Indian Census (e.g. houseless and rural literacy populations)
- Reserve Bank of India’s Poverty data (e.g. rural population below poverty in Andhra Pradesh)
- FDA and additional drug information (e.g. drug from FDA, ChEMBL, PharmGKB, etc.)
- Improved Covid-19 statistics (e.g. vaccination stats from ourworldindata.org)
- US Energy Information Administration (e.g. coal and natural gas consumption for electricity)
- Expanded international data from World Bank (e.g., crime, healthstats)
- Updated data from existing sources, including:
As always, we are eager to hear from you! Please share your feedback with our team.
Carolyn on behalf of the Data Commons team
15 Oct 2020 – R.V.Guha
Today, we are excited to share that Data Commons is accessible via natural language queries in Google search. At a time when data informs our understanding of so many issues–from public health and education to the evolving workforce and more–access to data has never been more important. Data Commons in Google search is a step in this direction, enabling users to explore data without the need for expertise or programming skills.
Three years ago, the Data Commons journey started at Google with a simple observation: our ability to use data to understand our world is frequently hampered by the difficulties in working with data. The difficulties of finding, cleaning and joining datasets effectively limit who gets to work with data.
Data Commons addresses this challenge head on, performing the tedious tasks of curating, joining and cleaning data sets at scale so that data users don’t have to. The result? Large scale and cloud accessible APIs to clean and normalize data originating from some of the most widely used datasets, including those from the US Census, World Bank, CDC and more. Available as a layer on top of the Knowledge Graph, Data Commons is now accessible to a much wider audience.
Data Commons is Open. Open Data, Open Source. We hope that like its elder sister Schema.org, it becomes one of the foundational layers of the Web. We know this can only happen if it is built in an open and collaborative fashion. We are actively looking for partnerships on every aspect of this project, and we look forward to hearing from you!
R.V.Guha & the Data Commons team
26 Jul 2020 – Jennifer Chen
Over the last month and a half, we have worked hard to add some exciting new features:
New Map Explorer
The new Map Explorer offers an easy way to visualize how a statistical variable can vary across geographic regions. Try it out for Attainment of Bachelor Degree or Higher across Washington Counties or Median Income across US States.
New Statistical Variable Menu
The Statistical Variable Menu used for the Scatter Plot Explorer, Timelines Explorer, and Map Explorer was revamped to serve a much more comprehensive list of over 287000 statistical variables in an easy to consume way. This new menu comes with useful features such as search support and information on the places that each statistical variable has data for.
We’ve continued to add new data to the Data Commons graph. Some of these new additions include:
- Air quality data from US Environmental Protection Agency (eg. Overall Air Quality Index)
- India wages data from Indian Periodic Labour Force Survey (eg. mean daily wages for urban workers and rural workers)
- India unemployment rate data from Reserve Bank of India (eg. unemployment rate amongst urban residents and rural residents)
We would love to hear any feedback you may have!
Jennifer on behalf of the Data Commons team