Blog

  • Data Commons' New Natural Language Interface

    Data Commons is now harnessing the power of AI, specifically large language models (LLMs), to create a natural language interface. LLMs are used to understand the query and the results come straight from Data Commons, including a link to the original data source.

    Learn more in our Keyword blog post.

  • New Courseware - Data Literacy with Data Commons

    tl;dr

    Today, we are announcing the open and public availability of “Data Literacy with Data Commons” which comprises curriculum/course materials for instructors, students and other practitioners working on or helping others become data literate. This includes detailed modules with pedagogical narratives, explanations of key concepts, examples, and suggestions for exercises/projects focused on advancing the consumption, understanding and interpretation of data in the contemporary world. In our quest to expand the reach and utility of this material, we assume no background in computer science or programming, thereby removing a key obstacle to many such endeavors.

    This material can be accessed on our courseware page and it is open for anyone to take advantage of. If you use any of this material, we would love to hear from you! If you end up finding any of this material useful and would like to be notified of updates, do drop us a line.

    What is it?

    A set of modules focusing on several key concepts focusing on data modeling, analysis, visualization and the (ab)use of data to tell (false) narratives. Each module lists its objectives and builds on a pedagogical narrative around the explanation of key concepts, e.g. the differences between correlations and causation. We extensively use the Data Commons platform to point to real world examples without needing to write a single line of code!

    Who is this for?

    Anyone and everyone. Instructors, students, aspiring data scientists and anyone interested in advancing their data comprehension and analysis skills without needing to code. For instructors, the curriculum page details the curriculum organization and how to find key concepts/ideas to use.

    What’s Different?

    There are several excellent courses which range from basic data analysis to advanced data science. We make no claim about “Data Literacy with Data Commons” being a replacement for them. Instead, we hope for this curriculum to become a useful starting point for those who want to whet their appetite in becoming data literate. This material uses a hands on approach, replete with real world examples but without requiring any programming. It also assumes only a high-school level of comfort with math and statistics. Data Commons is a natural companion platform to enable easy access to data and core visualizations. We hope that anyone exploring the suggested examples will rapidly be able to explore more and even generate new examples and case studies on their own! If you end up finding and exploring new examples and case studies, please share them with us through this form.

    What is Data Literacy?

    What does it mean to be “data literate”? Unsurprisingly, the answer depends on who one asks: from those who believe it implies being a casual consumer of data visualizations (in the media, for example) to those who believe that such a person ought to be able to run linear regressions on large volumes of data in a spreadsheet. Given that most (or all) of us are proliferate consumers of data, we take an opinionated approach to defining “data literacy”: someone who is data literate ought to be comfortable with consuming data across a wide range of modalities and be able to interpret it to make informed decisions. And we believe that data literacy ought not to be exclusionary and should be accessible to anyone and everyone.

    There is no shortage of data all around us. While some of it will always be beyond the comprehension of most of us, e.g. advanced clinical trials data about new drugs under development or data reporting the inner workings of complex systems like satellites, much of the data we consume is not as complex and should not need advanced degrees to consume and decipher. For example, the promise of hundreds of dollars in savings when switching insurance providers or that nine out of ten dentists recommend a particular brand of toothpaste or that different segments of the society (men, women, youth, veterans etc) tend to vote a certain way on specific issues. We consume this data regularly and being able to interpret it to draw sound conclusions ought not to require advanced statistics.

    Unfortunately, data literacy has been an elusive goal for many because it has been gated on relative comfort with programming or programming-like skills, e.g. spreadsheets. We believe data literacy should be more inclusive and require fewer prerequisites. There is no hiding from a basic familiarity with statistics, e.g. knowing how to take a sample average—after all, interpreting data is a sStatistical exercise. However, for a large majority of us the consumption, interpretation and decision-making based on data does not need a working knowledge of computer science (programming).

    As a summary, our view on “Data Literacy” can be described as follows:

    • Ability to consume, understand, create, and communicate with data.
    • Ability to make decisions based on data.
    • And to do so confidently, i.e. reduce “data anxiety”.
    • A skill for everyone, not just “data scientists”.

    With these goals in mind, we hope that this introductory curriculum can help the target audiences towards achieving data literacy and inspire many to dive deeper and farther to become data analysts and scientists.

    Crystal, Jehangir, and Julia, on behalf of the Data Commons team

  • New Data Download Tool

    In the last year, we have added several interesting datasets and exciting new features to Data Commons. One such feature is the new Data Download tool that allows you to easily download statistical variable data for a large number of places with just a few button clicks.

    The new data download tool

    The Data Commons knowledge graph is huge – there are over 240B data points for over 120K statistical variables. Sometimes, you may want to export just some of this data and use it in a custom tool. We now make that easy to do with the new data download tool. The new tool gives you the data in a csv file, does not require any coding experience to use, and allows you to select the statistical variables, places, and dates that you are interested in.

    Maybe you want to explore the population of all the countries in the world (get the data here). Or you want to analyze poverty levels during COVID-19 (get the data here). Or you’re interested in projected temperature differences (relative to 2006) and activities that can be affected by temperature rise (get the data here). The Data Download tool gives you the power to use the data in our knowledge graph to explore all of this and much more in your tool of choice.

    As always, we would love to hear from you! Please share your feedback with our team.

    Jennifer on behalf of the Data Commons team

  • Sustainability Data Commons

    Data Commons now includes 100+ sources of Sustainability data, covering topics from climate predictions (CMIP 5 and CMIP 6) from NASA, emissions from EPA, energy from EIA, NREL and UN, disasters from USGS and USFS, health from CDC and more. You can learn more about the launch of Sustainability Data Commons on the Google Keyword Blog.

    As always, we are eager to hear your feedback.

    Jennifer on behalf of the Data Commons team

  • Data Commons Updates

    Over the past few months, we’ve continued to incorporate new data into our knowledge graph and develop new tools. Here are some of the highlights:

    New Statistical Variable Explorer

    As Data Commons has grown, the number of Statistical Variables has increased. With over 300k variables to choose from (and counting!), we wanted to make it easier for you to find the right variables for your analysis. To address this, we added a new tool for exploring Statistical Variables. The tool provides metadata about the observations, places, and provenances we have for each variable.

    New Data

    Lately, we’ve been focused on building up our inventory of sustainability-related data. Some of recent our imports include:

    We’re also in the process of importing a large number of US Census American Community Survey Subject Tables, which contain detailed demographic data about a variety of topics. For example:

    New Import Tool

    We’ve made it easier for contributors to add datasets to Data Commons with our new open source command-line tool. This tool provides linting and detailed stats validation, streamlining our data ingestion process and making it more accessible.

    Check out our Github repo here.

    As always, please feel free to share any feedback.

    Thanks!

    Natalie on behalf of the Data Commons team

  • Data Commons Updates

    We’ve been hard at work since we surfaced Data Commons in Google Search last October. Some of the exciting features we’ve added include:

    Internationalization Support

    Place Explorer is now available in 8 languages in addition to English: German, Spanish, French, Hindi, Italian, Japanese, Korean and Russian. Additionally, support for these languages are carried forward from Google Search, here’s an example.

    New Graph Browser

    The Graph Browser was rewritten from the ground up to be faster and more responsive. It includes search support for the growing number of Statistical Variables available for each node, as well as redesigned to improve information density. Try it out for some nodes such as India, Unemployment Rate in Boston and Renal Cell Carcinoma.

    New Scatter Plot Explorer

    The new Scatter Plot Explorer enables quick visual exploration of any two statistical variables for a set of places. Try it out for Bachelor Degree Attainment vs Females per capita in California Counties or Covid-19 cases vs African Americans per capita among US States.

    API Documentation Refresh

    We participated in the 2020 Season of Docs, working with Anne Ulrich (@KilimAnnejaro) to completely refresh and improve our API documentation. Every API page was rewritten, in addition to new Google Sheets API tutorials. We had a wonderful time collaborating with Anne on this project and hope the improved documentation enables more developers to harness the power of our APIs.

    New Stats API

    We have also released a new set of APIs centered around statistics retrieval. There are different REST endpoints to retrieve a single statistical value, a statistical time series or the entire collection of statistical data for a set of places. We have used these APIs to build the new Scatter Plot Explorer and hope this enables other applications too.

    New Data

    As always, we continue to add more data to the Data Commons Graph. Some recent additions include:

    As always, we are eager to hear from you! Please share your feedback with our team.

    Carolyn on behalf of the Data Commons team

  • Data Commons, now accessible on Google Search

    Today, we are excited to share that Data Commons is accessible via natural language queries in Google search. At a time when data informs our understanding of so many issues–from public health and education to the evolving workforce and more–access to data has never been more important. Data Commons in Google search is a step in this direction, enabling users to explore data without the need for expertise or programming skills.

    Three years ago, the Data Commons journey started at Google with a simple observation: our ability to use data to understand our world is frequently hampered by the difficulties in working with data. The difficulties of finding, cleaning and joining datasets effectively limit who gets to work with data.

    Data Commons addresses this challenge head on, performing the tedious tasks of curating, joining and cleaning data sets at scale so that data users don’t have to. The result? Large scale and cloud accessible APIs to clean and normalize data originating from some of the most widely used datasets, including those from the US Census, World Bank, CDC and more. Available as a layer on top of the Knowledge Graph, Data Commons is now accessible to a much wider audience.

    Data Commons is Open. Open Data, Open Source. We hope that like its elder sister Schema.org, it becomes one of the foundational layers of the Web. We know this can only happen if it is built in an open and collaborative fashion. We are actively looking for partnerships on every aspect of this project, and we look forward to hearing from you!

    R.V.Guha & the Data Commons team

  • Data Commons Updates

    Over the last month and a half, we have worked hard to add some exciting new features:

    New Map Explorer

    The new Map Explorer offers an easy way to visualize how a statistical variable can vary across geographic regions. Try it out for Attainment of Bachelor Degree or Higher across Washington Counties or Median Income across US States.

    New Statistical Variable Menu

    The Statistical Variable Menu used for the Scatter Plot Explorer, Timelines Explorer, and Map Explorer was revamped to serve a much more comprehensive list of over 287000 statistical variables in an easy to consume way. This new menu comes with useful features such as search support and information on the places that each statistical variable has data for.

    New Data

    We’ve continued to add new data to the Data Commons graph. Some of these new additions include:

    We would love to hear any feedback you may have!

    Jennifer on behalf of the Data Commons team