Join with data outside Data Commons
This page illustrates how you can join external datasets with Data Commons by relying on unique IDs and geo locations. For the examples below, we use other public datasets in BigQuery Analytics Hub, but they can be any other private/public dataset.
Use FIPS codes
We use the Fatal Accidents dataset from National Highway Traffic Safety Administration to compute counties with highest fatal accidents per capita. We map to DC counties using the FIPS or geoId, and use total population statistics. Loving County, TX (the least populated county in main US) and Kenedy County, TX are at the top.
WITH FatalAccidents AS (
SELECT
CONCAT(LPAD(CAST(NHTSA.state_number AS STRING), 2, '0'), LPAD(CAST(NHTSA.county AS STRING), 3, '0')) AS FIPS,
COUNT(consecutive_number) AS Count
FROM `nhtsa_traffic_fatalities.accident_2016` AS NHTSA
GROUP BY FIPS)
SELECT P.id AS CountyId,
P.name AS CountyName,
IF(Obs.value IS NOT NULL, FA.Count / CAST(Obs.value AS FLOAT64), NULL) AS FatalAccidentsPerCapita
FROM FatalAccidents AS FA
JOIN `data_commons.Observation` AS Obs ON TRUE
JOIN `data_commons.Place` AS P ON TRUE
WHERE
P.id = Obs.observation_about AND
P.geo_id = FA.FIPS AND
Obs.variable_measured = 'Count_Person' AND
Obs.is_preferred_obs_across_facets
ORDER BY FatalAccidentsPerCapita DESC
Use zip codes
We use Google’s Project SunRoof dataset to compute solar potential for low-income Zip code areas in the US. From 500 Zip code areas with the lowest median income, we compute those Zip codes that have the highest solar potential (among those that were sufficiently qualified). Of the 500, we find that 133 of them had > 50% potential.
WITH LowestEarnerZips AS (
SELECT Obs.observation_about AS PlaceId,
Obs.Value AS Income
FROM `data_commons.Observation` AS Obs
WHERE
Obs.is_preferred_obs_across_facets AND
Obs.variable_measured = 'Median_Income_Person' AND
Obs.observation_about LIKE 'zip/%'
ORDER BY CAST(Value AS FLOAT64)
LIMIT 500)
SELECT SunRoof.region_name AS Zip, SunRoof.state_name AS State, SunRoof.percent_qualified AS PercentSunRoof
FROM LowestEarnerZips AS DC
JOIN `project_sunroof.solar_potential_by_postal_code` AS SunRoof ON TRUE
WHERE CONCAT('zip/', SunRoof.region_name) = DC.PlaceId AND
SunRoof.percent_qualified > 0 AND
SunRoof.percent_covered > 80
ORDER BY PercentSunRoof DESC
Use latitude/longitude
From the OpenStreetMap Public Dataset, we compute the US counties with most fire-hydrants per unit area. To do this, we use the geo boundaries in DC to map latitude/longitude to US counties and get their corresponding land area values. Alexandria County, Virginia is at the top of the list.
WITH CountyFireHydrantCount AS(
WITH FireHydrantLocations AS (
SELECT geometry AS Geo
FROM `openstreetmap_public_dataset.planet_nodes` AS node
JOIN UNNEST(all_tags) AS tags
WHERE (tags.key = 'emergency' AND tags.value = 'fire_hydrant'))
SELECT PB.id AS CountyId, COUNT(*) AS NumHydrants
FROM `data_commons.PlaceBoundary` AS PB
JOIN `data_commons.Place` AS P ON TRUE
JOIN FireHydrantLocations ON TRUE
WHERE
P.id = PB.id AND
EXISTS(SELECT * FROM UNNEST(P.all_types) AS T WHERE T = 'County') AND
ST_WITHIN(FireHydrantLocations.Geo,
ST_GEOGFROMGEOJSON(PB.geo_json_coordinates, make_valid => TRUE))
GROUP BY CountyId)
SELECT P.id AS CountyId,
P.name AS CountyName,
CountyFireHydrantCount.NumHydrants /
(SUBSTR(P.land_Area, 12) AS INT64) AS HydrantsPerSqMeter
FROM `data_commons.Place` AS P
JOIN CountyFireHydrantCount ON TRUE
WHERE
P.id = CountyFireHydrantCount.CountyId AND
STARTS_WITH(P.land_area, 'SquareMeter')
ORDER BY HydrantsPerSqMeter DESC
Page last updated: November 21, 2024 • Send feedback about this page