# Modules

All of the available course content can be found on this page, which is composed of two sections:

- Overviews summarizes the available modules and lists the modules in progress
- Tables of Contents provides the table of contents for each available module

If you are having trouble accessing this content, please email us at courses@datacommons.org. If you would like to contribute to this endeavor, please do reach out at courses@datacommons.org and help us add more content.

## Overviews

**Module 1: Data Overview**

This module is meant to serve as the introductory class(es) in a data literacy course. It provides motivation for why data literacy matters, introduces key concepts such as measurements, variables, models, etc., and defines some basic descriptive statistics and data visualization techniques.

**Module 2: Deep Dive into Data “Set”**

Introduces students to the idea of a data “set” as well as some basic terms and statistical measures (e.g., mean, standard deviation, etc.) while highlighting important caveats and takeaways when drawing conclusions from these measures.

**Module 3: Relating Data “Sets”**

Introduces students to the idea of related data sets and describes how to merge data sets into one large data collection. Real data sets are analyzed in depth to demonstrate the benefits of building a nuanced, complete picture of the world.

**Module 4: Plotting/Graphing**

Introduces students to some common plots and graphs and discusses when to use each depending on the data and the context. It also defines and explores various distributions with analysis of their properties and examples from the real world.

**Module 5: Correlations**

Introduces students to the idea of correlation between two variables and describes how to observe, quantify, and label correlations, with a focus on linear correlations. Strategies for distinguishing real correlations from noise are provided and common pitfalls, such as correlation vs causation, are discussed in depth.

**Module 6: Mapping**

Introduces students to the idea of geographic data and discusses how to visualize and aggregate geographic data.

**Module 7: Time Series**

Coming soon!

## Tables of Contents

**Module 1: Data Overview**

- Lies, Damn Lies, and Statistics
- Why is Data Literacy Important?
- Manipulating Visuals
- Misleading Statistics
- Cognitive Biases
- Survivorship Bias
- False Causality Fallacy
- Other Biases

- Truth, Damn Truths, and Statistics
- Data Is Powerful
- So, What is Data, Anyway?
- Variables
- Measurements
- Direct Observation
- Estimation
- Interpolation and Extrapolation

- Population
- Sampling
- Random Sampling
- Systematic Sampling
- Convenience Sampling
- Cluster Sampling

- Models

- The Big Picture
- Communicating Data
- Descriptive Statistics
- Mode
- Median
- Mean
- Standard deviation
- Data Visualization
- Bar Chart
- Line Chart
- Scatter Plot
- Histogram
- Map Plot

**Module 2: Deep Dive into Data “Set”**

- Where Do Data Sets Come From?
- Examples of Datasets
- Exploring Datasets
- Variables & Dimensions
- Categorical vs Numeric Variables

- Number of Observations
- Exercise: Dice Rolls
- Missing Observations

- Summary Statistics
- Measures of Central Tendency
- One Measure to Rule Them All?
- Mean
- Median
- Mode

- One and Done?
- Three and Complete?
- Measures of Variability
- Range and Standard Deviation

- Assignments
- Explore, Analyze and Compare Two Datasets

**Module 3: Relating Data “Sets”**

- What Are Related Data Sets?
- Examples of Related Data Sets
- What About “Unrelated” Data Sets?
- Exercise: Finding Relationships

- How Do We Combine Related Data Sets?
- Visualizing Data Sets
- Examples of Tabular Data
- Basic Data Combinations
- Adding Rows
- Adding Columns
- More Complicated Combinations
- Exercise: Find the Combination

- Analyzing Combined Data
- Why Do We Combine Related Data Sets?
- Case Studies
- Blood Pressure vs. Age
- Simpson’s Paradox
- Global Covid Data
- Which Country is the “Best”?

- Exercise: Exploring New Places
- Exercise: Where to Live?

**Module 4: Plotting/Graphing**

- Plot vs. Graph
- Why Do We Plot?
- Common Plots
- Scatter
- Line
- Map
- Bar
- Histogram

- 1-D vs. 2-D vs. N-D
- Exercise: Exploring Plots and Graphs

- Outliers
- Where Do Outliers Come From?
- How Do We Handle Outliers?

- Advanced Topic: Introduction to Distributions
- Normal Distribution
- Definition
- Properties
- When is Data Normal?
- Exercise: Is It Normal?
- Exceptions to the Rule

- Exponential
- Other Distributions (Optional)
- Bernoulli
- Binomial

**Module 5: Correlations**

- What is Correlation?
- Linear, Quadratic, and Exponential Correlations
- Direction of Correlation: Positive and Negative
- Degree of Correlation

- Identifying Correlations
- Visual: Scatter Plot
- Line of Best Fit
- Numerical: Correlation Coefficient

- Examples of Correlated/Uncorrelated Variables
- Exercise: Spotting Correlations
- Exercise: Correlations in the Wild
- Common Pitfalls
- Lack of Data
- Simpson’s Paradox

- Important Outliers
- Weighted Correlation Coefficient

- Misleading Correlations
- Overgeneralization
- Correlation and Causation
- What is Causation?
- Correlation vs Causation
- What’s the Difference?
- Can We Have…
- Confounding Variable
- Case Studies

**Module 6: Mapping**

*Table of Contents*

- Examples of Geographic Data
- Levels of Granularity
- Latitude/Longitude Coordinates
- Mini-Exercise: Getting Used to Lat/Lon
- Numerical vs. Categorical Granularity

- Two Places, One Name? One Place, Two Names?
- Change Over Time
- Change Across Borders

- Visualizing Geographic Data: Map Plots
- Interpreting Geographic Data
- Analyzing Geographic Data
- Converting between Levels of Granularity
- Mapping Step
- Aggregation Step
- Aggregation by Summing
- Aggregation by Averaging
- Aggregation by Something Else?
- Exercise: Aggregation by You

- Converting between Types of Granularity

**Module 7: Time Series**

Coming soon!