Key themes
- Data and Modeling Overview (Data and Modeling Overview)
- What it means to ‘model’ a phenomenon
- What is a variable
- What is a measurement
- Observation (direct) vs estimation (indirect)
- Interpolation as an instance of estimation (indirect)
- Introduce concepts such as sampling vs complete population-level observations
- Types of sampling e.g., random, systematic, cluster, etc.
- Basic Data Analysis
- Summary Statistics (Data and Modeling Overview, Deep Dive into Data “Set”)
- Measures of central tendency and their benefits/drawbacks
- E.g., Mean, median, mode
- Measures of variability and their benefits/drawbacks
- E.g., Variation, range, standard deviation
- Discuss the limitations of summary statistics with examples
- E.g., Anscombe Quartet
- Measures of central tendency and their benefits/drawbacks
- Aggregations
- When and why do aggregations make sense or are necessary (Relating Data “Sets”, Mapping)
- How to aggregate two data sets into one (Relating Data “Sets”)
- How to aggregate data sets between levels of granularity (Mapping)
- Outliers (Plotting/Graphing)
- Why we care about outliers
- How to spot and handle outliers
- Summary Statistics (Data and Modeling Overview, Deep Dive into Data “Set”)
- Techniques for Data Visualization (Plotting/Graphing)
- Plotting vs graphing
- Why we visualize data
- Types of plots
- Scatter, line, map, bar, histogram, etc.
- How to visualize 1-D vs 2-D vs N-D data
- Techniques for Statistical Modeling
- Distributions (Plotting/Graphing)
- Theoretical vs empirical (i.e. probability distributions vs frequency distributions)
- Normal/Gaussian distribution
- Properties of the normal distribution
- When it is appropriate to model using the normal distribution
- Exponential distribution
- Other distributions
- Bernoulli
- Binomial
- Correlations (Correlations)
- Linear vs quadratic vs exponential
- Degree and direction of correlation
- How to identify correlations
- Visually (e.g., scatter plot)
- Numerically (e.g., correlation coefficient)
- Common pitfalls such as overgeneralization, Simpson’s paradox, etc.
- Correlation vs. causation
- Distributions (Plotting/Graphing)
- Data Science in the Real World (Data and Modeling Overview)
- Lies, damn lies and statistics
- Manipulative visuals
- Manipulative statistics
- Cognitive biases
- Survivorship bias
- False causality fallacy
- Other biases
- Lies, damn lies and statistics
Page last updated: December 17, 2024 • Send feedback about this page