Key Themes
- Data and Modeling Overview [Data and Modeling Overview]
- What it means to ‘model’ a phenomenon
- What is a variable
- What is a measurement
- Observation (direct) vs estimation (indirect)
- Interpolation as an instance of estimation (indirect)
- Introduce concepts such as sampling vs complete population-level observations
- Types of sampling e.g., random, systematic, cluster, etc.
- Basic Data Analysis
- Summary Statistics [Data and Modeling Overview, Deep Dive into Data “Set”]
- Measures of central tendency and their benefits/drawbacks
- Measures of variability and their benefits/drawbacks
- E.g., Variation, range, standard deviation
- Discuss the limitations of summary statistics with examples
- Aggregations
- Outliers [Plotting/Graphing]
- Why we care about outliers
- How to spot and handle outliers
- Techniques for Data Visualization [Plotting/Graphing]
- Plotting vs graphing
- Why we visualize data
- Types of plots
- Scatter, line, map, bar, histogram, etc.
- How to visualize 1-D vs 2-D vs N-D Data
- Techniques for Statistical Modeling
- Distributions [Plotting/Graphing(Advanced Topic)]
- Theoretical vs empirical (i.e. probability distributions vs frequency distributions)
- Normal/Gaussian distribution
- Properties of the normal distribution
- When it is appropriate to model using the normal distribution
- Exponential distribution
- Other distributions
- Correlations [Correlations]
- Linear vs quadratic vs exponential
- Degree and direction of correlation
- How to identify correlations
- Visually (e.g., scatter plot)
- Numerically (e.g., correlation coefficient)
- Common pitfalls such as overgeneralization, Simpson’s paradox, etc.
- Correlation vs. causation
- Data Science in the Real World [Data and Modeling Overview]
- Lies, damn lies and statistics
- Manipulative visuals
- Manipulative statistics
- Cognitive biases
- Survivorship bias
- False causality fallacy
- Other biases