# Key Themes

• Data and Modeling Overview [Data and Modeling Overview]
• What it means to ‘model’ a phenomenon
• What is a variable
• What is a measurement
• Observation (direct) vs estimation (indirect)
• Interpolation as an instance of estimation (indirect)
• Introduce concepts such as sampling vs complete population-level observations
• Types of sampling e.g., random, systematic, cluster, etc.
• Basic Data Analysis
• Summary Statistics [Data and Modeling Overview, Deep Dive into Data “Set”]
• Measures of central tendency and their benefits/drawbacks
• E.g., Mean, median, mode
• Measures of variability and their benefits/drawbacks
• E.g., Variation, range, standard deviation
• Discuss the limitations of summary statistics with examples
• E.g., Anscombe Quartet
• Aggregations
• Outliers [Plotting/Graphing]
• Why we care about outliers
• How to spot and handle outliers
• Techniques for Data Visualization [Plotting/Graphing]
• Plotting vs graphing
• Why we visualize data
• Types of plots
• Scatter, line, map, bar, histogram, etc.
• How to visualize 1-D vs 2-D vs N-D Data
• Techniques for Statistical Modeling
• Theoretical vs empirical (i.e. probability distributions vs frequency distributions)
• Normal/Gaussian distribution
• Properties of the normal distribution
• When it is appropriate to model using the normal distribution
• Exponential distribution
• Other distributions
• Bernoulli
• Binomial
• Correlations [Correlations]
• Linear vs quadratic vs exponential
• Degree and direction of correlation
• How to identify correlations
• Visually (e.g., scatter plot)
• Numerically (e.g., correlation coefficient)
• Common pitfalls such as overgeneralization, Simpson’s paradox, etc.
• Correlation vs. causation
• Data Science in the Real World [Data and Modeling Overview]
• Lies, damn lies and statistics
• Manipulative visuals
• Manipulative statistics
• Cognitive biases
• Survivorship bias
• False causality fallacy
• Other biases