Statistics — Distribution continuous, Outliers interquartile (extended) Cheat Sheet
The core ideas of Statistics — Distribution continuous, Outliers interquartile (extended) distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Mean, Median, and Mode
The three primary measures of central tendency. The mean is the arithmetic average, the median is the middle value when data are ordered, and the mode is the most frequently occurring value. Each measure captures a different aspect of a dataset's center.
Standard Deviation
A measure of the spread or dispersion of a dataset relative to its mean. It is calculated as the square root of the variance, which is the average of squared deviations from the mean. A low standard deviation indicates data points cluster near the mean, while a high value indicates greater spread.
Normal Distribution
A symmetric, bell-shaped probability distribution defined by its mean $\mu$ and standard deviation $\sigma$. It is fundamental to statistics because of the Central Limit Theorem, which states that sample means tend toward a normal distribution regardless of the population's shape. Approximately 68% of data fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
Hypothesis Testing
A formal procedure for using sample data to evaluate claims about a population. The process involves stating a null hypothesis (no effect or no difference) and an alternative hypothesis, calculating a test statistic, and determining whether the evidence is strong enough to reject the null hypothesis at a chosen significance level.
P-Value
The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true. A small p-value suggests that the observed data are unlikely under the null hypothesis, providing evidence against it. It does not measure the probability that the null hypothesis is true.
Confidence Intervals
A range of values, derived from sample data, that is likely to contain the true population parameter with a specified level of confidence. A 95% confidence interval means that if the same sampling procedure were repeated many times, approximately 95% of the constructed intervals would contain the true parameter.
Regression Analysis
A set of statistical methods for estimating the relationship between a dependent variable and one or more independent variables. Linear regression fits a straight line to the data, while multiple regression and nonlinear regression handle more complex relationships. It is widely used for prediction and understanding causal factors.
Correlation
A statistical measure that quantifies the strength and direction of the linear relationship between two variables. The Pearson correlation coefficient ranges from $-1$ (perfect negative correlation) to $+1$ (perfect positive correlation), with 0 indicating no linear relationship. Correlation does not imply causation.
Sampling Methods
Techniques for selecting a subset of individuals from a population to estimate characteristics of the whole group. Common methods include simple random sampling, stratified sampling, cluster sampling, and systematic sampling. Proper sampling is essential for making valid inferences and avoiding bias.
Bayesian Statistics
An approach to statistics that incorporates prior knowledge or beliefs along with observed data to update the probability of a hypothesis. Using Bayes' theorem, the posterior probability is calculated by combining the prior probability with the likelihood of the observed data. This framework is especially useful when prior information is available or sample sizes are small.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.