Biostatistics Cheat Sheet

The core ideas of Biostatistics distilled into a single, scannable reference — perfect for review or quick lookup.

PiqCue — piqcue.com/biostatistics/cheatsheet

Quick Reference

Hypothesis Testing

A formal statistical procedure for deciding whether observed data provide sufficient evidence to reject a null hypothesis. It involves setting a significance level (alpha), computing a test statistic, and comparing the resulting p-value to alpha to draw conclusions.

P-Value

The probability of observing data as extreme as, or more extreme than, the observed results under the assumption that the null hypothesis is true. A small p-value suggests the observed effect is unlikely due to chance alone.

Confidence Interval

A range of values, derived from sample data, that is likely to contain the true population parameter with a specified level of confidence (commonly 95%). It communicates both the estimate and the uncertainty around it.

Randomized Controlled Trial (RCT)

An experimental study design in which participants are randomly assigned to treatment or control groups to minimize bias and confounding. RCTs are considered the gold standard for establishing causal relationships in clinical research.

Survival Analysis

A set of statistical methods for analyzing time-to-event data, where the outcome of interest is the time until an event such as death, disease recurrence, or equipment failure occurs. It handles censored observations where the event has not yet occurred.

Logistic Regression

A regression method used when the outcome variable is binary (e.g., disease present or absent). It models the log-odds of the outcome as a linear function of predictor variables and produces odds ratios as measures of association.

Multiple Testing Correction

Statistical adjustments made when performing many simultaneous hypothesis tests to control the overall probability of false positives. Common methods include the Bonferroni correction and the Benjamini-Hochberg procedure for controlling the false discovery rate.

Power Analysis

A method used to determine the sample size needed for a study to detect a meaningful effect with a specified probability (statistical power, typically 80% or 90%). It depends on the expected effect size, significance level, and variability in the data.

Confounding Variable

A variable that is associated with both the exposure and the outcome, potentially distorting the observed relationship between them. Failure to account for confounders can lead to biased estimates of effect.

Kaplan-Meier Estimator

A nonparametric statistic used to estimate the survival function from time-to-event data. It accounts for censored observations and produces a step-function survival curve showing the probability of surviving past each observed event time.

Key Terms at a Glance

Alpha (Significance Level):The pre-specified probability threshold (commonly 0.05) for rejecting the null hypothesis. It represents the maximum acceptable probability of committing a Type I error.

Analysis of Variance (ANOVA):A statistical method for comparing means across three or more groups by partitioning the total variability in data into between-group and within-group components.

Bayesian Inference:A statistical framework that updates the probability of a hypothesis as new evidence is obtained, using Bayes' theorem to combine prior beliefs with observed data to produce posterior probabilities.

Bias:A systematic error that leads to an incorrect estimate of the association between exposure and outcome. Common types include selection bias, information bias, and confounding.

Blinding:A procedure in clinical trials where participants, investigators, or both are kept unaware of treatment assignments to prevent bias in treatment delivery and outcome assessment.

Censoring:A situation in survival analysis where the time to the event of interest is incompletely observed, often because the study ended or the participant was lost to follow-up.

Clinical Trial Phases:The sequential stages of testing a new therapy: Phase I (safety and dosage), Phase II (efficacy and side effects), Phase III (large-scale comparison with standard treatment), and Phase IV (post-marketing surveillance).

Confounding:Distortion of the estimated association between an exposure and an outcome caused by a third variable that is related to both.

Cox Proportional Hazards Model:A semi-parametric regression model used in survival analysis to estimate the effect of covariates on the hazard rate while assuming proportional hazards over time.

Cross-Sectional Study:An observational study that collects data on exposure and outcome at a single point in time. It measures prevalence but cannot establish temporal relationships or causation.

Effect Size:A quantitative measure of the magnitude of a phenomenon or treatment effect, independent of sample size. Common measures include Cohen's d, odds ratios, and relative risks.

Epidemiology:The study of the distribution, determinants, and frequency of disease in human populations. Biostatistics provides the analytical tools used in epidemiological research.

False Discovery Rate (FDR):The expected proportion of false positives among all rejected null hypotheses. Controlling FDR is an alternative to controlling the family-wise error rate in multiple testing scenarios.

Incidence Rate:The number of new cases of a disease occurring per unit of person-time at risk in a defined population over a specified period.

Intention-to-Treat (ITT):An analysis approach in RCTs where all participants are analyzed in the group to which they were randomized, regardless of adherence, to preserve the integrity of randomization.

Get study tips in your inbox

We'll send you evidence-based study strategies and new cheat sheets as they're published.

We'll notify you about updates. No spam, unsubscribe anytime.