Computational Statistics Glossary

25 essential terms — because precise language is the foundation of clear thinking in Computational Statistics.

Showing 25 of 25 terms

A likelihood-free inference method that accepts parameter values whose simulated data closely match the observed data under chosen summary statistics.

A smoothing parameter in kernel density estimation that controls the width of each kernel and the trade-off between bias and variance.

A resampling method that draws repeated samples with replacement from observed data to estimate the sampling distribution of a statistic.

The initial phase of an MCMC run whose samples are discarded because the chain has not yet converged to the target distribution.

Statistical tools used to assess whether an MCMC sampler has reached its stationary distribution, including trace plots, R-hat, and effective sample size.

A model evaluation strategy that repeatedly splits data into training and validation sets to estimate out-of-sample prediction performance.

An iterative procedure for maximum likelihood estimation in the presence of latent variables, alternating between expectation and maximization steps.

A property of a Markov chain ensuring that time averages converge to ensemble averages, required for valid MCMC inference.

An MCMC algorithm that samples each variable from its full conditional distribution given the current values of all other variables.

An MCMC method that uses the gradient of the target density to propose moves along Hamiltonian trajectories, reducing random-walk behavior.

A variance-reduction technique that draws from a proposal distribution and reweights samples to estimate expectations under a different target distribution.

A resampling technique that estimates bias and standard error by systematically omitting one observation at a time from the dataset.

A non-parametric technique for estimating the probability density function of a random variable by summing kernel functions placed at each observation.

A measure of how one probability distribution differs from a reference distribution, used in variational inference to quantify approximation quality.

A stochastic process in which the future state depends only on the current state and not on the sequence of preceding states.

A general-purpose MCMC algorithm that generates proposals from an arbitrary distribution and accepts or rejects them using a calculated acceptance probability.

Any computational technique that uses repeated random sampling to approximate numerical results for problems that may be deterministic in principle.

Iterative algorithms for finding parameter values that extremize an objective function, including gradient descent, Newton's method, and quasi-Newton methods.

A hypothesis test that determines statistical significance by computing the test statistic over all (or many random) rearrangements of the data labels.

In Bayesian statistics, the probability distribution of parameters after updating the prior with observed data via Bayes' theorem.

An initial value used to start a pseudorandom number generator, enabling reproducible computational experiments.

A Monte Carlo method that generates proposals from a known distribution and accepts them with a probability that ensures the accepted samples follow the target distribution.

A family of methods (bootstrap, jackknife, permutation) that draw repeated samples from an observed dataset to make statistical inferences.

An optimization algorithm that approximates the gradient using a random subset of data at each iteration, enabling efficient training on large datasets.

An optimization-based approach to Bayesian inference that approximates the posterior by finding the closest member of a tractable distribution family.