Machine Learning Glossary

25 essential terms — because precise language is the foundation of clear thinking in Machine Learning.

Showing 25 of 25 terms

A mathematical function applied to a neuron's output to introduce non-linearity, enabling neural networks to learn complex patterns. Common examples include ReLU, sigmoid, and tanh.

An algorithm for computing gradients in neural networks by applying the chain rule of calculus to propagate error signals backward from the output layer through all hidden layers.

The number of training samples processed together before the model's parameters are updated during one iteration of gradient descent.

Error introduced by approximating a complex real-world problem with a simplified model. High bias leads to underfitting, where the model misses relevant patterns in the data.

A supervised learning task where the model predicts a discrete categorical label (e.g., spam or not spam, image category) for each input.

An unsupervised learning technique that groups similar data points together based on feature similarity without predefined labels. Examples include k-means and DBSCAN.

A deep learning architecture using convolutional filters to automatically learn spatial feature hierarchies from grid-structured data like images.

A model evaluation technique that partitions data into multiple folds to train and test the model on different splits, providing a robust estimate of generalization performance.

The surface or line in feature space that separates different classes in a classification model. Its shape depends on the algorithm and model complexity.

A subset of machine learning that uses neural networks with many hidden layers to learn hierarchical representations of data, excelling at tasks like image recognition and language understanding.

A regularization technique where randomly selected neurons are ignored during training to prevent co-adaptation and reduce overfitting.

A technique that combines predictions from multiple models to produce a more accurate and robust result than any single model. Includes bagging, boosting, and stacking.

One complete pass through the entire training dataset during the model training process.

An individual measurable property or input variable used by a machine learning model to make predictions. Also called an attribute or predictor.

An iterative optimization algorithm that adjusts model parameters by moving in the direction of steepest decrease of the loss function to find minimum error.

A configuration value set before training begins that controls the learning process, such as learning rate, number of layers, or regularization strength. Tuned via search or optimization.

A mathematical function that quantifies the difference between a model's predictions and the actual target values. The training process aims to minimize this function.

A condition where a model learns training data noise rather than the true underlying pattern, resulting in excellent training performance but poor generalization to new data.

An unsupervised dimensionality reduction technique that transforms data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they capture.

A neural network with cyclic connections that maintain a hidden state, enabling it to process sequential data like text, speech, and time series.

A supervised learning task where the model predicts a continuous numerical value, such as price, temperature, or probability.

Techniques that add constraints or penalties to a model to prevent overfitting. Common forms include $L_1$ (Lasso), $L_2$ (Ridge), dropout, and early stopping.

A learning paradigm where an agent learns optimal behavior through trial-and-error interaction with an environment, guided by reward and penalty signals.

A technique that reuses a model pre-trained on one task as the foundation for a different but related task, reducing data and compute requirements.

Error caused by a model's sensitivity to small fluctuations in the training data. High variance leads to overfitting, where the model captures noise rather than the true signal.