Machine Learning Cheat Sheet

The core ideas of Machine Learning distilled into a single, scannable reference — perfect for review or quick lookup.

PiqCue — piqcue.com/machine-learning/cheatsheet

Quick Reference

Supervised Learning

A machine learning paradigm in which models are trained on labeled datasets containing input-output pairs. The algorithm learns a mapping function from inputs to outputs, enabling it to predict correct labels for previously unseen data. Common tasks include classification and regression.

Unsupervised Learning

A machine learning approach where algorithms learn patterns from unlabeled data without predefined output categories. The system discovers inherent structure, groupings, or relationships within the data on its own. Key techniques include clustering, dimensionality reduction, and anomaly detection.

Neural Networks

Computational models inspired by the biological neural networks of the human brain, consisting of interconnected layers of artificial neurons (nodes). Each connection has a weight that is adjusted during training, and neurons apply activation functions to produce outputs. Deep neural networks with many hidden layers can learn complex, hierarchical representations of data.

Gradient Descent

An iterative optimization algorithm used to minimize a model's loss function by updating parameters in the direction of the steepest decrease of the loss. The learning rate controls the step size, and variants like stochastic gradient descent (SGD) and Adam improve efficiency by using subsets of data or adaptive learning rates.

Overfitting

A modeling error that occurs when a machine learning model learns the training data too well, capturing noise and random fluctuations rather than the underlying pattern. An overfit model performs excellently on training data but poorly on unseen test data. Techniques like regularization, dropout, cross-validation, and early stopping help prevent overfitting.

Bias-Variance Tradeoff

A fundamental concept describing the tension between two sources of error in machine learning models. Bias is error from overly simplistic assumptions causing the model to miss relevant patterns (underfitting), while variance is error from excessive sensitivity to training data fluctuations (overfitting). The optimal model balances both to minimize total error.

Feature Engineering

The process of using domain knowledge to create, transform, or select input variables (features) that improve a machine learning model's predictive performance. Good feature engineering can dramatically boost model accuracy and is often more impactful than choosing a more complex algorithm. It includes tasks like normalization, encoding categorical variables, and creating interaction terms.

Decision Trees

A supervised learning algorithm that makes predictions by recursively splitting data based on feature values, forming a tree-like structure of decisions. Each internal node represents a test on a feature, each branch represents an outcome of that test, and each leaf node holds a prediction. They are intuitive and interpretable but prone to overfitting without pruning or ensemble methods.

Ensemble Methods

Techniques that combine multiple individual models to produce a single, stronger predictive model. By aggregating the predictions of several base learners, ensembles reduce variance, bias, or both, and generally outperform any single constituent model. Major approaches include bagging (e.g., Random Forests), boosting (e.g., XGBoost, AdaBoost), and stacking.

Transfer Learning

A technique where a model trained on one task is repurposed as the starting point for a different but related task. Instead of training from scratch, the pre-trained model's learned representations are fine-tuned on a smaller, task-specific dataset. This approach significantly reduces the data and computation required and is especially powerful in deep learning for computer vision and natural language processing.

Key Terms at a Glance

Activation Function:A mathematical function applied to a neuron's output to introduce non-linearity, enabling neural networks to learn complex patterns. Common examples include ReLU, sigmoid, and tanh.

Backpropagation:An algorithm for computing gradients in neural networks by applying the chain rule of calculus to propagate error signals backward from the output layer through all hidden layers.

Batch Size:The number of training samples processed together before the model's parameters are updated during one iteration of gradient descent.

Bias (Model):Error introduced by approximating a complex real-world problem with a simplified model. High bias leads to underfitting, where the model misses relevant patterns in the data.

Classification:A supervised learning task where the model predicts a discrete categorical label (e.g., spam or not spam, image category) for each input.

Clustering:An unsupervised learning technique that groups similar data points together based on feature similarity without predefined labels. Examples include k-means and DBSCAN.

Convolutional Neural Network (CNN):A deep learning architecture using convolutional filters to automatically learn spatial feature hierarchies from grid-structured data like images.

Cross-Validation:A model evaluation technique that partitions data into multiple folds to train and test the model on different splits, providing a robust estimate of generalization performance.

Decision Boundary:The surface or line in feature space that separates different classes in a classification model. Its shape depends on the algorithm and model complexity.

Deep Learning:A subset of machine learning that uses neural networks with many hidden layers to learn hierarchical representations of data, excelling at tasks like image recognition and language understanding.

Dropout:A regularization technique where randomly selected neurons are ignored during training to prevent co-adaptation and reduce overfitting.

Ensemble Method:A technique that combines predictions from multiple models to produce a more accurate and robust result than any single model. Includes bagging, boosting, and stacking.

Epoch:One complete pass through the entire training dataset during the model training process.

Feature:An individual measurable property or input variable used by a machine learning model to make predictions. Also called an attribute or predictor.

Gradient Descent:An iterative optimization algorithm that adjusts model parameters by moving in the direction of steepest decrease of the loss function to find minimum error.

Get study tips in your inbox

We'll send you evidence-based study strategies and new cheat sheets as they're published.

We'll notify you about updates. No spam, unsubscribe anytime.