Bioinformatics Cheat Sheet
The core ideas of Bioinformatics distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Sequence Alignment
The process of arranging DNA, RNA, or protein sequences to identify regions of similarity that may indicate functional, structural, or evolutionary relationships. Algorithms like BLAST and Smith-Waterman are fundamental tools for comparing sequences against large databases.
Genome Assembly
The computational process of reconstructing a complete genome sequence from millions of short overlapping DNA fragments (reads) produced by sequencing machines. Assembly algorithms use overlap graphs or de Bruijn graphs to piece together contiguous sequences.
Phylogenetics
The study of evolutionary relationships among organisms or genes by constructing tree-like diagrams (phylogenetic trees) based on molecular sequence data. Methods include maximum likelihood, Bayesian inference, and neighbor-joining approaches.
Gene Expression Analysis
The quantitative measurement and comparison of mRNA or protein levels across different conditions, tissues, or time points using technologies like RNA-seq or microarrays. Differential expression analysis identifies genes that are up- or down-regulated in response to stimuli.
Multiple Sequence Alignment (MSA)
The alignment of three or more biological sequences simultaneously to reveal conserved regions across a set of related sequences. MSA is used to identify conserved motifs, build phylogenetic trees, and predict protein structure and function.
Hidden Markov Models (HMMs)
A probabilistic statistical model widely used in bioinformatics for tasks like gene finding, protein domain identification, and sequence classification. HMMs model sequences as a series of probabilistic transitions between hidden states, each emitting observable symbols.
Protein Structure Prediction
Computational methods to determine the three-dimensional shape of a protein from its amino acid sequence. Approaches include homology modeling, ab initio prediction, and deep learning methods such as AlphaFold, which predicts structures with near-experimental accuracy.
Variant Calling
The process of identifying differences (variants) between a sequenced genome and a reference genome, including single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variants. This is essential for clinical genomics and population genetics.
Gene Ontology (GO) Enrichment
A statistical method to determine whether a set of genes is enriched for particular biological processes, molecular functions, or cellular components compared to what would be expected by chance. GO enrichment provides functional interpretation of gene lists from high-throughput experiments.
Metagenomics
The study of genetic material recovered directly from environmental or clinical samples without culturing individual organisms. Metagenomic analysis characterizes the composition and functional potential of entire microbial communities using shotgun sequencing or 16S rRNA amplicon sequencing.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.