Information Retrieval Glossary
25 essential terms — because precise language is the foundation of clear thinking in Information Retrieval.
Showing 25 of 25 terms
Best Matching 25, a probabilistic ranking function incorporating term frequency saturation and document length normalization.
A retrieval model where queries are expressed as Boolean combinations of terms (AND, OR, NOT) and documents either match or do not.
A similarity measure between two vectors computed as the cosine of the angle between them, commonly used to compare document and query vectors.
The standard experimental methodology for IR evaluation using a test collection, queries, and relevance judgments.
An approach to retrieval using learned dense vector embeddings rather than sparse term-based representations for semantic matching.
The science of searching for and obtaining relevant information from large data collections, encompassing the algorithms and systems behind search engines and digital libraries.
A data structure mapping terms to the documents and positions where they occur, enabling efficient full-text search.
A probabilistic model estimating the likelihood of a sequence of words, used in IR to rank documents by the probability of generating the query.
Reducing words to their dictionary base form (lemma) using linguistic analysis, more accurate than stemming.
An evaluation metric averaging precision values at each relevant document across a set of queries.
Normalized Discounted Cumulative Gain, an evaluation metric for ranked retrieval that supports graded relevance judgments.
A link analysis algorithm that assigns importance scores to web pages based on the quantity and quality of incoming hyperlinks.
The list of documents (and optionally positions) associated with a particular term in an inverted index.
The proportion of retrieved documents that are relevant to the user's query.
The process of adding related terms to a query to improve recall by bridging vocabulary mismatches.
The proportion of all relevant documents in the collection that are successfully retrieved.
The degree to which a retrieved document satisfies the user's information need. Can be binary or graded.
Using explicit or implicit user judgments on retrieved documents to iteratively refine the query and improve results.
A classic relevance feedback method that adjusts the query vector toward relevant documents and away from non-relevant ones in the vector space model.
The process of reducing words to their morphological root form to improve term matching across inflectional variants.
Highly frequent function words (e.g., 'the', 'and', 'is') often removed during indexing to reduce noise and index size.
A term weighting scheme combining term frequency (how often a term appears in a document) and inverse document frequency (how rare the term is across the collection).
The process of breaking text into individual units (tokens), typically words or subwords, as a first step in text processing.
Text REtrieval Conference, an annual NIST-organized evaluation campaign benchmarking IR systems on shared tasks.
A model representing documents and queries as vectors in term space, using cosine similarity to measure relevance.