Information retrieval (IR) is the science and practice of searching for and obtaining relevant information from large collections of data, including text documents, multimedia, and structured databases. The field addresses the fundamental challenge of connecting users with the information they need, encompassing the theories, algorithms, and systems that power modern search engines, digital libraries, and recommendation systems. At its core, IR deals with the representation, storage, organization, and access of information items, drawing on principles from computer science, linguistics, cognitive science, and library science.
The theoretical foundations of information retrieval were established in the mid-20th century, with seminal contributions from researchers such as Gerard Salton, who developed the vector space model and the SMART Information Retrieval System, and Stephen Robertson, who advanced probabilistic retrieval models. The field introduced key evaluation metrics like precision and recall, and formalized the concept of relevance as a measurable quantity. The development of the inverted index as a core data structure enabled efficient full-text search over massive document collections, paving the way for the web search revolution of the late 1990s and early 2000s.
Today, information retrieval encompasses a broad range of topics including web search, question answering, text classification, clustering, filtering, and recommendation. Modern IR systems leverage machine learning, natural language processing, and deep learning techniques such as transformer-based neural ranking models to improve search quality. Evaluation campaigns like TREC (Text REtrieval Conference) continue to drive innovation. The field is more relevant than ever as the volume of digital information grows exponentially, making effective retrieval a critical capability for individuals, businesses, and society at large.