Data Analytics Glossary

25 essential terms — because precise language is the foundation of clear thinking in Data Analytics.

Showing 25 of 25 terms

A controlled experiment that compares two variants by randomly assigning subjects to each group, used to determine which version performs better on a defined metric.

Related:Statistical SignificanceHypothesis TestingConversion Rate

The strategies, technologies, and tools used to collect, integrate, analyze, and present business data to support better decision-making, typically through dashboards and reports.

Related:DashboardData VisualizationDescriptive Analytics

An analytical technique that groups subjects by a shared characteristic or time period and tracks their behavior over time to identify trends and lifecycle patterns.

Related:Retention AnalysisFunnel AnalysisSegmentation

The framework of policies, roles, processes, and standards that ensures organizational data is managed consistently, securely, and in compliance with applicable regulations.

Related:Data QualityData LineageCompliance

The process of replacing missing data values with substituted values using methods such as mean replacement, interpolation, k-nearest neighbors, or model-based approaches.

Related:Data QualityMissing DataData Preprocessing

A storage system that holds vast amounts of raw data in its native format until needed, supporting schema-on-read and accommodating structured, semi-structured, and unstructured data.

Related:Data WarehouseUnstructured DataSchema on Read

The documentation of data's origins, movements, and transformations throughout its lifecycle within an organization's systems and processes.

Related:Data GovernanceETLAudit Trail

The graphical representation of data through charts, graphs, maps, and dashboards to make complex information accessible and to reveal patterns, trends, and outliers.

Related:DashboardTableauExploratory Data Analysis

A centralized repository that stores structured, processed data from multiple sources, optimized for analytical querying and reporting using a schema-on-write approach.

Related:Star SchemaOLAPData Lake

The tier of analytics that summarizes historical data using aggregation, visualization, and reporting to answer the question 'what happened.'

Related:Diagnostic AnalyticsKPIDashboard

Techniques that reduce the number of variables in a dataset while preserving as much information as possible, such as PCA, to simplify models and mitigate the curse of dimensionality.

Related:PCAFeature EngineeringMachine Learning

Extract, Transform, Load: a data integration process that extracts data from sources, transforms it into a clean and consistent format, and loads it into a target data store.

Related:Data PipelineData WarehouseData Integration

The process of using domain knowledge to create, select, or transform input variables from raw data to improve the performance of analytical or machine learning models.

Related:Machine LearningData PreprocessingDimensionality Reduction

Key Performance Indicator: a quantifiable metric used to evaluate how effectively an organization or process is achieving its strategic and operational objectives.

Related:DashboardDescriptive AnalyticsBusiness Intelligence

The default assumption in hypothesis testing that there is no effect or no difference between groups. Statistical tests attempt to gather evidence against the null hypothesis.

Related:P-valueStatistical SignificanceAlternative Hypothesis

Online Analytical Processing: a category of systems designed for complex, multi-dimensional queries over large historical datasets, supporting operations like slicing, dicing, and drill-down.

Related:Data WarehouseOLTPCube

A data point that differs significantly from other observations in a dataset. Outliers may indicate measurement errors, data entry mistakes, or genuinely extreme values that require investigation.

Related:Data QualityExploratory Data AnalysisStandard Deviation

The probability of observing results at least as extreme as the data, assuming the null hypothesis is true. Lower p-values provide stronger evidence against the null hypothesis.

Related:Statistical SignificanceNull HypothesisConfidence Interval

The use of statistical models and machine learning on historical data to forecast future outcomes and probabilities.

Related:Regression AnalysisMachine LearningForecasting

The most advanced analytics tier that recommends specific actions by combining predictive models with optimization and simulation techniques.

Related:Predictive AnalyticsOptimizationDecision Science

A statistical method for estimating relationships between a dependent variable and one or more independent variables, used for prediction and understanding variable influence.

Related:R-squaredPredictive AnalyticsLinear Model

Structured Query Language: the standard programming language for managing and querying relational databases, widely used for data retrieval, manipulation, and reporting in analytics.

Related:Relational DatabaseData WarehouseQuery Optimization

A data warehouse modeling approach featuring a central fact table linked to surrounding dimension tables, resembling a star and optimized for analytical queries.

Related:Data WarehouseFact TableDimension Table

A determination that an observed result is unlikely to have occurred by chance, typically assessed using a p-value threshold of 0.05.

Related:P-valueA/B TestingHypothesis Testing

A logical error that occurs when analysis focuses only on entities that passed a selection process, ignoring those that did not, leading to overly optimistic or skewed conclusions.

Related:Selection BiasSampling BiasData Quality