Top

Site Menu
History of Statistics

Statistics Topics

Big Data

DARPA Big Data

One of the most recent developments in statistics is the use of big data. This includes modeling, data science, and data analytics, which have been made possible by the advent and improvement of computers. Big data is characterized by a large dataset, a short time to prepare data for analysis, and the variety of types of data considered. Models involving such data are often cost effective, sorting through people or other populations, for example to find those for whom a job interview or an advertisement might be most effective. Before a model is created from data, a researcher must clean the data, eliminating irrelevant and unreliable variables. Advanced analytic methods are required for big data, which is too large to analyze with traditional methods. These advanced analytics use algorithms to create a model. Models that integrate artificial intelligence through machine learning are becoming more common.

However, modeling with big data comes with costs. O'Neil (2017) contended that many predictions made by statistical models can hurt individuals, especially those already adversely impacted by society such as the poor or minorities, encoding biases into predictive algorithms. She called such models Weapons of Math Destruction, or WMDs. Big data models inform credit score calculations, which are sometimes incorporated in models for loan approval or job interviews. Someone with a poor credit score often gets a higher interest rate on loans or a lower paying job, thus reinforcing the model that they are less likely to able to pay back a loan in the future. WMDs used to determine criminal sentence length predict recidivism based on characteristics which are often correlated with race and socioeconomic status. Many other WMDs involve so many variables and complicated algorithms making it difficult to evaluate accuracy because the exact algorithm cannot be traced. She argues that big data models can be valuable tools if they are used appropriately: they show input data, they show targeted results, and they are easily audited.

The advent of big data shows that the development of statistics will continue to progress with no end in sight.