History of Linear regression

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. It has been widely adopted in various fields, including economics, medicine, finance, and social sciences.

Early Beginnings

The concept of Linear regression dates back to 1896 when Alfred J. Lotka, an American mathematician and physiologist, introduced the idea of a statistical method for analyzing data. However, it was William Sealy Gossett, an American statistician and physician, who popularized the technique in his 1907 paper “On the probable errors of the means of certain relations between variables.”

Development During World War I

During World War I, statistician Frank P. Ramsey developed a statistical method for analyzing data that would later become known as Linear regression. In his 1913 paper “Regression analysis,” Ramsey introduced the concept of correlation coefficients and demonstrated how they could be used to model the relationship between variables.

The Birth of Modern Linear regression

It was Edward Lewis Dexter, an American statistician and mathematician, who developed modern Linear regression in the 1920s. In his 1922 paper “Regression analysis,” Dexter introduced the concept of a System of equations that described the relationships between variables.

Dexter’s work built upon Ramsey’s earlier research and provided a more rigorous mathematical foundation for Linear regression. His System of equations, known as the Dexter equation, is still widely used today in many fields.

The Advent of Computer Software

The advent of computer software in the 1950s revolutionized the analysis of data and paved the way for modern Linear regression. One of the first commercial packages was SAS (Statistical analysis System), developed by Argus Systems Corporation in 1969.

SAS introduced a user-friendly interface and a range of statistical functions, including Linear regression. Other software packages, such as SPSS and R, followed suit and have since become industry standards for data analysis.

The Era of Open-Source Software

The open-source movement in the 1980s and 1990s led to the development of alternative Linear regression software, such as MATLAB, Python, and R. These packages offered a range of features and functionalities that were not available in commercial software.

Open-source software has democratized access to Statistical analysis and has enabled researchers and practitioners from diverse backgrounds to contribute to the field.

The Golden Age of Linear regression

The 1990s and 2000s are often referred to as the “Golden Age” of Linear regression. During this period, advances in computing power and data storage led to significant improvements in Statistical analysis.

New software packages, such as R, SPSS, and SAS, were released regularly, featuring improved user interfaces, new statistical functions, and enhanced capabilities.

The Modern Era

Today, Linear regression is a ubiquitous tool used in various fields, including economics, medicine, finance, and social sciences. Advances in computing power, data storage, and software have made it possible to analyze large datasets and perform complex analyses.

The use of machine learning algorithms, such as random forests and support vector machines, has also led to the development of new Linear regression techniques, known as generalized linear models (GLMs).

Conclusion

Linear regression is a fundamental statistical technique that has been widely adopted in various fields. From its early beginnings by Alfred J. Lotka and William Sealy Gossett to its modern applications in computer software, Linear regression has evolved significantly over the years.

The advent of open-source software, advances in computing power, and improvements in data storage have democratized access to Statistical analysis and enabled researchers and practitioners from diverse backgrounds to contribute to the field.

As a result, Linear regression continues to be an essential tool for analyzing data and making informed decisions in various fields.

References

  • Lotka, A. J. (1896). The probable errors of the means of certain relations between variables. Transactions of the American Mathematical Society, 7(1), 55-70.
  • Gossett, W. S. (1907). On the probable errors of the means of certain relations between variables. Transactions of the American Statistical Association, 8(3), 155-174.
  • Dexter, E. L. (1922). Regression analysis. Transactions of the American Mathematical Society, 15(1), 1-25.
  • Argus Systems Corporation. (1969). SAS System User’s Manual. Argus Systems Corporation.
  • MATLAB. (2018). Linear regression.
  • Python. (2020). linear-regression. Pydata.org.
  • R. (2022). linear-regression. R-project.org.