# Statistics, Origins and People

Science is perhaps not as pure as it is touted to be.

The origins of statistics feel rather uneasy. In 1877, Francis Galton presented at the ritualistic Friday Evening Discourse at the Royal Institution of Great Britain. His talk was titled â€śTypical Laws of Heredityâ€ť. Thatâ€™s right, statistics started with eugenics. He brought along a physical apparatus called a quincunx, now known as the Galton board.

The Galton board was a physical realization of the extraordinary result of the central limit theorem. Balls would fall through a Pascalâ€™s triangle of pins into vertical columns and form a bell-curved distribution at the bottom. These were the first time the concepts of â€śregression to the meanâ€ť and â€śco-relationâ€ť (modern correlation) were first developed. While each ballâ€™s path seems chaotic, the population of balls show order. This was beautiful.

Nonetheless, it pains me to see how science isnâ€™t as pure it is touted to be and cannot be separated from the people who do it. In Judea Pearlâ€™s words,

It is an irony of history that Galton started out in search of causation and ended up discovering correlation, a relationship that is oblivious of causation.

Galton had an everlasting influence on his disciple, Karl Pearson. Pearson subsequently became an extremely powerful scientific figure by pretty much creating the field of statistics. He went on to create the well known journal Biometrika. He would be described by his biographer as a â€śzealotâ€ť. The scientific discourse wasnâ€™t as clean as one might have wanted it to be. Dissenters were largely abandoned and competing ideas found it very hard to publish in Biometrika, which during its formative years was personally edited by Pearson himself.

The philosophical predispositions of Pearson (and Galton) blinded them towards conspicuous failings of correlation and downgraded causation as a special case. Today, we know in fact that the opposite is true. Pearson himself contributed to a vast number of examples of â€śspurious correlationsâ€ť. One of them, as early as 1899, was based on the Simpsonâ€™s Paradox - mixing different populations can lead to reverse conclusions. However, he dismissed them as a statistical artifact of an inappropriate mixing. No wonder, this was a very big missed opportunity.

Something similar happened with probabilistic methods for decades before 1980s when rule-based logic systems dominated discourse in AI research. Later, neural networks suffered a similar fate fate until the turn of the 21st century when impressive results finally compelled the community to take note.

Science is a social endeavor. The sociological side-effects are inseparable. People will always make mistakes and bad judgements, often for questionable ulterior motives. I wish scientific discourse was as pure as I imagined them to be as a child. May be there is a silver lining - science corrects itself given enough time. But is it worth wasting years fighting the wrong battles?