Top

Site Menu
History of Statistics

Modern Statisticians

William Sealy Gosset (1876-1937)

William Sealy Gosset
William Sealy Gosset (1876-1937)

William Gosset graduated from Oxford at the age of 23 with a degree in chemistry and mathematics. At a time when companies had recently begun hiring mathematicians, he was immediately hired by Guinness Brewery. His job was to ensure consistency in beer brewing. The yeast used for fermentation was cultured in jars, and there was some variability in the density of yeast organisms in each bottle of fluid. Accuracy was important to ensure complete fermentation without the bitter taste that comes from too much yeast. Because yeast is a living organism, not only the number of cells in each sample varied due to chance, but the number of cells in the bottle changed. Through his investigation, he found that the number of cells could be modeled using the Poisson distribution. This was a remarkable accomplishment because there was little naturally-occurring data following the Poisson distribution that had been discovered since the discovery of the distribution itself.

The prevalent idea at the time was that a researcher needed a very large sample size to get a good estimate of a population parameter. However, Gosset realized that in many cases, a large sample size was not practical. He analytically found a distribution that reflected the distribution of sample means when the sample size is very small. He painstakingly checked his work by randomly sampling 750 samples of size four from the heights and left middle finger lengths of 3000 criminals. He used a chi-square goodness of fit test to compare the distribution he found analytically to the distribution of sample means he had collected. He found that the two distributions were similar enough to assume that the sample means followed the distribution he identified, which was called the t-distribution. His calculations were made on the assumption that the data came from a population with a normal distribution, but later research determined that the same distribution could come from other, less normal, distributions, meaning that the significance test using the t-distribution is robust.

Gosset wanted to publish his findings, but Guinness had a strict no publishing policy resulting from a previous employee spilling trade secrets. Karl Pearson was eager to publish Gosset's findings in his journal, Biometrika, so they decided to publish them under the pseudonym Student. The paper that discussed his t-distribution (often known as Student's t-distribution) was called The Probable Error of the Mean and was published in 1908. He later published more work under the same name. The Guinness company eventually found out about his publishing, but not for many years, and possibly not until his sudden death. They could not complain about him wasting time while getting paid since so much of his work benefited the company and was discovered outside of work hours.

Gosset became a middleman between R. A. Fisher and Pearson, who had a history of not getting along. He began his friendship with Fisher while Fisher was studying at Cambridge. Fisher wrote a paper with the same findings as Gosset's 1908 work, likely found independently of Gosset, so his tutor introduced them. Gosset found a small error in Fisher's paper and Fisher responded with two pages of complicated mathematics, including a proof using multidimensional geometry. Gosset often complained that he did not understand what Fisher wrote, but he remained lifelong friends with Fisher.