Statistics: Math or Myth?

BY - Galit Shmuéli - Kent Johnson
ILLUSTRATION BY - Goñi Montes

Answering a Question  with the Wrong Statistics

by Kent Johnson

1016_Final_RevisedSimon and Cathay are chefs at two restaurants in town. They recently discovered the recipe for a delicate pastry that is made just before serving. If the pastry is not made correctly, it falls flat and cannot be served. Simon and Cathay decided to have a friendly competition to see who is more skilled at making these pastries. Last Tuesday and Wednesday, they both offered a special that contains the pastry; whoever had the largest percentage of pastries turn out well would win a bottle of single-malt scotch. On Tuesday night, Simon and Cathay met after work to compare their evening’s records; that night, 20% of Simon’s pastries were successful, but only 10% of Cathay’s were. They both thought they could do better, so they practiced during the night. On Wednesday night, 95% of Simon’s pastries turned out well, but only 80% of Cathay’s did. Upon seeing this, Simon victoriously grabbed the scotch and poured a drink. But before he could toast his victory, Cathay quietly showed him something she’d written on some paper. They studied the figures. Eventually, Simon let out a big laugh, graciously handed Cathay the scotch, and declared her to be the contest true winner! On average, Simon was better at pastry-making on Tuesday night, and he was again on Wednesday night. Yet Cathay was better on average on Tuesday and Wednesday nights taken together. How could this happen? If this strikes you as bizarre, you are not alone. However, cases such as this one, which are known as a “Simpson’s Paradox”, can occur quite easily. To see what happened, we must look at the absolute numbers that produced the percentages, not the percentages themselves. Since the contest concerned the best overall average, Cathay tallied up all her trials and all of her successes over both days. Thus, these two facts were what mattered most for their overall scores in the contest. When we consider a statistic, we are viewing a summary that will hopefully allow us to draw relevant and appropriate conclusions. But summaries always involve some loss of information. That is, a statistic provides an accurate report of one aspect of what may be a very complex situation. Asking a statistic to say something about some other aspects of the situation can be a risky business.

Workplace Statistics: Who’s Asking?

by Galit Shmuéli

Managers prefer “hard data” over gut feelings for decision making. Computing the right statistic from a set of data and interpreting its meaning requires figuring out the exact question that we are trying to answer. A critical aspect of the question is: who is asking? Is it the employer, the employee, or perhaps a government agency? In each case, a different number might be needed. Let me illustrate this point by considering employer initiated drug screening. Drug testing used by companies for screening potential and current employees are controversial for various ethical and legal considerations. One such issue is the false-positive rate of the tests. A false-positive result means that the tested person incorrectly tests positively. In drug screening, this would mean that a non-user is tested positively for drug use. Various factors affect the inaccuracy of urine drug tests. According to the European Workplace Drug Testing Society “an analytical positive result may be due to medication or to dietary causes.” False positive results can clearly carry a hefty price. Therefore, a key piece of information that both employers and employees need from the testing agency is the false positive rate of the test. Public information on false positive rates of standard urinalysis drug testing is not readily available.  While some sources report a rate of 1%-2.5%, others claim “at least 10 percent, and possibly as much as 30 percent”. But what exactly is a false positive rate? How is it computed? There appears to be confusion regarding how to compute it. An employee taking the test wants to know: “Given that I am not using drugs, what is the chance that I will test positive?” In contrast, an employer might want to know: “Given all the positive test results in my company, what percent is incorrect?” for evaluating costs and other resources related to follow-up testing, rehabilitation plans and compensation. While the employee and employer’s questions seem similar, they in fact differ significantly. The employee knows whether he or she uses drugs, but their test result is unknown when they are asking this question. In the employer’s case, the test results are known, but the employees’ drug usage is unknown.

[W   socsci.uci.edu/~johnsonk     galitshmueli.com]

20.10.2010
The World celebrates the first World Statistics Day to promote the many achievements of official statistics premised on the core values of service, professionalism and integrity. The WSD serves as a tool to further support the work of statisticians across different settings, cultures, and domains. On World Statistics Day, activities at national level will highlight the role of official statistics and the many achievements of the national statistical system. International, regional and sub-regional organizations will complement national activities with additional events. The United Nations Secretary General Ban Ki-Moon explains, “today marks the first observance of WSD, proclaimed to recognize the importance of statistics in shaping our Societies. However, as in so many other areas, developing countries often find themselves at a disadvantage. On this first WSD I encourage the international community to work with the United Nations to enable all countries to meet their statistical needs. Let us all acknowledge the crucial role of statistics in fulfilling our global mission of development and peace.”

Published in the hard-copy of Work Style Magazine, Fall 2010