Thursday, October 28, 2021

Without Bayes, You Are Easily Tricked

Suppose 1% of the population has COVID.  The COVID test is 99% accurate.  You are going to travel internationally and a negative PCR (99% accurate) test is required.  Your test is positive.  What is the probability that you have COVID?   Most people will answer 99%.  After all, the test is 99% accurate.  The correct answer is 50%.  

This is how it works.  Suppose 1,000,000 people selected at random are tested.  1% have COVID so, in this thought experiment, there are 10,000 with it.  Since the test is 99% accurate, 9,900 will test positive.  990,000 do not have COVID, but because the test is 99% accurate, 1% or 9,900 will test positive, anyway.  So, in total, 19,800 will test positive and half of those do not have the disease.

This is Bayesian probability and is properly used when there are prior relavent probabilities.  Withhout training, people are really very bad at properly assessing these situations.  Because of this it can and is used to trick people, without lies. You will be told that 10,000 people tested positive for COVID and that the test is 99% accurate.  They did not lie.  But you will likely walk away with the impression that the 10,000 estimate is very close to reality.  You now know that it may be very wrong.

This was highlighted when the 'Monte Hall Problem' was widely published.  It goes like this. A game show host shows you three doors and tells you that there is a donkey behind two doors and a brand new car behind the third one.  He tells you that you may have what is behind the door you select.  Nearly everyone understands that they have ⅓ chance of getting a car.

However, rather than showing you what you won, the gameshow host opens one of the doors that you didn't choose to expose a donkey.  He then tells you that you can stay with the door you selected or switch.  The question is, 'should you stay or switch"?

The vast majority of people say it doesn't matter. When I first heard this problem, I immediately said, "You switch, of course", which is the correct answer.  I knew the correct answer because I know Bayesian Probability.  What is interesting is thàt most people didn't believe the answer even when it was explained to them.

The best way to explain the answer is like this.  There was a ⅓ chance you selected a car and a ⅔ chance that you didn't.  When the game show host opened one of the doors, the ⅔ chance that you didn't choose a car resides in just the one door.  You should switch to it.

Returning to the issue of COVID, in order to assess the meaning of reported statistics, the person conversant in Bayesian Probability will understand that without two values, the positivity rate and the test accuracy, the reported number may or may not be reliable.

Let's take a look at a recent day for U.S. reported numbers.  The number of tests reported was 1,494,000 and the reported cases was 93,000.  This is a positivity rate of 6.28%.  So, if the accuracy of the tests is 93.6%, all reported cases would be false positives.

Several studies have been undertaken and various tests and testing protocols have found a wide variety of results, so the mix of tests used is very imoortant.  Not surprisingly, people who had symptoms and tested positive were much more likely to have the disease.  This tells us that the trend toward testing asymptomatic people with rapid tests (as happens with travel or entrance requirement testing) means that most of the reported cases may be false positives.

Another outgrowth of Bayes is how you react to a positive test result.  If the reported posivity rate in your locale is low you understand that, in spite of the 99% accuracy, your positive result is likely to be false.  So, you will know to take the test again.  The probability of a negative second result is high.  

The belief is that statistcs lie.  They don't.  However, your ignorance of statistics can lead you to misinterpret the stastistcs that are presented to you.  Politicians, 'the news' and other charlatans can take advantage of this to mislead you.  I argue that a mandatory course on 'Everyday Statistics' should be given in the K-12 curriculum.  That, of course, would require the very people who are using statistics to mislead to agree.  Of course, they won't.