Friday, November 30, 2012

The False Positive Paradox

I've posted before about my affection for Randall Monroe's webcomic xkcd and his related work, humor, and insight, and tonight (via Neatorama) I learned that it's been two years since his fiance was diagnosed with stage III breast cancer.  She's beating it so far.  He's noted the stats, and had a webcomic on this, but it also reminded me of something separate: the false positive paradox. (Well, that sounds like a downer, but it's not.)

The false positive paradox arises when a fairly reliable test yields fewer correct positive results than incorrect positive because the incidence of whatever is being tested is so low in the population. For example, imagine 1% of a population has a disease and a test for the disease is 95% accurate. That means if one gets a positive result there is, in fact, only about a 16% chance that one actually has the disease (specifically, the chance of a correct positive result is 1% x 95% = .95%; the chance for a false positive result is 99% x 5% = 4.95%; the chance of having the condition with a positive result is then .95 / (.95 + 4.95) ≈ 16%).

Several surveys taken among physicians have shown that many do not understand the paradox. That, of course, can lead to poor treatment and prevention plans, particularly when a patient does not understand it either. Even physicians who get the math -- it is simple math -- may be misled if a tester is moving between populations with different real rates of incidence. Moving to a different population with different incidence can increase or decrease the effect of the paradox.

That leads to another common misconception: in fact, the false positive paradox does not always mean the likelihood one has a condition is less than accuracy of a test of the test has some rate of false positives.  If the incidence of a condition is high and a test is very accurate, then, if one is part of the high risk population and if one gets a positive result, then the chance of having the condition is greater than the test's stated accuracy.

For example, if 95% of a population has a condition and there's a test for it that is 95% accurate, then a member of the population who tests positive on the test has the condition with a likelihood of greater than 99% (specifically, the chance of a correct positive result is 95% x 95% = 90.25%; the chance for a false positive result is 5% x 5% = .25%; the chance of actually having the condition with a positive result is then 90.25 / (90.25 + .25) > 99%).  In fact, if one does not get a positive test result, there's still just as much chance that one has the condition as not (specifically, the chance that one  has the condition yet got non-positive is 95% x 5% = 4.75%, and the chance that one  does not have the condition and got a non-positive result is 5% x 95% = 4.75%.) In addition, under this scenario the chance that one has the condition and did not get a positive result (95% x 5% = 4.75%) is equal to the chance that one does not have the condition and did not get a positive result (5% x 95% = 4.75%).

Throughout the above I've ignored "false negatives" and what that may mean -- indeed, I've not been entirely clear by what I mean by "accurate" and "accuracy," either. A "false positive" is when one does not have the condition yet a positive result occurs. A "false negative" is when one has the condition yet a negative (or non-positive) result occurs. I've assumed for my examples above that the test is bivalent: it's either positive or negative. If only the real world were that simple.

In reality a test may return a certain result a certain percentage of the time, and, if it does, we may have a certain amount of confidence that the result is correct. For example, a test may come back positive 30% of the time with a confidence of 99%. Of the remaining 70% of the population, some may have the condition and the test does not detect it (because it is latent, has different markers, etc.). Indeed, if 42% of the remaining population has the condition, then just about as many people have the condition without a positive result as with a positive result (i.e. .30 x .99 ≈ .70 x .42). Thus, a very "accurate" test, when results are returned, may still have a very high false negative rate. The test is underinclusive.

All this may seem so confusing and testing so uncertain that we wonder why we conduct tests at all. The simple answer is that testing can be very reliable if one conducts multiple different tests. (Different tests are necessary, of course, to exclude systemic errors; people who should know better -- e.g. some medical doctors -- too often seem unaware of this.) in the case of the underinclusive test even those who did not get a positive result need more testing.

I've written at length on this, and one may wonder why. The reason is that the problem with the false positive paradox is not limited to medical testing. It applies anytime we make a factual assessment of whether something is true, and that factual assessment may be inaccurate. In short, it is a situation we all face hundreds of times a day.

Consider, further, that what we know of the world is based on observation. Those observations require careful confirmation from multiple angles. Science, of course, is a method for doing that: that is, indeed, what science is. Knowledge, such as it is, that is not scientific is sketchy at best. The idea that anybody would base a theory on something other than scientific knowledge is ludicrous. I could wax poetic here, but that is all that need be said.

No comments: