DIAGNOSIS
| · Introduction |
| · Learning objectives |
| · Searching evidence |
| · Appraising evidence |
| · Reading and links |
Sensitivity and specificity
are the most widely used statistics used to describe a diagnostic
test.
As clinicians, though, we don’t generally know whether or not the patient has disease; that’s why we’re ordering the test in the first place! Thus, sensitivity and specificity do not give us the information we need to interpret the test results. What do we want to know?
Ideally, we’d like to know what the probability of disease is, given a
positive or negative test.
Clearly, this is not an equation you can carry around in your head! It is also not a simple transformation, which explains why it is hard to "guesstimate" the post-test probability from the sensitivity and specificity. Take a look at the "2 x 2" table below:
We’ll refer to similar
tables in other discussions, so it’s a good idea to get familiar with
how they work.
Sensitivity and specificity by
themselves are only useful when either is very high (over typically, 95%
or higher).
The sensitivity of dyspnea on
exertion for the diagnosis of CHF is 100% (41/(41+0)), and the specificity
17% (35/(183+35)). If negative (the patient does not complain of dyspnea
on exertion), it is very unlikely that they have CHF (0 out of 41 patients
with CHF did not have this symptom). Conversely, a very specific
test, when positive, rules in disease.
The sensitivity of gallop for CHF is only
24% (10/41), but the specificity is 99% (215/218). "Given a positive (or negative) test result, what is the new probability of disease?" Let’s fill some numbers in. In his article on the clinical diagnosis of strep, Frank Dobbs took consecutive patients, prospectively did the same history and physical exam manoeuvres on all of them, and then did a throat culture. One of the things our nurses ask patients who call with a sore throat is how long they’ve had it. If it’s only been a couple of days, they are more likely to ask patients to try symptomatic remedies. If the duration is longer, they are more likely to ask them to come in. The presence of fever is another important factor in advising the patients, with febrile patients more likely to be asked to come in for evaluation. Is there any evidence to support these strategies? Essentially we are asking, "If a
patient has fever, what is the likelihood of strep pharyngitis?" and
"If a patient has symptoms for 3 or more days, what is the likelihood
of strep pharyngitis?"
Using the equations for sensitivity and specificity, we find that for fever:
Note that the sensitivity can be written as "0.81" or "81%". One is no better than the other - just be consistent! Similarly, for duration of symptoms >= 3 days,
At this point, you should be getting a little uneasy about your triage policy. While most patients with strep had fever, only 22% with symptoms for more than 3 days had the diagnosis! We still haven’t answered our question, though. To do that, we have to calculate predictive values. They are defined as:
The probability of disease given a positive test can therefore be called the "post-test probability of disease given a positive test", the "positive predictive value", or the "posterior probability of disease given a positive test". These names are interchangeable. Similarly, the probability of disease given a negative test is called the "post-test probability of disease given a negative test" or the "posterior probability of disease given a negative test"; this is equal to one minus the negative predictive value. Note this last point: the negative predictive value does not equal the post-test probability of disease given a negative test. They are the converse of one and another. What about our old friend, the 2 x 2 table? Here is the standard 2 x 2 table:
We can now define positive and negative predictive value as follows:
Notice that we are now using the rows
instead of columns, as for sensitivity and specificity.
We can quickly calculate that for fever:
And for duration of symptoms of 3 or more days:
So…If a patient has fever, there is a 42% chance of strep, and if they have symptoms for 3 or more days, only a 20% chance. It appears that it may be appropriate to revise our triage policy! Note that these are the same values we got earlier. This is an important point, and one of the strengths of sensitivity and specificity:
In other words, they are not affected by how common or rare the disease is! On the other hand,
Likelihood ratios tell us how much we should shift our suspicion for a particular test result. Because tests can be positive or negative, there are at least two likelihood ratios for each test. The "positive likelihood ratio" (LR+) tells us how much to increase the probability of disease if the test is positive, while the "negative likelihood ratio" (LR-) tells us how much to decrease it if the test is negative. The formula for calculating the likelihood ratio is: probability of an
individual with the condition having the test result Thus, the positive likelihood ratio is: probability
of an individual with the condition having a positive test Similarly, the negative likelihood ratio is: probability
of an individual with the condition having a negative test You can also define the LR+ and LR- in terms of sensitivity and specificity:
* * * The first thing to realize about LRs is that an LR > 1
indicates an increased probability that the target disorder is present, and an LR < 1
indicates a decreased probability that the target disorder is present.
Correspondingly, an LR = 1 means that the test result does not change the probability of
disease at all!
The terms "odds of disease" and "probability of disease" get thrown around a lot as if they were the same thing, but they are not. Lets consider a group of 10 patients, 3 of whom have strep and 7 of whom dont. If we randomly choose a patient, the probability that they will have strep is 3/10 or 0.3 or 30%. On the other hand, the odds of having strep in this group are 3 : 7. Here is a table which relates the odds to the probability:
Stated as a mathematical formula (yuck!) this relationship is:
Thus, if the odds are 4:9, the probability is 4 / (4+9) = 4/13 = 0.31 (or 31%). Similarly, if the probability is 15%, then the odds are 15 : (100-15) = 15 : 85. With a little practice, you can easily convert from probability to odds and back again in your head. Why should you possibly care about doing this? Well, the likelihood ratio has a very interesting property:
So, for positive and negative tests:
Now, with a little practice, we can actually estimate the probability of disease given a positive or negative test in our heads! Lets go through a couple of examples: You estimate, based on your knowledge of the community, the patients age of 10 years, and his symptoms (sore throat, fever, exudate, and adenopathy) that the pre-test probability of strep is approximately 40%. The rapid antigen test for strep is positive; looking at the package insert, you see that it has a sensitivity of 90% and specificity of 90%. The LR+ and LR- are therefore 9 and 0.1. Before proceeding, make sure you understand how we calculated those LRs using the formulas described above. First, notice that knowing the sensitivity and specificity doesnt help you much when it comes to calculating the likelihood of disease in your patient. However, in 3 simple steps, well use the LRs to do just that:
* It simplifies calculations somewhat to reduce elements to the least common denominator. Thus, 40:60 is the same as 4:6, and is also the same as 2:3. Similarly, 30 : 70 is the same as 3:7. What if the test is negative? Lets go through that, using the LR- of 0.1 this time in our calculations:
Now, instead of just knowing that a positive strep test makes disease more likely, and a negative one makes it less likely (or worse yet, thinking that a positive test means the patient has disease and a negative test means they dont) you can estimate the specific likelihood of disease for your patient. This is truly "patient-centred" medicine, since your interpretation of the laboratory test is specific to your patients pre-test probability of disease, which is in turn based on his or her age, symptoms, and signs. In the above example, a positive test provides pretty convincing evidence of strep (86% probability). On the other hand, many physicians would be uncomfortable not treating a child who had a negative strep test, and therefore still had a 6% chance of having strep. After going through this calculation once, you might decide that in similar patients, you will empirically treat them, since a negative test does not rule out disease. Or, you might decide to get a throat culture in-patients with a negative strep screen, while giving antibiotics to those with a positive strep screen. Lets consider another example: an older patient with much less typical symptoms of strep (age 20, sore throat, cough, no adenopathy, and no exudate) and a pre-test probability of disease of 5% by your estimate. If the test is positive (remember, LR+ = 9):
If the test is negative:
In this case, a negative test does rule out disease, and a positive test gives a high enough likelihood of disease that you would probably treat the patient, but remain open to other causes for his or her symptoms. Individualizing treatment in this way is much more powerful than simply doing the same thing for every patient. Getting the most information from a testWhen we order a test result, were accustomed to thinking in terms of the results being positive or negative. However, the actual information in the result is often much richer. Consider the diagnosis of iron deficiency anemia (IDA) from the serum ferritin level. Labs generally report a single cut-off for abnormal around 65 mmol/l, with low values suggesting a diagnosis of iron deficiency anemia. Using that value as a "positive" test, the LR+ is 6 and the LR- is 0.12. But there is more information hidden in these results. You can also calculate a likelihood ratio for each range of ferritin, as shown below:
Doing these calculations is easy. Set up your table as above, with a column showing the percentage of patients with the disease that have a test value in that range, and a second column showing the percentage of patients without disease that have a test value in that range. Then, divide the first column by the second column to calculate the LR for that range. In the table above, for example, 59% / 1.1% = 52. Once again, likelihood ratios help us provide individualized care, and get the most
possible information from a test result. 4. ROC
Curves
If we just want to calculate sensitivity and specificity for this test, we have to choose a "cutpoint" which separates 'normal' from 'abnormal'. If we choose <= 34 as an abnormal ferritin, we can "collapse" some rows and get the following table:
Doing the math, we now have a familiar 2 x 2 table:
Finally, we can calculate sensitivity and specificity for this cutpoint of 34:
Remember, though, that the sensitivity and specificity depend on where we make the cutpoint. I have done the math, and calculated the sensitivity and specificity for each of 4 different cutpoints in the table below:
This confirms that as the sensitivity increases, the specificity drops, and vice versa.
This diagram graphs the creatinine kinase values for two groups of patients, those with
MI and those without MI. As we know from our clinical experience, there is an
overlap in the CK values between the two groups, shown in the middle of the diagram.
What about ROC curves? We're getting there, but the above concepts are important. Make sure you understand how you can derive multiple pairs of sensitivity and specificity for a diagnostic test, and why sensitivity and specificity are inversely related. An ROC curve is simply a graph of sensitivity vs. (1-specificity). Why not sensitivity vs. specificity? Well, you could do that, but because the area under the curve for sensitivity vs. (1-specificity) has special meaning, whereas it does not for sensitivity vs. specificity, we choose the former. You'll see. Below, we've graphed the values from the table of sensitivities and specificities for the diagnosis of iron deficiency anemia using serum ferritin:
The area under the ROC curve (AUROCC) is a reflection of how good the
test is at distinguishing (or "discriminating") between patients with and
without IDA. The greater the area, the better the test.
A worthless test, which does not discriminate between IDA and non-IDA patients, would have a curve shown by the diagonal red line. Thus, the best possible test (100% sensitive and 100% specific) would have an area under the curve of 1.0
|