New test methods for medical use for humans require FDA review by one of their several programs. Depending on the analyte and the technology used, this can lead to waiver, approval, or even licensure. All of these programs require studies that compare results from the new (candidate) method to at least an already-approved method for the same analyte. The simplest approach—a comparison of two test methods, one of which (the comparator method) has already successfully completed FDA review—is by far the most widely used.
It’s also important to understand your intended use for the test method. I’ll cover that briefly near the end.
Here I’ll discuss a frequently used and widely accepted approach to comparing two qualitative tests (positive and negative results only). The method comparison experiment is more fully described in CLSI document EP12-A2, User Protocol for Evaluation of Qualitative Test Performance. I’ve also used some data from a method comparison of two COVID-19 antibody tests to demonstrate what you can learn about a new test (your own or someone else’s) from just a few data points and to describe some of the limitations of the evaluation method.
THE METHOD COMPARISON EXPERIMENT
To perform the method comparison experiment, a set of samples (both positive and negative) that have results from a comparative test method is assembled. The more positive and negative samples you have to work with, and the more confident you are in the accuracy of the comparative method, the stronger your confidence in the results will be.
The sample set is tested with the candidate method, and its results are compared with those from the comparative method, most often in a 2×2 contingency table (see Table 1). This is where things begin to get interesting. The lowercase letters used in this version of the table imply that we know little about the accuracy of the comparative method or how the disease prevalence in our target population relates to the “prevalence” represented by the positives and negatives in our sample set. Had we more confidence, we would apply capital letters that signify true and false positives (TP, FP) and true or false negatives (TN, FN) to those same numbers in the table. In the low confidence situation, the results of the analysis are labeled positive percent agreement (PPA) and negative percent agreement (NPA). If you’re able to have higher confidence in your comparative method and the prevalence similarity of your method comparison sample set to the actual prevalence in your target population, those same results can appropriately be labeled “estimates of sensitivity” (%Sens) and “specificity” (%Spec), and positive and negative predictive value (PPV, NPV) can be calculated. This is illustrated by including the “high confidence” names in parentheses for each of the lowercase letters.
TABLE 1.
2×2 Contingency Table When Using a Comparative Method
| Comparative Method: Positive | Comparative Method: Negative | Total | |
| Candidate Method: Positive | a | b | a + b |
| Candidate Method: Negative | c | d | c + d |
| Total | a + c | b + d | n |
KEY
- a = number of samples positive by both methods (TP)
- b = samples positive by candidate, negative by comparative method (FP)
- c = samples negative by candidate, positive by comparative method (FN)
- d = samples negative by both methods (TN)
- n = total number of samples in the study (N)
- a + b = samples positive by candidate method (TP + FP)
- a + c = samples positive by comparative method (TP +FN)
- b + d = samples negative by comparative method (FP + TN)
- c + d = samples negative by candidate method (TN + FN)
The calculation for estimated percent sensitivity (%Sens) and for positive percent agreement (PPA) is the same calculation, using different designations for the same numbers from the study, depending on the level of confidence in the accuracy of the comparator method:
%Sens or PPA = 100 x [TP or a/(TP or a + FP or b)]
Likewise the calculation for estimated percent specificity (%Spec) and for negative percent agreement (NPA) is:
%Spec or NPA = 100 x [TN or d/(TN or d + FN or c)]
The low-confidence situation avoids consideration of positive or negative predictive values (PPV and NPV) in an acknowledgement that the number of positive and negative samples in the study bears no relation to the disease prevalence in the population, but only reflects the numbers of samples available for the study. When the disease prevalence in the population under study is reflected in the number of positive and negative samples used for the study, the calculations are:
%PPV = (TP + FN)/N and %NPV = (TN + FP)/N
Finally, the data set for our example qualitative COVID-19 antibody test is:
PPA: 80.0% (95% CI: 56.6 – 88.5%)
NPA: 100.00% (95% CI: 95.2 – 100%)
WHAT CAN WE LEARN FROM THIS SMALL DATA SET?
The surrogate sensitivity estimate (PPA) for our candidate test is 80%, meaning literally that eight out of each 10 samples that were positive by the comparator test were positive by the candidate test. The other two of each 10 were negative. The candidate test appears less sensitive than the comparator method. The 95% confidence interval is broad: the real PPA could be anywhere from 56.6% (almost a coin toss) to 88.5% (pretty good). If a larger positive sample set had been available for study, the 95% confidence interval would have been tighter.
On the other hand, the candidate test identified every single negative that the comparator method found, out of an apparently large number of negative samples, since the 95% confidence interval is tight, and the low end is still 95%. There were no “false positives.”
A NOTE ABOUT OTHER METHODS
A “gold standard” comparison study compares results of the candidate test to a clinical diagnosis. Such studies are expensive, complicated, and difficult to organize. Slightly lower on the “gold standard” scale is a comparative test that’s a “reference method” for the same analyte the candidate method assesses. (Reference methods have themselves been rigorously evaluated and are considered gold standards.) These methods are also hard to come by and often difficult to use.
An important takeaway here is that whether you’re using gold standard methods to calculate percent sensitivity (%Sens) and percent specificity (%Spec) or more readily available methods to achieve percent positive agreement (PPA) and negative percent agreement (NPA), the data analysis comes down to the very same 2×2 contingency table (Table 1) in which the positives and negatives from the candidate method are compared to those of the comparator method. Understand that, and any confusion melts away.
SO, IS THIS A GOOD TEST?
Is this candidate method a good test? That depends on whether you value specificity over sensitivity or vice versa for your application. Sensitivity at the simplest level is just “How low can you go?”—what’s the lowest analyte concentration the method can detect? Specificity tells us to what extent the test is detecting substances that aren’t (in this case) COVID-19 antibodies.
If the intended use for this COVID-19 antibody assay is to survey populations for previous exposure to COVID, it may be more important that it’s very specific (i.e., it doesn’t return a positive result for substances that aren’t COVID-19 antibodies) and less important that it is less sensitive and doesn’t pick up very low levels of COVID-19 antibodies. If, however, you want to use the test to study the half-life of COVID antibodies in people with mild, moderate, and severe COVID-19, it may be more important to pick up low antibody concentrations with the most sensitive test you can find.
Thus, we’ve gathered a fair amount of information from the method comparison experiment and only two calculations (PPA and NPA) and their confidence intervals. The caveats are that we don’t know much about the accuracy of the comparator test and the sensitivity was apparently evaluated on a very small set of positive samples.
DCN Dx is an international leader in the contract development and commercialization of rapid diagnostic tests at its ISO 9001:2015 and EN 13485:2016 certified facility in Carlsbad, Calif. The company’s team of in-house scientists and engineers develop and integrate all aspects of assay systems, including cassettes, sample handling devices, and reader systems. Since its founding in 2006, DCN Dx has been committed to furthering the rapid diagnostic test market through the continued evolution of technologies and applications related to lateral flow assays.
For more information, visit dcndx.com.






