USEARCH manual

UTAX and RDP classifier performance on ITS

The figure below shows sensitivity (x axis) vs. error rate (y axis) for RDP and UTAX on full-length ITS1. ITS2 is not shown; it looks similar. The method used to measure sensitivity and error is described here:

/usearch/manual/taxonomy_validation.html

To claim that UTAX is definitively better than RDP, we would need to see that the UTAX curve (black) is always below the RDP curve (red). This would show that for any chosen cutoff in the RDP bootstrap, there is a P-value cutoff for UTAX that gives higher sensitivity and lower error rate. For genus, this is true, and for family it is mostly true at acceptable error levels (say, <5%), but the story gets messier at higher taxonomic levels. This is the main reason I haven't published anything yet -- this story is not a clean win, and I don't yet understand the behavior. Why the dramatic difference between these two quite similar algorithms at class and phylum level? UTAX is very poor compared to RDP at class level, while UTAX does a decent job at phylum level where RDP is useless (the first point has >25% error rate, so there is no visible RDP curve). Note that "class level" means that the closest sequence in the reference database is a different order or higher, it does not mean that the class level overall is this bad -- obviously, if you get the genus or family right, all the higher levels are also correct.

Looking at these curves, I think it is fair to say that UTAX is a better classifier at genus and family level, and with ITS higher levels are going to be very hard with any classifier because sequence similarity is so low.

There is another consideration that is very important -- setting cutoff values. Here, UTAX has a decisive advantage over RDP because the P-value estimates were obtained empirically from exactly these curves, so we know that 1. they accurately predict error rates, at least on this training data, and 2. the cutoffs are comparable at all levels, so e.g. if you set a 95% P-value cutoff then this will give the same error rate at each level. With the RDP classifier, the bootstrap value does not give you the error rate, and the values are not comparable at different taxonomic levels, so e.g. a 50% bootstrap cutoff might give you a 15% error rate at genus level but 30% at family level. I would prefer to have a classifier with a somewhat higher error rate if it provides informative P-values because this gives me a good estimate of the error rates for downstream analysis. With the RDP classifier, you don't get an informative estimate of error rate, so the cutoff you choose might give you higher error rates than UTAX even if lower rates are possible.