See also
UTAX reference data downloads
How do I create my own taxconfs file?
Validating taxonomy classifier
algorithms
UTAX and the RDP classifier
compared on fungal ITS
UTAX is an algorithm for taxonomy assignment which is implemented in the utax command.
The main advantages of UTAX are very high speed and predictive P-values.
The algorithm is currently not published. See Validating Taxonomy Classifiers for the method I used to validate its accuracy compared with other algorithms, including especially RDPC. See results on ITS.
At a high level, UTAX is a word-counting method
similar to the RDP Naive Bayesian Classifier. It exploits the "U-sorting"
strategy of the USEARCH algorithm to perform an
alignment-free search of the reference database (because my testing found no
significant improvement using alignments). The fractional word counts are used
to calculate a score and P-value for each taxonomic level. The P-values are
obtained by curve-fitting to empirical results on training data and give a
realistic estimate of the error rates at all taxonomic levels, in contrast to
the bootstrap values reported by the RDP Classifier which do not predict true
error rates.