ITS amplicons have large variations in length due to the biology of the region -- some of the sequence evolves neutrally, and long indels are common.
This is the strategy I currently recommend for global trimming for ITS reads.
1. Pick a fixed length which is as long as possible
without losing a large fraction of the reads because they have expected errors >
1 (or your chosen e.e. threshold). The
fastq_eestats command is useful for figuring out a good compromise. Call
this length L_trim.
2. If a match to the reverse primer is present, then delete the matching letters
and any letters after that.
3. Delete if the read is shorter than a reasonable length given your primer
pair, then discard the read.
4. If the read is longer than L_trim, truncate to L_trim.
5. If the read is shorter than L_trim, pad with Ns so that it is L_trim letters.
Step 5 is needed because cluster_otus considers terminal
gaps to be real differences. After this step, all your reads should now have
length L_trim.
Steps 2 - 5 should be done before quality filtering by max e.e. You will need to
write your own script to do this as usearch currently doesn't have commands with
the necessary features. You can use the search_oligodb command to find the
reverse primer matches.
Once you've pre-processed the reads to get them to the fixed length, proceed as
usual to make UPARSE OTUs: quality filter, dereplicate, discard singletons, and
run cluster_otus.
Finally, you'll need to strip the trailing Ns (added in
step 5) from the OTU sequences.