2013-12-24

BLAST+ should keep its BL_ORD_ID identifiers to itself

This is in a sense a continuation of my previous BLAST blog post, My IDs not good enough for NCBI BLAST+. My core complaint is that makeblastdb currently ignores the user's own identifiers and automatically assigns its own identifiers (gnl|BL_ORD_ID|0, gnl|BL_ORD_ID|1, gnl|BL_ORD_ID|2, etc), and that the BLAST+ suite as a whole is inconsistent about hiding these in its output.

Note that one side-effect of BLAST+ ignoring the user identifiers and creating its own is that it can tolerate databases made from FASTA files with accidentally duplicated identifiers, but this only causes great confusion and ambiguity in the downstream analysis. One of the ways I've seen FASTA files be created with accidentally duplicated identifiers is pooling of assemblies where generic names like contig1 (or even the more complex Trinity naming scheme) naturally cause clashes. In situations like this, I think makeblastdb should give an error when attempting to build a BLAST database.