BLAST+ Christmas Wish List

Dear Santa,

Please could you ask the Elves at the NCBI to deliver the following BLAST+ feature requests for Christmas 2014?

Thank you,


P.S. Do they think I have been naughty or nice with my BLAST blog posts?


Column headers in BLAST+ tabular and CSV output

In the last couple of years, my preferred BLAST output format has switched from BLAST XML to plain tabular output. The main reason for this it is easier to parse, and now gives easy access to more fields - BLAST+ 2.2.28 added descriptions and taxonomy output to the tabular and CSV output, but the cumulative effect is BLAST XML has been lagging behind.

However, there is a simple change the NCBI could make to greatly improve the usability of the tabular or CSV output - label the columns with a header line! This is vital meta-data: No-one should be forced to guess-the-columns when presented with a data file. 


BLAST! No frequency ratios needed for composition-based statistics

While working on updating the NCBI BLAST+ wrapper for Galaxy for any changes in the new BLAST+ 2.2.30 release, I hit a cryptic error message from deltablast

$ deltablast -query rhodopsin_proteins.fasta -subject four_human_proteins.fasta -evalue 1e-08 -outfmt "6 qseqid sseqid score" -rpsdb /data/blastdb/cdd_delta
BLAST engine error: /data/blastdb/cdd_delta contains no frequency ratios needed for composition-based statistics.
Please disable composition-based statistics when searching against /data/blastdb/ncbi/cdd/cdd_delta.

To cut a long story short, to fix this you need to download and unpack a newer cdd_delta.tar.gz which now includes another file cdd_delta.freq containing frequency ratio information which the newer deltablast tool requires.

The same applies to the rpsblast tool, although here you just get a warning rather than an error:

$ rpsblast -query four_human_proteins.fasta -db /data/blastdb/cdd_delta -evalue 1e-08 -outfmt "6 qseqid sseqid score"
Warning: /data/blastdb/cdd_delta contain(s) no freq ratios needed for composition-based statistics.
RPSBLAST will be run without composition-based statistics.
sp|Q9BS26|ERP44_HUMAN    gnl|CDD|222416    401
sp|P06213|INSR_HUMAN    gnl|CDD|238021    137
sp|P08100|OPSD_HUMAN    gnl|CDD|215646    411


BLAST+ 2.2.29 upset by [key=value] entries in queries

I recently got a weird error/warning message (repeated) in my BLAST+ stderr output,

Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.

This turns out to be due to having [key=value] tags in my query FASTA file, and appears to be a new bug introduced in BLAST+ 2.2.29 (as BLAST+ 2.2.26 through 2.2.28 inclusive are not affected).

Update (31 October 2014): This was fixed in BLAST+ 2.2.30 (released yesterday).


BLAST XML output needs more love from NCBI

For some time I had thought that the best option for computer parsing of BLAST+ output was BLAST XML. It had all the key bits of information, and XML is designed for automated parsing. However, with the extra fields added to the tabular or comma separated output in BLAST+ 2.2.28 like the long overdue hit descriptions, and taxonomy fields, I think they are now preferable. BLAST XML is now lagging behind!