[Biopython] Converting from NCBIXML to SearchIO

Peter Cock p.j.a.cock at googlemail.com
Sat Feb 15 12:25:45 UTC 2014


On Fri, Feb 14, 2014 at 10:57 PM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
>
>   Another issue I see now that I used to poke over two iterators in a while
> loop. I was checking that each of the iterators returned a result object
> (evaluating as True).

With some of the BLAST output formats (e.g. tabular), if a query
had no records it will not appear in the output at all - and so if you
iterate over it, there will be less results than if you iterated over the
query FASTA file.  Similarly, if you had several BLAST files for the
same query (e.g. against different databases) they might be missing
results for different queries.

In this kind of situation, a single loop using zip(...) isn't going to
work. However, it would be a nice match to SearchIO.index(...)
I think. e.g. Something like this (untested):

from Bio import SeqIO
from Bio import SearchIO
blast_index = SearchIO.index(blast_file, blast_format)
for query_seq_record in SeqIO.parse(query_file, "fasta"):
    query_id = query_seq_record.id
    if query_id not in blast_index:
        #BLAST format where empty results are missing
        #e.g. BLAST tabular
        continue
    query_result = blast_index[query.id]
    if not query_result.hits:
        #BLAST result with no hits, e.g. BLAST text
        continue
    print("Have hits for %s" % query_id)

Peter



More information about the Biopython mailing list