[Biopython] Filter Blast results
Wibowo Arindrarto
w.arindrarto at gmail.com
Tue Feb 26 17:27:21 UTC 2013
Hi Justin,
For your purpose, you can try using the SearchIO module
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc101), from
the latest Biopython (1.61).
Here's my attempt to have a similar working function:
from Bio import SearchIO, SeqIO
fasta_ids = set([x.id for x in SeqIO.parse('fasta', 'fasta')]) # get
all fasta IDs in a set
with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
hits = set([x.id for x in qresult]) # get all the ID in a set
present = fasta_ids.intersection(hits) # output all IDs
present in both sets
if present: # set is not empty
hit.write(qresult.id)
else:
no_hit.write(qresult.id)
On another note, if you are always checking against the same Fasta
file, you can try to create your own BLAST database consisting of only
those files and search against them, so any BLAST results you have
will always at least
contain one of the sequences in your FASTA file.
This makes the functions slightly simpler:
from Bio import SearchIO
with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
# empty queries evaluate to False
if qresult:
hit.write(qresult.id)
else:
no_hit.write(qresult.id)
Both functions still require you to store all the FASTA IDs in memory,
but should be more reasonable than storing whole SeqRecord objects.
Hope that helps,
Bow
On Tue, Feb 26, 2013 at 5:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
>> I know that there is already a script in the Cookbook for filtering out
>> blast queries with no hits, but it involves holding all of the sequence
>> objects in memory, which isn't good if you have to work with a lot of
>> sequences.
>
> Hi Justin,
>
> Which example are you referring too? It doesn't sound very efficient.
>
> There are some wiki pages with user contributed cookbook recipes:
> http://biopython.org/wiki/Category:Cookbook
>
> There is also the "Biopython Tutorial and Cookbook", online here:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list