[BioPython] how to convert file full of BLAST runs into a FASTA file of sequences?

Peter biopython at maubp.freeserve.co.uk
Thu Apr 9 08:49:32 UTC 2009


On Thu, Apr 9, 2009 at 1:20 AM,  <jchen at alumni.caltech.edu> wrote:
> Hello,
>
> How do I convert a file full of BLAST runs into a FASTA file of sequences
> for each hit?

Do you just want the FASTA file to contain the matched region of the
sequences in the database?  That information should be in the BLAST
output - you'll need to remove any gap characters.

If you want the full sequence of each matched target, that isn't in
the database.  You'd have to take the reference number and look it up.
 If you made the database yourself from a FASTA file, that should be
easy.  If it was from NR/NT or another large database then maybe
fetching the sequences from the NCBI would be easiest (try
Bio.Entrez).

> I have tried parsing a file full of BLAST runs per the instructions from
> the Biopython tutorial and cookbook
> (http://biopython.org/DIST/docs/tutorial/Tutorial.html), but I continue to
> get a ValueError. I have tried the hints on throwing certain exceptions,
> without much help. The only thing I have gotten working is parsing a BLAST
> output consisting of a single hit from a single query.
>
> I used BLAST v.2.2.18 to generate my BLAST output.

Are you sure you are using the XML output?

With the plain text output and BLAST v.2.2.18, Biopython can only cope
with single query output.  The NCBI regularly change their plain text
output, and we have more-or-less given up with the our plain text
parser.  The NCBI themselves do not recommend parsing it - that is
what the XML format was introduced for.

I can't offer any more advice without the error message, your OS (e.g.
Windows XP), version of Python, version of Biopython and ideally a
snippet of your code which is failing.

Peter



More information about the Biopython mailing list