[BioPython] how to convert file full of BLAST runs into a FASTA file of sequences?
jchen at alumni.caltech.edu
jchen at alumni.caltech.edu
Thu Apr 9 22:14:02 UTC 2009
Hi Peter,
> Do you just want the FASTA file to contain the matched region of the
> sequences in the database? That information should be in the BLAST
> output - you'll need to remove any gap characters.
>
> If you want the full sequence of each matched target, that isn't in
> the database. You'd have to take the reference number and look it up.
> If you made the database yourself from a FASTA file, that should be
> easy. If it was from NR/NT or another large database then maybe
> fetching the sequences from the NCBI would be easiest (try
> Bio.Entrez).
Yeah, I actually do want the full length FASTA sequences. I didn't think
about the fact that the BLAST output only contains (partial) match
regions. I have a FASTA file of the entire proteome for the organism we
are studying.
> Are you sure you are using the XML output?
>
> With the plain text output and BLAST v.2.2.18, Biopython can only cope
> with single query output. The NCBI regularly change their plain text
> output, and we have more-or-less given up with the our plain text
> parser. The NCBI themselves do not recommend parsing it - that is
> what the XML format was introduced for.
>
That's unfortunate there's no standard BLAST format. Yeah, I am trying to
parse the plain text BLAST output. I'm not familiar with the XML output -
I don't know how to have BLAST output in XML format.
My file contains a few hundred queries. I ended up writing a little script
that extracted the name of each query and each of its significant hits. I
will probably end up writing my own scripts for getting the FASTA
sequences for each of these hits from a FASTA proteome file.
> I can't offer any more advice without the error message, your OS (e.g.
> Windows XP), version of Python, version of Biopython and ideally a
> snippet of your code which is failing.
That's alright. It will be easier for me to write my own little scripts to
parse my BLAST output file. I was just hoping there was an easy, fast way
to do it with Biopython.
Thanks for your help!
-Jerry
More information about the Biopython
mailing list