[Biopython] Tutorial Question 7.4 alignment.title
Peter
biopython at maubp.freeserve.co.uk
Fri Oct 8 09:30:58 UTC 2010
On Fri, Oct 8, 2010 at 4:06 AM, Ara Kooser <akooser at unm.edu> wrote:
> Hello all,
>
> I am a new user to Biopython. I've been working my way through the
> tutorial. I have a question about how the alignment.title works in the
> example given in section 7.4 of the tutorial. I wrote the following code:
>
> from Bio.Blast import NCBIXML
>
> E_VALUE_THRESH = 1e-30
>
> result_handle = open("test.xml")
> blast_records = NCBIXML.parse(result_handle)
> blast_record = blast_records.next()
>
> for alignment in blast_record.alignments:
> for hsp in alignment.hsps:
> if hsp.expect < E_VALUE_THRESH:
> print '****Alignment****'
> print 'sequence:', alignment.title
> print 'e value:', hsp.expect
> print 'length:', alignment.length
> print 'start:', hsp.query_start
> print 'end:',hsp.query_end
>
> To look at a .xml file that was produced by BLAST. I was wondering if there
> was a way to break up the string for information produced by the:
>
> print 'sequence:', alignment.title
>
> Basically I would like the organisms name first, followed by the locus
> number. I wasn't sure how to split up the print command.
>
> I looked at the docs over at http://biopython.org/DIST/docs/api/ to see if
> there was a tag specifically for the locus number and organism name.
>
> Thank you for your time and help.
>
> Regards,
> Ara
Hi Ara,
An example of the output you are getting and what you want
would help, but I think this isn't possible in general.
As I recall, the locus number and organism name information is
just part of the original identifier and/or description in the FASTA
file used to build the BLAST database. The NCBI tend to include
the species in the description within square brackets - but this is
just their convention, it is not a nicely tagged part of the BLAST
output which the parser could spot.
Basically I think you will have to parse the string yourself.
Peter
P.S. Alternatively if you want the organism name and have the
GI number (or similar) this can be mapped to the organism via
the NCBI taxonomy database (either online via Entrez or
by parsing a downloaded copy of the mapping).
More information about the Biopython
mailing list