[BioPython] target sequence length in blast parsing

Ann Loraine aloraine at gmail.com
Sun Jan 7 18:58:33 UTC 2007


Dear Michiel,

Thank you for your fast reply!

It seems to be relatively straightforward to get the query length from
the "rec" object -

e.g.,

>>> rec.query_length
240

The target/subject length is harder to find. Would the XML parser be
able to retrieve this information? I'm not sure which of the various
blast parse output objects contain a slot for this data. Ideally,
there could be a variable under the alignment object called
subject_length or something similar which would capture this
information.

For example, an alignment object has these data:

>>> dir(a)
['__doc__', '__init__', '__module__', '__str__', 'hsps', 'length', 'title']

I will download the new code and take a look!

Thank you again,

Ann

On 1/7/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> There are two things you can do:
>
> First, try dir(record) on the Blast record to see if the information you
> are looking for is hiding in one of those variables.
>
> If you can't find it there, the following should work, assuming that you
> parse Blast XML output instead of Blast plain-text output (the latter
> may or may not work):
>
>  >>> from Bio.Blast import NCBIXML
>  >>> inputfile = open("myblastoutput.xml")
>  >>> records = NCBIXML.parse(inputfile)
>  >>> for record in records:
> ...      print record.query_letters
>  >>> inputfile.close()
>
> Two caveats:
> 1) This uses the latest Blast parsing code in CVS; it is not in
> Biopython release 1.42. You can download the new files in Bio/Blast/*.py
> from CVS and just copy them over the corresponding files of release 1.42
> to make this work.
> 2) Jacob Joseph makes the (I believe correct) argument that there are
> some inconsistencies between variable names in the Biopython blast
> parsers. So record.query_letters may be called differently in a future
> Biopython release. See Bug #2176 on Bugzilla for more information.
>
> --Michiel.
>
> Ann Loraine wrote:
> > Dear all,
> >
> > I have a question about blast parsing in biopython - any tips would be
> > much appreciated.
> >
> > How can I access the length of the target sequence (e.g., 669 in the
> > following text) from alignment (or other?) objects retrieved from a
> > blast report parse?
> >
> > ** example **
> >
> >> gi|34908012|ref|NP_915353.1| putative carboxypeptidase D [Oryza sativa (japo
> > nica
> >            cultivar-group)]
> >           Length = 669
> >
> >  Score =  247 bits (625), Expect(2) = 3e-71
> >  Identities = 108/167 (64%), Positives = 132/167 (78%)
> >  Frame = +2
> >
> > Yours,
> >
> > Ann
> >
>
>


-- 
Ann Loraine
Assistant Professor
Departments of Genetics, Biostatistics, and
Section on Statistical Genetics
University of Alabama at Birmingham
http://www.ssg.uab.edu
http://www.transvar.org



More information about the Biopython mailing list