[BioPython] target sequence length in blast parsing
Ann Loraine
aloraine at gmail.com
Sun Jan 7 18:58:33 UTC 2007
Dear Michiel,
Thank you for your fast reply!
It seems to be relatively straightforward to get the query length from
the "rec" object -
e.g.,
>>> rec.query_length
240
The target/subject length is harder to find. Would the XML parser be
able to retrieve this information? I'm not sure which of the various
blast parse output objects contain a slot for this data. Ideally,
there could be a variable under the alignment object called
subject_length or something similar which would capture this
information.
For example, an alignment object has these data:
>>> dir(a)
['__doc__', '__init__', '__module__', '__str__', 'hsps', 'length', 'title']
I will download the new code and take a look!
Thank you again,
Ann
On 1/7/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> There are two things you can do:
>
> First, try dir(record) on the Blast record to see if the information you
> are looking for is hiding in one of those variables.
>
> If you can't find it there, the following should work, assuming that you
> parse Blast XML output instead of Blast plain-text output (the latter
> may or may not work):
>
> >>> from Bio.Blast import NCBIXML
> >>> inputfile = open("myblastoutput.xml")
> >>> records = NCBIXML.parse(inputfile)
> >>> for record in records:
> ... print record.query_letters
> >>> inputfile.close()
>
> Two caveats:
> 1) This uses the latest Blast parsing code in CVS; it is not in
> Biopython release 1.42. You can download the new files in Bio/Blast/*.py
> from CVS and just copy them over the corresponding files of release 1.42
> to make this work.
> 2) Jacob Joseph makes the (I believe correct) argument that there are
> some inconsistencies between variable names in the Biopython blast
> parsers. So record.query_letters may be called differently in a future
> Biopython release. See Bug #2176 on Bugzilla for more information.
>
> --Michiel.
>
> Ann Loraine wrote:
> > Dear all,
> >
> > I have a question about blast parsing in biopython - any tips would be
> > much appreciated.
> >
> > How can I access the length of the target sequence (e.g., 669 in the
> > following text) from alignment (or other?) objects retrieved from a
> > blast report parse?
> >
> > ** example **
> >
> >> gi|34908012|ref|NP_915353.1| putative carboxypeptidase D [Oryza sativa (japo
> > nica
> > cultivar-group)]
> > Length = 669
> >
> > Score = 247 bits (625), Expect(2) = 3e-71
> > Identities = 108/167 (64%), Positives = 132/167 (78%)
> > Frame = +2
> >
> > Yours,
> >
> > Ann
> >
>
>
--
Ann Loraine
Assistant Professor
Departments of Genetics, Biostatistics, and
Section on Statistical Genetics
University of Alabama at Birmingham
http://www.ssg.uab.edu
http://www.transvar.org
More information about the Biopython
mailing list