[BioPython] Is query_length really the length of query?

Peter biopython at maubp.freeserve.co.uk
Wed Apr 1 10:59:24 UTC 2009


On Wed, Apr 1, 2009 at 11:34 AM,  <Yvan.Strahm at bccs.uib.no> wrote:
>
> Hello List
>
> I try to get the length of the query from the blast result itself
>
> like that:
> result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe,
> "blastn",
>                                                      my_blast_db,
> my_blast_file)
>
> from Bio.Blast import NCBIXML
> blast_records = NCBIXML.parse(result_handle)
> for blast_record in blast_records
>
> but
> blast_record.query_length return None
> and
> blast_record.query_letters return the actual size
>
> Should I test the length of the query before the blast result? O did I
> miss-interpreted the meaning of query_length and query_letters?
>
> Thanks for your time
>
> Is query_length really the length of query?

You can use query_letters (although it wouldn't hurt to double check
this if you have the query sequence available). With the current BLAST
XML parser query_length is always None (but I think we should fix so
they are both populated).

Its an unfortunate historical accident dating back to the plain text
BLAST parser.  The plain text output printed the query length in two
places, with different captions, which was reflected in the names
given in the BLAST record (the values should be the same, assuming the
BLAST output is sane).  The XML output doesn't have this redundancy,
but our XML parser tries to use the same object to hold the results.
See: http://bugzilla.open-bio.org/show_bug.cgi?id=2176#c12

Have a look at the discussion on Bug 2176 for more about this
(including the far more complicated situation for the database length
which has multiple meanings).

This seems like a timely reminder that we could perhaps tidy up a
little of this ready for Biopython 1.50 ...

Peter




More information about the Biopython mailing list