[BioPython] Is query_length really the length of query?

Yvan Strahm yvan.strahm at bccs.uib.no
Tue Apr 14 14:00:17 UTC 2009



Peter wrote:
> On Wed, Apr 1, 2009 at 11:34 AM,  <Yvan.Strahm at bccs.uib.no> wrote:
>> Hello List
>>
>> I try to get the length of the query from the blast result itself
>>
>> like that:
>> result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe,
>> "blastn",
>>                                                      my_blast_db,
>> my_blast_file)
>>
>> from Bio.Blast import NCBIXML
>> blast_records = NCBIXML.parse(result_handle)
>> for blast_record in blast_records
>>
>> but
>> blast_record.query_length return None
>> and
>> blast_record.query_letters return the actual size
>>
>> Should I test the length of the query before the blast result? O did I
>> miss-interpreted the meaning of query_length and query_letters?
>>
>> Thanks for your time
>>
>> Is query_length really the length of query?
> 
> You can use query_letters (although it wouldn't hurt to double check
> this if you have the query sequence available). With the current BLAST
> XML parser query_length is always None (but I think we should fix so
> they are both populated).
> 
> Its an unfortunate historical accident dating back to the plain text
> BLAST parser.  The plain text output printed the query length in two
> places, with different captions, which was reflected in the names
> given in the BLAST record (the values should be the same, assuming the
> BLAST output is sane).  The XML output doesn't have this redundancy,
> but our XML parser tries to use the same object to hold the results.
> See: http://bugzilla.open-bio.org/show_bug.cgi?id=2176#c12
> 
> Have a look at the discussion on Bug 2176 for more about this
> (including the far more complicated situation for the database length
> which has multiple meanings).
> 
> This seems like a timely reminder that we could perhaps tidy up a
> little of this ready for Biopython 1.50 ...
> 
> Peter

Hello,

I tried to check the length before sending it to blast.
My problem is that all the query sequences are in a file so I used SeqIO to read/parse them

for record in SeqIO.parse(fh, "fasta"):
	l_query = len(record.seq)
	result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn",
                                                           my_blast_db, record.seq)

doesn't work as NCBIStandalone.blastall takes a file as infile.

Should I write a temporary file with the record.id and record.seq and pass it to 
NCBIStandalone.blastall ?

or is there an easier way?

for now I am just use the blast_record.query_letters variable.

I am using Bioperl 1.49 and Python 2.6.1

cheers,
yvan



More information about the Biopython mailing list