[BioPython] Is query_length really the length of query?
Yvan Strahm
yvan.strahm at bccs.uib.no
Tue Apr 14 14:00:17 UTC 2009
Peter wrote:
> On Wed, Apr 1, 2009 at 11:34 AM, <Yvan.Strahm at bccs.uib.no> wrote:
>> Hello List
>>
>> I try to get the length of the query from the blast result itself
>>
>> like that:
>> result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe,
>> "blastn",
>> my_blast_db,
>> my_blast_file)
>>
>> from Bio.Blast import NCBIXML
>> blast_records = NCBIXML.parse(result_handle)
>> for blast_record in blast_records
>>
>> but
>> blast_record.query_length return None
>> and
>> blast_record.query_letters return the actual size
>>
>> Should I test the length of the query before the blast result? O did I
>> miss-interpreted the meaning of query_length and query_letters?
>>
>> Thanks for your time
>>
>> Is query_length really the length of query?
>
> You can use query_letters (although it wouldn't hurt to double check
> this if you have the query sequence available). With the current BLAST
> XML parser query_length is always None (but I think we should fix so
> they are both populated).
>
> Its an unfortunate historical accident dating back to the plain text
> BLAST parser. The plain text output printed the query length in two
> places, with different captions, which was reflected in the names
> given in the BLAST record (the values should be the same, assuming the
> BLAST output is sane). The XML output doesn't have this redundancy,
> but our XML parser tries to use the same object to hold the results.
> See: http://bugzilla.open-bio.org/show_bug.cgi?id=2176#c12
>
> Have a look at the discussion on Bug 2176 for more about this
> (including the far more complicated situation for the database length
> which has multiple meanings).
>
> This seems like a timely reminder that we could perhaps tidy up a
> little of this ready for Biopython 1.50 ...
>
> Peter
Hello,
I tried to check the length before sending it to blast.
My problem is that all the query sequences are in a file so I used SeqIO to read/parse them
for record in SeqIO.parse(fh, "fasta"):
l_query = len(record.seq)
result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, record.seq)
doesn't work as NCBIStandalone.blastall takes a file as infile.
Should I write a temporary file with the record.id and record.seq and pass it to
NCBIStandalone.blastall ?
or is there an easier way?
for now I am just use the blast_record.query_letters variable.
I am using Bioperl 1.49 and Python 2.6.1
cheers,
yvan
More information about the Biopython
mailing list