[BioPython] blastall questions (output, full length subject)

Christof Winter winter at biotec.tu-dresden.de
Mon Jan 21 08:18:15 EST 2008


Stefanie Lück wrote:
> Hi!
> 
> I need again some advice for a local blast with blastall.
> 
> First of all, everything works fine, I just have some questions on how to
> continue:
> 
> 1) How can I see the full length of the subject? I always can see only this
> part, which is matching with the query.

Hi Stefanie,

you suffered from the slightly confusing naming in the BioPython NCBIXML class.
Here is an explanation:

alignment.length = total length of unaligned hit sequence
record.query_letters = length of query sequence
len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment

with

parser = NCBIXML.BlastParser()
records = parser.parse(open(blast_results_file))

for record in records:
     for alignment in record.alignments:
         for hsp in alignment.hsps:
             # do s.th.

> 2) How are your suggestions to continue with the xml output? I want to sort
> the Hits by % of matching and my idea was it to put everything in a
> dictionary (%match as key and all the rest information's as values).

If you refer to the sequence identity percentage, you can use
sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query))

To use the sequence identity as key in a dictionary, you would have to keep a 
list (or set) of records as value, since different records (hits) can have the 
same sequence identity.

I would recommend to just keep a set (or list) of records, and use the key or 
cmp parameter of Python's sort function to sort by one field of the record:
http://wiki.python.org/moin/HowTo/Sorting

If you only need some information of the record, it might be even easier to 
store this information in a list, and keep a set (or list) of these lists.

HTH,
Christof

PS: Maybe we could enrich NCBIXML.py for some more meaningful variables?

> 
> Is this the right way?
> 
> 
> 
> Greetings
> 
> Stefanie
> 
> 
> 
> _______________________________________________ BioPython mailing list  -
> BioPython at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/biopython


More information about the BioPython mailing list