[BioPython] blastall questions (output, full length subject)
Christof Winter
winter at biotec.tu-dresden.de
Mon Jan 21 08:18:15 EST 2008
Stefanie Lück wrote:
> Hi!
>
> I need again some advice for a local blast with blastall.
>
> First of all, everything works fine, I just have some questions on how to
> continue:
>
> 1) How can I see the full length of the subject? I always can see only this
> part, which is matching with the query.
Hi Stefanie,
you suffered from the slightly confusing naming in the BioPython NCBIXML class.
Here is an explanation:
alignment.length = total length of unaligned hit sequence
record.query_letters = length of query sequence
len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment
with
parser = NCBIXML.BlastParser()
records = parser.parse(open(blast_results_file))
for record in records:
for alignment in record.alignments:
for hsp in alignment.hsps:
# do s.th.
> 2) How are your suggestions to continue with the xml output? I want to sort
> the Hits by % of matching and my idea was it to put everything in a
> dictionary (%match as key and all the rest information's as values).
If you refer to the sequence identity percentage, you can use
sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query))
To use the sequence identity as key in a dictionary, you would have to keep a
list (or set) of records as value, since different records (hits) can have the
same sequence identity.
I would recommend to just keep a set (or list) of records, and use the key or
cmp parameter of Python's sort function to sort by one field of the record:
http://wiki.python.org/moin/HowTo/Sorting
If you only need some information of the record, it might be even easier to
store this information in a list, and keep a set (or list) of these lists.
HTH,
Christof
PS: Maybe we could enrich NCBIXML.py for some more meaningful variables?
>
> Is this the right way?
>
>
>
> Greetings
>
> Stefanie
>
>
>
> _______________________________________________ BioPython mailing list -
> BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the BioPython
mailing list