[BioPython] Parser
Peter
biopython at maubp.freeserve.co.uk
Mon Feb 25 17:16:51 UTC 2008
Hi Sebastian,
Did you mean to send this email to me only?
On Sat, Feb 23, 2008 at 6:59 AM, smriti Sebastian
<smriti.sebastuan at gmail.com> wrote:
> hi,
> One more help plz.
> I need to retrieve the hits which are coming under
> "Sequences not found previously or not previously below threshold:" from
> PSI-Blast output file..
> or else i need to avoid those id's while parsing the psi-blast output using
> PsiBlastParser.
> Is there any way to do that?
> I tried new_seqs attribute of rounds.But it didn't help me.
> I have attached a sample output from psi-blast.Plz help
> Thanks in advance.
The round object has "alignments" which includes all the hits, and
"reused_seqs" which is only those above the "Sequences not found
previously or not previously below threshold:" line, while "new_seqs"
is only those below the line.
Perhaps something like this will be helpful...
Peter
#!usr/bin/python
import Bio.Blast.NCBIStandalone
b_parser=Bio.Blast.NCBIStandalone.PSIBlastParser()
b_record=b_parser.parse(open('trial_psi_blast.txt','r'))
for rnd in b_record.rounds:
old = len(rnd.reused_seqs)
new = len(rnd.new_seqs)
assert old+new == len(round.alignments)
print "Round number %i, with %i old and %i new" \
% (rnd.number, old, new)
for i,aln in enumerate(round.alignments) :
#The identifier is the first word (split on white space)
identifier = rnd.alignments[i].title.split()[0]
#Remove the leading > if present as it isn't used
#on the reused_seqs results.
if identifier[0] == ">" : identifier = identifier[1:]
if i < old:
reused = rnd.reused_seqs[i]
assert reused.title.split()[0] == identifier
print "%i - %s reused, score %i, exp %f" \
% (i, identifier, reused.score, reused.e)
else :
novel = rnd.new_seqs[i-old]
assert novel.title.split()[0] == identifier
print "%i - %s novel, score %i, exp %f" \
% (i, identifier, novel.score, novel.e)
print "Done"
More information about the Biopython
mailing list