[BioPython] Parser

Mon Feb 25 17:16:51 UTC 2008

Hi Sebastian,

Did you mean to send this email to me only?

On Sat, Feb 23, 2008 at 6:59 AM, smriti Sebastian
<smriti.sebastuan at gmail.com> wrote:
> hi,
> One more help plz.
> I need to retrieve the hits which are coming under
> "Sequences not found previously or not previously below threshold:" from
> PSI-Blast output file..
> or else i need to avoid those id's while parsing the psi-blast output using
> PsiBlastParser.
>  Is there any way to do that?
> I tried new_seqs attribute of rounds.But it didn't help me.
> I have attached a sample output from psi-blast.Plz help
> Thanks in advance.

The round object has "alignments" which includes all the hits, and
"reused_seqs" which is only those above the "Sequences not found
previously or not previously below threshold:" line, while "new_seqs"
is only those below the line.

Perhaps something like this will be helpful...

Peter

#!usr/bin/python
import Bio.Blast.NCBIStandalone
b_parser=Bio.Blast.NCBIStandalone.PSIBlastParser()
b_record=b_parser.parse(open('trial_psi_blast.txt','r'))

for rnd in b_record.rounds:
    old = len(rnd.reused_seqs)
    new = len(rnd.new_seqs)
    assert old+new == len(round.alignments)
    print "Round number %i, with %i old and %i new" \
          % (rnd.number, old, new)

    for i,aln in enumerate(round.alignments) :
        #The identifier is the first word (split on white space)
        identifier = rnd.alignments[i].title.split()[0]
        #Remove the leading > if present as it isn't used
        #on the reused_seqs results.
        if identifier[0] == ">" : identifier = identifier[1:]

        if i <  old:
            reused = rnd.reused_seqs[i]
            assert reused.title.split()[0] == identifier
            print "%i - %s reused, score %i, exp %f" \
            % (i, identifier, reused.score, reused.e)
        else :
            novel = rnd.new_seqs[i-old]
            assert novel.title.split()[0] == identifier
            print "%i - %s novel, score %i, exp %f" \
            % (i, identifier, novel.score, novel.e)

print "Done"