[Bioperl-l] Retrieving hits in order in SeqIO

Adam Woolfe awoolfe at rfcgr.mrc.ac.uk
Thu Mar 31 08:37:55 EST 2005


Hi
Im trying to retrieve the top hit from a set of blast results in a
single file using SeqIO.

I assumed that SeqIO processed the hits in the same order as in the input
file (i.e. from hits with the lowest evalue onwards) but Ive been getting
some strange results back where the first hit is actually the last one in
the list:

e.g. the desciption lines of the hits in the input file is as follows:

                                                 Score      E
Sequences producing significant alignments:       (bits)  Value


EM:16 chromosome:NCBI35:16:49081023:50081022:1     277    1e-72
EM:20 chromosome:NCBI35:20:49832988:50832987:1      54    3e-05
EM:4 chromosome:NCBI35:4:103445867:104383982:1      40     0.52
EM:17 chromosome:NCBI35:17:32359244:33357400:1      40     0.52
EM:10 chromosome:NCBI35:10:106096982:107096981:1    40     0.52
EM:10 chromosome:NCBI35:10:77173138:78173137:1      40     0.52


As a test a highly stripped version of the perlscript:
-------------------------------------------------------------
$file = "/path/to/infile.blast";

$in2 = new Bio::SearchIO( -format => 'blast',
                          -file => "$file");

	      while( my $result = $in2->next_result ) {

		  while( my $hit = $result->next_hit ) {

		    while( my $hsp = $hit->next_hsp ) {

                           print "hit:".$hit->name." ".$hit->description;
	                                              }
							}
							  }

------------------------------------------------------------
the output of this is:

hit:EM:10 chromosome:NCBI35:10:77173138:78173137:1
hit:EM:16 chromosome:NCBI35:16:49081023:50081022:1
hit:EM:20 chromosome:NCBI35:20:49832988:50832987:1
hit:EM:4 chromosome:NCBI35:4:103445867:104383982:1
hit:EM:17 chromosome:NCBI35:17:32359244:33357400:1
hit:EM:10 chromosome:NCBI35:10:106096982:107096981:1


Why is it not giving me the results in the correct order? In other
examples ive looked at, the top hit is not always the last (as in this
example) so it seems like something very random is going on.

Could anyone shed any light on this, I'd really appreciate it.

many thanks,

Adam

P.S. Im using Bioperl 1.4 on Solaris9



More information about the Bioperl-l mailing list