[Bioperl-l] RemoteBlast problem

Scott Markel smarkel at scitegic.com
Fri Oct 28 16:53:03 EDT 2005


Tom,

I haven't seen your exact problem, but I recently saw something
similar.  In my case (blastn, NM_021267, Chromosome DB, default
parameters), one of the hits has a comment about the sequence
features.

===============================================
 >gi|23618947|ref|NC_004331.1| Download subject sequence spanning the HSP Plasmodium falciparum 3D7 chromosome 13, *** SEQUENCING IN PROGRESS
             ***, 33 ordered pieces
           Length=2732359

  Features in this part of subject sequence:
    hypothetical protein

  Score = 46.1 bits (23),  Expect = 0.27
  Identities = 23/23 (100%), Gaps = 0/23 (0%)
  Strand=Plus/Minus

Query  830      ACATCCCCTTCTACTTCTTCTTC  852
                 |||||||||||||||||||||||
Sbjct  1369466  ACATCCCCTTCTACTTCTTCTTC  1369444
===============================================

BioPerl's parser is choking on the extra lines.

Scott

Thomas J Keller wrote:

> Greetings,
> I'm using perl 5.8.6 and the fink installation of bioperl 1.4.5 on an  
> Apple G5 running OS X 10.4.2
> running my script with a simple fasta dna sequence file:
> $ bp_remote_blast2.pl -p blastx -d nr -i test.fa
> 
> Here's the error:
> 
> ------------- EXCEPTION  -------------
> MSG: no data for midline Query  78    
> HRRPSFSACRCVLSASSVFPSRLGNNYITAAGAQVLAEGLRGNTSLQFLG  227
> STACK Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ 
> SearchIO/blast.pm:1151
> STACK toplevel /Users/kellert/Sandbox/Perlscripts/bp_remote_blast2.pl:90
> 
> 
> It looks like Bio::SearchIO::blast
> is choking on the result from Bio::Tools::Run::RemoteBlast
> 
> I lifted this from Jason's exampleL bp_remote_blast but modified it  to 
> use SearchIO method instead of BPLite:
> 
> use warnings;
> use strict;
> use vars qw($USAGE);
> 
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SeqIO;
> use Bio::SearchIO;
> use Getopt::Long;
> 
> $USAGE = "remote_blast.pl [-h] [-p prog] [-d db] [-e expect]  [-f  
> seqformat] -i seqfile\n";
> 
> my ($prog, $db, $expect );
> 
> my ($sequencefile,$sequenceformat,$help) = (undef, 'fasta',undef);
> 
> &GetOptions('prog|p=s'               => \$prog,
>         'db|d=s'                 => \$db,
>         'expect|e=s'             => \$expect,
>         'input|i=s'              => \$sequencefile,
>         'format|f=s'             => \$sequenceformat,
>         'help|h'                 => \$help,
>         );
> 
> if( $help ) {
>     exec('perldoc', $0);
>     die;
> }
> 
> if( !defined $prog ) {
>     die($USAGE . "\n\tMust specify a valid program name ([t]blast 
> [pxn])\n");
> }
> if( !defined $db ) {
>     die($USAGE . "\n\tMust specify a db (e.g. \'nr\') to search\n");
> }
> if( !defined $sequencefile ) {
>     die($USAGE . "\n\tMust specify a path to a sequence file.\n");
> }
> 
> my $factory = new Bio::Tools::Run::RemoteBlast ('-prog'      => $prog,
>                              '-data'      => $db,
>                              '-expect'    => $expect,
>                              'readmethod' => 'SearchIO',        #use  
> SearchIO to parse
>                              );
> 
> # submit_blast can only currenly handle fasta format files so I'll
> # preprocess outside of the module but I'd rather be sure here
> 
> my $input;
> if( $sequenceformat !~ /fasta/ ) {
>     my @seqs;
>     my $seqio = new Bio::SeqIO('-format' => $sequenceformat,
>                    '-file'   => $sequencefile );
>     while( my $seq = $seqio->next_seq() ) {
>     push @seqs, $seq;
>     }
>     $input = \@seqs;
> } else {
>     $input = $sequencefile;
> }
> my $r = $factory->submit_blast($input);
> 
> my $v = 1;
> ## set to 0 to turn off Sanity check messages
> print STDERR  "retrieving blasts for $input ...\n" if( $v > 0);
> 
> while ( my @rids = $factory->each_rid ) {
>     foreach my $rid ( @rids ) {
>         my $rc = $factory->retrieve_blast($rid);
>         if( !ref($rc) ) {
>             if( $rc < 0 ) {
>                 $factory->remove_rid($rid);
>             }
>             print STDERR "." if ( $v > 0 );
>             sleep 5;
>             } else {
>                 my $result = $rc->next_result();
>                 next unless ($result);
>                  print "\nQuery Name: ", $result->query_name(), "\n";
>                 #save the output
>                 my $filename = $result->query_name()."\.out";
>                 print STDERR  "Saving result to $filename.\n" if( $v  > 0);
>                 $factory->save_output($filename);
>                 $factory->remove_rid($rid);
>                  print "\nQuery Name: ", $result->query_name(), "\n";
>                 while ( my $hit = $result->next_hit ) {
>                     next unless ( $v > 0);
>                             print "\thit name is ", $hit->name, "\n";
>                             while( my $hsp = $hit->next_hsp ) {
>                                   print "\t\tscore is ", $hsp- >score, 
> "\n";
>                               }
>                         }
>                   }
>             }
>           }
> 
> I'm guessing either NCBI has changed it blast format and bioperl 1.4  no 
> longer works, or I'm missing something that should be obvious.
> 
> Help much apprecieated.
> 
> Tom K
> 
> 
> Thomas J. Keller, Ph.D.
> Director, MMI Core Facility
> Oregon Health & Science University
> 3181 SW Sam Jackson Park Rd.
> Portland, OR, USA,   97239
> 
> http://www.ohsu.edu/research/core
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com



More information about the Bioperl-l mailing list