[Bioperl-l] Suggested patches take 2

Murad Nayal murad@godel.bioc.columbia.edu
Wed, 14 Mar 2001 16:05:56 +0100


This is a multi-part message in MIME format.
--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



Dear all,

few days ago I submitted 2 patches for SeqIO/embl and SeqIO/swiss to
accommodate sequence files trembl.dat and trembl_new.dat and the
variable slicing versions of swissprot and trembl. Unfortunately I
didn't make time then to rerun the test suite after the modifications.
sure enough one of my changes resulted in a truncated accession code. I
fixed the regular expression in question and now the patches pass all
tests on a fresh (today) checkout of bioperl-live. the fixed patches are
attached. very sorry for being careless earlier. I found bioperl's
ability to read the afro mentioned files important to me and I do hope
you'll find it appropriate to add the patches to bioperl.

all the best
Murad
--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii;
 name="embl.pm.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="embl.pm.patch"

*** embl.pm.orig	Wed Mar 14 15:16:12 2001
--- embl.pm	Wed Mar 14 15:47:18 2001
***************
*** 151,160 ****
         return undef; # end of file
     }
     $line =~ /^ID\s+\S+/ || $self->throw("EMBL stream with no ID. Not embl in my book");
!    $line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;\s+(\S+)\;/;
!    $name = $1;
!    $mol = $2;
!    $div = $3;
     if(! $name) {
         $name = "unknown id";
     }
--- 151,166 ----
         return undef; # end of file
     }
     $line =~ /^ID\s+\S+/ || $self->throw("EMBL stream with no ID. Not embl in my book");
! 
!    if   ($line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;\s+(\S+)\;/) {
!      $name = $1;
!      $mol  = $2;
!      $div  = $3;
!    } elsif($line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;/        ) {
!      $name = $1;
!      $mol  = $2;
!    }
! 
     if(! $name) {
         $name = "unknown id";
     }
***************
*** 176,181 ****
--- 182,193 ----
     until( !defined $buffer ) {
         $_ = $buffer;
  
+        # Exit if you found FT or SQ before encountering FH
+        if(/^FT   \w/ or /^SQ /) {
+          $self->_pushback($buffer);
+          last;
+        }
+ 
         # Exit at start of Feature table
         last if /^FH/;
  
***************
*** 185,201 ****
         }
  
         #accession number
!        if( /^AC\s+(\S+);?/ ) {
! 	   $acc = $1;
! 	   $acc =~ s/\;//;
! 	   $seq->accession_number($acc);
         }
         
         #version number
!        if( /^SV\s+(\S+);?/ ) {
! 	   my $sv = $1;
! 	   $sv =~ s/\;//;
! 	   $seq->seq_version($sv);
         }
  
         #date (NOTE: takes last date line)
--- 197,209 ----
         }
  
         #accession number
!        if( /^AC\s+([^\s;]+);?/ ) {
! 	   $seq->accession_number($1);
         }
         
         #version number
!        if( /^SV\s+([^\s;]+);?/ ) {
! 	   $seq->seq_version($1);
         }
  
         #date (NOTE: takes last date line)

--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii;
 name="swiss.pm.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="swiss.pm.patch"

*** swiss.pm.org	Mon Mar 12 02:22:37 2001
--- swiss.pm	Mon Mar 12 02:21:24 2001
***************
*** 150,161 ****
         return undef; # end of file
     }
  
!    $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ 
!      || $self->throw("swissprot stream with no ID. Not swissprot in my book");
!    $name = $1."_".$2;
!    $seq->primary_id($1);
!    $seq->division($2);
!    $seq->molecule($4);
      # this is important to have the id for display in e.g. FTHelper, otherwise
      # you won't know which entry caused an error
     $seq->display_id($name);
--- 150,168 ----
         return undef; # end of file
     }
  
!    if     ($line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/) {
!      $name = $1."_".$2;
!      $seq->primary_id($1);
!      $seq->division($2);
!      $seq->molecule($4);
!    } elsif($line =~ /^ID\s+(\S+)\s+([^\s;]+);\s+([^\s;]+);/              ) {
!      $name = $1;
!      $seq->primary_id($1);
!      $seq->molecule($3);
!    } else                                                                  {
!      $self->throw("swissprot stream with no ID. Not swissprot in my book");
!    }
! 
      # this is important to have the id for display in e.g. FTHelper, otherwise
      # you won't know which entry caused an error
     $seq->display_id($name);

--------------90AEBD19D1ECB7DED5073BC2--