[Bioperl-l] important change for Bio::SeqIO::GenBank

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 31 Oct 2001 16:49:53 +0000


"Wang, Kai" wrote:
> 
> It is the first time I visit this mailing list. I do not know whether
> somebody has pointed out this, but it is a serious problem.

Which version of BioPerl you are using?

I fixed the problem circular  molecules two weeks ago in bioperl-live ("from
CVS"). At the same time I noticed the announcement of the LOCUS line change,
but did not do anything about it. See:
http://bioperl.org/pipermail/bioperl-l/2001-October/006397.html
for the thread.

Do you think you could check out the latest code and check your changes to
it. 
(see http://cvs.bioperl.org/ for instructions of getting a full copy or
check changes at WEbCVS: 'View CVS' on the bioperl home page.)

Thanks,

	-Heikki

> When I browsed source code of Bio::SeqIO in the Bioperl project, I found a
> bug that the code cannot process moleuclar shape of "circular" or "linear",
> which are annotated in a few GenBank files.
> 
> An even serious problem is that a new GenBank format will be published soon.
> The "LOCUS" line will be changed a little. According to the release notes of
> GenBank, "the introduction of the new format will occur with Release 127.0
> in December 2001". Obviously the current Bio::SeqIO code cannot process the
> new format.
> 
> Because of the above two reasons, I rewrite a small part of the
> Bio::SeqIO::GenBank code and hopefully it will support the new file format
> as well as the old one. A new hash key "molshape" is introduced to represent
> "circular" or "linear". If you think my code is acceptable, please included
> it into new release of BioPerl. GenBank records with new format will be
> released by December 2001 this year!
> 
> Below is my code:
> 
> # CODE FROM Bio::SeqIO::GenBank
> sub next_seq{
>         my ($self,@args) = @_;
>         my ($pseq,$c,$name,@desc,$seqc,$mol,$div,$date);
>         my $line;
>         my @acc = ();
>         my $seq = Bio::Seq::RichSeq->new('-verbose' =>$self->verbose());
> 
>         while(defined($line = $self->_readline())) {
>                 $line =~ /^LOCUS\s+\S+/ && last;
>         }
>         return undef if( !defined $line ); # end of file
> # BEGIN MY CODE
>         $line =~
> /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(linear|circular)?\s+(\w{3})\s+(\
> S+)/ || do {
>                 $line =~ /^LOCUS\s+(\S+)/ ||
>                         $self->throw("GenBank stream with no LOCUS. Not
> GenBank in my book. Got $line");
>                 $self->warn("GenBank record with LOCUS line in unexpected
> format. Attributes from this line other than ID will be missing.")
> 
>         }
>         $name = $1;
>         $seq->display_id($name);
>         if ($2 eq 'bp') {
>                 $seq->moltype ('dna');
>                 $seq->molecule ($3);                #examples of $3 are
> 'ss-tRNA', 'ds-DNA', 'DN', 'sno-RNA'
>                 $seq->molshape ($4);               #molshape means either
> 'circular' or 'linear': this term is not included in your original program
> 
>         } elsif ($2 eq 'aa') {
>                 $seq->moltype ('protein');
>                 $seq->molecule ('PRT');              #for protein there are
> no $3 and $4 defined.
>         }
>         $seq->division ($5);
>         defined ($6) and $seq->add_date ($6);
> # END MY CODE
> # BEGIN YOUR CODE
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________