[Bioperl-l] bug in genbank.pm

Andreas Matern andreas.matern@lbri.lionbioscience.com
Wed, 20 Feb 2002 17:18:23 -0500


Has this been fixed?  Just wondering....

"Wang, Kai" wrote:
> 
> I pointed out this problem about two months ago, but nobody changed it. The
> new GenBank file format add a "molecular shape" in the LOCUS line so current
> genbank.pm cannot process it.
> 
> in the file:
> 
> # $Id: genbank.pm,v 1.46 2002/02/14 16:41:22 jason Exp $
>     if (($2 eq 'bp') || defined($5)) {
>         if ($4 eq 'circular') {
>             $seq->molecule($3);
>             $seq->is_circular($4);
>             $seq->division($5);
>             ($date) = $line =~ /.*(\d\d-\w\w\w-\d\d\d\d)/;
>         } else {
>             $seq->molecule($3);
>             $seq->division($4);
>             $date = $5;
>         }
>     } else {
>         $seq->molecule('PRT') if($2 eq 'aa');
>         $seq->division($3);
>         $date = $4;
>     }
> 
> The above code was based on the wrong assumption that NCBI will not add
> 'linear' tag to a record.
> One example is accession number 'NM_003748'. The first line is:
> 
> LOCUS       NM_003748               3134 bp    mRNA    linear   PRI
> 01-NOV-2000
> 
> The current genbank.pm cannot recognize 01-NOV-2000.
> 
> I think the best way is to use:    $line =~
> /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(\S+)?\s+(\w\w\w)?\s+(\d\d-\w\w\w
> -\d\d\d\d)?/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 


------------------
Andreas Matern
Bioinformatician
LION Bioscience Research, Inc.
141 Portland Street, 10th Floor
Cambridge, MA 02139

andreas.matern@lbri.lionbioscience.com
phone: (617) 245-5483
fax:   (617) 245-5499