[Bioperl-l] can't parse GenBank correctly (SeqIO or included modules)

t-nakazato at muj.biglobe.ne.jp t-nakazato at muj.biglobe.ne.jp
Fri Sep 30 07:53:18 EDT 2005


Hi,

I'll report the case that BioPerl can't parse GenBank file
correctly.
BioPerl (SeqIO or included modules) confuses REMARK and PUBMED
line in GenBank file.
I'm running Ver.1.5 on RedHat 9.


I wrote script as follows to retrieve PMID from GenBank file.

-----
#!/usr/bin/perl

use Bio::SeqIO;

my $file_in = shift;

my $in_obj  = Bio::SeqIO->new( -file   => "$file_in",
                               -format => "GenBank" );

while ( my $each_obj = $in_obj->next_seq) {
    @ref_objarray = $each_obj->annotation->get_Annotations("reference");

    foreach $ref_obj (@ref_objarray) {
        print $ref_obj->pubmed();
    }
}
-----

Most of GenBank file is parsed correctly, but I can't get PMID
from AC091629 or AC002397 (GenBank Accession No.).  (Nothing is
printed.)


Original GenBank file is as follows.
(in the case of AC091629)

-----
...
REFERENCE   1  (bases 1 to 161334)
  AUTHORS   Poorkaj,P., Kas,A., D'Souza,I., Zhou,Y., Pham,Q., Stone,M.,
            Olson,M.V. and Schellenberg,G.D.
  TITLE     A genomic sequence analysis of the mouse and human
            microtubule-associated protein tau
  JOURNAL   Mamm. Genome 12 (9), 700-712 (2001)
  REMARK    Contact: Gerald D. Schellenberg (zachdad at u.washington.edu)
   PUBMED   11641718
...
-----


So, I'll try "print $ref_obj->comment();".

-----
Contact: Gerald D. Schellenberg (zachdad at u.washington.edu) PUBMED   11641718
-----

I checked BioPerl confuses "comment" and "pubmed" in $each_obj
(in my script, SeqIO object) with Data::Dumper.


Ver.1.4 confuses JOURNAL and PUBMED line.
This problem was fixed in 1.5, but it seems to remain.


Best regards,
Takeru




More information about the Bioperl-l mailing list