[Bioperl-l] Error parsing Genbank file

Ryan Golhar golharam at umdnj.edu
Wed Jan 5 15:41:33 EST 2005


Hi all,

I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The
entry is just a WGS entry referencing a bunch of other entries.  It does
on line 492 with the error "Unexpected error in feature table for
Skipping feature, attempting to recover".

I'm using the following code:

#!/usr/bin/perl

use strict;
use Bio::SeqIO;

my $usage = "$0 <genbank file> <fasta file>\n";
my $file = shift or die $usage;
my $outfilename = shift or die $usage;

my $infile = Bio::SeqIO->new('-file' => "<$file",
			    '-format' => "genbank");

my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
			    '-format' => "fasta");

while (my $seq = $infile->next_seq) {
#	print STDERR $seq->accession_number,"\n";
	
	$outfile->write_seq($seq);
}

Here is the contents of the genbank entry:

LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
22-AUG-2002
DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
ACCESSION   CAAB00000000
VERSION     CAAB00000000.1  GI:22418063
KEYWORDS    WGS.
SOURCE      Takifugu rubripes (Fugu rubripes)
  ORGANISM  Takifugu rubripes
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
            Actinopterygii; Neopterygii; Teleostei; Euteleostei;
Neoteleostei;
            Acanthomorpha; Acanthopterygii; Percomorpha;
Tetraodontiformes;
            Tetradontoidea; Tetraodontidae; Takifugu.
REFERENCE   1  (bases 1 to 12381)
  AUTHORS   The Fugu Genome Sequencing Consortium.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
Consortium,
            http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project has
the
            project accession CAAB00000000.  This version of the project
(01)
            has the accession number CAAB01000000, and consists of
sequences
            CAAB01000001-CAAB01012381.
FEATURES             Location/Qualifiers
     source          1..12381
                     /organism="Takifugu rubripes"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:31033"
WGS         CAAB01000001-CAAB01012381
//



-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam at umdnj.edu



More information about the Bioperl-l mailing list