[Bioperl-l] Error parsing Genbank file

Jason Stajich jason.stajich at duke.edu
Thu Jan 6 17:14:04 EST 2005


Fixed in CVS.  You can grab the changes from http://cvs.open-bio.org/


Index: Bio/SeqIO/genbank.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v
retrieving revision 1.116
diff -r1.116 genbank.pm
71a72
 >  wgs             - Should contain a Bio::Annotation::SimpleValue 
object
465,466c466
<             last if(($buffer =~ /^BASE/o) || ($buffer =~ /^ORIGIN/o) 
||
<                     ($buffer =~ /^CONTIG/o) );
---
 >             last if( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o);
517a518,522
 >         } elsif( s/^WGS\s+// ) {
 >             chomp;
 >             $annotation->add_Annotation(
 >                 'wgs',
 >                 Bio::Annotation::SimpleValue->new(-value => $_));
522c527,528
<         }
---
 >
 >                                      } else { warn($_); }
775a782,788
 >       # deal with WGS
 >       foreach my $wgs ( $seq->annotation->get_Annotations('wgs') ) {
 >           $self->_print(sprintf ("%-11s %s\n",'WGS',
 >                                  $wgs->value));
 >           $self->_show_dna(0);
 >       }
 >
On Jan 6, 2005, at 4:21 PM, Ryan Golhar wrote:

> What is the fix for CONTIG entries....
>
> BTW- I'm new to bioperl...
>
> Ryan
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, January 05, 2005 4:37 PM
> To: golharam at umdnj.edu
> Cc: 'Bioperl List'
> Subject: Re: [Bioperl-l] Error parsing Genbank file
>
>
> We can't parse WGS files.  The fix it needs is very similar to how we
> handle CONTIG entries if you want to have a go at fixing it.
>
> On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The
>> entry is just a WGS entry referencing a bunch of other entries.  It
>> does on line 492 with the error "Unexpected error in feature table for
>> Skipping feature, attempting to recover".
>>
>> I'm using the following code:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use Bio::SeqIO;
>>
>> my $usage = "$0 <genbank file> <fasta file>\n";
>> my $file = shift or die $usage;
>> my $outfilename = shift or die $usage;
>>
>> my $infile = Bio::SeqIO->new('-file' => "<$file",
>> 			    '-format' => "genbank");
>>
>> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
>> 			    '-format' => "fasta");
>>
>> while (my $seq = $infile->next_seq) {
>> #	print STDERR $seq->accession_number,"\n";
>> 	
>> 	$outfile->write_seq($seq);
>> }
>>
>> Here is the contents of the genbank entry:
>>
>> LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
>> 22-AUG-2002
>> DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
>> ACCESSION   CAAB00000000
>> VERSION     CAAB00000000.1  GI:22418063
>> KEYWORDS    WGS.
>> SOURCE      Takifugu rubripes (Fugu rubripes)
>>   ORGANISM  Takifugu rubripes
>>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>>             Actinopterygii; Neopterygii; Teleostei; Euteleostei;
>> Neoteleostei;
>>             Acanthomorpha; Acanthopterygii; Percomorpha;
>> Tetraodontiformes;
>>             Tetradontoidea; Tetraodontidae; Takifugu.
>> REFERENCE   1  (bases 1 to 12381)
>>   AUTHORS   The Fugu Genome Sequencing Consortium.
>>   TITLE     Direct Submission
>>   JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
>> Consortium,
>>             http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
>> COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project
>> has
>> the
>>             project accession CAAB00000000.  This version of the
>> project
>> (01)
>>             has the accession number CAAB01000000, and consists of
>> sequences
>>             CAAB01000001-CAAB01012381.
>> FEATURES             Location/Qualifiers
>>      source          1..12381
>>                      /organism="Takifugu rubripes"
>>                      /mol_type="genomic DNA"
>>                      /db_xref="taxon:31033"
>> WGS         CAAB01000001-CAAB01012381
>> //
>>
>>
>>
>> -----
>> Ryan Golhar
>> Computational Biologist
>> The Informatics Institute at
>> The University of Medicine & Dentistry of NJ
>>
>> Phone: 973-972-5034
>> Fax: 973-972-7412
>> Email: golharam at umdnj.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/



More information about the Bioperl-l mailing list