[Bioperl-l] get_SeqFeatures doesn't like genbank CON files

Chris Fields cjfields at uiuc.edu
Thu Mar 29 20:00:09 UTC 2007


Nick, you may want to check your bioperl version as the  
SeqIO::genbank line number indicated in the error is not the same as  
in CVS (and I'm guessing from the last release either).  If you  
aren't running a recent bioperl version I would suggest upgrading to  
1.5.2; CONTIG parsing was something I added in last year sometime  
post 1.5.1.  They must be preceeded by the GenBank-compliant CONTIG  
tagname to be parsed correctly (using the EMBL-like 'CON' doesn't work).

The CONTIG line data is not supposed to be treated like a location;  
it's normally just stuffed into Annotation::SimpleValue objects to be  
spit back out in write_seq() if needed.  As the error states there  
are no Bio::Location classes that handle gap data.  Since it's trying  
to process this as a location it indicates something is definitely  
wrong; the only place this would occur is while parsing features as  
that's where FTLocationFactory comes into play (via FTHelper).

If your seq records look like this (from CM000126) and you still have  
problems with the latest bioperl release you'll have to file a bug  
with an example file so we can look at it.

...
FEATURES             Location/Qualifiers
      source          1..47244934
                      /organism="Oryza sativa (indica cultivar-group)"
                      /mol_type="genomic DNA"
                      /cultivar="93-11"
                      /db_xref="taxon:39946"
                      /chromosome="1"
CONTIG      join(CH398081.1:1..22419,gap(unk100),CH398082.1:1..12525385,
             gap(unk100),CH398083.1:1..13518,gap 
(unk100),CH398084.1:1..2551194,
             gap(unk100),CH398085.1:1..3493222,gap(unk100),
             CH398086.1:1..5091462,gap(unk100),CH398087.1:1..26622,gap 
(unk100),
             CH398088.1:1..4860221,gap(unk100),CH398089.1:1..18660091)
//

chris

On Mar 29, 2007, at 2:06 PM, Staffa, Nick (NIH/NIEHS) wrote:

> If I use the following code on the genbank flat files gbconN.seq   
> (N=1..4),
> I bomb memory.  So I wrote a flat Perl script and made oodles of  
> files,
> one for each genbank CON entry for D.pseudoobscura.
> These entries have complete features tables, but do not have real  
> sequence,
> just join statements referencing the WGS files AADExxxxxxxxxxx.
> When I run this code on them. the BioPerl modules don't seem to  
> like the
> join statements being where they are, and for some reason object to  
> "gap".
> I AM glad that BioPerl allowed the program to process all files.
>
> The code:
>    $seqio_object = Bio::SeqIO->new(-file => "$filename" );
>    $seq_object = $seqio_object->next_seq;
>    $sequence_length = $seq_object->length();
>    my @features = $seq_object->get_SeqFeatures(); # just top level
>
>
> The log:
> -------------------- WARNING ---------------------
> MSG: exception while parsing location line
> [join(AADE01003924.1:1..5157,gap(128),complement 
> (AADE01002963.1:1..8959),gap
> (50),complement(AA
> DE01002322.1:801..13635),AADE01008784.1:1..995,complement 
> (AADE01002422.1:1..
> 12770),gap(105),complement(AADE01006425.1:1..1791),gap(940),c
> omplement(AADE01002137.1:1..15323),gap 
> (962),AADE01003112.1:1..8150,gap(194),
> AADE01000989.1:1..38476,AADE01012537.1:1..1696,gap(243),AADE0
> 1012620.1:1..612,complement(AADE01002972.1:1..8912),gap 
> (1646),complement(AAD
> E01009428.1:602..2135),AADE01000086.1:1..143541,complement(AA
> ...
> ...
> ...
> 01003505.1:1..6496,gap(1445),AADE01004655.1:1..3580,gap 
> (328),AADE01002622.1:
> 1..11193,gap(90),complement(AADE01006718.1:1..1606),gap(423),
> complement(AADE01004351.1:1..4128))] in reading EMBL/GenBank/ 
> SwissProt,
> ignoring feature CONTIG (seqid=CH379058):
> ------------- EXCEPTION  -------------
> MSG: operator "gap" unrecognized by parser
> STACK Bio::Factory::FTLocationFactory::from_string
> /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:179
> STACK Bio::Factory::FTLocationFactory::from_string
> /usr/lib/perl5/site_perl/5.8.5/Bio/Factory/FTLocationFactory.pm:175
> STACK (eval) /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:127
> STACK Bio::SeqIO::FTHelper::_generic_seqfeature
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/FTHelper.pm:126
> STACK Bio::SeqIO::genbank::next_seq
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/genbank.pm:514
> STACK toplevel find_orthos.pl:24
>
> This even occurs with the addition of –format => “genbank”
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list