[Bioperl-l] GenBank files CONTIG line messes up sequences
jockoblom at gmail.com
jockoblom at gmail.com
Wed May 14 09:41:23 EDT 2014
Hi!
I'm using BioPerl to work with GenBAnk files, mostly public files from the
NCBI. For GenBank files with a CONTIG line like thw following
"CONTIG join(BA000030.3:1..9025608)"
the Genbank parser does not work properly.
As a simple example:
*#!/usr/bin/env perluse strict;use warnings;use Bio::SeqIO;my $file =
shift;# create reader for the inputmy $seq_in = Bio::SeqIO->new( -file =>
$file, -format =>
'genbank' );my @entries;while (my $contig =
$seq_in->next_seq()){ ## do stuff push @entries, $contig;}my $outfile
= Bio::SeqIO->new( -file => ">out.gbk",
-format => 'genbank' );foreach my $contig
(@entries){ $outfile->write_seq($contig);}*
This script creates a Genbank file, but the genome sequence is missing.
This is independent from what I do in the "do stuff" part, even this empty
version does not export a sequence.
The problem seems to be that the sequence is not accesible, e.g., the
following lines will create empty sequences in $feature_seq.
*foreach my $feat ($contig->get_SeqFeatures()){ if($feat->primary_tag()
eq 'CDS'){ $feature_seq = $feat->seq();...*Even the most simple (and
most useless) script example does fail to produce a genbank file with
sequence:
*my $input = $ARGV[0];my $output = $ARGV[1];my $in = Bio::SeqIO->new(-file
=> "$input" , -format => 'genbank');my $out =
Bio::SeqIO->new(-file => ">$output" , -format =>
'genbank'); while ( my $seq = $in->next_seq() ) {
$out->write_seq($seq);}*
Same script with Genbank to EMBL works just fine. EMBL to Genbank fails as
well if the EMBL file has the line corresponding to the CONTIG line:
"CO join(BA000030.3:1..9025608)"
I found this post
https://groups.google.com/forum/#!msg/bioperl-l/ur9ZQIXyoj0/xT8izNWb-8kJ
telling that the problem is fixed in BioPerl 1.6.922, but I still encounter
the problems with sequence retieval.
Using Perl 5.18.2
Bio::Root::Version::VERSION: 1.006923
Anyone with the same problems or any helpful ideas?
Best regards,
Jochen
More information about the Bioperl-l
mailing list