[Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records

Mon Jan 11 15:42:22 UTC 2010

On 1/11/10 3:16 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

These entries form the CON data class, see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
and they don't contain any sequence information.

If you take the 'expanded' entries from
ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r
102.dat.gz
your script will work.

Hans

> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l