[Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records

Mon Jan 11 10:04:00 EST 2010

Hi Peter, 
I found the issue-- there are no SQ lines in the data, and 
having them is a key stop condition in the parser (line 438 embl.pm).
We evidently need to be more liberal in what we accept, even as we 
are strict in what we emit. Could you make a bug report?
thanks for the heads-up--
MAJ
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "bioperl-l list" <bioperl-l at lists.open-bio.org>
Sent: Monday, January 11, 2010 9:16 AM
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records


> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>