[Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records

Chris Fields cjfields at illinois.edu
Tue Jan 12 16:02:02 UTC 2010


On Jan 11, 2010, at 9:55 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>> 
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
> 
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.

IIRC the contig information for GenBank is stored in annotation.  We can try to ensure the data is carried over to EMBL properly.

>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
> 
> That's a useful tip - thanks.
> 
> Peter

NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

chris



More information about the Bioperl-l mailing list