[Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
Chris Fields
cjfields at illinois.edu
Tue Jan 12 16:02:02 UTC 2010
On Jan 11, 2010, at 9:55 AM, Peter wrote:
> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
>
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.
IIRC the contig information for GenBank is stored in annotation. We can try to ensure the data is carried over to EMBL properly.
>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
>
> That's a useful tip - thanks.
>
> Peter
NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).
chris
More information about the Bioperl-l
mailing list