[Biopython] missing fields in SeqIO EMBL parser?

Wim De Smet Wim.DeSmet at UGent.be
Fri May 7 10:36:09 EDT 2010


On 07-05-10 15:23, Peter wrote:
> On Fri, May 7, 2010 at 2:04 PM, Wim De Smet<Wim.DeSmet at ugent.be>  wrote:
>> Hi,
>>
>> I'm trying to parse an embl file using Bio.SeqIO but I'm missing some
>> metadata fields in the parsed object. For one, I can't find any reference to
>> the DT (date) fields or any of the database cross references. I'm using
>> biopython 1.53.
>>
>> Is this simply not implemented yet or are there options to include this data
>> in the SeqRecord object returned?
>
> The DT lines are currently ignored, please file an enhancement bug.
> This is complicated by the fact the GenBank files have only one date,
> and the EMBL parser shares a lot of code with the GenBank parser.

Okay, thanks for your help. I'll file a bug for it then.

> Could you be a bit more precise about missing database cross references?
> i.e. What line type are you looking for?

Sure, take this record:
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+EntryPage+-id+7BIdF1bEbRt+-e+[EMBL:FJ904258]+-vn+2

I'm looking for the data from the database cross reference lines (DR), i.e.:
DR   RFAM; RF00177; SSU_rRNA_5.
DR   SILVA-SSU; FJ904258.

I assumed this would be in the record.dxrefs fields, but it's empty when 
I parse this file. It's more of a nice to have than anything else at 
this point, but I'll have to figure out another way to get a hold of 
these elements then.

cheers,
Wim

-- 
Wim De Smet
http://www.straininfo.net/


More information about the Biopython mailing list