[Biopython] missing fields in SeqIO EMBL parser?

Peter biopython at maubp.freeserve.co.uk
Fri May 7 14:50:20 UTC 2010


On Fri, May 7, 2010 at 3:36 PM, Wim De Smet <Wim.DeSmet at ugent.be> wrote:
>
> Sure, take this record:
> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+EntryPage+-id+7BIdF1bEbRt+-e+[EMBL:FJ904258]+-vn+2
>
> I'm looking for the data from the database cross reference lines (DR), i.e.:
> DR   RFAM; RF00177; SSU_rRNA_5.
> DR   SILVA-SSU; FJ904258.
>
> I assumed this would be in the record.dxrefs fields, but it's empty when I
> parse this file. It's more of a nice to have than anything else at this
> point, but I'll have to figure out another way to get a hold of these
> elements then.

That was also left as a TODO - the dbxrefs list is normally used for single
identifiers - here it would be "RFAM:RF00177" and "SILVA-SSU:FJ904258"
for consistency with the other parsers. At the time I was undecided on how
to handle any secondary identifier Would you need/want this too? Maybe
as  "RFAM:RF00177:SSU_rRNA_5"?

Peter




More information about the Biopython mailing list