[Bioperl-l] added -type for Bio::Annotation::DBLink
Dave Messina
David.Messina at sbc.su.se
Thu Jul 8 18:27:13 UTC 2010
Hi everybody,
In working on representing sequence metadata*, I've found it useful to track
the type of information that is involved in a database cross-reference. I'm
adding an optional -type property to Bio::Annotation::DBLink to support this
cleanly in BioPerl.
Here follows my rationale. I'm not tied to doing this in B:A::DBLink if
there's a better way — it just seems the best route to me at the moment.
--
So, what do I mean by tracking the type of information in a DB
crossreference?
Right now, a standard DBLink contains
database => RefSeq
ID => NM_12345
along with a few other optional properties. See the docs for details:
http://doc.bioperl.org/bioperl-live/Bio/Annotation/DBLink.html
I want to be able to say
database => RefSeq
ID => NM_12345
type => RNA
Why?
Two reasons:
1. a single database can store more than one type of information.
RefSeq, for example, stores RNA and protein records. Although RefSeq's IDs
are named intelligently to note their type (NM_xxx for transcript, NP_xxx
for protein), this is not true for all databases and not everybody knows the
ID codes.
For example, here are three database-ID pairs:
Genbank: AK291692.1
Genbank: CH471055.1
Genbank: AAH14616.1
Those are three different record types (mRNA, genomic DNA, protein) from the
same database.
2. There can be multiple crossreferences for multiple types of information.
There can be multiple source databases providing the same type of
crossreference and multiple types of crossreferences.
Take this example:
Genbank: AAA81779.1
EMBL: AK291692
Ensembl: ENST00000308775
HPA: CAB001960
Two of these are mRNA records, one is a protein record, and one is something
else entirely. If I wanted to take one mRNA xref and one protein xref from
this set, I couldn't do it using solely the information provided above.
If I had type information, though, it'd be easy.
And since -type is an optional parameter, it is fully backwards-compatible.
Any thoughts or comments?
Dave
* specifically, in SeqXML. See http://seqxml.org and
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/seqxml.html
More information about the Bioperl-l
mailing list