[Bioperl-l] Sequence IDs and Comment()s

Charles Tilford charles.tilford@bms.com
Tue, 23 Oct 2001 09:41:17 -0400


Jason,

This has been a bit of confusion to me - I had assumed that
display_id() was "human readable" text like "Beta Hemoglobin", while
primary_id() should be used for unique database keys. I got this
impression from the documentation for Bio::Seq. I'm particularly
confused about the implementation of primary_id, since "For sequences
with no natural id, this method should return a stringified memory
location" (pointing to what, and to what end?).

I've been working in the context of migrating Seq objects to (and
from) BSML for display in the LabBook Viewer. The problem I've faced
is storing a short name for a sequence, for use as a title. The
contents of desc are often too long for a simple on-screen label
(sometimes a full sentence or more), and the sort of database primary
keys I've been putting in primary_id are typically things like
"245331", which are not of great utility to the viewer. Accession
number is also not immediately informative, and is not always
available.

So where should I put "title" strings? In a Comment? s/ /_/g and put
them as display_id?

General observation: When performing Bio::Seq <-> BSML, I end up with
a fistful of objects from one implementation that have no clear
corollary in the other. In BSML, there are two generic name/value type
containers (e.g. <Attribute name="user" content="Bob"> or <Qualifier
value-type="gene" value="actin">) - I've used these liberally to store
information that otherwise has no clear home. In the reverse
direction, I've been mapping orphaned data into Comment()s of the form
"name: Bob". However, I fret about the lack of predictability in
delimiter choice (": " vs ":" vs "\t" vs "," etc.).

So... What would people think of adding a type() (or class(),
category(), meta(), etc.) method to Comment to optionally qualify the
contents?

-Charles

Jason Eric Stajich wrote:

...snip...

> If you wanted these names printed out with the seqio system you should
> make sure and set the display id:
> $seq->display_id("myaccessionnumber");
> 
> display id should really not have spaces in it since it is the
> intended unique id for the sequence in a db.

...snip...

-- 
Charles Tilford, Bioinformatics-Applied Genomics
Bristol-Myers Squibb PRI, Hopewell 3A039
P.O. Box 5400, Princeton, NJ 08543-5400, (609) 818-3213
charles.tilford@bms.com