[Bioperl-l] DBSOURCE parsing

Chris Fields cjfields at uiuc.edu
Mon Nov 27 21:47:12 UTC 2006


Jason,

I am working on stockholm and GenPept format parsing, both which have  
DBLink objects.  I have a couple of questions.  First, (not a huge  
issue really, more like a curiosity), is it possible to pass a  
callback to Annotation objects for the overloaded operators?  I'm  
just thinking of situations where the data is displayed differently  
in other formats (like Stockholm).

Also, would it be feasible to have DBLink objects also contain  
annotations (comments, other DBLink objects, etc) for more complex  
data?  In particular this regards GenPept stuff, like the following  
examples:

DBSOURCE    swissprot: locus BRCA1_HUMAN, accession P38398;
             class: standard.
             created: Oct 1, 1994.
             sequence updated: Feb 1, 1995.
             annotation updated: Nov 14, 2006.
             xrefs: U14680.1, AAA73985.1, L78833.1, AAC37594.1,  
AY273801.1,
             AAP12647.1, A58881, 1JM7A, 1JNXX, 1N5OX, 1OQAA, 1T15A,  
1T29A,
             1T2UA, 1T2VA, 1T2VB, 1T2VC, 1T2VD, 1T2VE, 1Y98A
             xrefs (non-sequence databases): UniGene:Hs.194143,  
IntAct:P38398,
             TRANSFAC:T04074, Ensembl:ENSG00000012048, KEGG:hsa:672,  
HGNC:1100,
             MIM:113705, MIM:114480, Reactome:P38398,  
ArrayExpress:P38398,
             GO:0031436, GO:0008274, GO:0005634, GO:0000151, GO:0050681,
             GO:0003677, GO:0019899, GO:0003713, GO:0015631, GO:0008270,
             GO:0030521, GO:0007059, GO:0006978, GO:0008630, GO:0042759,
             GO:0046600, GO:0016481, GO:0045739, GO:0031398, GO:0045893,
             GO:0016567, GO:0042981, GO:0042127, GO:0006357, GO:0006359,
             InterPro:IPR011364, InterPro:IPR001357, InterPro:IPR002378,
             InterPro:IPR001841, PANTHER:PTHR13763, Pfam:PF00533,  
Pfam:PF00097,
             PIRSF:PIRSF001734, PRINTS:PR00493, SMART:SM00292,  
SMART:SM00184,
             PROSITE:PS50172, PROSITE:PS00518, PROSITE:PS50089
...
DBSOURCE    pdb: molecule 1T2U, chain 65, release Apr 22, 2004;
             deposition: Apr 22, 2004;
             class: Antitumor Protein;
             source: Mol_id: 1; Organism_scientific: Homo Sapiens;
             Organism_common: Human; Gene: Brca1; Expression_system:  
Escherichia
             Coli; Expression_system_common: Bacteria;
             Exp. method: X-Ray Diffraction.

...
DBSOURCE    pir: locus I49350;

             summary: #length 1812 #molecular-weight 198788 #checksum  
8813
             ;
             genetic: #gene Brca1
             ;
             superfamily: transcriptional regulator, BRCA1 type; RING  
finger
             homology
             ;
             PIR dates: 02-Jul-1996 #sequence_revision 02-Jul-1996  
#text_change
             09-May-2004
             .
...
DBSOURCE    prf: locus 2202221A;

             state: hepatoma/colonic tumor;
             taxonomy: Mammalia.

My thought is, the first line would be the main DBLink object data,  
with all subsequent lines as annotation objects (comments, DBLinks,  
etc) in an annotation collection contained within the main DBLink  
object.  I don't think there would be any danger of circular  
references if handled correctly.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list