[Bioperl-l] OMIM parser

Heikki Lehvaslaiho heikki@ebi.ac.uk
Fri, 12 Jul 2002 11:57:54 +0100


Chris,

 From the sequence point of view OMIM is annotation. On its own it contains 
the best we have describeng human phenotype.


Last year Jason and I were brainstorming during ISMB. The resuts are in 
bioperl file models/maps_and_markers.dia. It is already a bit outdated. Maps 
and markers all ended up in Bio::Map but we also outlined Bio::Organism 
namespace where Phenotype is one component.  So, I'd suggest Bio::Phenotype 
or even beter Bio::Organism::Phenotype::OMIM

OMIM phenotypes are quite generic, but in prectice they are associate with 
sequences and individuals. We'll need  Bio::Organism::Individual which could 
have more than one subphenotypes which together form that persons phenotype.

The important thing to remember about OMIM is that it is not a database in 
rigorous sense. It is a loosely structured - much more than general 
semistructired biological databases - collection of free text and various 
other structures:

- ID
- Name(s)
- Keywords
- Summary
- Main text
- Mutations (Bio::Variation)
   -- ID
   -- keywords including mutaion description
   -- free text
- Crossreferences  (Bio::Annotation::DBLink)
- References     (Bio::Biblio or Bio::Annotation::Reference)
- Contributors & History
- it implies Species (Bio::Species)


I am not saying you have to parse nd write out everything, but at least try 
to keep the the bigger picture in mind and future options open.

Good luck,

	-Heikki



Chris Zmasek wrote:
> Hi!
> 
> I am in the process of writing a parser for the OMIM database (to be submitted to Bioperl).
> 
> Not all entries in OMIM are linked to a gene/locus, some of them are just diseases without a associated gene, for example the entry for "ABDOMINAL AORTIC ANEURYSM" (100070). 
> 
> Therefore I am not clear what the best output for such a parser might be:
> Sequence objects (without a actual "sequence string") or annotation objects?
> If the output consists of sequence objects, entries without a associated gene would have to be ignored.
> 
> What do you think?
> 
> Thanks,
> 
> Christian [czmasek@gnf.org]
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
       _/      _/                      http://www.ebi.ac.uk/mutations/
      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________