[Bioperl-l] OMIMparser/OMIMentry fail to retrieve gene symbols

Enrico Ferrero enricoferrero86 at gmail.com
Fri Jul 12 16:33:40 UTC 2013


Hi,

I'm using Bio::Phenotype::OMIM::OMIMparser to query the OMIM database [1].
In the simplest scenario I just want to retrieve all gene symbols
associated with all the diseases in OMIM and store disease and gene
information in a flat file.

I have modified the code on the CPAN page for OMIMparser [2] just
slightly to add my custom query, but my script fails to retrieve the
gene symbols associated with the OMIM entry (better formatted code
also available here: http://pastebin.com/8SFx4mUW):

***CODE START***
use Bio::Phenotype::OMIM::OMIMparser;

$omim_parser = Bio::Phenotype::OMIM::OMIMparser->new( -genemap  => "genemap",
                                                        -omimtext =>
"omim.txt" );

my $parsedOMIM="parsed.OMIM.txt";
open my $fh, ">", $parsedOMIM or die "Can't open $parsedOMIM: $!";
print $fh "ID" . "\t" . "Disease" . "\t" . "Genes" . "\n";

while ( my $omim_entry = $omim_parser->next_phenotype() ) {
# This prints everything.
#~ print( $omim_entry->to_string() );
#~ print "\n\n";
# This gets individual data (some of them object-arrays)
# (and illustrates the relevant methods of OMIMentry).
my $numb  = $omim_entry->MIM_number();                     # *FIELD* NO
my $title = $omim_entry->title();                          # *FIELD*
TI - first line
my $alt   = $omim_entry->alternative_titles_and_symbols(); # *FIELD*
TI - additional lines
my $mtt   = $omim_entry->more_than_two_genes();            # "#" before title
my $sep   = $omim_entry->is_separate();                    # "*" before title
my $desc  = $omim_entry->description();                    # *FIELD* TX
my $mm    = $omim_entry->mapping_method();                 # from genemap
my $gs    = $omim_entry->gene_status();                    # from genemap
my $cr    = $omim_entry->created();                        # *FIELD* CD
my $cont  = $omim_entry->contributors();                   # *FIELD* CN
my $ed    = $omim_entry->edited();                         # *FIELD* ED
my $sa    = $omim_entry->additional_references();          # *FIELD* SA
my $cs    = $omim_entry->clinical_symptoms_raw();          # *FIELD* CS
my $comm  = $omim_entry->comment();                        # from genemap
my $mini_mim   = $omim_entry->miniMIM();                   # *FIELD* MN
 # A Bio::Phenotype::OMIM::MiniMIMentry object.
 # class Bio::Phenotype::OMIM::MiniMIMentry
 # provides the following:
 # - description()
 # - created()
 # - contributors()
 # - edited()
 #
# Prints the contents of the MINI MIM entry (most OMIM entries do
# not have MINI MIM entries, though).
#~ print $mini_mim->description()."\n";
#~ print $mini_mim->created()."\n";
#~ print $mini_mim->contributors()."\n";
#~ print $mini_mim->edited()."\n";
my @corrs      = $omim_entry->each_Correlate();            # from genemap
 # Array of Bio::Phenotype::Correlate objects.
 # class Bio::Phenotype::Correlate
 # provides the following:
 # - name()
 # - description() (not used)
 # - species() (always mouse)
 # - type() ("OMIM mouse correlate")
 # - comment()
my @refs       = $omim_entry->each_Reference();            # *FIELD* RF
 # Array of Bio::Annotation::Reference objects.
my @avs        = $omim_entry->each_AllelicVariant();       # *FIELD* AV
 # Array of Bio::Phenotype::OMIM::OMIMentryAllelicVariant objects.
 # class Bio::Phenotype::OMIM::OMIMentryAllelicVariant
 # provides the following:
 # - number (e.g ".0001" )
 # - title (e.g "ALCOHOL INTOLERANCE" )
 # - symbol (e.g "ALDH2*2" )
 # - description (e.g "The ALDH2*2-encoded protein has a change ..." )
 # - aa_ori  (used if information in the form "LYS123ARG" is found)
 # - aa_mut (used if information in the form "LYS123ARG" is found)
 # - position (used if information in the form "LYS123ARG" is found)
 # - additional_mutations (used for e.g. "1-BP DEL, 911T")
my @cps        = $omim_entry->each_CytoPosition();         # from genemap
 # Array of Bio::Map::CytoPosition objects.
my @gss        = $omim_entry->each_gene_symbol();          # from genemap
 # Array of strings.
### A handy string to store gene symbols
my $geneSymbols = join(",", @gss);
### My query (this is just an example, I actually need to perform more
complex queries)
if ($title =~ /^#/) {
print $fh $numb . "\t" . $title . "\t" . $geneSymbols . "\n";
}
}
close $fh;
****CODE END***

So, my understanding is that '$omim_entry->each_gene_symbol()' fails
to retrieve gene symbols, except for a handful of cases (making the
issue a lot more mysterious).

I'm still a beginner, so it's entirely possible I'm doing something
stupid or wrong.
Alternatively, there might be something wrong on how
OMIMparser/OMIMentry parse and link the 'omim.txt' and 'genemap'
files.

Any help on how to get this to work is greatly appreciated.
Thank you.

Best,

[1] http://europe.omim.org/
[2] http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Phenotype/OMIM/OMIMparser.pm

--
Enrico Ferrero



More information about the Bioperl-l mailing list