[Bioperl-l] Problem with Bio::GeneMapper

Sami Kilpinen sami.kilpinen at medisapiens.com
Thu Feb 23 07:20:10 UTC 2012


Hello,

I have been trying to solve following issues with GeneMapper.
1) ->to_string(); throws an error
2) Mapper doesn't seem to do much of anything


Bioperl 1.6.1 in use and relevant parts of the code are as follows:
---------------------------------------------------
use Bio::Coordinate::GeneMapper;
use Bio::Location::Split;
use Bio::Location::Simple;

# get a Bio::Location::Split or an array of Bio::LocationI objects
# holding the start, end and strand of all the exons in chromosomal
# (or entry) coordinates
my $splits = Bio::Location::Split->new();

# $exons come from mongodb, not relevant here as data is printed to screen for debug purposes

while ( my $exon = $exons->next ) {
	my $e_start=$exon->{'start'};
	my $e_end=$exon->{'end'};
	my $e_strand=$exon->{'strand'};
	print "Exon_start,Exon-end,Exon-strand\n";
	print $e_start,",",$e_end,",",$e_strand,"\n";
	$splits->add_sub_Location(Bio::Location::Simple->new(-start=>$e_start,-end=>$e_end,-strand=>$e_strand));
}

print  "Location::Split object string dump\n";
print $splits->to_FTstring(),"\n";

# get a Bio::RangeI representing the start, end and strand of the CDS
# in chromosomal (or entry) coordinates
my $cds = Bio::Location::Simple->new(-start => 752890, -end => 753369, -strand => 1 );

# create a gene mapper and set it to map from chromosomal to cds coordinates
my $gene = Bio::Coordinate::GeneMapper->new(-in   =>'chr',
				      -out  =>'cds',
				      -cds  =>$cds,
				      -exons=>$splits
				     );

# get a a Bio::Location or sequence feature in input (chr) coordinates
my $loc = Bio::Location::Simple->new(-start => 753584, -end => 753584, -strand => 1 );

# Trying to set strict boundaries
$gene->strict('cds');

# map the location into output coordinates and get a new location object
my $newloc = $gene->map($loc);

print "new location in cds coordinates\n";
print  $newloc->start(),"\n";

print "Mapper internal data in human readable format\n";
my $tmp=$gene->to_string();

------------------------------

And the result of running this are:
==========================
Exon_start,Exon-end,Exon-strand
752751,753582,1
Exon_start,Exon-end,Exon-strand
754103,755214,1
Location::Split object string dump
join(752751..753582,754103..755214)
new location in cds coordinates
695
Mapper internal data in human readable format
----------------------------------------

     chr-gene     (1-2)
                gene offset: 752889 (753368)
                gene strand: 1

     gene-intron  (2-5)
Can't call method "each_mapper" on an undefined value at /usr/local/share/perl/5.10.1/Bio/Coordinate/GeneMapper.pm line 1013.
==========================

Thus here we have two exons and cds situating completely in the first exon. GeneMapper calculates (correctly) that chr coordinate 752892 is nucleotide 2 from the start of cds (752890) but then throws an error about each_mapper. Then, if I ask for cds coordinate for chr coordinate (753769) which is between the exons and beyond the end of the cds (753369) it returns me value 880 and same error (why it even claims that such an cds coordinate exists? cds is 479 long...). I have tried to test some more complex genes where cds contains >1 exons and if I ask chr=>cds coordinate mapping when the chr coordinate situates in exon 2 it seems that it does not take the intron into account.. it just returns the "asked chr coordinate"-"cds start coordinate". This simple calculation is all that this module does for me... 

What I would like to do in this more complex cases is that it calculates the length of all exon nucleotides between cds start and the asked chr coordinate. In other words ("asked chr coordinate"-"cds start in chr coordinates")-length of all introns between those. Isn't this what GeneMapper is supposed to do?

I would highly appreciate any help with this issue.

Regards,
Sami Kilpinen






More information about the Bioperl-l mailing list