[Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF

Marco Blanchette mblanche at berkeley.edu
Tue May 2 22:31:44 UTC 2006


Brian--

I checked out last week version from the CVS.

Silly question: How do I get the version of BioPerl I am using... Never had
to check a module/bundle version number before...

Marco


On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Odd, because the intersection() code is quite simple and it's clear how it
> should behave. What version of Bioperl are you using? I'm looking at the
> latest, in bioperl-live...
> 
> Brian O.
> 
> 
> On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Brian--
>> 
>> Even when both elements of intersection() are from the negative strand, the
>> return object is from the positive strand and $overlap is actually the
>> revervese complement of the intersection between the 2 exons. Here is part
>> of the output from the script below:
>> 
>> ===
>> ex1     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
>> ex2     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
>> CAAATCG
>> overlap Strand: 1
>> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
>> TGCCGACTGCCATGTTCAACTAATAAACCGG
>> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
>> ...
>> 
>> If both are from the positive strand, the return object is positive as in:
>> 
>> ===
>> ex1     Strand: 1
>> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
>> TTTGTGCCTGTTTCAGTATAAATTAATTATG
>> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
>> AAATATACATATATGCAACATATATAACTTC
>> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
>> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
>> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
>> ex2     Strand: 1
>> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
>> overlap Strand: 1
>> CAACGCAGACGTG
>> 
>> Is there something I am missing? Here is the script generating the output
>> 
>> Many thanks all...
>> 
>> Marco
>> 
>> 
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>>                         $exon1->seq, "\n";
>>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>>                         $exon2->seq, "\n";
>>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>>                         $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> 
>> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>> 
>>> Marco,
>>> 
>>> Yes, this is how intersection() is supposed to work. If both of the Range
>>> objects have the same strand then the strand information is returned as part
>>> of the result but if they aren't on the same strand then no strand
>>> information is returned.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>> 
>>>> Dear all--
>>>> 
>>>> I have been trying to use the intersection function to extract overlapping
>>>> region from alternatively spliced exons as in the following script. The
>>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>>> actually loosing the strand of $exon1 if $exon1 is from the negative
>>>> strand.
>>>> Is this behavior expected? Should I check the strand of $exon1 before
>>>> working on the object return by any Bio::RangeI function?
>>>> 
>>>> Many thanks 
>>>> 
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::DB::GFF;
>>>> 
>>>> MAIN:{
>>>> 
>>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>>                                 -dsn =>
>>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>>                                 -user => 'guest');
>>>>     my $test_db = $db->segment('4');
>>>>     
>>>>     # Load up the exons into $exons_p
>>>>     for my $gene ($test_db->features(-types => 'gene')){
>>>> 
>>>>         my $exons_p = extractExons($gene);
>>>> 
>>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>>> 
>>>>     }
>>>> }
>>>> 
>>>> sub extractExons {
>>>>     my $gene = shift;
>>>>     my %ex_list;
>>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>>                                     -attributes =>{Gene => $gene->group});
>>>>               
>>>>     for my $tc (@tcs){
>>>>         my @exons = $tc->features (-type => 'exon',
>>>>                                      -attributes => {Parent => $tc->group}
>>>> );        
>>>>     
>>>>         for (@exons){
>>>>             my $ex_id    = $_->id;
>>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>> 
>>>>         }
>>>>     
>>>>     }
>>>>     my @values = values %ex_list;
>>>>     return(\@values);
>>>> }
>>>> 
>>>> sub cluster {
>>>>     my $exons_p = shift;
>>>>     
>>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>>             my $exon1 = $exons_p->[$s];
>>>>             my $exon2 = $exons_p->[$t];
>>>>             
>>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>>             
>>>>                 my $overlap = $exon1->intersection($exon2);
>>>>               
>>>>                 print "===\n";;
>>>>                 print "ex1\n", $exon1->seq, "\n";
>>>>                 print "ex2\n", $exon2->seq, "\n";
>>>>                 print "overlap\n", $overlap->seq, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ______________________________
>>>> Marco Blanchette, Ph.D.
>>>> 
>>>> mblanche at uclink.berkeley.edu
>>>> 
>>>> Donald C. Rio's lab
>>>> Department of Molecular and Cell Biology
>>>> 16 Barker Hall
>>>> University of California
>>>> Berkeley, CA 94720-3204
>>>> 
>>>> Tel: (510) 642-1084
>>>> Cell: (510) 847-0996
>>>> Fax: (510) 642-6062
>>> 
>>> 
>> 
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 






More information about the Bioperl-l mailing list