[Bioperl-l] check for the continous segments to extract thesequences

Fri Apr 27 13:52:10 UTC 2007

Gopu/Jason,

Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm

Here's a perl one-liner that shows you how easy it is:

perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new;  while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000

I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.

There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.

Cheers,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to 
> extract thesequences
> 
> You want a connectivity algorithm.  One can be found on 
> perlmonks.org  
> as well as in Bio::Search::SearchUtils the method collapse_nums().  
> You'll have to modify aspects of it to deal with ranges.
> 
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
> 
> >
> > As a newbee to programming, thx for the support from this group.  
> > Please
> > ignore the message if this message is not relevant to this 
> group as my
> > problem may be a typical computer science recursive one! (as I am  
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the  
> > start
> > position and the second element '1000' gives the end of the  
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points  
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END  
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any  
> > continous
> > segments lie in the list and add them together to form a one whole  
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract  
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and  
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will  
> > be able to
> > extract that part of sequence from a whole genome. Thanks for  
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > -- 
> > View this message in context: http://www.nabble.com/check-for-the- 
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>