Bioperl: repetitive DNA

Lincoln Stein lstein@cshl.org
Sun, 7 Nov 1999 16:42:56 -0500 (EST)


No module needed.  Here's a simple one-line regular expression that
does everything that dust does.  It catches all repeats of unit length 
1 or greater that are repeated at least 4 times.

 $sequence =~ s/((.+)\2{4,})/'N' x length $1/eg;

This one occurred to me while writing problems for the CSHL genome
informatics course.

Lincoln 

Alessandro Guffanti writes:
 > Hi. I think a good solution could also be to use NCBI's DUST
 > filter with a suitable cut-off, then retrieve the coordinates
 > of masked sequences through a perl wrapper - c'est fait.
 > You can retrieve DUST from WU ftp server:
 > 
 > ftp://blast.wustl.edu/pub/dust
 > 
 > >test
 > acgatgacgatgatatatatatatatacataatatatatcacagggga
 > atatatatatcccacataatata
 > 
 > dust test
 > >test
 > acgatgacgatgNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNcc
 > cacataatata
 > 
 > dust test 45
 > >test
 > acgatgacgatgatatatatatatatacataatatatatcacaggggaatatatatatcc
 > cacataatata
 > 
 > 
 > Best Wishes,
 > 
 > Alessandro.
 > 
 > BTW, I think that this could be a good startup for a "filtering"
 > module. Do you think this could be interesting ? It could be a
 > method in a sequence object or a separate module per se. The outcome
 > could be a list of coordinates in the sequence which correspond to
 > masked areas. I would be happy to produce a rough version of this.
 > 
 > 
 > -- 
 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 >        Alessandro Guffanti - Informatics      
 > The Sanger Centre, Wellcome Trust Genome Campus
 >   Hinxton, Cambridge CB10 1SA, United Kingdom        
 >     phone: +1223-834244 * fax: +1223-494919
 >       http://www.sanger.ac.uk/Users/ag3
 > =========== Bioperl Project Mailing List Message Footer =======
 > Project URL: http://bio.perl.org/
 > For info about how to (un)subscribe, where messages are archived, etc:
 > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > ====================================================================

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================