Bioperl: repetitive DNA

Alessandro Guffanti ag3@sanger.ac.uk
Thu, 09 Sep 1999 09:26:39 +0100


Hi. I think a good solution could also be to use NCBI's DUST
filter with a suitable cut-off, then retrieve the coordinates
of masked sequences through a perl wrapper - c'est fait.
You can retrieve DUST from WU ftp server:

ftp://blast.wustl.edu/pub/dust

>test
acgatgacgatgatatatatatatatacataatatatatcacagggga
atatatatatcccacataatata

dust test
>test
acgatgacgatgNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNcc
cacataatata

dust test 45
>test
acgatgacgatgatatatatatatatacataatatatatcacaggggaatatatatatcc
cacataatata


Best Wishes,

Alessandro.

BTW, I think that this could be a good startup for a "filtering"
module. Do you think this could be interesting ? It could be a
method in a sequence object or a separate module per se. The outcome
could be a list of coordinates in the sequence which correspond to
masked areas. I would be happy to produce a rough version of this.


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       Alessandro Guffanti - Informatics      
The Sanger Centre, Wellcome Trust Genome Campus
  Hinxton, Cambridge CB10 1SA, United Kingdom        
    phone: +1223-834244 * fax: +1223-494919
      http://www.sanger.ac.uk/Users/ag3
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================