[Bioperl-l] [BioPerl]RepeatMasker issues

Jason Stajich jason.stajich at gmail.com
Wed Apr 27 15:31:26 UTC 2011


I guess it fails to produce a .out file not an empty .out file if there are no repeats, not sure if this is intended or a bug in RM.

BTW  - your approach to use this module will likely be pretty inefficient as you will initiate 100,000 runs while you could just run repeatmasker on a single fasta file of your 100bp seqs as a single instance (or some smaller chunks that you farm out to the cluster). Then you just need to parse one repeatmasker output file (or concatenate all the ones from the chunks you sent off to the cluster). This seems much more sane than this one at a time approach.

However, If you want to still do it this way, you need to fix the module so that it doesn't throw on empty file - at around line 260 you can change that throw to a warn and return from the function.  

e.g. in RepeatMasker.pm you could change this:
unless (open (RM, $outfile)) {
      $self->throw("Cannot open RepeatMasker outfile for parsing");
  }

to something like this
unless (open (RM, $outfile)) {
      $self->warn("Cannot open RepeatMasker outfile for parsing");
      return;
  }


You should also be checking to see that the run succeeded  in your code
my @rpt_features = $factory->run($seq) 

if( @rpt_features ) { # test that there are > 0 elements, then we have repeats to report
 print $factory->masked_seq->seq, "\n";
}
-- or just run RepeatMasker on the cmdline 
On Apr 27, 2011, at 3:55 AM, Graham Hamilton wrote:

> I am writing a script to run RepeatMasker on several hundred thousand sequences of ~100 bp. I have installed RepeatMasker and have a script that will read in a single fasta sequence file and print out the masked sequence.
> 
> use Bio::Tools::Run::RepeatMasker;
> use Bio::SeqIO;
> 
> my $factory = Bio::Tools::Run::RepeatMasker->new(-species => 'human');
> my $in  = Bio::SeqIO->new(-file => "test.fa",
>                      -format => 'fasta');
> my $seq = $in->next_seq();
> $factory->run($seq);
> my $masked_seq = $factory->masked_seq;
> print  $masked_seq->seq;
> 
> The script works for sequences that contain repeats. Unfortunately, when I use a sequence without repeats I get the following error.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Cannot open RepeatMasker outfile for parsing
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.12.3/Bio/Root/Root.pm:472
> STACK: Bio::Tools::Run::RepeatMasker::_run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:308
> STACK: Bio::Tools::Run::RepeatMasker::run /usr/local/lib/perl5/site_perl/5.12.3/Bio/Tools/Run/RepeatMasker.pm:260
> STACK: RepeatMaskerTest.pl:22
> -----------------------------------------------------------
> 
> As I want to screen a file of many short sequences, most will contain repeats but not all. I want to keep the sequences that do not contain repeats for further investigation. This is a problem for me as the Exception exits the script. 
> 
> I assume that I am doing something wrong, can anyone give me some hints as to how I can get this to work?
> 
> Regards
> 
> Graham
> 
> 
> Dr Graham Hamilton
> The Sir Henry Wellcome Functional Genomics Facility
> Room B3-28
> Joseph Black Building
> Research Institute of Molecular, Cell & Systems Biology
> College of Medical, Veterinary & Life Sciences
> University of Glasgow
> Glasgow
> Scotland
> G12 8QQ
> 
> T: +44 141 330 6212
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list