[Bioperl-l] Suggestion for a new script: bp_repeat_mask_sequence.pl
Dan Bolser
dan.bolser at gmail.com
Tue Jul 6 12:37:05 UTC 2010
Hello,
I'd like to submit a script, 'bp_repeat_mask_sequence.pl'(?), to the
set of scripts in BioPerl. Below is what I have so far. The script
works by reading in a GFF of 'repeat_region's and a fasta file of
sequences. It outputs a fasta sequence file with the repeats replaced
by Xs.
The script clearly needs to be more configurable, but I thought I'd
send it along now to see if I'm working along the right lines, or if I
should be using a different approach.
Comments?
Cheers,
Dan.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Bio::SeqIO;
use Bio::FeatureIO;
## Set options
my $verbose = 0;
my $seq_file;
my $gff_file;
GetOptions
(
"verbose" => \$verbose,
"seq=s" => \$seq_file,
"gff=s" => \$gff_file,
)
or die "failed to parse command line options\n";
die "fail $gff_file : $!\n"
unless -s $gff_file;
## Set up the BioPerl objects
my $seq_reader =
Bio::SeqIO->new( -file => $seq_file,
-format => 'fasta'
);
my $seq_writer =
Bio::SeqIO->new( -fh => \*STDOUT,
-format => 'fasta',
-width => 80
);
my $gff_reader =
Bio::FeatureIO->new( -file => $gff_file,
-format => 'GFF',
);
#warn $seq_reader->width, "\n"; exit;
## Run
my (%repeats, $c);
while ( my $feature = $gff_reader->next_feature() ) {
if($verbose>1){
print
join("\t", #$feature,
$feature->seq_id,
$feature->type->name,
$feature->start,
$feature->end,
), "\n";
}
if($feature->type->name eq 'repeat_region'){
$c++;
push @{$repeats{ $feature->seq_id }},
[$feature->start,
$feature->end];
}
# Debugging
#last if $c > 100;
}
warn "read $c repeat_region features for ",
scalar keys(%repeats), " sequences\n";
##
while(my $seq = $seq_reader->next_seq){
my $id = $seq->id;
my $sequence = $seq->seq;
print $id, "\n"
if $verbose > 0;
print length($sequence), "\n"
if $verbose > 0;
for my $region (@{$repeats{ $id }}){
my ($start, $end) = @$region;
print "$start\t$end\n"
if $verbose > 1;
substr($sequence, $start, $end - $start, 'X' x ($end - $start));
}
print length($sequence), "\n"
if $verbose > 0;
$seq->seq($sequence);
$seq_writer->write_seq($seq);
# Debugging;
#last;
}
warn "OK\n";
More information about the Bioperl-l
mailing list