[Bioperl-l] Suggestion for a new script: bp_repeat_mask_sequence.pl
Brian Osborne
bosborne11 at verizon.net
Tue Jul 6 14:50:25 UTC 2010
Dan,
There are 2 different directories for scripts, examples/ and scripts/. The examples/ directory accepts any sort of script. The scripts/ directory scripts can be installed when Bioperl is installed, if the user wishes. The guidelines are that scripts/ directory scripts should accept command-line arguments (which yours does), should be named with the suffix 'PLS', and should have POD documentation. So, all you need is some POD. Here's some example POD:
=head1 NAME
bioflat_index.pl - index sequence files using Bio::DB::Flat
=head1 DESCRIPTION
Create or update a biological sequence database indexed with the
Bio::DB::Flat indexing scheme. The arguments are a list of flat files
containing the sequence information to be indexed.
=head1 USAGE
bioflat_index.pl <options> file1 file2 file3...
Options:
--create Create or reinitialize the index. If not specified,
the index must already exist.
--format <format> The format of the sequence files. Must be one
of "genbank", "swissprot", "embl" or "fasta".
--location <path> Path to the directory in which the index files
are stored.
--dbname <name> The symbolic name of the database to be created.
--indextype <type> Type of index to create. Either "bdb" or "flat".
"binarysearch" is the same as "flat".
Options can be abbreviated. For example, use -i for --indextype.
The following environment variables will be used as defaults if the
corresponding options are not provided:
OBDA_FORMAT format of sequence file
OBDA_LOCATION path to directory in which index files are stored
OBDA_DBNAME name of database
OBDA_INDEX type of index to create
=cut
On Jul 6, 2010, at 2:37 PM, Dan Bolser wrote:
> Hello,
>
> I'd like to submit a script, 'bp_repeat_mask_sequence.pl'(?), to the
> set of scripts in BioPerl. Below is what I have so far. The script
> works by reading in a GFF of 'repeat_region's and a fasta file of
> sequences. It outputs a fasta sequence file with the repeats replaced
> by Xs.
>
> The script clearly needs to be more configurable, but I thought I'd
> send it along now to see if I'm working along the right lines, or if I
> should be using a different approach.
>
> Comments?
>
>
> Cheers,
> Dan.
>
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Getopt::Long;
>
> use Bio::SeqIO;
> use Bio::FeatureIO;
>
>
>
> ## Set options
>
> my $verbose = 0;
> my $seq_file;
> my $gff_file;
>
> GetOptions
> (
> "verbose" => \$verbose,
> "seq=s" => \$seq_file,
> "gff=s" => \$gff_file,
> )
> or die "failed to parse command line options\n";
>
> die "fail $gff_file : $!\n"
> unless -s $gff_file;
>
>
>
> ## Set up the BioPerl objects
>
> my $seq_reader =
> Bio::SeqIO->new( -file => $seq_file,
> -format => 'fasta'
> );
>
> my $seq_writer =
> Bio::SeqIO->new( -fh => \*STDOUT,
> -format => 'fasta',
> -width => 80
> );
>
> my $gff_reader =
> Bio::FeatureIO->new( -file => $gff_file,
> -format => 'GFF',
> );
>
> #warn $seq_reader->width, "\n"; exit;
>
>
>
> ## Run
>
> my (%repeats, $c);
>
> while ( my $feature = $gff_reader->next_feature() ) {
> if($verbose>1){
> print
> join("\t", #$feature,
> $feature->seq_id,
> $feature->type->name,
> $feature->start,
> $feature->end,
> ), "\n";
> }
>
> if($feature->type->name eq 'repeat_region'){
> $c++;
> push @{$repeats{ $feature->seq_id }},
> [$feature->start,
> $feature->end];
> }
>
> # Debugging
> #last if $c > 100;
> }
>
> warn "read $c repeat_region features for ",
> scalar keys(%repeats), " sequences\n";
>
>
>
> ##
>
> while(my $seq = $seq_reader->next_seq){
> my $id = $seq->id;
> my $sequence = $seq->seq;
>
> print $id, "\n"
> if $verbose > 0;
>
> print length($sequence), "\n"
> if $verbose > 0;
>
> for my $region (@{$repeats{ $id }}){
> my ($start, $end) = @$region;
> print "$start\t$end\n"
> if $verbose > 1;
>
> substr($sequence, $start, $end - $start, 'X' x ($end - $start));
> }
>
> print length($sequence), "\n"
> if $verbose > 0;
>
> $seq->seq($sequence);
>
> $seq_writer->write_seq($seq);
>
> # Debugging;
> #last;
> }
>
> warn "OK\n";
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list