[Bioperl-l] Working with SeqWithQuality

Roy Chaudhuri roy.chaudhuri at gmail.com
Thu Jun 19 15:44:37 UTC 2008


Hi Mick,

I think you want Bio::Tools::Alignment::Trim. There are lots of caveats
in the POD but as I recall it works fine.

Here is a small script I wrote to return contigs over a certain length
and a certain minimum quality (not exactly what you asked for but I'm
sure it could be adapted).

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::Tools::Alignment::Trim;
use Getopt::Long;
die "Usage: trimqual fasta_file qual_file -qual 20 -window 10 -min
500\n" unless $ARGV[1];
my $phred=20;
my $window=10;
my $min=500;
GetOptions ('phred|qual=i'=>\$phred,
              'window=i'=>\$window,
              'minimum=i'=>\$min) or die "Unrecognised option\n";
my $fasta=Bio::SeqIO->newFh(-file=>$ARGV[0], -format=>'fasta');
my $quality=Bio::SeqIO->newFh(-file=>$ARGV[1], -format=>'qual');
my $out=Bio::SeqIO->newFh(-format=>'fasta');
my $count=0;
while (<$fasta>) {
      my $seq=$_;
      my $qual=<$quality>;
      my $trim=Bio::Tools::Alignment::Trim->new(-phreds=>$phred,
                                                -windowsize=>$window);
      my ($start,$end) = @{$trim->trim_singlet($seq->seq,
                                               join(' ',@{$qual->qual}),
                                               $seq->display_id,'singlet')};
      next if $end==0;
      my $trimmed_sequence=substr($seq->seq, $start, $end+1-$start);
      my $length=length $trimmed_sequence;
      print STDERR $seq->display_id, ": $start-$end [$length bp]";
      if ($length<$min) {
          print STDERR " - skipped\n";
          next;
      } else {
          print STDERR "\n";
      }
      my $trimseq=Bio::Seq->new(-seq=>$trimmed_sequence,
                                -id=>$seq->display_id);
      print $out $trimseq;
      $count++;
}
print STDERR "\nFinished. $count contigs larger than $min bp. were
found.\n";


Cheers,
Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


michael watson (IAH-C) wrote:
> Hi
> 
> I have fasta files and separate quality fasta files.  I understand all
> about constructing a SeqWithQuality object, that's the easy bit, but are
> there no functions for actually manipulating the sequence based on the
> quality values?  For example, trimming the ends where a moving window
> average quality does not go above a certain value, or masking areas with
> low quality?  
> 
> Thanks
> Mick
> 
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful. 
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute. 
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes. 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list