[Bioperl-l] Working with SeqWithQuality

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Thu Jun 19 15:28:24 UTC 2008


Hi Roy

That's exactly what I want, thanks, it just wasn't where I was expecting
it

Mick 

-----Original Message-----
From: Roy Chaudhuri [mailto:rrc22 at cam.ac.uk] 
Sent: 19 June 2008 16:27
To: michael watson (IAH-C)
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Working with SeqWithQuality

Hi Mick,

I think you want Bio::Tools::Alignment::Trim. There are lots of caveats 
in the POD but as I recall it works fine.

Here is a small script I wrote to return contigs over a certain length 
and a certain minimum quality (not exactly what you asked for but I'm 
sure it could be adapted).

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::Tools::Alignment::Trim;
use Getopt::Long;
die "Usage: trimqual fasta_file qual_file -qual 20 -window 10 -min 
500\n" unless $ARGV[1];
my $phred=20;
my $window=10;
my $min=500;
GetOptions ('phred|qual=i'=>\$phred,
             'window=i'=>\$window,
             'minimum=i'=>\$min) or die "Unrecognised option\n";
my $fasta=Bio::SeqIO->newFh(-file=>$ARGV[0], -format=>'fasta');
my $quality=Bio::SeqIO->newFh(-file=>$ARGV[1], -format=>'qual');
my $out=Bio::SeqIO->newFh(-format=>'fasta');
my $count=0;
while (<$fasta>) {
     my $seq=$_;
     my $qual=<$quality>;
     my $trim=Bio::Tools::Alignment::Trim->new(-phreds=>$phred,
                                               -windowsize=>$window);
     my ($start,$end) = @{$trim->trim_singlet($seq->seq,
                                              join(' ',@{$qual->qual}),
 
$seq->display_id,'singlet')};
     next if $end==0;
     my $trimmed_sequence=substr($seq->seq, $start, $end+1-$start);
     my $length=length $trimmed_sequence;
     print STDERR $seq->display_id, ": $start-$end [$length bp]";
     if ($length<$min) {
         print STDERR " - skipped\n";
         next;
     } else {
         print STDERR "\n";
     }
     my $trimseq=Bio::Seq->new(-seq=>$trimmed_sequence,
                               -id=>$seq->display_id);
     print $out $trimseq;
     $count++;
}
print STDERR "\nFinished. $count contigs larger than $min bp. were 
found.\n";


Cheers,
Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.


michael watson (IAH-C) wrote:
> Hi
> 
> I have fasta files and separate quality fasta files.  I understand all
> about constructing a SeqWithQuality object, that's the easy bit, but
are
> there no functions for actually manipulating the sequence based on the
> quality values?  For example, trimming the ends where a moving window
> average quality does not go above a certain value, or masking areas
with
> low quality?  
> 
> Thanks
> Mick
> 
> The information contained in this message may be confidential or
legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the
originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful. 
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute. 
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes. 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list