[bioperl-l] changed Bio::SeqIO::scf

ecky@e-lehmann.de ecky@e-lehmann.de
Thu, 11 Apr 2002 16:51:18 +0200


Hi,

I made some changes to the Bio::SeqIO::scf module for myself because I
need to write base variants to scf files.

When I now pass a SeqWithQuality object to the
Bio::SeqIO::scf::write_seq() method that contains a sequence string with
IUPAC coded letters, the .scf file is written so that the traces overlap
at every point where a variant (a M,R,W,S,T,Y,K,V,H,D or B) appears in
the sequence string. The scf module of bioperl 1.0 wasn't able to write
.scf files this way before.
If I run phredPhrap and consed on these .scf files, consed will
recognise heterocygote variants at the special mentioned positions.

For the changes I took bioperl 1.0, not an cvs version, and it is the
very first time I changed resp. extended the bioperl source code.
So my question: is it worth to take this little change over to the
official bioperl cvs version? If not, don't regard this mail...

The code for the changed scf module can be found at
http://e-lehmann.de/perl/scf.pm
At the end of this mail I explain shortly what I have done to the
module.


Eckhard



Here is what I changed:

sub _set_binary_tracesbases
There is a section in this method that contains 5 times almost the same
if statement, which begins with:
if ($current_base eq "A"){ ... }
It is for setting up the trace-peaks for every base. I took all those if
statements out and created a extra sub routine:
----
sub _set_base_traces {
  my ($self, $base, $place_at, $peak_qual, $is_variant) = @_;
  $base =~ tr/A-Z/a-z/;

  my $half_ramp = int($self->{'text'}->{'ramp_width'}/2);
  my $ramp_pos = $place_at - $half_ramp;
  for (my $i = 0; $i < $self->{'text'}->{'ramp_width'};  $i++) {
    $self->{'text'}->{"samples_$base"}[$ramp_pos+$i] = $peak_qual *
$self->{'text'}->{'ramp'}[$i];
  }

  if ($base eq "a") {push
@{$self->{'text'}->{'v2_bases'}},($place_at+1,$peak_qual,0,0,0,$base,0,0,0);}
  elsif ($base eq "c") {push
@{$self->{'text'}->{'v2_bases'}},($place_at+1,0,$peak_qual,0,0,$base,0,0,0);}
  elsif ($base eq "g") {push
@{$self->{'text'}->{'v2_bases'}},($place_at+1,0,0,0,$peak_qual,$base,0,0,0);}
  elsif ($base eq "t") {push
@{$self->{'text'}->{'v2_bases'}},($place_at+1,$peak_qual,0,0,0,$base,0,0,0);}

  push @{$self->{'text'}->{"v3_base_accuracy_$base"}},$peak_qual unless
$is_variant;

  if (!$is_variant) {
    my $ba = "acgt";
    $ba =~ s/$base//;
    my @baarray = split(//,$ba);

    my $b;
    foreach $b(@baarray) {
      push @{$self->{'text'}->{"v3_base_accuracy_$b"}},0;
    }
  }
}
---------

furthermore I call the routine in _set_binary_tracesbases() with if
statements for every (M,R,W,S,T,Y,K,V,H,D,B) like this:
----------
if ($current_base eq "M") {
      #modify the peak_quality for the ambiguous base at this point
      #this time it is 50% subtracted from the peak_quality for the
major allele base
      #the line is very similar in all the following if statements...
      my $mod_peak_qual = int ($peak_quality * (1 - 0.5));

      $self->_set_base_traces("A", $place_base_at, $peak_quality);
      $self->_set_base_traces("C", $place_base_at, $mod_peak_qual, 1);
    }
---------
and at least
---------
#no heterocygote variant at this position in the sequence
    else {$self->_set_base_traces($current_base, $place_base_at,
$peak_quality);}
---------