[Bioperl-l] Bio::SeqIO::scf traces scrambled?
Cook, Malcolm
MEC at stowers.org
Thu Jun 18 15:42:48 UTC 2009
Charles,
Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF
http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.
Its not in the bioperl project but it is an easy install from CPAN.
I am familiar with staden::read installation woes.
Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....
Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
#!/usr/bin/env perl
# PURPOSE: extract from AB1 files into fasta format the sequence in
# the 'clear range' defined by 3 parameters. If there is no clear
# range, emit warning and skip the sequence. The fasta 'defline'
# identifier is taken as the sample name. Other useful attributes are
# also embedded into the defline using attribute=value syntax.
# USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1
# NOTE: 20 4 20 is ABI default settings
# EXAMPLE:
# ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta
# AUTHOR: malcolm_cook at stowers-institute.org
use strict;
use warnings;
use Bio::Trace::ABIF;
use Text::Wrap qw(wrap);
$Text::Wrap::columns = 72; # wrap the sequence
use File::Basename;
my ($window_width,
$bad_bases_threshold,
$quality_threshold,
@ARGV) = @ARGV;
my $abif = Bio::Trace::ABIF->new();
sub main {} {
foreach (@ARGV) {
$abif->open_abif($_) or die "error opening $_ as ABIF";
my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
$bad_bases_threshold,
$quality_threshold
);
my $sample_score = $abif->sample_score(
$window_width,
$bad_bases_threshold,
$quality_threshold
);
# my $contiguous_read_length = $abif->contiguous_read_length($window_width,
# $quality_threshold,
# 0, # ==> trim_ends
# );
# my $length_of_read = $abif->length_of_read(
# $window_width,
# $quality_threshold,
# # $method
# );
my $defline =
join "\t",
$abif->sample_name,
#basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
#$abif->container_identifier . ':' . $abif->well_id, # or this, for container:well_id formatted defline identifiers
(map {my $method = $_;
"$method=". ($abif->$method() || '')}
qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
# sample_tracking_id - don't use this - it is internal to ABI software
"clear_range_start=$clear_range_start",
"clear_range_stop=$clear_range_stop",
"sample_score=$sample_score",
#"contiguous_read_length=$contiguous_read_length",
#"length_of_read=$length_of_read",
;
if ($clear_range_start == -1) {
warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
next;
}
my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
print ">$defline\n$seq\n";
$abif->close_abif();
}
}
main ();
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Charles Tilford
> Sent: Thursday, June 18, 2009 8:39 AM
> To: BioPerl List
> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
>
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace
> channels.
> Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out
> of chromatograms. The SCF files are being produced by phred
> using the "-cd"
> parameter. The traces come out great, and the corresponding
> base calls from the .phd files align with the peaks
> wonderfully when I visualize them on a rendered trace.
> However, only the A bases align to the appropriate trace
> channel, the rest are mixed up. I find that if I do the
> following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data
> to be in order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell,
> the relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be
> ACGT (matching the SeqIO unpack() for bases, but not traces).
> My empirical channel unscrambling mapping implies order ACTG,
> which is different from either of the two orders above. The
> sequence from the SCF file (should be that from original AB1
> file, I think) is not perfectly identical to that called by
> phred, but is very similar (to be expected); that is, I don't
> need to remap C, G and T to get it to align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections
> of the packed trace data to the appropriate bases. The unpack
> order is different than the staden documentation ... but so
> is the order I impose to correct the problem. I am still
> unclear as to the differences between
> V2 and V3 of the format. The major difference appears to be
> coding the trace absolutely (V2) or relatively to prior
> values (V3); I'd expect if I was using one format and SeqIO
> was trying to parse the other that I would get garbage out.
> Running in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a
> problem with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly)
> install staden::read that I can work directly with the ab1
> files, but I need to stopgap on SCF for the time being....
>
> -CAT
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list