[Bioperl-l] Bio::SeqIO::scf traces scrambled?
Chris Fields
cjfields at illinois.edu
Thu Jun 18 15:31:08 UTC 2009
Charles,
The best way to make sure this is addressed is to file a ticket (bug
report) on it so we can properly track it. I have a local
installation of io_lib and I believe we also have Geneious installed
locally (both of which read SCF), so I can work on confirming that.
If it stays on the list it may not get answered and a possible bug
report will be lost (to possibly bite someone else later).
AFAIK this module doesn't use staden::read but is pure perl. You are
more than welcome to try out Bio::SeqIO::staden::read, but I have to
warn you that most of us are looking at replacing it's functionality
at some point with BioLib bindings to io_lib (more stable) and so we
don't intend on following up with bug fixes.
Note: there is also Bio::SCF (non-bp):
http://search.cpan.org/~lds/Bio-SCF-1.01/
chris
On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of
> chromatograms. The SCF files are being produced by phred using the "-
> cd" parameter. The traces come out great, and the corresponding base
> calls from the .phd files align with the peaks wonderfully when I
> visualize them on a rendered trace. However, only the A bases align
> to the appropriate trace channel, the rest are mixed up. I find that
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT
> (matching the SeqIO unpack() for bases, but not traces). My
> empirical channel unscrambling mapping implies order ACTG, which is
> different from either of the two orders above. The sequence from the
> SCF file (should be that from original AB1 file, I think) is not
> perfectly identical to that called by phred, but is very similar (to
> be expected); that is, I don't need to remap C, G and T to get it to
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the
> packed trace data to the appropriate bases. The unpack order is
> different than the staden documentation ... but so is the order I
> impose to correct the problem. I am still unclear as to the
> differences between V2 and V3 of the format. The major difference
> appears to be coding the trace absolutely (V2) or relatively to
> prior values (V3); I'd expect if I was using one format and SeqIO
> was trying to parse the other that I would get garbage out. Running
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install
> staden::read that I can work directly with the ab1 files, but I need
> to stopgap on SCF for the time being....
>
> -CAT
More information about the Bioperl-l
mailing list