[Bioperl-l] Bio::SeqIO::scf traces scrambled?

Chris Fields cjfields at illinois.edu
Thu Jun 18 15:31:08 UTC 2009


Charles,

The best way to make sure this is addressed is to file a ticket (bug  
report) on it so we can properly track it.  I have a local  
installation of io_lib and I believe we also have Geneious installed  
locally (both of which read SCF), so I can work on confirming that.   
If it stays on the list it may not get answered and a possible bug  
report will be lost (to possibly bite someone else later).

AFAIK this module doesn't use staden::read but is pure perl.  You are  
more than welcome to try out Bio::SeqIO::staden::read, but I have to  
warn you that most of us are looking at replacing it's functionality  
at some point with BioLib bindings to io_lib (more stable) and so we  
don't intend on following up with bug fixes.

Note: there is also Bio::SCF (non-bp):

http://search.cpan.org/~lds/Bio-SCF-1.01/

chris

On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:

> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of  
> chromatograms. The SCF files are being produced by phred using the "- 
> cd" parameter. The traces come out great, and the corresponding base  
> calls from the .phd files align with the peaks wonderfully when I  
> visualize them on a rendered trace. However, only the A bases align  
> to the appropriate trace channel, the rest are mixed up. I find that  
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in  
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the  
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT  
> (matching the SeqIO unpack() for bases, but not traces). My  
> empirical channel unscrambling mapping implies order ACTG, which is  
> different from either of the two orders above. The sequence from the  
> SCF file (should be that from original AB1 file, I think) is not  
> perfectly identical to that called by phred, but is very similar (to  
> be expected); that is, I don't need to remap C, G and T to get it to  
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the  
> packed trace data to the appropriate bases. The unpack order is  
> different than the staden documentation ... but so is the order I  
> impose to correct the problem. I am still unclear as to the  
> differences between V2 and V3 of the format. The major difference  
> appears to be coding the trace absolutely (V2) or relatively to  
> prior values (V3); I'd expect if I was using one format and SeqIO  
> was trying to parse the other that I would get garbage out. Running  
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem  
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install  
> staden::read that I can work directly with the ab1 files, but I need  
> to stopgap on SCF for the time being....
>
> -CAT





More information about the Bioperl-l mailing list