[Bioperl-l] Bio::SeqIO::scf traces scrambled?
Charles Tilford
charles.tilford at bms.com
Thu Jun 18 19:59:01 UTC 2009
Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl.
Yes, that's my understanding too. I'm using the SeqIO module because of
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>
I have that installed, but have not tried it out yet.
Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of
>> chromatograms. The SCF files are being produced by phred using the "-
>> cd" parameter. The traces come out great, and the corresponding base
>> calls from the .phd files align with the peaks wonderfully when I
>> visualize them on a rendered trace. However, only the A bases align
>> to the appropriate trace channel, the rest are mixed up. I find that
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT
>> (matching the SeqIO unpack() for bases, but not traces). My
>> empirical channel unscrambling mapping implies order ACTG, which is
>> different from either of the two orders above. The sequence from the
>> SCF file (should be that from original AB1 file, I think) is not
>> perfectly identical to that called by phred, but is very similar (to
>> be expected); that is, I don't need to remap C, G and T to get it to
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the
>> packed trace data to the appropriate bases. The unpack order is
>> different than the staden documentation ... but so is the order I
>> impose to correct the problem. I am still unclear as to the
>> differences between V2 and V3 of the format. The major difference
>> appears to be coding the trace absolutely (V2) or relatively to
>> prior values (V3); I'd expect if I was using one format and SeqIO
>> was trying to parse the other that I would get garbage out. Running
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install
>> staden::read that I can work directly with the ab1 files, but I need
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>
>
>
>
>
More information about the Bioperl-l
mailing list