[Bioperl-l] Next Gen Formats

Peter biopython at maubp.freeserve.co.uk
Fri Mar 12 13:26:39 UTC 2010


On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar <golharam at umdnj.edu> wrote:
>
> Here is an example of a color-space sequence:
>
> In one file (something.csfasta):
>
>>1_30_226_F3
> T210320010.200.03.0110320320220212200122200.2220200
>>1_30_252_F3
> T322220212.133.00.2202322132022202221002011.0011020
>
> The '.' means the color could not be called
>
> In another file (something.qual):
>
>>1_30_226_F3
> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20
> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4
>>1_30_252_F3
> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5
> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12
>
> The -1 represents those colors that could not be called.

Now that is funny (using -1). True PHRED scores are defined with a
logarithm and can't be negative. A score of zero is normally used in
this situation since that maps to a probability of error of 1 (i.e. the
read is 100% wrong, or 0% true).

Where did these files come from? Direct from a sequencing
machine or via some third party script?

Peter



More information about the Bioperl-l mailing list