[Biopython] Paired end SFF data

Wed Aug 19 11:52:52 EDT 2009

On Wed, Aug 19, 2009 at 2:26 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
>
> Basically after sample preparation, you have DNA fragments containing
> the 3' end of your sequence, a known linker, and then the 5' end of
> your sequence. The sequencing machine doesn't need to know what the
> magic linker sequence is, and (I infer) after sample preparation,
> everything proceeds as normal for single end 454 sequencing. The
> upshot is the SFF file for a paired end read is exactly like any other
> SFF file (apparently even for the XML meta data Roche include), just
> most of the reads should have a "magic" linker sequence somewhere in
> them.
>
> ... FLX linker:
> GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC.
>
> Note that the linker sequence depends on how the sample was prepared,
> and differs for different Roche protocols. e.g. According to the
> wgs-assembler documentation the known Roche 454 Titanium paired end
> linkers are instead:
> TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG and
> CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
> See http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Formatting_Inputs

According to the MIRA documentation, local sequencing centres may
also use their own linker sequences (and have been known to modify
the adaptor sequences), which would make things more complicated.
http://www.chevreux.org/uploads/media/mira3_faq.html#section_10

Peter