[Biopython] Paired-End Read Splitting & Joining

Thu Nov 17 09:21:21 UTC 2011

On Thu, Nov 17, 2011 at 3:20 AM, 曹亚强 <caoyaqiang0410 at gmail.com> wrote:
> Dear mail-lists:
>        Hi, my first time of asking questions in mailing, please excuse me
> if there is any possible problems.
>        I'm new in Python and biopython, nearly without practically
> programming experience in Bioinformatics. Recently my work get involved in
> transcriptome and TopHat(http://tophat.cbcb.umd.edu/manual.html) , the
> software needs paired-end sequences in two fastq files. So I wonder can
> biopython finish the job in a conventient way? Because the paired-end file
> is too big and can't be done in a conventient way in *Galaxy*
>        Please give me some guide. Thanks.
>
> Best wishes,
> Yaqiang Cao

Probably, yes.

So you have one large FASTQ file containing both parts of
each pair (say part one and part two, or they might be
labelled as the forward and reverse reads), and you want
to split this into two FASTQ files?

How are your reads named? The hard part is inferring this,
one common scheme used /1 and /2 suffixes, but Illumina
have changed this in their latest pipeline and the part is
now in the description instead.

Could you show us the first 6 reads (or so) from the big
FASTQ file?

Also are there any single reads in your file, either never
paired or orphaned where one of a pair failed Qc?

Peter