[Biopython] Paired-End Read Splitting & Joining

Peter Cock p.j.a.cock at googlemail.com
Thu Nov 17 12:31:30 UTC 2011


On Thu, Nov 17, 2011 at 11:53 AM, Yaqiang Cao <caoyaqiang0410 at gmail.com> wrote:
>
> Thanks for replying.
>
> Yes, I have a .fastq file convert from .sra, used one of NCBI
> sratools,fastq-dump . And the file is over 1G. I want to split this into two
> FASTQ files because the tophat requires two files of paired-end sequence.
> The screenshot of the first 20 lines of the .fastq file is like the attached
> picture file:

Looking at the names, that file seems not to have both parts of each pair.

I looked on the NCBI SRA page, and the library is described as paired:
http://www.ncbi.nlm.nih.gov/sra?term=srr100235

There only seems to be one SRA file for this accession,
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX042/SRX042254/SRR100235/

i.e. This file:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX042/SRX042254/SRR100235/SRR100235.sra

I'd look more but the SRA website tells me "Our database is
temporarily unavailable. Please come back later."

Peter



More information about the Biopython mailing list