[Biopython] Public example FASTQ files (for Tutorial examples)?

Peter Cock p.j.a.cock at googlemail.com
Fri Apr 1 09:59:29 UTC 2011


On Fri, Mar 25, 2011 at 7:37 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> One of the volunteers proof reading the Biopython tutorial
> noticed our links to specific example FASTQ files at the NCBI
> SRA don't work any more. They have withdrawn them from
> the FTP site, although you can still download the files in
> the compressed *.sra format and in in theory convert then
> to FASTQ locally with the NCBI's toolkit (which is cross
> platform).
>
> Another option is to download the FASTQ files via the
> NCBI's webinterface. Unless there is an obvious way to
> do this with a URL that I missed initially, we have a
> complicated situation to describe where the user can
> choose all the reads for an experiment or just the filtered
> set, and also choose to have them pre-trimmed or not.
> Plus for me at least, the HTPP download wasn't as
> robust as the FTP one was.

Brad pointed out we should be able to get the same reads
from the EBI's sequence read archive, the ENA.

I'm looking at that but the first example from the NCBI SRA,
a single 23MB  FASTQ file, which I had thought was single
ended Roche 454 data, :

ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX003/SRX003639/SRR014849.fastq.gz
[dead link]

I can find the same accession on the ENA, but it seems to
be paired end data - and looks to have longer reads than
the file from the NCBI (probably not quality trimmed?).

http://www.ebi.ac.uk/ena/data/view/SRR014849
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR014/SRR014849/SRR014849_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR014/SRR014849/SRR014849_2.fastq.gz

Interestingly going back to the NCBI SRA, that also says it
is paired end data, and looking at the data it does make
sense. I'm pretty sure the original FASTQ file I got from
the NCBI SRA a while ago would need parsing to spot
and split on the Roche 454 linker sequences, in this case
the 454flx linker:

GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC

Curious - but it won't be a quick job to just swap the URL,
I'll need to find another small example on the ENA instead.

Peter



More information about the Biopython mailing list