[Bioperl-l] randomizing fastq sequences

Tue Feb 8 11:48:44 EST 2011

Hi All,
    Thanks for all the suggestions.
@Simon Andrew and Roy:
   Your method worked perfect but now memory is the issue.
Now i have to select 50K fastq sequences from a illumina data (around 70 mil
reads) randomly , so is there again any module that can select random
sequences from fastq file?

I can still use same methods on 50k sequences but getting 50k from huge data
set is a problem.
Also at some point i need to shuffle the  fastq reads (order of
nucleotides).

I am really sorry for asking lot of things , i know i am really bad in
handling fastq sequences.
i would really appreciate your suggestions.

Thanks
Shalu

On Tue, Feb 8, 2011 at 10:53 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Just to note, I have been thinking about wrapping this for fast indexing
> and retrieval of FASTQ for bioperl (this came up in a prior thread, with the
> same suggestion from Malcolm IIRC).
>
> chris
>
> On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote:
>
> > Gotta chime in....
> >
> > If
> >       you're working with fastq files
> >       are working in unix and have the `shuf` command available
> >
> > I recommand you to install cdbyank
> http://sourceforge.net/projects/cdbfasta/ which provides for indexing
> fasta and fastq files and providing random access to them
> >
> > Index the fastq, then extract the IDs with cdyank, pipe them through
> `shuf` and then through cdyank again to pull out the sequences.
> >
> > Like this example, which uses a test fastq from my local install of
> bioperl:
> >
> >> cd ~/local/src/bioperl-live/t/data/fastq/
> >> cdbfasta -Q example.fastq
> > 3 entries from file example.fastq were indexed in file example.fastq.cidx
> >> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx >
> shuf_example.fastq
> >
> > There would be issues if your IDs are not unique.
> >
> > Malcolm Cook
> > Stowers Institute for Medical Research -  Bioinformatics
> > Kansas City, Missouri  USA
> >
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >> shalu sharma
> >> Sent: Monday, February 07, 2011 4:08 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] randomizing fastq sequences
> >>
> >> Hi,
> >>   i am trying to test one program for which i need to change
> >> order of sequences in a fastq file.
> >> My fastq file contains about 50,000 sequences.
> >> Is there any script that can do this task?
> >>
> >> Thanks
> >> Shalu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>