[Bioperl-l] randomizing fastq sequences

Cook, Malcolm MEC at stowers.org
Tue Feb 8 15:12:47 UTC 2011


Gotta chime in....

If 
	you're working with fastq files 
	are working in unix and have the `shuf` command available

I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them 

Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences.

Like this example, which uses a test fastq from my local install of bioperl:

> cd ~/local/src/bioperl-live/t/data/fastq/
> cdbfasta -Q example.fastq
3 entries from file example.fastq were indexed in file example.fastq.cidx
> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq

There would be issues if your IDs are not unique.

Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> shalu sharma
> Sent: Monday, February 07, 2011 4:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] randomizing fastq sequences
> 
> Hi,
>    i am trying to test one program for which i need to change 
> order of sequences in a fastq file.
> My fastq file contains about 50,000 sequences.
> Is there any script that can do this task?
> 
> Thanks
> Shalu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list