[Bioperl-l] randomizing fastq sequences
Cook, Malcolm
MEC at stowers.org
Tue Feb 8 15:12:47 UTC 2011
Gotta chime in....
If
you're working with fastq files
are working in unix and have the `shuf` command available
I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them
Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences.
Like this example, which uses a test fastq from my local install of bioperl:
> cd ~/local/src/bioperl-live/t/data/fastq/
> cdbfasta -Q example.fastq
3 entries from file example.fastq were indexed in file example.fastq.cidx
> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq
There would be issues if your IDs are not unique.
Malcolm Cook
Stowers Institute for Medical Research - Bioinformatics
Kansas City, Missouri USA
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> shalu sharma
> Sent: Monday, February 07, 2011 4:08 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] randomizing fastq sequences
>
> Hi,
> i am trying to test one program for which i need to change
> order of sequences in a fastq file.
> My fastq file contains about 50,000 sequences.
> Is there any script that can do this task?
>
> Thanks
> Shalu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list