[Bioperl-l] randomizing fastq sequences

Chris Fields cjfields at illinois.edu
Tue Feb 8 15:53:27 UTC 2011


Just to note, I have been thinking about wrapping this for fast indexing and retrieval of FASTQ for bioperl (this came up in a prior thread, with the same suggestion from Malcolm IIRC).

chris

On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote:

> Gotta chime in....
> 
> If 
> 	you're working with fastq files 
> 	are working in unix and have the `shuf` command available
> 
> I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them 
> 
> Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences.
> 
> Like this example, which uses a test fastq from my local install of bioperl:
> 
>> cd ~/local/src/bioperl-live/t/data/fastq/
>> cdbfasta -Q example.fastq
> 3 entries from file example.fastq were indexed in file example.fastq.cidx
>> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq
> 
> There would be issues if your IDs are not unique.
> 
> Malcolm Cook
> Stowers Institute for Medical Research -  Bioinformatics
> Kansas City, Missouri  USA
> 
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> shalu sharma
>> Sent: Monday, February 07, 2011 4:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] randomizing fastq sequences
>> 
>> Hi,
>>   i am trying to test one program for which i need to change 
>> order of sequences in a fastq file.
>> My fastq file contains about 50,000 sequences.
>> Is there any script that can do this task?
>> 
>> Thanks
>> Shalu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list