[Biojava-l] Reading and writting Fastq files

Thu Apr 1 03:56:42 UTC 2010

xyz wrote:

> Thank you it works, but after I extended the code with
> RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
> fastq.getDescription());
> in order to get also a trimmed fasta file I got the following error:
>
> Fastq2Fasta.java:51: cannot
> find symbol symbol  : method
> writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
> location: class org.biojavax.bio.seq.RichSequence.IOTools
> RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
> fastq.getDescription()); 1 error

The fastq package has not yet been integrated with biojava core or the
biojavax packages.  If you would like to use RichSequence.IOTools, you
would need to create a RichSequence from each Fastq object before writing.

Something like

import static ...RichSequence.Tools.*;
import static ...RichSequence.IOTools.*;

Fastq fastq = ...;
Namespace namepace = ...;
RichSequence richSequence = createRichSequence(
  namespace,
  fastq.getDescription(),
  fastq.getSequence(),
  DNATools.getDNA());

writeFasta(outputStream, richSequence, namespace);

may work.

> Suggestions:
> 1)
> After I trimmed the fastq files the header information for quality
> is empty
>
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAA
> +
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
>
> this reduced the size of the files but is it compatible with
> SOAP and TopHat?

Sorry, not sure what you are asking here.

> 2)
> I was using fastq files up to 6 GBytes and I have not run any benchmarks
> with different Buffer/stream combination on big text files and therefore
> I am not sure that is enough to use just FileInputStream or
> FileOutputStream. BioJavaX is using BufferedReader br = new
> BufferedReader(new FileReader()) are there any speed difference?

AbstractFastqReader.read(InputStream) uses a BufferedReader, and all the
other read methods pass through that one.

   michael