[Biojava-dev] sff files

Scooter Willis HWillis at scripps.edu
Mon Nov 8 09:10:13 EST 2010


Charles

If you could take a look at the Biojava3 FASTA parser that would be a good template for integration into Biojava3. We have support for proxy sequences that allows you to go through and parse very large sequence files and pick up the appropriate accession id and index to offsets in the file. It could also hide details for sequences stored in a database or retrieved via a web services call. You can parse the file and organize DNASequences and do lazy loading of the actual sequence data. I have some code example of using the FileProxyProteinSequenceCreator at http://biojava.org/wiki/BioJava:CookBook:Core:Overview

Thanks

Scooter

On Nov 8, 2010, at 8:24 AM, Charles Imbusch wrote:

Hi all,

for a project I implemented a rudimentary support for sff files coming
from 454 sequencing machines. I packed and uploaded the code to:

http://imbusch.net/tmp/sffParser.tar

It is capable of extracting read information if the read id is known.
Certainly an iterator for the reads  and taking advantage of the mft index
structur (thanks to Peter for information) is necessary.

An example code to extract a sequence:

String sfffile = "/home/charlie/sff/Harmigera/EU97XD416.sff";
sffParser sffparser = new sffParser(sfffile);
System.out.println("number of reads: " + sffparser.get_number_of_reads());
Read read = sffparser.get_Read("EU97XD416JXTCU");
System.out.println("sequence for read EU97XD416JXTCU");
System.out.println(read.get_bases());

I would like to extend and integrate the code into BioJava but I'm a bit
unsure on how to proceed. Especially the Read class was a quick solution
for me. Maybe there is already something existing to manage reads and
their quality scores?

Any feedback is welcome!

Cheers,
 Charles


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev




More information about the biojava-dev mailing list