[Biojava-dev] Code Update

Scooter Willis HWillis at scripps.edu
Tue Jan 26 18:09:36 UTC 2010


I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header.  The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility.  I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytesRead.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state.

I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? 

Thanks

Scooter



More information about the biojava-dev mailing list