[Biojava-dev] Code Update

Scooter Willis HWillis at scripps.edu
Tue Jan 26 20:17:47 UTC 2010


Andy

Let me know when you have that in a healthy state and I will work on the gtf/gff3 parser->create gene->transcript->(exon)->to protein code.

Scooter

On Jan 26, 2010, at 2:58 PM, Andy Yates wrote:

> Talking about code updates I've got DNA -> RNA -> Peptide working  
> quite well. It's about a day or two of tinkering away from being  in a  
> sensible state. There's also some utilities I've gone & created;  
> they've gone into org.biojava3.core.util ... anyone got any better  
> suggestions as to where they should live?
> 
> Andy
> 
> On 26 Jan 2010, at 18:45, Andreas Prlic wrote:
> 
>> the cookbook approach seems to work quite well. You could start a new
>> "Chapter" in the book and make it clear that this will be only
>> available once biojava 3 has been released (or via SVN checkout)
>> 
>> Andreas
>> 
>> On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis  
>> <HWillis at scripps.edu> wrote:
>>> 
>>> I checked in updates with test cases for Fasta fileparsing where  
>>> the main focus is on the fasta header.  The test cases are based on  
>>> the wikipedia examples so results will vary with actual files. It  
>>> is very easy now to do a custom header parser so we have lots of  
>>> flexibility.  I also started the code for the file pointer sequence  
>>> proxy where the key usage is creating a sequence with the header  
>>> and storing a reference to the file and offset in the file for the  
>>> start of the sequence. When a method is called related to getting a  
>>> sequence/subsequence the init() method is called to load the  
>>> sequence data via RandomAccessFile with a seek to the offset. It  
>>> turns out that none of the java io classes will actually return an  
>>> offset index of the actual bytes read. This also gets complicated  
>>> with the readline() methods where the CR and/or LF is stripped off  
>>> when the string is returned so you can't keep track of it  
>>> externally. I copied the BufferedReader.java class to  
>>> BufferedReaderBytes!
>>> Read.java and keep track of the file pointer internally. This code  
>>> still needs to be tested. This should be a great way to load large  
>>> date sets with minimal memory. To complete this approach I will  
>>> probably do a collection that is proxy aware that can go through  
>>> and free up storage by returning a sequence to its proxy state.
>>> 
>>> I will work this week on getting some wiki pages created to give  
>>> examples on using the header parsing interface and proxy sequences.  
>>> How do we want to organize wiki pages related to biojava3 work?
>>> 
>>> Thanks
>>> 
>>> Scooter
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 





More information about the biojava-dev mailing list