[Biojava-dev] biojava 3 progress
Scooter Willis
HWillis at scripps.edu
Wed Mar 17 16:09:29 UTC 2010
Andy
Let me know if you have any major code changes for the core sequencing handling that have been or could be checked in. So far I haven't needed to touch any of the core sequence code but want to avoid merging code if you have made any significant changes.
I should have code to check in today and if we can't come up with a better name I will ask Andreas to create a biojava3-genes module and I can then check that code in for your review. The current problem is that we have ExonSequence extending DNASequence when it could also be described as a feature. One way to look at this that a TranscriptSequence is also a feature of a DNA sequence and only when you want to have a stand alone class with internal links back to parent sequence do you return a TranscriptSequence. The TranscriptFeature would have ExonFeature and IntronFeature as children. You can ask for a ExonSequence based on the ExonFeature. Once you get a ProteinSequence you should be able to reverse the process and get back the TranscriptSequence and the corresponding ExonFeatures and some sort of mapping from a protein sequence position back to the three DNA sequence positions that coded for it. This would need to handle the case where you have a the end of an exon and the start of the next exon coding for a particular amino acid sequence position.
We also need to add in the ability to have tracks as a way to group features. This way you export features based on a particular track as a GFF/GFF3 file for importing into various genome browsers. You have one genome you are working on with genes added in from three different gene prediction algorithms each organized by a track. You should then be able to determine overlaps of genes that were predicted and validated via blast against uniprot and create another summary track of validated genes and non-validate genes. If the feature classes we put together can make this easy then I think we will have a solid design.
Scooter
More information about the biojava-dev
mailing list