[Biojava-dev] SearchIO module for biojava

Jose Manuel Duarte jose.duarte at psi.ch
Sun Aug 30 11:51:21 UTC 2015


Hi Paolo

Sorry for the late reply, only now I've been able to have a quick look 
at your new SearchIO module and I think it looks fantastic. In my 
opinion this is a great and needed feature. I would definitely want to 
use it for my own projects as soon as possible. Some comments inline below.

On 22.08.2015 19:26, Paolo Pavan wrote:
>
> Also note that this is a required part of another module I have 
> written that can potentially be of community interest: a biojava-run 
> module, to bless it similarly to something already listened. This 
> latter aims to be a generic module used to run an analysis performed 
> by an external program. In my case I needed ncbi blast search. So the 
> API was written to declare a database of biojava Sequence objects, 
> pass a collection of query sequences and retrieve in output Result 
> objects of the SearchIO module.
> I know from previous attempt echoed in the mailing list that the 
> orientation of the project was to reimplement the blast algorithm in 
> pure Java and I agree that it would be a great idea. But until now 
> this project as far as I know is late and I solved the platform 
> portability issue by including several binaries for all the platforms 
> (well, the major) packaging all together in one jar file relying upon 
> this great Java facility.
> Anyway, all this came later.
The biojava-run module sounds interesting too, do you have it anywhere 
in github that we can have a look at it?

>
> Just to spend few technical comments on the SearchIO module:
> - included in core module since it defines a new base data structure
> - include a dependency from biojava-alignment. This is not compulsory, 
> it is there since the alignment data structure is included in that 
> package. In my opinion, moving this important data structure in core 
> will solve this and avoid similar problems in the future. This is also 
> the reason why I choose to add those new implemented Hits/Hsp etc 
> directly in core, after all search is one of the most important tasks 
> in bioinformatics.

I agree it only makes sense to have it in core right now. However the 
dependency on biojava-alignment is not ideal. In principle the best 
solution would be to move the alignment data structure to core, I agree 
with you. For the time being another possibility would be to put all the 
new SearchIO stuff in biojava-alignment, but that doesn't really improve 
the structure of things. I'd vote for moving the alignment data 
structure as a sounder and better long-term solution.

This would surely require a new minor version, so that all the new stuff 
would be released as biojava 4.2.


> - BlastXML parser is implemented in the BlastXMLQuery class. Maybe 
> this name it is not so meaningful, it comes from the original class 
> that is still there in biojava even if it seems not so much utilised, 
> that I initially started to improve trying to remain tighter to the 
> original project. From here also the use of the class XMLHelper and 
> some deprecated tags I added. From the old thread I understood that 
> there was not any "elective choice" of biojava for XML parsing, but 
> anyway the job was already done with the XMLHelper module and so this 
> class came to new life.
> - it was designed to be easy to extend: add support for a new file 
> format a developer must just write a single class that implements the 
> ResultFactory interface (I have implemented also a blast tabular 
> parser to show it). The Api for biojava user does not change, it is just:
> SearchIO reader = new SearchIO(new File("BlastReport.blastxml"), 
> blastResultFactory);
>
> - it is possible to auto recognise file formats relying upon standard 
> file extension. Just try a different constructor:
> SearchIO reader = new SearchIO(new File("BlastReport.blastxml"));
>

How about the different blast XML formats? Would it work with all the 
latest ones? Blast+ 2.2.31 has introduced some modifications to the 
format (see http://www.ncbi.nlm.nih.gov/books/NBK131777/)

> If you agree that this feature would be interesting for the project I 
> can send a pull request for the SearchIO part and then push on my 
> GitHub also the run module.
>

I think it is a very nice addition, in my opinion you should go ahead 
with the pull request which will make it easier for everyone to check it 
out and review it for a while.

Jose

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150830/e032085c/attachment.html>


More information about the biojava-dev mailing list