[Biojava-l] error: Program ncbi-blastn Version 2.2.17 is not supported
Andy Yates
ayates at ebi.ac.uk
Tue Nov 27 15:16:12 UTC 2007
I was always under the impression that blast's XML output was nearly as
hard to parse as the flat file format but I do agree that if we can use
XML whenever we can it would make writing parsers a lot easier
(especially if there are SAX based XPath libraries available). Actually
this brings up a good question about development of this type of parser.
The majority of XPath supporting libraries are DOM based which will mean
large memory usage in some situations but overall providing an easier
coding experience (and hopefully reduce our chances of creating bugs).
Or should we code to the edge cases of someone trying to parse a 1GB
XML? Personally I'd favour the former.
Going back to the original topic there are going to be situations where
people want the flat file parsers/writers & I think it's a valid point
to say this is where BioJava is meant to come in & help a developer.
Afterall XML is a computer science problem where as parsing an EMBL flat
file or blast output is a bioinformatics problem.
Andy
Mark Schreiber wrote:
> For a long time now my feeling has been that we should *only* support
> the XML version of blast output. The other formats are too brittle to
> be easy to parse. I also feel similarly about Genbank, EMBL, etc that
> may be an extreme view but the power of generic XML parsers and things
> like XPath etc really make these formats look very attractive.
>
> - Mark
>
>
> On Nov 27, 2007 7:47 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>> I think Groovy have adopted a similar system recently & have guidelines
>> for how each module should behave (dependencies, build system etc). This
>> enforces the idea that a module whilst not part of the core project must
>> behave in the same manner the core does. I do like the idea that we can
>> have a core biojava & things get added around it & it might encourage
>> other users to start developing their own modules for any
>> formats/purpose they want.
>>
>> Richard Holland wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>>> What format options are there from blast? Just thinking if it supports
>>>> CIGAR or something like that are we better providing a parser for that
>>>> format & saying that we do not support the traditional blast output?
>>>> That said it doesn't help is when that format changes so maybe what is
>>>> needed is a way to push out parser changes without requiring a full
>>>> biojava release (v3 discussion) ...
>>> Exactly! So the modular idea would work nicely here - we could have a
>>> blast module and only update that single module (which would be its own
>>> JAR) whenever the format changes. In a way, BioJava releases as such
>>> would no longer happen, except maybe for some kind of core BioJava
>>> module. Everything would be done in terms of individual module+JAR
>>> releases instead - one for Genbank, one for BioSQL, one for NEXUS, one
>>> for Phylogenetic tools, one for translation/transcription, etc. etc.
>>>
>>> cheers,
>>> Richard
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
More information about the Biojava-l
mailing list