[Biojava-dev] Parser backwards compatibility

Scooter Willis HWillis at scripps.edu
Sat Apr 14 14:48:12 UTC 2012


We still have the reality that per data set you typically have one data
format. Since you are going after a specific return type for the actual
data I don't think you gain very much by abstracting out file parsing. I
think it is more important to handle large files in the way that we
support indexing of fasta files. First pass find all the index positions
in the file and return the appropriate sequence object. At some point in
the future if you need the sequence the underlying storage mechanism knows
how to retrieve the data from disk quickly. Same concept for databases or
remote web services.

On 4/14/12 10:35 AM, "P. Troshin" <to.petr at gmail.com> wrote:

>> So what you're looking for is something like this?
>> FastaParser fasta = ParserFactory.fasta("example.fasta");
>> FastqParser fastq = ParserFactory.fastq("example.fastq");
>
>Yes, only that I expect to construct a parser from an InputStream as well.
>I agree with Hannes that having a factory you could guess the input
>format and instantiate an appropriate parser. However, I do not see
>this as a particularly important feature because in the real life you
>usually know which format you work with.
>
>Regards,
>Peter
>
>
>On 14 April 2012 14:50, David Felty <davfelty at gmail.com> wrote:
>> Michael Heuer wrote:
>>> Open source projects should projects should provide room for
>>> both evolutionary and revolutionary changes
>>
>> Thanks for all the info, very useful!
>>
>>
>> P. Troshin wrote:
>>> I think you just need to make a common entry point for them.
>>> E.g a factory class which would contain functions to
>>> instantiate various parsers.
>>
>> So what you're looking for is something like this?
>> FastaParser fasta = ParserFactory.fasta("example.fasta");
>> FastqParser fastq = ParserFactory.fastq("example.fastq");
>>
>> Scooter Willis wrote:
>>> Can you give some examples of what you are trying to do for
>>> the common set of interfaces?
>>
>> I gave this example in my proposal at
>> 
>>http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dfelt/
>>2001
>>
>> for (BasicSequence seq : SeqIO.parse(inStream, SeqFormat.FASTA) {
>>     System.out.println(seq.getSequenceAsString());
>> }
>>
>> But I think Troshin's idea would be easier to implement, given
>> the current BioJava parsers.
>>
>> On Apr 13, 2012 1:31 PM, "P. Troshin" <to.petr at gmail.com> wrote:
>>>
>>> Hi David,
>>>
>>> > In order to fit BioJava's parsers into a shared API, I would like to
>>> > wrap them under a common set of interfaces.
>>>
>>> I think you just need to make a common entry point for them. E.g. a
>>> factory class which would contain functions to instantiate various
>>> parsers.
>>> You only need a common interface for the same parsers, e.g. Fasta
>>> parsers. However, I'd be inclined to converge all Fasta parsers in
>>> BioJava to one parser. So I am not sure you'd need a common interface
>>> in the end.
>>>
>>> >However, I foresee that
>>> > some of the parsers will resist being wrapped, and will need to
>>>either
>>> > be modified or rewritten.
>>>
>>> You'll need to choose the best parser and implement features that a
>>> lacking from it. Other parsers then can be retired.
>>>
>>>
>>> >However, this would mean that two different
>>> > copies of the same parsers would exist in BioJava, which I think is
>>> > kind of ugly.
>>>
>>> Yes, that would be scary for the languages like Perl or Python, but
>>> Java is compiled language, so you'll see most of the problems as
>>> compilation errors. You will also need to write unit tests for
>>> existing parsers and then for your new parser to make sure that
>>> rewrite were successful.
>>>
>>> >However, this would mean that two different
>>> > copies of the same parsers would exist in BioJava, which I think is
>>> > kind of ugly.
>>>
>>> The whole idea of this project is to get rid of this ugliness, and
>>> provide a streamline API for the users as well as the powerful
>>> parsers.
>>>
>>> Hope that helps.
>>> Regards,
>>> Peter
>>>
>>>
>>> On 13 April 2012 14:47, David Felty <davfelty at gmail.com> wrote:
>>> > In order to fit BioJava's parsers into a shared API, I would like to
>>> > wrap them under a common set of interfaces. However, I foresee that
>>> > some of the parsers will resist being wrapped, and will need to
>>>either
>>> > be modified or rewritten. So my question is, should I keep the
>>> > original versions these problematic parsers around for backwards
>>> > compatibility, or can I freely modify them to fit into the new API?
>>> > I'm afraid that the latter would break existing code, so I'm more
>>> > inclined to do the former. However, this would mean that two
>>>different
>>> > copies of the same parsers would exist in BioJava, which I think is
>>> > kind of ugly. Any thoughts?
>>> >
>>> > Thanks,
>>> > David
>>> > _______________________________________________
>>> > biojava-dev mailing list
>>> > biojava-dev at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list