[Biojava-dev] [Biojava-l] file i/o with ArrayList
Paolo Pavan
paolo.pavan at gmail.com
Thu Feb 5 11:23:14 UTC 2015
Hi Stephan,
I have recently worked with AbstractSequence and "childs" and so I have
things fresh enough in my mind.
Not every sequence has a proxyloader and not every proxyloader is a lazy
loader, the design is much more complicated and we have already discussed
it. I proposed also a refactoring but this is the long way and I'm also not
really sure it is worthwhile.
Nevertheless, If you have a few time that you can spend on it, I think the
quickest solution for you would be upgrading the GenbankWriter class,
taking advantage of the new full genbank reading capability of biojava4.
Would you?
Paolo
2015-02-05 0:42 GMT+01:00 stefan harjes <stefanharjes at yahoo.de>:
> Hi Andreas,
>
> yes I took a look at FastaWriterHelper as well as GenbankWriter and they
> only seem to implement writing the name and sequence as fasta. Also they do
> not allow to read/write a mixed array of protein and DNA sequences. I asked
> myself what is the sense of constructing a complicated sequence with
> annotations, features and links, if I can only write fasta?
>
> This lead me to check out why one of the most basic classes of biojava
> like sequence (i.e. AbstractSequence) is not serializable.
> (Isn't it like String for java?)
>
> The first thing I noticed is that for some reason every sequence has a
> proxyloader. As fas as I understand the proxy is implemented in order to
> not load the entire sequence in case it is very big. Sure, then you can
> load sequences which have Gigabase length. But I have never in my 25 years
> of biochemistry actually worked with a single sequence of > 1GB. While
> there are some plant chromosomes which might fit this description, I would
> argue that the vast majority of biological sequences are much smaller and
> thus do not need a proxy for a single sequence. Thus, I would conclude that
> a small subset of ChromosomeSequence might need a proxyreader
> implementation.
> And thus it should be implemented there and not in the most basic class?
>
> The first class which prevents serialization is as you mentioned
> NucleotideCompound. I lack the biojava experience to say what is essential
> in NucleotideCompond and why it does not allow an empty constructor. But I
> saw for example in biojava 3.1 that compounds are allowed to have flexible
> name length, which I have never seen in actual sequence data, where it is
> always 1 or three characters. Is it not a better strategy to keep basic
> classes such as Sequence and Compound more basic in order to allow
> serialization. Implementation of more complex features could then be moved
> to classes which extend the basic classes?
>
> In my humble opinion one could instantiate a compound without a 'base'
> name but once this compound is added to the compound set, I could check
> that it actually has a base name?
>
> I do not want to sound like a know-it-all and do not try to reinvent
> biojava. However to be honest the (unsuccessful) effort in trying to
> serialize an ArrayList<Sequence<?>> either to send it around over TCP/IP,
> to JSON or to disk has been so frustrating and time consuming, that I
> actually consider changing to jython/biopython, biojavaX, or to write my
> own implementation.
>
> Cheers
> Stefan
>
>
>
>
>
>
> Andreas Prlic <andreas at sdsc.edu> schrieb am 4:32 Donnerstag, 5.Februar
> 2015:
>
>
>
>
> Hi Stefan,
>
> just another quick follow up. You took a look at FastaWriterHelper and it
> was not useful, right? You need to serialize some header information as
> well, or what was the problem with it?
>
>
> http://www.biojava.org/docs/api/org/biojava/nbio/core/sequence/io/FastaWriterHelper.html
>
> Thanks,
>
> Andreas
>
>
> On Wed, Feb 4, 2015 at 7:13 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
> Thanks for pointing this out, Stefan. The problem is that the
> NucleotideCompound class does not have a zero-args constructor. That means
> you need to tweak kryo a bit. Kryo can be configured to use an
> InstantiatorStrategy to handle creating instances of a class.
> https://github.com/EsotericSoftware/kryo/blob/master/README.md
>
> Having said that, we need to improve the API and make something like this
> easier.
>
> Andreas
>
>
>
> On Wed, Feb 4, 2015 at 2:54 AM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
>
> I finally had some time to try the serialization/deserialization library
> (Kryo) you mentioned, but I do not seem to get it to work. I can not even
> save a DNASequence:
>
> void test() {
> Kryo kryo = new Kryo();
> DNASequence dna=null;
> try {
> dna = new DNASequence("AGCT");
> } catch (CompoundNotFoundException e1) {
> // TODO Auto-generated catch block
> e1.printStackTrace();
> }
> try {
> Output output = new Output(new FileOutputStream("test.ser"));
> kryo.writeObject(output, dna);
> output.close();
> } catch (FileNotFoundException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> try {
> Input input = new Input(new FileInputStream("test.ser"));
> dna = kryo.readObject(input, DNASequence.class);
> input.close();
> } catch (FileNotFoundException e) {
> // TODO Auto-generated catch block
> System.out.println("file not found");
> e.printStackTrace();
> }
> }
> I tried several calls of Kryo and also registration, but I can not get it
> to work.... Any ideas?
>
>
> Cheers
> Stefan
>
>
> Andreas Prlic <andreas at sdsc.edu> schrieb am 3:47 Samstag, 31.Januar
> 2015:
>
>
> Hi Stefan,
>
> for your use case (save and load at server start/stop) I'd recommend the
> Kryo library. It will store your data as a binary. Should be only two
> lines of code each to persist and load the data.
> https://github.com/EsotericSoftware/kryo
>
> You are right, writing is not very well developed, but then there are so
> many utility libraries in Java that can be used for efficient
> serialization/deserialization in many ways, once you have an object in
> memory.
>
> Andreas
>
>
>
> On Fri, Jan 30, 2015 at 3:01 AM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
>
> Hi biojava-l
>
>
>
> I have a huge number of small sequences in an Array
> (ListArray<Sequence<?>>) which for server start and stop I would like to
> store on disk. Unfortunately Sequence is not serilizable, so I searched and
> found that GenbankWriterHelper.writeSequences(OutputStream os,
> Collection<Sequence<?>> seqs) should be able to do the job.
> However when looking at GenbankReaderHelper, there are no methods which
> correspond to the above writer method. Am I on the wrong track completely?
>
> When looking at the writer/reader helpers, I think I remember reading that
> they are rudimentary and save only the sequence (fasta)? I would expect in
> such an advanced verision of biojava (4.0 is being prepared?) that there
> must be a standard way to serialize rich sequences/arrays of them in order
> to send them around on streams/Json etc?
>
> Any help would be appreciated
>
> Cheers
> Stefan
>
>
>
>
>
>
>
>
>
> --
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> RCSB PDB Protein Data Bank
> University of California, San Diego
>
> Editor Software Section
> PLOS Computational Biology
>
> BioJava Project Lead
> -----------------------------------------------------------------------
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150205/5a540004/attachment.html>
More information about the biojava-dev
mailing list