[Biojava-dev] Biojava.util package?

Hannes Brandstätter-Müller biojava at hannes.oib.com
Thu Mar 29 14:05:29 UTC 2012


SeqIO definitely needs to be easily extensible, and to support as many
of the input file formats listed on the SeqIO pages for biopython and
bioperl as possible.

I would also prefer the biopython way of being explicit in specifying
what is contained in the files instead of trying to guess from the
filename/extension.

I think making a wrapper like you propose is a good idea, but the
design specifics need to be discussed in more detail (static methods
vs. SeqIO instance with the settings defined via the constructor…)

Hannes

On Thu, Mar 29, 2012 at 15:55, David Felty <davfelty at gmail.com> wrote:
> I've actually been working on something like this for my GSoC proposal,
> here's what I came up with:
>
> public class SeqIO {
>     public static final int FASTA = 0;
>     public static final int FASTQ = 1;
>     public static final Class<DNASequence> DNA = DNASequence.class;
>     public static final Class<ProteinSequence> PROTEIN =
> ProteinSequence.class;
>
>     public static <S extends Sequence> Iterable<S> parse(InputStream is, int
> fileFormat, Class<S> seqType) throws Exception {
>         switch (fileFormat) {
>             case FASTA:
>                 if (seqType == DNA) {
>                     return (Iterable<S>)
> FastaReaderHelper.readFastaDNASequence(is);
>                 } else if (seqType == PROTEIN) {
>                     // etc...
>                 }
> break;
>             case FASTQ:
>                 // etc...
>         }
>     }
> }
>
> It would be used like so:
>
> InputStream is = ...
> Iterable<DNASequence> seqs = SeqIO.parse(is, SeqIO.FASTA, SeqIO.DNA);
> for (DNASequence s : seqs) {
>    // do something
> }
>
> Obviously it's not the prettiest and a lot could be changed, but that's my
> initial design. I tried to base it off BioPython's SeqIO, but static typing
> and the variety of Sequence types forced me to put in some nasty generics.
> Any tips would be appreciated!
>
> David
>
> On Thu, Mar 29, 2012 at 4:27 AM, Hannes Brandstätter-Müller
> <biojava at hannes.oib.com> wrote:
>>
>> Yes, something like a simplifying and unifying wrapper would be what I
>> am thinking of.
>>
>> Hannes
>>
>> On Thu, Mar 29, 2012 at 05:55, Andreas Prlic <andreas at sdsc.edu> wrote:
>> > Hi Hannes,
>> >
>> > I guess this is pretty similar to:
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:FastaReadWrite
>> >
>> > we have also been using "proxy" objects to fetch sequence data on the
>> > fly
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:Sequences
>> >
>> > As such I think we should discuss this a bit more. If we can find a
>> > common api that is simple and works with both local files as well as
>> > remote proxy objects, that would be nice. There should be no need to
>> > change much of the existing code, but perhaps there could be a
>> > simplified wrapper for what is already there.
>> >
>> >  Andreas
>> >
>> > On Wed, Mar 28, 2012 at 12:04 PM, Hannes Brandstätter-Müller
>> > <biojava at hannes.oib.com> wrote:
>> >> Hi,
>> >>
>> >> I browsed around in the sister projects Biopython and Bioperl a bit,
>> >> and noticed that many of the user interaction with the code goes
>> >> through classes like SeqIO, SearchIO, AlignIO...
>> >>
>> >> So that got me thinking: how about we create similar "Interface"
>> >> classes in Biojava?
>> >>
>> >> PROS:
>> >>
>> >>  - easy change for programmers who switch languages
>> >>  - easy base interface that can be used in cookbook examples
>> >>  - makes code more readable if designed properly
>> >>  - easy access to features that are spread over the whole codebase but
>> >> are connected anyway, like all file parsers
>> >>
>> >> CONS:
>> >>
>> >>  - another thing to maintain
>> >>  - creates possible cross-dependencies (but if you don't want that,
>> >> just use the existing classes directly)
>> >>
>> >>
>> >> What are your thoughts?
>> >>
>> >> python from http://biopython.org/wiki/SeqIO:
>> >>
>> >> from Bio import SeqIO
>> >> handle = open("example.fasta", "rU")
>> >> for record in SeqIO.parse(handle, "fasta") :
>> >>    print record.id
>> >> handle.close()
>> >>
>> >> possible equivalent in biojava (support for streaming API, Iterators,
>> >> etc?):
>> >>
>> >> import org.biojava3.util.SeqIO;
>> >>
>> >> File file = new File("example.fasta");
>> >> SeqIO seqIO = new SeqIO(file, SeqIO.FASTA);
>> >> while (seqIO.hasNext()) {
>> >>    System.out.println(seqIO.next());
>> >> }
>> >> file.close();
>> >>
>> >> Hannes
>> >> _______________________________________________
>> >> biojava-dev mailing list
>> >> biojava-dev at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>> >
>> >
>> > --
>> > -----------------------------------------------------------------------
>> > Dr. Andreas Prlic
>> > Senior Scientist, RCSB PDB Protein Data Bank
>> > University of California, San Diego
>> > (+1) 858.246.0526
>> > -----------------------------------------------------------------------
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>




More information about the biojava-dev mailing list