[Biojava-dev] Biojava.util package?

Andreas Prlic andreas at sdsc.edu
Thu Mar 29 14:39:39 UTC 2012


Hi David,

so far it still feels like a wrapper for what is already there. Try to
take it to the next level. Why does the user still need to provide the
type of file, can't this be auto-detected? What is the behaviour for
non-fasta files, what can be supported and where are the limits, etc.

Andreas

On Thu, Mar 29, 2012 at 6:55 AM, David Felty <davfelty at gmail.com> wrote:
> I've actually been working on something like this for my GSoC proposal,
> here's what I came up with:
>
> public class SeqIO {
>    public static final int FASTA = 0;
>    public static final int FASTQ = 1;
>    public static final Class<DNASequence> DNA = DNASequence.class;
>    public static final Class<ProteinSequence> PROTEIN =
> ProteinSequence.class;
>
>    public static <S extends Sequence> Iterable<S> parse(InputStream is,
> int fileFormat, Class<S> seqType) throws Exception {
>        switch (fileFormat) {
>            case FASTA:
>                if (seqType == DNA) {
>                    return (Iterable<S>)
> FastaReaderHelper.readFastaDNASequence(is);
>                } else if (seqType == PROTEIN) {
>                    // etc...
>                }
> break;
>            case FASTQ:
>                // etc...
>        }
>    }
> }
>
> It would be used like so:
>
> InputStream is = ...
> Iterable<DNASequence> seqs = SeqIO.parse(is, SeqIO.FASTA, SeqIO.DNA);
> for (DNASequence s : seqs) {
>   // do something
> }
>
> Obviously it's not the prettiest and a lot could be changed, but that's my
> initial design. I tried to base it off BioPython's SeqIO, but static typing
> and the variety of Sequence types forced me to put in some nasty generics.
> Any tips would be appreciated!
>
> David
>
> On Thu, Mar 29, 2012 at 4:27 AM, Hannes Brandstätter-Müller <
> biojava at hannes.oib.com> wrote:
>
>> Yes, something like a simplifying and unifying wrapper would be what I
>> am thinking of.
>>
>> Hannes
>>
>> On Thu, Mar 29, 2012 at 05:55, Andreas Prlic <andreas at sdsc.edu> wrote:
>> > Hi Hannes,
>> >
>> > I guess this is pretty similar to:
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:FastaReadWrite
>> >
>> > we have also been using "proxy" objects to fetch sequence data on the fly
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:Sequences
>> >
>> > As such I think we should discuss this a bit more. If we can find a
>> > common api that is simple and works with both local files as well as
>> > remote proxy objects, that would be nice. There should be no need to
>> > change much of the existing code, but perhaps there could be a
>> > simplified wrapper for what is already there.
>> >
>> >  Andreas
>> >
>> > On Wed, Mar 28, 2012 at 12:04 PM, Hannes Brandstätter-Müller
>> > <biojava at hannes.oib.com> wrote:
>> >> Hi,
>> >>
>> >> I browsed around in the sister projects Biopython and Bioperl a bit,
>> >> and noticed that many of the user interaction with the code goes
>> >> through classes like SeqIO, SearchIO, AlignIO...
>> >>
>> >> So that got me thinking: how about we create similar "Interface"
>> >> classes in Biojava?
>> >>
>> >> PROS:
>> >>
>> >>  - easy change for programmers who switch languages
>> >>  - easy base interface that can be used in cookbook examples
>> >>  - makes code more readable if designed properly
>> >>  - easy access to features that are spread over the whole codebase but
>> >> are connected anyway, like all file parsers
>> >>
>> >> CONS:
>> >>
>> >>  - another thing to maintain
>> >>  - creates possible cross-dependencies (but if you don't want that,
>> >> just use the existing classes directly)
>> >>
>> >>
>> >> What are your thoughts?
>> >>
>> >> python from http://biopython.org/wiki/SeqIO:
>> >>
>> >> from Bio import SeqIO
>> >> handle = open("example.fasta", "rU")
>> >> for record in SeqIO.parse(handle, "fasta") :
>> >>    print record.id
>> >> handle.close()
>> >>
>> >> possible equivalent in biojava (support for streaming API, Iterators,
>> etc?):
>> >>
>> >> import org.biojava3.util.SeqIO;
>> >>
>> >> File file = new File("example.fasta");
>> >> SeqIO seqIO = new SeqIO(file, SeqIO.FASTA);
>> >> while (seqIO.hasNext()) {
>> >>    System.out.println(seqIO.next());
>> >> }
>> >> file.close();
>> >>
>> >> Hannes
>> >> _______________________________________________
>> >> biojava-dev mailing list
>> >> biojava-dev at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>> >
>> >
>> > --
>> > -----------------------------------------------------------------------
>> > Dr. Andreas Prlic
>> > Senior Scientist, RCSB PDB Protein Data Bank
>> > University of California, San Diego
>> > (+1) 858.246.0526
>> > -----------------------------------------------------------------------
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev




More information about the biojava-dev mailing list