[Biojava-dev] Biojava.util package?

Michael Heuer heuermh at gmail.com
Thu Mar 29 16:39:02 UTC 2012


David Felty <davfelty at gmail.com> wrote:

> I've actually been working on something like this for my GSoC proposal,
> here's what I came up with:
>
> public class SeqIO {
>    public static final int FASTA = 0;
>    public static final int FASTQ = 1;
>    public static final Class<DNASequence> DNA = DNASequence.class;
>    public static final Class<ProteinSequence> PROTEIN =
> ProteinSequence.class;
>
>    public static <S extends Sequence> Iterable<S> parse(InputStream is,
> int fileFormat, Class<S> seqType) throws Exception {
>        switch (fileFormat) {
>            case FASTA:
>                if (seqType == DNA) {
>                    return (Iterable<S>)
> FastaReaderHelper.readFastaDNASequence(is);
>                } else if (seqType == PROTEIN) {
>                    // etc...
>                }
> break;
>            case FASTQ:
>                // etc...
>        }
>    }
> }
>
> It would be used like so:
>
> InputStream is = ...
> Iterable<DNASequence> seqs = SeqIO.parse(is, SeqIO.FASTA, SeqIO.DNA);
> for (DNASequence s : seqs) {
>   // do something
> }
>
> Obviously it's not the prettiest and a lot could be changed, but that's my
> initial design. I tried to base it off BioPython's SeqIO, but static typing
> and the variety of Sequence types forced me to put in some nasty generics.
> Any tips would be appreciated!

Hello David,

You might also want to look at the cookbook examples for FASTQ parsing

http://biojava.org/wiki/BioJava:Cookbook:SeqIO:FASTQ

http://biojava.org/wiki/BioJava:CookBook3:FASTQ

These APIs are different from the biojava-legacy FASTA parsing APIs
and from the biojava3 FASTA parsing APIs for various historical
reasons.  Perhaps there might be something good to come from pulling
the best together from all of these.

   michael




More information about the biojava-dev mailing list