[BioSQL-l] Storing "per letter" annotation?

Mark Schreiber markjschreiber at gmail.com
Mon May 26 02:30:31 UTC 2008


Hi -

BioJava has Interfaces for tokenizers which can be String or
character. Most cross product alphabets don't have official character
tokenizers however there is no reason why they can't.  I think the
only reason why they don't as yet is that there is no official IUB
type nomenclature, however he who codes it wins!

Because this is an issue that mostly affects BioSQL serialization it
would be nice if when people develop these they could become official
BioSQL standards so that Bio* et al all speak the same language.

- Mark

On 5/25/08, Hilmar Lapp <hlapp at gmx.net> wrote:
> Indeed, when building a cross-product you can mix alphabets too of course
> (as compared to codons, which is DNA x DNA x DNA).
>
> That's a nice concept - so given proper de/serialization from/into one flat
> string the present BioSQL model could hold the cross-product sequence
> already.
>
> Do you guys have a standard notation for the alphabet in this case? More
> concretely, if you store a cross-product sequence in BioSQL, what do you put
> into the biosequence.alphabet column?
>
> Peter - I had included integer or floating point sequences in my response;
> there is no restriction that individual symbols need to be representable by
> at most 7 or 8 bits.
>
>        -hilmar
>
>
> On May 25, 2008, at 7:55 AM, Richard Holland wrote:
> > For what it's worth, BioJava allows you to define sequences as lists
> > of symbols, and each symbol can contain as much info as you want. e.g.
> > if you consider DNA to be an alphabet of ATCG etc., and you consider
> > quality scores as an alphabet consisting of the integer numbers, then
> > to construct a quality-scored sequence you use BioJava to make a
> > cross-product alphabet of the two, where each symbol in the sequence
> > actually consists of a pair of symbols, one from each alphabet. This
> > means you can combine any number of alphabets to define complex and
> > informative objects to represent each symbol in your sequence.
> >
> > cheers,
> > Richard.
> >
> > 2008/5/25 Peter <biopython at maubp.freeserve.co.uk>:
> >
> > > Hilmar Wrote:
> > >
> > > >
> > > > > It sounds like in essence you want to store alternative sequences in
> other
> > > > > alphabets for a sequence?
> > > > >
> > > >
> > >
> > > Peter wrote:
> > >
> > > > I hadn't thought of it like that, but for many of the examples it
> > > > would just be one character per letter of sequence, so could be held
> > > > as an alternative sequence.  This doesn't really extend to cover
> > > > things like a list of integers or a list of floats, but would
> > > > certainly cover a number of use-cases.
> > > >
> > >
> > > Now that I know which bits of BioPerl to search for, I see there has
> > > been some similar BioSQL discussion in the past, e.g.
> > >
> http://bioperl.org/pipermail/bioperl-l/2005-July/019280.html
> > >
> > > Hilmar Wrote:
> > >
> > > >
> > > > > In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
> > > > > Bio::Seq::MetaI.
> > > > >
> > > >
> > >
> > > I had wondered what metals had to do with sequences, in a different
> > > font MetaI is of course short for MetaInformation!
> > >
> > > Peter
> > >
> > > P.S. I'll be away next week, so I probably won't follow up on this
> > > topic immediately,
> > > _______________________________________________
> > > BioSQL-l mailing list
> > > BioSQL-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biosql-l
> > >
> > >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>



More information about the BioSQL-l mailing list