[BioSQL-l] Storing "per letter" annotation?

Richard Holland dicknetherlands at gmail.com
Sun May 25 19:21:36 UTC 2008


The notation is to enclose each symbol in square brackets, with comma
separate sub-symbols, each of which can be square bracketed itself if
it consists of further sub-divisions. The alphabet name is then just
the name of the alphabet object in BioJava - you'd have to decode this
externally to the db, unless you introduce some kind of fixed format
description for the alphabet to store in the name field. We've never
tried storing this kind of sequence to my knowledge. Something to try
in future!

Richard.


2008/5/25 Hilmar Lapp <hlapp at gmx.net>:
> Indeed, when building a cross-product you can mix alphabets too of course
> (as compared to codons, which is DNA x DNA x DNA).
>
> That's a nice concept - so given proper de/serialization from/into one flat
> string the present BioSQL model could hold the cross-product sequence
> already.
>
> Do you guys have a standard notation for the alphabet in this case? More
> concretely, if you store a cross-product sequence in BioSQL, what do you put
> into the biosequence.alphabet column?
>
> Peter - I had included integer or floating point sequences in my response;
> there is no restriction that individual symbols need to be representable by
> at most 7 or 8 bits.
>
>        -hilmar
>
> On May 25, 2008, at 7:55 AM, Richard Holland wrote:
>>
>> For what it's worth, BioJava allows you to define sequences as lists
>> of symbols, and each symbol can contain as much info as you want. e.g.
>> if you consider DNA to be an alphabet of ATCG etc., and you consider
>> quality scores as an alphabet consisting of the integer numbers, then
>> to construct a quality-scored sequence you use BioJava to make a
>> cross-product alphabet of the two, where each symbol in the sequence
>> actually consists of a pair of symbols, one from each alphabet. This
>> means you can combine any number of alphabets to define complex and
>> informative objects to represent each symbol in your sequence.
>>
>> cheers,
>> Richard.
>>
>> 2008/5/25 Peter <biopython at maubp.freeserve.co.uk>:
>>>
>>> Hilmar Wrote:
>>>>>
>>>>> It sounds like in essence you want to store alternative sequences in
>>>>> other
>>>>> alphabets for a sequence?
>>>
>>> Peter wrote:
>>>>
>>>> I hadn't thought of it like that, but for many of the examples it
>>>> would just be one character per letter of sequence, so could be held
>>>> as an alternative sequence.  This doesn't really extend to cover
>>>> things like a list of integers or a list of floats, but would
>>>> certainly cover a number of use-cases.
>>>
>>> Now that I know which bits of BioPerl to search for, I see there has
>>> been some similar BioSQL discussion in the past, e.g.
>>> http://bioperl.org/pipermail/bioperl-l/2005-July/019280.html
>>>
>>> Hilmar Wrote:
>>>>>
>>>>> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
>>>>> Bio::Seq::MetaI.
>>>
>>> I had wondered what metals had to do with sequences, in a different
>>> font MetaI is of course short for MetaInformation!
>>>
>>> Peter
>>>
>>> P.S. I'll be away next week, so I probably won't follow up on this
>>> topic immediately,
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>



More information about the BioSQL-l mailing list