[BioSQL-l] Storing "per letter" annotation?

Hilmar Lapp hlapp at gmx.net
Sun May 25 15:43:26 UTC 2008


Indeed, when building a cross-product you can mix alphabets too of  
course (as compared to codons, which is DNA x DNA x DNA).

That's a nice concept - so given proper de/serialization from/into  
one flat string the present BioSQL model could hold the cross-product  
sequence already.

Do you guys have a standard notation for the alphabet in this case?  
More concretely, if you store a cross-product sequence in BioSQL,  
what do you put into the biosequence.alphabet column?

Peter - I had included integer or floating point sequences in my  
response; there is no restriction that individual symbols need to be  
representable by at most 7 or 8 bits.

	-hilmar

On May 25, 2008, at 7:55 AM, Richard Holland wrote:
> For what it's worth, BioJava allows you to define sequences as lists
> of symbols, and each symbol can contain as much info as you want. e.g.
> if you consider DNA to be an alphabet of ATCG etc., and you consider
> quality scores as an alphabet consisting of the integer numbers, then
> to construct a quality-scored sequence you use BioJava to make a
> cross-product alphabet of the two, where each symbol in the sequence
> actually consists of a pair of symbols, one from each alphabet. This
> means you can combine any number of alphabets to define complex and
> informative objects to represent each symbol in your sequence.
>
> cheers,
> Richard.
>
> 2008/5/25 Peter <biopython at maubp.freeserve.co.uk>:
>> Hilmar Wrote:
>>>> It sounds like in essence you want to store alternative  
>>>> sequences in other
>>>> alphabets for a sequence?
>>
>> Peter wrote:
>>> I hadn't thought of it like that, but for many of the examples it
>>> would just be one character per letter of sequence, so could be held
>>> as an alternative sequence.  This doesn't really extend to cover
>>> things like a list of integers or a list of floats, but would
>>> certainly cover a number of use-cases.
>>
>> Now that I know which bits of BioPerl to search for, I see there has
>> been some similar BioSQL discussion in the past, e.g.
>> http://bioperl.org/pipermail/bioperl-l/2005-July/019280.html
>>
>> Hilmar Wrote:
>>>> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
>>>> Bio::Seq::MetaI.
>>
>> I had wondered what metals had to do with sequences, in a different
>> font MetaI is of course short for MetaInformation!
>>
>> Peter
>>
>> P.S. I'll be away next week, so I probably won't follow up on this
>> topic immediately,
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the BioSQL-l mailing list