[BioSQL-l] FW: SeqWithQuality and biosql
Marc Logghe
Marc.Logghe at devgen.com
Tue Jul 5 03:39:28 EDT 2005
Thanks for the feedback.
Good to know I am not alone in this ;-)
I totally agree with Mark that there should be a kind of consensus on
how to store this in Bio*.
Yesterday I mistakenly posted my original mail to the bioperl list.
Heikki responded to that; it might be a good starting point but I am not
familiar with it:
http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html
So far the long term solustion.
In short term, to have at least something that works, I'll experiment a
little with storing separate objects. I remember one of the
presentations of Hilmar, where he gave the example of making an adaptor
and storing 2 sequence objects that interacted with each other as a
result of a Two Hybrid experiment in yeast.
Cheers,
Marc
>
> I'd think storing it in BioSQL as 2-byte pairs would be good.
> First byte is the base (an ASCII character), second byte is
> the quality (an 8-bit integer). Sure it wastes a few bits but
> so does normal DNA...
>
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are
> not the intended recipient, please delete it and notify us
> immediately. Please do not copy or use it for any purpose, or
> disclose its content to any other person. Thank you.
> ---------------------------------------------
>
>
> > -----Original Message-----
> > From: biosql-l-bounces at portal.open-bio.org
> > [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of
> > mark.schreiber at novartis.com
> > Sent: Tuesday, July 05, 2005 1:44 PM
> > To: Marc Logghe
> > Cc: biosql-l-bounces at portal.open-bio.org; biosql-l at open-bio.org
> > Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql
> >
> >
> > Hello -
> >
> > I was wondering about similar issues with biojava. As you
> may (or may
> > not) know biojava can make sequences from symbols in any
> alphabet, two
> > examples are DNA and the integer alphabet (a collection of Symbols
> > that are integers). Biojava can also make compound
> alphabets, one such
> > example is the Phred alphabet which is the multiplication of DNA x
> > Integer (technically a subset of Integer from 0 to 99).
> >
> > Because sequence in BioSQL is stored in a CLOB if you can
> encode your
> > SeqWithQuality as a String of characters you can store it.
> > With the case
> > above (which is probably similar to yours) you would need 400
> > characters to store it which is too large for ASCI but
> could be done
> > in Unicode. The downside is your persitance layer needs to
> know how to
> > encode and decode your SeqWithQuality. I'm not familiar how BioPerl
> > would do this. BioJava would need to Implement a
> SymbolTokenizer for
> > the alphabet and then persistance would happen
> automatically (assuming
> > your DB is OK with Unicode). An alternative would be to make a
> > tokenizer that uses more than single character tokens for
> encoding (eg
> > A23 G40 T34 C22 etc).
> >
> > The alternative you suggest of storing two sequences with a
> > relationship is also nice (because you can retreive each part
> > seperately) but also requires your persitance layer to know
> about it.
> > However, it has big disadvantages because they are not
> strongly tied
> > to each other. If you manipulate one you might invalidate
> the other.
> > Also if you delete one the other will probably not be deleted in a
> > cascade.
> >
> > Not sure if any of this helps but a consensus on how to store this
> > kind of information would be good so the bio* projects do
> it the same
> > way.
> > Consensus in this case will probably mean whatever the first
> > implementation is.
> >
> > - Mark
> >
> >
> >
> >
> >
> > "Marc Logghe" <Marc.Logghe at devgen.com> Sent by:
> > biosql-l-bounces at portal.open-bio.org
> > 07/04/2005 05:56 PM
> >
> >
> > To: <biosql-l at open-bio.org>
> > cc: (bcc: Mark Schreiber/GP/Novartis)
> > Subject: [BioSQL-l] FW: SeqWithQuality and biosql
> >
> >
> > Apologies for cross posting, I had picked the wrong mail adress :-(
> >
> > -----Original Message-----
> > From: Marc Logghe
> > Sent: Monday, July 04, 2005 11:43 AM
> > To: bioperl-l at portal.open-bio.org
> > Subject: SeqWithQuality and biosql
> >
> > Hi all,
> > I am currently exploring the possibility to store a
> > Bio::Seq::SeqWithQuality object in biosql.
> > Has anyone ever tried this ?
> > One possibility would be to
> > 1) split up the Bio::Seq::SeqWithQuality object into a plain
> > Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
> > 2) store them separately in biosql; different namespaces
> > 3) link them with a relation term.
> > 4) make a custom adaptor to fetch the persistent objects
> from biosql
> > and reconstruct the Bio::Seq::SeqWithQuality
> >
> > Does that make sense ? Any other suggestions/possibilities ?
> > As a test I tried to load a Bio::Seq::PrimaryQual in biosql
> using the
> > load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does
> > not have a namespace method.
> > I hope I'm wrong but I have the impression there is a long
> way to go
> > ;-)
> >
> > Marc
> >
> >
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
>
More information about the BioSQL-l
mailing list