<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Peter ha scritto:
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap="">Hi Andrea (and everyone else),
This is a continuation of a discussion started on Bug 2883. Andrea had
a problem with unpickling SeqRecord objects which were pickled using
an older version of Biopython. She was using pickle to store complicated
annotated SeqRecord objects on disk.
See <a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883">http://bugzilla.open-bio.org/show_bug.cgi?id=2883</a> for details.
<a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c6">http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c6</a>
On Bug 2883 comment 6, Peter wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">If your SeqRecord objects are all simply loaded from sequence files in
the first place (and not modified), I would just keep the original file and
re-parse it.
If you have generated your own SeqRecords (or modified those from
reading a file), then it makes sense to save them somehow. The choice
of file format depends on the nature of annotation. The latest Biopython
will now record the features in a GenBank file, making that a reasonable
choice - but this does not cover per-letter-annotations. BioSQL has the
same limitation.
</pre>
</blockquote>
</blockquote>
<pre wrap=""><!---->
<a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c7">http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c7</a>
On Bug 2883 comment 7, Andrea wrote:
</pre>
<blockquote type="cite">
<pre wrap="">yes, i'm testing some predictors. I do prediction and i compare the
"newly predicted seqrecords" with the "previously correct predicted
pickled seqrecords".
</pre>
</blockquote>
<pre wrap=""><!---->
Sorry - when you said "test code" on the Bug discussion, I though you
meant you were testing the code - not that this was real work doing
biological tests.
</pre>
</blockquote>
<tt>To be precise i'm really testing code, my code. My predictors are
implemented in python<br>
and to be shure that during time, bug fixes, modifications.. i won't
alter the prediction <br>
results, i build some unittest to compare the results of the modified
code with the results<br>
of the old code.</tt><br>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap="">
If you have SeqFeatures and SeqRecords with simple string based
annotation, then BioSQL should be fine.
</pre>
</blockquote>
<tt>According to me, for unittesting purposes, using Biosql for storing
data is quite expensive in term of code <br>
(or it seems so...), despite the fact, actually, BioSQL is for sure
fine for storing my annotations and <br>
features.<br>
</tt>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap="">
If you have SeqFeatures, then using GenBank output might be enough.
There are no general fields in the GenBank format for arbitary
annotation though.
</pre>
</blockquote>
<tt>Yes, i think that GenBank wont store my "peronal annotations" (or
i've to check it).</tt><br>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap=""></pre>
<blockquote type="cite">
<pre wrap="">Actually i don't use per-letter-annotation despite the fact it seems
interesting. But i didn't find any example in documentation (that
show how the dictionary is populated...) so i really don't know
how to use it.... even if i've, during prediction, a "per position
annotation".
</pre>
</blockquote>
<pre wrap=""><!---->
You are right that the SeqRecord chapter in the Tutorial doesn't
explicitly cover populating the per-letter-annotation. I can fix that...
However, the built in documentation covers this (e.g. the section
on slicing a SeqRecord to get a sub-record):
</pre>
</blockquote>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">from Bio.SeqRecord import SeqRecord
help(SeqRecord)
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre wrap="">...
You can read this online:
<a class="moz-txt-link-freetext" href="http://www.biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html">http://www.biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html</a>
</pre>
</blockquote>
<tt>Very interesting and easy to use. I can either use it for:<br>
- storing per position string representing the "per position label"
of the prediction<br>
- storing list of per position reliabilities (raliability of
prediction)<br>
- storing sequence variant<br>
- storing possible aligned sequence <br>
But it's a pity that this is not yet managed in BioSQL ....</tt><br>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap=""></pre>
<blockquote type="cite">
<pre wrap="">Also if the "per letter annotation" is not managed in the GenBank
format or in the BioSQL format (that i use a lot) i've to wait!!
</pre>
</blockquote>
<pre wrap=""><!---->
Currently the BioSQL schema doesn't have any explicit support
for "per letter annotation", but we could encode it as a string
(e.g. using XML or JSON) perhaps. This will require coordination
with BioSQL, BioPerl etc - and thus far no one has expressed a
strong need for this.
</pre>
</blockquote>
<tt>I could say that i will use it, if it will work in biosql... but
until <br>
there won't be the possibility to store this information (BioSQL,
GenBank...) <br>
i think the "per letter annotation" will lose part of its "charme"....<br>
<br>
</tt>
<blockquote
cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"
type="cite">
<pre wrap="">
The GenBank file format simply doesn't have an concept of "per
letter annotation". The PFAM/Stockholm alignment format does
(for the special case of a single character per letter of the
sequence), and in sequencing the base quality is also held in
some file formats.
</pre>
<blockquote type="cite">
<pre wrap="">I was thinking also to store the pssm information somewhere in the
seqrecord.... but this would be a very big change... (and also
manage to store it in BioSQL.... )... but it's better to stop
the discussion here or to move it... :-)
</pre>
</blockquote>
<pre wrap=""><!---->
You can record any object in the SeqRecord's annotation dictionary.
However, saving the result to a file will be tricky - and it wouldn't
work in BioSQL either.
</pre>
</blockquote>
<tt>If you store the PSSM into the annotations in a particular way..
you could also populate <br>
the BioSQL database... but filling the bioentry_qualifier_value table
with these information<br>
is not the right job.<br>
I'm considering the letter_annotation as a better place for pssm data.
Imagine to have <br>
a dictionary where:<br>
- each key is one of the aminoacid of the alphabet you choose<br>
- and each value is a list of per position frequencies of that
aminoacid along the sequence<br>
you have stored the pssm.<br>
If there will be a position for the letter_annotion dictionary in
BioSQL, according to me it<br>
will be useful for these and many other purpose... and i will use it,
for sure ;-)<br>
<br>
<br>
Thanks<br>
Andrea<br>
<br>
<br>
</tt><br>
</body>
</html>