<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Peter ha scritto:

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap="">Hi Andrea (and everyone else),

This is a continuation of a discussion started on Bug 2883. Andrea had

a problem with unpickling SeqRecord objects which were pickled using

an older version of Biopython. She was using pickle to store complicated

annotated SeqRecord objects on disk.

See <a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883">http://bugzilla.open-bio.org/show_bug.cgi?id=2883</a> for details.

<a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c6">http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c6</a>

On Bug 2883 comment 6, Peter wrote:

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">If your SeqRecord objects are all simply loaded from sequence files in

the first place (and not modified), I would just keep the original file and

re-parse it.

If you have generated your own SeqRecords (or modified those from

reading a file), then it makes sense to save them somehow. The choice

of file format depends on the nature of annotation. The latest Biopython

will now record the features in a GenBank file, making that a reasonable

choice - but this does not cover per-letter-annotations. BioSQL has the

same limitation.

      </pre>

    </blockquote>

  </blockquote>

  <pre wrap=""><!---->

<a class="moz-txt-link-freetext" href="http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c7">http://bugzilla.open-bio.org/show_bug.cgi?id=2883#c7</a>

On Bug 2883 comment 7, Andrea wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">yes, i'm testing some predictors. I do prediction and i compare the

"newly predicted seqrecords" with the "previously correct predicted

pickled seqrecords".

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Sorry - when you said "test code" on the Bug discussion, I though you

meant you were testing the code - not that this was real work doing

biological tests.

  </pre>

</blockquote>

<tt>To be precise i'm really testing code, my code. My predictors are

implemented in python<br>

and to be shure that during time, bug fixes, modifications.. i won't

alter the prediction <br>

results, i build some unittest to compare the results of the modified

code with the results<br>

of the old code.</tt><br>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap="">

If you have SeqFeatures and SeqRecords with simple string based

annotation, then BioSQL should be fine.

  </pre>

</blockquote>

<tt>According to me, for unittesting purposes, using Biosql for storing

data is quite expensive&nbsp; in term of code <br>

(or it seems so...), despite the fact, actually, BioSQL is for sure

fine for storing&nbsp; my annotations and <br>

features.<br>

</tt>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap="">

If you have SeqFeatures, then using GenBank output might be enough.

There are no general fields in the GenBank format for arbitary

annotation though.

  </pre>

</blockquote>

<tt>Yes, i think that GenBank wont store my "peronal annotations" (or

i've to check it).</tt><br>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap=""></pre>

  <blockquote type="cite">

    <pre wrap="">Actually i don't use per-letter-annotation despite the fact it seems

interesting. But i didn't find any example in documentation (that

show how the dictionary is populated...) so i really don't know

how to use it.... even if i've, during prediction, a "per position

annotation".

    </pre>

  </blockquote>

  <pre wrap=""><!---->

You are right that the SeqRecord chapter in the Tutorial doesn't

explicitly cover populating the per-letter-annotation. I can fix that...

However, the built in documentation covers this (e.g. the section

on slicing a SeqRecord to get a sub-record):

  </pre>

</blockquote>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <blockquote type="cite">

      <blockquote type="cite">

        <pre wrap="">from Bio.SeqRecord import SeqRecord

help(SeqRecord)

        </pre>

      </blockquote>

    </blockquote>

  </blockquote>

  <pre wrap="">...

You can read this online:

<a class="moz-txt-link-freetext" href="http://www.biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html">http://www.biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html</a>

  </pre>

</blockquote>

<tt>Very interesting and easy to use. I can either use it for:<br>

&nbsp; - storing per position string representing the "per position label"

of the prediction<br>

&nbsp; - storing list of per position reliabilities (raliability of

prediction)<br>

&nbsp; - storing sequence variant<br>

&nbsp; - storing possible aligned sequence <br>

But it's a pity that this is not yet managed in BioSQL ....</tt><br>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap=""></pre>

  <blockquote type="cite">

    <pre wrap="">Also if the "per letter annotation" is not managed in the GenBank

format or in the BioSQL format (that i use a lot) i've to wait!!

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Currently the BioSQL schema doesn't have any explicit support

for "per letter annotation", but we could encode it as a string

(e.g. using XML or JSON) perhaps. This will require coordination

with BioSQL, BioPerl etc - and thus far no one has expressed a

strong need for this.

  </pre>

</blockquote>

<tt>I could say that i will use it, if it will work in biosql... but

until <br>

there won't be the&nbsp; possibility to store this information (BioSQL,

GenBank...) <br>

i think the "per letter annotation" will lose part of its "charme"....<br>

<br>

</tt>

<blockquote

 cite="mid:320fb6e00907230320r49809329p620f3d1d4a39fb36@mail.gmail.com"

 type="cite">

  <pre wrap="">

The GenBank file format simply doesn't have an concept of "per

letter annotation". The PFAM/Stockholm alignment format does

(for the special case of a single character per letter of the

sequence), and in sequencing the base quality is also held in

some file formats.

  </pre>

  <blockquote type="cite">

    <pre wrap="">I was thinking also to store the pssm information somewhere in the

seqrecord.... but this would be a very big change... (and also

manage to store it in BioSQL.... )... but it's better to stop

the discussion here or to move it... :-)

    </pre>

  </blockquote>

  <pre wrap=""><!---->

You can record any object in the SeqRecord's annotation dictionary.

However, saving the result to a file will be tricky - and it wouldn't

work in BioSQL either.

  </pre>

</blockquote>

<tt>If you store the PSSM into the annotations in a particular way..

you could also populate <br>

the BioSQL database... but filling the bioentry_qualifier_value table

with these information<br>

is not the right job.<br>

I'm considering the letter_annotation as a better place for pssm data.

Imagine to have <br>

a dictionary where:<br>

&nbsp; - each key is one of the aminoacid of the alphabet you choose<br>

&nbsp; - and each value is a list of per position frequencies of that

aminoacid along the sequence<br>

you have stored the pssm.<br>

If there will be a position for the letter_annotion dictionary in

BioSQL, according to me it<br>

will be useful for these and many other purpose... and i will use it,

for sure ;-)<br>

<br>

<br>

Thanks<br>

Andrea<br>

<br>

<br>

</tt><br>

</body>

</html>