[Biojava-l] small "bug" correction in package BioSql

Richard Holland holland at ebi.ac.uk
Fri Nov 9 07:42:38 EST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I did a bit of poking around in our code and internally BioJava
represents all the default alphabet names (Protein, DNA, etc.) in upper
case. It also allows for mixed case alphabet names.

It's not quite as easy as I thought to change these to lower case as
they are often referenced by text name, meaning other people's code
might break if I change them.

Also, as it allows for mixed-case alphabet names, I can't do a
toUpper/toLower fudge on persistence to BioSQL, as I wouldn't
necessarily get out what I put in!

So, I think I'll add this as a point on the recently announced BioJava 3
proposal, that BioSQL interaction must be compliant with standards laid
down by the BioSQL project, and that our code will be able to cope with
this internally.

That brings us back to BioSQL standards - the idea of a mini-hackathon
to solve this once and for all is a very good one. Our previous attempts
between BioPerl and BioJava in Singapore were good, but still there are
niggles as seen in this thread of discussion. It seems that a schema on
it's own just isn't enough to make the various projects play nicely, and
instructions are needed on exactly how to use that schema if they are
truly all going to be able to use it without caring who or what wrote
the data that is being read.

cheers,
Richard


Hilmar Lapp wrote:
> It seems BioPerl and Biopython both want (and have traditionally used)
> lowercase - do you mind going with that for Biojava as well, or
> alternatively, simply map upon insert/update and retrieve?
> 
>     -hilmar
> 
> On Nov 8, 2007, at 11:18 AM, Richard Holland wrote:
> 
> we do need a consensus here.
> 
> I'm happy to go with whatever value is chosen, as the BioJava code can
> easily be modified to suit.
> 
> cheers,
> Richard
> 
> Hilmar Lapp wrote:
>>>> Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we
>>>> explicitly lowercase the value found for alphabet, and the comment
>>>> says why:
>>>>
>>>>          # Note: Biojava uses upper-case terms for alphabet, so we
>>>>          # need to change to all-lower in case the sequence was
>>>>          # manipulated by Biojava.
>>>>          $obj->alphabet(lc($rows->[3])) if $rows->[3];
>>>>
>>>> However, when inserting sequences, we leave the value as is in
>>>> BioPerl (which is lowercase), leading to a potential problem for
>>>> Biojava upon retrieval. Do the Biojava folks deal with that? Should
>>>> this may harmonized across the board?
>>>>
>>>>     -hilmar
>>>>
>>>> On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote:
>>>>
>>>>> Dear Peter,
>>>>>
>>>>> All the alphabet are "DNA" (upper case) in my database. The
>>>>> sequences are taken from NCBI by a BioJava application.
>>>>> Thus is should be that BioJava inserts the records with "DNA". Thus
>>>>> no potential "hidden bug" in BioPython.
>>>>>
>>>>> Maybe a point to share with the Open-Bio committee.
>>>>>
>>>>> Eric
>>>>>
>>>>> ----- Message d'origine ----
>>>>> De : Peter <biopython at maubp.freeserve.co.uk>
>>>>> À : Eric Gibert <ericgibert at yahoo.fr>
>>>>> Cc : biopython at lists.open-bio.org
>>>>> Envoyé le : Jeudi, 8 Novembre 2007, 19h40mn 00s
>>>>> Objet : Re: [BioPython] small "bug" correction in package BioSql
>>>>>
>>>>> Eric Gibert wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the
>>>>>> function:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> please note my correction: force moltype to be turn in lower case as
>>>>>> my database has upper case value! this raises the "Unknown moltype"
>>>>>> error.
>>>>> Hi Eric, I've made your suggested change in CVS,
>>>>> biopython/BioSQL/BioSeq.py revision 1.13, thank you.
>>>>>
>>>>> I would encourage you to investigate why some of the "alphabet" fields
>>>>> in the biosequence table are in upper case.  There could be a bug
>>>>> elsewhere which is writing these entries with the wrong alphabet.  Is
>>>>> this affecting all entries, or just some?
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ______________________________________________________________________
>>>>> _______
>>>>> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers
>>>>> Yahoo! Mail
>>>>> _______________________________________________
>>>>> BioPython mailing list  -  BioPython at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>>>

> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================






-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHNFW84C5LeMEKA/QRApBiAJ41WqCDKOJhee5NxIsquYaR/ImBRgCfb7zM
LX75HHvCUC/v4n3okmUQ+ME=
=d6QO
-----END PGP SIGNATURE-----


More information about the Biojava-l mailing list