[BioPython] bioperl idl

Mon, 20 Sep 1999 09:27:01 +0100 (BST)

On Sun, 19 Sep 1999, Andrew Dalke wrote:

> [Ewan's IDL]
> 
> Let my start by asking a dumb question.  What does strand type (as in
> + or -) mean?  Or do you have a URL for me (a protein person by
> training) to read?

Locations on DNA sequence can be on either strand or on 'both' strands.
(hence either '+','-' or '.' or - my preferred notation, 1,-1,0).

Other question - I prefer knuth/C style numbering system, but the rest
of bioinformatics has standardised on "biological" numbering schemes
(ie, 1..3 is a 3 long string, starting at the first place). Annoying or
what.

If I were you I would stick to that. It just confuses people converting
from biology otherwise.

> 
> 
> You have a type() method to get the sequence type.  What about
> picking another name?  Python defines a "type()" function, which
> is not a reserved word and will never have a conflict with the
> method name.

actually in bioperl I am using moltype. (molecular type).

> You have a "Comment" string data type which is
> 
> >  // just a list of strings.
> >  interface Comment : ReleaseableObject {
> >    sequence <string> comments;
> >    bool is_html;
> >  }
> 
> This doesn't feel right, for a couple of reasons.  Why is it a list of
> strings, rather than a single string separated by newlines?  Is it the
> cross-platform newline issue?  I would rather have things as a single
> string (with a defined "\n", "\r" or "\r\n") than lots of strings.
> 
> Also, the is_html isn't very nice.  What about support for other
> formats?  This is exactly the problem that MIME solves, so you could
> have
> 
>   interface Comment : ReleaseableObject {
>     string mime_type;
>     string comment;
>   }

I think this is pretty good. I was thinking of something similar. It is
just that Comments in most bioinformatics databases are best represented
as a list of strings, each string a line. This is the natural way to
process them. Is 

sequence <string> comments
string mime_type

so bad?

> 
> URL stands for Uniform Resource Location (or "Universal" if you've
> been around long enough :).  URN stands for Uniform Resource Names
> (RFC 2141 at http://www.cis.ohio-state.edu/htbin/rfc/rfc2141.html).
> 

Right. I know them. I am just representing what is common usage in
bioinformatics. The dbxref is like URNs except that the first and
second part are officially split apart. But one could merge them.

I'm just not trying to be super clever. Perhaps one should have

DBxREF {
  string URN;
  string database;
  string primary_key;
};

> 
> You can't limit yourself to medline, since some places have references
> not indexed by medline, for example, internal documentation.  All it
> needs to be is the name which can be resolved to get the right sort of
> information.

Well. This could be done in the dbxref. It subclasses dbxref, and database
does not have to be medline.

-----------------------------------------------------------------
Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/
-----------------------------------------------------------------