[BioPython] BOSC 2001 [Bioinformatics Open Source Conference]

Andrew Dalke dalke@acm.org
Fri, 2 Feb 2001 18:54:56 -0700


I'm hijacking Ewan's thread name, but this is only sent
to the biopython mailing list.

I sent mail to bosc@bubbles saying that I was nominated
as the biopython representative.

I also sent some ideas on what I would like in a BOSC,
but I sent them as my ideas and without input from what
others here would like.

Here are my comments:
 =================
I would like a smaller and more technical conference.
Last year's was more of a "Hello, we're X and we do Y"
but as someone who follows most of the mailing lists
I must admit it wasn't that new or thrilling.

What I don't do is follow the lists deeply enough to
know the details of the major topics.  For example, I
know there's a bunch of things going on in bioperl on
fuzzy sequence locations.  Since I don't work with them,
I don't know the scenarios for which they are needed.
(The wiki and NCBI docs don't describe use cases for
them.)

So I would like to see a talk that explains what they
are, how they are used, and the model used to handle
those cases or why it was decided to ignore certain
needs.

Other topics I would like to hear about are:
  - type-safe alphabets for sequences.  Bioperl uses only
     only three types while biopjava and biopython allow
     more type safety.
  - underlying type models: Eg, describe how the biopython
     sequence object works
  - URN/URIs: use for naming schemes for the different
     database records.
  - software techniques: how useful is XP, UML, etc. for
     existing projects
  - just why are XML/XSLT/XLINK/X* hot?  What do they do?
  - parsing (you know I had to mention that!): could describe
      biojava's and biopython's event driven parsers.
  - component-ware: anything being done to work with the
      GNOME CORBA interface?  What about Mozilla's XPCOM?
  - PostgreSQL: it, like data cartridges and data blades,
      allows integration of new data types in the server.
      Does anyone use this feature for bioinformatics?
  - Results of user testing for open source projects:
      ease of installation; understanding APIs, docs, GUIs;
      what can be changed to improve matters
  - developing use case scenarios to guide new development
  - blue sky: what can the bio* projects do to be revolutionary

As you can see, quite technical.

I also have a personal problem with the size of last year's
BOSC.  With that many people I become inhibited because
I know I delve into technical matters that most people there
wouldn't want to hear.

For example, I wanted to ask why DAS chose a new and hybrid
protocol using HTTP GET parameters and XML return values
when things like XML-RPC (and its newer descendent SOAP)
exist, allow easily extensible parameters with typed
values, and use XML for both send and return (so no problems
with "Request-URI Too Long" errors as RFC 2616 warns about
at 255 bytes or the emprical problems at around 10K when
the CGI environment variable space fills up.)

Again, quite technical and not something I wanted to
ask in a crowd of >100 people.  But asking one-on-one
also isn't that useful since there aren't enough other
people to provide feedback and insight.

I would also have loved access to some of the technical
papers or source code while there.  For example, I'm
currently looking at the DAS code again and I can see
there are problems with denial-of-service attacks against
the machine by using regexps with exponentially bad
run-time behaviour.  I wonder if they know about it.

As you may know, I've been testing different parsers.
I worked with Matthew Pocock so he could provide me some
Java parsers.  That interaction was nice, but as he's
about 8 time zones away it was cumbersome and harder
to ask questions.  Having face time between coders would
be excellent.

 =================

If anyone here has additional or different comments,
please let me know.

(Oh, and I just read through the DAS lists.  They are
considering using SOAP.)

                    Andrew
                    dalke@acm.org