[Biopython-dev] BOSC abstract for Biopython

Tue May 6 13:22:08 EDT 2003

Hey all;
Since many Biopython developers are facing the monetary difficulties 
of making it to Australia for this years Bioinformatics Open Source 
Conference (BOSC), I've been honored again this year with the 
privilege of giving a Biopython talk. As always, I am but a humble 
servant of the Biopython community and want to represent you fine 
folks as best I can (so I'll try not to show up reeking like cheap 
gin (although, no promises :-)).

I've gotten together an abstract for my talk which I've pasted
below. Basically, I'm planning to take the style of stepping through
a problem and showing solutions to it using Biopython, which seems
to have been a decent technique in talks I've given on Biopython in
the past (something useful for people plus the basics of Biopython
thrown in throughout).

I'd appreciate any feedback, good or bad, on it before I submit it
to the fine BOSC organizers. I'm already behind on this (my writtens
are coming up in two weeks -- man, does my life suck right now) so
I'm hoping to get it in as soon as I can. Thanks in advance for any
comments!

Brad

Using Biopython for Laboratory Analysis Pipelines

The Biopython project is distributed collaborative effort to develop Python
libraries to address the needs of researchers doing bioinformatics work.
Python is an interpreted object-oriented programming language that we feel is
well suited for both beginning and advanced computational researchers. Biopython 
has been around since 1999, and has a number of active contributors and 
users who continue its regular development.

One major problem in bioinformatics work is developing analysis pipelines
which combine data from a number of different sources. Advanced
scientific questions will require information from many disparate sources
such as web pages, flat text files and relational databases. Additionally,
these sources of information will often be found in different, non-compatible
formats. The challenge of many researchers and software developers is to
organize this information so that it can be readily queried and examined.
This problem is made even more difficult by the varied and rapidly changing
interests of scientists who want to ask questions with the data.

Rather then trying to build specific applications to address these data
manipulation problems, Biopython has focused on developing library
functionality to manipulate various data sources. This frees a researcher from
having to deal with low level details of parsing and data acquistion, helping
to abstract the process of data conversion. Additionally, since the lower level
data manipulation code is shared amongst multiple researchers, data format
changes or problems with the code are more readily identified and fixed.

This talk will focus on using the Biopython libraries in developing analysis
pipelines for scientific research. In addition to demonstrating the uses of
Biopython, this will highlight some areas where Biopython offers unique
solutions to data manipulation problems. We will identify some of the common
challenges the libraries have to deal with, such as attempts to standardize
output from multiple programs that perform similar function, and describe our
attempts to deal with these difficulties. This will provide a foundation for both
understanding the Biopython libraries and the development process underlying
them.