[Bioperl-l] BOSC 2001 bioperl report

Jason Stajich jason@chg.mc.duke.edu
Mon, 23 Jul 2001 08:15:21 -0400 (EDT)


Here are some minutes and a report of stated goals for bioperl with regard
to bioperl 1.0 release.

Report from Bioperl Developer meeting at BOSC 2001

  * 0.9.x developer release series will begin releases in about 3
    weeks.  These will be non-stable release where all tests pass, but
    known bugs may be included.  These APIs in the 0.9.x series are
    not considered stable and users must accept this risk when
    developing applications with this series.  The advantage of these
    releases over pure CVS checkouts is an FTP tarball and guarantee
    that all tests pass.

  * Preparations for the 1.0 release are underway with expected
    release date late part of Q4.

    Below is the checklist for 1.0 - bioperl members currently
    responsible for an item are listed in parentheses, if no one is
    listed we need someone to volunteer.

    o Alignment objects - Abstraction of interface for Alignments
      based on SimpleAlign, and removal of UnivAln object.
      (Heikki Lehvaslaiho)
   
    o AnnotationI - An interface for describing general purpose annotations
      (Ewan Birney)

      Perhaps building Bibliography Reference objects based on work by
      Martin Senger and interface with his BQS ideas.
   
    o ApplicationFactoryI - Interface for running applications from
      within bioperl.  We need an abstract definition for running
      applications (basing on Novella, openBSA).  The intention is to
      define this in such a way that applications can be summarized by
      metadata and not require the creation of 1 class per application
      ala StandAloneBlast,TCoffee, Clustal.

      First effort will be to try and inteface with EMBOSS package.
      Additional efforts to interface with Phrap, phred, consed and
      rewrap blast, netblast, clustal, and tcoffee interfaces.
      Heikki has already gotten started on the EMBOSS interface and it
      looks very promising.

      (David Block to propose interface)

    o Assembly tools - handling Consed, phred parsing and phrap
      interfaces working with ideas and code proposed by Chad Masala
      and connecting to (new) BioCORBA Comparison objects.
      (Chad Masala to help start)

    o Sequence Parsers (SeqIO) - We do not plan to do many changes to
      the parsing to insure a stable release.  However one idea
      proposed by Hilmar Lapp is to eliminate the hardcoded nature of
      Sequence and Feature creation of the SeqIO system.  Instead of
      the harcoded object name in the FTHelper code to create
      Bio::SeqFeature::Generic objects a SeqFeatureFactory object can
      be passed in to the SeqIO parser on initialization to set where
      seqfeatures are created from.  Similarly a SeqFactory should
      also be defined to be used to create empty sequence objects
      which are initialized by the parser code instead of the
      hardcoded creation of Bio::Seq objects in all the SeqIO parser
      modules.
      (?)
 
    o Semantic Feature interpreter factory which will take a tree of
      features and output a tree of features, interpreting them and
      creating groupings and new SeqFeatures where appropriate based
      on the feature tags.  The best example of this is to interpret
      the primary tags in a tree of SeqFeatures and build gene objects
      if one finds CDS, exon, mRNA tags. 
      (Team PBI Saskatoon - David Block and Mark Wilkinson)   
 
    o Expression Data - some bioperl members have mentioned they have
      objects for expression data which will make their way into
      bioperl core.

    o Evaluate that SeqFeature::GeneStructure object is complete. 
      (Hilmar Lapp and Mark Wilkinson)

    o FASTA analysis parser - building similarity pairs and complying
      to the SeqAnalysisParserI interface.
      (Dyfed)
 
    o Maps & Markers - handling marker maps, connecting markers with
      Variation package, and representing these in a database.
      (Heikki Lehvaslaiho, Jason Stajich, Lincoln Stein)

    o Pedigree data - managing and manipulating pedigree data and
      interfacing with genotype and haplotype data.  Connecting with a
      database for storing these objects. 
      (Heikki Lehvaslaiho and Jason Stajich)
    
    o Implement BioCORBA 0.03 proposed spec as described by the
      "Copenhagen Core" at BOSC2001.

    o Teaching tools - building teaching tutorials and basic
      problem sets for introducing bioperl to new users.
      (Peter Schattner and Jason Stajich)

  * Additionally we discussed the creation of new CVS modules.  I have
    been liberal, creating modules based on project domains, however
    the bioperl core discussed and proposed that we instead create
    modules specific to external dependancies.  So bioperl-live would
    hold pure perl code, bioperl-db would hold database specific code
    (SQL dependancy), bioperl-ext would hold c-extensions, bioperl-gui
    graphical dependancies (perlTk).

    If anyone has input on this, please respond to the list as this is
    just a proposal.   
    
  * Open calling for scripts.  We will open a public directory for
    submitting perl scripts and a short description of them.  Everyone
    is encouraged to donate perl scripts which may or may not use
    bioperl to solve biological problems in your work.  These scripts
    will be re-written by a bioperl developer and distributed as
    either part of the scripts or examples directory of bioperl
    (depending on their utility as general purpose or just as an
    example).
  
    Finally this script writing has 2 intended purposes.  We wish to
    include more people developing for bioperl and these will be
    small, self-contained projects giving new developers a chance to
    work on a problem without feeling swallowed by the whole bioperl
    object model.  Additionally this will also help provide a large
    number of utilities for bioinformatics and address the needs of
    users who download bioperl and only see a library of code with no
    applications.

  * Post-1.0 ideas

    o Sequence Parsing - Event based parsing and utilizing grammars.  
    o Other ideas should be submitted to the list and we'll help keep
      track.

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/