[Bioperl-l] Bio::FPC

Ewan Birney birney@ebi.ac.uk
Fri, 15 Nov 2002 17:41:48 +0000 (GMT)


On Fri, 15 Nov 2002, Jamie Hatfield wrote:

> Hello all, I need some advice.
>
> I work at the Arizona Genomics Institute under Dr. Cari Soderlund (if
> you don't know her, she used to work at the Sanger Centre, where she
> developed FPC - FingerPrinted Contigs - probably the most used software
> for physical map construction.  She's here in Tucson, AZ after a short
> hiatus in Clemson, SC)  Anyway, I've re-introduced our group to Bioperl
> and we are starting to take advantage of it whereever possible.  Cari
> had seen Bioperl before, but that was pre 1.0 days, when things weren't
> stable enough (in her opinion) for a production environment, after which
> point, she never got around to looking into it again.
>

Cool. Say hi to Cari from me - I'm glad she's letting you look into this
after an intial oops experience...

> I noticed in some document from a presentation given by one of the
> Bioperl bigwigs (might have been LStein), that a FPC parser was a common
> request.  If that's true, we know fpc probably as well as anybody else
> so it would make sense for us to develop/maintain it.
>
> So now we would like to make a contribution.  Don't get too excited
> yet... It's not programmed yet.  But we have found that in many, many
> different areas we need to read a .fpc file (and corresponding .cor
> file) and Do Something(c) with it.  At the same time, I want to get more
> familiar with Bioperl.  I've done fairly simple things, like reading in
> fasta/genbank/swisspro format files and working with alignments (as you
> all saw in my EST Alignment questions).
>
> The advice I want is as follows:
> 1) Where are the standards/guidelines for writing Bioperl modules?

 try reading biodesign.pod as some standards here

> 2) Any ideas on what features/functionality Bio::FPC should have?

I would have thought the following heirerachy is good:


   Bio::FPC::FPCSet  (has a set of)
           ::Contig  (has a set of)
           ::BAC

Bio::FPC::BAC may well inheriet off Bio::SeqFeatureI - using the FPC band
coordinates for start/end.

The have

  Bio::FPC::IO.pm - IO format - steal Bio::Align::IO etc
     ::FPC::IO::fpc - problem here - the "program" is also called fpc. Any
other name for the format

IO.pm would defined methods ->next_fpcset and ->write_fpcset

> 3) Any ideas on what (if any besides Bio::Root) I should inherit from?
> 4) Should this be an interface and separate implementation or just an
> implementation?
>    (i.e., are there other file formats/programs for physical maps?)

If you think there will be more than one storage system -eg files and
database, probably good to split everything into interfaces and
implementations now. I should think you would want to do this, so I would
go for it.

> 5) What Bioperl objects should I use in construction?
>
> These are the ideas I have so far (after all of a day of thinking about
> it, so feel free to laugh/scorn/suggest better implementations)
> (all these classes should be prefixed with Bio::FPC
>

Aha. I like your names better than mine.


Project, Clone, Contig, Marker. Great stuff.

> 1) ::Project
>   This would be the main class.  It would contain the information parsed
> from the top 8 or so lines of the .fpc file.  It would also contain the
> rest of these objects.
>
> 2) ::Clone
>   Obviously, this is the clone (or more properly - fingerprinted clone)
> from the fpc file.  The attributes would include type (Clone, BAC, PAC)
> name, bands[], sizes[] (if available), a few dates (creation,
> modification), remarks (normal and fpc remarks), contig (and range),
> matching clones (parents and children; approximate, exact, and pseudo),
> markers, etc.  Basically anything you might find as the /^(\w+)/ of the
> line in a .fpc file.
>
>   In typing that out, it seems that maybe the contig and range that a
> clone hits would best be implemented as a type of RangeI class, which is
> more apparent now that I typed that sentence.  Moving on then...
>
> 3) ::Contig
>   Contig number, datetime, status (Ok, NoCB, Avoid, NoAce, Dead), #Q's,
> description.
>
> 4) ::Marker
>   Type (STS, eMRK, whatever), date (create,mod), Global position (if
> anchored to framework)
>
> That's basically it for the objects.  Although the contigrange might
> need to be an object inherited from RangeI.
>
> So now I need some input, and we'll see if I can't get started coding
> this.
>
> Thanks!
>
> ----------------------------------------------------------------------
> Jamie Hatfield                              Room 541H, Marley Building
> Systems Programmer                          University of Arizona
> Arizona Genomics Computational              Tucson, AZ  85721
>   Laboratory (AGCoL)                        (520) 626-9598
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>