[Bioperl-l] Bio::FPC

Jamie Hatfield jamie@genome.arizona.edu
Mon, 18 Nov 2002 15:10:54 -0700


Thank you all for replies.  I've been reading through bioperl modules
since Friday, trying to get my bearings and evaluate the possibility of
using suggested modules.  I need some clarification from interested
parties...  Please excuse bulking this all into one email.

Martin:
Great suggestion about splitting up the fingerprint and the map.  I will
probably do the map first, then come back and make a fingerprint an
attribute of the clones in the map, and make a way to get fingerprints
yourself from a .cor file or other file formats you mentioned.

Heikki:

You suggested using the Bio::Map modules.  This idea, I really like.  It
seems like this is exactly what I should be looking at, since I am
trying to model a physical MAP.  I like it.  The MappableI interface
seems like it will fit nicely with clones and markers that are mapped
into a Contig (which isa MapI).  

But then you mentioned Bio::Assembly, and now I am confused.  What
exactly is the difference between the two?  Is Assembly meant more for
sequence based things?  If so, then I don't think I want FPC to go
there.  On the other hand, ScaffoldI seems to be a GREAT interface for
the main FPC class to implement, and contigs are already in the
Bio::Assembly idea.  But, again, these contigs seem to be a bit too tied
to sequence data.  In particular, 
   "A contig is as a set of sequences, locally aligned 
    to each other, sothat every sequence has overlapping
    regions with at least one sequence in the contig..."
Does not necessarily hold true for FPC contigs.  Also, there is no idea
of sequence data in the FPC physical map (well, not in what I'm looking
at) so it doesn't seem fitting for FPC to be a type of Assembly.  

Unless I'm misunderstanding, I think I will go ahead with using the
Bio::Map modules... Unless I could somehow use the ScaffoldI *and* the
Bio::Map modules.  ?

So if I use the Bio::Map modules, I will name the main FPC class
Bio::Map::Physical and have a Bio::MapIO::fpc to load it into the
Bio::Map::Physical class.  There will also be a Bio::Map::Contig
(inheriting from Bio::Map::MapI) and Bio::Map::Clone (inheriting from
Bio::Map::MappableI).  I expect to be able to use Bio::Map::Marker, but
I'm not sure.

Also, you mentioned the Bio::Coordinate modules.  I don't really
understand these enough from the documentation, so I will ask you
directly.  :-)  

I have Contig5.  Contig5 has clones c1, c2, c3, and c4.  It looks like
this:

    Contig5
0123456789012345
================
---c1---  --c4--
  ----c2----
    -c3--
         
Each unit on the map above is a cb (consensus band unit), which is
(roughly) equivalent to a fingerprint band.  So c1 is 8 cb's long, and
starts at 0.  c2 is 10 cb's long and starts at 2.  c3 is 5 cb's long and
starts at 4.  c4 is 6 cb's long and starts at 10.

How would this be represented in Bio::Coordinate modules?  This is what
I understand.
We have a Bio::Coordinate::Collection (BCC).  The BCC is loaded with
Bio::Coordinate::Pairs (BCP).  So it happens something like this:

$c1 = Bio::Location::Simple->new(-seq_id=> 'c1',
                -start => 0, -end => 7);
$c2 = Bio::Location::Simple->new(-seq_id=> 'c2',
                -start => 2, -end => 11);
$c3 = Bio::Location::Simple->new(-seq_id=> 'c3',
                -start => 4, -end => 8);
$c4 = Bio::Location::Simple->new(-seq_id=> 'c4',
                -start => 10, -end => 15);

That takes care of the clones.  Now, here, I get a little confused.  How
do I map these things into the contig?  Do I need a new
Bio::Location::Simple object for each clone, each time that I try to map
it into the contig?

Ewan:

SeqFeatureI makes sense, but I think it would be better to make it a
MappableI, especially in light of the above discussion?  One drawback,
though... Would a MappableI be displayed through the Graphics package as
easily as a SeqFeatureI (in reference to the ongoing discussion).


Thanks again, everyone.


----------------------------------------------------------------------
Jamie Hatfield                              Room 541H, Marley Building
Systems Programmer                          University of Arizona
Arizona Genomics Computational              Tucson, AZ  85721
  Laboratory (AGCoL)                        (520) 626-9598

> -----Original Message-----
> From: Martin Krzywinski [mailto:martink@bcgsc.ca] 
> Sent: Friday, November 15, 2002 1:33 PM
> To: Jamie Hatfield; bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Bio::FPC
> 
> 
> 
> An FPC parser, especially for .cor files, would be very 
> useful, where the
> actual clone fragment information is stored. Where do I sign up ;)
> 
> We have unpublished code which models intersection through 
> the Sulston score
> (using simple and Needleman-Wunsch alignment), differences, 
> and unions of
> fingerprints and provides graphical output of fingerprints. I 
> also have some
> unit conversion code (standard mobility <-> fragment size) 
> and a clone name
> parser. It would be great to structure things together to 
> include this where
> appropriate.
> 
> The only issue I see with the the Bio::FPC namespace is that 
> it refers to a
> specific implementation and data storage (FPC) rather than an 
> abstract data
> model. Perhaps something like Bio::Fingerprint or Bio::FP 
> which would then
> include ::FPC as an I/O layer. You should be able to create a 
> Fingerprint
> object, for example, from a .cor file, or a fasta file 
> (through in-silico),
> or even through a .sizes/.bands file produced by Image. Here 
> we convert all
> our FPC maps into mysql to use with iCE (Internet Contig Explorer).
> 
> > from the fpc file.  The attributes would include type 
> (Clone, BAC, PAC)
> > name, bands[], sizes[] (if available), a few dates (creation,
> > modification), remarks (normal and fpc remarks), contig (and range),
> 
> The clone itself doesn't necessarily need to be associated with any
> restriction fragments. This is why I think it would be good 
> for the FPC
> module set to interact with a middle fingerprint/clone layer.
> 
> Best regards,
> 
> Martin
> 
> 
> Martin Krzywinski
> Genome Sciences Centre
> 600 W 10th Avenue
> Vancouver BC V5Z 4E6
> Canada
> tel 604.877.6086
> fax 604.877.6085
> http://www.bcgsc.ca
> 
> ----- Original Message -----
> From: "Jamie Hatfield" <jamie@genome.arizona.edu>
> To: <bioperl-l@bioperl.org>
> Sent: Friday, November 15, 2002 8:15 AM
> Subject: [Bioperl-l] Bio::FPC
> 
> 
> > Hello all, I need some advice.
> >
> > I work at the Arizona Genomics Institute under Dr. Cari 
> Soderlund (if
> > you don't know her, she used to work at the Sanger Centre, where she
> > developed FPC - FingerPrinted Contigs - probably the most 
> used software
> > for physical map construction.  She's here in Tucson, AZ 
> after a short
> > hiatus in Clemson, SC)  Anyway, I've re-introduced our 
> group to Bioperl
> > and we are starting to take advantage of it whereever 
> possible.  Cari
> > had seen Bioperl before, but that was pre 1.0 days, when 
> things weren't
> > stable enough (in her opinion) for a production 
> environment, after which
> > point, she never got around to looking into it again.
> >
> > I noticed in some document from a presentation given by one of the
> > Bioperl bigwigs (might have been LStein), that a FPC parser 
> was a common
> > request.  If that's true, we know fpc probably as well as 
> anybody else
> > so it would make sense for us to develop/maintain it.
> >
> > So now we would like to make a contribution.  Don't get too excited
> > yet... It's not programmed yet.  But we have found that in 
> many, many
> > different areas we need to read a .fpc file (and corresponding .cor
> > file) and Do Something(c) with it.  At the same time, I 
> want to get more
> > familiar with Bioperl.  I've done fairly simple things, 
> like reading in
> > fasta/genbank/swisspro format files and working with 
> alignments (as you
> > all saw in my EST Alignment questions).
> >
> > The advice I want is as follows:
> > 1) Where are the standards/guidelines for writing Bioperl modules?
> > 2) Any ideas on what features/functionality Bio::FPC should have?
> > 3) Any ideas on what (if any besides Bio::Root) I should 
> inherit from?
> > 4) Should this be an interface and separate implementation 
> or just an
> > implementation?
> >    (i.e., are there other file formats/programs for physical maps?)
> > 5) What Bioperl objects should I use in construction?
> >
> > These are the ideas I have so far (after all of a day of 
> thinking about
> > it, so feel free to laugh/scorn/suggest better implementations)
> > (all these classes should be prefixed with Bio::FPC
> >
> > 1) ::Project
> >   This would be the main class.  It would contain the 
> information parsed
> > from the top 8 or so lines of the .fpc file.  It would also 
> contain the
> > rest of these objects.
> >
> > 2) ::Clone
> >   Obviously, this is the clone (or more properly - 
> fingerprinted clone)
> > from the fpc file.  The attributes would include type 
> (Clone, BAC, PAC)
> > name, bands[], sizes[] (if available), a few dates (creation,
> > modification), remarks (normal and fpc remarks), contig (and range),
> > matching clones (parents and children; approximate, exact, 
> and pseudo),
> > markers, etc.  Basically anything you might find as the 
> /^(\w+)/ of the
> > line in a .fpc file.
> >
> >   In typing that out, it seems that maybe the contig and 
> range that a
> > clone hits would best be implemented as a type of RangeI 
> class, which is
> > more apparent now that I typed that sentence.  Moving on then...
> >
> > 3) ::Contig
> >   Contig number, datetime, status (Ok, NoCB, Avoid, NoAce, 
> Dead), #Q's,
> > description.
> >
> > 4) ::Marker
> >   Type (STS, eMRK, whatever), date (create,mod), Global position (if
> > anchored to framework)
> >
> > That's basically it for the objects.  Although the contigrange might
> > need to be an object inherited from RangeI.
> >
> > So now I need some input, and we'll see if I can't get 
> started coding
> > this.
> >
> > Thanks!
> >
> > 
> ----------------------------------------------------------------------
> > Jamie Hatfield                              Room 541H, 
> Marley Building
> > Systems Programmer                          University of Arizona
> > Arizona Genomics Computational              Tucson, AZ  85721
> >   Laboratory (AGCoL)                        (520) 626-9598
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> 
>