[Bioperl-l] Aggressive aggregation?

Lincoln Stein lstein at cshl.edu
Thu Mar 10 16:49:05 EST 2005


The problem is tied up with the need for better handling of GFF3 by 
Bio::DB::GFF.  In GFF3 you can separate the Name of a thing and its 
parentage:

	
	ID=match0001;Target=cdna0123 12 462
	ID=match0001;Target=cdna0123 463 963
	ID=match0001;Target=cdna0123 964 2964
	ID=match0002;Target=cdna0123 1 129
	ID=match0002;Target=cdna0123 463 960

This is what the alignment GFF emitter should produce.  Unfortunately, 
when you load this into Bio::DB::GFF, the distinction between the ID 
and the Target is lost and all the lines get aggregated together 
again on the target name cdna0123.

I've got lots of notes on a better Bio::DB::GFF and a sample schema 
and queries.  If someone wants to work on this, I'll hand it over to 
them.  ...Alternatively, perhaps this can be fixed by a much less 
invasive change to the Bio::DB::GFF module.  Perhaps the Target 
should simply be converted into an alias so that it can be 
identified.

Lincoln

On Thursday 10 March 2005 12:21 pm, Chad Matsalla wrote:
> On Wed, 9 Mar 2005, Aaron J. Mackey wrote:
> > > chr1 aafcest     HSP   200   275   .     -     .     Target
> > > "Sequence:chad1" 200 275
> > > chr1 aafcest     HSP   300   450   .     -     .     Target
> > > "Sequence:chad1" 300 450
> > > chr1 aafcest     match 200   450   .     -     .     Target
> > > "Sequence:chad1" 200 450
> >
> > These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2"
> > or some such.  This also means that if you're saving the ESTs in
> > the database (for sequence alignment display), you'll have to
> > save them redundantly under chad1-1, chad1-2, etc.
>
> This is horrible. I want to fix this.
>
> > Now, you could write a custom aggregator that de-aggregated
> > multiple chad1 "match" features, assigning the contained HSPs to
> > each, but there is no such "default" behavior.  Let me know if
> > there's general interest for this ...
>
> I think there is, and I volunteer to write it. I'm new to the
> Bio::DB subsystem but I'm eager to dive in. Can you help me by
> providing a general flowchart on what you'd do to create this? What
> should the Aggregator be called? Hmm.
> Bio::DB::GFF::Aggregator::manymatch ?
>
> Chad Matsalla
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse at cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/0d70e8a1/attachment.bin


More information about the Bioperl-l mailing list