[Bioperl-l] sequence trace data

Robson Francisco de Souza rfsouza@citri.iq.usp.br
Thu, 19 Dec 2002 21:24:19 -0200 (BRST)


	Hi Chad,

On Thu, 19 Dec 2002, Chad Matsalla wrote:
> On Thu, 19 Dec 2002, Robson Francisco de Souza wrote:
> >       Just a thought: Bio::Seq::SeqWithQuality objects aggregate a
> > Bio::PrimarySeq and a Bio::Seq::PrimaryQual object. The last one holds
> > the
> > quality array and is-a Bio::Seq:QualI. Should there be also a
> > Bio::Seq::TraceI and a Bio::Seq::PrimaryTrace object and should
> > Bio::Seq::Read aggregate a Bio::PrimarySeq and a
> > Bio::Seq::PrimaryTrace?
> 
> And then a SequenceWithTraceData ? Returned from SeqIO::scf.pm?
> package Bio::Seq::SequenceTrace;
> @ISA = qw(Bio::Seq::SeqWithQuality);
> 
> That way it can be returned from SeqIO::scf::read_seq and polymorphism
> will prevent it from breaking stuff that is already out there.
> 
> The new methods in SequenceTrace will deal with things like:
> 
>         # get and set the trace data for this object
> $reference_to_array = $st->trace($base,\@trace_points);  AAAH! Overloading...
>         # get a subtrace
> $reference_to_array = $st->subtrace($base,$start,$end);

Won't there be a method for accessing the trace from the trace data points
coordinates? Or should such a method come from Bio::Seq::PrimaryTrace
throught aggregation:

my $ref_to_array = $st->primary_trace("a")->subtrace($start,$end);

Note: a trace object for each base type, right?

> 	# synthesize a false trace
> $reference_to_array = $st->false_trace("a",Bio::Seq, $accuracy);

What is the "Bio::Seq" argument for?

> Any objections or comments?

	Just some thoughts below, but maybe they should be treated
on a separe thread.

> I think that a trace is a special case because it has a lot of data that
> is not necessarily associated with a base or a base-call- it is the
> stuff _inbewteen_.

	I'm still worried about the idea of generalizing an
interface for sequence associated arrays of data. The main reason for my
concern is the output of the polyphred program: a matrix of data
associated with each single base from a read. I believe that might be
useful some day :).
	Anyway, I agree, trace data is different. It's more general than
a base <-> other_data mapping. A lot of trace data is associted with a
single base and a lot more may be not associated with any base. That
means we need a mapping (one base) <-> (many trace points). Maybe that
could be implemented by aggregating sequence and other feature like
objects into an object which holds Bio::Coordinate objects and a table
with columns [array1 element, array2 element, start, end].
	What I want is just a way to map two big arrays of data, maybe
of different sizes. This would be of use to represente sequences and
traces, sequences and qualities, sequences and polymorphisms, an ordered
sequence of genes and their GC content, etc. Well, I know this is going
too far, sorry :). Anyway features keep coming back to my mind as a bad
option, because of memory usage and no methods to select subarrays, but I
may be wrong... (Bio::SeqFeature::Collection?).
	Ok, I went too far :).
	Cheers,
			Robson

> > > Bio::Seq::SeqWithQualityI should probably inheriet from
> > > Bio::PrimarySeqI
> 
> I was not planning to have an interface for this- I view it as an
> implementor of interfaces. Do you think that SequenceWithTrace data
> should inherit from it though? Because really SequenceWithTrace is an
> extension of SequenceWithQuality (in the real world, whatever _that_
> is).
> 
> Chad Matsalla
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>