[Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable?

Chris Fields cjfields at uiuc.edu
Tue Aug 22 12:40:53 UTC 2006


Many Bio::DB* modules access the database to get the raw data, and  
this is attached to an Bio::*IO stream class in some way (for most  
cases).  There are a few that get around this; for instance,  
Bio::DB::Taxonomy* uses no specialized SeqIO-like class.

Like you mentioned, you could extend Bio::Matrix::PSM::IO::transfac  
specifically to encompass the 'instance' sequences (the other PSM::IO  
modules wouldn't have the same methods available to them), use  
SimpleAlign or SeqFeature::SimilarityPair (I agree the former is  
probably better).  Or have the Bio::DB module set up to grab either  
your 'instance' sequences by ID (where you could possibly implement  
RandomAccessI) or a Transfac PSM (implement a new Matrix-based)  
interface.  TMTOWTDI.

Does the TFBS package have any overlap here?  I haven't used them  
(they require PDL which is a pain to install on WinXP) but they are  
supposed to be fully integrated with Bioperl.

http://forkhead.cgb.ki.se/TFBS/

Chris

On Aug 22, 2006, at 3:23 AM, Sendu Bala wrote:

> I'm looking to extract data from some Transcription Factor Binding  
> Site
> (TFBS) databases. For example, matrix, sequence and known position
> information out of Transfac flatfiles.
>
> Currently there is Bio::Matrix::PSM::IO::transfac, but it only  
> gives you
> the PSM matrices, not the 'instance' sequences. Bio::Matrix::PSM also
> has this to say:
>
>> =head1 DESCRIPTION
>>
>> To handle a combination of site matrices and/or their corresponding
>> sequence matches (instances). This object inherits from
>> Bio::Matrix::PSM::SiteMatrix, so you can use the respective
>> methods. It may hold also an array of Bio::Matrix::PSM::InstanceSite
>> object, but you will have to retrieve these through
>> Bio::Matrix::PSM::Psm-E<gt>instances method (see below). To some  
>> extent
>> this is an expanded SiteMatrix object, holding data from analysis  
>> that
>> also deal with sequence matches of a particular matrix.
>>
>>
>> =head2 DESIGN ISSUES
>>
>> This does not make too much sense to me I am mixing PSM with PSM
>> sequence matches Though they are very closely related, I am not
>> satisfied by the way this is implemented here.  Heikki suggested
>> different objects when one has something like meme But does this mean
>> we have to write a different objects for mast, meme, transfac,
>> theiresias, etc.?  To me the best way is to return SiteMatrix  
>> object +
>> arrray of InstanceSite objects and then mast will return undef for
>> SiteMatrix and transfac will return undef for InstanceSite.  
>> Probably I
>> cannot see some other design issues that might arise from such
>> approach, but it seems more straightforward.  Hilmar does not like
>> this beacause it is an exception from the general BioPerl rules  
>> Should
>> I leave this as an option?  Also the header rightfully belongs the
>> driver object, and could be retrieved as hashes.  I do not think it
>> can be done any other way, unless we want to create even one more
>> object with very unclear content.
>
> I actually want to get even more kinds of data out, so rather than
> extend Bio::Matrix::PSM::IO::transfac and related modules in some way,
> would it be more appropriate to have something like
> Bio::DB::TFBS::transfac which had a number of methods that gave  
> specific
> kinds of objects? We could have get_psm() which gives a normal 'pure'
> Bio::Matrix::PSM with no InstanceSite objects, get_aln() which  
> returns a
> Bio::SimpleAlign for the 'instance' sequences that were used to  
> generate
> a given PSM, and get_map() which returns a new special kind of  
> Bio::Map
> with binding site position information.
>
> Another way it makes a little more sense for this to be a 'DB' module
> and not an IO one is that there are multiple huge Transfac data  
> files in
> the database, with related and cross-referenced information. To  
> extract
> the complete information you would want to parse them all and create
> indexes for fast lookups later, not something you really expect of  
> an IO
> module.
>
>
> Thoughts anyone?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list