[Bioperl-l] Question on SeqFeature_RelationShip

lajus florian.lajus at inria.fr
Thu Jan 26 12:47:07 UTC 2012


Hi all,

I have a question concerning the way you can add a seqfeature to an 
existing seq.
To do this, you have to store the immediat parent of seqfeature, that is 
to say (in your current version) the seq. Then you store this new 
seqfeature and if a rank is set for this feature, you set proper rank 
for every other features of this seq.
In my version, I want to handle seqfeature relationships. So, I have a 
ranking problem:

Two solutions for ranking:
1     pseudogene
2     mRNA 'A'
3         exon 'a'
4         exon 'b'
5     mRNA 'B'
6         exon 'e'
7         exon 'f'
8         exon 'g'
...
or

1     pseudogene
2     mRNA 'A'
      1         exon 'a'
      2         exon 'b'
3     mRNA 'B'
      1         exon 'e'
      2         exon 'f'
      3         exon 'g'
...
The problem with the first solution is: If you add an exon to the first 
mRNA, you have to modify all the ranking of the next features ... 
Another problem is that if you add an exon to a retrieven seqfeatures 
(an mRNA here) you'll have to call the store method not on the immediate 
parent (the mRNA) but on the seq itself (or you 'll have a bad ranking):
Example, if you add 2 exons to first mRNA and call store method on mRNA:
1 Seq 'CR0001'
2     mRNA 'A'
3         exon 'a'
4         exon 'b'
5         exon 'c'
6         exon 'd'
5     mRNA 'B'
6         exon 'e'
7         exon 'f'
8         exon 'g'
And the exon 'd' won't be added because of the BIOSQL schema rules. (See 
problem with the second solution)

The problem with the second solution is that seqfeatures (here exon for 
example) have the same rank, the same source and the same type from the 
seqfeature table point of view. So you can't insert it cause of the 
BIOSQL schema rules which don't care of the name attribute.

My question is why don't we care ?

I hope I am clear...

Florian
Le 20/01/2012 19:06, Hilmar Lapp a écrit :
> Florian -
>
> I'll add that aside from being better aligned with our procedures for integrating code contributions, this would also make it easier for us and more recognizable for everyone to attribute these changes to you, because the git commit logs will do this rather than it being buried in a commit message string. So there's an actual benefit for you as the contributor.
>
> 	-hilmar
>
> On Jan 20, 2012, at 11:50 AM, Fields, Christopher J wrote:
>
>> Florian,
>>
>> Re: patches, we can accept these, but I would like to point out the code is publicly available on github:
>>
>>     https://github.com/bioperl/bioperl-db
>>
>> The fastest way to contribute is to create a github account, fork the code from that repository, checkout a local copy using git, then push the changes back to your fork so they are not lost.  You can then submit a pull request that should appear on the bioperl developers mailing list, where one of us can simply (via the github interface) merge your changes in.
>>
>> chris
>>
>> On Jan 20, 2012, at 10:45 AM, lajus wrote:
>>
>>> Hi all,
>>>
>>> As I have said, I've worked on SeqFeatureAdaptor to also persist and retrieve sub-features.
>>> My code is in attachment:
>>>
>>>    I have modified the last stable version of :
>>>            - Bio/DB/BioSQL/SeqFeatureAdaptor.pm
>>>            - Bio/DB/BIOSQL/SeqAdaptor.pm
>>>    I have created:
>>>            - Bio/SeqFeature/SeqFeatureRealtionship.pm
>>>            - Bio/DB/BioSQL/SeqFeatureRealtionshipAdaptor.pm
>>>            some tests:
>>>            - SeqFeatureRealtionship.t (very simple test for the also rally simple SeqFeatureRealtionship class)
>>>            - SeqFeatureRealtionshipAdaptor.t (test persistent of subfeatures object in database. Need a database with BioSQL to connect with (even if no commit is done)) =>  I have only tested with a PostGres database ...
>>>
>>> If you have advices, questions about my implementation or about my tests, don't hesitate to tell me.
>>>
>>> Is there a way to include my modifications to a future release of BioPerl ?
>>>
>>> Florian
>>>
>>> Le 14/01/2012 17:12, Hilmar Lapp a écrit :
>>>> Hi Florian,
>>>>
>>>> You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur.
>>>>
>>>> So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable.
>>>>
>>>> -hilmar
>>>>
>>>> Sent with a tap.
>>>>
>>>> On Jan 13, 2012, at 4:27 AM, lajus<florian.lajus at inria.fr>   wrote:
>>>>
>>>>> I should write
>>>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL
>>>>> of course
>>>>>
>>>>> Le 13/01/2012 10:25, lajus a écrit :
>>>>>> Hi hilmar,
>>>>>>
>>>>>> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this.
>>>>>>
>>>>>> As far as I understand, I should create:
>>>>>>    - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB
>>>>>>    - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature
>>>>>>    - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships)
>>>>>>    - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships )
>>>>>>
>>>>>> Is it the right way?
>>>>>>
>>>>>> Florian
>>>>>>
>>>>>> Le 12/01/2012 18:49, Hilmar Lapp a écrit :
>>>>>>> Hi Florian,
>>>>>>>
>>>>>>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true.
>>>>>>>
>>>>>>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this.
>>>>>>>
>>>>>>>    -hilmar
>>>>>>>
>>>>>>> On Jan 12, 2012, at 4:23 AM, lajus wrote:
>>>>>>>
>>>>>>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet:
>>>>>>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too):
>>>>>>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet."
>>>>>>>>
>>>>>>>> So it's totally normal, if it doesn't work.
>>>>>>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ?
>>>>>>>>
>>>>>>>> Florian
>>>>>>>>
>>>>>>>> Le 11/01/2012 16:44, Fields, Christopher J a écrit :
>>>>>>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into.  Do you have some example data (genbank file you are using, for instance)?
>>>>>>>>>
>>>>>>>>> chris
>>>>>>>>>
>>>>>>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote:
>>>>>>>>>
>>>>>>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many :
>>>>>>>>>>
>>>>>>>>>> no adaptor found for class Bio::Annotation::TypeManager
>>>>>>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory
>>>>>>>>>>
>>>>>>>>>> Just warning, no errors but...
>>>>>>>>>> Any clues?
>>>>>>>>>>
>>>>>>>>>> Thanks by advance,
>>>>>>>>>>
>>>>>>>>>> Florian
>>>>>>>>>>
>>>>>>>>>> Le 11/01/2012 13:43, lajus a écrit :
>>>>>>>>>>> I have looked to the Unflattener and  the magic works quite fine.
>>>>>>>>>>> Then, the $seq which is given (by side-effect)  by
>>>>>>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1);
>>>>>>>>>>> has a good hierarchy for us.
>>>>>>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS.
>>>>>>>>>>> But when I create a persitent object for $seq and if I create it:
>>>>>>>>>>>    $adaptor->create_persistent($seq);
>>>>>>>>>>>    $pseq->create();
>>>>>>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table.
>>>>>>>>>>>
>>>>>>>>>>> Do you have an explanation?
>>>>>>>>>>>
>>>>>>>>>>> Florian
>>>>>>>>>>>
>>>>>>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a écrit :
>>>>>>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus<florian.lajus at inria.fr>      wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> I am currently working on a refactoring of the Genolevures project
>>>>>>>>>>>>>> (http://www.genolevures.org/)
>>>>>>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL
>>>>>>>>>>>>>> database.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in
>>>>>>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too.
>>>>>>>>>>>>>> But it seems that my seqfeature_relationship table is empty.
>>>>>>>>>>>>>> I find it strange in so far as there is a relationship between gene and its
>>>>>>>>>>>>>> CDS. right?
>>>>>>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be)
>>>>>>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank
>>>>>>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl
>>>>>>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would
>>>>>>>>>>>>> get record in the BioSQL tables.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter
>>>>>>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in:
>>>>>>>>>>>>
>>>>>>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener
>>>>>>>>>>>>
>>>>>>>>>>>> chris
>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> <SeqFeatureRelationship.pm><SeqFeatureRelationshipAdaptor.pm><SeqFeatureAdaptor.pm><SeqAdaptor.pm><seqFeatureRelationshipAdaptor.t><SeqFeatureRelationShip.t.t>




More information about the Bioperl-l mailing list