[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri roy.chaudhuri at gmail.com
Wed Jan 11 18:38:34 UTC 2012


Hi Frank,

Looks great, I like the use of between locations, didn't think of that.

It was suggested that I avoid using Clone for cat, trunc_with_features 
etc. to avoid adding a dependency (which may no longer be an issue) and 
because it would cause problems for Bio::Seq implementations that use a 
database as the back-end. Maybe you could add it as an option, but keep 
the default as is?

Cheers,
Roy.

On 11/01/2012 18:16, Frank Schwach wrote:
> Hi Roy and Chris,
>
> I have made the changes to the code now. As you suggested, feature ends
> no longer change type and I insert a note instead to inform about the
> deletion (or insertion), showing the length and position.
> I have also added a feature to annotate deletion sites themselves (with
> IN-BETWEEN locations).
>
> Roy's test script now prints:
>
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>        CDS             join(2..3,4..6)
>                        /note="3bp internal deletion between pos 3 and 4"
>        CDS             2..3
>                        /note="2bp deleted from feature end"
>        misc_feature    3^4
>                        /note="deletion of 3bp"
> ORIGIN
>           1 aaaaaaa
> //
>
>
> or, if you add strand information (-1 in this case) to the second feature:
>
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>        CDS             join(2..3,4..6)
>                        /note="3bp internal deletion between pos 3 and 4"
>        CDS             complement(2..3)
>                        /note="2bp deleted from feature 5' end"
>        misc_feature    3^4
>                        /note="deletion of 3bp"
> ORIGIN
>           1 aaaaaaa
> //
>
> I have comitted this along with some bugfixes to my master branch on GitHub
> https://github.com/fschwach/bioperl-live
> so it's now also in my existing pull request.
>
> I'm still wondering if cloning the sequence objects rather than calling
> 'new' on their respective classes would be an option inside 'delete' and
> 'insert'?
> I'm experimenting with this for my own purposes because I have to work
> with custom sub-classes of Bio::Seq which have additional attributes and
> therefore set 'can_call_new' to false.
> Without cloning the objects, I first have to convert the custom
> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid.
> Is there any reason why something like Clone::Fast should not be used in
> this case? It seems to work for me but there may be situations where
> this is going to blow up which I am not aware of.
> Cloning rather than calling new could be made an option in
> Bio::SeqUtils. I have most of the code for that already.
>
> Frank
>
>
>
>
>
>
>
>
>
>
>
> On 10/01/12 17:31, Roy Chaudhuri wrote:
>> Or without the typo:
>>
>> CDS             join(2..3,4..6)
>>                  /note="3 bp internal deletion"
>> CDS             2..3
>>                  /note="2 bp deleted from 3' end"
>>
>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>> I think it's me that didn't explain very well - I was talking about
>>> overlapping (rather than spanning) a deletion, although I think the same
>>> principle applies to the spanning example you gave. Here's some test
>>> code:
>>>
>>> #!/usr/bin/perl
>>> use warnings FATAL=>qw(all);
>>> use strict;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::SeqUtils;
>>> use Bio::SeqFeature::Generic;
>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>                                                       -start=>2,
>>>                                                       -end=>9));
>>>
>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>                                                       -start=>2,
>>>                                                       -end=>5));
>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>> print $out $trunc;
>>>
>>>
>>> This currently outputs:
>>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>>> ACCESSION   unknown
>>> FEATURES             Location/Qualifiers
>>>         CDS             join(2..>3,<4..6)
>>>         CDS             2..>3
>>> ORIGIN
>>>            1 aaaaaaa
>>> //
>>>
>>> However, I was suggesting that the feature table should be something
>>> like:
>>> CDS             join(2..3,4..6)
>>>                    /note="3 bp internal deletion"
>>> CDS             join(2..3)
>>>                    /note="2 bp deleted from 3' end"
>>>
>>> Fuzzy locations are intended to represent features which have boundaries
>>> spanning outside of the sequence. For a defined deletion that's not the
>>> case, the boundaries of the feature aren't unknown, they have been
>>> specifically altered.
>>>
>>> Hope this is clearer.
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>> Hi Roy,
>>>>
>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>> locations:
>>>>
>>>>     --------------------           a feature's location
>>>> ==========xxxx================= sequence
>>>>
>>>>
>>>>     ---------                     sublocation 1
>>>>              --------             sublocation 2
>>>> ===============================
>>>>
>>>> x= sequence to delete
>>>> The feature's location has changed from Simple to Split.
>>>>
>>>> Sublocation 1:
>>>> start is still EXACT and has not changed
>>>> end is now AFTER because this is not a true end of the feature
>>>>
>>>> Sublocation 2:
>>>> start is BEFORE
>>>> end is EXACT (but shifted)
>>>>
>>>> I hope this makes more sense(?)
>>>>
>>>> Cheers,
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>> Hi Frank,
>>>>>
>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>> trunc_with_features because it's intended to represent a taking a
>>>>> subregion of a larger sequence, but if you're representing an internal
>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>> unknown,
>>>>> they have been specifically altered. Maybe you could give absolute
>>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>>> truncated by however many bases.
>>>>>
>>>>> Cheers,
>>>>> Roy.
>>>>>
>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>>> HOWTO for the BioPerl wiki.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>> Sounds very promising!  The easiest way to contribute is via a
>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>
>>>>>>> chris
>>>>>>>
>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>> sequence
>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>>>> vector
>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>> annotations and
>>>>>>>> moving the features accordingly.
>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>> sites in
>>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>>> methods.
>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>> methods:
>>>>>>>>
>>>>>>>> delete
>>>>>>>> ======
>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>> and types
>>>>>>>> of locations of sequence features:
>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>> into
>>>>>>>> Splits.
>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>> Fuzzy to
>>>>>>>> indicate that their true start/end was lost.
>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>> - other features are shifted according to the length of the
>>>>>>>> deletion.
>>>>>>>>
>>>>>>>> insert
>>>>>>>> ======
>>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>>> - locations of features that span the insertion site are split but
>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>> original
>>>>>>>> feature is lost.
>>>>>>>> - other features are shifted according to the length of the
>>>>>>>> insertion.
>>>>>>>>
>>>>>>>> ligate
>>>>>>>> ======
>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>> insert in
>>>>>>>> turn.
>>>>>>>>
>>>>>>>>
>>>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>>>> origin
>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>> anyway. The
>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>
>>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>
>>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>




More information about the Bioperl-l mailing list