[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning

Roy Chaudhuri roy.chaudhuri at gmail.com
Tue Jan 10 17:27:05 UTC 2012


I think it's me that didn't explain very well - I was talking about 
overlapping (rather than spanning) a deletion, although I think the same 
principle applies to the spanning example you gave. Here's some test code:

#!/usr/bin/perl
use warnings FATAL=>qw(all);
use strict;
use Bio::Seq;
use Bio::SeqIO;
use Bio::SeqUtils;
use Bio::SeqFeature::Generic;
my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
$seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
                                                    -start=>2,
                                                    -end=>9));

$seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
                                                    -start=>2,
                                                    -end=>5));
my $out=Bio::SeqIO->newFh(-format=>'genbank');
my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
print $out $trunc;


This currently outputs:
LOCUS       seq-accession_number            7 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
      CDS             join(2..>3,<4..6)
      CDS             2..>3
ORIGIN
         1 aaaaaaa
//

However, I was suggesting that the feature table should be something like:
CDS             join(2..3,4..6)
                 /note="3 bp internal deletion"
CDS             join(2..3)
                 /note="2 bp deleted from 3' end"

Fuzzy locations are intended to represent features which have boundaries 
spanning outside of the sequence. For a defined deletion that's not the 
case, the boundaries of the feature aren't unknown, they have been 
specifically altered.

Hope this is clearer.
Cheers,
Roy.

On 10/01/2012 16:47, Frank Schwach wrote:
> Hi Roy,
>
> Sorry, I hadn't explained that very well: it's not the outer boundaries
> of the feature that become fuzzy but the "inner" ones of the split
> locations:
>
>   --------------------           a feature's location
> ==========xxxx================= sequence
>
>
>   ---------                     sublocation 1
>            --------             sublocation 2
> ===============================
>
> x= sequence to delete
> The feature's location has changed from Simple to Split.
>
> Sublocation 1:
> start is still EXACT and has not changed
> end is now AFTER because this is not a true end of the feature
>
> Sublocation 2:
> start is BEFORE
> end is EXACT (but shifted)
>
> I hope this makes more sense(?)
>
> Cheers,
>
> Frank
>
>
>
> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>> Hi Frank,
>>
>> Looks good to me. One thing I'm not sure about - why do features
>> overlapping a deletion become fuzzy? That behaviour is in
>> trunc_with_features because it's intended to represent a taking a
>> subregion of a larger sequence, but if you're representing an internal
>> deletion then the boundaries of the overlapping feature aren't unknown,
>> they have been specifically altered. Maybe you could give absolute
>> coordinates, but add a note indicating that the 5' or 3' end has been
>> truncated by however many bases.
>>
>> Cheers,
>> Roy.
>>
>> On 10/01/2012 13:10, Frank Schwach wrote:
>>> Hi Chris,
>>>
>>> I have made the changes in a Git fork and made the pull request now.
>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>> HOWTO for the BioPerl wiki.
>>>
>>> Frank
>>>
>>>
>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>>>
>>>> chris
>>>>
>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>>>> and insert a fragment into it while preserving all the annotations and
>>>>> moving the features accordingly.
>>>>> My main aim was to split features that span deletion/insertion sites in
>>>>> a meaningful way, which can not be done with the currently availble
>>>>> methods.
>>>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>>>
>>>>> delete
>>>>> ======
>>>>> removes a segment from a sequence object and adjusts positions and types
>>>>> of locations of sequence features:
>>>>> - locations of features that span the deletion sites are turned into
>>>>> Splits.
>>>>> - locations that extend into the deleted region are turned to Fuzzy to
>>>>> indicate that their true start/end was lost.
>>>>> - locations contained inside the deleted regions are lost.
>>>>> - other features are shifted according to the length of the deletion.
>>>>>
>>>>> insert
>>>>> ======
>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>> sites. This also affects the features on the recipient sequence:
>>>>> - locations of features that span the insertion site are split but
>>>>> position types are not turned to Fuzzy because no part of the original
>>>>> feature is lost.
>>>>> - other features are shifted according to the length of the insertion.
>>>>>
>>>>> ligate
>>>>> ======
>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>>>> turn.
>>>>>
>>>>>
>>>>> One situation I haven't handled yet is a deletion that spans the origin
>>>>> of a circular molecule but that should be a rare thing to do anyway. The
>>>>> code currently throws an error if this is attempted.
>>>>>
>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>> Comments on the handling of feature locations highly welcome!
>>>>>
>>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>




More information about the Bioperl-l mailing list