[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning

Fields, Christopher J cjfields at illinois.edu
Wed Jan 11 03:13:45 UTC 2012


Have to agree with Roy, in cases where the bounds of the deletion are defined, the truncated features wouldn't be fuzzy (e.g. the start and end would be known, the feature would just be truncated).  Same with other mutations.

chris

On Jan 10, 2012, at 4:35 PM, Frank Schwach wrote:

> Hi Roy,
> 
> I see what you mean and I had the same thought but somehow I liked the fuzzy locations more because it suggests to me that the feature is not complete (anymore). But I do take your point that this is not the intended use of this location type. I can add notes as you suggest but I guess I should also add a misc_feature "deletion", in your example between bases 3 and 4, to make it clearer that something has happened to the feature.
> 
> Frank
> 
> 
> 
> On 10/01/12 17:27, Roy Chaudhuri wrote:
>> I think it's me that didn't explain very well - I was talking about overlapping (rather than spanning) a deletion, although I think the same principle applies to the spanning example you gave. Here's some test code:
>> 
>> #!/usr/bin/perl
>> use warnings FATAL=>qw(all);
>> use strict;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::SeqUtils;
>> use Bio::SeqFeature::Generic;
>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                   -start=>2,
>>                                                   -end=>9));
>> 
>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>                                                   -start=>2,
>>                                                   -end=>5));
>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>> print $out $trunc;
>> 
>> 
>> This currently outputs:
>> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
>> ACCESSION   unknown
>> FEATURES             Location/Qualifiers
>>     CDS             join(2..>3,<4..6)
>>     CDS             2..>3
>> ORIGIN
>>        1 aaaaaaa
>> //
>> 
>> However, I was suggesting that the feature table should be something like:
>> CDS             join(2..3,4..6)
>>                /note="3 bp internal deletion"
>> CDS             join(2..3)
>>                /note="2 bp deleted from 3' end"
>> 
>> Fuzzy locations are intended to represent features which have boundaries spanning outside of the sequence. For a defined deletion that's not the case, the boundaries of the feature aren't unknown, they have been specifically altered.
>> 
>> Hope this is clearer.
>> Cheers,
>> Roy.
>> 
>> On 10/01/2012 16:47, Frank Schwach wrote:
>>> Hi Roy,
>>> 
>>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>>> of the feature that become fuzzy but the "inner" ones of the split
>>> locations:
>>> 
>>>  --------------------           a feature's location
>>> ==========xxxx================= sequence
>>> 
>>> 
>>>  ---------                     sublocation 1
>>>           --------             sublocation 2
>>> ===============================
>>> 
>>> x= sequence to delete
>>> The feature's location has changed from Simple to Split.
>>> 
>>> Sublocation 1:
>>> start is still EXACT and has not changed
>>> end is now AFTER because this is not a true end of the feature
>>> 
>>> Sublocation 2:
>>> start is BEFORE
>>> end is EXACT (but shifted)
>>> 
>>> I hope this makes more sense(?)
>>> 
>>> Cheers,
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>> Hi Frank,
>>>> 
>>>> Looks good to me. One thing I'm not sure about - why do features
>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>> trunc_with_features because it's intended to represent a taking a
>>>> subregion of a larger sequence, but if you're representing an internal
>>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>>> they have been specifically altered. Maybe you could give absolute
>>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>>> truncated by however many bases.
>>>> 
>>>> Cheers,
>>>> Roy.
>>>> 
>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>> Hi Chris,
>>>>> 
>>>>> I have made the changes in a Git fork and made the pull request now.
>>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>>> HOWTO for the BioPerl wiki.
>>>>> 
>>>>> Frank
>>>>> 
>>>>> 
>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>> Sounds very promising!  The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules).
>>>>>> 
>>>>>> chris
>>>>>> 
>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence
>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector
>>>>>>> and insert a fragment into it while preserving all the annotations and
>>>>>>> moving the features accordingly.
>>>>>>> My main aim was to split features that span deletion/insertion sites in
>>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>>> methods.
>>>>>>> I have modified Bio::SeqUtils so that I have the following new methods:
>>>>>>> 
>>>>>>> delete
>>>>>>> ======
>>>>>>> removes a segment from a sequence object and adjusts positions and types
>>>>>>> of locations of sequence features:
>>>>>>> - locations of features that span the deletion sites are turned into
>>>>>>> Splits.
>>>>>>> - locations that extend into the deleted region are turned to Fuzzy to
>>>>>>> indicate that their true start/end was lost.
>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>> - other features are shifted according to the length of the deletion.
>>>>>>> 
>>>>>>> insert
>>>>>>> ======
>>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>>> - locations of features that span the insertion site are split but
>>>>>>> position types are not turned to Fuzzy because no part of the original
>>>>>>> feature is lost.
>>>>>>> - other features are shifted according to the length of the insertion.
>>>>>>> 
>>>>>>> ligate
>>>>>>> ======
>>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>>> Simply calls delete [, reverse_complement_with_features] and insert in
>>>>>>> turn.
>>>>>>> 
>>>>>>> 
>>>>>>> One situation I haven't handled yet is a deletion that spans the origin
>>>>>>> of a circular molecule but that should be a rare thing to do anyway. The
>>>>>>> code currently throws an error if this is attempted.
>>>>>>> 
>>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>> 
>>>>>>> Frank
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 

Chris Fields
Senior Research Scientist
National Center for Supercomputing Applications
Institute for Genomic Biology
University of Illinois at Urbana-Champaign





More information about the Bioperl-l mailing list