[Bioperl-l] Re: [Bioperl-guts-l] Bio::Restriction modifications

Rob Edwards redwards at utmem.edu
Wed Dec 3 08:16:13 EST 2003


Peter,

It sounds like we used the same approach to fix the same problems - 
great minds :)

However, we both fixed extra things and so we need to merge.

Here
> Bio::Restriction::Analysis
>
>     Changed the site finding algorithm in '_new_cuts'. The new 
> algorithm
> uses pos to find the recognition site regex and finds the actual cut 
> point
> using $enz->cut. The cut positions are put in an array and substr is 
> used
> to get digestion fragments.

This is the best way to do it, it allows you to do a lot more with the 
sequences. I also used this approach for ambiguous enzymes. For 
non-ambiguous enzymes it is a lot more efficient to use an index based 
string search rather than a regexp, and I added an internal method to 
deal with this.

>    Fixed the origin bug for circular target sequences. If the sequence 
> is
> circular the first 40 bases are appended to the end of the sequence. 
> Sites
> that span the origin are now detected. First and last fragments are
> reattached as before.

I added a  method to EnzymeCollection to return the longest cut site in 
the collection and then added that length of sequence. This will 
correctly deal with any site > 40 bp. Using position instead of 
fragments sure makes this a lot easier, huh?

>    Fixed the non-palindromes bug.  Non-palindromic sites are detected 
> in
> the reverse orientation by searching the reverse complement of the 
> target
> sequence, using $enz->complementary_cut and then subtracting from the
> sequence length.

I added a new method to Enzyme.pm to return the complementary cut site. 
I don't particularly like the way I did this. I'd like Analysis to 
correctly use the Enzyme->cut and Enzyme->complementary_cut which it 
doesn't at the moment. I think this is a lot better than using the "^" 
in the site routine.

>    Fixed a bug where a site would be detected even if the 
> complementary cut
> was off the end of the target sequence.

I am not sure if I deal with this properly. I think it should be 
handled properly with circularization?


>    Added a method 'fragment_maps($enzyme_name)'. This method returns an
> array of hashes where each hash corresponds to a restriction fragment. 
> The
> hash keys are 'start', 'end' and 'seq'. 'seq' holds the sequence of the
> fragment, while 'start' and 'end' hold the positions of the first and 
> last
> bases of the fragment in the target sequence. This will allow 
> restriction
> fragments to be turned into features.

This is cool.

>       Added internal methods '_find_cut', '_multicut' and ' _digest'.
> '_find_cut' figures out the cut point given the position of the 
> recognition
> site and checks whether the cut point is off the end of the molecule. 
> This
> has to be done twice for non-palindromic enzymes, so it makes sense to 
> make
> it a separate method.
>    '_digest' gets the restriction fragments from the list of cut 
> points. I
> split this off of '_new_cuts' for the sake of readability, particularly
> after we start dealing with overlapping sites.
>    '_multicut' checks whether the "other" cut of a dual cut enzyme is 
> off
> the end of the sequence. Could probably be folded back into 
> '_find_cut'.

I did something similar, but probably used about 5x as many methods 
(making it much less clear I am sure).

> Bio::Restriction::Enzyme
>
>    Changed the enzyme name correction code so only a single trailing 
> '1'
> gets changed to an 'I'. This should prevent name mangling of enzymes 
> like
> Alw21I.
>
>    Changed '$cut && $self->cut($cut);' to 'defined $cut &&
> $self->cut($cut);' (same for complementary cut) in the constructor. 
> This
> allows $self->cut to be zero.
>
>    Some minor changes in methods 'cut' and 'site'.

These should be committed.

> Bio::Restriction::IO::base
>
>    Changed method '_make_multicuts' so that the second enzyme is not 
> added
> to the collection and the first enzyme is not added to the second 
> enzyme's
> "others" array, only the other way around. The idea is to prevent dual 
> cut
> sites from being reported twice. This will break scripts that used this
> behavior to work around the dual cut bug, but you shouldn't have to do 
> that
> any more.
>
>    Similar changes made to '_make_multisites'.

These should be committed too.

I'll try and merge this code with what I wrote and add it to cvs in a 
day or so. Once I get a near working version, I'll let you know, 
perhaps you can help debug the changes.

Take a look at the latest versions in cvs and let me know what you 
think.

Rob



More information about the Bioperl-l mailing list