[Bioperl-l] concatenate two embl sequence files
Roy Chaudhuri
roy at colibase.bham.ac.uk
Wed Jan 25 17:05:29 UTC 2006
Hi all.
I also had need of a function to concatenate two Bio::Seq objects, so had a go
at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted below. I'm
not too sure about the concept of sub-SeqFeatures (I've never seen any sequence
that had more than one level of feature)- I worked on the assumption that little
sub-SeqFeatures can have littler sub-SeqFeatures and so ad infinitum, but as I
don't have an example file I haven't been able to test if this works. Likewise,
although I think the code should cope with Fuzzy and Split locations, I haven't
tested this with any particularly unusual examples.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.
http://xbase.bham.ac.uk
=head2 cat
Title : cat
Usage : my $catseq = Bio::SeqUtils->cat(@seqs)
Function: Concatenates an array of Bio::Seq objects, using the first sequence
as a template for species etc. Adjusts the coordinates of features
from any additional objects.
Returns : A sequence object of the same class as the first argument.
Args : array of sequence objects
=cut
sub cat {
my ($self, @seqs) = @_;
my $seq=shift @seqs;
$self->throw('Object [$seq] '. 'of class ['. ref($seq).
'] should be a Bio::PrimarySeqI ')
unless $seq->isa('Bio::PrimarySeqI');
for (@seqs) {
$self->throw('Object [$seq] '. 'of class ['. ref($seq).
'] should be a Bio::PrimarySeqI ')
unless $seq->isa('Bio::PrimarySeqI');
my $length=$seq->length;
$seq->seq($seq->seq.$_->seq);
for my $feat ($_->get_SeqFeatures) {
$seq->add_SeqFeature($self->_coordAdjust($feat, $length));
}
}
return $seq;
}
=head2 _coordAdjust
Title : _coordAdjust
Usage : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100);
Function: Recursive subroutine to adjust the coordinates of a feature
and all its subfeatures.
Returns : A Bio::SeqFeatureI compliant object.
Args : A Bio::SeqFeatureI compliant object,
the number of bases to add to the coordinates
=cut
sub _coordAdjust {
my ($self, $feat, $add)=@_;
$self->throw('Object [$feat] '. 'of class ['. ref($feat).
'] should be a Bio::SeqFeatureI ')
unless $feat->isa('Bio::SeqFeatureI');
my @adjsubfeat;
for my $subfeat ($feat->remove_SeqFeatures) {
push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat);
}
my @loc=$feat->location->each_Location;
map {
my @coords=($_->start, $_->end);
map s/(\d+)/$add+$1/ge, @coords;
$_->start(shift @coords);
$_->end(shift @coords);
} @loc;
if (@loc==1) {
$feat->location($loc[0])
} else {
my $loc=Bio::Location::Split->new;
$loc->add_sub_Location(@loc);
$feat->location($loc);
}
$feat->add_SeqFeature($_) for @adjsubfeat;
return $feat;
}
>
>
> Jan,
>
> It would be easy if someone had written a function to do it. Even writing the
> function is not hard. I do not think there is no other way than go through
> all features, though.
>
> In my opinion this would be an excellent addition to Bio::Seq::Utilities.
>
> E.g. cat($arrayrefofsequences, optional_seq_class_to_create)
> return a new seq, species and other info based on the first seq in array
>
> Could you write it and post to bugzilla?
>
> -Heikki
>
>
> On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote:
>> Hi all,
>>
>> Does anyone know of an easy way to concatenate two sequences, including
>> recalculation of features positions of the second one? E.g.
>> seq 1 = 100 bp
>> feature A: 5..15
>> seq 2 = 200 bp
>> feature B: 20..30
>> => concatenated sequence 3 = 300 bp
>> feature A: 5..15
>> feature B: 120..130 <<<<<<<<<<<
>>
>> Annotations (features without range) should be transferred as well.
>>
>> Of course, it must be possible to create a blank sequence and work my
>> way through all features, adding them to a new collection of features
>> and stuff. But I was wondering if a simpler technique is possible.
>>
>> Many thanks,
>> Jan Aerts
>> Bioinformatics Department
>> Roslin Institute
>> Roslin, Scotland, UK
>>
>> ---------The obligatory disclaimer--------
>> The information contained in this e-mail (including any attachments) is
>> confidential and is intended for the use of the addressee only. The
>> opinions expressed within this e-mail (including any attachments) are
>> the opinions of the sender and do not necessarily constitute those of
>> Roslin Institute (Edinburgh) ("the Institute") unless specifically
>> stated by a sender who is duly authorised to do so on behalf of the
>> Institute.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ______ _/ _/_____________________________________________________
> _/ _/
> _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
> _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho
> _/ _/ _/ SANBI, South African National Bioinformatics Institute
> _/ _/ _/ University of Western Cape, South Africa
> _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
>
More information about the Bioperl-l
mailing list