[Bioperl-l] concatenate two embl sequence files

Roy Chaudhuri roy at colibase.bham.ac.uk
Wed Jan 25 12:05:29 EST 2006


Hi all.

I also had need of a function to concatenate two Bio::Seq objects, so had a go
at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted below. I'm
not too sure about the concept of sub-SeqFeatures (I've never seen any sequence
that had more than one level of feature)- I worked on the assumption that little
sub-SeqFeatures can have littler sub-SeqFeatures and so ad infinitum, but as I
don't have an example file I haven't been able to test if this works. Likewise,
although I think the code should cope with Fuzzy and Split locations, I haven't
tested this with any particularly unusual examples.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk



=head2 cat

  Title   : cat
  Usage   : my $catseq = Bio::SeqUtils->cat(@seqs)
  Function: Concatenates an array of Bio::Seq objects, using the first sequence
            as a template for species etc. Adjusts the coordinates of features
            from any additional objects.
  Returns : A sequence object of the same class as the first argument.
  Args    : array of sequence objects


=cut

sub cat {
     my ($self, @seqs) = @_;
     my $seq=shift @seqs;
     $self->throw('Object [$seq] '. 'of class ['. ref($seq).
                  '] should be a Bio::PrimarySeqI ')
     unless $seq->isa('Bio::PrimarySeqI');
     for (@seqs) {
     	$self->throw('Object [$seq] '. 'of class ['. ref($seq).
                  '] should be a Bio::PrimarySeqI ')
	unless $seq->isa('Bio::PrimarySeqI');
	my $length=$seq->length;
	$seq->seq($seq->seq.$_->seq);
	for my $feat ($_->get_SeqFeatures) {
	    $seq->add_SeqFeature($self->_coordAdjust($feat, $length));
	}
     }
     return $seq;
}

=head2 _coordAdjust

  Title   : _coordAdjust
  Usage   : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100);
  Function: Recursive subroutine to adjust the coordinates of a feature
            and all its subfeatures.
  Returns : A Bio::SeqFeatureI compliant object.
  Args    : A Bio::SeqFeatureI compliant object,
            the number of bases to add to the coordinates


=cut

sub _coordAdjust {
     my ($self, $feat, $add)=@_;
     $self->throw('Object [$feat] '. 'of class ['. ref($feat).
                  '] should be a Bio::SeqFeatureI ')
	unless $feat->isa('Bio::SeqFeatureI');
     my @adjsubfeat;
     for my $subfeat ($feat->remove_SeqFeatures) {
	push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat);
     }
     my @loc=$feat->location->each_Location;
     map {
	my @coords=($_->start, $_->end);
	map s/(\d+)/$add+$1/ge, @coords;
	$_->start(shift @coords);
	$_->end(shift @coords);
     } @loc;
     if (@loc==1) {
	$feat->location($loc[0])
     } else {
	my $loc=Bio::Location::Split->new;
	$loc->add_sub_Location(@loc);
	$feat->location($loc);
     }
     $feat->add_SeqFeature($_) for @adjsubfeat;
     return $feat;
}

> 
> 
> Jan, 
> 
> It would be easy if someone had written a function to do it. Even writing the 
> function is not hard.  I do not think there is no other way than go through 
> all features, though.
> 
> In my opinion this would be an excellent addition to Bio::Seq::Utilities.
> 
> E.g. cat($arrayrefofsequences, optional_seq_class_to_create)
>      return a new seq, species and other info based on the first seq in array 
> 
> Could you  write it and post to bugzilla?
> 
> 	-Heikki
> 
> 
> On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote:
>> Hi all,
>>
>> Does anyone know of an easy way to concatenate two sequences, including
>> recalculation of features positions of the second one? E.g.
>>   seq 1 = 100 bp
>>     feature A: 5..15
>>   seq 2 = 200 bp
>>     feature B: 20..30
>>   => concatenated sequence 3 = 300 bp
>>        feature A: 5..15
>>        feature B: 120..130  <<<<<<<<<<<
>>
>> Annotations (features without range) should be transferred as well.
>>
>> Of course, it must be possible to create a blank sequence and work my
>> way through all features, adding them to a new collection of features
>> and stuff. But I was wondering if a simpler technique is possible.
>>
>> Many thanks,
>> Jan Aerts
>> Bioinformatics Department
>> Roslin Institute
>> Roslin, Scotland, UK
>>
>> ---------The obligatory disclaimer--------
>> The information contained in this e-mail (including any attachments) is
>> confidential and is intended for the use of the addressee only.   The
>> opinions expressed within this e-mail (including any attachments) are
>> the opinions of the sender and do not necessarily constitute those of
>> Roslin Institute (Edinburgh) ("the Institute") unless specifically
>> stated by a sender who is duly authorised to do so on behalf of the
>> Institute.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> 



More information about the Bioperl-l mailing list