[Bioperl-l] concatenate two embl sequence files

Heikki Lehvaslaiho heikki at sanbi.ac.za
Wed Jan 25 16:11:45 EST 2006


Thanks Roy!

I'll check to code in tomorrow when I am less sleepy and can go through the 
code in detail. In principle the code looks good. It definitely needs tests. 
If you have written any please do post them.

A few more checks to make sure seq_>alphabet is the same in all sequences 
might be a good idea.

   -Heikki



On Wednesday 25 January 2006 19:05, Roy Chaudhuri wrote:
> Hi all.
>
> I also had need of a function to concatenate two Bio::Seq objects, so had a
> go at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted
> below. I'm not too sure about the concept of sub-SeqFeatures (I've never
> seen any sequence that had more than one level of feature)- I worked on the
> assumption that little sub-SeqFeatures can have littler sub-SeqFeatures and
> so ad infinitum, but as I don't have an example file I haven't been able to
> test if this works. Likewise, although I think the code should cope with
> Fuzzy and Split locations, I haven't tested this with any particularly
> unusual examples.
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
>
>
>
> =head2 cat
>
>   Title   : cat
>   Usage   : my $catseq = Bio::SeqUtils->cat(@seqs)
>   Function: Concatenates an array of Bio::Seq objects, using the first
> sequence as a template for species etc. Adjusts the coordinates of features
> from any additional objects.
>   Returns : A sequence object of the same class as the first argument.
>   Args    : array of sequence objects
>
>
> =cut
>
> sub cat {
>      my ($self, @seqs) = @_;
>      my $seq=shift @seqs;
>      $self->throw('Object [$seq] '. 'of class ['. ref($seq).
>                   '] should be a Bio::PrimarySeqI ')
>      unless $seq->isa('Bio::PrimarySeqI');
>      for (@seqs) {
>      	$self->throw('Object [$seq] '. 'of class ['. ref($seq).
>                   '] should be a Bio::PrimarySeqI ')
> 	unless $seq->isa('Bio::PrimarySeqI');
> 	my $length=$seq->length;
> 	$seq->seq($seq->seq.$_->seq);
> 	for my $feat ($_->get_SeqFeatures) {
> 	    $seq->add_SeqFeature($self->_coordAdjust($feat, $length));
> 	}
>      }
>      return $seq;
> }
>
> =head2 _coordAdjust
>
>   Title   : _coordAdjust
>   Usage   : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100);
>   Function: Recursive subroutine to adjust the coordinates of a feature
>             and all its subfeatures.
>   Returns : A Bio::SeqFeatureI compliant object.
>   Args    : A Bio::SeqFeatureI compliant object,
>             the number of bases to add to the coordinates
>
>
> =cut
>
> sub _coordAdjust {
>      my ($self, $feat, $add)=@_;
>      $self->throw('Object [$feat] '. 'of class ['. ref($feat).
>                   '] should be a Bio::SeqFeatureI ')
> 	unless $feat->isa('Bio::SeqFeatureI');
>      my @adjsubfeat;
>      for my $subfeat ($feat->remove_SeqFeatures) {
> 	push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat);
>      }
>      my @loc=$feat->location->each_Location;
>      map {
> 	my @coords=($_->start, $_->end);
> 	map s/(\d+)/$add+$1/ge, @coords;
> 	$_->start(shift @coords);
> 	$_->end(shift @coords);
>      } @loc;
>      if (@loc==1) {
> 	$feat->location($loc[0])
>      } else {
> 	my $loc=Bio::Location::Split->new;
> 	$loc->add_sub_Location(@loc);
> 	$feat->location($loc);
>      }
>      $feat->add_SeqFeature($_) for @adjsubfeat;
>      return $feat;
> }
>
> > Jan,
> >
> > It would be easy if someone had written a function to do it. Even writing
> > the function is not hard.  I do not think there is no other way than go
> > through all features, though.
> >
> > In my opinion this would be an excellent addition to Bio::Seq::Utilities.
> >
> > E.g. cat($arrayrefofsequences, optional_seq_class_to_create)
> >      return a new seq, species and other info based on the first seq in
> > array
> >
> > Could you  write it and post to bugzilla?
> >
> > 	-Heikki
> >
> > On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote:
> >> Hi all,
> >>
> >> Does anyone know of an easy way to concatenate two sequences, including
> >> recalculation of features positions of the second one? E.g.
> >>   seq 1 = 100 bp
> >>     feature A: 5..15
> >>   seq 2 = 200 bp
> >>     feature B: 20..30
> >>   => concatenated sequence 3 = 300 bp
> >>        feature A: 5..15
> >>        feature B: 120..130  <<<<<<<<<<<
> >>
> >> Annotations (features without range) should be transferred as well.
> >>
> >> Of course, it must be possible to create a blank sequence and work my
> >> way through all features, adding them to a new collection of features
> >> and stuff. But I was wondering if a simpler technique is possible.
> >>
> >> Many thanks,
> >> Jan Aerts
> >> Bioinformatics Department
> >> Roslin Institute
> >> Roslin, Scotland, UK
> >>
> >> ---------The obligatory disclaimer--------
> >> The information contained in this e-mail (including any attachments) is
> >> confidential and is intended for the use of the addressee only.   The
> >> opinions expressed within this e-mail (including any attachments) are
> >> the opinions of the sender and do not necessarily constitute those of
> >> Roslin Institute (Edinburgh) ("the Institute") unless specifically
> >> stated by a sender who is duly authorised to do so on behalf of the
> >> Institute.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list