[Bioperl-l] concatenate two embl sequence files
Heikki Lehvaslaiho
heikki at sanbi.ac.za
Wed Jan 25 16:11:45 EST 2006
Thanks Roy!
I'll check to code in tomorrow when I am less sleepy and can go through the
code in detail. In principle the code looks good. It definitely needs tests.
If you have written any please do post them.
A few more checks to make sure seq_>alphabet is the same in all sequences
might be a good idea.
-Heikki
On Wednesday 25 January 2006 19:05, Roy Chaudhuri wrote:
> Hi all.
>
> I also had need of a function to concatenate two Bio::Seq objects, so had a
> go at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted
> below. I'm not too sure about the concept of sub-SeqFeatures (I've never
> seen any sequence that had more than one level of feature)- I worked on the
> assumption that little sub-SeqFeatures can have littler sub-SeqFeatures and
> so ad infinitum, but as I don't have an example file I haven't been able to
> test if this works. Likewise, although I think the code should cope with
> Fuzzy and Split locations, I haven't tested this with any particularly
> unusual examples.
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
>
> http://xbase.bham.ac.uk
>
>
>
> =head2 cat
>
> Title : cat
> Usage : my $catseq = Bio::SeqUtils->cat(@seqs)
> Function: Concatenates an array of Bio::Seq objects, using the first
> sequence as a template for species etc. Adjusts the coordinates of features
> from any additional objects.
> Returns : A sequence object of the same class as the first argument.
> Args : array of sequence objects
>
>
> =cut
>
> sub cat {
> my ($self, @seqs) = @_;
> my $seq=shift @seqs;
> $self->throw('Object [$seq] '. 'of class ['. ref($seq).
> '] should be a Bio::PrimarySeqI ')
> unless $seq->isa('Bio::PrimarySeqI');
> for (@seqs) {
> $self->throw('Object [$seq] '. 'of class ['. ref($seq).
> '] should be a Bio::PrimarySeqI ')
> unless $seq->isa('Bio::PrimarySeqI');
> my $length=$seq->length;
> $seq->seq($seq->seq.$_->seq);
> for my $feat ($_->get_SeqFeatures) {
> $seq->add_SeqFeature($self->_coordAdjust($feat, $length));
> }
> }
> return $seq;
> }
>
> =head2 _coordAdjust
>
> Title : _coordAdjust
> Usage : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100);
> Function: Recursive subroutine to adjust the coordinates of a feature
> and all its subfeatures.
> Returns : A Bio::SeqFeatureI compliant object.
> Args : A Bio::SeqFeatureI compliant object,
> the number of bases to add to the coordinates
>
>
> =cut
>
> sub _coordAdjust {
> my ($self, $feat, $add)=@_;
> $self->throw('Object [$feat] '. 'of class ['. ref($feat).
> '] should be a Bio::SeqFeatureI ')
> unless $feat->isa('Bio::SeqFeatureI');
> my @adjsubfeat;
> for my $subfeat ($feat->remove_SeqFeatures) {
> push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat);
> }
> my @loc=$feat->location->each_Location;
> map {
> my @coords=($_->start, $_->end);
> map s/(\d+)/$add+$1/ge, @coords;
> $_->start(shift @coords);
> $_->end(shift @coords);
> } @loc;
> if (@loc==1) {
> $feat->location($loc[0])
> } else {
> my $loc=Bio::Location::Split->new;
> $loc->add_sub_Location(@loc);
> $feat->location($loc);
> }
> $feat->add_SeqFeature($_) for @adjsubfeat;
> return $feat;
> }
>
> > Jan,
> >
> > It would be easy if someone had written a function to do it. Even writing
> > the function is not hard. I do not think there is no other way than go
> > through all features, though.
> >
> > In my opinion this would be an excellent addition to Bio::Seq::Utilities.
> >
> > E.g. cat($arrayrefofsequences, optional_seq_class_to_create)
> > return a new seq, species and other info based on the first seq in
> > array
> >
> > Could you write it and post to bugzilla?
> >
> > -Heikki
> >
> > On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote:
> >> Hi all,
> >>
> >> Does anyone know of an easy way to concatenate two sequences, including
> >> recalculation of features positions of the second one? E.g.
> >> seq 1 = 100 bp
> >> feature A: 5..15
> >> seq 2 = 200 bp
> >> feature B: 20..30
> >> => concatenated sequence 3 = 300 bp
> >> feature A: 5..15
> >> feature B: 120..130 <<<<<<<<<<<
> >>
> >> Annotations (features without range) should be transferred as well.
> >>
> >> Of course, it must be possible to create a blank sequence and work my
> >> way through all features, adding them to a new collection of features
> >> and stuff. But I was wondering if a simpler technique is possible.
> >>
> >> Many thanks,
> >> Jan Aerts
> >> Bioinformatics Department
> >> Roslin Institute
> >> Roslin, Scotland, UK
> >>
> >> ---------The obligatory disclaimer--------
> >> The information contained in this e-mail (including any attachments) is
> >> confidential and is intended for the use of the addressee only. The
> >> opinions expressed within this e-mail (including any attachments) are
> >> the opinions of the sender and do not necessarily constitute those of
> >> Roslin Institute (Edinburgh) ("the Institute") unless specifically
> >> stated by a sender who is duly authorised to do so on behalf of the
> >> Institute.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/ _/_____________________________________________________
> > _/ _/
> > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
> > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho
> > _/ _/ _/ SANBI, South African National Bioinformatics Institute
> > _/ _/ _/ University of Western Cape, South Africa
> > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/
_/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
_/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho
_/ _/ _/ SANBI, South African National Bioinformatics Institute
_/ _/ _/ University of Western Cape, South Africa
_/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list