[BioRuby] concatenate two EMBL sequence files
jan aerts (RI)
jan.aerts at bbsrc.ac.uk
Wed Jan 18 03:30:02 EST 2006
Just for future reference (including myself): the code to shift features in a GenBank sequence. It might be improved on (a lot?), but it works for me...
jan.
#!/usr/bin/ruby
require 'bio'
module Bio
class Location
def shift!(slide_length)
@from += slide_length.to_i
@to += slide_length.to_i
end
def to_s
output = ''
if @lt
from_str = '<' + @from.to_s
else
from_str = @from.to_s
end
if @gt
to_str = '>' + @to.to_s
else
to_str = @to.to_s
end
output = from_str + '..' + to_str
if @strand == -1
output = 'complement(' + output + ')'
end
return output
end
end
class Locations
def shift!(slide_length)
@locations.each do |x|
x.shift!(slide_length)
end
end
def to_s
output = ''
if @locations.length == 1
output = @locations[0].to_s
else
parts = Array.new()
@locations.each do |loc|
parts.push(loc.to_s)
end
output = 'join(' + parts.join(',') + ')'
end
return output
end
end
class Feature
def shift!(slide_length)
locations = Locations.new(@position)
locations.shift!(slide_length)
@position = locations.to_s
end
end
end
ff = Bio::FlatFile.open(Bio::GenBank, "M33388.gb")
ff.each_entry do |gb|
gb.features.each do |feature|
puts "BEFORE: " + feature.position
feature.shift!(5)
puts "AFTER: " + feature.position
end
end
ff.close
-----Original Message-----
From: Toshiaki Katayama [mailto:ktym at hgc.jp]
Sent: Tue 1/17/2006 2:16 PM
To: jan aerts (RI)
Subject: Re: [BioRuby] concatenate two EMBL sequence files
Hello,
On 2006/01/17, at 21:57, jan aerts (RI) wrote:
> I've tried BioPerl, but it gets pretty difficult when you want to shift
> features that have positions like
> "join(<5..10,complement(55..70),13..14)"...
I have never tried but I believed that BioPerl is good for these situation
as it seems to have various sequence and location models...
Note that, the location including '<' notation is not well supported in BioRuby.
Please keep in mind that '<5..10' will be treated as '5..10'.
In your case, I suppose you understand what you are doing. :)
> It was _really_ straightforward to add such a Bio::Locations#shift
> method to my bioruby installation as you suggested, but now I'm trying
> to figure out where to add the code for exporting to EMBL format. There
> is a to_fasta method in bio/sequence.rb, but a newly added to_embl
> method in that file can not be found by my script. Can anyone tell me
> where I should add such a method?
>
> My code to read a genbank file, shift all features by 5 and print in
> EMBL format:
> $ ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
> $ ff.each_entry do |gb|
> $ gb.features.each do |feature|
> $ locs = Bio::Locations.new(feature.position)
> $ locs.shift(5) ####Uses new method to shift all locs by 5
I realized, after I hit send button, that the 'shift' method should be
added to Bio::Features class to shift all the features of the sequence.
gb2 = gb.features.shift(5)
gb2.features.each ...
or
gb.features.each do |feature|
feature.shift!(5)
end
puts gb.to_embl
However, ideally, having a rich sequence model like BioPerl which converts
sequence with annotations from any sequence entry (GenBank, EMBL etc.),
would give more sophisticated solution.
If you have two genbank entries in variables gb1 and gb2, and if there were
a 'to_richseq' method which converts Bio::GenBank data model to Bio::RichSeq
or something,
seq = gb1.to_richseq + gb2.to_richseq.shift(5)
will give you a new sequence which has a merged sequence with shifted annotations.
> $ end
> $ puts gb.to_embl ####Doesn't work yet!! Where do I
> put the to_embl method?
> $ end
>
> Two problems still exist:
> (1) in what file from the bioruby libs should I put the to_embl code?
In Ruby, classes are open, so you can put
module Bio
class GenBank
def to_embl
...
end
end
end
before you use the 'to_embl' method against your Bio::GenBank object
(say, on top of your script).
Again, I think it would be better to have this method in the new
not-yet-implemented Bio::RichSeq class.
The Bio::Sequence#to_fasta method should also be moved to this class.
I don't like the name 'RichSeq', though.
> (2) I suppose that, in the code above, the features themselves don't get
> shifted (the locs object has been detached from the gb object). How can
> I do that?
Yes, as I mentioned above, my previous suggestion was not good.
The new shift/shift! methods should be in Bio::Features or Bio::Feature
(or in Bio::RichSeq would be better).
> Many thanks,
> Jan Aerts
Good luck!
Thanks,
Toshiaki Katayama
>
>> -----Original Message-----
>> From: bioruby-bounces at portal.open-bio.org
>> [mailto:bioruby-bounces at portal.open-bio.org] On Behalf Of
>> Toshiaki Katayama
>> Sent: 16 January 2006 15:29
>> To: BioRuby ML Discussion List Project
>> Subject: Re: [BioRuby] concatenate two EMBL sequence files
>>
>> Hello,
>>
>> For your purpose, using BioPerl would be straight forward.
>>
>> BioRuby doesn't have feature re-location mechanism and no one
>> is working on writing out in EMBL format for now.
>>
>> I might consider to do the similar job by converting both
>> sequence to GFF.
>> By the way, having Bio::Locations#shift(slide_length) method
>> would be helpful.
>>
>> Regards,
>> Toshiaki Katayama
>>
>> On 2006/01/16, at 22:46, jan aerts (RI) wrote:
>>
>>> Hello all,
>>>
>>> I'm trying to concatenate two EMBL files (_with_ features)
>> into one larger sequence. Of course, positions of features of
>> the second sequence have to recalculated in the new sequence.
>>> e.g.:
>>> sequence 1 = 100 bp
>>> feature A: 5..10
>>> sequence 2 = 200 bp
>>> feature B: 15..20
>>> => concatenated sequence 3 = 300 bp
>>> feature A: 5..10
>>> feature B: 115..120
>>>
>>> I'd like to write out the new sequence as an EMBL file.
>>>
>>> Can anyone point me in the right direction? Many thanks.
>>>
>>> A possible perl-convert,
>>> Jan Aerts, PhD
>>> Bioinformatics Group
>>> Roslin Institute
>>> Roslin
>>> Scotland, UK
>>>
>>> _______________________________________________
>>> BioRuby mailing list
>>> BioRuby at open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioruby
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioruby
>>
More information about the BioRuby
mailing list