[BioRuby] Calculation of Conserved residues.
Toshiaki Katayama
ktym at hgc.jp
Mon Apr 9 07:19:32 UTC 2007
Hi,
On 2007/04/08, at 8:31, Yonatan Gross wrote:
> seqs = []
> seqs << Bio::Sequence::AA.new(arabidopsis)
> seqs << Bio::Sequence::AA.new(tobacco)
:
The point is that Bio::Sequence::AA.new couldn't accept FASTA format string.
(Should I change this behavior?)
> factory = Bio::ClustalW.new
> report = factory.query_align(seqs)
Also, factory.query(seqs) finally passes seqs to Bio::OriginalAlignment.new(seqs)
and it looks like that this method also expects array of Bio::Sequence::AA objects
or array of objects which respond to one-of 'seq', 'naseq' or 'aaseq' methods.
In your case, I recommend you to have your sequence in the separate file
and read it through the Bio::FlatFile interface (which recognize FASTA
formatted file and iterates on each sequence by creating Bio::FastaFormat
object) then run clustalw through BioRuby.
% ruby run_clustalw.rb sequences.txt
*** :** .: ** *: .: * :: * *:* **** **** **: :*******::*** **:**.:** *.* * . .* *** ****:. ****** *. ***************** * ****:**************:* *****:********* **:************.*.*:*: ***:****.*******: ***:
The revised version of your script is as follows:
run_clustalw.rb
------------------------------------------------------------
#!/usr/bin/env ruby
require 'bio'
seqs = []
Bio::FlatFile.auto(ARGF).each do |fasta|
seqs << fasta.seq
end
clustalw = Bio::ClustalW.new
report = clustalw.query(seqs)
puts report.alignment.match_line
------------------------------------------------------------
sequences.txt:
------------------------------------------------------------
>gi|9843639|emb|CAC03598.1| Rieske FeS protein [Arabidopsis thaliana] (arabidopsis)
MASSSLSPATQLGSSRSALMAMSSGLFVKPTKMNHQMVRKEKIGLRIACQASSIPADRVPDMEKRKTLNL
LLLGALSLPTGYMLVPYATFFVPPGTGGGGGGTPAKDALGNDVVAAEWLKTHGPGDRTLTQGLKGDPTYL
VVENDKTLATYGINAVCTHLGCVVPWNKAENKFLCPCHGSQYNAQGRVVRGPAPLSLALAHADIDEAGKV
LFVPWVETDFRTGDAPWWS
>gi|19995|emb|CAA46808.1| Rieske FeS [Nicotiana tabacum] (tobacco)
MASSTLSPVTQLCSSKSGLSSVSQCLLVKPMKINSHGLGKDKRMKVKCMATSIPADDRVPDMEKRNLMNL
LLLGALSLPTAGMLVPYGTFFVPPGSGGGSGGTPAKDALGNDVIASEWLKTHPPGNRTLTQGLKGDPTYL
VVENDGTLATYGINAVCTHLGCVVPFNAAENKFICPCHGSQYNNQGRVVRGPAPLSLALAHADIDDGKVV
FVPWVETDFRTGEDPWWA
>gi|226151|prf||1412276A rieske FeS precursor protein [spinach] (spinach)
MIISIFNQLHLTENSSLMASFTLSSATPSQLCSSKNGMFAPSLALAKAGRVNVLISKERIRGMKLTCQAT
SIPADNVPDMQKRETLNLLLLGALSLPTGYMLLPYASFFVPPGGGAGTGGTIAKDALGNDVIAAEWLKTH
APGDRTLTQGLKGDPTYLVVESDKTLATFGINAVCTHLGCVVPFNAAENKFICPCHGSQYNNQGRVVRGP
APLSLALAHCDVDDGKVVFVPWTETDFRTGEAPWWSA
>gi|115472727|ref|NP_001059962.1| Os07g0556200 [Oryza sativa (japonica cultivar-group)] (rice)
MASTALSTASNPTQLCRSRASLGKPVKGLGFGRERVPRTATTITCQAASSIPADRVPDMGKRQLMNLLLL
GAISLPTVGMLVPYGAFFIPAGSGNAGGGQVAKDKLGNDVLAEEWLKTHGPNDRTLTQGLKGDPTYLVVE
ADKTLATYGINAVCTHLGCVVPWNAAENKFICPCHGSQYNNQGRVVRGPAPLSLALVHADVDDGKVLFVP
WVETDFRTGDNPWWA
>gi|37222949|gb|AAQ90151.1| putative Rieske Fe-S protein precursor [Solanum tuberosum] (potato)
MASSTLSHVTPSQLCSSKSGVSSVSQALLVKPMKINGHGMGKDKRMKAKCMAASIPADDRVPDMEKRNLM
NLLLLGALALPTGGMLVPYATFFAPPGSGGGSSGTIAKDANGNDVVVTEWLKTHSPGTRTLTQGLKGDPT
YLVVENDGTLATYGINAVCTHLGCVVPWNTAENKFICPCHGSQYNNQGKVVRGPAPLSLALAHADIDDGK
VVFVPWVETDFRTGDSPWWA
------------------------------------------------------------
Hope this helps.
Toshiaki
More information about the BioRuby
mailing list