[BioRuby] csfasta parser
Tomoaki NISHIYAMA
tomoakin at kenroku.kanazawa-u.ac.jp
Mon Aug 16 14:08:09 UTC 2010
Hi,
I modified fasta.rb to parse csfasta format a modified version of
fasta to
handle color sequence produced by SOLiD sequencers by
Lifetechnologies (Formally Applied Biosystems).
The most important difference is that the sequence is a nucleotide
followed
by colors specified by numbers [0-3]. When the sequencer fail to
assign a
color it may be represented by a dot ".".
The other difference is that mapping location may be added to the
definition line
without space but separated with comma ",".
Thus the entry_id extraction should be based on comma rather than space.
In some case, more interest is for the mapping location or entry id
itself,
and the data is not touched at all. So, I made it to store the entry and
definition, but the data is not extracted at initialization but left
for lazy evaluation.
The code can be found at
http://github.com/tomoakin/bioruby/blob/master/lib/bio/db/csfasta.rb
Note that naseq etc. is not tested.
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
More information about the BioRuby
mailing list