[BioRuby] Problem with Bio::GFF::GFF2
George Githinji
georgkam at gmail.com
Tue Jun 9 14:24:38 UTC 2009
Thank you so much Naohisa for the excellent explanation!!
however
bep_gff.records.each do |record|
p record.seqname
end
returns
"seq1 bepipred-1.0b epitope 1 1 0.173 . . ."
which is not what is intended and
record.score, record.start etc all return nil.
:(
On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO
<ngoto at gen-info.osaka-u.ac.jp>wrote:
> Hi George,
>
> On Tue, 9 Jun 2009 15:26:45 +0300
> George Githinji <georgkam at gmail.com> wrote:
>
> > Hi all,
> > I am try to parse a GFF file. The file looks like this
> >
> > ##gff-version 2
> > ##source-version bepipred-1.0b
> > ##date 2009-06-09
> > ##Type Protein seq1
> > # seqname source feature start end score N/A
> ?
> > #
> >
> ---------------------------------------------------------------------------
> > seq1 bepipred-1.0b epitope 1 1 0.173 . . .
> > seq1 bepipred-1.0b epitope 2 2 -0.043 . . .
> > seq1 bepipred-1.0b epitope 3 3 -0.014 . . .
> > seq1 bepipred-1.0b epitope 4 4 0.144 . . .
> > seq1 bepipred-1.0b epitope 5 5 0.250 . . .
> > seq1 bepipred-1.0b epitope 6 6 0.218 . . .
> >
> > ....truncated
>
> The above GFF records do not contain any "attributes".
> The field definition of each GFF line is:
> <seqname> <source> <feature> <start> <end> <score> <strand> <frame>
> [attributes] [comments]
>
> When talking about GFF, the word "attributes" points the
> "attributes" field in each GFF line.
>
> See the GFF2 specifications document for details.
> http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
>
> > and i have written the following lines with an aim of extracting the
> start,
> > end and score attributes. but before that i wanted to know whether the
> full
> > attributes are available. so i did the following.
> >
> > require 'rubygems'
> > require 'bio'
> > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff'))
> >
> > bep_gff.records.each do |record|
> > puts record.attributes_to_hash.inspect
> > end
> >
> > However, i get empty hashes.
> > Any ideas?
>
> Because the Bio::GFF2::Record#attributes_to_hash method returns
> "attributes" as a hash, and all "attributes" field in the above
> GFF2 records are empty, showing empty hashes is logically right.
>
> If you really want a hash, adding each field into a hash would
> be the easiest way. For example,
>
> bep_gff.records.each do |record|
> h = {}
> h['seqname'] = record.seqname
> h['source'] = record.source
> h['feature'] = record.feature
> h['start'] = record.start
> h['end'] = record.end
> h['score'] = record.score
> h['strand'] = record.strand
> h['frame'] = record.frame
> h['attributes'] = record.attributes_to_hash
> p h
> end
>
> Bio::GFF2::Record have seqname, source, feature, start, end,
> score, strand, frame attributes(so called in the Ruby language),
> which are inherited from Bio::GFF::Record class.
> Normally, it is natural using the above attributes(in Ruby)
> directly without creating a hash.
>
> Note that using attributes_to_hash may lost some data when
> there are two or more values with the same tag name in an
> "attributes" field.
>
> When creating new data, in case using "attributes" extensively,
> GFF3 is recommended, because the design of GFF2 attributes is
> somehow broken.
>
> > Thank you
> >
> >
> > --
> > ---------------
> > Sincerely
> > George
> >
> > Skype: george_g2
> > Blog: http://biorelated.wordpress.com/
>
> Your blog is nice!
>
> --
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>
--
---------------
Sincerely
George
Skype: george_g2
Blog: http://biorelated.wordpress.com/
More information about the BioRuby
mailing list