[BioRuby] Problem with Bio::GFF::GFF2
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Wed Jun 10 06:14:30 UTC 2009
On Tue, 9 Jun 2009 17:24:38 +0300
George Githinji <georgkam at gmail.com> wrote:
> Thank you so much Naohisa for the excellent explanation!!
> however
>
> bep_gff.records.each do |record|
> p record.seqname
> end
>
> returns
> "seq1 bepipred-1.0b epitope 1 1 0.173 . . ."
>
>
> which is not what is intended and
> record.score, record.start etc all return nil.
It seems this is NOT a valid GFF2 format.
In GFF formats, delimiter must be a TAB ("\t" in Ruby).
However, in above data, it seems that characters between
"seq1" and "bepipred-1.0b" entry may be white spaces
(" " in Ruby), instead of a TAB.
Copy-and-paste from terminal or web browser, or
autocomlete function in a text editor or wordprocessor
can often create such kind of degenerated data.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>
> :(
>
>
>
>
>
> On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO
> <ngoto at gen-info.osaka-u.ac.jp>wrote:
>
> > Hi George,
> >
> > On Tue, 9 Jun 2009 15:26:45 +0300
> > George Githinji <georgkam at gmail.com> wrote:
> >
> > > Hi all,
> > > I am try to parse a GFF file. The file looks like this
> > >
> > > ##gff-version 2
> > > ##source-version bepipred-1.0b
> > > ##date 2009-06-09
> > > ##Type Protein seq1
> > > # seqname source feature start end score N/A
> > ?
> > > #
> > >
> > ---------------------------------------------------------------------------
> > > seq1 bepipred-1.0b epitope 1 1 0.173 . . .
> > > seq1 bepipred-1.0b epitope 2 2 -0.043 . . .
> > > seq1 bepipred-1.0b epitope 3 3 -0.014 . . .
> > > seq1 bepipred-1.0b epitope 4 4 0.144 . . .
> > > seq1 bepipred-1.0b epitope 5 5 0.250 . . .
> > > seq1 bepipred-1.0b epitope 6 6 0.218 . . .
> > >
> > > ....truncated
> >
> > The above GFF records do not contain any "attributes".
> > The field definition of each GFF line is:
> > <seqname> <source> <feature> <start> <end> <score> <strand> <frame>
> > [attributes] [comments]
> >
> > When talking about GFF, the word "attributes" points the
> > "attributes" field in each GFF line.
> >
> > See the GFF2 specifications document for details.
> > http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
> >
> > > and i have written the following lines with an aim of extracting the
> > start,
> > > end and score attributes. but before that i wanted to know whether the
> > full
> > > attributes are available. so i did the following.
> > >
> > > require 'rubygems'
> > > require 'bio'
> > > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff'))
> > >
> > > bep_gff.records.each do |record|
> > > puts record.attributes_to_hash.inspect
> > > end
> > >
> > > However, i get empty hashes.
> > > Any ideas?
> >
> > Because the Bio::GFF2::Record#attributes_to_hash method returns
> > "attributes" as a hash, and all "attributes" field in the above
> > GFF2 records are empty, showing empty hashes is logically right.
> >
> > If you really want a hash, adding each field into a hash would
> > be the easiest way. For example,
> >
> > bep_gff.records.each do |record|
> > h = {}
> > h['seqname'] = record.seqname
> > h['source'] = record.source
> > h['feature'] = record.feature
> > h['start'] = record.start
> > h['end'] = record.end
> > h['score'] = record.score
> > h['strand'] = record.strand
> > h['frame'] = record.frame
> > h['attributes'] = record.attributes_to_hash
> > p h
> > end
> >
> > Bio::GFF2::Record have seqname, source, feature, start, end,
> > score, strand, frame attributes(so called in the Ruby language),
> > which are inherited from Bio::GFF::Record class.
> > Normally, it is natural using the above attributes(in Ruby)
> > directly without creating a hash.
> >
> > Note that using attributes_to_hash may lost some data when
> > there are two or more values with the same tag name in an
> > "attributes" field.
> >
> > When creating new data, in case using "attributes" extensively,
> > GFF3 is recommended, because the design of GFF2 attributes is
> > somehow broken.
> >
> > > Thank you
> > >
> > >
> > > --
> > > ---------------
> > > Sincerely
> > > George
> > >
> > > Skype: george_g2
> > > Blog: http://biorelated.wordpress.com/
> >
> > Your blog is nice!
> >
> > --
> > Naohisa Goto
> > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> >
>
>
>
> --
> ---------------
> Sincerely
> George
>
> Skype: george_g2
> Blog: http://biorelated.wordpress.com/
>
More information about the BioRuby
mailing list