[Bioperl-l] Added sequence parsing code to Bio::Tools::GFF

Chris Mungall cjm at fruitfly.org
Mon Jul 12 23:18:59 EDT 2004


I have added sequence parsing code to the GFF parser; note that sequence
data is only available in GFF3.

It should now be possible to create a Bio::SeqIO::gff3 class, which would
be a short wrapper to Bio::Tools::GFF. Most people would still want to use
the Tools parser to parse on a per-feature basis, but the option of
treating gff3 in a similar fashion to genbank/embl/chadoxml/etc via SeqIO
would be there.

According to the GFF3 spec the sequence data can come after or before the
relevant features; this means that the parser has the potential to be a
memory hog (but then the existing SeqIO classes already are with genbank
whole-chromosome entries).

I've included the new docs from the gff parser below; if people agree with
this general means of handling sequence data then I'll go ahead and add a
Bio::SeqIO::gff3 as well.

=head1 GFF3 AND SEQUENCE DATA

[added by cjm 2004/07/09]

GFF3 supports sequence data; see
http://song.sourceforge.net/gff3-jan04.shtml

There are a number of ways to deal with this -

If you call

  $gffio->ignore_sequence_data_toggle(1)

prior to parsing the sequence data is ignored; this is useful if you
just want the features. It avoids the memory overhead in building and
caching sequences

Alternatively, you can call either

  $gffio->get_all_seqs()

Or

  $gffio->seq_id_by_h()

At the B<end> of parsing to get either a list or hashref of Bio::Seq
objects (see the documentation for each of these methods)

Note that these objects will not have the features attached - you have
to do this yourself, OR call

  $gffio->features_attached_to_seqs_toggle(1)

PRIOR to parsing; this will ensure that the Seqs have the features
attached; ie you will then be able to call

  $seq->get_SeqFeatures();

And use Bio::SeqIO methods

Note that auto-attaching the features to seqs will incur a higher
memory overhead as the features must be cached until the sequence data
is found

=cut



More information about the Bioperl-l mailing list