[Biopython-dev] GFF3 files in Bio.SeqIO
Peter
biopython-dev at maubp.freeserve.co.uk
Tue Feb 27 01:34:04 UTC 2007
Peter wrote:
> Leighton, you also mentioned parsing the NCBI's GFF files, which seem to
> be a tab separated variable dump of the information found in a GenBank
> file's features table (link to documentation welcome).
>
> An entire GFF file could be turned into a single SeqRecord with no
> sequence, but with many sub features as SeqFeatures (akin to the results
> of the existing "genbank" parser). The location information would be
> simplified for GFF.
>
> Also, it looks like parsing just the CDS entries from a GFF file into
> "sequence free" SeqRecords would also be sensible... (akin to the
> existing "genbank-cds" parser).
I went through my old emails, and actually you did point me in this
direction:
http://song.sourceforge.net/gff3.shtml
http://www.sequenceontology.org/gff3.shtml
The file format does looks much more complicated that I had first
thought. Interestingly the file format does allow for FASTA records to
be appended to it - however the NCBI at least does not do this.
Perhaps a more general GFF3 parser would be more useful that a sequence
orientated one for Bio.SeqIO?
Peter
More information about the Biopython-dev
mailing list