[Biopython-dev] Bio.GFF and Brad's code

Fri Apr 17 16:44:34 UTC 2009

--- On Fri, 4/17/09, Brad Chapman <chapmanb at 50mail.com> wrote:
> The GFF parser right now is really generating SeqFeature
> objects for each GFF line; the top level SeqRecords are a
> collection that holds the individual features. The SeqFeature
> object is pretty similar to GFF and the generic object you are
> proposing. For instance, here is a GFF line and the relevant
> attributes from SeqFeature for the line:
> 
> I	Orfeome	PCR_product	12759747	12764936	.	-	.	PCR_product "mv_B0019.1" ; Amplified 1 ; Amplified 1
> 
> type: PCR_product
> location: [12759746:12764936]
> strand: -1
> qualifiers: 
>         Key: amplified, Value: ['1']
>         Key: pcr_product, Value: ['mv_B0019.1']
>         Key: source, Value: ['Orfeome']
> 

Just to make I understand how this works, looking at your previous code example:

>>> from BCBio.GFF.GFFParser import GFFAddingIterator
>>> gff_iterator = GFFAddingIterator()
>>> rec_dict = gff_iterator.get_all_features(gff_file)

> The returned dictionary is like a dictionary from SeqIO.to_dict;
> keys are ids and values are SeqRecords.

What will be the key in rec_dict for the example GFF file above? Is that the "I" in the first column, as in

rec_dict["I"] = a SeqRecord with the SeqFeature you described above?

Best,

--Michiel