[Biopython] zero-length feature

Peter biopython at maubp.freeserve.co.uk
Mon Mar 22 11:41:52 UTC 2010


On Mon, Mar 22, 2010 at 11:37 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Does your genome have a single N (or n) character at this point?
>
> If so, it does make sense to use 422950..422950 to mean that
> single letter - it really is a feature of length one. That should be
> possible with the existing (unmodified) Biopython EMBL/GenBank
> output. Note that in python notation this would be the region
> [422949:422950], where start != end but instead start+1 == end.
>
> If however the gap isn't explicitly in the genome string, I think you
> should be using something like 422950^422951 to indicate the
> gap is between bases 422950 and 422951. This is a zero length
> feature.
>
> Perhaps I have misunderstood your aim?

I should perhaps include a quote from the EMBL documentation
to explain my question a little further:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

Feature Key           gap

Definition            gap in the sequence
Mandatory qualifiers  /estimated_length=unknown or <integer>
Optional qualifiers   /experiment="text"
                      /inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
                      /map="text"
                      /note="text"
Comment               the location span of the gap feature for an unknown
                      gap is 100 bp, with the 100 bp indicated as 100 "n"'s in
                      the sequence.  Where estimated length is indicated by
                      an integer, this is indicated by the same number of
                      "n"'s in the sequence.
                      No upper or lower limit is set on the size of the gap.


i.e. I think EMBL would want you to insert a string of n characters
into the genome where you have a gap, and then the gap feature
would describe this string of n characters.

Peter



More information about the Biopython mailing list