[Biopython] zero-length feature
Peter
biopython at maubp.freeserve.co.uk
Mon Mar 22 11:41:52 UTC 2010
On Mon, Mar 22, 2010 at 11:37 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Does your genome have a single N (or n) character at this point?
>
> If so, it does make sense to use 422950..422950 to mean that
> single letter - it really is a feature of length one. That should be
> possible with the existing (unmodified) Biopython EMBL/GenBank
> output. Note that in python notation this would be the region
> [422949:422950], where start != end but instead start+1 == end.
>
> If however the gap isn't explicitly in the genome string, I think you
> should be using something like 422950^422951 to indicate the
> gap is between bases 422950 and 422951. This is a zero length
> feature.
>
> Perhaps I have misunderstood your aim?
I should perhaps include a quote from the EMBL documentation
to explain my question a little further:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
Feature Key gap
Definition gap in the sequence
Mandatory qualifiers /estimated_length=unknown or <integer>
Optional qualifiers /experiment="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/map="text"
/note="text"
Comment the location span of the gap feature for an unknown
gap is 100 bp, with the 100 bp indicated as 100 "n"'s in
the sequence. Where estimated length is indicated by
an integer, this is indicated by the same number of
"n"'s in the sequence.
No upper or lower limit is set on the size of the gap.
i.e. I think EMBL would want you to insert a string of n characters
into the genome where you have a gap, and then the gap feature
would describe this string of n characters.
Peter
More information about the Biopython
mailing list