[Biopython] zero-length feature
Anne Pajon
ap12 at sanger.ac.uk
Mon Mar 22 11:44:00 UTC 2010
My genome has a single N character at this point.
Here is the code I use to insert these gaps:
# Add FT gap
seq = record.seq
in_N = False
gap_features = []
for i in range(len(seq)):
if seq[i] == 'N' and not in_N:
start_N = i
in_N = True
if in_N and not seq[i+1] == 'N':
end_N = i
if start_N == end_N:
log.warning("gap of size 1 %s..%s" % (start_N, end_N))
length = (end_N - start_N) + 1
gap_feature = SeqFeature(FeatureLocation(start_N,end_N
+1), strand=1, type="gap")
gap_feature.qualifiers['estimated_length'] = [length]
gap_features.append(gap_feature)
in_N = False
What should I do to make it works with (unmodified) Biopython EMBL
output? Thanks in advance for your help.
Regards,
Anne.
On 22 Mar 2010, at 11:37, Peter wrote:
> On Mon, Mar 22, 2010 at 11:24 AM, Anne Pajon <ap12 at sanger.ac.uk>
> wrote:
>> Hi Peter,
>>
>> Here is the feature location string I would like to achieve in the
>> EMBL
>> output:
>>
>> FT gap 422950..422950
>> FT /estimated_length=1
>>
>>
>> Regards,
>> Anne.
>
> Does your genome have a single N (or n) character at this point?
>
> If so, it does make sense to use 422950..422950 to mean that
> single letter - it really is a feature of length one. That should be
> possible with the existing (unmodified) Biopython EMBL/GenBank
> output. Note that in python notation this would be the region
> [422949:422950], where start != end but instead start+1 == end.
>
> If however the gap isn't explicitly in the genome string, I think you
> should be using something like 422950^422951 to indicate the
> gap is between bases 422950 and 422951. This is a zero length
> feature.
>
> Perhaps I have misunderstood your aim?
>
> Peter
--
Dr Anne Pajon - Pathogen Genomics, Team 81
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Biopython
mailing list