[Biopython] zero-length feature

Anne Pajon ap12 at sanger.ac.uk
Mon Mar 22 11:44:00 UTC 2010


My genome has a single N character at this point.

Here is the code I use to insert these gaps:

     # Add FT gap
     seq = record.seq
     in_N = False
     gap_features = []
     for i in range(len(seq)):
         if seq[i] == 'N' and not in_N:
             start_N = i
             in_N = True
         if in_N and not seq[i+1] == 'N':
             end_N = i
             if start_N == end_N:
                 log.warning("gap of size 1 %s..%s" % (start_N, end_N))
             length = (end_N - start_N) + 1
             gap_feature = SeqFeature(FeatureLocation(start_N,end_N 
+1), strand=1, type="gap")
             gap_feature.qualifiers['estimated_length'] = [length]
             gap_features.append(gap_feature)
             in_N = False

What should I do to make it works with (unmodified) Biopython EMBL  
output? Thanks in advance for your help.

Regards,
Anne.

On 22 Mar 2010, at 11:37, Peter wrote:

> On Mon, Mar 22, 2010 at 11:24 AM, Anne Pajon <ap12 at sanger.ac.uk>  
> wrote:
>> Hi Peter,
>>
>> Here is the feature location string I would like to achieve in the  
>> EMBL
>> output:
>>
>> FT   gap             422950..422950
>> FT                   /estimated_length=1
>>
>>
>> Regards,
>> Anne.
>
> Does your genome have a single N (or n) character at this point?
>
> If so, it does make sense to use 422950..422950 to mean that
> single letter - it really is a feature of length one. That should be
> possible with the existing (unmodified) Biopython EMBL/GenBank
> output. Note that in python notation this would be the region
> [422949:422950], where start != end but instead start+1 == end.
>
> If however the gap isn't explicitly in the genome string, I think you
> should be using something like 422950^422951 to indicate the
> gap is between bases 422950 and 422951. This is a zero length
> feature.
>
> Perhaps I have misunderstood your aim?
>
> Peter

--
Dr Anne Pajon - Pathogen Genomics, Team 81
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Biopython mailing list