[BioPython] Iterating through an alignment to calculate the number of gaps and their lengths

Ruchira Datta ruchira.datta at gmail.com
Wed Feb 6 21:39:02 UTC 2008


Hi, Matthew try this:

import re

contiguous_gap = re.compile('-+')
gappy_regions = contiguous_gap.findall(seq)

Now gappy_regions contains a list of the gappy regions, e.g., if seq =
'ILV--F---AAS', then gappy_regions will be ['--','---']

Then to find the lengths of the gappy_regions, you can just say

[len(region) for region in gappy_regions]

which would give you in the above example

[2,3]

Hope this helps,

--Ruchira

Ruchira S. Datta <http://www.ruchiradatta.com/>, Ph.D
Postdoctoral Researcher <http://phylogenomics.berkeley.edu/members/ruchira/>
Berkeley Phylogenomics Group <http://phylogenomics.berkeley.edu/>
324D Stanley Hall <http://www.qb3.org/stanleyfactsheet.htm>
Department of Bioengineering <http://bioeng.berkeley.edu/>
California Institute for Quantitative Biosciences (QB3)
<http://www.qb3.org/>
University of California <http://www.ucop.edu/>
Berkeley <http://www.ci.berkeley.ca.us/>, CA <http://gocalif.ca.gov/> 94720
Phone: (510) 642-6642
Email: ruchira at berkeley.edu <//ruchira at berkeley.edu>


On Feb 6, 2008 1:21 PM, Matthew Abravanel <vmatthewa at gmail.com> wrote:

> Hi Everyone,
>
> I was wondering if anyone could help, I am trying to write a little python
> script to iterate through an alignment and determine the number of gaps
> the
> alignment has and their lengths and output that information as a list.
>  Such as this made up alignemt:
>
>  Seq1 ATT-AGC-C
>  Seq2 AT--AGCTC
>
> and your program runs and outputs like  2 gaps of length 1 outputted as a
> list like this [1,1] or something like that. I am still learning about
> python strings and iterators and am not sure how you would approach this?
> Appreciate any help you could give. Thanks.
>
> Sincerely,
>
>                     Matthew
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list