[BioPython] Iterating through an alignment to calculate the number of gaps and their lengths
Ruchira Datta
ruchira.datta at gmail.com
Wed Feb 6 21:39:02 UTC 2008
Hi, Matthew try this:
import re
contiguous_gap = re.compile('-+')
gappy_regions = contiguous_gap.findall(seq)
Now gappy_regions contains a list of the gappy regions, e.g., if seq =
'ILV--F---AAS', then gappy_regions will be ['--','---']
Then to find the lengths of the gappy_regions, you can just say
[len(region) for region in gappy_regions]
which would give you in the above example
[2,3]
Hope this helps,
--Ruchira
Ruchira S. Datta <http://www.ruchiradatta.com/>, Ph.D
Postdoctoral Researcher <http://phylogenomics.berkeley.edu/members/ruchira/>
Berkeley Phylogenomics Group <http://phylogenomics.berkeley.edu/>
324D Stanley Hall <http://www.qb3.org/stanleyfactsheet.htm>
Department of Bioengineering <http://bioeng.berkeley.edu/>
California Institute for Quantitative Biosciences (QB3)
<http://www.qb3.org/>
University of California <http://www.ucop.edu/>
Berkeley <http://www.ci.berkeley.ca.us/>, CA <http://gocalif.ca.gov/> 94720
Phone: (510) 642-6642
Email: ruchira at berkeley.edu <//ruchira at berkeley.edu>
On Feb 6, 2008 1:21 PM, Matthew Abravanel <vmatthewa at gmail.com> wrote:
> Hi Everyone,
>
> I was wondering if anyone could help, I am trying to write a little python
> script to iterate through an alignment and determine the number of gaps
> the
> alignment has and their lengths and output that information as a list.
> Such as this made up alignemt:
>
> Seq1 ATT-AGC-C
> Seq2 AT--AGCTC
>
> and your program runs and outputs like 2 gaps of length 1 outputted as a
> list like this [1,1] or something like that. I am still learning about
> python strings and iterators and am not sure how you would approach this?
> Appreciate any help you could give. Thanks.
>
> Sincerely,
>
> Matthew
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list