[BioPython] writing clustalw alignment to file
Scott T. Kelley
kelleys@ucsu.colorado.edu
Fri, 16 Nov 2001 11:27:20 -0800
I had a couple questions concerning Biopython I hope someone can help me
answer. (I am using the Windows version BTW.)
(1) I am trying to convert a clustalw alignment to a fasta alignment (pg.
38-39 of Cookbook). I can do this just like in the book, but then I want to
write the fasta alignment to a file and I don't know how to do this. Also, I
would really like to search through the fasta or clustalw file for
particular sequence names (titles) so that I can retrieve the particular
sequence but I do not know how this is done.
For instance, I would like to find the following sequence from a file:
>APE2174
------------------------------------------------------------
--------------------------MVKGSQVKPSTTELLLKAVSAKAPSGD-------
---PIHQK---IGDLPFEKIVEIAIEKKPD--LLAKTLKAAVKTILGSARSIGVTVDGKD
PKEVTRQVDEGVYDAVLAKYEEKWEEAEG
(2) The other question has to do with parsing Genbank files. I can use the
Genbank parser to retrieve exon and intron sequences just fine and I give a
(partial) example of a function I wrote to do this below from the "mRNA"
feature (very similar to the example in the cookbook). However, when I try
to get the positions from the "gene" feature I run into problems. For
example:
gene 2..1021
/gene="HI0001"
All I want is the positions of the gene and the gene name but I keep running
into problems. Any feedback you could give me on these problems would be
appreciated. Thanks! -Scott
def get_introns(self, genbank_file_name):
"""Function parses a genbank file and finds all exonsequences"""
# --- load a parser and iterator for our GenBank file
gb_handle = open(genbank_file_name, "r")
# -- a parser that will give you back SeqFeature objects
feature_parser = GenBank.FeatureParser()
iterator = GenBank.Iterator(gb_handle, feature_parser)
# begin iterating through the file and getting GenBank records
while 1:
# get a SeqFeature object for the next GenBank record. When we run
# out of records in the file, cur_entry will be None
cur_entry = iterator.next()
if cur_entry is None:
break
print "Writing intron entries for %s to file" % cur_entry.id
# loop through all of the features for the entry
for feature in cur_entry.features:
# when we've got mRNA features, parse the info out of them
if feature.type == "mRNA":
# loop through all of the subfeatures and get the
# individual exons
all_exons = []
for sub_feature in feature.sub_features:
# add the info about the exon, ignoring fuzzy locations
all_exons.append((sub_feature.location.nofuzzy_start,
sub_feature.location.nofuzzy_end))
-------------------
Scott T. Kelley, Ph.D.
Campus Box 347
MCD Biology
University of Colorado
Boulder, CO 80309-0347
Phone: (303) 735-1808
Fax: (303) 492-7744
E-mail: Scott.Kelley@Colorado.edu