[BioPython] writing clustalw alignment to file

Scott T. Kelley kelleys@ucsu.colorado.edu
Fri, 16 Nov 2001 11:27:20 -0800


I had a couple questions concerning Biopython I hope someone can help me
answer. (I am using the Windows version BTW.)

(1) I am trying to convert a clustalw alignment to a fasta alignment (pg.
38-39 of Cookbook). I can do this just like in the book, but then I want to
write the fasta alignment to a file and I don't know how to do this. Also, I
would really like to search through the fasta or clustalw file for
particular sequence names (titles) so that I can retrieve the particular
sequence but I do not know how this is done.

For instance, I would like to find the following sequence from a file:

>APE2174
------------------------------------------------------------
--------------------------MVKGSQVKPSTTELLLKAVSAKAPSGD-------
---PIHQK---IGDLPFEKIVEIAIEKKPD--LLAKTLKAAVKTILGSARSIGVTVDGKD
PKEVTRQVDEGVYDAVLAKYEEKWEEAEG


(2) The other question has to do with parsing Genbank files. I can use the
Genbank parser to retrieve exon and intron sequences just fine and I give a
(partial) example of a function I wrote to do this below from the "mRNA"
feature (very similar to the example in the cookbook). However, when I try
to get the positions from the "gene" feature I run into problems. For
example:

     gene            2..1021
                     /gene="HI0001"

All I want is the positions of the gene and the gene name but I keep running
into problems. Any feedback you could give me on these problems would be
appreciated. Thanks! -Scott

    def get_introns(self, genbank_file_name):
        """Function parses a genbank file and finds all exonsequences"""
        # --- load a parser and iterator for our GenBank file
        gb_handle = open(genbank_file_name, "r")

        # -- a parser that will give you back SeqFeature objects
        feature_parser = GenBank.FeatureParser()
        iterator = GenBank.Iterator(gb_handle, feature_parser)

        # begin iterating through the file and getting GenBank records
        while 1:
        # get a SeqFeature object for the next GenBank record. When we run
        # out of records in the file, cur_entry will be None
            cur_entry = iterator.next()

            if cur_entry is None:
              break

            print "Writing intron entries for %s to file" % cur_entry.id
            # loop through all of the features for the entry
            for feature in cur_entry.features:
            # when we've got mRNA features, parse the info out of them
                if feature.type == "mRNA":
                # loop through all of the subfeatures and get the
                # individual exons
                    all_exons = []
                    for sub_feature in feature.sub_features:
                    # add the info about the exon, ignoring fuzzy locations

all_exons.append((sub_feature.location.nofuzzy_start,
                           sub_feature.location.nofuzzy_end))


-------------------
Scott T. Kelley, Ph.D.
Campus Box 347
MCD Biology
University of Colorado
Boulder, CO 80309-0347
Phone: (303) 735-1808
Fax: (303) 492-7744
E-mail: Scott.Kelley@Colorado.edu