[Biopython-dev] "Features" of Bio.Clustalw

Brad Chapman chapmanb at arches.uga.edu
Wed Aug 1 05:28:30 EDT 2001

Hi Davide;

[Clustalw bugs]

> Here I send the patches I was able to cook up, these are only minor
> changes, anyway I hope it will help.

Great! I applied these to CVS. Thanks much for the contribution!
> I think that having a class like MultipleAlignCL is superior to passing
> the alignment arguments to a function as is for blastpgp or blastall.

I'm glad you like it :-). This is just an idea I came up with because
clustalw had so many options. It seemed less confusing than trying to
pass in all of those options through a function.

blastall and blastpgp are Jeff Chang's functions, so maybe he could
comment on your idea to have classes to encompass their options. I'm
not positive if he even likes the "command line in a class" idea :-).

> Finally it is a general mechanism and could be used to give a uniform
> interface to functions invoking external programs.
> Do you think you would be interested in a patch implementing such
> behaviour? I think one could also retain compatibilty with the current
> interface.

As I mentioned above, it is really Jeff's call about whether or not
he'd like to see something like this in blastall() and friends; but I
do think having a general interface would be nice. There was a lot of
talk as BOSC/ISMB conference this year about other programs that it
would nice for biopython to interface to (EMBOSS in particular) so
there is definately interest and a lot of work that could be done
along these lines, if you are interested.

Also, during one of the talks at the ISMB conference I got inspired
and had an idea for a generic class for running Applications. Based on
what I scrawled on a piece of notebook paper during the talk, I wrote
up something that kind of sketches out the ideas I had and attached it
to this mail. This isn't working code or anything -- just enough to
show the ideas. I'm not really sure if this is good, but I thought you
might be interested in looking at it if you want to work further on
this. Feel free to use it or not use it.

Thanks again for the patches and interest!


-------------- next part --------------
"""Rough ideas for a general way to access applications in biopython.
import os

# --- the general classes

class AbstractApplication:
    """Generic interface for running applications from biopython.

    This class shouldn't be called directly; it should be subclassed to
    provide an implementation for a specific application.
    def __init__(self):
        self.program_name = ""
        self.parameters = []
    def run(self):
        """Construct the commandline and run the program.

    def construct_commandline(self):
        """Make the commandline with the currently set options.
        commandline = "%s " % self.program_name
        for parameter in self.parameters:
            if parameter.is_required and not(parameter.is_set):
                raise ValueError("Parameter %s is not set." % parameter.names)
            if parameter.is_set:
                commandline += str(parameter)

        return commandline

    def set_parameter(self, name, value = None):
        """Set a commandline option for a program.
        set_option = 0
        for parameter in self.parameters:
            if name in parameter.names:
                if value is not None:
                    if parameter.checker_function is not None:

                    parameter.value = value
                parameter.is_set = 1
                set_option = 1

        if set_option == 0:
            raise ValueError("Option name %s was not found." % name)
class _AbstractParameter:
    """A class to hold information about a parameter for a commandline.

    Do not use this directly, instead use one of the subclasses.


    o names -- a list of string names by which the parameter can be
    referenced (ie. ["-a", "--append", "append"]). The first name in
    the list is considered to be the one that goes on the commandline,
    for those parameters that print the option.

    o checker_function -- a reference to a function that will determine
    if a given value is valid for this parameter.

    o description -- a description of the option.

    o is_required -- a flag to indicate if the parameter must be set for
    the program to be run.

    o is_set -- if the parameter has been set

    o value -- the value of a parameter
    def __init__(self, names = [], checker_function = None, is_required = 0,
                 description = ""):
        self.names = names
        self.checker_function = checker_function
        self.description = description
        self.is_required = 0

        self.is_set = 0
        self.value = None

class _Option(_AbstractParameter):
    """Represent an option that can be set for a program.

    This holds UNIXish options like --append=yes and -a yes
    def __str__(self):
        """Return the value of this option for the commandline.
        # first deal with long options
        if self.names[0].find("--") >= 0:
            output = "%s" % self.names[0]
            if self.value is not None:
                output += "=%s " % self.value
        # now short options
        elif self.names[0].find("-") >= 0:
            output = "%s " % self.names[0]
            if self.value is not None:
                output += "%s " % self.value
            raise ValueError("Unrecognized option type: %s" % self.names[0])

        return output

class _Argument(_AbstractParameter):
    """Represent an argument on a commandline.
    def __str__(self):
        if self.value is not None:
            return "%s " % self.value
            return " "
# --- Example program for Clustalw

class ClustalwApplication(AbstractApplication):
    """Accessing Clustalw through the Application interface.

    XXX This is not done at all -- just meant as an example of how the
    AbstractApplication stuff might work.
    This class could also have the same 'helper functions'
    as the current MultipleAlignCL class.
    def __init__(self):

        self.program_name = "clustalw"

        self.parameters = \
          [_Argument(["sequence_file"], self._file_exists, 1),
           _Option(["-USETREE=", "guide_tree"], self._file_exists, 0),
           _Option(["-TYPE=", "output_type"], self._valid_output_type, 0)

    def run(self):
        commandline = self.construct_commandline()
        # just put in the stuff from Bio/Clustalw/__init__.py.do_alignment()

    # --- functions to check for valid parameters
    def _file_exists(self, filename):
        """Make sure that a passed filename exists.
        if not(os.path.exists(filename)):
            raise ValueError("File %s does not exist." % filename)

    def _valid_output_type(self, type):
        OUTPUT_TYPES = ['GCG', 'GDE', 'PHYLIP', 'PIR', 'NEXUS']
        if type not in OUTPUT_TYPES:
            raise ValueError("Output type %s not valid. Options are %s" %
                             (type, OUTPUT_TYPES))

More information about the Biopython-dev mailing list