From tarjei at genome.wi.mit.edu  Wed Aug  1 02:02:41 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:01 2005
Subject: [Biopython-dev] Pathway Module
In-Reply-To: <002101c1194c$d997d5e0$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDKEKFCBAA.tarjei@genome.wi.mit.edu>

Hi,

thanks to Cayte for taking the initiative on getting a Pathway module
discussion going. Below are my ramblings on what I think such a module
should be like. This is all off the top off my head, so any feedback
would be greatly appreciated.

First off all I think it is an useful exercise to consider what kind of
tasks would benefit from the availability of reaction/pathway classes.
I can think of the following:

 * Elementary mode analysis and MCA.
   - Involves converting a set of reactions to a stochiometry matrix
 * Mapping genes clustered by location or expression to pathways
 * Route queries (how can we transform A to B given a set of enzymes?)
 * Neighborhood queries (which enzymes are k-separated from enzyme Y?)
   - All three of these focus on the graph structure of the pathways.
 * Dynamic simulations

The last task is beyond the scope of anything we could do on this project.
Not only because of the technical challenges, but also because the lack of
information about kinetics. There is a fair amount of kinetic information in
databases like EMP and Brenda, but these numbers are extremely context
specific and irregular. I therefore think that information like reaction
temperature, free energies, experimentally determined kinetics, and even
which organism a reaction has been observed in are best left in the Record
objects of the individual database modules.

I think the core of a biopython pathway module should be a relatively
lightweight abstraction for pathway connectivity, and not much more.
Below is a quick description of what I imagine it could look like.
Note that this is a description of an *abstraction*, not a python
*implementation*.

CLASSES:

Species:
 - A very light class for representing any biochemical species that
   are present in the system we're interested in. It could be a small
molecule,
   an enzyme, whatever.

 a unique name or id              - identifies what this species is (EC
number,
                                    CAS number, something like that)
 a user-defined reference          - ref to object containing further
information,
                                    probably an appropriate Record

Reaction:
 - Represents any biochemical transformation that can take place in the
   system, such as an enzymatic reaction, or a spontaneous transformation.

 a set S of Species objects       - the substrates
 s set P of Species objects       - the products
 a set E of Species objects       - the enzymes
 a set F of species objects       - the factors (cofactors, effectors,
                                    inhibitors?)

System:
 - Represents the biochemical system we're interested in. It is essentially
   a directed multi-graph were the vertices are Species and the edges are
   labeled with references to the reaction that links the parent vertex
   to the child vertex.

 a set V of Species objects       - these are all biochemical species in
this
                                    system, including metabolites, enzymes
and
                                    whatnot

 a set E of tuples (from, to, reaction)
   where from, to refer to elements in V and reaction is a
  (not necessarily unique) Reaction object where from is
   a substrate and to is a product.
                                  - these are the 'edges' that collectively
                                    define a multi-graph representing the
                                    network connectivity


So for example, in as system with Species A,B,C,D,E and one Reaction
R1: A + B -E-> C + D, the System object would be

S1:
 V = {A,B,C,D,E}
 E = {(A,C,E), (A,D,E), (B,C,E), (B,D,E)}


USAGE:

 This is a collection of imagined user interactions with the pathway module:

 First we create a bunch of Species objects which refer to descriptions of
them,
 such as KEGG or WIT records. This step will usually happen inside a
database
 parser:

 A = Species('A',ref1)
 B = Species('B',ref2)
 C = Species('C',ref2)
 ...

 Then we create any Reaction objects. This will also usually happen inside a
 parser module:

 R1 = Reaction(name='smelly',substrates=[A,B],enzymes=[E],products=[C,D])
 R2 = Reaction(name='decay',substrates=[C])
 R3 = R1.reverse()

 It should be easy to create a System object from a collection of
 Reactions. Connectivity should be inferred automatically when several
 reactions are combined:

 >>>S = System()
 >>>S.add_reaction(R1)
 >>>S.add_reaction(R2)
 >>>repr(S.species())
 [Species('A'), Species('B'), ..., Species('E')]

 We might be interested in only some of the species:

 >>>repr(S.enzymes())
 [Species('E')]
 >>>repr(S.metabolites())
 [Species('A'), Species('B'), Species('C'), Species('D')]

 Other useful information:

 >>>S.stochiometry()
 [[-1 -1 1 1], [0 0 -1 0]]

 Putting the information to use:

 flux analysis:

 >>>import Bio.Pathway.Metatool
 >>>Metatool.find_elementary_modes(S, exterals=[A,D])
 ...Metatool output...

 neighborhood queries:

 >>>import Bio.Pathway.Graph
 >>>Graph.find_neighbours(S, E1, separation=3)
 [[E2, E3], [E4], []]

 ..and so on. You get the picture.


Appendix :) - reply to Cayte:


> Step is separate from reaction, because a reaction could occur in
> more than one pathway.

I'm not sure I see the rationale for this. It is true that a reaction
can occur in several pathways, but unless there is information about a
reaction that only applies to a specific pathway there is no need to
keep a separate Step object - you can just let two different pathway
objects reference the same reaction object.

> There may be other information associate with reaction, like
> temperature, but I haven't come across it yet in the WIT or
> EMP databases.

As I said above, I don't think we should represent kinetics and
other "volatile" information in the core pathway objects.


 - Tarjei


From m.1.robinson at herts.ac.uk  Wed Aug  1 03:41:44 2001
From: m.1.robinson at herts.ac.uk (Mark Robinson)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
References: <3B66BE5E.7020204@herts.ac.uk> <p05101004b78c9282b250@[171.65.33.250]>
Message-ID: <3B67B2B8.1020901@herts.ac.uk>

Thanks Jeff,

I'll get stuck in using it then ;). Hope thats been some help, I have to 
say so far I am really impressed by what I am seeing.
Great work!!

blobby

Jeffrey Chang wrote:

> Hey Mark,
> 
> Thanks for letting us know about these.  I'm moving this thread onto 
> the "biopython-dev" list, as it's probably more appropriate there.
> 
>> Failure: test_SubsMat
>> 
>> AssertionError:
>> output: 'M0.00 0.40 0.70 0.80 1.00\n'
>> Expected: 'M -0.00 0.40 0.70 0.80 1.00\n'
> 
> 
> It looks like this is from a difference in how windows and Iddo's OS 
> handles 0's.  It's probably not serious, but should be fixed.  Iddo, 
> can you write some code that will check for this?
> 
> 
>> Error: test_gobase
>> 
>> from Bio import Sequence
>> ImportError: cannot import name Sequence
>> 
>> Error: test_rebase
>> 
>> from Bio import Sequence
>> ImportError: cannot import name Sequence
> 
> 
> These seem to be from some legacy code that hasn't been cleaned up. 
> It's now fixed in the CVS and will be incorporated into the next release.
> 
> 
> 
>> Failure: test_prodoc
>> 
>> AssertionError:
>> Output: 'J. \n'
>> Expected: 'J. \n'
> 
> 
> Brad, this looks pretty odd.  Is it a newline problem?
> 
> Jeff
> 
> 
> 


From chapmanb at arches.uga.edu  Wed Aug  1 05:28:30 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] "Features" of Bio.Clustalw
In-Reply-To: <Pine.LNX.4.31.0107310101210.7704-100000@mercurio.localdomain>
References: <15203.3182.97701.271322@taxus.athen1.ga.home.com>
	<Pine.LNX.4.31.0107310101210.7704-100000@mercurio.localdomain>
Message-ID: <15207.52158.379530.574917@taxus.athen1.ga.home.com>

Hi Davide;

[Clustalw bugs]

> Here I send the patches I was able to cook up, these are only minor
> changes, anyway I hope it will help.

Great! I applied these to CVS. Thanks much for the contribution!
 
> I think that having a class like MultipleAlignCL is superior to passing
> the alignment arguments to a function as is for blastpgp or blastall.

I'm glad you like it :-). This is just an idea I came up with because
clustalw had so many options. It seemed less confusing than trying to
pass in all of those options through a function.

blastall and blastpgp are Jeff Chang's functions, so maybe he could
comment on your idea to have classes to encompass their options. I'm
not positive if he even likes the "command line in a class" idea :-).

> Finally it is a general mechanism and could be used to give a uniform
> interface to functions invoking external programs.
> 
> Do you think you would be interested in a patch implementing such
> behaviour? I think one could also retain compatibilty with the current
> interface.

As I mentioned above, it is really Jeff's call about whether or not
he'd like to see something like this in blastall() and friends; but I
do think having a general interface would be nice. There was a lot of
talk as BOSC/ISMB conference this year about other programs that it
would nice for biopython to interface to (EMBOSS in particular) so
there is definately interest and a lot of work that could be done
along these lines, if you are interested.

Also, during one of the talks at the ISMB conference I got inspired
and had an idea for a generic class for running Applications. Based on
what I scrawled on a piece of notebook paper during the talk, I wrote
up something that kind of sketches out the ideas I had and attached it
to this mail. This isn't working code or anything -- just enough to
show the ideas. I'm not really sure if this is good, but I thought you
might be interested in looking at it if you want to work further on
this. Feel free to use it or not use it.

Thanks again for the patches and interest!

Brad

-------------- next part --------------
"""Rough ideas for a general way to access applications in biopython.
"""
import os

# --- the general classes

class AbstractApplication:
    """Generic interface for running applications from biopython.

    This class shouldn't be called directly; it should be subclassed to
    provide an implementation for a specific application.
    """
    def __init__(self):
        self.program_name = ""
        self.parameters = []
    
    def run(self):
        """Construct the commandline and run the program.
        """
        pass

    def construct_commandline(self):
        """Make the commandline with the currently set options.
        """
        commandline = "%s " % self.program_name
        for parameter in self.parameters:
            if parameter.is_required and not(parameter.is_set):
                raise ValueError("Parameter %s is not set." % parameter.names)
            if parameter.is_set:
                commandline += str(parameter)

        return commandline

    def set_parameter(self, name, value = None):
        """Set a commandline option for a program.
        """
        set_option = 0
        for parameter in self.parameters:
            if name in parameter.names:
                if value is not None:
                    if parameter.checker_function is not None:
                        paramater.checker_function(value)

                    parameter.value = value
                parameter.is_set = 1
                set_option = 1

        if set_option == 0:
            raise ValueError("Option name %s was not found." % name)
                    
class _AbstractParameter:
    """A class to hold information about a parameter for a commandline.

    Do not use this directly, instead use one of the subclasses.

    Attributes:

    o names -- a list of string names by which the parameter can be
    referenced (ie. ["-a", "--append", "append"]). The first name in
    the list is considered to be the one that goes on the commandline,
    for those parameters that print the option.

    o checker_function -- a reference to a function that will determine
    if a given value is valid for this parameter.

    o description -- a description of the option.

    o is_required -- a flag to indicate if the parameter must be set for
    the program to be run.

    o is_set -- if the parameter has been set

    o value -- the value of a parameter
    """
    def __init__(self, names = [], checker_function = None, is_required = 0,
                 description = ""):
        self.names = names
        self.checker_function = checker_function
        self.description = description
        self.is_required = 0

        self.is_set = 0
        self.value = None

class _Option(_AbstractParameter):
    """Represent an option that can be set for a program.

    This holds UNIXish options like --append=yes and -a yes
    """
    def __str__(self):
        """Return the value of this option for the commandline.
        """
        # first deal with long options
        if self.names[0].find("--") >= 0:
            output = "%s" % self.names[0]
            if self.value is not None:
                output += "=%s " % self.value
        # now short options
        elif self.names[0].find("-") >= 0:
            output = "%s " % self.names[0]
            if self.value is not None:
                output += "%s " % self.value
        else:
            raise ValueError("Unrecognized option type: %s" % self.names[0])

        return output

class _Argument(_AbstractParameter):
    """Represent an argument on a commandline.
    """
    def __str__(self):
        if self.value is not None:
            return "%s " % self.value
        else:
            return " "
    
# --- Example program for Clustalw

class ClustalwApplication(AbstractApplication):
    """Accessing Clustalw through the Application interface.

    XXX This is not done at all -- just meant as an example of how the
    AbstractApplication stuff might work.
    This class could also have the same 'helper functions'
    as the current MultipleAlignCL class.
    """
    def __init__(self):
        AbstractApplication.__init__(self)

        self.program_name = "clustalw"

        self.parameters = \
          [_Argument(["sequence_file"], self._file_exists, 1),
           _Option(["-USETREE=", "guide_tree"], self._file_exists, 0),
           _Option(["-TYPE=", "output_type"], self._valid_output_type, 0)
          ] 

    def run(self):
        commandline = self.construct_commandline()
        # just put in the stuff from Bio/Clustalw/__init__.py.do_alignment()

    # --- functions to check for valid parameters
    def _file_exists(self, filename):
        """Make sure that a passed filename exists.
        """
        if not(os.path.exists(filename)):
            raise ValueError("File %s does not exist." % filename)

    def _valid_output_type(self, type):
        OUTPUT_TYPES = ['GCG', 'GDE', 'PHYLIP', 'PIR', 'NEXUS']
        if type not in OUTPUT_TYPES:
            raise ValueError("Output type %s not valid. Options are %s" %
                             (type, OUTPUT_TYPES))
From chapmanb at arches.uga.edu  Wed Aug  1 05:42:58 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <p05101004b78c9282b250@[171.65.33.250]>
References: <3B66BE5E.7020204@herts.ac.uk>
	<p05101004b78c9282b250@[171.65.33.250]>
Message-ID: <15207.53026.138744.787675@taxus.athen1.ga.home.com>

Jeff:
> Thanks for letting us know about these.  I'm moving this thread onto 
> the "biopython-dev" list, as it's probably more appropriate there.

I'd like to second the thanks -- it's all around nice to have people
using biopython regularly on non-UNIX platforms. 

> >Failure: test_SubsMat
> >
> >AssertionError:
> >output: 'M0.00 0.40 0.70 0.80 1.00\n'
> >Expected: 'M -0.00 0.40 0.70 0.80 1.00\n'
> 
> It looks like this is from a difference in how windows and Iddo's OS 
> handles 0's.  It's probably not serious, but should be fixed.  Iddo, 
> can you write some code that will check for this?

I think this actually might be a python version difference and not an
OS difference. I'm also seeing it right now on my machine:

$ uname -a
NetBSD taxus.athen1.ga.home.com 1.5.1 NetBSD 1.5.1 (TAXUS) #1: Tue Jun 12 09:13:48 EDT 2001     chapmanb@taxus:/usr/src/sys/arch/macppc/compile/TAXUS macppc

$ python
Python 2.1 (#6, Jul  8 2001, 17:18:01) 
[GCC egcs-2.91.66 19990314 (egcs-1.1.2 release)] on netbsd1

FAIL: test_SubsMat
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 153, in runTest
    expected_handle)
  File "run_tests.py", line 247, in compare_output
    assert expected_line == output_line, \
AssertionError: 
Output  : 'M 0.00 0.40 0.70 0.80 1.00\n'
Expected: 'M -0.00 0.40 0.70 0.80 1.00\n'

This is just one I've thrown my hands up in the air about. It's not
really a bug in SubsMat (hey, 0.00 and -0.00 are still the same, right
:-), but I'm not sure how to make the regression checker recognize this.

> >Failure: test_prodoc
> >
> >AssertionError:
> >Output: 'J. \n'
> >Expected: 'J. \n'
> 
> Brad, this looks pretty odd.  Is it a newline problem?

This is another one I've seen on Windows and also on Yair's Mac stuff,
but have to throw my hands up in the air about. What Mark reported
here is different from what I've seen -- my error looks like:

Output: 'J. \n'
Expected: 'J.\n'

So, there is, for some unknown reason, as extra space generated at the
end of the line, that we don't see on UNIX platforms. I'm not sure
what is going on here, or how we can make the regression tester stop
choking on it (other than reintroducing my "end of the line whitespace
isn't important stuff" :-). 

Any ideas for anyone? I'd definately like to clear up these two
problems if we could.

Brad


From idoerg at cc.huji.ac.il  Wed Aug  1 08:31:57 2001
From: idoerg at cc.huji.ac.il (Iddo Friedberg)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <15207.53026.138744.787675@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.33_heb2.09.0108011524010.17117-100000@new-shum>

Hi,

OK, I'm on to the test_SubsMat problem. I'll see what I can do to
accomodate this. Seems like a format-string handling problem, which may
arise from different OS versions. Doesn't seem to be from different python
versions, as I'm also using the 2.1, and the test was good in both 2.1 and
2.0. Brad is using a 2.1 on a FreeBSD machine, and is getting different
output than me.


On another matter: got a problem with test_unigene:

idoerg@arrakis:biopython/Tests> python run_tests.py  test_unigene.py
test_unigene ... FAIL

======================================================================
FAIL: test_unigene
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 153, in runTest
    expected_handle)
  File "run_tests.py", line 247, in compare_output
    assert expected_line == output_line, \
AssertionError:
Output  : '        key is AI616857\n'
Expected: '        key is AA495266\n'
----------------------------------------------------------------------
Ran 1 tests in 0.732s

FAILED (failures=1)

My machine:

idoerg@arrakis:biopython/Tests> uname -a
Linux arrakis.md.huji.ac.il 2.2.16-22enterprise #1 SMP Tue Aug 22 16:29:32
EDT 2000 i686 unknown

My Python:

idoerg@arrakis:biopython/Tests> python
Python 2.1 (#1, Jul 11 2001, 11:27:29)
[GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-85)] on linux2


Platform-independence-means-that-some-platforms-are-more-independent-than-others'ly

yours,

Iddo


 On Wed, 1 Aug 2001, Brad Chapman wrote:

: Jeff:
: > Thanks for letting us know about these.I'm moving this thread onto
: > the "biopython-dev" list, as it's probably more appropriate there.
:
: I'd like to second the thanks -- it's all around nice to have people
: using biopython regularly on non-UNIX platforms.
:
: > >Failure: test_SubsMat
: > >
: > >AssertionError:
: > >output: 'M0.00 0.40 0.70 0.80 1.00\n'
: > >Expected: 'M -0.00 0.40 0.70 0.80 1.00\n'
: >
: > It looks like this is from a difference in how windows and Iddo's OS
: > handles 0's.It's probably not serious, but should be fixed.Iddo,
: > can you write some code that will check for this?
:
: I think this actually might be a python version difference and not an
: OS difference. I'm also seeing it right now on my machine:
:
: $ uname -a
: NetBSD taxus.athen1.ga.home.com 1.5.1 NetBSD 1.5.1 (TAXUS) #1: Tue Jun 12 09:13:48 EDT 2001   chapmanb@taxus:/usr/src/sys/arch/macppc/compile/TAXUS macppc
:
: $ python
: Python 2.1 (#6, Jul8 2001, 17:18:01)
: [GCC egcs-2.91.66 19990314 (egcs-1.1.2 release)] on netbsd1
:
: FAIL: test_SubsMat
: ----------------------------------------------------------------------
: Traceback (most recent call last):
: File "run_tests.py", line 153, in runTest
:   expected_handle)
: File "run_tests.py", line 247, in compare_output
:   assert expected_line == output_line, \
: AssertionError:
: Output: 'M 0.00 0.40 0.70 0.80 1.00\n'
: Expected: 'M -0.00 0.40 0.70 0.80 1.00\n'
:
: This is just one I've thrown my hands up in the air about. It's not
: really a bug in SubsMat (hey, 0.00 and -0.00 are still the same, right
: :-), but I'm not sure how to make the regression checker recognize this.
:
: > >Failure: test_prodoc
: > >
: > >AssertionError:
: > >Output: 'J. \n'
: > >Expected: 'J. \n'
: >
: > Brad, this looks pretty odd.Is it a newline problem?
:
: This is another one I've seen on Windows and also on Yair's Mac stuff,
: but have to throw my hands up in the air about. What Mark reported
: here is different from what I've seen -- my error looks like:
:
: Output: 'J. \n'
: Expected: 'J.\n'
:
: So, there is, for some unknown reason, as extra space generated at the
: end of the line, that we don't see on UNIX platforms. I'm not sure
: what is going on here, or how we can make the regression tester stop
: choking on it (other than reintroducing my "end of the line whitespace
: isn't important stuff" :-).
:
: Any ideas for anyone? I'd definately like to clear up these two
: problems if we could.
:
: Brad
:
: _______________________________________________
: Biopython-dev mailing list
: Biopython-dev@biopython.org
: http://biopython.org/mailman/listinfo/biopython-dev
:

--

Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/


From jchang at SMI.Stanford.EDU  Wed Aug  1 11:01:35 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] "Features" of Bio.Clustalw
In-Reply-To: <15207.52158.379530.574917@taxus.athen1.ga.home.com>
References: <15203.3182.97701.271322@taxus.athen1.ga.home.com>
 <Pine.LNX.4.31.0107310101210.7704-100000@mercurio.localdomain>
 <15207.52158.379530.574917@taxus.athen1.ga.home.com>
Message-ID: <p05101000b78dc887fcf0@[192.168.0.4]>

>  [Davide Marchignoli]
>  > I think that having a class like MultipleAlignCL is superior to passing
>  > the alignment arguments to a function as is for blastpgp or blastall.

[Brad Chapman]
[...]
>blastall and blastpgp are Jeff Chang's functions, so maybe he could
>comment on your idea to have classes to encompass their options. I'm
>not positive if he even likes the "command line in a class" idea :-).
>
>>  Finally it is a general mechanism and could be used to give a uniform
>>  interface to functions invoking external programs.
>>
>>  Do you think you would be interested in a patch implementing such
>>  behaviour? I think one could also retain compatibilty with the current
>>  interface.

Yes, I think that's a good idea, and one that I've used in other 
modules I've written.  However, I do still want a low-level interface 
mapped closely to the program where you pass in variables as 
parameters to the function.  If you have that, it's always possible 
to build other interfaces on top of it, as you suggest.  However, 
it's harder to go the other way around.

>class AbstractApplication:
>     """Generic interface for running applications from biopython.
>
>     This class shouldn't be called directly; it should be subclassed to
>     provide an implementation for a specific application.
>     """

Looks pretty cool.  The only thing that might be missing is some way 
of dealing with the output.  That way, you can pass around 
applications that you can call, and it will return usable objects. 
But maybe that should be done in a decorator class.

Jeff

From idoerg at cc.huji.ac.il  Wed Aug  1 12:23:20 2001
From: idoerg at cc.huji.ac.il (Iddo Friedberg)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <3B66BE5E.7020204@herts.ac.uk>
Message-ID: <Pine.GSO.4.33_heb2.09.0108011918320.1809-100000@new-shum>

Hi,

I just read Mark's post a bit more carefully:


On Tue, 31 Jul 2001, Mark Robinson wrote:

: Hi guys,
:

[Description of a couple of bugs in the regression tests]

:
: ===
:
: The two AssertionErrors don't occur if I run the individual test script
: only if I run it from the graphical interface, and I guess it looks like
: the newline error you flag in the tutorial.

Can anybody say why the AssertionErrors do not occur when the individual
scripts are run, but only when the graphical interface is used? This
sounds a bit weird...

I should add my thanks here to all those who are involved in porting &
checking Biopython on various platforms. This is very important.


Iddo


--

Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/


From chapmanb at arches.uga.edu  Wed Aug  1 12:43:41 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <Pine.GSO.4.33_heb2.09.0108011918320.1809-100000@new-shum>
References: <3B66BE5E.7020204@herts.ac.uk>
	<Pine.GSO.4.33_heb2.09.0108011918320.1809-100000@new-shum>
Message-ID: <15208.12733.310043.228233@taxus.athen1.ga.home.com>

Mark:
> : The two AssertionErrors don't occur if I run the individual test script
> : only if I run it from the graphical interface, and I guess it looks like
> : the newline error you flag in the tutorial.

Iddo:
> Can anybody say why the AssertionErrors do not occur when the individual
> scripts are run, but only when the graphical interface is used? This
> sounds a bit weird...

Sure, if you just run the test script:

python test_SubsMat.py

the test itself runs fine. But if you add on the comparison of the
generated output to the old output, then that's where you'll get the
error (an AssertionError in this case, since the regression testing
framework just asserts that the lines are the same).

If you want to just run the regression testing stuff on a single test,
you can do:

python run_tests.py test_SubsMat

to just do SubsMat.

Back-to-work-ly yr's,
Brad


From idoerg at cc.huji.ac.il  Wed Aug  1 12:55:10 2001
From: idoerg at cc.huji.ac.il (Iddo Friedberg)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <15208.12733.310043.228233@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.33_heb2.09.0108011951470.1809-100000@new-shum>

On Wed, 1 Aug 2001, Brad Chapman wrote:

: Mark:
: > : The two AssertionErrors don't occur if I run the individual test script
: > : only if I run it from the graphical interface, and I guess it looks like
: > : the newline error you flag in the tutorial.
:
: Iddo:
: > Can anybody say why the AssertionErrors do not occur when the individual
: > scripts are run, but only when the graphical interface is used? This
: > sounds a bit weird...
:
: Sure, if you just run the test script:
:
: python test_SubsMat.py
:
: the test itself runs fine. But if you add on the comparison of the
: generated output to the old output, then that's where you'll get the
: error (an AssertionError in this case, since the regression testing
: framework just asserts that the lines are the same).
:

Oh, I thought that Mark meant:

python run_tests.py test_SubsMat.py

(As you show later, which compares with the old output).

That explains stuff.

Iddo

--

Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/


From tarjei at genome.wi.mit.edu  Wed Aug  1 23:49:33 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway Module
In-Reply-To: <003c01c11b17$477f3000$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDMEOPCBAA.tarjei@genome.wi.mit.edu>

>  > > Step is separate from reaction, because a reaction could occur in
> > > more than one pathway.
> >
> > I'm not sure I see the rationale for this. It is true that a reaction
> > can occur in several pathways, but unless there is information about a
> > reaction that only applies to a specific pathway there is no need to
> > keep a separate Step object - you can just let two different pathway
> > objects reference the same reaction object.
> >
>
>    The information that applies to just one pathway is the branching and
> sequence, the in links and out links to other steps..  Maybe you can tease
> this information out of the products and substrates for each
> reaction, but I
> thought of using explicit links from one step to the next step(s).

 So there are two issues here: 1) When is there a link between two
reactions in a pathway? and 2) How do we represent those links?

 For 1) my understanding is that a pathway is uniquely defined by the
substrates and products of its constituent reactions. That is,
there is always a link from A->B to B->A and from C + D -> E to E -> F,
and there is never a link from B->A to A->B, or from E -> C + D to A -> B.
Because of this I think it is important that a Pathway/System class
infer links between reactions automatically. That is, if a user
combines two reactions A -> B and B -> C into a pathway, s/he should
not have to explicitly define the link between them.

 For 2) there are several equivalent options. My proposed classes would
keep a (kind of) adjacency list/matrix in the System class that explicitly
define these links. If I understand your proposal, your idea is to keep
an array of Step objects that reference their neighbors internally.
 These two representations are essentially equivalent and shouldn't make any
difference to the end-user, so I don't have a particularly strong opinion
about which is the better.

  Keep the ideas flowing :)

 Tarjei


From jchang at SMI.Stanford.EDU  Wed Aug  1 19:57:22 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] add dynamic programming alignment modules
Message-ID: <p05101006b78e43ebd170@[171.65.33.250]>

On the flight home from ISMB, I coded up some modules to do pairwise 
alignments.  I went ahead and put them into the Bio.Align package 
because they seem most appropriate there -- I hope nobody objects!

There are two main modules: pairwise.py and fastpairwise.py.  The 
first one implements a slower, more general alignment algorithm.  The 
second is faster, but requires an affine gap penalty.  Right now, 
they're both implemented in python.  However, I broke the code up in 
such a way so that it won't be hard to swap out a piece of it with 
fast C code.  I didn't have time to do this on the flight, but might 
get to it at a later date.

Enjoy!

Jeff

From jchang at SMI.Stanford.EDU  Wed Aug  1 13:40:16 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <Pine.GSO.4.33_heb2.09.0108011524010.17117-100000@new-shum>
References: <Pine.GSO.4.33_heb2.09.0108011524010.17117-100000@new-shum>
Message-ID: <p05101005b78def09ec9b@[171.65.33.250]>

>On another matter: got a problem with test_unigene:
>
>idoerg@arrakis:biopython/Tests> python run_tests.py  test_unigene.py
>test_unigene ... FAIL


>My machine:
>
>idoerg@arrakis:biopython/Tests> uname -a
>Linux arrakis.md.huji.ac.il 2.2.16-22enterprise #1 SMP Tue Aug 22 16:29:32
>EDT 2000 i686 unknown
>
>My Python:
>
>idoerg@arrakis:biopython/Tests> python
>Python 2.1 (#1, Jul 11 2001, 11:27:29)
>[GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-85)] on linux2


Uh, oh.  That's bad.  Are you sure you have a current CVS?  Mine 
works.  I'm on:

SunOS helio 5.6 Generic_105181-25 sun4u sparc SUNW,Ultra-Enterprise

Python 2.1 (#7, Apr 17 2001, 18:53:25)
[GCC 2.8.1] on sunos5


Cayte, could you look into this?

Jeff

From jchang at SMI.Stanford.EDU  Wed Aug  1 13:30:46 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <15207.53026.138744.787675@taxus.athen1.ga.home.com>
References: <3B66BE5E.7020204@herts.ac.uk>
 <p05101004b78c9282b250@[171.65.33.250]>
 <15207.53026.138744.787675@taxus.athen1.ga.home.com>
Message-ID: <p05101004b78dea07bfff@[171.65.33.250]>

I think this may be a problem in test_prodoc.py rather than the 
regression testing framework.  This output is generated in a function 
called print_references:

def print_references( list ):
     for item in list:
         text = item.number + ' ' + item.authors + ' ' + item.citation
         while text:
             print text[ :80 ]
             text = text[ 80: ]

It prints some text out 80 characters at a time.  Perhaps this 
boundary is falling on different characters depending on the OS' line 
breaking convention.  To make things more difficult, the text file 
itself has different types of line breaks.

I've gone through and changed this code to:

def print_references( list ):
     for item in list:
         print item.number
         print item.authors
         print item.citation

and submitted the changes and the new output to CVS.

I don't have a reproducible here, so could someone with a Windows 
machine take a look at it?

Thanks,
Jeff


>  > >Failure: test_prodoc
>>  >
>>  >AssertionError:
>>  >Output: 'J. \n'
>>  >Expected: 'J. \n'
>>
>>  Brad, this looks pretty odd.  Is it a newline problem?
>
>This is another one I've seen on Windows and also on Yair's Mac stuff,
>but have to throw my hands up in the air about. What Mark reported
>here is different from what I've seen -- my error looks like:
>
>Output: 'J. \n'
>Expected: 'J.\n'
>
>So, there is, for some unknown reason, as extra space generated at the
>end of the line, that we don't see on UNIX platforms. I'm not sure
>what is going on here, or how we can make the regression tester stop
>choking on it (other than reintroducing my "end of the line whitespace
>isn't important stuff" :-).
>
>Any ideas for anyone? I'd definately like to clear up these two
>problems if we could.
>
>Brad

From katel at worldpath.net  Thu Aug  2 01:52:10 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway Module
References: <NEBBJGOKPAACGBJLMGCDKEKFCBAA.tarjei@genome.wi.mit.edu>
Message-ID: <003c01c11b17$477f3000$010a0a0a@cadence.com>

----- Original Message -----
S1:
 V = {A,B,C,D,E}
 E = {(A,C,E), (A,D,E), (B,C,E), (B,D,E)}


 > > Step is separate from reaction, because a reaction could occur in
> > more than one pathway.
>
> I'm not sure I see the rationale for this. It is true that a reaction
> can occur in several pathways, but unless there is information about a
> reaction that only applies to a specific pathway there is no need to
> keep a separate Step object - you can just let two different pathway
> objects reference the same reaction object.
>

   The information that applies to just one pathway is the branching and
sequence, the in links and out links to other steps..  Maybe you can tease
this information out of the products and substrates for each reaction, but I
thought of using explicit links from one step to the next step(s).

                                                 Cayte


From davide at biodec.com  Thu Aug  2 04:29:11 2001
From: davide at biodec.com (Davide Marchignoli)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Abstract application wrapper
In-Reply-To: <15207.52158.379530.574917@taxus.athen1.ga.home.com>
Message-ID: <Pine.LNX.4.31.0108012148250.903-100000@mercurio.localdomain>

Hi,

On Wed, 1 Aug 2001, Brad Chapman wrote:

> Hi Davide;
>
> As I mentioned above, it is really Jeff's call about whether or not
> he'd like to see something like this in blastall() and friends; but I
> do think having a general interface would be nice. There was a lot of
> talk as BOSC/ISMB conference this year about other programs that it
> would nice for biopython to interface to (EMBOSS in particular) so
> there is definately interest and a lot of work that could be done
> along these lines, if you are interested.
>
> Also, during one of the talks at the ISMB conference I got inspired
> and had an idea for a generic class for running Applications. Based on
> what I scrawled on a piece of notebook paper during the talk, I wrote
> up something that kind of sketches out the ideas I had and attached it
> to this mail. This isn't working code or anything -- just enough to
> show the ideas. I'm not really sure if this is good, but I thought you
> might be interested in looking at it if you want to work further on
> this. Feel free to use it or not use it.
>
> Thanks again for the patches and interest!
>
> Brad
>

I think it is really very nice!

In my opinion it is general enough to encapsulate most (if not all)
external programs used within biopython.

If there is an agreement on the interface I think it should not be a
problem to fix the implementation details.

However I slightly prefer the lighter version in which you have a class

AbstractApplicationCommandLine (yes to be shortened) instead of
AbstractApplication

where the only difference is that you do not have a run method and have
also a __str__ method behaving as construct_commandline. (or maybe better
something returning a list of strings?)

In my opinion the advantage of such architecture is that you do not have a
wrapper around the function running the application, but rather your class
works side by side with the function running the application.

You retain the lowest level interface given by the function to which you
can pass the os.system string and also an higher level interface in which
you pass an instance of some class derived from AbstractApplicationCommandLine.

In my opinion, at the moment the interface provided by blastpgp is not
completely low level. For instance you cannot pass to blastpgp a parameter
that is not listed in att2param. The blastpgp function already does some
kind of parsing. With this approach you would not repeat work (parameter
parsing would be done only at the level of the CommandLine class), you
would retain an interface at lower level than the one you have now and
finally you would have an high level interface provided by the
AbstractApplicationCommandLine class.

One of the nice things it would allow would be the following:

# NON working code, for example purpose only
def blastpgp(commandline):
  args = str(commandline).split()

  ...

  r, w, e = popen2.popen3(args)
  if commandline isinstance(BlastpgpCommandline) and commandline.streaminput:
    commandline.write_input(w)
  else:
    w.close

where BlastpgpCommandline implements:

def set_seqinput(self, seq_record):
  self.input_seq_record = seq_record
  self.streaminput = 1

def set_streaminput(self, stream):
  self.input_stream = stream
  self.streaminput = 1

def write_input(self, outstream):
  if self.input_stream:
    outstream.write(self.input_stream.read())
  elif self.input_seq_record:
    SeqIO.Fasta.FastaWriter(outstream).write(self.input_seq_record)
  else:
    raise ValueError

so the user could write something like

args = BlastpgpCommandline(...)
args.set_input(seqrecord) # passing a SeqRecord as input
align = blastpgp(args)


Let me know what you think about it.

Bye,
				Davide Marchignoli


From idoerg at cc.huji.ac.il  Thu Aug  2 02:51:22 2001
From: idoerg at cc.huji.ac.il (Iddo Friedberg)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <p05101005b78def09ec9b@[171.65.33.250]>
Message-ID: <Pine.GSO.4.33_heb2.09.0108020948080.11496-100000@new-shum>

Hi,

[Iddo]

: >On another matter: got a problem with test_unigene:
: >
: >idoerg@arrakis:biopython/Tests> python run_tests.pytest_unigene.py
: >test_unigene ... FAIL

False alarm.  Sorry.

Iddo


On Wed, 1 Aug 2001, Jeffrey Chang wrote:

:
:
: >My machine:
: >
: >idoerg@arrakis:biopython/Tests> uname -a
: >Linux arrakis.md.huji.ac.il 2.2.16-22enterprise #1 SMP Tue Aug 22 16:29:32
: >EDT 2000 i686 unknown
: >
: >My Python:
: >
: >idoerg@arrakis:biopython/Tests> python
: >Python 2.1 (#1, Jul 11 2001, 11:27:29)
: >[GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-85)] on linux2
:
:
: Uh, oh.That's bad.  Are you sure you have a current CVS?Mine
: works.I'm on:
:
: SunOS helio 5.6 Generic_105181-25 sun4u sparc SUNW,Ultra-Enterprise
:
: Python 2.1 (#7, Apr 17 2001, 18:53:25)
: [GCC 2.8.1] on sunos5
:
:
: Cayte, could you look into this?
:
: Jeff
:

--

Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/


From jefftc at Stanford.EDU  Thu Aug  2 20:43:30 2001
From: jefftc at Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] test (apologies)
Message-ID: <Pine.GSO.4.31.0108021743050.24096-100000@saga1.Stanford.EDU>

My previous emails to biopython did not go through, so I'm sending this to
check if there's a problem with my mail.  Sorry about the spam!

Jeff


From katel at worldpath.net  Fri Aug  3 00:00:17 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
References: <Pine.GSO.4.33_heb2.09.0108011524010.17117-100000@new-shum> <p05101005b78def09ec9b@[171.65.33.250]>
Message-ID: <001c01c11bd0$d0c05e20$010a0a0a@cadence.com>

> Cayte, could you look into this?
>
  Its failing on a dictionary item.  I think I need to sort the items before
printing.  Andrew fixed this on another file and I was going to put the fix
in unigene and kabat when I became bogged down on a Martel issue.

                     Cayte


From katel at worldpath.net  Fri Aug  3 03:40:26 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
References: <Pine.GSO.4.33_heb2.09.0108011524010.17117-100000@new-shum> <p05101005b78def09ec9b@[171.65.33.250]> <001c01c11bd0$d0c05e20$010a0a0a@cadence.com>
Message-ID: <002301c11bef$935c06a0$010a0a0a@cadence.com>

----- Original Message -----
From: "Cayte" <katel@worldpath.net>
To: "Iddo Friedberg" <idoerg@cc.huji.ac.il>; <biopython-dev@biopython.org>;
"Jeffrey Chang" <jchang@SMI.Stanford.EDU>
Sent: Thursday, August 02, 2001 9:00 PM
Subject: Re: [Biopython-dev] Re: [BioPython] tests failing


> > Cayte, could you look into this?
> >
>   Its failing on a dictionary item.  I think I need to sort the items
before
> printing.  Andrew fixed this on another file and I was going to put the
fix
> in unigene and kabat when I became bogged down on a Martel issue.
>
  I'm confused.  Someone( Andrew? ) must have fixed the code.  The latest
CVS code ( 8/2 ) doesn't have the problem.  I just downloaded it and tried
it Andrew said he made a fix but that was a while ago.  The latest unigene
code is dated 8/2.!?!?!?

The heat ust be getting to me.:)!


                                        Cayte


From pewilkinson at informaxinc.com  Fri Aug  3 14:34:04 2001
From: pewilkinson at informaxinc.com (Peter Wilkinson)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway module
In-Reply-To: <200108031601.f73G1qq22184@pw600a.bioperl.org>
Message-ID: <001c01c11c4a$e04f8530$7d0210ac@l001696w00>

ok guys,

if you are not aware, look up BIND on the net. This is written by
Christopher Hogue's group (yes, the same would has writtn chapter in the
Baxevanis text Bioinformatics).

his web site is at the Samuel Lunenfield institute in Toronto, Ontario (sp?)
on the web, and there is a paper explaining how it works on the sire. It is
easy to find with www.google.com


the BIND data model works very well, and it involves interactions that can
be between protein:protein, protein:molecule, protein:photon, etc.

And so the pathway can be then built from all these interactions.

I heavily suggest that you have a good look at the BIND model, since like
genbank, will become the standard public archive for molecular interactions
and pathway data

Peter


From katel at worldpath.net  Sat Aug  4 00:04:40 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway module
References: <001c01c11c4a$e04f8530$7d0210ac@l001696w00>
Message-ID: <000501c11c9a$97c07b80$010a0a0a@cadence.com>

>
> if you are not aware, look up BIND on the net. This is written by
> Christopher Hogue's group (yes, the same would has writtn chapter in the
> Baxevanis text Bioinformatics).
>
>
    I found the text and my first impression is that its too ambitious for
what we are doing.  We are not supporting kinetics or simulation.  Our
mandate, as I understand it, is to provide tools that make it easier to work
with databses like BIND, to complement not duplicate their functionality.
BINDS supports detail at the atomic level, this enzyme weaks three electons
on the 4th carbon.

   We could use a subset, maybe Interaction, Pathway and Action objects.
But we would either carry a lot of extra baggage  or hsve lots of empty
fields.  Even if the fields are empty  we would need code to fish through
them and pull out the data of interest.

  On the positive side, it would be extensible and compatible with a
standard format.  But I'd be concerned that the format is so rich, you'd
lose the forest for the trees.

  What do others think?

                                                                    Cayte


>


From tarjei at genome.wi.mit.edu  Fri Aug  3 21:59:38 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway module
In-Reply-To: <000501c11c9a$97c07b80$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDAEPHCBAA.tarjei@genome.wi.mit.edu>

> > if you are not aware, look up BIND on the net. 
> >
> I found the text and my first impression is that its too ambitious for
> what we are doing.  We are not supporting kinetics or simulation.  Our
> mandate, as I understand it, is to provide tools that make it 
> easier to work with databses like BIND, to complement not duplicate 
> their functionality.

I totally agree on this point. What we need is something that is both
much more lightweight and more "processed" than the BIND data model.
Out "Pathway" should be a data structure that makes it simple to operate
on data selectively extracted from BIND or other databases.

Basically it's the difference between the Bio.GenBank.Record class and
the Bio.Seq class.

On the other hand, BIND appears to have matured and gained some 
momentum since last time I heard of it, and there is no doubt that 
a module for parsing and selecting BIND data would be very useful.
It would be something to look at after our WIT/EMP and KEGG 
modules. (After all, with 6 pathways stored the BIND database is 
currently much less useful than these two).

On a related note, The EcoCyc ontology paper 
(Karp P. (2000) Bioinformatics, v16, n3, p269-285) is also worth a
look for anyone interested in this topic.

 Tarjei

From chapmanb at arches.uga.edu  Sat Aug  4 10:41:11 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: [BioPython] tests failing
In-Reply-To: <p05101004b78dea07bfff@[171.65.33.250]>
References: <3B66BE5E.7020204@herts.ac.uk>
	<p05101004b78c9282b250@[171.65.33.250]>
	<15207.53026.138744.787675@taxus.athen1.ga.home.com>
	<p05101004b78dea07bfff@[171.65.33.250]>
Message-ID: <15212.2439.368738.975068@taxus.athen1.ga.home.com>

Jeff:
> I think this may be a problem in test_prodoc.py rather than the 
> regression testing framework.  This output is generated in a function 
> called print_references:
[...]
> It prints some text out 80 characters at a time.  Perhaps this 
> boundary is falling on different characters depending on the OS' line 
> breaking convention.  To make things more difficult, the text file 
> itself has different types of line breaks.
[...]
> I don't have a reproducible here, so could someone with a Windows 
> machine take a look at it?

I just tested it out (after much swearing while attempting to get CVS
working on Windows. Grrrrrr...) and it turns out, as I've always
suspected, that you are a genius. No more failing for test_prodoc, 
yippeee! Excellent deduction, Jeff.

My current test status is:

==> Windows 98 w/ Python 2.1

test_MultiProc is failing with a complaint about os.fork not existing
on Windows. I guess there is not much we can do about this.

test_GenBank is failing with a parse error. I'll investigate this
further once I manage to get CVS working properly.

test_SubsMat is failing with the -0.00 and 0.00 thing.

==> UNIX (my NetBSD machine) w/ Python 2.1

test_SubsMat is failing with the -0.00 and 0.00 thing.

test_interpro was failing with:

IOError: [Errno 2] No such file or directory: 'InterPro/ipr001064.htm'

which seems to be due to the fact the test files are named
IPR001064.htm (ie. capital IPR). This probably didn't fail on Windows,
but does on my machine. I just checked in a fix for this to
test_interpro, so it's taken care of.


So that's where I'm at. Thanks again Jeff for the prodoc fix!
Brad


From chapmanb at arches.uga.edu  Sat Aug  4 10:51:13 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Re: Abstract application wrapper
In-Reply-To: <Pine.LNX.4.31.0108012148250.903-100000@mercurio.localdomain>
References: <15207.52158.379530.574917@taxus.athen1.ga.home.com>
	<Pine.LNX.4.31.0108012148250.903-100000@mercurio.localdomain>
Message-ID: <15212.3041.23351.55042@taxus.athen1.ga.home.com>

Hi Davide, Jeff;

[AbstractApplication ideas]

> I think it is really very nice!

Thanks, I'm glad that I used my time during that conference talk
productively :-)
 
> However I slightly prefer the lighter version in which you have a class
> 
> AbstractApplicationCommandLine (yes to be shortened) instead of
> AbstractApplication
> 
> where the only difference is that you do not have a run method and have
> also a __str__ method behaving as construct_commandline. (or maybe better
> something returning a list of strings?)

I've been thinking about your points while at work (I've got lots of
time to think while grinding up cactus), and I totally agree with
you. I like the idea of the class representing a command-line, and so
__str__ returns the string representation of that class (I do prefer
the actual commandline being returned over a list of string). So I
also think it would be better to just have an AbstractCommandLine
class that only represents the class, and then have the functions to
run the programs separate from the class.

[....snip... Lots of good justifications]
 
> Let me know what you think about it.

You've convinced me :-). I'd be very happy if you'd like to work on
this and get something together. Having a common way to deal with
command-lines would be very nice, and might convince us to get support
together for more programs :-)

Brad


From chapmanb at arches.uga.edu  Sat Aug  4 11:02:56 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] add dynamic programming alignment modules
In-Reply-To: <p05101006b78e43ebd170@[171.65.33.250]>
References: <p05101006b78e43ebd170@[171.65.33.250]>
Message-ID: <15212.3744.940607.190511@taxus.athen1.ga.home.com>

Hey Jeff;

> On the flight home from ISMB, I coded up some modules to do pairwise 
> alignments.  I went ahead and put them into the Bio.Align package 
> because they seem most appropriate there -- I hope nobody objects!

Sweet! You are the man. And to think, I spent my whole time on the
flight nursing a bad headache caused by staying up the entire night
before (whoops, forgot to book a hotel room for that last night in
Denmark!), and reading Hunter S. Thompson books.

Seriously, I'm very happy to have this. I also have some dynamic
programming stuff in my HMM module (which I am getting ready for
potential submission right now -- working myself through the fun of
writing up docs); once I get that ready we can see if there is
anything there we can generalize and merge together.

Brad


From ybenita at mac.com  Sat Aug  4 18:57:52 2001
From: ybenita at mac.com (Yair Benita)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Mac stuff
Message-ID: <B7924A90.6E9%ybenita@mac.com>

Hi Guys,
I am glad to be joining this list. I will do my best to contribute to this
great project and especially keep you aware of some Mac issues.
Almost everything works on the Mac. I have some problems with all WWW
modules but I can't put my finger on it yet.
Local BLAST is not working on the Mac because OS.py does not have an
attribute pipe(). Before I dive into the C code of pipe() and try to make a
similar Mac attribute, does any of you have a nice and easy alternative?
Thanks,
Yari
-- 
Yair Benita
Pharmaceutical Proteomics
Utrecht University
Netherlands


From katel at worldpath.net  Mon Aug  6 02:56:58 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway Module
References: <NEBBJGOKPAACGBJLMGCDMEOPCBAA.tarjei@genome.wi.mit.edu>
Message-ID: <000f01c11e44$fe30b960$010a0a0a@cadence.com>

  I found another paper that may be interesting, at least, to keep the ideas
flowing.

http://www.ebi.ac.uk/research/pfmp/publications/biol_chem_2000/Biol_Chem-MS-
revised.html

The approach is based on an entity relationship model.  What I liked about
this approach is that it represents interection on any level of granularity,
without mixing levels.  You can zoom into the level of molecular reactions
or zoom out to the level of pathways.  This is done with two basic elements,
entities and interections that can be combined in a variety of ways,
subclassed, nested, chained and combined to  build  representations in a
flexible way.

  A subclass of entity  also provides evidence objects so you can see if
different techniques converge or assess the certainty of the conclusions
offered.

 IMHO its worth an hour or so to read.

                                    Cayte


From tarjei at genome.wi.mit.edu  Mon Aug  6 02:06:34 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway Module
In-Reply-To: <000f01c11e44$fe30b960$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDGEPKCBAA.tarjei@genome.wi.mit.edu>

>   I found another paper that may be interesting, at least, to 
> keep the ideas flowing.
> 

Thanks for the pointer. It looks like interesting work. I'll take a look
as soon as I get a chance.

I played around with some quick and dirty code this weekend to test
whether my initial ideas sucked (quick answer: yup ;) )

This might be obvious, but it occurred to me that there are two 
different "pathway" concepts that are useful in different circumstances:

The first, a System, is as a set of reactions that are 
implicitly connected through their products and substrates. This is
essentially equivalent to a stochiometric matrix, which is useful for
things like flux/mode analysis.

The second, a Pathway, is a set of species/metabolites that are 
explicitly linked through reactions. This is equivalent to a graph,
which is useful for things like route searches, neighbor analysis
and so on.

You can convert from a System to a Pathway by specifying which of
the products and substrates from the System reactions are to be 
used as nodes in the Pathway graph. The reverse conversion is trivial.

I think that in our module it might be useful to make a distinction 
between these two concepts. The reason being that they are each useful
for different kind of analyses, and that databases like KEGG, WIT
and BIND seem to contain many more individual reactions - which can 
be grouped into a System - than are used in their "curated" pathways.

Does this make sense?

 Tarjei


From katel at worldpath.net  Mon Aug  6 19:32:24 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Pathway Module
References: <NEBBJGOKPAACGBJLMGCDGEPKCBAA.tarjei@genome.wi.mit.edu>
Message-ID: <002601c11ed0$0df25260$010a0a0a@cadence.com>

> This might be obvious, but it occurred to me that there are two
> different "pathway" concepts that are useful in different circumstances:
>
> The first, a System, is as a set of reactions that are
> implicitly connected through their products and substrates. This is
> essentially equivalent to a stochiometric matrix, which is useful for
> things like flux/mode analysis.
>
> The second, a Pathway, is a set of species/metabolites that are
> explicitly linked through reactions. This is equivalent to a graph,
> which is useful for things like route searches, neighbor analysis
> and so on.
>
> You can convert from a System to a Pathway by specifying which of
> the products and substrates from the System reactions are to be
> used as nodes in the Pathway graph. The reverse conversion is trivial.
>
> I think that in our module it might be useful to make a distinction
> between these two concepts. The reason being that they are each useful
> for different kind of analyses, and that databases like KEGG, WIT
> and BIND seem to contain many more individual reactions - which can
> be grouped into a System - than are used in their "curated" pathways.
>
> Does this make sense?
>
   The user is boss.  If the separation of  modules is the most efficient
way to support  the typical user scenarios, I think we should go with it.

                                                   Cayte
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
>
>


From jchang at SMI.Stanford.EDU  Mon Aug  6 17:22:30 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Mac stuff
In-Reply-To: <B7924A90.6E9%ybenita@mac.com>
References: <B7924A90.6E9%ybenita@mac.com>
Message-ID: <p05101000b794bab6eb50@[171.65.33.250]>

>Local BLAST is not working on the Mac because OS.py does not have an
>attribute pipe(). Before I dive into the C code of pipe() and try to make a
>similar Mac attribute, does any of you have a nice and easy alternative?

blast gets launched with a call to the popen2 module, which seems to 
be supported only on Unix and Windows.  How do you exec a process on 
a Mac?  Does Python have a module to do stuff like this?

Jeff

From jchang at SMI.Stanford.EDU  Mon Aug  6 18:30:09 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] add dynamic programming alignment modules
In-Reply-To: <15212.3744.940607.190511@taxus.athen1.ga.home.com>
References: <p05101006b78e43ebd170@[171.65.33.250]>
 <15212.3744.940607.190511@taxus.athen1.ga.home.com>
Message-ID: <p05101006b794c3f61625@[171.65.33.250]>

At 11:02 AM -0400 8/4/01, Brad Chapman wrote:
>Hey Jeff;
>
>>  On the flight home from ISMB, I coded up some modules to do pairwise
>>  alignments.  I went ahead and put them into the Bio.Align package
>>  because they seem most appropriate there -- I hope nobody objects!
>
>Sweet! You are the man. And to think, I spent my whole time on the
>flight nursing a bad headache caused by staying up the entire night
>before (whoops, forgot to book a hotel room for that last night in
>Denmark!), and reading Hunter S. Thompson books.


Yeah, I don't know why you did that.  There were plenty of places you 
could have stayed!


>Seriously, I'm very happy to have this. I also have some dynamic
>programming stuff in my HMM module (which I am getting ready for
>potential submission right now -- working myself through the fun of
>writing up docs); once I get that ready we can see if there is
>anything there we can generalize and merge together.

Sounds good.  Although the boundary conditions are different, I 
believe the recurrences are the same, so we can share that part.  We 
only have to write it in C once!

Jeff

From julio at hpcf.upr.edu  Tue Aug  7 16:23:16 2001
From: julio at hpcf.upr.edu (julio@hpcf.upr.edu)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Fixed Bug
Message-ID: <200108072023.QAA13527@astraeus.hpcf.upr.edu>

   I include the archive FASTA.py with some change, and correct some errors

      The first thing is :

           the class write_records(records) mising the self
                
               write_records(self, records) corrected

           
      the second thing is :


               write(self , record) not support mutable objects with the
                                    following change support mutable objets


                data = self.tostring 

                      and this line have one adiional error
                      before the fixed version, the sentence
                       before is :
                                      
                              data = self.seq this is not correct because
                                              not exist seq attribute in
                                              Seq.py
    

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 2 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010807/efdc9d5d/attachment.obj
From tarjei at genome.wi.mit.edu  Wed Aug  8 14:27:41 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] Microarray  jamboree
Message-ID: <NEBBJGOKPAACGBJLMGCDIEAJCCAA.tarjei@genome.wi.mit.edu>

 Hi people,

 If there is anyone interested in microarray data and all that good stuff
you might want to check this out:

 There is a jamboree planned in Toronto on September 14-19 where people 
from academia (UC Berkeley, EBI) and industry (Affymetrix, Rosetta, etc.) 
will gather to implement open source tools to work with the new MAGE-ML
(an XML format for microarray data that is set to be the successor of
various existing standards, I forget their names) that is being released 
by the Object Management Group some time soon.

 The plan is to develop an API for the MAGE object model in several 
different languages. Currently there are people signed up to work on
C/C++, Perl and CORBA implementations - *but no Python*. 

 If any of you biopythoneers are interested in doing a Python 
implementation there you should sign up on the microarray-format 
mailing list at www.mged.org, and then notify the 
organizer (Paul Spellman) ASAP.

 thanks,

 Tarjei

From katel at worldpath.net  Wed Aug  8 20:19:53 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDIEAJCCAA.tarjei@genome.wi.mit.edu>
Message-ID: <002001c12069$05af1620$010a0a0a@cadence.com>

Yesterday, I downloaded your new code for enzymes.  I started code, but WIT
uses the KEGG format for enzymes.  So we may be able to get by with one
piece of code for both.

In your test files, what is the difference between the irregular and the
sample files?.  Did you manually strip out the HTML stuff?  I'll try to
upload more test cases because sometimes I've seen bugs on the tenth case.


I hope its cooler where you live than in my area( 88 F ).

                                                          Cayte


From katel at worldpath.net  Wed Aug  8 20:35:26 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:02 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDIEAJCCAA.tarjei@genome.wi.mit.edu> <002001c12069$05af1620$010a0a0a@cadence.com>
Message-ID: <002601c1206b$303bcee0$010a0a0a@cadence.com>

>
> In your test files, what is the difference between the irregular and the
> sample files?.  Did you manually strip out the HTML stuff?  I'll try to
> upload more test cases because sometimes I've seen bugs on the tenth case.
>
   The only difference I can see between the WIT text and KEGG is an html
tag embedded in the entry line in WIT.  The format needs to strip out the
angle bracketed stuff between ENTRY and the EC number.

                                                             Cayte


From tarjei at genome.wi.mit.edu  Wed Aug  8 17:32:03 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
In-Reply-To: <002001c12069$05af1620$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDEEALCCAA.tarjei@genome.wi.mit.edu>

> Yesterday, I downloaded your new code for enzymes.  I started 
> code, but WIT uses the KEGG format for enzymes.  So we may be 
> able to get by with one piece of code for both.

Yeah, I noticed that when I played around with WIT the other day.
I suspect that they're not only using the same format, but that
the enzyme record there are in fact the same as those in KEGG 
(or maybe it's the other way around, I don't know). - I haven't 
verified this though.

It makes sense to not duplicate the code, so we can either move
the shared parts into a module by itself, or you can just import
my KEGG code in your modules. 

> In your test files, what is the difference between the irregular and the
> sample files?.  

The KEGG distribution comes with a text file describing the 
record format. The .irregular files contains records distributed
by KEGG that does not conform to their description <sigh>.

> Did you manually strip out the HTML stuff?  

There was no HTML. You can download all the enzyme records in KEGG
in one big flatfile with no markup. If you want to pull down records
directly from a web page you can just strip the tags off in a simple
preprocessing step. There might even be a standard library call for
that.

> I'll try to
> upload more test cases because sometimes I've seen bugs 
> on the tenth case.

That would be great.

> I hope its cooler where you live than in my area( 88 F ).

A little, it's about 80 at Logan now. They've warned us that we
might get up into the 90ies before the weekend though.
 
We'll just have to make sure the good old heat-shock proteins are
working :)


 Tarjei

From katel at worldpath.net  Thu Aug  9 00:26:17 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Rebase
Message-ID: <003201c1208b$7055d1e0$010a0a0a@cadence.com>

 I just changed the print routines to sort keys before printing  and renamed
the top routine to __str__ to make it consistent with python style.

                                                      Cayte


From katel at worldpath.net  Thu Aug  9 01:50:18 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Rebase
References: <003201c1208b$7055d1e0$010a0a0a@cadence.com>
Message-ID: <006901c12097$2cd9b420$010a0a0a@cadence.com>

  I just uploaded some test files to biopython/tests/WIT, both the text and
htm versions.  They should work for KEGG except for the embedded html tag on
the ENTRY line.

                                              Cayte


From katel at worldpath.net  Fri Aug 10 01:22:30 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] RecordFile
Message-ID: <002001c1215c$7565d3c0$010a0a0a@cadence.com>

  I updated a fix to RecordFile, to check for an end of file condition I had
previously missed.  I spliced it into my local version of test_KEGG and it
passed.  The next step is to see if it can strip out gibberish between
records Unless the gibberish contains a start tag .  I don't know how to
make it absolutely bulletproof, but hopefully I can make it useful.


Cayte


From katel at worldpath.net  Fri Aug 10 20:58:48 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDEEALCCAA.tarjei@genome.wi.mit.edu>
Message-ID: <003b01c12200$c8ff3f40$010a0a0a@cadence.com>

  I made these changes to  a copy of KEGG/enzyyme_format.py,

  html_tag = Expression.Literal( '<' ) + Rep( AnyBut( '>\n\r' ) ) +
Expression.Literal( '>' )

entry = Group("entry",
              Str1("EC ") +
              Rep( Str( " " ) ) + Opt( html_tag ) +
              Rep(Rep1(Integer()) + point) +
              Rep1(Integer()) +
              Rep( Str( " " ) ) + Opt( html_tag ) )

  The format failed halfway through the file.  I think the problem is the
order of entries.  The format specifies GENES before MOTIF but this order is
reversed in the test file.  Maybe the format should be less sensitive to
order ,where it doesn't convey information.

                                                 Cayte


From tarjei at genome.wi.mit.edu  Sat Aug 11 00:35:04 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
In-Reply-To: <003b01c12200$c8ff3f40$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDCEBECCAA.tarjei@genome.wi.mit.edu>

>   I made these changes to  a copy of KEGG/enzyyme_format.py,
> 
>   html_tag = Expression.Literal( '<' ) + Rep( AnyBut( '>\n\r' ) ) +
> Expression.Literal( '>' )
> 
> entry = Group("entry",
>               Str1("EC ") +
>               Rep( Str( " " ) ) + Opt( html_tag ) +
>               Rep(Rep1(Integer()) + point) +
>               Rep1(Integer()) +
>               Rep( Str( " " ) ) + Opt( html_tag ) )

 I'm not too fond of adding this to the format file. HTML markup isn't
part of the KEGG format description, so this seems a bit ad hoc.

 Instead I suggest that you either run the input through 
File.SGMLHandle or File.SGMLStripper before you pass the
WIT record to KEGG.Enzyme.Parser OR write a separate Parser
class in your WIT module that wraps a ParserSupport.SGMLStrippingConsumer
around KEGG.Enzyme._Consumer.
 
>   The format failed halfway through the file.  I think the problem is the
> order of entries.  The format specifies GENES before MOTIF but 
> this order is
> reversed in the test file.  Maybe the format should be less sensitive to
> order ,where it doesn't convey information.

 Yeah, the entries are supposed to come in a specified order, but even
the KEGG people don't follow that rule. I've committed a change to 
KEGG.Enzyme.enzyme_format.py that assumes very little about entry
ordering. If that's the error, it should work for you now.

 Tarjei

From katel at worldpath.net  Sun Aug 12 01:52:22 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDCEBECCAA.tarjei@genome.wi.mit.edu>
Message-ID: <001001c122f2$f602e760$010a0a0a@cadence.com>

----- Original Message -----
From: "Tarjei S Mikkelsen" <tarjei@genome.wi.mit.edu>
>  I'm not too fond of adding this to the format file. HTML markup isn't
> part of the KEGG format description, so this seems a bit ad hoc.
>
>  Instead I suggest that you either run the input through
> File.SGMLHandle or File.SGMLStripper before you pass the
> WIT record to KEGG.Enzyme.Parser OR write a separate Parser
> class in your WIT module that wraps a ParserSupport.SGMLStrippingConsumer
> around KEGG.Enzyme._Consumer.
>
  The problem is I'm experimenting with a filter to strip out junk ( not
necessarily html ) between records.
The motivation is that I've had Martel fail on just an extraneous line feed.
Somehow the idea of chaining two filters together trips a watch for bugs
alarm in my mind.

> >   The format failed halfway through the file.  I think the problem is
the
> > order of entries.  The format specifies GENES before MOTIF but
> > this order is
> > reversed in the test file.  Maybe the format should be less sensitive to
> > order ,where it doesn't convey information.
>
>  Yeah, the entries are supposed to come in a specified order, but even
> the KEGG people don't follow that rule. I've committed a change to
> KEGG.Enzyme.enzyme_format.py that assumes very little about entry
> ordering. If that's the error, it should work for you now.
>

Now its stopping on files with db links like this example:

            PIR: B49338  B49935  E64239  KIECAA

These are quibbles but the computer doesn't understand quibbles:).

                                                                 Cayte
>  Tarjei
>
>


From biopython-bugs at bioperl.org  Mon Aug 13 21:57:24 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/39
Message-ID: <200108140157.f7E1vOq28569@pw600a.bioperl.org>

JitterBug notification

new message incoming/39

Message summary for PR#39
	From: cirano@chollian.net
	Subject: Parsing Problem of GenBank format
	Date: Mon, 13 Aug 2001 21:57:23 -0400
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From cirano@chollian.net Mon Aug 13 21:57:24 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7E1vIq28563
	for <biopython-bugs@pw600a.bioperl.org>; Mon, 13 Aug 2001 21:57:23 -0400
Date: Mon, 13 Aug 2001 21:57:23 -0400
Message-Id: <200108140157.f7E1vIq28563@pw600a.bioperl.org>
From: cirano@chollian.net
To: biopython-bugs@bioperl.org
Subject: Parsing Problem of GenBank format

Full_Name: Chang Gyeom, Kim
Module: Bio/File.py/saveline module
Version: Biopython1.00a2
OS: Redhat7.1
Submission from: (NULL) (203.248.117.3)


My Source code:  

	from Bio import GenBank

	search_term = "Lupine leghemoglobin"

	gi_list = GenBank.search_for(search_term)

	ncbi_dict = GenBank.NCBIDictionary()
	gb_seqrecord = ncbi_dict[ gi_list[0] ]
	print gb_seqrecord

When I run this code, I lost first 5 lines of GenBank Record.
I think this problem is caused by the function of "saveline" 
located in Bio/File.py module

So I revised the code like this:

    def saveline(self, line):
        if line:
            handle_contents = self.read()
            self._saved = line + handle_contents
            self._handle = StringIO.StringIO(self._saved)

Although I fixed my problem, I'm not sure this is the right way.
 

From biopython-bugs at bioperl.org  Tue Aug 14 02:07:19 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/32
Message-ID: <200108140607.f7E67Jq29518@pw600a.bioperl.org>

JitterBug notification

jchang changed notes

Message summary for PR#32
	From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
	Subject: Re: [Biopython-dev] Notification: incoming/31
	Date: Wed, 16 May 2001 11:58:00 -0700
	0 replies 	0 followups
	Notes: duplicate of Bug #31.  How did this get split?


====> ORIGINAL MESSAGE FOLLOWS <====

>From jchang@SMI.Stanford.EDU Wed May 16 13:53:23 2001
Received: from crg-gw.Stanford.EDU (root@crg-gw.Stanford.EDU [171.65.32.201])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4GHrJb11642
	for <biopython-bugs@bioperl.org>; Wed, 16 May 2001 13:53:23 -0400
Received: from [171.65.33.127] (chang-smi.Stanford.EDU [171.65.33.127])
	by crg-gw.Stanford.EDU (8.9.1a/8.9.1) with ESMTP id LAA23878;
	Wed, 16 May 2001 11:58:23 -0700 (PDT)
User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.02.2022
Date: Wed, 16 May 2001 11:58:00 -0700
Subject: Re: [Biopython-dev] Notification: incoming/31
From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
To: <hy263book@263.net>
CC: <biopython-bugs@bioperl.org>
Message-ID: <B7281BC8.805%jchang@smi.stanford.edu>
In-Reply-To: <200105160814.f4G8EZb32193@pw600a.bioperl.org>
Mime-version: 1.0
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
Content-Transfer-Encoding: 7bit

Hi Huang,

Could you send the file that's generating the output?  We have regression
tests that check for behavior for "No hits found", and it does not generate
any error message, as designed.

helio:~/remotecvs/biopython/Tests/Blast> python
Python 2.1 (#7, Apr 17 2001, 18:53:25)
[GCC 2.8.1] on sunos5
Type "copyright", "credits" or "license" for more information.
>>> from Bio.Blast import NCBIStandalone
>>> rec = NCBIStandalone.BlastParser().parse_file('bt002')
>>> print rec.alignments
[]
>>> 

Thanks,
Jeff


> From: biopython-bugs@bioperl.org
> Date: Wed, 16 May 2001 04:14:35 -0400
> To: biopython-dev@biopython.org
> Subject: [Biopython-dev] Notification: incoming/31
> 
> JitterBug notification
> 
> new message incoming/31
> 
> Message summary for PR#31
> From: hy263book@263.net
> Subject: When I encounter "No hits found"
> Date: Wed, 16 May 2001 04:14:35 -0400
> 0 replies     0 followups
> 
> ====> ORIGINAL MESSAGE FOLLOWS <====
> 
>> From hy263book@263.net Wed May 16 04:14:35 2001
> Received: from localhost (localhost [127.0.0.1])
> by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4G8EYb32187
> for <biopython-bugs@pw600a.bioperl.org>; Wed, 16 May 2001 04:14:35 -0400
> Date: Wed, 16 May 2001 04:14:35 -0400
> Message-Id: <200105160814.f4G8EYb32187@pw600a.bioperl.org>
> From: hy263book@263.net
> To: biopython-bugs@bioperl.org
> Subject: When I encounter "No hits found"
> 
> Full_Name: Huang Ying
> Module: Bio.Blast.NCBIStandalond
> Version: 
> OS: Win2k
> Submission from: (NULL) (166.111.30.26)
> 
> 
> I use Bio.Blast.NCBIStandalone.BlastParser to analysis Blast report.When blast
> result is "No hits found",python send the wrong message
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 


From biopython-bugs at bioperl.org  Tue Aug 14 02:07:20 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/32
Message-ID: <200108140607.f7E67Kq29522@pw600a.bioperl.org>

JitterBug notification

jchang moved PR#32 from incoming to fixed-bugs
Message summary for PR#32
	From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
	Subject: Re: [Biopython-dev] Notification: incoming/31
	Date: Wed, 16 May 2001 11:58:00 -0700
	0 replies 	0 followups
	Notes: duplicate of Bug #31.  How did this get split?


====> ORIGINAL MESSAGE FOLLOWS <====

>From jchang@SMI.Stanford.EDU Wed May 16 13:53:23 2001
Received: from crg-gw.Stanford.EDU (root@crg-gw.Stanford.EDU [171.65.32.201])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4GHrJb11642
	for <biopython-bugs@bioperl.org>; Wed, 16 May 2001 13:53:23 -0400
Received: from [171.65.33.127] (chang-smi.Stanford.EDU [171.65.33.127])
	by crg-gw.Stanford.EDU (8.9.1a/8.9.1) with ESMTP id LAA23878;
	Wed, 16 May 2001 11:58:23 -0700 (PDT)
User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.02.2022
Date: Wed, 16 May 2001 11:58:00 -0700
Subject: Re: [Biopython-dev] Notification: incoming/31
From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
To: <hy263book@263.net>
CC: <biopython-bugs@bioperl.org>
Message-ID: <B7281BC8.805%jchang@smi.stanford.edu>
In-Reply-To: <200105160814.f4G8EZb32193@pw600a.bioperl.org>
Mime-version: 1.0
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
Content-Transfer-Encoding: 7bit

Hi Huang,

Could you send the file that's generating the output?  We have regression
tests that check for behavior for "No hits found", and it does not generate
any error message, as designed.

helio:~/remotecvs/biopython/Tests/Blast> python
Python 2.1 (#7, Apr 17 2001, 18:53:25)
[GCC 2.8.1] on sunos5
Type "copyright", "credits" or "license" for more information.
>>> from Bio.Blast import NCBIStandalone
>>> rec = NCBIStandalone.BlastParser().parse_file('bt002')
>>> print rec.alignments
[]
>>> 

Thanks,
Jeff


> From: biopython-bugs@bioperl.org
> Date: Wed, 16 May 2001 04:14:35 -0400
> To: biopython-dev@biopython.org
> Subject: [Biopython-dev] Notification: incoming/31
> 
> JitterBug notification
> 
> new message incoming/31
> 
> Message summary for PR#31
> From: hy263book@263.net
> Subject: When I encounter "No hits found"
> Date: Wed, 16 May 2001 04:14:35 -0400
> 0 replies     0 followups
> 
> ====> ORIGINAL MESSAGE FOLLOWS <====
> 
>> From hy263book@263.net Wed May 16 04:14:35 2001
> Received: from localhost (localhost [127.0.0.1])
> by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4G8EYb32187
> for <biopython-bugs@pw600a.bioperl.org>; Wed, 16 May 2001 04:14:35 -0400
> Date: Wed, 16 May 2001 04:14:35 -0400
> Message-Id: <200105160814.f4G8EYb32187@pw600a.bioperl.org>
> From: hy263book@263.net
> To: biopython-bugs@bioperl.org
> Subject: When I encounter "No hits found"
> 
> Full_Name: Huang Ying
> Module: Bio.Blast.NCBIStandalond
> Version: 
> OS: Win2k
> Submission from: (NULL) (166.111.30.26)
> 
> 
> I use Bio.Blast.NCBIStandalone.BlastParser to analysis Blast report.When blast
> result is "No hits found",python send the wrong message
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 


From biopython-bugs at bioperl.org  Tue Aug 14 02:10:10 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/39
Message-ID: <200108140610.f7E6AAq29623@pw600a.bioperl.org>

JitterBug notification

jchang changed notes

Message summary for PR#39
	From: cirano@chollian.net
	Subject: Parsing Problem of GenBank format
	Date: Mon, 13 Aug 2001 21:57:23 -0400
	0 replies 	0 followups
	Notes: Thanks for the bug report.  Andrew Dalke noted this earlier and submitted the
follow fix for UndoHandle.read:
   def read(self, size=-1):
        if size == -1:
            saved = string.join(self._saved, "")
            self._saved[:] = []
        else:

It's checked into the CVS and will go out the next release.  Actually, enough
people are getting tripped out on it that that should happen sooner than later.


====> ORIGINAL MESSAGE FOLLOWS <====

>From cirano@chollian.net Mon Aug 13 21:57:24 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7E1vIq28563
	for <biopython-bugs@pw600a.bioperl.org>; Mon, 13 Aug 2001 21:57:23 -0400
Date: Mon, 13 Aug 2001 21:57:23 -0400
Message-Id: <200108140157.f7E1vIq28563@pw600a.bioperl.org>
From: cirano@chollian.net
To: biopython-bugs@bioperl.org
Subject: Parsing Problem of GenBank format

Full_Name: Chang Gyeom, Kim
Module: Bio/File.py/saveline module
Version: Biopython1.00a2
OS: Redhat7.1
Submission from: (NULL) (203.248.117.3)


My Source code:  

	from Bio import GenBank

	search_term = "Lupine leghemoglobin"

	gi_list = GenBank.search_for(search_term)

	ncbi_dict = GenBank.NCBIDictionary()
	gb_seqrecord = ncbi_dict[ gi_list[0] ]
	print gb_seqrecord

When I run this code, I lost first 5 lines of GenBank Record.
I think this problem is caused by the function of "saveline" 
located in Bio/File.py module

So I revised the code like this:

    def saveline(self, line):
        if line:
            handle_contents = self.read()
            self._saved = line + handle_contents
            self._handle = StringIO.StringIO(self._saved)

Although I fixed my problem, I'm not sure this is the right way.
 

From biopython-bugs at bioperl.org  Tue Aug 14 02:10:10 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/39
Message-ID: <200108140610.f7E6AAq29627@pw600a.bioperl.org>

JitterBug notification

jchang moved PR#39 from incoming to fixed-bugs
Message summary for PR#39
	From: cirano@chollian.net
	Subject: Parsing Problem of GenBank format
	Date: Mon, 13 Aug 2001 21:57:23 -0400
	0 replies 	0 followups
	Notes: Thanks for the bug report.  Andrew Dalke noted this earlier and submitted the
follow fix for UndoHandle.read:
   def read(self, size=-1):
        if size == -1:
            saved = string.join(self._saved, "")
            self._saved[:] = []
        else:

It's checked into the CVS and will go out the next release.  Actually, enough
people are getting tripped out on it that that should happen sooner than later.


====> ORIGINAL MESSAGE FOLLOWS <====

>From cirano@chollian.net Mon Aug 13 21:57:24 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7E1vIq28563
	for <biopython-bugs@pw600a.bioperl.org>; Mon, 13 Aug 2001 21:57:23 -0400
Date: Mon, 13 Aug 2001 21:57:23 -0400
Message-Id: <200108140157.f7E1vIq28563@pw600a.bioperl.org>
From: cirano@chollian.net
To: biopython-bugs@bioperl.org
Subject: Parsing Problem of GenBank format

Full_Name: Chang Gyeom, Kim
Module: Bio/File.py/saveline module
Version: Biopython1.00a2
OS: Redhat7.1
Submission from: (NULL) (203.248.117.3)


My Source code:  

	from Bio import GenBank

	search_term = "Lupine leghemoglobin"

	gi_list = GenBank.search_for(search_term)

	ncbi_dict = GenBank.NCBIDictionary()
	gb_seqrecord = ncbi_dict[ gi_list[0] ]
	print gb_seqrecord

When I run this code, I lost first 5 lines of GenBank Record.
I think this problem is caused by the function of "saveline" 
located in Bio/File.py module

So I revised the code like this:

    def saveline(self, line):
        if line:
            handle_contents = self.read()
            self._saved = line + handle_contents
            self._handle = StringIO.StringIO(self._saved)

Although I fixed my problem, I'm not sure this is the right way.
 

From biopython-bugs at bioperl.org  Tue Aug 14 02:10:10 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: PR#39
Message-ID: <200108140610.f7E6AAq29632@pw600a.bioperl.org>

JitterBug notification

jchang moved PR#39 from incoming to fixed-bugs
Message summary for PR#39
	From: cirano@chollian.net
	Subject: Parsing Problem of GenBank format
	Date: Mon, 13 Aug 2001 21:57:23 -0400
	0 replies 	0 followups
	Notes: Thanks for the bug report.  Andrew Dalke noted this earlier and submitted the
follow fix for UndoHandle.read:
   def read(self, size=-1):
        if size == -1:
            saved = string.join(self._saved, "")
            self._saved[:] = []
        else:

It's checked into the CVS and will go out the next release.  Actually, enough
people are getting tripped out on it that that should happen sooner than later.


====> ORIGINAL MESSAGE FOLLOWS <====

>From cirano@chollian.net Mon Aug 13 21:57:24 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7E1vIq28563
	for <biopython-bugs@pw600a.bioperl.org>; Mon, 13 Aug 2001 21:57:23 -0400
Date: Mon, 13 Aug 2001 21:57:23 -0400
Message-Id: <200108140157.f7E1vIq28563@pw600a.bioperl.org>
From: cirano@chollian.net
To: biopython-bugs@bioperl.org
Subject: Parsing Problem of GenBank format

Full_Name: Chang Gyeom, Kim
Module: Bio/File.py/saveline module
Version: Biopython1.00a2
OS: Redhat7.1
Submission from: (NULL) (203.248.117.3)


My Source code:  

	from Bio import GenBank

	search_term = "Lupine leghemoglobin"

	gi_list = GenBank.search_for(search_term)

	ncbi_dict = GenBank.NCBIDictionary()
	gb_seqrecord = ncbi_dict[ gi_list[0] ]
	print gb_seqrecord

When I run this code, I lost first 5 lines of GenBank Record.
I think this problem is caused by the function of "saveline" 
located in Bio/File.py module

So I revised the code like this:

    def saveline(self, line):
        if line:
            handle_contents = self.read()
            self._saved = line + handle_contents
            self._handle = StringIO.StringIO(self._saved)

Although I fixed my problem, I'm not sure this is the right way.
 

From biopython-bugs at bioperl.org  Tue Aug 14 02:11:32 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/35
Message-ID: <200108140611.f7E6BWq29758@pw600a.bioperl.org>

JitterBug notification

jchang changed notes

Message summary for PR#35
	From: tarjei@mit.edu
	Subject: NCBIStandalone.BlastParser bug
	Date: Tue, 19 Jun 2001 10:57:42 -0400
	0 replies 	0 followups
	Notes: format change, got fixed and released in biopython 1.0a2

-Jeff


====> ORIGINAL MESSAGE FOLLOWS <====

>From tarjei@mit.edu Tue Jun 19 10:57:42 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272
	for <biopython-bugs@pw600a.bioperl.org>; Tue, 19 Jun 2001 10:57:42 -0400
Date: Tue, 19 Jun 2001 10:57:42 -0400
Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org>
From: tarjei@mit.edu
To: biopython-bugs@bioperl.org
Subject: NCBIStandalone.BlastParser bug

Full_Name: Tarjei Mikkelsen
Module: Bio.Blast.NCBIStandalone.BlastParser
Version: 1.00a
OS: Dec/Alpha OSF1
Submission from: incognito.mit.edu (18.246.0.239)


The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser) fails
with a SyntaxError when the (path)name of the database spans more than one
line.

The following code stub/BLAST output will reproduce the bug: (Even though this
example is from BLAST 2.0.5 the same thing happens in newer versions)

<<<<<CUT: blast_parser_bug.py>>>>>
from Bio.Blast import NCBIStandalone

blast_out = open("blast_parser_bug.out", "r")
blast_parser = NCBIStandalone.BlastParser()
blast_record = blast_parser.parse(blast_out)
<<<<<CUT>>>>>

<<<<<CUT: blast_parser_bug.out>>>>>
BLASTP 2.0.5 [May-5-1998]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= eco:b1416
         (83 letters)

Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11
.fa
           39 sequences; 18,779 total letters

Searching......................................done

                                                                   Score     E
Sequences producing significant alignments:                        (bits) 
Value

spy:SPy1283                                                           20  0.64
lla:L0002                                                             20  0.84

>spy:SPy1283
           Length = 337
           
 Score = 20.4 bits (41), Expect = 0.64
 Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDATQ 45
           G  +EE+V S I+G +  G++F  T+
Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312


>lla:L0002
           Length = 340
           
 Score = 20.0 bits (40), Expect = 0.84
 Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDAT 44
           G  +EE+V S I+G +  G++F  T
Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310


 Score = 18.8 bits (37), Expect = 1.9
 Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%)

Query: 28  VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55
           + +DI+G+ F   FD A  T + A+  ++
Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154


  Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.
  11.fa
    Posted date:  Jun 18, 2001  1:19 PM
  Number of letters in database: 18,779
  Number of sequences in database:  39
  
Lambda     K      H
   0.313    0.129    0.352 

Gapped
Lambda     K      H
   0.270   0.0470    0.230 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 2788
Number of Sequences: 39
Number of extensions: 119
Number of successful extensions: 3
Number of sequences better than 10: 2
Number of HSP's better than 10.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 3
length of query: 83
length of database: 18779
effective HSP length: 33
effective length of query: 50
effective length of database: 17492
effective search space:   874600
T: 11
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 34 (18.3 bits)
S2: 31 (16.5 bits)
<<<<<CUT>>>>>


From biopython-bugs at bioperl.org  Tue Aug 14 02:11:32 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/35
Message-ID: <200108140611.f7E6BWq29762@pw600a.bioperl.org>

JitterBug notification

jchang moved PR#35 from incoming to fixed-bugs
Message summary for PR#35
	From: tarjei@mit.edu
	Subject: NCBIStandalone.BlastParser bug
	Date: Tue, 19 Jun 2001 10:57:42 -0400
	0 replies 	0 followups
	Notes: format change, got fixed and released in biopython 1.0a2

-Jeff


====> ORIGINAL MESSAGE FOLLOWS <====

>From tarjei@mit.edu Tue Jun 19 10:57:42 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272
	for <biopython-bugs@pw600a.bioperl.org>; Tue, 19 Jun 2001 10:57:42 -0400
Date: Tue, 19 Jun 2001 10:57:42 -0400
Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org>
From: tarjei@mit.edu
To: biopython-bugs@bioperl.org
Subject: NCBIStandalone.BlastParser bug

Full_Name: Tarjei Mikkelsen
Module: Bio.Blast.NCBIStandalone.BlastParser
Version: 1.00a
OS: Dec/Alpha OSF1
Submission from: incognito.mit.edu (18.246.0.239)


The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser) fails
with a SyntaxError when the (path)name of the database spans more than one
line.

The following code stub/BLAST output will reproduce the bug: (Even though this
example is from BLAST 2.0.5 the same thing happens in newer versions)

<<<<<CUT: blast_parser_bug.py>>>>>
from Bio.Blast import NCBIStandalone

blast_out = open("blast_parser_bug.out", "r")
blast_parser = NCBIStandalone.BlastParser()
blast_record = blast_parser.parse(blast_out)
<<<<<CUT>>>>>

<<<<<CUT: blast_parser_bug.out>>>>>
BLASTP 2.0.5 [May-5-1998]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= eco:b1416
         (83 letters)

Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11
.fa
           39 sequences; 18,779 total letters

Searching......................................done

                                                                   Score     E
Sequences producing significant alignments:                        (bits) 
Value

spy:SPy1283                                                           20  0.64
lla:L0002                                                             20  0.84

>spy:SPy1283
           Length = 337
           
 Score = 20.4 bits (41), Expect = 0.64
 Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDATQ 45
           G  +EE+V S I+G +  G++F  T+
Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312


>lla:L0002
           Length = 340
           
 Score = 20.0 bits (40), Expect = 0.84
 Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDAT 44
           G  +EE+V S I+G +  G++F  T
Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310


 Score = 18.8 bits (37), Expect = 1.9
 Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%)

Query: 28  VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55
           + +DI+G+ F   FD A  T + A+  ++
Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154


  Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.
  11.fa
    Posted date:  Jun 18, 2001  1:19 PM
  Number of letters in database: 18,779
  Number of sequences in database:  39
  
Lambda     K      H
   0.313    0.129    0.352 

Gapped
Lambda     K      H
   0.270   0.0470    0.230 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 2788
Number of Sequences: 39
Number of extensions: 119
Number of successful extensions: 3
Number of sequences better than 10: 2
Number of HSP's better than 10.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 3
length of query: 83
length of database: 18779
effective HSP length: 33
effective length of query: 50
effective length of database: 17492
effective search space:   874600
T: 11
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 34 (18.3 bits)
S2: 31 (16.5 bits)
<<<<<CUT>>>>>


From biopython-bugs at bioperl.org  Tue Aug 14 16:44:35 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/40
Message-ID: <200108142044.f7EKiZq02776@pw600a.bioperl.org>

JitterBug notification

new message incoming/40

Message summary for PR#40
	From: joungjh@AptusGenomics.com
	Subject: retrieving GenBank records from NCBI
	Date: Tue, 14 Aug 2001 16:44:34 -0400
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From joungjh@AptusGenomics.com Tue Aug 14 16:44:35 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7EKiYq02770
	for <biopython-bugs@pw600a.bioperl.org>; Tue, 14 Aug 2001 16:44:34 -0400
Date: Tue, 14 Aug 2001 16:44:34 -0400
Message-Id: <200108142044.f7EKiYq02770@pw600a.bioperl.org>
From: joungjh@AptusGenomics.com
To: biopython-bugs@bioperl.org
Subject: retrieving GenBank records from NCBI

Full_Name: J. Joung
Module: GenBank
Version: biopython-1.00a2
OS: UNIX
Submission from: gw-aptusgen1.cust.fast.net (209.92.248.166)


I'm using GenBank NCBIDictionary to retrieve a GenBank record. The retrived
record is missing the following information: LOCUS, DEFINITION, ACCESSION,
VERSION, and KEYWORDS.  

Is there a way of obtaining the GenBank id from a known locuslink id in
biopython?


From katel at worldpath.net  Tue Aug 14 21:43:32 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDCEBECCAA.tarjei@genome.wi.mit.edu> <001001c122f2$f602e760$010a0a0a@cadence.com>
Message-ID: <005801c1252b$b3274040$010a0a0a@cadence.com>

----- Original Message -----
From: "Cayte" <katel@worldpath.net>
To: <tarjei@MIT.EDU>
Cc: <biopython-dev@biopython.org>
Sent: Saturday, August 11, 2001 10:52 PM
Subject: Re: [Biopython-dev] WIT and KEGG


>
> ----- Original Message -----
> From: "Tarjei S Mikkelsen" <tarjei@genome.wi.mit.edu>
> >  I'm not too fond of adding this to the format file. HTML markup isn't
> > part of the KEGG format description, so this seems a bit ad hoc.
> >
> >  Instead I suggest that you either run the input through
> > File.SGMLHandle or File.SGMLStripper before you pass the
> > WIT record to KEGG.Enzyme.Parser OR write a separate Parser
> > class in your WIT module that wraps a
ParserSupport.SGMLStrippingConsumer
> > around KEGG.Enzyme._Consumer.
> >
>   The problem is I'm experimenting with a filter to strip out junk ( not
> necessarily html ) between records.
> The motivation is that I've had Martel fail on just an extraneous line
feed.
> Somehow the idea of chaining two filters together trips a watch for bugs
> alarm in my mind.
>
> > >   The format failed halfway through the file.  I think the problem is
> the
> > > order of entries.  The format specifies GENES before MOTIF but
> > > this order is
> > > reversed in the test file.  Maybe the format should be less sensitive
to
> > > order ,where it doesn't convey information.
> >
> >  Yeah, the entries are supposed to come in a specified order, but even
> > the KEGG people don't follow that rule. I've committed a change to
> > KEGG.Enzyme.enzyme_format.py that assumes very little about entry
> > ordering. If that's the error, it should work for you now.
> >
>
> Now its stopping on files with db links like this example:
>
>             PIR: B49338  B49935  E64239  KIECAA
>
> These are quibbles but the computer doesn't understand quibbles:).
>
>                                                                  Cayte
> >  Tarjei
> >
> >
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
>
>


From tarjei at genome.wi.mit.edu  Tue Aug 14 19:15:16 2001
From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
In-Reply-To: <005801c1252b$b3274040$010a0a0a@cadence.com>
Message-ID: <NEBBJGOKPAACGBJLMGCDCECICCAA.tarjei@genome.wi.mit.edu>

> > >  Instead I suggest that you either run the input through
> > > File.SGMLHandle or File.SGMLStripper before you pass the
> > > WIT record to KEGG.Enzyme.Parser OR write a separate Parser
> > > class in your WIT module that wraps a
> ParserSupport.SGMLStrippingConsumer
> > > around KEGG.Enzyme._Consumer.
> > >
> >   The problem is I'm experimenting with a filter to strip out junk ( not
> > necessarily html ) between records.
> > The motivation is that I've had Martel fail on just an extraneous line
> feed.
> > Somehow the idea of chaining two filters together trips a watch for bugs
> > alarm in my mind.

Sure, for experimentation that's fine, but I'd prefer to keep it the way it
is in the distribution version. Especially because the HTML versions of
these records are full of other markup _in_ the record that has to be
cleaned out anyway - and adding regexps for all of those would be a mess.

> > > >   The format failed halfway through the file.  I think the
> problem is
> > the
> > > > order of entries.  The format specifies GENES before MOTIF but
> > > > this order is
> > > > reversed in the test file.  Maybe the format should be less
> sensitive
> to
> > > > order ,where it doesn't convey information.
> > >
> > >  Yeah, the entries are supposed to come in a specified order, but even
> > > the KEGG people don't follow that rule. I've committed a change to
> > > KEGG.Enzyme.enzyme_format.py that assumes very little about entry
> > > ordering. If that's the error, it should work for you now.
> > >
> >
> > Now its stopping on files with db links like this example:
> >
> >             PIR: B49338  B49935  E64239  KIECAA
> >
> > These are quibbles but the computer doesn't understand quibbles:).

 Yeah, I missed this case because it doesn't appear in KEGG. I've committed
another change which appears to deal well with it.

 Btw, I'm going away for a couple of weeks, so I'll won't be very responsive
during that time. But I'm planning to bring my laptop to do some more
experiments with reaction/pathway classes.

 take care,

 Tarjei


From katel at worldpath.net  Wed Aug 15 02:14:04 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] METATOOL
Message-ID: <001b01c12551$7d784fe0$010a0a0a@cadence.com>

  The WIT files work fine with the KEGG parser now.

   In the next couple of weeks, I plan to look into METATOOL, maybe start a
Martel parser for the output.  Pathway researchers use it a lot, like
genomic researchers use blast.  The output of METATOOL is flat - no html
tags.

                                                       Cayte


From biopython-bugs at bioperl.org  Wed Aug 15 01:45:12 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/41
Message-ID: <200108150545.f7F5jCq05973@pw600a.bioperl.org>

JitterBug notification

new message incoming/41

Message summary for PR#41
	From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
	Subject: Re: [Biopython-dev] Notification: incoming/40
	Date: Tue, 14 Aug 2001 22:46:45 -0700
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From jchang@SMI.Stanford.EDU Wed Aug 15 01:45:11 2001
Received: from crg-gw.Stanford.EDU (root@crg-gw.Stanford.EDU [171.65.32.201])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7F5jAq05966
	for <biopython-bugs@bioperl.org>; Wed, 15 Aug 2001 01:45:11 -0400
Received: from [192.168.0.4] (c1128134-a.stcla1.sfba.home.com [24.176.209.55])
	by crg-gw.Stanford.EDU (8.11.5/8.11.5) with ESMTP id f7F5jDU24945;
	Tue, 14 Aug 2001 22:45:13 -0700 (PDT)
Mime-Version: 1.0
X-Sender: jchang@smi.stanford.edu (Unverified)
Message-Id: <p05101000b79fbcb4bcbf@[192.168.0.4]>
In-Reply-To: <200108142044.f7EKiZq02776@pw600a.bioperl.org>
References: <200108142044.f7EKiZq02776@pw600a.bioperl.org>
Date: Tue, 14 Aug 2001 22:46:45 -0700
To: biopython-bugs@bioperl.org, biopython-dev@biopython.org,
       joungjh@aptusgenomics.com
From: Jeffrey Chang <jchang@SMI.Stanford.EDU>
Subject: Re: [Biopython-dev] Notification: incoming/40
Content-Type: text/plain; charset="us-ascii" ; format="flowed"

At 4:44 PM -0400 8/14/01, biopython-bugs@bioperl.org wrote:
>Full_Name: J. Joung
>I'm using GenBank NCBIDictionary to retrieve a GenBank record. The retrived
>record is missing the following information: LOCUS, DEFINITION, ACCESSION,
>VERSION, and KEYWORDS.

Is this information that's in the Genbank record?  It should be 
returning whatever NCBI returns, or raising an exception.  Dropping 
information would be odd.  Do you have a reproducible?  What is the 
accession you're using?


>Is there a way of obtaining the GenBank id from a known locuslink id in
>biopython?

No, we don't have any locuslink functionality at the moment.

Jeff


From jchang at SMI.Stanford.EDU  Wed Aug 15 01:46:45 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/40
In-Reply-To: <200108142044.f7EKiZq02776@pw600a.bioperl.org>
References: <200108142044.f7EKiZq02776@pw600a.bioperl.org>
Message-ID: <p05101000b79fbcb4bcbf@[192.168.0.4]>

At 4:44 PM -0400 8/14/01, biopython-bugs@bioperl.org wrote:
>Full_Name: J. Joung
>I'm using GenBank NCBIDictionary to retrieve a GenBank record. The retrived
>record is missing the following information: LOCUS, DEFINITION, ACCESSION,
>VERSION, and KEYWORDS.

Is this information that's in the Genbank record?  It should be 
returning whatever NCBI returns, or raising an exception.  Dropping 
information would be odd.  Do you have a reproducible?  What is the 
accession you're using?


>Is there a way of obtaining the GenBank id from a known locuslink id in
>biopython?

No, we don't have any locuslink functionality at the moment.

Jeff

From jchang at SMI.Stanford.EDU  Wed Aug 15 01:50:20 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
In-Reply-To: <001001c122f2$f602e760$010a0a0a@cadence.com>
References: <NEBBJGOKPAACGBJLMGCDCEBECCAA.tarjei@genome.wi.mit.edu>
 <001001c122f2$f602e760$010a0a0a@cadence.com>
Message-ID: <p05101001b79fbd64e601@[192.168.0.4]>

At 10:52 PM -0700 8/11/01, Cayte wrote:
>From: "Tarjei S Mikkelsen" <tarjei@genome.wi.mit.edu>
>  >  Instead I suggest that you either run the input through
>>  File.SGMLHandle or File.SGMLStripper before you pass the
>>  WIT record to KEGG.Enzyme.Parser OR write a separate Parser
>>  class in your WIT module that wraps a ParserSupport.SGMLStrippingConsumer
>>  around KEGG.Enzyme._Consumer.
>>
>   The problem is I'm experimenting with a filter to strip out junk ( not
>necessarily html ) between records.
>The motivation is that I've had Martel fail on just an extraneous line feed.
>Somehow the idea of chaining two filters together trips a watch for bugs
>alarm in my mind.

I agree with Tarjei that these should be separated out, if possible. 
Yes, there's a possibility of bugs when chaining filters together, 
but having two entities developed and debugged separately should have 
fewer bugs (and easier maintenance) than a system where all the 
functionality is munged together.

Jeff

From biopython-bugs at bioperl.org  Wed Aug 15 08:22:26 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/42
Message-ID: <200108151222.f7FCMQq08880@pw600a.bioperl.org>

JitterBug notification

new message incoming/42

Message summary for PR#42
	From: joungjh@email.com
	Subject: Re: [Biopython-dev] Notification: incoming/40
	Date: Wed, 15 Aug 2001 08:22:26 -0400
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From joungjh@email.com Wed Aug 15 08:22:26 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f7FCMPq08874
	for <biopython-bugs@pw600a.bioperl.org>; Wed, 15 Aug 2001 08:22:26 -0400
Date: Wed, 15 Aug 2001 08:22:26 -0400
Message-Id: <200108151222.f7FCMPq08874@pw600a.bioperl.org>
From: joungjh@email.com
To: biopython-bugs@bioperl.org
Subject: Re: [Biopython-dev] Notification: incoming/40

Full_Name: 
Module: 
Version: 
OS: 
Submission from: gw-aptusgen1.cust.fast.net (209.92.248.166)


>>I'm using GenBank NCBIDictionary to retrieve a GenBank record. The retrived
>>record is missing the following information: LOCUS, DEFINITION, ACCESSION,
>>VERSION, and KEYWORDS.

>Is this information that's in the Genbank record?  It should be 
>returning whatever NCBI returns, or raising an exception.  Dropping 
>information would be odd.  Do you have a reproducible?  What is the 
>accession you're using?

Yes, LOCUS, DEFINITION, ACCESSION, VERSION, and KEYWORDS information is in
GenBank record. Any GenBank id would drop this information on UNIX. You can try
GenBank id of '15145772'.  I have installed biopython-1.00a1 windows version on
my pc and this seems to return all information correctly. Thank you for your
quick response.


From chapmanb at arches.uga.edu  Wed Aug 15 08:55:37 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/42
In-Reply-To: <200108151222.f7FCMQq08880@pw600a.bioperl.org>
Message-ID: <Pine.A41.4.10.10108150848350.110510-100000@archa15.cc.uga.edu>

Hey all;

Bug report from J. Joung:
> >>I'm using GenBank NCBIDictionary to retrieve a GenBank record. The retrived
> >>record is missing the following information: LOCUS, DEFINITION, ACCESSION,
> >>VERSION, and KEYWORDS.

Jeff:
> >Is this information that's in the Genbank record?  It should be 
> >returning whatever NCBI returns, or raising an exception.  Dropping 
> >information would be odd.  Do you have a reproducible?  What is the 
> >accession you're using?

I think this is the infamous "lose the first 5 lines of the file" bug that
popped up in biopython-1.00a2 (which would also explain why 1.00a1 works
just file). This has been fixed in the current CVS, so the next release
should be bug free (well, at least in regards to this bug :-).

The solution for now is to fix Bio/File.py. I'm not exactly sure how this
would be done with diffs on windows, but attached is the change which
fixes the problem. I hope I've picked up on your problem correctly -- if
this change doesn't help please let us know!

Thanks for the bug report, and sorry about the problem! 
Hope this helps.
Brad

$ more File.diff
Index: File.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/File.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -c -r1.12 -r1.13
*** File.py     2001/06/04 04:44:09     1.12
--- File.py     2001/07/14 23:48:51     1.13
***************
*** 46,60 ****
          return line
  
      def read(self, size=-1):
!         saved = ''
!         while size > 0 and self._saved:
!             if len(self._saved[0]) <= size:
!                 size = size - len(self._saved[0])
!                 saved = saved + self._saved.pop(0)
!             else:
!                 saved = saved + self._saved[0][:size]
!                 self._saved[0] = self._saved[0][size:]
!                 size = 0
          return saved + self._handle.read(size)
  
      def saveline(self, line):
--- 46,64 ----
          return line
  
      def read(self, size=-1):
!         if size == -1:
!             saved = string.join(self._saved, "")
!             self._saved[:] = []
!         else:
!             saved = ''
!             while size > 0 and self._saved:
!                 if len(self._saved[0]) <= size:
!                     size = size - len(self._saved[0])
!                     saved = saved + self._saved.pop(0)
!                 else:
!                     saved = saved + self._saved[0][:size]
!                     self._saved[0] = self._saved[0][size:]
!                     size = 0
          return saved + self._handle.read(size)
  
      def saveline(self, line):


From katel at worldpath.net  Wed Aug 15 19:12:15 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] WIT and KEGG
References: <NEBBJGOKPAACGBJLMGCDCEBECCAA.tarjei@genome.wi.mit.edu> <001001c122f2$f602e760$010a0a0a@cadence.com> <p05101001b79fbd64e601@[192.168.0.4]>
Message-ID: <002f01c125df$bc14d0a0$010a0a0a@cadence.com>

----- Original Message -----
From: "Jeffrey Chang" <jchang@SMI.Stanford.EDU>
> I agree with Tarjei that these should be separated out, if possible.
> Yes, there's a possibility of bugs when chaining filters together,
> but having two entities developed and debugged separately should have
> fewer bugs (and easier maintenance) than a system where all the
> functionality is munged together.
>
  I'm not sure what two entities you are referring two.

   Two filters?  I can see the case for not cluttering the KEGG format with
html filters.

   Two modules?  There may be no need for a separate WIT module because the
10 ( filtered ) WIT files are accepted by the KEGG parser.  And the WIT
documentation claims to be using KEGG format. Of course I need to take a
close, byte by byte look to see if any problem lurks in the details..  So
WIT may just need a preprocesor consisting of chained filters.


                                      Cayte
>
>


From katel at worldpath.net  Wed Aug 15 19:18:47 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool and Martel
Message-ID: <003701c125e0$a3bc1940$010a0a0a@cadence.com>

   Does Martel handle embedded size fields?  The MetaTool output contains
lots of matrixes preceded by column row counts.  It would be hard, Martel
would have to catch and store data on the fly.

   It's not strictly necessary but without it Martel would accept matrixes
that were not consistent with the size fields.

                                                                    Cayte


From adalke at mindspring.com  Wed Aug 15 10:17:37 2001
From: adalke at mindspring.com (Andrew Dalke)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool and Martel
Message-ID: <017301c12595$098a54e0$0201a8c0@josiah.dalkescientific.com>

>   Does Martel handle embedded size fields?

Yes!  I needed it for support for the MDL file format.

Suppose you have something like

2 1 this and that
3 1 but not the other
1 3 this is a test

which should be turned into

record 1 == ("this and", "that")
record 2 == ("but not the", "other")
record 3 == ("this", "is a test")

Then you can use something like

>>> from Martel import Integer, Str, RepN, Group, AnyEol, Re, Rep
>>> word = Group("word", Re("[^ \R]+"))
>>>
>>> record = Integer("n1") + Str(" ") + Integer("n2") + \
...     Group("group1", RepN(Str(" ") + word, "n1")) + \
...     Group("group2", RepN(Str(" ") + word, "n2")) + \
...     AnyEol()
>>>
>>> from xml.sax import saxutils
>>> format = Rep(record)
>>> parser = format.make_parser()
>>> parser.setContentHandler(saxutils.XMLGenerator())
>>> parser.parseString("""\
... 2 1 this and that
... 3 1 but not the other
... 1 3 this is a test
... """)
<?xml version="1.0" encoding="iso-8859-1"?>
<n1>2</n1> <n2>1</n2><group1> <word>this</word>
<word>and</word></group1><group2> <word>that</word></group2>
<n1>3</n1> <n2>1</n2><group1> <word>but</word> <word>not</word>
<word>the</word></group1><group2> <word>other</word></group2>
<n1>1</n1> <n2>3</n2><group1> <word>this</word></group1><group2>
<word>is</word> <word>a</word> <word>test</word></group2>
>>>

A couple more details are at:
 http://www.dalkescientific.com/Martel/ebi-talk/img35.htm

This is only usable if the number and the repeat count are the
same.  Eg, if the count value is N to mean N-1 repeats then it
isn't possible to support it.  (N+1 is doable as a repeat of N
then a repeat of 1.)

But I've not come across that case.  Yet.

>   It's not strictly necessary but without it Martel would accept matrixes
>that were not consistent with the size fields.

There are other formats (MDL mol format) where the counts are required
else things get out of synch.

                    Andrew


From jchang at SMI.Stanford.EDU  Thu Aug 16 01:32:16 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Notification: incoming/42
In-Reply-To: 
 <Pine.A41.4.10.10108150848350.110510-100000@archa15.cc.uga.edu>
References: <Pine.A41.4.10.10108150848350.110510-100000@archa15.cc.uga.edu>
Message-ID: <p05101006b7a10b1f45c5@[192.168.0.4]>

At 8:55 AM -0400 8/15/01, Brad Chapman wrote:
>Hey all;
>
>Bug report from J. Joung:
>>  >>I'm using GenBank NCBIDictionary to retrieve a GenBank record. 
>>The retrived
>>  >>record is missing the following information: LOCUS, DEFINITION, ACCESSION,
>>  >>VERSION, and KEYWORDS.
>
>Jeff:
>>  >Is this information that's in the Genbank record?  It should be
>>  >returning whatever NCBI returns, or raising an exception.  Dropping
>>  >information would be odd.  Do you have a reproducible?  What is the
>>  >accession you're using?
>
>I think this is the infamous "lose the first 5 lines of the file" bug that
>popped up in biopython-1.00a2 (which would also explain why 1.00a1 works
>just file). This has been fixed in the current CVS, so the next release
>should be bug free (well, at least in regards to this bug :-).

Hey, good call!  I completely forgot about that.  It looks like we 
really should release a fix soon...

Jeff

From dagdigian at blackstonecomputing.com  Wed Aug 22 13:40:45 2001
From: dagdigian at blackstonecomputing.com (Chris Dagdigian)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Need help debugging our python-based viewcvs.cgi
Message-ID: <3B83EE9D.4060802@blackstonecomputing.com>

Hey folks,

Our python-based web CVS front end breaks as soon as you traverse into a 
CVSROOT and then try to click on one of the links meant to aid in 
traversing the directory tree.

The central problem is that the URLS that are constructed are wrong 
after you get to a certain depth in the CVS tree.

As an example check out:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Martel/formats/?cvsroot=biopython

That link will put you into biopython/Martel/formats...

Now- on that same page try clicking on one of the links next to the 
Current Directory: navigation line. The URL link back to 
biopython/biopython is just plain wrong and it causes the CGI to bomb 
out with an error. It seems to be appending extra path info to the 
arguments that get passed back to the CGI.

At this point I'm not sure if this is a python bug in the code or 
perhaps an artifact of how our our virtual website and cgi-bin 
directories are configured.

Does anyone have the spare cycles to fool around with this app and try 
to debug it? I don't know enough python to feel comfortable diving 
around in the URL contruction codebase.

I'll set up account access and permisions (if necessary) if anyone wants 
to help out in debugging this app.

Regards,
Chris


From michal at orfeus.bioinfo.pl  Wed Aug 22 13:59:35 2001
From: michal at orfeus.bioinfo.pl (Michal Kurowski)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] swissprot not working ?
Message-ID: <20010822195935.A1749@orfeus>


Hi,
I've installed biopython-1.00a2 revently and I'm having some
unexpected problems:
1) swissprot module has some serious problems. Running "swissprot.py"
   from the "examples" directory gives traceback i am attaching.
2) installation won't go smoothly. ( I'm sure I've got TextTools
   installed ;-). The log is in a attachment.

My python is:

Python 2.0 (#1, Dec 20 2000, 15:28:16) 
[GCC 2.96 20000731 (Red Hat Linux 7.0)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> 

from the redhat rpm package.

Cheers,

-- 
Michal Kurowski
<mkur@bio.iimcb.gov.pl>
-------------- next part --------------
Script started on Wed Aug 22 19:53:20 2001
]0;michal@a7: /home/michal[michal@a7 michal]$ python2 /home/seals/michal/bin/swiss_kinase.py
Traceback (most recent call last):
  File "/home/seals/michal/bin/swiss_kinase.py", line 23, in ?
    cur_record = s_iterator.next()
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 168, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 289, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 332, in feed
    self._scan_record(uhandle, consumer)
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 337, in _scan_record
    fn(self, uhandle, consumer)
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 369, in _scan_id
    self._scan_line('ID', uhandle, consumer.identification, exactly_one=1)
  File "/usr/lib/python2.0/site-packages/Bio/SwissProt/SProt.py", line 359, in _scan_line
    read_and_call(uhandle, event_fn, start=line_type)
  File "/usr/lib/python2.0/site-packages/Bio/ParserSupport.py", line 326, in read_and_call
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'ID':
AC   P54646;

]0;michal@a7: /home/michal[michal@a7 michal]$ exit
exit

Script done on Wed Aug 22 19:53:25 2001
-------------- next part --------------
Script started on Wed Aug 22 19:50:29 2001
]0;michal@a7: /usr/local/src/biopython-1.00a2[root@a7 biopython-1.00a2]# python2 setup.py test
running test
test_Enzyme ... ok
test_FSSP ... ok
test_Fasta ... ok
test_Fasta2 ... ok
test_File ... ok
test_GenBank ... ok
test_GenBankFormat ... ok
test_KeyWList ... ok
test_Location ... ok
test_LocationParser ... ok
test_NCBIStandalone ... ok
test_NCBIWWW ... ok
test_ParserSupport ... ok
test_SProt ... ok
test_SubsMat ... ok
test_align ... ok
test_gobase ... ERROR
test_kabat ... ok
test_prodoc ... ok
test_property_manager ... ok
test_prosite ... ok
test_prosite2 ... ok
test_rebase ... ERROR
test_seq ... ok
test_translate ... ok
test_unigene ... FAIL

======================================================================
ERROR: test_gobase
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 136, in runTest
    __import__(self.test_name)
  File "test_gobase.py", line 12, in ?
    from Bio import Gobase
  File "/usr/lib/python2.0/site-packages/Bio/Gobase/__init__.py", line 33, in ?
    from Bio import Sequence
ImportError: cannot import name Sequence
======================================================================
ERROR: test_rebase
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 136, in runTest
    __import__(self.test_name)
  File "test_rebase.py", line 12, in ?
    from Bio.Rebase import Rebase
  File "/usr/lib/python2.0/site-packages/Bio/Rebase/__init__.py", line 32, in ?
    from Bio import Sequence
ImportError: cannot import name Sequence
======================================================================
FAIL: test_unigene
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 153, in runTest
    expected_handle)
  File "run_tests.py", line 247, in compare_output
    assert expected_line == output_line, \
AssertionError: 
Output  : '        key is D61454\012'
Expected: '        key is F10922\012'
----------------------------------------------------------------------
Ran 26 tests in 49.226s

FAILED (failures=1, errors=2)
]0;michal@a7: /usr/local/src/biopython-1.00a2[root@a7 biopython-1.00a2]# exit
exit

Script done on Wed Aug 22 19:51:36 2001
From chapmanb at arches.uga.edu  Wed Aug 22 14:40:05 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] swissprot not working ?
In-Reply-To: <20010822195935.A1749@orfeus>
References: <20010822195935.A1749@orfeus>
Message-ID: <15235.64645.89742.341180@taxus.athen1.ga.home.com>

Hi Michal;
Thanks for writing. Since you caught me right at the end of writing
e-mails, you get an extra fast response :-)

> I've installed biopython-1.00a2 revently and I'm having some
> unexpected problems:

In short, the problems you are having look like bugs that we have
noticed and squashed since the 1.00a2 release. I'll get into more
detail below, but if you want to "just fix the problems," getting the
latest CVS version should work for you. The biopython source is
available via anonymous CVS, with instructions at:

http://cvs.biopython.org/

We also hope to make a new release relatively soon. 

Anyways, the problems you are seeing are due to us, and not you :-)

> 1) swissprot module has some serious problems. Running "swissprot.py"
>    from the "examples" directory gives traceback i am attaching.

There is a (now infamous) bug that snuck into 1.00a2 in which the
first 5 lines of a file will be eaten (under some conditions). The
traceback you are seeing is caused during retrieval of the swissprot
records in the swissprot.py example. The record is retrieved, but is
short the first 5 lines, so the swissprot parser thinks it is malformed.

> 2) installation won't go smoothly. ( I'm sure I've got TextTools
>    installed ;-). The log is in a attachment.

The installation looks good (hey, a majority of the tests passed :-),
but these are also a few bugs in the tests:

> ImportError: cannot import name Sequence

This is caused by an old module Bio.Sequence (which has been replaced
by Bio.Seq), which was referenced in a few places we didn't
expect. This has been fixed.

> Output  : '        key is D61454\012'
> Expected: '        key is F10922\012'

This is caused by different dictionary key orderings under different
version of python. The module itself works fine, but when the output
generated by your version of python is compared to the "golden output"
produced by a different version, the key orderings differ so the
comparison fails. I believe this problem has also been fixed (by
sorting the dictionary keys so they are always standard). But, at any
rate, this is a regression test bug, and shouldn't affect your use of
the module.

Thanks for reporting these problems. We definately like to get
feedback about this sort of thing. I hope this clears things up
and that you enjoy using Biopython! 

Brad

From michal at orfeus.bioinfo.pl  Wed Aug 22 14:52:44 2001
From: michal at orfeus.bioinfo.pl (Michal Kurowski)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] Re: swissprot not working ?
In-Reply-To: <15235.64645.89742.341180@taxus.athen1.ga.home.com>; from chapmanb@arches.uga.edu on Wed, Aug 22, 2001 at 02:40:05PM -0400
References: <20010822195935.A1749@orfeus> <15235.64645.89742.341180@taxus.athen1.ga.home.com>
Message-ID: <20010822205244.A6217@orfeus>

Brad Chapman [chapmanb@arches.uga.edu] wrote:
> Hi Michal;
> Thanks for writing. Since you caught me right at the end of writing
> e-mails, you get an extra fast response :-)

Seems I'm really lucky ;-).

> In short, the problems you are having look like bugs that we have
> noticed and squashed since the 1.00a2 release. I'll get into more
> detail below, but if you want to "just fix the problems," getting the
> latest CVS version should work for you. The biopython source is
> available via anonymous CVS, with instructions at:
> 
> http://cvs.biopython.org/
> 

I'm going there right away.

> There is a (now infamous) bug that snuck into 1.00a2 in which the
> first 5 lines of a file will be eaten (under some conditions). The
> traceback you are seeing is caused during retrieval of the swissprot
> records in the swissprot.py example. The record is retrieved, but is
> short the first 5 lines, so the swissprot parser thinks it is malformed.

I was having the same type of errors when trying my own scripts. After
a small invastigation I found that SProt.py module is a culprit. At
least it seems to ;-).

> 
> > 2) installation won't go smoothly. ( I'm sure I've got TextTools
> >    installed ;-). The log is in a attachment.
> 
> The installation looks good (hey, a majority of the tests passed :-),
> but these are also a few bugs in the tests:

I was using the last "alpha" release previously and I don't remember
anything like that ( but now I've got diffrent TextTools ).

> Thanks for reporting these problems. We definately like to get
> feedback about this sort of thing. I hope this clears things up
> and that you enjoy using Biopython! 

I surely do.
Thanks a lot,

-- 
Michal Kurowski
<mkur@bio.iimcb.gov.pl>

From katel at worldpath.net  Sun Aug 26 02:10:58 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool and Martel
References: <017301c12595$098a54e0$0201a8c0@josiah.dalkescientific.com>
Message-ID: <002301c12df5$e16c67a0$010a0a0a@cadence.com>

> >>> from Martel import Integer, Str, RepN, Group, AnyEol, Re, Rep
> >>> word = Group("word", Re("[^ \R]+"))
> >>>
> >>> record = Integer("n1") + Str(" ") + Integer("n2") + \
> ...     Group("group1", RepN(Str(" ") + word, "n1")) + \
> ...     Group("group2", RepN(Str(" ") + word, "n2")) + \
> ...     AnyEol()

  Can the variable be reassigned within a single record?  MetaTool outputs a
lot of matrixes.  It would be simpler to reassign row_count and column_count
for each matrix than invent a new variable name for each matrix and clutter
up the code with repetitive, almost the same  matrix definitions.

                                                     Cayte


From katel at worldpath.net  Sun Aug 26 22:07:39 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool
Message-ID: <000f01c12e9d$0dad7820$010a0a0a@cadence.com>

   The MetaTool parser will need to represent matrixes.  Before writing my
own class, I found an extension called Numeric Python, that provides
powerful support for matix representation and manipulation.  The only
drawback I can see is that it requires bundling yet another tool with the
distribution.  But Metatool is new and having powerful matrix features will
allow users to experiment in unanticipated ways.

  Of course I'd need to investigate more to see how reliable the extension
is.

  Is this the way to go?

                                                                        Cyte


From adalke at mindspring.com  Sun Aug 26 11:53:02 2001
From: adalke at mindspring.com (Andrew Dalke)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool and Martel
Message-ID: <049801c12e47$30e8bf80$0201a8c0@josiah.dalkescientific.com>

Cayte:
>  Can the variable be reassigned within a single record?

Yes.  It uses the most recently matched value, including
if there was a partial match on path that require back
tracking.

> It would be simpler to reassign row_count and column_count
> for each matrix than invent a new variable name for each
> matrix and clutter up the code with repetitive, almost
> the same  matrix definitions.

No problem.  Go ahead.

                    Andrew


From adalke at mindspring.com  Sun Aug 26 11:56:26 2001
From: adalke at mindspring.com (Andrew Dalke)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool
Message-ID: <04a101c12e47$a9e075e0$0201a8c0@josiah.dalkescientific.com>

>I found an extension called Numeric Python, that provides
>powerful support for matix representation and manipulation.

Numeric Python is pretty widely used, and rather easy
to install.

>  Of course I'd need to investigate more to see how reliable
> the extension is.

One of my clients uses it all the time.  Years ago there
used to be a lot of things (almost all non-standard uses)
that would cause it to fail, but they've been long ago
cleaned up.

I think Numeric was one of the first common non-Guido
extension to Python.

>  Is this the way to go?

Yes.  If you're doing non-trivial matrix numerics it's
best to use Numeric, even given the extra dependency.

                    Andrew
                    dalke@dalkescientific.com


From jchang at SMI.Stanford.EDU  Mon Aug 27 01:15:36 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] MetaTool
In-Reply-To: <000f01c12e9d$0dad7820$010a0a0a@cadence.com>
Message-ID: <Pine.GSO.4.31.0108262214540.6369-100000@taiyang>

>    The MetaTool parser will need to represent matrixes.  Before
> writing my own class, I found an extension called Numeric Python, that
> provides powerful support for matix representation and manipulation.
> The only drawback I can see is that it requires bundling yet another
> tool with the distribution.  But Metatool is new and having powerful
> matrix features will allow users to experiment in unanticipated ways.
>
>   Of course I'd need to investigate more to see how reliable the
> extension is.
>
>   Is this the way to go?

Yes.  It's already a dependency for Biopython, for some of the more
algorithmic code.

Jeff


From jchang at SMI.Stanford.EDU  Tue Aug 28 20:18:49 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] next release
Message-ID: <p05101001b7b1e334d90c@[169.254.197.41]>

Hello everyone,

I'd like to roll a new biopython release.  This will also be the 
final release before I move around the directory structure as 
discussed at BOSC.

This release will not contain a lot of new functionality, but will be 
mostly fix bugs, including the now infamous UndoHandle bug.  The code 
for the release should be working correctly, so all the core 
developers should let me know if your stuff is ready to be released, 
and if not, when it will be.

The regression tests seem to all pass...

Thanks,
Jeff

From jchang at SMI.Stanford.EDU  Fri Aug 31 18:15:22 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:03 2005
Subject: [Biopython-dev] next release imminent
Message-ID: <p05101000b7b5bcb25eb1@[171.65.33.250]>

If nobody has any rejections, I'm going to put together the next 
release this weekend.  Please let me know if I should hold off...

Thanks,
Jeff