From edschofield at gmail.com Fri Mar 2 12:52:44 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 2 Mar 2007 17:52:44 +0000 Subject: [Biopython-dev] Offer to convert BioPython to NumPy Message-ID: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> Hi everyone, Last month there was some interest expressed on this list (and the general discussion list) about conversion of the codebase from Numeric to NumPy. I'd like to volunteer to lead this effort. I'm a (minor) NumPy developer and a (less minor) SciPy developer. My main contributions to SciPy have been the maximum entropy module and parts of the sparse matrix module. I've recently moved into the field of computational biology, and I'm happy to see that BioPython exists; I am sure it will save me time. But I don't want to go back to using Numeric, since NumPy is so much better (and is now the only array package supported by SciPy). I think trying to retain compatibility with Numeric would be unrealistic. But I would hope that a transition to a NumPy-only codebase would be quick (a week or so). If there are any technical problems we are sure to get a quick response on the numpy-discussion list. Bruce, if you're still interested in helping with the porting, we could split up the work. I suggest that we make our changes in a new CVS branch. That way our changes would be unintrusive until the patch-set is ready and tested. -- Ed From mdehoon at c2b2.columbia.edu Fri Mar 2 13:10:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 02 Mar 2007 13:10:50 -0500 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> Message-ID: <45E868AA.60409@c2b2.columbia.edu> Ed Schofield wrote: > Last month there was some interest expressed on this list (and the general > discussion list) about conversion of the codebase from Numeric to NumPy. I'd > like to volunteer to lead this effort. Thanks! I'd be happy to see Biopython to get to work with numpy. > > I think trying to retain compatibility with Numeric would be unrealistic. Why is that so? For example, matplotlib happily supports Numeric, NumPy, and numarray. Given that the latest version of NumPy does not compile out of the box on Cygwin, I'd be very hesitant to drop Numeric support for Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bsouthey at gmail.com Fri Mar 2 15:36:05 2007 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 2 Mar 2007 14:36:05 -0600 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <45E868AA.60409@c2b2.columbia.edu> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> Message-ID: Hi, That would be great because I have not had any time to look into this! Also, I realized that I need to understand more about BioPython first. However, most files appear to only need very small changes but virtually all the uses (except Bio/KDTree/KDTree.py) use the 'from Numeric import *'. Bruce Bio/LogisticRegression.py Bio/MarkovModel.py Bio/MaxEntropy.py Bio/NaiveBayes.py Bio/distance.py Bio/kNN.py Bio/Affy/CelFile.py Bio/Cluster/__init__.py Bio/KDTree/KDTree.py Bio/PDB/Atom.py Bio/PDB/Entity.py Bio/PDB/FragmentMapper.py Bio/PDB/MMCIFParser.py Bio/PDB/Model.py Bio/PDB/NeighborSearch.py Bio/PDB/PDBParser.py Bio/PDB/Polypeptide.py Bio/PDB/ResidueDepth.py Bio/PDB/Superimposer.py Bio/PDB/Superimposer.py Bio/PDB/Vector.py Bio/PDB/Vector.py Bio/SVDSuperimposer/SVDSuperimposer.py Bio/SVDSuperimposer/SVDSuperimposer.py Bio/Statistics/lowess.py On 3/2/07, Michiel Jan Laurens de Hoon wrote: > Ed Schofield wrote: > > Last month there was some interest expressed on this list (and the general > > discussion list) about conversion of the codebase from Numeric to NumPy. I'd > > like to volunteer to lead this effort. > Thanks! I'd be happy to see Biopython to get to work with numpy. > > > > I think trying to retain compatibility with Numeric would be unrealistic. > Why is that so? For example, matplotlib happily supports Numeric, NumPy, > and numarray. > > Given that the latest version of NumPy does not compile out of the box > on Cygwin, I'd be very hesitant to drop Numeric support for Biopython. > > --Michiel. > > > -- > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1130 St Nicholas Avenue > New York, NY 10032 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From edschofield at gmail.com Mon Mar 5 12:42:56 2007 From: edschofield at gmail.com (Ed Schofield) Date: Mon, 5 Mar 2007 17:42:56 +0000 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <45E89B41.1090508@c2b2.columbia.edu> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> <1b5a37350703021124n6e7b759cg6d0ee69e3cd3a68e@mail.gmail.com> <45E89B41.1090508@c2b2.columbia.edu> Message-ID: <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> On 3/2/07, Michiel Jan Laurens de Hoon wrote: > > > If this is really necessary, the easiest way to proceed may be to use > > NumPy's "oldnumeric" interface, rather than porting properly. > > For the reasons above, this is really necessary. But will oldnumeric be > around in the future? Or is it a temporary measure to making porting > easier? I think Travis wants to keep the oldnumeric interface around for at least a few years -- long enough, I imagine, for most actively developed projects to have been ported to NumPy. I've started work on a simple wrapper layer for Biopython to use either Numeric or numpy.oldnumeric. I'll post more details soon. From mdehoon at c2b2.columbia.edu Mon Mar 5 13:17:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 05 Mar 2007 13:17:28 -0500 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> <1b5a37350703021124n6e7b759cg6d0ee69e3cd3a68e@mail.gmail.com> <45E89B41.1090508@c2b2.columbia.edu> <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> Message-ID: <45EC5EB8.2050708@c2b2.columbia.edu> Ed Schofield wrote: > I've started work on a simple wrapper layer for Biopython to use either > Numeric or numpy.oldnumeric. I'll post more details soon. Thanks Ed! I looked at the definitions in oldnumeric.h. It turned out that only two of them are actually used in Biopython: #define CONTIGUOUS NPY_CONTIGUOUS and #undef import_array #define import_array() { if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); } } So it appears that the compatibility problem of Biopython and numpy may not be as big as it seemed at first, at least as far as the C-code is concerned. About the import_array definition: Do you know why it appears in oldnumeric.h? As the exact same definition appears in numpy/core/code_generators/generate_array_api.py, I would think that there is no need for it in oldnumeric.h. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Mar 6 06:18:09 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 06 Mar 2007 11:18:09 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45E237A6.4040801@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> Message-ID: <45ED4DF1.90609@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> SequenceIterator(handle, format) >> SequencesToDict(sequences, key_function=None) >> SequencesToAlignment(sequences, ...) >> WriteSequences(sequences, handle, format) >> >> Does anyone want to suggest different names for these functions? Only Michiel has replied, so I assume there are no other strong views on the dev mailing list. Michiel de Hoon wrote: > I would prefer >>>> from Bio import SeqIO >>>> SeqIO.read(handle, format) >>>> SeqIO.write(sequences, handle, format) You have persuaded me to rename SequenceIterator and WriteSequences. BioPython uses the term "parser" all over the place. Doing a quick search of the code, I found ten files with "def read(" and fifty eight with "def parse(" - so I would rather have "parse" than "read". That would give us the core functions: Bio.SeqIO.parse(handle, format) Bio.SeqIO.write(sequences, handle, format) or, Bio.SeqIO.read(handle, format) Bio.SeqIO.write(sequences, handle, format) I'll let you [Michiel] make the call. Say the word and I'll update the code and the wiki today. Are you happy with Bio.SeqIO.SequencesToDict(...) name? I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time being, until we do some work on the Bio.Align class. I don't think we should tackle this before the next release. I'm happy to document this particular function as "experimental/beta" and liable to be removed or replaced in future. After the renaming, I would say the Bio.SeqIO code is OK for release. After BioPython 1.43 is out, I would like to mark the old code in Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. Peter From lpritc at scri.ac.uk Tue Mar 6 06:45:01 2007 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 06 Mar 2007 11:45:01 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <1173181501.19889.79.camel@lplinuxdev.scri.sari.ac.uk> On Tue, 2007-03-06 at 11:18 +0000, Peter wrote: > That would give us the core functions: > > Bio.SeqIO.parse(handle, format) > Bio.SeqIO.write(sequences, handle, format) > > or, > > Bio.SeqIO.read(handle, format) > Bio.SeqIO.write(sequences, handle, format) > > I'll let you [Michiel] make the call. Say the word and I'll update the > code and the wiki today. +1 for Bio.SeqIO.parse(handle, format) - I too think it's more consistent with the existing parser behaviours. L. -- Dr Leighton Pritchard AMRSC D131, Plant Pathology, Scottish Crop Research Institute W: http://bioinf.scri.ac.uk/lp E: lpritc at scri.ac.uk GPG: 0xE58BA41B _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From mdehoon at c2b2.columbia.edu Tue Mar 6 13:40:30 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 06 Mar 2007 13:40:30 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <45EDB59E.2000407@c2b2.columbia.edu> Peter wrote: > That would give us the core functions: > > Bio.SeqIO.parse(handle, format) > Bio.SeqIO.write(sequences, handle, format) That sounds good to me. > I'll let you [Michiel] make the call. Say the word and I'll update the > code and the wiki today. Just to avoid any misunderstanding: While I have been in charge of building the Biopython releases, unfortunately that doesn't come with any official decision-making power :-(. > Are you happy with Bio.SeqIO.SequencesToDict(...) name? Well I think that this function is not so essential as Bio.SeqIO.parse and Bio.SeqIO.write. So I'll let you decide. > I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time > being, until we do some work on the Bio.Align class. I don't think we > should tackle this before the next release. I'm happy to document this > particular function as "experimental/beta" and liable to be removed or > replaced in future. OK. > After the renaming, I would say the Bio.SeqIO code is OK for release. OK then I'll try for the Bronx-release (1.43) for sometime during next week. If we find some issues with the new code after this release, we can do another release (code-named Queens) shortly after. I'll get started on updating the documentation for the new Bio.Blast parsers. > After BioPython 1.43 is out, I would like to mark the old code in > Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. As far as I'm concerned, you can also deprecate them before this release. This will encourage people to start using Bio.SeqIO, and improve our changes on finding any remaining problems. Thanks for all your work on Bio.SeqIO, and for involving the Biopython community in its development. I think Bio.SeqIO is a major improvement for Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Mar 6 17:31:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 06 Mar 2007 22:31:44 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EDB59E.2000407@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> Message-ID: <45EDEBD0.1060000@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: >> That would give us the core functions: >> >> Bio.SeqIO.parse(handle, format) >> Bio.SeqIO.write(sequences, handle, format) > > That sounds good to me. Done. I have also updated the wiki: http://www.biopython.org/wiki/SeqIO >> Are you happy with Bio.SeqIO.SequencesToDict(...) name? > > Well I think that this function is not so essential as Bio.SeqIO.parse > and Bio.SeqIO.write. So I'll let you decide. > >> I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time >> being, until we do some work on the Bio.Align class. I don't think we >> should tackle this before the next release. I'm happy to document this >> particular function as "experimental/beta" and liable to be removed or >> replaced in future. > > OK. I was thinking tonight, after updating CVS, that perhaps we should try and find some shorter (lower case) names for "SequencesToDict" and "SequencesToAlignment"... something like "toDict" and "toAlignment", or "as_dict" and "as_alignment" might looks nicer. e.g. from Bio import SeqIO my_dict = SeqIO.toDict(SeqIO.parse(handle, format)) rather than this, which looks clumsy and inconsistent: from Bio import SeqIO my_dict = SeqIO.SequencesToDict(SeqIO.parse(handle, format)) >> After the renaming, I would say the Bio.SeqIO code is OK for release. > > OK then I'll try for the Bronx-release (1.43) for sometime during next > week. If we find some issues with the new code after this release, we > can do another release (code-named Queens) shortly after. I have started looking over the other existing sequence parsers in BioPython with a view to adding some of them into the SeqIO framework (after the Bronx 1.43 release): http://www.biopython.org/wiki/SeqIO_dev Note to self (or anyone bored), I should actually write something about the SeqRecord class: http://www.biopython.org/wiki/SeqRecord >> After BioPython 1.43 is out, I would like to mark the old code in >> Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. > > As far as I'm concerned, you can also deprecate them before this > release. This will encourage people to start using Bio.SeqIO, and > improve our changes on finding any remaining problems. True - but I will be away for a bit (end of March, early April) so I wouldn't like encourage too many people, and then not be here to help them. Maybe I should try and draft something for the release notes, along the lines of "Beta software - please try it and give us feedback". Peter From mdehoon at c2b2.columbia.edu Tue Mar 6 22:04:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 06 Mar 2007 22:04:52 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EDEBD0.1060000@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> Message-ID: <45EE2BD4.7090400@c2b2.columbia.edu> Peter wrote: > I was thinking tonight, after updating CVS, that perhaps we should try > and find some shorter (lower case) names for "SequencesToDict" and > "SequencesToAlignment"... something like "toDict" and "toAlignment", or > "as_dict" and "as_alignment" might looks nicer. e.g. > > from Bio import SeqIO > my_dict = SeqIO.toDict(SeqIO.parse(handle, format)) ... There may be a simple solution to this. Note that a dictionary can be created by specifying a list of [key, value] pairs: >>> dict([['a','A'],['b','B'],['c','C']]) {'a': 'A', 'c': 'C', 'b': 'B'} This also works with an iterator: >>> def f(text): for character in text: yield [character, character.upper()] >>> dict(f("abcd")) {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'} Now, if we let SeqRecord inherit from list, we can make it behave as a [record.id, record] list. Normally, this would not be visible to the user, in the sense that a user who doesn't know that SeqRecord inherits from list wouldn't notice that it does. The upshot is that we can now create a dictionary like this: >>> d = dict(SeqIO.parse(handle, format)) without any changes to Bio.SeqIO. Two things get lost here: 1) We can't have a key_function to change how to choose the key. 2) We're no longer checking if all keys are different. This can be fixed by saving the keys in the parser function and raising an exception if two identical keys are found. This implies though that the same exception is raised in all use cases of SeqIO.parse, which may not be what we want. --Michiel From chris.lasher at gmail.com Tue Mar 6 23:00:01 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 6 Mar 2007 23:00:01 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> On 3/6/07, Peter wrote: > Only Michiel has replied, so I assume there are no other strong views on > the dev mailing list. If you're still soliciting opinions, here's mine: I am really fond of the SeqIO.parse function, rather than SeqIO.read, since read is a builtin function. Different namespaces, but parse is unambiguous. For remaining function/method names, I would really prefer to stick to the PEP 8 style guide, which specifies: > Function Names > > Function names should be lowercase, with words separated by underscores > as necessary to improve readability. > > mixedCase is allowed only in contexts where that's already the > prevailing style (e.g. threading.py), to retain backwards compatibility. e.g., to_dict() instead of toDict(). Thanks again for the work on SeqIO! Chris From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 05:35:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 10:35:05 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> Message-ID: <45EE9559.2020500@maubp.freeserve.co.uk> Chris Lasher wrote: > On 3/6/07, Peter wrote: >> Only Michiel has replied, so I assume there are no other strong views on >> the dev mailing list. > > If you're still soliciting opinions, here's mine: I am really fond of > the SeqIO.parse function, rather than SeqIO.read, since read is a > builtin function. Different namespaces, but parse is unambiguous. We have gone for Bio.SeqIO.parse and Bio.SeqIO.write Is read really a built in function? Its not on this list: http://docs.python.org/lib/built-in-funcs.html > For remaining function/method names, I would really prefer to stick to > the PEP 8 style guide, which specifies: > >> Function Names >> >> Function names should be lowercase, with words separated by underscores >> as necessary to improve readability. >> >> mixedCase is allowed only in contexts where that's already the >> prevailing style (e.g. threading.py), to retain backwards compatibility. > > e.g., to_dict() instead of toDict(). Any views on to_dict versus as_dict, to_alignment versus as_alignment? As an aside, we should really have a "coding styles" page on the wiki somewhere, and by default I would also reference PEP 8: http://www.python.org/dev/peps/pep-0008/ (And I should probably go through the new SeqIO code and make sure it complies!) Peter From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 05:43:36 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 10:43:36 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EE2BD4.7090400@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> Message-ID: <45EE9758.1050902@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Note that a dictionary can be created by specifying a list of [key, > value] pairs: > > >>> dict([['a','A'],['b','B'],['c','C']]) > {'a': 'A', 'c': 'C', 'b': 'B'} > > This also works with an iterator: > >>> def f(text): > for character in text: > yield [character, character.upper()] > >>> dict(f("abcd")) > {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'} > > Now, if we let SeqRecord inherit from list, we can make it behave as a > [record.id, record] list. Normally, this would not be visible to the > user, in the sense that a user who doesn't know that SeqRecord inherits > from list wouldn't notice that it does. > > The upshot is that we can now create a dictionary like this: > >>> d = dict(SeqIO.parse(handle, format)) > without any changes to Bio.SeqIO. That is clever... > Two things get lost here: > 1) We can't have a key_function to change how to choose the key. > 2) We're no longer checking if all keys are different. This can be fixed > by saving the keys in the parser function and raising an exception if > two identical keys are found. This implies though that the same > exception is raised in all use cases of SeqIO.parse, which may not be > what we want. Sadly not ideal. Also, wouldn't this prevent us making a SeqRecord inherit from Seq (another interesting idea you proposed in the past)? And for Seq objects, they could behave a little more like a string, or a list of letters. It might be nice to be able to splice a SeqRecord and get a new SeqRecord with the appropriate sub-sequence... I have been thinking about a "RichSeqRecord" subclass of SeqRecord which would support sequence level annotation (e.g. secondary structure). In this situation, when requesting a sub record, the appropriate sub set of the secondary structure information should also be extracted. e.g. The pfam/stockholm alignment format can hold strings the same length as the sequences which contain "per sequence per character" information like secondary structure. We could also load a PDB file in this way, and provide a list of residue objects (including the atom coordinates) in parallel with the sequence. Peter From mdehoon at c2b2.columbia.edu Wed Mar 7 15:50:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 07 Mar 2007 15:50:59 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EE9758.1050902@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> Message-ID: <45EF25B3.1050206@c2b2.columbia.edu> Peter wrote: >> The upshot is that we can now create a dictionary like this: >> >>> d = dict(SeqIO.parse(handle, format)) >> without any changes to Bio.SeqIO. > > That is clever... > >> Two things get lost here: >> 1) We can't have a key_function to change how to choose the key. >> 2) We're no longer checking if all keys are different. This can be >> fixed by saving the keys in the parser function and raising an >> exception if two identical keys are found. This implies though that >> the same exception is raised in all use cases of SeqIO.parse, which >> may not be what we want. > > Sadly not ideal. About 2): It may be a good idea to add a keyword allow_identical_keys (probably a better name is needed here), False by default, in SeqIO.parse to specify if SeqIO.parse should raise an exception if two records with an identical record.id are found. Whereas this is more of a problem when creating a dictionary, I think that this is also relevant in general. Note though that if SeqIO.parse checks for identical keys automatically, there is not much left to do for SeqIO.to_dict. Btw, a to_dict function may fit in better with Bio.SeqRecord, as it is not specifically related to sequence file IO. > Also, wouldn't this prevent us making a SeqRecord > inherit from Seq (another interesting idea you proposed in the past)? Not necessarily; there are two ways to avoid this: A) SeqRecord could inherit both from list and from Seq; B) Instead of letting SeqRecord inherit from list, we could add a next() and __iter__ method to the SeqRecord class (returning record.id and record, and then StopIteration); this will also let us create a dictionary with dict(SeqIO.parse(handle, format)). --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 18:16:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 23:16:48 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EF25B3.1050206@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> <45EF25B3.1050206@c2b2.columbia.edu> Message-ID: <45EF47E0.9070009@maubp.freeserve.co.uk> I have renamed SequenceToDict and SequencesToAlignment as to_dict and to_alignment, which as Chris Lasher pointed out follows the PEP8 python style guide. While there may be better places for these to functions to live, leaving them in SeqIO seems reasonable to me. Still - if we do want to move them (or remove them) in the near future it would be better to do this before releasing BioPython 1.43 Other than that, I think Bio.SeqIO is "ready" for its first release. Michiel Jan Laurens de Hoon wrote: > It may be a good idea to add a keyword allow_identical_keys (probably a > better name is needed here), False by default, in SeqIO.parse to specify > if SeqIO.parse should raise an exception if two records with an > identical record.id are found. Whereas this is more of a problem when > creating a dictionary, I think that this is also relevant in general. I'm not very keen on this "allow_identical_keys" option for SeqIO.parse() However, I think we could do that in the SeqIO.parse function itself (rather than repeating the code many times for each underlying parser). One catch is that the exception would get raised once a duplicate is found - possibly after the user has already processed the first half of the file. >> Also, wouldn't this prevent us making a SeqRecord >> inherit from Seq (another interesting idea you proposed in the past)? > > Not necessarily; there are two ways to avoid this: > A) SeqRecord could inherit both from list and from Seq; > B) Instead of letting SeqRecord inherit from list, we could add a next() > and __iter__ method to the SeqRecord class (returning record.id and > record, and then StopIteration); this will also let us create a > dictionary with dict(SeqIO.parse(handle, format)). I think I didn't make myself clear. I wanted to reserve the __iter__ method to the SeqRecord class for use like this: for residue in record : #assuming residue this is also a SeqRecord object print residue.seq.tostring() and similarly for __iter__ of a Seq class: for residue in seq : #assuming residue is also a Seq object, print residue.tostring() To me this syntax seems very natural, but does seem to block your clever dict() plan. Peter From mdehoon at c2b2.columbia.edu Thu Mar 8 11:17:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 08 Mar 2007 11:17:58 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EF47E0.9070009@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> <45EF25B3.1050206@c2b2.columbia.edu> <45EF47E0.9070009@maubp.freeserve.co.uk> Message-ID: <45F03736.7080300@c2b2.columbia.edu> Peter wrote: > I have renamed SequenceToDict and SequencesToAlignment as to_dict and > to_alignment, which as Chris Lasher pointed out follows the PEP8 python > style guide. OK, fair enough. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Mar 8 11:47:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 08 Mar 2007 11:47:32 -0500 Subject: [Biopython-dev] Biopython release coming up Message-ID: <45F03E24.4080407@c2b2.columbia.edu> Hi everybody, The Biopython release (1.43, code-named "Bronx") is coming up. This release will include the new Bio.SeqIO code as well as the new Blast parser. I'm planning to finish the release during the weekend of March 17/18, so about ten days from now. Some files have been added or removed from Biopython recently, so it may be useful to checkout a fresh copy of Biopython from CVS. The Biopython tests in CVS all pass, so things are looking good. However, Bugzilla currently lists 17 bugs, so please have a look to see if there's something we can do about them. If you have some code sitting around, now would be a good time to commit it to CVS. However, if you are not sure if your code is ready for prime time, please hold off until after this release. Thanks everybody for your contributions to Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Thu Mar 8 13:29:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:29:25 -0500 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703081829.l28ITP45001122@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:29 EST ------- [I have tried to reassign this to the mailing list...] Could someone familar with BioSQL take a look at this please? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 13:45:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:45:47 -0500 Subject: [Biopython-dev] [Bug 2225] New: Do something with the PROJECT line in GenBank files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2225 Summary: Do something with the PROJECT line in GenBank files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk See also bug 1946 where the introduction of this line broke the parser. At the moment the project line is currently ignored. Quoting: ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt ------------------------------------------------- 1.4 Upcoming Changes 1.4.1 Multiple identifiers for the PROJECT line The recently-introduced PROJECT linetype (see Section 3.4.7.2) provides a way to link GenBank sequences that are part of a sequencing project to the Entrez Genome Project database, where further details about the project can be found. As of June 2007, multiple identifiers will be valid for the PROJECT line. Here is a mocked-up example of the expected usage: LOCUS AANA01000001 2 rc DNA linear BCT 09-FEB-2007 DEFINITION Polaribacter dokdonensis MED152 whole genome shotgun sequencing project. ACCESSION AANA01000001 VERSION AANA01000001.1 GI:85822094 PROJECT GenomeProject:13543 GenomeProject:99999 There are several situations in which a record could be considered part of two different Genome Projects. For example, consider an environmental-sampling metagenomic WGS project for which the individual sequence-overlap contigs are not attributed to a specific organism. A Genome Project could exist that provides further details about the sequencing effort, the centers involved, etc. If, in subsequent assembly and annotation phases, scaffold/super-contig/ chromosomal records are created which **are** attributed to a specific organism, then those CON-division records could have two Genome Project IDs: one for the WGS sequencing project as a whole; and a second for organism- specific Genome Projects. Additional examples illustrating the use of multiple Genome Project IDs will be provided in future release notes, and via the GenBank listserv. ------------------------------------------------- End quote For the RecordParser, storing this line as a string should be fine (?) However, for the FeatureParser, which turns the data into a SeqRecord, perhaps this data should be held in the annotation as a list of strings: ['GenomeProject:13543', 'GenomeProject:99999'] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 13:46:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:46:30 -0500 Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line type PROJECT In-Reply-To: Message-ID: <200703081846.l28IkUYF002586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1946 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:46 EST ------- You can download the example here, which the reporter saved as 'bug.gb': http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi??db=nucleotide&val=NC_001416 Using CVS, the original sample script now runs fine: from Bio import GenBank feature_parser = GenBank.FeatureParser() gb_record = feature_parser.parse(open('bug.gb','r')) As does something similar using Bio.SeqIO, e.g. from Bio import SeqIO records = list(SeqIO.parse(open('bug.gb','rU'),"genbank")) assert len(records) == 1 gb_record = records[0] In both cases, the new GenBankScanner class in Bio/GenBank/Scanner.py will silently ignore the "PROJECT" line, unless run in debug mode. I have filed Bug 2225 to cover doing something useful with the project data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 13:51:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:51:10 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <200703081851.l28IpApE003050@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:51 EST ------- Created an attachment (id=583) --> (http://bugzilla.open-bio.org/attachment.cgi?id=583&action=view) Marc's frameTranslations.py rescued from the mailing list 15 May 2006 Quote: > Man this bugzilla doesn't have an option to up load files. > The file can be found on the dev-list. Actually it does, but oddly you can only attach files once the bug has been created. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Mar 8 14:13:20 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 08 Mar 2007 19:13:20 +0000 Subject: [Biopython-dev] Administration of Bugzilla Message-ID: <45F06050.6050404@maubp.freeserve.co.uk> Do any of our regular readers have administrator access to BugZilla? The "version" field should really be updated to include at least 1.42 (the current release) and 1.43 (due soon). At the moment we have the somewhat dated choices of 1.00a4, 1.10, 1.24 and "Not applicable". While we are at it, I would also like to extend the "Components" list. Adding entries for the PDB, Nexus, and SeqIO (or maybe "Sequence Files" in general) might be nice. Peter From edschofield at gmail.com Fri Mar 9 13:35:51 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 9 Mar 2007 18:35:51 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython Message-ID: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> On 3/5/07, Ed Schofield wrote: > > > I've started work on a simple wrapper layer for Biopython to use either > Numeric or numpy.oldnumeric. I'll post more details soon. > I've now finished the first version of a patch to add support for NumPy in addition to Numeric. I'll try to attach it to this message; you can also get it from http://edschofield.com/biopython-numpy-support.patch. It applies cleanly to the current CVS version using: patch -p0 < biopython-numpy-support.patch in the root biopython/ directory. The main difficulty I had to overcome was with C extensions, particularly the Cluster module. This is because NumPy defines array dimensions and strides as intp types, whereas Numeric defines them as int, which differs on 64-bit platforms. Some tests fail; these failures all result from overly fragile expectations on the formatting of the output. The tests should be updated, but I haven't done this with this patch. All other tests pass with both NumPy and Numeric on my machine. MarkovModel.py had a bug in its setting of p_initial, p_transition, and p_emission; it made an incorrect assumption about the behaviour of the Python logical "or" operation when applied to Numeric arrays, which is somewhat broken. I've tried to fix it, but someone familiar with MarkovModel.py should look over the relevant lines (176-184) to be sure I haven't changed the intended behaviour. I'd like to continue contributing to BioPython. Whom should I contact about CVS write access? -- Ed -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython-numpy-support.patch Type: text/x-patch Size: 48870 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20070309/50658af8/attachment-0001.bin From mdehoon at c2b2.columbia.edu Fri Mar 9 13:51:55 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 13:51:55 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> Message-ID: <45F1ACCB.1030704@c2b2.columbia.edu> Ed Schofield wrote: > I've now finished the first version of a patch to add support for NumPy in > addition to Numeric. I'll try to attach it to this message; you can also > get > it from http://edschofield.com/biopython-numpy-support.patch. ... Thanks, Ed. Quick question: The patch #includes numpy/oldnumeric.h for Python <--> C glue code that uses Numeric. Why is this needed? --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From edschofield at gmail.com Fri Mar 9 14:22:13 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 9 Mar 2007 19:22:13 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> Message-ID: <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> On 3/9/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > > I've now finished the first version of a patch to add support for NumPy > in > > addition to Numeric. I'll try to attach it to this message; you can also > > get > > it from http://edschofield.com/biopython-numpy-support.patch. ... > > Thanks, Ed. > > Quick question: > The patch #includes numpy/oldnumeric.h for Python <--> C glue code that > uses Numeric. Why is this needed? For the CONTIGUOUS and import_array() definitions right now. As you pointed out earlier in the thread, these are only used a couple of times. But including the header is simpler than pasting these two definitions into the C source files and should maximize compatibility should they be extended further in the future. I haven't yet got around to answering your question about import_array() ... Do you know why it appears in oldnumeric.h? As the exact same definition > appears in numpy/core/code_generators/generate_array_api.py, I would > think that there is no need for it in oldnumeric.h . > ... because I don't know how the code generation works within NumPy. But I don't think extension modules will ever use NumPy's internal code generators; they just need the headers. -- Ed From mdehoon at c2b2.columbia.edu Fri Mar 9 15:22:48 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 15:22:48 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> Message-ID: <45F1C218.6000601@c2b2.columbia.edu> Ed Schofield wrote: >> Quick question: >> The patch #includes numpy/oldnumeric.h for Python <--> C glue code that >> uses Numeric. Why is this needed? > > For the CONTIGUOUS and import_array() definitions right now. For the CONTIGUOUS definition, there's a simpler solution. CONTIGUOUS is currently only used in Bio/Cluster/clustermodule.h, and it's only used as follows: PyArrayObject* array; if (array->flags & CONTIGUOUS)... Now, there is a macro in arrayobject.h, both in Numeric and in NumPy, that deals exactly with this situation: In Numeric: #define PyArray_ISCONTIGUOUS(m) ((m)->flags & CONTIGUOUS) In numpy (imported via ndarrayobject.h): #define PyArray_ISCONTIGUOUS(m) PyArray_CHKFLAGS(m, NPY_CONTIGUOUS) So, if we use this macro instead of CONTIGUOUS directly, we can avoid using oldnumeric.h. Or am I missing something? > I haven't yet got around to answering your question about import_array() ... > ... because I don't know how the code generation works within NumPy. Yeah I know, I don't think that code generation in NumPy was a good idea. It makes it too hard to figure out what is going on. > But I don't think extension modules will ever use NumPy's > internal code generators; they just need the headers. I think so too. NumPy itself actually calls import_array without #including oldnumeric.h. For example, see numpy/random/mtrand/mtrand.c. So we too should be fine without oldnumeric.h. But it might be good to check this with the NumPy folks. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Fri Mar 9 16:57:15 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 16:57:15 -0500 Subject: [Biopython-dev] Documentation for new Blast parser Message-ID: <45F1D83B.20208@c2b2.columbia.edu> Hi everybody, For the upcoming Biopython release, I rewrote the chapter on Blast in the tutorial to describe our new Blast parser. For those of you who want to have a preview, I put a copy here: http://biopython.org/DIST/docs/tutorial/Tutorial-new.html Please let me know if you have any comments, or if you find any errors or omissions. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From edschofield at gmail.com Fri Mar 9 20:25:37 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sat, 10 Mar 2007 01:25:37 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F1C218.6000601@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> Message-ID: <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> On 3/9/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > >> Quick question: > >> The patch #includes numpy/oldnumeric.h for Python <--> C glue code that > >> uses Numeric. Why is this needed? > > > > For the CONTIGUOUS and import_array() definitions right now. > > For the CONTIGUOUS definition, there's a simpler solution. > > [snip] > > So, if we use this macro instead of CONTIGUOUS directly, we can avoid > using oldnumeric.h. Or am I missing something? Yeah, sure, but why would we want to avoid using oldnumeric.h? Perhaps you're assuming this is the 'oldnumeric' compatibility layer that I mentioned earlier. The C header is actually only a very small part of it; the meat of it is in the numpy.oldnumeric module imported by the Python code, which we're inextricably bound to using as long as we preserve Numeric support. > I haven't yet got around to answering your question about import_array() > ... > > ... because I don't know how the code generation works within NumPy. > > Yeah I know, I don't think that code generation in NumPy was a good > idea. It makes it too hard to figure out what is going on. Well, that might be too harsh a judgment. Remember, the code generation is only for the internals -- I don't think it's something extension writers should need to know or worry about... -- Ed From mdehoon at c2b2.columbia.edu Fri Mar 9 22:25:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 22:25:50 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> Message-ID: <45F2253E.2030909@c2b2.columbia.edu> Ed Schofield wrote: > On 3/9/07, Michiel Jan Laurens de Hoon wrote: >> So, if we use this macro instead of CONTIGUOUS directly, we can avoid >> using oldnumeric.h. Or am I missing something? > > Yeah, sure, but why would we want to avoid using oldnumeric.h? Why #include oldnumeric.h if we don't need it? The fewer changes we need to make to Biopython and the cleaner we can keep the code, the better. I see no justification for #including an unnecessary header file. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Fri Mar 9 23:06:39 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Mar 2007 23:06:39 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> Message-ID: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> On 10/9/06, Chris Lasher wrote: > Anybody know if BioPython (I suppose all Open Bio projects) will > switch over to Subversion, and if so, when? I think the merits and > advantages of Subversion over CVS speak for themselves. It's certainly > become my revision control system of preference. Anybody else's? I'm raising this issue again. I did some digging and found an Open-Bio mailing list thread from 2006 that states: > We have a new machine just for everyone on this list. It is called > "dev.open-bio.org" and it will be the new home for developers using > CVS as well as the people who want to switch over to using Subversion. The full thread is available from . Considering that Subversion addresses the weaknesses of CVS, I'm surprised that every Open-Bio project still runs from CVS instead of Subversion. Someone's got to lead the way, why not have it be BioPython? We've had a lot of active development as of late, with SeqIO, and now with NumPy transitioning. Either of those cases would have been better tracked through Subversion, which keeps tallies of revisions on a repository-wide basis, rather than a file-by-file basis. I've also found that researchers new to revision control have had great success in picking up and using Subversion in our local Software Carpentry group. I have also created a screencast on using Subversion which demonstrates all the basic commands and activities. This screencast is available in AVI (MPEG4) and OGG formats at I hope the BioPython developers will consider a move to Subversion seriously. If there is support from the devs, but no interest on anyone's part to make it happen, given the proper people to contact, I will be happy to get this moving as a way of contributing back to the BioPython community. Best, Chris From chris.lasher at gmail.com Fri Mar 9 23:09:58 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Mar 2007 23:09:58 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> Message-ID: <128a885f0703092009g4cb20a5ey668d5763613db35e@mail.gmail.com> On 3/9/07, Chris Lasher wrote: > I have also created a screencast on using Subversion > which demonstrates all the basic commands and activities. This > screencast is available in AVI (MPEG4) and OGG formats at > > I need to double check my links! The CORRECT links to the *SUBVERSION* screencasts are as follows: My apologies, Chris From mdehoon at c2b2.columbia.edu Fri Mar 9 23:36:07 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Fri, 09 Mar 2007 23:36:07 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> Message-ID: <45F235B7.6000409@c2b2.columbia.edu> Chris Lasher wrote: > I hope the BioPython developers will consider a move to Subversion > seriously. If there is support from the devs, but no interest on > anyone's part to make it happen, given the proper people to contact, I > will be happy to get this moving as a way of contributing back to the > BioPython community. I know very little about CVS and Subversion (which is why I didn't respond to your original post). But I did notice that a lot of software projects are using Subversion instead of CVS nowadays, including Python itself. So I don't have any objections against Biopython moving to Subversion as well (unfortunately, I cannot be very help here either). If we are moving to Subversion though, I'd like to ask you not to make any changes until the next Biopython release comes out, which will be in about one week from now. --Michiel. From edschofield at gmail.com Sat Mar 10 05:40:55 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sat, 10 Mar 2007 10:40:55 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F2253E.2030909@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> Message-ID: <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> On 3/10/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > > On 3/9/07, Michiel Jan Laurens de Hoon > wrote: > >> So, if we use this macro instead of CONTIGUOUS directly, we can avoid > >> using oldnumeric.h. Or am I missing something? > > > > Yeah, sure, but why would we want to avoid using oldnumeric.h? > > Why #include oldnumeric.h if we don't need it? The fewer changes we need > to make to Biopython and the cleaner we can keep the code, the better. I > see no justification for #including an unnecessary header file. It's a minor issue, but I can see several reasons to use the header file NumPy provides for the purpose, rather than pasting its definitions into our own source files: - because, by isolating the NumPy definitions from the BioPython source files, it leads to code that's shorter overall and (IMHO) simpler - because we may have C extensions in the future that need other parts of the compatibility interface - because any future bugfixes or changes to NumPy's oldnumeric.h would then be picked up automatically Would we write helloworld.c like this? int printf(const char * __restrict, ...) __DARWIN_LDBL_COMPAT(printf); int main() { printf("Hello, world!\n"); } ;) I can understand if you're wanting to make an early start on the porting process to remove the dependence on the oldnumeric compatibility layer entirely. But in this case I don't think it's worth it; a full port to NumPy's native interfaces would break Numeric compatibility, which you're committed to keeping for some time yet. The oldnumeric interface won't be a hindrance for BioPython's users anyway -- with my patch they can either use Numeric or uninstall it entirely and instead pass native NumPy arrays between BioPython and other packages like SciPy, Matplotlib and PyTables. -- Ed From biopython-dev at maubp.freeserve.co.uk Sat Mar 10 07:24:33 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Mar 2007 12:24:33 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F1D83B.20208@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> Message-ID: <45F2A381.2080303@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Hi everybody, > > For the upcoming Biopython release, I rewrote the chapter on Blast in > the tutorial to describe our new Blast parser. For those of you who want > to have a preview, I put a copy here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > Please let me know if you have any comments, or if you find any errors > or omissions. Good work - I've made a few notes. -------------------------------------------------------------------- In section "3.1 Running BLAST locally" I would also stress the fact that this is your only choice if you are using "private" data, for example unpublished data from a company. e.g. add something like this after the "two advantages" of running BLAST locally: Another reason to run blast locally is if you are dealing with proprietary or unpublished sequence data. You may not be allowed to redistribute the sequences, so submitting them to the NCBI as a blast query would not be an option. -------------------------------------------------------------------- In section "3.1 Running BLAST locally" the wording about the location of the database files could be a little clearer. You wrote the following (which I have reformatted to use shorter lines): >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis in the # directory /home/mdehoon/Data/Genomes/Databases. # The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin, # and bsubtilis.nsq in this directory. You talk about four files, but only name three of them. I also found the path to be a little unclear... I think you meant this: >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis # (for Bacillus subtilis) consisting of the following four files: # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nhr # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nin # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nsq # /home/mdehoon/Data/Genomes/Databases/bsubtilis.??? rather than the file being inside a subdirectory, bsubtilis, like this: >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis # (for Bacillus subtilis) consisting of the following four files: # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nhr # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nin # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nsq # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.??? -------------------------------------------------------------------- I think you should include an explicit example of running standalone blast and getting XML files back, i.e. include this at the end of section 3.1 (rather than just mentioning it): >>> from Bio.Blast import NCBIStandalone >>> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, \ 'blastn', my_blast_db, my_blast_file, align_view=7) I am wondering if now is a good time to switch the default output format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc given NCBIWWW.qblast already defaults to XML. ---------------------------------------------------------------------- There is an extra "the" at the end of the first paragraph of section "3.4 Parsing BLAST output": "..., it is also much easier to parse automatically, making the Biopython a whole lot more stable." Should read: "..., it is also much easier to parse automatically, making Biopython a whole lot more stable." Also should it be "Biopython" or "BioPython"? The website uses a mixture... ----------------------------------------------------------------------- This email is getting a bit long - I'll read the rest of the document later. Peter From biopython-dev at maubp.freeserve.co.uk Sat Mar 10 09:03:10 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Mar 2007 14:03:10 +0000 Subject: [Biopython-dev] SProt and new lines issue In-Reply-To: <45F1F7CC.1010404@maubp.freeserve.co.uk> References: <45F03E24.4080407@c2b2.columbia.edu> <45F1F7CC.1010404@maubp.freeserve.co.uk> Message-ID: <45F2BA9E.2020206@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Some files have been added or removed from Biopython recently, so it > may be useful to checkout a fresh copy of Biopython from CVS. The > Biopython tests in CVS all pass, so things are looking good. I just got a fresh copy of CVS on my Windows machine, and discovered that test_SeqIO fails on a SwissProt file (however test_SProt is OK), specifically: biopython/Tests/SwissProt/sp007 It turns out that test_SeqIO opens file in mode "rU" (reading, universal newline mode) while test_SProt opens files in normal read mode. For some reason the sp007 file causes trouble in universal newline mode. Making test_SProt also use "rU" shows the same error as seen in test_SeqIO. I can "fix" the (Windows only?) error by either opening the files in normal read mode, or by running unix2dos on the input file. Very odd. This also suggests the example file sp007 is not stored in CVS as a text file, but as a binary file. Peter P.S. I did some work on test_SProt to compare the results of its RecordParser() and SeqenceParser() and to do a basic test of the Iterator() - we should add a multi-record example test case too. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 15:32:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:32:15 -0500 Subject: [Biopython-dev] [Bug 2227] New: Writing Nexus files with Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2227 Summary: Writing Nexus files with Bio.SeqIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I would like to be able to write multiple sequence alignments in Nexus format, using Bio.SeqIO (and possibly also Bio.Nexus). I have tried to do this by creating a nexus object in code (from an existing alignment) using the add_sequence() method, with the intention of then calling its write_nexus_data() method. However, it seems that add_sequence() is intended for use when the alignment matrix has already been created - not for building one from scratch. I will attached code for Bio.SeqIO to write a Nexus alignment WITHOUT using Bio.Nexus, which seemed easier. I would prefer to use Bio.Nexus to do this however... [This issue can wait till after we release BioPython 1.43] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 15:38:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:38:07 -0500 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200703102038.l2AKc7Lf014970@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-10 15:38 EST ------- I'm having trouble with bugzilla not accepting the attachment, which is a replacement for Bio/SeqIO/NexusIO.py This seems to work on my limited testing. However, the only "validation" I have done to date is checking that Bio.Nexus can read the alignments I create. Also, if the input records all have simple "generic" alphabets, the code cannot decide if they are protein, dna or rna - and raises a ValueError. Its not fool proof, but it might be better to look at the letters in the actual sequence at that point and "guess". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 15:41:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:41:09 -0500 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200703102041.l2AKf9UE015188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-10 15:41 EST ------- Created an attachment (id=584) --> (http://bugzilla.open-bio.org/attachment.cgi?id=584&action=view) replacement for Bio/SeqIO/NexusIO.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sat Mar 10 19:42:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 10 Mar 2007 19:42:01 -0500 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F2A381.2080303@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> Message-ID: <45F35059.6000102@c2b2.columbia.edu> Thanks, Peter! These are all good points. I've updated the tutorial following your suggestions. Peter wrote: >> For the upcoming Biopython release, I rewrote the chapter on Blast in >> the tutorial to describe our new Blast parser. For those of you who >> want to have a preview, I put a copy here: >> >> http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > I am wondering if now is a good time to switch the default output > format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc > given NCBIWWW.qblast already defaults to XML. I agree that these functions should return XML by default. Objections, anybody? If not, I'll make this change and update the tutorial accordingly. > -------------------------------------------------------------------- > > In section "3.1 Running BLAST locally" the wording about the location > of the database files could be a little clearer. > > You wrote the following (which I have reformatted to use shorter lines): > > >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" > # I used formatdb to create a BLAST database named bsubtilis in the > # directory /home/mdehoon/Data/Genomes/Databases. > # The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin, > # and bsubtilis.nsq in this directory. > > You talk about four files, but only name three of them. There are only three files, so maybe my description is confusing and suggests that there should be four files. I've updated the Tutorial to show the full path of the three files. > Also should it be "Biopython" or "BioPython"? The website uses a > mixture... A long time ago it was decided that "Biopython" is the official name. Though BioPython and biopython also appear all over the place. Thanks again, Peter. --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sat Mar 10 20:08:56 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 01:08:56 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F35059.6000102@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> Message-ID: <45F356A8.6050706@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Thanks, Peter! These are all good points. I've updated the tutorial > following your suggestions. I see you have started to mention Bio.SeqIO in the Blast documentation - is this a hint to get me to update section 2.4 "Parsing biological file formats"? I see you are editing the CVS file biopython/Doc/Tutorial.tex - I am happy working with LaTex so that shouldn't be a problem. Link to ViewCVS if anyone is interested: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython I presume you run some flavour of latex to html on it, and then upload the file to website somehow... ---------------------------------------------------------------------- Back to the BLAST tutorial, quoting Section 3.1 Running BLAST locally: > Running BLAST locally (as opposed to over the internet, see Section > 3.2) has two advantages: > > * Local BLAST may be faster than BLAST over the internet; > > * Local BLAST allows you to make your own database to search for > sequences against. > > * Dealing with proprietary or unpublished sequence data can be > another reason to run BLAST locally. You may not be allowed to > redistribute the sequences, so submitting them to the NCBI as a BLAST > query would not be an option. Minor style point: Using the bullet points makes it look like three advantages (when the introduction says two). I wouldn't use a bullet point for the proprietary/unpublished data paragraph. ---------------------------------------------------------------------- > Peter wrote: >> I am wondering if now is a good time to switch the default output >> format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast >> etc given NCBIWWW.qblast already defaults to XML. Michiel wrote: > I agree that these functions should return XML by default. > Objections, anybody? If not, I'll make this change and update the > tutorial accordingly. It will catch a few people out, but it seems best to do it now at the same time as the related Blast XML changes. Do the HTML and Text parsers spot when they are fed XML input, and give a helpful error message? Should we also mention this change in the DEPRECATED file? http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython Peter From mdehoon at c2b2.columbia.edu Sat Mar 10 21:40:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 10 Mar 2007 21:40:26 -0500 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F356A8.6050706@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> Message-ID: <45F36C1A.6080005@c2b2.columbia.edu> Peter wrote: > I see you have started to mention Bio.SeqIO in the Blast documentation - > is this a hint to get me to update section 2.4 "Parsing biological file > formats"? If you could do that, then that would be great. If you don't find the time for it, I can also help you by grabbing what you have on your wiki page. Feel free to change section 2.4 as you want -- I don't think what's there is still very relevant to Biopython, and it probably scares people off. > I see you are editing the CVS file biopython/Doc/Tutorial.tex - I am > happy working with LaTex so that shouldn't be a problem. Note that it is not exactly LaTeX, but LaTeX with hevea added. Let me know if you run into problems with hevea. > > I presume you run some flavour of latex to html on it, and then upload > the file to website somehow... Hevea takes care of this; see the Makefile in biopython/Doc. > Back to the BLAST tutorial, quoting Section 3.1 Running BLAST locally: ... > Minor style point: Using the bullet points makes it look like three > advantages (when the introduction says two). I wouldn't use a bullet > point for the proprietary/unpublished data paragraph. Ha, you're right. I should work on my counting skills. It's fixed now. Thanks again, --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sun Mar 11 11:39:07 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 15:39:07 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F36C1A.6080005@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> Message-ID: <45F4229B.9080401@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: >> I see you have started to mention Bio.SeqIO in the Blast documentation - >> is this a hint to get me to update section 2.4 "Parsing biological file >> formats"? > > If you could do that, then that would be great. If you don't find the > time for it, I can also help you by grabbing what you have on your wiki > page. Feel free to change section 2.4 as you want -- I don't think > what's there is still very relevant to Biopython, and it probably scares > people off. I have basically replaced that whole section in Tutorial.tex, and checked it looks fine using pdflatex and hevea on my linux machine. I have not updated the website - so getting feedback from might be a little tricky. Can you tell me how to do that please Michiel? Thanks Peter From bugzilla-daemon at portal.open-bio.org Sun Mar 11 13:44:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 13:44:58 -0400 Subject: [Biopython-dev] [Bug 2228] New: genbank parser should not print warnings to stdout Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2228 Summary: genbank parser should not print warnings to stdout Product: Biopython Version: 1.24 Platform: PC OS/Version: FreeBSD Status: NEW Severity: critical Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: markd at cse.ucsc.edu In general, it's not a good idea to print warnings from a library, as this is a an undocumented and unexpected output from the library API. However printing warnings to stdout is a serious bug, meaning programs using the library can't be used in a pipeline. There are four warning prints to stdout in Bio/GenBank/__init__.py. All of these should go to stderr, and probably be under the control of the debug option. Also, the warning: "WARNING - Unquoted multiline '%s' entry for %s feature with location %s" should be removed. This is not a WARNING, it's a failure of this module to correctly handle the poorly documented Genbank flat-file format -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 14:31:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 14:31:39 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111831.l2BIVdkE010082@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|critical |major Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 14:31 EST ------- I don't think you are using BioPython 1.24, but probably 1.42 (I know, the buzilla choices need updating). Since the release of BioPython 1.42, this area of code has been revised significantly (to also deal with EMBL files) and of those print statments have either vanished or are debug only. Have a look at new file Bio/GenBank/Scanner.py There is currently just one "evil print statement", triggered when faced with a "minimal LOCUS line". I will make this print to standard error instead... I would very much like to have some examples of this as test cases BTW. > Also, the warning: > "WARNING - Unquoted multiline '%s' entry for %s feature with location %s" > should be removed. It has already been removed. > This is not a WARNING, it's a failure of this module > to correctly handle the poorly documented Genbank flat-file format Unqoted multiline feature entries do exist "in the wild" but are in breach of my reading of the NCBI documentation. The parser handles them fine (but grumbled). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Mar 11 14:34:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 14:34:56 -0400 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F4229B.9080401@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> Message-ID: <45F44BD0.6010000@c2b2.columbia.edu> Peter wrote: > Michiel Jan Laurens de Hoon wrote: >> Peter wrote: >>> I see you have started to mention Bio.SeqIO in the Blast >>> documentation - is this a hint to get me to update section 2.4 >>> "Parsing biological file formats"? ... > I have basically replaced that whole section in Tutorial.tex, and > checked it looks fine using pdflatex and hevea on my linux machine. For those of you who want to preview the new tutorial, it's available here: http://biopython.org/DIST/docs/tutorial/Tutorial-new.html This contains a description of the new Bio.SeqIO in section 2.4. --Michiel From bugzilla-daemon at portal.open-bio.org Sun Mar 11 14:52:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 14:52:18 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111852.l2BIqISQ010925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 14:52 EST ------- > There is currently just one "evil print statement", triggered > when faced with a "minimal LOCUS line". I will make this print > to standard error instead... Done, marking as fixed. You can update to CVS or wait for BioPython 1.43 which should be due this month. P.S. I would like to see some real life examples of GenBank files with "minimal LOCUS lines" to check Biopython does something sensible with them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 15:01:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 15:01:50 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111901.l2BJ1oGw011375@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 markd at cse.ucsc.edu changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |markd at cse.ucsc.edu ------- Comment #3 from markd at cse.ucsc.edu 2007-03-11 15:01 EST ------- thanks for the amazingly quick response!! What documentation page are you using for the genbank flat-file format? The only thing I have ever found is the sample record (http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html), which is not a satisfactory definition of the format. I can report the lack of documentation on using parenthesis to quote multi-line fields to NCBI. I have never seen an example of the `minimal LOCUS lines'; will submit on if I ever encounter it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 15:26:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 15:26:30 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111926.l2BJQUeJ012797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 15:26 EST ------- The "interesting" part of a GenBank file is the feature table, which is also present in EMBL files in almost the same form. One good page for that is here: http://www.insdc.org/files/feature_table.html I'm sure I've seen other pages on the GenBank (in addition to the one you linked to) as well... If memory serves, I have seen "unquoted multiline features" and other "abuses" like blank lines in the feature table in actual NCBI files on occasion. They usually correct this sort of thing in later releases. Regarding "minimal LOCUS lines", I recall them being mentioned on our mailing lists as being produced by certain third party tools. Again I don't have the link to hand, and haven't had reason to chase this up. P.S. Have you joined the mailing list? This bug is starting to go off topic -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Mar 11 15:32:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 15:32:32 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> Message-ID: <45F45950.3090104@c2b2.columbia.edu> Ed Schofield wrote: > It's a minor issue, but I can see several reasons to use the header file > NumPy provides for the purpose, rather than pasting its definitions into > our own source files: I am not suggesting to paste the definitions in oldnumeric.h into our own source files. The point is that we don't need any of the definitions in oldnumeric.h. So we don't need to #include them, and we also don't need to paste them into Biopython. For example, your patch adds these lines (among others) to Bio/KDTree/KDTree.i: #if (NDARRAY_VERSION >= 0x00090908) #include "numpy/oldnumeric.h" #endif AFAICT, these three lines can simply be removed. --Michiel. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 16:07:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 16:07:10 -0400 Subject: [Biopython-dev] [Bug 1741] Bug in fasta consumer in Doc/tutorial.tex and Doc/examples/ In-Reply-To: Message-ID: <200703112007.l2BK7A1O014745@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1741 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Summary|Bug in fasta consumer in |Bug in fasta consumer in |Doc/tutorial.tex and |Doc/tutorial.tex and |Doc/examples/ |Doc/examples/ ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 16:07 EST ------- I've started to update that section of the tutorial to use Bio.SeqIO instead of Bio.Fasta, so the problem in Doc/Tutorial.tex doesn't apply anymore. The example scripts in Doc/examples/*.py will need updating or replacing... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 16:42:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 16:42:26 -0400 Subject: [Biopython-dev] [Bug 1741] Bug in fasta consumer in Doc/tutorial.tex and Doc/examples/ In-Reply-To: Message-ID: <200703112042.l2BKgQwp016519@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1741 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 16:42 EST ------- fasta_consumer.py - deleted, relevant section in the tutorial also replace fasta_dictionary.py - checked, then extended to also cover Bio.SeqIO fasta_iterator.py - checked, then extended to also cover Bio.SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 17:02:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 17:02:32 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703112102.l2BL2Wl4017557@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython- |biopython-dev at biopython.org |bugzilla at maubp.freeserve.co.| |uk | Status|ASSIGNED |NEW ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 17:02 EST ------- Have you had a chance to look at this patch yet Michiel? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sun Mar 11 18:01:58 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 22:01:58 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F44BD0.6010000@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> Message-ID: <45F47C56.7070300@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I have basically replaced that whole section in Tutorial.tex, and >> checked it looks fine using pdflatex and hevea on my linux machine. > > For those of you who want to preview the new tutorial, it's available here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > This contains a description of the new Bio.SeqIO in section 2.4. Cheers Michiel. There will be a few more changes still to come. For example, I just noticed that the "Orchid Photos" links which Brad Chapman liked so much are dead - looks like www.millicentorchids.com belongs to a domain parking company now. How about these flickr and google image searches instead? http://www.flickr.com/search/?q=lady+slipper+orchid&s=int&z=t http://images.google.com/images?q=lady%20slipper%20orchid Peter From mdehoon at c2b2.columbia.edu Sun Mar 11 19:53:07 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 19:53:07 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO Message-ID: <45F49663.5000509@c2b2.columbia.edu> I looked at the tutorial documentation for the new Bio.SeqIO. Generally it looks good to me, so I just have a few small issues: Section 2.4 is titled "Parsing biological file formats". This used to be appropriate, as the original version of this section described Biopython's parsers in more general terms. Since now it's about Bio.SeqIO, it may be better to rename it to make explicit that it is about parsing sequence files. Last sentence of section 2.4.2: "record.id" should be "seq_record.id". In section 2.4.4: "built python function list" should be "built-in python function list" Last sentence of section 2.4.5: "Not to complicated" should be "Not too complicated". And finally, "lets" in various places should be "let's". That's all, it looks very fine otherwise. Thanks! --Michiel. From edschofield at gmail.com Mon Mar 12 06:49:26 2007 From: edschofield at gmail.com (Ed Schofield) Date: Mon, 12 Mar 2007 10:49:26 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F45950.3090104@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> Message-ID: <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> On 3/11/07, Michiel de Hoon wrote: > > Ed Schofield wrote: > > It's a minor issue, but I can see several reasons to use the header file > > NumPy provides for the purpose, rather than pasting its definitions into > > our own source files: > > I am not suggesting to paste the definitions in oldnumeric.h into our > own source files. The point is that we don't need any of the definitions > in oldnumeric.h. So we don't need to #include them, and we also don't > need to paste them into Biopython. > > For example, your patch adds these lines (among others) to > Bio/KDTree/KDTree.i: > > #if (NDARRAY_VERSION >= 0x00090908) > #include "numpy/oldnumeric.h" > #endif > > AFAICT, these three lines can simply be removed. Ah, yes, I understand. And I've now re-read your previous post and now understand your point there about clustermodule.c too. So I've changed all (flags & CONTIGUOUS) instances to use PyArray_ISCONTIGUOUS() in my patch and updated it at http://edschofield.com/biopython-numpy-support.patch And you're right ... with these changes, including oldnumeric.h is no longer necessary anywhere. -- Ed From bugzilla-daemon at portal.open-bio.org Mon Mar 12 11:36:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 11:36:33 -0400 Subject: [Biopython-dev] [Bug 2229] New: GenBank Scanner fails to scan over headers Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2229 Summary: GenBank Scanner fails to scan over headers Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org The new GenBank code fails to read in NCBI-GenBank flat file releases, such as gbvrl1.seq (release 158.0) from Bio import GenBank import sys fh = open(sys.argv[1], 'r') gb_iter = GenBank.Iterator(fh, GenBank.FeatureParser()) for rec in gb_iter: print rec.id This fails because the first item is not 'LOCUS'. The following works. Index: Scanner.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/GenBank/Scanner.py,v retrieving revision 1.7 diff -r1.7 Scanner.py 62c62,63 < break --- > self.line = line > return line 69,72c70,72 < raise SyntaxError("Expected line starting '%s', found '%s'" \ < % (self.RECORD_START, line.rstrip())) < self.line = line < return line --- > if self.debug > 1: print "Skipping line" > > return None -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 11:41:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 11:41:24 -0400 Subject: [Biopython-dev] [Bug 2230] New: GenBank __init__.py: _Scanner import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2230 Summary: GenBank __init__.py: _Scanner import Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org This is bad coding IMHO and took me several minutes to figure out where the class was. from Scanner import GenBankScanner as _Scanner since this is used only two times and referenced once! Index: __init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v retrieving revision 1.67 diff -r1.67 __init__.py 25c25 < _Scanner Set up a GenBank parser to parse a record. --- > GenBankScanner Set up a GenBank parser to parse a record. 47c47 < from Scanner import GenBankScanner as _Scanner --- > from Scanner import GenBankScanner 178c178 < self._scanner = _Scanner(debug_level) --- > self._scanner = GenBankScanner(debug_level) 202c202 < self._scanner = _Scanner(debug_level) --- > self._scanner = GenBankScanner(debug_level) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 12:05:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:05:19 -0400 Subject: [Biopython-dev] [Bug 2229] GenBank Scanner fails to scan over headers In-Reply-To: Message-ID: <200703121605.l2CG5Jk1012292@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2229 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:05 EST ------- Link to download the file, about 28 MB ftp://ftp.ncbi.nih.gov/genbank/gbvrl1.seq.gz This starts: ------------------------------------------------ GBVRL1.SEQ Genetic Sequence Data Bank February 15 2007 NCBI-GenBank Flat File Release 158.0 Viral Sequences (Part 1) 72061 loci, 66147687 bases, from 72061 reported sequences LOCUS AB000048 2007 bp DNA linear VRL 05-FEB-1999 DEFINITION Feline panleukopenia virus DNA for nonstructural protein 1, complete cds. ACCESSION AB000048 ... ------------------------------------------------ Much smaller test case, 81 KB compressed: ftp://ftp.ncbi.nih.gov/genbank/gbuna.seq.gz File starts: ------------------------------------------------ GBUNA.SEQ Genetic Sequence Data Bank February 15 2007 NCBI-GenBank Flat File Release 158.0 Unannotated Sequences 211 loci, 114018 bases, from 211 reported sequences LOCUS AB086827 901 bp mRNA linea ... ------------------------------------------------ In both cases, and I assume all these archives, there is a fairly uniform header present, followed by the GenBank records. I suppose we could/should spot these and skip them... does anyone know off hand in EMBL does anything similar? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 12:11:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:11:35 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200703121611.l2CGBZtw012643@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |LATER ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:11 EST ------- That was done on purpose so that any legacy code directly using the _Scanner object would still work. Note that the original _Scanner class was defined in Bio/GenBank/__init__.py I agree that its not very elegant, but it seemed like the easiest way to do it at the time. I don't want to touch this with the next release due imminently, but certainly some housekeeping may be in order after that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Mar 12 12:16:51 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Mar 2007 16:16:51 +0000 Subject: [Biopython-dev] GenBank file format documentation In-Reply-To: <200703111926.l2BJQUeJ012797@portal.open-bio.org> References: <200703111926.l2BJQUeJ012797@portal.open-bio.org> Message-ID: <45F57CF3.4060402@maubp.freeserve.co.uk> On bug 2228, http://bugzilla.open-bio.org/show_bug.cgi?id=2228#c4 I wrote: > The "interesting" part of a GenBank file is the feature table, which is also > present in EMBL files in almost the same form. One good page for that is here: > > http://www.insdc.org/files/feature_table.html > > I'm sure I've seen other pages on the GenBank (in addition to the one you > linked to) as well... This link looks like the same document, and may have been what I was thinking of: http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html Peter From bugzilla-daemon at portal.open-bio.org Mon Mar 12 12:31:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:31:05 -0400 Subject: [Biopython-dev] [Bug 2229] GenBank Scanner fails to scan over headers In-Reply-To: Message-ID: <200703121631.l2CGV5CL013600@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2229 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:31 EST ------- You were spot on Marc - this was indeed a regression from when I added the combined GenBank / EMBL scanner. I've checked in a fix to Bio/GenBank/Scanner.py I plan to add another example to the unit tests as they didn't catch this. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 13:20:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:20:55 -0400 Subject: [Biopython-dev] [Bug 2231] New: NCBI added new sequence type - cRNA Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2231 Summary: NCBI added new sequence type - cRNA Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org Breaks GenBank.Scanner, add cRNA into assert array (and description above) assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA', 'cRNA'], \ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 13:26:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:26:18 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703121726.l2CHQIT7016249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 mcolosimo at mitre.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mcolosimo at mitre.org ------- Comment #2 from mcolosimo at mitre.org 2007-03-12 13:26 EST ------- (In reply to comment #1) > [I have tried to reassign this to the mailing list...] > > Could someone familar with BioSQL take a look at this please? > I'll have a look at this. I already have a few changes to submit for MySQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 13:55:04 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:55:04 -0400 Subject: [Biopython-dev] [Bug 2231] NCBI added new sequence type - cRNA In-Reply-To: Message-ID: <200703121755.l2CHt4xH017634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2231 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 13:55 EST ------- Suggested change made in CVS. Thanks Marc. Could you point me at a specific example GenBank file where this is used? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 15:56:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 15:56:58 -0400 Subject: [Biopython-dev] [Bug 2231] NCBI added new sequence type - cRNA In-Reply-To: Message-ID: <200703121956.l2CJuw5V023462@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2231 ------- Comment #2 from mcolosimo at mitre.org 2007-03-12 15:56 EST ------- (In reply to comment #1) > Suggested change made in CVS. Thanks Marc. > > Could you point me at a specific example GenBank file where this is used? > Yep, AF039525 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 09:42:08 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 09:42:08 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141342.l2EDg80p017603@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #3 from mcolosimo at mitre.org 2007-03-14 09:42 EST ------- Created an attachment (id=593) --> (http://bugzilla.open-bio.org/attachment.cgi?id=593&action=view) Fixed last_id method of Mysql_dbutils class -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 10:22:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 10:22:45 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141422.l2EEMjth019835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #4 from mcolosimo at mitre.org 2007-03-14 10:22 EST ------- Created an attachment (id=594) --> (http://bugzilla.open-bio.org/attachment.cgi?id=594&action=view) Various fixes and possible improvements -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 10:27:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 10:27:07 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141427.l2EER79G020271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 mcolosimo at mitre.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #5 from mcolosimo at mitre.org 2007-03-14 10:27 EST ------- I've attached two diffs that fixed a bunch of "bugs". A bug was introduced that broke a key feature (storing of ncbi taxon ids) when trying to fix a bug. I fixed both the code that looked up the ncbi taxon ids and the sql statement that stored them. Hint: using None is better than "0" or removing it. Also, using my code and the latest BioSQL schema, I was able to load in AY243312 with out any problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 11:16:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:16:56 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141516.l2EFGurY022990@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 11:16 EST ------- Minor commentss on patch 594 (which I have only read)): Did you mean to change the number of leading spaces in this string: " FROM reference JOIN dbxref USING (dbxref_id)" versus: " FROM reference JOIN dbxref USING (dbxref_id)" You've used a mixture of "MEC" and "mec" tags on some comments - your initials? Personally I would avoid this... I initially assumed it was a three letter acronyms for something biological. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 11:22:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:22:12 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141522.l2EFMCYj023249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 11:22 EST ------- Marc, you wrote in comment 5: > I've attached two diffs that fixed a bunch of "bugs". A bug was > introduced that broke a key feature (storing of ncbi taxon ids) > when trying to fix a bug. I fixed both the code that looked up > the ncbi taxon ids and the sql statement that stored them. > > Hint: using None is better than "0" or removing it. I'm guessing that you are refering to the change two years ago in biopython/BioSQL/Loader.py (revision 1.16) http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/BioSQL/Loader.py?cvsroot=biopython See also bug 1921 which I think is a different issue (?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Mar 14 11:51:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 11:51:31 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> Message-ID: <45F81A03.4080504@c2b2.columbia.edu> Ed Schofield wrote: > Ah, yes, I understand. And I've now re-read your previous post and now > understand your point there about clustermodule.c too. So I've changed > all (flags & CONTIGUOUS) instances to use PyArray_ISCONTIGUOUS() in my > patch and updated it at > > http://edschofield.com/biopython-numpy-support.patch OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). I'll look at this patch in more detail after the upcoming release is out. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Wed Mar 14 11:51:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:51:06 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703141551.l2EFp6k9024891@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-03-14 11:51 EST ------- > Have you had a chance to look at this patch yet Michiel? No, I haven't looked at this. Basically I have given up on parsing plain-text output from Blast. I don't think it can be done reliably. On the other hand, if the patch solves some of the issues with plain-text parsing, I'm all for it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 12:06:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 12:06:44 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141606.l2EG6iOl025748@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #8 from lpritc at scri.sari.ac.uk 2007-03-14 12:06 EST ------- > I'm guessing that you are refering to the change two years ago in > biopython/BioSQL/Loader.py (revision 1.16) I think Marc's improving on my workaround to Loader.py (rev 1.17) and taking an alternative approach to my fix to DBUtils.py. The test_BioSQL.py script in CVS still fails though, due to an unrelated issue of the return string format for locations, and I still need to issue server.adaptor.commit() in MySQL5. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From edschofield at gmail.com Wed Mar 14 14:26:53 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 14 Mar 2007 18:26:53 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F47C56.7070300@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> Message-ID: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> On 3/11/07, Peter wrote: > Michiel de Hoon wrote: > > Peter wrote: > >> I have basically replaced that whole section in Tutorial.tex, and > >> checked it looks fine using pdflatex and hevea on my linux machine. > > > > For those of you who want to preview the new tutorial, it's available here: > > > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > > > This contains a description of the new Bio.SeqIO in section 2.4. > > Cheers Michiel. > > There will be a few more changes still to come. > > For example, I just noticed that the "Orchid Photos" links which Brad > Chapman liked so much are dead - looks like www.millicentorchids.com > belongs to a domain parking company now. Two more points: - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. - The code example in Section 2.4.1 uses record and seq_record inconsistently. Otherwise, looks good to me! -- Ed From bugzilla-daemon at portal.open-bio.org Wed Mar 14 14:34:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 14:34:27 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141834.l2EIYRNL001256@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #9 from mcolosimo at mitre.org 2007-03-14 14:34 EST ------- I meant the change in leading spaces, which is only cosmetic. I put my initals in the code for my usage so that I knew what I touched. So, they can be removed. In the change log, I think it should be keept (or add the bug # that these fix). (In reply to comment #6) > Minor commentss on patch 594 (which I have only read)): > > Did you mean to change the number of leading spaces in this string: > " FROM reference JOIN dbxref USING (dbxref_id)" > versus: > " FROM reference JOIN dbxref USING (dbxref_id)" > > You've used a mixture of "MEC" and "mec" tags on some comments - your initials? > Personally I would avoid this... I initially assumed it was a three letter > acronyms for something biological. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 14:41:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 14:41:47 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141841.l2EIflj1001592@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #10 from mcolosimo at mitre.org 2007-03-14 14:41 EST ------- (In reply to comment #8) > > I'm guessing that you are refering to the change two years ago in > > biopython/BioSQL/Loader.py (revision 1.16) > > I think Marc's improving on my workaround to Loader.py (rev 1.17) and taking an > alternative approach to my fix to DBUtils.py. The test_BioSQL.py script in CVS > still fails though, due to an unrelated issue of the return string format for > locations, and I still need to issue server.adaptor.commit() in MySQL5. > This does improve on your work around for bug #1921 and maybe the patch files should be attached to that bug #. I couldn't repeat the error for this bug. This is an older bug and might have been fixed in between. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Mar 14 15:08:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 15:08:58 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> Message-ID: <45F8484A.9010803@c2b2.columbia.edu> Ed Schofield wrote: > > - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. I've added these files on the server. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 15:47:07 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Mar 2007 19:47:07 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> Message-ID: <45F8513B.90802@maubp.freeserve.co.uk> Peter wrote: >> ..., I just noticed that the "Orchid Photos" links which Brad >> Chapman liked so much are dead - looks like www.millicentorchids.com >> belongs to a domain parking company now. Unless anyone has a better suggestion, I've gone with the Google Image and Filkr searches instead... Ed Schofield wrote: > Two more points: > > - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. There was a "funny" in the LaTeX code that worked in the PDF output from pdflatex, but not in the HTML output from Hevea. I think I have fixed that in CVS now. Also I see Michiel has put the example files on the website now, so I have also updated the link to point there (rather than downloading the via a ViewCVS checkout). Ed Schofield wrote: > - The code example in Section 2.4.1 uses record and seq_record inconsistently. I think Michiel pointed that out too, and again, I think I have corrected that in CVS. Ed Schofield wrote: > Otherwise, looks good to me! Lovely. I'll try and recompile the HTML edition tonight, and fingers crossed we can get it online for another proof read before the next release. Peter From mdehoon at c2b2.columbia.edu Wed Mar 14 16:18:17 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 16:18:17 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F8513B.90802@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> Message-ID: <45F85889.20104@c2b2.columbia.edu> Peter wrote: > I'll try and recompile the HTML edition tonight, and fingers crossed we > can get it online for another proof read before the next release. > If it's easier for you, you can also submit Turorial.tex to CVS, then I'll get it from there, run hevea on it and put it up for a proof read (since hevea will be used for the final version). --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 18:42:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Mar 2007 22:42:01 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F85889.20104@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> Message-ID: <45F87A39.4060809@maubp.freeserve.co.uk> Thanks for the feedback Michiel and Ed. When I get back from my holiday (by which time I hope that Biopython 1.43 will have been released without any big issues), I plan to update Section 4.4 "Dealing with alignments". I've made a few changes to the Tutorial in CVS today, but I think I've finished now. Michiel Jan Laurens de Hoon wrote: > If it's easier for you, you can also submit Turorial.tex to CVS, then > I'll get it from there, run hevea on it and put it up for a proof read > (since hevea will be used for the final version). That would be easier for me. You might notice that I tweaked the all URLs (and updated the addresses of a few that had changed). This got rid of the recursive anchor tag warnings I was getting from hevea, and hasn't had any side effects as far as I can tell. I have just checked the updated Tutorial.tex works on Linux with both pdflatex (for Tutorial.pdf) and hevea (for the HTML version). I have not installed hevea on Windows, but after adding the hevea.sty file to my windows latex installation (I use MikTex) I was able to build Tutorial.pdf on Windows too. >Ed Schofield wrote: >> >> - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. >> > I've added these files on the server. Great. I've updated the URLs in the tutorial to point at those files. It might be nice to upload the rest of the files used Bio/Doc/examples/ as well... I had previously made some of the other examples point to ViewCVS to download the relevant file - this works but is a bit clumsy. Certainly when I was first reading the Tutorial, my immediate reaction was to think where can I get these examples sequence files - and having a URL right there in the HTMl and PDF editions is very nice. Peter From mdehoon at c2b2.columbia.edu Wed Mar 14 20:04:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 20:04:33 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F87A39.4060809@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> Message-ID: <45F88D91.5080609@c2b2.columbia.edu> I put the new Tutorial preview is at http://biopython.org/DIST/docs/tutorial/Tutorial-new.html Peter wrote: >>> - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. >>> >> I've added these files on the server. > > Great. I've updated the URLs in the tutorial to point at those files. > It might be nice to upload the rest of the files used Bio/Doc/examples/ > as well... > I've changed the links to the example files to relative paths, and added these files on the server. With the relative paths, links should work both on the web page and from the local documentation as contained in the Biopython distribution. --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 20:37:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 00:37:05 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F88D91.5080609@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> Message-ID: <45F89531.9070600@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > I've changed the links to the example files to relative paths, and added > these files on the server. With the relative paths, links should work > both on the web page and from the local documentation as contained in > the Biopython distribution. > > ... > > I put the new Tutorial preview at > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html I am the bearer of bad news I'm afraid. I assume the local file structure looks something like this: .../Doc/Tutorial.pdf .../Doc/Tutorial.html .../Doc/examples/ls_orchid.fasta .../Doc/examples/ls_orchid.gbk etc With the online copies going here: http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk So yes, relative links like "examples/ls_orchids.fasta" and "examples/ls_orchids.gbk" should in theory work for both the HTML and PDF files, both locally and online. Sadly it looks like hevea has mangled your relative links. They looked fine in LaTeX, but the HTML file you put online contains things like which of course fails. Also, and this may depend on the version of LaTeX used, but on the Tutorial.pdf I just built on windows, using Adobe Reader 6.0, the links display with "file:examples/ls_orchid.fasta" as the tool tip, but seem to get passed to Internet Explorer as "http://examples/ls_orchid.fasta" which fails. I think it would be simpler and safer to just use an absolute URL to the webpage copy, and mention in the text that the files are included in the source code. By the way - am I right in thinking that the Windows installer does not come with the documentation directory? I would assume that some Windows users would just download the PDF tutorial and put it anywhere on their hard disk (its what I would do!), in which case there is no way the relative links idea could work. Peter From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 20:41:50 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 00:41:50 +0000 Subject: [Biopython-dev] [BioPython] Biopython hackathon In-Reply-To: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com> References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu> <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com> Message-ID: <45F8964E.1030701@maubp.freeserve.co.uk> Ann Loraine wrote: > I hope you will consider the following two requests as possible > hackathon activities: > > (1) If it does not already do this, it would be nice if the blast > "plain text" (non-XML) parser would report the length of the target > ("hit") sequence as well as the query. If I recall correctly, the last > time I used the plain text blast parser, I had to measure the length > of the targets by opening up the fasta copy of the blastable database > and reading the lengths one-by-one. My database wasn't very big, so it > wasn't a hassle to do this, but I can foresee situations where this > kludge would fail. I'm not answering your question, but... The "plain text" (non-XML) parser is currently falling behind the changes the NCBI seem to make each release - see bug 2090: http://bugzilla.open-bio.org/show_bug.cgi?id=2090 We really want to encourage people to move over to the XML Blast parser instead. Do you have a strong reason for preferring to parse the plain text output? Peter From bugzilla-daemon at portal.open-bio.org Wed Mar 14 20:50:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 20:50:40 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703150050.l2F0oeV7016990@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 20:50 EST ------- > No, I haven't looked at this. Basically I have given up on > parsing plain-text output from Blast. I don't think it can > be done reliably. On the other hand, if the patch solves some > of the issues with plain-text parsing, I'm all for it. I agree with you that maintaining the plain-text parsing is a pain - especially given the recent change for the output for multiple queries where the header is no longer repeated. However, the patch did help for parsing the output of single queries, which is better than nothing. Note that I haven't touched this code since December, but the only change in CVS in the meantime was the trivial "oldengine" switch, so it should be easy to merge this in. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 21:38:09 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 01:38:09 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F88D91.5080609@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> Message-ID: <45F8A381.3020900@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > I've changed the links to the example files to relative paths, and added > these files on the server. ... So far I can see just "ls_orchid.fasta", "ls_orchid.gbk" and "opuntia.fasta" online: http://biopython.org/DIST/docs/tutorial/examples/ To be fair, these are the only files I had explicitly named or linked to. I meant it would be nice to also upload and then link to: (a) protein.aln, used (but not currently hyper-referenced) in the section "Creating your own substitution matrix from an alignment". (b) m_cold.fasta, used (but not currently hyper-referenced) as an example input file in the Blast section, and also in the "What the heck in a handle?" appendix. I think that covers all the non python files in Bio/Doc/examples/ except for Bio/Doc/examples/nmr/noed.xpk which does not seem to be used in the tutorial at the moment. Peter From mdehoon at c2b2.columbia.edu Thu Mar 15 11:57:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Mar 2007 11:57:05 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F89531.9070600@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> <45F89531.9070600@maubp.freeserve.co.uk> Message-ID: <45F96CD1.7010002@c2b2.columbia.edu> Peter wrote: > Michiel Jan Laurens de Hoon wrote: >> I've changed the links to the example files to relative paths, and >> added these files on the server. With the relative paths, links should >> work both on the web page and from the local documentation as >> contained in the Biopython distribution. > > Sadly it looks like hevea has mangled your relative links. They looked > fine in LaTeX, but the HTML file you put online contains things like HREF="ls_orchid.fasta"> which of course fails. Are you sure you're looking at the latest version of the HTML? Maybe your browser is showing you an older page from cache. On my browser, I'm seeing . > I think it would be simpler and safer to just use an absolute URL to the > webpage copy, and mention in the text that the files are included in the > source code. > By the way - am I right in thinking that the Windows installer does not > come with the documentation directory? I would assume that some Windows > users would just download the PDF tutorial and put it anywhere on their > hard disk (its what I would do!), in which case there is no way the > relative links idea could work. That's right: The Windows installer comes without the documentation. As a solution, we can include both links -- a local one for off-line use, and an absolute URL for safety. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Sat Mar 17 19:26:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 17 Mar 2007 19:26:50 -0400 Subject: [Biopython-dev] Biopython release 1.43 Message-ID: <45FC793A.2010106@c2b2.columbia.edu> Dear Biopythoneers, We are pleased to announce the release of Biopython 1.43. This release includes a brand-new set of parsers in Bio.SeqIO by Peter Cock for reading biological sequence files in various formats, an updated Blast XML parser in Bio.Blast.NCBIXML, a new UniGene flat-file parser by Sean Davis, and numerous improvements and bug fixes in Bio.PDB, Bio.SwissProt, Bio.Nexus, BioSQL, and others. Believe it or not, even the documentation was updated. Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Sun Mar 18 12:14:25 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 18 Mar 2007 12:14:25 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <45F235B7.6000409@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> Message-ID: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> On 3/10/07, Michiel de Hoon wrote: > Chris Lasher wrote: > > I hope the BioPython developers will consider a move to Subversion > > seriously. If there is support from the devs, but no interest on > > anyone's part to make it happen, given the proper people to contact, I > > will be happy to get this moving as a way of contributing back to the > > BioPython community. > > I know very little about CVS and Subversion (which is why I didn't > respond to your original post). But I did notice that a lot of software > projects are using Subversion instead of CVS nowadays, including Python > itself. So I don't have any objections against Biopython moving to > Subversion as well (unfortunately, I cannot be very help here either). > > If we are moving to Subversion though, I'd like to ask you not to make > any changes until the next Biopython release comes out, which will be in > about one week from now. Since no one else has volunteered, I'm taking up responsibility for the transition. I got the ball moving by contacting "support at open-bio.org" to get alert them of our interest and get any contacts we'll need to make this happen. Also, if anybody on the list has any information that would be helpful in this (e.g., who administers the CVS repo) please feel free to send it along. Likewise, feel free to raise any questions, concerns, and comments on the list. Once the Subversion repository is in place and in sync with the CVS repository (done through cvs2svn), we have two options with regards to the CVS repository: 1) Drop support for CVS. What this means: The CVS repository will either be shut down or be left but not supported with updates to the code. Avantages: * Transitioning users to Subversion should be trivial due to Subversion's inheritance from CVS. - Transition documentation already exists from svnbook.red-bean.com, specific examples can be written on Biopython wiki, etc. * Clean, less challenging solution. Disadvantages: * We will need to publicize in a big way that we've transitioned. * Automated scripts on remote machines that depend on the CVS repository would break or work with Biopython code that becomes increasingly out of date. - Trivial to remedy (e.g., s/cvs up -dP/svn up/g) * Obstinate users will complain. - We can't please everybody. 2) Allowing legacy support for CVS via the method found at: What this means: Briefly, the commits to Subversion repository are mirrored in the CVS repository. CVS access becomes read-only, commits are not permitted. Advantages: * Allows legacy users of CVS repository to receive updates. Disadvantages: * We may not have enough administrative access to do this. * This will require much more time to implement, test, and triple-check. * Has anybody on the list ever done this? It could lead to a lot of "learning experiences". Questions, concerns, and comments welcome. Chris From edschofield at gmail.com Sun Mar 18 12:53:37 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sun, 18 Mar 2007 16:53:37 +0000 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> Message-ID: <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> On 3/18/07, Chris Lasher wrote: > > Since no one else has volunteered, I'm taking up responsibility for > the transition. I got the ball moving by contacting "support at > open-bio.org" to get alert them of our interest and get any contacts > we'll need to make this happen. Also, if anybody on the list has any > information that would be helpful in this (e.g., who administers the > CVS repo) please feel free to send it along. Likewise, feel free to > raise any questions, concerns, and comments on the list. > > Once the Subversion repository is in place and in sync with the CVS > repository (done through cvs2svn), we have two options with regards to > the CVS repository: > ... Nice summary. How many "legacy users" will there be for whom moving to SVN would be a non-trivial task? This list is one channel for soliciting their feedback; are there any others? -- Ed From chris.lasher at gmail.com Sun Mar 18 13:02:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 18 Mar 2007 13:02:03 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> Message-ID: <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> On 3/18/07, Ed Schofield wrote: > Nice summary. How many "legacy users" will there be for whom moving to > SVN would be a non-trivial task? This list is one channel for > soliciting their feedback; are there any others? I don't know, actually, how many "legacy users" we would have. The other channel for soliciting feedback, and a very important one, is the Biopython user list. That would probably be the best way of assessing how many would still need to rely on CVS. Should I send an email out to that list, or would one of the more senior developers like to do it? Chris From sbassi at gmail.com Tue Mar 20 15:41:08 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 20 Mar 2007 16:41:08 -0300 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F81A03.4080504@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> Message-ID: On 3/14/07, Michiel Jan Laurens de Hoon wrote: > OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use > PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). > I'll look at this patch in more detail after the upcoming release is out. Did the numpy instead of numeric made it into 1.43? Best, SB. From mdehoon at c2b2.columbia.edu Tue Mar 20 16:46:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 20 Mar 2007 16:46:52 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> Message-ID: <4600483C.7010507@c2b2.columbia.edu> Sebastian Bassi wrote: > On 3/14/07, Michiel Jan Laurens de Hoon wrote: >> OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use >> PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). >> I'll look at this patch in more detail after the upcoming release is out. > > Did the numpy instead of numeric made it into 1.43? > Yes and no. Yes): The patch above did make it in, which means that it is now fairly easy to compile Biopython with Numpy instead of Numeric. All you would have to do is to change the #include statements in the C-code to #include . (Why the Numpy folks put arrayobject.h in a different location, I don't know; if they hadn't, a transition to Numpy would have been a lot easier). But, if you're not on a 32-bits platform, Bio.Cluster will not work correctly. Also, some import statements will fail (e.g. from Numeric import * should become from numpy import *). So, in summary, you can compile the code but you'll probably have to tinker with the Python code. On the bright side, most of Biopython does not need Numeric or Numpy, so 90% of Biopython will work. No): Adding numpy support is a major undertaking; it's not something I'd want to add one week before a release is coming out. Some tests fail with Numpy. While these don't seem to be major issues, it's something that needs to be fixed first. An additional problem is that we cannot just drop Numeric support, so we'll have to support both for now. Numeric is still needed because (a) Numpy does not compile cleanly on all major platforms (for example on Cygwin); (b) Other Python software relevant to computational biology uses Numeric; (c) Numpy's documentation costs $40; Numeric's free. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Tue Mar 20 22:39:11 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 20 Mar 2007 22:39:11 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> Message-ID: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> On 3/18/07, Chris Lasher wrote: > I don't know, actually, how many "legacy users" we would have. The > other channel for soliciting feedback, and a very important one, is > the Biopython user list. That would probably be the best way of > assessing how many would still need to rely on CVS. Should I send an > email out to that list, or would one of the more senior developers > like to do it? Silence means assent. =-) I'll post to the Biopython users list if nobody else wants to and I don't hear any objections in the next day. Chris From edschofield at gmail.com Wed Mar 21 07:36:13 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 21 Mar 2007 11:36:13 +0000 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <1b5a37350703210436l6aeef938s13f06ebabc2d8a09@mail.gmail.com> On 3/21/07, Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: > > I don't know, actually, how many "legacy users" we would have. The > > other channel for soliciting feedback, and a very important one, is > > the Biopython user list. That would probably be the best way of > > assessing how many would still need to rely on CVS. Should I send an > > email out to that list, or would one of the more senior developers > > like to do it? > > Silence means assent. =-) Exactly! ;) -- Ed From mcolosimo at mitre.org Wed Mar 21 08:39:57 2007 From: mcolosimo at mitre.org (Marc Colosimo) Date: Wed, 21 Mar 2007 08:39:57 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: On Mar 20, 2007, at 10:39 PM, Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > I've found that svn is more useable than cvs. I especially like the move command (svn mv), which moves the file(s) and all the associated information. Thus, you can undo a move and see all the change history for the file(s). In addition, it handles binary files automagically. The one thing that I don't like at times is that you need to explicitly set keyword tags (like $Id$). Also, some places block webDAV commands through proxies. So, keeping a read only cvs would be good. Marc From edschofield at gmail.com Wed Mar 21 08:55:34 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 21 Mar 2007 12:55:34 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <4600483C.7010507@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> <4600483C.7010507@c2b2.columbia.edu> Message-ID: <1b5a37350703210555n720d4e9bla03207a3f5758833@mail.gmail.com> On 3/20/07, Michiel Jan Laurens de Hoon wrote: > Sebastian Bassi wrote: > > On 3/14/07, Michiel Jan Laurens de Hoon wrote: > >> OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use > >> PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). > >> I'll look at this patch in more detail after the upcoming release is out. > > > > Did the numpy instead of numeric made it into 1.43? > > > Yes and no. > > Yes): > > The patch above did make it in, which means that it is now fairly easy > to compile Biopython with Numpy instead of Numeric. All you would have > to do is to change the #include statements in > the C-code to #include A more honest answer would have been "no". To compile and _actually use_ NumPy one needs somewhat more than to "probably ... tinker with the Python code". At a minimum one must change the Python import statements, change references to obsoleted function names, fix the broken array boolean operators in MarkovModel.py, and, for 64-bit platforms, fix the width of the dimension data types in cluster.c. These changes are not optional. To make it buildable one also needs to change the distutils setup.py file to get the new header locations. And if one ever needs to install Numeric after installing BioPython, one wants a mechanism to avoid segfaults when importing incompatible compiled C modules. In short, one still needs my patch. -- Ed From chris.lasher at gmail.com Wed Mar 21 10:23:58 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 21 Mar 2007 10:23:58 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <128a885f0703210723q3bc95e41kc587c9077915763d@mail.gmail.com> On 3/21/07, Marc Colosimo wrote: > I've found that svn is more useable than cvs. I especially like the > move command (svn mv), which moves the file(s) and all the associated > information. Thus, you can undo a move and see all the change history > for the file(s). In addition, it handles binary files automagically. Good points. Another +1 for Subversion, it tracks changes concommitantly and maintains a revision across the entire repository, not on a file-by-file basis. The Subversion Book explains this concept better than I can. See and . > The one thing that I don't like at times is that you need to > explicitly set keyword tags (like $Id$). That's true. One thing you can do, however, is configure Subversion to always set keyword substitution on all files of a filetype (e.g., always set keyword substitution for $Revision$ for all ".py" files). > Also, some places block webDAV commands through proxies. > So, keeping a read only cvs would be good. Hmm, I hadn't thought about this. How common is this practice? It's certainly a good argument in favor of maintaining CVS. (I'm assuming CVS does not do WebDAV). Chris From mhampton at d.umn.edu Wed Mar 21 10:29:20 2007 From: mhampton at d.umn.edu (Marshall Hampton) Date: Wed, 21 Mar 2007 09:29:20 -0500 (CDT) Subject: [Biopython-dev] SAGE project support In-Reply-To: References: Message-ID: Hi, I thought I would alert readers of this list to the fact that I recently asked the project leader for SAGE (Software for Algebra and Geometry Experimentation), William Stein, to add biopython as an optional package for SAGE. In his usual speedy fashion, he did so in a couple of hours! SAGE is a very exciting platform for uniting many open-source projects in mathematics. By leveraging lots of existing code it has progressed extremely rapidly. Currently the main server at the University of Washington is down for maintenance, but there is a mirror (maybe to become the main site) at: www.sagemath.org The screenshots link gives a pretty good idea of what SAGE is currently capable of. By the way, I am currently using biopython heavily in a bioinformatics course and I hope to contribute more to the project in the future. Cheers, Marshall Hampton University of Minnesota, Duluth Department of Mathematics and Statistics From mdehoon at c2b2.columbia.edu Thu Mar 22 00:36:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Thu, 22 Mar 2007 00:36:12 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <460207BC.9000907@c2b2.columbia.edu> Maybe this is a silly question but will there be some downtime during the cvs->svn conversion? And can we still make commits to cvs or is it better to wait until the conversion is complete? --Michiel. Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mcolosimo at mitre.org Thu Mar 22 06:33:52 2007 From: mcolosimo at mitre.org (Colosimo, Marc E.) Date: Thu, 22 Mar 2007 06:33:52 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <460207BC.9000907@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com><128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> <460207BC.9000907@c2b2.columbia.edu> Message-ID: I haven't done this, but I would assume that you need to turn off commits to cvs and I think it would be rather quick (depending on the machine), like overnight. I'll ask some people here who might know better. Marc -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org [mailto:biopython-dev-bounces at lists.open-bio.org] On Behalf Of Michiel de Hoon Sent: Thursday, March 22, 2007 12:36 AM To: Chris Lasher Cc: BioPython Developers List Subject: Re: [Biopython-dev] Subversion Repository Maybe this is a silly question but will there be some downtime during the cvs->svn conversion? And can we still make commits to cvs or is it better to wait until the conversion is complete? --Michiel. Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From chris.lasher at gmail.com Sat Mar 24 12:36:22 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 24 Mar 2007 12:36:22 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <460207BC.9000907@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> <460207BC.9000907@c2b2.columbia.edu> Message-ID: <128a885f0703240936l4cebc2c1j5e8294560b3e7322@mail.gmail.com> On 3/22/07, Michiel de Hoon wrote: > Maybe this is a silly question but will there be some downtime during > the cvs->svn conversion? And can we still make commits to cvs or is it > better to wait until the conversion is complete? Ah, very good question, Michiel. Yes, there will be downtime for the CVS repository. As Marc mentioned, it would probably be one evening. Also, to clarify, once the conversion is complete, CVS commits won't be permitted. If we decide to maintain CVS at all, it will only be possible to check out and update from CVS. (In other words CVS becomes "read-only".) All commits will go through Subversion, and there are two main reasons for this: 1) There is no easy means by which we can keep the two in sync going both ways. The Subversion to CVS syncronization is quite a bit of hackery on its own. 2) Subversion effectively deprecates CVS. Taken straight from Subversion's site: > Subversion is meant to be a better CVS, so it has most > of CVS's features. Generally, Subversion's interface to > a particular feature is similar to CVS's, except where > there's a compelling reason to do otherwise. So, aside from the natural human propensity for fear of change, why one would want to continue to work via CVS escapes me. I am not keen on supporting CVS with updates after the transition, but if users must have access to it, I will put in the work it takes to make that happen. In any event, we will strongly encourage Biopython users to make the transition to Subversion. I need to draft a migration strategy which includes the following: * documentation for developers on switching over to Subversion, in general (available via the Subversion Book), and in the specific context of Biopython, particularly * documentation for users on how to checkout and update Biopython via Subversion * the nitty gritty technical details of how we will proceed with the conversion--useful for the other Open-Bio projects which will want to follow in Biopython's wake I will begin writing this documentation on the Biopython wiki this weekend. I would like to set a target date for the CVS to Subversion transition for May 20th, which gives about two months' worth of anticipation for developers (and users, once I get an email out to that list), and plenty of time prior to BOSC 2007 for the growing pains caused by the transition to be worked out. How does this sound--any red flags or glaring omissions? Chris From mdehoon at c2b2.columbia.edu Mon Mar 26 20:03:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 26 Mar 2007 20:03:49 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com><45F1C218.6000601@c2b2.columbia.edu><1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com><45F2253E.2030909@c2b2.columbia.edu><1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com><45F45950.3090104@c2b2.columbia.edu><1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com><45F81A03.4080504@c2b2.columbia.edu><4600483C.7010507@c2b2.columbia.edu> <1b5a37350703210555n720d4e9bla03207a3f5758833@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5DA@mail2.exch.c2b2.columbia.edu> Ed wrote: > In short, one still needs my patch. Sorry for my late reply. Could you make your patch available via BugZilla? So that further discussion of this patch can be kept in one place. Preferably a version that is consistent with the recently released version of Biopython, so people can try it out. Thanks, --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Tue Mar 27 09:08:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Mar 2007 09:08:37 -0400 Subject: [Biopython-dev] [Bug 2251] New: [PATCH] NumPy support for BioPython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2251 Summary: [PATCH] NumPy support for BioPython Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: edschofield at gmail.com The following patch adds support for NumPy in addition to Numeric. It does so with a thin wrapper layer at the C level and Python level, similar in purpose to (but less ambitious than) the numerix wrapper layer used by matplotlib. The patch is designed to prevent imports of incompatible compiled C modules in the case one installs Numeric after installing BioPython with NumPy support. It also changes the following: - C #include statements - Python import statements - references to obsoleted function names - the width of the dimension data types in cluster.c from int to intp (for 64-bit architectures) - the distutils setup.py file to supply the correct NumPy header locations. - the documentation (updating references to NumPy) It also fixes array boolean operators in MarkovModel.py, which were silently broken before. It applies to BioPython 1.43, as follows: $ tar xzvf biopython-1.43.tar.gz $ cd biopython-1.43 $ patch -p1 < /path/to/biopython-1.43-numpy-support-v5.patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 27 09:09:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Mar 2007 09:09:58 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703271309.l2RD9wte011941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #1 from edschofield at gmail.com 2007-03-27 09:09 EST ------- Created an attachment (id=610) --> (http://bugzilla.open-bio.org/attachment.cgi?id=610&action=view) Patch for NumPy support through the oldnumeric interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 12:09:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 12:09:57 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281609.l2SG9vFE011216@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-03-28 12:09 EST ------- It looks like there's an error in this patch: @@ -166,7 +184,7 @@ const int colstride = (*array)->strides[1]; for (i=0; i < nrows; i++) { const char* p = p0; - mask[i] = malloc(ncolumns*sizeof(int)); + mask[i] = malloc(ncolumns*sizeof(int*)); for (j=0; j < ncolumns; j++, p+=colstride) mask[i][j] = *((int*)p); p0 += rowstride; } mask is int**, mask[i] is int*, and we're allocating ncolumns integers. Or am I missing something? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 13:05:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 13:05:57 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281705.l2SH5vhN014252@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #3 from edschofield at gmail.com 2007-03-28 13:05 EST ------- Oops, my mistake. Revised patch attached below. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 13:06:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 13:06:59 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281706.l2SH6xna014329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 edschofield at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #610 is|0 |1 obsolete| | ------- Comment #4 from edschofield at gmail.com 2007-03-28 13:06 EST ------- Created an attachment (id=611) --> (http://bugzilla.open-bio.org/attachment.cgi?id=611&action=view) Patch for NumPy support through the oldnumeric interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From charles.vejnar at isb-sib.ch Wed Mar 28 12:14:33 2007 From: charles.vejnar at isb-sib.ch (Charles Vejnar) Date: Wed, 28 Mar 2007 18:14:33 +0200 Subject: [Biopython-dev] Dynamic class return in Seq class Message-ID: <200703281814.33891.charles.vejnar@isb-sib.ch> Hi, I would like to build a class which would inherit from Seq class. But in some methods of the Seq class, return is like this : return Seq(s, self.alphabet) which makes my sub-class unusable (I always get a Seq instance instead of my sub-class Seq instance). I would like to avoid a delegation schema. So, is it possible to change the returns from Seq(...) to self.__class__(...) (as it's already done in the __getslice__ method) or are there any reasons I am missing which justify these returns ? Best regards, Charles From gbastian at pasteur.fr Thu Mar 29 10:46:59 2007 From: gbastian at pasteur.fr (Giacomo Bastianelli) Date: Thu, 29 Mar 2007 16:46:59 +0200 Subject: [Biopython-dev] ResidueDepth Message-ID: <460BD163.9050509@pasteur.fr> Dear Biopython developers, I am trying to use the ResidueDepth class. I have installed the MSMS module and I get this error: Traceback (most recent call last): File "test.py", line 8, in ? rd = ResidueDepth(model, '1SBC.pdb') File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") IOError: [Errno 2] No such file or directory: '/tmp/tmpEoynGC.vert' I checked the single programs (msms, pdb_to_xyzr) and they seem to work fine. Thanks for your suggestions! Giacomo From mdehoon at c2b2.columbia.edu Thu Mar 29 19:03:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 29 Mar 2007 19:03:22 -0400 Subject: [Biopython-dev] Dynamic class return in Seq class In-Reply-To: <200703281814.33891.charles.vejnar@isb-sib.ch> References: <200703281814.33891.charles.vejnar@isb-sib.ch> Message-ID: <460C45BA.1050801@c2b2.columbia.edu> In principle, I think that using self.__class__ instead of Seq is a good idea. But some Biopython tests fail with this substitution. It looks like these failures are trivial, but we do need to find some solution for them. --Michiel. Charles Vejnar wrote: > Hi, > > I would like to build a class which would inherit from Seq class. But in some > methods of the Seq class, return is like this : > return Seq(s, self.alphabet) > which makes my sub-class unusable (I always get a Seq instance instead of my > sub-class Seq instance). > > I would like to avoid a delegation schema. So, is it possible to change the > returns from Seq(...) to self.__class__(...) (as it's already done in the > __getslice__ method) or are there any reasons I am missing which justify > these returns ? > > > Best regards, > Charles > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From gbastian at pasteur.fr Fri Mar 30 04:22:58 2007 From: gbastian at pasteur.fr (Giacomo Bastianelli) Date: Fri, 30 Mar 2007 10:22:58 +0200 Subject: [Biopython-dev] Residue Depth class Message-ID: <1175242979.7228.4.camel@localhost> Dear Biopython developers, I am trying to use the ResidueDepth class. I have installed the MSMS module and I get this error: Traceback (most recent call last): File "test.py", line 8, in ? rd = ResidueDepth(model, '1SBC.pdb') File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") IOError: [Errno 2] No such file or directory: '/tmp/tmpEoynGC.vert' I have python2.4 with biopython 1.43 in a linux ubuntu OS. I checked the single programs (msms, pdb_to_xyzr) and they seem to work fine. this is the code that I use: ---------------------- from string import * from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('1SBC.pdb', '1SBC.pdb') model = structure[0] rd = ResidueDepth(model, '1SBC.pdb') -------------------------- Thanks for your suggestions! Giacomo From bugzilla-daemon at portal.open-bio.org Sat Mar 31 18:52:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 31 Mar 2007 18:52:46 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703312252.l2VMqkwR007408@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2007-03-31 18:52 EST ------- In proposed change to py_kcluster in this patch, using npy_generic_dimension_t for nrows, ncolumns will cause a bug. Depending on the platform, npy_genetic_dimension_t may not be an int. But the function kcluster, which is called from py_kcluster, expects nrows, ncolumns to be an int. If, for example, npy_genetic_dimension_t is an 8-byte long, nrows will be truncated to a 4-byte int in the call to kcluster. So kcluster may get an incorrect number for nrows, ncolumns. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From edschofield at gmail.com Fri Mar 2 17:52:44 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 2 Mar 2007 17:52:44 +0000 Subject: [Biopython-dev] Offer to convert BioPython to NumPy Message-ID: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> Hi everyone, Last month there was some interest expressed on this list (and the general discussion list) about conversion of the codebase from Numeric to NumPy. I'd like to volunteer to lead this effort. I'm a (minor) NumPy developer and a (less minor) SciPy developer. My main contributions to SciPy have been the maximum entropy module and parts of the sparse matrix module. I've recently moved into the field of computational biology, and I'm happy to see that BioPython exists; I am sure it will save me time. But I don't want to go back to using Numeric, since NumPy is so much better (and is now the only array package supported by SciPy). I think trying to retain compatibility with Numeric would be unrealistic. But I would hope that a transition to a NumPy-only codebase would be quick (a week or so). If there are any technical problems we are sure to get a quick response on the numpy-discussion list. Bruce, if you're still interested in helping with the porting, we could split up the work. I suggest that we make our changes in a new CVS branch. That way our changes would be unintrusive until the patch-set is ready and tested. -- Ed From mdehoon at c2b2.columbia.edu Fri Mar 2 18:10:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 02 Mar 2007 13:10:50 -0500 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> Message-ID: <45E868AA.60409@c2b2.columbia.edu> Ed Schofield wrote: > Last month there was some interest expressed on this list (and the general > discussion list) about conversion of the codebase from Numeric to NumPy. I'd > like to volunteer to lead this effort. Thanks! I'd be happy to see Biopython to get to work with numpy. > > I think trying to retain compatibility with Numeric would be unrealistic. Why is that so? For example, matplotlib happily supports Numeric, NumPy, and numarray. Given that the latest version of NumPy does not compile out of the box on Cygwin, I'd be very hesitant to drop Numeric support for Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bsouthey at gmail.com Fri Mar 2 20:36:05 2007 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 2 Mar 2007 14:36:05 -0600 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <45E868AA.60409@c2b2.columbia.edu> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> Message-ID: Hi, That would be great because I have not had any time to look into this! Also, I realized that I need to understand more about BioPython first. However, most files appear to only need very small changes but virtually all the uses (except Bio/KDTree/KDTree.py) use the 'from Numeric import *'. Bruce Bio/LogisticRegression.py Bio/MarkovModel.py Bio/MaxEntropy.py Bio/NaiveBayes.py Bio/distance.py Bio/kNN.py Bio/Affy/CelFile.py Bio/Cluster/__init__.py Bio/KDTree/KDTree.py Bio/PDB/Atom.py Bio/PDB/Entity.py Bio/PDB/FragmentMapper.py Bio/PDB/MMCIFParser.py Bio/PDB/Model.py Bio/PDB/NeighborSearch.py Bio/PDB/PDBParser.py Bio/PDB/Polypeptide.py Bio/PDB/ResidueDepth.py Bio/PDB/Superimposer.py Bio/PDB/Superimposer.py Bio/PDB/Vector.py Bio/PDB/Vector.py Bio/SVDSuperimposer/SVDSuperimposer.py Bio/SVDSuperimposer/SVDSuperimposer.py Bio/Statistics/lowess.py On 3/2/07, Michiel Jan Laurens de Hoon wrote: > Ed Schofield wrote: > > Last month there was some interest expressed on this list (and the general > > discussion list) about conversion of the codebase from Numeric to NumPy. I'd > > like to volunteer to lead this effort. > Thanks! I'd be happy to see Biopython to get to work with numpy. > > > > I think trying to retain compatibility with Numeric would be unrealistic. > Why is that so? For example, matplotlib happily supports Numeric, NumPy, > and numarray. > > Given that the latest version of NumPy does not compile out of the box > on Cygwin, I'd be very hesitant to drop Numeric support for Biopython. > > --Michiel. > > > -- > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1130 St Nicholas Avenue > New York, NY 10032 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From edschofield at gmail.com Mon Mar 5 17:42:56 2007 From: edschofield at gmail.com (Ed Schofield) Date: Mon, 5 Mar 2007 17:42:56 +0000 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <45E89B41.1090508@c2b2.columbia.edu> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> <1b5a37350703021124n6e7b759cg6d0ee69e3cd3a68e@mail.gmail.com> <45E89B41.1090508@c2b2.columbia.edu> Message-ID: <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> On 3/2/07, Michiel Jan Laurens de Hoon wrote: > > > If this is really necessary, the easiest way to proceed may be to use > > NumPy's "oldnumeric" interface, rather than porting properly. > > For the reasons above, this is really necessary. But will oldnumeric be > around in the future? Or is it a temporary measure to making porting > easier? I think Travis wants to keep the oldnumeric interface around for at least a few years -- long enough, I imagine, for most actively developed projects to have been ported to NumPy. I've started work on a simple wrapper layer for Biopython to use either Numeric or numpy.oldnumeric. I'll post more details soon. From mdehoon at c2b2.columbia.edu Mon Mar 5 18:17:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 05 Mar 2007 13:17:28 -0500 Subject: [Biopython-dev] Offer to convert BioPython to NumPy In-Reply-To: <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> References: <1b5a37350703020952j66a6a249ga53e059524782b7b@mail.gmail.com> <45E868AA.60409@c2b2.columbia.edu> <1b5a37350703021124n6e7b759cg6d0ee69e3cd3a68e@mail.gmail.com> <45E89B41.1090508@c2b2.columbia.edu> <1b5a37350703050942x27283412u3067c1284db0b0e0@mail.gmail.com> Message-ID: <45EC5EB8.2050708@c2b2.columbia.edu> Ed Schofield wrote: > I've started work on a simple wrapper layer for Biopython to use either > Numeric or numpy.oldnumeric. I'll post more details soon. Thanks Ed! I looked at the definitions in oldnumeric.h. It turned out that only two of them are actually used in Biopython: #define CONTIGUOUS NPY_CONTIGUOUS and #undef import_array #define import_array() { if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); } } So it appears that the compatibility problem of Biopython and numpy may not be as big as it seemed at first, at least as far as the C-code is concerned. About the import_array definition: Do you know why it appears in oldnumeric.h? As the exact same definition appears in numpy/core/code_generators/generate_array_api.py, I would think that there is no need for it in oldnumeric.h. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Mar 6 11:18:09 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 06 Mar 2007 11:18:09 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45E237A6.4040801@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> Message-ID: <45ED4DF1.90609@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> SequenceIterator(handle, format) >> SequencesToDict(sequences, key_function=None) >> SequencesToAlignment(sequences, ...) >> WriteSequences(sequences, handle, format) >> >> Does anyone want to suggest different names for these functions? Only Michiel has replied, so I assume there are no other strong views on the dev mailing list. Michiel de Hoon wrote: > I would prefer >>>> from Bio import SeqIO >>>> SeqIO.read(handle, format) >>>> SeqIO.write(sequences, handle, format) You have persuaded me to rename SequenceIterator and WriteSequences. BioPython uses the term "parser" all over the place. Doing a quick search of the code, I found ten files with "def read(" and fifty eight with "def parse(" - so I would rather have "parse" than "read". That would give us the core functions: Bio.SeqIO.parse(handle, format) Bio.SeqIO.write(sequences, handle, format) or, Bio.SeqIO.read(handle, format) Bio.SeqIO.write(sequences, handle, format) I'll let you [Michiel] make the call. Say the word and I'll update the code and the wiki today. Are you happy with Bio.SeqIO.SequencesToDict(...) name? I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time being, until we do some work on the Bio.Align class. I don't think we should tackle this before the next release. I'm happy to document this particular function as "experimental/beta" and liable to be removed or replaced in future. After the renaming, I would say the Bio.SeqIO code is OK for release. After BioPython 1.43 is out, I would like to mark the old code in Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. Peter From lpritc at scri.ac.uk Tue Mar 6 11:45:01 2007 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 06 Mar 2007 11:45:01 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <1173181501.19889.79.camel@lplinuxdev.scri.sari.ac.uk> On Tue, 2007-03-06 at 11:18 +0000, Peter wrote: > That would give us the core functions: > > Bio.SeqIO.parse(handle, format) > Bio.SeqIO.write(sequences, handle, format) > > or, > > Bio.SeqIO.read(handle, format) > Bio.SeqIO.write(sequences, handle, format) > > I'll let you [Michiel] make the call. Say the word and I'll update the > code and the wiki today. +1 for Bio.SeqIO.parse(handle, format) - I too think it's more consistent with the existing parser behaviours. L. -- Dr Leighton Pritchard AMRSC D131, Plant Pathology, Scottish Crop Research Institute W: http://bioinf.scri.ac.uk/lp E: lpritc at scri.ac.uk GPG: 0xE58BA41B _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From mdehoon at c2b2.columbia.edu Tue Mar 6 18:40:30 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 06 Mar 2007 13:40:30 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <45EDB59E.2000407@c2b2.columbia.edu> Peter wrote: > That would give us the core functions: > > Bio.SeqIO.parse(handle, format) > Bio.SeqIO.write(sequences, handle, format) That sounds good to me. > I'll let you [Michiel] make the call. Say the word and I'll update the > code and the wiki today. Just to avoid any misunderstanding: While I have been in charge of building the Biopython releases, unfortunately that doesn't come with any official decision-making power :-(. > Are you happy with Bio.SeqIO.SequencesToDict(...) name? Well I think that this function is not so essential as Bio.SeqIO.parse and Bio.SeqIO.write. So I'll let you decide. > I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time > being, until we do some work on the Bio.Align class. I don't think we > should tackle this before the next release. I'm happy to document this > particular function as "experimental/beta" and liable to be removed or > replaced in future. OK. > After the renaming, I would say the Bio.SeqIO code is OK for release. OK then I'll try for the Bronx-release (1.43) for sometime during next week. If we find some issues with the new code after this release, we can do another release (code-named Queens) shortly after. I'll get started on updating the documentation for the new Bio.Blast parsers. > After BioPython 1.43 is out, I would like to mark the old code in > Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. As far as I'm concerned, you can also deprecate them before this release. This will encourage people to start using Bio.SeqIO, and improve our changes on finding any remaining problems. Thanks for all your work on Bio.SeqIO, and for involving the Biopython community in its development. I think Bio.SeqIO is a major improvement for Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Mar 6 22:31:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 06 Mar 2007 22:31:44 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EDB59E.2000407@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> Message-ID: <45EDEBD0.1060000@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: >> That would give us the core functions: >> >> Bio.SeqIO.parse(handle, format) >> Bio.SeqIO.write(sequences, handle, format) > > That sounds good to me. Done. I have also updated the wiki: http://www.biopython.org/wiki/SeqIO >> Are you happy with Bio.SeqIO.SequencesToDict(...) name? > > Well I think that this function is not so essential as Bio.SeqIO.parse > and Bio.SeqIO.write. So I'll let you decide. > >> I think we should keep Bio.SeqIO.SequencesToAlignment(...) for the time >> being, until we do some work on the Bio.Align class. I don't think we >> should tackle this before the next release. I'm happy to document this >> particular function as "experimental/beta" and liable to be removed or >> replaced in future. > > OK. I was thinking tonight, after updating CVS, that perhaps we should try and find some shorter (lower case) names for "SequencesToDict" and "SequencesToAlignment"... something like "toDict" and "toAlignment", or "as_dict" and "as_alignment" might looks nicer. e.g. from Bio import SeqIO my_dict = SeqIO.toDict(SeqIO.parse(handle, format)) rather than this, which looks clumsy and inconsistent: from Bio import SeqIO my_dict = SeqIO.SequencesToDict(SeqIO.parse(handle, format)) >> After the renaming, I would say the Bio.SeqIO code is OK for release. > > OK then I'll try for the Bronx-release (1.43) for sometime during next > week. If we find some issues with the new code after this release, we > can do another release (code-named Queens) shortly after. I have started looking over the other existing sequence parsers in BioPython with a view to adding some of them into the SeqIO framework (after the Bronx 1.43 release): http://www.biopython.org/wiki/SeqIO_dev Note to self (or anyone bored), I should actually write something about the SeqRecord class: http://www.biopython.org/wiki/SeqRecord >> After BioPython 1.43 is out, I would like to mark the old code in >> Bio/SeqIO/FASTA.py and Bio/SeqIO/generic.py as depreciated. > > As far as I'm concerned, you can also deprecate them before this > release. This will encourage people to start using Bio.SeqIO, and > improve our changes on finding any remaining problems. True - but I will be away for a bit (end of March, early April) so I wouldn't like encourage too many people, and then not be here to help them. Maybe I should try and draft something for the release notes, along the lines of "Beta software - please try it and give us feedback". Peter From mdehoon at c2b2.columbia.edu Wed Mar 7 03:04:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 06 Mar 2007 22:04:52 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EDEBD0.1060000@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> Message-ID: <45EE2BD4.7090400@c2b2.columbia.edu> Peter wrote: > I was thinking tonight, after updating CVS, that perhaps we should try > and find some shorter (lower case) names for "SequencesToDict" and > "SequencesToAlignment"... something like "toDict" and "toAlignment", or > "as_dict" and "as_alignment" might looks nicer. e.g. > > from Bio import SeqIO > my_dict = SeqIO.toDict(SeqIO.parse(handle, format)) ... There may be a simple solution to this. Note that a dictionary can be created by specifying a list of [key, value] pairs: >>> dict([['a','A'],['b','B'],['c','C']]) {'a': 'A', 'c': 'C', 'b': 'B'} This also works with an iterator: >>> def f(text): for character in text: yield [character, character.upper()] >>> dict(f("abcd")) {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'} Now, if we let SeqRecord inherit from list, we can make it behave as a [record.id, record] list. Normally, this would not be visible to the user, in the sense that a user who doesn't know that SeqRecord inherits from list wouldn't notice that it does. The upshot is that we can now create a dictionary like this: >>> d = dict(SeqIO.parse(handle, format)) without any changes to Bio.SeqIO. Two things get lost here: 1) We can't have a key_function to change how to choose the key. 2) We're no longer checking if all keys are different. This can be fixed by saving the keys in the parser function and raising an exception if two identical keys are found. This implies though that the same exception is raised in all use cases of SeqIO.parse, which may not be what we want. --Michiel From chris.lasher at gmail.com Wed Mar 7 04:00:01 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 6 Mar 2007 23:00:01 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45ED4DF1.90609@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> Message-ID: <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> On 3/6/07, Peter wrote: > Only Michiel has replied, so I assume there are no other strong views on > the dev mailing list. If you're still soliciting opinions, here's mine: I am really fond of the SeqIO.parse function, rather than SeqIO.read, since read is a builtin function. Different namespaces, but parse is unambiguous. For remaining function/method names, I would really prefer to stick to the PEP 8 style guide, which specifies: > Function Names > > Function names should be lowercase, with words separated by underscores > as necessary to improve readability. > > mixedCase is allowed only in contexts where that's already the > prevailing style (e.g. threading.py), to retain backwards compatibility. e.g., to_dict() instead of toDict(). Thanks again for the work on SeqIO! Chris From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 10:35:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 10:35:05 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <128a885f0703062000m20370027pc63e7fd858ae12e2@mail.gmail.com> Message-ID: <45EE9559.2020500@maubp.freeserve.co.uk> Chris Lasher wrote: > On 3/6/07, Peter wrote: >> Only Michiel has replied, so I assume there are no other strong views on >> the dev mailing list. > > If you're still soliciting opinions, here's mine: I am really fond of > the SeqIO.parse function, rather than SeqIO.read, since read is a > builtin function. Different namespaces, but parse is unambiguous. We have gone for Bio.SeqIO.parse and Bio.SeqIO.write Is read really a built in function? Its not on this list: http://docs.python.org/lib/built-in-funcs.html > For remaining function/method names, I would really prefer to stick to > the PEP 8 style guide, which specifies: > >> Function Names >> >> Function names should be lowercase, with words separated by underscores >> as necessary to improve readability. >> >> mixedCase is allowed only in contexts where that's already the >> prevailing style (e.g. threading.py), to retain backwards compatibility. > > e.g., to_dict() instead of toDict(). Any views on to_dict versus as_dict, to_alignment versus as_alignment? As an aside, we should really have a "coding styles" page on the wiki somewhere, and by default I would also reference PEP 8: http://www.python.org/dev/peps/pep-0008/ (And I should probably go through the new SeqIO code and make sure it complies!) Peter From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 10:43:36 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 10:43:36 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EE2BD4.7090400@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> Message-ID: <45EE9758.1050902@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Note that a dictionary can be created by specifying a list of [key, > value] pairs: > > >>> dict([['a','A'],['b','B'],['c','C']]) > {'a': 'A', 'c': 'C', 'b': 'B'} > > This also works with an iterator: > >>> def f(text): > for character in text: > yield [character, character.upper()] > >>> dict(f("abcd")) > {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'} > > Now, if we let SeqRecord inherit from list, we can make it behave as a > [record.id, record] list. Normally, this would not be visible to the > user, in the sense that a user who doesn't know that SeqRecord inherits > from list wouldn't notice that it does. > > The upshot is that we can now create a dictionary like this: > >>> d = dict(SeqIO.parse(handle, format)) > without any changes to Bio.SeqIO. That is clever... > Two things get lost here: > 1) We can't have a key_function to change how to choose the key. > 2) We're no longer checking if all keys are different. This can be fixed > by saving the keys in the parser function and raising an exception if > two identical keys are found. This implies though that the same > exception is raised in all use cases of SeqIO.parse, which may not be > what we want. Sadly not ideal. Also, wouldn't this prevent us making a SeqRecord inherit from Seq (another interesting idea you proposed in the past)? And for Seq objects, they could behave a little more like a string, or a list of letters. It might be nice to be able to splice a SeqRecord and get a new SeqRecord with the appropriate sub-sequence... I have been thinking about a "RichSeqRecord" subclass of SeqRecord which would support sequence level annotation (e.g. secondary structure). In this situation, when requesting a sub record, the appropriate sub set of the secondary structure information should also be extracted. e.g. The pfam/stockholm alignment format can hold strings the same length as the sequences which contain "per sequence per character" information like secondary structure. We could also load a PDB file in this way, and provide a list of residue objects (including the atom coordinates) in parallel with the sequence. Peter From mdehoon at c2b2.columbia.edu Wed Mar 7 20:50:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 07 Mar 2007 15:50:59 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EE9758.1050902@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> Message-ID: <45EF25B3.1050206@c2b2.columbia.edu> Peter wrote: >> The upshot is that we can now create a dictionary like this: >> >>> d = dict(SeqIO.parse(handle, format)) >> without any changes to Bio.SeqIO. > > That is clever... > >> Two things get lost here: >> 1) We can't have a key_function to change how to choose the key. >> 2) We're no longer checking if all keys are different. This can be >> fixed by saving the keys in the parser function and raising an >> exception if two identical keys are found. This implies though that >> the same exception is raised in all use cases of SeqIO.parse, which >> may not be what we want. > > Sadly not ideal. About 2): It may be a good idea to add a keyword allow_identical_keys (probably a better name is needed here), False by default, in SeqIO.parse to specify if SeqIO.parse should raise an exception if two records with an identical record.id are found. Whereas this is more of a problem when creating a dictionary, I think that this is also relevant in general. Note though that if SeqIO.parse checks for identical keys automatically, there is not much left to do for SeqIO.to_dict. Btw, a to_dict function may fit in better with Bio.SeqRecord, as it is not specifically related to sequence file IO. > Also, wouldn't this prevent us making a SeqRecord > inherit from Seq (another interesting idea you proposed in the past)? Not necessarily; there are two ways to avoid this: A) SeqRecord could inherit both from list and from Seq; B) Instead of letting SeqRecord inherit from list, we could add a next() and __iter__ method to the SeqRecord class (returning record.id and record, and then StopIteration); this will also let us create a dictionary with dict(SeqIO.parse(handle, format)). --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 7 23:16:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 07 Mar 2007 23:16:48 +0000 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EF25B3.1050206@c2b2.columbia.edu> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> <45EF25B3.1050206@c2b2.columbia.edu> Message-ID: <45EF47E0.9070009@maubp.freeserve.co.uk> I have renamed SequenceToDict and SequencesToAlignment as to_dict and to_alignment, which as Chris Lasher pointed out follows the PEP8 python style guide. While there may be better places for these to functions to live, leaving them in SeqIO seems reasonable to me. Still - if we do want to move them (or remove them) in the near future it would be better to do this before releasing BioPython 1.43 Other than that, I think Bio.SeqIO is "ready" for its first release. Michiel Jan Laurens de Hoon wrote: > It may be a good idea to add a keyword allow_identical_keys (probably a > better name is needed here), False by default, in SeqIO.parse to specify > if SeqIO.parse should raise an exception if two records with an > identical record.id are found. Whereas this is more of a problem when > creating a dictionary, I think that this is also relevant in general. I'm not very keen on this "allow_identical_keys" option for SeqIO.parse() However, I think we could do that in the SeqIO.parse function itself (rather than repeating the code many times for each underlying parser). One catch is that the exception would get raised once a duplicate is found - possibly after the user has already processed the first half of the file. >> Also, wouldn't this prevent us making a SeqRecord >> inherit from Seq (another interesting idea you proposed in the past)? > > Not necessarily; there are two ways to avoid this: > A) SeqRecord could inherit both from list and from Seq; > B) Instead of letting SeqRecord inherit from list, we could add a next() > and __iter__ method to the SeqRecord class (returning record.id and > record, and then StopIteration); this will also let us create a > dictionary with dict(SeqIO.parse(handle, format)). I think I didn't make myself clear. I wanted to reserve the __iter__ method to the SeqRecord class for use like this: for residue in record : #assuming residue this is also a SeqRecord object print residue.seq.tostring() and similarly for __iter__ of a Seq class: for residue in seq : #assuming residue is also a Seq object, print residue.tostring() To me this syntax seems very natural, but does seem to block your clever dict() plan. Peter From mdehoon at c2b2.columbia.edu Thu Mar 8 16:17:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 08 Mar 2007 11:17:58 -0500 Subject: [Biopython-dev] Bio.SeqIO In-Reply-To: <45EF47E0.9070009@maubp.freeserve.co.uk> References: <45E0CD30.4060108@maubp.freeserve.co.uk> <45E237A6.4040801@c2b2.columbia.edu> <45ED4DF1.90609@maubp.freeserve.co.uk> <45EDB59E.2000407@c2b2.columbia.edu> <45EDEBD0.1060000@maubp.freeserve.co.uk> <45EE2BD4.7090400@c2b2.columbia.edu> <45EE9758.1050902@maubp.freeserve.co.uk> <45EF25B3.1050206@c2b2.columbia.edu> <45EF47E0.9070009@maubp.freeserve.co.uk> Message-ID: <45F03736.7080300@c2b2.columbia.edu> Peter wrote: > I have renamed SequenceToDict and SequencesToAlignment as to_dict and > to_alignment, which as Chris Lasher pointed out follows the PEP8 python > style guide. OK, fair enough. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Mar 8 16:47:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 08 Mar 2007 11:47:32 -0500 Subject: [Biopython-dev] Biopython release coming up Message-ID: <45F03E24.4080407@c2b2.columbia.edu> Hi everybody, The Biopython release (1.43, code-named "Bronx") is coming up. This release will include the new Bio.SeqIO code as well as the new Blast parser. I'm planning to finish the release during the weekend of March 17/18, so about ten days from now. Some files have been added or removed from Biopython recently, so it may be useful to checkout a fresh copy of Biopython from CVS. The Biopython tests in CVS all pass, so things are looking good. However, Bugzilla currently lists 17 bugs, so please have a look to see if there's something we can do about them. If you have some code sitting around, now would be a good time to commit it to CVS. However, if you are not sure if your code is ready for prime time, please hold off until after this release. Thanks everybody for your contributions to Biopython. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Thu Mar 8 18:29:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:29:25 -0500 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703081829.l28ITP45001122@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:29 EST ------- [I have tried to reassign this to the mailing list...] Could someone familar with BioSQL take a look at this please? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 18:45:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:45:47 -0500 Subject: [Biopython-dev] [Bug 2225] New: Do something with the PROJECT line in GenBank files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2225 Summary: Do something with the PROJECT line in GenBank files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk See also bug 1946 where the introduction of this line broke the parser. At the moment the project line is currently ignored. Quoting: ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt ------------------------------------------------- 1.4 Upcoming Changes 1.4.1 Multiple identifiers for the PROJECT line The recently-introduced PROJECT linetype (see Section 3.4.7.2) provides a way to link GenBank sequences that are part of a sequencing project to the Entrez Genome Project database, where further details about the project can be found. As of June 2007, multiple identifiers will be valid for the PROJECT line. Here is a mocked-up example of the expected usage: LOCUS AANA01000001 2 rc DNA linear BCT 09-FEB-2007 DEFINITION Polaribacter dokdonensis MED152 whole genome shotgun sequencing project. ACCESSION AANA01000001 VERSION AANA01000001.1 GI:85822094 PROJECT GenomeProject:13543 GenomeProject:99999 There are several situations in which a record could be considered part of two different Genome Projects. For example, consider an environmental-sampling metagenomic WGS project for which the individual sequence-overlap contigs are not attributed to a specific organism. A Genome Project could exist that provides further details about the sequencing effort, the centers involved, etc. If, in subsequent assembly and annotation phases, scaffold/super-contig/ chromosomal records are created which **are** attributed to a specific organism, then those CON-division records could have two Genome Project IDs: one for the WGS sequencing project as a whole; and a second for organism- specific Genome Projects. Additional examples illustrating the use of multiple Genome Project IDs will be provided in future release notes, and via the GenBank listserv. ------------------------------------------------- End quote For the RecordParser, storing this line as a string should be fine (?) However, for the FeatureParser, which turns the data into a SeqRecord, perhaps this data should be held in the annotation as a list of strings: ['GenomeProject:13543', 'GenomeProject:99999'] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 18:46:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:46:30 -0500 Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line type PROJECT In-Reply-To: Message-ID: <200703081846.l28IkUYF002586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1946 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:46 EST ------- You can download the example here, which the reporter saved as 'bug.gb': http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi??db=nucleotide&val=NC_001416 Using CVS, the original sample script now runs fine: from Bio import GenBank feature_parser = GenBank.FeatureParser() gb_record = feature_parser.parse(open('bug.gb','r')) As does something similar using Bio.SeqIO, e.g. from Bio import SeqIO records = list(SeqIO.parse(open('bug.gb','rU'),"genbank")) assert len(records) == 1 gb_record = records[0] In both cases, the new GenBankScanner class in Bio/GenBank/Scanner.py will silently ignore the "PROJECT" line, unless run in debug mode. I have filed Bug 2225 to cover doing something useful with the project data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 8 18:51:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Mar 2007 13:51:10 -0500 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <200703081851.l28IpApE003050@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-08 13:51 EST ------- Created an attachment (id=583) --> (http://bugzilla.open-bio.org/attachment.cgi?id=583&action=view) Marc's frameTranslations.py rescued from the mailing list 15 May 2006 Quote: > Man this bugzilla doesn't have an option to up load files. > The file can be found on the dev-list. Actually it does, but oddly you can only attach files once the bug has been created. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Mar 8 19:13:20 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 08 Mar 2007 19:13:20 +0000 Subject: [Biopython-dev] Administration of Bugzilla Message-ID: <45F06050.6050404@maubp.freeserve.co.uk> Do any of our regular readers have administrator access to BugZilla? The "version" field should really be updated to include at least 1.42 (the current release) and 1.43 (due soon). At the moment we have the somewhat dated choices of 1.00a4, 1.10, 1.24 and "Not applicable". While we are at it, I would also like to extend the "Components" list. Adding entries for the PDB, Nexus, and SeqIO (or maybe "Sequence Files" in general) might be nice. Peter From edschofield at gmail.com Fri Mar 9 18:35:51 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 9 Mar 2007 18:35:51 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython Message-ID: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> On 3/5/07, Ed Schofield wrote: > > > I've started work on a simple wrapper layer for Biopython to use either > Numeric or numpy.oldnumeric. I'll post more details soon. > I've now finished the first version of a patch to add support for NumPy in addition to Numeric. I'll try to attach it to this message; you can also get it from http://edschofield.com/biopython-numpy-support.patch. It applies cleanly to the current CVS version using: patch -p0 < biopython-numpy-support.patch in the root biopython/ directory. The main difficulty I had to overcome was with C extensions, particularly the Cluster module. This is because NumPy defines array dimensions and strides as intp types, whereas Numeric defines them as int, which differs on 64-bit platforms. Some tests fail; these failures all result from overly fragile expectations on the formatting of the output. The tests should be updated, but I haven't done this with this patch. All other tests pass with both NumPy and Numeric on my machine. MarkovModel.py had a bug in its setting of p_initial, p_transition, and p_emission; it made an incorrect assumption about the behaviour of the Python logical "or" operation when applied to Numeric arrays, which is somewhat broken. I've tried to fix it, but someone familiar with MarkovModel.py should look over the relevant lines (176-184) to be sure I haven't changed the intended behaviour. I'd like to continue contributing to BioPython. Whom should I contact about CVS write access? -- Ed -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython-numpy-support.patch Type: text/x-patch Size: 48870 bytes Desc: not available URL: From mdehoon at c2b2.columbia.edu Fri Mar 9 18:51:55 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 13:51:55 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> Message-ID: <45F1ACCB.1030704@c2b2.columbia.edu> Ed Schofield wrote: > I've now finished the first version of a patch to add support for NumPy in > addition to Numeric. I'll try to attach it to this message; you can also > get > it from http://edschofield.com/biopython-numpy-support.patch. ... Thanks, Ed. Quick question: The patch #includes numpy/oldnumeric.h for Python <--> C glue code that uses Numeric. Why is this needed? --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From edschofield at gmail.com Fri Mar 9 19:22:13 2007 From: edschofield at gmail.com (Ed Schofield) Date: Fri, 9 Mar 2007 19:22:13 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> Message-ID: <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> On 3/9/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > > I've now finished the first version of a patch to add support for NumPy > in > > addition to Numeric. I'll try to attach it to this message; you can also > > get > > it from http://edschofield.com/biopython-numpy-support.patch. ... > > Thanks, Ed. > > Quick question: > The patch #includes numpy/oldnumeric.h for Python <--> C glue code that > uses Numeric. Why is this needed? For the CONTIGUOUS and import_array() definitions right now. As you pointed out earlier in the thread, these are only used a couple of times. But including the header is simpler than pasting these two definitions into the C source files and should maximize compatibility should they be extended further in the future. I haven't yet got around to answering your question about import_array() ... Do you know why it appears in oldnumeric.h? As the exact same definition > appears in numpy/core/code_generators/generate_array_api.py, I would > think that there is no need for it in oldnumeric.h . > ... because I don't know how the code generation works within NumPy. But I don't think extension modules will ever use NumPy's internal code generators; they just need the headers. -- Ed From mdehoon at c2b2.columbia.edu Fri Mar 9 20:22:48 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 15:22:48 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> Message-ID: <45F1C218.6000601@c2b2.columbia.edu> Ed Schofield wrote: >> Quick question: >> The patch #includes numpy/oldnumeric.h for Python <--> C glue code that >> uses Numeric. Why is this needed? > > For the CONTIGUOUS and import_array() definitions right now. For the CONTIGUOUS definition, there's a simpler solution. CONTIGUOUS is currently only used in Bio/Cluster/clustermodule.h, and it's only used as follows: PyArrayObject* array; if (array->flags & CONTIGUOUS)... Now, there is a macro in arrayobject.h, both in Numeric and in NumPy, that deals exactly with this situation: In Numeric: #define PyArray_ISCONTIGUOUS(m) ((m)->flags & CONTIGUOUS) In numpy (imported via ndarrayobject.h): #define PyArray_ISCONTIGUOUS(m) PyArray_CHKFLAGS(m, NPY_CONTIGUOUS) So, if we use this macro instead of CONTIGUOUS directly, we can avoid using oldnumeric.h. Or am I missing something? > I haven't yet got around to answering your question about import_array() ... > ... because I don't know how the code generation works within NumPy. Yeah I know, I don't think that code generation in NumPy was a good idea. It makes it too hard to figure out what is going on. > But I don't think extension modules will ever use NumPy's > internal code generators; they just need the headers. I think so too. NumPy itself actually calls import_array without #including oldnumeric.h. For example, see numpy/random/mtrand/mtrand.c. So we too should be fine without oldnumeric.h. But it might be good to check this with the NumPy folks. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Fri Mar 9 21:57:15 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 16:57:15 -0500 Subject: [Biopython-dev] Documentation for new Blast parser Message-ID: <45F1D83B.20208@c2b2.columbia.edu> Hi everybody, For the upcoming Biopython release, I rewrote the chapter on Blast in the tutorial to describe our new Blast parser. For those of you who want to have a preview, I put a copy here: http://biopython.org/DIST/docs/tutorial/Tutorial-new.html Please let me know if you have any comments, or if you find any errors or omissions. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From edschofield at gmail.com Sat Mar 10 01:25:37 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sat, 10 Mar 2007 01:25:37 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F1C218.6000601@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> Message-ID: <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> On 3/9/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > >> Quick question: > >> The patch #includes numpy/oldnumeric.h for Python <--> C glue code that > >> uses Numeric. Why is this needed? > > > > For the CONTIGUOUS and import_array() definitions right now. > > For the CONTIGUOUS definition, there's a simpler solution. > > [snip] > > So, if we use this macro instead of CONTIGUOUS directly, we can avoid > using oldnumeric.h. Or am I missing something? Yeah, sure, but why would we want to avoid using oldnumeric.h? Perhaps you're assuming this is the 'oldnumeric' compatibility layer that I mentioned earlier. The C header is actually only a very small part of it; the meat of it is in the numpy.oldnumeric module imported by the Python code, which we're inextricably bound to using as long as we preserve Numeric support. > I haven't yet got around to answering your question about import_array() > ... > > ... because I don't know how the code generation works within NumPy. > > Yeah I know, I don't think that code generation in NumPy was a good > idea. It makes it too hard to figure out what is going on. Well, that might be too harsh a judgment. Remember, the code generation is only for the internals -- I don't think it's something extension writers should need to know or worry about... -- Ed From mdehoon at c2b2.columbia.edu Sat Mar 10 03:25:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 09 Mar 2007 22:25:50 -0500 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> Message-ID: <45F2253E.2030909@c2b2.columbia.edu> Ed Schofield wrote: > On 3/9/07, Michiel Jan Laurens de Hoon wrote: >> So, if we use this macro instead of CONTIGUOUS directly, we can avoid >> using oldnumeric.h. Or am I missing something? > > Yeah, sure, but why would we want to avoid using oldnumeric.h? Why #include oldnumeric.h if we don't need it? The fewer changes we need to make to Biopython and the cleaner we can keep the code, the better. I see no justification for #including an unnecessary header file. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Sat Mar 10 04:06:39 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Mar 2007 23:06:39 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> Message-ID: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> On 10/9/06, Chris Lasher wrote: > Anybody know if BioPython (I suppose all Open Bio projects) will > switch over to Subversion, and if so, when? I think the merits and > advantages of Subversion over CVS speak for themselves. It's certainly > become my revision control system of preference. Anybody else's? I'm raising this issue again. I did some digging and found an Open-Bio mailing list thread from 2006 that states: > We have a new machine just for everyone on this list. It is called > "dev.open-bio.org" and it will be the new home for developers using > CVS as well as the people who want to switch over to using Subversion. The full thread is available from . Considering that Subversion addresses the weaknesses of CVS, I'm surprised that every Open-Bio project still runs from CVS instead of Subversion. Someone's got to lead the way, why not have it be BioPython? We've had a lot of active development as of late, with SeqIO, and now with NumPy transitioning. Either of those cases would have been better tracked through Subversion, which keeps tallies of revisions on a repository-wide basis, rather than a file-by-file basis. I've also found that researchers new to revision control have had great success in picking up and using Subversion in our local Software Carpentry group. I have also created a screencast on using Subversion which demonstrates all the basic commands and activities. This screencast is available in AVI (MPEG4) and OGG formats at I hope the BioPython developers will consider a move to Subversion seriously. If there is support from the devs, but no interest on anyone's part to make it happen, given the proper people to contact, I will be happy to get this moving as a way of contributing back to the BioPython community. Best, Chris From chris.lasher at gmail.com Sat Mar 10 04:09:58 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Mar 2007 23:09:58 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> Message-ID: <128a885f0703092009g4cb20a5ey668d5763613db35e@mail.gmail.com> On 3/9/07, Chris Lasher wrote: > I have also created a screencast on using Subversion > which demonstrates all the basic commands and activities. This > screencast is available in AVI (MPEG4) and OGG formats at > > I need to double check my links! The CORRECT links to the *SUBVERSION* screencasts are as follows: My apologies, Chris From mdehoon at c2b2.columbia.edu Sat Mar 10 04:36:07 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Fri, 09 Mar 2007 23:36:07 -0500 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> Message-ID: <45F235B7.6000409@c2b2.columbia.edu> Chris Lasher wrote: > I hope the BioPython developers will consider a move to Subversion > seriously. If there is support from the devs, but no interest on > anyone's part to make it happen, given the proper people to contact, I > will be happy to get this moving as a way of contributing back to the > BioPython community. I know very little about CVS and Subversion (which is why I didn't respond to your original post). But I did notice that a lot of software projects are using Subversion instead of CVS nowadays, including Python itself. So I don't have any objections against Biopython moving to Subversion as well (unfortunately, I cannot be very help here either). If we are moving to Subversion though, I'd like to ask you not to make any changes until the next Biopython release comes out, which will be in about one week from now. --Michiel. From edschofield at gmail.com Sat Mar 10 10:40:55 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sat, 10 Mar 2007 10:40:55 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F2253E.2030909@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> Message-ID: <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> On 3/10/07, Michiel Jan Laurens de Hoon wrote: > > Ed Schofield wrote: > > On 3/9/07, Michiel Jan Laurens de Hoon > wrote: > >> So, if we use this macro instead of CONTIGUOUS directly, we can avoid > >> using oldnumeric.h. Or am I missing something? > > > > Yeah, sure, but why would we want to avoid using oldnumeric.h? > > Why #include oldnumeric.h if we don't need it? The fewer changes we need > to make to Biopython and the cleaner we can keep the code, the better. I > see no justification for #including an unnecessary header file. It's a minor issue, but I can see several reasons to use the header file NumPy provides for the purpose, rather than pasting its definitions into our own source files: - because, by isolating the NumPy definitions from the BioPython source files, it leads to code that's shorter overall and (IMHO) simpler - because we may have C extensions in the future that need other parts of the compatibility interface - because any future bugfixes or changes to NumPy's oldnumeric.h would then be picked up automatically Would we write helloworld.c like this? int printf(const char * __restrict, ...) __DARWIN_LDBL_COMPAT(printf); int main() { printf("Hello, world!\n"); } ;) I can understand if you're wanting to make an early start on the porting process to remove the dependence on the oldnumeric compatibility layer entirely. But in this case I don't think it's worth it; a full port to NumPy's native interfaces would break Numeric compatibility, which you're committed to keeping for some time yet. The oldnumeric interface won't be a hindrance for BioPython's users anyway -- with my patch they can either use Numeric or uninstall it entirely and instead pass native NumPy arrays between BioPython and other packages like SciPy, Matplotlib and PyTables. -- Ed From biopython-dev at maubp.freeserve.co.uk Sat Mar 10 12:24:33 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Mar 2007 12:24:33 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F1D83B.20208@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> Message-ID: <45F2A381.2080303@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Hi everybody, > > For the upcoming Biopython release, I rewrote the chapter on Blast in > the tutorial to describe our new Blast parser. For those of you who want > to have a preview, I put a copy here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > Please let me know if you have any comments, or if you find any errors > or omissions. Good work - I've made a few notes. -------------------------------------------------------------------- In section "3.1 Running BLAST locally" I would also stress the fact that this is your only choice if you are using "private" data, for example unpublished data from a company. e.g. add something like this after the "two advantages" of running BLAST locally: Another reason to run blast locally is if you are dealing with proprietary or unpublished sequence data. You may not be allowed to redistribute the sequences, so submitting them to the NCBI as a blast query would not be an option. -------------------------------------------------------------------- In section "3.1 Running BLAST locally" the wording about the location of the database files could be a little clearer. You wrote the following (which I have reformatted to use shorter lines): >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis in the # directory /home/mdehoon/Data/Genomes/Databases. # The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin, # and bsubtilis.nsq in this directory. You talk about four files, but only name three of them. I also found the path to be a little unclear... I think you meant this: >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis # (for Bacillus subtilis) consisting of the following four files: # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nhr # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nin # /home/mdehoon/Data/Genomes/Databases/bsubtilis.nsq # /home/mdehoon/Data/Genomes/Databases/bsubtilis.??? rather than the file being inside a subdirectory, bsubtilis, like this: >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" # I used formatdb to create a BLAST database named bsubtilis # (for Bacillus subtilis) consisting of the following four files: # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nhr # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nin # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nsq # /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.??? -------------------------------------------------------------------- I think you should include an explicit example of running standalone blast and getting XML files back, i.e. include this at the end of section 3.1 (rather than just mentioning it): >>> from Bio.Blast import NCBIStandalone >>> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, \ 'blastn', my_blast_db, my_blast_file, align_view=7) I am wondering if now is a good time to switch the default output format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc given NCBIWWW.qblast already defaults to XML. ---------------------------------------------------------------------- There is an extra "the" at the end of the first paragraph of section "3.4 Parsing BLAST output": "..., it is also much easier to parse automatically, making the Biopython a whole lot more stable." Should read: "..., it is also much easier to parse automatically, making Biopython a whole lot more stable." Also should it be "Biopython" or "BioPython"? The website uses a mixture... ----------------------------------------------------------------------- This email is getting a bit long - I'll read the rest of the document later. Peter From biopython-dev at maubp.freeserve.co.uk Sat Mar 10 14:03:10 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Mar 2007 14:03:10 +0000 Subject: [Biopython-dev] SProt and new lines issue In-Reply-To: <45F1F7CC.1010404@maubp.freeserve.co.uk> References: <45F03E24.4080407@c2b2.columbia.edu> <45F1F7CC.1010404@maubp.freeserve.co.uk> Message-ID: <45F2BA9E.2020206@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Some files have been added or removed from Biopython recently, so it > may be useful to checkout a fresh copy of Biopython from CVS. The > Biopython tests in CVS all pass, so things are looking good. I just got a fresh copy of CVS on my Windows machine, and discovered that test_SeqIO fails on a SwissProt file (however test_SProt is OK), specifically: biopython/Tests/SwissProt/sp007 It turns out that test_SeqIO opens file in mode "rU" (reading, universal newline mode) while test_SProt opens files in normal read mode. For some reason the sp007 file causes trouble in universal newline mode. Making test_SProt also use "rU" shows the same error as seen in test_SeqIO. I can "fix" the (Windows only?) error by either opening the files in normal read mode, or by running unix2dos on the input file. Very odd. This also suggests the example file sp007 is not stored in CVS as a text file, but as a binary file. Peter P.S. I did some work on test_SProt to compare the results of its RecordParser() and SeqenceParser() and to do a basic test of the Iterator() - we should add a multi-record example test case too. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 20:32:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:32:15 -0500 Subject: [Biopython-dev] [Bug 2227] New: Writing Nexus files with Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2227 Summary: Writing Nexus files with Bio.SeqIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I would like to be able to write multiple sequence alignments in Nexus format, using Bio.SeqIO (and possibly also Bio.Nexus). I have tried to do this by creating a nexus object in code (from an existing alignment) using the add_sequence() method, with the intention of then calling its write_nexus_data() method. However, it seems that add_sequence() is intended for use when the alignment matrix has already been created - not for building one from scratch. I will attached code for Bio.SeqIO to write a Nexus alignment WITHOUT using Bio.Nexus, which seemed easier. I would prefer to use Bio.Nexus to do this however... [This issue can wait till after we release BioPython 1.43] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 20:38:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:38:07 -0500 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200703102038.l2AKc7Lf014970@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-10 15:38 EST ------- I'm having trouble with bugzilla not accepting the attachment, which is a replacement for Bio/SeqIO/NexusIO.py This seems to work on my limited testing. However, the only "validation" I have done to date is checking that Bio.Nexus can read the alignments I create. Also, if the input records all have simple "generic" alphabets, the code cannot decide if they are protein, dna or rna - and raises a ValueError. Its not fool proof, but it might be better to look at the letters in the actual sequence at that point and "guess". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 10 20:41:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Mar 2007 15:41:09 -0500 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200703102041.l2AKf9UE015188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-10 15:41 EST ------- Created an attachment (id=584) --> (http://bugzilla.open-bio.org/attachment.cgi?id=584&action=view) replacement for Bio/SeqIO/NexusIO.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Mar 11 00:42:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 10 Mar 2007 19:42:01 -0500 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F2A381.2080303@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> Message-ID: <45F35059.6000102@c2b2.columbia.edu> Thanks, Peter! These are all good points. I've updated the tutorial following your suggestions. Peter wrote: >> For the upcoming Biopython release, I rewrote the chapter on Blast in >> the tutorial to describe our new Blast parser. For those of you who >> want to have a preview, I put a copy here: >> >> http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > I am wondering if now is a good time to switch the default output > format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc > given NCBIWWW.qblast already defaults to XML. I agree that these functions should return XML by default. Objections, anybody? If not, I'll make this change and update the tutorial accordingly. > -------------------------------------------------------------------- > > In section "3.1 Running BLAST locally" the wording about the location > of the database files could be a little clearer. > > You wrote the following (which I have reformatted to use shorter lines): > > >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis" > # I used formatdb to create a BLAST database named bsubtilis in the > # directory /home/mdehoon/Data/Genomes/Databases. > # The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin, > # and bsubtilis.nsq in this directory. > > You talk about four files, but only name three of them. There are only three files, so maybe my description is confusing and suggests that there should be four files. I've updated the Tutorial to show the full path of the three files. > Also should it be "Biopython" or "BioPython"? The website uses a > mixture... A long time ago it was decided that "Biopython" is the official name. Though BioPython and biopython also appear all over the place. Thanks again, Peter. --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sun Mar 11 01:08:56 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 01:08:56 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F35059.6000102@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> Message-ID: <45F356A8.6050706@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Thanks, Peter! These are all good points. I've updated the tutorial > following your suggestions. I see you have started to mention Bio.SeqIO in the Blast documentation - is this a hint to get me to update section 2.4 "Parsing biological file formats"? I see you are editing the CVS file biopython/Doc/Tutorial.tex - I am happy working with LaTex so that shouldn't be a problem. Link to ViewCVS if anyone is interested: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython I presume you run some flavour of latex to html on it, and then upload the file to website somehow... ---------------------------------------------------------------------- Back to the BLAST tutorial, quoting Section 3.1 Running BLAST locally: > Running BLAST locally (as opposed to over the internet, see Section > 3.2) has two advantages: > > * Local BLAST may be faster than BLAST over the internet; > > * Local BLAST allows you to make your own database to search for > sequences against. > > * Dealing with proprietary or unpublished sequence data can be > another reason to run BLAST locally. You may not be allowed to > redistribute the sequences, so submitting them to the NCBI as a BLAST > query would not be an option. Minor style point: Using the bullet points makes it look like three advantages (when the introduction says two). I wouldn't use a bullet point for the proprietary/unpublished data paragraph. ---------------------------------------------------------------------- > Peter wrote: >> I am wondering if now is a good time to switch the default output >> format to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast >> etc given NCBIWWW.qblast already defaults to XML. Michiel wrote: > I agree that these functions should return XML by default. > Objections, anybody? If not, I'll make this change and update the > tutorial accordingly. It will catch a few people out, but it seems best to do it now at the same time as the related Blast XML changes. Do the HTML and Text parsers spot when they are fed XML input, and give a helpful error message? Should we also mention this change in the DEPRECATED file? http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython Peter From mdehoon at c2b2.columbia.edu Sun Mar 11 02:40:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 10 Mar 2007 21:40:26 -0500 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F356A8.6050706@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> Message-ID: <45F36C1A.6080005@c2b2.columbia.edu> Peter wrote: > I see you have started to mention Bio.SeqIO in the Blast documentation - > is this a hint to get me to update section 2.4 "Parsing biological file > formats"? If you could do that, then that would be great. If you don't find the time for it, I can also help you by grabbing what you have on your wiki page. Feel free to change section 2.4 as you want -- I don't think what's there is still very relevant to Biopython, and it probably scares people off. > I see you are editing the CVS file biopython/Doc/Tutorial.tex - I am > happy working with LaTex so that shouldn't be a problem. Note that it is not exactly LaTeX, but LaTeX with hevea added. Let me know if you run into problems with hevea. > > I presume you run some flavour of latex to html on it, and then upload > the file to website somehow... Hevea takes care of this; see the Makefile in biopython/Doc. > Back to the BLAST tutorial, quoting Section 3.1 Running BLAST locally: ... > Minor style point: Using the bullet points makes it look like three > advantages (when the introduction says two). I wouldn't use a bullet > point for the proprietary/unpublished data paragraph. Ha, you're right. I should work on my counting skills. It's fixed now. Thanks again, --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sun Mar 11 15:39:07 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 15:39:07 +0000 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F36C1A.6080005@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> Message-ID: <45F4229B.9080401@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: >> I see you have started to mention Bio.SeqIO in the Blast documentation - >> is this a hint to get me to update section 2.4 "Parsing biological file >> formats"? > > If you could do that, then that would be great. If you don't find the > time for it, I can also help you by grabbing what you have on your wiki > page. Feel free to change section 2.4 as you want -- I don't think > what's there is still very relevant to Biopython, and it probably scares > people off. I have basically replaced that whole section in Tutorial.tex, and checked it looks fine using pdflatex and hevea on my linux machine. I have not updated the website - so getting feedback from might be a little tricky. Can you tell me how to do that please Michiel? Thanks Peter From bugzilla-daemon at portal.open-bio.org Sun Mar 11 17:44:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 13:44:58 -0400 Subject: [Biopython-dev] [Bug 2228] New: genbank parser should not print warnings to stdout Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2228 Summary: genbank parser should not print warnings to stdout Product: Biopython Version: 1.24 Platform: PC OS/Version: FreeBSD Status: NEW Severity: critical Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: markd at cse.ucsc.edu In general, it's not a good idea to print warnings from a library, as this is a an undocumented and unexpected output from the library API. However printing warnings to stdout is a serious bug, meaning programs using the library can't be used in a pipeline. There are four warning prints to stdout in Bio/GenBank/__init__.py. All of these should go to stderr, and probably be under the control of the debug option. Also, the warning: "WARNING - Unquoted multiline '%s' entry for %s feature with location %s" should be removed. This is not a WARNING, it's a failure of this module to correctly handle the poorly documented Genbank flat-file format -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 18:31:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 14:31:39 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111831.l2BIVdkE010082@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|critical |major Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 14:31 EST ------- I don't think you are using BioPython 1.24, but probably 1.42 (I know, the buzilla choices need updating). Since the release of BioPython 1.42, this area of code has been revised significantly (to also deal with EMBL files) and of those print statments have either vanished or are debug only. Have a look at new file Bio/GenBank/Scanner.py There is currently just one "evil print statement", triggered when faced with a "minimal LOCUS line". I will make this print to standard error instead... I would very much like to have some examples of this as test cases BTW. > Also, the warning: > "WARNING - Unquoted multiline '%s' entry for %s feature with location %s" > should be removed. It has already been removed. > This is not a WARNING, it's a failure of this module > to correctly handle the poorly documented Genbank flat-file format Unqoted multiline feature entries do exist "in the wild" but are in breach of my reading of the NCBI documentation. The parser handles them fine (but grumbled). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Mar 11 18:34:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 14:34:56 -0400 Subject: [Biopython-dev] Documentation for new Blast parser In-Reply-To: <45F4229B.9080401@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> Message-ID: <45F44BD0.6010000@c2b2.columbia.edu> Peter wrote: > Michiel Jan Laurens de Hoon wrote: >> Peter wrote: >>> I see you have started to mention Bio.SeqIO in the Blast >>> documentation - is this a hint to get me to update section 2.4 >>> "Parsing biological file formats"? ... > I have basically replaced that whole section in Tutorial.tex, and > checked it looks fine using pdflatex and hevea on my linux machine. For those of you who want to preview the new tutorial, it's available here: http://biopython.org/DIST/docs/tutorial/Tutorial-new.html This contains a description of the new Bio.SeqIO in section 2.4. --Michiel From bugzilla-daemon at portal.open-bio.org Sun Mar 11 18:52:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 14:52:18 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111852.l2BIqISQ010925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 14:52 EST ------- > There is currently just one "evil print statement", triggered > when faced with a "minimal LOCUS line". I will make this print > to standard error instead... Done, marking as fixed. You can update to CVS or wait for BioPython 1.43 which should be due this month. P.S. I would like to see some real life examples of GenBank files with "minimal LOCUS lines" to check Biopython does something sensible with them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 19:01:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 15:01:50 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111901.l2BJ1oGw011375@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 markd at cse.ucsc.edu changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |markd at cse.ucsc.edu ------- Comment #3 from markd at cse.ucsc.edu 2007-03-11 15:01 EST ------- thanks for the amazingly quick response!! What documentation page are you using for the genbank flat-file format? The only thing I have ever found is the sample record (http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html), which is not a satisfactory definition of the format. I can report the lack of documentation on using parenthesis to quote multi-line fields to NCBI. I have never seen an example of the `minimal LOCUS lines'; will submit on if I ever encounter it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 19:26:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 15:26:30 -0400 Subject: [Biopython-dev] [Bug 2228] genbank parser should not print warnings to stdout In-Reply-To: Message-ID: <200703111926.l2BJQUeJ012797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2228 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 15:26 EST ------- The "interesting" part of a GenBank file is the feature table, which is also present in EMBL files in almost the same form. One good page for that is here: http://www.insdc.org/files/feature_table.html I'm sure I've seen other pages on the GenBank (in addition to the one you linked to) as well... If memory serves, I have seen "unquoted multiline features" and other "abuses" like blank lines in the feature table in actual NCBI files on occasion. They usually correct this sort of thing in later releases. Regarding "minimal LOCUS lines", I recall them being mentioned on our mailing lists as being produced by certain third party tools. Again I don't have the link to hand, and haven't had reason to chase this up. P.S. Have you joined the mailing list? This bug is starting to go off topic -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Mar 11 19:32:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 15:32:32 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> Message-ID: <45F45950.3090104@c2b2.columbia.edu> Ed Schofield wrote: > It's a minor issue, but I can see several reasons to use the header file > NumPy provides for the purpose, rather than pasting its definitions into > our own source files: I am not suggesting to paste the definitions in oldnumeric.h into our own source files. The point is that we don't need any of the definitions in oldnumeric.h. So we don't need to #include them, and we also don't need to paste them into Biopython. For example, your patch adds these lines (among others) to Bio/KDTree/KDTree.i: #if (NDARRAY_VERSION >= 0x00090908) #include "numpy/oldnumeric.h" #endif AFAICT, these three lines can simply be removed. --Michiel. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 20:07:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 16:07:10 -0400 Subject: [Biopython-dev] [Bug 1741] Bug in fasta consumer in Doc/tutorial.tex and Doc/examples/ In-Reply-To: Message-ID: <200703112007.l2BK7A1O014745@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1741 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Summary|Bug in fasta consumer in |Bug in fasta consumer in |Doc/tutorial.tex and |Doc/tutorial.tex and |Doc/examples/ |Doc/examples/ ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 16:07 EST ------- I've started to update that section of the tutorial to use Bio.SeqIO instead of Bio.Fasta, so the problem in Doc/Tutorial.tex doesn't apply anymore. The example scripts in Doc/examples/*.py will need updating or replacing... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 20:42:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 16:42:26 -0400 Subject: [Biopython-dev] [Bug 1741] Bug in fasta consumer in Doc/tutorial.tex and Doc/examples/ In-Reply-To: Message-ID: <200703112042.l2BKgQwp016519@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1741 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 16:42 EST ------- fasta_consumer.py - deleted, relevant section in the tutorial also replace fasta_dictionary.py - checked, then extended to also cover Bio.SeqIO fasta_iterator.py - checked, then extended to also cover Bio.SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 11 21:02:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 11 Mar 2007 17:02:32 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703112102.l2BL2Wl4017557@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython- |biopython-dev at biopython.org |bugzilla at maubp.freeserve.co.| |uk | Status|ASSIGNED |NEW ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-11 17:02 EST ------- Have you had a chance to look at this patch yet Michiel? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sun Mar 11 22:01:58 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Mar 2007 22:01:58 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F44BD0.6010000@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> Message-ID: <45F47C56.7070300@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I have basically replaced that whole section in Tutorial.tex, and >> checked it looks fine using pdflatex and hevea on my linux machine. > > For those of you who want to preview the new tutorial, it's available here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > This contains a description of the new Bio.SeqIO in section 2.4. Cheers Michiel. There will be a few more changes still to come. For example, I just noticed that the "Orchid Photos" links which Brad Chapman liked so much are dead - looks like www.millicentorchids.com belongs to a domain parking company now. How about these flickr and google image searches instead? http://www.flickr.com/search/?q=lady+slipper+orchid&s=int&z=t http://images.google.com/images?q=lady%20slipper%20orchid Peter From mdehoon at c2b2.columbia.edu Sun Mar 11 23:53:07 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 11 Mar 2007 19:53:07 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO Message-ID: <45F49663.5000509@c2b2.columbia.edu> I looked at the tutorial documentation for the new Bio.SeqIO. Generally it looks good to me, so I just have a few small issues: Section 2.4 is titled "Parsing biological file formats". This used to be appropriate, as the original version of this section described Biopython's parsers in more general terms. Since now it's about Bio.SeqIO, it may be better to rename it to make explicit that it is about parsing sequence files. Last sentence of section 2.4.2: "record.id" should be "seq_record.id". In section 2.4.4: "built python function list" should be "built-in python function list" Last sentence of section 2.4.5: "Not to complicated" should be "Not too complicated". And finally, "lets" in various places should be "let's". That's all, it looks very fine otherwise. Thanks! --Michiel. From edschofield at gmail.com Mon Mar 12 10:49:26 2007 From: edschofield at gmail.com (Ed Schofield) Date: Mon, 12 Mar 2007 10:49:26 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F45950.3090104@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> Message-ID: <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> On 3/11/07, Michiel de Hoon wrote: > > Ed Schofield wrote: > > It's a minor issue, but I can see several reasons to use the header file > > NumPy provides for the purpose, rather than pasting its definitions into > > our own source files: > > I am not suggesting to paste the definitions in oldnumeric.h into our > own source files. The point is that we don't need any of the definitions > in oldnumeric.h. So we don't need to #include them, and we also don't > need to paste them into Biopython. > > For example, your patch adds these lines (among others) to > Bio/KDTree/KDTree.i: > > #if (NDARRAY_VERSION >= 0x00090908) > #include "numpy/oldnumeric.h" > #endif > > AFAICT, these three lines can simply be removed. Ah, yes, I understand. And I've now re-read your previous post and now understand your point there about clustermodule.c too. So I've changed all (flags & CONTIGUOUS) instances to use PyArray_ISCONTIGUOUS() in my patch and updated it at http://edschofield.com/biopython-numpy-support.patch And you're right ... with these changes, including oldnumeric.h is no longer necessary anywhere. -- Ed From bugzilla-daemon at portal.open-bio.org Mon Mar 12 15:36:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 11:36:33 -0400 Subject: [Biopython-dev] [Bug 2229] New: GenBank Scanner fails to scan over headers Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2229 Summary: GenBank Scanner fails to scan over headers Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org The new GenBank code fails to read in NCBI-GenBank flat file releases, such as gbvrl1.seq (release 158.0) from Bio import GenBank import sys fh = open(sys.argv[1], 'r') gb_iter = GenBank.Iterator(fh, GenBank.FeatureParser()) for rec in gb_iter: print rec.id This fails because the first item is not 'LOCUS'. The following works. Index: Scanner.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/GenBank/Scanner.py,v retrieving revision 1.7 diff -r1.7 Scanner.py 62c62,63 < break --- > self.line = line > return line 69,72c70,72 < raise SyntaxError("Expected line starting '%s', found '%s'" \ < % (self.RECORD_START, line.rstrip())) < self.line = line < return line --- > if self.debug > 1: print "Skipping line" > > return None -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 15:41:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 11:41:24 -0400 Subject: [Biopython-dev] [Bug 2230] New: GenBank __init__.py: _Scanner import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2230 Summary: GenBank __init__.py: _Scanner import Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org This is bad coding IMHO and took me several minutes to figure out where the class was. from Scanner import GenBankScanner as _Scanner since this is used only two times and referenced once! Index: __init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v retrieving revision 1.67 diff -r1.67 __init__.py 25c25 < _Scanner Set up a GenBank parser to parse a record. --- > GenBankScanner Set up a GenBank parser to parse a record. 47c47 < from Scanner import GenBankScanner as _Scanner --- > from Scanner import GenBankScanner 178c178 < self._scanner = _Scanner(debug_level) --- > self._scanner = GenBankScanner(debug_level) 202c202 < self._scanner = _Scanner(debug_level) --- > self._scanner = GenBankScanner(debug_level) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 16:05:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:05:19 -0400 Subject: [Biopython-dev] [Bug 2229] GenBank Scanner fails to scan over headers In-Reply-To: Message-ID: <200703121605.l2CG5Jk1012292@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2229 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:05 EST ------- Link to download the file, about 28 MB ftp://ftp.ncbi.nih.gov/genbank/gbvrl1.seq.gz This starts: ------------------------------------------------ GBVRL1.SEQ Genetic Sequence Data Bank February 15 2007 NCBI-GenBank Flat File Release 158.0 Viral Sequences (Part 1) 72061 loci, 66147687 bases, from 72061 reported sequences LOCUS AB000048 2007 bp DNA linear VRL 05-FEB-1999 DEFINITION Feline panleukopenia virus DNA for nonstructural protein 1, complete cds. ACCESSION AB000048 ... ------------------------------------------------ Much smaller test case, 81 KB compressed: ftp://ftp.ncbi.nih.gov/genbank/gbuna.seq.gz File starts: ------------------------------------------------ GBUNA.SEQ Genetic Sequence Data Bank February 15 2007 NCBI-GenBank Flat File Release 158.0 Unannotated Sequences 211 loci, 114018 bases, from 211 reported sequences LOCUS AB086827 901 bp mRNA linea ... ------------------------------------------------ In both cases, and I assume all these archives, there is a fairly uniform header present, followed by the GenBank records. I suppose we could/should spot these and skip them... does anyone know off hand in EMBL does anything similar? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 16:11:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:11:35 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200703121611.l2CGBZtw012643@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |LATER ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:11 EST ------- That was done on purpose so that any legacy code directly using the _Scanner object would still work. Note that the original _Scanner class was defined in Bio/GenBank/__init__.py I agree that its not very elegant, but it seemed like the easiest way to do it at the time. I don't want to touch this with the next release due imminently, but certainly some housekeeping may be in order after that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Mar 12 16:16:51 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Mar 2007 16:16:51 +0000 Subject: [Biopython-dev] GenBank file format documentation In-Reply-To: <200703111926.l2BJQUeJ012797@portal.open-bio.org> References: <200703111926.l2BJQUeJ012797@portal.open-bio.org> Message-ID: <45F57CF3.4060402@maubp.freeserve.co.uk> On bug 2228, http://bugzilla.open-bio.org/show_bug.cgi?id=2228#c4 I wrote: > The "interesting" part of a GenBank file is the feature table, which is also > present in EMBL files in almost the same form. One good page for that is here: > > http://www.insdc.org/files/feature_table.html > > I'm sure I've seen other pages on the GenBank (in addition to the one you > linked to) as well... This link looks like the same document, and may have been what I was thinking of: http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html Peter From bugzilla-daemon at portal.open-bio.org Mon Mar 12 16:31:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 12:31:05 -0400 Subject: [Biopython-dev] [Bug 2229] GenBank Scanner fails to scan over headers In-Reply-To: Message-ID: <200703121631.l2CGV5CL013600@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2229 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 12:31 EST ------- You were spot on Marc - this was indeed a regression from when I added the combined GenBank / EMBL scanner. I've checked in a fix to Bio/GenBank/Scanner.py I plan to add another example to the unit tests as they didn't catch this. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 17:20:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:20:55 -0400 Subject: [Biopython-dev] [Bug 2231] New: NCBI added new sequence type - cRNA Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2231 Summary: NCBI added new sequence type - cRNA Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mcolosimo at mitre.org Breaks GenBank.Scanner, add cRNA into assert array (and description above) assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA', 'cRNA'], \ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 17:26:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:26:18 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703121726.l2CHQIT7016249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 mcolosimo at mitre.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mcolosimo at mitre.org ------- Comment #2 from mcolosimo at mitre.org 2007-03-12 13:26 EST ------- (In reply to comment #1) > [I have tried to reassign this to the mailing list...] > > Could someone familar with BioSQL take a look at this please? > I'll have a look at this. I already have a few changes to submit for MySQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 17:55:04 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 13:55:04 -0400 Subject: [Biopython-dev] [Bug 2231] NCBI added new sequence type - cRNA In-Reply-To: Message-ID: <200703121755.l2CHt4xH017634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2231 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-12 13:55 EST ------- Suggested change made in CVS. Thanks Marc. Could you point me at a specific example GenBank file where this is used? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Mar 12 19:56:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Mar 2007 15:56:58 -0400 Subject: [Biopython-dev] [Bug 2231] NCBI added new sequence type - cRNA In-Reply-To: Message-ID: <200703121956.l2CJuw5V023462@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2231 ------- Comment #2 from mcolosimo at mitre.org 2007-03-12 15:56 EST ------- (In reply to comment #1) > Suggested change made in CVS. Thanks Marc. > > Could you point me at a specific example GenBank file where this is used? > Yep, AF039525 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 13:42:08 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 09:42:08 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141342.l2EDg80p017603@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #3 from mcolosimo at mitre.org 2007-03-14 09:42 EST ------- Created an attachment (id=593) --> (http://bugzilla.open-bio.org/attachment.cgi?id=593&action=view) Fixed last_id method of Mysql_dbutils class -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 14:22:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 10:22:45 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141422.l2EEMjth019835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #4 from mcolosimo at mitre.org 2007-03-14 10:22 EST ------- Created an attachment (id=594) --> (http://bugzilla.open-bio.org/attachment.cgi?id=594&action=view) Various fixes and possible improvements -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 14:27:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 10:27:07 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141427.l2EER79G020271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 mcolosimo at mitre.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #5 from mcolosimo at mitre.org 2007-03-14 10:27 EST ------- I've attached two diffs that fixed a bunch of "bugs". A bug was introduced that broke a key feature (storing of ncbi taxon ids) when trying to fix a bug. I fixed both the code that looked up the ncbi taxon ids and the sql statement that stored them. Hint: using None is better than "0" or removing it. Also, using my code and the latest BioSQL schema, I was able to load in AY243312 with out any problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 15:16:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:16:56 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141516.l2EFGurY022990@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 11:16 EST ------- Minor commentss on patch 594 (which I have only read)): Did you mean to change the number of leading spaces in this string: " FROM reference JOIN dbxref USING (dbxref_id)" versus: " FROM reference JOIN dbxref USING (dbxref_id)" You've used a mixture of "MEC" and "mec" tags on some comments - your initials? Personally I would avoid this... I initially assumed it was a three letter acronyms for something biological. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 15:22:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:22:12 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141522.l2EFMCYj023249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 11:22 EST ------- Marc, you wrote in comment 5: > I've attached two diffs that fixed a bunch of "bugs". A bug was > introduced that broke a key feature (storing of ncbi taxon ids) > when trying to fix a bug. I fixed both the code that looked up > the ncbi taxon ids and the sql statement that stored them. > > Hint: using None is better than "0" or removing it. I'm guessing that you are refering to the change two years ago in biopython/BioSQL/Loader.py (revision 1.16) http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/BioSQL/Loader.py?cvsroot=biopython See also bug 1921 which I think is a different issue (?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Mar 14 15:51:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 11:51:31 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1ACCB.1030704@c2b2.columbia.edu> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> Message-ID: <45F81A03.4080504@c2b2.columbia.edu> Ed Schofield wrote: > Ah, yes, I understand. And I've now re-read your previous post and now > understand your point there about clustermodule.c too. So I've changed > all (flags & CONTIGUOUS) instances to use PyArray_ISCONTIGUOUS() in my > patch and updated it at > > http://edschofield.com/biopython-numpy-support.patch OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). I'll look at this patch in more detail after the upcoming release is out. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Wed Mar 14 15:51:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 11:51:06 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703141551.l2EFp6k9024891@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-03-14 11:51 EST ------- > Have you had a chance to look at this patch yet Michiel? No, I haven't looked at this. Basically I have given up on parsing plain-text output from Blast. I don't think it can be done reliably. On the other hand, if the patch solves some of the issues with plain-text parsing, I'm all for it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 16:06:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 12:06:44 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141606.l2EG6iOl025748@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #8 from lpritc at scri.sari.ac.uk 2007-03-14 12:06 EST ------- > I'm guessing that you are refering to the change two years ago in > biopython/BioSQL/Loader.py (revision 1.16) I think Marc's improving on my workaround to Loader.py (rev 1.17) and taking an alternative approach to my fix to DBUtils.py. The test_BioSQL.py script in CVS still fails though, due to an unrelated issue of the return string format for locations, and I still need to issue server.adaptor.commit() in MySQL5. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From edschofield at gmail.com Wed Mar 14 18:26:53 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 14 Mar 2007 18:26:53 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F47C56.7070300@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> Message-ID: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> On 3/11/07, Peter wrote: > Michiel de Hoon wrote: > > Peter wrote: > >> I have basically replaced that whole section in Tutorial.tex, and > >> checked it looks fine using pdflatex and hevea on my linux machine. > > > > For those of you who want to preview the new tutorial, it's available here: > > > > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html > > > > This contains a description of the new Bio.SeqIO in section 2.4. > > Cheers Michiel. > > There will be a few more changes still to come. > > For example, I just noticed that the "Orchid Photos" links which Brad > Chapman liked so much are dead - looks like www.millicentorchids.com > belongs to a domain parking company now. Two more points: - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. - The code example in Section 2.4.1 uses record and seq_record inconsistently. Otherwise, looks good to me! -- Ed From bugzilla-daemon at portal.open-bio.org Wed Mar 14 18:34:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 14:34:27 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141834.l2EIYRNL001256@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #9 from mcolosimo at mitre.org 2007-03-14 14:34 EST ------- I meant the change in leading spaces, which is only cosmetic. I put my initals in the code for my usage so that I knew what I touched. So, they can be removed. In the change log, I think it should be keept (or add the bug # that these fix). (In reply to comment #6) > Minor commentss on patch 594 (which I have only read)): > > Did you mean to change the number of leading spaces in this string: > " FROM reference JOIN dbxref USING (dbxref_id)" > versus: > " FROM reference JOIN dbxref USING (dbxref_id)" > > You've used a mixture of "MEC" and "mec" tags on some comments - your initials? > Personally I would avoid this... I initially assumed it was a three letter > acronyms for something biological. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 14 18:41:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 14:41:47 -0400 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200703141841.l2EIflj1001592@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #10 from mcolosimo at mitre.org 2007-03-14 14:41 EST ------- (In reply to comment #8) > > I'm guessing that you are refering to the change two years ago in > > biopython/BioSQL/Loader.py (revision 1.16) > > I think Marc's improving on my workaround to Loader.py (rev 1.17) and taking an > alternative approach to my fix to DBUtils.py. The test_BioSQL.py script in CVS > still fails though, due to an unrelated issue of the return string format for > locations, and I still need to issue server.adaptor.commit() in MySQL5. > This does improve on your work around for bug #1921 and maybe the patch files should be attached to that bug #. I couldn't repeat the error for this bug. This is an older bug and might have been fixed in between. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Mar 14 19:08:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 15:08:58 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> Message-ID: <45F8484A.9010803@c2b2.columbia.edu> Ed Schofield wrote: > > - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. I've added these files on the server. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 19:47:07 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Mar 2007 19:47:07 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> Message-ID: <45F8513B.90802@maubp.freeserve.co.uk> Peter wrote: >> ..., I just noticed that the "Orchid Photos" links which Brad >> Chapman liked so much are dead - looks like www.millicentorchids.com >> belongs to a domain parking company now. Unless anyone has a better suggestion, I've gone with the Google Image and Filkr searches instead... Ed Schofield wrote: > Two more points: > > - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. There was a "funny" in the LaTeX code that worked in the PDF output from pdflatex, but not in the HTML output from Hevea. I think I have fixed that in CVS now. Also I see Michiel has put the example files on the website now, so I have also updated the link to point there (rather than downloading the via a ViewCVS checkout). Ed Schofield wrote: > - The code example in Section 2.4.1 uses record and seq_record inconsistently. I think Michiel pointed that out too, and again, I think I have corrected that in CVS. Ed Schofield wrote: > Otherwise, looks good to me! Lovely. I'll try and recompile the HTML edition tonight, and fingers crossed we can get it online for another proof read before the next release. Peter From mdehoon at c2b2.columbia.edu Wed Mar 14 20:18:17 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 16:18:17 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F8513B.90802@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> Message-ID: <45F85889.20104@c2b2.columbia.edu> Peter wrote: > I'll try and recompile the HTML edition tonight, and fingers crossed we > can get it online for another proof read before the next release. > If it's easier for you, you can also submit Turorial.tex to CVS, then I'll get it from there, run hevea on it and put it up for a proof read (since hevea will be used for the final version). --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Mar 14 22:42:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Mar 2007 22:42:01 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F85889.20104@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> Message-ID: <45F87A39.4060809@maubp.freeserve.co.uk> Thanks for the feedback Michiel and Ed. When I get back from my holiday (by which time I hope that Biopython 1.43 will have been released without any big issues), I plan to update Section 4.4 "Dealing with alignments". I've made a few changes to the Tutorial in CVS today, but I think I've finished now. Michiel Jan Laurens de Hoon wrote: > If it's easier for you, you can also submit Turorial.tex to CVS, then > I'll get it from there, run hevea on it and put it up for a proof read > (since hevea will be used for the final version). That would be easier for me. You might notice that I tweaked the all URLs (and updated the addresses of a few that had changed). This got rid of the recursive anchor tag warnings I was getting from hevea, and hasn't had any side effects as far as I can tell. I have just checked the updated Tutorial.tex works on Linux with both pdflatex (for Tutorial.pdf) and hevea (for the HTML version). I have not installed hevea on Windows, but after adding the hevea.sty file to my windows latex installation (I use MikTex) I was able to build Tutorial.pdf on Windows too. >Ed Schofield wrote: >> >> - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. >> > I've added these files on the server. Great. I've updated the URLs in the tutorial to point at those files. It might be nice to upload the rest of the files used Bio/Doc/examples/ as well... I had previously made some of the other examples point to ViewCVS to download the relevant file - this works but is a bit clumsy. Certainly when I was first reading the Tutorial, my immediate reaction was to think where can I get these examples sequence files - and having a URL right there in the HTMl and PDF editions is very nice. Peter From mdehoon at c2b2.columbia.edu Thu Mar 15 00:04:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Mar 2007 20:04:33 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F87A39.4060809@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> Message-ID: <45F88D91.5080609@c2b2.columbia.edu> I put the new Tutorial preview is at http://biopython.org/DIST/docs/tutorial/Tutorial-new.html Peter wrote: >>> - The links to ls_orchid.fasta and ls_orchid.gbk are also dead. >>> >> I've added these files on the server. > > Great. I've updated the URLs in the tutorial to point at those files. > It might be nice to upload the rest of the files used Bio/Doc/examples/ > as well... > I've changed the links to the example files to relative paths, and added these files on the server. With the relative paths, links should work both on the web page and from the local documentation as contained in the Biopython distribution. --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Thu Mar 15 00:37:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 00:37:05 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F88D91.5080609@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> Message-ID: <45F89531.9070600@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > I've changed the links to the example files to relative paths, and added > these files on the server. With the relative paths, links should work > both on the web page and from the local documentation as contained in > the Biopython distribution. > > ... > > I put the new Tutorial preview at > http://biopython.org/DIST/docs/tutorial/Tutorial-new.html I am the bearer of bad news I'm afraid. I assume the local file structure looks something like this: .../Doc/Tutorial.pdf .../Doc/Tutorial.html .../Doc/examples/ls_orchid.fasta .../Doc/examples/ls_orchid.gbk etc With the online copies going here: http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk So yes, relative links like "examples/ls_orchids.fasta" and "examples/ls_orchids.gbk" should in theory work for both the HTML and PDF files, both locally and online. Sadly it looks like hevea has mangled your relative links. They looked fine in LaTeX, but the HTML file you put online contains things like which of course fails. Also, and this may depend on the version of LaTeX used, but on the Tutorial.pdf I just built on windows, using Adobe Reader 6.0, the links display with "file:examples/ls_orchid.fasta" as the tool tip, but seem to get passed to Internet Explorer as "http://examples/ls_orchid.fasta" which fails. I think it would be simpler and safer to just use an absolute URL to the webpage copy, and mention in the text that the files are included in the source code. By the way - am I right in thinking that the Windows installer does not come with the documentation directory? I would assume that some Windows users would just download the PDF tutorial and put it anywhere on their hard disk (its what I would do!), in which case there is no way the relative links idea could work. Peter From biopython-dev at maubp.freeserve.co.uk Thu Mar 15 00:41:50 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 00:41:50 +0000 Subject: [Biopython-dev] [BioPython] Biopython hackathon In-Reply-To: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com> References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu> <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com> Message-ID: <45F8964E.1030701@maubp.freeserve.co.uk> Ann Loraine wrote: > I hope you will consider the following two requests as possible > hackathon activities: > > (1) If it does not already do this, it would be nice if the blast > "plain text" (non-XML) parser would report the length of the target > ("hit") sequence as well as the query. If I recall correctly, the last > time I used the plain text blast parser, I had to measure the length > of the targets by opening up the fasta copy of the blastable database > and reading the lengths one-by-one. My database wasn't very big, so it > wasn't a hassle to do this, but I can foresee situations where this > kludge would fail. I'm not answering your question, but... The "plain text" (non-XML) parser is currently falling behind the changes the NCBI seem to make each release - see bug 2090: http://bugzilla.open-bio.org/show_bug.cgi?id=2090 We really want to encourage people to move over to the XML Blast parser instead. Do you have a strong reason for preferring to parse the plain text output? Peter From bugzilla-daemon at portal.open-bio.org Thu Mar 15 00:50:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 Mar 2007 20:50:40 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200703150050.l2F0oeV7016990@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-03-14 20:50 EST ------- > No, I haven't looked at this. Basically I have given up on > parsing plain-text output from Blast. I don't think it can > be done reliably. On the other hand, if the patch solves some > of the issues with plain-text parsing, I'm all for it. I agree with you that maintaining the plain-text parsing is a pain - especially given the recent change for the output for multiple queries where the header is no longer repeated. However, the patch did help for parsing the output of single queries, which is better than nothing. Note that I haven't touched this code since December, but the only change in CVS in the meantime was the trivial "oldengine" switch, so it should be easy to merge this in. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Mar 15 01:38:09 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Mar 2007 01:38:09 +0000 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F88D91.5080609@c2b2.columbia.edu> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> Message-ID: <45F8A381.3020900@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > I've changed the links to the example files to relative paths, and added > these files on the server. ... So far I can see just "ls_orchid.fasta", "ls_orchid.gbk" and "opuntia.fasta" online: http://biopython.org/DIST/docs/tutorial/examples/ To be fair, these are the only files I had explicitly named or linked to. I meant it would be nice to also upload and then link to: (a) protein.aln, used (but not currently hyper-referenced) in the section "Creating your own substitution matrix from an alignment". (b) m_cold.fasta, used (but not currently hyper-referenced) as an example input file in the Blast section, and also in the "What the heck in a handle?" appendix. I think that covers all the non python files in Bio/Doc/examples/ except for Bio/Doc/examples/nmr/noed.xpk which does not seem to be used in the tutorial at the moment. Peter From mdehoon at c2b2.columbia.edu Thu Mar 15 15:57:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Mar 2007 11:57:05 -0400 Subject: [Biopython-dev] Tutorial documentation for Bio.SeqIO In-Reply-To: <45F89531.9070600@maubp.freeserve.co.uk> References: <45F1D83B.20208@c2b2.columbia.edu> <45F2A381.2080303@maubp.freeserve.co.uk> <45F35059.6000102@c2b2.columbia.edu> <45F356A8.6050706@maubp.freeserve.co.uk> <45F36C1A.6080005@c2b2.columbia.edu> <45F4229B.9080401@maubp.freeserve.co.uk> <45F44BD0.6010000@c2b2.columbia.edu> <45F47C56.7070300@maubp.freeserve.co.uk> <1b5a37350703141126o5fd29e5fu1426f2b4eb173991@mail.gmail.com> <45F8513B.90802@maubp.freeserve.co.uk> <45F85889.20104@c2b2.columbia.edu> <45F87A39.4060809@maubp.freeserve.co.uk> <45F88D91.5080609@c2b2.columbia.edu> <45F89531.9070600@maubp.freeserve.co.uk> Message-ID: <45F96CD1.7010002@c2b2.columbia.edu> Peter wrote: > Michiel Jan Laurens de Hoon wrote: >> I've changed the links to the example files to relative paths, and >> added these files on the server. With the relative paths, links should >> work both on the web page and from the local documentation as >> contained in the Biopython distribution. > > Sadly it looks like hevea has mangled your relative links. They looked > fine in LaTeX, but the HTML file you put online contains things like HREF="ls_orchid.fasta"> which of course fails. Are you sure you're looking at the latest version of the HTML? Maybe your browser is showing you an older page from cache. On my browser, I'm seeing . > I think it would be simpler and safer to just use an absolute URL to the > webpage copy, and mention in the text that the files are included in the > source code. > By the way - am I right in thinking that the Windows installer does not > come with the documentation directory? I would assume that some Windows > users would just download the PDF tutorial and put it anywhere on their > hard disk (its what I would do!), in which case there is no way the > relative links idea could work. That's right: The Windows installer comes without the documentation. As a solution, we can include both links -- a local one for off-line use, and an absolute URL for safety. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Sat Mar 17 23:26:50 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Sat, 17 Mar 2007 19:26:50 -0400 Subject: [Biopython-dev] Biopython release 1.43 Message-ID: <45FC793A.2010106@c2b2.columbia.edu> Dear Biopythoneers, We are pleased to announce the release of Biopython 1.43. This release includes a brand-new set of parsers in Bio.SeqIO by Peter Cock for reading biological sequence files in various formats, an updated Blast XML parser in Bio.Blast.NCBIXML, a new UniGene flat-file parser by Sean Davis, and numerous improvements and bug fixes in Bio.PDB, Bio.SwissProt, Bio.Nexus, BioSQL, and others. Believe it or not, even the documentation was updated. Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Sun Mar 18 16:14:25 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 18 Mar 2007 12:14:25 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <45F235B7.6000409@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> Message-ID: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> On 3/10/07, Michiel de Hoon wrote: > Chris Lasher wrote: > > I hope the BioPython developers will consider a move to Subversion > > seriously. If there is support from the devs, but no interest on > > anyone's part to make it happen, given the proper people to contact, I > > will be happy to get this moving as a way of contributing back to the > > BioPython community. > > I know very little about CVS and Subversion (which is why I didn't > respond to your original post). But I did notice that a lot of software > projects are using Subversion instead of CVS nowadays, including Python > itself. So I don't have any objections against Biopython moving to > Subversion as well (unfortunately, I cannot be very help here either). > > If we are moving to Subversion though, I'd like to ask you not to make > any changes until the next Biopython release comes out, which will be in > about one week from now. Since no one else has volunteered, I'm taking up responsibility for the transition. I got the ball moving by contacting "support at open-bio.org" to get alert them of our interest and get any contacts we'll need to make this happen. Also, if anybody on the list has any information that would be helpful in this (e.g., who administers the CVS repo) please feel free to send it along. Likewise, feel free to raise any questions, concerns, and comments on the list. Once the Subversion repository is in place and in sync with the CVS repository (done through cvs2svn), we have two options with regards to the CVS repository: 1) Drop support for CVS. What this means: The CVS repository will either be shut down or be left but not supported with updates to the code. Avantages: * Transitioning users to Subversion should be trivial due to Subversion's inheritance from CVS. - Transition documentation already exists from svnbook.red-bean.com, specific examples can be written on Biopython wiki, etc. * Clean, less challenging solution. Disadvantages: * We will need to publicize in a big way that we've transitioned. * Automated scripts on remote machines that depend on the CVS repository would break or work with Biopython code that becomes increasingly out of date. - Trivial to remedy (e.g., s/cvs up -dP/svn up/g) * Obstinate users will complain. - We can't please everybody. 2) Allowing legacy support for CVS via the method found at: What this means: Briefly, the commits to Subversion repository are mirrored in the CVS repository. CVS access becomes read-only, commits are not permitted. Advantages: * Allows legacy users of CVS repository to receive updates. Disadvantages: * We may not have enough administrative access to do this. * This will require much more time to implement, test, and triple-check. * Has anybody on the list ever done this? It could lead to a lot of "learning experiences". Questions, concerns, and comments welcome. Chris From edschofield at gmail.com Sun Mar 18 16:53:37 2007 From: edschofield at gmail.com (Ed Schofield) Date: Sun, 18 Mar 2007 16:53:37 +0000 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> Message-ID: <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> On 3/18/07, Chris Lasher wrote: > > Since no one else has volunteered, I'm taking up responsibility for > the transition. I got the ball moving by contacting "support at > open-bio.org" to get alert them of our interest and get any contacts > we'll need to make this happen. Also, if anybody on the list has any > information that would be helpful in this (e.g., who administers the > CVS repo) please feel free to send it along. Likewise, feel free to > raise any questions, concerns, and comments on the list. > > Once the Subversion repository is in place and in sync with the CVS > repository (done through cvs2svn), we have two options with regards to > the CVS repository: > ... Nice summary. How many "legacy users" will there be for whom moving to SVN would be a non-trivial task? This list is one channel for soliciting their feedback; are there any others? -- Ed From chris.lasher at gmail.com Sun Mar 18 17:02:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 18 Mar 2007 13:02:03 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> Message-ID: <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> On 3/18/07, Ed Schofield wrote: > Nice summary. How many "legacy users" will there be for whom moving to > SVN would be a non-trivial task? This list is one channel for > soliciting their feedback; are there any others? I don't know, actually, how many "legacy users" we would have. The other channel for soliciting feedback, and a very important one, is the Biopython user list. That would probably be the best way of assessing how many would still need to rely on CVS. Should I send an email out to that list, or would one of the more senior developers like to do it? Chris From sbassi at gmail.com Tue Mar 20 19:41:08 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 20 Mar 2007 16:41:08 -0300 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <45F81A03.4080504@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> Message-ID: On 3/14/07, Michiel Jan Laurens de Hoon wrote: > OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use > PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). > I'll look at this patch in more detail after the upcoming release is out. Did the numpy instead of numeric made it into 1.43? Best, SB. From mdehoon at c2b2.columbia.edu Tue Mar 20 20:46:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 20 Mar 2007 16:46:52 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <1b5a37350703091121i282acabbs96a1b55a134bbde0@mail.gmail.com> <1b5a37350703091122t67e6e10fsf2d33a70d5ba8dab@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> Message-ID: <4600483C.7010507@c2b2.columbia.edu> Sebastian Bassi wrote: > On 3/14/07, Michiel Jan Laurens de Hoon wrote: >> OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use >> PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). >> I'll look at this patch in more detail after the upcoming release is out. > > Did the numpy instead of numeric made it into 1.43? > Yes and no. Yes): The patch above did make it in, which means that it is now fairly easy to compile Biopython with Numpy instead of Numeric. All you would have to do is to change the #include statements in the C-code to #include . (Why the Numpy folks put arrayobject.h in a different location, I don't know; if they hadn't, a transition to Numpy would have been a lot easier). But, if you're not on a 32-bits platform, Bio.Cluster will not work correctly. Also, some import statements will fail (e.g. from Numeric import * should become from numpy import *). So, in summary, you can compile the code but you'll probably have to tinker with the Python code. On the bright side, most of Biopython does not need Numeric or Numpy, so 90% of Biopython will work. No): Adding numpy support is a major undertaking; it's not something I'd want to add one week before a release is coming out. Some tests fail with Numpy. While these don't seem to be major issues, it's something that needs to be fixed first. An additional problem is that we cannot just drop Numeric support, so we'll have to support both for now. Numeric is still needed because (a) Numpy does not compile cleanly on all major platforms (for example on Cygwin); (b) Other Python software relevant to computational biology uses Numeric; (c) Numpy's documentation costs $40; Numeric's free. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From chris.lasher at gmail.com Wed Mar 21 02:39:11 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 20 Mar 2007 22:39:11 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> Message-ID: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> On 3/18/07, Chris Lasher wrote: > I don't know, actually, how many "legacy users" we would have. The > other channel for soliciting feedback, and a very important one, is > the Biopython user list. That would probably be the best way of > assessing how many would still need to rely on CVS. Should I send an > email out to that list, or would one of the more senior developers > like to do it? Silence means assent. =-) I'll post to the Biopython users list if nobody else wants to and I don't hear any objections in the next day. Chris From edschofield at gmail.com Wed Mar 21 11:36:13 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 21 Mar 2007 11:36:13 +0000 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <1b5a37350703210436l6aeef938s13f06ebabc2d8a09@mail.gmail.com> On 3/21/07, Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: > > I don't know, actually, how many "legacy users" we would have. The > > other channel for soliciting feedback, and a very important one, is > > the Biopython user list. That would probably be the best way of > > assessing how many would still need to rely on CVS. Should I send an > > email out to that list, or would one of the more senior developers > > like to do it? > > Silence means assent. =-) Exactly! ;) -- Ed From mcolosimo at mitre.org Wed Mar 21 12:39:57 2007 From: mcolosimo at mitre.org (Marc Colosimo) Date: Wed, 21 Mar 2007 08:39:57 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: On Mar 20, 2007, at 10:39 PM, Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > I've found that svn is more useable than cvs. I especially like the move command (svn mv), which moves the file(s) and all the associated information. Thus, you can undo a move and see all the change history for the file(s). In addition, it handles binary files automagically. The one thing that I don't like at times is that you need to explicitly set keyword tags (like $Id$). Also, some places block webDAV commands through proxies. So, keeping a read only cvs would be good. Marc From edschofield at gmail.com Wed Mar 21 12:55:34 2007 From: edschofield at gmail.com (Ed Schofield) Date: Wed, 21 Mar 2007 12:55:34 +0000 Subject: [Biopython-dev] PATCH: NumPy support for BioPython In-Reply-To: <4600483C.7010507@c2b2.columbia.edu> References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com> <45F1C218.6000601@c2b2.columbia.edu> <1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com> <45F2253E.2030909@c2b2.columbia.edu> <1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com> <45F45950.3090104@c2b2.columbia.edu> <1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com> <45F81A03.4080504@c2b2.columbia.edu> <4600483C.7010507@c2b2.columbia.edu> Message-ID: <1b5a37350703210555n720d4e9bla03207a3f5758833@mail.gmail.com> On 3/20/07, Michiel Jan Laurens de Hoon wrote: > Sebastian Bassi wrote: > > On 3/14/07, Michiel Jan Laurens de Hoon wrote: > >> OK, thanks. I have fixed clustermodule.c in Biopython's CVS to use > >> PyArray_ISCONTIGUOUS instead of (flags & CONTIGUOUS). > >> I'll look at this patch in more detail after the upcoming release is out. > > > > Did the numpy instead of numeric made it into 1.43? > > > Yes and no. > > Yes): > > The patch above did make it in, which means that it is now fairly easy > to compile Biopython with Numpy instead of Numeric. All you would have > to do is to change the #include statements in > the C-code to #include A more honest answer would have been "no". To compile and _actually use_ NumPy one needs somewhat more than to "probably ... tinker with the Python code". At a minimum one must change the Python import statements, change references to obsoleted function names, fix the broken array boolean operators in MarkovModel.py, and, for 64-bit platforms, fix the width of the dimension data types in cluster.c. These changes are not optional. To make it buildable one also needs to change the distutils setup.py file to get the new header locations. And if one ever needs to install Numeric after installing BioPython, one wants a mechanism to avoid segfaults when importing incompatible compiled C modules. In short, one still needs my patch. -- Ed From chris.lasher at gmail.com Wed Mar 21 14:23:58 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 21 Mar 2007 10:23:58 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <128a885f0703210723q3bc95e41kc587c9077915763d@mail.gmail.com> On 3/21/07, Marc Colosimo wrote: > I've found that svn is more useable than cvs. I especially like the > move command (svn mv), which moves the file(s) and all the associated > information. Thus, you can undo a move and see all the change history > for the file(s). In addition, it handles binary files automagically. Good points. Another +1 for Subversion, it tracks changes concommitantly and maintains a revision across the entire repository, not on a file-by-file basis. The Subversion Book explains this concept better than I can. See and . > The one thing that I don't like at times is that you need to > explicitly set keyword tags (like $Id$). That's true. One thing you can do, however, is configure Subversion to always set keyword substitution on all files of a filetype (e.g., always set keyword substitution for $Revision$ for all ".py" files). > Also, some places block webDAV commands through proxies. > So, keeping a read only cvs would be good. Hmm, I hadn't thought about this. How common is this practice? It's certainly a good argument in favor of maintaining CVS. (I'm assuming CVS does not do WebDAV). Chris From mhampton at d.umn.edu Wed Mar 21 14:29:20 2007 From: mhampton at d.umn.edu (Marshall Hampton) Date: Wed, 21 Mar 2007 09:29:20 -0500 (CDT) Subject: [Biopython-dev] SAGE project support In-Reply-To: References: Message-ID: Hi, I thought I would alert readers of this list to the fact that I recently asked the project leader for SAGE (Software for Algebra and Geometry Experimentation), William Stein, to add biopython as an optional package for SAGE. In his usual speedy fashion, he did so in a couple of hours! SAGE is a very exciting platform for uniting many open-source projects in mathematics. By leveraging lots of existing code it has progressed extremely rapidly. Currently the main server at the University of Washington is down for maintenance, but there is a mirror (maybe to become the main site) at: www.sagemath.org The screenshots link gives a pretty good idea of what SAGE is currently capable of. By the way, I am currently using biopython heavily in a bioinformatics course and I hope to contribute more to the project in the future. Cheers, Marshall Hampton University of Minnesota, Duluth Department of Mathematics and Statistics From mdehoon at c2b2.columbia.edu Thu Mar 22 04:36:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Thu, 22 Mar 2007 00:36:12 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> Message-ID: <460207BC.9000907@c2b2.columbia.edu> Maybe this is a silly question but will there be some downtime during the cvs->svn conversion? And can we still make commits to cvs or is it better to wait until the conversion is complete? --Michiel. Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mcolosimo at mitre.org Thu Mar 22 10:33:52 2007 From: mcolosimo at mitre.org (Colosimo, Marc E.) Date: Thu, 22 Mar 2007 06:33:52 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <460207BC.9000907@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com><128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> <460207BC.9000907@c2b2.columbia.edu> Message-ID: I haven't done this, but I would assume that you need to turn off commits to cvs and I think it would be rather quick (depending on the machine), like overnight. I'll ask some people here who might know better. Marc -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org [mailto:biopython-dev-bounces at lists.open-bio.org] On Behalf Of Michiel de Hoon Sent: Thursday, March 22, 2007 12:36 AM To: Chris Lasher Cc: BioPython Developers List Subject: Re: [Biopython-dev] Subversion Repository Maybe this is a silly question but will there be some downtime during the cvs->svn conversion? And can we still make commits to cvs or is it better to wait until the conversion is complete? --Michiel. Chris Lasher wrote: > On 3/18/07, Chris Lasher wrote: >> I don't know, actually, how many "legacy users" we would have. The >> other channel for soliciting feedback, and a very important one, is >> the Biopython user list. That would probably be the best way of >> assessing how many would still need to rely on CVS. Should I send an >> email out to that list, or would one of the more senior developers >> like to do it? > > Silence means assent. =-) I'll post to the Biopython users list if > nobody else wants to and I don't hear any objections in the next day. > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From chris.lasher at gmail.com Sat Mar 24 16:36:22 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 24 Mar 2007 12:36:22 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <460207BC.9000907@c2b2.columbia.edu> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <1b5a37350703180953w47ca97c6g1fcfadc0014a7e30@mail.gmail.com> <128a885f0703181002g53ba4305w803f81021006a7ae@mail.gmail.com> <128a885f0703201939v650d76e4m8287ce5535f4891e@mail.gmail.com> <460207BC.9000907@c2b2.columbia.edu> Message-ID: <128a885f0703240936l4cebc2c1j5e8294560b3e7322@mail.gmail.com> On 3/22/07, Michiel de Hoon wrote: > Maybe this is a silly question but will there be some downtime during > the cvs->svn conversion? And can we still make commits to cvs or is it > better to wait until the conversion is complete? Ah, very good question, Michiel. Yes, there will be downtime for the CVS repository. As Marc mentioned, it would probably be one evening. Also, to clarify, once the conversion is complete, CVS commits won't be permitted. If we decide to maintain CVS at all, it will only be possible to check out and update from CVS. (In other words CVS becomes "read-only".) All commits will go through Subversion, and there are two main reasons for this: 1) There is no easy means by which we can keep the two in sync going both ways. The Subversion to CVS syncronization is quite a bit of hackery on its own. 2) Subversion effectively deprecates CVS. Taken straight from Subversion's site: > Subversion is meant to be a better CVS, so it has most > of CVS's features. Generally, Subversion's interface to > a particular feature is similar to CVS's, except where > there's a compelling reason to do otherwise. So, aside from the natural human propensity for fear of change, why one would want to continue to work via CVS escapes me. I am not keen on supporting CVS with updates after the transition, but if users must have access to it, I will put in the work it takes to make that happen. In any event, we will strongly encourage Biopython users to make the transition to Subversion. I need to draft a migration strategy which includes the following: * documentation for developers on switching over to Subversion, in general (available via the Subversion Book), and in the specific context of Biopython, particularly * documentation for users on how to checkout and update Biopython via Subversion * the nitty gritty technical details of how we will proceed with the conversion--useful for the other Open-Bio projects which will want to follow in Biopython's wake I will begin writing this documentation on the Biopython wiki this weekend. I would like to set a target date for the CVS to Subversion transition for May 20th, which gives about two months' worth of anticipation for developers (and users, once I get an email out to that list), and plenty of time prior to BOSC 2007 for the growing pains caused by the transition to be worked out. How does this sound--any red flags or glaring omissions? Chris From mdehoon at c2b2.columbia.edu Tue Mar 27 00:03:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 26 Mar 2007 20:03:49 -0400 Subject: [Biopython-dev] PATCH: NumPy support for BioPython References: <1b5a37350703091035u176c2ab8m4512466ac77f3a4@mail.gmail.com><45F1C218.6000601@c2b2.columbia.edu><1b5a37350703091725w47b2ef59j260cbe32c170a1cf@mail.gmail.com><45F2253E.2030909@c2b2.columbia.edu><1b5a37350703100240uac7437ds7c9e707503f2434e@mail.gmail.com><45F45950.3090104@c2b2.columbia.edu><1b5a37350703120349k4f155a6cx2b8995a23ef6deeb@mail.gmail.com><45F81A03.4080504@c2b2.columbia.edu><4600483C.7010507@c2b2.columbia.edu> <1b5a37350703210555n720d4e9bla03207a3f5758833@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5DA@mail2.exch.c2b2.columbia.edu> Ed wrote: > In short, one still needs my patch. Sorry for my late reply. Could you make your patch available via BugZilla? So that further discussion of this patch can be kept in one place. Preferably a version that is consistent with the recently released version of Biopython, so people can try it out. Thanks, --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Tue Mar 27 13:08:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Mar 2007 09:08:37 -0400 Subject: [Biopython-dev] [Bug 2251] New: [PATCH] NumPy support for BioPython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2251 Summary: [PATCH] NumPy support for BioPython Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: edschofield at gmail.com The following patch adds support for NumPy in addition to Numeric. It does so with a thin wrapper layer at the C level and Python level, similar in purpose to (but less ambitious than) the numerix wrapper layer used by matplotlib. The patch is designed to prevent imports of incompatible compiled C modules in the case one installs Numeric after installing BioPython with NumPy support. It also changes the following: - C #include statements - Python import statements - references to obsoleted function names - the width of the dimension data types in cluster.c from int to intp (for 64-bit architectures) - the distutils setup.py file to supply the correct NumPy header locations. - the documentation (updating references to NumPy) It also fixes array boolean operators in MarkovModel.py, which were silently broken before. It applies to BioPython 1.43, as follows: $ tar xzvf biopython-1.43.tar.gz $ cd biopython-1.43 $ patch -p1 < /path/to/biopython-1.43-numpy-support-v5.patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 27 13:09:58 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Mar 2007 09:09:58 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703271309.l2RD9wte011941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #1 from edschofield at gmail.com 2007-03-27 09:09 EST ------- Created an attachment (id=610) --> (http://bugzilla.open-bio.org/attachment.cgi?id=610&action=view) Patch for NumPy support through the oldnumeric interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 16:09:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 12:09:57 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281609.l2SG9vFE011216@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-03-28 12:09 EST ------- It looks like there's an error in this patch: @@ -166,7 +184,7 @@ const int colstride = (*array)->strides[1]; for (i=0; i < nrows; i++) { const char* p = p0; - mask[i] = malloc(ncolumns*sizeof(int)); + mask[i] = malloc(ncolumns*sizeof(int*)); for (j=0; j < ncolumns; j++, p+=colstride) mask[i][j] = *((int*)p); p0 += rowstride; } mask is int**, mask[i] is int*, and we're allocating ncolumns integers. Or am I missing something? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 17:05:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 13:05:57 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281705.l2SH5vhN014252@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #3 from edschofield at gmail.com 2007-03-28 13:05 EST ------- Oops, my mistake. Revised patch attached below. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 28 17:06:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Mar 2007 13:06:59 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703281706.l2SH6xna014329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 edschofield at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #610 is|0 |1 obsolete| | ------- Comment #4 from edschofield at gmail.com 2007-03-28 13:06 EST ------- Created an attachment (id=611) --> (http://bugzilla.open-bio.org/attachment.cgi?id=611&action=view) Patch for NumPy support through the oldnumeric interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From charles.vejnar at isb-sib.ch Wed Mar 28 16:14:33 2007 From: charles.vejnar at isb-sib.ch (Charles Vejnar) Date: Wed, 28 Mar 2007 18:14:33 +0200 Subject: [Biopython-dev] Dynamic class return in Seq class Message-ID: <200703281814.33891.charles.vejnar@isb-sib.ch> Hi, I would like to build a class which would inherit from Seq class. But in some methods of the Seq class, return is like this : return Seq(s, self.alphabet) which makes my sub-class unusable (I always get a Seq instance instead of my sub-class Seq instance). I would like to avoid a delegation schema. So, is it possible to change the returns from Seq(...) to self.__class__(...) (as it's already done in the __getslice__ method) or are there any reasons I am missing which justify these returns ? Best regards, Charles From gbastian at pasteur.fr Thu Mar 29 14:46:59 2007 From: gbastian at pasteur.fr (Giacomo Bastianelli) Date: Thu, 29 Mar 2007 16:46:59 +0200 Subject: [Biopython-dev] ResidueDepth Message-ID: <460BD163.9050509@pasteur.fr> Dear Biopython developers, I am trying to use the ResidueDepth class. I have installed the MSMS module and I get this error: Traceback (most recent call last): File "test.py", line 8, in ? rd = ResidueDepth(model, '1SBC.pdb') File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") IOError: [Errno 2] No such file or directory: '/tmp/tmpEoynGC.vert' I checked the single programs (msms, pdb_to_xyzr) and they seem to work fine. Thanks for your suggestions! Giacomo From mdehoon at c2b2.columbia.edu Thu Mar 29 23:03:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 29 Mar 2007 19:03:22 -0400 Subject: [Biopython-dev] Dynamic class return in Seq class In-Reply-To: <200703281814.33891.charles.vejnar@isb-sib.ch> References: <200703281814.33891.charles.vejnar@isb-sib.ch> Message-ID: <460C45BA.1050801@c2b2.columbia.edu> In principle, I think that using self.__class__ instead of Seq is a good idea. But some Biopython tests fail with this substitution. It looks like these failures are trivial, but we do need to find some solution for them. --Michiel. Charles Vejnar wrote: > Hi, > > I would like to build a class which would inherit from Seq class. But in some > methods of the Seq class, return is like this : > return Seq(s, self.alphabet) > which makes my sub-class unusable (I always get a Seq instance instead of my > sub-class Seq instance). > > I would like to avoid a delegation schema. So, is it possible to change the > returns from Seq(...) to self.__class__(...) (as it's already done in the > __getslice__ method) or are there any reasons I am missing which justify > these returns ? > > > Best regards, > Charles > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From gbastian at pasteur.fr Fri Mar 30 08:22:58 2007 From: gbastian at pasteur.fr (Giacomo Bastianelli) Date: Fri, 30 Mar 2007 10:22:58 +0200 Subject: [Biopython-dev] Residue Depth class Message-ID: <1175242979.7228.4.camel@localhost> Dear Biopython developers, I am trying to use the ResidueDepth class. I have installed the MSMS module and I get this error: Traceback (most recent call last): File "test.py", line 8, in ? rd = ResidueDepth(model, '1SBC.pdb') File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/usr/lib64/python2.3/site-packages/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") IOError: [Errno 2] No such file or directory: '/tmp/tmpEoynGC.vert' I have python2.4 with biopython 1.43 in a linux ubuntu OS. I checked the single programs (msms, pdb_to_xyzr) and they seem to work fine. this is the code that I use: ---------------------- from string import * from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('1SBC.pdb', '1SBC.pdb') model = structure[0] rd = ResidueDepth(model, '1SBC.pdb') -------------------------- Thanks for your suggestions! Giacomo From bugzilla-daemon at portal.open-bio.org Sat Mar 31 22:52:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 31 Mar 2007 18:52:46 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200703312252.l2VMqkwR007408@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2007-03-31 18:52 EST ------- In proposed change to py_kcluster in this patch, using npy_generic_dimension_t for nrows, ncolumns will cause a bug. Depending on the platform, npy_genetic_dimension_t may not be an int. But the function kcluster, which is called from py_kcluster, expects nrows, ncolumns to be an int. If, for example, npy_genetic_dimension_t is an 8-byte long, nrows will be truncated to a 4-byte int in the call to kcluster. So kcluster may get an incorrect number for nrows, ncolumns. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.