From katel at worldpath.net Sun Oct 7 22:10:50 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] IntelliGenetics parser Message-ID: <001401c14f9e$748fe920$499403cf@g0fjl> Tonihjt I ran into a snag with Ithe iterator. The problem is the starting tag which is a semicolon. The IntelliGenetics format starts with a block of comment lines, each beginning with a semicolon. So the iterator interprets each comment as the start of a new record. Should I write a custom iterator? Cayte -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20011007/35e31fab/attachment.htm From jchang at SMI.Stanford.EDU Mon Oct 8 01:11:36 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] IntelliGenetics parser In-Reply-To: <001401c14f9e$748fe920$499403cf@g0fjl> References: <001401c14f9e$748fe920$499403cf@g0fjl> Message-ID: I'm not familiar with the IntelliGenetics format. Could you provide a sample of it, so that people will be able to provide more comments? Thanks, Jeff At 10:10 PM -0400 10/7/01, Cayte wrote: > Tonihjt I ran into a snag with Ithe iterator. The problem is the >starting tag which is a semicolon. The IntelliGenetics format >starts with a block of comment lines, each beginning with a >semicolon. So the iterator interprets each comment as the start of >a new record. Should I write a custom iterator? > > Cayte From katel at worldpath.net Mon Oct 8 13:45:21 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] IntelliGenetics parser References: <001401c14f9e$748fe920$499403cf@g0fjl> Message-ID: <000b01c15021$01988480$6b72bbd1@g0fjl> ----- Original Message ----- From: "Jeffrey Chang" To: "Cayte" ; Sent: Monday, October 08, 2001 1:11 AM Subject: Re: [Biopython-dev] IntelliGenetics parser > I'm not familiar with the IntelliGenetics format. Could you provide > a sample of it, so that people will be able to provide more comments? > > Thanks, Its the same as the MASE format. I committed a sample under Tests\IntelliGenetics. I saw a way to write the iterator with a simple state machine so I used that approach. The files are checked in but I need to check the test output closely. Cayte > > From chapmanb at arches.uga.edu Mon Oct 8 23:39:40 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] GenBank parser fails (on large files?) In-Reply-To: <20010928081702.A973@ci350185-a.athen1.ga.home.com> References: <200109281141.f8SBfDM188776@electre.pasteur.fr> <20010928081702.A973@ci350185-a.athen1.ga.home.com> Message-ID: <20011008233940.B13537@ci350185-a.athen1.ga.home.com> Michel: > [Talking about the translation] > > Note, incidentally, that this is a bit ugly, because the \012's and spaces > > should have been cleaned out me: > I agree with you here -- I haven't yet done any work at massaging > the feature value information. I'll think about a good way to do > this (I'm sure there are other cases where this also needs to be > done), and try to get something done on it this weekend. I finally managed to come up with a good (in my opinion, of course) way to handle the problem of selectively cleaning up values based on their type. Basically, what I did was add a Bio.GenBank.utils class that has a FeatureValueCleaner class. Right now this class is quite simple (it just deals with the translation problem mentioned), but could be extended quite easily to deal with other special cases as they come up. You can use this class by passing it as a feature_cleaner argument to the FeatureParser, ie: from Bio import GenBank from Bio.GenBank.utils import FeatureValueCleaner parser = GenBank.FeatureParser(feature_cleaner = FeatureValueCleaner()) Right now this is not enabled by default, but I'm definately open to opinions about whether or not it should be. Michel, I'd be happy to hear if this does what you'd like it to. If you have additional things that need cleaning up, I'd be more than happy to accept patches against utils.py adding these things. The utils.py class is attached, along with the patch against __init__.py. These are also checked into CVS. Hope this works for you. Brad -- PGP public key available from http://pgp.mit.edu/ -------------- next part -------------- *** __init__.py.orig Thu Sep 27 16:00:49 2001 --- __init__.py Mon Oct 8 23:16:50 2001 *************** *** 239,245 **** class FeatureParser: """Parse GenBank files into Seq + Feature objects. """ ! def __init__(self, debug_level = 0, use_fuzziness = 1): """Initialize a GenBank parser and Feature consumer. Arguments: --- 239,246 ---- class FeatureParser: """Parse GenBank files into Seq + Feature objects. """ ! def __init__(self, debug_level = 0, use_fuzziness = 1, ! feature_cleaner = None): """Initialize a GenBank parser and Feature consumer. Arguments: *************** *** 249,262 **** you can set this as high as two and see exactly where a parse fails. o use_fuzziness - Specify whether or not to use fuzzy representations. The default is 1 (use fuzziness). """ self._scanner = _Scanner(debug_level) self.use_fuzziness = use_fuzziness def parse(self, handle): """Parse the specified handle. """ ! self._consumer = _FeatureConsumer(self.use_fuzziness) self._scanner.feed(handle, self._consumer) return self._consumer.data --- 250,268 ---- you can set this as high as two and see exactly where a parse fails. o use_fuzziness - Specify whether or not to use fuzzy representations. The default is 1 (use fuzziness). + o feature_cleaner - A class which will be used to clean out the + values of features. This class must implement the function + clean_value. GenBank.utils has a "standard" cleaner class. """ self._scanner = _Scanner(debug_level) self.use_fuzziness = use_fuzziness + self._cleaner = feature_cleaner def parse(self, handle): """Parse the specified handle. """ ! self._consumer = _FeatureConsumer(self.use_fuzziness, ! self._cleaner) self._scanner.feed(handle, self._consumer) return self._consumer.data *************** *** 398,409 **** Attributes: o use_fuzziness - specify whether or not to parse with fuzziness in feature locations. """ ! def __init__(self, use_fuzziness): _BaseGenBankConsumer.__init__(self) self.data = SeqRecord(None, id = None) self._use_fuzziness = use_fuzziness self._seq_type = '' self._seq_data = [] --- 404,418 ---- Attributes: o use_fuzziness - specify whether or not to parse with fuzziness in feature locations. + o feature_cleaner - a class that will be used to provide specialized + cleaning-up of feature values. """ ! def __init__(self, use_fuzziness, feature_cleaner = None): _BaseGenBankConsumer.__init__(self) self.data = SeqRecord(None, id = None) self._use_fuzziness = use_fuzziness + self._feature_cleaner = feature_cleaner self._seq_type = '' self._seq_data = [] *************** *** 856,861 **** --- 865,872 ---- if self._cur_qualifier_key: key = self._cur_qualifier_key value = self._cur_qualifier_value + if self._feature_cleaner is not None: + value = self._feature_cleaner.clean_value(key, value) # if the qualifier name exists, append the value if self._cur_feature.qualifiers.has_key(key): self._cur_feature.qualifiers[key].append(value) -------------- next part -------------- """Useful utilities for helping in parsing GenBank files. """ # standard library import string class FeatureValueCleaner: """Provide specialized capabilities for cleaning up values in features. This class is designed to provide a mechanism to clean up and process values in the key/value pairs of GenBank features. This is useful because in cases like: /translation="MED YDPWNLRFQSKYKSRDA" you'll end up with a value with \012s and spaces in it like: "MED\012 YDPWEL..." which you probably don't want. This cleaning needs to be done on a case by case basis since it is impossible to interpret whether you should be concatenating everything (as in translations), or combining things with spaces (as might be the case with /notes). """ keys_to_process = ["translation"] def __init__(self, to_process = keys_to_process): """Initialize with the keys we should deal with. """ self._to_process = to_process def clean_value(self, key_name, value): """Clean the specified value and return it. If the value is not specified to be dealt with, the original value will be returned. """ if key_name in self._to_process: try: cleaner = getattr(self, "_clean_%s" % key_name) value = cleaner(value) except AttributeError: raise AssertionError("No function to clean key: %s" % key_name) return value def _clean_translation(self, value): """Concatenate a translation value to one long protein string. """ translation_parts = value.split() return string.join(translation_parts, '') From chapmanb at arches.uga.edu Tue Oct 9 01:00:50 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] SeqIO In-Reply-To: References: <20010926224754.E27721@ci350185-a.athen1.ga.home.com> Message-ID: <20011009010050.E13537@ci350185-a.athen1.ga.home.com> Hi Thomas; Hope you're doing well! [I ask how many features we want to keep between conversions] > All of them. I think each GenBank feature has an exact equivalence in EMBL > and SwissProt (GenPept). So that leaves us just with the definition of the > corresponding feature names. [relatedly, I ask in a confusing manner about a "specialized converter"] > I don't know if I understood this question... What I mean is that I'm not sure how I would plug in "lossless" EBML->GenBank conversion into the current scheme. I can write a generic writer that will convert a basic SeqRecord to a simple GenBank (no features). But the way ReadSeq.Convert works now is that I only get a record, and don't know the starting format. In order to have "smart" EMBL->GenBank I need to know that the format is EMBL, so I can look for the conversions. It seems like a simple thing to do would be to add an optional second argument to write, so that I could do something like: def write(self, record, starting_format): if starting_format == "embl": _do_embl_to_genbank() elif starting_format = "swissprot": _do_swissprot_to_genbank() else: _do_generic_to_genbank() Does this make sense and seem like a good idea? Or am I still making no darn sense? [I ask about duplicated SeqRecord stuff] > I copied everything so that I c?uld play around without breaking e.g. your > code. Now I think the changes are actually backward compatible - so we > could move it back. Yeah, you are more than welcome to add additions that are back-compatible to the SeqRecord stuff. This will help eliminate duplication (and thus lots of confusion for me :-). > P.S. is anybody going to the Atlanta meeting in November ? Well, I am not going (I was kind of scared away by the "Bioinformatics after Human Genome" sub-title), but I am only an hour away from Atlanta so I'll definately "be in the area" :-). Many of the talks look good though, so I may try to sneak in to listen to a couple of them. It's-okay-for-graduate-students-to-be-cheap-ly yr's, Brad -- PGP public key available from http://pgp.mit.edu/ From chapmanb at arches.uga.edu Tue Oct 9 03:27:09 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Implementation of Application interface Message-ID: <20011009032708.G13537@ci350185-a.athen1.ga.home.com> Hello all; I thought this might be of interest to Davide and others interested in accessing applications through Biopython. We talked a while back about a generic interface for specifying command lines and running programs through biopython, in a thread starting here: http://www.biopython.org/pipermail/biopython-dev/2001-August/000476.html Well, I've been working in biopython-corba on implementing interfaces to remote programs (through Novella, http://industry.ebi.ac.uk/novella/) and wrote up a "biopython-like" interface that implements what we were talking about. The code for this is in biopython-corba CVS in BioCorba/Bio/Application/__init__.py, and on-line here (sorry 'bout the long URL): http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython-corba/BioCorba/Bio/Application/__init__.py?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=biopython Anyways, I thought a working implementation of what we talked about might be of interest to people, and once we get the generic application stuff working in biopython, we can synchronize these interfaces so the biopython-corba stuff models as closely as possible the biopython code. Comments on whether I accurately implemented what we talked about, etc, are very welcome. Brad -- PGP public key available from http://pgp.mit.edu/ From jchang at SMI.Stanford.EDU Wed Oct 10 03:56:30 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] recent checkins to CVS Message-ID: Hello everyone, I've checked in some new stuff into the CVS tree: 1. I've implemented the fastpairwise dynamic programming code in C. It runs much faster now and is probably about as fast as it will get without making some more assumptions. 2. There are 2 new modules in Bio.Tools.Classification: LogisticRegression and MaxEntropy. 3. There is now some preliminary support for the NLM's XML format for Medline. There's a Martel format definition and some code to index the files. However, getting a parser to put things into a data structure will take some work and may not happen soon... Enjoy, and let me know if anything breaks! Jeff From reillywu at yahoo.com Fri Oct 12 23:29:28 2001 From: reillywu at yahoo.com (Chunlei Wu) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] The 'year' attribute of parsed Medline record return empty Message-ID: <20011013032928.78364.qmail@web20504.mail.yahoo.com> Hi, all, Here is a sample code: ========== .... cur_record=medline_dict[id] print cur_record.year ========= The 'year' attribute always returns empty string, while 'publication_date' attribute returns the correct whole date string. When I looked into the __init__.py, I found the 'year' attr. coresponding to 'YR' qualifier. But 'YR' doesn't exist in most of Medline format record. Although we can get year-value easily from cur_record.publication_date[:4], I think it's better give the proper value to the attr of 'year'. Thanks Chunlei Wu __________________________________________________ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com From jchang at SMI.Stanford.EDU Sat Oct 13 17:26:44 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] The 'year' attribute of parsed Medline record return empty In-Reply-To: <20011013032928.78364.qmail@web20504.mail.yahoo.com> References: <20011013032928.78364.qmail@web20504.mail.yahoo.com> Message-ID: This is as designed. The members of the Record class are supposed to mirror the information given in the MEDLARS format. If there's no YR line, then the year member of the record is empty. If you need the year member, it should be pretty simple to make a parser that uses it. For example, you could do (untested): class MyParserWithYear: def parse(self, handle): rec = Medline.RecordParser().parse(handle) if not rec.year: rec.year = rec.publication_date[:4] return rec Jeff At 8:29 PM -0700 10/12/01, Chunlei Wu wrote: >Hi, all, > Here is a sample code: >========== >.... >cur_record=medline_dict[id] >print cur_record.year >========= > > The 'year' attribute always returns empty string, >while 'publication_date' attribute returns the correct >whole date string. When I looked into the __init__.py, >I found the 'year' attr. coresponding to 'YR' >qualifier. But 'YR' doesn't exist in most of Medline >format record. Although we can get year-value easily >from cur_record.publication_date[:4], I think it's >better give the proper value to the attr of 'year'. > >Thanks > >Chunlei Wu > >__________________________________________________ >Do You Yahoo!? >Make a great connection at Yahoo! Personals. >http://personals.yahoo.com >_______________________________________________ >Biopython-dev mailing list >Biopython-dev@biopython.org >http://biopython.org/mailman/listinfo/biopython-dev From katel at worldpath.net Sun Oct 14 00:48:58 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] bio formats Message-ID: <000701c1546b$8aeb6100$010a0a0a@cadence.com> This looks like a great web site: http://newfish.mbl.edu/Course/Software/FileFormats Cayte From katel at worldpath.net Sun Oct 14 01:06:18 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] another great web page Message-ID: <000701c1546d$f66b9a60$010a0a0a@cadence.com> http://www.sander.embl-ebi.ac.uk/Services/webin/help/webin-align/align_forma t_help.html Cayte From katel at worldpath.net Sun Oct 14 01:16:47 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Pir format Message-ID: <000c01c1546f$6d3e7f80$010a0a0a@cadence.com> Maybe I'll try PIR next. The other formats( MSF, Phyllip, Nexus ) contain phylogenetic info. I'm not sure how to fit phylogenetic data in with what we have so far. Annotation? Or should we define phylogenetic classes. So far the MASE output seems fine. But I've checked only 3 sequences so far. I'll wait for a rainy day before I slog through a letter by letter check of some 30 sequences.!:) Cayte From mkersz at pasteur.fr Mon Oct 15 11:59:16 2001 From: mkersz at pasteur.fr (Michel Kerszberg) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Re: cleaning features Message-ID: <3260000.1003161556@cricri> Dear Brad, The Bio.GenBank FeatureValueCleaner utility is what the doctor prescribed! Personnally, I would vote for applying it by default. Meanwhile, I discovered that the GenBank BLAST parser stalls at the interactive map when this is included in the HTML file. I guess the parser should ignore anything enclosed in
 and 
flags and including a "#graphical-overview" string. Mind you, no big deal to take this out by hand, or cheking the right options when doing the BLAST! Also, parsing of TBLASTX records stalls due to the unusual format of the final information: Matrix: BLOSUM62 Number of Hits to DB: 10,447,982,379 Number of Sequences: 988209 Number of extensions: 209322370 Number of successful extensions: 17383937 Number of sequences better than 1.0e-50: 110 length of database: 1,426,479,391 effective HSP length: 60 effective length of database: 1,367,186,851 effective search space used: 869530837236 frameshift window, decay const: 50, 0.5 T: 13 A: 40 X1: 16 ( 7.3 bits) X2: 0 ( 0.0 bits) S1: 41 (21.7 bits) Thanks for the good work! I appreciate biopython more and more (not to speak of python) Best regards, Michel From jchang at SMI.Stanford.EDU Mon Oct 15 14:05:52 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Re: cleaning features In-Reply-To: <3260000.1003161556@cricri> References: <3260000.1003161556@cricri> Message-ID: >Also, parsing of TBLASTX records stalls due to the unusual format of >the final information: > >Matrix: BLOSUM62 >Number of Hits to DB: 10,447,982,379 >Number of Sequences: 988209 >Number of extensions: 209322370 >Number of successful extensions: 17383937 >Number of sequences better than 1.0e-50: 110 >length of database: 1,426,479,391 >effective HSP length: 60 >effective length of database: 1,367,186,851 >effective search space used: 869530837236 >frameshift window, decay const: 50, 0.5 >T: 13 >A: 40 >X1: 16 ( 7.3 bits) >X2: 0 ( 0.0 bits) >S1: 41 (21.7 bits) Thanks for pointing this out. The Standalone parser gets used more often, so fixes make it into there more often than the web one. I'll update the WWW parser, and it should fix this problem. jeff From biopython-bugs at bioperl.org Tue Oct 23 18:50:52 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/44 Message-ID: <200110232250.f9NMoqB13198@pw600a.bioperl.org> JitterBug notification new message incoming/44 Message summary for PR#44 From: gec@compbio.berkeley.edu Subject: Raised no existant error? Date: Tue, 23 Oct 2001 18:50:52 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:50:52 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMoqB13192 for ; Tue, 23 Oct 2001 18:50:52 -0400 Date: Tue, 23 Oct 2001 18:50:52 -0400 Message-Id: <200110232250.f9NMoqB13192@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Raised no existant error? Full_Name: Gavin Crooks Module: SCOP/Dom.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) When fed a corrupt file Dom.DomainParser will attempt to raise "error", but error hasn't been defined. NameError: global name 'error' is not defined From biopython-bugs at bioperl.org Tue Oct 23 18:54:40 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/45 Message-ID: <200110232254.f9NMseB13272@pw600a.bioperl.org> JitterBug notification new message incoming/45 Message summary for PR#45 From: gec@compbio.berkeley.edu Subject: PDB sequence numbers can be negative Date: Tue, 23 Oct 2001 18:54:38 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:54:38 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMscB13266 for ; Tue, 23 Oct 2001 18:54:38 -0400 Date: Tue, 23 Oct 2001 18:54:38 -0400 Message-Id: <200110232254.f9NMscB13266@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: PDB sequence numbers can be negative Full_Name: Gavin Crooks Module: SCOP/Location.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) PDB residue sequence numbers can, on occasion, be negative. e.g. 1B9N. SCOP domains sometimes start on negative sequence numbers. This breaks the location parser in Bio.SCOP.Location.py From biopython-bugs at bioperl.org Tue Oct 23 18:56:44 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/46 Message-ID: <200110232256.f9NMuiB13336@pw600a.bioperl.org> JitterBug notification new message incoming/46 Message summary for PR#46 From: gec@compbio.berkeley.edu Subject: PDB sequence numbers can be negative Date: Tue, 23 Oct 2001 18:56:44 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:56:44 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMuiB13330 for ; Tue, 23 Oct 2001 18:56:44 -0400 Date: Tue, 23 Oct 2001 18:56:44 -0400 Message-Id: <200110232256.f9NMuiB13330@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: PDB sequence numbers can be negative Full_Name: Gavin Crooks Module: SCOP/Location.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) PDB residue sequence numbers can, on occasion, be negative. e.g. 1B9N. SCOP domains sometimes start on negative sequence numbers. This breaks the location parser in Bio.SCOP.Location.py From biopython-bugs at bioperl.org Tue Oct 23 23:19:42 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/47 Message-ID: <200110240319.f9O3JgB15062@pw600a.bioperl.org> JitterBug notification new message incoming/47 Message summary for PR#47 From: gec@compbio.berkeley.edu Subject: Tutorial typos Date: Tue, 23 Oct 2001 23:19:41 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 23:19:41 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9O3JfB15056 for ; Tue, 23 Oct 2001 23:19:41 -0400 Date: Tue, 23 Oct 2001 23:19:41 -0400 Message-Id: <200110240319.f9O3JfB15056@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Tutorial typos Full_Name: Gavin Crooks Module: Tutotial.tex Version: OS: Submission from: sdn-ar-013casfrmp012.dialsprint.net (158.252.217.14) The tutorial contains a few minor bugs. Page 5: "Installation of FreeBSD" should be "Installation on FreeBSD" Page 6: The first sentance of section 1.3.3 does not make sence. Everywhere: "ie." should be "i.~e.~TheNextWord", or "i.~e.," Page 11 : "created for free for you" should be "created for free"? Page 43ish: Some html has worked its way into the tex file, producing some odd symbols. Plus some of the number have hats on. From biopython-bugs at bioperl.org Wed Oct 24 13:17:43 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/48 Message-ID: <200110241717.f9OHHhB21139@pw600a.bioperl.org> JitterBug notification new message incoming/48 Message summary for PR#48 From: gec@compbio.berkeley.edu Subject: Unclosed file Date: Wed, 24 Oct 2001 13:17:43 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Wed Oct 24 13:17:43 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9OHHgB21133 for ; Wed, 24 Oct 2001 13:17:43 -0400 Date: Wed, 24 Oct 2001 13:17:43 -0400 Message-Id: <200110241717.f9OHHgB21133@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Unclosed file Full_Name: Gavin Crooks Module: ParserSupport.AbstractParser Version: OS: Submission from: sdn-ar-005casfrmp182.dialsprint.net (158.252.212.184) AbstractParser.parse_file(self,filename) does not close the file it opens. From biopython-bugs at bioperl.org Wed Oct 24 19:50:14 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/48 Message-ID: <200110242350.f9ONoEB24543@pw600a.bioperl.org> JitterBug notification jchang changed notes Message summary for PR#48 From: gec@compbio.berkeley.edu Subject: Unclosed file Date: Wed, 24 Oct 2001 13:17:43 -0400 0 replies 0 followups Notes: It gets closed implicitly as the reference in parse goes out of scope. However, you're right that it's better to be done explicitly, so I've made the changes in the file. Thanks, Jeff ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Wed Oct 24 13:17:43 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9OHHgB21133 for ; Wed, 24 Oct 2001 13:17:43 -0400 Date: Wed, 24 Oct 2001 13:17:43 -0400 Message-Id: <200110241717.f9OHHgB21133@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Unclosed file Full_Name: Gavin Crooks Module: ParserSupport.AbstractParser Version: OS: Submission from: sdn-ar-005casfrmp182.dialsprint.net (158.252.212.184) AbstractParser.parse_file(self,filename) does not close the file it opens. From biopython-bugs at bioperl.org Wed Oct 24 19:50:14 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/48 Message-ID: <200110242350.f9ONoEB24547@pw600a.bioperl.org> JitterBug notification jchang moved PR#48 from incoming to fixed-bugs Message summary for PR#48 From: gec@compbio.berkeley.edu Subject: Unclosed file Date: Wed, 24 Oct 2001 13:17:43 -0400 0 replies 0 followups Notes: It gets closed implicitly as the reference in parse goes out of scope. However, you're right that it's better to be done explicitly, so I've made the changes in the file. Thanks, Jeff ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Wed Oct 24 13:17:43 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9OHHgB21133 for ; Wed, 24 Oct 2001 13:17:43 -0400 Date: Wed, 24 Oct 2001 13:17:43 -0400 Message-Id: <200110241717.f9OHHgB21133@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Unclosed file Full_Name: Gavin Crooks Module: ParserSupport.AbstractParser Version: OS: Submission from: sdn-ar-005casfrmp182.dialsprint.net (158.252.212.184) AbstractParser.parse_file(self,filename) does not close the file it opens. From biopython-bugs at bioperl.org Wed Oct 24 19:54:54 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/44 Message-ID: <200110242354.f9ONsrB24695@pw600a.bioperl.org> JitterBug notification jchang changed notes Message summary for PR#44 From: gec@compbio.berkeley.edu Subject: Raised no existant error? Date: Tue, 23 Oct 2001 18:50:52 -0400 0 replies 0 followups Notes: Oops, you're right. Changed to SyntaxError. Jeff ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:50:52 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMoqB13192 for ; Tue, 23 Oct 2001 18:50:52 -0400 Date: Tue, 23 Oct 2001 18:50:52 -0400 Message-Id: <200110232250.f9NMoqB13192@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Raised no existant error? Full_Name: Gavin Crooks Module: SCOP/Dom.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) When fed a corrupt file Dom.DomainParser will attempt to raise "error", but error hasn't been defined. NameError: global name 'error' is not defined From biopython-bugs at bioperl.org Wed Oct 24 19:54:54 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/44 Message-ID: <200110242354.f9ONssB24699@pw600a.bioperl.org> JitterBug notification jchang moved PR#44 from incoming to fixed-bugs Message summary for PR#44 From: gec@compbio.berkeley.edu Subject: Raised no existant error? Date: Tue, 23 Oct 2001 18:50:52 -0400 0 replies 0 followups Notes: Oops, you're right. Changed to SyntaxError. Jeff ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:50:52 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMoqB13192 for ; Tue, 23 Oct 2001 18:50:52 -0400 Date: Tue, 23 Oct 2001 18:50:52 -0400 Message-Id: <200110232250.f9NMoqB13192@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Raised no existant error? Full_Name: Gavin Crooks Module: SCOP/Dom.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) When fed a corrupt file Dom.DomainParser will attempt to raise "error", but error hasn't been defined. NameError: global name 'error' is not defined From biopython-bugs at bioperl.org Wed Oct 24 19:56:24 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/46 Message-ID: <200110242356.f9ONuOB24799@pw600a.bioperl.org> JitterBug notification jchang changed notes Message summary for PR#46 From: gec@compbio.berkeley.edu Subject: PDB sequence numbers can be negative Date: Tue, 23 Oct 2001 18:56:44 -0400 0 replies 0 followups Notes: dup of 45 ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:56:44 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMuiB13330 for ; Tue, 23 Oct 2001 18:56:44 -0400 Date: Tue, 23 Oct 2001 18:56:44 -0400 Message-Id: <200110232256.f9NMuiB13330@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: PDB sequence numbers can be negative Full_Name: Gavin Crooks Module: SCOP/Location.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) PDB residue sequence numbers can, on occasion, be negative. e.g. 1B9N. SCOP domains sometimes start on negative sequence numbers. This breaks the location parser in Bio.SCOP.Location.py From biopython-bugs at bioperl.org Wed Oct 24 19:56:24 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/46 Message-ID: <200110242356.f9ONuOB24803@pw600a.bioperl.org> JitterBug notification jchang moved PR#46 from incoming to fixed-bugs Message summary for PR#46 From: gec@compbio.berkeley.edu Subject: PDB sequence numbers can be negative Date: Tue, 23 Oct 2001 18:56:44 -0400 0 replies 0 followups Notes: dup of 45 ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 18:56:44 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMuiB13330 for ; Tue, 23 Oct 2001 18:56:44 -0400 Date: Tue, 23 Oct 2001 18:56:44 -0400 Message-Id: <200110232256.f9NMuiB13330@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: PDB sequence numbers can be negative Full_Name: Gavin Crooks Module: SCOP/Location.py Version: OS: Submission from: sienna.berkeley.edu (128.32.236.51) PDB residue sequence numbers can, on occasion, be negative. e.g. 1B9N. SCOP domains sometimes start on negative sequence numbers. This breaks the location parser in Bio.SCOP.Location.py From biopython-bugs at bioperl.org Wed Oct 24 19:57:16 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/49 Message-ID: <200110242357.f9ONvGB24883@pw600a.bioperl.org> JitterBug notification new message incoming/49 Message summary for PR#49 From: Jeffrey Chang Subject: Re: [Biopython-dev] Notification: incoming/46 Date: Wed, 24 Oct 2001 16:58:30 -0700 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From jchang@SMI.Stanford.EDU Wed Oct 24 19:57:15 2001 Received: from crg-gw.Stanford.EDU (root@crg-gw.Stanford.EDU [171.65.32.201]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9ONvAB24866 for ; Wed, 24 Oct 2001 19:57:15 -0400 Received: from [171.65.33.250] (air11-smi.Stanford.EDU [171.65.33.250]) by crg-gw.Stanford.EDU (8.11.5/8.11.5) with ESMTP id f9ONvEC09544 for ; Wed, 24 Oct 2001 16:57:14 -0700 (PDT) Mime-Version: 1.0 X-Sender: jchang@smi.stanford.edu (Unverified) Message-Id: In-Reply-To: <200110232256.f9NMuiB13336@pw600a.bioperl.org> References: <200110232256.f9NMuiB13336@pw600a.bioperl.org> Date: Wed, 24 Oct 2001 16:58:30 -0700 To: biopython-bugs@bioperl.org From: Jeffrey Chang Subject: Re: [Biopython-dev] Notification: incoming/46 Content-Type: text/plain; charset="us-ascii" ; format="flowed" Hi Gavin, Could you send me a sample of this? It'll be helpful to have a test case to test fixes. Thanks, Jeff >JitterBug notification > >new message incoming/46 > >Message summary for PR#46 > From: gec@compbio.berkeley.edu > Subject: PDB sequence numbers can be negative > Date: Tue, 23 Oct 2001 18:56:44 -0400 > 0 replies 0 followups > >====> ORIGINAL MESSAGE FOLLOWS <==== > >>From gec@compbio.berkeley.edu Tue Oct 23 18:56:44 2001 >Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMuiB13330 > for ; Tue, 23 Oct 2001 >18:56:44 -0400 >Date: Tue, 23 Oct 2001 18:56:44 -0400 >Message-Id: <200110232256.f9NMuiB13330@pw600a.bioperl.org> >From: gec@compbio.berkeley.edu >To: biopython-bugs@bioperl.org >Subject: PDB sequence numbers can be negative > >Full_Name: Gavin Crooks >Module: SCOP/Location.py >Version: >OS: >Submission from: sienna.berkeley.edu (128.32.236.51) > > > >PDB residue sequence numbers can, on occasion, be >negative. e.g. 1B9N. SCOP domains sometimes start >on negative sequence numbers. This breaks the >location parser in Bio.SCOP.Location.py > > >_______________________________________________ >Biopython-dev mailing list >Biopython-dev@biopython.org >http://biopython.org/mailman/listinfo/biopython-dev From biopython-bugs at bioperl.org Wed Oct 24 20:49:43 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/50 Message-ID: <200110250049.f9P0nhB25253@pw600a.bioperl.org> JitterBug notification new message incoming/50 Message summary for PR#50 From: "Gavin E. Crooks" Subject: Re: [Biopython-dev] Notification: incoming/49 Date: Wed, 24 Oct 2001 17:40:52 -0700 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@sienna.berkeley.edu Wed Oct 24 20:49:43 2001 Received: from sienna.berkeley.edu (IDENT:root@sienna.Berkeley.EDU [128.32.236.51]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9P0ngB25247 for ; Wed, 24 Oct 2001 20:49:42 -0400 Received: from localhost (localhost [[UNIX: localhost]]) by sienna.berkeley.edu (8.9.3/8.9.3) id RAA03432 for biopython-bugs@bioperl.org; Wed, 24 Oct 2001 17:49:42 -0700 From: "Gavin E. Crooks" Reply-To: gec@compbio.berkeley.edu Organization: Very Little To: biopython-bugs@bioperl.org Subject: Re: [Biopython-dev] Notification: incoming/49 Date: Wed, 24 Oct 2001 17:40:52 -0700 X-Mailer: KMail [version 1.0.29] Content-Type: text/plain References: <200110242357.f9ONvGB24883@pw600a.bioperl.org> In-Reply-To: <200110242357.f9ONvGB24883@pw600a.bioperl.org> MIME-Version: 1.0 Message-Id: <01102417494205.14420@sienna.berkeley.edu> Content-Transfer-Encoding: 8bit How about "A:-1-126", direct from SCOP... 16118 px a.4.5.8 d1b9ma1 1b9m A:-1-126 I am in the middle of updating the SCOP module, and I have already refactored that code, and fixed this bug. And I've written a nice shiny unit test. But I was concerned that this same bug could crop up elsewhere. Its the kind of obscure boundary case that could trip up any code working with PDB sequence numbers. Gavin gec@compbio.berkeley.edu http://threeplusone.com > Hi Gavin, > > Could you send me a sample of this? It'll be helpful to have a test > case to test fixes. > > Thanks, > Jeff > > >Full_Name: Gavin Crooks > >Module: SCOP/Location.py > >Version: > >OS: > >Submission from: sienna.berkeley.edu (128.32.236.51) > > > >PDB residue sequence numbers can, on occasion, be > >negative. e.g. 1B9N. SCOP domains sometimes start > >on negative sequence numbers. This breaks the > >location parser in Bio.SCOP.Location.py > From gec at compbio.berkeley.edu Wed Oct 24 21:03:46 2001 From: gec at compbio.berkeley.edu (Gavin E. Crooks) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/48 In-Reply-To: <200110242350.f9ONoEB24543@pw600a.bioperl.org> References: <200110242350.f9ONoEB24543@pw600a.bioperl.org> Message-ID: <01102418073807.14420@sienna.berkeley.edu> The new code dosn't work as intended, since parse() may raise an exception. This def parse_file(self, filename): h = open(filename) retval = self.parse(h) h.close() return retval should be def parse_file(self, filename): h = open(filename) try: return self.parse(h) finally : h.close() Gavin p.s. The viewcvs diff appears to be broken. On Wed, 24 Oct 2001, you wrote: > JitterBug notification > > jchang changed notes > > Message summary for PR#48 > From: gec@compbio.berkeley.edu > Subject: Unclosed file > Date: Wed, 24 Oct 2001 13:17:43 -0400 > 0 replies 0 followups > Notes: It gets closed implicitly as the reference in parse goes out of scope. However, > you're right that it's better to be done explicitly, so I've made the changes in > the file. > > Thanks, > Jeff > > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > From gec@compbio.berkeley.edu Wed Oct 24 13:17:43 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9OHHgB21133 > for ; Wed, 24 Oct 2001 13:17:43 -0400 > Date: Wed, 24 Oct 2001 13:17:43 -0400 > Message-Id: <200110241717.f9OHHgB21133@pw600a.bioperl.org> > From: gec@compbio.berkeley.edu > To: biopython-bugs@bioperl.org > Subject: Unclosed file > > Full_Name: Gavin Crooks > Module: ParserSupport.AbstractParser > Version: > OS: > Submission from: sdn-ar-005casfrmp182.dialsprint.net (158.252.212.184) > > > AbstractParser.parse_file(self,filename) does not close the file it opens. > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From biopython-bugs at bioperl.org Wed Oct 24 21:56:27 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/47 Message-ID: <200110250156.f9P1uRB25817@pw600a.bioperl.org> JitterBug notification chapmanb changed notes Message summary for PR#47 From: gec@compbio.berkeley.edu Subject: Tutorial typos Date: Tue, 23 Oct 2001 23:19:41 -0400 0 replies 0 followups Notes: Thanks for the pointers! All are fixed in the tex file and on the web. ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 23:19:41 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9O3JfB15056 for ; Tue, 23 Oct 2001 23:19:41 -0400 Date: Tue, 23 Oct 2001 23:19:41 -0400 Message-Id: <200110240319.f9O3JfB15056@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Tutorial typos Full_Name: Gavin Crooks Module: Tutotial.tex Version: OS: Submission from: sdn-ar-013casfrmp012.dialsprint.net (158.252.217.14) The tutorial contains a few minor bugs. Page 5: "Installation of FreeBSD" should be "Installation on FreeBSD" Page 6: The first sentance of section 1.3.3 does not make sence. Everywhere: "ie." should be "i.~e.~TheNextWord", or "i.~e.," Page 11 : "created for free for you" should be "created for free"? Page 43ish: Some html has worked its way into the tex file, producing some odd symbols. Plus some of the number have hats on. From biopython-bugs at bioperl.org Wed Oct 24 21:56:27 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:05 2005 Subject: [Biopython-dev] Notification: incoming/47 Message-ID: <200110250156.f9P1uRB25821@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#47 from incoming to fixed-bugs Message summary for PR#47 From: gec@compbio.berkeley.edu Subject: Tutorial typos Date: Tue, 23 Oct 2001 23:19:41 -0400 0 replies 0 followups Notes: Thanks for the pointers! All are fixed in the tex file and on the web. ====> ORIGINAL MESSAGE FOLLOWS <==== >From gec@compbio.berkeley.edu Tue Oct 23 23:19:41 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9O3JfB15056 for ; Tue, 23 Oct 2001 23:19:41 -0400 Date: Tue, 23 Oct 2001 23:19:41 -0400 Message-Id: <200110240319.f9O3JfB15056@pw600a.bioperl.org> From: gec@compbio.berkeley.edu To: biopython-bugs@bioperl.org Subject: Tutorial typos Full_Name: Gavin Crooks Module: Tutotial.tex Version: OS: Submission from: sdn-ar-013casfrmp012.dialsprint.net (158.252.217.14) The tutorial contains a few minor bugs. Page 5: "Installation of FreeBSD" should be "Installation on FreeBSD" Page 6: The first sentance of section 1.3.3 does not make sence. Everywhere: "ie." should be "i.~e.~TheNextWord", or "i.~e.," Page 11 : "created for free for you" should be "created for free"? Page 43ish: Some html has worked its way into the tex file, producing some odd symbols. Plus some of the number have hats on. From pewilkinson at informaxinc.com Wed Oct 24 22:19:04 2001 From: pewilkinson at informaxinc.com (Peter Wilkinson) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] mxTextools install and biopython 2.1 In-Reply-To: <200110241603.f9OG32B20618@pw600a.bioperl.org> Message-ID: <005501c15cfb$6b776ce0$3ac53604@l001696w00> Does anyone know why the mxTexttools is strangely configures? If activestate Python 2.1 comes with Martel, and we install Biopython in the root of the install as Bio: How is mxTexttools supposed to be linked up properly, how and where is it installed? I had a problem with my install and I had to redo it. I can not figure it out. anyone? Peter > -----Original Message----- > From: biopython-dev-admin@biopython.org > [mailto:biopython-dev-admin@biopython.org]On Behalf Of > biopython-dev-request@biopython.org > Sent: Wednesday, October 24, 2001 10:03 AM > To: biopython-dev@biopython.org > Subject: Biopython-dev digest, Vol 1 #228 - 4 msgs > > > Send Biopython-dev mailing list submissions to > biopython-dev@biopython.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://biopython.org/mailman/listinfo/biopython-dev > or, via email, send a message with subject or body 'help' to > biopython-dev-request@biopython.org > > You can reach the person managing the list at > biopython-dev-admin@biopython.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biopython-dev digest..." > > > Today's Topics: > > 1. Notification: incoming/44 (biopython-bugs@bioperl.org) > 2. Notification: incoming/45 (biopython-bugs@bioperl.org) > 3. Notification: incoming/46 (biopython-bugs@bioperl.org) > 4. Notification: incoming/47 (biopython-bugs@bioperl.org) > > --__--__-- > > Message: 1 > Date: Tue, 23 Oct 2001 18:50:52 -0400 > From: biopython-bugs@bioperl.org > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/44 > > JitterBug notification > > new message incoming/44 > > Message summary for PR#44 > From: gec@compbio.berkeley.edu > Subject: Raised no existant error? > Date: Tue, 23 Oct 2001 18:50:52 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > >From gec@compbio.berkeley.edu Tue Oct 23 18:50:52 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMoqB13192 > for ; Tue, 23 Oct > 2001 18:50:52 -0400 > Date: Tue, 23 Oct 2001 18:50:52 -0400 > Message-Id: <200110232250.f9NMoqB13192@pw600a.bioperl.org> > From: gec@compbio.berkeley.edu > To: biopython-bugs@bioperl.org > Subject: Raised no existant error? > > Full_Name: Gavin Crooks > Module: SCOP/Dom.py > Version: > OS: > Submission from: sienna.berkeley.edu (128.32.236.51) > > > When fed a corrupt file Dom.DomainParser will > attempt to raise "error", but error hasn't been > defined. > > NameError: global name 'error' is not defined > > > > > > > --__--__-- > > Message: 2 > Date: Tue, 23 Oct 2001 18:54:40 -0400 > From: biopython-bugs@bioperl.org > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/45 > > JitterBug notification > > new message incoming/45 > > Message summary for PR#45 > From: gec@compbio.berkeley.edu > Subject: PDB sequence numbers can be negative > Date: Tue, 23 Oct 2001 18:54:38 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > >From gec@compbio.berkeley.edu Tue Oct 23 18:54:38 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMscB13266 > for ; Tue, 23 Oct > 2001 18:54:38 -0400 > Date: Tue, 23 Oct 2001 18:54:38 -0400 > Message-Id: <200110232254.f9NMscB13266@pw600a.bioperl.org> > From: gec@compbio.berkeley.edu > To: biopython-bugs@bioperl.org > Subject: PDB sequence numbers can be negative > > Full_Name: Gavin Crooks > Module: SCOP/Location.py > Version: > OS: > Submission from: sienna.berkeley.edu (128.32.236.51) > > > > PDB residue sequence numbers can, on occasion, be > negative. e.g. 1B9N. SCOP domains sometimes start > on negative sequence numbers. This breaks the > location parser in Bio.SCOP.Location.py > > > > --__--__-- > > Message: 3 > Date: Tue, 23 Oct 2001 18:56:44 -0400 > From: biopython-bugs@bioperl.org > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/46 > > JitterBug notification > > new message incoming/46 > > Message summary for PR#46 > From: gec@compbio.berkeley.edu > Subject: PDB sequence numbers can be negative > Date: Tue, 23 Oct 2001 18:56:44 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > >From gec@compbio.berkeley.edu Tue Oct 23 18:56:44 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9NMuiB13330 > for ; Tue, 23 Oct > 2001 18:56:44 -0400 > Date: Tue, 23 Oct 2001 18:56:44 -0400 > Message-Id: <200110232256.f9NMuiB13330@pw600a.bioperl.org> > From: gec@compbio.berkeley.edu > To: biopython-bugs@bioperl.org > Subject: PDB sequence numbers can be negative > > Full_Name: Gavin Crooks > Module: SCOP/Location.py > Version: > OS: > Submission from: sienna.berkeley.edu (128.32.236.51) > > > > PDB residue sequence numbers can, on occasion, be > negative. e.g. 1B9N. SCOP domains sometimes start > on negative sequence numbers. This breaks the > location parser in Bio.SCOP.Location.py > > > > --__--__-- > > Message: 4 > Date: Tue, 23 Oct 2001 23:19:42 -0400 > From: biopython-bugs@bioperl.org > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/47 > > JitterBug notification > > new message incoming/47 > > Message summary for PR#47 > From: gec@compbio.berkeley.edu > Subject: Tutorial typos > Date: Tue, 23 Oct 2001 23:19:41 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > > >From gec@compbio.berkeley.edu Tue Oct 23 23:19:41 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9O3JfB15056 > for ; Tue, 23 Oct > 2001 23:19:41 -0400 > Date: Tue, 23 Oct 2001 23:19:41 -0400 > Message-Id: <200110240319.f9O3JfB15056@pw600a.bioperl.org> > From: gec@compbio.berkeley.edu > To: biopython-bugs@bioperl.org > Subject: Tutorial typos > > Full_Name: Gavin Crooks > Module: Tutotial.tex > Version: > OS: > Submission from: sdn-ar-013casfrmp012.dialsprint.net (158.252.217.14) > > > > The tutorial contains a few minor bugs. > > Page 5: "Installation of FreeBSD" should be "Installation on FreeBSD" > > Page 6: The first sentance of section 1.3.3 does not make sence. > > Everywhere: "ie." should be "i.~e.~TheNextWord", or "i.~e.," > > Page 11 : "created for free for you" should be "created for free"? > > Page 43ish: Some html has worked its way into the tex file, > producing some odd > symbols. Plus some of the number have hats on. > > > > > --__--__-- > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > > > End of Biopython-dev Digest From idoerg at cc.huji.ac.il Thu Oct 25 03:55:12 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] mxTextools install and biopython 2.1 In-Reply-To: <005501c15cfb$6b776ce0$3ac53604@l001696w00> Message-ID: Hi Peter, On Wed, 24 Oct 2001, Peter Wilkinson wrote: : : Does anyone know why the mxTexttools is strangely configures? If activestate : Python 2.1 comes with Martel, and we install Biopython in the root of the : install as Bio: : : How is mxTexttools supposed to be linked up properly, how and where is it : installed?I had a problem with my install and I had to redo it. I can not : figure it out. : Hi Peter, I remember running into the same problems myself, though I'm a bit fuzzy about the whys and wherefores of the solution. Anyhow, it works for me, so I checked my /usr/local/lib/Python2.1/site-packages/mx tree. Apparently I put a dummy __init__.py file under mx/ directory, and the rest is undes mx/TextTools I'm attaching the tree scheme. I hope this helps. Iddo -- Iddo Friedberg | Tel: +972-2-6757374 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ -------------- next part -------------- From idoerg@arrakis.md.huji.ac.il Thu Oct 25 09:53:00 2001 Date: Thu, 25 Oct 2001 09:49:33 +0200 From: Iddo Friedberg To: idoerg@cc.huji.ac.il /usr/local/lib/python2.1/site-packages/mx/ |-- TextTools | |-- Constants | | |-- Sets.py | | |-- Sets.pyc | | |-- TagTables.py | | |-- TagTables.pyc | | |-- __init__.py | | `-- __init__.pyc | |-- Doc | | `-- mxTextTools.html | |-- Examples | | |-- HTML.py | | |-- HTML.pyc | | |-- Loop.py | | |-- Python.py | | |-- RTF.py | | |-- RegExp.py | | |-- Tim.py | | |-- Words.py | | |-- __init__.py | | |-- __init__.pyc | | |-- altRTF.py | | `-- pytag.py | |-- LICENSE | |-- Makefile.pkg | |-- README | |-- TextTools.py | |-- TextTools.pyc | |-- __init__.py | |-- __init__.pyc | `-- mxTextTools | |-- Makefile | |-- Makefile.pre | |-- Makefile.pre.in | |-- Setup | |-- Setup.in | |-- __init__.py | |-- __init__.pyc | |-- config.c | |-- mx.h | |-- mxTextTools.c | |-- mxTextTools.def | |-- mxTextTools.h | |-- mxTextTools.o | |-- mxTextTools.pyd | |-- mxTextTools.so | |-- mxbmse.c | |-- mxbmse.h | |-- mxbmse.o | |-- mxh.h | |-- mxpyapi.h | |-- mxstdlib.h | |-- mxte.c | |-- mxte.h | |-- mxte.o | |-- sedscript | `-- test.py |-- __init__.py `-- __init__.pyc 5 directories, 54 files From adalke at mindspring.com Thu Oct 25 04:18:57 2001 From: adalke at mindspring.com (Andrew Dalke) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] mxTextools install and biopython 2.1 Message-ID: <0abc01c15d2d$b234c9c0$0301a8c0@josiah.dalkescientific.com> Peter Wilkinson: >Does anyone know why the mxTexttools is strangely configures? If activestate >Python 2.1 comes with Martel, and we install Biopython in the root of the >install as Bio: ActiveState Python comes with Martel? That's news to me! I suspect that's a typo, and you meant "comes with mxTextTools." Here's the list of extensions shipped with their Python 2.1 http://aspn.activestate.com/ASPN/Downloads/ActivePython/Extensions/ I don't know anything about that distribution so I can't give you any pointers about it. >How is mxTexttools supposed to be linked up properly, how and where is it >installed? I had a problem with my install and I had to redo it. I can not >figure it out. > >anyone? What went wrong? If it comes with ActiveState Python then it should just work. If you're installing mxTextTools from scratch, it should be a matter of following the instructures. It's distutils enabled, right? (Just checked. Yes.) So "python setup.py install" should do things just fine. It gets installed in the standard installation location. On Unix machines it's something like /usr/local/lib/python2.1/site-packages (where the "/usr/local" comes from the installation prefix and is usually "/usr" for Linux boxes, and where the "2.1" comes from the Python version number.) (There are a few other places it could be installed which would still work. They are almost never used.) Just copy&paste your work session. That should be enough for me or someone else on the list to figure out what's munged up. Andrew dalke@dalkescientific.com From biopython-bugs at bioperl.org Thu Oct 25 09:36:28 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/51 Message-ID: <200110251336.f9PDaRB30900@pw600a.bioperl.org> JitterBug notification new message incoming/51 Message summary for PR#51 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:36:27 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:36:27 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDaRB30894 for ; Thu, 25 Oct 2001 09:36:27 -0400 Date: Thu, 25 Oct 2001 09:36:27 -0400 Message-Id: <200110251336.f9PDaRB30894@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From biopython-bugs at bioperl.org Thu Oct 25 09:37:29 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/52 Message-ID: <200110251337.f9PDbTB30989@pw600a.bioperl.org> JitterBug notification new message incoming/52 Message summary for PR#52 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:37:29 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:37:29 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDbTB30983 for ; Thu, 25 Oct 2001 09:37:29 -0400 Date: Thu, 25 Oct 2001 09:37:29 -0400 Message-Id: <200110251337.f9PDbTB30983@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From biopython-bugs at bioperl.org Thu Oct 25 15:22:41 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/51 Message-ID: <200110251922.f9PJMeB01895@pw600a.bioperl.org> JitterBug notification chapmanb changed notes Message summary for PR#51 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:36:27 -0400 0 replies 0 followups Notes: Thanks for pointing this out. Fixed in CVS (my easiest fix ever :-) ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:36:27 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDaRB30894 for ; Thu, 25 Oct 2001 09:36:27 -0400 Date: Thu, 25 Oct 2001 09:36:27 -0400 Message-Id: <200110251336.f9PDaRB30894@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From biopython-bugs at bioperl.org Thu Oct 25 15:22:41 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/51 Message-ID: <200110251922.f9PJMfB01899@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#51 from incoming to fixed-bugs Message summary for PR#51 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:36:27 -0400 0 replies 0 followups Notes: Thanks for pointing this out. Fixed in CVS (my easiest fix ever :-) ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:36:27 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDaRB30894 for ; Thu, 25 Oct 2001 09:36:27 -0400 Date: Thu, 25 Oct 2001 09:36:27 -0400 Message-Id: <200110251336.f9PDaRB30894@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From biopython-bugs at bioperl.org Thu Oct 25 15:23:39 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/52 Message-ID: <200110251923.f9PJNdB01997@pw600a.bioperl.org> JitterBug notification chapmanb changed notes Message summary for PR#52 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:37:29 -0400 0 replies 0 followups Notes: Duplicate of bug 51 ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:37:29 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDbTB30983 for ; Thu, 25 Oct 2001 09:37:29 -0400 Date: Thu, 25 Oct 2001 09:37:29 -0400 Message-Id: <200110251337.f9PDbTB30983@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From biopython-bugs at bioperl.org Thu Oct 25 15:23:40 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Notification: incoming/52 Message-ID: <200110251923.f9PJNeB02001@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#52 from incoming to fixed-bugs Message summary for PR#52 From: crm17@cornell.edu Subject: biopython-1.00a3/Bio/__init__.py error Date: Thu, 25 Oct 2001 09:37:29 -0400 0 replies 0 followups Notes: Duplicate of bug 51 ====> ORIGINAL MESSAGE FOLLOWS <==== >From crm17@cornell.edu Thu Oct 25 09:37:29 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f9PDbTB30983 for ; Thu, 25 Oct 2001 09:37:29 -0400 Date: Thu, 25 Oct 2001 09:37:29 -0400 Message-Id: <200110251337.f9PDbTB30983@pw600a.bioperl.org> From: crm17@cornell.edu To: biopython-bugs@bioperl.org Subject: biopython-1.00a3/Bio/__init__.py error Full_Name: Chris Myers Module: biopython-1.00a3/Bio/__init__.py Version: biopython-1.00a3 OS: Linux Submission from: sowhat.tc.cornell.edu (128.84.162.75) the __all__ definition in biopython-1.00a3/Bio/__init__.py is missing a comma between "Alphabet" and "Blast" __all__ = [ "Align", "Alphabet" "Blast", # ... ] The result is that from Bio import * chokes, claiming: AttributeError: 'Bio' module has no attribute 'AlphabetBlast' Adding the missing comma fixes this. From katel at worldpath.net Thu Oct 25 20:02:46 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser Message-ID: <000801c15db1$8c21bb60$3170bbd1@g0fjl> I've completed checking the MASE( IntelliGenetics ) parser and I would like to start the PIR parser. Let me know if you are working on it currently. Cayte -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20011025/7c3b2d8f/attachment.htm From katel at worldpath.net Thu Oct 25 20:21:17 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Phylogenetic rats nests( oops I meant trees ) Message-ID: <001401c15db4$21ffe740$3170bbd1@g0fjl> Many of the bioformats represent phylogenetic data as well as sequence or path data. I can think of a number of problems with phylogenetic data. 1. A number of types of relationships are possible. A sequence may be a descendent or an ancestor of another sequence. They both may have a common ancestor. They may have converged to the same patteern. They may have hopped across species. Whatever the arguments against transgenic species, the assertion that it never happens in nature ain'tr so! The latest issue of Natural History describes how the vertebrate immune system may have once been a parasite. 2. Researchers often don't agree among themselves what these relationships are. Should the links contain epistemology links that describe the level of confidence and the methodology plus journal references? 3. Links will change as new research unfolds. This will be a maintenance issue( luckily not ours ). But we need the ability to easily change links and remove dead links. Should a mechanism for the storage historical information be provided? 4. What if an intermediate is found between an ancestor descendent pair? Should we delete the old link? Then the annotation will be lost. Should the old link contain pointers to the new links? 5. Should we limit our scope to just seuences? Please share your thoughts on this. Cayte -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20011025/8b04bd4e/attachment.htm From adalke at mindspring.com Fri Oct 26 00:27:32 2001 From: adalke at mindspring.com (Andrew Dalke) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser Message-ID: <0d0201c15dd6$885a7a60$0301a8c0@josiah.dalkescientific.com> Hey Cayte, Oops! I forgot to follow up when you mentioned this the first time. There's a set of parsers in Martel (pre-CVS version, not yet integrated into the current code base). One of them is for PIR. The last Martel release I see was version 0.5 http://www.biopython.org/~dalke/Martel-0.5.tar.gz and the PIR format definition for Martel is at http://www.biopython.org/~dalke/Martel-0.5/formats/PIR_3_0.py I don't recall the status of the parser. Searching through the back emails I see a thread of mine titled "PIR parsing" so you might want to start from http://biopython.org/pipermail/biopython-dev/2000-December/000186.html BTW, in other news I'm getting closer to signing a development agreement to work more on Biopython, which means spending time finishing up Martel, working on format definitions, etc. Andrew From thomas at cbs.dtu.dk Fri Oct 26 03:10:35 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Phylogenetic rats nests( oops I meant trees ) In-Reply-To: "Cayte"'s message of "Thu, 25 Oct 2001 20:21:17 -0400" References: <001401c15db4$21ffe740$3170bbd1@g0fjl> Message-ID: "Cayte" writes: > Many of the bioformats represent phylogenetic data as well as sequence or path > data.? I can think of a number of problems with > phylogenetic data. I do not really understand your questions. Are you concerned about how to store and convert sequence formats containing alignments, or are planning a huge phylogeny database project or are you trying to answer philosophical aspects of molecular evolution :-) ? The sequence is the base object. An alignment represents - one among many - solution to linearize sequences. A phylogenetic tree is way of clustering sequences considering evolutionary changes. The reconstruction of a phylogenetic tree is most of the time based on a sequence alignment and dependents on how you interpret the data and which method you use (Distance [UPGMA and, Neighbor Joining], Maximum Parsimony and Maximum Likelihood) > > 1.? A number of types of relationships are possible.? A sequence may be a > descendent or an ancestor of another sequence. > > They both may have a common ancestor.? They may have converged to the same > patteern.? They may have hopped across species.? Whatever the arguments against > transgenic species, the assertion that it never happens in nature ain'tr so!? The > latest issue of Natural History describes how the vertebrate immune system may have > once been a parasite. Lateral (horizontal) gene transfer [LGT] is very common, the biggest known events are the origin of eukaryotic mitochondria from alpha-proteo bacteria and the origin of plant chloroplast from cyanobacteria. There is a still ongoing flow of genetic material between Thermotoga (eubacteria) and Pyrococcus (archaea), which makes it hard to tell the original "owner" of a gene. But: LGT does mainly affect our _interpretation_ of sequence data. > 2.? Researchers often don't agree among themselves what these relationships are.? > Should the links contain epistemology links that describe the level of confidence > and the methodology plus journal references? Who is going to decide the level of confidence ? a referee ? the bootstrap values ? There is no way to prove a phylogenetic relationship with sequences only. > 3.? Links will change as new research unfolds.? This will be a maintenance issue( > luckily not ours ).?But we need the ability to easily change links and remove dead > links.? Should?a mechanism for the?storage historical information be provided? New phylogenetic trees with new information and biological interpretation will emerge and be published ... which will result in new sequence entries. > > 4. What if an intermediate is found between an ancestor descendent pair? > Should we delete the old link?? Then the annotation will be lost.? Should > the old link contain pointers to the new links? What exactly are "links" ? Is this a synonym for nodes in the tree or hyperlinks (XRef's) in e.g. EMBL annotations ? > 5.? Should we limit our scope to just seuences? What is the original problem description ? If you are planning normal sequence/alignment/tree format storage, then you should not include additional interpretations and views which are not found in the original experiment (experiment =alignment, tree reconstruction, evolutionary interpretation) On the other hand, if you are thinking about an internal phylogeny database which gets dynamically updated during e.g. the coordination of sequencing projects, then trees should be reconstructed after each change and contradicting node annotations should get logged inside the database. Could you please mail me your intended scope ? maybe-I-should-start-the-day-with-coffee-instead-of-biopython-postings'ly yr's thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From klindner at tality.com Fri Oct 26 13:26:42 2001 From: klindner at tality.com (Kathy Lindner) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Re: Phylogenetic scope Message-ID: <000701c15e43$61492d50$bda58c9e@pc-klindner1.cadence.com> The scope is simply to represent phylogenetic data if is comes with a sequence. Some formats( nexus for example ) support phylogenetic data. No databases, please.:) Cayte -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20011026/3b4c5e21/attachment.htm From katel at worldpath.net Sat Oct 27 00:19:15 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser References: <0d0201c15dd6$885a7a60$0301a8c0@josiah.dalkescientific.com> Message-ID: <003e01c15e9e$8b931340$ea70bbd1@g0fjl> ----- Original Message ----- From: "Andrew Dalke" To: Sent: Friday, October 26, 2001 12:27 AM Subject: Re: [Biopython-dev] Last chance to clam PIR parser > The last Martel release I see was version 0.5 > http://www.biopython.org/~dalke/Martel-0.5.tar.gz > and the PIR format definition for Martel is at > http://www.biopython.org/~dalke/Martel-0.5/formats/PIR_3_0.py > What formats can I work on without a collision? GCG/MSF? The PIR format described in www.sander.embl-ebi.ac.uk/Services/webin/help/webin-align/align_format_help.html#pir seems different and more fastalike than the way you describe PIR. Are there different renditions of pir? Cayte -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20011027/4b3aaba2/attachment.htm From adalke at mindspring.com Sat Oct 27 00:34:04 2001 From: adalke at mindspring.com (Andrew Dalke) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser Message-ID: <10a001c15ea0$9ca4aa20$0301a8c0@josiah.dalkescientific.com> Cayte: > The PIR format described in >www.sander.embl-ebi.ac.uk/Services/webin/help/webin-align/align_format_help .html#pir > seems different and more fastalike than the way you describe PIR. Are > there different renditions of pir? Oh, right. Yes, there are. The PIR format I have is for the CODATA format, which includes a lot more data than the NBRF format you're probably thinking of. PIR releases data in three formats: CODATA -- human readable (meaning it has 2D formatting to make it easier to find the different sections), hard to parse -- even with Martel NBRF -- machine readable, hard for humans to read XML (new) -- somewhere in the middle These are linked from http://pir.georgetown.edu/pirwww/dbinfo/pirpsd.html It seems most people want to read the NBRF format, not the CODATA one. But I did the CODATA one because I was thinking about the "convert to HTML" ability of Martel. Also, there were more fields in the CODATA format than the other two -- at least, there were fields there which were undocumented. I sent some email to the PIR people on them but never got a response about what they meant. Andrew dalke@dalkescientific.com From katel at worldpath.net Sat Oct 27 01:04:29 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser References: <10a001c15ea0$9ca4aa20$0301a8c0@josiah.dalkescientific.com> Message-ID: <005001c15ea4$dd44b6c0$ea70bbd1@g0fjl> ----- Original Message ----- From: "Andrew Dalke" To: Sent: Saturday, October 27, 2001 12:34 AM Subject: Re: [Biopython-dev] Last chance to clam PIR parser > Cayte: > > The PIR format described in > >www.sander.embl-ebi.ac.uk/Services/webin/help/webin-align/align_format_help > .html#pir > > > seems different and more fastalike than the way you describe PIR. Are > > there different renditions of pir? > > Oh, right. Yes, there are. The PIR format I have is for the CODATA > format, which includes a lot more data than the NBRF format you're > probably thinking of. > So shud I write a parser for NBRF? With the recession, I'm working 2 days a week. Might as well take advantage of the free time, it won'y last. Cayte From adalke at mindspring.com Sat Oct 27 00:49:41 2001 From: adalke at mindspring.com (Andrew Dalke) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Last chance to clam PIR parser Message-ID: <10b801c15ea2$cabbc220$0301a8c0@josiah.dalkescientific.com> Cayte: > So shud I write a parser for NBRF? Yes. I think more people are interested in importing data into their DBMS then marking it up into HTML. Andrew From j.joung at AptusGenomics.com Tue Oct 30 14:17:29 2001 From: j.joung at AptusGenomics.com (Jeong Joung) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] RE: Parsing Protein GenBank Records In-Reply-To: <20010918212532.A3580@ci350185-a.athen1.ga.home.com> Message-ID: Hello, Thanks for your help. The updated parser now works well for most REFSEQ proteins. I came across several REFSEQ protein records where the parser still fails on UNIX machine. The following is the error message: Traceback (most recent call last): entry = parser.parse(gb_handle) File "/usr/.../Bio/GenBank/__init__.py", line 281, in parse self._scanner.feed(handle, self_consumer) File "/usr/.../Bio/GenBank/__init__.py", line 1143, in feed self._parser.parseFile(handle) File "/usr/.../Martel/Parser.py", line 226, in parseFile self.parseString(fileobj.read()) File "/usr/.../Martel/Parser.py", line 254, in parseString self._err_handler.fatalError(result) File "/usr/.../python2.1/xml/sax/handler.py", line 38, in fatalError raise exceptionParserPositionException: error parsing at or beyond character 2889 Any help will be greatly appreciated. Thank You, Jeong -----Original Message----- From: Brad Chapman [mailto:chapmanb@arches.uga.edu] Sent: Tuesday, September 18, 2001 9:26 PM To: Jeong Joung Cc: biopython-dev@biopython.org Subject: Re: Parsing Protein GenBank Records Hi Joung; (ccing this to biopython-dev since this is relevant to everyone) > I'm having trouble parsing GenBank records obtained from the protein > database. The parser works fine for nucleotide GenBank records , but not for > protein records. I would appreciate it very much if you can guide me in > right direction for parsing such records. > > Here is the code and the error that I get back. > > >>> parser = GenBank.RecordParser() > >>> ncbi = GenBank.NCBIDictionary(database='Protein') > >>> rec = ncbi['6754304'] The parser does work for proteins in general, but does fail badly on this particular REFSEQ sequence. In the past, REFSEQ stuff has been only "sort of" GenBank format, and this record is no exception. It has a lot of formatting problems (has no identifier for the sequence type in the LOCUS line, has extra DBSOURCE tag, has non-standard feature table types and keys (Protein, Region, region_name)). Anyways, it is a big non-standard formatting mess. I've fixed the GenBank parser to be able to handle this, and checked the changes into CVS. Diffs to the relevant files (Record.py, __init__.py and genbank_format.py in Bio.GenBank) are also attached to this file in case you don't have CVS access. Thanks for the bug report. Hope this works for you! Brad -- PGP public key available from http://pgp.mit.edu/ From j.joung at AptusGenomics.com Tue Oct 30 15:48:19 2001 From: j.joung at AptusGenomics.com (Jeong Joung) Date: Sat Mar 5 14:43:06 2005 Subject: [Biopython-dev] Parsing Protein GenBank Records Message-ID: Hi, I just found out that this problem occurs on some REFSEQ nucleotide records as well. Thank You, Jeong -----Original Message----- From: Jeong Joung [mailto:j.joung@AptusGenomics.com] Sent: Tuesday, October 30, 2001 2:17 PM To: Brad Chapman Cc: biopython-dev@biopython.org Subject: RE: Parsing Protein GenBank Records Hello, Thanks for your help. The updated parser now works well for most REFSEQ proteins. I came across several REFSEQ protein records where the parser still fails on UNIX machine. The following is the error message: Traceback (most recent call last): entry = parser.parse(gb_handle) File "/usr/.../Bio/GenBank/__init__.py", line 281, in parse self._scanner.feed(handle, self_consumer) File "/usr/.../Bio/GenBank/__init__.py", line 1143, in feed self._parser.parseFile(handle) File "/usr/.../Martel/Parser.py", line 226, in parseFile self.parseString(fileobj.read()) File "/usr/.../Martel/Parser.py", line 254, in parseString self._err_handler.fatalError(result) File "/usr/.../python2.1/xml/sax/handler.py", line 38, in fatalError raise exceptionParserPositionException: error parsing at or beyond character 2889 Any help will be greatly appreciated. Thank You, Jeong -----Original Message----- From: Brad Chapman [mailto:chapmanb@arches.uga.edu] Sent: Tuesday, September 18, 2001 9:26 PM To: Jeong Joung Cc: biopython-dev@biopython.org Subject: Re: Parsing Protein GenBank Records Hi Joung; (ccing this to biopython-dev since this is relevant to everyone) > I'm having trouble parsing GenBank records obtained from the protein > database. The parser works fine for nucleotide GenBank records , but not for > protein records. I would appreciate it very much if you can guide me in > right direction for parsing such records. > > Here is the code and the error that I get back. > > >>> parser = GenBank.RecordParser() > >>> ncbi = GenBank.NCBIDictionary(database='Protein') > >>> rec = ncbi['6754304'] The parser does work for proteins in general, but does fail badly on this particular REFSEQ sequence. In the past, REFSEQ stuff has been only "sort of" GenBank format, and this record is no exception. It has a lot of formatting problems (has no identifier for the sequence type in the LOCUS line, has extra DBSOURCE tag, has non-standard feature table types and keys (Protein, Region, region_name)). Anyways, it is a big non-standard formatting mess. I've fixed the GenBank parser to be able to handle this, and checked the changes into CVS. Diffs to the relevant files (Record.py, __init__.py and genbank_format.py in Bio.GenBank) are also attached to this file in case you don't have CVS access. Thanks for the bug report. Hope this works for you! Brad -- PGP public key available from http://pgp.mit.edu/