From biopython-bugs at bioperl.org Thu May 3 20:18:40 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/30 Message-ID: <200105040018.f440Ie208736@pw600a.bioperl.org> JitterBug notification new message incoming/30 Message summary for PR#30 From: dimlight@lgci.co.kr Subject: PRIVATE: About Genbank Iterator Date: Thu, 3 May 2001 20:18:39 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From dimlight@lgci.co.kr Thu May 3 20:18:39 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f440Ic208720 for ; Thu, 3 May 2001 20:18:39 -0400 Date: Thu, 3 May 2001 20:18:39 -0400 Message-Id: <200105040018.f440Ic208720@pw600a.bioperl.org> From: dimlight@lgci.co.kr To: biopython-bugs@bioperl.org Subject: PRIVATE: About Genbank Iterator Full_Name: Wankyu Kim Module: GenBank,SeqFeature Version: biopython-1.00a1 OS: win98 Submission from: cache14.bora.net (210.120.192.31) I tried parsing GenBank-formatted file and just print every element on screen. And I've downloaded RefSeq flat file in Genbank format at the following site. ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/hs.gbff.gz After unzipped the hs.gbff.gz file, I tryed parsing every element of RefSeq Record. It seemed working very well, and I could see the parsed elements scrolling down on and on... but on parsing 5287th record, I had the following error message. Traceback (innermost last): File "C:\Python20\genbank_element.py", line 11, in ? cur_record = gb_iterator.next() File "c:\python20\Bio\GenBank\__init__.py", line 156, in next return self._parser.parse(File.StringHandle(data)) File "c:\python20\Bio\GenBank\__init__.py", line 233, in parse self._scanner.feed(handle, self._consumer) File "c:\python20\Bio\GenBank\__init__.py", line 1004, in feed self._parser.parseFile(handle) File "c:\python20\Martel\Parser.py", line 206, in parseFile self.parseString(fileobj.read()) File "c:\python20\Martel\Parser.py", line 234, in parseString self._err_handler.fatalError(result) File "c:\python20\lib\xml\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 446 I had similar errors on RedHat 6.2 too. Please cut & paste my code and test it. It'will took hours to test. < Code > from Bio import GenBank gb_file = "hs.gbff" from Bio import SeqFeature gb_handle = open(gb_file, 'r') feature_parser = GenBank.FeatureParser() gb_iterator = GenBank.Iterator(gb_handle, feature_parser) k = 0 while 1: cur_record = gb_iterator.next() k = k +1 print print "record no", k print if cur_record is None: break print "cur_record.seq:", cur_record.seq.tostring() print print "cur_record.id",cur_record.id print print "cur_record.name", cur_record.name print print "cur_record.description", cur_record.description print print "cur_record.annotations" print "gi : ", cur_record.annotations['gi'] print "organism : ", cur_record.annotations['organism'] print "taxonomy : ", cur_record.annotations['taxonomy'][:] print "keywords : ", cur_record.annotations['keywords'] print "data_file_division : ", cur_record.annotations['data_file_division'] print "date : ", cur_record.annotations['date'] ref_len = len(cur_record.annotations['references']) for j in range(ref_len): print cur_record.annotations['references'][j].journal print cur_record.annotations['references'][j].title print cur_record.annotations['references'][j].authors print cur_record.annotations['references'][j].medline_id print cur_record.annotations['references'][j].pubmed_id print cur_record.annotations['references'][j].comment print len(cur_record.features) i = len(cur_record.features) for i in range(i): print "type:", '\t\t',cur_record.features[i].type print "location:",'\t', cur_record.features[i].location for key in cur_record.features[i].qualifiers.keys(): print key, '\t', cur_record.features[i].qualifiers[key] print print print "Congulatulations!!! You've gone through RefSeq file " print print From biopython-bugs at bioperl.org Fri May 4 12:58:56 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/30 Message-ID: <200105041658.f44Gwt223011@pw600a.bioperl.org> JitterBug notification chapmanb changed notes Message summary for PR#30 From: dimlight@lgci.co.kr Subject: PRIVATE: About Genbank Iterator Date: Thu, 3 May 2001 20:18:39 -0400 0 replies 0 followups Notes: Problem was GenBank record NM_006141.1, which was lacking a REFERENCE section. Fixed the parser to be able to handle this case, fixes in CVS. ====> ORIGINAL MESSAGE FOLLOWS <==== >From dimlight@lgci.co.kr Thu May 3 20:18:39 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f440Ic208720 for ; Thu, 3 May 2001 20:18:39 -0400 Date: Thu, 3 May 2001 20:18:39 -0400 Message-Id: <200105040018.f440Ic208720@pw600a.bioperl.org> From: dimlight@lgci.co.kr To: biopython-bugs@bioperl.org Subject: PRIVATE: About Genbank Iterator Full_Name: Wankyu Kim Module: GenBank,SeqFeature Version: biopython-1.00a1 OS: win98 Submission from: cache14.bora.net (210.120.192.31) I tried parsing GenBank-formatted file and just print every element on screen. And I've downloaded RefSeq flat file in Genbank format at the following site. ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/hs.gbff.gz After unzipped the hs.gbff.gz file, I tryed parsing every element of RefSeq Record. It seemed working very well, and I could see the parsed elements scrolling down on and on... but on parsing 5287th record, I had the following error message. Traceback (innermost last): File "C:\Python20\genbank_element.py", line 11, in ? cur_record = gb_iterator.next() File "c:\python20\Bio\GenBank\__init__.py", line 156, in next return self._parser.parse(File.StringHandle(data)) File "c:\python20\Bio\GenBank\__init__.py", line 233, in parse self._scanner.feed(handle, self._consumer) File "c:\python20\Bio\GenBank\__init__.py", line 1004, in feed self._parser.parseFile(handle) File "c:\python20\Martel\Parser.py", line 206, in parseFile self.parseString(fileobj.read()) File "c:\python20\Martel\Parser.py", line 234, in parseString self._err_handler.fatalError(result) File "c:\python20\lib\xml\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 446 I had similar errors on RedHat 6.2 too. Please cut & paste my code and test it. It'will took hours to test. < Code > from Bio import GenBank gb_file = "hs.gbff" from Bio import SeqFeature gb_handle = open(gb_file, 'r') feature_parser = GenBank.FeatureParser() gb_iterator = GenBank.Iterator(gb_handle, feature_parser) k = 0 while 1: cur_record = gb_iterator.next() k = k +1 print print "record no", k print if cur_record is None: break print "cur_record.seq:", cur_record.seq.tostring() print print "cur_record.id",cur_record.id print print "cur_record.name", cur_record.name print print "cur_record.description", cur_record.description print print "cur_record.annotations" print "gi : ", cur_record.annotations['gi'] print "organism : ", cur_record.annotations['organism'] print "taxonomy : ", cur_record.annotations['taxonomy'][:] print "keywords : ", cur_record.annotations['keywords'] print "data_file_division : ", cur_record.annotations['data_file_division'] print "date : ", cur_record.annotations['date'] ref_len = len(cur_record.annotations['references']) for j in range(ref_len): print cur_record.annotations['references'][j].journal print cur_record.annotations['references'][j].title print cur_record.annotations['references'][j].authors print cur_record.annotations['references'][j].medline_id print cur_record.annotations['references'][j].pubmed_id print cur_record.annotations['references'][j].comment print len(cur_record.features) i = len(cur_record.features) for i in range(i): print "type:", '\t\t',cur_record.features[i].type print "location:",'\t', cur_record.features[i].location for key in cur_record.features[i].qualifiers.keys(): print key, '\t', cur_record.features[i].qualifiers[key] print print print "Congulatulations!!! You've gone through RefSeq file " print print From biopython-bugs at bioperl.org Fri May 4 12:58:57 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/30 Message-ID: <200105041658.f44Gwv223026@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#30 from incoming to fixed-bugs Message summary for PR#30 From: dimlight@lgci.co.kr Subject: PRIVATE: About Genbank Iterator Date: Thu, 3 May 2001 20:18:39 -0400 0 replies 0 followups Notes: Problem was GenBank record NM_006141.1, which was lacking a REFERENCE section. Fixed the parser to be able to handle this case, fixes in CVS. ====> ORIGINAL MESSAGE FOLLOWS <==== >From dimlight@lgci.co.kr Thu May 3 20:18:39 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f440Ic208720 for ; Thu, 3 May 2001 20:18:39 -0400 Date: Thu, 3 May 2001 20:18:39 -0400 Message-Id: <200105040018.f440Ic208720@pw600a.bioperl.org> From: dimlight@lgci.co.kr To: biopython-bugs@bioperl.org Subject: PRIVATE: About Genbank Iterator Full_Name: Wankyu Kim Module: GenBank,SeqFeature Version: biopython-1.00a1 OS: win98 Submission from: cache14.bora.net (210.120.192.31) I tried parsing GenBank-formatted file and just print every element on screen. And I've downloaded RefSeq flat file in Genbank format at the following site. ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/hs.gbff.gz After unzipped the hs.gbff.gz file, I tryed parsing every element of RefSeq Record. It seemed working very well, and I could see the parsed elements scrolling down on and on... but on parsing 5287th record, I had the following error message. Traceback (innermost last): File "C:\Python20\genbank_element.py", line 11, in ? cur_record = gb_iterator.next() File "c:\python20\Bio\GenBank\__init__.py", line 156, in next return self._parser.parse(File.StringHandle(data)) File "c:\python20\Bio\GenBank\__init__.py", line 233, in parse self._scanner.feed(handle, self._consumer) File "c:\python20\Bio\GenBank\__init__.py", line 1004, in feed self._parser.parseFile(handle) File "c:\python20\Martel\Parser.py", line 206, in parseFile self.parseString(fileobj.read()) File "c:\python20\Martel\Parser.py", line 234, in parseString self._err_handler.fatalError(result) File "c:\python20\lib\xml\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 446 I had similar errors on RedHat 6.2 too. Please cut & paste my code and test it. It'will took hours to test. < Code > from Bio import GenBank gb_file = "hs.gbff" from Bio import SeqFeature gb_handle = open(gb_file, 'r') feature_parser = GenBank.FeatureParser() gb_iterator = GenBank.Iterator(gb_handle, feature_parser) k = 0 while 1: cur_record = gb_iterator.next() k = k +1 print print "record no", k print if cur_record is None: break print "cur_record.seq:", cur_record.seq.tostring() print print "cur_record.id",cur_record.id print print "cur_record.name", cur_record.name print print "cur_record.description", cur_record.description print print "cur_record.annotations" print "gi : ", cur_record.annotations['gi'] print "organism : ", cur_record.annotations['organism'] print "taxonomy : ", cur_record.annotations['taxonomy'][:] print "keywords : ", cur_record.annotations['keywords'] print "data_file_division : ", cur_record.annotations['data_file_division'] print "date : ", cur_record.annotations['date'] ref_len = len(cur_record.annotations['references']) for j in range(ref_len): print cur_record.annotations['references'][j].journal print cur_record.annotations['references'][j].title print cur_record.annotations['references'][j].authors print cur_record.annotations['references'][j].medline_id print cur_record.annotations['references'][j].pubmed_id print cur_record.annotations['references'][j].comment print len(cur_record.features) i = len(cur_record.features) for i in range(i): print "type:", '\t\t',cur_record.features[i].type print "location:",'\t', cur_record.features[i].location for key in cur_record.features[i].qualifiers.keys(): print key, '\t', cur_record.features[i].qualifiers[key] print print print "Congulatulations!!! You've gone through RefSeq file " print print From chapmanb at arches.uga.edu Fri May 4 13:06:16 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] GenBank parser Bug in RefSeq Message-ID: <15090.57736.803381.414239@taxus.athen1.ga.home.com> Wankyu; Thanks for the bug report on the GenBank parser. I downloaded the RefSeq and it turns out the problem is in NM_006141.1. This record is completely lacking a REFERENCE section, which I had assumed was mandatory. The fixes were applied to biopython CVS. You can either download the CVS version anonymously (see http://cvs.biopython.org for instructions), or just replace Bio/GenBank/genbank_format.py: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/genbank_format.py?rev=1.5&content-type=text/vnd.viewcvs-markup&cvsroot=biopython You may have to wait for a couple hours for anonymous CVS to catch up and get the new revisions. After this fix, the parser successfully made it through the whole file, so you should get to see your Congratulations message :-). Please let me know if this doesn't fix your problem. Thanks again for the report. Brad From dalke at acm.org Mon May 7 02:00:47 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] wiki problem Message-ID: <054701c0d6bb$10566620$0301a8c0@josiah> I'm trying to add a new topic to the wiki and I found that I can't edit any pages. I get: > Authorization Required > This server could not verify that you are authorized to access > the document requested. Either you supplied the wrong credentials > (e.g., bad password), or your browser doesn't understand how to > supply the credentials required. What should I do? Do I reenter the password and if so, how? (I'm using MSIE 5 under Win98 if that's any help.) I can go to the biopython.org machine and use lynx to get authorization, but usability under lynx is bad enough that I would rather fix this problem. Andrew dalke@acm.org From chapmanb at arches.uga.edu Mon May 7 21:05:05 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] wiki problem In-Reply-To: <054701c0d6bb$10566620$0301a8c0@josiah> References: <054701c0d6bb$10566620$0301a8c0@josiah> Message-ID: <15095.17985.737963.881457@taxus.athen1.ga.home.com> Hi Andrew! Since no one who actually knows anything about the wiki setup answered, I thought I would give it a shot based on what I know. > I'm trying to add a new topic to the wiki and I found that > I can't edit any pages. I get: > > > Authorization Required > > This server could not verify that you are authorized to access > > the document requested. Either you supplied the wrong credentials > > (e.g., bad password), or your browser doesn't understand how to > > supply the credentials required. Hmmm, I'm not sure how the authentication works, but is it possible its implemented with cookies and you somehow got a cookie with the wrong username and password? Or, maybe MSIE is trying to use the annoying "automatic username and password" thing and giving the wiki the wrong password? Just some guesses, I really don't know much about it because I always use lynx, leading my too.... > I can go > to the biopython.org machine and use lynx to get authorization, > but usability under lynx is bad enough that I would rather fix > this problem. Actually, editing wiki pages using lynx can be really sweet once you get it set up. Lynx can allow you to edit the page in a decent editor (trying to use lynx's text boxes is madness). If you are doing it on biopython.org, they have emacs there, and have X11-forwarding enabled on ssh, so you could set emacs to your editor in your lynx configuration (hit 'o' for options and then set the 'Editor' option to emacs), and then hit esc-e in a lynx test box to launch emacs to edit the page. Anyways, just a thought -- at least this way you can get stuff done without fighting with IE. Hope this helps some, Brad From dalke at acm.org Tue May 8 02:32:56 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] BOSC biopython outline Message-ID: <3AF79318.C976C12B@acm.org> Hey all, As most of you know, I'm presenting the Biopython talk at BOSC 2001, which is in Copenhagen. Unlike almost every other presentation I've given, I'm starting to work on this one with time to spare. And unlike every other one, I'm going to try a collaborative development through Wiki. Take a look at http://www.biopython.org/wiki/html/BioPython/BoscOhOne.html (That's "BoscOhOne" since neither BOSC01 or Bosc01 allow automatic linking.) All it contains is a sketch of an outline of what I may talk about. I'm thinking a three part presentation: - executive overview - who/what/... - walkthrough/tutorial - this is the main part, and hits the highlights of some of the things biopython can do - future - what we would like to do Feel free to make contributions to the page! All I'm looking for now are ideas and topics to discuss. From that I hope to get enough for a good outline, which should then lead to things like the presentataion, additional tutorials and documentation, and more wiki nodes. Andrew dalke@acm.org From thomas at cbs.dtu.dk Tue May 8 04:01:22 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] BOSC biopython outline In-Reply-To: Andrew Dalke's message of "Tue, 08 May 2001 00:32:56 -0600" References: <3AF79318.C976C12B@acm.org> Message-ID: Hej Andrew, > As most of you know, I'm presenting the Biopython talk at BOSC 2001, > which is in Copenhagen. Unlike almost every other presentation I've > given, > I'm starting to work on this one with time to spare. And unlike every > other one, I'm going to try a collaborative development through Wiki. > > Take a look at > http://www.biopython.org/wiki/html/BioPython/BoscOhOne.html Ok - brainstorming ! > Written in Python, which some people prefer (warning, down this path lies > language wars). Lots of libraries. Batteries included. Portable. I > (AndrewDalke) personally believe Python code scales better for large > projects than most other languages, but that can be perceived as overly > antagonistic. * We probably have to be carefull about not raising a flame war. If you are the first speaker or if the previous (probably perl) speakers haven't mentioned anything - it might be wise to start with some diplomatic words about "use the language you are comfortable with" etc. * There are a lot of perl people there but probably few from the others (tcl, java ...). How do you want to compare python ? - this and that etc. is better than perl ? I can back you up at "Python code scales better for large projects than most other languages": I used biotcl (hardly usable at all) and developed biowish (bio+ tcl + graphics) and used it of course for a lot of research .... until my little brother asked me to look into python. After 2 days I started to rewrite the tcl code for my last (and biggest) PhD project, which was a #$%#$% to write in tcl, mainly because it was to big and tcl's OO solutions sucked. After one week, I hade rewritten all code and expanded the amount of features/codeby a factor of 10 (and was 3 month ahead of my schedule :-) And its still easy to navigate through the code and add new features .... !!! I have seen the light !!! * Our Vision: one of my visions - because of python's (almost standard) graphical modules (Tk, wx, GTK++ etc.) its very easy to integrate graphics which makes biopython VERY easy and fast to build tools for graphical comparisons (e.g. genomes) * to mention: * one of the strongest features of biopython (IMHO) are the parsers and how easy it is to make new parsers ... I don't know what perl has to offer - but perl people I meet were definitely interested in biopython's Swiss and Blast parser's. * we have two graphical sequence editor's in the standard distribution * SeqGui, a wx application by Cayte ??? * xbbtools, a Tkinter application by me (Thomas Sicheritz-Ponten) Mybe more after lunch cheers -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From jchang at SMI.Stanford.EDU Tue May 8 04:19:47 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] BOSC biopython outline In-Reply-To: Message-ID: > * We probably have to be carefull about not raising a flame war. If you are the > first speaker or if the previous (probably perl) speakers haven't mentioned > anything - it might be wise to start with some diplomatic words about "use > the language you are comfortable with" etc. Yes, we don't want to be antagonistic and start a flame war. However, I have no problems with rocking the boat from time to time, if we have well-reasoned claims. Knowing that Andrew never says anything that he can't back, I say fire away. ;) Jeff From katel at worldpath.net Tue May 15 04:05:14 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Kabat puzzles References: <3AF79318.C976C12B@acm.org> Message-ID: <003a01c0dd15$c6b24f00$010a0a0a@cadence.com> Some web surfing convinced me that subscripted Kabat numbers as in the following text, represent variable loops in an antibody. So if I simply use the Sequence class and string the residues together like beads, that will be misleading. Should the loops be features or should I extend the class? Any more ideas? I need to do more surfing and digging. SEQTPA 109 100 gat ASP D SEQTPA 110 100A cta LEU L SEQTPA 111 100B ccg PRO P SEQTPA 112 100C cac HIS H SEQTPA 113 100D aat ASN N SEQTPA 114 100E gat ASP D SEQTPA 115 100F ggt GLY G SEQTPA 116 100G --- --- - SEQTPA 117 100H --- --- - SEQTPA 118 100I --- --- - SEQTPA 119 100J --- --- - SEQTPA 120 100K ttt PHE F Cayte From biopython-bugs at bioperl.org Wed May 16 04:14:35 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/31 Message-ID: <200105160814.f4G8EZb32193@pw600a.bioperl.org> JitterBug notification new message incoming/31 Message summary for PR#31 From: hy263book@263.net Subject: When I encounter "No hits found" Date: Wed, 16 May 2001 04:14:35 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From hy263book@263.net Wed May 16 04:14:35 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4G8EYb32187 for ; Wed, 16 May 2001 04:14:35 -0400 Date: Wed, 16 May 2001 04:14:35 -0400 Message-Id: <200105160814.f4G8EYb32187@pw600a.bioperl.org> From: hy263book@263.net To: biopython-bugs@bioperl.org Subject: When I encounter "No hits found" Full_Name: Huang Ying Module: Bio.Blast.NCBIStandalond Version: OS: Win2k Submission from: (NULL) (166.111.30.26) I use Bio.Blast.NCBIStandalone.BlastParser to analysis Blast report.When blast result is "No hits found",python send the wrong message From gcox at netgenics.com Wed May 16 09:46:59 2001 From: gcox at netgenics.com (Cox, Greg) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Uniform parsing vocabulary Message-ID: I apologize for the cross-posting. I'm involved in the BioJava project, and I'm looking at constructing a uniform vocabulary for the parsed keys. For example, currently in BioJava, if an EMBL record is parsed the Organism information goes under the key 'OS', while if a Genbank record is parsed, the information is under the key "ORGANISM". Are there existing standards that define the keys used, or should I put together a new proposal? Greg Cox From biopython-bugs at bioperl.org Wed May 16 13:53:26 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/32 Message-ID: <200105161753.f4GHrQb11648@pw600a.bioperl.org> JitterBug notification new message incoming/32 Message summary for PR#32 From: Jeffrey Chang Subject: Re: [Biopython-dev] Notification: incoming/31 Date: Wed, 16 May 2001 11:58:00 -0700 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From jchang@SMI.Stanford.EDU Wed May 16 13:53:23 2001 Received: from crg-gw.Stanford.EDU (root@crg-gw.Stanford.EDU [171.65.32.201]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4GHrJb11642 for ; Wed, 16 May 2001 13:53:23 -0400 Received: from [171.65.33.127] (chang-smi.Stanford.EDU [171.65.33.127]) by crg-gw.Stanford.EDU (8.9.1a/8.9.1) with ESMTP id LAA23878; Wed, 16 May 2001 11:58:23 -0700 (PDT) User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.02.2022 Date: Wed, 16 May 2001 11:58:00 -0700 Subject: Re: [Biopython-dev] Notification: incoming/31 From: Jeffrey Chang To: CC: Message-ID: In-Reply-To: <200105160814.f4G8EZb32193@pw600a.bioperl.org> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Content-Transfer-Encoding: 7bit Hi Huang, Could you send the file that's generating the output? We have regression tests that check for behavior for "No hits found", and it does not generate any error message, as designed. helio:~/remotecvs/biopython/Tests/Blast> python Python 2.1 (#7, Apr 17 2001, 18:53:25) [GCC 2.8.1] on sunos5 Type "copyright", "credits" or "license" for more information. >>> from Bio.Blast import NCBIStandalone >>> rec = NCBIStandalone.BlastParser().parse_file('bt002') >>> print rec.alignments [] >>> Thanks, Jeff > From: biopython-bugs@bioperl.org > Date: Wed, 16 May 2001 04:14:35 -0400 > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/31 > > JitterBug notification > > new message incoming/31 > > Message summary for PR#31 > From: hy263book@263.net > Subject: When I encounter "No hits found" > Date: Wed, 16 May 2001 04:14:35 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > >> From hy263book@263.net Wed May 16 04:14:35 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4G8EYb32187 > for ; Wed, 16 May 2001 04:14:35 -0400 > Date: Wed, 16 May 2001 04:14:35 -0400 > Message-Id: <200105160814.f4G8EYb32187@pw600a.bioperl.org> > From: hy263book@263.net > To: biopython-bugs@bioperl.org > Subject: When I encounter "No hits found" > > Full_Name: Huang Ying > Module: Bio.Blast.NCBIStandalond > Version: > OS: Win2k > Submission from: (NULL) (166.111.30.26) > > > I use Bio.Blast.NCBIStandalone.BlastParser to analysis Blast report.When blast > result is "No hits found",python send the wrong message > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From katel at worldpath.net Sat May 19 02:43:58 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Kabat wire ( or wild goose ) chase References: Message-ID: <002601c0e02f$15eb49c0$010a0a0a@cadence.com> I found that the subscripts of the Kabat numbers do not represent alternatives. A helpful email informed me that more than likely the letters flag insertions compared to a reference. When I checked the PubMed reference for a Kabat output file, I found that the sequence in the NCBI nucleotide database matched the Kabat residues chained together,without any special handling of the subscripted residues. My guess is that the reference used is the canonical form, sort of a template for what a plain vanilla antibody should contain. SEQTPA 109 100 gat ASP D SEQTPA 110 100A cta LEU L SEQTPA 111 100B ccg PRO P SEQTPA 112 100C cac HIS H SEQTPA 113 100D aat ASN N SEQTPA 114 100E gat ASP D SEQTPA 115 100F ggt GLY G SEQTPA 116 100G --- --- - SEQTPA 117 100H --- --- - SEQTPA 118 100I --- --- - SEQTPA 119 100J --- --- - SEQTPA 120 100K ttt PHE F SEQTPA Cayte From mrp at sanger.ac.uk Mon May 21 10:18:45 2001 From: mrp at sanger.ac.uk (Matthew Pocock) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [BioXML-dev] Uniform parsing vocabulary References: Message-ID: <3B0923C5.4050601@sanger.ac.uk> Propose away - do we have ontologies for these things knocking about yet? Cox, Greg wrote: > I apologize for the cross-posting. > > I'm involved in the BioJava project, and I'm looking at constructing a > uniform vocabulary for the parsed keys. For example, currently in BioJava, > if an EMBL record is parsed the Organism information goes under the key > 'OS', while if a Genbank record is parsed, the information is under the key > "ORGANISM". Are there existing standards that define the keys used, or > should I put together a new proposal? > > Greg Cox > _______________________________________________ > BioXML-dev mailing list - BioXML-dev@bioxml.org > http://bioxml.org/mailman/listinfo/bioxml-dev From bradmars at yahoo.com Mon May 21 13:31:53 2001 From: bradmars at yahoo.com (Bradley Marshall) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary In-Reply-To: <3B0923C5.4050601@sanger.ac.uk> Message-ID: <20010521173153.35397.qmail@web218.mail.yahoo.com> Actually, I just looked around for this type of ontology last week and found nothing. I think it would be a valuable tool. Brad --- Matthew Pocock wrote: > Propose away - do we have ontologies for these > things knocking about yet? > > Cox, Greg wrote: > > > I apologize for the cross-posting. > > > > I'm involved in the BioJava project, and I'm > looking at constructing a > > uniform vocabulary for the parsed keys. For > example, currently in BioJava, > > if an EMBL record is parsed the Organism > information goes under the key > > 'OS', while if a Genbank record is parsed, the > information is under the key > > "ORGANISM". Are there existing standards that > define the keys used, or > > should I put together a new proposal? > > > > Greg Cox > > _______________________________________________ > > BioXML-dev mailing list - BioXML-dev@bioxml.org > > http://bioxml.org/mailman/listinfo/bioxml-dev > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From Solbrig.Harold at mayo.edu Mon May 21 14:15:28 2001 From: Solbrig.Harold at mayo.edu (Solbrig, Harold R.) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] RE: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary Message-ID: The sites below may be of some use: http://www-smi.stanford.edu/projects/helix/riboweb/kb-pub.html http://www.geml.org/ Also, there is a wealth of jump-off points anchored at http://www.semanticweb.org -----Original Message----- From: Bradley Marshall [mailto:bradmars@yahoo.com] Sent: Monday, May 21, 2001 12:32 PM To: Matthew Pocock; Cox, Greg Cc: 'biojava-l@biojava.org'; 'bioperl-l@bioperl.org'; 'bioxml-dev@bioxml.org'; 'biopython-dev@biopython.org' Subject: Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary Actually, I just looked around for this type of ontology last week and found nothing. I think it would be a valuable tool. Brad --- Matthew Pocock wrote: > Propose away - do we have ontologies for these > things knocking about yet? > > Cox, Greg wrote: > > > I apologize for the cross-posting. > > > > I'm involved in the BioJava project, and I'm > looking at constructing a > > uniform vocabulary for the parsed keys. For > example, currently in BioJava, > > if an EMBL record is parsed the Organism > information goes under the key > > 'OS', while if a Genbank record is parsed, the > information is under the key > > "ORGANISM". Are there existing standards that > define the keys used, or > > should I put together a new proposal? > > > > Greg Cox > > _______________________________________________ > > BioXML-dev mailing list - BioXML-dev@bioxml.org > > http://bioxml.org/mailman/listinfo/bioxml-dev > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Solbrig.Harold at mayo.edu Mon May 21 19:10:47 2001 From: Solbrig.Harold at mayo.edu (Solbrig, Harold R.) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] RE: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary Message-ID: Another possibility http://www.geneontology.org/ > -----Original Message----- > From: Bradley Marshall [mailto:bradmars@yahoo.com] > Sent: Monday, May 21, 2001 12:32 PM > To: Matthew Pocock; Cox, Greg > Cc: 'biojava-l@biojava.org'; 'bioperl-l@bioperl.org'; > 'bioxml-dev@bioxml.org'; 'biopython-dev@biopython.org' > Subject: Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary > > > > Actually, I just looked around for this type of > ontology last week and found nothing. I think it > would be a valuable tool. > > Brad > > --- Matthew Pocock wrote: > > Propose away - do we have ontologies for these > > things knocking about yet? > > > > Cox, Greg wrote: > > > > > I apologize for the cross-posting. > > > > > > I'm involved in the BioJava project, and I'm > > looking at constructing a > > > uniform vocabulary for the parsed keys. For > > example, currently in BioJava, > > > if an EMBL record is parsed the Organism > > information goes under the key > > > 'OS', while if a Genbank record is parsed, the > > information is under the key > > > "ORGANISM". Are there existing standards that > > define the keys used, or > > > should I put together a new proposal? > > > > > > Greg Cox > > > _______________________________________________ > > > BioXML-dev mailing list - BioXML-dev@bioxml.org > > > http://bioxml.org/mailman/listinfo/bioxml-dev > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > __________________________________________________ > Do You Yahoo!? > Yahoo! Auctions - buy the things you want at great prices > http://auctions.yahoo.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From biopython-bugs at bioperl.org Mon May 21 21:27:19 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Notification: incoming/33 Message-ID: <200105220127.f4M1RJb30045@pw600a.bioperl.org> JitterBug notification new message incoming/33 Message summary for PR#33 From: sarah@k-k.oz.au Subject: str matrix in Bio.SubsMat.MatrixInfo Date: Mon, 21 May 2001 21:27:18 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From sarah@k-k.oz.au Mon May 21 21:27:19 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4M1RIb30039 for ; Mon, 21 May 2001 21:27:18 -0400 Date: Mon, 21 May 2001 21:27:18 -0400 Message-Id: <200105220127.f4M1RIb30039@pw600a.bioperl.org> From: sarah@k-k.oz.au To: biopython-bugs@bioperl.org Subject: str matrix in Bio.SubsMat.MatrixInfo Full_Name: Sarah Kummerfeld Module: Bio.SubsMat.MatrixInfo Version: biopython-1.00a1 -- not sure of specific module version OS: linux Submission from: metra.ucc.usyd.edu.au (129.78.64.5) There is a matrix called str in the Bio.SubsMat.MatrixInfo set. This causes problems with the python str() function! I couldn't find the get_matrices.py code that is referred to as having generated these substitution matrices. So i'm not sure if it should be changed there or in MatrixInfo Sarah From katel at worldpath.net Tue May 22 02:03:33 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [BioXML-dev] Uniform parsing vocabulary References: <3B0923C5.4050601@sanger.ac.uk> Message-ID: <003401c0e284$f0c9eea0$010a0a0a@cadence.com> ----- Original Message ----- From: "Matthew Pocock" To: "Cox, Greg" Cc: ; ; ; Sent: Monday, May 21, 2001 7:18 AM Subject: [Biopython-dev] Re: [BioXML-dev] Uniform parsing vocabulary > Propose away - do we have ontologies for these things knocking about yet? > > Cox, Greg wrote: > > > I apologize for the cross-posting. > > > > I'm involved in the BioJava project, and I'm looking at constructing a > > uniform vocabulary for the parsed keys. For example, currently in BioJava, > > if an EMBL record is parsed the Organism information goes under the key > > 'OS', while if a Genbank record is parsed, the information is under the key > > "ORGANISM". Are there existing standards that define the keys used, or > > should I put together a new proposal? > > The IEB ( Information Engineering Branch ) of NCBI is developing a specialized ASN( Abstract Syntax Notation ) for bio to solve interoperability. It is available for download. It may or may not become the last word because sometimes something meaner and leaner comes along and blows away the official protocal, like what happened wth TCP/IP and the 7 layer OSI model. Cayte From rik at cs.ucsd.edu Mon May 21 23:11:51 2001 From: rik at cs.ucsd.edu (Richard K. Belew) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary References: Message-ID: <3B09D8F7.BCF12366@cs.ucsd.edu> > > > -----Original Message----- > > From: Bradley Marshall [mailto:bradmars@yahoo.com] > > Sent: Monday, May 21, 2001 12:32 PM > > To: Matthew Pocock; Cox, Greg > > Cc: 'biojava-l@biojava.org'; 'bioperl-l@bioperl.org'; > > 'bioxml-dev@bioxml.org'; 'biopython-dev@biopython.org' > > Subject: Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary > > > > > > > > Actually, I just looked around for this type of > > ontology last week and found nothing. I think it > > would be a valuable tool. "Solbrig, Harold R." wrote: > > Another possibility > > http://www.geneontology.org/ right, GO is what i immediately thought of, too. but perhaps (given the biopython audience) it is tools for manipulating GO that you seek? on a related theme, i'm wondering if anyone has worked on integrating NLM's MESH into biopython? rik -- Richard K. Belew rik@cs.ucsd.edu http://www.cs.ucsd.edu/~rik Computer Science & Engr. Dept. Univ. California -- San Diego 858 / 534-2601 9500 Gilman Dr. (0114) 858 / 532-0702 (msgs) La Jolla CA 92093-0114 USA 858 / 534-7029 (fax) From jchang at SMI.Stanford.EDU Tue May 22 02:46:17 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary In-Reply-To: <3B09D8F7.BCF12366@cs.ucsd.edu> Message-ID: > From: "Richard K. Belew" [discussion on sequence accession ontologies] > on a related theme, i'm wondering if anyone has > worked on integrating NLM's MESH into biopython? This is related to the stuff I work on in my research. Stuff in this area is definitely welcome in biopython. However, it's unclear to me how it can be integrated. (Someone please speak up if you know!) The online MeSH browser seems to be more human-oriented and not as useful for accessing via a script. However, the UMLS version requires a license. I don't know if there is free access to MeSH/UMLS information. Jeff From birney at ebi.ac.uk Tue May 22 04:43:37 2001 From: birney at ebi.ac.uk (Ewan Birney) Date: Sat Mar 5 14:42:59 2005 Subject: [Biopython-dev] Re: [BioXML-dev] Uniform parsing vocabulary In-Reply-To: <003401c0e284$f0c9eea0$010a0a0a@cadence.com> Message-ID: Ok guys - can we stop cross posting now ;) Greg - feel free to report back I guess to biojava + biocorba your results. Lets get the cross-project posts a little more confined... (I vote biocorba...) ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From biopython-bugs at bioperl.org Tue May 22 16:57:19 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/33 Message-ID: <200105222057.f4MKvJb05482@pw600a.bioperl.org> JitterBug notification chapmanb changed notes Message summary for PR#33 From: sarah@k-k.oz.au Subject: str matrix in Bio.SubsMat.MatrixInfo Date: Mon, 21 May 2001 21:27:18 -0400 0 replies 0 followups Notes: Ooops, my fault -- too much automation and not enough sanity checking on my part :-). The pages that were used to generate this don't even exist any more, so the change was made directly to MatrixInfo.py, str is now called structure. Sarah, thanks for finding this! ====> ORIGINAL MESSAGE FOLLOWS <==== >From sarah@k-k.oz.au Mon May 21 21:27:19 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4M1RIb30039 for ; Mon, 21 May 2001 21:27:18 -0400 Date: Mon, 21 May 2001 21:27:18 -0400 Message-Id: <200105220127.f4M1RIb30039@pw600a.bioperl.org> From: sarah@k-k.oz.au To: biopython-bugs@bioperl.org Subject: str matrix in Bio.SubsMat.MatrixInfo Full_Name: Sarah Kummerfeld Module: Bio.SubsMat.MatrixInfo Version: biopython-1.00a1 -- not sure of specific module version OS: linux Submission from: metra.ucc.usyd.edu.au (129.78.64.5) There is a matrix called str in the Bio.SubsMat.MatrixInfo set. This causes problems with the python str() function! I couldn't find the get_matrices.py code that is referred to as having generated these substitution matrices. So i'm not sure if it should be changed there or in MatrixInfo Sarah From biopython-bugs at bioperl.org Tue May 22 16:57:19 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/33 Message-ID: <200105222057.f4MKvJb05486@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#33 from incoming to fixed-bugs Message summary for PR#33 From: sarah@k-k.oz.au Subject: str matrix in Bio.SubsMat.MatrixInfo Date: Mon, 21 May 2001 21:27:18 -0400 0 replies 0 followups Notes: Ooops, my fault -- too much automation and not enough sanity checking on my part :-). The pages that were used to generate this don't even exist any more, so the change was made directly to MatrixInfo.py, str is now called structure. Sarah, thanks for finding this! ====> ORIGINAL MESSAGE FOLLOWS <==== >From sarah@k-k.oz.au Mon May 21 21:27:19 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4M1RIb30039 for ; Mon, 21 May 2001 21:27:18 -0400 Date: Mon, 21 May 2001 21:27:18 -0400 Message-Id: <200105220127.f4M1RIb30039@pw600a.bioperl.org> From: sarah@k-k.oz.au To: biopython-bugs@bioperl.org Subject: str matrix in Bio.SubsMat.MatrixInfo Full_Name: Sarah Kummerfeld Module: Bio.SubsMat.MatrixInfo Version: biopython-1.00a1 -- not sure of specific module version OS: linux Submission from: metra.ucc.usyd.edu.au (129.78.64.5) There is a matrix called str in the Bio.SubsMat.MatrixInfo set. This causes problems with the python str() function! I couldn't find the get_matrices.py code that is referred to as having generated these substitution matrices. So i'm not sure if it should be changed there or in MatrixInfo Sarah From muilu at ebi.ac.uk Tue May 22 17:49:33 2001 From: muilu at ebi.ac.uk (Juha Muilu) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Re: [Biojava-l] Re: [BioXML-dev] Uniform parsing vocabulary References: <3B0923C5.4050601@sanger.ac.uk> Message-ID: <3B0ADEED.4A3A8817@ebi.ac.uk> Hi Perhaps something usefull here as well... http://industry.ebi.ac.uk/candy/ Matthew Pocock wrote: > > Propose away - do we have ontologies for these things knocking about yet? > > Cox, Greg wrote: > > > I apologize for the cross-posting. > > > > I'm involved in the BioJava project, and I'm looking at constructing a > > uniform vocabulary for the parsed keys. For example, currently in BioJava, > > if an EMBL record is parsed the Organism information goes under the key > > 'OS', while if a Genbank record is parsed, the information is under the key > > "ORGANISM". Are there existing standards that define the keys used, or > > should I put together a new proposal? > > > > Greg Cox > > _______________________________________________ > > BioXML-dev mailing list - BioXML-dev@bioxml.org > > http://bioxml.org/mailman/listinfo/bioxml-dev > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- +--------------------------------------------------------------------+ |Juha Muilu, Ph.D., EMBL Outstation| Email: muilu@ebi.ac.uk | |European Bioinformatics Institute | Phone: +44 (0)1223 494 624 | |Wellcome Trust Genome Campus | Fax: +44 (0)1223 494 468 | |Hinxton, Cambridge CB10 1SD, UK | http://industry.ebi.ac.uk/~muilu| +--------------------------------------------------------------------+ From idoerg at cc.huji.ac.il Thu May 24 07:46:08 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: <20000907124634.B74243@fling.sanbi.ac.za> Message-ID: Hi all, Does Biopython provide anything in the field of writing out a sequence (Seq/MutableSeq classes in the usual GenBank/SwissProt/Fasta/... formats? I see that FastaAlign does that, but I cannot find anything for a single sequence. If Biopython really does not provide this feature, maybe a discussion could be started. Writing out the sequence part is easy. Can be implemented with a few functions (to_fasta, to_swiss), or those can even be methods within Seq/MutableSeq Why? Well, basically there is a need for some sort of interface with the common run-of-the mill sequence handling packages which lack APIs such as the Wisconsin package, Clustal, etc. (OK, maybe not Clustal, FastaAlign took care of that, but you get my drift). Currently, using files is the best approach, and would provide a practical solution for people who work with these packages, but which use Biopython as well. Annotation, as usual, is the problem. (SeqRecord in Biopythonese). There was some discussion a couple of days ago regarding uniform vocabularies. From what I gathered from the discussion, this is at a pretty preliminary stage. My personal feelings are that this should not be an obstacle to providing some practical tool for writing a skeleton annotation containing, say, accession numbers, organism, keywords and refs. This can always be expanded/modified later when some standardization scheme comes in. OK, I think I should stop here. I hope I'm not raising some sort of moot point. If I'm not way off the mark, I can provide a couple of more specifics. If I am, please chalk this up to me being out of the loop for a while, and not reading the archives carefully. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From sarah at k-k.oz.au Thu May 24 19:11:16 2001 From: sarah at k-k.oz.au (Sarah Kummerfeld) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: Message-ID: > Does Biopython provide anything in the field of writing out a sequence > (Seq/MutableSeq classes in the usual GenBank/SwissProt/Fasta/... formats? I noticed this didn't seem to be available a month or so back and started thinking about the best place to put it. I didn't finish coding it because I was not sure if it should be part of Seq/MutableSeq or in a 'writer' module. The first option seems best in terms of OO design and probably usability, but it seems that most of the input in biopython is done using a reader, so to be consistent with that it should go in a writer module? Also, to output meaningful genbank files I think we really need to operate on SeqFeature objects? In fact even fasta needs the information in a SeqRecord object rather than just a MutableSeq? Sarah p.s. For my own purposes I just wrote a writer module with the formats I needed and a '# FIXME' to sort it out later :) From chapmanb at arches.uga.edu Fri May 25 00:28:40 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: References: <20000907124634.B74243@fling.sanbi.ac.za> Message-ID: <15117.57208.36295.53219@taxus.athen1.ga.home.com> Hi Iddo! Nice to hear from you again! Hope things are going well. Iddo: > Does Biopython provide anything in the field of writing out a sequence > (Seq/MutableSeq classes in the usual GenBank/SwissProt/Fasta/... formats? Nope, not yet. As Sarah noticed, no one has coded this up. I give a big +1 for adding something that has this type of functionality. Iddo: > If Biopython really does not provide this feature, maybe a discussion > could be started. Writing out the sequence part is easy. Can be > implemented with a few functions (to_fasta, to_swiss), or those can even > be methods within Seq/MutableSeq Sarah: > Also, to output meaningful genbank files I think we really need > to operate on SeqFeature objects? In fact even fasta needs the > information in a SeqRecord object rather than just a MutableSeq? I think Sarah is right on this. Seq/MutableSeq classes do not store any useful annotations on the sequence (except the alphabet/type of the sequence). Things should focus on SeqRecord, which has all of the annotation stuff. I was thinking about this problem while I was writing the output functionality for GenBank.Record objects (the GenBank specific Record classes). Here's my 2 cents on what I think should be done: => First, someone needs to work on SeqRecord to beef it up and make it a nicer class for storing annotation information. Right now, everything gets shoved into the annotations or features attribute (take a look at the GenBank stuff for a good example of how someone (me!), can abuse these badly. I think a more full featured SeqRecord class would be great. => In my mind, instead of focusing on conversions like: SeqRecord -> FASTA flat file format we should do the conversions like: SeqRecord -> Fasta.Record class -> FASTA flat file format (and something similar for GenBank, SwissProt, etc). Since in biopython we have nice classes for representing specific flat file formats, and also have a way to output the flat file from the record (at least for FASTA and GenBank right now), this allows us to use this strength of biopython and also not duplicate code. This is a big bonus for more complicated formats like GenBank -- writing a function that outputs FASTA is not too bad, but GenBank is much more complicated -- I was amazed at the amount of work I had to do to get output working, even from a GenBank specific record class. I'd rather not duplicate this type of code. => So, since we've already got the Record -> flat file converters (or can write them), I think we could focus on writing a converter that will take a SeqRecord and give you a format specific Record object, like: class SeqRecordConverter: def __init__(self, seq_record): def to_fasta(self): def to_genbank(self): def to_swissprot(self): This could either go in the Bio/SeqRecord.py module or into something like Bio/Tools/Converter.py, but I think it is better to separate these functions away from the SeqRecord class itself: this would help keep SeqRecord small, and would also allow you to use the SeqRecordConverter with "SeqRecord-like" objects (ie. you could code up your own SeqRecord-like classes for specialized behavior or whatever). So anyways, these are the ideas that have been mulling around in my brain concerning this. What do people think? Other opinions on how to implement this type of functionality? Thanks Iddo and Sarah -- I'm really glad y'all are interested in working on this! Brad From katel at worldpath.net Fri May 25 04:02:25 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Kabat References: Message-ID: <001e01c0e4f1$09cc2940$010a0a0a@cadence.com> Just committed the Kabat modules. Cayte From katel at worldpath.net Sat May 26 02:49:00 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] ASN.1 References: <001e01c0e4f1$09cc2940$010a0a0a@cadence.com> Message-ID: <004501c0e5af$f321f3e0$010a0a0a@cadence.com> How much is ASN.1 used? I just looked at a sequence in ASN.1. My first impression is that its close enough to xml that it should be xml( a heirarchical structure ) but I havent delved into it? What does it offer beyond xml? Cayte From jchang at SMI.Stanford.EDU Sat May 26 01:49:30 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] ASN.1 In-Reply-To: <004501c0e5af$f321f3e0$010a0a0a@cadence.com> Message-ID: It's a context-free grammar used for data at the NCBI. It was invented before XML, but technically, should be able to handle equivalent data. I don't know if it's used outside of NCBI -- I've never seen it. Jeff > From: "Cayte" > Date: Fri, 25 May 2001 23:49:00 -0700 > To: "Cayte" , > Subject: [Biopython-dev] ASN.1 > > How much is ASN.1 used? I just looked at a sequence in ASN.1. My first > impression is that its close enough to xml that it should be xml( a > heirarchical structure ) but I havent delved into it? What does it offer > beyond xml? > > Cayte > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From dalke at acm.org Sat May 26 02:30:49 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] ASN.1 Message-ID: <08ed01c0e5ad$68b6fea0$0301a8c0@josiah> Jeff: >It's a context-free grammar used for data at the NCBI. It was invented >before XML, but technically, should be able to handle equivalent data. I >don't know if it's used outside of NCBI -- I've never seen it. I've seen it in two other places: exchange of crypto data and (this is obscure :)) Steve Brenner's annmm, at http://predict.sanger.ac.uk/th/papers/annmmpost/annmmpost.html Years ago I tried using NCBI's toolkit to parse annmm for VMD and got it to core dump so ended up writing an ugly parser in Perl which was sufficient to handle only annmm. That task turned me off of both ASN.1 and the NCBI toolkit. No doubt the toolkit has improved since then but I don't know anyone who has used it to tell me its status. A google search returns lots of "ASN.1" hits. The first is http://asn1.elibel.tm.fr/ . I came across references of ANS.1 used for SNMP, X.400 email, X.500 directory and Z39.50. In other words, big gnarly international standards I don't use :) Cayte: > How much is ASN.1 used? Outside of NCBI but still in the world of bioinformatics, it isn't. > My first impression is that its close enough to xml that it > should be xml [...] What does it offer beyond xml? I would argue that it's not the same as raw XML. It contains things like "integer" or "choice" which are more like XML Schemas, "which offers facilities for describing the structure and constraining the contents of XML" (http://www.w3.org/TR/xmlschema-1/). Granted, XML Schemas are written using XML, but that's like saying SWISS-PROT is written using ASCII. I don't know enough about it to judge what it offers beyond XML. As with Jeff, I think they are for the most part equivalent - and for what NCBI uses of it, they definitely equivalent, with the advantage going to XML for having many more tools available for dealing with it. Andrew From idoerg at cc.huji.ac.il Sat May 26 08:52:25 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: <15117.57208.36295.53219@taxus.athen1.ga.home.com> Message-ID: Hi, Well, I'm happy that I hit on something other people find necessary. Always good to know one is not alone :) On Fri, 25 May 2001, Brad Chapman wrote: Brad: : : I think Sarah is right on this. Seq/MutableSeq classes do not store : any useful annotations on the sequence (except the alphabet/type of : the sequence). Things should focus on SeqRecord, which has all of the : annotation stuff. : I concur. It's just that, as you said, SeqRecord should include a lot more stuff for good GenBank/SwissProt records. As it is, it seems to be good enough for FASTA format. But the big formats (anything not Fasta) are not really interconvertible, except maybe GenBank <--> EMBL. So maybe what we need is just the following: 1) {big formats} --> fasta converter 2) A writer for each of the formats ( e.g. SProt.Record.write(handle) ) 3) EMBL <--> GenBank, but that's pretty superfluous The problem arises from annotation. Do you think it's feasable to perform a good GenPept (that's the GenBank translation database) <--> SwissProt converter that will preserve everything? Or a PIR <--> SwissProt converter? I think that anyone seeking to preserve annotation, beyond the bare bones (organism, accession, maybe references, etc) would not want to use a converter anyhow. So the problem is basically downsized to having a writer for each record types. And for SeqRecord which will be a generic record, but could only be written out in Fasta. This way we don't get caught up in trying to create a monster data type which integrates all the information which the various formats like to preserve. (And I haven't even mentioned PDB annotation yet!) So maybe we just need a writer for each {database}.Record types, and a to_fasta converter and writer in Tools. Of course, we can beef up SeqRecord to have a bit more than bare-bones annotation capability, for functional reasons, not only for flat-file writing capabilities, but that's a different topic. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From jchang at SMI.Stanford.EDU Sat May 26 20:04:21 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: Message-ID: > From: Iddo Friedberg > [...] > The problem arises from annotation. Do you think it's feasable to perform a > good GenPept (that's the GenBank translation database) <--> SwissProt > converter that will preserve everything? > The gold standard for preserving information, is if you can convert A to B back to A, and have it come out exactly the same. That'll probably be possible for a lot of records, but many of them will not work. For example, GenBank locations are much richer than SwissProt ones, so complex location semantics that SwissProt doesn't handle will be lost. > I think that anyone seeking to preserve annotation, beyond the bare bones > (organism, accession, maybe references, etc) would not want to use a converter > anyhow. > Yes, this is a reasonable assumption. > So the problem is basically downsized to having a writer for each record > types. And for SeqRecord which will be a generic record, but could only be > written out in Fasta. This way we don't get caught up in trying to create a > monster data type which integrates all the information which the various > formats like to preserve. (And I haven't even mentioned PDB annotation yet!) Yep, I agree. > So maybe we just need a writer for each {database}.Record types, and a > to_fasta converter and writer in Tools. Isn't this what the SeqIO directory is for? I had always hoped to get SeqIO functionality similar to bioperl's. Jeff From sarah at k-k.oz.au Sat May 26 20:32:11 2001 From: sarah at k-k.oz.au (Sarah Kummerfeld) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: Message-ID: > Of course, we can beef up SeqRecord to have a bit more than bare-bones > annotation capability, for functional reasons, not only for flat-file > writing capabilities, but that's a different topic. Maybe we could subclass the SeqRecord for different types of formats (sort of like the alignment class is for clustalw....). Then a write method as you suggested would just print out the relevant format. A set of converters could also be provided separately (again just like for alignments). It doesn't solve the preservation of information, but at least it's a 'nice' design for the 'beefing up' of SeqRecord. Alternatively, SeqFeature could be subclassed for that purpose if you want to keep SeqRecord simpler. Sarah From idoerg at cc.huji.ac.il Sat May 26 23:44:57 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Output sequence files In-Reply-To: Message-ID: Hi, Iddo: : > So maybe we just need a writer for each {database}.Record types, and a : > to_fasta converter and writer in Tools. Jeff: : : Isn't this what the SeqIO directory is for?I had always hoped to get SeqIO : functionality similar to bioperl's. DAMN! I thought I was missing out on something... Did I also miss the existence of writers for {database}.Record types? Sorry about that. I'll look into SeqIO, and bioperl's one, see if I can learn something. Thanks for clearing this up. Iddo: : > The problem arises from annotation. Do you think it's feasable to perform a : > good GenPept (that's the GenBank translation database) <--> SwissProt : > converter that will preserve everything? : > Jeff: : The gold standard for preserving information, is if you can convert A to B : back to A, and have it come out exactly the same.That'll probably be : possible for a lot of records, but many of them will not work.For example, : GenBank locations are much richer than SwissProt ones, so complex location : semantics that SwissProt doesn't handle will be lost. : Actually, GenBank <--> SwissProt is probably the least convertible of the kind. Many GenBank records hold the annotation to several CDS's, and generally a GenBank sequence holds also untranslated regions, etc. GenPept, the GenBank translation, is not much better: it holds coding information and all sorts of stuff which is SwissProt irrelevant. And vice-versa. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From biopython-bugs at bioperl.org Mon May 28 13:59:32 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/34 Message-ID: <200105281759.f4SHxWb29493@pw600a.bioperl.org> JitterBug notification new message incoming/34 Message summary for PR#34 From: Subject: specials of the day Date: Mon, 28 May 2001 15:02:24 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From dragon13@dwp.net Mon May 28 13:59:32 2001 Received: from lunar.eclipse.net (root@[207.207.192.6]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4SHxVb29487; Mon, 28 May 2001 13:59:31 -0400 Received: from dwp.net (da001d1958.atl-ga.osd.concentric.net [64.3.199.167]) by lunar.eclipse.net (8.9.1a/8.6.12) with SMTP id PAA10840; Mon, 28 May 2001 15:03:30 -0400 (EDT) From: Subject: specials of the day Date: Mon, 28 May 2001 15:02:24 Message-Id: <613.466435.205071@dwp.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" PLEASE FORWARD TO THE PERSON RESPONSIBLE FOR PURCHASING YOUR LASER PRINTER SUPPLIES **** VORTEX SUPPLIES **** LASER PRINTER TONER CARTRIDGES, COPIER AND FAX CARTRIDGES SAVE UP TO 30% FROM RETAIL ORDER BY PHONE:1-888-288-9043 ORDER BY FAX: 1-888-977-1577 CUSTOMER SERVICE: 1-888-248-2015 E-MAIL REMOVAL LINE: 1-888-248-4930 UNIVERSITY AND/OR SCHOOL PURCHASE ORDERS WELCOME. (NO CREDIT APPROVAL REQUIRED) ALL OTHER PURCHASE ORDER REQUESTS REQUIRE CREDIT APPROVAL. PAY BY CHECK (C.O.D), CREDIT CARD OR PURCHASE ORDER (NET 30 DAYS). IF YOUR ORDER IS BY CREDIT CARD PLEASE LEAVE YOUR CREDIT CARD # PLUS EXPIRATION DATE. IF YOUR ORDER IS BY PURCHASE ORDER LEAVE YOUR SHIPPING/BILLING ADDRESSES AND YOUR P.O. NUMBER FOR THOSE OF YOU WHO REQUIRE MORE INFORMATION ABOUT OUR COMPANY INCUDING FEDERAL TAX ID NUMBER, CLOSEST SHIPPING OR CORPORATE ADDRESS IN THE CONTINENTAL U.S. OR FOR CATALOG REQUESTS PLEASE CALL OUR CUSTOMER SERVICE LINE 1-888-248-2015 OUR NEW , LASER PRINTER TONER CARTRIDGE, PRICES ARE AS FOLLOWS: (PLEASE ORDER BY PAGE NUMBER AND/OR ITEM NUMBER) HEWLETT PACKARD: (ON PAGE 2) ITEM #1 LASERJET SERIES 4L,4P (74A)------------------------$44 ITEM #2 LASERJET SERIES 1100 (92A)-------------------------$44 ITEM #3 LASERJET SERIES 2 (95A)----------------------------$39 ITEM #4 LASERJET SERIES 2P (75A)---------------------------$54 ITEM #5 LASERJET SERIES 5P,6P,5MP, 6MP (3903A)---------- -$44 ITEM #6 LASERJET SERIES 5SI, 8000 (09A)--------------------$95 ITEM #7 LASERJET SERIES 2100 (96A)-------------------------$74 ITEM #8 LASERJET SERIES 8100 (82X)------------------------$145 ITEM #9 LASERJET SERIES 5L/6L (3906A)----------------------$39 ITEM #10 LASERJET SERIES 4V---------------------------------$95 ITEM #11 LASERJET SERIES 4000 (27X)--------------------------$72 ITEM #12 LASERJET SERIES 3SI/4SI (91A)-----------------------$54 ITEM #13 LASERJET SERIES 4, 4M, 5,5M-------------------------$49 ITEM #13A LASERJET SERIES 5000 (29X)-------------------------$95 HEWLETT PACKARD FAX (ON PAGE 2) ITEM #14 LASERFAX 500, 700 (FX1)----------$49 ITEM #15 LASERFAX 5000,7000 (FX2)--------$54 ITEM #16 LASERFAX (FX3)------------------$59 ITEM #17 LASERFAX (FX4)------------------$54 LEXMARK/IBM (ON PAGE 3) OPTRA 4019, 4029 HIGH YIELD---------------$89 OPTRA R, 4039, 4049 HIGH YIELD-----------$105 OPTRA E-----------------------------------$59 OPTRA N----------------------------------$115 OPTRA S----------------------------------$165 EPSON (ON PAGE 4) ACTION LASER 7000,7500,8000,9000----------$105 ACTION LASER 1000,1500--------------------$105 CANON PRINTERS (ON PAGE 5) PLEASE CALL FOR MODELS AND UPDATED PRICES FOR CANON PRINTER CARTRIDGES PANASONIC (0N PAGE 7) NEC SERIES 2 MODELS 90 AND 95----------$105 APPLE (0N PAGE 8) LASER WRITER PRO 600 or 16/600------------------$49 LASER WRITER SELECT 300,320,360-----------------$74 LASER WRITER 300 AND 320------------------------$54 LASER WRITER NT, 2NT----------------------------$54 LASER WRITER 12/640-----------------------------$79 CANON FAX (ON PAGE 9) LASERCLASS 4000 (FX3)---------------------------$59 LASERCLASS 5000,6000,7000 (FX2)-----------------$54 LASERFAX 5000,7000 (FX2)------------------------$54 LASERFAX 8500,9000 (FX4)------------------------$54 CANON COPIERS (PAGE 10) PC 3, 6RE, 7 AND 11 (A30)---------------------$69 PC 300,320,700,720 and 760 (E-40)-------------$89 IF YOUR CARTRIDGE IS NOT LISTED CALL CUSTOMER SERVICE AT 1-888-248-2015 90 DAY UNLIMITED WARRANTY INCLUDED ON ALL PRODUCTS. ALL TRADEMARKS AND BRAND NAMES LISTED ABOVE ARE PROPERTY OF THE RESPECTIVE HOLDERS AND USED FOR DESCRIPTIVE PURPOSES ONLY. From katel at worldpath.net Tue May 29 01:58:31 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] User friendly files References: Message-ID: <003401c0e804$64dd1ea0$010a0a0a@cadence.com> . I was thinking of ideas to minimize the need to edit, cut and paste files. One idea is to subclass File and add a read_record_line routine. This would contain a state machine. The SEARCHING_FOR_RECORD state would discard lines up to the start of record tag, and retiurn this line and advance to the RECORD_FOUND state. In the RECORD_FOUND state, read_record_line would call readline. At the end of record tag it would advance to the END_RECORD state. The next call would return an empty state and return to SEARCHING_FOR_RECORD. The state would always go to the END_OF_FILE when the code finished reading all the file contents. The state would go to an UNKNOWN state if the user forced the file pointer. I feel the easier our tools are the more likey they'll be used. Please share your ideas. Cayte From jchang at SMI.Stanford.EDU Wed May 30 02:58:49 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] User friendly files In-Reply-To: <003401c0e804$64dd1ea0$010a0a0a@cadence.com> Message-ID: I think this is a good idea, but the timing may be a little off. Would this have applications beyond writing parsers? We're trying to encourage people to start using Martel and RecordReader, which essentially does this too. Jeff > From: "Cayte" > Date: Mon, 28 May 2001 22:58:31 -0700 > To: > Subject: [Biopython-dev] User friendly files > > . I was thinking of ideas to minimize the need to edit, cut and paste > files. One idea is to subclass File and add a read_record_line routine. > This would contain a state machine. The SEARCHING_FOR_RECORD state would > discard lines up to the start of record tag, and retiurn this line and > advance to the RECORD_FOUND state. In the RECORD_FOUND state, > read_record_line would call readline. At the end of record tag it would > advance to the END_RECORD state. The next call would return an empty state > and return to SEARCHING_FOR_RECORD. The state would always go to the > END_OF_FILE when the code finished reading all the file contents. The state > would go to an UNKNOWN state if the user forced the file pointer. > > I feel the easier our tools are the more likey they'll be used. Please > share your ideas. > > Cayte > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From katel at worldpath.net Thu May 31 03:06:52 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] User friendly files References: Message-ID: <001b01c0e9a0$467229e0$010a0a0a@cadence.com> ----- Original Message ----- From: "Jeffrey Chang" . > I think this is a good idea, but the timing may be a little off. Would this > have applications beyond writing parsers? We're trying to encourage people > to start using Martel and RecordReader, which essentially does this too. > . IIt complements Martel. Martel doesn't strip boilerplate before, after or between records. I discussed this with Andrew in an email xchange. Cayte