From Y.Benita at pharm.uu.nl Fri Aug 1 02:49:12 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone? In-Reply-To: <3F298B68.4060200@burnham.org> Message-ID: on 31/7/03 23:34, Iddo Friedberg at idoerg@burnham.org wrote: > Hi, > > Does anybody have a program which calculates a sequence's Grand average > of hydropathicity (GRAVY)? > > TIA, > > ./I > > Here is one of my functions. I have a collection of many protein analysis functions, maybe its time to put together a module. Yair # Kyte & Doolittle hydrophobiciy index kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5, 'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5, 'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6, 'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 } # calculate the garvy according to kyte and doolittle. def Gravy(ProteinSequence): if ProteinSequence.islower(): ProteinSequence = ProteinSequence.upper() ProtGravy=0.0 for i in ProteinSequence: ProtGravy += kd[i] return ProtGravy/len(ProteinSequence) -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University From dalke at dalkescientific.com Fri Aug 1 03:40:05 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone? In-Reply-To: Message-ID: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com> Yair: > Here is one of my functions. I have a collection of many protein > analysis > functions, maybe its time to put together a module. It would be. BTW, here's a way to make things go faster - make the dict include the lowercase characters. This means you don't need to scan/convert the sequence before acting on it. # Kyte & Doolittle hydrophobiciy index kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5, 'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5, 'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6, 'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 } # add in the lowercase characters _full_kd = kd.copy() _full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()])) # calculate the garvy according to kyte and doolittle. def Gravy(ProteinSequence): _kd = _full_kd # slightly faster performance with a local name lookup ProtGravy=0.0 for i in ProteinSequence: ProtGravy += _kd[i] return ProtGravy/len(ProteinSequence) I don't think there's a faster way. Other tricks, like sum([kd[c] for c in s]) and sum(map(kd.__getitem__, s)) for the main loop are both slower because they build up the intermediate list. I even played around with def iter_lookup(d, s): for c in s: yield d sum(iter_lookup(_kd, ProteinSequence)) but at least for a short sequence it's also slower - perhaps because of the '.next()' method call overhead? Andrew dalke@dalkescientific.com From dalke at dalkescientific.com Fri Aug 1 05:04:56 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone? In-Reply-To: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com> Message-ID: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com> Me: > # add in the lowercase characters > _full_kd = kd.copy() > _full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()])) BTW, I've been experimenting with some of the new 2.3 features. Unfortunately, I went overboard. The following is better _full_kd = {} for k, v in kd.items(): _full_kd[k] = _full_kd[k.lower()] = v Andrew dalke@dalkescientific.com From Y.Benita at pharm.uu.nl Fri Aug 1 09:07:39 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone? In-Reply-To: <5E12B6AE-C3F3-11D7-961C-000393C92466@dalkescientific.com> Message-ID: on 1/8/03 9:40, Andrew Dalke at dalke@dalkescientific.com wrote: > Yair: >> Here is one of my functions. I have a collection of many protein >> analysis >> functions, maybe its time to put together a module. > Andrew: > It would be. Thanks for the feedback, Andrew. I already implemented your suggestions. I recall we discussed this issue a few months ago. We talked about a "Tools" module which will hold all analysis functions. The "Tools" module evolved to something different and the analysis methods were forgotten. My own modules have two analysis classes which are initialized with an appropriate sequence object and have these methods: class DnaAnalysis nucleotide content GC content Codon adaptation index class ProteinAnalysis Amino acid content Molecular weight Aromaticity Calculate scale -> sliding window with any given dictionary. Instability index Flexibility isoelectric point Gravy secondary structure -> using Garnier I guess there is some redundancy with existing modules. How should we proceed with that? Maybe some of you prefer to separate DNA and Protein or even make separate functions instead of classes. I will clean up my code and send it soon. Yair -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University From jchang at smi.stanford.edu Fri Aug 1 13:07:02 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Re: [BioPython] GRAVY index program anyone? In-Reply-To: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com> Message-ID: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> On Friday, August 1, 2003, at 02:04 AM, Andrew Dalke wrote: > BTW, I've been experimenting with some of the new 2.3 features. What is everyone's feelings about Python 2.3 for Biopython? I'd rather be conservative with the versioning (Biopython now requires Python 2.2), so that people aren't required to upgrade their Python. However, if there are compelling features in 2.3 that people need to use, that might be a good reason to bump up the requirement for the next release. Are there features that people need, and are using in 2.3? I haven't upgraded mine yet. Jeff From Yves.Bastide at irisa.fr Mon Aug 4 03:51:22 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Python 2.3 [Was Re: [BioPython] GRAVY index program anyone?] In-Reply-To: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> References: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> Message-ID: <3F2E107A.4090809@irisa.fr> Jeffrey Chang wrote: > On Friday, August 1, 2003, at 02:04 AM, Andrew Dalke wrote: > >> BTW, I've been experimenting with some of the new 2.3 features. > > > What is everyone's feelings about Python 2.3 for Biopython? I'd rather > be conservative with the versioning (Biopython now requires Python 2.2), > so that people aren't required to upgrade their Python. However, if > there are compelling features in 2.3 that people need to use, that might > be a good reason to bump up the requirement for the next release. Are > there features that people need, and are using in 2.3? I haven't > upgraded mine yet. * enumerate: easy to duplicate. * The dict() constructor: ditto. * Universal newline support: nice, though Biopython should do it OK by now. * Some nice modules: logging, csv, optparse. None useful for Biopython. All in all, I think it's best to stick to Python 2.2 -- well, to incrementally upgrade Biopython to Python 2 :) > > Jeff > yves From dalke at dalkescientific.com Mon Aug 4 04:13:38 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:24 2005 Subject: [Biopython-dev] Python 2.3 [Was Re: [BioPython] GRAVY index program anyone?] In-Reply-To: <3F2E107A.4090809@irisa.fr> Message-ID: <8D5682C5-C653-11D7-961C-000393C92466@dalkescientific.com> I don't think there's a strong reason to move to 2.3. Here's the most relevant changes sum and enumerate builtins - we can get by with old-style code. The module I like most is 'datetime'. As some point we should get our database records to use this. For the chemistry work I do, csv is nice, but surprisingly unused in biology. The bsddb and bz2 modules are nice for Mindy, but not essential. (*sigh*, and I need to finish that off.) The logging might be nice, but I'm just not a logging type of person. I'm told that optparse is better than getopt, so we should move scripts over to use the new API sets is potentially useful, but will require API changes. Eg, we could return search results as a set rather than a list. This would allow us to do intersections and unions pretty easily. But then we lose native order. socket timeout might be handy in a few cases. I don't think so though - the socket should be passed in to objects rather than created internally. zipimport suggests a nice way to distribute biopython (excepting the C extensions) A lot of ugly slice code can be fixed up with the new slice object methods. Andrew From Y.Benita at pharm.uu.nl Mon Aug 4 10:16:21 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Additions to SeqUtils Message-ID: Hi All, As promised a few days ago I submit code to be added to the SeqUtils module. The modules include: Codon adaptation index -> for DNA sequence Protein analysis methods such as isoelectric point, molecular weight and more. Take a look. You just have to change the import statement at the top to fit the location you use for the module. I would appreciate any comments or feedback. Thanks, Yair -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/octet-stream Size: 184320 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20030804/084691a3/attachment.obj From grouse at mail.utexas.edu Mon Aug 4 12:34:22 2003 From: grouse at mail.utexas.edu (Michael Hoffman) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Python 2.3 In-Reply-To: <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> References: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com> <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> Message-ID: On Fri, 1 Aug 2003, Jeffrey Chang wrote: > What is everyone's feelings about Python 2.3 for Biopython? I agree with the previous posters--there are some nice features but it isn't essential. However, I think the programmers might have a hard time resisting using all those new features forever! ;-) We've already seen examples of this. It might be a good idea to decide on a migration schedule that will give the installed base time to migrate to Python 2.3 before we allow check-in of code. Maybe 1 August 2004? Don't worry, it will be here before you realize it. :-) As far as enumerate, I put it into Bio.GFF.GenericTools as soon as I saw the PEP (indeed the code came straight out of the PEP), as are many other non-biological utility classes/functions. Have fun! Also, we have been using optik (now optparse) for all of our scripts locally for some time. It has greatly increased maintainability and documentation of our code. I highly recommend you check it out. csv (available standalone for some time) is very useful for GFF stuff of course. None of the GFF code I have checked in uses it though. -- Michael Hoffman The University of Texas at Austin From Y.Benita at pharm.uu.nl Tue Aug 5 03:20:02 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Additions to SeqUtils In-Reply-To: Message-ID: on 5/8/03 3:16, Mark Yeager at yeagerm@comcast.net wrote: >> As promised a few days ago I submit code to be added to the SeqUtils module. >> The modules include: >> Codon adaptation index -> for DNA sequence >> Protein analysis methods such as isoelectric point, molecular weight and >> more. Take a look. > > Hello Yair- I am just starting out (one week now) with learning Python and > BioPython for small bioinformatics utilities. I am an old Fortran programmer > so it is very new to me. I started to learn Perl and BioPerl but I could never > make useful sense of code examples and I decided to go with Python instead. > > To start with, I'd like to script adding additional info to flat file > databases of proteins of interest. Your example of CAI would be a perfect > starting point. > > Is there a small example program to orient me to actually do something useful- > to specify an accession number and lookup the sequence in a fasta file and > then calculate the CAI? A working example I can play with but actually do > something useful. > > I am continuing to read through the tutorials but I have yet to make it to > BioPython. There is probably something already there along these lines- > perhaps you can point me to that? > > Thanks very much for your contributions, best regards, > > Mark Yeager Hi Mark, You can look in the test files for more examples. Here are a few lines which can help you fetch a gene from Genbank and get the CAI. from Bio.WWW import NCBI from Bio import Fasta import CodonUsage #make sure you put the module on your python path # fetch a gene from genebank aGene = NCBI.efetch('nucleotide',id='23113',seq_start=373,seq_stop=1113,rettype='fas ta') # set up the fasta parser to read it. parser = Fasta.RecordParser() iterator = Fasta.Iterator(aGene, parser) record = iterator.next() # create an instance of CodonAdaptationIndex aGeneCai = CodonUsage.CodonAdaptationIndex() # print the gene in fasta format print record # print the CAI for the gene using the cai_for_gene method. # Note that the default Shart & Li Ecoli index is used when # you don't specify a different index. # look in the test_CodonUsage for an example on making your own index. print "\nCodon adaptation index for the above gene: %.2f" % aGeneCai.cai_for_gene(record.sequence) -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University From andreas.kuntzagk at mdc-berlin.de Tue Aug 5 03:33:33 2003 From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Python 2.3 In-Reply-To: References: <38832842-C3FF-11D7-961C-000393C92466@dalkescientific.com> <919ED7E4-C442-11D7-B81C-000A956845CE@smi.stanford.edu> Message-ID: <1060068757.11841.58.camel@sulawesi> Am Mon, 2003-08-04 um 18.34 schrieb Michael Hoffman: > On Fri, 1 Aug 2003, Jeffrey Chang wrote: > > > What is everyone's feelings about Python 2.3 for Biopython? > > I agree with the previous posters--there are some nice features but it > isn't essential. I think the same. > However, I think the programmers might have a hard > time resisting using all those new features forever! I can resist for some time. (At least until I find out by accident, that the next distribution I install comes with 2.3 preinstalled. > ;-) We've already > seen examples of this. It might be a good idea to decide on a > migration schedule that will give the installed base time to migrate > to Python 2.3 before we allow check-in of code. Maybe 1 August 2004? > Don't worry, it will be here before you realize it. :-) > > As far as enumerate, I put it into Bio.GFF.GenericTools as soon as I > saw the PEP (indeed the code came straight out of the PEP), as are > many other non-biological utility classes/functions. Have fun! So we already depend on 2.3 ? Or can you use enumerate with 2.2? The longer I think about enumerate, the more I think it is a useful feature. Maybe we can change the migration date to September 2003 ;-) Btw. I was thinking, is there a better way to use the Iterator-classes in the different Modules? Up to now, I do it like this: Parser = GenBank.RecordParser() File = open ("foo") Iterator = GenBank.Iterator(File,parser=Parser) while 1: record = it.next() if not record: break ... I think, nicer looking would be: File = GenBank.Flatfile("foo",Parser) for record in File: # work with record Can I already do this? Maybe with the new Parsers? (I still haven't checked it out.) Andreas From dalke at dalkescientific.com Tue Aug 5 03:55:16 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Python 2.3 In-Reply-To: <1060068757.11841.58.camel@sulawesi> Message-ID: <26B71C22-C71A-11D7-961C-000393C92466@dalkescientific.com> Andreas Kuntzagk: > So we already depend on 2.3 ? Or can you use enumerate with 2.2? The PEP which introduces enumerate includes Python 2.2 compatible code which implements the same functionality. Michael Hoffman added that snippet to his code, so he could use it despite not supporting 2.3. It is possible to have a 'compatibility' module which lets some 2.3-style code work on 2.2 Pythons. For example, from Bio.compatibility import enumerate This would be implemented as try: enumerate except NameError: def enumerate ... I've seen similar code used (rarely) in other modules. I'm not enthusiastic one way or the other, but I don't see any deep problem with it. > Btw. I was thinking, is there a better way to use the Iterator-classes > in the different Modules? *sigh* Last time I worked on the parser code we still had 2.1 compatibility so I have lots of tricky, ugly code to emulate iterators through __getitem__. I need to clean that up. (And what I really want is someone to pay me for it. :) Sadly, not happening soon. Andrew dalke@dalkescientific.com From thomas at cbs.dtu.dk Tue Aug 5 07:46:10 2003 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Additions to SeqUtils In-Reply-To: <6FCDAE95-C712-11D7-BC7F-000A956845CE@jeffchang.com> References: <6FCDAE95-C712-11D7-BC7F-000A956845CE@jeffchang.com> Message-ID: Jeffrey Chang writes: > Hej Thomas, > > How are you doing? Could you take a look at this and submit it if is > looks OK to you? ?hhmm .. I am currently moving from Uppsala to Malm?, all the essential stuff (computers, coffee machine, mp3's) are in the moving company truck Arghhhh - I feel so lonely without my computers!!!! :-) I can look into that after restoring my original chaos, just send me a reminding email next week, cheers -thomas > > > Jeff > > > > On Monday, August 4, 2003, at 07:16 AM, Yair Benita wrote: > > > Hi All, > > As promised a few days ago I submit code to be added to the SeqUtils > > module. > > > The modules include: > > Codon adaptation index -> for DNA sequence > > Protein analysis methods such as isoelectric point, molecular weight > > and > > > more. Take a look. > > > > You just have to change the import statement at the top to fit the > > location > > > you use for the module. > > > > I would appreciate any comments or feedback. > > Thanks, > > Yair > > -- > > Yair Benita > > Pharmaceutical Proteomics > > Faculty of Pharmacy > > Utrecht University > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev@biopython.org > > http://biopython.org/mailman/listinfo/biopython-dev > -- Sicheritz-Ponten Thomas, Ph.D, thomas@biopython.org ( Center for Biological Sequence Analysis \ BioCentrum-DTU, Technical University of Denmark ) CBS: +45 45 252485 Building 208, DK-2800 Lyngby ##-----> Fax: +45 45 931585 http://www.cbs.dtu.dk/thomas ) / ... damn arrow eating trees ... ( From bugzilla-daemon at portal.open-bio.org Tue Aug 5 11:48:57 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] [Bug 1409] Problems with Martel and setup.py Message-ID: <200308051548.h75Fmv26001554@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1409 ------- Additional Comments From dvorak@mcs.anl.gov 2003-08-05 11:48 ------- Hello: When I try to install, I am having similar problems. Please see the last line of output below. [dvorak@jlogin2 biopython-1.21]$ python setup.py install --prefix=/soft/apps/packages/python-biopython-1.21 running install *** Martel *** is either not installed or out of date. This package is required for many Biopython features. Please install it before you install Biopython. You can find Martel at http://www.biopython.org/~dalke/Martel/. Do you want to continue this installation? (y/N) y *** Reportlab *** is either not installed or out of date. This package is optional, which means it is only used in a few specialized modules in Biopython. You probably don't need this is you are unsure. You can ignore this requirement, and install it later if you see ImportErrors. You can find Reportlab at http://www.reportlab.com/download.html. Do you want to continue this installation? (Y/n) y running build running build_py Traceback (most recent call last): File "setup.py", line 387, in ? ext_modules=EXTENSIONS, File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/core.py", line 138, in setup File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line 893, in run_commands File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line 913, in run_command File "setup.py", line 137, in run install.run(self) File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/command/install.py", line 491, in run File "/usr/lib/python2.2/cmd.py", line 330, in run_command print "\n" File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line 913, in run_command File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/command/build.py", line 107, in run File "/usr/lib/python2.2/cmd.py", line 330, in run_command print "\n" File "/var/tmp/python-2.2.1-root/usr/lib/python2.2/distutils/dist.py", line 913, in run_command File "setup.py", line 144, in run if not is_Martel_installed(): File "setup.py", line 190, in is_Martel_installed m = can_import("Martel") File "setup.py", line 179, in can_import return __import__(module_name) File "Martel/__init__.py", line 78, in ? NoCase = Expression.NoCase AttributeError: 'module' object has no attribute 'NoCase' ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jchang at smi.stanford.edu Tue Aug 5 13:14:04 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Python 2.3 In-Reply-To: <1060068757.11841.58.camel@sulawesi> Message-ID: <36C93FA0-C768-11D7-8FEF-000A956845CE@smi.stanford.edu> On Tuesday, August 5, 2003, at 12:32 AM, Andreas Kuntzagk wrote: > Btw. I was thinking, is there a better way to use the Iterator-classes > in the different Modules? > > Up to now, I do it like this: > Parser = GenBank.RecordParser() > File = open ("foo") > Iterator = GenBank.Iterator(File,parser=Parser) > while 1: > record = it.next() > if not record: break > ... > > I think, nicer looking would be: > > File = GenBank.Flatfile("foo",Parser) > for record in File: > # work with record Yes, that is much nicer. We can do that, now that we require Python 2.2. However, the code usually lags behind somewhat, because people are reluctant to change working code (and update the documentation), and break people's code. I did go back and add iterator support to the Medline code, which introduced some bugs that are now fixed in the CVS. :( But yes, I think using iterators is cleaner, and should be done at some point. Jeff From letondal at pasteur.fr Tue Aug 5 14:23:05 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Pise url change Message-ID: <200308051823.h75IN5BX114747@electre.pasteur.fr> Hi, The url of the Pise software that is indicated in: http://www.biopython.org/scriptcentral/ (http://www-alt.pasteur.fr/~letondal/Pise/#pisepython) is now invalid. Would it be possible to change it to the new location: http://www.pasteur.fr/recherche/unites/sis/Pise/#pisepython Thanks a lot in advance, -- Catherine Letondal -- Pasteur Institute Computing Center From chapmanb at uga.edu Tue Aug 5 14:40:19 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Pise url change In-Reply-To: <200308051823.h75IN5BX114747@electre.pasteur.fr> References: <200308051823.h75IN5BX114747@electre.pasteur.fr> Message-ID: <20030805184018.GB86377@evostick.agtec.uga.edu> Hi Catherine; > The url of the Pise software that is indicated in: > http://www.biopython.org/scriptcentral/ > (http://www-alt.pasteur.fr/~letondal/Pise/#pisepython) > is now invalid. > > Would it be possible to change it to the new location: > http://www.pasteur.fr/recherche/unites/sis/Pise/#pisepython No problem. All done. By the way, you can edit these pages yourself (which I hoped would make things easier and keep things up to date). The username and password are at the bottom of: http://biopython.org/docs/developer/website_technical.html Same info and password for both the Participants and the Script Central pages. Brad From bugzilla-daemon at portal.open-bio.org Tue Aug 5 20:19:40 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] [Bug 1409] Problems with Martel and setup.py Message-ID: <200308060019.h760JecW003032@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1409 jchang@biopython.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Additional Comments From jchang@biopython.org 2003-08-05 20:19 ------- This can happen if the mx.TextTools is not already installed before building Biopython. If this is the case, then import Martel fails because of the mx.TextTools dependency. However, the "Expressions" module is still left in the namespace. This will be tricky to fix. The work-around is to install mx.TextTools first. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From Y.Benita at pharm.uu.nl Wed Aug 6 02:33:55 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Updates to protein weights Message-ID: The protein weights in Bio-> Data -> IUPACData are not accurate enough. For large proteins I get an error which is unacceptable. Can someone with access please update them: protein_weights = { "A": 89.09, "C": 121.16, "D": 133.10, "E": 147.13, "F": 165.19, "G": 75.07, "H": 155.16, "I": 131.18, "K": 146.19, "L": 131.18, "M": 149.21, "N": 132.12, "P": 115.13, "Q": 146.15, "R": 174.20, "S": 105.09, "T": 119.12, "V": 117.15, "W": 204.23, "Y": 181.19} -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University From Y.Benita at pharm.uu.nl Wed Aug 6 03:04:55 2003 From: Y.Benita at pharm.uu.nl (Yair Benita) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Additions to SeqUtils In-Reply-To: Message-ID: Here is an update to the molecular weight functions in ProtParam: # Calculate MW from Protein sequence def molecular_weight (self): # make local dictionary for speed MwDict = {} # remove a molecule of water from the amino acid weights. for i in IUPACData.protein_weights.keys(): MwDict[i] = IUPACData.protein_weights[i] - 18.02 MW = 18.02 # add just one water molecule for the whole sequence. for i in self.sequence: MW += MwDict[i] return MW on 4/8/03 16:16, Yair Benita at Y.Benita@pharm.uu.nl wrote: > Hi All, > As promised a few days ago I submit code to be added to the SeqUtils module. > The modules include: > Codon adaptation index -> for DNA sequence > Protein analysis methods such as isoelectric point, molecular weight and > more. Take a look. > > You just have to change the import statement at the top to fit the location > you use for the module. > > I would appreciate any comments or feedback. > Thanks, > Yair -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University From mark_yeager at merck.com Wed Aug 6 09:46:57 2003 From: mark_yeager at merck.com (Yeager, Mark D) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Updates to protein weights Message-ID: <70A1E2C0DD86D511B81500508BB23F1F07066C6D@uswpmx00.merck.com> I am very new to Python and Biopython, so I may be making some irrelevant suggestions... For those doing mass spectroscopy (proteomics), it might be beneficial to include monoisotopic as well as average MW for the amino acids, and bookkeeping of the additional H2O for a polypeptide could be done at the end of the calculation. For example here are a couple amino acids without H2O Monoiso. Average A 71.03711 71.0788 R 156.10111 156.1875 monoiso H2O= 18.01057 average H2O= 18.0152 (I'm not sure about the accuracy of average H20 -- have to dig into IUPAC) With the spreading use of highly accurate FTICR (Fourier transform ion cyclotron resonance) mass spec, this might be a good thing to have in place sooner rather than later. FYI: Carbon is a mix of 12C and 13C isotopes (and 14C which we neglect here). Natural abundance of 98.9% and 1.1%. By definition 12C = 12.00000 and all other isotope masses refer to this standard. 13C = 13.00335.1 1H = 1.00783 16O is 15.99491. -- Mark -----Original Message----- From: Yair Benita [mailto:Y.Benita@pharm.uu.nl] Sent: Wednesday, August 06, 2003 2:34 AM To: biopython-dev@biopython.org Subject: [Biopython-dev] Updates to protein weights The protein weights in Bio-> Data -> IUPACData are not accurate enough. For large proteins I get an error which is unacceptable. Can someone with access please update them: protein_weights = { "A": 89.09, "C": 121.16, "D": 133.10, "E": 147.13, "F": 165.19, "G": 75.07, "H": 155.16, "I": 131.18, "K": 146.19, "L": 131.18, "M": 149.21, "N": 132.12, "P": 115.13, "Q": 146.15, "R": 174.20, "S": 105.09, "T": 119.12, "V": 117.15, "W": 204.23, "Y": 181.19} -- Yair Benita Pharmaceutical Proteomics Faculty of Pharmacy Utrecht University ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ------------------------------------------------------------------------------ From jchang at smi.stanford.edu Wed Aug 6 11:39:40 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Updates to protein weights In-Reply-To: Message-ID: <31B521C2-C824-11D7-A6BC-000A956845CE@smi.stanford.edu> I have committed these in the the CVS. Jeff On Tuesday, August 5, 2003, at 11:33 PM, Yair Benita wrote: > The protein weights in Bio-> Data -> IUPACData are not accurate > enough. For > large proteins I get an error which is unacceptable. Can someone with > access > please update them: > > protein_weights = { > "A": 89.09, > "C": 121.16, > "D": 133.10, > "E": 147.13, > "F": 165.19, > "G": 75.07, > "H": 155.16, > "I": 131.18, > "K": 146.19, > "L": 131.18, > "M": 149.21, > "N": 132.12, > "P": 115.13, > "Q": 146.15, > "R": 174.20, > "S": 105.09, > "T": 119.12, > "V": 117.15, > "W": 204.23, > "Y": 181.19} > > > -- > Yair Benita > Pharmaceutical Proteomics > Faculty of Pharmacy > Utrecht University > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From Yves.Bastide at irisa.fr Fri Aug 8 06:09:03 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] [patch] NCBIStandalone.py cleanup Message-ID: <3F3376BF.2070304@irisa.fr> Hi, Here's a patch bringing Bio.Blast.NCBIStandalone nearer to Python 3 compatibility :-) Changelog: * Don't use the string module * use .find() == -1, not .find() <= 0 * use .startswith() and .endwith() Regards, yves -------------- next part -------------- Index: NCBIStandalone.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIStandalone.py,v retrieving revision 1.47 diff -u -p -r1.47 NCBIStandalone.py --- NCBIStandalone.py 2003/06/09 02:12:09 1.47 +++ NCBIStandalone.py 2003/08/08 10:04:35 @@ -34,9 +34,7 @@ blastpgp Execute blastpgp. """ import os -import string import re -from types import * from Bio import File from Bio.ParserSupport import * @@ -145,9 +143,9 @@ class _Scanner: while 1: line = safe_peekline(uhandle) - if line[:9] != 'Searching' and \ - re.search(r"Score +E", line) is None and \ - string.find(line, 'No hits found') < 0: + if (not line.startswith('Searching') and + re.search(r"Score +E", line) is None and + line.find('No hits found') == -1): break self._scan_descriptions(uhandle, consumer) @@ -197,7 +195,7 @@ class _Scanner: # Check for these error lines and ignore them for now. Let # the BlastErrorParser deal with them. line = uhandle.peekline() - if line.find("ERROR:") >= 0 or line.startswith("done"): + if line.find("ERROR:") != -1 or line.startswith("done"): read_and_call_while(uhandle, consumer.noevent, contains="ERROR:") read_and_call(uhandle, consumer.noevent, start="done") @@ -256,7 +254,7 @@ class _Scanner: # Read the descriptions and the following blank lines, making # sure that there are descriptions. - if uhandle.peekline()[:19] != 'Sequences not found': + if not uhandle.peekline().startswith('Sequences not found'): read_and_call_until(uhandle, consumer.description, blank=1) read_and_call_while(uhandle, consumer.noevent, blank=1) @@ -269,7 +267,7 @@ class _Scanner: # Read the descriptions and the following blank lines. read_and_call_while(uhandle, consumer.noevent, blank=1) l = safe_peekline(uhandle) - if l[:9] != 'CONVERGED' and l[0] != '>': + if not l.startswith('CONVERGED') and l[0] != '>': read_and_call_until(uhandle, consumer.description, blank=1) read_and_call_while(uhandle, consumer.noevent, blank=1) @@ -281,7 +279,7 @@ class _Scanner: def _scan_alignments(self, uhandle, consumer): # First, check to see if I'm at the database report. line = safe_peekline(uhandle) - if line[:10] == ' Database': + if line.startswith(' Database'): return elif line[0] == '>': # XXX make a better check here between pairwise and masterslave @@ -305,7 +303,7 @@ class _Scanner: # Scan a bunch of score/alignment pairs. while 1: line = safe_peekline(uhandle) - if line[:6] != ' Score': + if not line.startswith(' Score'): break self._scan_hsp(uhandle, consumer) consumer.end_alignment() @@ -318,7 +316,7 @@ class _Scanner: read_and_call(uhandle, consumer.title, start='>') while 1: line = safe_readline(uhandle) - if string.lstrip(line)[:8] == 'Length =': + if line.lstrip().startswith('Length ='): consumer.length(line) break elif is_blank_line(line): @@ -372,7 +370,7 @@ class _Scanner: read_and_call_while(uhandle, consumer.noevent, blank=1) line = safe_peekline(uhandle) # Alignment continues if I see a 'Query' or the spaces for Blastn. - if line[:5] != 'Query' and line[:5] != ' ': + if not (line.startswith('Query') or line.startswith(' ')): break def _scan_masterslave_alignment(self, uhandle, consumer): @@ -382,10 +380,10 @@ class _Scanner: # Check to see whether I'm finished reading the alignment. # This is indicated by 1) database section, 2) next psi-blast round # patch by chapmanb - if line[:9] == 'Searching': + if line.startswith('Searching'): uhandle.saveline(line) break - elif line[:10] == ' Database': + elif line.startswith(' Database'): uhandle.saveline(line) break elif is_blank_line(line): @@ -423,7 +421,7 @@ class _Scanner: line = safe_readline(uhandle) uhandle.saveline(line) - if string.find(line, 'Lambda') >= 0: + if line.find('Lambda') != -1: break read_and_call(uhandle, consumer.noevent, start='Lambda') @@ -577,22 +575,22 @@ class _HeaderConsumer: self._header = Record.Header() def version(self, line): - c = string.split(line) + c = line.split() self._header.application = c[0] self._header.version = c[1] self._header.date = c[2][1:-1] def reference(self, line): - if line[:11] == 'Reference: ': + if line.startswith('Reference: '): self._header.reference = line[11:] else: self._header.reference = self._header.reference + line def query_info(self, line): - if line[:7] == 'Query= ': + if line.startswith('Query= '): self._header.query = line[7:] - elif line[:7] != ' ': # continuation of query_info - self._header.query = self._header.query + line + elif not line.startswith(' '): # continuation of query_info + self._header.query = "%s%s" % (self._header.query, line) else: letters, = _re_search( r"([0-9,]+) letters", line, @@ -600,11 +598,11 @@ class _HeaderConsumer: self._header.query_letters = _safe_int(letters) def database_info(self, line): - line = string.rstrip(line) - if line[:10] == 'Database: ': + line = line.rstrip() + if line.startswith('Database: '): self._header.database = line[10:] - elif not line[-13:] == 'total letters': - self._header.database = self._header.database + string.strip(line) + elif not line.endswith('total letters'): + self._header.database = self._header.database + line.strip() else: sequences, letters =_re_search( r"([0-9,]+) sequences; ([0-9,]+) total letters", line, @@ -614,8 +612,8 @@ class _HeaderConsumer: def end_header(self): # Get rid of the trailing newlines - self._header.reference = string.rstrip(self._header.reference) - self._header.query = string.rstrip(self._header.query) + self._header.reference = self._header.reference.rstrip() + self._header.query = self._header.query.rstrip() class _DescriptionConsumer: def start_descriptions(self): @@ -629,8 +627,8 @@ class _DescriptionConsumer: self.__has_n = 0 # Does the description line contain an N value? def description_header(self, line): - if line[:19] == 'Sequences producing': - cols = string.split(line) + if line.startswith('Sequences producing'): + cols = line.split() if cols[-1] == 'N': self.__has_n = 1 @@ -656,9 +654,9 @@ class _DescriptionConsumer: pass def round(self, line): - if line[:18] != 'Results from round': + if not line.startswith('Results from round'): raise SyntaxError, "I didn't understand the round line\n%s" % line - self._roundnum = _safe_int(string.strip(line[18:])) + self._roundnum = _safe_int(line[18:]) def end_descriptions(self): pass @@ -674,23 +672,23 @@ class _DescriptionConsumer: # - title must be preserved exactly (including whitespaces) # - score could be equal to e-value (not likely, but what if??) # - sometimes there's an "N" score of '1'. - cols = string.split(line) + cols = line.split() if len(cols) < 3: raise SyntaxError, \ "Line does not appear to contain description:\n%s" % line if self.__has_n: - i = string.rfind(line, cols[-1]) # find start of N - i = string.rfind(line, cols[-2], 0, i) # find start of p-value - i = string.rfind(line, cols[-3], 0, i) # find start of score + i = line.rfind(cols[-1]) # find start of N + i = line.rfind(cols[-2], 0, i) # find start of p-value + i = line.rfind(cols[-3], 0, i) # find start of score else: - i = string.rfind(line, cols[-1]) # find start of p-value - i = string.rfind(line, cols[-2], 0, i) # find start of score + i = line.rfind(cols[-1]) # find start of p-value + i = line.rfind(cols[-2], 0, i) # find start of score if self.__has_n: dh.title, dh.score, dh.e, dh.num_alignments = \ - string.rstrip(line[:i]), cols[-3], cols[-2], cols[-1] + line[:i].rstrip(), cols[-3], cols[-2], cols[-1] else: dh.title, dh.score, dh.e, dh.num_alignments = \ - string.rstrip(line[:i]), cols[-2], cols[-1], 1 + line[:i].rstrip(), cols[-2], cols[-1], 1 dh.num_alignments = _safe_int(dh.num_alignments) dh.score = _safe_int(dh.score) dh.e = _safe_float(dh.e) @@ -706,52 +704,52 @@ class _AlignmentConsumer: self._multiple_alignment = Record.MultipleAlignment() def title(self, line): - self._alignment.title = self._alignment.title + string.lstrip(line) + self._alignment.title = "%s%s" % (self._alignment.title, + line.lstrip()) def length(self, line): - self._alignment.length = string.split(line)[2] + self._alignment.length = line.split()[2] self._alignment.length = _safe_int(self._alignment.length) def multalign(self, line): # Standalone version uses 'QUERY', while WWW version uses blast_tmp. - if line[:5] == 'QUERY' or line[:9] == 'blast_tmp': + if line.startswith('QUERY') or line.startswith('blast_tmp'): # If this is the first line of the multiple alignment, # then I need to figure out how the line is formatted. # Format of line is: # QUERY 1 acttg...gccagaggtggtttattcagtctccataagagaggggacaaacg 60 try: - name, start, seq, end = string.split(line) + name, start, seq, end = line.split() except ValueError: raise SyntaxError, "I do not understand the line\n%s" \ % line - self._start_index = string.index(line, start, len(name)) - self._seq_index = string.index(line, seq, - self._start_index+len(start)) + self._start_index = line.index(start, len(name)) + self._seq_index = line.index(seq, + self._start_index+len(start)) # subtract 1 for the space self._name_length = self._start_index - 1 self._start_length = self._seq_index - self._start_index - 1 - self._seq_length = string.rfind(line, end) - self._seq_index - 1 + self._seq_length = line.rfind(end) - self._seq_index - 1 - #self._seq_index = string.index(line, seq) + #self._seq_index = line.index(seq) ## subtract 1 for the space - #self._seq_length = string.rfind(line, end) - self._seq_index - 1 - #self._start_index = string.index(line, start) + #self._seq_length = line.rfind(end) - self._seq_index - 1 + #self._start_index = line.index(start) #self._start_length = self._seq_index - self._start_index - 1 #self._name_length = self._start_index # Extract the information from the line - name = string.rstrip(line[:self._name_length]) - start = string.rstrip( - line[self._start_index:self._start_index+self._start_length]) + name = line[:self._name_length] + name = name.rstrip() + start = line[self._start_index:self._start_index+self._start_length] + start = start.rstrip() if start: start = _safe_int(start) - end = string.rstrip( - line[self._seq_index+self._seq_length:]) + end = line[self._seq_index+self._seq_length:].rstrip() if end: end = _safe_int(end) - seq = string.rstrip( - line[self._seq_index:self._seq_index+self._seq_length]) + seq = line[self._seq_index:self._seq_index+self._seq_length].rstrip() # right pad the sequence with spaces if necessary if len(seq) < self._seq_length: seq = seq + ' '*(self._seq_length-len(seq)) @@ -826,7 +824,7 @@ class _AlignmentConsumer: def end_alignment(self): # Remove trailing newlines if self._alignment: - self._alignment.title = string.rstrip(self._alignment.title) + self._alignment.title = self._alignment.title.rstrip() # This code is also obsolete. See note above. # If there's a multiple alignment, I will need to make sure @@ -883,16 +881,16 @@ class _HSPConsumer: "I could not find the identities in line\n%s" % line) self._hsp.identities = _safe_int(x), _safe_int(y) - if string.find(line, 'Positives') >= 0: + if line.find('Positives') != -1: x, y = _re_search( r"Positives = (\d+)\/(\d+)", line, "I could not find the positives in line\n%s" % line) self._hsp.positives = _safe_int(x), _safe_int(y) - if string.find(line, 'Gaps') >= 0: + if line.find('Gaps') != -1: x, y = _re_search( r"Gaps = (\d+)\/(\d+)", line, - "I could not find the positives in line\n%s" % line) + "I could not find the gaps in line\n%s" % line) self._hsp.gaps = _safe_int(x), _safe_int(y) @@ -905,7 +903,7 @@ class _HSPConsumer: # Frame can be in formats: # Frame = +1 # Frame = +2 / +2 - if string.find(line, '/') >= 0: + if line.find('/') != -1: self._hsp.frame = _re_search( r"Frame = ([-+][123]) / ([-+][123])", line, "I could not find the frame in line\n%s" % line) @@ -931,7 +929,7 @@ class _HSPConsumer: self._query_len = len(seq) def align(self, line): - seq = string.rstrip(line[self._query_start_index:]) + seq = line[self._query_start_index:].rstrip() if len(seq) < self._query_len: # Make sure the alignment is the same length as the query seq = seq + ' ' * (self._query_len-len(seq)) @@ -948,7 +946,7 @@ class _HSPConsumer: #On occasion, there is a blast hit with no subject match #so far, it only occurs with 1-line short "matches" #I have decided to let these pass as they appear - if not string.strip(seq): + if not seq.strip(): seq = ' ' * self._query_len self._hsp.sbjct = self._hsp.sbjct + seq if self._hsp.sbjct_start is None: @@ -976,8 +974,8 @@ class _DatabaseReportConsumer: self._dr.database_name.append(m.group(1)) elif self._dr.database_name: # This must be a continuation of the previous name. - x = self._dr.database_name[-1] + string.strip(line) - self._dr.database_name[-1] = x + self._dr.database_name[-1] = "%s%s" % (self._dr.database_name[-1], + line.strip()) def posted_date(self, line): self._dr.posted_date.append(_re_search( @@ -995,14 +993,14 @@ class _DatabaseReportConsumer: self._dr.num_sequences_in_database.append(_safe_int(sequences)) def ka_params(self, line): - x = string.split(line) + x = line.split() self._dr.ka_params = map(_safe_float, x) def gapped(self, line): self._dr.gapped = 1 def ka_params_gap(self, line): - x = string.split(line) + x = line.split() self._dr.ka_params_gap = map(_safe_float, x) def end_database_report(self): @@ -1013,7 +1011,7 @@ class _ParametersConsumer: self._params = Record.Parameters() def matrix(self, line): - self._params.matrix = string.rstrip(line[8:]) + self._params.matrix = line[8:].rstrip() def gap_penalties(self, line): x = _get_cols( @@ -1021,7 +1019,7 @@ class _ParametersConsumer: self._params.gap_penalties = map(_safe_float, x) def num_hits(self, line): - if string.find(line, '1st pass') >= 0: + if line.find('1st pass') != -1: x, = _get_cols(line, (-4,), ncols=11, expected={2:"Hits"}) self._params.num_hits = _safe_int(x) else: @@ -1029,7 +1027,7 @@ class _ParametersConsumer: self._params.num_hits = _safe_int(x) def num_sequences(self, line): - if string.find(line, '1st pass') >= 0: + if line.find('1st pass') != -1: x, = _get_cols(line, (-4,), ncols=9, expected={2:"Sequences:"}) self._params.num_sequences = _safe_int(x) else: @@ -1037,7 +1035,7 @@ class _ParametersConsumer: self._params.num_sequences = _safe_int(x) def num_extends(self, line): - if string.find(line, '1st pass') >= 0: + if line.find('1st pass') != -1: x, = _get_cols(line, (-4,), ncols=9, expected={2:"extensions:"}) self._params.num_extends = _safe_int(x) else: @@ -1045,7 +1043,7 @@ class _ParametersConsumer: self._params.num_extends = _safe_int(x) def num_good_extends(self, line): - if string.find(line, '1st pass') >= 0: + if line.find('1st pass') != -1: x, = _get_cols(line, (-4,), ncols=10, expected={3:"extensions:"}) self._params.num_good_extends = _safe_int(x) else: @@ -1297,8 +1295,13 @@ class Iterator: If set to None, then the raw contents of the file will be returned. """ - if type(handle) is not FileType and type(handle) is not InstanceType: - raise ValueError, "I expected a file handle or file-like object" + try: + dummy = handle.readline + except AttributeError: + raise ValueError( + "I expected a file handle or file-like object, got %s" + % type(handle)) + del dummy self._uhandle = File.UndoHandle(handle) self._parser = parser @@ -1315,7 +1318,8 @@ class Iterator: if not line: break # If I've reached the next one, then put the line back and stop. - if lines and (line[:5] == 'BLAST' or line[1:6] == 'BLAST'): + if lines and (line.startswith('BLAST') + or line.startswith('BLAST', start = 1)): self._uhandle.saveline(line) break lines.append(line) @@ -1323,7 +1327,7 @@ class Iterator: if not lines: return None - data = string.join(lines, '') + data = ''.join(lines) if self._parser is not None: return self._parser.parse(File.StringHandle(data)) return data @@ -1559,7 +1563,7 @@ def _re_search(regex, line, error_msg): return m.groups() def _get_cols(line, cols_to_get, ncols=None, expected={}): - cols = string.split(line) + cols = line.split() # Check to make sure number of columns is correct if ncols is not None and len(cols) != ncols: @@ -1584,13 +1588,14 @@ def _safe_int(str): except ValueError: # Something went wrong. Try to clean up the string. # Remove all commas from the string - str = string.replace(str, ',', '') + str = str.replace(',', '') try: # try again. return int(str) except ValueError: pass # If it fails again, maybe it's too long? + # XXX why converting to float? return long(float(str)) def _safe_float(str): @@ -1599,13 +1604,13 @@ def _safe_float(str): # we need to check the string for this condition. # Sometimes BLAST leaves of the '1' in front of an exponent. - if str[0] in ['E', 'e']: + if str and str[0] in ['E', 'e']: str = '1' + str try: return float(str) except ValueError: # Remove all commas from the string - str = string.replace(str, ',', '') + str = str.replace(',', '') # try again. return float(str) @@ -1613,7 +1618,7 @@ class _BlastErrorConsumer(_BlastConsumer def __init__(self): _BlastConsumer.__init__(self) def noevent(self, line): - if line.find("Query must be at least wordsize") >= 0: + if line.find("Query must be at least wordsize") != -1: raise ShortQueryBlastError, "Query must be at least wordsize" # Now pass the line back up to the superclass. method = getattr(_BlastConsumer, 'noevent', @@ -1687,7 +1692,7 @@ class BlastErrorParser(AbstractParser): # 'Searchingdone' instead of 'Searching......done' seems # to indicate a failure to perform the BLAST due to # low quality sequence - if line[:13] == 'Searchingdone': + if line.startswith('Searchingdone'): raise LowQualityBlastError("Blast failure occured on query: ", data_record.query) line = handle.readline() From Yves.Bastide at irisa.fr Fri Aug 8 08:45:02 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Flat files indices Message-ID: <3F339B4E.5090304@irisa.fr> Hi, Are the "Open-bio flat-file indexing systems" implemented soemwhere? yves From jchang at jeffchang.com Fri Aug 8 15:31:46 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] [patch] NCBIStandalone.py cleanup In-Reply-To: <3F3376BF.2070304@irisa.fr> Message-ID: Great patch! I've applied it to the CVS repository. Jeff On Friday, August 8, 2003, at 03:09 AM, Yves Bastide wrote: > Hi, > > Here's a patch bringing Bio.Blast.NCBIStandalone nearer to Python 3 > compatibility :-) > > Changelog: > * Don't use the string module > * use .find() == -1, not .find() <= 0 > * use .startswith() and .endwith() > > Regards, > > yves > Index: NCBIStandalone.py > =================================================================== > RCS file: > /home/repository/biopython/biopython/Bio/Blast/NCBIStandalone.py,v > retrieving revision 1.47 > diff -u -p -r1.47 NCBIStandalone.py > --- NCBIStandalone.py 2003/06/09 02:12:09 1.47 > +++ NCBIStandalone.py 2003/08/08 10:04:35 > @@ -34,9 +34,7 @@ blastpgp Execute blastpgp. > """ > > import os > -import string > import re > -from types import * > > from Bio import File > from Bio.ParserSupport import * > @@ -145,9 +143,9 @@ class _Scanner: > > while 1: > line = safe_peekline(uhandle) > - if line[:9] != 'Searching' and \ > - re.search(r"Score +E", line) is None and \ > - string.find(line, 'No hits found') < 0: > + if (not line.startswith('Searching') and > + re.search(r"Score +E", line) is None and > + line.find('No hits found') == -1): > break > > self._scan_descriptions(uhandle, consumer) > @@ -197,7 +195,7 @@ class _Scanner: > # Check for these error lines and ignore them for now. Let > # the BlastErrorParser deal with them. > line = uhandle.peekline() > - if line.find("ERROR:") >= 0 or line.startswith("done"): > + if line.find("ERROR:") != -1 or line.startswith("done"): > read_and_call_while(uhandle, consumer.noevent, > contains="ERROR:") > read_and_call(uhandle, consumer.noevent, start="done") > > @@ -256,7 +254,7 @@ class _Scanner: > > # Read the descriptions and the following blank lines, making > # sure that there are descriptions. > - if uhandle.peekline()[:19] != 'Sequences not found': > + if not uhandle.peekline().startswith('Sequences not found'): > read_and_call_until(uhandle, consumer.description, > blank=1) > read_and_call_while(uhandle, consumer.noevent, blank=1) > > @@ -269,7 +267,7 @@ class _Scanner: > # Read the descriptions and the following blank lines. > read_and_call_while(uhandle, consumer.noevent, blank=1) > l = safe_peekline(uhandle) > - if l[:9] != 'CONVERGED' and l[0] != '>': > + if not l.startswith('CONVERGED') and l[0] != '>': > read_and_call_until(uhandle, consumer.description, > blank=1) > read_and_call_while(uhandle, consumer.noevent, > blank=1) > > @@ -281,7 +279,7 @@ class _Scanner: > def _scan_alignments(self, uhandle, consumer): > # First, check to see if I'm at the database report. > line = safe_peekline(uhandle) > - if line[:10] == ' Database': > + if line.startswith(' Database'): > return > elif line[0] == '>': > # XXX make a better check here between pairwise and > masterslave > @@ -305,7 +303,7 @@ class _Scanner: > # Scan a bunch of score/alignment pairs. > while 1: > line = safe_peekline(uhandle) > - if line[:6] != ' Score': > + if not line.startswith(' Score'): > break > self._scan_hsp(uhandle, consumer) > consumer.end_alignment() > @@ -318,7 +316,7 @@ class _Scanner: > read_and_call(uhandle, consumer.title, start='>') > while 1: > line = safe_readline(uhandle) > - if string.lstrip(line)[:8] == 'Length =': > + if line.lstrip().startswith('Length ='): > consumer.length(line) > break > elif is_blank_line(line): > @@ -372,7 +370,7 @@ class _Scanner: > read_and_call_while(uhandle, consumer.noevent, blank=1) > line = safe_peekline(uhandle) > # Alignment continues if I see a 'Query' or the spaces > for Blastn. > - if line[:5] != 'Query' and line[:5] != ' ': > + if not (line.startswith('Query') or line.startswith(' > ')): > break > > def _scan_masterslave_alignment(self, uhandle, consumer): > @@ -382,10 +380,10 @@ class _Scanner: > # Check to see whether I'm finished reading the alignment. > # This is indicated by 1) database section, 2) next > psi-blast round > # patch by chapmanb > - if line[:9] == 'Searching': > + if line.startswith('Searching'): > uhandle.saveline(line) > break > - elif line[:10] == ' Database': > + elif line.startswith(' Database'): > uhandle.saveline(line) > break > elif is_blank_line(line): > @@ -423,7 +421,7 @@ class _Scanner: > > line = safe_readline(uhandle) > uhandle.saveline(line) > - if string.find(line, 'Lambda') >= 0: > + if line.find('Lambda') != -1: > break > > read_and_call(uhandle, consumer.noevent, start='Lambda') > @@ -577,22 +575,22 @@ class _HeaderConsumer: > self._header = Record.Header() > > def version(self, line): > - c = string.split(line) > + c = line.split() > self._header.application = c[0] > self._header.version = c[1] > self._header.date = c[2][1:-1] > > def reference(self, line): > - if line[:11] == 'Reference: ': > + if line.startswith('Reference: '): > self._header.reference = line[11:] > else: > self._header.reference = self._header.reference + line > > def query_info(self, line): > - if line[:7] == 'Query= ': > + if line.startswith('Query= '): > self._header.query = line[7:] > - elif line[:7] != ' ': # continuation of query_info > - self._header.query = self._header.query + line > + elif not line.startswith(' '): # continuation of > query_info > + self._header.query = "%s%s" % (self._header.query, line) > else: > letters, = _re_search( > r"([0-9,]+) letters", line, > @@ -600,11 +598,11 @@ class _HeaderConsumer: > self._header.query_letters = _safe_int(letters) > > def database_info(self, line): > - line = string.rstrip(line) > - if line[:10] == 'Database: ': > + line = line.rstrip() > + if line.startswith('Database: '): > self._header.database = line[10:] > - elif not line[-13:] == 'total letters': > - self._header.database = self._header.database + > string.strip(line) > + elif not line.endswith('total letters'): > + self._header.database = self._header.database + > line.strip() > else: > sequences, letters =_re_search( > r"([0-9,]+) sequences; ([0-9,]+) total letters", line, > @@ -614,8 +612,8 @@ class _HeaderConsumer: > > def end_header(self): > # Get rid of the trailing newlines > - self._header.reference = string.rstrip(self._header.reference) > - self._header.query = string.rstrip(self._header.query) > + self._header.reference = self._header.reference.rstrip() > + self._header.query = self._header.query.rstrip() > > class _DescriptionConsumer: > def start_descriptions(self): > @@ -629,8 +627,8 @@ class _DescriptionConsumer: > self.__has_n = 0 # Does the description line contain an N > value? > > def description_header(self, line): > - if line[:19] == 'Sequences producing': > - cols = string.split(line) > + if line.startswith('Sequences producing'): > + cols = line.split() > if cols[-1] == 'N': > self.__has_n = 1 > > @@ -656,9 +654,9 @@ class _DescriptionConsumer: > pass > > def round(self, line): > - if line[:18] != 'Results from round': > + if not line.startswith('Results from round'): > raise SyntaxError, "I didn't understand the round > line\n%s" % line > - self._roundnum = _safe_int(string.strip(line[18:])) > + self._roundnum = _safe_int(line[18:]) > > def end_descriptions(self): > pass > @@ -674,23 +672,23 @@ class _DescriptionConsumer: > # - title must be preserved exactly (including whitespaces) > # - score could be equal to e-value (not likely, but what > if??) > # - sometimes there's an "N" score of '1'. > - cols = string.split(line) > + cols = line.split() > if len(cols) < 3: > raise SyntaxError, \ > "Line does not appear to contain description:\n%s" > % line > if self.__has_n: > - i = string.rfind(line, cols[-1]) # find start of N > - i = string.rfind(line, cols[-2], 0, i) # find start of > p-value > - i = string.rfind(line, cols[-3], 0, i) # find start of > score > + i = line.rfind(cols[-1]) # find start of N > + i = line.rfind(cols[-2], 0, i) # find start of p-value > + i = line.rfind(cols[-3], 0, i) # find start of score > else: > - i = string.rfind(line, cols[-1]) # find start of > p-value > - i = string.rfind(line, cols[-2], 0, i) # find start of > score > + i = line.rfind(cols[-1]) # find start of p-value > + i = line.rfind(cols[-2], 0, i) # find start of score > if self.__has_n: > dh.title, dh.score, dh.e, dh.num_alignments = \ > - string.rstrip(line[:i]), cols[-3], cols[-2], > cols[-1] > + line[:i].rstrip(), cols[-3], cols[-2], cols[-1] > else: > dh.title, dh.score, dh.e, dh.num_alignments = \ > - string.rstrip(line[:i]), cols[-2], cols[-1], 1 > + line[:i].rstrip(), cols[-2], cols[-1], 1 > dh.num_alignments = _safe_int(dh.num_alignments) > dh.score = _safe_int(dh.score) > dh.e = _safe_float(dh.e) > @@ -706,52 +704,52 @@ class _AlignmentConsumer: > self._multiple_alignment = Record.MultipleAlignment() > > def title(self, line): > - self._alignment.title = self._alignment.title + > string.lstrip(line) > + self._alignment.title = "%s%s" % (self._alignment.title, > + line.lstrip()) > > def length(self, line): > - self._alignment.length = string.split(line)[2] > + self._alignment.length = line.split()[2] > self._alignment.length = _safe_int(self._alignment.length) > > def multalign(self, line): > # Standalone version uses 'QUERY', while WWW version uses > blast_tmp. > - if line[:5] == 'QUERY' or line[:9] == 'blast_tmp': > + if line.startswith('QUERY') or line.startswith('blast_tmp'): > # If this is the first line of the multiple alignment, > # then I need to figure out how the line is formatted. > > # Format of line is: > # QUERY 1 > acttg...gccagaggtggtttattcagtctccataagagaggggacaaacg 60 > try: > - name, start, seq, end = string.split(line) > + name, start, seq, end = line.split() > except ValueError: > raise SyntaxError, "I do not understand the line\n%s" > \ > % line > - self._start_index = string.index(line, start, len(name)) > - self._seq_index = string.index(line, seq, > - > self._start_index+len(start)) > + self._start_index = line.index(start, len(name)) > + self._seq_index = line.index(seq, > + self._start_index+len(start)) > # subtract 1 for the space > self._name_length = self._start_index - 1 > self._start_length = self._seq_index - self._start_index > - 1 > - self._seq_length = string.rfind(line, end) - > self._seq_index - 1 > + self._seq_length = line.rfind(end) - self._seq_index - 1 > > - #self._seq_index = string.index(line, seq) > + #self._seq_index = line.index(seq) > ## subtract 1 for the space > - #self._seq_length = string.rfind(line, end) - > self._seq_index - 1 > - #self._start_index = string.index(line, start) > + #self._seq_length = line.rfind(end) - self._seq_index - 1 > + #self._start_index = line.index(start) > #self._start_length = self._seq_index - self._start_index > - 1 > #self._name_length = self._start_index > > # Extract the information from the line > - name = string.rstrip(line[:self._name_length]) > - start = string.rstrip( > - > line[self._start_index:self._start_index+self._start_length]) > + name = line[:self._name_length] > + name = name.rstrip() > + start = > line[self._start_index:self._start_index+self._start_length] > + start = start.rstrip() > if start: > start = _safe_int(start) > - end = string.rstrip( > - line[self._seq_index+self._seq_length:]) > + end = line[self._seq_index+self._seq_length:].rstrip() > if end: > end = _safe_int(end) > - seq = string.rstrip( > - line[self._seq_index:self._seq_index+self._seq_length]) > + seq = > line[self._seq_index:self._seq_index+self._seq_length].rstrip() > # right pad the sequence with spaces if necessary > if len(seq) < self._seq_length: > seq = seq + ' '*(self._seq_length-len(seq)) > @@ -826,7 +824,7 @@ class _AlignmentConsumer: > def end_alignment(self): > # Remove trailing newlines > if self._alignment: > - self._alignment.title = > string.rstrip(self._alignment.title) > + self._alignment.title = self._alignment.title.rstrip() > > # This code is also obsolete. See note above. > # If there's a multiple alignment, I will need to make sure > @@ -883,16 +881,16 @@ class _HSPConsumer: > "I could not find the identities in line\n%s" % line) > self._hsp.identities = _safe_int(x), _safe_int(y) > > - if string.find(line, 'Positives') >= 0: > + if line.find('Positives') != -1: > x, y = _re_search( > r"Positives = (\d+)\/(\d+)", line, > "I could not find the positives in line\n%s" % line) > self._hsp.positives = _safe_int(x), _safe_int(y) > > - if string.find(line, 'Gaps') >= 0: > + if line.find('Gaps') != -1: > x, y = _re_search( > r"Gaps = (\d+)\/(\d+)", line, > - "I could not find the positives in line\n%s" % line) > + "I could not find the gaps in line\n%s" % line) > self._hsp.gaps = _safe_int(x), _safe_int(y) > > > @@ -905,7 +903,7 @@ class _HSPConsumer: > # Frame can be in formats: > # Frame = +1 > # Frame = +2 / +2 > - if string.find(line, '/') >= 0: > + if line.find('/') != -1: > self._hsp.frame = _re_search( > r"Frame = ([-+][123]) / ([-+][123])", line, > "I could not find the frame in line\n%s" % line) > @@ -931,7 +929,7 @@ class _HSPConsumer: > self._query_len = len(seq) > > def align(self, line): > - seq = string.rstrip(line[self._query_start_index:]) > + seq = line[self._query_start_index:].rstrip() > if len(seq) < self._query_len: > # Make sure the alignment is the same length as the query > seq = seq + ' ' * (self._query_len-len(seq)) > @@ -948,7 +946,7 @@ class _HSPConsumer: > #On occasion, there is a blast hit with no subject match > #so far, it only occurs with 1-line short "matches" > #I have decided to let these pass as they appear > - if not string.strip(seq): > + if not seq.strip(): > seq = ' ' * self._query_len > self._hsp.sbjct = self._hsp.sbjct + seq > if self._hsp.sbjct_start is None: > @@ -976,8 +974,8 @@ class _DatabaseReportConsumer: > self._dr.database_name.append(m.group(1)) > elif self._dr.database_name: > # This must be a continuation of the previous name. > - x = self._dr.database_name[-1] + string.strip(line) > - self._dr.database_name[-1] = x > + self._dr.database_name[-1] = "%s%s" % > (self._dr.database_name[-1], > + line.strip()) > > def posted_date(self, line): > self._dr.posted_date.append(_re_search( > @@ -995,14 +993,14 @@ class _DatabaseReportConsumer: > > self._dr.num_sequences_in_database.append(_safe_int(sequences)) > > def ka_params(self, line): > - x = string.split(line) > + x = line.split() > self._dr.ka_params = map(_safe_float, x) > > def gapped(self, line): > self._dr.gapped = 1 > > def ka_params_gap(self, line): > - x = string.split(line) > + x = line.split() > self._dr.ka_params_gap = map(_safe_float, x) > > def end_database_report(self): > @@ -1013,7 +1011,7 @@ class _ParametersConsumer: > self._params = Record.Parameters() > > def matrix(self, line): > - self._params.matrix = string.rstrip(line[8:]) > + self._params.matrix = line[8:].rstrip() > > def gap_penalties(self, line): > x = _get_cols( > @@ -1021,7 +1019,7 @@ class _ParametersConsumer: > self._params.gap_penalties = map(_safe_float, x) > > def num_hits(self, line): > - if string.find(line, '1st pass') >= 0: > + if line.find('1st pass') != -1: > x, = _get_cols(line, (-4,), ncols=11, expected={2:"Hits"}) > self._params.num_hits = _safe_int(x) > else: > @@ -1029,7 +1027,7 @@ class _ParametersConsumer: > self._params.num_hits = _safe_int(x) > > def num_sequences(self, line): > - if string.find(line, '1st pass') >= 0: > + if line.find('1st pass') != -1: > x, = _get_cols(line, (-4,), ncols=9, > expected={2:"Sequences:"}) > self._params.num_sequences = _safe_int(x) > else: > @@ -1037,7 +1035,7 @@ class _ParametersConsumer: > self._params.num_sequences = _safe_int(x) > > def num_extends(self, line): > - if string.find(line, '1st pass') >= 0: > + if line.find('1st pass') != -1: > x, = _get_cols(line, (-4,), ncols=9, > expected={2:"extensions:"}) > self._params.num_extends = _safe_int(x) > else: > @@ -1045,7 +1043,7 @@ class _ParametersConsumer: > self._params.num_extends = _safe_int(x) > > def num_good_extends(self, line): > - if string.find(line, '1st pass') >= 0: > + if line.find('1st pass') != -1: > x, = _get_cols(line, (-4,), ncols=10, > expected={3:"extensions:"}) > self._params.num_good_extends = _safe_int(x) > else: > @@ -1297,8 +1295,13 @@ class Iterator: > If set to None, then the raw contents of the file will be > returned. > > """ > - if type(handle) is not FileType and type(handle) is not > InstanceType: > - raise ValueError, "I expected a file handle or file-like > object" > + try: > + dummy = handle.readline > + except AttributeError: > + raise ValueError( > + "I expected a file handle or file-like object, got %s" > + % type(handle)) > + del dummy > self._uhandle = File.UndoHandle(handle) > self._parser = parser > > @@ -1315,7 +1318,8 @@ class Iterator: > if not line: > break > # If I've reached the next one, then put the line back > and stop. > - if lines and (line[:5] == 'BLAST' or line[1:6] == > 'BLAST'): > + if lines and (line.startswith('BLAST') > + or line.startswith('BLAST', start = 1)): > self._uhandle.saveline(line) > break > lines.append(line) > @@ -1323,7 +1327,7 @@ class Iterator: > if not lines: > return None > > - data = string.join(lines, '') > + data = ''.join(lines) > if self._parser is not None: > return self._parser.parse(File.StringHandle(data)) > return data > @@ -1559,7 +1563,7 @@ def _re_search(regex, line, error_msg): > return m.groups() > > def _get_cols(line, cols_to_get, ncols=None, expected={}): > - cols = string.split(line) > + cols = line.split() > > # Check to make sure number of columns is correct > if ncols is not None and len(cols) != ncols: > @@ -1584,13 +1588,14 @@ def _safe_int(str): > except ValueError: > # Something went wrong. Try to clean up the string. > # Remove all commas from the string > - str = string.replace(str, ',', '') > + str = str.replace(',', '') > try: > # try again. > return int(str) > except ValueError: > pass > # If it fails again, maybe it's too long? > + # XXX why converting to float? > return long(float(str)) > > def _safe_float(str): > @@ -1599,13 +1604,13 @@ def _safe_float(str): > # we need to check the string for this condition. > > # Sometimes BLAST leaves of the '1' in front of an exponent. > - if str[0] in ['E', 'e']: > + if str and str[0] in ['E', 'e']: > str = '1' + str > try: > return float(str) > except ValueError: > # Remove all commas from the string > - str = string.replace(str, ',', '') > + str = str.replace(',', '') > # try again. > return float(str) > > @@ -1613,7 +1618,7 @@ class _BlastErrorConsumer(_BlastConsumer > def __init__(self): > _BlastConsumer.__init__(self) > def noevent(self, line): > - if line.find("Query must be at least wordsize") >= 0: > + if line.find("Query must be at least wordsize") != -1: > raise ShortQueryBlastError, "Query must be at least > wordsize" > # Now pass the line back up to the superclass. > method = getattr(_BlastConsumer, 'noevent', > @@ -1687,7 +1692,7 @@ class BlastErrorParser(AbstractParser): > # 'Searchingdone' instead of 'Searching......done' seems > # to indicate a failure to perform the BLAST due to > # low quality sequence > - if line[:13] == 'Searchingdone': > + if line.startswith('Searchingdone'): > raise LowQualityBlastError("Blast failure occured on > query: ", > data_record.query) > line = handle.readline() > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From jefftc at stanford.edu Fri Aug 8 16:02:38 2003 From: jefftc at stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Re: [BioPython] Updated Logo In-Reply-To: <200308071740.h77HeVTN011918@kira.skynet.be> Message-ID: <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu> (Moved to the -dev mailing list.) Brad, what's the best way to add images to the webpage? Should they be installed in a separate directory, or mixed in with the other pages? We will need for them to be accessible from other pages. Someone should be able to have an IMG SRC point to our logo. Jeff On Thursday, August 7, 2003, at 10:42 AM, Thomas Hamelryck wrote: > > Hi, > > Here's a smaller version of the biopython logo (300x89). > The large image scales well using gimp, BTW, or so I think. > Can it now be added to the biopython homepage? > > Regards, > > --- > Thomas Hamelryck > ULTR/COMO > Institute for molecular biology/Computer Science Department > Vrije Universiteit Brussel (VUB) > Brussels, Belgium > http://homepages.vub.ac.be/~thamelry > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From chapmanb at uga.edu Fri Aug 8 17:29:19 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Re: [BioPython] Updated Logo In-Reply-To: <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu> References: <200308071740.h77HeVTN011918@kira.skynet.be> <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu> Message-ID: <20030808212919.GC47653@evostick.agtec.uga.edu> Thomas: > >Here's a smaller version of the biopython logo (300x89). > >The large image scales well using gimp, BTW, or so I think. > >Can it now be added to the biopython homepage? Sure-o. I put it up at the top where the ol' BIOPYTHON text used to be. Looks quite nice, in my non-artistic opinion. What do people think? Jeff: > Brad, what's the best way to add images to the webpage? Should they be > installed in a separate directory, or mixed in with the other pages? > We will need for them to be accessible from other pages. Someone > should be able to have an IMG SRC point to our logo. I created an images directory, so now: http://www.biopython.org/images/ has both the full size and scaled logo in it, for easy linking and all other good things people might like to do. Very pretty. Thanks for the logo Thomas! Brad From grouse at mail.utexas.edu Fri Aug 8 17:49:16 2003 From: grouse at mail.utexas.edu (Michael Hoffman) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Re: [BioPython] Updated Logo In-Reply-To: <20030808212919.GC47653@evostick.agtec.uga.edu> References: <200308071740.h77HeVTN011918@kira.skynet.be> <4278F5D5-C9DB-11D7-A249-000A956845CE@stanford.edu> <20030808212919.GC47653@evostick.agtec.uga.edu> Message-ID: On Fri, 8 Aug 2003, Brad Chapman wrote: > Sure-o. I put it up at the top where the ol' BIOPYTHON text used to > be. Looks quite nice, in my non-artistic opinion. What do people > think? There's no difference between the major and minor grooves! The horror! Just kidding. I like it. It would be nice if the web site colors matched up with the logo more or vice versa. -- Michael Hoffman The University of Texas at Austin From chapmanb at uga.edu Fri Aug 8 18:08:23 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Flat files indices In-Reply-To: <3F339B4E.5090304@irisa.fr> References: <3F339B4E.5090304@irisa.fr> Message-ID: <20030808220823.GE47653@evostick.agtec.uga.edu> Hi Yves; > Are the "Open-bio flat-file indexing systems" implemented soemwhere? Yes, in Bio.Mindy. All of the components are there to be used, although the documentation and "ease of use" of them is lagging behind what has actually been finished. But, you can do useful work with them. I'm attaching a file of example code ripped from some work I've been doing which hopefully demonstrates the using it. This current code indexes a standard set of FASTA files downloaded from GenBank based on GI numbers. It also has some support for version numbers and other things tossed in which aren't use here, but which I have used in other things (aaah, the ugliness of cut-n-paste code). This uses exclusively the Martel based parsers, which allows it to work on pretty darn huge FASTA files. A previous version used the standard Fasta RecordParser which doesn't do well on huge entries (I was doing rice work for someone and had entries like all of chromosome 10 tossed in). So, yeah -- it's all there but needs some work to make it more user-friendly and documented. Volunteers are always welcome :-). Hope this helps! Brad -------------- next part -------------- import cStringIO from Bio import Fasta from Bio import Mindy from Bio.Mindy import SimpleSeqRecord from Bio.expressions import fasta from Martel.LAX import LAX class MindyIndexOrganizer: """Organizer to deal with a set of Mindy indexes of Fasta (or other) files """ def __init__(self): self._databases ={} self._opened_mindydbs = {} self._retrieved_seqs = {} def _get_indexer(self): """Derive from and override this class to get different indexers. """ return _FastaIdIndexer() def index(self, index_name, index_file, index_dir, force_index = 0): """Index, if necessary, the file and load the index into the organizer. """ if not(os.path.exists(index_dir)): os.makedirs(index_dir) full_index = os.path.join(index_dir, index_name) if not(os.path.exists(full_index)) or force_index: print "Indexing %s..." % index_file indexer = self._get_indexer() SimpleSeqRecord.create_berkeleydb([index_file], full_index, indexer) self._databases[index_name] = full_index def retrieve(self, database_name, id_key): """Retrieve a sequence record from the given Mindy sequence db. This returns a Record object, specified by the parser passed to the class. """ # see if we've already retrieved this sequence -- save time with big # sequences try: return self._retrieved_seqs[(database_name, id_key)] except KeyError: pass # Get the sequence database seq_db = self._get_opened_mindydb(database_name) # first try retrieving by the base id try: seqs = seq_db.lookup(id = id_key) # if we can't do that, we have to fetch by alias # this deals with the problem of multiple sequence versions except KeyError: seqs = seq_db.lookup(aliases = id_key) # easy case -- 1 sequence if len(seqs) == 1: seq_ref = seqs[0] # harder case -- multiple sequence versions else: seq_ref = self._get_latest_version(seqs) it_builder = fasta.format.make_iterator("record") handle = cStringIO.StringIO(seq_ref.text) iterator = it_builder.iterateFile(handle, LAX(fields = ['bioformat:sequence'])) result = iterator.next() seq_record = Fasta.Record() seq_record.title = id_key seq_record.sequence = "".join(result['bioformat:sequence']) # add the retrieved sequence to an in-memory dictionary self._retrieved_seqs[(database_name, id_key)] = seq_record return seq_record def _get_opened_mindydb(self, name): """Retrieve an open Mindy database with the given name. This stores databases we already opened to try and prevent multiple opens of the same database (and thus save time). """ try: return self._opened_mindydbs[name] except KeyError: self._opened_mindydbs[name] = Mindy.open(self._databases[name]) return self._get_opened_mindydb(name) def _get_latest_version(self, seqs): """Find the latest version of a sequence from a list of choices. This deals with the problem of multiple versions of a sequence by finding and returning the most recent version """ version_dict = {} the_base_name = None # error checking for seq in seqs: text_parts = seq.text.split("\n") first_line_parts = text_parts[0].split(" ") seq_name = first_line_parts[0] assert seq_name.find(".") >= 0, \ "Expected versioned seqs: %s" % seq_name base_name, version = seq_name.split(".") if the_base_name is None: the_base_name = base_name else: assert base_name == the_base_name, "Different seqs" version_dict[int(version)] = seq versions = version_dict.keys() versions.sort() # use the most recent version, seems the best plan return version_dict[max(versions)] class _FastaIdIndexer(SimpleSeqRecord.BaseSeqRecordIndexer): """Simple indexer to index by GI values. This assumes that the description of the sequence records is in the standard GenBank-like format: >gi|33304516|gb|AY339367.1| Oryza sativa (japonica cultivar-group)... It indexes these based on the GI number of the sequence. """ def __init__(self): SimpleSeqRecord.BaseSeqRecordIndexer.__init__(self) def primary_key_name(self): return "id" def secondary_key_names(self): return ["name", "aliases"] def get_id_dictionary(self, seq_record): parts = seq_record.description.split(" ") record_id = parts[0] record_parts = record_id.split("|") sequence_id = record_parts[1] # GI number aliases = [] if not(sequence_id): raise ValueError("No sequence ID: %s" % seq_record.description) id_info = {"id" : [sequence_id], "name" : [], "aliases" : aliases} return id_info From Yves.Bastide at irisa.fr Tue Aug 12 15:18:41 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:25 2005 Subject: [Biopython-dev] Flat files indices In-Reply-To: <20030808220823.GE47653@evostick.agtec.uga.edu> References: <3F339B4E.5090304@irisa.fr> <20030808220823.GE47653@evostick.agtec.uga.edu> Message-ID: <3F393D91.3010108@irisa.fr> Brad Chapman wrote: > Hi Yves; > > >>Are the "Open-bio flat-file indexing systems" implemented soemwhere? > > > Yes, in Bio.Mindy. All of the components are there to be used, > although the documentation and "ease of use" of them is lagging > behind what has actually been finished. But, you can do useful work > with them. > > I'm attaching a file of example code ripped from some work I've been > doing which hopefully demonstrates the using it. This current code > indexes a standard set of FASTA files downloaded from GenBank based > on GI numbers. It also has some support for version numbers and > other things tossed in which aren't use here, but which I have used > in other things (aaah, the ugliness of cut-n-paste code). > > This uses exclusively the Martel based parsers, which allows it to > work on pretty darn huge FASTA files. A previous version used the > standard Fasta RecordParser which doesn't do well on huge entries (I > was doing rice work for someone and had entries like all of > chromosome 10 tossed in). > > So, yeah -- it's all there but needs some work to make it more > user-friendly and documented. Volunteers are always welcome :-). > > Hope this helps! > Brad Thanks! I think there are a few bugs in Mindy (eg., the use of fileid_onfo in WriteDB); here's a patch with gratuitous cosmetic changes, perhaps useful ones, and certainly new bugs :) (I also started to add docstrings, then saw the hour here) I also patched sprot38.py to read a current snapshot of SwissProt: 2A5D_HUMAN has a multi-lines RP. I dunno if there are users of the parser to change, though... yves -------------- next part -------------- Index: Bio/Mindy/BaseDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/BaseDB.py,v retrieving revision 1.5 diff -u -p -r1.5 BaseDB.py --- Bio/Mindy/BaseDB.py 2002/12/10 20:56:05 1.5 +++ Bio/Mindy/BaseDB.py 2003/08/12 19:12:21 @@ -3,6 +3,7 @@ import Bio import compression def _int_str(i): + # XXX doesn't seem useful s = str(i) if s[-1:] == "l": return s[:-1] @@ -12,22 +13,22 @@ class WriteDB: # Must define 'self.filename_map' mapping from filename -> fileid # Must define 'self.fileid_info' mapping from fileid -> (filename,size) - def add_filename(self, filename, size, fileid_info): + def add_filename(self, filename, size): fileid = self.filename_map.get(filename, None) if fileid is not None: return fileid s = str(len(self.filename_map)) self.filename_map[filename] = s # map from filename -> id - assert s not in fileid_info.keys(), "Duplicate entry! %s" % (s,) + assert s not in self.fileid_info.keys(), "Duplicate entry! %s" % (s,) self.fileid_info[s] = (filename, size) return s - def load(self, filename, builder, fileid_info, record_tag = "record"): + def load(self, filename, builder, record_tag="record"): formatname = self.formatname size = os.path.getsize(filename) - filetag = self.add_filename(filename, size, fileid_info) + filetag = self.add_filename(filename, size) - source = compression.open_file(filename, "rb") + source = compression.open(filename, "rb") if formatname == "unknown": formatname = "sequence" @@ -66,7 +67,7 @@ class DictLookup: def items(self): return [(key, self[key]) for key in self.keys()] - def get(self, key, default = None): + def get(self, key, default=None): try: return self[key] except KeyError: @@ -97,7 +98,7 @@ class OpenDB(DictLookup): if os.path.getsize(filename) != size: raise TypeError( "File %s has changed size from %d to %d bytes!" % - (size, os.path.getsize(filename))) + (filename, size, os.path.getsize(filename))) self.filename_map = filename_map self.fileid_info = fileid_info Index: Bio/Mindy/BerkeleyDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/BerkeleyDB.py,v retrieving revision 1.8 diff -u -p -r1.8 BerkeleyDB.py --- Bio/Mindy/BerkeleyDB.py 2002/12/10 21:55:40 1.8 +++ Bio/Mindy/BerkeleyDB.py 2003/08/12 19:12:21 @@ -1,15 +1,19 @@ +"""Open-Bio BerkeleyDB indexing system for flat-files databanks.""" + import os -from bsddb3 import db +try: + from bsddb import db +except ImportError: + from bsddb3 import db import Location import BaseDB import Bio -_open = open # rename for internal use -- gets redefined below - INDEX_TYPE = "BerkeleyDB/1" def create(dbname, primary_namespace, secondary_namespaces, - formatname = "unknown"): + formatname="unknown"): + """BerkeleyDB creator factory""" os.mkdir(dbname) config_filename = os.path.join(dbname, "config.dat") BaseDB.write_config(config_filename = config_filename, @@ -39,7 +43,7 @@ def create(dbname, primary_namespace, se primary_table.close() dbenv.close() - return open(dbname, "rw") + return BerkeleyDB(dbname, "rw") class PrimaryNamespace(BaseDB.DictLookup): @@ -92,7 +96,7 @@ class SecondaryNamespace(BaseDB.DictLook return table.keys() class BerkeleyDB(BaseDB.OpenDB, BaseDB.WriteDB): - def __init__(self, dbname, mode = "r"): + def __init__(self, dbname, mode="r"): if mode not in ("r", "rw"): raise TypeError("Unknown mode: %r" % (mode,)) self.__need_flush = 0 @@ -173,7 +177,7 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W [x.close() for x in self.secondary_tables.values()] self.dbenv.close() self.dbenv = self.primary_table = self.fileid_info = \ - self.secondary_tables = self.fileid_info = None + self.secondary_tables = None def __del__(self): if self.dbenv is not None: @@ -188,4 +192,3 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W return SecondaryNamespace(self, key) open = BerkeleyDB - Index: Bio/Mindy/FlatDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/FlatDB.py,v retrieving revision 1.6 diff -u -p -r1.6 FlatDB.py --- Bio/Mindy/FlatDB.py 2002/03/01 15:07:21 1.6 +++ Bio/Mindy/FlatDB.py 2003/08/12 19:12:21 @@ -1,9 +1,11 @@ +"""Open-Bio flat indexing system for flat-files databanks.""" -import os, bisect -import BaseDB, Location +import os +import bisect +import BaseDB +import Location import Bio -_open = open INDEX_TYPE = "flat/1" def _parse_primary_table_entry(s): @@ -11,7 +13,7 @@ def _parse_primary_table_entry(s): return name, filetag, long(startpos), long(length) def _read_primary_table(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) table = {} while 1: @@ -36,7 +38,7 @@ def _write_primary_table(filename, prima raise AssertionError( "Primary index record too large for format spec! " + " %s bytes in %r" % (n, s)) - outfile = _open(filename, "wb") + outfile = file(filename, "wb") outfile.write("%04d" % n) for k, v in info: s = "%s\t%s" % (k, v) @@ -47,7 +49,7 @@ def _parse_secondary_table_entry(s): return s.rstrip().split("\t") def _read_secondary_table(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) table = {} while 1: @@ -75,7 +77,7 @@ def _write_secondary_table(filename, tab "Secondary index record too large for format spec! " + " %s bytes in %r" % (n, s)) # And write the output - outfile = _open(filename, "wb") + outfile = file(filename, "wb") outfile.write("%04d" % n) for k, v in items: for x in v: @@ -127,7 +129,7 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF def __init__(self, dbname): self.__in_constructor = 1 self._need_flush = 0 - BaseFlatDB.__init__(self, dbname, INDEX_TYPE) + BaseFlatDB.__init__(self, dbname) primary_filename = os.path.join(self.dbname, "key_%s.key" % (self.primary_namespace,) ) @@ -145,7 +147,8 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF if len(key_list) != 1: raise TypeError( "Field %s has %d entries but must have only one " - "(must be unique)" % (repr(unique), len(key_list))) + "(must be unique)" % (repr(self.primary_namespace), + len(key_list))) key = key_list[0] if self.primary_table.has_key(key): raise TypeError("Field %r = %r already exists; must be unique" % @@ -227,7 +230,7 @@ class BisectFile: def _find_entry(filename, wantword): size = os.path.getsize(filename) - infile = _open(filename, "rb") + infile = file(filename, "rb") bf = BisectFile(infile, size) left = bisect.bisect_left(bf, wantword) @@ -238,7 +241,7 @@ def _find_entry(filename, wantword): def _find_range(filename, wantword): size = os.path.getsize(filename) - infile = _open(filename, "rb") + infile = file(filename, "rb") bf = BisectFile(infile, size) left = bisect.bisect_left(bf, wantword) @@ -272,7 +275,7 @@ def _lookup_alias(id_filename, word): return primary_keys def create(dbname, primary_namespace, secondary_namespaces, - formatname = "unknown"): + formatname="unknown"): os.mkdir(dbname) config_filename = os.path.join(dbname, "config.dat") BaseDB.write_config(config_filename = config_filename, @@ -297,7 +300,7 @@ def create(dbname, primary_namespace, se return open(dbname, "rw") -def open(dbname, mode = "r"): +def open(dbname, mode="r"): if mode == "r": return DiskFlatDB(dbname) elif mode == "rw": @@ -308,7 +311,7 @@ def open(dbname, mode = "r"): raise TypeError("Unknown mode: %r" % (mode,)) def _get_first_words(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) data = [] while 1: Index: Bio/Mindy/Location.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/Location.py,v retrieving revision 1.2 diff -u -p -r1.2 Location.py --- Bio/Mindy/Location.py 2002/02/26 11:32:26 1.2 +++ Bio/Mindy/Location.py 2003/08/12 19:12:21 @@ -1,6 +1,6 @@ import compression -class Location: +class Location(object): """Handle for a record (use 'text' to get the record's text)""" def __init__(self, namespace, name, filename, startpos, length): self.namespace = namespace @@ -9,26 +9,26 @@ class Location: self.startpos = startpos self.length = length def __repr__(self): - return "Location(namespace = %r, name = %r, filename = %r, startpos = %r, length = %r)" % (self.namespace, self.name, self.filename, self.startpos, self.length) + return "Location(namespace = %r, name = %r, filename = %r," \ + " startpos = %r, length = %r)" % \ + (self.namespace, self.name, self.filename, + self.startpos, self.length) def __str__(self): return "Location(%s:%s at %s: %s, %s)" % \ (self.namespace, self.name, self.filename,self.startpos, self.length) - def __getattr__(self, key): - if key == "text": - infile = compression.open_file(self.filename) - if hasattr(infile, "seek"): - infile.seek(self.startpos) - return infile.read(self.length) - # read 1MB chunks at a time - CHUNKSIZE = 1000000 - count = 0 - while count + CHUNKSIZE < self.startpos: - infile.read(CHUNKSIZE) - count += CHUNKSIZE - infile.read(self.startpos - count) + def get_text(self): + infile = compression.open(self.filename) + if hasattr(infile, "seek"): + infile.seek(self.startpos) return infile.read(self.length) - elif key == "__members__": - return ["text"] - raise AttributeError(key) + # read 1MiB chunks at a time + CHUNKSIZE = 1048576 + count = 0 + while count + CHUNKSIZE < self.startpos: + infile.read(CHUNKSIZE) + count += CHUNKSIZE + infile.read(self.startpos - count) + return infile.read(self.length) + text = property(get_text) Index: Bio/Mindy/SimpleSeqRecord.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/SimpleSeqRecord.py,v retrieving revision 1.2 diff -u -p -r1.2 SimpleSeqRecord.py --- Bio/Mindy/SimpleSeqRecord.py 2002/12/10 20:56:05 1.2 +++ Bio/Mindy/SimpleSeqRecord.py 2003/08/12 19:12:22 @@ -94,8 +94,10 @@ class FixDocumentBuilder(BuildSeqRecord) # --- convenience functions for indexing # you should just use these unless you are doing something fancy -def create_berkeleydb(files, db_name, indexer = SimpleIndexer()): +def create_berkeleydb(files, db_name, indexer=None): from Bio.Mindy import BerkeleyDB + if indexer is None: + indexer = SimpleIndexer() unique_name = indexer.primary_key_name() alias_names = indexer.secondary_key_names() creator = BerkeleyDB.create(db_name, unique_name, alias_names) @@ -104,8 +106,10 @@ def create_berkeleydb(files, db_name, in creator.load(filename, builder = builder, fileid_info = {}) creator.close() -def create_flatdb(files, db_name, indexer = SimpleIndexer()): +def create_flatdb(files, db_name, indexer=None): from Bio.Mindy import FlatDB + if indexer is None: + indexer = SimpleIndexer() unique_name = indexer.primary_key_name() alias_names = indexer.secondary_key_names() creator = FlatDB.create(db_name, unique_name, alias_names) Index: Bio/Mindy/XPath.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/XPath.py,v retrieving revision 1.3 diff -u -p -r1.3 XPath.py --- Bio/Mindy/XPath.py 2002/03/01 15:07:21 1.3 +++ Bio/Mindy/XPath.py 2003/08/12 19:12:22 @@ -1,4 +1,5 @@ -import xml.sax, re +import xml.sax +import re from Bio import Std @@ -10,7 +11,7 @@ _pat_tag_re = re.compile(r"""^//(%s)(\[@ #') # emacs cruft -def parse_simple_xpath(s): +def _parse_simple_xpath(s): # Only supports two formats # //tag # //tag[@attr="value"] @@ -32,11 +33,23 @@ def parse_simple_xpath(s): def xpath_index(dbname, filenames, primary_namespace, - extract_info, # pair of (data_value, xpath) - format = "sequence", - record_tag = Std.record.tag, - creator_factory = None, + extract_info, + format="sequence", + record_tag=Std.record.tag, + creator_factory=None, ): + """Index a flat-file databank. + + Arguments: + dbname -- databank name + filenames -- list of file names; full paths should be used + primary_namespace -- primary identifier namespace + extract_info -- list of pairs (data_value, xpath) + format -- name of the file format (default: "sequence") + record_tag -- record tag (default: `Bio.Std.record.tag`) + creator_factory -- creator factory (default: BerkeleyDB.create) + + """ if creator_factory is None: import BerkeleyDB creator_factory = BerkeleyDB.create @@ -55,28 +68,32 @@ def xpath_index(dbname, raise TypeError("Property %r has no xpath definition" % (primary_namespace,)) - creator = creator_factory(dbname, primary_namespace, data_names) - builder = GrabXPathNodes(extract_info) + creator = creator_factory(dbname, primary_namespace, data_names, + formatname = format) + builder = _GrabXPathNodes(extract_info) + fileid_info = {} for filename in filenames: - creator.load(filename, builder = builder, record_tag = record_tag, - formatname = format) + creator.load(filename, builder = builder, fileid_info = fileid_info, + record_tag = record_tag) creator.close() -class GrabXPathNodes(xml.sax.ContentHandler): +class _GrabXPathNodes(xml.sax.ContentHandler): def __init__(self, extractinfo): + xml.sax.ContentHandler.__init__(self) self._fast_tags = _fast_tags = {} for property, xpath in extractinfo: - tag, attrs = parse_simple_xpath(xpath) + tag, attrs = _parse_simple_xpath(xpath) _fast_tags.setdefault(tag, []).append( (attrs, property) ) # for doing the endElement in the correct order, # which is opposite to the input order - self._rev_tags = _rev_tags = {} + _rev_tags = {} for k, v in self._fast_tags.items(): v = v[:] v.reverse() - self._rev_tags[k] = v + _rev_tags[k] = v + self._rev_tags = _rev_tags def uses_tags(self): return self._fast_tags.keys() Index: Bio/Mindy/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/__init__.py,v retrieving revision 1.6 diff -u -p -r1.6 __init__.py --- Bio/Mindy/__init__.py 2002/03/01 15:07:21 1.6 +++ Bio/Mindy/__init__.py 2003/08/12 19:12:22 @@ -1,9 +1,13 @@ -import os, sys +import os -_open = open # rename for internal use -- gets redefined below +# For python 2.1 compatibility, one can add +##try: +## file +##except NameError: +## file = open -def open(dbname, mode = "r"): - text = _open(os.path.join(dbname, "config.dat"), "rb").read() +def open(dbname, mode="r"): + text = file(os.path.join(dbname, "config.dat"), "rb").read() line = text.split("\n")[0] if line == "index\tBerkeleyDB/1": import BerkeleyDB @@ -18,19 +22,19 @@ def open(dbname, mode = "r"): def main(): from Bio import Std import XPath - import FlatDB + ##import FlatDB XPath.xpath_index( - #dbname = "sprot_flat", + ##dbname = "sprot_flat", dbname = "sprot_small", filenames = ["/home/dalke/ftps/swissprot/smaller_sprot38.dat", - #filenames = ["/home/dalke/ftps/swissprot/sprot38.dat", + ##filenames = ["/home/dalke/ftps/swissprot/sprot38.dat", ], primary_namespace = "entry", extract_info = [ ("entry", "//entry_name"), ("accession", "//%s[@type='accession']" % (Std.dbid.tag,)), ], - #creator_factory = FlatDB.CreateFlatDB, + ##creator_factory = FlatDB.CreateFlatDB, ) Index: Bio/Mindy/compression.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/compression.py,v retrieving revision 1.1 diff -u -p -r1.1 compression.py --- Bio/Mindy/compression.py 2002/01/28 20:55:30 1.1 +++ Bio/Mindy/compression.py 2003/08/12 19:12:22 @@ -1,4 +1,5 @@ -import commands, os +import commands +import os _uncompress_table = { ".bz": "bzip2", @@ -8,21 +9,23 @@ _uncompress_table = { ".Z": "compress", } -def open_file(filename, mode = "rb"): +def open(filename, mode="rb"): ext = os.path.splitext(filename)[1] type = _uncompress_table.get(ext) if type is None: - return open(filename, mode) + return file(filename, mode) if type == "gzip": import gzip - gzip.open(filename, mode) + return gzip.open(filename, mode) if type == "bzip2": - cmd = "bzcat --decompress" - cmd += commands.mkarg(filename) - return os.popen(cmd, mode) + try: + import bz2 + except ImportError: + cmd = "bzcat --decompress %s" % commands.mkarg(filename) + return os.popen(cmd, mode) + return bz2.BZ2File(filename, mode) if type == "compress": - cmd = "zcat -d" - cmd += commands.mkarg(filename) + cmd = "zcat -d %s" % commands.mkarg(filename) return os.popen(cmd, mode) raise AssertionError("What's a %r?" % type) -------------- next part -------------- Index: Bio/expressions//swissprot/sprot38.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/expressions/swissprot/sprot38.py,v retrieving revision 1.4 diff -u -p -r1.4 sprot38.py --- Bio/expressions//swissprot/sprot38.py 2002/02/27 07:31:32 1.4 +++ Bio/expressions//swissprot/sprot38.py 2003/08/12 19:13:33 @@ -111,9 +111,9 @@ RN = Martel.Group("RN", Martel.Re("RN #--- RP -# occurs once +# 1 or more RP = Simple("RP", "reference_position") - +RP_block = Martel.Group("RP_block", Martel.Rep1(RP)) #--- RC @@ -151,7 +151,7 @@ RL_block = Martel.Group("RL_block", Mart reference = Martel.Group("reference", RN + \ - RP + \ + RP_block + \ Martel.Opt(RC_block) + \ Martel.Opt(RX) + \ RA_block + \ From Yves.Bastide at irisa.fr Tue Aug 12 15:25:14 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] Re: [patch] Bio/Mindy In-Reply-To: <20030808220823.GE47653@evostick.agtec.uga.edu> References: <3F339B4E.5090304@irisa.fr> <20030808220823.GE47653@evostick.agtec.uga.edu> Message-ID: <3F393F1A.9050703@irisa.fr> Oops. Do C-x s before cvs diff. yves -------------- next part -------------- Index: Bio/Mindy/BaseDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/BaseDB.py,v retrieving revision 1.5 diff -u -p -r1.5 BaseDB.py --- Bio/Mindy/BaseDB.py 2002/12/10 20:56:05 1.5 +++ Bio/Mindy/BaseDB.py 2003/08/12 19:17:44 @@ -3,6 +3,7 @@ import Bio import compression def _int_str(i): + # XXX doesn't seem useful s = str(i) if s[-1:] == "l": return s[:-1] @@ -12,22 +13,22 @@ class WriteDB: # Must define 'self.filename_map' mapping from filename -> fileid # Must define 'self.fileid_info' mapping from fileid -> (filename,size) - def add_filename(self, filename, size, fileid_info): + def add_filename(self, filename, size): fileid = self.filename_map.get(filename, None) if fileid is not None: return fileid s = str(len(self.filename_map)) self.filename_map[filename] = s # map from filename -> id - assert s not in fileid_info.keys(), "Duplicate entry! %s" % (s,) + assert s not in self.fileid_info.keys(), "Duplicate entry! %s" % (s,) self.fileid_info[s] = (filename, size) return s - def load(self, filename, builder, fileid_info, record_tag = "record"): + def load(self, filename, builder, record_tag="record"): formatname = self.formatname size = os.path.getsize(filename) - filetag = self.add_filename(filename, size, fileid_info) + filetag = self.add_filename(filename, size) - source = compression.open_file(filename, "rb") + source = compression.open(filename, "rb") if formatname == "unknown": formatname = "sequence" @@ -66,7 +67,7 @@ class DictLookup: def items(self): return [(key, self[key]) for key in self.keys()] - def get(self, key, default = None): + def get(self, key, default=None): try: return self[key] except KeyError: @@ -97,7 +98,7 @@ class OpenDB(DictLookup): if os.path.getsize(filename) != size: raise TypeError( "File %s has changed size from %d to %d bytes!" % - (size, os.path.getsize(filename))) + (filename, size, os.path.getsize(filename))) self.filename_map = filename_map self.fileid_info = fileid_info Index: Bio/Mindy/BerkeleyDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/BerkeleyDB.py,v retrieving revision 1.8 diff -u -p -r1.8 BerkeleyDB.py --- Bio/Mindy/BerkeleyDB.py 2002/12/10 21:55:40 1.8 +++ Bio/Mindy/BerkeleyDB.py 2003/08/12 19:17:44 @@ -1,15 +1,19 @@ +"""Open-Bio BerkeleyDB indexing system for flat-files databanks.""" + import os -from bsddb3 import db +try: + from bsddb import db +except ImportError: + from bsddb3 import db import Location import BaseDB import Bio -_open = open # rename for internal use -- gets redefined below - INDEX_TYPE = "BerkeleyDB/1" def create(dbname, primary_namespace, secondary_namespaces, - formatname = "unknown"): + formatname="unknown"): + """BerkeleyDB creator factory""" os.mkdir(dbname) config_filename = os.path.join(dbname, "config.dat") BaseDB.write_config(config_filename = config_filename, @@ -39,7 +43,7 @@ def create(dbname, primary_namespace, se primary_table.close() dbenv.close() - return open(dbname, "rw") + return BerkeleyDB(dbname, "rw") class PrimaryNamespace(BaseDB.DictLookup): @@ -92,7 +96,7 @@ class SecondaryNamespace(BaseDB.DictLook return table.keys() class BerkeleyDB(BaseDB.OpenDB, BaseDB.WriteDB): - def __init__(self, dbname, mode = "r"): + def __init__(self, dbname, mode="r"): if mode not in ("r", "rw"): raise TypeError("Unknown mode: %r" % (mode,)) self.__need_flush = 0 @@ -173,7 +177,7 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W [x.close() for x in self.secondary_tables.values()] self.dbenv.close() self.dbenv = self.primary_table = self.fileid_info = \ - self.secondary_tables = self.fileid_info = None + self.secondary_tables = None def __del__(self): if self.dbenv is not None: @@ -188,4 +192,3 @@ class BerkeleyDB(BaseDB.OpenDB, BaseDB.W return SecondaryNamespace(self, key) open = BerkeleyDB - Index: Bio/Mindy/FlatDB.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/FlatDB.py,v retrieving revision 1.6 diff -u -p -r1.6 FlatDB.py --- Bio/Mindy/FlatDB.py 2002/03/01 15:07:21 1.6 +++ Bio/Mindy/FlatDB.py 2003/08/12 19:17:44 @@ -1,9 +1,11 @@ +"""Open-Bio flat indexing system for flat-files databanks.""" -import os, bisect -import BaseDB, Location +import os +import bisect +import BaseDB +import Location import Bio -_open = open INDEX_TYPE = "flat/1" def _parse_primary_table_entry(s): @@ -11,7 +13,7 @@ def _parse_primary_table_entry(s): return name, filetag, long(startpos), long(length) def _read_primary_table(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) table = {} while 1: @@ -36,7 +38,7 @@ def _write_primary_table(filename, prima raise AssertionError( "Primary index record too large for format spec! " + " %s bytes in %r" % (n, s)) - outfile = _open(filename, "wb") + outfile = file(filename, "wb") outfile.write("%04d" % n) for k, v in info: s = "%s\t%s" % (k, v) @@ -47,7 +49,7 @@ def _parse_secondary_table_entry(s): return s.rstrip().split("\t") def _read_secondary_table(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) table = {} while 1: @@ -75,7 +77,7 @@ def _write_secondary_table(filename, tab "Secondary index record too large for format spec! " + " %s bytes in %r" % (n, s)) # And write the output - outfile = _open(filename, "wb") + outfile = file(filename, "wb") outfile.write("%04d" % n) for k, v in items: for x in v: @@ -127,7 +129,7 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF def __init__(self, dbname): self.__in_constructor = 1 self._need_flush = 0 - BaseFlatDB.__init__(self, dbname, INDEX_TYPE) + BaseFlatDB.__init__(self, dbname) primary_filename = os.path.join(self.dbname, "key_%s.key" % (self.primary_namespace,) ) @@ -145,7 +147,8 @@ class MemoryFlatDB(BaseDB.WriteDB, BaseF if len(key_list) != 1: raise TypeError( "Field %s has %d entries but must have only one " - "(must be unique)" % (repr(unique), len(key_list))) + "(must be unique)" % (repr(self.primary_namespace), + len(key_list))) key = key_list[0] if self.primary_table.has_key(key): raise TypeError("Field %r = %r already exists; must be unique" % @@ -227,7 +230,7 @@ class BisectFile: def _find_entry(filename, wantword): size = os.path.getsize(filename) - infile = _open(filename, "rb") + infile = file(filename, "rb") bf = BisectFile(infile, size) left = bisect.bisect_left(bf, wantword) @@ -238,7 +241,7 @@ def _find_entry(filename, wantword): def _find_range(filename, wantword): size = os.path.getsize(filename) - infile = _open(filename, "rb") + infile = file(filename, "rb") bf = BisectFile(infile, size) left = bisect.bisect_left(bf, wantword) @@ -272,7 +275,7 @@ def _lookup_alias(id_filename, word): return primary_keys def create(dbname, primary_namespace, secondary_namespaces, - formatname = "unknown"): + formatname="unknown"): os.mkdir(dbname) config_filename = os.path.join(dbname, "config.dat") BaseDB.write_config(config_filename = config_filename, @@ -297,7 +300,7 @@ def create(dbname, primary_namespace, se return open(dbname, "rw") -def open(dbname, mode = "r"): +def open(dbname, mode="r"): if mode == "r": return DiskFlatDB(dbname) elif mode == "rw": @@ -308,7 +311,7 @@ def open(dbname, mode = "r"): raise TypeError("Unknown mode: %r" % (mode,)) def _get_first_words(filename): - infile = _open(filename, "rb") + infile = file(filename, "rb") size = int(infile.read(4)) data = [] while 1: Index: Bio/Mindy/Location.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/Location.py,v retrieving revision 1.2 diff -u -p -r1.2 Location.py --- Bio/Mindy/Location.py 2002/02/26 11:32:26 1.2 +++ Bio/Mindy/Location.py 2003/08/12 19:17:44 @@ -1,6 +1,6 @@ import compression -class Location: +class Location(object): """Handle for a record (use 'text' to get the record's text)""" def __init__(self, namespace, name, filename, startpos, length): self.namespace = namespace @@ -9,26 +9,26 @@ class Location: self.startpos = startpos self.length = length def __repr__(self): - return "Location(namespace = %r, name = %r, filename = %r, startpos = %r, length = %r)" % (self.namespace, self.name, self.filename, self.startpos, self.length) + return "Location(namespace = %r, name = %r, filename = %r," \ + " startpos = %r, length = %r)" % \ + (self.namespace, self.name, self.filename, + self.startpos, self.length) def __str__(self): return "Location(%s:%s at %s: %s, %s)" % \ (self.namespace, self.name, self.filename,self.startpos, self.length) - def __getattr__(self, key): - if key == "text": - infile = compression.open_file(self.filename) - if hasattr(infile, "seek"): - infile.seek(self.startpos) - return infile.read(self.length) - # read 1MB chunks at a time - CHUNKSIZE = 1000000 - count = 0 - while count + CHUNKSIZE < self.startpos: - infile.read(CHUNKSIZE) - count += CHUNKSIZE - infile.read(self.startpos - count) + def get_text(self): + infile = compression.open(self.filename) + if hasattr(infile, "seek"): + infile.seek(self.startpos) return infile.read(self.length) - elif key == "__members__": - return ["text"] - raise AttributeError(key) + # read 1MiB chunks at a time + CHUNKSIZE = 1048576 + count = 0 + while count + CHUNKSIZE < self.startpos: + infile.read(CHUNKSIZE) + count += CHUNKSIZE + infile.read(self.startpos - count) + return infile.read(self.length) + text = property(get_text) Index: Bio/Mindy/SimpleSeqRecord.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/SimpleSeqRecord.py,v retrieving revision 1.2 diff -u -p -r1.2 SimpleSeqRecord.py --- Bio/Mindy/SimpleSeqRecord.py 2002/12/10 20:56:05 1.2 +++ Bio/Mindy/SimpleSeqRecord.py 2003/08/12 19:17:44 @@ -94,8 +94,10 @@ class FixDocumentBuilder(BuildSeqRecord) # --- convenience functions for indexing # you should just use these unless you are doing something fancy -def create_berkeleydb(files, db_name, indexer = SimpleIndexer()): +def create_berkeleydb(files, db_name, indexer=None): from Bio.Mindy import BerkeleyDB + if indexer is None: + indexer = SimpleIndexer() unique_name = indexer.primary_key_name() alias_names = indexer.secondary_key_names() creator = BerkeleyDB.create(db_name, unique_name, alias_names) @@ -104,8 +106,10 @@ def create_berkeleydb(files, db_name, in creator.load(filename, builder = builder, fileid_info = {}) creator.close() -def create_flatdb(files, db_name, indexer = SimpleIndexer()): +def create_flatdb(files, db_name, indexer=None): from Bio.Mindy import FlatDB + if indexer is None: + indexer = SimpleIndexer() unique_name = indexer.primary_key_name() alias_names = indexer.secondary_key_names() creator = FlatDB.create(db_name, unique_name, alias_names) Index: Bio/Mindy/XPath.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/XPath.py,v retrieving revision 1.3 diff -u -p -r1.3 XPath.py --- Bio/Mindy/XPath.py 2002/03/01 15:07:21 1.3 +++ Bio/Mindy/XPath.py 2003/08/12 19:17:44 @@ -1,4 +1,5 @@ -import xml.sax, re +import xml.sax +import re from Bio import Std @@ -10,7 +11,7 @@ _pat_tag_re = re.compile(r"""^//(%s)(\[@ #') # emacs cruft -def parse_simple_xpath(s): +def _parse_simple_xpath(s): # Only supports two formats # //tag # //tag[@attr="value"] @@ -32,11 +33,23 @@ def parse_simple_xpath(s): def xpath_index(dbname, filenames, primary_namespace, - extract_info, # pair of (data_value, xpath) - format = "sequence", - record_tag = Std.record.tag, - creator_factory = None, + extract_info, + format="sequence", + record_tag=Std.record.tag, + creator_factory=None, ): + """Index a flat-file databank. + + Arguments: + dbname -- databank name + filenames -- list of file names; full paths should be used + primary_namespace -- primary identifier namespace + extract_info -- list of pairs (data_value, xpath) + format -- name of the file format (default: "sequence") + record_tag -- record tag (default: `Bio.Std.record.tag`) + creator_factory -- creator factory (default: BerkeleyDB.create) + + """ if creator_factory is None: import BerkeleyDB creator_factory = BerkeleyDB.create @@ -55,28 +68,31 @@ def xpath_index(dbname, raise TypeError("Property %r has no xpath definition" % (primary_namespace,)) - creator = creator_factory(dbname, primary_namespace, data_names) - builder = GrabXPathNodes(extract_info) + creator = creator_factory(dbname, primary_namespace, data_names, + formatname = format) + builder = _GrabXPathNodes(extract_info) + fileid_info = {} for filename in filenames: - creator.load(filename, builder = builder, record_tag = record_tag, - formatname = format) + creator.load(filename, builder = builder, record_tag = record_tag) creator.close() -class GrabXPathNodes(xml.sax.ContentHandler): +class _GrabXPathNodes(xml.sax.ContentHandler): def __init__(self, extractinfo): + xml.sax.ContentHandler.__init__(self) self._fast_tags = _fast_tags = {} for property, xpath in extractinfo: - tag, attrs = parse_simple_xpath(xpath) + tag, attrs = _parse_simple_xpath(xpath) _fast_tags.setdefault(tag, []).append( (attrs, property) ) # for doing the endElement in the correct order, # which is opposite to the input order - self._rev_tags = _rev_tags = {} + _rev_tags = {} for k, v in self._fast_tags.items(): v = v[:] v.reverse() - self._rev_tags[k] = v + _rev_tags[k] = v + self._rev_tags = _rev_tags def uses_tags(self): return self._fast_tags.keys() Index: Bio/Mindy/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/__init__.py,v retrieving revision 1.6 diff -u -p -r1.6 __init__.py --- Bio/Mindy/__init__.py 2002/03/01 15:07:21 1.6 +++ Bio/Mindy/__init__.py 2003/08/12 19:17:44 @@ -1,9 +1,13 @@ -import os, sys +import os -_open = open # rename for internal use -- gets redefined below +# For python 2.1 compatibility, one can add +##try: +## file +##except NameError: +## file = open -def open(dbname, mode = "r"): - text = _open(os.path.join(dbname, "config.dat"), "rb").read() +def open(dbname, mode="r"): + text = file(os.path.join(dbname, "config.dat"), "rb").read() line = text.split("\n")[0] if line == "index\tBerkeleyDB/1": import BerkeleyDB @@ -18,7 +22,7 @@ def open(dbname, mode = "r"): def main(): from Bio import Std import XPath - import FlatDB + ##import FlatDB XPath.xpath_index( #dbname = "sprot_flat", dbname = "sprot_small", Index: Bio/Mindy/compression.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Mindy/compression.py,v retrieving revision 1.1 diff -u -p -r1.1 compression.py --- Bio/Mindy/compression.py 2002/01/28 20:55:30 1.1 +++ Bio/Mindy/compression.py 2003/08/12 19:17:44 @@ -1,4 +1,5 @@ -import commands, os +import commands +import os _uncompress_table = { ".bz": "bzip2", @@ -8,21 +9,23 @@ _uncompress_table = { ".Z": "compress", } -def open_file(filename, mode = "rb"): +def open(filename, mode="rb"): ext = os.path.splitext(filename)[1] type = _uncompress_table.get(ext) if type is None: - return open(filename, mode) + return file(filename, mode) if type == "gzip": import gzip - gzip.open(filename, mode) + return gzip.open(filename, mode) if type == "bzip2": - cmd = "bzcat --decompress" - cmd += commands.mkarg(filename) - return os.popen(cmd, mode) + try: + import bz2 + except ImportError: + cmd = "bzcat --decompress %s" % commands.mkarg(filename) + return os.popen(cmd, mode) + return bz2.BZ2File(filename, mode) if type == "compress": - cmd = "zcat -d" - cmd += commands.mkarg(filename) + cmd = "zcat -d %s" % commands.mkarg(filename) return os.popen(cmd, mode) raise AssertionError("What's a %r?" % type) From e.bettler at cmbi.kun.nl Wed Aug 13 17:32:24 2003 From: e.bettler at cmbi.kun.nl (Dr E Bettler) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] reference in an article Message-ID: <200308132332.24489.e.bettler@cmbi.kun.nl> Hi, we developped a project that is using Biopython libraries. In an article, how can we referenced Biopython ? just the url ? thanks, best regards, -- Dr Emmanuel BETTLER /-------------------------------/ CMBI University of Nijmegen P.O. Box 9010, 6500 GL Nijmegen, the Netherlands http://www.cmbi.kun.nl/staff/EBettler.shtml Tel. +31 (0)24 36 53338 (CMBI A-3031) +31 (0)24 36 53391 (CMBI's secretary) Fax. +31 (0)24 36 52977 Mob. +31 (0)6 25 175 619 From jefftc at stanford.edu Wed Aug 13 19:12:03 2003 From: jefftc at stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] reference in an article In-Reply-To: <200308132332.24489.e.bettler@cmbi.kun.nl> Message-ID: <8CB67A2E-CDE3-11D7-A091-000A956845CE@stanford.edu> Yes. Please cite the URL: http://www.biopython.org Jeff On Wednesday, August 13, 2003, at 02:32 PM, Dr E Bettler wrote: > Hi, > we developped a project that is using Biopython libraries. In an > article, how > can we referenced Biopython ? just the url ? > > thanks, > > best regards, > > -- > Dr Emmanuel BETTLER > /-------------------------------/ > > CMBI > University of Nijmegen > P.O. Box 9010, > 6500 GL Nijmegen, the Netherlands > http://www.cmbi.kun.nl/staff/EBettler.shtml > > Tel. +31 (0)24 36 53338 (CMBI A-3031) > +31 (0)24 36 53391 (CMBI's secretary) > Fax. +31 (0)24 36 52977 > Mob. +31 (0)6 25 175 619 > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Mon Aug 18 14:58:11 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] [Bug 1492] New: Martel Parser fails on Bio.db["protein-genbank-cgi"] entry Message-ID: <200308181858.h7IIwBwt013428@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1492 Summary: Martel Parser fails on Bio.db["protein-genbank-cgi"] entry Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Martel/Mindy AssignedTo: biopython-dev@biopython.org ReportedBy: henschel@mpi-cbg.de # Hi There! #The following few lines broke on a rather simple parsing, however it looks #reproducible (using biopython 1.21 on python 2.2.2) #It works for most other entries I tested. from Bio import GenBank from Bio import db genbank=db["protein-genbank-cgi"] parser=GenBank.FeatureParser() h=genbank["3891376"] res=parser.parse(h) # Causes Error: Martel.Parser.ParserPositionException: error parsing at or #beyond character 4462 >>> print genbank["3891376"].read() # still works though! # Hope it makes sense to you # Cheers, Andreas ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 18 15:36:08 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] [Bug 1493] New: Failure to load a MySQL database using BioSQL Message-ID: <200308181936.h7IJa84E013527@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1493 Summary: Failure to load a MySQL database using BioSQL Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev@biopython.org ReportedBy: idoerg@burnham.org Hi, Following a tutorial example on BioSQL, I ran into the followning dump. Seems like the taxon_id table does not have a columns "binomial" or "variant". I used the table definitions for MySQLdb off the CVS on the MySQLDb site. Iddo >>> gdb = server.new_database("px01") >>> from Bio import GenBank >>> parser = GenBank.FeatureParser() >>> iterator = GenBank.Iterator(open("/usr/home/iddo/results/anthrax_03/px01_gb"),parser) Traceback (most recent call last): File "", line 1, in ? IOError: [Errno 2] No such file or directory: '/usr/home/iddo/results/anthrax_03/px01_gb' >>> iterator =\ GenBank.Iterator(open("/usr/home/iddo/results/anthrax_03/px01_03.gb"),parser) >>> gdb.load(iterator) Traceback (most recent call last): File "", line 1, in ? File "/home/iddo/biopy_cvs/biopython/BioSQL/BioSeqDatabase.py", line 337, in load db_loader.load_seqrecord(cur_record) File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 173, in _load_bioentry_table taxon_id = self._get_taxon_id(record) File "/home/iddo/biopy_cvs/biopython/BioSQL/Loader.py", line 107, in _get_taxon_id taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant)) File "/home/iddo/biopy_cvs/biopython/BioSQL/BioSeqDatabase.py", line 236, in execute_and_fetchall self.cursor.execute(sql, args) File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 95, in execute return self._execute(query, args) File "/usr/lib/python2.2/site-packages/MySQLdb/cursors.py", line 114, in _execute self.errorhandler(self, exc, value) File "/usr/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler raise errorclass, errorvalue _mysql_exceptions.OperationalError: (1054, "Unknown column 'binomial' in 'where clause'") >>> ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas.kuntzagk at mdc-berlin.de Tue Aug 19 04:46:51 2003 From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] biopython into PyPi ? Message-ID: <1061282763.9530.13.camel@sulawesi> Hi, do you think it would good to add biopython to the Python Package Index? ( http://www.python.org/pypi ) This would maybe bring more developer/user to biopython ( if this is wanted.) Andreas From mdehoon at ims.u-tokyo.ac.jp Tue Aug 19 07:16:30 2003 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] biopython into PyPi ? In-Reply-To: <1061282763.9530.13.camel@sulawesi> References: <1061282763.9530.13.camel@sulawesi> Message-ID: <3F42070E.8090408@ims.u-tokyo.ac.jp> The PyPI page seems to describe smaller projects than Biopython, so Biopython may be out of place there. Other large Python projects are also not represented there. On the other hand, it wouldn't hurt. --Michiel. Andreas Kuntzagk wrote: > Hi, > > do you think it would good to add biopython to the Python Package Index? > ( http://www.python.org/pypi ) > This would maybe bring more developer/user to biopython ( if this is > wanted.) > > Andreas > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > > -- Michiel de Hoon Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From bugzilla-daemon at portal.open-bio.org Tue Aug 19 17:49:04 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] [Bug 1493] Failure to load a MySQL database using BioSQL Message-ID: <200308192149.h7JLn4WP019183@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1493 idoerg@burnham.org changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major Summary|Failure to load a MySQL |Failure to load a MySQL |database using BioSQL |database using BioSQL ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Aug 19 18:33:05 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] [Bug 1493] Failure to load a MySQL database using BioSQL Message-ID: <200308192233.h7JMX5Ge019394@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1493 ------- Additional Comments From chapmanb@arches.uga.edu 2003-08-19 18:33 ------- Iddo -- Yup, the Biopython BioSQL code is definitely out of phase with the current schema. I just plain haven't had time to work on it and get things back up to date. Anyone else is definitely welcome to step up. The SQL schemas in the test directory (Tests/BioSQL) are the schemas that they work with. These are reasonably recent (depending, of course, on your definition of reasonable) and should do most things for Biopython only use. That's the best alternative to updating the code that we can offer at the moment. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jchang at jeffchang.com Wed Aug 20 02:18:11 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] biopython into PyPi ? In-Reply-To: <1061282763.9530.13.camel@sulawesi> Message-ID: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com> Good idea. I tried to create an account there, but got an SMTP error. Do you (or someone) already have an account, and can maintain the PyPI entry? Otherwise, I will try again later. Jeff On Tuesday, August 19, 2003, at 01:46 AM, Andreas Kuntzagk wrote: > Hi, > > do you think it would good to add biopython to the Python Package > Index? > ( http://www.python.org/pypi ) > This would maybe bring more developer/user to biopython ( if this is > wanted.) > > Andreas > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From andreas.kuntzagk at mdc-berlin.de Wed Aug 20 04:17:11 2003 From: andreas.kuntzagk at mdc-berlin.de (Andreas Kuntzagk) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] biopython into PyPi ? In-Reply-To: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com> References: <1344B701-D2D6-11D7-9EA3-000A956845CE@jeffchang.com> Message-ID: <1061367382.9525.15.camel@sulawesi> Am Mit, 2003-08-20 um 08.18 schrieb Jeffrey Chang: > Good idea. I tried to create an account there, but got an SMTP error. Same with me. > Do you (or someone) already have an account, and can maintain the PyPI > entry? Otherwise, I will try again later. From idoerg at burnham.org Mon Aug 25 18:30:06 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] Re: [BioPython] Enzyme module In-Reply-To: <3F4A75EB.4040008@burnham.org> References: <3F4A75EB.4040008@burnham.org> Message-ID: <3F4A8DEE.5070805@burnham.org> In response to myself: I am writing the consumer. Watch this space... ./I Iddo Friedberg wrote: > Hi, > > Has anybody ever used the Enzyme module and written some sort of > consumer for it? I'd like to do some very basic parsing... and steal > some code :) > > Thanks, > > Iddo > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From jchang at jeffchang.com Mon Aug 25 20:12:42 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] biopython into PyPi ? In-Reply-To: <1061367382.9525.15.camel@sulawesi> Message-ID: <02B3A831-D75A-11D7-82B3-000A956845CE@jeffchang.com> The SMTP problems seem to have gone away, so I've registered Biopython under PyPI. It was extremely easy, once my account was set up: python setup.py register It takes the metadata out of the setup.py file! Jeff On Wednesday, August 20, 2003, at 01:16 AM, Andreas Kuntzagk wrote: > Am Mit, 2003-08-20 um 08.18 schrieb Jeffrey Chang: >> Good idea. I tried to create an account there, but got an SMTP error. > > Same with me. > >> Do you (or someone) already have an account, and can maintain the PyPI >> entry? Otherwise, I will try again later. > From bugzilla-daemon at portal.open-bio.org Fri Aug 29 05:32:43 2003 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:26 2005 Subject: [Biopython-dev] [Bug 1492] Martel Parser fails on Bio.db["protein-genbank-cgi"] entry Message-ID: <200308290932.h7T9Whud002272@localhost.localdomain> http://bugzilla.bioperl.org/show_bug.cgi?id=1492 Peter.Bienstman@ugent.be changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From Peter.Bienstman@ugent.be 2003-08-29 05:32 ------- Fixed in current CVS (added keywords 'het' and 'heterogen'/) ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.