From JBonis at imim.es Tue Mar 2 12:46:21 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Tue Mar 2 12:54:30 2004 Subject: [BioPython] Martel problems and MS-Windows Message-ID: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> Hi everyone, I have tried to install biopython. It seem to works propertly, but when trying the example of the cookbook the Martel module fails (or this is what I think). When typing: GenBank.FeatureParser() I get the error: "ImportError: No module named Martel". Of course when trying 'import Marvel' it doesnt work. I have installed: python2.3 mx-base 2.0.5 for python 2.3 numarray 0.8 for python 2.3 biopython1.24 for python2.3 following the installation instructions. I dont find any windows installer for Marvel. Shoudl I enter in the complex world of compiling py under windows? Any idea, howto, tutorial? From absmythe at ucdavis.edu Tue Mar 2 13:05:45 2004 From: absmythe at ucdavis.edu (ashleigh smythe) Date: Tue Mar 2 13:11:36 2004 Subject: [BioPython] problems parsing alignment files Message-ID: <1078250745.16948.193.camel@nate.ucdavis.edu> Hello biopythoneers! I've been having problems parsing alignments, but I can parse fasta files with no problem so I don't think I'm totally off base. I can't get either the NBRF parser or the Clustalw parser to work. I have updated biopython on February 27th and just updated Martel - both via the command "CVS update", but I guess it looks like a Martel problem. I think the NBRF (.pir) files and .aln files I'm using are fine as they open with no problem directly in ClustalX. Here's the traceback for NBRF: >>> from Bio import NBRF >>> parser=NBRF.RecordParser() >>> afile=open('allopen6ext2.pir') >>> iterator=NBRF.Iterator(afile, parser) >>> cur_record=iterator.next() Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.2/site-packages/Bio/NBRF/__init__.py", line 63, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.2/site-packages/Bio/NBRF/__init__.py", line 158, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.2/site-packages/Bio/NBRF/__init__.py", line 108, in feed self._parser.parseFile(handle) File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 2517 And here's the one from Clustalw (I'm not sure if I've got the syntax right on this as it doesn't seem to have the same RecordParser as NBRF or Fasta): >>> from Bio import Clustalw >>> dir(Clustalw) ['Alignment', 'Alphabet', 'ClustalAlignment', 'IUPAC', 'MultipleAlignCL', 'Seq', 'SeqRecord', '_AlignCreator', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'clustal_format', 'do_alignment', 'handler', 'os', 'parse_file', 'saxutils', 'string'] >>> parser=Clustalw.parse_file('allopen4ext2.aln') Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.2/site-packages/Bio/Clustalw/__init__.py", line 60, in parse_file parser.parseFile(to_parse) File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 361, in parseString self._err_handler.fatalError(ParserIncompleteException(pos)) File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserIncompleteException: error parsing at or beyond character 8311 (unparsed text remains) >>> Thanks for any help that anybody can provide! Ashleigh -- **************************** Ashleigh B. Smythe Graduate Research Assistant Department of Nematology University of California, Davis, CA 95616 email:absmythe@ucdavis.edu phone:530-754-4321 From chapmanb at uga.edu Tue Mar 2 14:42:53 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Mar 2 14:55:05 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> Message-ID: <20040302194253.GQ24150@evostick.agtec.uga.edu> Hello; > I have tried to install biopython. It seem to works propertly, but when > trying the example of the cookbook the Martel module fails (or this is what > I think). > > When typing: GenBank.FeatureParser() I get the error: "ImportError: No > module named Martel". It looks like you did everything perfectly find -- I believe the problem is that Martel (the underlying parsing engine) was accidentally left out of the windows installer. Michiel kindly provided the installers, so maybe he'd be willing to double-check if the might have been left out -- since I'm not much of a Windows person myself. (As an aside to Michiel -- I think this problem is caused by the way that the setup.py works. Specifically, only installing Martel if the version number of the "distributed Martel" is bigger than the "installed Martel." I put a note in the release instructions and will fix this in the future by bumping the Martel version on each release. This also ensures that any fixes/changes to Martel get installed). So yes, it looks like a problem on our part, and hopefully we can get it sorted for you. Thanks for reporting it and sorry for any confusion. Brad From chapmanb at uga.edu Tue Mar 2 14:48:35 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Mar 2 15:00:45 2004 Subject: [BioPython] problems parsing alignment files In-Reply-To: <1078250745.16948.193.camel@nate.ucdavis.edu> References: <1078250745.16948.193.camel@nate.ucdavis.edu> Message-ID: <20040302194835.GR24150@evostick.agtec.uga.edu> Hi Ashleigh; > Hello biopythoneers! I've been having problems parsing alignments, but > I can parse fasta files with no problem so I don't think I'm totally off > base. I can't get either the NBRF parser or the Clustalw parser to > work. I have updated biopython on February 27th and just updated Martel > - both via the command "CVS update", but I guess it looks like a Martel > problem. I think the NBRF (.pir) files and .aln files I'm using are > fine as they open with no problem directly in ClustalX. > > Here's the traceback for NBRF: [...] > And here's the one from Clustalw (I'm not sure if I've got the syntax > right on this as it doesn't seem to have the same RecordParser as NBRF > or Fasta): [...] Yes, it looks like you are doing everything right -- the tracebacks are indicating that the underlying Martel parser is having trouble dealing with the files you are providing, for some reason. If you could send me the NBRF and ALN files (off-list), I can have a look and see if I can figure where they are failing and try to get everything fixed. Also, that is a good point about the Clustalw parser not having the same syntax as the other parsers. Thanks for bringing that up. Please do send the files and thanks for the bug report. Brad From robert.roth at home.se Tue Mar 2 15:00:03 2004 From: robert.roth at home.se (Robert Roth) Date: Tue Mar 2 15:08:59 2004 Subject: [BioPython] Martel problems and MS-Windows Message-ID: <1078257603.44835580robert.roth@home.se> Hi Julio, >I have tried to install biopython. It seem to works >propertly, but when >trying the example of the cookbook the Martel module >fails (or this is what >I think). > >When typing: GenBank.FeatureParser() I get the >error: "ImportError: No >module named Martel". > >I dont find any windows installer for Marvel. > >Shoudl I enter in the complex world of compiling py >under windows? I ran in to similar problems when running some tests after installing biopython 1.24 on windows. The tests fails and give a dead link (dont remember the link exactly, www.biopython.org/~dalke/Martel something). I solved this by downloading the source for Martel from the CVS and installing by, python setup.py install Not very complex and all tests worked after this. Hope this helps. /Robert From chapmanb at uga.edu Tue Mar 2 18:15:15 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Mar 2 18:27:24 2004 Subject: [BioPython] problems parsing alignment files In-Reply-To: <20040302194835.GR24150@evostick.agtec.uga.edu> References: <1078250745.16948.193.camel@nate.ucdavis.edu> <20040302194835.GR24150@evostick.agtec.uga.edu> Message-ID: <20040302231515.GB49406@evostick.agtec.uga.edu> Hey all; [Ashleigh reports errors in parsing Clustalw and PIR files] Me: > If you could send me the NBRF and ALN files (off-list), I can > have a look and see if I can figure where they are failing and try > to get everything fixed. To follow up to the list -- I got the files and just now squished the bugs. Here were the problems: => The clustalw format didn't accept question marks in title names. => The NBRF/PIR parser didn't handle Clustalw format PIR, which is valid but different then the PIR files you can download. Fixes for both of these are in CVS and I also squashed some other bugs in the PIR parsing that I found. Thanks again to Asleigh for the report. Brad From mdehoon at ims.u-tokyo.ac.jp Tue Mar 2 20:43:37 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Mar 2 20:50:08 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <20040302194253.GQ24150@evostick.agtec.uga.edu> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> <20040302194253.GQ24150@evostick.agtec.uga.edu> Message-ID: <40453849.8050402@ims.u-tokyo.ac.jp> You are right, Martel did get skipped because it was already present on my machine. Why is it necessary to check version number of Martel? Is it possible that a version of Martel is already present that is newer than the one included with Biopython? --Michiel. Brad Chapman wrote: > Hello; > > >>I have tried to install biopython. It seem to works propertly, but when >>trying the example of the cookbook the Martel module fails (or this is what >>I think). >> >>When typing: GenBank.FeatureParser() I get the error: "ImportError: No >>module named Martel". > > > It looks like you did everything perfectly find -- I believe the > problem is that Martel (the underlying parsing engine) was > accidentally left out of the windows installer. > > Michiel kindly provided the installers, so maybe he'd be willing to > double-check if the might have been left out -- since I'm not much > of a Windows person myself. > > (As an aside to Michiel -- I think this problem is caused by the way > that the setup.py works. Specifically, only installing Martel if the > version number of the "distributed Martel" is bigger than the > "installed Martel." I put a note in the release instructions and > will fix this in the future by bumping the Martel version on each > release. This also ensures that any fixes/changes to Martel get > installed). > > So yes, it looks like a problem on our part, and hopefully we can > get it sorted for you. Thanks for reporting it and sorry for any > confusion. > > Brad > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From cuteone at allens.com Wed Mar 3 08:25:14 2004 From: cuteone at allens.com (donika cable) Date: Wed Mar 3 08:31:13 2004 Subject: [BioPython] You been stepped Message-ID: <200403031331.i23DVA9Q016771@portal.open-bio.org> Rap Battle

Yo limp dizzle, I got the salooshin to yer bodies palooshin'. Take one dose of thizz, give that ho a kizz, drop yo draws, she'll luv how hard it izz. She be breaking down your door, begging for more. Spreadin' them legs like a dirty wh*re...

- Life just seems to flow with Ciliazz. Don't want any dis, then flow owt with our lizt.

World famous rapper, Shawtie Flave of Dawty Soüth Reckardz

From jeffrey_chang at stanfordalumni.org Wed Mar 3 08:56:43 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Wed Mar 3 09:02:38 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <40453849.8050402@ims.u-tokyo.ac.jp> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> <20040302194253.GQ24150@evostick.agtec.uga.edu> <40453849.8050402@ims.u-tokyo.ac.jp> Message-ID: <9AB287DF-6D1A-11D8-869E-000A956845CE@stanfordalumni.org> This was my doing. At the time (and still now, I believe), Andrew wanted also to distribute Martel as a separate package, that people can use outside of Biopython. So thus, the CVS version, and possible distributions, were ahead of Biopython. But now that it appears that Martel development is stabilizing, perhaps it is time to revisit that decision. Jeff On Mar 2, 2004, at 8:43 PM, Michiel Jan Laurens de Hoon wrote: > You are right, Martel did get skipped because it was already present > on my machine. Why is it necessary to check version number of Martel? > Is it possible that a version of Martel is already present that is > newer than the one included with Biopython? > > --Michiel. > > Brad Chapman wrote: > >> Hello; >>> I have tried to install biopython. It seem to works propertly, but >>> when >>> trying the example of the cookbook the Martel module fails (or this >>> is what >>> I think). >>> >>> When typing: GenBank.FeatureParser() I get the error: "ImportError: >>> No >>> module named Martel". >> It looks like you did everything perfectly find -- I believe the >> problem is that Martel (the underlying parsing engine) was >> accidentally left out of the windows installer. >> Michiel kindly provided the installers, so maybe he'd be willing to >> double-check if the might have been left out -- since I'm not much >> of a Windows person myself. >> (As an aside to Michiel -- I think this problem is caused by the way >> that the setup.py works. Specifically, only installing Martel if the >> version number of the "distributed Martel" is bigger than the >> "installed Martel." I put a note in the release instructions and >> will fix this in the future by bumping the Martel version on each >> release. This also ensures that any fixes/changes to Martel get >> installed). >> So yes, it looks like a problem on our part, and hopefully we can >> get it sorted for you. Thanks for reporting it and sorry for any >> confusion. >> Brad >> _______________________________________________ >> BioPython mailing list - BioPython@biopython.org >> http://biopython.org/mailman/listinfo/biopython > > -- > Michiel de Hoon, Assistant Professor > University of Tokyo, Institute of Medical Science > Human Genome Center > 4-6-1 Shirokane-dai, Minato-ku > Tokyo 108-8639 > Japan > http://bonsai.ims.u-tokyo.ac.jp/~mdehoon > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From chapmanb at uga.edu Wed Mar 3 11:00:39 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 3 11:12:46 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <9AB287DF-6D1A-11D8-869E-000A956845CE@stanfordalumni.org> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> <20040302194253.GQ24150@evostick.agtec.uga.edu> <40453849.8050402@ims.u-tokyo.ac.jp> <9AB287DF-6D1A-11D8-869E-000A956845CE@stanfordalumni.org> Message-ID: <20040303160039.GB51845@evostick.agtec.uga.edu> Hey guys; Michiel: > >You are right, Martel did get skipped because it was already present > >on my machine. Why is it necessary to check version number of Martel? > >Is it possible that a version of Martel is already present that is > >newer than the one included with Biopython? Jeff: > This was my doing. At the time (and still now, I believe), Andrew > wanted also to distribute Martel as a separate package, that people can > use outside of Biopython. So thus, the CVS version, and possible > distributions, were ahead of Biopython. But now that it appears that > Martel development is stabilizing, perhaps it is time to revisit that > decision. My vote is that it's not really necessary, since I thought that distutils would already do the work of determining if the files to be installed are newer than the stuff in site-packages. Of course, I guess this based on time-stamps so it's not a perfect system, but should work for most cases and make things a little less confusing. On the other hand, it's not too tough to bump the version on making new releases or if new changes go into CVS -- I think I was just sleeping (well, really studying for exams) during the whole Martel install debate a while back, so I didn't know to do this on the last release. I added it to the build release directions, so it shouldn't be a problem in the future if we leave things the way they are. By the way, Michiel -- the quick fix is to just remove Martel from your site-packages before building the installer. We'll have a real fix of some sort for the next release. Brad From ralf.sigmund at ipk-gatersleben.de Wed Mar 3 13:21:22 2004 From: ralf.sigmund at ipk-gatersleben.de (Ralf Sigmund) Date: Wed Mar 3 13:27:13 2004 Subject: [BioPython] wu-blast 2.0 standalone parser? Message-ID: <40462222.5060603@ipk-gatersleben.de> Hi, are there any efforts to build a pythonic parser for the licensed version of wahington university wu-blast 2.0 standalone? should this be done using the event based martel parser architecture? is there a martel - based parser for ncbi blast? there are production quality parsers for theese formats in BioPerl. Would You favour to call Bioperl from Biopython over reimplementing a wu-blast parser in biopython? slightly off topic - does it still make sense to support both wu and ncbi blast? cheers Ralf From mdehoon at ims.u-tokyo.ac.jp Wed Mar 3 21:34:22 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Mar 3 21:40:52 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <20040303160039.GB51845@evostick.agtec.uga.edu> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> <20040302194253.GQ24150@evostick.agtec.uga.edu> <40453849.8050402@ims.u-tokyo.ac.jp> <9AB287DF-6D1A-11D8-869E-000A956845CE@stanfordalumni.org> <20040303160039.GB51845@evostick.agtec.uga.edu> Message-ID: <404695AE.3070100@ims.u-tokyo.ac.jp> I have put fixed Windows installers that now contain Martel on the biopython website. I am bound to forget to delete Martel on my machine again for the next biopython release, so I also vote to remove the version check. Another option would be to install Martel if the distributed version is newer than *or equal to* the installed version. --Michiel. Brad Chapman wrote: > Hey guys; > > Michiel: > >>>You are right, Martel did get skipped because it was already present >>>on my machine. Why is it necessary to check version number of Martel? >>>Is it possible that a version of Martel is already present that is >>>newer than the one included with Biopython? > > > Jeff: > >>This was my doing. At the time (and still now, I believe), Andrew >>wanted also to distribute Martel as a separate package, that people can >>use outside of Biopython. So thus, the CVS version, and possible >>distributions, were ahead of Biopython. But now that it appears that >>Martel development is stabilizing, perhaps it is time to revisit that >>decision. > > > My vote is that it's not really necessary, since I thought that > distutils would already do the work of determining if the files to > be installed are newer than the stuff in site-packages. Of course, I > guess this based on time-stamps so it's not a perfect system, but > should work for most cases and make things a little less confusing. > > On the other hand, it's not too tough to bump the version on making > new releases or if new changes go into CVS -- I think I was just > sleeping (well, really studying for exams) during the whole Martel > install debate a while back, so I didn't know to do this on the last > release. I added it to the build release directions, so it shouldn't > be a problem in the future if we leave things the way they are. > > By the way, Michiel -- the quick fix is to just remove Martel from > your site-packages before building the installer. We'll have a real > fix of some sort for the next release. > > Brad > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From anunberg at oriongenomics.com Thu Mar 4 11:02:33 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Thu Mar 4 11:12:31 2004 Subject: [BioPython] Question about indexing fasta files Message-ID: Hey, Does the fasta indexer normally convert sequence to all upper case automatically? I have seq with lc masking and want to preserver that when fetching sequences.. Andy -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From chapmanb at uga.edu Thu Mar 4 12:01:45 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Mar 4 12:13:49 2004 Subject: [BioPython] Martel problems and MS-Windows In-Reply-To: <404695AE.3070100@ims.u-tokyo.ac.jp> References: <10708933FD0CD511BCFE000102BE5F7202CB4047@hpserv.imim.es> <20040302194253.GQ24150@evostick.agtec.uga.edu> <40453849.8050402@ims.u-tokyo.ac.jp> <9AB287DF-6D1A-11D8-869E-000A956845CE@stanfordalumni.org> <20040303160039.GB51845@evostick.agtec.uga.edu> <404695AE.3070100@ims.u-tokyo.ac.jp> Message-ID: <20040304170144.GA810@evostick.agtec.uga.edu> Michiel: > I have put fixed Windows installers that now contain Martel on the > biopython website. Thanks much for updating them. Julio, Robert, any other Windows installer users -- please do try the new installers and let us know if you have any problems. > I am bound to forget to delete Martel on my machine > again for the next biopython release, so I also vote to remove the version > check. Another option would be to install Martel if the distributed version > is newer than *or equal to* the installed version. Okay, that seems like a reasonable compromise and covers us if we forget to bump the version. I just checked this into CVS -- hopefully it'll eliminate problems in the future. Thanks again for the installers. Brad From chapmanb at uga.edu Thu Mar 4 12:08:54 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Mar 4 12:20:59 2004 Subject: [BioPython] wu-blast 2.0 standalone parser? In-Reply-To: <40462222.5060603@ipk-gatersleben.de> References: <40462222.5060603@ipk-gatersleben.de> Message-ID: <20040304170854.GB810@evostick.agtec.uga.edu> Hi Ralf; > are there any efforts to build a pythonic parser for the licensed > version of wahington university wu-blast 2.0 standalone? > should this be done using the event based martel parser architecture? These is some initial effort -- Andrew wrote a wu-blast grammar for Martel, which you can find in Bio/expressions/blast/wublast.py. There is no "Biopython-like" framework built up around that yet, however, just the grammar. > is there a martel - based parser for ncbi blast? Yes, there is also Martel code for NCBI blast in the same directory -- Bio/expressions/blast/ncbiblast.py The Bio.Blast code does not use Martel currently -- at some time in the future I'd like to see it integrated, if only to keep the ncbiblast.py code up to date and to speed up parsing. > slightly off topic - does it still make sense to support both wu and > ncbi blast? Sure, we are happy to support whatever format people are still using. As with anything, it just takes someone willing to put the time into producing and maintaining a quality parser. Hope this helps. Brad From chapmanb at uga.edu Thu Mar 4 12:17:08 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Mar 4 12:29:11 2004 Subject: [BioPython] Question about indexing fasta files In-Reply-To: References: Message-ID: <20040304171708.GC810@evostick.agtec.uga.edu> Hi Andy; > Does the fasta indexer normally convert sequence to all upper case > automatically? I have seq with lc masking and want to preserver that when > fetching sequences.. No, the indexer doesn't do any conversion of the sequence information -- you should get it back from a RecordParser of SequenceParser in the same case as you put it in. Hope this helps. Brad From chapmanb at uga.edu Thu Mar 4 12:35:25 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Mar 4 12:47:31 2004 Subject: [BioPython] Re: biopython install problems In-Reply-To: <40474AF8.1010404@daimi.au.dk> References: <40474AF8.1010404@daimi.au.dk> Message-ID: <20040304173525.GD810@evostick.agtec.uga.edu> Hi Jacob; [Copying to the Biopython list] > I'm following the instructions at > > http://www.biopython.org/docs/install/Installation.html > > python setup.py install --home=~/usr > > to install the package under my ~/usr directory, and it works fine (the > package is installed in ~/usr/lib/python). > > I do the exact same thing with the Numeric package and it does put a > directory called Numeric (along with a Numeric.pth file) where I > expected, but it also puts some other files in ~/usr/include/python. > > And when I try to import this package, I get > > Python 2.3.2 (#1, Nov 13 2003, 20:02:35) > [GCC 3.3.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import mx > >>> from Numeric import * > Traceback (most recent call last): > File "", line 1, in ? > ImportError: No module named Numeric > >>> > > When I look in the ~/usr/lib/python/Numeric directory, I find no > __init__.py file as under ~/usr/lib/python/mx. Does that have anything > to do with the problem, and do you have other suggestions? Yes, the problem is that Numeric uses that Numeric.pth file to basically change the sys.path to include it's directory for importing. The problem is, that *.pth files only work in the "official" site-packages directory (/usr/local/lib/python2.3/site-packages type places). The fix, as you mention, is just to add an __init__.py to the Numeric directory to make it executable ('touch __init__.py' in that directory works file). Then if ~/usr/lib/Numeric is in your PYTHONPATH, everything should work smoothly: >>> import Numeric >>> Numeric.__file__ '/usr/home/chapmanb/usr/lib/python/Numeric/__init__.py' >>> Hope this makes sense and sorry about the confusion. Brad From chili at daimi.au.dk Fri Mar 5 03:34:56 2004 From: chili at daimi.au.dk (Jakob Fredslund) Date: Fri Mar 5 03:40:50 2004 Subject: [BioPython] Re: biopython install problems In-Reply-To: <20040304173525.GD810@evostick.agtec.uga.edu> References: <40474AF8.1010404@daimi.au.dk> <20040304173525.GD810@evostick.agtec.uga.edu> Message-ID: <40483BB0.5080208@daimi.au.dk> Brad Chapman wrote: > Hi Jacob; > [Copying to the Biopython list] > > >>I'm following the instructions at >> >>http://www.biopython.org/docs/install/Installation.html >> >>python setup.py install --home=~/usr >> >>to install the package under my ~/usr directory, and it works fine (the >>package is installed in ~/usr/lib/python). >> >>I do the exact same thing with the Numeric package and it does put a >>directory called Numeric (along with a Numeric.pth file) where I >>expected, but it also puts some other files in ~/usr/include/python. >> >>And when I try to import this package, I get >> >>Python 2.3.2 (#1, Nov 13 2003, 20:02:35) >>[GCC 3.3.1] on linux2 >>Type "help", "copyright", "credits" or "license" for more information. >> >>>>>import mx >>>>>from Numeric import * >> >>Traceback (most recent call last): >> File "", line 1, in ? >>ImportError: No module named Numeric >> >>When I look in the ~/usr/lib/python/Numeric directory, I find no >>__init__.py file as under ~/usr/lib/python/mx. Does that have anything >>to do with the problem, and do you have other suggestions? > > > Yes, the problem is that Numeric uses that Numeric.pth file to > basically change the sys.path to include it's directory for > importing. The problem is, that *.pth files only work in the > "official" site-packages directory (/usr/local/lib/python2.3/site-packages > type places). > > The fix, as you mention, is just to add an __init__.py to the > Numeric directory to make it executable ('touch __init__.py' in that > directory works file). Then if ~/usr/lib/Numeric is in your > PYTHONPATH, everything should work smoothly: > > >>>>import Numeric >>>>Numeric.__file__ > > '/usr/home/chapmanb/usr/lib/python/Numeric/__init__.py' > > > Hope this makes sense and sorry about the confusion. > Brad Hi Brad, Yup, that solved the problem. Thanks-a-many! jakob -- Jakob Fredslund, PhD. Assistant Professor Bioinformatics Research Center University of Aarhus Denmark From postmaster at tenere.ucam-campos.br Sat Mar 6 18:56:04 2004 From: postmaster at tenere.ucam-campos.br (postmaster@tenere.ucam-campos.br) Date: Sat Mar 6 17:56:04 2004 Subject: [BioPython] VIRUS IN YOUR MAIL TO marilia Message-ID: <200403062356.i26Nu4Y5013056@tenere.ucam-campos.br> V I R U S A L E R T Our viruschecker found a VIRUS in your email to "marilia". We stopped delivery of this email! Now it is on you to check your system for viruses For further information about this viruschecker see: http://amavis.org/ AMaViS - A Mail Virus Scanner, licenced GPL For your reference, here are the headers from your email: ------------------------- BEGIN HEADERS ----------------------------- Return-Path: Received: from ucam-campos.br (dialup-200-184-36-241.intelignet.com.br [200.184.36.241] (may be forged)) by tenere.ucam-campos.br (8.12.8/8.12.4) with ESMTP id i26NtrAR012946 for ; Sat, 6 Mar 2004 20:55:54 -0300 Posted-Date: Sat, 6 Mar 2004 20:55:53 -0300 Received-Date: Sat, 6 Mar 2004 20:55:54 -0300 Message-Id: <200403062355.i26NtrAR012946@tenere.ucam-campos.br> From: biopython@biopython.org To: marilia@ucam-campos.br Subject: Re: Excel file Date: Sat, 6 Mar 2004 19:49:21 -0300 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0002_00007064.00000238" X-Priority: 3 X-MSMail-Priority: Normal X-AntiVirus: scanned for viruses by AMaViS 0.2.1 (http://amavis.org/) -------------------------- END HEADERS ------------------------------ From letondal at pasteur.fr Sun Mar 7 05:50:36 2004 From: letondal at pasteur.fr (Catherine Letondal) Date: Sun Mar 7 05:56:20 2004 Subject: [BioPython] problem in NCBI Blast Parser ? Message-ID: <20040307115036.A122645@electre.pasteur.fr> Hi, It seems that the Blast.NCBIWWW parser has a problem? Traceback (most recent call last): File "blast_swissprot_parse.py", line 6, in ? record = blast_parser.parse(blast_results) File "/home1/sis/njoly/tmp/bp/biopython-1.24/build/lib.osf1-V5.1-alpha-2.3/Bio/Blast/NCBIWWW.py", line 48, in parse File "/home1/sis/njoly/tmp/bp/biopython-1.24/build/lib.osf1-V5.1-alpha-2.3/Bio/Blast/NCBIWWW.py", line 97, in feed File "/home1/sis/njoly/tmp/bp/biopython-1.24/build/lib.osf1-V5.1-alpha-2.3/Bio/ParserSupport.py", line 335, in read_and_call_until File "/home1/sis/njoly/tmp/bp/biopython-1.24/build/lib.osf1-V5.1-alpha-2.3/Bio/ParserSupport.py", line 411, in safe_readline SyntaxError: Unexpected end of stream. The file handler that was passed to the parse() method was on a file just saved with the NCBIWWW.blast.read() method (as explained in tutorials, either ours our the official one: http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html exercises 11.16 and 11.17. ) Thanks a lot, it is for a course next Tuesday :-( -- Catherine Letondal -- Pasteur Institute Computing Center From ericbrown99 at yahoo.com.au Sun Mar 7 20:42:47 2004 From: ericbrown99 at yahoo.com.au (=?iso-8859-1?q?Eric=20Brown?=) Date: Sun Mar 7 20:48:28 2004 Subject: [BioPython] Why Perl Message-ID: <20040308014247.91903.qmail@web41207.mail.yahoo.com> Dear All, I am new to the discussion group.I am student in comparative genomics and know bit of python and java. Would anybody advise me about making choice between Perl and Python for bioinformatic applications. I guess userbase, resorce richness, job perspective and language stregnth are issues with me to compare and start one. I will go with the maturity of modules and user base. With thanks and regards, Eric Find local movie times and trailers on Yahoo! Movies. http://au.movies.yahoo.com From chapmanb at uga.edu Mon Mar 8 14:36:35 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Mon Mar 8 14:48:26 2004 Subject: [BioPython] problem in NCBI Blast Parser ? In-Reply-To: <20040307115036.A122645@electre.pasteur.fr> References: <20040307115036.A122645@electre.pasteur.fr> Message-ID: <20040308193635.GJ3423@evostick.agtec.uga.edu> Hi Catherine; > It seems that the Blast.NCBIWWW parser has a problem? [...] > The file handler that was passed to the parse() method was on a file > just saved with the NCBIWWW.blast.read() method > > (as explained in tutorials, either ours our the official one: > http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html > exercises 11.16 and 11.17. The problem really isn't in the parser, but rather in the NCBIWWW.blast call. Calling blast in the way you do in the tutorial gives back Text output (
 tags and the raw text) rather than the
expected HTML. Thus, the NCBIWWW parser chokes on the file, since it
is not really the HTML it expects.

It turns out that the problem was that the format_type parameter of
blast defaulted to 'html'. NCBI no longer accepts this and only
takes 'HTML'. I updated the blast function so that it now has 'HTML'
as the default, so you can either update from CVS or pass
format_type = "HTML" as an argument to the blast call.

After these changes, the parser seems to work fine. Thanks for the
report and hope this helps.
Brad
From anunberg at oriongenomics.com  Mon Mar  8 17:23:01 2004
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Mon Mar  8 17:26:56 2004
Subject: [BioPython] Updates to CVS -- please do test
In-Reply-To: <20040229223143.GJ24150@evostick.agtec.uga.edu>
Message-ID: 

I tried running my fasta indexing script which normally works and got the
following error
File "/loginhome/anunberg/bin/index_fasta.py", line 39, in ?
    main()
  File "/loginhome/anunberg/bin/index_fasta.py", line 36, in main
    Fasta.index_file(fasta_file,options.name,get_id)
  File "/compbio/lib/python/Bio/Fasta/__init__.py", line 229, in index_file
    SimpleSeqRecord.create_berkeleydb([filename], indexname, indexer)
  File "/compbio/lib/python/Bio/Mindy/SimpleSeqRecord.py", line 98, in
create_berkeleydb
    from Bio.Mindy import BerkeleyDB
  File "/compbio/lib/python/Bio/Mindy/BerkeleyDB.py", line 2, in ?
    from bsddb3 import db
ImportError: No module named bsddb3

Do I need to install something? I did a check out of cvs and installed it..
Andy
> Hello everyone;
> I wanted to write a quick mail because I made a number of changes to
> CVS. Specifically, I did some work on the GenBank parser and then
> checked in the new Martel-based Fasta parser I wrote about last
> week. Since I know these are some of the more widely used modules in
> Biopython, this might make the CVS a little more unstable (well,
> potentially having more bugs) than normal.
> 
> I'd appreciate it if interested people would check it out and give
> the new modules and changes a spin. If we can catch and squish bugs
> now, then it'll help make the next release smooth as normal.
> 
> For the curious, here are the major changes:
> 
> -> Checked in the Fasta parser using Martel. The gruesome details
> were described here:
> 
> http://portal.open-bio.org/pipermail/biopython/2004-February/001877.html
> 
> -> Updated the GenBank parser, specifically the Martel GenBank
> format. This involved several things:
> * Removing the restricted list of names of feature and qualifier
>   keys. We now use more general regular expressions. Hopefully
>   this will make life easier for developers and users.
> * Adding useful bits of code from the redundant
>   Bio/expressions/genbank.py, which is Andrew's take on the
>   GenBank parsing problem.
> * Moved Bio/GenBank/genbank_format.py to
>   Bio/expressions/genbank.py to keep the Martel formats together.
> 
> -> Misc fixes to GenBank/__init__.py to make the new changes work.
> 
> Thanks in advance for testing and reporting bugs!
> Brad
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 

-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com


From chapmanb at uga.edu  Mon Mar  8 21:02:54 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Mon Mar  8 21:14:45 2004
Subject: [BioPython] Updates to CVS -- please do test
In-Reply-To: 
References: <20040229223143.GJ24150@evostick.agtec.uga.edu>
	
Message-ID: <20040309020254.GB30775@evostick.agtec.uga.edu>

Hi Andy;
Thanks for checking out the changes in CVS.

> I tried running my fasta indexing script which normally works 
> and got the following error
[...]
> ImportError: No module named bsddb3
> 
> Do I need to install something? I did a check out of cvs and 
> installed it..

It's complaining about not having the bsddb3 module -- which are
bindings to BerkeleyDB.

After some reflection and digging around, I'm realizing that I made
a mistake in making the Berkeley indexing the default -- although
bsddb is included with Python, it seems like it will not be built on
a lot of platforms. Sounds like more trouble then it solves.

So, I've adjusted the scripts to use the Flat file indexing by
default, with an option for BerkeleyDB indexing if you want to do it
that way. Introducing new required dependencies is bad.

So, I've just fixed the index_file function in CVS -- please try out
the new version and let me know if it gives you any problems (it
also requires a fix to Bio/Mindy which is also checked in). Thanks
again for testing things out -- I'm very happy to have people
looking at this.

Brad
From dlondon at ebi.ac.uk  Wed Mar 10 09:35:02 2004
From: dlondon at ebi.ac.uk (Darin London)
Date: Wed Mar 10 09:40:40 2004
Subject: [BioPython] BOSC 2004 Announcement and Call for Papers (fwd)
Message-ID: 

 {Please pass the word!}
 
 MEETING ANNOUNCEMENT & CALL FOR SPEAKERS
 
 The 5th annual Bioinformatics Open Source Conference (BOSC'2004) is 
 organized by the not-for-profit Open Bioinformatics Foundation. The 
 meeting will take place July 29-30, 2004 in Glasgow, Scotland, and is 
 one of several Special Interest Group (SIG) meetings occurring in 
 conjunction with the 12th International Conference on Intelligent 
 Systems for Molecular Biology.

 see http://www.iscb.org/ismb2004/ for more information.

 The focus of the meeting will be on current and emerging Open Source** 
 informatics tools and toolkits. BOSC provides a forum for developers, 
 project groups, users and interested parties to meet personally, exchange ideas and 
 collaborate together.

 In addition, keynote speeches from well known Open Source Bioinformatics 
 leaders are being planned.
 
 BOSC PROGRAM & CONTACT INFO
 
 * Web: http://www.open-bio.org/bosc2004/
 * Email: bosc@open-bio.org
 * Online registration: https://www.cteusa.com/iscb3/
 
 
 FEES
 
 * Corporate :GBP ?165.00 british pounds sterling
 * Academic : GBP ?120.00 british pounds sterling
 * Student : GBP ?90.00 british pounds sterling
 
 A 17.5% Valued Added Tax(VAT) will be added to all fees.

 Note: We have tried to set our fees as low as possible without risking 
 the chance that the foundation will lose money on the event. We budget 
 with the goal of breaking even on costs or realizing a small profit.
 
 REGISTER ONLINE FOR BOSC'2004 & ISMB AT:
 https://www.cteusa.com/iscb3/
 
 SPEAKERS & ABSTRACTS WANTED
 
 The program committee is currently seeking abstracts for talks at BOSC 
 2004. BOSC is a great opportunity for you to tell the community about 
 your use, development, or philosophy of open source software development 
 in bioinformatics. The committee will select several submitted abstracts 
 for 25-minute talks and others for shorter "lightning" talks. Accepted 
 abstracts will be published on the BOSC web site.
 
 If you are interested in speaking at BOSC 2004, 
 please send us:

 * an abstract (no more than a few paragraphs)
 * a URL for the project page, if applicable  
 * information about the open source license used for your software or
   your release plans.
 
 LIGHTNING-TALK SPEAKERS WANTED!
 
 The program committee is currently seeking speakers for the lightning 
 talks at BOSC 2004. Lightning talks are quick - only five minutes 
 long - and a great opportunity for you to give people a quick 
 summary of your open source project, code, idea, or vision of the future.

 If you are interested in giving a lightning talk at BOSC 2004, 
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 We will accept entries on-line until BOSC starts, but
 space for demos and lightning talks is limited.
SOFTWARE DEMONSTRATIONS WANTED! If you are involved in the development of Open Source Bioinformatics Software, you are invited to provide a short demonstration to attendees of BOSC 2004. If you are interested in giving a software demonstration at BOSC 2004, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * Internet connectivity requirements (e.g. website Application served on the world wide web, or web based client application). We will accept entries on-line until the BOSC starts, but space for demos and lightning talks is limited. ** Because the mission of the OBF is to promote Open Source software, we will favor submissions for projects that apply a recognized Open Source License, or adhere to the general Open Source Philosophy. See the following websites for further details: href="http://www.opensource.org/licenses/ href="http://www.opensource.org/docs/definition.php From qinfo8 at bvimailbox.com Wed Mar 10 11:45:22 2004 From: qinfo8 at bvimailbox.com (Info) Date: Wed Mar 10 11:50:55 2004 Subject: [BioPython] Communique / Press release Message-ID: <200403101650.i2AGoltk009453@portal.open-bio.org> Publications Canadiennes / Canadian Publications 4865 Hwy 138, r.r. 1 St-Andrews west Ontario, KOC 2A0 PRESS RELEASE CANADIAN SUBSIDY DIRECTORY YEAR 2004 EDITION Legal Deposit-National Library of Canada ISBN 2-922870-05-7 The Canadian Subsidy Directory 2004 is now available, newly revised it is the most complete and affordable reference for anyone looking for financing. It is the perfect tool for new and existing businesses, individuals, foundations and associations. This Publication contains more than 2000 direct and indirect financial subsidies, grants and loans offered by government departments and agencies, foundations, associations and organisations. In this new 2004 edition all programs are well described. The Canadian Subsidy Directory is the most comprehensive tool to start up a business, improve existent activities, set up a business plan, or obtain assistance from experts in fields such as: Industry, transport, agriculture, communications, municipal infrastructure, education, import-export, labor, construction and renovation, the service sector, hi-tech industries, research and development, joint ventures, arts, cinema, theatre, music and recording industry, the self employed, contests, and new talents. Assistance from and for foundations and associations, guidance to prepare a business plan, market surveys, computers, and much more! The Canadian Subsidy Directory is sold $ 69.95, to obtain a copy please call 819-322-5756 or visit the web site at: http://www.netpublications.net From libsvm at tom.com Wed Mar 10 20:00:08 2004 From: libsvm at tom.com (denny) Date: Wed Mar 10 20:07:26 2004 Subject: [BioPython] Is there any more detailed documentation to BioPython? Message-ID: <200403110100.i2B106tk013180@portal.open-bio.org> Hi, I use BioPython about one year and it is really a good programming language. But I think the ONLY drawback is that BioPython has simple or poor documentation. When I use a certain module in BioPython, I have to read all of its source code. It is really not very convenience.On the contrast, BioPerl gives more detailed documentation. Maybe someone has the same feeling like me. Regards. Denny         libsvm@tom.com           2005-03-12 From dag at sonsorol.org Wed Mar 10 21:30:13 2004 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed Mar 10 21:35:56 2004 Subject: [BioPython] O|B|F mail update -- making progress on anti-spam issues with our mailing lists Message-ID: <404FCF35.5010705@sonsorol.org> Hi folks, Apologies for the cross-posting but I just wanted to give our list members and admins an update on some new anti-spam measures we have (re)enabled. Good news to report basically... The most annoying spams recently have been the simple plain text messages without any HTML, attachments or mime-encoding that just slip right by our filters. Some lists have been forced to switch over to "only members can post" while other lists (like bioperl) have consistantly voted to stay as open as possible. I'll update you on our current efforts as well as a new effort that is about 24 hours old but already working really well so far. Until yesterday we had three main lines of defense against spam: 1. The mailserver itself (rejects mail from nonexistant domains, etc.) 2. The sendmail Mail::Milter extention (MIMEDefang+SpamAssassin are used to scan all incoming messages. Anything that scores higher than 8.0 is simply discarded automatically. MIMEDefang also strips dangerous attachments like .exe and .pif) 3. Our mailing list moderation queue (emails with attachments, odd MIME encodings and spamassassin scores from 0.0 - 7.9 are held in a moderator queue for a human to make an accept/discard decision) Here are some stats on how this system worked over the past few days: o 138 attempts to relay mail through our server blocked o 192 emails blocked due to forged or unresolvable sender domain o 577 emails discarded automatically by SpamAssassin+MIMEDefang This system worked *ok* but put a lot of work onto the shoulders of our list admins who constantly had to weed out the spam caught up in the mailing list moderator system. Yesterday I brought online another system that seems to be already working really well. It catches spam before we even accept it on our server which makes the load easier on both our scanning software and our human list moderators. The system is the RBL+ blackhole list from http://www.mail-abuse.org and the way it works is that we now query (via DNS) the RBL+ database each time someone connects to our mail server. If the RBL check against the sender IP address comes back as "positive" we reject the incoming email. RBL+ is a combination of four constantly updated databases: 1. RBL -- IP addresses of known, documented spammers and spam machines 2. RSS -- IP addresses of documented/tested unsecured email relays 3. OPS -- IP addresses of documented open proxy servers w/ spam history 3. DUL -- IP addresses belonging to ISP dialup and DHCP customers We have already blocked 137 email attempts in the last 24 hours from machines that were listed in one or more of the RBL databases. It is too soon to tell but if the RBL+ system plus our existing anti-spam measures work well enough we may be in a position where our "closed" mailing lists could revert back to being 'anyone can post'. Feedback appreciated. Especially if you get a "reject" message from us saying that you are listed in the RBL+ blackhole database! Regards, Chris O|B|F From cavallo at biochem.ucl.ac.uk Thu Mar 11 06:42:44 2004 From: cavallo at biochem.ucl.ac.uk (Antonio Cavallo) Date: Thu Mar 11 06:48:27 2004 Subject: [BioPython] embl Message-ID: <405050B4.5060007@biochem.ucl.ac.uk> Hi, I'm new (and quite confused) to biopython. I have a simple question (maybe it looks silly): how do I parse an embl data file using biopython? Is there any way to retrieve the sequence information (The CDS section)? What about the position of the CDS sections (they are split in sub pieces)? kindly regards, antonio cavallo From lpritc at scri.sari.ac.uk Thu Mar 11 07:53:20 2004 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Thu Mar 11 07:59:03 2004 Subject: [BioPython] embl In-Reply-To: <405050B4.5060007@biochem.ucl.ac.uk> References: <405050B4.5060007@biochem.ucl.ac.uk> Message-ID: <40506140.5050401@scri.sari.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Antonio Cavallo wrote: | Hi, | I'm new (and quite confused) to biopython. | I have a simple question (maybe it looks silly): | how do I parse an embl data file using biopython? | Is there any way to retrieve the sequence information (The CDS section)? | What about the position of the CDS sections (they are split in sub pieces)? Not that silly a question. I had a similar problem when I was working with .tab files (with no header information) from the Sanger, and ended up writing a BioPython-style parser for them. It's not the most robust code in the world, but you're welcome to a copy if it might help you. - -- Dr Leighton Pritchard AMRSC D104, PPI, Scottish Crop Research Institute Invergowrie, Dundee, DD2 5DA, Scotland, UK E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml T: +44 (0)1382 568579 F: +44 (0)1382 568578 PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAUGFAL1gZ+OWLpBsRAj9yAJ4tCmuI43Xzdz/oa7AQvPQ07HvKrACeLZjj m67cOc3ZCZzkfhGDIFmft80= =EBcY -----END PGP SIGNATURE----- From biopython at wardroper.org Thu Mar 11 10:32:35 2004 From: biopython at wardroper.org (Alan Wardroper) Date: Thu Mar 11 10:32:37 2004 Subject: [BioPython] contig mapping in BioPython Message-ID: <6.0.3.0.2.20040424150959.03801db0@wardroper.org> I'm thinking about writing some BioPython modules for contig/genome mapping - something akin to BioPerl's Bio::Assembler::contig - for use in genome mapping (and whatever else it ends up lending itself to). Can't find any references to any such projects that are ongoing but would like to check if anyone else is working on this before I put in too much time in reinventing more wheels than we need. Anyone think this would/would not be useful? Thanks for your input... ======================== Alan Wardroper ======================== From anunberg at oriongenomics.com Thu Mar 11 11:20:43 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Thu Mar 11 11:24:34 2004 Subject: [BioPython] contig mapping in BioPython In-Reply-To: <6.0.3.0.2.20040424150959.03801db0@wardroper.org> Message-ID: There is something similar in Bio.Sequencing Have you checked out Biopython from CVS? On a totally different note: I agree that as a simple user of biopython that the documentation can be confusing because python does not use special characters to denote basic data types(variables,lists, dictionary And I recollect that in other places the object to pass or what is returned is not documented either(ie you pass a Seq object and get a SeqFeature object in return... If I ever get any time, or if I get so fed up to make the time, I will go through some of the libraries I use most often and try to create more documentation Andy > I'm thinking about writing some BioPython modules for contig/genome mapping > - something akin to BioPerl's Bio::Assembler::contig - for use in genome > mapping (and whatever else it ends up lending itself to). > > Can't find any references to any such projects that are ongoing but would > like to check if anyone else is working on this before I put in too much > time in reinventing more wheels than we need. > Anyone think this would/would not be useful? > > Thanks for your input... > > ======================== > Alan Wardroper > ======================== > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From mbatesalann at netscape.net Wed Mar 10 14:31:32 2004 From: mbatesalann at netscape.net (mbatesalann@netscape.net) Date: Thu Mar 11 16:28:06 2004 Subject: [BioPython] REPLY SOON Message-ID: Dear Friend, As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer. It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my business. Though I am very rich, I was never generous, I was always hostile to people and only focused on my business as that was the only thing I cared for. But now I regret all this as I now know that there is more to life than just wanting to have or make all the money in the world. I believe when God gives me a second chance to come to this world I would live my life a different way from how I have lived it. Now that God has called me, I have willed and given most of my property and assets to my immediate and extended family members as well as a few close friends. I want God to be merciful to me and accept my soul so, I have decided to give alms to charity organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed money to some charity organizations in the U.A.E, Algeria and Malaysia. Now that my health has deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one of my accounts and distribute the money which I have there to charity organization in Bulgaria and Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as they seem not to be contended with what I have left for them. The last of my money which no one knows of is the huge cash deposit of eighteen million dollars $18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations. I have set aside 10% for you and for your time. God be with you. BATES ALAN From sbassi at asalup.org Fri Mar 12 10:20:33 2004 From: sbassi at asalup.org (Sebastian Bassi) Date: Fri Mar 12 18:19:33 2004 Subject: [BioPython] [Fwd: Re: Tm calc: 1.3 (This is the good one!)] Message-ID: <4051D541.6020107@asalup.org> Brad: I send this using the list because it seems you have an antispam filter that doesn't allow me to reach you using attachments. -------- Original Message -------- Date: Mon, 08 Mar 2004 21:04:55 -0300 From: Sebastian Bassi Reply-To: sbassi@asalup.org Organization: ASALUP To: Brad Chapman Subject: Re: Tm calc: 1.3 (This is the good one!) Brad Chapman wrote: > Hi Sebastian; >>Did you get my last version of Tm function? > I didn't. If you could send it again that would be great. I was > wondering what happened with that :-). Here is the new version, see inside the zip (I did zip it because plain text often get corrupted by email) -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info -------------- next part -------------- A non-text attachment was scrubbed... Name: tm.zip Type: application/zip Size: 3775 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040312/6c805e64/tm-0001.zip -------------- next part -------------- *** tmori.py Fri Mar 5 11:06:45 2004 --- tm.py Fri Mar 5 11:10:25 2004 *************** *** 1,6 **** --- 1,9 ---- import string import math + STRONG_BONDS = ["G", "C"] + WEAK_BONDS = ["A", "T", "U"] + def Tm_staluc(s,dnac=50,saltc=50,rna=0): """Returns DNA/DNA tm using nearest neighbor thermodynamics. dnac is DNA concentration [nM] and saltc is salt concentration [mM]. *************** *** 34,66 **** if rna==0: #DNA/DNA #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594 ! if stri[0]=="G" or stri[0]=="C": deltah=deltah-0.1 deltas=deltas+2.8 ! elif stri[0]=="A" or stri[0]=="T": deltah=deltah-2.3 deltas=deltas-4.1 ! if stri[-1]=="G" or stri[-1]=="C": ! deltah=deltah-0.1 deltas=deltas+2.8 ! elif stri[-1]=="A" or stri[-1]=="T": deltah=deltah-2.3 deltas=deltas-4.1 dhL=dh+deltah dsL=ds+deltas return dsL,dhL elif rna==1: ! #RNA ! if stri[0]=="G" or stri[0]=="C": deltah=deltah-3.61 deltas=deltas-1.5 ! elif stri[0]=="A" or stri[0]=="T" or stri[0]=="U": deltah=deltah-3.72 deltas=deltas+10.5 ! if stri[-1]=="G" or stri[-1]=="C": deltah=deltah-3.61 deltas=deltas-1.5 ! elif stri[-1]=="A" or stri[-1]=="T" or stri[0]=="U": deltah=deltah-3.72 deltas=deltas+10.5 dhL=dh+deltah --- 37,69 ---- if rna==0: #DNA/DNA #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594 ! if stri[0] in STRONG_BONDS: deltah=deltah-0.1 deltas=deltas+2.8 ! elif stri[0] in WEAK_BONDS: deltah=deltah-2.3 deltas=deltas-4.1 ! if stri[0] in STRONG_BONDS: ! deltah=deltah-0.1 deltas=deltas+2.8 ! elif stri[0] in WEAK_BONDS: deltah=deltah-2.3 deltas=deltas-4.1 dhL=dh+deltah dsL=ds+deltas return dsL,dhL elif rna==1: ! #RNA/RNA ! if stri[0] in STRONG_BONDS: deltah=deltah-3.61 deltas=deltas-1.5 ! elif stri[0] in WEAK_BONDS: deltah=deltah-3.72 deltas=deltas+10.5 ! if stri[0] in STRONG_BONDS: deltah=deltah-3.61 deltas=deltas-1.5 ! elif stri[0] in WEAK_BONDS: deltah=deltah-3.72 deltas=deltas+10.5 dhL=dh+deltah *************** *** 68,90 **** # print "delta h=",dhL return dsL,dhL ! def overcount(st,p): ! """Returns how many p are on st, works even for overlapping""" ! ocu=0 ! x=0 ! while 1: ! try: ! i=st.index(p,x) ! except ValueError: ! break ! ocu=ocu+1 ! x=i+1 ! return ocu sup=string.upper(s) R=1.987 # universal gas constant in Cal/degrees C*Mol vsTC,vh=tercorr(sup) vs=vsTC k=(dnac/4.0)*1e-8 #With complementary check on, the 4.0 should be changed to a variable. --- 71,91 ---- # print "delta h=",dhL return dsL,dhL ! def countdinucs(s): ! """Counts dinucleotide frequencies in a sequence""" ! dinucs={} ! map(dinucs.__setitem__,[a+b for a in 'ACGT' for b in 'ACGT'],[0]*16) ! for i in range(len(s)-1): ! dn=s[i:i+2] ! dinucs[dn]+=1 ! return dinucs sup=string.upper(s) R=1.987 # universal gas constant in Cal/degrees C*Mol vsTC,vh=tercorr(sup) vs=vsTC + dinuc=countdinucs(sup) + k=(dnac/4.0)*1e-8 #With complementary check on, the 4.0 should be changed to a variable. *************** *** 92,136 **** if rna==0: #DNA/DNA #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594 ! vh=vh+((overcount(sup,"AA"))*7.9+(overcount(sup,"TT"))* ! 7.9+(overcount(sup,"AT"))*7.2+(overcount(sup,"TA"))* ! 7.2+(overcount(sup,"CA"))*8.5+(overcount(sup,"TG"))* ! 8.5+(overcount(sup,"GT"))*8.4+(overcount(sup,"AC"))*8.4) ! vh=vh+((overcount(sup,"CT"))*7.8+(overcount(sup,"AG"))* ! 7.8+(overcount(sup,"GA"))*8.2+(overcount(sup,"TC"))*8.2) ! vh=vh+((overcount(sup,"CG"))*10.6+(overcount(sup,"GC"))* ! 10.6+(overcount(sup,"GG"))*8+(overcount(sup,"CC"))*8) ! ! vs=vs+((overcount(sup,"AA"))*22.2+(overcount(sup,"TT"))* ! 22.2+(overcount(sup,"AT"))*20.4+(overcount(sup,"TA"))*21.3) ! vs=vs+((overcount(sup,"CA"))*22.7+(overcount(sup,"TG"))* ! 22.7+(overcount(sup,"GT"))*22.4+(overcount(sup,"AC"))*22.4) ! vs=vs+((overcount(sup,"CT"))*21.0+(overcount(sup,"AG"))* ! 21.0+(overcount(sup,"GA"))*22.2+(overcount(sup,"TC"))*22.2) ! vs=vs+((overcount(sup,"CG"))*27.2+(overcount(sup,"GC"))* ! 27.2+(overcount(sup,"GG"))*19.9+(overcount(sup,"CC"))*19.9) ds=vs dh=vh else: #RNA/RNA hybridisation of Xia et al (1998) #Biochemistry 37: 14719-14735 ! vh=vh+((overcount(sup,"AA"))*6.82+(overcount(sup,"TT"))* ! 6.6+(overcount(sup,"AT"))*9.38+(overcount(sup,"TA"))* ! 7.69+(overcount(sup,"CA"))*10.44+(overcount(sup,"TG"))* ! 10.5+(overcount(sup,"GT"))*11.4+(overcount(sup,"AC"))*10.2) ! vh=vh+((overcount(sup,"CT"))*10.48+(overcount(sup,"AG"))* ! 7.6+(overcount(sup,"GA"))*12.44+(overcount(sup,"TC"))*13.3) ! vh= vh+((overcount(sup,"CG"))*10.64+(overcount(sup,"GC"))* ! 14.88+(overcount(sup,"GG"))*13.39+(overcount(sup,"CC"))*12.2) ! ! vs=vs+((overcount(sup,"AA"))*19.0+(overcount(sup,"TT"))* ! 18.4+(overcount(sup,"AT"))*26.7+(overcount(sup,"TA"))*20.5) ! vs=vs+((overcount(sup,"CA"))*26.9+(overcount(sup,"TG"))* ! 27.8+(overcount(sup,"GT"))*29.5+(overcount(sup,"AC"))*26.2) ! vs=vs+((overcount(sup,"CT"))*27.1+(overcount(sup,"AG"))* ! 19.2+(overcount(sup,"GA"))*32.5+(overcount(sup,"TC"))*35.5) ! vs=vs+((overcount(sup,"CG"))*26.7+(overcount(sup,"GC"))* ! 36.9+(overcount(sup,"GG"))*32.7+(overcount(sup,"CC"))*29.7) ds=vs dh=vh --- 93,119 ---- if rna==0: #DNA/DNA #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594 ! vh=vh+dinuc["AA"]*7.9+dinuc["TT"]*7.9+dinuc["AT"]*7.2+dinuc["TA"]*7.2+\ ! dinuc["CA"]*8.5+dinuc["TG"]*8.5+dinuc["GT"]*8.4+dinuc["AC"]*8.4+\ ! dinuc["CT"]*7.8+dinuc["AG"]*7.8+dinuc["GA"]*8.2+dinuc["TC"]*8.2+\ ! dinuc["CG"]*10.6+dinuc["GC"]*10.6+dinuc["GG"]*8+dinuc["CC"]*8 ! vs=vs+dinuc["AA"]*22.2+dinuc["TT"]*22.2+dinuc["AT"]*20.4+dinuc["TA"]*21.3+\ ! dinuc["CA"]*22.7+dinuc["TG"]*22.7+dinuc["GT"]*22.4+dinuc["AC"]*22.4+\ ! dinuc["CT"]*21.0+dinuc["AG"]*21.0+dinuc["GA"]*22.2+dinuc["TC"]*22.2+\ ! dinuc["CG"]*27.2+dinuc["GC"]*27.2+dinuc["GG"]*19.9+dinuc["CC"]*19.9 ds=vs dh=vh else: #RNA/RNA hybridisation of Xia et al (1998) #Biochemistry 37: 14719-14735 ! vh=dinuc["AA"]*6.6+dinuc["TT"]*6.6+dinuc["AT"]*5.7+dinuc["TA"]*8.1+\ ! dinuc["CA"]*10.5+dinuc["TG"]*10.5+dinuc["GT"]*10.2+dinuc["AC"]*10.2+\ ! dinuc["CT"]*7.6+dinuc["AG"]*7.6+dinuc["GA"]*13.3+dinuc["TC"]*13.3+\ ! dinuc["CG"]*8.0+dinuc["GC"]*14.2+dinuc["GG"]*12.2+dinuc["CC"]*12.2+\ ! dinuc["AA"]*18.4+dinuc["TT"]*18.4+dinuc["AT"]*15.5+dinuc["TA"]*16.9 ! vs=vs+dinuc["CA"]*27.8+dinuc["TG"]*27.8+dinuc["GT"]*26.2+dinuc["AC"]*26.2+\ ! dinuc["CT"]*19.2+dinuc["AG"]*19.2+dinuc["GA"]*35.5+dinuc["TC"]*35.5+\ ! dinuc["CG"]*19.4+dinuc["GC"]*34.9+dinuc["GG"]*29.7+dinuc["CC"]*29.7 ds=vs dh=vh *************** *** 138,141 **** tm=((1000* (-dh))/(-ds+(R * (math.log(k)))))-273.15 # print "ds="+str(ds) # print "dh="+str(dh) ! return tm \ No newline at end of file --- 121,124 ---- tm=((1000* (-dh))/(-ds+(R * (math.log(k)))))-273.15 # print "ds="+str(ds) # print "dh="+str(dh) ! return tm From nauman.maqbool at agresearch.co.nz Sun Mar 14 22:47:37 2004 From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman) Date: Sun Mar 14 22:53:11 2004 Subject: [BioPython] Blast parser error Message-ID: Hi I am new to biopython and am trying out the NCBI Standalone Blast parser. While trying the blast parsing methods from the cookbook (parsing standalone Blastn output) I got the following error message: >>> ================================ RESTART ================================ >>> Traceback (most recent call last): File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 26, in -toplevel- testparser(in_file) File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 11, in testparser b_record = b_iterator.next() File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1332, in next return self._parser.parse(File.StringHandle(data)) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 557, in parse self._scanner.feed(handle, self._consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 97, in feed self._scan_rounds(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 153, in _scan_rounds self._scan_alignments(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 287, in _scan_alignments self._scan_pairwise_alignments(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 297, in _scan_pairwise_alignments self._scan_one_pairwise_alignment(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 309, in _scan_one_pairwise_alignment self._scan_hsp(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 337, in _scan_hsp self._scan_hsp_alignment(uhandle, consumer) File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 368, in _scan_hsp_alignment read_and_call(uhandle, consumer.query, start='Query') File "C:\Python23\Lib\site-packages\Bio\ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'Query': ncbiClient.20040311_1242_9391.log >>> The version of Blast we are running is: 2.2.6 [Apr-09-2003]. I found a similar blast parser error in the biopython archives but that was referring to the output format change in blastx. I don't think that the Blastn output has changed in the recent past, so it might be due to something that I might be missing in my script, here is the script that I am running: from Bio.Blast import NCBIStandalone in_file = 'test.blast' def testparser(blastfile): blast_out = open(blastfile, "r") b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break E_VALUE_THRESH = 0.05 for alignment in b_record.alignments: for hsp in alignments.hsp: if hsp.expect < E_VALUE_THRESH: print '*****Alignmnent*****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect # Main testparser(in_file) Any help will be highly appreciated. Regards Nauman ******************************************** Nauman J Maqbool PhD Bioinformatics Group AgResearch Invermay Private Bag 50034 Puddle Alley Mosgiel New Zealand email: nauman.maqbool@agresearch.co.nz Tel: +64-3-489 9031 Fax: +64-3-489 3739 ******************************************** ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jeffrey_chang at stanfordalumni.org Mon Mar 15 00:14:24 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Mon Mar 15 00:19:55 2004 Subject: [BioPython] Blast parser error In-Reply-To: References: Message-ID: <9FC1F012-763F-11D8-8B70-000A956845CE@stanfordalumni.org> It is likely because the BLAST format has changed. If you can send the BLAST output file that is causing the problem, I can take a look at it for you. Otherwise, you will need to figure out which line is causing the problem and update the parser to deal with it properly. Jeff On Mar 14, 2004, at 7:47 PM, Maqbool, Nauman wrote: > Hi > > I am new to biopython and am trying out the NCBI Standalone Blast > parser. While trying the blast parsing methods from the cookbook > (parsing standalone Blastn output) I got the following error message: > >>>> ================================ RESTART > ================================ >>>> > > Traceback (most recent call last): > File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 26, in > -toplevel- > testparser(in_file) > File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 11, in > testparser > b_record = b_iterator.next() > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 1332, in next > return self._parser.parse(File.StringHandle(data)) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 557, in parse > self._scanner.feed(handle, self._consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 97, in feed > self._scan_rounds(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 153, in _scan_rounds > self._scan_alignments(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 287, in _scan_alignments > self._scan_pairwise_alignments(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 297, in _scan_pairwise_alignments > self._scan_one_pairwise_alignment(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 309, in _scan_one_pairwise_alignment > self._scan_hsp(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 337, in _scan_hsp > self._scan_hsp_alignment(uhandle, consumer) > File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", > line > 368, in _scan_hsp_alignment > read_and_call(uhandle, consumer.query, start='Query') > File "C:\Python23\Lib\site-packages\Bio\ParserSupport.py", line 300, > in read_and_call > raise SyntaxError, errmsg > SyntaxError: Line does not start with 'Query': > ncbiClient.20040311_1242_9391.log > >>>> > > The version of Blast we are running is: 2.2.6 [Apr-09-2003]. I found a > similar blast parser error in the biopython archives but that was > referring to the output format change in blastx. I don't think that the > Blastn output has changed in the recent past, so it might be due to > something that I might be missing in my script, here is the script that > I am running: > > > from Bio.Blast import NCBIStandalone > > in_file = 'test.blast' > > def testparser(blastfile): > blast_out = open(blastfile, "r") > b_parser = NCBIStandalone.BlastParser() > b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) > > while 1: > b_record = b_iterator.next() > > if b_record is None: > break > > E_VALUE_THRESH = 0.05 > for alignment in b_record.alignments: > for hsp in alignments.hsp: > if hsp.expect < E_VALUE_THRESH: > print '*****Alignmnent*****' > print 'sequence:', alignment.title > print 'length:', alignment.length > print 'e value:', hsp.expect > > # Main > testparser(in_file) > > > Any help will be highly appreciated. > > Regards > > Nauman > > ******************************************** > Nauman J Maqbool PhD > Bioinformatics Group > AgResearch Invermay > Private Bag 50034 > Puddle Alley > Mosgiel > New Zealand > email: nauman.maqbool@agresearch.co.nz > Tel: +64-3-489 9031 > Fax: +64-3-489 3739 > ******************************************** > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From absmythe at ucdavis.edu Mon Mar 15 16:57:48 2004 From: absmythe at ucdavis.edu (ashleigh smythe) Date: Mon Mar 15 17:03:19 2004 Subject: [BioPython] trying to make NBRF dictionary Message-ID: <1079387868.6757.20.camel@nate.ucdavis.edu> Hello. As there seems to be no existing Bio.Fasta-style dictionary code for alignments (Clustalw or NBRF), I thought I'd try to write a simple script using the NBRF iterator to make a dictionary of sequence name:sequence key:value pairs. My ultimate goal is to be able to combine different aligned datasets where the sequence names (taxa) are the same but they are in a different order (otherwise I could just append one to the other). It seemed like a good use of a dictionary, only I'm still pretty lame at python. I thought I'd start with just trying to get one file into a dictionary, and I'm stuck already. My code seems to make a dictionary of sorts, but it behaves like it only has 1 key:value pair rather than 4 (len(mydict) returns 1) and the keys are just my variable name (cur_record.sequence_name), not what I think the keys should be - the actual data I put into the dictionary. I'm guessing that means I have some scope problem. Can anybody please give me some tips on where to go, at least for this first chunk? Here is my script: import Bio from Bio import NBRF mydict={} def makedict(file1): parser=NBRF.RecordParser() first_file=open(file1, 'r') iterator=NBRF.Iterator(first_file, parser) while 1: cur_record=iterator.next() if cur_record is None: break name=cur_record.sequence_name sequence=cur_record.sequence.data mydict[name] = sequence return mydict And here is what I get: >>> seqcombine2.makedict('test.pir') {'9.1Otostrongylus_sp._U81589.1': '----------------------------------------------------------------------------------------------------T-GTC-GA--GTTC-A--CC------TT--C--A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--C--C--A-TTT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCG-ATTA-A-AC-CCTG-AC---T--T-T--T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGGA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-GTTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAGAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CAA---------------GA----TT-----------TT------T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGT--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTG----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----TGGTA----TAGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTG---------------------------------------------------------------------', '813Otostrongylus_circumlitus_A': '-----------------------------------------------------------------------------------GATT-AAGCCATG-CA-T-GTC-GA--GTTC-A--GC------TT--C--A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--C--C--A-TTT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCG-ATTA-A-AC-CCTG-AC---T--T-T--T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGGA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-GTTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAGAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CAA---------------GA----TT-----------TT------T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGT--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTG----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----TGGTA----TAGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTGTAGGTGAACCTGG--------------------------------------------------------', '815Parelaphostrongylus_odocoil': '------------------------------------------------------------------------------------ATT-AAGCCATG-CA-T-GTG-GA--GTTC-A--AC------TT--CA-A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--CC-T--T-AAT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATATGCAT-A-A-AC-CCTG-AC---T--C-TG-T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATATT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTCATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGTA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-ATTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTATA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAAAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CTA---------------GA----T------------ACG-----T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTTCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGC--TGGC---CTA--T----CCAT-TAC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTA----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----CGATA----TGGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-A---------------------------------------------------------------------------------', '804Angiostrongylus_cantonensis': '------------------------------------------------------------------------------------ATT-AAGCCATG-CA-T-GAG-GA--GTTC-A--GC------TT--TA-A----G-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-GATT---TATT-CG--G--AA-A--A-T--CC-T----ATT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCGTAT-A-A-AC-CCTG-AC---T--T-T--C---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGTA--TCCAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTTGC---AT--GCA-AT-G-ATTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAAAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAGT-AATGATTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CTA---------------GA----TT-----------AT------T----GAGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGC--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCG CTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTA----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----CGATA----TGGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTG---------------------------------------------------------------------'} Thanks for any input. Ashleigh From nauman.maqbool at agresearch.co.nz Mon Mar 15 20:41:52 2004 From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman) Date: Mon Mar 15 20:47:25 2004 Subject: [BioPython] Blast parser error Message-ID: Hi everyone I have another (beginner's) question. Is there a way the query title in the header of the Blast report can be returned? By Query title I mean the title of the sequence used as the query for the Blast search. I notice that other objects e.g. from title, length and info about hsps can be returned very easily but returning objects from header, databasereport or parameters is not that straight forward, or is it? Regards Nauman > On Mar 14, 2004, at 7:47 PM, Maqbool, Nauman wrote: > > > Hi > > > > I am new to biopython and am trying out the NCBI Standalone Blast > > parser. While trying the blast parsing methods from the cookbook > > (parsing standalone Blastn output) I got the following error message: > > > >>>> ================================ RESTART > > ================================ //snip > > > > ******************************************** > > Nauman J Maqbool PhD > > Bioinformatics Group > > AgResearch Invermay > > Private Bag 50034 > > Puddle Alley > > Mosgiel > > New Zealand > > email: nauman.maqbool@agresearch.co.nz > > Tel: +64-3-489 9031 > > Fax: +64-3-489 3739 > > ******************************************** ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From lpritc at scri.sari.ac.uk Tue Mar 16 07:00:25 2004 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Tue Mar 16 07:05:59 2004 Subject: [BioPython] Blast parser error In-Reply-To: References: Message-ID: <4056EC59.2060302@scri.sari.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Maqbool, Nauman wrote: | Hi everyone | | I have another (beginner's) question. Is there a way the query title in | the header of the Blast report can be returned? By Query title I mean | the title of the sequence used as the query for the Blast search. | | I notice that other objects e.g. from title, length and info about hsps | can be returned very easily but returning objects from header, | databasereport or parameters is not that straight forward, or is it? | | Regards | | Nauman Hi Nauman, If the record object being returned from the parser is b_record, then the title of the query sequence in the search producing the record is b_record.query (for orientation, alignments are in b_record.alignments, hsps in b_record.alignments[0].hsps and so on). Try dir(b_record) for a list of the attributes of your record. Best, - -- Dr Leighton Pritchard AMRSC D104, PPI, Scottish Crop Research Institute Invergowrie, Dundee, DD2 5DA, Scotland, UK E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml T: +44 (0)1382 568579 F: +44 (0)1382 568578 PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAVuxYL1gZ+OWLpBsRAu8+AJwN6vd2wU/YvLMz/yVKUHMkU2Um2QCfQJNG VS+VEgE3Nd4wuKyk4xig4+0= =ZqmN -----END PGP SIGNATURE----- From chapmanb at uga.edu Wed Mar 17 20:00:29 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 17 20:11:53 2004 Subject: [BioPython] Could BioPython transfer GenBank file to Fasta file like BioPerl? In-Reply-To: <20040227175451.19840.qmail@web12704.mail.yahoo.com> References: <20040227175451.19840.qmail@web12704.mail.yahoo.com> Message-ID: <20040318010029.GA99271@evostick.agtec.uga.edu> Hi Long; > I have a GenBank file, and want to transfer it to > Fasta file. Of course, I can use FeatureParser to get > the "sequence", "id", "description" ..., then write to > a file. > > I want to know if there is a simple command or a > module to do that? In Perl/BioPerl, it is very simple > to transfer files between kinds of file format. We do have a FormatIO system under development which is meant to act much like the BioPerl SeqIO system and make simple format conversions much easier. I wrote up some cookbook style documentation for the system (and for the "by hand" system) you describe above. You can get them from the documentation page: http://biopython.org/documentation/ under "Cookbook-style documentation" and "Converting GenBank (and other formats) to Fasta." Hopefully this is helpful -- please let me know if there are any questions or I can improve the docs. Thanks! Brad From chapmanb at uga.edu Wed Mar 17 20:18:53 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 17 20:30:18 2004 Subject: [BioPython] embl In-Reply-To: <405050B4.5060007@biochem.ucl.ac.uk> References: <405050B4.5060007@biochem.ucl.ac.uk> Message-ID: <20040318011853.GC99271@evostick.agtec.uga.edu> Hi Antonio; > I'm new (and quite confused) to biopython. > I have a simple question (maybe it looks silly): > how do I parse an embl data file using biopython? > Is there any way to retrieve the sequence information (The CDS section)? > What about the position of the CDS sections (they are split in sub pieces)? EMBL support is still lacking in Biopython. Currently we do have the basis for developing a EMBL parser -- there is a Martel (the underyling parsing system in Biopython) grammar for embl. This is located in Bio/expressions/embl/embl65.py. We still do need someone to help do the work to build this grammar into a "Biopython-style" parser. As a workaround, the GenBank parser in Biopython is quite functional and widely used -- so you could fetch your sequences in GenBank format and parse out the features from there, as described in the documentation: http://biopython.org/docs/tutorial/Tutorial004.html#toc13 Hope this helps! Brad From chapmanb at uga.edu Wed Mar 17 20:39:43 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 17 20:51:07 2004 Subject: [BioPython] Is there any more detailed documentation to BioPython? In-Reply-To: <200403110100.i2B106tk013180@portal.open-bio.org> References: <200403110100.i2B106tk013180@portal.open-bio.org> Message-ID: <20040318013943.GF99271@evostick.agtec.uga.edu> Hi Denny; > I use BioPython about one year and it is really > a good programming language. But I think the ONLY drawback > is that BioPython has simple or poor documentation. Thanks for the comments. Definitely I feel the same as you -- documentation is always something where we are lacking. A couple of ways in which I am thinking about trying to improve things are: * Getting a better representation of the documentation that is in the modules. Many of the Biopython docstrings are very useful, but the automatic extraction tool I've been using (Happydoc) hasn't always made me happy. Just last week I was pointed to epydoc: http://epydoc.sourceforge.net/ which I am hearing is much better. So we may be improving like that. * Modularizing the Biopython Tutorial documentation into smaller "cook-book" like sections. Honestly, the Tutorial is getting too big and unwieldy to maintain, and I'm planning on working to section it up into smaller parts that describe individual sections. We have already started doing this with the installation instructions and BioSQL, and recently I've written a couple other smaller bits. The second part -- writing small documentation on doing something with Biopython -- is something we can always use help with. We are definitely looking for Biopython users to contribute here, and I do hope that with an emphasis on writing just a small document that describes something we can get more people to contribute on this front. Thanks for the feedback. Brad From chapmanb at uga.edu Wed Mar 17 20:47:38 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 17 20:59:01 2004 Subject: [BioPython] contig mapping in BioPython In-Reply-To: <6.0.3.0.2.20040424150959.03801db0@wardroper.org> References: <6.0.3.0.2.20040424150959.03801db0@wardroper.org> Message-ID: <20040318014738.GG99271@evostick.agtec.uga.edu> Hi Alan; > I'm thinking about writing some BioPython modules for contig/genome mapping > - something akin to BioPerl's Bio::Assembler::contig - for use in genome > mapping (and whatever else it ends up lending itself to). Cool. This is definitely something we can use. A reasonable set of Python objects to hold contig information, and annotations on the assemblies of contigs, would be excellent. BioPerl does seem a reasonable place to start to look at these objects, especially since they've dealt with some of the messy problems of sequence coordinates along contigs. > Can't find any references to any such projects that are ongoing but would > like to check if anyone else is working on this before I put in too much > time in reinventing more wheels than we need. > Anyone think this would/would not be useful? Definitely useful. As Andy pointed out, the helpful code along these lines that we currently have is in CVS in Bio/Sequencing. These are parsers for Phred and Ace contig files, the latter of which I imagine might be most useful/relevant. But yes, I do offer definite encouragement :-). Please do keep us up to date with ideas/code. Thanks for the mail. Brad From chapmanb at uga.edu Wed Mar 17 21:32:28 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Wed Mar 17 21:43:58 2004 Subject: [BioPython] trying to make NBRF dictionary In-Reply-To: <1079387868.6757.20.camel@nate.ucdavis.edu> References: <1079387868.6757.20.camel@nate.ucdavis.edu> Message-ID: <20040318023228.GJ99271@evostick.agtec.uga.edu> Hi Ashleigh; > Hello. As there seems to be no existing Bio.Fasta-style dictionary code > for alignments (Clustalw or NBRF), I thought I'd try to write a simple > script using the NBRF iterator to make a dictionary of sequence > name:sequence key:value pairs. Okay, this makes good sense. > I'm stuck already. My code seems to make a dictionary of sorts, > but it behaves like it only > has 1 key:value pair rather than 4 (len(mydict) returns 1) and the keys > are just my variable name (cur_record.sequence_name), not what I think > the keys should be - the actual data I put into the dictionary. I'm > guessing that means I have some scope problem. Yes, I think you're right. The output you gave seems to be what you actually want (or at least what you describe you want above) but the code itself does contain a bit of confusion with the mydict dictionary, so it's probably something in the code that we don't see in the example. > mydict={} > > def makedict(file1): > parser=NBRF.RecordParser() > first_file=open(file1, 'r') > iterator=NBRF.Iterator(first_file, parser) > > while 1: > cur_record=iterator.next() > if cur_record is None: > break > name=cur_record.sequence_name > sequence=cur_record.sequence.data > mydict[name] = sequence > > return mydict Okay, that major confusion here is that mydict should be internal to the makedict function. It seems like you would get an UnboundLocalError with the code you posted, so I'm not exactly sure, but guessing your function should look like: def makedict(file1): parser=NBRF.RecordParser() first_file=open(file1, 'r') iterator=NBRF.Iterator(first_file, parser) mydict = {} while 1: cur_record=iterator.next() if cur_record is None: break name=cur_record.sequence_name sequence=cur_record.sequence.data mydict[name] = sequence return mydict Then you should be able to call it without any problem doing something like: file1_dict = makedict("my_file1.nbrf") file2_dict = makedict("my_file2.nbrf") >From the problems you are describing, it should like you are doing something where you reassign mydict because it is used both internally and externally of the function. One of the major problems with using functions (definitely forgive me if I'm being too simplified here) is not having a good grasp of which variables are internal to the function and which are external. In general you want to focus on remembering that the only outside information you should be passing the function is the argument (file1 in this case) and the only information you should get back is what you return (the dictionary in this case). But, I digress. Hope this helps. Brad From thamelry at binf.ku.dk Fri Mar 19 04:08:28 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Fri Mar 19 08:31:49 2004 Subject: [BioPython] PDB header parser In-Reply-To: <20040226001009.GB20365@evostick.agtec.uga.edu> References: <20040226001009.GB20365@evostick.agtec.uga.edu> Message-ID: <200403191008.28748.thamelry@binf.ku.dk> Hi everybody, Thanks to Kristian Rother (again!), Bio.PDB (the Biopython module that deals with macromolecular structure data) now also provides convenient access to a PDB file's header information. The Structure class now has a 'header' attribute which is a dictionary whose keys are the header fields. For example >>> structure.header["structure_method"] x-ray diffraction >>> structure.header["resolution"] 2.2 The code is in the CVS repository. Best regards, --- Thomas Hamelryck Bioinformatik centret Universitetsparken 15 Bygning 10 DK-2100 K?benhavn ? Denmark http://www.binf.ku.dk/users/thamelry/ From thamelry at binf.ku.dk Fri Mar 19 04:23:01 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Fri Mar 19 08:31:50 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <20040212235232.GB2841@evostick.agtec.uga.edu> References: <401AC30A.E639BD2E@ebc.uu.se> <40226C51.F70315B7@ebc.uu.se> <20040212235232.GB2841@evostick.agtec.uga.edu> Message-ID: <200403191023.01597.thamelry@binf.ku.dk> Hi everybody, I recently moved to a new position and of course immediately started to convert my colleagues to (Bio)Python :-). One of the most often asked questions with respect to Biopython here is "Does Biopython have the same functionalities as Bioperl?". Is there a document somewhere that compares BioPerl and BioPython? Would be REALLY useful. Another question: is anybody using BioPerl from BioPython? If so, how? BioCorba? And then some suggestions: I think it's time to do something about the Biopython documentation, and maybe remove some obsolete, incomplete and/or unmaintained code. At the moment it's a bit difficult to get a good overview of what is present and useable in Biopython, I think... Brad, I vaguely remember that you mentioned something about replacing HappyDoc? I'd be happy to help out.... Best regards, --- Thomas Hamelryck Bioinformatik centret Universitetsparken 15 Bygning 10 DK-2100 K?benhavn ? Denmark http://www.binf.ku.dk/users/thamelry/ From sbassi at asalup.org Fri Mar 19 12:06:18 2004 From: sbassi at asalup.org (Sebastian Bassi) Date: Fri Mar 19 12:12:31 2004 Subject: [BioPython] Parsing genbank problem Message-ID: <405B288A.5050902@asalup.org> Hello, I've been trying to parse a gb file to no avail. Here is my code (extracted from biopython cookbook) from Bio import GenBank from Bio.Seq import MutableSeq from Bio.Alphabet import IUPAC from Bio import utils gb_handle = open("f:\\download\\ors.gbk","r") feature_parser=GenBank.FeatureParser() iterator = GenBank.Iterator(gb_handle, feature_parser) while 1: cur_entry=iterator.next() if cur_entry is None: break print "test", cur_entry.id gb_handle.close() In ors.gbk file there are lots of genbank entries (don't post it here because is 800Kb long). Here is what I get: Traceback (most recent call last): File "C:/Program Files/Python22/parseGB.py", line 13, in ? cur_entry=iterator.next() File "C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 183, in next return self._parser.parse(File.StringHandle(data)) File "C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 268, in parse self._scanner.feed(handle, self._consumer) File "C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 1255, in feed self._parser.parseFile(handle) File "C:\PROGRA~1\Python22\Lib\site-packages\Martel\Parser.py", line 338, in parseFile self.parseString(fileobj.read()) File "C:\PROGRA~1\Python22\Lib\site-packages\Martel\Parser.py", line 366, in parseString self._err_handler.fatalError(result) File "C:\PROGRA~1\Python22\Lib\site-packages\_xmlplus\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 1496 From chapmanb at uga.edu Fri Mar 19 12:18:57 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Mar 19 12:30:16 2004 Subject: [BioPython] Parsing genbank problem In-Reply-To: <405B288A.5050902@asalup.org> References: <405B288A.5050902@asalup.org> Message-ID: <20040319171857.GB95219@evostick.agtec.uga.edu> Hi Sebastian; > I've been trying to parse a gb file to no avail. > Here is my code (extracted from biopython cookbook) Your code looks just fine, so the traceback... > Traceback (most recent call last): [...] > ParserPositionException: error parsing at or beyond character 1496 ...indicates that there is a problem with the Martel grammar reading one of the records in your file. I've done a number of fixes to the GenBank parser since the last release, so if you could check things out with the latest CVS that should hopefully fix things. Alternatively (or if you still have problems). You can find out more information about where the parser is failing by initializing your parser with debug_level = 2: feature_parser=GenBank.FeatureParser(debug_level = 2) This will cause Martel to spit out lots of information and likely tell you exactly where things are failing. But the best bet is to get the latest CVS and use that. I'm hoping to push out a new release semi-soon to get the code out there, but CVS is the way to go until then. Hope this helps! Brad From chapmanb at uga.edu Fri Mar 19 12:37:03 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Mar 19 12:48:21 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <200403191023.01597.thamelry@binf.ku.dk> References: <401AC30A.E639BD2E@ebc.uu.se> <40226C51.F70315B7@ebc.uu.se> <20040212235232.GB2841@evostick.agtec.uga.edu> <200403191023.01597.thamelry@binf.ku.dk> Message-ID: <20040319173703.GC95219@evostick.agtec.uga.edu> Hi Thomas; > I recently moved to a new position and of course immediately started > to convert my colleagues to (Bio)Python :-). One of the most often asked > questions with respect to Biopython here is "Does Biopython have the same > functionalities as Bioperl?". Is there a document somewhere that compares > BioPerl and BioPython? Would be REALLY useful. No, not that I know of. Honestly, I am not a big fan of BioPerl/Biopython comparisons just as I'm not a huge fan of Perl/Python comparisons -- I'm all for sticking with what you like and working with it. But definitely if it were useful to people looking at the projects, I'd be for having that kind of document. > Another question: is anybody using BioPerl from BioPython? > If so, how? BioCorba? BioCorba is for all intensive purposes dead. The code still works and all but it was not really being used and I've stopped doing development on it (so I can graduate and all :-). > And then some suggestions: I think it's time to do something about the > Biopython documentation, and maybe remove some obsolete, incomplete and/or > unmaintained code. At the moment it's a bit difficult to get a good overview > of what is present and useable in Biopython, I think... Agreed. About the docs -- as I mentioned the other day, I'm planning to factor out the Tutorial into smaller cookbook-style sections (there is a directory in CVS -- Doc/cookbook, that I've started populating). For this weekend I have in my head to pull out at least the "Working with sequences" section thanks to the feedback I got from Marc, and to pull out and update the Bio.db registries section. I'd certainly welcome help on this front -- from yourself and anyone else. Taking it as quick as it'll go but that's my current plan to keep the useful docs and fix them to be up to date as possible. If you are talking to beginners in Python, it might also be nice to point them to Katja and Catherine's course: http://www.pasteur.fr/recherche/unites/sis/formation/python/ which is linked from the documentation page. This does have a lot of very nice code and explanations for getting started. As far as code goes -- if you have suggestions for modules that are no longer useful and we don't think can be fixed/updated easily please do suggest a plan of action. We can get a survey on whether others use these modules and then decide where to go. It's always a good idea to keep out cruft. > Brad, I vaguely remember that you mentioned something about replacing > HappyDoc? I'd be happy to help out.... Yeah, I will also play around with that this weekend. But, I do think the number one priority on the doc front is extracting the Tutorial into smaller sections. If you want to help on that, it would be great. For instance, the PDB module could have it's own documentation section :-). Thanks for the comments -- I'd be interested to know what others think about the plans, and also very interested in others picking a section of the Tutorial to work on :-). Thanks-for-the-PDB-updates-as-well-ly yr's, Brad From idoerg at burnham.org Fri Mar 19 12:46:34 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri Mar 19 12:54:39 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <200403191023.01597.thamelry@binf.ku.dk> References: <401AC30A.E639BD2E@ebc.uu.se> <40226C51.F70315B7@ebc.uu.se> <20040212235232.GB2841@evostick.agtec.uga.edu> <200403191023.01597.thamelry@binf.ku.dk> Message-ID: <405B31FA.5000509@burnham.org> Hi, Regarding the documentation: how about adopting two models to help keep it up-to-date: 1) CVS the Biopython Book. In that manner, it will be easy for people to insert fixes/updates new entries, etc. etc. See the plone book http://plone.org/documentation CVS on http://sourceforge.net/projects/plone-docs 2) Online comments from users, like in the Zope or MySQL manuals. that would be helpful in identifying glaring gaps in the docs. Thomas, I'm glad to hear you're perfoeming some missionary work as well ...;) Oh, and thanks for the PDB header adition. ./I Thomas Hamelryck wrote: > Hi everybody, > > I recently moved to a new position and of course immediately started > to convert my colleagues to (Bio)Python :-). One of the most often asked > questions with respect to Biopython here is "Does Biopython have the same > functionalities as Bioperl?". Is there a document somewhere that compares > BioPerl and BioPython? Would be REALLY useful. > > Another question: is anybody using BioPerl from BioPython? > If so, how? BioCorba? > > And then some suggestions: I think it's time to do something about the > Biopython documentation, and maybe remove some obsolete, incomplete and/or > unmaintained code. At the moment it's a bit difficult to get a good overview > of what is present and useable in Biopython, I think... > > Brad, I vaguely remember that you mentioned something about replacing > HappyDoc? I'd be happy to help out.... > > Best regards, > > --- > Thomas Hamelryck > Bioinformatik centret > Universitetsparken 15 > Bygning 10 > DK-2100 K?benhavn ? > Denmark > http://www.binf.ku.dk/users/thamelry/ -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From anunberg at oriongenomics.com Fri Mar 19 13:29:51 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Fri Mar 19 14:15:11 2004 Subject: [BioPython] Problem parsing genbank file Message-ID: I just updated from cvs and got this error when trying to parse a genbank file that had mutliple genbank files in it, I got this error : Traceback (most recent call last): File "/loginhome/anunberg/bin/bac_hits.py", line 239, in ? main() File "/loginhome/anunberg/bin/bac_hits.py", line 98, in main seq_record = iterator.next()# go through each record File "/compbio/lib/python/Bio/GenBank/__init__.py", line 130, in next return self._parser.parse(File.StringHandle(data)) File "/compbio/lib/python/Bio/GenBank/__init__.py", line 220, in parse self._scanner.feed(handle, self._consumer) File "/compbio/lib/python/Bio/GenBank/__init__.py", line 1248, in feed self._parser.parseFile(handle) File "/compbio/lib/python/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/compbio/lib/python/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/local/lib/python2.3/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 20805 The parser seems to work if the genbank file only has one record I will suggest it again, PLEASE PLEASE PLEASE tag the code in cvs so I can revert to stable versions easily. I am now using biopython regularly and I am on a bit of schedule for some of this work. Updating code is fine however tagging it will save some headaches.. -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From jeffrey_chang at stanfordalumni.org Fri Mar 19 14:23:57 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Fri Mar 19 14:29:23 2004 Subject: [BioPython] Problem parsing genbank file In-Reply-To: References: Message-ID: On Mar 19, 2004, at 1:29 PM, Andrew Nunberg wrote: > I will suggest it again, PLEASE PLEASE PLEASE tag the code in cvs so I > can > revert to stable versions easily. > I am now using biopython regularly and I am on a bit of schedule for > some of > this work. Updating code is fine however tagging it will save some > headaches.. We have tagged all the releases, back to the beginning. You should be able to revert to any of the following releases: symbolic names: biopython-124: 1.8 biopython-123: 1.7 biopython-122: 1.7 biopython-121: 1.6 biopython-120: 1.6 biopython-110: 1.5 biopython-100a4: 1.4 biopython-100a3: 1.4 biopython-100a2: 1.4 biopython-100a1: 1.3 biopython-090d02: 1.1 biopython-090d01: 1.1 If I recall, though, the tag for one of the very early releases got lost somewhere along the way... :( Jeff From thamelry at binf.ku.dk Fri Mar 19 15:15:49 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Fri Mar 19 15:27:18 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <405B31FA.5000509@burnham.org> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <405B31FA.5000509@burnham.org> Message-ID: <200403192115.49577.thamelry@binf.ku.dk> On Friday 19 March 2004 18:46, Iddo Friedberg wrote: > 1) CVS the Biopython Book. In that manner, it will be easy for people to > insert fixes/updates new entries, etc. etc. See the plone book That sounds like a good idea... Didn't it use to be in the CVS? I can't find it anymore... > Oh, and thanks for the PDB header adition. Thanks should go to Kristian Rother (he also donated a module to download PDB files and keep a local PDB database up-to-date previously). Nice! -Thomas From thamelry at binf.ku.dk Fri Mar 19 15:51:05 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Fri Mar 19 16:24:04 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <20040319173703.GC95219@evostick.agtec.uga.edu> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> Message-ID: <200403192151.05004.thamelry@binf.ku.dk> On Friday 19 March 2004 18:37, Brad Chapman wrote: > No, not that I know of. Honestly, I am not a big fan of > BioPerl/Biopython comparisons just as I'm not a huge fan of > Perl/Python comparisons I agree, but still, it would be handy to have a page that lists BioPerl and Biopython features so that people could decide what they want to use for a certain purpose. I'm not suggesting a 'BioPython is better than BioPerl page'. That is indeed pointless. > As far as code goes -- if you have suggestions for modules that are > no longer useful and we don't think can be fixed/updated easily > please do suggest a plan of action. We can get a survey on whether > others use these modules and then decide where to go. It's always a > good idea to keep out cruft. We could make a list of modules that will be potentially removed, post it to the biopython list, and then actually remove them when no-one objects. Is anybody using the two HMMs (HMM and MarkovModel) for instance? Or the support vector machine (SVM) and NeuralNetwork modules? The xKMeans, KNN and KMeans clustering modules also seem to be obsolete in view of Michiel de Hoons clustering module. > Yeah, I will also play around with that this weekend. But, I do > think the number one priority on the doc front is extracting the > Tutorial into smaller sections. If you want to help on that, it > would be great. For instance, the PDB module could have it's own > documentation section :-). That's coming up! I am very much in favor of automatically generated documentation. Each module should at least have 5-10 lines or so in api/Bio/index.html that describe what the module actually does! What are you thinking of using in the future? I must admit that the HappyDoc requirements for generating good and readable descriptions are a bit of a mystery to me.... Bio.PDB looks very ugly (my fault, probably). I'd sure like to improve that (especially since Bio.PDB is actually pretty well commented code). In any case, thanks a lot for all the work you put into Biopython! -Thomas From mdehoon at ims.u-tokyo.ac.jp Sat Mar 20 01:29:55 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 20 01:35:31 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <200403192151.05004.thamelry@binf.ku.dk> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> Message-ID: <405BE4E3.6000103@ims.u-tokyo.ac.jp> Thomas Hamelryck wrote: > We could make a list of modules that will be potentially removed, post it to > the biopython list, and then actually remove them when no-one objects. Is > anybody using the two HMMs (HMM and MarkovModel) for instance? Or the > support vector machine (SVM) and NeuralNetwork modules? The xKMeans, > KNN and KMeans clustering modules also seem to be obsolete in view of Michiel > de Hoons clustering module. The xKMeans and KMeans can be considered obsolete, as they are included in Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are currently not obsolete, as they contain supervised learning methods, which are not included in Bio.Cluster. I am not sure what the purpose is of the GA (Genetic Algorithm Neural Network) module and the NeuralNetwork module. Are they the same? Is their usage described somewhere? After cleaning up the modules, it may be a good idea to set up some kind of unified way to deal with gene expression data in Biopython. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From thamelry at binf.ku.dk Sat Mar 20 02:54:09 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Sat Mar 20 21:53:17 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <405BE4E3.6000103@ims.u-tokyo.ac.jp> References: <401AC30A.E639BD2E@ebc.uu.se> <200403192151.05004.thamelry@binf.ku.dk> <405BE4E3.6000103@ims.u-tokyo.ac.jp> Message-ID: <200403200854.09983.thamelry@binf.ku.dk> On Saturday 20 March 2004 07:29, Michiel Jan Laurens de Hoon wrote: > I am not sure what the purpose is of the GA (Genetic Algorithm Neural > Network) module and the NeuralNetwork module. Are they the same? Is their > usage described somewhere? They are not the same. GA is a genetic algorithm framework and NeuralNetwork is a neural network (which seems to have some special features to deal with genes as input). They are both potentially interesting (providing that they actually work) but it's a complete mystery how they are to be used. Does anybody know who implemented these modules? -Thomas From chapmanb at uga.edu Sat Mar 20 11:32:02 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 20 22:51:16 2004 Subject: [BioPython] Problem parsing genbank file In-Reply-To: References: Message-ID: <20040320163202.GF95219@evostick.agtec.uga.edu> Hi Andy; > I just updated from cvs and got this error when trying to parse a genbank > file that had mutliple genbank files in it, I got this error : > Traceback (most recent call last): [...] > Martel.Parser.ParserPositionException: error parsing at or beyond character > 20805 Thanks for sending me the file separately. It actually looks like one of the records in the file: AF124045, was somehow corrupted. The region where the parser fails looks like: misc_feature <38880..>39000 /note="putative breakpoint of recombination in orthologous maize region, 38875 bp is the end of homology, >38875-50877repeat_region join(<38904..38924,38960..>39022) /note="CT-rich stretches" /evidence=not_experimental where in the original file (from NCBI), it looks like: misc_feature <38880..>39000 /note="putative breakpoint of recombination in orthologous maize region, 38875 bp is the end of homology, >38875-50877< region missing in maize; Region: Breakpoint" /evidence=not_experimental repeat_region join(<38904..38924,38960..>39022) /note="CT-rich stretches" /evidence=not_experimental So somehow it looks like the text from "< region missing in" to the next feature key (repeat_region) was deleted. I've not seen something like this before, but the best solution seems to be to re-download this record and try parsing it all again. All of the other records in your file seem to parse fine. Hope this helps. Brad From chapmanb at uga.edu Sun Mar 21 12:46:05 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sun Mar 21 12:57:31 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <200403192151.05004.thamelry@binf.ku.dk> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> Message-ID: <20040321174605.GA18818@evostick.agtec.uga.edu> Hey all; Great discussion on things. I'll try to touch on all the points in one e-mail; apologies for the length. [Automated documentation generation] Thomas: > What are you thinking of using in the future? I must admit that the HappyDoc > requirements for generating good and readable descriptions are a bit of a > mystery to me.... Bio.PDB looks very ugly (my fault, probably). I'd sure like > to improve that (especially since Bio.PDB is actually pretty well commented > code). Yes, I've not been a fan of HappyDoc for a while. I was pointed to, and really like, epydoc. Please take a look at: http://biopython.org/docs/api/private/trees.html and let me know what you think. I am a big fan of the new output, and we can just pull out of the text documentation of modules, classes and functions without having to try and format it up in some pretty way. I made a number of small modifications to the docs to get them to look nicer under this system. I'd like to stick with epydoc unless people have objections. I added some documentation to the end of the contributing guidelines to describe the simple things you can do to make your modules, classes and functions be maximally useful with epydoc: http://biopython.org/docs/developer/contrib.html [Non-automated documentation (someone has to write it style)] Iddo: > Regarding the documentation: how about adopting two models to help keep > it up-to-date: > > 1) CVS the Biopython Book. In that manner, it will be easy for people to > insert fixes/updates new entries, etc. etc. See the plone book > > http://plone.org/documentation The documentation is in CVS -- Docs/Tutorial.tex and Docs/cookbook for all the new cookbook stuff (one directory per example there). As far as getting a framework like Plone in place, I honestly am not sure I am really for that. I do think it is a good idea, but our attempts at the Wiki in the past have really soured me on "fancier" ways to generate documentation. Really, what I'd like to see is contributions from people in the new cookbook style. This requires no need to learn any type of system -- I'm happy to accept docs in plain text, html, pdf -- anything that will be viewable on the web. So people can write documentation however they feel comfortable. > 2) Online comments from users, like in the Zope or MySQL manuals. that > would be helpful in identifying glaring gaps in the docs. This is a good idea, but also along the lines of my biases against trying to be fancier than we need to be. Honestly, the user bases of Zope or MySQL dwarf those of Biopython (although we are catching up fast :-) and I don't want to put the cart before the horse (or however that cliche goes). But those are just my opinions -- I can always be convinced otherwise :-). [Removal/Deprecation of modules] Thomas: > We could make a list of modules that will be potentially removed, post it to > the biopython list, and then actually remove them when no-one objects. Is > anybody using the two HMMs (HMM and MarkovModel) for instance? Or the > support vector machine (SVM) and NeuralNetwork modules? Is potential non-use (or trying to assess non-use) really a good model to remove modules? If they work and are decently coded then I think they have a potential use -- I definitely do know that a lot of the different supervised learning methods are useful to people doing clustering of literature (which is what I'm pretty positive Jeff worked on for his thesis). If things don't work, or are duplicated, then I'm in favor of trying to get rid of that, but working code seems useful to me. Thomas: > The xKMeans, > KNN and KMeans clustering modules also seem to be obsolete in view of Michiel > de Hoons clustering module. Michiel: > The xKMeans and KMeans can be considered obsolete, as they are included in > Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are > currently not obsolete, as they contain supervised learning methods, which > are not included in Bio.Cluster. If things are duplicated then the right thing to do is to remove the duplication. I'd like to consider two things, though: 1. I'd like Jeff to chime in since these are his modules (I think). I don't have enough knowledge about clustering to know if Bio.Cluster also does the things that he needed his code to do. 2. We want to make sure to be careful about back-compatibility. If we decide to remove things, I'd like to first have them raise DeprecationWarnings for a couple of releases so that people have time to change their code -- and also have some quick docs about how to change from new to old. Breaking code is bad, and I want to make it as easy as possible for people to keep up with changes. Thomas: > GA is a genetic algorithm framework and NeuralNetwork > is a neural network (which seems to have some special features to deal with > genes as input). They are both potentially interesting (providing that they > actually work) but it's a complete mystery how they are to be used. Does > anybody know who implemented these modules? Yup, I did. I don't think they are perfect by any means, but they should still work (all of the tests still pass, at least) and be useful. I honestly don't use them much myself anymore since I finished the project I used them for. But, they do need documentation. Michiel: > After cleaning up the modules, it may be a good idea to set up some kind of > unified way to deal with gene expression data in Biopython. Definitely -- I would welcome this. I nominate you to be in charge :-). [Miscellaneous bits] Me: > > No, not that I know of. Honestly, I am not a big fan of > > BioPerl/Biopython comparisons just as I'm not a huge fan of > > Perl/Python comparisons Thomas: > I agree, but still, it would be handy to have a page that lists BioPerl and > Biopython features so that people could decide what they want to use for a > certain purpose. I agree. I'd be happy to accept a document like this :-). Thanks again for everyone's comments! Brad From jeffrey_chang at stanfordalumni.org Sun Mar 21 16:30:09 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Sun Mar 21 16:35:34 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <20040321174605.GA18818@evostick.agtec.uga.edu> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> Message-ID: On Mar 21, 2004, at 12:46 PM, Brad Chapman wrote: > Yes, I've not been a fan of HappyDoc for a while. I was pointed to, > and really like, epydoc. Please take a look at: > > http://biopython.org/docs/api/private/trees.html This looks very nice to me. Is there any way to ask it to hide private methods or variables, i.e. those that begin with "_"? Although knowing what those are is occasionally useful, exposing that extra information may be confusing for people reading the docs and trying to figure out how to use the module. >> 2) Online comments from users, like in the Zope or MySQL manuals. that >> would be helpful in identifying glaring gaps in the docs. > > This is a good idea, but also along the lines of my biases against > trying to be fancier than we need to be. Honestly, the user bases of > Zope or MySQL dwarf those of Biopython (although we are catching up > fast :-) and I don't want to put the cart before the horse (or > however that cliche goes). Agreed. People did not take to our wiki, and bioperl doesn't use theirs much either. > [Removal/Deprecation of modules] > Thomas: >> We could make a list of modules that will be potentially removed, >> post it to >> the biopython list, and then actually remove them when no-one >> objects. Is >> anybody using the two HMMs (HMM and MarkovModel) for instance? Or the >> support vector machine (SVM) and NeuralNetwork modules? > > Is potential non-use (or trying to assess non-use) really a good > model to remove modules? If they work and are decently coded then I > think they have a potential use -- I definitely do know that a lot of > the > different supervised learning methods are useful to people doing > clustering of literature (which is what I'm pretty positive Jeff > worked on for his thesis). > > If things don't work, or are duplicated, then I'm in favor of trying > to get rid of that, but working code seems useful to me. > > Thomas: >> The xKMeans, >> KNN and KMeans clustering modules also seem to be obsolete in view of >> Michiel >> de Hoons clustering module. > > Michiel: >> The xKMeans and KMeans can be considered obsolete, as they are >> included in >> Bio.Cluster. The KNN and other modules under Bio/Tools/Classification >> are >> currently not obsolete, as they contain supervised learning methods, >> which >> are not included in Bio.Cluster. > > If things are duplicated then the right thing to do is to remove the > duplication. I'd like to consider two things, though: > > 1. I'd like Jeff to chime in since these are his modules (I think). > I don't have enough knowledge about clustering to know if > Bio.Cluster also does the things that he needed his code to do. kMeans is superceded by Bio.Cluster, and can be deprecated. Thomas wrote xkMeans, which is a visualizer for kMeans, and could be rewritten to use Bio.Cluster instead. MarkovModel is redundant with HMM. Probably only one of them is necessary. SVM is superceded by libsvm. It should be deprecated. kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but need more documentation. Also, another idea is that they could be donated to the pyml project. Currently, no code in Biopython depends on them. However, they might be useful for a microarray package, in which case donating them would introduce another dependency. Jeff From chapmanb at uga.edu Mon Mar 22 18:06:54 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Mon Mar 22 18:18:13 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> Message-ID: <20040322230654.GF22666@evostick.agtec.uga.edu> Hey Jeff and everyone; Me: > >Yes, I've not been a fan of HappyDoc for a while. I was pointed to, > >and really like, epydoc. Please take a look at: > > > >http://biopython.org/docs/api/private/trees.html > > This looks very nice to me. Is there any way to ask it to hide private > methods or variables, i.e. those that begin with "_"? Although knowing > what those are is occasionally useful, exposing that extra information > may be confusing for people reading the docs and trying to figure out > how to use the module. Good points. The one I have linked to is actually the version that includes private variables. If you subsitution public for private in the url above (or just click "hide private") at the top the private functions. The problem I've had so far is that epydoc hides some public modules by labelling them as private. I hadn't figured exactly sure how it decides what is public and what is private in terms of modules (it seems to use the _Underscore for classes and functions, which I'm happy with). I played around with it a bit since then and it looks like it was using the __all__ variable to determine what it public and private. To be honest, I'd like to remove the use of __all__ completely unless people object. Unless I'm mistaken it controls what happens when people do from Bio import * (or from Bio.Whatever import *). Doing the import * is pretty discouraged now, and for maintenence it is fairly annoying to have variables you have to make sure are updated. Would anyone object to stop using __all__? Any reasons to keep it? I may be missing the point of it completely. > kMeans is superceded by Bio.Cluster, and can be deprecated. Thomas > wrote xkMeans, which is a visualizer for kMeans, and could be rewritten > to use Bio.Cluster instead. Okay. I guess this would involve a couple of steps: 1. Starting to raise a Deprecation Warning for the kMeans module. 2. Trying to write some kind of short document on how to switch from using kMeans to using Bio.Cluster.kcluster. BioPerl has a document called DEPRECATED with this kind of info -- that seems like a reasonable step to follow. Jeff and Michiel, would it be possible to write something up quick. 3. Thomas needs to decide if he wants to rewrite xkMeans or deprecate it as well. Also, Thomas did mention the potential usefulness of having both pure Python and Python/C implementation, in case someone wanted to use the code for learning purposes. I'm not sure how much this weighs on people's minds versus maintaining a slimmer code base. It does seem to me like duplicate versions are a bad for confusion issues, and because we have limited developer time to maintain and document things. Anyways, just a point to bring up. > MarkovModel is redundant with HMM. Probably only one of them is > necessary. Okay, I wrote HMM a long time ago and really haven't used it much since then. I think you wrote MarkovModel. Both have tests and things. MarkovModel has the serious advantage of having a C module underlying it, which I think makes it the best candidate for keeping. I'd be very happy if we could get a volunteer to look at these and decide if one has more functionality then the other, and then move forward on this. Anyone excited about volunteering? If I can't get someone, I can try to look at this myself (but not real soon). > SVM is superceded by libsvm. It should be deprecated. > > kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, > but need more documentation. Also, another idea is that they could be > donated to the pyml project. Currently, no code in Biopython depends > on them. However, they might be useful for a microarray package, in > which case donating them would introduce another dependency. Ah, I didn't know about PyML. It does seem like it would be useful to try and coordinate with their project -- do you happen to know the author (Stanford connections and all)? Other candidates for donation are the recently discussed GA and Neural Network packages. Lots of thoughts. I think for the next release (which I'd like to try and do soon-like) I think we should work on the kMeans code as a priority and go from there. Brad From jeffrey_chang at stanfordalumni.org Mon Mar 22 21:56:15 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Mon Mar 22 22:01:41 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <20040322230654.GF22666@evostick.agtec.uga.edu> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> <20040322230654.GF22666@evostick.agtec.uga.edu> Message-ID: On Mar 22, 2004, at 6:06 PM, Brad Chapman wrote: [Jeff] >> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, >> but need more documentation. Also, another idea is that they could be >> donated to the pyml project. Currently, no code in Biopython depends >> on them. However, they might be useful for a microarray package, in >> which case donating them would introduce another dependency. > > Ah, I didn't know about PyML. It does seem like it would be useful > to try and coordinate with their project -- do you happen to know the > author (Stanford connections and all)? Other candidates for donation > are the recently discussed GA and Neural Network packages. Yes, I've talked to Asa about merging this machine learning code into pyml, and he seemed open to the idea. However, it looked like it would be a bit of work to port things over to the pyml style of doing things. OTOH, it's still a bit of work getting the code documented up to the point where it is generally useful... Jeff From mdehoon at ims.u-tokyo.ac.jp Mon Mar 22 22:51:19 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 22 22:57:12 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> Message-ID: <405FB437.5020300@ims.u-tokyo.ac.jp> Thomas: > The xKMeans, KNN and KMeans clustering modules also seem to be obsolete in > view of Michiel de Hoons clustering module. > Michiel: > The xKMeans and KMeans can be considered obsolete, as they are included in > Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are > currently not obsolete, as they contain supervised learning methods, which > are not included in Bio.Cluster. Jeffrey Chang wrote: > kMeans is superceded by Bio.Cluster, and can be deprecated. Thomas wrote > xkMeans, which is a visualizer for kMeans, and could be rewritten to use > Bio.Cluster instead. > Jeffrey Chang wrote: > kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but > need more documentation. Also, another idea is that they could be donated to > the pyml project. Currently, no code in Biopython depends on them. However, > they might be useful for a microarray package, in which case donating them > would introduce another dependency. Okay. I guess this would involve a couple > of steps: Brad: > 1. Starting to raise a Deprecation Warning for the kMeans module. 2. Trying > to write some kind of short document on how to switch from using kMeans to > using Bio.Cluster.kcluster. BioPerl has a document called DEPRECATED with > this kind of info -- that seems like a reasonable step to follow. Jeff and > Michiel, would it be possible to write something up quick. 3. Thomas needs to > decide if he wants to rewrite xkMeans or deprecate it as well. Michiel again: 1. OK. 2. OK I'll work on that. 3. If I understand correctly, the xkMeans module provides a visualization of the progress of the k-means clustering algorithm by showing the cluster sizes. If so, it would not be clear how to switch that to using the kcluster in Bio.Cluster. One of the key points in Bio.Cluster's kcluster is that it automatically repeats the k-means algorithm starting from different initial (random) clusterings. For the kMeans module, I assume it performs one run of the k-means algorithm, for which the visualization in xkMeans make sense. For repeated k-means runs, such a visualization may not be as useful. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From mdehoon at ims.u-tokyo.ac.jp Mon Mar 22 22:59:20 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 22 23:04:54 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> Message-ID: <405FB618.1070901@ims.u-tokyo.ac.jp> Jeffrey Chang wrote: > kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but > need more documentation. Also, another idea is that they could be donated to > the pyml project. Currently, no code in Biopython depends on them. However, > they might be useful for a microarray package, in which case donating them > would introduce another dependency. Biopython and pyml are likely to have different goals for these routines. In particular a routine as kNN is quite useful for microarray data analysis, and I expect that the routine will get updated over time to be better suited for biological (microarray or other) data analysis. So I would suggest to keep these routines in Biopython and to continue working on them. We can still donate these routines to pyml also, but I would expect that e.g. the Biopython-kNN and the pyml-kNN will diverge over time to be most suited for each package's requirements. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From letondal at pasteur.fr Tue Mar 23 01:49:37 2004 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue Mar 23 01:55:00 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: Your message of "Mon, 22 Mar 2004 18:06:54 EST." <20040322230654.GF22666@evostick.agtec.uga.edu> Message-ID: <200403230649.i2N6nbUe239936@electre.pasteur.fr> Hi, > > >http://biopython.org/docs/api/private/trees.html This document is absolutely useful! Something that could be useful too would be to have an example in the documentation of the modules (available by pydoc or the Web page as the one at this url). Something like the Synopsis in bioperl modules, where the rule is that you can cut and paste the example and it is supposed to work. Maybe, also: including full examples where interactions between several classes are explained could be useful. Having recently teached biopython to biologists, I observed this was the most difficult: which class is playing which role - that's quite complex. -- Catherine Letondal -- Pasteur Institute Computing Center From anunberg at oriongenomics.com Tue Mar 23 12:51:40 2004 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Tue Mar 23 13:25:25 2004 Subject: [BioPython] GFF parser? Message-ID: Hi, I was looking through Biopython looking for a GFF library that does parsing and creates a seq feature object. I did find a GFF module but I wasn?t sure what it was doing Andy -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From aaron at ocelot-atroxen.dyndns.org Thu Mar 25 01:14:56 2004 From: aaron at ocelot-atroxen.dyndns.org (Aaron Zschau) Date: Thu Mar 25 01:20:17 2004 Subject: [BioPython] parsing blast results for use in clustal Message-ID: I'm new to biopython and python in general. I am trying to take the results from a blast search to feed into a clustal multiple alignment. I followed the cookbook tutorials and can get results from blast but parsing into a file that clustal can read is giving me some trouble. (my current code prints all results under the e_value threshold with index numbers, and I then take user input to take the selected results and print them into a file) from what I can tell, some of the title records in my blast results have newline characters in them and are causing my resulting file to throw up seg faults when it runs in clustal. is there an easier way to send selected blast results to a file that clustal can easily read? thanks in advance, Aaron Zschau from Bio.Blast import NCBIWWW b_parser = NCBIWWW.BlastParser() b_record = b_parser.parse(blast_results) index = 0 E_VALUE_THRESH = 0.01 for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print "[" + str(index) + "] " + alignment.title + " " + hsp.match[0:20] + '...' index = index + 1 output_file = open('clustal-in', 'w') while 1: input = raw_input("\nEnter the index of the next sequence to align or 'a' to align\n") if input=='a': break else: output_file.write(b_record.alignments[int(input)].title[0:5] + " " + b_record.alignments[int(input)].hsps[0].match[0:alignment.length] + "\n") output_file.close() From pal at cbu.uib.no Thu Mar 25 06:06:29 2004 From: pal at cbu.uib.no (Paal Puntervoll) Date: Thu Mar 25 06:11:52 2004 Subject: [BioPython] PHI-BLAST support? Message-ID: <20040325110629.GB6048@svartfuru.ii.uib.no> Hi, I'm wondering whether BioPython supports doing PHI-BLAST searches (locally), and whether parsing of PHI-BLAST output information such as pattern positions is supported (see below)? Excerpet from PHI-BLAST output ------- Query: 541 SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA 600 pattern 557 **** SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA Sbjct: 541 SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA 600 ------- btw, here's the command I issue to run PHI-BLAST from command-line: blastpgp -i [infile] -k [patternfile] -p patseedp P?l -- P?l Puntervoll Computational Biology Unit Bergen Centre for Computational Science University of Bergen Phone: +47 555 84040 From pwilkinson at videotron.ca Fri Mar 26 00:04:23 2004 From: pwilkinson at videotron.ca (Peter Wilkinson) Date: Fri Mar 26 00:09:43 2004 Subject: [BioPython] Martel Question Message-ID: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca> I have just built a parser for Quantarray ... but not with Martel. I did this as warm up for a pile of code that I need to write for a microarray project (~3000 arrays). It has been some time since I have written any code, and I need to "get into it". This parser is built with the typical scanner consumer model philosophy, built on a state machine that will handle the quantarray output files. This parser was not built to load anything into the memory. It was meant to be as fast as possible to transform the original file into a new format written to disk :I was processing many files 5mb each x 3000... . So each line was processed from the input stream and processed and discarded. Eventually I will be building matrices of 19000 genes by 3000 samples from the quantarray files that I will be reading and I need something that can load into a Quantarray Record object, however I was a little worried about the Record sizes. There is only 1 record per file, which might be the saving grace. When I was parsing Genbank genomic files (many megs), the Genbank parser was slowing to a crawl (and required piles of memory); 1 genomic record per file. I would like to know if Martel scales to processing 5mb Records at a time, if the entire file is is in memory? Has Martel been improved over the last few months in the regard ... I may have a need to parse large Genbank NT records again. In Dalke's Martel paper it reads "Similarly, it should be possible to read data from the input stream only when required, so that overall memory footprint stays low. " is that still to be done ? Peter From chapmanb at uga.edu Thu Mar 25 19:39:02 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Mar 26 00:52:23 2004 Subject: [BioPython] parsing blast results for use in clustal In-Reply-To: References: Message-ID: <20040326003902.GA24957@misterbd.agtec.uga.edu> Hi Aaron; > I'm new to biopython and python in general. I am trying to take the > results from a blast search to feed into a clustal multiple alignment. > I followed the cookbook tutorials and can get results from blast but > parsing into a file that clustal can read is giving me some trouble. Okay, so if I'm understanding you correctly, what you want is a file that you can put into Clustalw to do an alignment. From the code you supplied, it looks like what you are printing out is clustalw aln output -- the results from an alignment. If I can try and extrapolate, what you probably want to do is retrieve the FASTA record for the hit and then write this to a file -- then subsequently use clustalw for the alignment. If I'm at all interpreting you correctly, then you can do this quite readily. Since it's NCBIWWW, I'll assume you are BLASting against some kind of standard NCBI database. Then you'll just need to split the title of the hit to get out the GI or accession number. With this, you can retrieve the corresponding full length FASTA record from NCBI with code like: >>> accession = "AAN04997.1" >>> from Bio import GenBank >>> dict = GenBank.NCBIDictionary(format = "fasta") >>> rec = dict[accession] >>> print rec >gi|22725997|gb|AAN04997.1| putative transcription initiation factor [Oryza sativa (japonica cultivar-group)] MGSADLVLKAACEGCGSPSDLYGTSCKHTTLCSSCGKSMALSGARCLVCSAPITNLIREYNVRANATTDK SFSIGRFVTGLPPFSKKKSAENKWSLHKEGLQGRQIPENMREKYNRKPWILEDETGQYQYQGQMEGSQSS TATYYLLMMHGKEFHAYPAGSWYNFSKIAQYKQLTLEEAEEKMNKRKTSATGYERWMMKAATNGPAAFGS DVKKLEPTNGTEKENARPKKGKNNEEGNNSDKGEEDEEEEAARKNRLALNKKSMDDDEEGGKDLDFDLDD EIEKGDDWEHEETFTDDDEAVDIDPEERADLAPEIPAPPEIKQDDEENEEEGGLSKSGKELKKLLGKAAG LNESDADEDDEDDDQEDESSPVLAPKQKDQPKDEPVDNSPAKPTPSGHARGTPPASKSKQKRKSGGGDDS KASGGAASKKAKVESDTKPSVAKDETPSSSKPASKATAASKTSANVSPVTEDEIRTVLLAVAPVTTQDLV SRFKSRLRGPEDKNAFAEILKKISKIQKTNGHNYVVLRDDKK So the returned record is a string FASTA record and you can replace your output_file.write(...) code with: output_file.write(rec) and then end up with a file full of FASTA sequences, which clustalw will take as input to do a subsequent alignment. If you wanted to trim the sequence to the length of the hit, you could parse the Fasta result you retrieve: >>> from Bio import Fasta >>> fasta_parser = Fasta.RecordParser() >>> import StringIO >>> fasta_rec = fasta_parser.parse(StringIO.StringIO(rec)) Manipulate the sequence: >>> fasta_rec.sequence = fasta_rec.sequence[20:70] And then write this out to your file: output_file.write(str(rec) + "\n") Hope some of that helped! Brad From MBATESALANN at netscape.net Fri Mar 26 02:41:26 2004 From: MBATESALANN at netscape.net (MBATESALANN@netscape.net) Date: Fri Mar 26 03:39:50 2004 Subject: [BioPython] REPLY BACK Message-ID: <0HV6001CVAP0IM@morfeus.helvetia.edu.co> Dear Friend, As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer. It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my business. Though I am very rich, I was never generous, I was always hostile to people and only focused on my business as that was the only thing I cared for. But now I regret all this as I now know that there is more to life than just wanting to have or make all the money in the world. I believe when God gives me a second chance to come to this world I would live my life a different way from how I have lived it. Now that God has called me, I have willed and given most of my property and assets to my immediate and extended family members as well as a few close friends. I want God to be merciful to me and accept my soul so, I have decided to give alms to charity organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed money to some charity organizations in the U.A.E, Algeria and Malaysia. Now that my health has deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one of my accounts and distribute the money which I have there to charity organization in Bulgaria and Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as they seem not to be contended with what I have left for them. The last of my money which no one knows of is the huge cash deposit of eighteen million dollars $18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations. I have set aside 10% for you and for your time. God be with you. BATES ALAN From idoerg at burnham.org Fri Mar 26 16:31:52 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri Mar 26 16:37:37 2004 Subject: [BioPython] Prothon Message-ID: <4064A148.9020405@burnham.org> Slightly off topic, but people on this list might be interested: http://www.prothon.org From the homepage (I have nothing to do with these guys, just copied & pasted): ``Prothon is a fresh new language that gets rid of classes altogether in the same way that Self does and regains the original practical and fun sensibility of Python. This major improvement plus many minor ones make for a clean new revolutionary break in language development. Prothon is quite simple and yet offers the power of Python and Self. Prothon is also an industrial-strength alternative to Python and Self. Prothon uses native threads and a 64-bit architecture to maximize performance in applications such as multiple-cpu hosting.'' ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From shak91 at comcast.net Sat Mar 27 06:31:59 2004 From: shak91 at comcast.net (shak91@comcast.net) Date: Sat Mar 27 09:09:34 2004 Subject: [BioPython] Play George W. Bush Credibility Twister Message-ID: <200403271409.i2RE9Vg2021262@portal.open-bio.org> Sonny Emerson wants you to know about President Bush's deception on Iraq. WHITE HOUSE PLAYS "TWISTER" WITH THE TRUTH -- NOW YOU CAN, TOO! President Bush and his administration have been twisting the truth when it comes to Iraq. Can you figure out the truth? Join in the truth-twisting action by playing our new game: George W. Bush Credibility Twister! http://www.democrats.org/truth/twister.html?s=taf FACTS AND FUN FOR THE WHOLE FAMILY George W. Bush Credibility Twister is an interactive, online game that exposes the facts with the click of a mouse: just how much the president, his advisers, and key agencies knew about the phony claim before it made it into Bush's State of the Union address. Play George W. Bush Credibility Twister today to learn the facts about President Bush! http://www.democrats.org/truth/twister.html?s=taf HOLD BUSH ACCOUNTABLE: DEMAND AN INVESTIGATION After you've played the game, sign the online petition demanding an independent, bipartisan investigation into Bush's statement and the intelligence his administration used. http://www.democrats.org/truth/twister.html?s=taf We will deliver your comments to Bush and Congressional leaders and tell them you want the full truth about Bush's deception. http://www.democrats.org/truth/twister.html?s=taf From pieter at laeremans.org Mon Mar 29 18:56:18 2004 From: pieter at laeremans.org (Pieter Laeremans) Date: Mon Mar 29 19:16:29 2004 Subject: [BioPython] Installation problem on debian Message-ID: <877jx3b42l.fsf@hades.kotnet.org> Hi, When I'm trying to install biopython on a debian system (sarge), with python2.3 and all the dependencies installed. But I get this error: /tmp/biopython-1.24 $ python setup.py build running build running build./_py creating build creating build/lib.linux-i686-2.3 creating build/lib.linux-i686-2.3/Bio copying Bio/DBXRef.py -> build/lib.linux-i686-2.3/Bio copying Bio/Decode.py -> build/lib.linux-i686-2.3/Bio copying Bio/DocSQL.py -> build/lib.linux-i686-2.3/ .... ..... .... gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -IBio -I/usr/include/python2.3 -c Bio/PDB/mmCIF/MMCIFlexmodule.c -o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlexmodule.o Bio/PDB/mmCIF/MMCIFlexmodule.c: In function `MMCIFlex_open_file': Bio/PDB/mmCIF/MMCIFlexmodule.c:14: warning: implicit declaration of function `mmcif_set_file' Bio/PDB/mmCIF/MMCIFlexmodule.c: In function `MMCIFlex_get_token': Bio/PDB/mmCIF/MMCIFlexmodule.c:42: warning: implicit declaration of function `mmcif_get_token' Bio/PDB/mmCIF/MMCIFlexmodule.c:47: warning: implicit declaration of function `mmcif_get_string' Bio/PDB/mmCIF/MMCIFlexmodule.c: At top level: Bio/PDB/mmCIF/MMCIFlexmodule.c:65: warning: function declaration isn't a prototype gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -IBio -I/usr/include/python2.3 -c Bio/PDB/mmCIF/lex.yy.c -o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/lex.yy.o mmcif.lex:52: warning: function declaration isn't a prototype lex.yy.c:1046: warning: `yyunput' defined but not used gcc -pthread -shared build/temp.linux-i686-2.3/Bio/PDB/mmCIF/lex.yy.o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlexmodule.o -lfl -o build/lib.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlex.so /usr/bin/ld: cannot find -lfl collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 So I think there has to be a library 'fl' whIch has to be installed. But I don't know which librarh it is. Has someone succeeded in installing this software on a debian system? Thanks, Pieter From pieter at laeremans.org Tue Mar 30 02:54:41 2004 From: pieter at laeremans.org (Pieter Laeremans) Date: Tue Mar 30 03:00:00 2004 Subject: [BioPython] Installation problem on debian In-Reply-To: <200403300928.06024.thamelry@binf.ku.dk> (Thomas Hamelryck's message of "Tue, 30 Mar 2004 09:28:06 +0200") References: <877jx3b42l.fsf@hades.kotnet.org> <200403300928.06024.thamelry@binf.ku.dk> Message-ID: <87lllidb26.fsf@laeremans.org> Thomas Hamelryck writes: > > Dag Pieter, > > The missing library is flex, the GNU version of lex. > Alternatively, you can comment out the MMCIF lines > in setup.py if you do not need it... > > Best regards, > Thank you very much! It does work now. kind regards, Pieter From thamelry at binf.ku.dk Tue Mar 30 02:28:06 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Tue Mar 30 08:23:11 2004 Subject: [BioPython] Installation problem on debian In-Reply-To: <877jx3b42l.fsf@hades.kotnet.org> References: <877jx3b42l.fsf@hades.kotnet.org> Message-ID: <200403300928.06024.thamelry@binf.ku.dk> On Tuesday 30 March 2004 01:56, Pieter Laeremans wrote: [knip] > So I think there has to be a library 'fl' whIch has to be installed. > But I don't know which librarh it is. Has someone succeeded in > installing this software on a debian system? Dag Pieter, The missing library is flex, the GNU version of lex. Alternatively, you can comment out the MMCIF lines in setup.py if you do not need it... Best regards, --- Thomas Hamelryck Bioinformatik centret Universitetsparken 15 Bygning 10 DK-2100 K?benhavn ? Denmark http://www.binf.ku.dk/users/thamelry/ From M.BATES.ALANN at netscape.net Tue Mar 30 15:34:30 2004 From: M.BATES.ALANN at netscape.net (M.BATES.ALANN@netscape.net) Date: Tue Mar 30 15:38:55 2004 Subject: [BioPython] REPLY BACK Message-ID: <20040330143446.SM00238@netscape.net> Dear Friend, As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer. It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my business. Though I am very rich, I was never generous, I was always hostile to people and only focused on my business as that was the only thing I cared for. But now I regret all this as I now know that there is more to life than just wanting to have or make all the money in the world. I believe when God gives me a second chance to come to this world I would live my life a different way from how I have lived it. Now that God has called me, I have willed and given most of my property and assets to my immediate and extended family members as well as a few close friends. I want God to be merciful to me and accept my soul so, I have decided to give alms to charity organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed money to some charity organizations in the U.A.E, Algeria and Malaysia. Now that my health has deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one of my accounts and distribute the money which I have there to charity organization in Bulgaria and Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as they seem not to be contended with what I have left for them. The last of my money which no one knows of is the huge cash deposit of eighteen million dollars $18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations. I have set aside 10% for you and for your time. God be with you. BATES ALAN From chapmanb at uga.edu Tue Mar 30 19:17:00 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Mar 30 19:27:54 2004 Subject: [BioPython] PHI-BLAST support? In-Reply-To: <20040325110629.GB6048@svartfuru.ii.uib.no> References: <20040325110629.GB6048@svartfuru.ii.uib.no> Message-ID: <20040331001700.GF29401@evostick.agtec.uga.edu> Hi P?l; > I'm wondering whether BioPython supports doing PHI-BLAST searches (locally), > > btw, here's the command I issue to run PHI-BLAST from command-line: > > blastpgp -i [infile] -k [patternfile] -p patseedp Yes, we do support that. If you had your infile in a variable 'input_file' and your patternfile in a variable 'pattern_file', then you could do the search against, say, a local swissprot database with: from Bio.Blast import NCBIStandalone result_handle, error_handle = NCBIStandalone.blastpgp( "/usr/local/bin/blastpgp", "swissprot", input_file, program = "patseedp", hit_infile = pattern_file) The variable result_handle contains the output of this run, and error_handle any errors that may have occurred. > and whether parsing of PHI-BLAST output information such as pattern positions > is supported (see below)? I don't believe the BLAST parser currently supports output from PHI-BLAST searches. We'd certainly accept contributions towards this goal. Hope this helps! Brad From chapmanb at uga.edu Tue Mar 30 19:24:44 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Mar 30 19:35:38 2004 Subject: [BioPython] Martel Question In-Reply-To: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca> References: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca> Message-ID: <20040331002444.GG29401@evostick.agtec.uga.edu> Hi Peter; > I have just built a parser for Quantarray ... but not with Martel. > This parser is built with the typical scanner consumer model philosophy, > built on a state machine that will handle the quantarray output files. This > parser was not built to load anything into the memory. [...] > I will be reading and I need something that can > load into a Quantarray Record object, however I was a little worried about > the Record sizes. There is only 1 record per file, which might be the > saving grace. When I was parsing Genbank genomic files (many megs), the > Genbank parser was slowing to a crawl (and required piles of memory); 1 > genomic record per file. I know there were some speedups done on the GenBank parser over recent releases. Nothing related specifically Martel, but rather to some of my Python code which utilizes it. Have you tried it lately on your files and machines and found it to be especially slow? But yes, we haven't done any work on making big records not be stored in memory. > I would like to know if Martel scales to processing 5mb Records at a time, > if the entire file is is in memory? Has Martel been improved over the last > few months in the regard ... I may have a need to parse large Genbank NT > records again. There haven't been any specific changes to Martel -- if you are basing the memory problems soley on the GenBank parser I know a number of parts of that were written badly (by myself) and have been attempted to fixed up. > In Dalke's Martel paper it reads "Similarly, it should be possible to read > data from the input stream only when required, so that overall memory > footprint stays low. " is that still to be done ? Nothing drastic has happened by Andrew to Martel in the last few months so I assume so. He could probably give a better answer. Yeah, so sorry but my answer sums up to -- I'm not sure how it will act, I guess you'll have to try and see. >From my own experience using the new Fasta parser (which uses Martel) -- it works quite well on large chromosome sized FASTA sequences on my machine (nothing fancy, just a standard desktop). Hope this answer helps some, sorry I can't be more specific. Brad From karin.lagesen at labmed.uio.no Wed Mar 31 11:43:54 2004 From: karin.lagesen at labmed.uio.no (Karin Lagesen) Date: Wed Mar 31 11:49:08 2004 Subject: [BioPython] error with Fasta.Record? Message-ID: <20040331164354.GA9655@uracil.uio.no> I use the following code to read in a fasta file: genes = quick_FASTA_reader(geneFile) genelist = {} rec = Fasta.Record() iterator = 10001 for entry in genes: g = ecoligene.EcoliGene(entry) oname = os.path.join(over300, str(iterator)) if dofiles: rec.title, rec.sequence = entry print iterator, rec.title, rec.sequence ofile = open(oname, 'w') ofile.write(str(rec)) ofile.close() I do this with a test file: adenine:18:38> cat /med/adenine/u2/projects/locator/gard/testfile >1_dapB_to_carA_29196_29650 gtctataagtgccaaaaattacatgttttgtcttctgtttttgttgttttaatgtaaatt ttgaccatttggtccacttttttctgctcgtttttatttcatgcaatc >2_caiT_to_fixA_41932_42366 aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct gtattct >3_caiT_to_fixA_41932_42366 aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct gtattctgtgattggtatcacatttttgtttcgggtgaatagagggcgttttttcgttaa t >4_caiT_to_fixA_41932_42366 aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct gtattctgtgattggtatcacatttttgtttcgggtgaatagagggcgttttttcgttaa ttttgattaataatcagtttgttatgctctgttgtgagtaaaaaataacatctgac >5_fruR_to_yabB_89033_89633 gcttcgcacgttggacgtaaaataaacaacgctgatattagccgtaaacatcgggttttt tacctcggtatgccttgtgac >6_fruR_to_yabB_89033_89633 aaacaacgctgatattagccgtaaacatcgggttttttacctcggtatgccttgtgac >7_aroP_to_pdhR_121552_122091 gtttacatcaaagaagtttgaattgttacaaaaagacttccgtcagatcaagaataatgg tatg adenine:18:38> And the files I get look like this: adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10001 >1_dapB_to_carA_29196_29650 GTCTATAAGTGCCAAAAATTACATGTTTTGTCTTCTGTTTTTGTTGTTTTAATGTAAATT adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10002 >2_caiT_to_fixA_41932_42366 AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10003 >3_caiT_to_fixA_41932_42366 AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT GTATTCTGTGATTGGTATCACATTTTTGTTTCGGGTGAATAGAGGGCGTTTTTTCGTTAA adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10004 >4_caiT_to_fixA_41932_42366 AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT GTATTCTGTGATTGGTATCACATTTTTGTTTCGGGTGAATAGAGGGCGTTTTTTCGTTAA adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10005 >5_fruR_to_yabB_89033_89633 GCTTCGCACGTTGGACGTAAAATAAACAACGCTGATATTAGCCGTAAACATCGGGTTTTT adenine:18:38> cat /med/adenine/u2/projects/locator/gard/singles/10006 >6_fruR_to_yabB_89033_89633 adenine:18:38> cat /med/adenine/u2/projects/locator/gard/singles/10007 >7_aroP_to_pdhR_121552_122091 GTTTACATCAAAGAAGTTTGAATTGTTACAAAAAGACTTCCGTCAGATCAAGAATAATGG adenine:18:38> I try printing the rec object to test if the sequences are read in correctly, and they are. Thus it seems to be a problem with writing this object to file. Is this something I do wrong, or is it something else? Karin -- Karin Lagesen, PhD student karin.lagesen@labmed.uio.no