From gca500 at york.ac.uk Mon Oct 2 15:46:02 2006 From: gca500 at york.ac.uk (gca500 at york.ac.uk) Date: 02 Oct 2006 20:46:02 +0100 Subject: [BioPython] Genbank parsing problem and fix Message-ID: Hi All, Been having a problem using the Genbank RecordParser with some Genbank files that have recently been added to NCBI. After a bit of trial and error, I realised the problem only occurs if a REFERENCE field isn't followed by an AUTHOR field (for example in reference 2 of this record: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=88602864). There's a very easy fix on line 289 of Genbank.py. Decided to post this to the list to save any one else who stumbles across this problem tearing their hair out like I've been doing this afternoon! Change: authors_block + to: Martel.Opt(authors_block) + and it works! Hope this is useful, Gemma _____________________________________________ Gemma Atkinson PhD student Professor Sandie Baldauf's lab Department of Biology University of York, UK YO10 5DD gca500 at york.ac.uk www.gemma-atkinson.co.uk From ewijaya at i2r.a-star.edu.sg Tue Oct 3 02:34:09 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 14:34:09 +0800 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> Dear experts, I have the following script which try to use Bio.SeqIO's FASTA method to read sequence and simply print the actual sequence. __BEGIN__ from Bio.SeqIO import FASTA import sys handle = open(sys.argv[1]) it = FASTA.FastaReader(handle) seq = it.next() while seq: print seq.seq seq = it.next() handle.close() __END__ But how come the output looks like this? Seq('AACTAACAGTTTCCCTTGTCTAAAGCCTGCTCCCGATAAAAATAAGGCTGTGGGTTCTGG ...', Alphabet()) Seq('CACCATCAGGGCGAGATTTAGCCGCTAGGTTTGTCTCATGGAAGAAAAGCAGTAGAAAAA ...', Alphabet()) Seq('ACTTCCCACGTACGTCTGCAGGAACTTGCCTGTACCACAGGAAGACGATCGTCATGAGAA ...', Alphabet()) Is there a way to get the actual plain ATCG sequence (i.e wihtout brackets,quotes,and Alphabet()). Sorry I'm new with Python. Please bear with me. Thanks and hope to hear from you again. Regards, Edward WIJAYA ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From ewijaya at i2r.a-star.edu.sg Tue Oct 3 04:13:29 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 16:13:29 +0800 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> <1159862405.45221885016f3@imp.rezolwenta.eu.org> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> Hi Bartek, Thanks for the reply. Where can I find info about tostring() ? I cannot seem to find in the Bio.SeqIO documentation itself? Regards, Edward WIJAYA ________________________________ From: bartek wilczynski [mailto:bartek at rezolwenta.eu.org] Sent: Tue 10/3/2006 4:00 PM To: Wijaya Edward Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA Citing Wijaya Edward : seq.seq.tostring() will return a string instead of a sequence object seq.seq ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From bartek at rezolwenta.eu.org Tue Oct 3 04:00:05 2006 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 03 Oct 2006 10:00:05 +0200 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> Message-ID: <1159862405.45221885016f3@imp.rezolwenta.eu.org> Citing Wijaya Edward : > > Dear experts, > > I have the following script which try to > use Bio.SeqIO's FASTA method to read > sequence and simply print the actual sequence. > > __BEGIN__ > from Bio.SeqIO import FASTA > import sys > > handle = open(sys.argv[1]) > it = FASTA.FastaReader(handle) > seq = it.next() > while seq: > print seq.seq > seq = it.next() > handle.close() > __END__ > > > But how come the output looks like this? > > Seq('AACTAACAGTTTCCCTTGTCTAAAGCCTGCTCCCGATAAAAATAAGGCTGTGGGTTCTGG ...', > Alphabet()) > Seq('CACCATCAGGGCGAGATTTAGCCGCTAGGTTTGTCTCATGGAAGAAAAGCAGTAGAAAAA ...', > Alphabet()) > Seq('ACTTCCCACGTACGTCTGCAGGAACTTGCCTGTACCACAGGAAGACGATCGTCATGAGAA ...', > Alphabet()) > > Is there a way to get the actual plain ATCG sequence (i.e wihtout > brackets,quotes,and Alphabet()). > Sorry I'm new with Python. Please bear with me. > Hi, seq.seq.tostring() will return a string instead of a sequence object seq.seq regards Bartek Wilczynski From bartek at rezolwenta.eu.org Tue Oct 3 04:23:42 2006 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 03 Oct 2006 10:23:42 +0200 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> <1159862405.45221885016f3@imp.rezolwenta.eu.org> <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> Message-ID: <1159863822.45221e0ebeb39@imp.rezolwenta.eu.org> Citing Wijaya Edward : > > Hi Bartek, > > Thanks for the reply. > Where can I find info about tostring() ? > I cannot seem to find in the Bio.SeqIO documentation itself? your seq.seq object is an instance of Bio.Seq.Seq class. The tostring method is mentioned in the tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 regards Bartek From biopython at maubp.freeserve.co.uk Tue Oct 3 05:54:29 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 03 Oct 2006 10:54:29 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: References: Message-ID: <45223355.6010304@maubp.freeserve.co.uk> gca500 at york.ac.uk wrote: > Hi All, > > Been having a problem using the Genbank RecordParser with some Genbank > files that have recently been added to NCBI. After a bit of trial and > error, I realised the problem only occurs if a REFERENCE field isn't > followed by an AUTHOR field (for example in reference 2 of this record: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=88602864). > > There's a very easy fix on line 289 of Genbank.py. Decided to post this to > the list to save any one else who stumbles across this problem tearing > their hair out like I've been doing this afternoon! > > Change ... and it works! > > Hope this is useful, > > Gemma Hi Gemma, I have made your suggested change to biopython/Bio/formatdefs/genbank.py as CVS revision 1.10, which should be viewable online soon: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/expressions/genbank.py?cvsroot=biopython I am curious as to why you are using this code (part of the FormatIO system), rather than the Bio.GenBank module. Thank you, Peter From gca500 at york.ac.uk Tue Oct 3 07:36:58 2006 From: gca500 at york.ac.uk (Gemma Atkinson) Date: Tue, 3 Oct 2006 12:36:58 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: <45223355.6010304@maubp.freeserve.co.uk> References: <45223355.6010304@maubp.freeserve.co.uk> Message-ID: Hi Peter, I was using the Bio.Genbank module. This is the code I've been using: from Bio import GenBank parser = GenBank.RecordParser(debug_level=2) record = parser.parse(open("test4.txt")) It was the expressions/genbank.py file, imported from within the Genbank module that I've been changing. I haven't touched the formatdefs/genbank.py file (should have made that clear before - sorry). This was the error I was getting before I changed expressions/ genbank.py: File "testgbparser.py", line 3, in ? record = parser.parse(open("test4.txt")) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Bio/GenBank/__init__.py", line 240, in parse self._scanner.feed(handle, self._consumer) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Bio/GenBank/__init__.py", line 1259, in feed self._parser.parseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 1153 Gemma On 3 Oct 2006, at 10:54, Peter wrote: > gca500 at york.ac.uk wrote: >> Hi All, >> Been having a problem using the Genbank RecordParser with some >> Genbank files that have recently been added to NCBI. After a bit >> of trial and error, I realised the problem only occurs if a >> REFERENCE field isn't followed by an AUTHOR field (for example in >> reference 2 of this record: http://www.ncbi.nlm.nih.gov/entrez/ >> viewer.fcgi?db=protein&val=88602864). >> There's a very easy fix on line 289 of Genbank.py. Decided to post >> this to the list to save any one else who stumbles across this >> problem tearing their hair out like I've been doing this afternoon! >> Change ... and it works! >> Hope this is useful, >> Gemma > > Hi Gemma, > > I have made your suggested change to biopython/Bio/formatdefs/ > genbank.py as CVS revision 1.10, which should be viewable online soon: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > expressions/genbank.py?cvsroot=biopython > > I am curious as to why you are using this code (part of the > FormatIO system), rather than the Bio.GenBank module. > > Thank you, > > Peter > From biopython at maubp.freeserve.co.uk Tue Oct 3 09:33:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 03 Oct 2006 14:33:48 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: References: <45223355.6010304@maubp.freeserve.co.uk> Message-ID: <452266BC.9060809@maubp.freeserve.co.uk> >> Hi Gemma, >> >> I have made your suggested change to biopython/Bio/formatdefs/ >> genbank.py as CVS revision 1.10, which should be viewable online soon: >> >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ >> expressions/genbank.py?cvsroot=biopython I got the URL right, but I mean to say Bio/expressions/genbank.py (which actually has the Martel definition in it) not Bio/formatdefs/genbank.py Peter wrote: >> I am curious as to why you are using this code ... Gemma replied: > I was using the Bio.Genbank module. This is the code I've been using: > > from Bio import GenBank > parser = GenBank.RecordParser(debug_level=2) > record = parser.parse(open("test4.txt")) I would guess you are using BioPython 1.41 (or older) then, as your stack trace was indeed using Martel internally. Recent versions of BioPython (1.42 and later) use a pure python parser in Bio.GenBank as the old Martel code didn't scale well with large input files (to the point of being almost useless on large genomes). If you do update your installation, and run into any problems with the GenBank parser, please do let us know. Peter From ewijaya at i2r.a-star.edu.sg Tue Oct 3 10:16:27 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 22:16:27 +0800 Subject: [BioPython] BioPython for TRANSFAC Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061584@mailbe01.teak.local.net> Hi there, Is there a method in BioPython that allow me to pass the query "fruitfly" or "drosophila" and then returning the: 1. already characterized TF and their binding sites (BS), 2. their respective coregulated genes, and 3. the location of TFBS location/position in the genes. all from TRANSFAC database. -- Regards, Edward WIJAYA ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From m.stantoncook at gmail.com Wed Oct 4 09:38:03 2006 From: m.stantoncook at gmail.com (Mitchell Stanton-Cook) Date: Wed, 4 Oct 2006 23:38:03 +1000 Subject: [BioPython] Creating fusion protein like constructs with BioPython Message-ID: Hello all. I am trying to create fusion protein-like model from two separate pdb files. I introduce a CYS mutant in the target protein, and then wish to form a disulphide bound between it and a small peptide. This is pure computational work. I am using Bio.PDB. As the two structures are in arbitrary frames of reference I need to rotate and translate to form the "construct". I wish to have TargetProtein-CB-SY-SY-CB-SmallPeptide (the peptide is not really added to the N/C term) I have tried many different approaches but have failed miserable to get SmallPeptide rotated relative to TargetProtein at the correct dihedral angle +/-90deg and bond lengths. My current approach is (omitting the correct bond length at this time): TP-CB-SY SY-CB-SP 1 2 3 4 Translate 2 onto 3 Calculate the angle between 1-(23)-4 Calculate the cross product of 1-23 x 23-4 Generate the rotation matrix given the angle and vector Rotate all SP (SmallPeptide) atoms by this rotation matrix. This has not worked. I have had some other ideas and have written code for them. Ideally, I wish to calculate the rotations about X,Y,Z to place the SP at the correct dihedral angle followed by translation, but I have no idea how to do this. 1) Can I use Bio.PDB to do this above task or do I need to look at something else? 2) Does anyone have any ideas on how to complete this goal? Thanking you for your time. Mitch From biopython at maubp.freeserve.co.uk Thu Oct 5 05:47:30 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 05 Oct 2006 10:47:30 +0100 Subject: [BioPython] Creating fusion protein like constructs with BioPython In-Reply-To: References: Message-ID: <4524D4B2.8030600@maubp.freeserve.co.uk> Mitchell Stanton-Cook wrote: > Hello all. > > I am trying to create fusion protein-like model from two separate pdb files. > I introduce a CYS mutant in the target protein, and then wish to form a > disulphide bound between it and a small peptide. > > This is pure computational work. > > ... > > 1) Can I use Bio.PDB to do this above task or do I need to look at something > else? My gut instinct is that yes, you probably can - but you will have to do a lot of the work with your own code. Its not something I have ever tried though. > 2) Does anyone have any ideas on how to complete this goal? You might want to have a look at MMTK, which on the face of it would be better suited. Assuming MMTK will read both PDB files you might have better luck - this proviso is because I have found MMTK will choke on "odd" PDB files, and its support for non-standard residues could be better. http://starship.python.net/crew/hinsen/MMTK/index.html Peter From thamelry at binf.ku.dk Thu Oct 5 05:52:56 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Thu, 5 Oct 2006 11:52:56 +0200 Subject: [BioPython] Creating fusion protein like constructs with BioPython In-Reply-To: References: Message-ID: <2d7c25310610050252j2f889242h84411e0927fb4502@mail.gmail.com> Hi, > I am trying to create fusion protein-like model from two separate pdb files. > I introduce a CYS mutant in the target protein, and then wish to form a > disulphide bound between it and a small peptide. ... > 1) Can I use Bio.PDB to do this above task or do I need to look at something > else? Bio.PDB has functionality to do vector/rotation calculations. Take a look at the Vector.py module. Best, ---- Thomas Hamelryck, Post-doctoral researcher Bioinformatics center Institute of Molecular Biology and Physiology University of Copenhagen Universitetsparken 15 - Bygning 10 DK-2100 Copenhagen ? Denmark Homepage: http://www.binf.ku.dk/Protein_structure From gebauer-jung at ice.mpg.de Thu Oct 5 06:30:36 2006 From: gebauer-jung at ice.mpg.de (Steffi Gebauer-Jung) Date: Thu, 05 Oct 2006 12:30:36 +0200 Subject: [BioPython] Problem parsing Blast XML output from different sources Message-ID: <4524DECC.3030307@ice.mpg.de> Hello, because of blastall 2.2.14 output was not parsed from the Bio.Blast.NCBIStandalone parser, I tried to switch to the recommended Bio.Blast.NCBIXML parser. Thereby I found, that the xml output of the locally installed standalone blastall (2.2.14) differs from the web xml output. For BlastN hsps on Plus/Minus strands, the xml gives query_frame/hit_frame 1 / -1 as usual. But query and frame positions and sequences are switched in direction (would match frames -1/1). As the Bio.Blast.Record returned by the NCBIXML parser only gives frames, sequences and start positions it is not possible (without knowing the source of the xml file) to be sure to find the right data. This is clearly a problem of Blast. But because of the missing end positions in the returned record object it becomes a problem for users of the parser too. Could somebody try to confirm the different behaviour of the xml blast output with his/her own examples/installation? Thanks, Steffi From mdehoon at c2b2.columbia.edu Thu Oct 5 12:01:04 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 05 Oct 2006 12:01:04 -0400 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <4524DECC.3030307@ice.mpg.de> References: <4524DECC.3030307@ice.mpg.de> Message-ID: <45252C40.8040806@c2b2.columbia.edu> Which sequence are you running blast on? I'd like to try this on our local blast installation. --Michiel. Steffi Gebauer-Jung wrote: > Hello, > > because of blastall 2.2.14 output was not parsed from the > Bio.Blast.NCBIStandalone parser, > I tried to switch to the recommended Bio.Blast.NCBIXML parser. > > Thereby I found, that the xml output of the locally installed standalone > blastall (2.2.14) > differs from the web xml output. > > For BlastN hsps on Plus/Minus strands, the xml gives > query_frame/hit_frame 1 / -1 as usual. > But query and frame positions and sequences are switched in direction > (would match frames -1/1). > > As the Bio.Blast.Record returned by the NCBIXML parser only gives > frames, sequences > and start positions it is not possible (without knowing the source of > the xml file) > to be sure to find the right data. > > This is clearly a problem of Blast. > But because of the missing end positions in the returned record object > it becomes a problem for users of the parser too. > > Could somebody try to confirm the different behaviour of the xml blast > output > with his/her own examples/installation? > > Thanks, Steffi > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From kirbywhite at sbcglobal.net Fri Oct 6 04:30:32 2006 From: kirbywhite at sbcglobal.net (kirbywhite at sbcglobal.net) Date: 06 Oct 2006 01:30:32 -0700 Subject: [BioPython] Join kirby white on Yahoo! Messenger! Message-ID: <200610060837.k968bH7m002645@portal.open-bio.org> kirby white wants to talk with you using the new Yahoo! Messenger with Voice: Accept the invitation by clicking this link: http://invite.msg.yahoo.com/invite?op=accept&intl=us&sig=TH4bGUcdNQlSM9glNjqlrYiUe5Ghe81EwN0H9cef5vb5F7R7g9X1RKU7ac1qLispOfRJgQy2V7nt.fUIeMUChnR9ZMz50uB3r5ocpMTyDcxHE4kS.n_LZ2zqpi54EYbR3KHoIq73BouZjRO0y5J6LODqpmvT3VY- With Yahoo! Messenger with Voice, you get: Free worldwide PC-to-PC calls.* All you need are speakers and a microphone (or a headset). If no one's there, leave a voicemail! IM Windows Live™ Messenger friends too. Add your Windows Live friends to your Yahoo! contact list. See when they're online and IM them anytime. Stealth settings keep you in control. Now you can get in touch on your time, by controlling who sees when you're online. So what are you waiting for? It's free. Get Yahoo! Messenger with Voice and start connecting how you want, when you want. * Emergency 911 calling services not available on Yahoo! Messenger. Please inform others who use your Yahoo! Messenger they must dial 911 through traditional phone lines or cell carriers. By using Yahoo! Messenger you agree to not use PC-to-PC calling in countries where prohibited. The above features apply to the Windows version of Yahoo! Messenger. From mdehoon at c2b2.columbia.edu Sun Oct 8 00:51:09 2006 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 08 Oct 2006 00:51:09 -0400 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <45261086.4070708@ice.mpg.de> References: <4524DECC.3030307@ice.mpg.de> <45252C40.8040806@c2b2.columbia.edu> <45261086.4070708@ice.mpg.de> Message-ID: <452883BD.7050907@c2b2.columbia.edu> Hi Steffi, I am trying to replicate this problem with Blast. Where did you get the pat database? I searched for it with google, but there seems to be more than one blast database called pat. --Michiel. Steffi Gebauer-Jung wrote: > Hello, > > I don't know what local databases you have available for testing. > The discrepancy between xml and 'pairwise text' output should be seen > for every Plus/Minus Hsp created by local Blastn (local server or > standalone blastall from command line, I use version 2.2.14) > > I tried several combinations, one is M38240 vs. pat database, > the hsp hit was BD298385. > Here are the interesting output snippets: > >> dbj|BD298385.1| >> >> CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS AND PLANT PARTS > CONTAINING THEM, AND METHODS FOR OBTAINING THEM > Length = 14108 > > Score = 125 bits (63), Expect = 1e-25 > Identities = 63/63 (100%) > Strand = Plus / Minus > > > Query: 727 aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc > 786 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 8332 aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc > 8273 > > Query: 787 cga 789 > ||| > Sbjct: 8272 cga 8270 > > ===================================================== > > 15 > gi|92136243|dbj|BD298385.1| > CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS > AND PLANT PARTS CONTAINING THEM, AND METHODS FOR OBTAINING THEM > BD298385 > 14108 > > > 1 > 125.381 > 63 > 9.63859e-26 > 789 > 727 > 8270 > 8332 > 1 > -1 > 63 > 63 > 63 > > TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT > > > TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT > > > ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > > > > > Thanks, Steffi > > > > > > > Michiel Jan Laurens de Hoon wrote: > >> Which sequence are you running blast on? >> I'd like to try this on our local blast installation. >> >> --Michiel. >> >> Steffi Gebauer-Jung wrote: >> >>> Hello, >>> >>> because of blastall 2.2.14 output was not parsed from the >>> Bio.Blast.NCBIStandalone parser, >>> I tried to switch to the recommended Bio.Blast.NCBIXML parser. >>> >>> Thereby I found, that the xml output of the locally installed >>> standalone blastall (2.2.14) >>> differs from the web xml output. >>> >>> For BlastN hsps on Plus/Minus strands, the xml gives >>> query_frame/hit_frame 1 / -1 as usual. >>> But query and frame positions and sequences are switched in direction >>> (would match frames -1/1). >>> >>> As the Bio.Blast.Record returned by the NCBIXML parser only gives >>> frames, sequences >>> and start positions it is not possible (without knowing the source of >>> the xml file) >>> to be sure to find the right data. >>> >>> This is clearly a problem of Blast. >>> But because of the missing end positions in the returned record object >>> it becomes a problem for users of the parser too. >>> >>> Could somebody try to confirm the different behaviour of the xml >>> blast output >>> with his/her own examples/installation? >>> >>> Thanks, Steffi >>> >>> >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> >> > From luca.beltrame at unimi.it Tue Oct 10 08:01:59 2006 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Tue, 10 Oct 2006 14:01:59 +0200 Subject: [BioPython] Querying Entrez Gene Message-ID: <200610101401.59622.luca.beltrame@unimi.it> Hello. I'm currently in need of querying the Entrez Gene database using a list of IDs I have. After searching in the Biopython documentation, I have found no indication of whether that is possible or not. Is there a way to query NCBI's Entrez Gene database? Thanks in advance. From cjfields at uiuc.edu Tue Oct 10 08:46:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Oct 2006 07:46:53 -0500 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <4E706F3E-C29C-43F1-936D-A2670E1D5A0C@uiuc.edu> There is a BioPerl way (Bio::DB::EntrezGene); not sure about BioPython. Chris On Oct 10, 2006, at 7:01 AM, Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a > list of IDs > I have. After searching in the Biopython documentation, I have > found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From winter at biotec.tu-dresden.de Tue Oct 10 08:50:46 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Tue, 10 Oct 2006 14:50:46 +0200 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <452B9726.1060101@biotec.tu-dresden.de> Dear Luca, you probably need this: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ The code below is Python, not Biopython, but should work. You further need to parse the resulting XML. There is also an EUtils package as part of Biopython, but I never tried it: http://biopython.org/DIST/docs/api/public/Bio.EUtils-module.html Cheers, Christof # Python eutils example import urllib2 eutilsURL = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/" def esummary(db, ids): idlist = ",".join(ids) url = eutilsURL + "esummary.fcgi?db=%(db)s&id=%(idlist)s&retmode=xml" req = urllib2.Request(url % vars()) handle = urllib2.urlopen(req) return handle.read() print esummary("gene", ["3487"]) Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From aloraine at gmail.com Tue Oct 10 08:16:56 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 10 Oct 2006 07:16:56 -0500 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <83722dde0610100516o532c0b5eg329b01cda11eb156@mail.gmail.com> Dear Luca, Whenever I need data from Entrez gene (usually mRNA-to-Gene id mappings, in my case) I download one of the tab-delimited from the NCBI "Gene" ftp site: e.g., gene2go, gene2accession. see: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA -Ann On 10/10/06, Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From palle at birc.au.dk Thu Oct 12 02:59:26 2006 From: palle at birc.au.dk (Palle Villesen) Date: Thu, 12 Oct 2006 08:59:26 +0200 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <452DE7CE.1090008@birc.au.dk> Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > EUtils are also a part of BioPython. You should look in the biopython tutorial for how to use it. Below is my own small "mass downloader" utility in python. (Running on a non-administrator install of both python and biopython). The basic function/module you need is the HistoryClient that can search and retrieve large sets - instead of looping through all your ids one at a time. Anyway - check the tutorial, it's quite good (at least for a person with the same very basic python knowledge as me). sincerely, Palle Villesen, BiRC, DK Program: gb_search.py ------------------- #!/web/biopv/usr/local/bin/python import sys import time biopython_path='/web/biopv/usr/local/lib/python' sys.path.insert(0,biopython_path) def help(): from Bio.EUtils import Config dbs=" ".join(Config.databases.keys()) help= """ GenBank retrieve tool. Usage: gb_search.py QUERY [RECS] [DB] [FORMAT] QUERY : the entrez query enclosed in " " RECS : Number of records/sequences to get at a time (default=20) DB : Database, (default='nucleotide') (%s) Format : Record format (default='fasta', but 'docsum', 'brief', 'gi' - and many others are available) """ % dbs sys.exit(help) return 0 # Default values step=20 database="nucleotide" format="fasta" time2sleep=3 if len(sys.argv) ==1: help() search_term=sys.argv[1] if len(sys.argv)>2 : step=int(sys.argv[2]) if len(sys.argv)>3 : database=sys.argv[3] if len(sys.argv)>4 : format=sys.argv[4] if len(sys.argv)>5 : time2sleep=int(sys.argv[5]) from Bio.EUtils import HistoryClient s = HistoryClient.HistoryClient().search(search_term,db=database) print >>sys.stderr, "Getting %s seqs, %s sequences at a time" % (len(s),step) i=0 while i>sys.stderr, "Getting sequences from ",i,"to",min(i+step,len(s)), print s[i:i+step].efetch(retmode = "text", rettype = format).read() if i+step > len(s): print >>sys.stderr, "..done" break print >>sys.stderr, "...done (sleeping %s seconds)" % time2sleep i+=step time.sleep(time2sleep) ------------------------------------- -- -._ _.--'"`'--._ _.--'"`'--._ _.--'"`'--._ _ '-:`.'|`|"':-. '-:`.'|`|"':-. '-:`.'|`|"':-. '.` : '. '. '. | | | |'. '. | | | |'. '. | | | |'. '.: '. '. : '. '.| | | | '. '.| | | | '. '.| | | | '. '. : '. `. ' '. `.:_ | :_.' '. `.:_ | :_.' '. `.:_ | :_.' '. `.' `. `-..,..-' `-..,..-' `-..,..-' ` ` Palle Villesen, Ph.D. BiRC, Build. 090, University of Aarhus DK - 8000 Aarhus C, Denmark palle.retrosearch.dk - +45 61708600 --------------------------------------------------------------------- From gebauer-jung at ice.mpg.de Fri Oct 13 09:23:35 2006 From: gebauer-jung at ice.mpg.de (Steffi Gebauer-Jung) Date: Fri, 13 Oct 2006 15:23:35 +0200 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <452ABE8B.2050200@c2b2.columbia.edu> References: <4524DECC.3030307@ice.mpg.de> <45252C40.8040806@c2b2.columbia.edu> <45261086.4070708@ice.mpg.de> <452883BD.7050907@c2b2.columbia.edu> <452A3CB5.5080308@ice.mpg.de> <452ABE8B.2050200@c2b2.columbia.edu> Message-ID: <452F9357.8020101@ice.mpg.de> Hello Michiel, the fix works fine. Thanks for the fast reply and fixing! Maybe there should be a hint for other users not to use the frame information of the blast xml output and to test the start/end positions of the hsp sequences instead, and to be aware of reverse query sequences. For my needs I have to have the query sequence in forward direction. That's why I try to reverse-complement the complete alignment if this isn't the case yet. Thereby I found, that Bio.Seq.Seq.complement() cannot handle unicode sequences, in spite of Bio.Seq.Seq might be initialized with unicode strings: >>> import Bio.Seq >>> s = Bio.Seq.Seq(u'acgt') >>> s Seq(u'acgt', Alphabet()) >>> s.complement() Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.5/site-packages/Bio/Seq.py", line 101, in complement s = self.data.translate(ttable) TypeError: character mapping must return integer, None or unicode And just another idea: In order to (reverse)complement aligned sequences it would be useful to have the gap sign '-' in the alphabets. Steffi Michiel Jan Laurens de Hoon wrote: > Hi Steffi, > > I had the same result when running Blast locally. > > I added hsp.query_end and hsp.sbjct_end to the Blast XML parser, so > you can get around this problem. Could you try the fixed Blast parser? > You'll need to pick up Bio/Blast/NCBIXML.py and Bio/Blast/Record.py from > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython > > > If it works fine (or if it doesn't), please send a message to the > Biopython mailing list (instead of my email address), so that this > gets into the mailing list archives. > > --Michiel. > > > Steffi Gebauer-Jung wrote: > >> Hello, >> >> the db was downloaded from ftp://ftp.ncbi.nih.gov//blast/db/patnt.tar.gz >> >> In fact the special query sequence and db shouldn't matter. >> >> If you have any 'Plus / Minus' HSP in a pairwise BlastN output >> you can run BlastN again in order to get the xml formatted output. >> >> Comparing the special HSP in both formats you should see the effect. > > > From dtoomey at rcsi.ie Fri Oct 13 09:58:41 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Fri, 13 Oct 2006 14:58:41 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Hi I have been writing some scripts which make use of the NCBIStandalone module and I have found that when I iterate the results from some local blast runs my script will hang. I have attached 2 examples. I have been running them against two fasta files from drugbank http://redpoll.pharmacy.ualberta.ca/drugbank/download.htm I have tried them against two of the downloads from this site, the redundant and non-redundant "drug target protein sequences" The first query sequence (problem.txt) only hangs when run against the non redundant, the other sequence hangs when run against either. I have tracked down the line that hangs to "line = self._handle.readline(*args,**keywds)" in the readline method of Bio.File.UndoHandle I have no idea why this would be happening so any help would be appreciated. I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 Cheers, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: problem_fasta_sequences.zip Type: application/x-zip-compressed Size: 1000 bytes Desc: problem_fasta_sequences.zip Url : http://lists.open-bio.org/pipermail/biopython/attachments/20061013/2942b829/attachment.bin From biopython at maubp.freeserve.co.uk Fri Oct 13 11:22:40 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Fri, 13 Oct 2006 16:22:40 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs In-Reply-To: References: Message-ID: <452FAF40.2030202@maubp.freeserve.co.uk> Hi David David Toomey wrote: > I have been writing some scripts which make use of the NCBIStandalone module > and I have found that when I iterate the results from some local blast runs > my script will hang. When you say the first query sequence (problem.txt) "hangs" do you get a python stack trace? What is the script you are using? I'm going to guess that you are using python to invoke the NCBI standalone blast program. Have you tried running standalone blast "by hand" at the command line, and had a look at the output? Are you getting plain text output or XML from the blast program? > I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 On Windows, Linux or Mac OS X? Peter From aloraine at gmail.com Sun Oct 15 22:21:40 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 15 Oct 2006 21:21:40 -0500 Subject: [BioPython] too lazy to install InterProScan on my computer Message-ID: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> Greetings all, I've been trying to get EBI's InterProScan Web service to work with ZSI, which I understand is the preferred SOAP messaging implementation for python. I haven't had much luck so far .. I've get an error that look like an incompatibility in the WSDL of EBI (http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl) and what ZSI expects (see below.) Has anyone on the list used EBI's InterProScan from python using ZSI? If yes, did it work for you? Any sample code would be much appreciated, as I am new to SOAP. (EBI provides some example perl and java client code, but I'd like to take advantage of python's interpreter, which in theory would allow me to submit a bunch of requests to EBI using the asynchronous option, collect the job ids in a dictionary, and then fetch the data later on in the same interactive session.) -Ann import ZSI ebi_wsdl = 'http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl' >>> service = ZSI.ServiceProxy(ebi_wsdl) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/ZSI/ServiceProxy.py", line 34, in __init__ wsdl = ZSI.wstools.WSDLTools.WSDLReader().loadFromURL(wsdl) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", line 42, in loadFromURL wsdl.load(document) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", line 260, in load schema = reader.loadFromNode(WSDLToolsAdapter(self), item) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 80, in loadFromNode schema.load(reader) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1116, in load tp.fromDom(node) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 2283, in fromDom self.content.fromDom(contents[indx]) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1996, in fromDom content[-1].fromDom(i) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1764, in fromDom self.setAttributes(node) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 627, in setAttributes self.__checkAttributes() File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 673, in __checkAttributes raise SchemaError,\ ZSI.wstools.XMLSchema.SchemaError: class instance ZSI.wstools.XMLSchema.LocalElementDeclaration, missing required attribute name -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From idoerg at burnham.org Mon Oct 16 00:25:08 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sun, 15 Oct 2006 21:25:08 -0700 Subject: [BioPython] too lazy to install InterProScan on my computer In-Reply-To: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> References: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> Message-ID: <453309A4.60802@burnham.org> Ann, Just a general comment from one who has recently installed InterProScan on his workstation for the first time: 1) Installation is truly a breeze. If you want to mass use interproscan, you should do it on your machine. You'll be happier down the road. So will EBI: I believe there is a reason they moved from 6 sequences at a time to 1 at a time in their web interface. 2) Having said that, I have a Python client that queries InterProScan and parses the results. I can give you the code if you like. Not ZSI, just plain hacking. Cheers, Iddo Ann Loraine wrote: > Greetings all, > > I've been trying to get EBI's InterProScan Web service to work with > ZSI, which I understand is the preferred SOAP messaging > implementation for python. I haven't had much luck so far .. I've get > an error that look like an incompatibility in the WSDL of EBI > (http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl) and > what ZSI expects (see below.) > > Has anyone on the list used EBI's InterProScan from python using ZSI? > If yes, did it work for you? > > Any sample code would be much appreciated, as I am new to SOAP. (EBI > provides some example perl and java client code, but I'd like to take > advantage of python's interpreter, which in theory would allow me to > submit a bunch of requests to EBI using the asynchronous option, > collect the job ids in a dictionary, and then fetch the data later on > in the same interactive session.) > > -Ann > > import ZSI > ebi_wsdl = 'http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl' >>>> service = ZSI.ServiceProxy(ebi_wsdl) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/lib/python2.4/site-packages/ZSI/ServiceProxy.py", > line 34, in __init__ > wsdl = ZSI.wstools.WSDLTools.WSDLReader().loadFromURL(wsdl) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", > line 42, in loadFromURL > wsdl.load(document) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", > line 260, in load > schema = reader.loadFromNode(WSDLToolsAdapter(self), item) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 80, in loadFromNode > schema.load(reader) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1116, in load > tp.fromDom(node) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 2283, in fromDom > self.content.fromDom(contents[indx]) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1996, in fromDom > content[-1].fromDom(i) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1764, in fromDom > self.setAttributes(node) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 627, in setAttributes > self.__checkAttributes() > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 673, in __checkAttributes > raise SchemaError,\ > ZSI.wstools.XMLSchema.SchemaError: class instance > ZSI.wstools.XMLSchema.LocalElementDeclaration, missing required > attribute name > > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org From dtoomey at rcsi.ie Mon Oct 16 04:23:31 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Mon, 16 Oct 2006 09:23:31 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Hi Peter It is part of a large script that I have written but I have replicated the problem with the following simple script blast_out, error_info = NCBIStandalone.blastall("C:/Program Files/Blast/bin" , "blastp", "C:/blast_test/new_prot_target_for_download.txt", "C:/blast_test/problem.txt") b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) record = b_iterator.next() print record.query I don't get any stack trace. I have tried it from the windows command line, Komodo and also on a linux box (although the linux box is Python 2.3.3 rather than 2.4) When I say it hangs I mean it just dosn't return after the b_iterator.next() statement. There is no error message or stack trace. I have run the blast manually from the windows command line and it works fine. The above script works fine with other fasta query sequences against the same database. Cheers, Dave -----Original Message----- From: Peter (BioPython List) [mailto:biopython at maubp.freeserve.co.uk] Sent: 13 October 2006 16:23 To: David Toomey Cc: biopython at biopython.org Subject: Re: [BioPython] NCBIStandalone.iterator hangs Hi David David Toomey wrote: > I have been writing some scripts which make use of the NCBIStandalone module > and I have found that when I iterate the results from some local blast runs > my script will hang. When you say the first query sequence (problem.txt) "hangs" do you get a python stack trace? What is the script you are using? I'm going to guess that you are using python to invoke the NCBI standalone blast program. Have you tried running standalone blast "by hand" at the command line, and had a look at the output? Are you getting plain text output or XML from the blast program? > I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 On Windows, Linux or Mac OS X? Peter From dtoomey at rcsi.ie Mon Oct 16 07:38:05 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Mon, 16 Oct 2006 12:38:05 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Thanks for the help Peter The attached file has the output from two queries, problem.txt and works.txt, when run manually from the command line I also edited the NCBIStandalone module to add a print statement to Iterator.next() and then ran the same two files using the script. If you compare the two reports for problem.txt you can see on which line of the report the iterator is hanging. I have had a look at this and can't see anything about the line that is unusual? The last line outputted by the script is Query: 266 LAAHIDQYDIDAMTGIRATDIEKTDEAIKVTLENGAVLESKTVIIATGAGWRKLNIPGEE 325 And the manual report continues with I + + ++ + VL Sbjct: 68 GIMSIPTLILFKGGE-PVKQLIGYQPKEQLEAQLADVL- 125 Even though problem.txt generates a valid report when run manually it does output a load of errors of the type below, but I am not sure how this would cause the script to stop at the line above. [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s top(365) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s top(336) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s tart(337) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s tart(338) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02113 s tart(284) >= len(149) If it is easier for you I can certainly raise a bug, I just wanted to be sure it wasn't anything silly that I was doing before I did this. Cheers, Dave -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: 16 October 2006 12:11 To: David Toomey Subject: Re: [BioPython] NCBIStandalone.iterator hangs David Toomey wrote: > Hi Peter > > It is part of a large script that I have written but I have > replicated the problem with the following simple script Thank you. > I don't get any stack trace. I have tried it from the windows command > line, Komodo and also on a linux box (although the linux box is > Python 2.3.3 rather than 2.4) When I say it hangs I mean it just > dosn't return after the b_iterator.next() statement. There is no > error message or stack trace. > > I have run the blast manually from the windows command line and it > works fine. The above script works fine with other fasta query > sequences against the same database. That fact that the script runs fine on other query sequences is important. My guess is that the output for problem.txt is somehow different (maybe no matches), and the parser can't cope. Could you email me the blast output from using problem.txt as input, and a second working output from a different query? Or, if you would rather, file a bug and attach the two output files to it. You could do this from the command line, or using python save the blast_out text generated by NCBIStandalone.blastall to a file. (Looking back over the emails, I can't see what version of NCBI Standalone BLAST you have - but it should be specified in the blast output.) Thanks Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_test.zip Type: application/x-zip-compressed Size: 27560 bytes Desc: blast_test.zip Url : http://lists.open-bio.org/pipermail/biopython/attachments/20061016/37cfe905/attachment-0001.bin From biopython at maubp.freeserve.co.uk Mon Oct 16 11:39:17 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Oct 2006 16:39:17 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs In-Reply-To: References: Message-ID: <4533A7A5.6020102@maubp.freeserve.co.uk> David Toomey wrote: > Thanks for the help Peter > > The attached file has the output from two queries, problem.txt and > works.txt, when run manually from the command line Excellent - that looks like everything I asked for. > I also edited the NCBIStandalone module to add a print statement to > Iterator.next() and then ran the same two files using the script. > If you compare the two reports for problem.txt you can see on which line of > the report the iterator is hanging. I have had a look at this and can't see > anything about the line that is unusual? > > The last line outputted by the script is > Query: 266 LAAHIDQYDIDAMTGIRATDIEKTDEAIKVTLENGAVLESKTVIIATGAGWRKLNIPGEE 325 > > > And the manual report continues with > I + + ++ + VL > Sbjct: 68 GIMSIPTLILFKGGE-PVKQLIGYQPKEQLEAQLADVL- 125 > Your script is hanging during the second alignment in the results for: Nadph Dehydrogenase 1 - Clostridium beijerinckii (Clostridium MP) This alignment does look "funny" to me, but its not the first "funny" alignment in the results. I suspect you have found a problem with NCBI blast, or perhaps have a malformed database (especially given the errors you mention below). However, there are similar odd pairwise alignments before this one in the results which BioPython has apparently coped with... so it is a little odd that BioPython get stuck at this particular point. As to why I think there is a problem: Notice that the Query sequence continues for several lines (up to Query 504) while the Sbjct sequence is blank (up to Sbjct 313) except for a single lone gap character at position 246. I would have expected the match to finish at about Query 303 / Match 103. Very odd. In addition, notice that the header information is inconsistent: Score = 149 bits (376), Expect = 7e-037 Identities = 0/309 (0%), Positives = 0/309 (0%), Gaps = 15/309 (4%) Even looking at just the second set of 60 characters (quoted above) we have three identical matches (I, V and L) and five close matches. In all I would say there where five identical matches (A, S, I, V, L) and a further nine close matches. So the identities score should be 5/length, and the positives 14/length. I would also say the alignment length is either 297 (based on the length of the gapped query shown) or 99+1 (based on the length of the gapped subject sequence shown). Even allowing for my quick counts being out by plus of minus one, I can't see where the stated length of 309 comes from. > > Even though problem.txt generates a valid report when run manually it does > output a load of errors of the type below, but I am not sure how this would > cause the script to stop at the line above. > > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > top(365) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > top(336) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > tart(337) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > tart(338) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02113 s > tart(284) >= len(149) > > > If it is easier for you I can certainly raise a bug, I just wanted to be > sure it wasn't anything silly that I was doing before I did this. > Have a look over the output yourself, and see if you agree with me. I assume you get exactly the same results from running Blast on both Linux and Windows. I see you are using standalone BLASTP 2.2.13 [Nov-27-2005], so one thing you could try is updating your copy of Blast. I would also double check how you created/installed the database. I think BioPython is going wrong because its been given "funny" input. It may be possible for us to improve that, but even so, I wouldn't trust those blast results. Good luck Peter From cariaso at yahoo.com Mon Oct 16 12:25:42 2006 From: cariaso at yahoo.com (Mike Cariaso) Date: Mon, 16 Oct 2006 09:25:42 -0700 (PDT) Subject: [BioPython] can someone create biopython-1.42.win32-py2.5.exe Message-ID: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. thanks, Mike Cariaso -- Mike Cariaso * Bioinformatics Software * http://cariaso.com From mdehoon at c2b2.columbia.edu Mon Oct 16 19:03:16 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 16 Oct 2006 19:03:16 -0400 Subject: [BioPython] can someone create biopython-1.42.win32-py2.5.exe In-Reply-To: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> References: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> Message-ID: <45340FB4.1020308@c2b2.columbia.edu> Done. See the Biopython download page. Let me know if you have any problems. --Michiel. Mike Cariaso wrote: > python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. > > thanks, > Mike Cariaso > > > -- > Mike Cariaso * Bioinformatics Software * http://cariaso.com > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From hlapp at gmx.net Mon Oct 16 20:46:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 16 Oct 2006 20:46:56 -0400 Subject: [BioPython] NESCent Phyloinformatics Hackathon Message-ID: <480EBF5D-0290-429B-A7FF-826DD4A23FD0@gmx.net> (apologies in advance to those who receive this multiple times) The National Evolutionary Synthesis Center (NESCent) in collaboration with Arlin Stoltzfus (U. Maryland, NIST), Aaron Mackey (GSK), Rutger Vos (UBC), and Mark Holder (FSU) sponsors a Phyloinformatics Hackathon to take place Dec 11-15 in Durham, NC. The (wiki) website with more information and a formal proposal is at https://www.nescent.org/wg_phyloinformatics/ In short, the goal is to leverage the Bio* toolkits to provide the "glue" for evolutionary analyses of various types that depend on automation, interoperability, and data integration. CALL FOR INPUT: The specific objectives are driven by "use cases", that is, specific target problems of interest to evolutionary biologists (click 'Use Cases' at the above website). We invite community input in order to focus efforts on the most urgent or pervasive problems. The wiki for the hackathon allows direct editing of the use cases after registration. You may also upload data files, or add comments to the "Forum" page. Alternatively, send email to hlapp at nescent.org. You may also contact any of the organizers with questions or comments. ATTENDANCE: The hackathon is scheduled for Dec 11-15, 2006 in Durham NC. Space is limited, and attendance is by invitation. If you have not been contacted but desire to attend, please contact Hilmar Lapp (hlapp at nescent.org). ORGANIZERS: Hilmar Lapp (NESCent; hlapp at nescent.org) Aaron Mackey (GSK; aaron.j.mackey at gsk.com) Mark Holder (FSU; mholder at scs.fsu.edu) Arlin Stoltzfus (CARB, NIST; arlin.stoltzfus at nist.gov) Todd Vision (NESCent; tjv at bio.unc.edu) Rutger Vos (UBC; rvosa at sfu.ca) From aloraine at gmail.com Tue Oct 17 08:39:54 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 17 Oct 2006 07:39:54 -0500 Subject: [BioPython] intalling on Windows, was: Re: can someone create biopython-1.42.win32-py2.5.exe Message-ID: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> Hi, Along these lines, a question: I helped a student install biopython on his Windows laptop this week. However, we had a couple of problems getting BioPython to work properly with cygwin's python: our attempts to import Bio packages failed, I guess because they were not in cygwin's python's search path. (We were able to get the cygwin's python to find a custom module he had written by setting PYTHONPATH properly, using Windows' Control Panel, and then checking python's search path in sys.path.) How would I tell python where to find the BioPython packages? Is there a way to use PYTHONPATH environment variable to do this? We will try this next time I meet with him, but any tips or advice you might have would be great. Sorry for the possibly dumb questions -- I am not very familiar with how Windows does things. Sincerely, Ann Loraine On 10/16/06, Mike Cariaso wrote: > python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. > > thanks, > Mike Cariaso > > > -- > Mike Cariaso * Bioinformatics Software * http://cariaso.com > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From biopython at maubp.freeserve.co.uk Tue Oct 17 09:25:20 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 17 Oct 2006 14:25:20 +0100 Subject: [BioPython] intalling on Windows, was: Re: can someone create biopython-1.42.win32-py2.5.exe In-Reply-To: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> References: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> Message-ID: <4534D9C0.6080600@maubp.freeserve.co.uk> Ann Loraine wrote: > Hi, > > Along these lines, a question: > > I helped a student install biopython on his Windows laptop this week. > However, we had a couple of problems getting BioPython to work > properly with cygwin's python: our attempts to import Bio packages > failed, I guess because they were not in cygwin's python's search > path. > > (We were able to get the cygwin's python to find a custom module he > had written by setting PYTHONPATH properly, using Windows' Control > Panel, and then checking python's search path in sys.path.) > > How would I tell python where to find the BioPython packages? Is there > a way to use PYTHONPATH environment variable to do this? > > We will try this next time I meet with him, but any tips or advice you > might have would be great. > > Sorry for the possibly dumb questions -- I am not very familiar with > how Windows does things. > > Sincerely, > > Ann Loraine Hello Ann Personally went I tried cygwin's python over a year ago, it did not seem very reliable (especially using idle). Things my have improved but I would recommend you use the "pure windows" version of Python. I suspect you are trying to combine cygwin's python with the pre-compiled windows setup.exe for BioPython. This is probably not going to work - and having to mess about with PYTHONPATH would not surprise me. The bits of BioPython written in pure python may be fine, but some of it is compiled C code... and then you asking for trouble. Pure Windows (not using Cygwin) =============================== For windows, I would recommend you to install a pure windows version of Python 2.5 and BioPython 1.42 using: http://www.python.org/ftp/python/2.5/python-2.5.msi http://biopython.org/DIST/biopython-1.42.win32-py2.5.exe Or the Python 2.4 versions: http://www.python.org/ftp/python/2.4.4/python-2.4.4c1.msi http://biopython.org/DIST/biopython-1.42.win32-py2.4.exe Or, if you have some good reason, the Python 2.3 versions. Check the python website for the latest versions, these were correct at the time or writing. It should "just work", without any messing about with paths. Using cygwin on Windows (ignoring any pure windows python installation) ======================= Basically follow the unix instructions. Install the cygwin version of python, gcc, flex, ..., using cygwin setup. Compile and install BioPython FROM SOURCE at the cygwin command line, using the cygwin version of python. Peter From alpersoyler at yahoo.com Wed Oct 18 06:33:47 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Wed, 18 Oct 2006 03:33:47 -0700 (PDT) Subject: [BioPython] Standalone Blast Message-ID: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> Hi all, I want to use formatdb option of standalone BLAST. I have 400 files with ".pep" extension and I want to format all of them. I looked at the "Biopython tutorial and cookbok" but there was no explanation about it. If you help me, I would be glad. Thank you in advance. Alper From sbassi at gmail.com Wed Oct 18 12:31:39 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 18 Oct 2006 13:31:39 -0300 Subject: [BioPython] Standalone Blast In-Reply-To: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> References: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> Message-ID: On 10/18/06, alper soyler wrote: > I want to use formatdb option of standalone BLAST. I have 400 files with ".pep" extension and I want to format all of them. I looked at the "Biopython tutorial and cookbok" but there was no explanation about it. If you help me, I would be glad. Thank you in advance. > Hello, it is not in the Biopython tutorial because it is out of scope, I mean, the biopython tutorial asumes you have a working blast installation. I think you should join all your sequences in one file before running formatdb (like *.pep > allfiles.txt). I asume that all pep files are fasta files. Best regards, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From tomee at genesilico.pl Thu Oct 19 20:52:33 2006 From: tomee at genesilico.pl (Tomek Jarzynka) Date: Fri, 20 Oct 2006 02:52:33 +0200 Subject: [BioPython] Automated ligand extraction from PDB Message-ID: <200610200252.34046.tomee@genesilico.pl> Hi, I would like to create a script that would take a PDB file as input, try and identify the ligand structures and delete them from the PDB file. I figured this would be possible with Biopython by looking at the atom id (whether it contains 'H_') and subclassing Select to not allow those items to be written to a file. Is there any more elegant way of doing this, perhaps another PDB parser framework, or maybe someone has already done similar work? Thanks in advance, -- Tomasz K. Jarzynka / +48 601 706 601 / tomee(a-t)genesilico(d-o-t)pl Laboratory of Bioinformatics and Protein Engineering | www.genesilico.pl International Institute of Molecular and Cell Biology | www.iimcb.gov.pl "You can have either freedom of speech or quality of communication. -- Orson Scott Card" From biopython at maubp.freeserve.co.uk Fri Oct 20 05:21:39 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Oct 2006 10:21:39 +0100 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <200610200252.34046.tomee@genesilico.pl> References: <200610200252.34046.tomee@genesilico.pl> Message-ID: <45389523.5000802@maubp.freeserve.co.uk> Tomek Jarzynka wrote: > Hi, > > I would like to create a script that would take a PDB file as input, > try and identify the ligand structures and delete them from the PDB > file. I figured this would be possible with Biopython by looking at > the atom id (whether it contains 'H_') and subclassing Select to > not allow those items to be written to a file. > Is there any more elegant way of doing this, perhaps another PDB parser > framework, or maybe someone has already done similar work? > > Thanks in advance, You plan sounds fine. If you are going to use BioPython to analyse PDB files later, then this would be a good exercise to get to grips with it. Also, using the BioPython PDB parser in strict mode (non-permissive) then it may also flag up some potential problems in the PDB file which you may be interested in for you work. A quick and dirty alternative would be to read in the input file line by line, and output selected lines: * If is not an atom record, just output it. * If is is an atom record, look at the atom id to decide. Peter From tomee at genesilico.pl Fri Oct 20 07:47:14 2006 From: tomee at genesilico.pl (Tomek Jarzynka) Date: Fri, 20 Oct 2006 13:47:14 +0200 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <45389523.5000802@maubp.freeserve.co.uk> References: <200610200252.34046.tomee@genesilico.pl> <45389523.5000802@maubp.freeserve.co.uk> Message-ID: <200610201347.15086.tomee@genesilico.pl> On Friday 20 October 2006 11:21, Peter wrote: > You plan sounds fine. > If you are going to use BioPython to analyse PDB files later, then this > would be a good exercise to get to grips with it. I tried some simple code and it works, but I find the 'H_' comparison pretty inelegant. Is there any better way to look for HET atoms, or ligand atoms in particular? > Also, using the BioPython PDB parser in strict mode (non-permissive) > then it may also flag up some potential problems in the PDB file which > you may be interested in for you work. Thanks, I'll take a look at it. Actually, Kristian Rother who's referenced as one of the PDB code authors is my 'roommate' at the institute :) BTW. How does PDB in Biopython relate to pymmlib and cctbx? > A quick and dirty alternative would be to read in the input file line by > line, and output selected lines: > * If is not an atom record, just output it. > * If is is an atom record, look at the atom id to decide. Yeah, doing this in a shell script would be easy at first but I am expecting trouble with some non-standard PDB files. -- Tomasz K. Jarzynka / +48 601 706 601 / tomee(a-t)genesilico(d-o-t)pl Laboratory of Bioinformatics and Protein Engineering | www.genesilico.pl International Institute of Molecular and Cell Biology | www.iimcb.gov.pl "Kt?? nie chcia?by sta? si? przez szcz??cie g?upszy, zamiast by? m?drzejszy przez szkod?. -- Kamil C. Norwid" From sdavis2 at mail.nih.gov Fri Oct 20 21:20:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Oct 2006 21:20:21 -0400 Subject: [BioPython] Martel-based parsing of Unigene flat files Message-ID: <453975D5.4070701@mail.nih.gov> I am relatively new to python and biopython (coming from perl side of things). I would like to make a parser for Unigene flat file format. However, after digging through the LocusLink parsing code (as probably the most similar format, etc.), I'm still at a loss for how Martel-based parsing works. I understand the big picture (converting an re-based parsing of a file into events), but it is the detail that I am missing. I know about pydoc, but the pydoc for much of Martel is not very helpful to me, at least not in my current state of knowledge. Any suggestions on how to get started? Thanks, Sean From Thenaturenook1 at aol.com Sat Oct 21 09:35:13 2006 From: Thenaturenook1 at aol.com (Thenaturenook1 at aol.com) Date: Sat, 21 Oct 2006 09:35:13 EDT Subject: [BioPython] installing biopython Message-ID: Hi, I am in the process of installing BioPython on a Windows XP System. I have been using the Windows installers throughout and following the instruction in the PDF file. I have installed all of the compulsary pre requisite modules in the order designated, but when i type the test code, this AttributeError happens: Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.2 >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> new_seq = Seq('GATCAGAAC', unambiguous_dna) >>> new_seq[0:2] Seq('GA', IUPACUnambiguousDNA()) >>> from Bio import Translate >>> translator = Translate.umambiguous_dna_by_name["Standard"] Traceback (most recent call last): File "", line 1, in translator = Translate.umambiguous_dna_by_name["Standard"] AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' >>> Can anyone help? thanks,tim From idoerg at burnham.org Sat Oct 21 10:58:01 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 21 Oct 2006 07:58:01 -0700 Subject: [BioPython] installing biopython References: Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D46C0@MAIL.burnham.org> > AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' You misspelled the module's name. ./I -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Thenaturenook1 at aol.com Sent: Sat 10/21/2006 6:35 AM To: biopython at biopython.org Subject: [BioPython] installing biopython Hi, I am in the process of installing BioPython on a Windows XP System. I have been using the Windows installers throughout and following the instruction in the PDF file. I have installed all of the compulsary pre requisite modules in the order designated, but when i type the test code, this AttributeError happens: Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.2 >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> new_seq = Seq('GATCAGAAC', unambiguous_dna) >>> new_seq[0:2] Seq('GA', IUPACUnambiguousDNA()) >>> from Bio import Translate >>> translator = Translate.umambiguous_dna_by_name["Standard"] Traceback (most recent call last): File "", line 1, in translator = Translate.umambiguous_dna_by_name["Standard"] AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' >>> Can anyone help? thanks,tim _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From sbassi at gmail.com Sat Oct 21 10:51:11 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Sat, 21 Oct 2006 11:51:11 -0300 Subject: [BioPython] installing biopython In-Reply-To: References: Message-ID: On 10/21/06, Thenaturenook1 at aol.com wrote: > Hi, > I am in the process of installing BioPython on a Windows XP System. I have .... > the order designated, but when i type the test code, this AttributeError This is not a problem with the biopython installation. Could you please send me the URL of the PDF file? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Sat Oct 21 10:39:38 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Oct 2006 15:39:38 +0100 Subject: [BioPython] installing biopython In-Reply-To: References: Message-ID: <453A312A.9010001@maubp.freeserve.co.uk> Thenaturenook1 at aol.com wrote: >>>> from Bio import Translate >>>> translator = Translate.umambiguous_dna_by_name["Standard"] > > Traceback (most recent call last): > File "", line 1, in > translator = Translate.umambiguous_dna_by_name["Standard"] > AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' > > Can anyone help? You have a typing error there, uMambiguous rather than uNambiguous. Also try dir(Translate) to see what else is on offer. i.e. Try this: translator = Translate.unambiguous_dna_by_name["Standard"] Peter From thamelry at binf.ku.dk Sun Oct 22 14:26:17 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Sun, 22 Oct 2006 20:26:17 +0200 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <200610200252.34046.tomee@genesilico.pl> References: <200610200252.34046.tomee@genesilico.pl> Message-ID: <2d7c25310610221126l7fbd5d65u4b8feb5888a81935@mail.gmail.com> On 10/20/06, Tomek Jarzynka wrote: > Hi, > > I would like to create a script that would take a PDB file as input, > try and identify the ligand structures and delete them from the PDB > file. Take a look at the extract function in Bio.PDB's Dice module. extract(s, "A", 1 100, "out.pdb") will write all amino acids between 1 and 100 of chain A in structure s to file "out.pdb". Hetero residues (ie. ligands) and hydrogens are not included. -Thomas From konrad.koehler at mac.com Mon Oct 23 12:32:09 2006 From: konrad.koehler at mac.com (Konrad Forster Koehler) Date: Mon, 23 Oct 2006 09:32:09 -0700 Subject: [BioPython] Using Bio.PDB to add atoms to a structure? Message-ID: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> Using Bio.PDB, I was wondering if there is a way to: 1) read in a structure from a PDB file 2) calculate the position of a new chain/residue/atom 3) add new these chains/residues/atoms to the structure 4) write of the PDB file containing the original + new atoms I have no problem with steps 1, 2, and 4, but I am stuck on step #3. I have tried things like: http://bioinformatics.org/bradstuff/bp/api/Bio/PDB/StructureBuilder_StructureBuilder.py.html#init_chain builder = StructureBuilder() chain_x = builder.init_chain("0","X") error message: init_chain() takes exactly 2 arguments (3 given) or chain_x = builder.init_chain("X") error message: StructureBuilder instance has no attribute 'model' I have googled for examples or documentation on adding new atoms using Bio.PDB, but I have not found anything. Can anyone provide me with any pointers, examples, etc? Best regards, Konrad From thamelry at binf.ku.dk Mon Oct 23 15:10:35 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 23 Oct 2006 21:10:35 +0200 Subject: [BioPython] Using Bio.PDB to add atoms to a structure? In-Reply-To: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> References: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> Message-ID: <2d7c25310610231210p64876f23k25bb4a67ca58c821@mail.gmail.com> Hi, Adding a chain: from Bio.PDB.Chain import Chain chain=Chain('B') model.add(chain) You have to make sure that the id (in this case 'B') is not yet present in the model. Similar for atom, residue and model. Best, -Thomas From biopython at maubp.freeserve.co.uk Wed Oct 25 05:27:45 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Oct 2006 10:27:45 +0100 Subject: [BioPython] [Biopython-dev] Martel-based parsing of Unigene flat files In-Reply-To: <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> References: <453975D5.4070701@mail.nih.gov> <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> Message-ID: <453F2E11.7030408@maubp.freeserve.co.uk> Sean, I did have a little look at Unigene, but wasn't sure which files exactly you wanted to parse. Like the NCBI, they seem to offer lots of different file formats. Chris Lasher wrote: > Hi Sean, > > FWIW this should probably have been posted to BioPython-dev, but I > don't think that would improve your chances of getting a response. I > am cross-posting it there, anyways. Unfortunately for you, I do not > have an answer for you. :-( The dev list would probably have been a better idea. I had seen Sean' email and was meaning to write something in the absence of any other takers. > I, myself, would be interested in a response to this question from the > Devs, as I would like to write a parser for PTT files. Last I saw > there was a lot of chatter about the Martel parsers being incredibly > slow compared to straightforward solutions. It seems that standard > format parsers would be one of the easiest ways for BioPython newbies > to contribute to developing the BioPython project, however, there > isn't very much in the way of documentation on the BioPython way to do > so, let alone developer documentation at all. I would like to know > what can be done to get some dev docs going on the wiki. I'm one of the more recent contributors - for example, I changed the GenBank parser from Martel to just Python. This was done on the pretext that the old parser (when it worked) was exceedingly slow on large files. There is still room for improvement, but I can now load whole chromosomes/genomes. If for your file format, the individual records (repeating units) are over 10MB in size, then I would begin to worry about the performance using Martel. Otherwise it might be OK... In the process of this work I did eventually get a feel for how Martel works, and how to define file formats etc. Its rather a clever design but it is daunting for new comers. Also, when someone manages to find a file formatted sufficiently different to what a Martel parser expects, working out what exactly needs to be fixed is sometimes tricky. Over on the developers list we have had some talk about where to go in future, and at the moment I have been working on a SeqIO system a little like BioPerl's, http://bugzilla.open-bio.org/show_bug.cgi?id=2059 http://biopython.org/wiki/SeqIO This is a work in progress... I've been planning to actually check something into CVS in the near future. I would also need to lay down guidelines on how annotations are stored so that file format conversion is as smooth as possible. Chris mentioned PTT files (protein table files), available from the NCBI (and probably other databases too). I think PTT files had been mentioned on the dev list in the context of SeqIO (sequence input/output), and one suggestion was to load them as annotated SeqRecord objects with an empty Sequence. Depending on what people want to do with a PTT file, this may not suit everyone. Peter From sdavis2 at mail.nih.gov Wed Oct 25 09:17:10 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 25 Oct 2006 09:17:10 -0400 Subject: [BioPython] =?iso-8859-1?q?=5BBiopython-dev=5D_Martel-based_parsi?= =?iso-8859-1?q?ng_of_Unigene=09flat_files?= In-Reply-To: <453F2E11.7030408@maubp.freeserve.co.uk> References: <453975D5.4070701@mail.nih.gov> <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> <453F2E11.7030408@maubp.freeserve.co.uk> Message-ID: <200610250917.10160.sdavis2@mail.nih.gov> On Wednesday 25 October 2006 05:27, Peter wrote: > Sean, > > I did have a little look at Unigene, but wasn't sure which files exactly > you wanted to parse. Like the NCBI, they seem to offer lots of > different file formats. I was thinking about files like Hs.data. These files are very simple file formats and can be parsed using simple regexes and if statements VERY quickly. I have written one in perl (because the bioperl one creates objects when none were needed in my case, and so was slow). I simply wanted to do the same in python, but wanted to "do it right". > Chris Lasher wrote: > > Hi Sean, > > > > FWIW this should probably have been posted to BioPython-dev, but I > > don't think that would improve your chances of getting a response. I > > am cross-posting it there, anyways. Unfortunately for you, I do not > > have an answer for you. :-( > > The dev list would probably have been a better idea. I will join and certainly use the dev list in the future for questions along these lines. It always takes a bit to get the culture of a new set of lists correct. > I'm one of the more recent contributors - for example, I changed the > GenBank parser from Martel to just Python. This was done on the pretext > that the old parser (when it worked) was exceedingly slow on large > files. There is still room for improvement, but I can now load whole > chromosomes/genomes. Good to know. I'll take a look at this code. > If for your file format, the individual records (repeating units) are > over 10MB in size, then I would begin to worry about the performance > using Martel. Otherwise it might be OK... > > In the process of this work I did eventually get a feel for how Martel > works, and how to define file formats etc. Its rather a clever design > but it is daunting for new comers. > > Also, when someone manages to find a file formatted sufficiently > different to what a Martel parser expects, working out what exactly > needs to be fixed is sometimes tricky. > > Over on the developers list we have had some talk about where to go in > future, and at the moment I have been working on a SeqIO system a little > like BioPerl's, Just keep in mind that on the bioperl side, as annotations have gotten richer and file size has become a non-issue for storage, some of those parsers are not keeping up in terms of speed. SeqIO is fairing quite well, but the BLAST parser isn't, just as an example. There is a fine line between creating objects for everything and speedy parsing into raw data structures. In fact, having a couple of parsers (not fully deprecating a fast but trivial parser) is probably the best general way to go. In short, the parser/consumer model is relatively new to me and I think that is where I need to spend a bit of time learning the lay of the land. Thanks for the hints and pointers. I'll look a bit more at code and then try to ask more specific questions as they arise. Sean From pmmagic at gmail.com Thu Oct 26 11:19:19 2006 From: pmmagic at gmail.com (paul m) Date: Thu, 26 Oct 2006 11:19:19 -0400 Subject: [BioPython] Postdoctoral Position - Systems/Computational Biology Message-ID: <991e7bc10610260819h3b8834a5o4418807eed32051@mail.gmail.com> I hope this email is considered appropriate for the BioPython mailing list. Python is the language of choice for compuational work in my lab and I hope there may be some folks on the list who are in the process of finishing up Ph.D. work and looking towards the next step... Postdoctoral Position in Systems/Computational Biology A two year postdoctoral position is available in the Department of Biology at Duke University and the newly formed Duke Center for Systems Biology. We seek a highly motivated postdoctoral research associate who has a strong background in statistical and computational methods. The successful candidate will help to develop quantitative models of the regulatory networks underlying complex traits in yeast. The person who fills this position will also participate in a Howard Hughes Medical Institute funded initiative to develop quantitative laboratory materials for an undergraduate biology course. To apply for this position please send a cover letter, CV and the names and contact information for three references to: Dr. Paul Magwene, Department of Biology, Duke University, P.O. Box 91000, Durham, NC 27708. You may also email this information to paul.magwene at duke.edu. From mdehoon at c2b2.columbia.edu Mon Oct 30 17:08:50 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 30 Oct 2006 17:08:50 -0500 Subject: [BioPython] Unigene flat file parser Message-ID: <454677F2.1050309@c2b2.columbia.edu> Hi everybody, [If you're also on biopython-dev, you've already received this post. Sorry for the cross-post.] Sean Davis of NIH has written a parser for the Unigene flat file format described here: ftp://ftp.ncbi.nih.gov/repository/UniGene/README under the Hs.data section. A natural place to include this in Biopython would be under Bio/UniGene. However, there is some code already under Bio/Unigene, but I couldn't find documentation for it and it hasn't been updated in more than two years, so it may be some dead code sitting around. If so, we may as well remove this code and put Sean's code there. So just to make sure that this wouldn't harm somebody's work: Is anybody using the current Bio/UniGene code? Thanks, --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From arareko at campus.iztacala.unam.mx Tue Oct 31 10:08:58 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 31 Oct 2006 09:08:58 -0600 Subject: [BioPython] FreeBSD port updated Message-ID: <4547670A.1010009@campus.iztacala.unam.mx> biopython-l, This is to inform you that the FreeBSD port for BioPython has been updated from 1.41 to 1.42. Many thanks to Thomas Abthorpe who created the patch for this update. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From gca500 at york.ac.uk Mon Oct 2 19:46:02 2006 From: gca500 at york.ac.uk (gca500 at york.ac.uk) Date: 02 Oct 2006 20:46:02 +0100 Subject: [BioPython] Genbank parsing problem and fix Message-ID: Hi All, Been having a problem using the Genbank RecordParser with some Genbank files that have recently been added to NCBI. After a bit of trial and error, I realised the problem only occurs if a REFERENCE field isn't followed by an AUTHOR field (for example in reference 2 of this record: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=88602864). There's a very easy fix on line 289 of Genbank.py. Decided to post this to the list to save any one else who stumbles across this problem tearing their hair out like I've been doing this afternoon! Change: authors_block + to: Martel.Opt(authors_block) + and it works! Hope this is useful, Gemma _____________________________________________ Gemma Atkinson PhD student Professor Sandie Baldauf's lab Department of Biology University of York, UK YO10 5DD gca500 at york.ac.uk www.gemma-atkinson.co.uk From ewijaya at i2r.a-star.edu.sg Tue Oct 3 06:34:09 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 14:34:09 +0800 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> Dear experts, I have the following script which try to use Bio.SeqIO's FASTA method to read sequence and simply print the actual sequence. __BEGIN__ from Bio.SeqIO import FASTA import sys handle = open(sys.argv[1]) it = FASTA.FastaReader(handle) seq = it.next() while seq: print seq.seq seq = it.next() handle.close() __END__ But how come the output looks like this? Seq('AACTAACAGTTTCCCTTGTCTAAAGCCTGCTCCCGATAAAAATAAGGCTGTGGGTTCTGG ...', Alphabet()) Seq('CACCATCAGGGCGAGATTTAGCCGCTAGGTTTGTCTCATGGAAGAAAAGCAGTAGAAAAA ...', Alphabet()) Seq('ACTTCCCACGTACGTCTGCAGGAACTTGCCTGTACCACAGGAAGACGATCGTCATGAGAA ...', Alphabet()) Is there a way to get the actual plain ATCG sequence (i.e wihtout brackets,quotes,and Alphabet()). Sorry I'm new with Python. Please bear with me. Thanks and hope to hear from you again. Regards, Edward WIJAYA ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From ewijaya at i2r.a-star.edu.sg Tue Oct 3 08:13:29 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 16:13:29 +0800 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> <1159862405.45221885016f3@imp.rezolwenta.eu.org> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> Hi Bartek, Thanks for the reply. Where can I find info about tostring() ? I cannot seem to find in the Bio.SeqIO documentation itself? Regards, Edward WIJAYA ________________________________ From: bartek wilczynski [mailto:bartek at rezolwenta.eu.org] Sent: Tue 10/3/2006 4:00 PM To: Wijaya Edward Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA Citing Wijaya Edward : seq.seq.tostring() will return a string instead of a sequence object seq.seq ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From bartek at rezolwenta.eu.org Tue Oct 3 08:00:05 2006 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 03 Oct 2006 10:00:05 +0200 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> Message-ID: <1159862405.45221885016f3@imp.rezolwenta.eu.org> Citing Wijaya Edward : > > Dear experts, > > I have the following script which try to > use Bio.SeqIO's FASTA method to read > sequence and simply print the actual sequence. > > __BEGIN__ > from Bio.SeqIO import FASTA > import sys > > handle = open(sys.argv[1]) > it = FASTA.FastaReader(handle) > seq = it.next() > while seq: > print seq.seq > seq = it.next() > handle.close() > __END__ > > > But how come the output looks like this? > > Seq('AACTAACAGTTTCCCTTGTCTAAAGCCTGCTCCCGATAAAAATAAGGCTGTGGGTTCTGG ...', > Alphabet()) > Seq('CACCATCAGGGCGAGATTTAGCCGCTAGGTTTGTCTCATGGAAGAAAAGCAGTAGAAAAA ...', > Alphabet()) > Seq('ACTTCCCACGTACGTCTGCAGGAACTTGCCTGTACCACAGGAAGACGATCGTCATGAGAA ...', > Alphabet()) > > Is there a way to get the actual plain ATCG sequence (i.e wihtout > brackets,quotes,and Alphabet()). > Sorry I'm new with Python. Please bear with me. > Hi, seq.seq.tostring() will return a string instead of a sequence object seq.seq regards Bartek Wilczynski From bartek at rezolwenta.eu.org Tue Oct 3 08:23:42 2006 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 03 Oct 2006 10:23:42 +0200 Subject: [BioPython] How to access the actual sequence from Bio.SeqIO.FASTA In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A06157F@mailbe01.teak.local.net> <1159862405.45221885016f3@imp.rezolwenta.eu.org> <3ACF03E372996C4EACD542EA8A05E66A061580@mailbe01.teak.local.net> Message-ID: <1159863822.45221e0ebeb39@imp.rezolwenta.eu.org> Citing Wijaya Edward : > > Hi Bartek, > > Thanks for the reply. > Where can I find info about tostring() ? > I cannot seem to find in the Bio.SeqIO documentation itself? your seq.seq object is an instance of Bio.Seq.Seq class. The tostring method is mentioned in the tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 regards Bartek From biopython at maubp.freeserve.co.uk Tue Oct 3 09:54:29 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 03 Oct 2006 10:54:29 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: References: Message-ID: <45223355.6010304@maubp.freeserve.co.uk> gca500 at york.ac.uk wrote: > Hi All, > > Been having a problem using the Genbank RecordParser with some Genbank > files that have recently been added to NCBI. After a bit of trial and > error, I realised the problem only occurs if a REFERENCE field isn't > followed by an AUTHOR field (for example in reference 2 of this record: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=88602864). > > There's a very easy fix on line 289 of Genbank.py. Decided to post this to > the list to save any one else who stumbles across this problem tearing > their hair out like I've been doing this afternoon! > > Change ... and it works! > > Hope this is useful, > > Gemma Hi Gemma, I have made your suggested change to biopython/Bio/formatdefs/genbank.py as CVS revision 1.10, which should be viewable online soon: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/expressions/genbank.py?cvsroot=biopython I am curious as to why you are using this code (part of the FormatIO system), rather than the Bio.GenBank module. Thank you, Peter From gca500 at york.ac.uk Tue Oct 3 11:36:58 2006 From: gca500 at york.ac.uk (Gemma Atkinson) Date: Tue, 3 Oct 2006 12:36:58 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: <45223355.6010304@maubp.freeserve.co.uk> References: <45223355.6010304@maubp.freeserve.co.uk> Message-ID: Hi Peter, I was using the Bio.Genbank module. This is the code I've been using: from Bio import GenBank parser = GenBank.RecordParser(debug_level=2) record = parser.parse(open("test4.txt")) It was the expressions/genbank.py file, imported from within the Genbank module that I've been changing. I haven't touched the formatdefs/genbank.py file (should have made that clear before - sorry). This was the error I was getting before I changed expressions/ genbank.py: File "testgbparser.py", line 3, in ? record = parser.parse(open("test4.txt")) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Bio/GenBank/__init__.py", line 240, in parse self._scanner.feed(handle, self._consumer) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Bio/GenBank/__init__.py", line 1259, in feed self._parser.parseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/Library/Frameworks/Python.framework/Versions/2.4//lib/ python2.4/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 1153 Gemma On 3 Oct 2006, at 10:54, Peter wrote: > gca500 at york.ac.uk wrote: >> Hi All, >> Been having a problem using the Genbank RecordParser with some >> Genbank files that have recently been added to NCBI. After a bit >> of trial and error, I realised the problem only occurs if a >> REFERENCE field isn't followed by an AUTHOR field (for example in >> reference 2 of this record: http://www.ncbi.nlm.nih.gov/entrez/ >> viewer.fcgi?db=protein&val=88602864). >> There's a very easy fix on line 289 of Genbank.py. Decided to post >> this to the list to save any one else who stumbles across this >> problem tearing their hair out like I've been doing this afternoon! >> Change ... and it works! >> Hope this is useful, >> Gemma > > Hi Gemma, > > I have made your suggested change to biopython/Bio/formatdefs/ > genbank.py as CVS revision 1.10, which should be viewable online soon: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > expressions/genbank.py?cvsroot=biopython > > I am curious as to why you are using this code (part of the > FormatIO system), rather than the Bio.GenBank module. > > Thank you, > > Peter > From biopython at maubp.freeserve.co.uk Tue Oct 3 13:33:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 03 Oct 2006 14:33:48 +0100 Subject: [BioPython] Genbank parsing problem and fix In-Reply-To: References: <45223355.6010304@maubp.freeserve.co.uk> Message-ID: <452266BC.9060809@maubp.freeserve.co.uk> >> Hi Gemma, >> >> I have made your suggested change to biopython/Bio/formatdefs/ >> genbank.py as CVS revision 1.10, which should be viewable online soon: >> >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ >> expressions/genbank.py?cvsroot=biopython I got the URL right, but I mean to say Bio/expressions/genbank.py (which actually has the Martel definition in it) not Bio/formatdefs/genbank.py Peter wrote: >> I am curious as to why you are using this code ... Gemma replied: > I was using the Bio.Genbank module. This is the code I've been using: > > from Bio import GenBank > parser = GenBank.RecordParser(debug_level=2) > record = parser.parse(open("test4.txt")) I would guess you are using BioPython 1.41 (or older) then, as your stack trace was indeed using Martel internally. Recent versions of BioPython (1.42 and later) use a pure python parser in Bio.GenBank as the old Martel code didn't scale well with large input files (to the point of being almost useless on large genomes). If you do update your installation, and run into any problems with the GenBank parser, please do let us know. Peter From ewijaya at i2r.a-star.edu.sg Tue Oct 3 14:16:27 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 03 Oct 2006 22:16:27 +0800 Subject: [BioPython] BioPython for TRANSFAC Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061584@mailbe01.teak.local.net> Hi there, Is there a method in BioPython that allow me to pass the query "fruitfly" or "drosophila" and then returning the: 1. already characterized TF and their binding sites (BS), 2. their respective coregulated genes, and 3. the location of TFBS location/position in the genes. all from TRANSFAC database. -- Regards, Edward WIJAYA ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From m.stantoncook at gmail.com Wed Oct 4 13:38:03 2006 From: m.stantoncook at gmail.com (Mitchell Stanton-Cook) Date: Wed, 4 Oct 2006 23:38:03 +1000 Subject: [BioPython] Creating fusion protein like constructs with BioPython Message-ID: Hello all. I am trying to create fusion protein-like model from two separate pdb files. I introduce a CYS mutant in the target protein, and then wish to form a disulphide bound between it and a small peptide. This is pure computational work. I am using Bio.PDB. As the two structures are in arbitrary frames of reference I need to rotate and translate to form the "construct". I wish to have TargetProtein-CB-SY-SY-CB-SmallPeptide (the peptide is not really added to the N/C term) I have tried many different approaches but have failed miserable to get SmallPeptide rotated relative to TargetProtein at the correct dihedral angle +/-90deg and bond lengths. My current approach is (omitting the correct bond length at this time): TP-CB-SY SY-CB-SP 1 2 3 4 Translate 2 onto 3 Calculate the angle between 1-(23)-4 Calculate the cross product of 1-23 x 23-4 Generate the rotation matrix given the angle and vector Rotate all SP (SmallPeptide) atoms by this rotation matrix. This has not worked. I have had some other ideas and have written code for them. Ideally, I wish to calculate the rotations about X,Y,Z to place the SP at the correct dihedral angle followed by translation, but I have no idea how to do this. 1) Can I use Bio.PDB to do this above task or do I need to look at something else? 2) Does anyone have any ideas on how to complete this goal? Thanking you for your time. Mitch From biopython at maubp.freeserve.co.uk Thu Oct 5 09:47:30 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 05 Oct 2006 10:47:30 +0100 Subject: [BioPython] Creating fusion protein like constructs with BioPython In-Reply-To: References: Message-ID: <4524D4B2.8030600@maubp.freeserve.co.uk> Mitchell Stanton-Cook wrote: > Hello all. > > I am trying to create fusion protein-like model from two separate pdb files. > I introduce a CYS mutant in the target protein, and then wish to form a > disulphide bound between it and a small peptide. > > This is pure computational work. > > ... > > 1) Can I use Bio.PDB to do this above task or do I need to look at something > else? My gut instinct is that yes, you probably can - but you will have to do a lot of the work with your own code. Its not something I have ever tried though. > 2) Does anyone have any ideas on how to complete this goal? You might want to have a look at MMTK, which on the face of it would be better suited. Assuming MMTK will read both PDB files you might have better luck - this proviso is because I have found MMTK will choke on "odd" PDB files, and its support for non-standard residues could be better. http://starship.python.net/crew/hinsen/MMTK/index.html Peter From thamelry at binf.ku.dk Thu Oct 5 09:52:56 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Thu, 5 Oct 2006 11:52:56 +0200 Subject: [BioPython] Creating fusion protein like constructs with BioPython In-Reply-To: References: Message-ID: <2d7c25310610050252j2f889242h84411e0927fb4502@mail.gmail.com> Hi, > I am trying to create fusion protein-like model from two separate pdb files. > I introduce a CYS mutant in the target protein, and then wish to form a > disulphide bound between it and a small peptide. ... > 1) Can I use Bio.PDB to do this above task or do I need to look at something > else? Bio.PDB has functionality to do vector/rotation calculations. Take a look at the Vector.py module. Best, ---- Thomas Hamelryck, Post-doctoral researcher Bioinformatics center Institute of Molecular Biology and Physiology University of Copenhagen Universitetsparken 15 - Bygning 10 DK-2100 Copenhagen ? Denmark Homepage: http://www.binf.ku.dk/Protein_structure From gebauer-jung at ice.mpg.de Thu Oct 5 10:30:36 2006 From: gebauer-jung at ice.mpg.de (Steffi Gebauer-Jung) Date: Thu, 05 Oct 2006 12:30:36 +0200 Subject: [BioPython] Problem parsing Blast XML output from different sources Message-ID: <4524DECC.3030307@ice.mpg.de> Hello, because of blastall 2.2.14 output was not parsed from the Bio.Blast.NCBIStandalone parser, I tried to switch to the recommended Bio.Blast.NCBIXML parser. Thereby I found, that the xml output of the locally installed standalone blastall (2.2.14) differs from the web xml output. For BlastN hsps on Plus/Minus strands, the xml gives query_frame/hit_frame 1 / -1 as usual. But query and frame positions and sequences are switched in direction (would match frames -1/1). As the Bio.Blast.Record returned by the NCBIXML parser only gives frames, sequences and start positions it is not possible (without knowing the source of the xml file) to be sure to find the right data. This is clearly a problem of Blast. But because of the missing end positions in the returned record object it becomes a problem for users of the parser too. Could somebody try to confirm the different behaviour of the xml blast output with his/her own examples/installation? Thanks, Steffi From mdehoon at c2b2.columbia.edu Thu Oct 5 16:01:04 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 05 Oct 2006 12:01:04 -0400 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <4524DECC.3030307@ice.mpg.de> References: <4524DECC.3030307@ice.mpg.de> Message-ID: <45252C40.8040806@c2b2.columbia.edu> Which sequence are you running blast on? I'd like to try this on our local blast installation. --Michiel. Steffi Gebauer-Jung wrote: > Hello, > > because of blastall 2.2.14 output was not parsed from the > Bio.Blast.NCBIStandalone parser, > I tried to switch to the recommended Bio.Blast.NCBIXML parser. > > Thereby I found, that the xml output of the locally installed standalone > blastall (2.2.14) > differs from the web xml output. > > For BlastN hsps on Plus/Minus strands, the xml gives > query_frame/hit_frame 1 / -1 as usual. > But query and frame positions and sequences are switched in direction > (would match frames -1/1). > > As the Bio.Blast.Record returned by the NCBIXML parser only gives > frames, sequences > and start positions it is not possible (without knowing the source of > the xml file) > to be sure to find the right data. > > This is clearly a problem of Blast. > But because of the missing end positions in the returned record object > it becomes a problem for users of the parser too. > > Could somebody try to confirm the different behaviour of the xml blast > output > with his/her own examples/installation? > > Thanks, Steffi > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From kirbywhite at sbcglobal.net Fri Oct 6 08:30:32 2006 From: kirbywhite at sbcglobal.net (kirbywhite at sbcglobal.net) Date: 06 Oct 2006 01:30:32 -0700 Subject: [BioPython] Join kirby white on Yahoo! Messenger! Message-ID: <200610060837.k968bH7m002645@portal.open-bio.org> kirby white wants to talk with you using the new Yahoo! Messenger with Voice: Accept the invitation by clicking this link: http://invite.msg.yahoo.com/invite?op=accept&intl=us&sig=TH4bGUcdNQlSM9glNjqlrYiUe5Ghe81EwN0H9cef5vb5F7R7g9X1RKU7ac1qLispOfRJgQy2V7nt.fUIeMUChnR9ZMz50uB3r5ocpMTyDcxHE4kS.n_LZ2zqpi54EYbR3KHoIq73BouZjRO0y5J6LODqpmvT3VY- With Yahoo! Messenger with Voice, you get: Free worldwide PC-to-PC calls.* All you need are speakers and a microphone (or a headset). If no one's there, leave a voicemail! IM Windows Live™ Messenger friends too. Add your Windows Live friends to your Yahoo! contact list. See when they're online and IM them anytime. Stealth settings keep you in control. Now you can get in touch on your time, by controlling who sees when you're online. So what are you waiting for? It's free. Get Yahoo! Messenger with Voice and start connecting how you want, when you want. * Emergency 911 calling services not available on Yahoo! Messenger. Please inform others who use your Yahoo! Messenger they must dial 911 through traditional phone lines or cell carriers. By using Yahoo! Messenger you agree to not use PC-to-PC calling in countries where prohibited. The above features apply to the Windows version of Yahoo! Messenger. From mdehoon at c2b2.columbia.edu Sun Oct 8 04:51:09 2006 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 08 Oct 2006 00:51:09 -0400 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <45261086.4070708@ice.mpg.de> References: <4524DECC.3030307@ice.mpg.de> <45252C40.8040806@c2b2.columbia.edu> <45261086.4070708@ice.mpg.de> Message-ID: <452883BD.7050907@c2b2.columbia.edu> Hi Steffi, I am trying to replicate this problem with Blast. Where did you get the pat database? I searched for it with google, but there seems to be more than one blast database called pat. --Michiel. Steffi Gebauer-Jung wrote: > Hello, > > I don't know what local databases you have available for testing. > The discrepancy between xml and 'pairwise text' output should be seen > for every Plus/Minus Hsp created by local Blastn (local server or > standalone blastall from command line, I use version 2.2.14) > > I tried several combinations, one is M38240 vs. pat database, > the hsp hit was BD298385. > Here are the interesting output snippets: > >> dbj|BD298385.1| >> >> CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS AND PLANT PARTS > CONTAINING THEM, AND METHODS FOR OBTAINING THEM > Length = 14108 > > Score = 125 bits (63), Expect = 1e-25 > Identities = 63/63 (100%) > Strand = Plus / Minus > > > Query: 727 aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc > 786 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 8332 aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc > 8273 > > Query: 787 cga 789 > ||| > Sbjct: 8272 cga 8270 > > ===================================================== > > 15 > gi|92136243|dbj|BD298385.1| > CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS > AND PLANT PARTS CONTAINING THEM, AND METHODS FOR OBTAINING THEM > BD298385 > 14108 > > > 1 > 125.381 > 63 > 9.63859e-26 > 789 > 727 > 8270 > 8332 > 1 > -1 > 63 > 63 > 63 > > TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT > > > TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT > > > ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > > > > > Thanks, Steffi > > > > > > > Michiel Jan Laurens de Hoon wrote: > >> Which sequence are you running blast on? >> I'd like to try this on our local blast installation. >> >> --Michiel. >> >> Steffi Gebauer-Jung wrote: >> >>> Hello, >>> >>> because of blastall 2.2.14 output was not parsed from the >>> Bio.Blast.NCBIStandalone parser, >>> I tried to switch to the recommended Bio.Blast.NCBIXML parser. >>> >>> Thereby I found, that the xml output of the locally installed >>> standalone blastall (2.2.14) >>> differs from the web xml output. >>> >>> For BlastN hsps on Plus/Minus strands, the xml gives >>> query_frame/hit_frame 1 / -1 as usual. >>> But query and frame positions and sequences are switched in direction >>> (would match frames -1/1). >>> >>> As the Bio.Blast.Record returned by the NCBIXML parser only gives >>> frames, sequences >>> and start positions it is not possible (without knowing the source of >>> the xml file) >>> to be sure to find the right data. >>> >>> This is clearly a problem of Blast. >>> But because of the missing end positions in the returned record object >>> it becomes a problem for users of the parser too. >>> >>> Could somebody try to confirm the different behaviour of the xml >>> blast output >>> with his/her own examples/installation? >>> >>> Thanks, Steffi >>> >>> >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> >> > From luca.beltrame at unimi.it Tue Oct 10 12:01:59 2006 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Tue, 10 Oct 2006 14:01:59 +0200 Subject: [BioPython] Querying Entrez Gene Message-ID: <200610101401.59622.luca.beltrame@unimi.it> Hello. I'm currently in need of querying the Entrez Gene database using a list of IDs I have. After searching in the Biopython documentation, I have found no indication of whether that is possible or not. Is there a way to query NCBI's Entrez Gene database? Thanks in advance. From cjfields at uiuc.edu Tue Oct 10 12:46:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Oct 2006 07:46:53 -0500 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <4E706F3E-C29C-43F1-936D-A2670E1D5A0C@uiuc.edu> There is a BioPerl way (Bio::DB::EntrezGene); not sure about BioPython. Chris On Oct 10, 2006, at 7:01 AM, Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a > list of IDs > I have. After searching in the Biopython documentation, I have > found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From winter at biotec.tu-dresden.de Tue Oct 10 12:50:46 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Tue, 10 Oct 2006 14:50:46 +0200 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <452B9726.1060101@biotec.tu-dresden.de> Dear Luca, you probably need this: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ The code below is Python, not Biopython, but should work. You further need to parse the resulting XML. There is also an EUtils package as part of Biopython, but I never tried it: http://biopython.org/DIST/docs/api/public/Bio.EUtils-module.html Cheers, Christof # Python eutils example import urllib2 eutilsURL = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/" def esummary(db, ids): idlist = ",".join(ids) url = eutilsURL + "esummary.fcgi?db=%(db)s&id=%(idlist)s&retmode=xml" req = urllib2.Request(url % vars()) handle = urllib2.urlopen(req) return handle.read() print esummary("gene", ["3487"]) Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From aloraine at gmail.com Tue Oct 10 12:16:56 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 10 Oct 2006 07:16:56 -0500 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <83722dde0610100516o532c0b5eg329b01cda11eb156@mail.gmail.com> Dear Luca, Whenever I need data from Entrez gene (usually mRNA-to-Gene id mappings, in my case) I download one of the tab-delimited from the NCBI "Gene" ftp site: e.g., gene2go, gene2accession. see: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA -Ann On 10/10/06, Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From palle at birc.au.dk Thu Oct 12 06:59:26 2006 From: palle at birc.au.dk (Palle Villesen) Date: Thu, 12 Oct 2006 08:59:26 +0200 Subject: [BioPython] Querying Entrez Gene In-Reply-To: <200610101401.59622.luca.beltrame@unimi.it> References: <200610101401.59622.luca.beltrame@unimi.it> Message-ID: <452DE7CE.1090008@birc.au.dk> Luca Beltrame wrote: > Hello. > I'm currently in need of querying the Entrez Gene database using a list of IDs > I have. After searching in the Biopython documentation, I have found no > indication of whether that is possible or not. > Is there a way to query NCBI's Entrez Gene database? > Thanks in advance. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > EUtils are also a part of BioPython. You should look in the biopython tutorial for how to use it. Below is my own small "mass downloader" utility in python. (Running on a non-administrator install of both python and biopython). The basic function/module you need is the HistoryClient that can search and retrieve large sets - instead of looping through all your ids one at a time. Anyway - check the tutorial, it's quite good (at least for a person with the same very basic python knowledge as me). sincerely, Palle Villesen, BiRC, DK Program: gb_search.py ------------------- #!/web/biopv/usr/local/bin/python import sys import time biopython_path='/web/biopv/usr/local/lib/python' sys.path.insert(0,biopython_path) def help(): from Bio.EUtils import Config dbs=" ".join(Config.databases.keys()) help= """ GenBank retrieve tool. Usage: gb_search.py QUERY [RECS] [DB] [FORMAT] QUERY : the entrez query enclosed in " " RECS : Number of records/sequences to get at a time (default=20) DB : Database, (default='nucleotide') (%s) Format : Record format (default='fasta', but 'docsum', 'brief', 'gi' - and many others are available) """ % dbs sys.exit(help) return 0 # Default values step=20 database="nucleotide" format="fasta" time2sleep=3 if len(sys.argv) ==1: help() search_term=sys.argv[1] if len(sys.argv)>2 : step=int(sys.argv[2]) if len(sys.argv)>3 : database=sys.argv[3] if len(sys.argv)>4 : format=sys.argv[4] if len(sys.argv)>5 : time2sleep=int(sys.argv[5]) from Bio.EUtils import HistoryClient s = HistoryClient.HistoryClient().search(search_term,db=database) print >>sys.stderr, "Getting %s seqs, %s sequences at a time" % (len(s),step) i=0 while i>sys.stderr, "Getting sequences from ",i,"to",min(i+step,len(s)), print s[i:i+step].efetch(retmode = "text", rettype = format).read() if i+step > len(s): print >>sys.stderr, "..done" break print >>sys.stderr, "...done (sleeping %s seconds)" % time2sleep i+=step time.sleep(time2sleep) ------------------------------------- -- -._ _.--'"`'--._ _.--'"`'--._ _.--'"`'--._ _ '-:`.'|`|"':-. '-:`.'|`|"':-. '-:`.'|`|"':-. '.` : '. '. '. | | | |'. '. | | | |'. '. | | | |'. '.: '. '. : '. '.| | | | '. '.| | | | '. '.| | | | '. '. : '. `. ' '. `.:_ | :_.' '. `.:_ | :_.' '. `.:_ | :_.' '. `.' `. `-..,..-' `-..,..-' `-..,..-' ` ` Palle Villesen, Ph.D. BiRC, Build. 090, University of Aarhus DK - 8000 Aarhus C, Denmark palle.retrosearch.dk - +45 61708600 --------------------------------------------------------------------- From gebauer-jung at ice.mpg.de Fri Oct 13 13:23:35 2006 From: gebauer-jung at ice.mpg.de (Steffi Gebauer-Jung) Date: Fri, 13 Oct 2006 15:23:35 +0200 Subject: [BioPython] Problem parsing Blast XML output from different sources In-Reply-To: <452ABE8B.2050200@c2b2.columbia.edu> References: <4524DECC.3030307@ice.mpg.de> <45252C40.8040806@c2b2.columbia.edu> <45261086.4070708@ice.mpg.de> <452883BD.7050907@c2b2.columbia.edu> <452A3CB5.5080308@ice.mpg.de> <452ABE8B.2050200@c2b2.columbia.edu> Message-ID: <452F9357.8020101@ice.mpg.de> Hello Michiel, the fix works fine. Thanks for the fast reply and fixing! Maybe there should be a hint for other users not to use the frame information of the blast xml output and to test the start/end positions of the hsp sequences instead, and to be aware of reverse query sequences. For my needs I have to have the query sequence in forward direction. That's why I try to reverse-complement the complete alignment if this isn't the case yet. Thereby I found, that Bio.Seq.Seq.complement() cannot handle unicode sequences, in spite of Bio.Seq.Seq might be initialized with unicode strings: >>> import Bio.Seq >>> s = Bio.Seq.Seq(u'acgt') >>> s Seq(u'acgt', Alphabet()) >>> s.complement() Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.5/site-packages/Bio/Seq.py", line 101, in complement s = self.data.translate(ttable) TypeError: character mapping must return integer, None or unicode And just another idea: In order to (reverse)complement aligned sequences it would be useful to have the gap sign '-' in the alphabets. Steffi Michiel Jan Laurens de Hoon wrote: > Hi Steffi, > > I had the same result when running Blast locally. > > I added hsp.query_end and hsp.sbjct_end to the Blast XML parser, so > you can get around this problem. Could you try the fixed Blast parser? > You'll need to pick up Bio/Blast/NCBIXML.py and Bio/Blast/Record.py from > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython > > > If it works fine (or if it doesn't), please send a message to the > Biopython mailing list (instead of my email address), so that this > gets into the mailing list archives. > > --Michiel. > > > Steffi Gebauer-Jung wrote: > >> Hello, >> >> the db was downloaded from ftp://ftp.ncbi.nih.gov//blast/db/patnt.tar.gz >> >> In fact the special query sequence and db shouldn't matter. >> >> If you have any 'Plus / Minus' HSP in a pairwise BlastN output >> you can run BlastN again in order to get the xml formatted output. >> >> Comparing the special HSP in both formats you should see the effect. > > > From dtoomey at rcsi.ie Fri Oct 13 13:58:41 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Fri, 13 Oct 2006 14:58:41 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Hi I have been writing some scripts which make use of the NCBIStandalone module and I have found that when I iterate the results from some local blast runs my script will hang. I have attached 2 examples. I have been running them against two fasta files from drugbank http://redpoll.pharmacy.ualberta.ca/drugbank/download.htm I have tried them against two of the downloads from this site, the redundant and non-redundant "drug target protein sequences" The first query sequence (problem.txt) only hangs when run against the non redundant, the other sequence hangs when run against either. I have tracked down the line that hangs to "line = self._handle.readline(*args,**keywds)" in the readline method of Bio.File.UndoHandle I have no idea why this would be happening so any help would be appreciated. I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 Cheers, Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: problem_fasta_sequences.zip Type: application/x-zip-compressed Size: 1000 bytes Desc: problem_fasta_sequences.zip URL: From biopython at maubp.freeserve.co.uk Fri Oct 13 15:22:40 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Fri, 13 Oct 2006 16:22:40 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs In-Reply-To: References: Message-ID: <452FAF40.2030202@maubp.freeserve.co.uk> Hi David David Toomey wrote: > I have been writing some scripts which make use of the NCBIStandalone module > and I have found that when I iterate the results from some local blast runs > my script will hang. When you say the first query sequence (problem.txt) "hangs" do you get a python stack trace? What is the script you are using? I'm going to guess that you are using python to invoke the NCBI standalone blast program. Have you tried running standalone blast "by hand" at the command line, and had a look at the output? Are you getting plain text output or XML from the blast program? > I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 On Windows, Linux or Mac OS X? Peter From aloraine at gmail.com Mon Oct 16 02:21:40 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 15 Oct 2006 21:21:40 -0500 Subject: [BioPython] too lazy to install InterProScan on my computer Message-ID: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> Greetings all, I've been trying to get EBI's InterProScan Web service to work with ZSI, which I understand is the preferred SOAP messaging implementation for python. I haven't had much luck so far .. I've get an error that look like an incompatibility in the WSDL of EBI (http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl) and what ZSI expects (see below.) Has anyone on the list used EBI's InterProScan from python using ZSI? If yes, did it work for you? Any sample code would be much appreciated, as I am new to SOAP. (EBI provides some example perl and java client code, but I'd like to take advantage of python's interpreter, which in theory would allow me to submit a bunch of requests to EBI using the asynchronous option, collect the job ids in a dictionary, and then fetch the data later on in the same interactive session.) -Ann import ZSI ebi_wsdl = 'http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl' >>> service = ZSI.ServiceProxy(ebi_wsdl) Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/ZSI/ServiceProxy.py", line 34, in __init__ wsdl = ZSI.wstools.WSDLTools.WSDLReader().loadFromURL(wsdl) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", line 42, in loadFromURL wsdl.load(document) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", line 260, in load schema = reader.loadFromNode(WSDLToolsAdapter(self), item) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 80, in loadFromNode schema.load(reader) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1116, in load tp.fromDom(node) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 2283, in fromDom self.content.fromDom(contents[indx]) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1996, in fromDom content[-1].fromDom(i) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 1764, in fromDom self.setAttributes(node) File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 627, in setAttributes self.__checkAttributes() File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", line 673, in __checkAttributes raise SchemaError,\ ZSI.wstools.XMLSchema.SchemaError: class instance ZSI.wstools.XMLSchema.LocalElementDeclaration, missing required attribute name -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From idoerg at burnham.org Mon Oct 16 04:25:08 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sun, 15 Oct 2006 21:25:08 -0700 Subject: [BioPython] too lazy to install InterProScan on my computer In-Reply-To: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> References: <83722dde0610151921n427d438aj94a3613c8871cdaf@mail.gmail.com> Message-ID: <453309A4.60802@burnham.org> Ann, Just a general comment from one who has recently installed InterProScan on his workstation for the first time: 1) Installation is truly a breeze. If you want to mass use interproscan, you should do it on your machine. You'll be happier down the road. So will EBI: I believe there is a reason they moved from 6 sequences at a time to 1 at a time in their web interface. 2) Having said that, I have a Python client that queries InterProScan and parses the results. I can give you the code if you like. Not ZSI, just plain hacking. Cheers, Iddo Ann Loraine wrote: > Greetings all, > > I've been trying to get EBI's InterProScan Web service to work with > ZSI, which I understand is the preferred SOAP messaging > implementation for python. I haven't had much luck so far .. I've get > an error that look like an incompatibility in the WSDL of EBI > (http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl) and > what ZSI expects (see below.) > > Has anyone on the list used EBI's InterProScan from python using ZSI? > If yes, did it work for you? > > Any sample code would be much appreciated, as I am new to SOAP. (EBI > provides some example perl and java client code, but I'd like to take > advantage of python's interpreter, which in theory would allow me to > submit a bunch of requests to EBI using the asynchronous option, > collect the job ids in a dictionary, and then fetch the data later on > in the same interactive session.) > > -Ann > > import ZSI > ebi_wsdl = 'http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl' >>>> service = ZSI.ServiceProxy(ebi_wsdl) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/lib/python2.4/site-packages/ZSI/ServiceProxy.py", > line 34, in __init__ > wsdl = ZSI.wstools.WSDLTools.WSDLReader().loadFromURL(wsdl) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", > line 42, in loadFromURL > wsdl.load(document) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/WSDLTools.py", > line 260, in load > schema = reader.loadFromNode(WSDLToolsAdapter(self), item) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 80, in loadFromNode > schema.load(reader) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1116, in load > tp.fromDom(node) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 2283, in fromDom > self.content.fromDom(contents[indx]) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1996, in fromDom > content[-1].fromDom(i) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 1764, in fromDom > self.setAttributes(node) > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 627, in setAttributes > self.__checkAttributes() > File "/usr/local/lib/python2.4/site-packages/ZSI/wstools/XMLSchema.py", > line 673, in __checkAttributes > raise SchemaError,\ > ZSI.wstools.XMLSchema.SchemaError: class instance > ZSI.wstools.XMLSchema.LocalElementDeclaration, missing required > attribute name > > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org From dtoomey at rcsi.ie Mon Oct 16 08:23:31 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Mon, 16 Oct 2006 09:23:31 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Hi Peter It is part of a large script that I have written but I have replicated the problem with the following simple script blast_out, error_info = NCBIStandalone.blastall("C:/Program Files/Blast/bin" , "blastp", "C:/blast_test/new_prot_target_for_download.txt", "C:/blast_test/problem.txt") b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) record = b_iterator.next() print record.query I don't get any stack trace. I have tried it from the windows command line, Komodo and also on a linux box (although the linux box is Python 2.3.3 rather than 2.4) When I say it hangs I mean it just dosn't return after the b_iterator.next() statement. There is no error message or stack trace. I have run the blast manually from the windows command line and it works fine. The above script works fine with other fasta query sequences against the same database. Cheers, Dave -----Original Message----- From: Peter (BioPython List) [mailto:biopython at maubp.freeserve.co.uk] Sent: 13 October 2006 16:23 To: David Toomey Cc: biopython at biopython.org Subject: Re: [BioPython] NCBIStandalone.iterator hangs Hi David David Toomey wrote: > I have been writing some scripts which make use of the NCBIStandalone module > and I have found that when I iterate the results from some local blast runs > my script will hang. When you say the first query sequence (problem.txt) "hangs" do you get a python stack trace? What is the script you are using? I'm going to guess that you are using python to invoke the NCBI standalone blast program. Have you tried running standalone blast "by hand" at the command line, and had a look at the output? Are you getting plain text output or XML from the blast program? > I am using Biopython 1.42, Python 2.4.3 and blast 2.2.13 On Windows, Linux or Mac OS X? Peter From dtoomey at rcsi.ie Mon Oct 16 11:38:05 2006 From: dtoomey at rcsi.ie (David Toomey) Date: Mon, 16 Oct 2006 12:38:05 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs Message-ID: Thanks for the help Peter The attached file has the output from two queries, problem.txt and works.txt, when run manually from the command line I also edited the NCBIStandalone module to add a print statement to Iterator.next() and then ran the same two files using the script. If you compare the two reports for problem.txt you can see on which line of the report the iterator is hanging. I have had a look at this and can't see anything about the line that is unusual? The last line outputted by the script is Query: 266 LAAHIDQYDIDAMTGIRATDIEKTDEAIKVTLENGAVLESKTVIIATGAGWRKLNIPGEE 325 And the manual report continues with I + + ++ + VL Sbjct: 68 GIMSIPTLILFKGGE-PVKQLIGYQPKEQLEAQLADVL- 125 Even though problem.txt generates a valid report when run manually it does output a load of errors of the type below, but I am not sure how this would cause the script to stop at the line above. [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s top(365) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s top(336) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s tart(337) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02286 s tart(338) >= len(329) [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: lcl|EXPT02113 s tart(284) >= len(149) If it is easier for you I can certainly raise a bug, I just wanted to be sure it wasn't anything silly that I was doing before I did this. Cheers, Dave -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: 16 October 2006 12:11 To: David Toomey Subject: Re: [BioPython] NCBIStandalone.iterator hangs David Toomey wrote: > Hi Peter > > It is part of a large script that I have written but I have > replicated the problem with the following simple script Thank you. > I don't get any stack trace. I have tried it from the windows command > line, Komodo and also on a linux box (although the linux box is > Python 2.3.3 rather than 2.4) When I say it hangs I mean it just > dosn't return after the b_iterator.next() statement. There is no > error message or stack trace. > > I have run the blast manually from the windows command line and it > works fine. The above script works fine with other fasta query > sequences against the same database. That fact that the script runs fine on other query sequences is important. My guess is that the output for problem.txt is somehow different (maybe no matches), and the parser can't cope. Could you email me the blast output from using problem.txt as input, and a second working output from a different query? Or, if you would rather, file a bug and attach the two output files to it. You could do this from the command line, or using python save the blast_out text generated by NCBIStandalone.blastall to a file. (Looking back over the emails, I can't see what version of NCBI Standalone BLAST you have - but it should be specified in the blast output.) Thanks Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_test.zip Type: application/x-zip-compressed Size: 27560 bytes Desc: blast_test.zip URL: From biopython at maubp.freeserve.co.uk Mon Oct 16 15:39:17 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Oct 2006 16:39:17 +0100 Subject: [BioPython] NCBIStandalone.iterator hangs In-Reply-To: References: Message-ID: <4533A7A5.6020102@maubp.freeserve.co.uk> David Toomey wrote: > Thanks for the help Peter > > The attached file has the output from two queries, problem.txt and > works.txt, when run manually from the command line Excellent - that looks like everything I asked for. > I also edited the NCBIStandalone module to add a print statement to > Iterator.next() and then ran the same two files using the script. > If you compare the two reports for problem.txt you can see on which line of > the report the iterator is hanging. I have had a look at this and can't see > anything about the line that is unusual? > > The last line outputted by the script is > Query: 266 LAAHIDQYDIDAMTGIRATDIEKTDEAIKVTLENGAVLESKTVIIATGAGWRKLNIPGEE 325 > > > And the manual report continues with > I + + ++ + VL > Sbjct: 68 GIMSIPTLILFKGGE-PVKQLIGYQPKEQLEAQLADVL- 125 > Your script is hanging during the second alignment in the results for: Nadph Dehydrogenase 1 - Clostridium beijerinckii (Clostridium MP) This alignment does look "funny" to me, but its not the first "funny" alignment in the results. I suspect you have found a problem with NCBI blast, or perhaps have a malformed database (especially given the errors you mention below). However, there are similar odd pairwise alignments before this one in the results which BioPython has apparently coped with... so it is a little odd that BioPython get stuck at this particular point. As to why I think there is a problem: Notice that the Query sequence continues for several lines (up to Query 504) while the Sbjct sequence is blank (up to Sbjct 313) except for a single lone gap character at position 246. I would have expected the match to finish at about Query 303 / Match 103. Very odd. In addition, notice that the header information is inconsistent: Score = 149 bits (376), Expect = 7e-037 Identities = 0/309 (0%), Positives = 0/309 (0%), Gaps = 15/309 (4%) Even looking at just the second set of 60 characters (quoted above) we have three identical matches (I, V and L) and five close matches. In all I would say there where five identical matches (A, S, I, V, L) and a further nine close matches. So the identities score should be 5/length, and the positives 14/length. I would also say the alignment length is either 297 (based on the length of the gapped query shown) or 99+1 (based on the length of the gapped subject sequence shown). Even allowing for my quick counts being out by plus of minus one, I can't see where the stated length of 309 comes from. > > Even though problem.txt generates a valid report when run manually it does > output a load of errors of the type below, but I am not sure how this would > cause the script to stop at the line above. > > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > top(365) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > top(336) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > tart(337) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02286 s > tart(338) >= len(329) > [NULL_Caption] ERROR: ncbiapi [000.000] AHPF_STAAC: SeqPortNew: > lcl|EXPT02113 s > tart(284) >= len(149) > > > If it is easier for you I can certainly raise a bug, I just wanted to be > sure it wasn't anything silly that I was doing before I did this. > Have a look over the output yourself, and see if you agree with me. I assume you get exactly the same results from running Blast on both Linux and Windows. I see you are using standalone BLASTP 2.2.13 [Nov-27-2005], so one thing you could try is updating your copy of Blast. I would also double check how you created/installed the database. I think BioPython is going wrong because its been given "funny" input. It may be possible for us to improve that, but even so, I wouldn't trust those blast results. Good luck Peter From cariaso at yahoo.com Mon Oct 16 16:25:42 2006 From: cariaso at yahoo.com (Mike Cariaso) Date: Mon, 16 Oct 2006 09:25:42 -0700 (PDT) Subject: [BioPython] can someone create biopython-1.42.win32-py2.5.exe Message-ID: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. thanks, Mike Cariaso -- Mike Cariaso * Bioinformatics Software * http://cariaso.com From mdehoon at c2b2.columbia.edu Mon Oct 16 23:03:16 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 16 Oct 2006 19:03:16 -0400 Subject: [BioPython] can someone create biopython-1.42.win32-py2.5.exe In-Reply-To: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> References: <20061016162542.30834.qmail@web90601.mail.mud.yahoo.com> Message-ID: <45340FB4.1020308@c2b2.columbia.edu> Done. See the Biopython download page. Let me know if you have any problems. --Michiel. Mike Cariaso wrote: > python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. > > thanks, > Mike Cariaso > > > -- > Mike Cariaso * Bioinformatics Software * http://cariaso.com > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From hlapp at gmx.net Tue Oct 17 00:46:56 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 16 Oct 2006 20:46:56 -0400 Subject: [BioPython] NESCent Phyloinformatics Hackathon Message-ID: <480EBF5D-0290-429B-A7FF-826DD4A23FD0@gmx.net> (apologies in advance to those who receive this multiple times) The National Evolutionary Synthesis Center (NESCent) in collaboration with Arlin Stoltzfus (U. Maryland, NIST), Aaron Mackey (GSK), Rutger Vos (UBC), and Mark Holder (FSU) sponsors a Phyloinformatics Hackathon to take place Dec 11-15 in Durham, NC. The (wiki) website with more information and a formal proposal is at https://www.nescent.org/wg_phyloinformatics/ In short, the goal is to leverage the Bio* toolkits to provide the "glue" for evolutionary analyses of various types that depend on automation, interoperability, and data integration. CALL FOR INPUT: The specific objectives are driven by "use cases", that is, specific target problems of interest to evolutionary biologists (click 'Use Cases' at the above website). We invite community input in order to focus efforts on the most urgent or pervasive problems. The wiki for the hackathon allows direct editing of the use cases after registration. You may also upload data files, or add comments to the "Forum" page. Alternatively, send email to hlapp at nescent.org. You may also contact any of the organizers with questions or comments. ATTENDANCE: The hackathon is scheduled for Dec 11-15, 2006 in Durham NC. Space is limited, and attendance is by invitation. If you have not been contacted but desire to attend, please contact Hilmar Lapp (hlapp at nescent.org). ORGANIZERS: Hilmar Lapp (NESCent; hlapp at nescent.org) Aaron Mackey (GSK; aaron.j.mackey at gsk.com) Mark Holder (FSU; mholder at scs.fsu.edu) Arlin Stoltzfus (CARB, NIST; arlin.stoltzfus at nist.gov) Todd Vision (NESCent; tjv at bio.unc.edu) Rutger Vos (UBC; rvosa at sfu.ca) From aloraine at gmail.com Tue Oct 17 12:39:54 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 17 Oct 2006 07:39:54 -0500 Subject: [BioPython] intalling on Windows, was: Re: can someone create biopython-1.42.win32-py2.5.exe Message-ID: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> Hi, Along these lines, a question: I helped a student install biopython on his Windows laptop this week. However, we had a couple of problems getting BioPython to work properly with cygwin's python: our attempts to import Bio packages failed, I guess because they were not in cygwin's python's search path. (We were able to get the cygwin's python to find a custom module he had written by setting PYTHONPATH properly, using Windows' Control Panel, and then checking python's search path in sys.path.) How would I tell python where to find the BioPython packages? Is there a way to use PYTHONPATH environment variable to do this? We will try this next time I meet with him, but any tips or advice you might have would be great. Sorry for the possibly dumb questions -- I am not very familiar with how Windows does things. Sincerely, Ann Loraine On 10/16/06, Mike Cariaso wrote: > python 2.5 has been out for a while now. It would be very helpful if whoever creates the win32 installers could create a new one for python2.5. > > thanks, > Mike Cariaso > > > -- > Mike Cariaso * Bioinformatics Software * http://cariaso.com > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From biopython at maubp.freeserve.co.uk Tue Oct 17 13:25:20 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 17 Oct 2006 14:25:20 +0100 Subject: [BioPython] intalling on Windows, was: Re: can someone create biopython-1.42.win32-py2.5.exe In-Reply-To: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> References: <83722dde0610170539j4865950dyf721ff52baaae4b7@mail.gmail.com> Message-ID: <4534D9C0.6080600@maubp.freeserve.co.uk> Ann Loraine wrote: > Hi, > > Along these lines, a question: > > I helped a student install biopython on his Windows laptop this week. > However, we had a couple of problems getting BioPython to work > properly with cygwin's python: our attempts to import Bio packages > failed, I guess because they were not in cygwin's python's search > path. > > (We were able to get the cygwin's python to find a custom module he > had written by setting PYTHONPATH properly, using Windows' Control > Panel, and then checking python's search path in sys.path.) > > How would I tell python where to find the BioPython packages? Is there > a way to use PYTHONPATH environment variable to do this? > > We will try this next time I meet with him, but any tips or advice you > might have would be great. > > Sorry for the possibly dumb questions -- I am not very familiar with > how Windows does things. > > Sincerely, > > Ann Loraine Hello Ann Personally went I tried cygwin's python over a year ago, it did not seem very reliable (especially using idle). Things my have improved but I would recommend you use the "pure windows" version of Python. I suspect you are trying to combine cygwin's python with the pre-compiled windows setup.exe for BioPython. This is probably not going to work - and having to mess about with PYTHONPATH would not surprise me. The bits of BioPython written in pure python may be fine, but some of it is compiled C code... and then you asking for trouble. Pure Windows (not using Cygwin) =============================== For windows, I would recommend you to install a pure windows version of Python 2.5 and BioPython 1.42 using: http://www.python.org/ftp/python/2.5/python-2.5.msi http://biopython.org/DIST/biopython-1.42.win32-py2.5.exe Or the Python 2.4 versions: http://www.python.org/ftp/python/2.4.4/python-2.4.4c1.msi http://biopython.org/DIST/biopython-1.42.win32-py2.4.exe Or, if you have some good reason, the Python 2.3 versions. Check the python website for the latest versions, these were correct at the time or writing. It should "just work", without any messing about with paths. Using cygwin on Windows (ignoring any pure windows python installation) ======================= Basically follow the unix instructions. Install the cygwin version of python, gcc, flex, ..., using cygwin setup. Compile and install BioPython FROM SOURCE at the cygwin command line, using the cygwin version of python. Peter From alpersoyler at yahoo.com Wed Oct 18 10:33:47 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Wed, 18 Oct 2006 03:33:47 -0700 (PDT) Subject: [BioPython] Standalone Blast Message-ID: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> Hi all, I want to use formatdb option of standalone BLAST. I have 400 files with ".pep" extension and I want to format all of them. I looked at the "Biopython tutorial and cookbok" but there was no explanation about it. If you help me, I would be glad. Thank you in advance. Alper From sbassi at gmail.com Wed Oct 18 16:31:39 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 18 Oct 2006 13:31:39 -0300 Subject: [BioPython] Standalone Blast In-Reply-To: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> References: <20061018103347.1681.qmail@web56513.mail.re3.yahoo.com> Message-ID: On 10/18/06, alper soyler wrote: > I want to use formatdb option of standalone BLAST. I have 400 files with ".pep" extension and I want to format all of them. I looked at the "Biopython tutorial and cookbok" but there was no explanation about it. If you help me, I would be glad. Thank you in advance. > Hello, it is not in the Biopython tutorial because it is out of scope, I mean, the biopython tutorial asumes you have a working blast installation. I think you should join all your sequences in one file before running formatdb (like *.pep > allfiles.txt). I asume that all pep files are fasta files. Best regards, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From tomee at genesilico.pl Fri Oct 20 00:52:33 2006 From: tomee at genesilico.pl (Tomek Jarzynka) Date: Fri, 20 Oct 2006 02:52:33 +0200 Subject: [BioPython] Automated ligand extraction from PDB Message-ID: <200610200252.34046.tomee@genesilico.pl> Hi, I would like to create a script that would take a PDB file as input, try and identify the ligand structures and delete them from the PDB file. I figured this would be possible with Biopython by looking at the atom id (whether it contains 'H_') and subclassing Select to not allow those items to be written to a file. Is there any more elegant way of doing this, perhaps another PDB parser framework, or maybe someone has already done similar work? Thanks in advance, -- Tomasz K. Jarzynka / +48 601 706 601 / tomee(a-t)genesilico(d-o-t)pl Laboratory of Bioinformatics and Protein Engineering | www.genesilico.pl International Institute of Molecular and Cell Biology | www.iimcb.gov.pl "You can have either freedom of speech or quality of communication. -- Orson Scott Card" From biopython at maubp.freeserve.co.uk Fri Oct 20 09:21:39 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Oct 2006 10:21:39 +0100 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <200610200252.34046.tomee@genesilico.pl> References: <200610200252.34046.tomee@genesilico.pl> Message-ID: <45389523.5000802@maubp.freeserve.co.uk> Tomek Jarzynka wrote: > Hi, > > I would like to create a script that would take a PDB file as input, > try and identify the ligand structures and delete them from the PDB > file. I figured this would be possible with Biopython by looking at > the atom id (whether it contains 'H_') and subclassing Select to > not allow those items to be written to a file. > Is there any more elegant way of doing this, perhaps another PDB parser > framework, or maybe someone has already done similar work? > > Thanks in advance, You plan sounds fine. If you are going to use BioPython to analyse PDB files later, then this would be a good exercise to get to grips with it. Also, using the BioPython PDB parser in strict mode (non-permissive) then it may also flag up some potential problems in the PDB file which you may be interested in for you work. A quick and dirty alternative would be to read in the input file line by line, and output selected lines: * If is not an atom record, just output it. * If is is an atom record, look at the atom id to decide. Peter From tomee at genesilico.pl Fri Oct 20 11:47:14 2006 From: tomee at genesilico.pl (Tomek Jarzynka) Date: Fri, 20 Oct 2006 13:47:14 +0200 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <45389523.5000802@maubp.freeserve.co.uk> References: <200610200252.34046.tomee@genesilico.pl> <45389523.5000802@maubp.freeserve.co.uk> Message-ID: <200610201347.15086.tomee@genesilico.pl> On Friday 20 October 2006 11:21, Peter wrote: > You plan sounds fine. > If you are going to use BioPython to analyse PDB files later, then this > would be a good exercise to get to grips with it. I tried some simple code and it works, but I find the 'H_' comparison pretty inelegant. Is there any better way to look for HET atoms, or ligand atoms in particular? > Also, using the BioPython PDB parser in strict mode (non-permissive) > then it may also flag up some potential problems in the PDB file which > you may be interested in for you work. Thanks, I'll take a look at it. Actually, Kristian Rother who's referenced as one of the PDB code authors is my 'roommate' at the institute :) BTW. How does PDB in Biopython relate to pymmlib and cctbx? > A quick and dirty alternative would be to read in the input file line by > line, and output selected lines: > * If is not an atom record, just output it. > * If is is an atom record, look at the atom id to decide. Yeah, doing this in a shell script would be easy at first but I am expecting trouble with some non-standard PDB files. -- Tomasz K. Jarzynka / +48 601 706 601 / tomee(a-t)genesilico(d-o-t)pl Laboratory of Bioinformatics and Protein Engineering | www.genesilico.pl International Institute of Molecular and Cell Biology | www.iimcb.gov.pl "Kt?? nie chcia?by sta? si? przez szcz??cie g?upszy, zamiast by? m?drzejszy przez szkod?. -- Kamil C. Norwid" From sdavis2 at mail.nih.gov Sat Oct 21 01:20:21 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Oct 2006 21:20:21 -0400 Subject: [BioPython] Martel-based parsing of Unigene flat files Message-ID: <453975D5.4070701@mail.nih.gov> I am relatively new to python and biopython (coming from perl side of things). I would like to make a parser for Unigene flat file format. However, after digging through the LocusLink parsing code (as probably the most similar format, etc.), I'm still at a loss for how Martel-based parsing works. I understand the big picture (converting an re-based parsing of a file into events), but it is the detail that I am missing. I know about pydoc, but the pydoc for much of Martel is not very helpful to me, at least not in my current state of knowledge. Any suggestions on how to get started? Thanks, Sean From Thenaturenook1 at aol.com Sat Oct 21 13:35:13 2006 From: Thenaturenook1 at aol.com (Thenaturenook1 at aol.com) Date: Sat, 21 Oct 2006 09:35:13 EDT Subject: [BioPython] installing biopython Message-ID: Hi, I am in the process of installing BioPython on a Windows XP System. I have been using the Windows installers throughout and following the instruction in the PDF file. I have installed all of the compulsary pre requisite modules in the order designated, but when i type the test code, this AttributeError happens: Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.2 >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> new_seq = Seq('GATCAGAAC', unambiguous_dna) >>> new_seq[0:2] Seq('GA', IUPACUnambiguousDNA()) >>> from Bio import Translate >>> translator = Translate.umambiguous_dna_by_name["Standard"] Traceback (most recent call last): File "", line 1, in translator = Translate.umambiguous_dna_by_name["Standard"] AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' >>> Can anyone help? thanks,tim From idoerg at burnham.org Sat Oct 21 14:58:01 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 21 Oct 2006 07:58:01 -0700 Subject: [BioPython] installing biopython References: Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D46C0@MAIL.burnham.org> > AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' You misspelled the module's name. ./I -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Thenaturenook1 at aol.com Sent: Sat 10/21/2006 6:35 AM To: biopython at biopython.org Subject: [BioPython] installing biopython Hi, I am in the process of installing BioPython on a Windows XP System. I have been using the Windows installers throughout and following the instruction in the PDF file. I have installed all of the compulsary pre requisite modules in the order designated, but when i type the test code, this AttributeError happens: Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.2 >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> new_seq = Seq('GATCAGAAC', unambiguous_dna) >>> new_seq[0:2] Seq('GA', IUPACUnambiguousDNA()) >>> from Bio import Translate >>> translator = Translate.umambiguous_dna_by_name["Standard"] Traceback (most recent call last): File "", line 1, in translator = Translate.umambiguous_dna_by_name["Standard"] AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' >>> Can anyone help? thanks,tim _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From sbassi at gmail.com Sat Oct 21 14:51:11 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Sat, 21 Oct 2006 11:51:11 -0300 Subject: [BioPython] installing biopython In-Reply-To: References: Message-ID: On 10/21/06, Thenaturenook1 at aol.com wrote: > Hi, > I am in the process of installing BioPython on a Windows XP System. I have .... > the order designated, but when i type the test code, this AttributeError This is not a problem with the biopython installation. Could you please send me the URL of the PDF file? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Sat Oct 21 14:39:38 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Oct 2006 15:39:38 +0100 Subject: [BioPython] installing biopython In-Reply-To: References: Message-ID: <453A312A.9010001@maubp.freeserve.co.uk> Thenaturenook1 at aol.com wrote: >>>> from Bio import Translate >>>> translator = Translate.umambiguous_dna_by_name["Standard"] > > Traceback (most recent call last): > File "", line 1, in > translator = Translate.umambiguous_dna_by_name["Standard"] > AttributeError: 'module' object has no attribute 'umambiguous_dna_by_name' > > Can anyone help? You have a typing error there, uMambiguous rather than uNambiguous. Also try dir(Translate) to see what else is on offer. i.e. Try this: translator = Translate.unambiguous_dna_by_name["Standard"] Peter From thamelry at binf.ku.dk Sun Oct 22 18:26:17 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Sun, 22 Oct 2006 20:26:17 +0200 Subject: [BioPython] Automated ligand extraction from PDB In-Reply-To: <200610200252.34046.tomee@genesilico.pl> References: <200610200252.34046.tomee@genesilico.pl> Message-ID: <2d7c25310610221126l7fbd5d65u4b8feb5888a81935@mail.gmail.com> On 10/20/06, Tomek Jarzynka wrote: > Hi, > > I would like to create a script that would take a PDB file as input, > try and identify the ligand structures and delete them from the PDB > file. Take a look at the extract function in Bio.PDB's Dice module. extract(s, "A", 1 100, "out.pdb") will write all amino acids between 1 and 100 of chain A in structure s to file "out.pdb". Hetero residues (ie. ligands) and hydrogens are not included. -Thomas From konrad.koehler at mac.com Mon Oct 23 16:32:09 2006 From: konrad.koehler at mac.com (Konrad Forster Koehler) Date: Mon, 23 Oct 2006 09:32:09 -0700 Subject: [BioPython] Using Bio.PDB to add atoms to a structure? Message-ID: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> Using Bio.PDB, I was wondering if there is a way to: 1) read in a structure from a PDB file 2) calculate the position of a new chain/residue/atom 3) add new these chains/residues/atoms to the structure 4) write of the PDB file containing the original + new atoms I have no problem with steps 1, 2, and 4, but I am stuck on step #3. I have tried things like: http://bioinformatics.org/bradstuff/bp/api/Bio/PDB/StructureBuilder_StructureBuilder.py.html#init_chain builder = StructureBuilder() chain_x = builder.init_chain("0","X") error message: init_chain() takes exactly 2 arguments (3 given) or chain_x = builder.init_chain("X") error message: StructureBuilder instance has no attribute 'model' I have googled for examples or documentation on adding new atoms using Bio.PDB, but I have not found anything. Can anyone provide me with any pointers, examples, etc? Best regards, Konrad From thamelry at binf.ku.dk Mon Oct 23 19:10:35 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 23 Oct 2006 21:10:35 +0200 Subject: [BioPython] Using Bio.PDB to add atoms to a structure? In-Reply-To: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> References: <3938052.1161621129819.JavaMail.konrad.koehler@mac.com> Message-ID: <2d7c25310610231210p64876f23k25bb4a67ca58c821@mail.gmail.com> Hi, Adding a chain: from Bio.PDB.Chain import Chain chain=Chain('B') model.add(chain) You have to make sure that the id (in this case 'B') is not yet present in the model. Similar for atom, residue and model. Best, -Thomas From biopython at maubp.freeserve.co.uk Wed Oct 25 09:27:45 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Oct 2006 10:27:45 +0100 Subject: [BioPython] [Biopython-dev] Martel-based parsing of Unigene flat files In-Reply-To: <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> References: <453975D5.4070701@mail.nih.gov> <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> Message-ID: <453F2E11.7030408@maubp.freeserve.co.uk> Sean, I did have a little look at Unigene, but wasn't sure which files exactly you wanted to parse. Like the NCBI, they seem to offer lots of different file formats. Chris Lasher wrote: > Hi Sean, > > FWIW this should probably have been posted to BioPython-dev, but I > don't think that would improve your chances of getting a response. I > am cross-posting it there, anyways. Unfortunately for you, I do not > have an answer for you. :-( The dev list would probably have been a better idea. I had seen Sean' email and was meaning to write something in the absence of any other takers. > I, myself, would be interested in a response to this question from the > Devs, as I would like to write a parser for PTT files. Last I saw > there was a lot of chatter about the Martel parsers being incredibly > slow compared to straightforward solutions. It seems that standard > format parsers would be one of the easiest ways for BioPython newbies > to contribute to developing the BioPython project, however, there > isn't very much in the way of documentation on the BioPython way to do > so, let alone developer documentation at all. I would like to know > what can be done to get some dev docs going on the wiki. I'm one of the more recent contributors - for example, I changed the GenBank parser from Martel to just Python. This was done on the pretext that the old parser (when it worked) was exceedingly slow on large files. There is still room for improvement, but I can now load whole chromosomes/genomes. If for your file format, the individual records (repeating units) are over 10MB in size, then I would begin to worry about the performance using Martel. Otherwise it might be OK... In the process of this work I did eventually get a feel for how Martel works, and how to define file formats etc. Its rather a clever design but it is daunting for new comers. Also, when someone manages to find a file formatted sufficiently different to what a Martel parser expects, working out what exactly needs to be fixed is sometimes tricky. Over on the developers list we have had some talk about where to go in future, and at the moment I have been working on a SeqIO system a little like BioPerl's, http://bugzilla.open-bio.org/show_bug.cgi?id=2059 http://biopython.org/wiki/SeqIO This is a work in progress... I've been planning to actually check something into CVS in the near future. I would also need to lay down guidelines on how annotations are stored so that file format conversion is as smooth as possible. Chris mentioned PTT files (protein table files), available from the NCBI (and probably other databases too). I think PTT files had been mentioned on the dev list in the context of SeqIO (sequence input/output), and one suggestion was to load them as annotated SeqRecord objects with an empty Sequence. Depending on what people want to do with a PTT file, this may not suit everyone. Peter From sdavis2 at mail.nih.gov Wed Oct 25 13:17:10 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 25 Oct 2006 09:17:10 -0400 Subject: [BioPython] =?iso-8859-1?q?=5BBiopython-dev=5D_Martel-based_parsi?= =?iso-8859-1?q?ng_of_Unigene=09flat_files?= In-Reply-To: <453F2E11.7030408@maubp.freeserve.co.uk> References: <453975D5.4070701@mail.nih.gov> <128a885f0610241922h5db02fbfod1a83cfeade29801@mail.gmail.com> <453F2E11.7030408@maubp.freeserve.co.uk> Message-ID: <200610250917.10160.sdavis2@mail.nih.gov> On Wednesday 25 October 2006 05:27, Peter wrote: > Sean, > > I did have a little look at Unigene, but wasn't sure which files exactly > you wanted to parse. Like the NCBI, they seem to offer lots of > different file formats. I was thinking about files like Hs.data. These files are very simple file formats and can be parsed using simple regexes and if statements VERY quickly. I have written one in perl (because the bioperl one creates objects when none were needed in my case, and so was slow). I simply wanted to do the same in python, but wanted to "do it right". > Chris Lasher wrote: > > Hi Sean, > > > > FWIW this should probably have been posted to BioPython-dev, but I > > don't think that would improve your chances of getting a response. I > > am cross-posting it there, anyways. Unfortunately for you, I do not > > have an answer for you. :-( > > The dev list would probably have been a better idea. I will join and certainly use the dev list in the future for questions along these lines. It always takes a bit to get the culture of a new set of lists correct. > I'm one of the more recent contributors - for example, I changed the > GenBank parser from Martel to just Python. This was done on the pretext > that the old parser (when it worked) was exceedingly slow on large > files. There is still room for improvement, but I can now load whole > chromosomes/genomes. Good to know. I'll take a look at this code. > If for your file format, the individual records (repeating units) are > over 10MB in size, then I would begin to worry about the performance > using Martel. Otherwise it might be OK... > > In the process of this work I did eventually get a feel for how Martel > works, and how to define file formats etc. Its rather a clever design > but it is daunting for new comers. > > Also, when someone manages to find a file formatted sufficiently > different to what a Martel parser expects, working out what exactly > needs to be fixed is sometimes tricky. > > Over on the developers list we have had some talk about where to go in > future, and at the moment I have been working on a SeqIO system a little > like BioPerl's, Just keep in mind that on the bioperl side, as annotations have gotten richer and file size has become a non-issue for storage, some of those parsers are not keeping up in terms of speed. SeqIO is fairing quite well, but the BLAST parser isn't, just as an example. There is a fine line between creating objects for everything and speedy parsing into raw data structures. In fact, having a couple of parsers (not fully deprecating a fast but trivial parser) is probably the best general way to go. In short, the parser/consumer model is relatively new to me and I think that is where I need to spend a bit of time learning the lay of the land. Thanks for the hints and pointers. I'll look a bit more at code and then try to ask more specific questions as they arise. Sean From pmmagic at gmail.com Thu Oct 26 15:19:19 2006 From: pmmagic at gmail.com (paul m) Date: Thu, 26 Oct 2006 11:19:19 -0400 Subject: [BioPython] Postdoctoral Position - Systems/Computational Biology Message-ID: <991e7bc10610260819h3b8834a5o4418807eed32051@mail.gmail.com> I hope this email is considered appropriate for the BioPython mailing list. Python is the language of choice for compuational work in my lab and I hope there may be some folks on the list who are in the process of finishing up Ph.D. work and looking towards the next step... Postdoctoral Position in Systems/Computational Biology A two year postdoctoral position is available in the Department of Biology at Duke University and the newly formed Duke Center for Systems Biology. We seek a highly motivated postdoctoral research associate who has a strong background in statistical and computational methods. The successful candidate will help to develop quantitative models of the regulatory networks underlying complex traits in yeast. The person who fills this position will also participate in a Howard Hughes Medical Institute funded initiative to develop quantitative laboratory materials for an undergraduate biology course. To apply for this position please send a cover letter, CV and the names and contact information for three references to: Dr. Paul Magwene, Department of Biology, Duke University, P.O. Box 91000, Durham, NC 27708. You may also email this information to paul.magwene at duke.edu. From mdehoon at c2b2.columbia.edu Mon Oct 30 22:08:50 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Mon, 30 Oct 2006 17:08:50 -0500 Subject: [BioPython] Unigene flat file parser Message-ID: <454677F2.1050309@c2b2.columbia.edu> Hi everybody, [If you're also on biopython-dev, you've already received this post. Sorry for the cross-post.] Sean Davis of NIH has written a parser for the Unigene flat file format described here: ftp://ftp.ncbi.nih.gov/repository/UniGene/README under the Hs.data section. A natural place to include this in Biopython would be under Bio/UniGene. However, there is some code already under Bio/Unigene, but I couldn't find documentation for it and it hasn't been updated in more than two years, so it may be some dead code sitting around. If so, we may as well remove this code and put Sean's code there. So just to make sure that this wouldn't harm somebody's work: Is anybody using the current Bio/UniGene code? Thanks, --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From arareko at campus.iztacala.unam.mx Tue Oct 31 15:08:58 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Tue, 31 Oct 2006 09:08:58 -0600 Subject: [BioPython] FreeBSD port updated Message-ID: <4547670A.1010009@campus.iztacala.unam.mx> biopython-l, This is to inform you that the FreeBSD port for BioPython has been updated from 1.41 to 1.42. Many thanks to Thomas Abthorpe who created the patch for this update. Regards, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM