From clayton_kd at yahoo.com Tue May 2 03:47:35 2006 From: clayton_kd at yahoo.com (Kyle Dent) Date: Tue, 2 May 2006 00:47:35 -0700 (PDT) Subject: [BioPython] GenBank parsing In-Reply-To: <4453E0B3.9040409@maubp.freeserve.co.uk> Message-ID: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> Hey Peter, Thanks for the help, that updated managed to solve a separate problem I encountered with a genbank file I downloaded in December. Unfortunately my existing problem stands. Here is the latest complaint (curtosy of the revised parser): Traceback (most recent call last): File "C:\blast\bin\script.py", line 21, in ? cur_record = gb_iterator.next() File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 146, in next return self._parser.parse(File.StringHandle(data)) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 191, in parse self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1541, in feed line = self._feed_header(handle, consumer) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1507, in _feed_header getattr(consumer, self._consumer_dict[line_type])(data) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 511, in title self._current_ref.title = content AttributeError: 'NoneType' object has no attribute 'title' I had examined the difference between Genbank files downloaded recently with those I downloaded in the past (its been a few months since Ive had to work with this). The major difference is a TITLE field which adjacent to the ACCESSION field. BioEdit seems to add these title fields as well, hence my not being able to parse BioEdited genbank files either. Help with this would be greatly appreciated, Regards, --- "Peter (BioPython)" wrote: > Kyle Dent wrote: > > Dear All, > > > > My script was successfully implementing the > Genbank > > parser until just today I was trying to get it to > > parse a genpept file. After much experimentation I > > discovered that it was actually having trouble > parsing > > even newly downloaded GenBank files as well > > (downloaded of NCBI). > > > > I wanted to ask if anyone is aware of this > problem, I > > understand the flat file format was updated this > month > > and is probably the cause of this. > > I'm aware that earlier in 2006, there was a new > project line added. I > haven't been aware of any further changes... on the > other hand, I don't > think I've ever used a "genpept" file either. > > Anyway, from the error message you are using the > "old" Martel based > parser shipped with BioPython 1.41 > > We recommend you update to the current CVS parser > which is (a) more up > to date, (b) faster, (c) should give slightly more > helpful error > messages if it does get stuck. > > For most cases you can simply download this file, > replacing your > Bio/GenBank/__init__.py after making a backup of the > old version: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython > > If you see errors about ReseekFile then you will > need to make a few > other changes... > > If you are still having trouble, or need further > help making the update, > please reply back. Including the GenBank reference > of any problem file > would be handy. > > Thank you > > Peter > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From biopython at maubp.freeserve.co.uk Tue May 2 06:41:11 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 02 May 2006 11:41:11 +0100 Subject: [BioPython] GenBank parsing In-Reply-To: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> References: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> Message-ID: <44573747.7070605@maubp.freeserve.co.uk> Kyle Dent wrote: > Hey Peter, > > Thanks for the help, that updated managed to solve a > separate problem I encountered with a genbank file I > downloaded in December. > > Unfortunately my existing problem stands. Here is the > latest complaint (curtosy of the revised parser): .. > I had examined the difference between Genbank files > downloaded recently with those I downloaded in the > past (its been a few months since Ive had to work with > this). The major difference is a TITLE field which > adjacent to the ACCESSION field. BioEdit seems to add > these title fields as well, hence my not being able to > parse BioEdited genbank files either. > > Help with this would be greatly appreciated, > > Regards, I've just checked with a random small bacterial genomes, freshly downloaded from the NCBI, and don't have a problem here: There is no "TITLE" as part of the "ACCESSION" line. Could you post a link to one of these "problem files" please? (Or send a copy directly to me only, off the mailing list to avoid clogging up everyone else's mailboxes) Thanks Peter From alpersoyler at yahoo.com Tue May 2 11:24:47 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Tue, 2 May 2006 08:24:47 -0700 (PDT) Subject: [BioPython] Question!!! Message-ID: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> Hi all, I have a protein in fasta format in a file called 'fasta' and I am writing below script: from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) However, it gives me the following error: Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'qblast' Do I have something missing in my Biopython? How can I solve this problem? Thank you. Regards, Alper --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice. From mdehoon at c2b2.columbia.edu Tue May 2 12:21:58 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 May 2006 12:21:58 -0400 Subject: [BioPython] Question!!! Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECF26@cgcmail.cgc.cpmc.columbia.edu> >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) Do you have anything between these two lines? Because otherwise, fasta is not defined. > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'qblast' Which version of Biopython are you using? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of alper soyler Sent: Tue 5/2/2006 11:24 AM To: biopython at lists.open-bio.org Subject: [BioPython] Question!!! Hi all, I have a protein in fasta format in a file called 'fasta' and I am writing below script: from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) However, it gives me the following error: Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'qblast' Do I have something missing in my Biopython? How can I solve this problem? Thank you. Regards, Alper --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From winter at biotec.tu-dresden.de Tue May 2 12:05:13 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Tue, 02 May 2006 18:05:13 +0200 Subject: [BioPython] Question!!! In-Reply-To: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> References: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> Message-ID: <44578339.8010005@biotec.tu-dresden.de> Dear Alper, you might want to try print NCBIWWW.__doc__ print NCBIWWW.__file__ to first see if the doc string of the NCBIWWW module has a section that mentions qblast as a function, and, if not, check the source code of the module (__file__ tells you the location) if it's correct. It should contain a qblast function (line 1038 in my version of NCBIWWW.py). HTH, Christof alper soyler wrote: > Hi all, > > I have a protein in fasta format in a file called 'fasta' and I am writing below script: > > from Bio.Blast import NCBIWWW > result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) > > However, it gives me the following error: > > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'qblast' > > Do I have something missing in my Biopython? How can I solve this problem? Thank you. > > Regards, > Alper From cce at clarkevans.com Wed May 3 19:23:23 2006 From: cce at clarkevans.com (Clark C. Evans) Date: Wed, 3 May 2006 19:23:23 -0400 Subject: [BioPython] Google Summer of Code Message-ID: <20060503232323.GA47256@prometheusresearch.com> Proposals are due by May 8, 2006 13:00 PDT. Proposals made before this period may get comments and permit you to update to reflect feedback on what you submitted. ----- Forwarded message from Neal Norwitz ----- Date: Thu, 20 Apr 2006 00:32:56 -0700 From: "Neal Norwitz" To: python-list at python.org Subject: Python Software Foundation seeks mentors and students for Google Summer of Code This spring and summer, Google will again provide stipends for students (18+, undergraduate thru PhD programs) to write new open-source code. The Python Software Foundation (PSF) http://www.python.org/psf/ will again act as a sponsoring organization in Google's Summer of Code, matching mentors and projects benefiting Python and Python users. Projects can include work on the core Python language, programmer utilities, libraries, packages, frameworks related to Python, or other Python implementations like Jython, PyPy, or IronPython. Please add your project ideas to the existing set at http://wiki.python.org/moin/SummerOfCode Mentoring instructions are also on this page. The deadline is soon, so please sign up as a mentor early. If you are a student considering a project, you should start deciding now. Feel free to ask questions on python-dev at python.org The main page for the Summer of Code is http://code.google.com/summerofcode.html At the bottom are links to StudentFAQ, MentorFAQ, and TermsOfService. The first two have the timeline. Note that student applications are due between May 1, 17:00 PST and May 8, 17:00 PST. People interested in mentoring a student though PSF are encouraged to contact me, Neal Norwitz at nnorwitz at gmail.com. People unknown to Guido or myself should find a couple of people known within the Python community who are willing to act as references. Feel free to contact me if you have any questions. I look forward to meeting many new mentors and students. Let's improve Python! ----- End forwarded message ----- From mmokrejs at ribosome.natur.cuni.cz Thu May 4 15:09:57 2006 From: mmokrejs at ribosome.natur.cuni.cz (=?windows-1252?Q?Martin_MOKREJ=8A?=) Date: Thu, 04 May 2006 21:09:57 +0200 Subject: [BioPython] Need help parsing Blastoutput In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <445A5185.7090304@ribosome.natur.cuni.cz> Hi, Michiel De Hoon wrote: > A general question is if anybody still needs the parser for Blast text > output. Currently, we are confusing our users by having a Blast text parser > that tends to break. A broken parser may be worse than no parser. I still do think it is worth the effort to keep it in the tree, for study purposes and for the fact that not everybody uses XML formatted output yet. ;-) Martin From sbassi at gmail.com Thu May 4 15:45:56 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 May 2006 16:45:56 -0300 Subject: [BioPython] Need help parsing Blastoutput In-Reply-To: <445A5185.7090304@ribosome.natur.cuni.cz> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> <445A5185.7090304@ribosome.natur.cuni.cz> Message-ID: On 5/4/06, Martin MOKREJ? wrote: > I still do think it is worth the effort to keep it in the tree, for > study purposes and for the fact that not everybody uses XML formatted output > yet. ;-) While I agree that nobody is using XML output by default (I also use text or HTML output), I think people will use the XML output if they BLAST knowing that they will parse it using Biopython. So I think that text parser should stay in Biopython, but examples in documentation should be in XML. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From dag23 at duke.edu Fri May 5 17:00:03 2006 From: dag23 at duke.edu (David Garfield) Date: Fri, 5 May 2006 17:00:03 -0400 Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Message-ID: <9FBD5F55-F7A9-4239-B7A1-5CFC3FDC54C4@duke.edu> Its been a while since I used the Bio.Blast modules, and I've been through one BioPython upgrade (at least) since then. It appears in the interlude the sbjct_end and query_end in the Record object have gone away. Am I totally wedged on this? If they're gone, does anyone have suggestions about other ways for deriving the ends which are oh so useful for other tasks. Cheers, David From mdehoon at c2b2.columbia.edu Fri May 5 18:18:18 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 5 May 2006 18:18:18 -0400 Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECF31@cgcmail.cgc.cpmc.columbia.edu> Can you show the code that used to work and doesn't work any more? You can probably get sbjct_end from sbjct_start and the sequence length. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of David Garfield Sent: Fri 5/5/2006 5:00 PM To: biopython at biopython.org Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Its been a while since I used the Bio.Blast modules, and I've been through one BioPython upgrade (at least) since then. It appears in the interlude the sbjct_end and query_end in the Record object have gone away. Am I totally wedged on this? If they're gone, does anyone have suggestions about other ways for deriving the ends which are oh so useful for other tasks. Cheers, David _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From lee.byung-chul at kaist.ac.kr Thu May 11 14:01:43 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 12 May 2006 03:01:43 +0900 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record Message-ID: <44637C07.30200@kaist.ac.kr> Hi biopython users, During paring the pdb files, I met a ridiculous parsing result. I coded my program like below: from Bio.PDB import * p = PDBParser() s= p.get_structure('test','pdb1j1t.ent') m0 = s[0] for r in m0['A']: print r then the result was obtained like : ... But in the pdb file, the TER record was written before Residue CA like below: ... ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O TER 1766 ASN A 233 HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S Thus I think BioPython's model iterator must stop before , and I want to know why this happens and how I can solve this. Regards, Byung-chul Lee -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From biopython at maubp.freeserve.co.uk Thu May 11 17:50:30 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Thu, 11 May 2006 22:50:30 +0100 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record In-Reply-To: <44637C07.30200@kaist.ac.kr> References: <44637C07.30200@kaist.ac.kr> Message-ID: <4463B1A6.8010006@maubp.freeserve.co.uk> Lee, Byung-chul wrote: > Hi biopython users, > > During paring the pdb files, I met a ridiculous parsing result. > > I coded my program like below: > > from Bio.PDB import * > p = PDBParser() > s= p.get_structure('test','pdb1j1t.ent') > m0 = s[0] > for r in m0['A']: > print r > > then the result was obtained like : > ... > > > > > > But in the pdb file, the TER record was written before Residue CA like > below: > ... > ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O > ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N > ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O > TER 1766 ASN A 233 > HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA > HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S > > Thus I think BioPython's model iterator must stop before het=H_ CA resseq=301 icode= >, and I want to know why this happens and > how I can solve this. I inferred from the filename used, that you are talking about the full PDB file for record 1j1t - certainly this seems to match the samples lines you quoted. Are the HETATM 1767 and 1768 really part of the chain? What about the rest of the S04 HETATM lines (1768 to 1772) and the following waters (HETATM 1773 to 2092). I would guess that BioPython treats a termination record (TER) as the end of the chain, and on the face of it that is the correct action. If you really want to treat these atoms as part of the chain you might try editing the PDB file to move the TER line a little lower down. I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be along in a while with a definitive answer... Peter From lee.byung-chul at kaist.ac.kr Fri May 12 01:17:48 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 12 May 2006 14:17:48 +0900 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record In-Reply-To: <4463B1A6.8010006@maubp.freeserve.co.uk> References: <44637C07.30200@kaist.ac.kr> <4463B1A6.8010006@maubp.freeserve.co.uk> Message-ID: <44641A7C.6020105@kaist.ac.kr> Thank Boris and Peter for your sincere efforts and kind reply. My point was that I wanted to parse only ATOM/HETATM records before TER record, so in the case of mine, I wanted regions from SER (resseq=6) to ASN (resseq=233). While I was looking for what to make problem in the pdb1j1t.ent after receiving your mails, it was founded that a chain id 'A' was included in the HETATM 1767 record. After deleting the chain id 'A' of the 22th column in the HETATM 1767, my expected result was properly obtained. So I think PDBParser seems not to recognize 'TER' record but to recognize only chain id column due to many problems of pdb ent file format. I will contact Thomas Hamelryck, and thanks again. Regards, Byung-chul Peter (BioPython) > Lee, Byung-chul wrote: > >> Hi biopython users, >> >> During paring the pdb files, I met a ridiculous parsing result. >> >> I coded my program like below: >> >> from Bio.PDB import * >> p = PDBParser() >> s= p.get_structure('test','pdb1j1t.ent') >> m0 = s[0] >> for r in m0['A']: >> print r >> >> then the result was obtained like : >> ... >> >> >> >> >> >> But in the pdb file, the TER record was written before Residue CA like >> below: >> ... >> ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O >> ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N >> ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O >> TER 1766 ASN A 233 >> HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA >> HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S >> >> Thus I think BioPython's model iterator must stop before > het=H_ CA resseq=301 icode= >, and I want to know why this happens and >> how I can solve this. > > > I inferred from the filename used, that you are talking about the full > PDB file for record 1j1t - certainly this seems to match the samples > lines you quoted. > > Are the HETATM 1767 and 1768 really part of the chain? What about the > rest of the S04 HETATM lines (1768 to 1772) and the following waters > (HETATM 1773 to 2092). > > I would guess that BioPython treats a termination record (TER) as the > end of the chain, and on the face of it that is the correct action. > > If you really want to treat these atoms as part of the chain you might > try editing the PDB file to move the TER line a little lower down. > > I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be > along in a while with a definitive answer... > > Peter > > -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From saskiavillinger at gmx.de Wed May 17 12:08:59 2006 From: saskiavillinger at gmx.de (Saskia Villinger) Date: Wed, 17 May 2006 18:08:59 +0200 (MEST) Subject: [BioPython] PDBParser, problem with pdb files missing b factors and occupancies Message-ID: <5333.1147882139@www099.gmx.net> Hi! I'd like to use the PDBParser for pdb files missing b factors and occupancies. When I write: from Bio.PDB.PDBParser import PDBParser p=PDBParser(PERMISSIVE=1) s=p.get_structure("test", "test.pdb") ... I get the following error: Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 65, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 86, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 149, in _parse_coordinates occupancy=float(line[54:60]) ValueError: empty string for float() ... which is due to the missing occupancies and b factors. I'd like to know, if there is any option for the PDBParser, so that it gets possible to use these files? I could probably add random occupancies and b factors to the file, but I thought maybe there is a simpler solution? Thank you, Saskia -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner From biopython at maubp.freeserve.co.uk Wed May 17 15:40:37 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Wed, 17 May 2006 20:40:37 +0100 Subject: [BioPython] PDBParser, problem with pdb files missing b factors and occupancies In-Reply-To: <5333.1147882139@www099.gmx.net> References: <5333.1147882139@www099.gmx.net> Message-ID: <446B7C35.1090309@maubp.freeserve.co.uk> Saskia Villinger wrote: > Hi! > > I'd like to use the PDBParser for pdb files missing b factors and > occupancies. When I write: > > from Bio.PDB.PDBParser import PDBParser > p=PDBParser(PERMISSIVE=1) > s=p.get_structure("test", "test.pdb") > > ... I get the following error: > > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 65, in > get_structure > self._parse(file.readlines()) > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 86, in > _parse > self.trailer=self._parse_coordinates(coords_trailer) > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 149, in > _parse_coordinates > occupancy=float(line[54:60]) > ValueError: empty string for float() > > > ... which is due to the missing occupancies and b factors. I'd like to know, > if there is any option for the PDBParser, so that it gets possible to use > these files? I could probably add random occupancies and b factors to the > file, but I thought maybe there is a simpler solution? It sounds like you are writing your own PDB files - in which case putting some default values in these fields is probably the best solution (e.g. one, or maybe zero, for the occupancy). The python PDB parser author (Thomas Hamelryck) might have some feedback for you... Peter From idoerg at burnham.org Mon May 22 13:41:32 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 22 May 2006 10:41:32 -0700 Subject: [BioPython] BOSC 2006 2nd Call for Papers Message-ID: <4471F7CC.9000705@burnham.org> >2nd CALL FOR SPEAKERS > >This is the second and last official call for speakers to submit their >abstracts to speak at BOSC 2006 >in Fortaleza, Brasil. In order to be considered as a potential speaker, >an abstract must be recieved by >Monday, June 5th, 2006. We look forward to a great conference this >year. Please consult >The Official BOSC 2006 Website at: > >http://www.open-bio.org/wiki/BOSC_2006 > >for more details and information. > > >In addition, a BOSC weblog has been setup to make it easier to >desiminate all BOSC >related announcements: > >http://wiki.open-bio.org/boscblog/ > >And if you have an ICAL compatible Calendar, there is an EventDB >calendar set up with all >BOSC related deadlines. > >http://eventful.com/groups/G0-001-000014747-0 > >More information about ISMB can be found at the Official ISMB 2006 Website: > >http://ismb2006.cbi.cnptia.embrapa.br/ > >Thank You, and we look forward to seeing you all, > >The BOSC Organizing Committee. > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From denilw at yahoo.com Fri May 26 12:57:11 2006 From: denilw at yahoo.com (Denil Wickrama) Date: Fri, 26 May 2006 09:57:11 -0700 (PDT) Subject: [BioPython] NCBIWWW.qblast with refseq by organism Message-ID: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Hi, I would like to BLAST a list of proteins against the refseq database and retrieve the corresponding accession numbers of the exact hits. I get errors when I change from the nr database to the refseq database. Also I am trying to restrict the results by organism name, but that was not successful. result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" [Organism]') result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') Is it possible to do refseq searches with NCBIWWW.qblast? What do you think I am doing wrong, any help would be appreciated. The code attached below. Thanks, Denil Wickrama Bioinformatics research assistant Biochemistry and Biomedical Sciences McMaster University from Bio import Fasta from Bio import GenBank from Bio.Blast import NCBIWWW from Bio.Blast import NCBIStandalone import csv from Bio.SeqRecord import SeqRecord from sys import * ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'fasta', parser = Fasta.RecordParser()) #gi_number is sent to this function from another function def BlastGI(gi_number): print gi_number seq = ncbi_dict[gi_number] # I would like to change "nr" to "refseq" but I get an error when I do so # I am trying to us entrez_query='"rattus norvegicus" [Organism]' to restrict the result to those from rat save_file = open('my_blast.out', 'w') blast_results = result_handle.read() save_file.write("") save_file.write(blast_results) save_file.close() blast_out = open('my_blast.out', 'r') from Bio.Blast import NCBIXML b_parser = NCBIXML.BlastParser() b_record = b_parser.parse(blast_out) #http://biopython.org/docs/tutorial/Tutorial004.html#toc10 E_VALUE_THRESH = 0.001 hspHit = False exactHit = False print 'GI:', gi_number,""; print 'alignments', len(b_record.alignments),""; for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: hspHit = True if hsp.sbjct == seq.sequence: #print 'sequence:', hsp.sbjct print 'title:', hits.name exactHit = True if exactHit == True: print "Exact HIT: True, 1, 1" else: print "Exact HIT: False, 0, 1" From mdehoon at c2b2.columbia.edu Fri May 26 18:33:15 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 26 May 2006 15:33:15 -0700 Subject: [BioPython] NCBIWWW.qblast with refseq by organism In-Reply-To: <20060526165711.94194.qmail@web51708.mail.yahoo.com> References: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Message-ID: <4477822B.80002@c2b2.columbia.edu> Which gi_number do you use when you call BlastGI? --Michiel Denil Wickrama wrote: > Hi, > I would like to BLAST a list of proteins against the refseq database and retrieve the corresponding accession numbers of the exact hits. I get errors when I change from the nr database to the refseq database. Also I am trying to restrict the results by organism name, but that was not successful. > result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" [Organism]') > result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') > Is it possible to do refseq searches with NCBIWWW.qblast? > What do you think I am doing wrong, any help would be appreciated. > The code attached below. > > Thanks, > Denil Wickrama > Bioinformatics research assistant > Biochemistry and Biomedical Sciences > McMaster University > > > from Bio import Fasta > from Bio import GenBank > from Bio.Blast import NCBIWWW > from Bio.Blast import NCBIStandalone > import csv > from Bio.SeqRecord import SeqRecord > from sys import * > > ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'fasta', parser = Fasta.RecordParser()) > > #gi_number is sent to this function from another function > def BlastGI(gi_number): > print gi_number > seq = ncbi_dict[gi_number] > # I would like to change "nr" to "refseq" but I get an error when I do so > # I am trying to us entrez_query='"rattus norvegicus" [Organism]' to restrict the result to those from rat > save_file = open('my_blast.out', 'w') > blast_results = result_handle.read() > > save_file.write("") > save_file.write(blast_results) > save_file.close() > blast_out = open('my_blast.out', 'r') > from Bio.Blast import NCBIXML > b_parser = NCBIXML.BlastParser() > b_record = b_parser.parse(blast_out) > > #http://biopython.org/docs/tutorial/Tutorial004.html#toc10 > E_VALUE_THRESH = 0.001 > hspHit = False > exactHit = False > print 'GI:', gi_number,""; > print 'alignments', len(b_record.alignments),""; > for alignment in b_record.alignments: > for hsp in alignment.hsps: > if hsp.expect < E_VALUE_THRESH: > hspHit = True > if hsp.sbjct == seq.sequence: > #print 'sequence:', hsp.sbjct > print 'title:', hits.name > exactHit = True > if exactHit == True: > print "Exact HIT: True, 1, 1" > else: > print "Exact HIT: False, 0, 1" > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From nauman.maqbool at agresearch.co.nz Mon May 29 17:41:09 2006 From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman) Date: Tue, 30 May 2006 09:41:09 +1200 Subject: [BioPython] Reading Affy's .CEL files Message-ID: Hi I was wondering how the Bio.Affy.CelFile module handles a .Cel file, while the .Cel file is in binary format? Does the .Cel file need to be converted to a text file first using GCOS? Any pointers would be much appreciated. Regards Nauman ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From alpersoyler at yahoo.com Wed May 31 07:40:48 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Wed, 31 May 2006 04:40:48 -0700 (PDT) Subject: [BioPython] NCBIWWW.qblast Message-ID: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> Dear All, I have a fasta file (called fasta) containing 20 proteins. I want to blast them in an order. How can I write the results of these 20 proteins in different output files. I tried to write the below script but the 'my_blast2.out' file turned empty. Can you help me please? regards, Alper #!usr/local/bin/python from Bio import Fasta file_for_blast = open('fasta', 'r') f_iterator = Fasta.Iterator(file_for_blast) f_record = f_iterator.next() from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) seqnum = 0 for f_record in f_iterator: save_file = open('my_blast.out', 'w') blast_results = result_handle.read() save_file.write(blast_results) save_file.close() seqnum += 1 save_file2 = open('my_blast2.out', 'w') blast_results = result_handle.read() save_file2.write(blast_results) save_file2.close() --------------------------------- Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice. From clayton_kd at yahoo.com Tue May 2 07:47:35 2006 From: clayton_kd at yahoo.com (Kyle Dent) Date: Tue, 2 May 2006 00:47:35 -0700 (PDT) Subject: [BioPython] GenBank parsing In-Reply-To: <4453E0B3.9040409@maubp.freeserve.co.uk> Message-ID: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> Hey Peter, Thanks for the help, that updated managed to solve a separate problem I encountered with a genbank file I downloaded in December. Unfortunately my existing problem stands. Here is the latest complaint (curtosy of the revised parser): Traceback (most recent call last): File "C:\blast\bin\script.py", line 21, in ? cur_record = gb_iterator.next() File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 146, in next return self._parser.parse(File.StringHandle(data)) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 191, in parse self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1541, in feed line = self._feed_header(handle, consumer) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1507, in _feed_header getattr(consumer, self._consumer_dict[line_type])(data) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 511, in title self._current_ref.title = content AttributeError: 'NoneType' object has no attribute 'title' I had examined the difference between Genbank files downloaded recently with those I downloaded in the past (its been a few months since Ive had to work with this). The major difference is a TITLE field which adjacent to the ACCESSION field. BioEdit seems to add these title fields as well, hence my not being able to parse BioEdited genbank files either. Help with this would be greatly appreciated, Regards, --- "Peter (BioPython)" wrote: > Kyle Dent wrote: > > Dear All, > > > > My script was successfully implementing the > Genbank > > parser until just today I was trying to get it to > > parse a genpept file. After much experimentation I > > discovered that it was actually having trouble > parsing > > even newly downloaded GenBank files as well > > (downloaded of NCBI). > > > > I wanted to ask if anyone is aware of this > problem, I > > understand the flat file format was updated this > month > > and is probably the cause of this. > > I'm aware that earlier in 2006, there was a new > project line added. I > haven't been aware of any further changes... on the > other hand, I don't > think I've ever used a "genpept" file either. > > Anyway, from the error message you are using the > "old" Martel based > parser shipped with BioPython 1.41 > > We recommend you update to the current CVS parser > which is (a) more up > to date, (b) faster, (c) should give slightly more > helpful error > messages if it does get stuck. > > For most cases you can simply download this file, > replacing your > Bio/GenBank/__init__.py after making a backup of the > old version: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython > > If you see errors about ReseekFile then you will > need to make a few > other changes... > > If you are still having trouble, or need further > help making the update, > please reply back. Including the GenBank reference > of any problem file > would be handy. > > Thank you > > Peter > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From biopython at maubp.freeserve.co.uk Tue May 2 10:41:11 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 02 May 2006 11:41:11 +0100 Subject: [BioPython] GenBank parsing In-Reply-To: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> References: <20060502074735.40114.qmail@web31515.mail.mud.yahoo.com> Message-ID: <44573747.7070605@maubp.freeserve.co.uk> Kyle Dent wrote: > Hey Peter, > > Thanks for the help, that updated managed to solve a > separate problem I encountered with a genbank file I > downloaded in December. > > Unfortunately my existing problem stands. Here is the > latest complaint (curtosy of the revised parser): .. > I had examined the difference between Genbank files > downloaded recently with those I downloaded in the > past (its been a few months since Ive had to work with > this). The major difference is a TITLE field which > adjacent to the ACCESSION field. BioEdit seems to add > these title fields as well, hence my not being able to > parse BioEdited genbank files either. > > Help with this would be greatly appreciated, > > Regards, I've just checked with a random small bacterial genomes, freshly downloaded from the NCBI, and don't have a problem here: There is no "TITLE" as part of the "ACCESSION" line. Could you post a link to one of these "problem files" please? (Or send a copy directly to me only, off the mailing list to avoid clogging up everyone else's mailboxes) Thanks Peter From alpersoyler at yahoo.com Tue May 2 15:24:47 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Tue, 2 May 2006 08:24:47 -0700 (PDT) Subject: [BioPython] Question!!! Message-ID: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> Hi all, I have a protein in fasta format in a file called 'fasta' and I am writing below script: from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) However, it gives me the following error: Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'qblast' Do I have something missing in my Biopython? How can I solve this problem? Thank you. Regards, Alper --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice. From mdehoon at c2b2.columbia.edu Tue May 2 16:21:58 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 May 2006 12:21:58 -0400 Subject: [BioPython] Question!!! Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECF26@cgcmail.cgc.cpmc.columbia.edu> >>> from Bio.Blast import NCBIWWW >>> result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) Do you have anything between these two lines? Because otherwise, fasta is not defined. > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'qblast' Which version of Biopython are you using? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of alper soyler Sent: Tue 5/2/2006 11:24 AM To: biopython at lists.open-bio.org Subject: [BioPython] Question!!! Hi all, I have a protein in fasta format in a file called 'fasta' and I am writing below script: from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) However, it gives me the following error: Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'qblast' Do I have something missing in my Biopython? How can I solve this problem? Thank you. Regards, Alper --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From winter at biotec.tu-dresden.de Tue May 2 16:05:13 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Tue, 02 May 2006 18:05:13 +0200 Subject: [BioPython] Question!!! In-Reply-To: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> References: <20060502152447.28632.qmail@web36804.mail.mud.yahoo.com> Message-ID: <44578339.8010005@biotec.tu-dresden.de> Dear Alper, you might want to try print NCBIWWW.__doc__ print NCBIWWW.__file__ to first see if the doc string of the NCBIWWW module has a section that mentions qblast as a function, and, if not, check the source code of the module (__file__ tells you the location) if it's correct. It should contain a qblast function (line 1038 in my version of NCBIWWW.py). HTH, Christof alper soyler wrote: > Hi all, > > I have a protein in fasta format in a file called 'fasta' and I am writing below script: > > from Bio.Blast import NCBIWWW > result_handle = NCBIWWW.qblast('blastp', 'nr', fasta) > > However, it gives me the following error: > > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'qblast' > > Do I have something missing in my Biopython? How can I solve this problem? Thank you. > > Regards, > Alper From cce at clarkevans.com Wed May 3 23:23:23 2006 From: cce at clarkevans.com (Clark C. Evans) Date: Wed, 3 May 2006 19:23:23 -0400 Subject: [BioPython] Google Summer of Code Message-ID: <20060503232323.GA47256@prometheusresearch.com> Proposals are due by May 8, 2006 13:00 PDT. Proposals made before this period may get comments and permit you to update to reflect feedback on what you submitted. ----- Forwarded message from Neal Norwitz ----- Date: Thu, 20 Apr 2006 00:32:56 -0700 From: "Neal Norwitz" To: python-list at python.org Subject: Python Software Foundation seeks mentors and students for Google Summer of Code This spring and summer, Google will again provide stipends for students (18+, undergraduate thru PhD programs) to write new open-source code. The Python Software Foundation (PSF) http://www.python.org/psf/ will again act as a sponsoring organization in Google's Summer of Code, matching mentors and projects benefiting Python and Python users. Projects can include work on the core Python language, programmer utilities, libraries, packages, frameworks related to Python, or other Python implementations like Jython, PyPy, or IronPython. Please add your project ideas to the existing set at http://wiki.python.org/moin/SummerOfCode Mentoring instructions are also on this page. The deadline is soon, so please sign up as a mentor early. If you are a student considering a project, you should start deciding now. Feel free to ask questions on python-dev at python.org The main page for the Summer of Code is http://code.google.com/summerofcode.html At the bottom are links to StudentFAQ, MentorFAQ, and TermsOfService. The first two have the timeline. Note that student applications are due between May 1, 17:00 PST and May 8, 17:00 PST. People interested in mentoring a student though PSF are encouraged to contact me, Neal Norwitz at nnorwitz at gmail.com. People unknown to Guido or myself should find a couple of people known within the Python community who are willing to act as references. Feel free to contact me if you have any questions. I look forward to meeting many new mentors and students. Let's improve Python! ----- End forwarded message ----- From mmokrejs at ribosome.natur.cuni.cz Thu May 4 19:09:57 2006 From: mmokrejs at ribosome.natur.cuni.cz (=?windows-1252?Q?Martin_MOKREJ=8A?=) Date: Thu, 04 May 2006 21:09:57 +0200 Subject: [BioPython] Need help parsing Blastoutput In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <445A5185.7090304@ribosome.natur.cuni.cz> Hi, Michiel De Hoon wrote: > A general question is if anybody still needs the parser for Blast text > output. Currently, we are confusing our users by having a Blast text parser > that tends to break. A broken parser may be worse than no parser. I still do think it is worth the effort to keep it in the tree, for study purposes and for the fact that not everybody uses XML formatted output yet. ;-) Martin From sbassi at gmail.com Thu May 4 19:45:56 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 May 2006 16:45:56 -0300 Subject: [BioPython] Need help parsing Blastoutput In-Reply-To: <445A5185.7090304@ribosome.natur.cuni.cz> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEF7@cgcmail.cgc.cpmc.columbia.edu> <445A5185.7090304@ribosome.natur.cuni.cz> Message-ID: On 5/4/06, Martin MOKREJ? wrote: > I still do think it is worth the effort to keep it in the tree, for > study purposes and for the fact that not everybody uses XML formatted output > yet. ;-) While I agree that nobody is using XML output by default (I also use text or HTML output), I think people will use the XML output if they BLAST knowing that they will parse it using Biopython. So I think that text parser should stay in Biopython, but examples in documentation should be in XML. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From dag23 at duke.edu Fri May 5 21:00:03 2006 From: dag23 at duke.edu (David Garfield) Date: Fri, 5 May 2006 17:00:03 -0400 Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Message-ID: <9FBD5F55-F7A9-4239-B7A1-5CFC3FDC54C4@duke.edu> Its been a while since I used the Bio.Blast modules, and I've been through one BioPython upgrade (at least) since then. It appears in the interlude the sbjct_end and query_end in the Record object have gone away. Am I totally wedged on this? If they're gone, does anyone have suggestions about other ways for deriving the ends which are oh so useful for other tasks. Cheers, David From mdehoon at c2b2.columbia.edu Fri May 5 22:18:18 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 5 May 2006 18:18:18 -0400 Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECF31@cgcmail.cgc.cpmc.columbia.edu> Can you show the code that used to work and doesn't work any more? You can probably get sbjct_end from sbjct_start and the sequence length. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of David Garfield Sent: Fri 5/5/2006 5:00 PM To: biopython at biopython.org Subject: [BioPython] Where'd sbjct_end go in the Bio.Blast.Record? Its been a while since I used the Bio.Blast modules, and I've been through one BioPython upgrade (at least) since then. It appears in the interlude the sbjct_end and query_end in the Record object have gone away. Am I totally wedged on this? If they're gone, does anyone have suggestions about other ways for deriving the ends which are oh so useful for other tasks. Cheers, David _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From lee.byung-chul at kaist.ac.kr Thu May 11 18:01:43 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 12 May 2006 03:01:43 +0900 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record Message-ID: <44637C07.30200@kaist.ac.kr> Hi biopython users, During paring the pdb files, I met a ridiculous parsing result. I coded my program like below: from Bio.PDB import * p = PDBParser() s= p.get_structure('test','pdb1j1t.ent') m0 = s[0] for r in m0['A']: print r then the result was obtained like : ... But in the pdb file, the TER record was written before Residue CA like below: ... ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O TER 1766 ASN A 233 HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S Thus I think BioPython's model iterator must stop before , and I want to know why this happens and how I can solve this. Regards, Byung-chul Lee -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From biopython at maubp.freeserve.co.uk Thu May 11 21:50:30 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Thu, 11 May 2006 22:50:30 +0100 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record In-Reply-To: <44637C07.30200@kaist.ac.kr> References: <44637C07.30200@kaist.ac.kr> Message-ID: <4463B1A6.8010006@maubp.freeserve.co.uk> Lee, Byung-chul wrote: > Hi biopython users, > > During paring the pdb files, I met a ridiculous parsing result. > > I coded my program like below: > > from Bio.PDB import * > p = PDBParser() > s= p.get_structure('test','pdb1j1t.ent') > m0 = s[0] > for r in m0['A']: > print r > > then the result was obtained like : > ... > > > > > > But in the pdb file, the TER record was written before Residue CA like > below: > ... > ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O > ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N > ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O > TER 1766 ASN A 233 > HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA > HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S > > Thus I think BioPython's model iterator must stop before het=H_ CA resseq=301 icode= >, and I want to know why this happens and > how I can solve this. I inferred from the filename used, that you are talking about the full PDB file for record 1j1t - certainly this seems to match the samples lines you quoted. Are the HETATM 1767 and 1768 really part of the chain? What about the rest of the S04 HETATM lines (1768 to 1772) and the following waters (HETATM 1773 to 2092). I would guess that BioPython treats a termination record (TER) as the end of the chain, and on the face of it that is the correct action. If you really want to treat these atoms as part of the chain you might try editing the PDB file to move the TER line a little lower down. I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be along in a while with a definitive answer... Peter From lee.byung-chul at kaist.ac.kr Fri May 12 05:17:48 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 12 May 2006 14:17:48 +0900 Subject: [BioPython] PDBParser, chain iterator problem. Terminating just after TER record In-Reply-To: <4463B1A6.8010006@maubp.freeserve.co.uk> References: <44637C07.30200@kaist.ac.kr> <4463B1A6.8010006@maubp.freeserve.co.uk> Message-ID: <44641A7C.6020105@kaist.ac.kr> Thank Boris and Peter for your sincere efforts and kind reply. My point was that I wanted to parse only ATOM/HETATM records before TER record, so in the case of mine, I wanted regions from SER (resseq=6) to ASN (resseq=233). While I was looking for what to make problem in the pdb1j1t.ent after receiving your mails, it was founded that a chain id 'A' was included in the HETATM 1767 record. After deleting the chain id 'A' of the 22th column in the HETATM 1767, my expected result was properly obtained. So I think PDBParser seems not to recognize 'TER' record but to recognize only chain id column due to many problems of pdb ent file format. I will contact Thomas Hamelryck, and thanks again. Regards, Byung-chul Peter (BioPython) > Lee, Byung-chul wrote: > >> Hi biopython users, >> >> During paring the pdb files, I met a ridiculous parsing result. >> >> I coded my program like below: >> >> from Bio.PDB import * >> p = PDBParser() >> s= p.get_structure('test','pdb1j1t.ent') >> m0 = s[0] >> for r in m0['A']: >> print r >> >> then the result was obtained like : >> ... >> >> >> >> >> >> But in the pdb file, the TER record was written before Residue CA like >> below: >> ... >> ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O >> ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N >> ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O >> TER 1766 ASN A 233 >> HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA >> HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S >> >> Thus I think BioPython's model iterator must stop before > het=H_ CA resseq=301 icode= >, and I want to know why this happens and >> how I can solve this. > > > I inferred from the filename used, that you are talking about the full > PDB file for record 1j1t - certainly this seems to match the samples > lines you quoted. > > Are the HETATM 1767 and 1768 really part of the chain? What about the > rest of the S04 HETATM lines (1768 to 1772) and the following waters > (HETATM 1773 to 2092). > > I would guess that BioPython treats a termination record (TER) as the > end of the chain, and on the face of it that is the correct action. > > If you really want to treat these atoms as part of the chain you might > try editing the PDB file to move the TER line a little lower down. > > I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be > along in a while with a definitive answer... > > Peter > > -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From saskiavillinger at gmx.de Wed May 17 16:08:59 2006 From: saskiavillinger at gmx.de (Saskia Villinger) Date: Wed, 17 May 2006 18:08:59 +0200 (MEST) Subject: [BioPython] PDBParser, problem with pdb files missing b factors and occupancies Message-ID: <5333.1147882139@www099.gmx.net> Hi! I'd like to use the PDBParser for pdb files missing b factors and occupancies. When I write: from Bio.PDB.PDBParser import PDBParser p=PDBParser(PERMISSIVE=1) s=p.get_structure("test", "test.pdb") ... I get the following error: Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 65, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 86, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 149, in _parse_coordinates occupancy=float(line[54:60]) ValueError: empty string for float() ... which is due to the missing occupancies and b factors. I'd like to know, if there is any option for the PDBParser, so that it gets possible to use these files? I could probably add random occupancies and b factors to the file, but I thought maybe there is a simpler solution? Thank you, Saskia -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner From biopython at maubp.freeserve.co.uk Wed May 17 19:40:37 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Wed, 17 May 2006 20:40:37 +0100 Subject: [BioPython] PDBParser, problem with pdb files missing b factors and occupancies In-Reply-To: <5333.1147882139@www099.gmx.net> References: <5333.1147882139@www099.gmx.net> Message-ID: <446B7C35.1090309@maubp.freeserve.co.uk> Saskia Villinger wrote: > Hi! > > I'd like to use the PDBParser for pdb files missing b factors and > occupancies. When I write: > > from Bio.PDB.PDBParser import PDBParser > p=PDBParser(PERMISSIVE=1) > s=p.get_structure("test", "test.pdb") > > ... I get the following error: > > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 65, in > get_structure > self._parse(file.readlines()) > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 86, in > _parse > self.trailer=self._parse_coordinates(coords_trailer) > File "/usr/lib/python2.3/site-packages/Bio/PDB/PDBParser.py", line 149, in > _parse_coordinates > occupancy=float(line[54:60]) > ValueError: empty string for float() > > > ... which is due to the missing occupancies and b factors. I'd like to know, > if there is any option for the PDBParser, so that it gets possible to use > these files? I could probably add random occupancies and b factors to the > file, but I thought maybe there is a simpler solution? It sounds like you are writing your own PDB files - in which case putting some default values in these fields is probably the best solution (e.g. one, or maybe zero, for the occupancy). The python PDB parser author (Thomas Hamelryck) might have some feedback for you... Peter From idoerg at burnham.org Mon May 22 17:41:32 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 22 May 2006 10:41:32 -0700 Subject: [BioPython] BOSC 2006 2nd Call for Papers Message-ID: <4471F7CC.9000705@burnham.org> >2nd CALL FOR SPEAKERS > >This is the second and last official call for speakers to submit their >abstracts to speak at BOSC 2006 >in Fortaleza, Brasil. In order to be considered as a potential speaker, >an abstract must be recieved by >Monday, June 5th, 2006. We look forward to a great conference this >year. Please consult >The Official BOSC 2006 Website at: > >http://www.open-bio.org/wiki/BOSC_2006 > >for more details and information. > > >In addition, a BOSC weblog has been setup to make it easier to >desiminate all BOSC >related announcements: > >http://wiki.open-bio.org/boscblog/ > >And if you have an ICAL compatible Calendar, there is an EventDB >calendar set up with all >BOSC related deadlines. > >http://eventful.com/groups/G0-001-000014747-0 > >More information about ISMB can be found at the Official ISMB 2006 Website: > >http://ismb2006.cbi.cnptia.embrapa.br/ > >Thank You, and we look forward to seeing you all, > >The BOSC Organizing Committee. > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From denilw at yahoo.com Fri May 26 16:57:11 2006 From: denilw at yahoo.com (Denil Wickrama) Date: Fri, 26 May 2006 09:57:11 -0700 (PDT) Subject: [BioPython] NCBIWWW.qblast with refseq by organism Message-ID: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Hi, I would like to BLAST a list of proteins against the refseq database and retrieve the corresponding accession numbers of the exact hits. I get errors when I change from the nr database to the refseq database. Also I am trying to restrict the results by organism name, but that was not successful. result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" [Organism]') result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') Is it possible to do refseq searches with NCBIWWW.qblast? What do you think I am doing wrong, any help would be appreciated. The code attached below. Thanks, Denil Wickrama Bioinformatics research assistant Biochemistry and Biomedical Sciences McMaster University from Bio import Fasta from Bio import GenBank from Bio.Blast import NCBIWWW from Bio.Blast import NCBIStandalone import csv from Bio.SeqRecord import SeqRecord from sys import * ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'fasta', parser = Fasta.RecordParser()) #gi_number is sent to this function from another function def BlastGI(gi_number): print gi_number seq = ncbi_dict[gi_number] # I would like to change "nr" to "refseq" but I get an error when I do so # I am trying to us entrez_query='"rattus norvegicus" [Organism]' to restrict the result to those from rat save_file = open('my_blast.out', 'w') blast_results = result_handle.read() save_file.write("") save_file.write(blast_results) save_file.close() blast_out = open('my_blast.out', 'r') from Bio.Blast import NCBIXML b_parser = NCBIXML.BlastParser() b_record = b_parser.parse(blast_out) #http://biopython.org/docs/tutorial/Tutorial004.html#toc10 E_VALUE_THRESH = 0.001 hspHit = False exactHit = False print 'GI:', gi_number,""; print 'alignments', len(b_record.alignments),""; for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: hspHit = True if hsp.sbjct == seq.sequence: #print 'sequence:', hsp.sbjct print 'title:', hits.name exactHit = True if exactHit == True: print "Exact HIT: True, 1, 1" else: print "Exact HIT: False, 0, 1" From mdehoon at c2b2.columbia.edu Fri May 26 22:33:15 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Fri, 26 May 2006 15:33:15 -0700 Subject: [BioPython] NCBIWWW.qblast with refseq by organism In-Reply-To: <20060526165711.94194.qmail@web51708.mail.yahoo.com> References: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Message-ID: <4477822B.80002@c2b2.columbia.edu> Which gi_number do you use when you call BlastGI? --Michiel Denil Wickrama wrote: > Hi, > I would like to BLAST a list of proteins against the refseq database and retrieve the corresponding accession numbers of the exact hits. I get errors when I change from the nr database to the refseq database. Also I am trying to restrict the results by organism name, but that was not successful. > result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" [Organism]') > result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') > Is it possible to do refseq searches with NCBIWWW.qblast? > What do you think I am doing wrong, any help would be appreciated. > The code attached below. > > Thanks, > Denil Wickrama > Bioinformatics research assistant > Biochemistry and Biomedical Sciences > McMaster University > > > from Bio import Fasta > from Bio import GenBank > from Bio.Blast import NCBIWWW > from Bio.Blast import NCBIStandalone > import csv > from Bio.SeqRecord import SeqRecord > from sys import * > > ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'fasta', parser = Fasta.RecordParser()) > > #gi_number is sent to this function from another function > def BlastGI(gi_number): > print gi_number > seq = ncbi_dict[gi_number] > # I would like to change "nr" to "refseq" but I get an error when I do so > # I am trying to us entrez_query='"rattus norvegicus" [Organism]' to restrict the result to those from rat > save_file = open('my_blast.out', 'w') > blast_results = result_handle.read() > > save_file.write("") > save_file.write(blast_results) > save_file.close() > blast_out = open('my_blast.out', 'r') > from Bio.Blast import NCBIXML > b_parser = NCBIXML.BlastParser() > b_record = b_parser.parse(blast_out) > > #http://biopython.org/docs/tutorial/Tutorial004.html#toc10 > E_VALUE_THRESH = 0.001 > hspHit = False > exactHit = False > print 'GI:', gi_number,""; > print 'alignments', len(b_record.alignments),""; > for alignment in b_record.alignments: > for hsp in alignment.hsps: > if hsp.expect < E_VALUE_THRESH: > hspHit = True > if hsp.sbjct == seq.sequence: > #print 'sequence:', hsp.sbjct > print 'title:', hits.name > exactHit = True > if exactHit == True: > print "Exact HIT: True, 1, 1" > else: > print "Exact HIT: False, 0, 1" > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From nauman.maqbool at agresearch.co.nz Mon May 29 21:41:09 2006 From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman) Date: Tue, 30 May 2006 09:41:09 +1200 Subject: [BioPython] Reading Affy's .CEL files Message-ID: Hi I was wondering how the Bio.Affy.CelFile module handles a .Cel file, while the .Cel file is in binary format? Does the .Cel file need to be converted to a text file first using GCOS? Any pointers would be much appreciated. Regards Nauman ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From alpersoyler at yahoo.com Wed May 31 11:40:48 2006 From: alpersoyler at yahoo.com (alper soyler) Date: Wed, 31 May 2006 04:40:48 -0700 (PDT) Subject: [BioPython] NCBIWWW.qblast Message-ID: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> Dear All, I have a fasta file (called fasta) containing 20 proteins. I want to blast them in an order. How can I write the results of these 20 proteins in different output files. I tried to write the below script but the 'my_blast2.out' file turned empty. Can you help me please? regards, Alper #!usr/local/bin/python from Bio import Fasta file_for_blast = open('fasta', 'r') f_iterator = Fasta.Iterator(file_for_blast) f_record = f_iterator.next() from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) seqnum = 0 for f_record in f_iterator: save_file = open('my_blast.out', 'w') blast_results = result_handle.read() save_file.write(blast_results) save_file.close() seqnum += 1 save_file2 = open('my_blast2.out', 'w') blast_results = result_handle.read() save_file2.write(blast_results) save_file2.close() --------------------------------- Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice.