From rcsqtc at iiqab.csic.es Mon Aug 1 11:32:18 2005 From: rcsqtc at iiqab.csic.es (Ramon Crehuet) Date: Mon Aug 1 11:23:17 2005 Subject: [BioPython] Superimposing CA atoms of a chain Message-ID: <42EE4082.9030702@iiqab.csic.es> I'd like to superimpose two chains (all atoms from all residues) but calculating the RMS only from CA atoms. That is, I'd like to calculate the transformation matrix for the CA atoms and apply it to all atoms. (A common operation, I guess...) Can I do that with the PDB.superimpose module? Otherwise, if I need to use the SVDSuperimpose, can I manipule atom instances or it only works with numeric arrays? Thanks, Ramon From thamelry at binf.ku.dk Mon Aug 1 11:26:52 2005 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon Aug 1 12:11:36 2005 Subject: [BioPython] Superimposing CA atoms of a chain In-Reply-To: <42EE4082.9030702@iiqab.csic.es> References: <42EE4082.9030702@iiqab.csic.es> Message-ID: <200508011726.52569.thamelry@binf.ku.dk> On Monday 01 August 2005 17:32, Ramon Crehuet wrote: > I'd like to superimpose two chains (all atoms from all residues) but > calculating the RMS only from CA atoms. That is, I'd like to calculate > the transformation matrix for the CA atoms and apply it to all atoms. > (A common operation, I guess...) > Can I do that with the PDB.superimpose module? Yes. Use the PDB.superimpose module to calculate the rotation/translation for the CA atoms only and then apply these to the atoms you want using the transform(rotation, translation) method of the atom object. -Thomas From amorgan at mitre.org Tue Aug 2 16:07:50 2005 From: amorgan at mitre.org (Alexander A. Morgan) Date: Tue Aug 2 15:59:24 2005 Subject: [BioPython] Changes in NCBI BLAST output format !!?? In-Reply-To: <1121786507.42dd1a8b9dee5@imp3-q.free.fr> References: <1121786507.42dd1a8b9dee5@imp3-q.free.fr> Message-ID: <42EFD296.3020708@mitre.org> Hello: I've just run into the same problem, and I haven't seen a suggested fix go by, so I apologize if this is redundant information, but it seems that the files I've been getting from NCBI have a

removed from the header between the "RID: " line and the "

Query" line, and it is just a blank line now. If you edit Bio.Blast.NCBIWWW to not look for the "

", it seems to work okay. class _Scanner: .... def _scan_header(self, uhandle, consumer): .... change: attempt_read_and_call(uhandle, consumer.noevent, start='

') to: attempt_read_and_call(uhandle, consumer.noevent) aurelie.bornot@free.fr wrote: >Thank you very much Jessica !!! > >Unfortunately, I need a lot of thing in the BLAST reports..... >It will be difficult to do the same thing as you did.... > >I will try to do something in the code of parser of Python. >But it will be difficult for me.. >so if you or someone has advices !!! > >Thanks a lot again for your answer Jessica ! >Aur?lie > > >-------------- >Aurelie BORNOT >MNHN >Paris > > >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython > > From mdehoon at c2b2.columbia.edu Wed Aug 3 14:37:33 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Aug 3 14:32:59 2005 Subject: [BioPython] Changes in NCBI BLAST output format !!?? Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC283@cgcmail.cgc.cpmc.columbia.edu> Do you happen to know if this change can break anything else in the Blast parser? From running Biopython's tests for Blast, it seems that this change is OK. On the other hand, I don't use Blast much myself, so I don't trust my own judgement in this matter. If making this change does not cause any new bugs, I'd be happy to include it in CVS. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces@portal.open-bio.org on behalf of Alexander A. Morgan Sent: Tue 8/2/2005 4:07 PM To: aurelie.bornot@free.fr Cc: biopython@biopython.org Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? Hello: I've just run into the same problem, and I haven't seen a suggested fix go by, so I apologize if this is redundant information, but it seems that the files I've been getting from NCBI have a

removed from the header between the "RID: " line and the "

Query" line, and it is just a blank line now. If you edit Bio.Blast.NCBIWWW to not look for the "

", it seems to work okay. class _Scanner: .... def _scan_header(self, uhandle, consumer): .... change: attempt_read_and_call(uhandle, consumer.noevent, start='

') to: attempt_read_and_call(uhandle, consumer.noevent) aurelie.bornot@free.fr wrote: >Thank you very much Jessica !!! > >Unfortunately, I need a lot of thing in the BLAST reports..... >It will be difficult to do the same thing as you did.... > >I will try to do something in the code of parser of Python. >But it will be difficult for me.. >so if you or someone has advices !!! > >Thanks a lot again for your answer Jessica ! >Aur?lie > > >-------------- >Aurelie BORNOT >MNHN >Paris > > >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython > > _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython From aurelie.bornot at free.fr Wed Aug 3 14:54:36 2005 From: aurelie.bornot at free.fr (=?iso-8859-1?Q?Aur=E9lie_Bornot?=) Date: Wed Aug 3 14:44:42 2005 Subject: [BioPython] Changes in NCBI BLAST output format !!?? References: <1121786507.42dd1a8b9dee5@imp3-q.free.fr> <42EFD296.3020708@mitre.org> Message-ID: <001b01c5985c$cc046360$0b413851@YSENGARD> Thank you very much Alexander !! I didn't dare to change the code in the Bio.Blast.NCBIWWW on my own because I didn't have time to make tests... So I simply automatiquely added the

in the Blast file... not very nice.. I know ! I will try your method instead ! Thanks ! Aur?lie -------------- Aurelie BORNOT MNHN Paris ----- Original Message ----- From: "Alexander A. Morgan" To: Cc: Sent: Tuesday, August 02, 2005 10:07 PM Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? > Hello: > I've just run into the same problem, and I haven't seen a suggested fix > go by, so I apologize if this is redundant information, but it seems that > the files I've been getting from NCBI have a

removed from the header > between the "RID: " line and the "

Query" line, and it is just a blank > line now. If you edit Bio.Blast.NCBIWWW to not look for the "

", it > seems to work okay. > > class _Scanner: > .... > def _scan_header(self, uhandle, consumer): > .... > > change: > attempt_read_and_call(uhandle, consumer.noevent, start='

') > to: attempt_read_and_call(uhandle, consumer.noevent) > > > > > > aurelie.bornot@free.fr wrote: > >>Thank you very much Jessica !!! >> >>Unfortunately, I need a lot of thing in the BLAST reports..... >>It will be difficult to do the same thing as you did.... >> >>I will try to do something in the code of parser of Python. >>But it will be difficult for me.. >>so if you or someone has advices !!! >> >>Thanks a lot again for your answer Jessica ! >>Aur?lie >> >> >>-------------- >>Aurelie BORNOT >>MNHN >>Paris >> >> >>_______________________________________________ >>BioPython mailing list - BioPython@biopython.org >>http://biopython.org/mailman/listinfo/biopython >> > > > > From dtomso at athenixcorp.com Thu Aug 4 16:40:17 2005 From: dtomso at athenixcorp.com (Daniel Tomso) Date: Thu Aug 4 16:32:27 2005 Subject: [BioPython] Blast and multiple processors Message-ID: Hello, all. I'm working on improving my BLAST throughput, and I have some questions about how the program handles multiple processors and multiple processes. Specifically, I've been experimenting with using BioPython's NCBIStandalone to handle 3 or 4 simultaneous blast requests, since my system has 4 processors. I spin out the requests via NCBIStandalone.blastall(blah, blah), then grab the blast_out and blast_err file handles in a list. Afterwards, I use blast_out.read() to collect the reports from each of the 4 processes. Is this wise and/or efficient? My execution times do drop off when I do, say, 4 jobs at a time instead of 1 at a time, so it is helping. Do the processor flags for blastall accomplish this more efficiently? Sorry if this is not specific enough, but any insight would be welcome!!!! Dan T. Daniel J. Tomso Senior Scientist, Bioinformatics Athenix Corporation 2202 Ellis Road Suite B Durham, NC 27703 919.281.0920 dtomso@athenixcorp.com www.athenixcorp.com Disclaimer: This message (including any attachments) may contain confidential or privileged information and is intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not use, copy, disclose or take any action based on this message or information herein. If you have received this message in error, please advise the sender immediately and erase all copies of this message and any related attachments. Thank you. From mdehoon at c2b2.columbia.edu Sun Aug 7 20:08:24 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun Aug 7 20:01:04 2005 Subject: [BioPython] Changes in NCBI BLAST output format !!?? Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC297@cgcmail.cgc.cpmc.columbia.edu> I've updated Biopython in CVS with this fix. See Bio/Blast/NCBIWWW.py revision 1.41. Please let me know if you find any problems. Thanks for finding this solution. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Alexander A. Morgan [mailto:amorgan@mitre.org] Sent: Wed 8/3/2005 2:48 PM To: Michiel De Hoon Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? Michiel: I couldn't find anything wrong with it. The Blast record objects seem to be correct and have the right alignments. However, I don't know of a thorough way to test it. In general the parser is pretty fragile though and will break for even the most minor changes in NCBI format, but it would be very challenging to try to make it more robust. Thanks, -Alex Michiel De Hoon wrote: >Do you happen to know if this change can break anything else in the Blast >parser? From running Biopython's tests for Blast, it seems that this change >is OK. On the other hand, I don't use Blast much myself, so I don't trust my >own judgement in this matter. >If making this change does not cause any new bugs, I'd be happy to include it >in CVS. > >--Michiel. > > >Michiel de Hoon >Center for Computational Biology and Bioinformatics >Columbia University >1150 St Nicholas Avenue >New York, NY 10032 > > > >-----Original Message----- >From: biopython-bounces@portal.open-bio.org on behalf of Alexander A. Morgan >Sent: Tue 8/2/2005 4:07 PM >To: aurelie.bornot@free.fr >Cc: biopython@biopython.org >Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? > >Hello: > I've just run into the same problem, and I haven't seen a suggested >fix go by, so I apologize if this is redundant information, but it seems >that the files I've been getting from NCBI have a

removed from the >header between the "RID: " line and the "

Query" line, and it is just >a blank line now. If you edit Bio.Blast.NCBIWWW to not look for the >"

", it seems to work okay. > >class _Scanner: >.... > def _scan_header(self, uhandle, consumer): >.... > > change: > attempt_read_and_call(uhandle, consumer.noevent, start='

in the Blast file... not very nice.. I know ! I will try your method instead ! Thanks ! Aurilie -------------- Aurelie BORNOT MNHN Paris ----- Original Message ----- From: "Alexander A. Morgan" To: Cc: Sent: Tuesday, August 02, 2005 10:07 PM Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? > Hello: > I've just run into the same problem, and I haven't seen a suggested fix > go by, so I apologize if this is redundant information, but it seems that > the files I've been getting from NCBI have a

removed from the header > between the "RID: " line and the "

Query" line, and it is just a blank > line now. If you edit Bio.Blast.NCBIWWW to not look for the "

", it > seems to work okay. > > class _Scanner: > .... > def _scan_header(self, uhandle, consumer): > .... > > change: > attempt_read_and_call(uhandle, consumer.noevent, start='

') > to: attempt_read_and_call(uhandle, consumer.noevent) > > > > > > aurelie.bornot@free.fr wrote: > >>Thank you very much Jessica !!! >> >>Unfortunately, I need a lot of thing in the BLAST reports..... >>It will be difficult to do the same thing as you did.... >> >>I will try to do something in the code of parser of Python. >>But it will be difficult for me.. >>so if you or someone has advices !!! >> >>Thanks a lot again for your answer Jessica ! >>Aurilie >> >> >>-------------- >>Aurelie BORNOT >>MNHN >>Paris >> >> >>_______________________________________________ >>BioPython mailing list - BioPython@biopython.org >>http://biopython.org/mailman/listinfo/biopython >> > > > > ------------------------------ Message: 2 Date: Thu, 4 Aug 2005 16:40:17 -0400 From: "Daniel Tomso" Subject: [BioPython] Blast and multiple processors To: Message-ID: Content-Type: text/plain; charset="us-ascii" Hello, all. I'm working on improving my BLAST throughput, and I have some questions about how the program handles multiple processors and multiple processes. Specifically, I've been experimenting with using BioPython's NCBIStandalone to handle 3 or 4 simultaneous blast requests, since my system has 4 processors. I spin out the requests via NCBIStandalone.blastall(blah, blah), then grab the blast_out and blast_err file handles in a list. Afterwards, I use blast_out.read() to collect the reports from each of the 4 processes. Is this wise and/or efficient? My execution times do drop off when I do, say, 4 jobs at a time instead of 1 at a time, so it is helping. Do the processor flags for blastall accomplish this more efficiently? Sorry if this is not specific enough, but any insight would be welcome!!!! Dan T. Daniel J. Tomso Senior Scientist, Bioinformatics Athenix Corporation 2202 Ellis Road Suite B Durham, NC 27703 919.281.0920 dtomso@athenixcorp.com www.athenixcorp.com Disclaimer: This message (including any attachments) may contain confidential or privileged information and is intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not use, copy, disclose or take any action based on this message or information herein. If you have received this message in error, please advise the sender immediately and erase all copies of this message and any related attachments. Thank you. ------------------------------ Message: 3 Date: Sun, 7 Aug 2005 20:08:24 -0400 From: "Michiel De Hoon" Subject: RE: [BioPython] Changes in NCBI BLAST output format !!?? To: "Alexander A. Morgan" Cc: biopython@biopython.org Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC297@cgcmail.cgc.cpmc.columbia.edu> Content-Type: text/plain; charset="iso-8859-1" I've updated Biopython in CVS with this fix. See Bio/Blast/NCBIWWW.py revision 1.41. Please let me know if you find any problems. Thanks for finding this solution. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Alexander A. Morgan [mailto:amorgan@mitre.org] Sent: Wed 8/3/2005 2:48 PM To: Michiel De Hoon Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? Michiel: I couldn't find anything wrong with it. The Blast record objects seem to be correct and have the right alignments. However, I don't know of a thorough way to test it. In general the parser is pretty fragile though and will break for even the most minor changes in NCBI format, but it would be very challenging to try to make it more robust. Thanks, -Alex Michiel De Hoon wrote: >Do you happen to know if this change can break anything else in the Blast >parser? From running Biopython's tests for Blast, it seems that this change >is OK. On the other hand, I don't use Blast much myself, so I don't trust my >own judgement in this matter. >If making this change does not cause any new bugs, I'd be happy to include it >in CVS. > >--Michiel. > > >Michiel de Hoon >Center for Computational Biology and Bioinformatics >Columbia University >1150 St Nicholas Avenue >New York, NY 10032 > > > >-----Original Message----- >From: biopython-bounces@portal.open-bio.org on behalf of Alexander A. Morgan >Sent: Tue 8/2/2005 4:07 PM >To: aurelie.bornot@free.fr >Cc: biopython@biopython.org >Subject: Re: [BioPython] Changes in NCBI BLAST output format !!?? > >Hello: > I've just run into the same problem, and I haven't seen a suggested >fix go by, so I apologize if this is redundant information, but it seems >that the files I've been getting from NCBI have a

removed from the >header between the "RID: " line and the "

Query" line, and it is just >a blank line now. If you edit Bio.Blast.NCBIWWW to not look for the >"

", it seems to work okay. > >class _Scanner: >.... > def _scan_header(self, uhandle, consumer): >.... > > change: > attempt_read_and_call(uhandle, consumer.noevent, start='

') > to: > attempt_read_and_call(uhandle, consumer.noevent) > > > > > >aurelie.bornot@free.fr wrote: > > > >>Thank you very much Jessica !!! >> >>Unfortunately, I need a lot of thing in the BLAST reports..... >>It will be difficult to do the same thing as you did.... >> >>I will try to do something in the code of parser of Python. >>But it will be difficult for me.. >>so if you or someone has advices !!! >> >>Thanks a lot again for your answer Jessica ! >>Aurilie >> >> >>-------------- >>Aurelie BORNOT >>MNHN >>Paris >> >> >>_______________________________________________ >>BioPython mailing list - BioPython@biopython.org >>http://biopython.org/mailman/listinfo/biopython >> >> >> >> > > >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython > > > ------------------------------ Message: 4 Date: Tue, 09 Aug 2005 09:13:10 -0400 From: Greg Wilson Subject: [BioPython] re: software skills course To: biopython@biopython.org Message-ID: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi, I'm working with support from the Python Software Foundation to develop an open source course on basic software development skills for people with backgrounds in science and engineering. I have a beta version of the course notes ready for review, and would like to pull in people in sci&eng to look it over and give me feedback. If you know anyone who fits this bill (particularly people who might be interested in following along with a trial run of the course this fall), I'd be grateful for pointers. Thanks, Greg Wilson ------------------------------ Message: 5 Date: Wed, 10 Aug 2005 13:55:09 +0800 From: "xuying" Subject: [BioPython] where to find an updated cookbook? To: "biopython" Message-ID: <20050810055436.9B14C10DECC@smtp.sibsnet.org> Content-Type: text/plain; charset="gb2312" Can anyone tell me where to find an updated biopython tutorial? Examples in the online tutorial are full of errors. Thanks! !!!!!!!!!!!!!!!!xuying !!!!!!!!!!!!!!!!xuying@sibs.ac.cn !!!!!!!!!!!!!!!!!!!!2005-08-10 ------------------------------ Message: 6 Date: Wed, 10 Aug 2005 13:11:49 -0700 From: Ann Loraine Subject: Re: [BioPython] re: software skills course To: Greg Wilson Cc: biopython@biopython.org Message-ID: <6f16141077ca7fbb9bd08da6746e2b5d@loraine.net> Content-Type: text/plain; charset=US-ASCII; format=flowed Hello, I would appreciate the chance to see the notes. It would be helpful to the postdocs and students I supervise who would like to learn python. Yours, Ann Loraine On Aug 9, 2005, at 6:13 AM, Greg Wilson wrote: > Hi, > > I'm working with support from the Python Software Foundation to develop > an open source course on basic software development skills for people > with backgrounds in science and engineering. I have a beta version of > the course notes ready for review, and would like to pull in people > in sci&eng to look it over and give me feedback. If you know anyone > who fits this bill (particularly people who might be interested in > following along with a trial run of the course this fall), I'd be > grateful for pointers. > > Thanks, > Greg Wilson > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > ------------------------------ Message: 7 Date: Thu, 11 Aug 2005 13:59:35 -0400 From: "Michiel De Hoon" Subject: RE: [BioPython] where to find an updated cookbook? To: "xuying" , "biopython" Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC2B0@cgcmail.cgc.cpmc.columbia.edu> Content-Type: text/plain; charset="iso-8859-1" > Can anyone tell me where to find an updated biopython tutorial? > Examples in the online tutorial are full of errors. Thanks! Can you make a list of the errors that you found? Then it'll be easier for us to fix those errors. If you have a solution to the errors that you found, even better! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 ------------------------------ Message: 8 Date: Fri, 19 Aug 2005 09:41:10 +0200 From: Jerome PANSANEL Subject: [BioPython] Vector NTI file import To: biopython@biopython.org Message-ID: <200508190941.11168.j.pansanel@pansanel.net> Content-Type: text/plain; charset="iso-8859-1" Hello, I would like to write some support for Vector NTI file (derived from GenBank format) for biopython. Is already someone working on it ? Is someone interested for debugging ? Thanks, Jerome Pansanel ------------------------------ Message: 9 Date: Fri, 19 Aug 2005 11:39:09 +0200 From: Frederic Sohm Subject: Re: [BioPython] Vector NTI file import To: biopython@biopython.org, Jerome PANSANEL Message-ID: <200508191139.09551.frederic.sohm@iaf.cnrs-gif.fr> Content-Type: text/plain; charset="iso-8859-1" Hi, I am definitely. Interested I mean. How do you plane to work on the NTI file format? I have had a look to it and it seems particularly complex. What kind of support do you have in mind? Cheers Fred Le vendredi 19 Ao{t 2005 09:41, Jerome PANSANEL a icrit : > Hello, > > I would like to write some support for Vector NTI file (derived from > GenBank format) for biopython. Is already someone working on it ? > Is someone interested for debugging ? > > Thanks, > > Jerome Pansanel > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Fridiric Sohm Equipe INRA U1126 "Morphogenhse du systhme nerveux des Chordis" UPR 2197 DEPSN, CNRS Institut de Neurosciences A. Fessard 1 Avenue de la Terrasse 91 198 GIF-SUR-YVETTE FRANCE Phone: +33 (0) 1 69 82 34 12 Fax:+33 (0) 1 69 82 34 47 ------------------------------ Message: 10 Date: Fri, 19 Aug 2005 09:31:55 -0300 From: Sebastian Bassi Subject: Re: [BioPython] Vector NTI file import To: biopython@biopython.org Message-ID: <4305D13B.3060308@genesdigitales.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Jerome PANSANEL wrote: > I would like to write some support for Vector NTI file (derived from GenBank > format) for biopython. Is already someone working on it ? > Is someone interested for debugging ? I do have a valid license for VNTI (not mine, but from the place I work), I could provide files. First obvious question: Is it documented? I could check manual if needed :) PS: Just changed email address, sorry if it is duplicate. ------------------------------ Message: 11 Date: Fri, 19 Aug 2005 17:43:18 +0200 From: Jerome PANSANEL Subject: Re: [BioPython] Vector NTI file import To: biopython@biopython.org Message-ID: <200508191743.19070.j.pansanel@pansanel.net> Content-Type: text/plain; charset="iso-8859-1" Le Vendredi 19 Ao{t 2005 14:31, Sebastian Bassi a icrit : > Jerome PANSANEL wrote: > > I would like to write some support for Vector NTI file (derived from > > GenBank format) for biopython. Is already someone working on it ? > > Is someone interested for debugging ? > > I do have a valid license for VNTI (not mine, but from the place I > work), I could provide files. It would be great ! > First obvious question: Is it documented? > I could check manual if needed :) I have not found any documentation. I only known that it's very similar to genbank file format. The main differences are : The header who can only contain LOCUS and SOURCE a lot of COMMENT Jerome Pansanel > > > PS: Just changed email address, sorry if it is duplicate. > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython ------------------------------ _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython End of BioPython Digest, Vol 32, Issue 2 **************************************** From frederic.sohm at iaf.cnrs-gif.fr Mon Aug 22 04:13:13 2005 From: frederic.sohm at iaf.cnrs-gif.fr (Frederic Sohm) Date: Mon Aug 22 04:04:28 2005 Subject: [BioPython] Vector NTI file import Message-ID: <200508221013.13555.frederic.sohm@iaf.cnrs-gif.fr> Le vendredi 19 Ao?t 2005 17:44, vous avez ?crit?: > Hi > > Le Vendredi 19 Ao?t 2005 11:39, vous avez ?crit?: > > Hi, > > > > I am definitely. Interested I mean. How do you plane to work on the NTI > > file format? I have had a look to it and it seems particularly complex. > > What kind of support do you have in mind? > > I think about importing data. vector NTI seems to easely import genbank > file, so it is not necessary to export this type of file, it isn't ? > Yes, it is largely enough. But I mean do you plan to import the Genbank part of the vector NTI format (the genbank fields after Features) or the Vector NTI part of it which records everything displayed by Vector NTI in the graphical map. It make quite a difference to the amount of code to write. Anyway I can do some testing of your code. good luck. Fred > Jerome > > > Cheers > > > > Fred > > > > Le vendredi 19 Ao?t 2005 09:41, Jerome PANSANEL a ?crit?: > > > Hello, > > > > > > I would like to write some support for Vector NTI file (derived from > > > GenBank format) for biopython. Is already someone working on it ? > > > Is someone interested for debugging ? > > > > > > Thanks, > > > > > > Jerome Pansanel > > > > > > _______________________________________________ > > > BioPython mailing list ?- ?BioPython@biopython.org > > > http://biopython.org/mailman/listinfo/biopython -- Fr?d?ric Sohm Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s" UPR 2197 DEPSN, CNRS Institut de Neurosciences A. Fessard 1 Avenue de la Terrasse 91 198 GIF-SUR-YVETTE FRANCE Phone: +33 (0) 1 69 82 34 12 Fax:+33 (0) 1 69 82 34 47 From mdehoon at c2b2.columbia.edu Mon Aug 22 10:26:01 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon Aug 22 10:17:07 2005 Subject: [BioPython] FW: NETTAB 2005 - Deadlines approaching: early registration and call (fwd) Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC2CB@cgcmail.cgc.cpmc.columbia.edu> FYI Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: mailman-bounces@portal.open-bio.org on behalf of Paolo Romano Sent: Mon 8/22/2005 9:52 AM To: biopython-owner@biopython.org Subject: NETTAB 2005 - Deadlines approaching: early registration and call (fwd) Dear list owner, I would be glad if you could forward the following message to your mailing list. Thank you in advance. Best regards. Paolo Romano This message has been forwarded by paolo.romano@istge.it -------- Dear all, this is a reminder for next deadlines of the NETTAB 2005 Workshop on "Workflows management: new abilities for the biological information overflow" that will be held on next October 5-7, 2005, in Naples, Italy. ---------------------------------------------------------------------- The Scientific Programme is now available on-line at http://www.nettab.org/2005/progr.html . The Opening Lecture will be given by Francis Ouellette. Francis Ouellette is the Director of the UBiC, the Bioinformatics Centre of the University of British Columbia, Canada. He is Associate Professor of the Michael Smith Laboratories and of the Department of Medical Genetics of UBC. He is also a core faculty member in the new UBC graduate Training Program in Bioinformatics for Health Research, associate director of Bioinformatics at Genome British Columbia and director of the Canadian Genetic Diseases Network (CGDN) bioinformatics core facility where he helps coordinate the Canadian Bioinformatics Workshops. Francis has an exceptional curriculum and his recent research, training and coordination activities make him one of the most known and appreciated bioinformaticians. The title of his opening talk will be "Workflow management in bioinformatics: the possibilities and the challenges". ---------------------------------------------------------------------- The Call for posters and position papers is closing on next friday August 26, 2005. You are all warmly invited to present your recent activity related to the workshops' topics by submitting a poster abstract (1-2 A4 pages, font size 12 pt, MS Word format) by email to posters2005@nettab.org . Topics are the following: Technologies and technological platforms of interest, with emphasis on: - Web Services (SOAP, WSDL, WSFL, UDDI, ....) - Web Services Choreography and Orchestration - Semantic Web (RDF, LSID, OWL, ...) - comparison of available technologies, limitations, pros and cons - knowledge representation - biological data and knowledge modeling tools - Ontologies, Databases and Applications of Semantics in Bioinformatics Workflow management systems in bioinformatics - implementations of web services - implementations of registries - reuse and versioning of web services and workflows - workflow management systems - web interfaces for accessing and executing workflows - interactive systems to support work flows Applications of workflow management systems in bioinformatics - Methodologies for life sciences analysis, such as: - gene expression, - genome annotation, - mass spec peptide fragment identification, - Encoding of the above in workflows - Case studies - Scenarios and use cases Check all details at: http://www.nettab.org/2005/call.html . ----------------------------------------------------------------------- The early registration deadline is also next Friday August 26, 2005. The registration form is available on-line at: http://www.nettab.org/2005/rform.html . The payment of the fee can either be done on-line, through the Online Payment Form of the Bioinformatics Italian Society (BITS), or by direct money transfer. Participation fees are as follows: Until August 26, 2005: - Students: 70,00 Euro - Academic: 130,00 Euro (reduced fee: 117,00 Euro) - Non-academic: 270,00 Euro (reduced fee: 243,00 Euro) After August 26, 2005: - Students: 70,00 Euro - Academic: 180,00 Euro (reduced fee: 162,00 Euro) - Non-academic: 370,00 Euro (reduced fee: 333,00 Euro) The 10% reduction on fees is applied for members of: - ISCB (International Society for Computational Biology), http://www.iscb.org/ - BITS (Bioinformatics Italian Society), http://www.bioinformatics.it/ - Hormone Responsive Breast Cancer (HRBC) Genomics Network, http://www.hrbc-genomics.net/ - Oncology over Internet (O2I) project, http://www.o2i.it/ - Interdisciplinary Laboratory for Technologies in Bioinformatics (LITBIO) ----------------------------------------------------------------------------- --- I'm looking forward to meeting many of you in Naples quite soon. Ciao. Paolo -- Paolo Romano (paolo.romano@istge.it) Bioinformatics and Structural Proteomics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 Web: http://www.nettab.org/promano/ From meames at itsa.ucsf.edu Tue Aug 23 15:42:32 2005 From: meames at itsa.ucsf.edu (meames@itsa.ucsf.edu) Date: Tue Aug 23 15:31:59 2005 Subject: [BioPython] Formating files for Clustalw Message-ID: <200508231942.j7NJgWmG029335@itsa.ucsf.edu> Hi all I'm working my way though the cookbook and I've run in to a snag in section 3.5.1 - Clustalw I've created a simple two-entry FASTA file for aligning but the parser appears to reject it. There are no question marks or other punctuation in the titles (such as I've read on this board) that would seem to give it trouble, so I'm at a bit of a loss. Can anyone help? (I'm running clustalw 1.81) Here is the error message: Traceback (most recent call last): File "./practice.py", line 21, in ? alignment = Clustalw.do_alignment(cline) File "/usr/lib/python2.3/site-packages/Bio/Clustalw/__init__.py", line 116, in do_alignme nt return parse_file(out_file, alphabet) File "/usr/lib/python2.3/site-packages/Bio/Clustalw/__init__.py", line 55, in parse_file parser.parseFile(to_parse) File "/usr/lib/python2.3/site-packages/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/usr/lib/python2.3/site-packages/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 0 Here is the code: cline = MultipleAlignCL('file_to_align') cline.set_output('test.aln') alignment = Clustalw.do_alignment(cline) Here is the file_to_align: >sptrembl|Q00647|Q00647 Myosin I heavy chain. [Emericella nidulans] MGHSRRPAGGEKKSRFGRSKAAADVGDGRQAGGKPQVRKAVFESTKKKEIGVSDLTLLSK ISNEAINDNLKLRFQHDEIYTYIGHVLVSVNPFRDLGIYTDSVLNSYRGKNRLEVPPHVF AVAESAYYNMKSYKDNQCVIISGESGAGKTEAAKRIMQYIASVSGGSDSSIQQTKDMVLA TNPLLESFGNAKTLRNNNSSRFGKYLELEFNAQGEPVGANITNYLLEKSRVVGQITNERN FHIFYQFAKGAPQKYRDSFGVQQPQSYLYTSRSKCFDVPGVDDVAEFQDTLNAMSVIGMS EAEQDNVFRMLAAILWMGNIQFAEDDSGNAAITDQSVVDFVAYLLEVDAGQVNQALTIRM METSRGGRRGSVYEVPLNTTQALAVRDALAKAIYFNLFDWIVGRVNQSLTAKGAVANSIG ILDIYGFEIFEKNSFEQLCINYVNEKLQQIFIQLTLKAEQDEYEREQITWTPIKYFDNKV VCSLIEDKRPPGVFAALNDACATAHADSGAADNTFVGRLNFLGQNPNFENRQGQFIIKHY AGDVSYAVQGMTDKNKDQLLKDLLNLVQSSSNHFVHTLFPEQVNQDDKRRPPTASDKIKA SANDLVAMLMKAQPSYIRTIKPNDNKAPKEFNESNVLHQIKYLGLQENVRIRRAGFAYRQ TFDKFVERFYLLSPKTSYAGDYTWTGDVETGARQILKDTRIPAEEYQMGITKVFIKTPET LFALEAMRDRYWHNMAIRIQRAWRNYLRYRTECAIRIQRFWPRMNGGLELLKLRDQGHTI LGGRKERRRMSILGSRRFLGDYVGISNKGGPGEMIRSGAAISTSDDVLFSCRGEVLVSKF GRSSKPSPRIFVLTNRHVYIVSQNFVNNQLVISSERTIPIGAIKTVSASSYRDDWFSLVV GGQEPDPLCNCVFKTEFFTHLHNALRGQLNLKIGPEIEYNKKPGKLATVKVVKDGSQVDS YKSGTIHTGPGEPPNSVSKPTPRGKQVAARPVTKGKLLRLAVQAVARPNWLPDLYQSVGL YHSPRLKQPRRNRHQRPDPFLNQWQPLQHPIHVLHLLPPQGHHPRLLPRPPAAAGPKKAK ALYDFSSDNNGMLSISAGQIVEIVSKEGNGWWLCMNLETSAQGWTPEAYLEEQVAPTPKP APPPPPPVAPRASPAPVNGSAAVAAAKAKAAPPPPAKRPNMAGRKTAPAPPPAPRDSAVS MNSQGDSSGASGRGTPSSVSNACLAGGLAEALRRRQSAMQGKQDDDDDW >gi|17507983|ref|NP_492393.1| F29D10.4 [Caenorhabditis elegans] MAFHWQSKVNVQHVGVDDMVLLPKLTEQSIVENLKKRLQANSIFTYIGPVLISVNPFKQM PYFTEKEMLLYQGAAQYENAPHIYALADNMYRNMLIDNESQCVIISGESGAGKTVNAKFI MNYISRISGGGQKVQHIKDVILQSNPLLEAFGNSATVRNWNSSRFGKYVEIVFSRGGEPI GGKLSNFLLEKSRVVHQNEGDRNFHVFYQLCAGADKNLRSTFGIGELQYYNYLNMSGVFK ADDTDDGKEFESTLHAMKVVGVNDQDQLEVLRIVATVLHIGNITFTEENNFAAVSGKDYL EYPAFLLGLTSADIEAKLTGRKMESKWGTQKEEIDMKLNVEQASYTRDAWVKAIYARLFD YLVKKVNDAMNITSQSTSDNFSVGILDIYGFEIFNNNGFEQFCINFVNEKLQQIFIELTL KAEQEEYVREGIKWTEIDYFDNKIVCDLIETKRPPGIMSLLDDTCAQNHGQREGVDRQLL TTLSKSFAGHPHFGPGSDSFVIKHYAGDVTYNVDGFCDRNRDVLYPDLILLMQKSSRPFI QALFPENVAASAGKRPTTFSTKIRTQANTLVESLMKCSPHYVRCIKPNETKRPNDWEESR VKHQVEYLGLRENIRVRRAGFAYRRAFDKFAQRYAIVSPQTWPCFQGDQQRACEIICDSV HMEKNQYQMGKTKIFVKNPESLFLLEETRERKFDGYARVIQKAWRQFSARKQHIKQKEQA ADLMYGKKERRRYSLNRNFVGDYIGLEHHPTLQSLVGKRQRVLFACTANKYDRKFRVTKL DLLLTVNHLTLIGKEKVKNGPEKGKIVEVIKRQFDLPQIKSIGLSPYQDDFVILYLGNDD YSSLLETPFKTEFCTALSKAYKERTNGTLHLDFRSSHVVSYKKMKFDFSDGKRTVQFGND GTSSAEKTLKPNGKVLNVSIGTGLPNTTRPSTERPQGGYTPRRDQLRTSTRRTKQNNQSY GQNGQSQAMRAPVPAHGMNNNYNQTPAPVSTNHQYSQEPARIPVMGNVINQLNNMNLSGN GNSPAGRGPPPARGPKPPPPAKPKLNPVVIAVYPYEAQDVDELSFEAGAEIELMNKDASG WWQGKVNNRVGLFPGNYVKE I've also attempted to run the simple command line: clustalw ./file_to_align -OUTFILE=test.aln without success, resulting in the error message: Error: unknown option -./file_to_align Thanks Matt "I'm new at this" Eames From j.pansanel at pansanel.net Wed Aug 24 03:42:02 2005 From: j.pansanel at pansanel.net (Jerome PANSANEL) Date: Wed Aug 24 03:37:59 2005 Subject: [BioPython] Formating files for Clustalw In-Reply-To: <200508231942.j7NJgWmG029335@itsa.ucsf.edu> References: <200508231942.j7NJgWmG029335@itsa.ucsf.edu> Message-ID: <200508240942.03064.j.pansanel@pansanel.net> Le Mardi 23 Ao?t 2005 21:42, meames@itsa.ucsf.edu a ?crit?: > Hi all > ... Hi, 1. clustalw -infile=file_to_align -outfile=test.aln is working very well by me (clustalw 1.83) 2. In your code, how looks your test.aln file ? Is it like an clustalw file ? Is your 'test.aln' file like this : CLUSTAL W (1.83) multiple sequence alignment sptrembl|Q00647|Q00647 MGHSRRPAGGEKKSRFGRSKAAADVGDGRQAGGKPQVRKAVFESTK KKEI gi|17507983|ref|NP_492393.1| MAFHWQSK------------------------------------VNVQHV *.. :. .: :.: sptrembl|Q00647|Q00647 GVSDLTLLSKISNEAINDNLKLRFQHDEIYTYIGHVLVSVNPFRDLGIYT gi|17507983|ref|NP_492393.1| GVDDMVLLPKLTEQSIVENLKKRLQANSIFTYIGPVLISVNPFKQMPYFT **.*:.**.*:::::* :*** *:* :.*:**** **:*****::: :* ... Jerome Pansanel -- From pwilkinson_m at xbioinformatics.org Wed Aug 24 16:31:30 2005 From: pwilkinson_m at xbioinformatics.org (Peter Wilkinson) Date: Wed Aug 24 16:20:05 2005 Subject: [BioPython] Vector NTI file import In-Reply-To: <200508191744.49362.j.pansanel@pansanel.net> References: <200508190941.11168.j.pansanel@pansanel.net> <200508191139.09551.frederic.sohm@iaf.cnrs-gif.fr> <200508191744.49362.j.pansanel@pansanel.net> Message-ID: <6.2.1.2.0.20050824162113.034978e8@mail.xbioinformatics.org> Hi there, .... Since I used to work for the company who created Vector NTI. The Vector NTI format is an adulteration of the Genbank format. The format is simple: Genbank + additional data in COMMENT TAG. Vector NTI stores up additional data associated with your sequence in flat-files in the back end. In order to keep it in the 'Genbank' format, it takes the additional data stored in Vector NTI and the stores name/value pair combinations inside a Genank COMMENT tag when it exports a Genbank format and some serialization information that is stored up in NTI (NTI proprietary serialization format ...). You will immediately see how this works if you export a Genbank file from Vector NTI. So if you had done some annotations of some kind, or primers or whatever that is where you would find them. This was done in order to make sure an exported Vector NTI Genbank format was compatible with other software that recognises the Genbank format, whiles retaining information that was added within NTI. For me, if you mess with the output formatting, its no longer officially a Genbank format but a Vector NTI's format ... but that is a philosophical debate. Feel free to contact me, send my a sample output, I have not worked with NTI for a while ... but it will all come back to me. Peter At 11:44 AM 19/08/2005, Jerome PANSANEL wrote: >Hi > >Le Vendredi 19 Ao?t 2005 11:39, vous avez ?crit : > > Hi, > > > > I am definitely. Interested I mean. How do you plane to work on the NTI > > file format? I have had a look to it and it seems particularly complex. > > What kind of support do you have in mind? > >I think about importing data. vector NTI seems to easely import genbank file, >so it is not necessary to export this type of file, it isn't ? > >Jerome > > > Cheers > > > > Fred > > > > Le vendredi 19 Ao?t 2005 09:41, Jerome PANSANEL a ?crit : > > > Hello, > > > > > > I would like to write some support for Vector NTI file (derived from > > > GenBank format) for biopython. Is already someone working on it ? > > > Is someone interested for debugging ? > > > > > > Thanks, > > > > > > Jerome Pansanel > > > > > > _______________________________________________ > > > BioPython mailing list - BioPython@biopython.org > > > http://biopython.org/mailman/listinfo/biopython > > >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython From pwilkinson_m at xbioinformatics.org Wed Aug 24 16:43:15 2005 From: pwilkinson_m at xbioinformatics.org (Peter Wilkinson) Date: Wed Aug 24 16:31:43 2005 Subject: [BioPython] Fasta parser, minor (bug/feature?) In-Reply-To: <200508240942.03064.j.pansanel@pansanel.net> References: <200508231942.j7NJgWmG029335@itsa.ucsf.edu> <200508240942.03064.j.pansanel@pansanel.net> Message-ID: <6.2.1.2.0.20050824163137.03497658@mail.xbioinformatics.org> It seems that the fasta parser retains the os specific line endings when it stores the title and sequence in the Record object, so I have to write out something like this when I read a file from working in windows (eeeeek), then display using a true text editor like Context: file_out.writelines(str(cur_record).replace('\r','')) ... because all the line endings are '\r\n', and are displayed in the text editor as 2 returns, or double spacing the text when written to file instead of single space: >gi|272209|gb|M61959.1| EST00007 Fetal brain, Stratagene (cat#936206) ... CTTCCCTTTTGTTCCCCTCAGTGTCCCTTTTAATTGCTTCCCTCCATTTTCCTTAGCAGC ATCCTAGTTGATGGTCTGGGTTATCAGAGGAGCAAAAACATTTAAGTGTCAAATAATGCT CATTGTCTCCCTGGGATTTCTAAACAGAAAAAATGAAGAAAGAGGCAGAGAAGAGCTTCA Should the behavior to allow both single and os specific line returns be applied, or just '\n'? I realise that the Record __str() method uses os.linesep, but when working with fasta files in a true text editor in windows ... only the \n is needed. Also I work generally in a mixed environment and the \r\n should be avoided. I am unsure why os.linesep is used here. My vote is to just have a plain '\n' applied to each end of line. Peter From sbassi at genesdigitales.com Thu Aug 25 14:59:12 2005 From: sbassi at genesdigitales.com (Sebastian Bassi) Date: Thu Aug 25 14:48:47 2005 Subject: [BioPython] Fasta parser, minor (bug/feature?) In-Reply-To: <6.2.1.2.0.20050824163137.03497658@mail.xbioinformatics.org> References: <200508231942.j7NJgWmG029335@itsa.ucsf.edu> <200508240942.03064.j.pansanel@pansanel.net> <6.2.1.2.0.20050824163137.03497658@mail.xbioinformatics.org> Message-ID: <430E1500.1010804@genesdigitales.com> Peter Wilkinson wrote: > It seems that the fasta parser retains the os specific line endings when > it stores the title and sequence in the Record object, so I have to > write out something like this when I read a file from working in windows > (eeeeek), then display using a true text editor like Context: .... I've just run into this problem too!. But this seems that is something that changed. And I will tell why I think so: I made a script on June 2004 and it worked as expected (without showing double spacing). Then I change PC and installed Py2.4 and last biopython (1.4). Today I run the same program and it printed the fasta format with double space (see atached example). So I think this a bug caused by either Python 2.4 or BioPy1.4 :) >QH_CA_Contig1507for ATTACGGTCGGGGAGTGGATCCGATATCGATATGATGGTAGGGATCCCTAACTCGCGATCTTCAATACGT TGCTGCAAGTCGTGACAATTCATTTGATTGGGTATGGAGAAACATCATGAGTTATCCGGATGTCAAATTT CCTTACATAGCAGTTGGTAACGAGGTCAACCCATCCGATGGCACATTGGCTCCATTGGTTCATCCGGCTT TGACCAACATCCAAGAAGCTGTCTCGTTTTATGGCCTCAAGGATCAAATCAAAGTTTCAACTTCGATCGA CACATCTATGATTGGAGTTAGTTATCCTCCGTCACAAGGTGCATTCAGCGATGATGCCCGTGCGTACATA GACCCGATCATCGGGTTCCTAGTTGCCATCAATGCACCATTGTTGGTTAATGTCTATCCATATTTCAGTT ACACAGGAAATCCGACACAGATATCACTAGCCTATGCAACATTTACTTCTCCTGGAACCGTAGTACAAGA TGGAGCAAATGGATACCAAAACCTTTTTGACGCGATAGTAGATGCGATGTACTCAGCGTTAGAGAGGGCC From chris.lasher at gmail.com Thu Aug 25 18:03:24 2005 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu Aug 25 17:54:23 2005 Subject: [BioPython] Why would this GenBank file choke the GB parser? Message-ID: <128a885f050825150342c609d3@mail.gmail.com> Hello, I have a GenBank file, accession AY499671.gb, and 21 like it that I would like to process through BioPython (I am using BioPython 1.40b with Windows), but I am encountering trouble. It seems that the GenBank parser is choking on something in the files themselves, but I could really use help determining what this would be, and in determining how to fix it. The error seems to be raised by the Martel Parser, but exactly what is causing it to raise the error is beyond my lack of knowledge and inexperience. I obtained the files from GenBank via the NCBI Entrez website pages, i.e., http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=41080113 . From a page like this one, I selected "File" in the dialog box labeled "Send to", and saved the file. I also tried obtaining the files via BioEdit and saving those, but the parser still had difficulty with those, as well. I am attaching my script "gbtoseq.py" that I'm trying to process my GB files with. I have had success with this script from sequences obtained from GenBank in the manner described above and can recreate this success, and I am including one of those successful sequences, AFU75647. I am also attaching the error output when it chokes on these 22 most recent sequences I've obtained. I sincerely appreciate any help anyone has to offer. Thanks very much in advance, Chris Lasher -------------- next part -------------- A non-text attachment was scrubbed... Name: AY499671.gb Type: pubmed/text Size: 2663 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/AY499671-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: AFU75647.gb Type: pubmed/text Size: 3096 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/AFU75647-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: gbtoseq.py Type: text/x-python Size: 1134 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050825/78c4dd26/gbtoseq-0001.py -------------- next part -------------- C:\Documents and Settings\chris\My Documents\scripts\pythonscripts\gbtoseq>gbtos eq.py Now on AFU75647.gb Writing to AFU75647.seq Now on AY499671.gb Traceback (most recent call last): File "C:\Documents and Settings\chris\My Documents\scripts\pythonscripts\gbtos eq\gbtoseq.py", line 30, in ? parserecord = gbiterator.next() File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 129, in nex t return self._parser.parse(File.StringHandle(data)) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 219, in par se self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\GenBank\__init__.py", line 1259, in fe ed self._parser.parseFile(handle) File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 356, in parseStrin g self._err_handler.fatalError(result) File "C:\Python24\lib\xml\sax\handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 64 From biopython at maubp.freeserve.co.uk Fri Aug 26 05:01:09 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri Aug 26 04:51:56 2005 Subject: [BioPython] Why would this GenBank file choke the GB parser? In-Reply-To: <128a885f050825150342c609d3@mail.gmail.com> References: <128a885f050825150342c609d3@mail.gmail.com> Message-ID: <430EDA55.303@maubp.freeserve.co.uk> Chris Lasher wrote: > Hello, > > I have a GenBank file, accession AY499671.gb, and 21 like it that I > would like to process through BioPython (I am using BioPython 1.40b > with Windows), but I am encountering trouble.... Hi Chris, Looking at your GenBank files by eye, I didn't spot anything "wrong" except I note there is a blank final line which has caused trouble in the past: http://www.biopython.org/pipermail/biopython/2005-April/002607.html Could you edit the GenBank file by hand to confirm this is the problem? I don't know if a fix for this was ever made... it should just be a small tweak to the GenBank file format definition for the Martel parser. ----------------------------------------------------------------------- Alternatively, I'm using a different GenBank parser (in order to cope with much larger GenBank files) and this works fine with your example (attached to previous email). You could try the patch on this bug to see if it solves your problem: http://bugzilla.open-bio.org/show_bug.cgi?id=1747 If you have trouble with the patch file, I can send you a modified version of the Bio/GenBank/__init__.py file which you can use to replace the existing one if that is easier. Note that this version might not work with the GenBank.Dictionary as I have never tried that... Peter From jtk at cmp.uea.ac.uk Fri Aug 26 05:59:46 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri Aug 26 05:51:22 2005 Subject: [BioPython] GenBank Format & Parsing (was: Why would this GenBank file choke the GB parser?) In-Reply-To: <128a885f050825150342c609d3@mail.gmail.com> References: <128a885f050825150342c609d3@mail.gmail.com> Message-ID: <20050826095946.GG4175@jtkpc.cmp.uea.ac.uk> On Thu, Aug 25, 2005 at 06:03:24PM -0400, Chris Lasher wrote: > Hello, > > I have a GenBank file, accession AY499671.gb, and 21 like it that I > would like to process through BioPython (I am using BioPython 1.40b > with Windows), but I am encountering trouble. It seems that the > GenBank parser is choking on something in the files themselves, but I > could really use help determining what this would be, and in > determining how to fix it. The error seems to be raised by the Martel > Parser, but exactly what is causing it to raise the error is beyond my > lack of knowledge and inexperience. I've run into similar problems a while ago, the parser is rather picky about certain things. In your case, AY499671 gives "ENV" as the division in the DEFINITION line (first line of the file), and it turns out that BioPython doesn't know about this division. Specifically, this is in Bio/expressions/genbank.py: valid_divisions = ["PRI", "ROD", "MAM", "VRT", "INV", "PLN", "BCT", "RNA", "VRL", "PHG", "SYN", "UNA", "EST", "PAT", "STS", "GSS", "HTG", "HTC", "CON"] Chances are very good that by adding "ENV" to that list, you'll fix your problem. I've tried changing ENV to BCT in the GenBank file and that fixed it. While we're at this: Yeast chromosome GenBank files which I downloaded recently have ACCESSION NC_001133 REGION: 1..230208 which the GenBank parser doesn't like either. I've patched my Bio/expressions/genbank.py to accept this, but I haven't been able to find any documentation of this -- I just checked the GenBank release notes (ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt) again. Can anyone comment on this? Personally, I can't help but wonder whether it would not be possible for the GenBank format to converge to stability after so many years... Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From chris.lasher at gmail.com Fri Aug 26 10:17:09 2005 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri Aug 26 10:07:06 2005 Subject: [BioPython] Re: GenBank Format & Parsing (was: Why would this GenBank file choke the GB parser?) In-Reply-To: <20050826095946.GG4175@jtkpc.cmp.uea.ac.uk> References: <128a885f050825150342c609d3@mail.gmail.com> <20050826095946.GG4175@jtkpc.cmp.uea.ac.uk> Message-ID: <128a885f050826071722ba6ea5@mail.gmail.com> That was tremendously helpful! Thank you very much, Dr. Kim! Should this change be added to the CVS of Bio/expressions/genbank.py, and if so, is that something I should do, or something one of the active developers should do? Thanks again, very much, Chris Lasher On 8/26/05, Jan T. Kim wrote: > I've run into similar problems a while ago, the parser is rather picky > about certain things. > > In your case, AY499671 gives "ENV" as the division in the DEFINITION line > (first line of the file), and it turns out that BioPython doesn't know > about this division. Specifically, this is in Bio/expressions/genbank.py: > > valid_divisions = ["PRI", "ROD", "MAM", "VRT", "INV", "PLN", "BCT", "RNA", > "VRL", "PHG", "SYN", "UNA", "EST", "PAT", "STS", "GSS", > "HTG", "HTC", "CON"] > > Chances are very good that by adding "ENV" to that list, you'll fix your > problem. I've tried changing ENV to BCT in the GenBank file and that > fixed it. > > While we're at this: Yeast chromosome GenBank files which I downloaded > recently have > > ACCESSION NC_001133 REGION: 1..230208 > > which the GenBank parser doesn't like either. I've patched my > Bio/expressions/genbank.py to accept this, but I haven't been able to > find any documentation of this -- I just checked the GenBank release > notes (ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt) again. Can anyone comment > on this? > > Personally, I can't help but wonder whether it would not be possible for > the GenBank format to converge to stability after so many years... > > Best regards, Jan > -- > +- Jan T. Kim -------------------------------------------------------+ > | *NEW* email: jtk@cmp.uea.ac.uk | > | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | > *-----=< hierarchical systems are for files, not for humans >=-----* > From kael.fischer at gmail.com Tue Aug 30 18:10:04 2005 From: kael.fischer at gmail.com (Kael Fischer) Date: Tue Aug 30 17:59:18 2005 Subject: [BioPython] BioPython/BioSQL Status? How to move forward? Message-ID: Hi all: I am involved in a metagenomic project that needs a powerful, fast and relational database. After some study of the schema I have decided I would like to try to use BioSQL. Numerous issues with BioPython/BioSQLhave come up. Of course I have been doing a lot of Googlin' and CVS browsing to see what people have found before me, with limited success. Although BioSQL has some documentation problems, from what I can tell there are a lot of BioPython specific problems too. At least there are problems we can start to work on on the BioPython side. I am wondering if there is interest here to get into the nuts and bolts of this. In particular: 1) How much interest, in general, is there in BioSQL and BioPython playing well together? 2) Where should I send my patches? Rgds, Kael -- Kael Fischer, Ph.D DeRisi Lab - Univ. Of California San Francisco 415-514-4320