From mictadlo at gmail.com Mon Aug 5 02:58:30 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 5 Aug 2013 16:58:30 +1000 Subject: [Biopython] GTF writing Message-ID: Hello, What is necessary to change the following example http://biopython.org/wiki/GFF_Parsing#Writing_GFF3_from_scratch so the output would be GTF instead of GFF3? Thank you in advance. Cheers, Mic From chapmanb at 50mail.com Mon Aug 5 06:44:05 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 05 Aug 2013 06:44:05 -0400 Subject: [Biopython] GTF writing In-Reply-To: References: Message-ID: <87li4g8lje.fsf@fastmail.fm> Mic; > What is necessary to change the following example > http://biopython.org/wiki/GFF_Parsing#Writing_GFF3_from_scratch so the > output would be GTF instead of GFF3? Unfortunately, it doesn't support output to GTF. This was never a priority because GTF is a underdefined format that GFF3 replaces, so the goal was to move towards the current specification. Hope this helps, Brad From dalke at dalkescientific.com Mon Aug 5 19:09:05 2013 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 6 Aug 2013 01:09:05 +0200 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> Message-ID: <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> A bit late, but a bit of background: > On Sun, Jul 14, 2013 at 5:40 PM, Katrina Lexa wrote: >> My PDB file came from Maestro, so that is the ordering it follows after 9999. On Jul 15, 2013, at 7:46 PM, Peter Cock wrote: > i.e. This software package? http://www.schrodinger.com/productpage/14/12/ > > Could you contact their support to find out why they are doing this please? Yes, that's the Maestro Katrina was almost certainly talking about. It's a commercial package which has been around for a while; the company started in 1990 as a commercialization of the Jaguar QM package from Richard Friesner's and William Goddard's labs at CalTech. Maestro is the GUI to their QM and MM codes. Their conversion routines support various options. See: https://www.schrodinger.com//AcrobatFile.php?type=supportdocs&type2=&ident=530 The key ones are: -hex : Use hexadecimal encoding for atom numbers greater than 99999 and for residue numbers greater than 9999 and -hybrid36 : Use the hybrid36 scheme for atom serial numbers. On input, integers of up to 6 digits and hexadecimal numbers are recognized on ATOM records by default. On output, the default is to use integers for less than 100 000 atoms, and hexadecimal for 100 000 atoms or more Annoyingly, as Robert Hanson reported in: http://www.mailinglistarchive.com/html/jmol-users at lists.sourceforge.net/2013-01/msg00111.html (and see the thread at) http://article.gmane.org/gmane.science.chemistry.blue-obelisk/1659/match=pdb+ok+who%27s+wise+guy their default output generates records like: ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H which means there can be two atoms with serial numbers "18700" (or "99999", etc) in the same file, with different meanings of what those numbers really mean. This obviously messes up all of the other PDB annotations which use a serial id, but I presume that most Maestro user only use PDB files for coordinate data, and not for the other fields. Maestro is the only program I know of which uses this awful form. A default enabling of the "-hybrid36" option (first-digit-is-in-base-36) would make it more consistent with tools in the X-PLOR/VMD heritage does, where A0000 follows 99999. Presumably they want the full 1,048,575 atom range. > If there are guidelines in the PDB specification for when this field overflows > I missed them, but it is a problem is there are rival hacks in common use > (roll-over/wrap-around versus this semi-hex scheme). There are no specs for how to handle more than 9999 residues, just like there are no specs for how to handle more than 99999 atoms. Cheers, Andrew dalke at dalkescientific.com From p.j.a.cock at googlemail.com Tue Aug 6 05:35:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 6 Aug 2013 10:35:25 +0100 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: On Tue, Aug 6, 2013 at 12:09 AM, Andrew Dalke wrote: > A bit late, but a bit of background: > >> On Sun, Jul 14, 2013 at 5:40 PM, Katrina Lexa wrote: >>> My PDB file came from Maestro, so that is the ordering it follows after 9999. > > On Jul 15, 2013, at 7:46 PM, Peter Cock wrote: >> i.e. This software package? http://www.schrodinger.com/productpage/14/12/ >> >> Could you contact their support to find out why they are doing this please? > > Yes, that's the Maestro Katrina was almost certainly talking about. It's a > commercial package which has been around for a while; the company > started in 1990 as a commercialization of the Jaguar QM package from > Richard Friesner's and William Goddard's labs at CalTech. Maestro is > the GUI to their QM and MM codes. > > Their conversion routines support various options. See: > https://www.schrodinger.com//AcrobatFile.php?type=supportdocs&type2=&ident=530 > > The key ones are: > > -hex : Use hexadecimal encoding for atom numbers greater > than 99999 and for residue numbers greater than 9999 > > and > > -hybrid36 : Use the hybrid36 scheme for atom serial numbers. > On input, integers of up to 6 digits and hexadecimal numbers are > recognized on ATOM records by default. On output, the default is > to use integers for less than 100 000 atoms, and hexadecimal for > 100 000 atoms or more > > > Annoyingly, as Robert Hanson reported in: > http://www.mailinglistarchive.com/html/jmol-users at lists.sourceforge.net/2013-01/msg00111.html > (and see the thread at) > http://article.gmane.org/gmane.science.chemistry.blue-obelisk/1659/match=pdb+ok+who%27s+wise+guy > > their default output generates records like: > > ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H > ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H > ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O > ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H > ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H > > which means there can be two atoms with serial numbers "18700" (or > "99999", etc) in the same file, with different meanings of what those > numbers really mean. > > This obviously messes up all of the other PDB annotations which use > a serial id, but I presume that most Maestro user only use PDB files > for coordinate data, and not for the other fields. > > Maestro is the only program I know of which uses this awful form. A > default enabling of the "-hybrid36" option (first-digit-is-in-base-36) > would make it more consistent with tools in the X-PLOR/VMD > heritage does, where A0000 follows 99999. Presumably they want > the full 1,048,575 atom range. > > >> If there are guidelines in the PDB specification for when this field overflows >> I missed them, but it is a problem is there are rival hacks in common use >> (roll-over/wrap-around versus this semi-hex scheme). > > There are no specs for how to handle more than 9999 residues, > just like there are no specs for how to handle more than 99999 atoms. > > Cheers, > > > Andrew > dalke at dalkescientific.com Thanks Andrew - useful background. In the long run this problem should go away as the PDB moves to using the The PDBx/mmCIF format: http://www.wwpdb.org/news/news_2013.html#22-May-2013 Peter From dalke at dalkescientific.com Tue Aug 6 14:49:35 2013 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 6 Aug 2013 20:49:35 +0200 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: > In the long run this problem should go away as the PDB moves > to using the The PDBx/mmCIF format: > http://www.wwpdb.org/news/news_2013.html#22-May-2013 Either you are optimistic or a ultra marathon runner! The move over to mmCIF started of course 20 years ago, and that link you gave said the change applies only to very large structures: Structures that do not exceed the limitations of the PDB format will continue to be provided as PDB files in the archive for the foreseeable future. Even for large files, which previously would split the structure over multiple records, there will be a "best-effort" PDB format, available as a web service. 40 years of the PDB format => well-entrenched => not going to get rid of it any time soon. For another historical side-note, the PDB format started in the early 1970s, but contains a kernel which is even older! Quoting from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : In order to establish the PDB, acceptance by the crystallographic community was necessary, requiring a pilgrimage in 1970 to the Medical Research Council (MRC) laboratory and Crystal Data Centre (CDC) in Cambridge. One result of this exchange was a concession that coordinates of protein structures would be stored in the same format as the small molecule CDC database (with a redundant ATOM label at the beginning of each card), retaining the now-arcane counting number at the end. But the idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, and colleagues in Cambridge. The "now-arcane" counting number has long disappeared from the spec. It was there, I believe, so that if the punch cards were dropped then they could be resorted based on the last few columns. (I imagine you could also write a program to strip out the C-alpha cards, work with them, then merge the C-alphas back into the card deck correctly.) Andrew dalke at dalkescientific.com From anaryin at gmail.com Tue Aug 6 15:08:20 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 6 Aug 2013 12:08:20 -0700 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: It's quite hopeful indeed to believe that PDB is going to be phased out.. unfortunately structural biology is quite conservative (nice word for stubborn) regarding formats! The new format will probably only be "yet another one", although I'm hopeful it will bring some fresh air. From cjfields at illinois.edu Tue Aug 6 14:59:09 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 6 Aug 2013 18:59:09 +0000 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF7B1630D8@CHIMBX5.ad.uillinois.edu> On Aug 6, 2013, at 1:49 PM, Andrew Dalke wrote: > >... > For another historical side-note, the PDB format started in > the early 1970s, but contains a kernel which is even older! > Quoting from > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : > > In order to establish the PDB, acceptance by the crystallographic > community was necessary, requiring a pilgrimage in 1970 to the Medical > Research Council (MRC) laboratory and Crystal Data Centre (CDC) in > Cambridge. One result of this exchange was a concession that coordinates > of protein structures would be stored in the same format as the small > molecule CDC database (with a redundant ATOM label at the beginning of > each card), retaining the now-arcane counting number at the end. But the > idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, > and colleagues in Cambridge. > > The "now-arcane" counting number has long disappeared from the > spec. It was there, I believe, so that if the punch cards were > dropped then they could be resorted based on the last few columns. > (I imagine you could also write a program to strip out the > C-alpha cards, work with them, then merge the C-alphas back into > the card deck correctly.) > > Andrew > dalke at dalkescientific.com Now *that* is backwards-compatibility taken to an extreme. chris From Jared.Sampson at nyumc.org Tue Aug 6 16:10:25 2013 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Tue, 6 Aug 2013 20:10:25 +0000 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> For the curious, there has been a conversation on the CCP4 Bulletin Board over the past few days addressing exactly this topic. The takeaway message is essentially what Andrew has mentioned: PDB format is here for the foreseeable future. http://www.mail-archive.com/ccp4bb at jiscmail.ac.uk/msg32321.html Cheers, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center Old Public Health Building, Room 610 341 East 25th Street New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Aug 6, 2013, at 2:49 PM, Andrew Dalke wrote: On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: In the long run this problem should go away as the PDB moves to using the The PDBx/mmCIF format: http://www.wwpdb.org/news/news_2013.html#22-May-2013 Either you are optimistic or a ultra marathon runner! The move over to mmCIF started of course 20 years ago, and that link you gave said the change applies only to very large structures: Structures that do not exceed the limitations of the PDB format will continue to be provided as PDB files in the archive for the foreseeable future. Even for large files, which previously would split the structure over multiple records, there will be a "best-effort" PDB format, available as a web service. 40 years of the PDB format => well-entrenched => not going to get rid of it any time soon. For another historical side-note, the PDB format started in the early 1970s, but contains a kernel which is even older! Quoting from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : In order to establish the PDB, acceptance by the crystallographic community was necessary, requiring a pilgrimage in 1970 to the Medical Research Council (MRC) laboratory and Crystal Data Centre (CDC) in Cambridge. One result of this exchange was a concession that coordinates of protein structures would be stored in the same format as the small molecule CDC database (with a redundant ATOM label at the beginning of each card), retaining the now-arcane counting number at the end. But the idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, and colleagues in Cambridge. The "now-arcane" counting number has long disappeared from the spec. It was there, I believe, so that if the punch cards were dropped then they could be resorted based on the last few columns. (I imagine you could also write a program to strip out the C-alpha cards, work with them, then merge the C-alphas back into the card deck correctly.) Andrew dalke at dalkescientific.com _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Tue Aug 6 16:46:17 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 6 Aug 2013 13:46:17 -0700 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> Message-ID: Really nice discussion Jared, thanks for sharing. 2013/8/6 Sampson, Jared > For the curious, there has been a conversation on the CCP4 Bulletin Board > over the past few days addressing exactly this topic. The takeaway message > is essentially what Andrew has mentioned: PDB format is here for the > foreseeable future. > > http://www.mail-archive.com/ccp4bb at jiscmail.ac.uk/msg32321.html > > Cheers, > Jared > > -- > Jared Sampson > Xiangpeng Kong Lab > NYU Langone Medical Center > Old Public Health Building, Room 610 > 341 East 25th Street > New York, NY 10016 > 212-263-7898 > http://kong.med.nyu.edu/ > > > > > On Aug 6, 2013, at 2:49 PM, Andrew Dalke > wrote: > > On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: > In the long run this problem should go away as the PDB moves > to using the The PDBx/mmCIF format: > http://www.wwpdb.org/news/news_2013.html#22-May-2013 > > Either you are optimistic or a ultra marathon runner! The > move over to mmCIF started of course 20 years ago, and that > link you gave said the change applies only to very large > structures: > > Structures that do not exceed the limitations of the PDB > format will continue to be provided as PDB files in the > archive for the foreseeable future. > > Even for large files, which previously would split the structure > over multiple records, there will be a "best-effort" PDB format, > available as a web service. > > > 40 years of the PDB format => well-entrenched => not going to > get rid of it any time soon. > > > > For another historical side-note, the PDB format started in > the early 1970s, but contains a kernel which is even older! > Quoting from > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : > > In order to establish the PDB, acceptance by the crystallographic > community was necessary, requiring a pilgrimage in 1970 to the Medical > Research Council (MRC) laboratory and Crystal Data Centre (CDC) in > Cambridge. One result of this exchange was a concession that coordinates > of protein structures would be stored in the same format as the small > molecule CDC database (with a redundant ATOM label at the beginning of > each card), retaining the now-arcane counting number at the end. But the > idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, > and colleagues in Cambridge. > > The "now-arcane" counting number has long disappeared from the > spec. It was there, I believe, so that if the punch cards were > dropped then they could be resorted based on the last few columns. > (I imagine you could also write a program to strip out the > C-alpha cards, work with them, then merge the C-alphas back into > the card deck correctly.) > > Andrew > dalke at dalkescientific.com > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From arklenna at gmail.com Thu Aug 8 15:54:58 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 8 Aug 2013 15:54:58 -0400 Subject: [Biopython] PDB occupancy behavior Message-ID: Hi all, I just submitted a pull request I'd like wider feedback on. https://github.com/biopython/biopython/pull/207 In summary, I am using software-produced PDB files that simply stop after the coordinate data, so occupancy data is missing. Currently, the Biopython PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing this to 1.0. I would like to see if anyone knows of situations in which this would be a bad idea. Cheers, Lenna From anaryin at gmail.com Thu Aug 8 16:02:39 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 8 Aug 2013 13:02:39 -0700 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Hi Lenna, As I mentioned in the Github email, I think it's fine. It doesn't matter if the occupancy is 0 or 1 in case of a model most of the time. I agree with it. The only bad thing I can think about is having occupancy for a certain atom larger than 1 in some bogus cases but to be honest, no software that I know of bothers checking that... Cheers, Jo?o 2013/8/8 Lenna Peterson > Hi all, > > I just submitted a pull request I'd like wider feedback on. > > https://github.com/biopython/biopython/pull/207 > > In summary, I am using software-produced PDB files that simply stop after > the coordinate data, so occupancy data is missing. Currently, the Biopython > PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing > this to 1.0. > > I would like to see if anyone knows of situations in which this would be a > bad idea. > > Cheers, > > Lenna > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Jared.Sampson at nyumc.org Thu Aug 8 16:30:31 2013 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Thu, 8 Aug 2013 20:30:31 +0000 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Thanks, Lenna and Jo?o - I also agree, 1.0 is a better default occupancy value. For most structural manipulation purposes, unless specified otherwise, we must assume the atoms listed are present in the structure at full occupancy. Setting a reduced occupancy can be useful for partially bound ligands, disordered loops, and so forth, but doing so is the exception, not the rule. Cheers, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center Old Public Health Building, Room 610 341 East 25th Street New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Aug 8, 2013, at 4:02 PM, Jo?o Rodrigues > wrote: Hi Lenna, As I mentioned in the Github email, I think it's fine. It doesn't matter if the occupancy is 0 or 1 in case of a model most of the time. I agree with it. The only bad thing I can think about is having occupancy for a certain atom larger than 1 in some bogus cases but to be honest, no software that I know of bothers checking that... Cheers, Jo?o 2013/8/8 Lenna Peterson > Hi all, I just submitted a pull request I'd like wider feedback on. https://github.com/biopython/biopython/pull/207 In summary, I am using software-produced PDB files that simply stop after the coordinate data, so occupancy data is missing. Currently, the Biopython PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing this to 1.0. I would like to see if anyone knows of situations in which this would be a bad idea. Cheers, Lenna _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu Aug 8 18:37:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Aug 2013 23:37:27 +0100 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Thanks everyone - that seems like a clear consensus, patch applied :) Peter On Thu, Aug 8, 2013 at 9:30 PM, Sampson, Jared wrote: > Thanks, Lenna and Jo?o - > > I also agree, 1.0 is a better default occupancy value. For most > structural manipulation purposes, unless specified otherwise, we must assume > the atoms listed are present in the structure at full occupancy. Setting a > reduced occupancy can be useful for partially bound ligands, disordered > loops, and so forth, but doing so is the exception, not the rule. > > Cheers, > Jared > > -- > Jared Sampson > Xiangpeng Kong Lab > NYU Langone Medical Center > Old Public Health Building, Room 610 > 341 East 25th Street > New York, NY 10016 > 212-263-7898 > http://kong.med.nyu.edu/ > > > > > On Aug 8, 2013, at 4:02 PM, Jo?o Rodrigues > > wrote: > > Hi Lenna, > > As I mentioned in the Github email, I think it's fine. It doesn't matter > if the occupancy is 0 or 1 in case of a model most of the time. I agree > with it. The only bad thing I can think about is having occupancy for > a certain atom larger than 1 in some bogus cases but to be honest, > no software that I know of bothers checking that... > > Cheers, > > Jo?o > > > 2013/8/8 Lenna Peterson > > > Hi all, > > I just submitted a pull request I'd like wider feedback on. > > https://github.com/biopython/biopython/pull/207 > > In summary, I am using software-produced PDB files that simply stop after > the coordinate data, so occupancy data is missing. Currently, the > Biopython PDBParser sets missing or blank occupancy to 0.0. I am > suggesting changing this to 1.0. > > I would like to see if anyone knows of situations in which this would be a > bad idea. > > Cheers, > > Lenna From sainitin7 at gmail.com Fri Aug 9 15:12:39 2013 From: sainitin7 at gmail.com (sai nitin) Date: Fri, 9 Aug 2013 21:12:39 +0200 Subject: [Biopython] Issue in retrieving Pubmed Ids Message-ID: Hi all, I have set of genes ( ASCL1, AEBP1, MLF1) i want to search PUBMED for literature in glioma. Means i want to get Pubmed Ids for these genes in glioma. To achieve this i tried biopython script as follows. First i stored this terms in file as follows ASCL1 and glioma AEBP1 and glioma ..... infile = "file.txt" for line in infile.readlines(): single_id = line #Retreiving information data = Entrez.esearch(db="pubmed",term = single_id) res=Entrez.read(data) PMID = res["IdList"] print "%s" %(PMID) out_put.write("%s\n" %(PMID)) out_put.close() It reads only first line ASCL1 and glioma ... and gives result as follows ['22859994', '18796682', '18636433', '17146289', '17124508', '16103883', '11433425'] Traceback (most recent call last): File "PUBMED.py", line 13, in res=Entrez.read(data) File "/Library/Python/2.7/site-packages/Bio/Entrez/__init__.py", line 367, in read record = handler.read(handle) File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Empty term and query_key - nothing todo It looks like it does not read second line..in file.txt..Can any body tell how to solve this issue.. Thanks, -- Sainitin D From arklenna at gmail.com Fri Aug 9 15:42:13 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 9 Aug 2013 15:42:13 -0400 Subject: [Biopython] Issue in retrieving Pubmed Ids In-Reply-To: References: Message-ID: If your file has blank lines as shown, then the second iteration is calling `data = Entrez.esearch(db="pubmed",term = "\n")` When reading files I often have a check like this: line = line.strip() if not line: continue Hope that helps. Cheers, Lenna On Fri, Aug 9, 2013 at 3:12 PM, sai nitin wrote: > Hi all, > > I have set of genes ( ASCL1, AEBP1, MLF1) i want to search PUBMED for > literature in glioma. Means i want to get Pubmed Ids for these genes in > glioma. To achieve this i tried biopython script as follows. First i stored > this terms in file as follows > > ASCL1 and glioma > > AEBP1 and glioma > > ..... > > infile = "file.txt" > > for line in infile.readlines(): > > single_id = line > > #Retreiving information > > data = Entrez.esearch(db="pubmed",term = single_id) > > res=Entrez.read(data) > > PMID = res["IdList"] > > print "%s" %(PMID) > > out_put.write("%s\n" %(PMID)) > > out_put.close() > > It reads only first line ASCL1 and glioma ... and gives result as follows > > ['22859994', '18796682', '18636433', '17146289', '17124508', '16103883', > '11433425'] > > Traceback (most recent call last): > > File "PUBMED.py", line 13, in > > res=Entrez.read(data) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/__init__.py", line > 367, in read > > record = handler.read(handle) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 184, > in read > > self.parser.ParseFile(handle) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 322, > in endElementHandler > > raise RuntimeError(value) > > RuntimeError: Empty term and query_key - nothing todo > > > It looks like it does not read second line..in file.txt..Can any body tell > how to solve this issue.. > > > Thanks, > > -- > > Sainitin D > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From golubchi at stats.ox.ac.uk Wed Aug 21 04:41:48 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 21 Aug 2013 09:41:48 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast Message-ID: <52147D4C.4090900@stats.ox.ac.uk> Hello, The following refers to Biopython 1.61. Does anyone know if there are any hidden or hard-coded defaults for any parameters in NcbiblastnCommandline? Or any known bugs that could cause hits not to be reported? I've encountered an extremely frustrating issue that I've never seen before. The upshot is that the blastn result obtained through NcbiblastnCommandline occasionally reports "no hits" in a way that seems dependent on the query file. This strange output is different from that obtained via a subprocess call (or outside python entirely) -- both of which recover all hits consistently. I am using *exactly* the same parameters, inputs, and exactly the same path to the blastn executable. What actually happens is that for my large fasta file of 3000-odd nucleotide queries, several give no hits with the NcbiblastnCommandline call (these are not at the end of the query file, just throughout the file). On the other hand, cutting the file down to about 2000 queries, but not changing it in any other way, does give hits for these missing queries. Note that *nothing* else is changed; the parameters and call to blastn remain identical. This only happens for some, not all, of the blast databases I'm searching, making it look like there are variable deletions between the samples. I can provide (large) test data files if anyone thinks they can help. I have the query file that produces the wrong 'patchy' output, another that produces the correct output, and a sample blast database for which this happens. The actual calls are (paths substituted): NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, culling_limit=2) The above gives 'no hits' for about 25 queries out of the 3000+ in the file. stdout, stderr = sp.Popen("/path/to/blastn -db /path/to/db/C00006635 -query untrimmed.fasta -outfmt 5 -word_size 17 -gapopen 5 -gapextend 2 -culling_limit 2".split(), stdout=sp.PIPE, stderr=sp.PIPE).communicate() The above call to subprocess returns all hits, correctly. Thanks Tanya From p.j.a.cock at googlemail.com Wed Aug 21 05:39:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 21 Aug 2013 10:39:04 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast In-Reply-To: <52147D4C.4090900@stats.ox.ac.uk> References: <52147D4C.4090900@stats.ox.ac.uk> Message-ID: On Wed, Aug 21, 2013 at 9:41 AM, Tanya Golubchik wrote: > Hello, > > The following refers to Biopython 1.61. Does anyone know if there are any > hidden or hard-coded defaults for any parameters in NcbiblastnCommandline? > Or any known bugs that could cause hits not to be reported? > > I've encountered an extremely frustrating issue that I've never seen before. > The upshot is that the blastn result obtained through NcbiblastnCommandline > occasionally reports "no hits" in a way that seems dependent on the query > file. This strange output is different from that obtained via a subprocess > call (or outside python entirely) -- both of which recover all hits > consistently. I am using *exactly* the same parameters, inputs, and exactly > the same path to the blastn executable. > > What actually happens is that for my large fasta file of 3000-odd nucleotide > queries, several give no hits with the NcbiblastnCommandline call (these are > not at the end of the query file, just throughout the file). On the other > hand, cutting the file down to about 2000 queries, but not changing it in > any other way, does give hits for these missing queries. Note that *nothing* > else is changed; the parameters and call to blastn remain identical. This > only happens for some, not all, of the blast databases I'm searching, making > it look like there are variable deletions between the samples. > > I can provide (large) test data files if anyone thinks they can help. I have > the query file that produces the wrong 'patchy' output, another that > produces the correct output, and a sample blast database for which this > happens. > > The actual calls are (paths substituted): > > NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, > query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, > culling_limit=2) > > The above gives 'no hits' for about 25 queries out of the 3000+ in the file. Are you using it like this, which also uses subprocess, cline = NcbiblastnCommandline(...) stdout, stderr = cline() > stdout, stderr = sp.Popen("/path/to/blastn -db /path/to/db/C00006635 -query > untrimmed.fasta -outfmt 5 -word_size 17 -gapopen 5 -gapextend 2 > -culling_limit 2".split(), stdout=sp.PIPE, stderr=sp.PIPE).communicate() > > The above call to subprocess returns all hits, correctly. Note there is a subtle difference in the order of the command line which could (depending on how the command line parsing is done) reveal a bug in BLAST: >>> from Bio.Blast.Applications import NcbiblastnCommandline >>> cline = NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, culling_limit=2) >>> str(cline) '/path/to/blastn -outfmt 5 -query untrimmed.fasta -db /path/to/db/C00006635 -gapopen 5 -gapextend 2 -culling_limit 2' Just to rule this out, retry running this string instead. However, the more likely explanation is you didn't set the wordsize, unless that is a typo in this email? Regards, Peter From golubchi at stats.ox.ac.uk Wed Aug 21 07:38:36 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 21 Aug 2013 12:38:36 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast In-Reply-To: References: <52147D4C.4090900@stats.ox.ac.uk> Message-ID: <5214A6BC.2070003@stats.ox.ac.uk> Hi Peter, Thank you for your help. You're right that it's the word size, although not in the way this was suggesting -- what happened was that with my selected word size, some queries were failing with -culling_limit 1 (I still don't quite know why this happens, it seems to be a blast bug) and defaulting to culling limit 2 -- which is fine, but at this point they were losing the word size argument (my stuff-up). Thanks again! Tanya From csaba.kiss at lanl.gov Sun Aug 25 15:20:40 2013 From: csaba.kiss at lanl.gov (Kiss, Csaba) Date: Sun, 25 Aug 2013 19:20:40 +0000 Subject: [Biopython] Mac installation problem Message-ID: I have a problem installing biopython on a MAC system using the easy_install method. I have tried these two commands: sudo easy_install -f http://biopython.org/DIST/ biopython sudo easy_install -U biopython None of them worked. I get these error messages: Reading http://pypi.python.org/simple/biopython/ No local packages or download links found for biopython error: Could not find suitable distribution for Requirement.parse('biopython') Can someone help me rectify this? I have a paper being reviewed and the reviewer has problems getting biopython going, which means it's pretty crucial for me. Csaba From p.j.a.cock at googlemail.com Sun Aug 25 15:47:19 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 25 Aug 2013 20:47:19 +0100 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: You'll need Apple's XCode for the compilers etc, available for free on the App Store - and then from within XCode you must install the optional command line utilities as well. Also, I'd recommend the old fashioned way, just download and decompress the tar-ball, then python setup.py build python setup.py test sudo python setup.py install This should match our wiki download page, http://biopython.org/wiki/Download Peter On Sunday, August 25, 2013, Kiss, Csaba wrote: > I have a problem installing biopython on a MAC system using the > easy_install method. I have tried these two commands: > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > sudo easy_install -U biopython > > None of them worked. I get these error messages: > > Reading http://pypi.python.org/simple/biopython/ > No local packages or download links found for biopython > error: Could not find suitable distribution for > Requirement.parse('biopython') > > Can someone help me rectify this? I have a paper being reviewed and the > reviewer has problems getting biopython going, which means it's pretty > crucial for me. > > Csaba > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Sun Aug 25 17:24:25 2013 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sun, 25 Aug 2013 17:24:25 -0400 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: I have found that installing MAC packages through FINK is helpful (other options similar to fink are macports and homebrew but I have found fink sufficient and it does install biopython ) On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock wrote: > You'll need Apple's XCode for the compilers > etc, available for free on the App Store - and then > from within XCode you must install the optional > command line utilities as well. > > Also, I'd recommend the old fashioned way, just > download and decompress the tar-ball, then > > python setup.py build > python setup.py test > sudo python setup.py install > > This should match our wiki download page, > http://biopython.org/wiki/Download > > Peter > > On Sunday, August 25, 2013, Kiss, Csaba wrote: > > > I have a problem installing biopython on a MAC system using the > > easy_install method. I have tried these two commands: > > > > > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > > > sudo easy_install -U biopython > > > > None of them worked. I get these error messages: > > > > Reading http://pypi.python.org/simple/biopython/ > > No local packages or download links found for biopython > > error: Could not find suitable distribution for > > Requirement.parse('biopython') > > > > Can someone help me rectify this? I have a paper being reviewed and the > > reviewer has problems getting biopython going, which means it's pretty > > crucial for me. > > > > Csaba > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From csaba.kiss at lanl.gov Mon Aug 26 09:47:09 2013 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Mon, 26 Aug 2013 07:47:09 -0600 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: <521B5C5D.1000102@lanl.gov> Thanks, George. I will give FINK a try. Csaba On 8/25/2013 3:24 PM, George Devaniranjan wrote: > I have found that installing MAC packages through FINK is helpful > (other options similar to fink are macports and homebrew but I have > found fink sufficient and it does install biopython ) > > > On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock > wrote: > > You'll need Apple's XCode for the compilers > etc, available for free on the App Store - and then > from within XCode you must install the optional > command line utilities as well. > > Also, I'd recommend the old fashioned way, just > download and decompress the tar-ball, then > > python setup.py build > python setup.py test > sudo python setup.py install > > This should match our wiki download page, > http://biopython.org/wiki/Download > > Peter > > On Sunday, August 25, 2013, Kiss, Csaba wrote: > > > I have a problem installing biopython on a MAC system using the > > easy_install method. I have tried these two commands: > > > > > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > > > sudo easy_install -U biopython > > > > None of them worked. I get these error messages: > > > > Reading http://pypi.python.org/simple/biopython/ > > No local packages or download links found for biopython > > error: Could not find suitable distribution for > > Requirement.parse('biopython') > > > > Can someone help me rectify this? I have a paper being reviewed > and the > > reviewer has problems getting biopython going, which means it's > pretty > > crucial for me. > > > > Csaba > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From nlindberg at mkei.org Mon Aug 26 10:08:19 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Mon, 26 Aug 2013 14:08:19 +0000 Subject: [Biopython] Mac installation problem In-Reply-To: <521B5C5D.1000102@lanl.gov> Message-ID: I would highly recommend doing it the old fashioned way, as the first poster recommended. The package managers start to do things a little funky the way they link packages, and depending on if you're using the built in Python versus a homebrewed/ported copy, etc. I don't think fink is actively maintained any longer. Nick Lindberg Sr. Consulting Engineer, HPC Milwaukee Institute 414.727.6413 (W) http://www.mkei.org On 8/26/13 8:47 AM, "Csaba Kiss" wrote: >Thanks, George. I will give FINK a try. > >Csaba >On 8/25/2013 3:24 PM, George Devaniranjan wrote: >> I have found that installing MAC packages through FINK is helpful >> (other options similar to fink are macports and homebrew but I have >> found fink sufficient and it does install biopython ) >> >> >> On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock > > wrote: >> >> You'll need Apple's XCode for the compilers >> etc, available for free on the App Store - and then >> from within XCode you must install the optional >> command line utilities as well. >> >> Also, I'd recommend the old fashioned way, just >> download and decompress the tar-ball, then >> >> python setup.py build >> python setup.py test >> sudo python setup.py install >> >> This should match our wiki download page, >> http://biopython.org/wiki/Download >> >> Peter >> >> On Sunday, August 25, 2013, Kiss, Csaba wrote: >> >> > I have a problem installing biopython on a MAC system using the >> > easy_install method. I have tried these two commands: >> > >> > >> > >> > sudo easy_install -f http://biopython.org/DIST/ biopython >> > >> > sudo easy_install -U biopython >> > >> > None of them worked. I get these error messages: >> > >> > Reading http://pypi.python.org/simple/biopython/ >> > No local packages or download links found for biopython >> > error: Could not find suitable distribution for >> > Requirement.parse('biopython') >> > >> > Can someone help me rectify this? I have a paper being reviewed >> and the >> > reviewer has problems getting biopython going, which means it's >> pretty >> > crucial for me. >> > >> > Csaba >> > >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biopython >> >> > >-- >Best Regards: >Csaba Kiss PhD, MSc, BSc >TA-43, HRL-1, MS888 >Los Alamos National Laboratory >Work: 1-505-667-9898 >Cell: 1-505-920-5774 > >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From arklenna at gmail.com Mon Aug 26 12:57:23 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Mon, 26 Aug 2013 12:57:23 -0400 Subject: [Biopython] Mac installation problem In-Reply-To: References: <521B5C5D.1000102@lanl.gov> Message-ID: Furthermore, I don't believe installing any of the Mac "package managers" eliminates the need to install developer tools/XCode. Cheers, Lenna On Mon, Aug 26, 2013 at 10:08 AM, Nick Lindberg wrote: > I would highly recommend doing it the old fashioned way, as the first > poster recommended. The package managers start to do things a little > funky the way they link packages, and depending on if you're using the > built in Python versus a homebrewed/ported copy, etc. > > I don't think fink is actively maintained any longer. > > Nick Lindberg > Sr. Consulting Engineer, HPC > Milwaukee Institute > 414.727.6413 (W) > http://www.mkei.org > > > > > > > > > > > > On 8/26/13 8:47 AM, "Csaba Kiss" wrote: > > >Thanks, George. I will give FINK a try. > > > >Csaba > >On 8/25/2013 3:24 PM, George Devaniranjan wrote: > >> I have found that installing MAC packages through FINK is helpful > >> (other options similar to fink are macports and homebrew but I have > >> found fink sufficient and it does install biopython ) > >> > >> > >> On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock >> > wrote: > >> > >> You'll need Apple's XCode for the compilers > >> etc, available for free on the App Store - and then > >> from within XCode you must install the optional > >> command line utilities as well. > >> > >> Also, I'd recommend the old fashioned way, just > >> download and decompress the tar-ball, then > >> > >> python setup.py build > >> python setup.py test > >> sudo python setup.py install > >> > >> This should match our wiki download page, > >> http://biopython.org/wiki/Download > >> > >> Peter > >> > >> On Sunday, August 25, 2013, Kiss, Csaba wrote: > >> > >> > I have a problem installing biopython on a MAC system using the > >> > easy_install method. I have tried these two commands: > >> > > >> > > >> > > >> > sudo easy_install -f http://biopython.org/DIST/ biopython > >> > > >> > sudo easy_install -U biopython > >> > > >> > None of them worked. I get these error messages: > >> > > >> > Reading http://pypi.python.org/simple/biopython/ > >> > No local packages or download links found for biopython > >> > error: Could not find suitable distribution for > >> > Requirement.parse('biopython') > >> > > >> > Can someone help me rectify this? I have a paper being reviewed > >> and the > >> > reviewer has problems getting biopython going, which means it's > >> pretty > >> > crucial for me. > >> > > >> > Csaba > >> > > >> > _______________________________________________ > >> > Biopython mailing list - Biopython at lists.open-bio.org > >> > >> > http://lists.open-bio.org/mailman/listinfo/biopython > >> > > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/biopython > >> > >> > > > >-- > >Best Regards: > >Csaba Kiss PhD, MSc, BSc > >TA-43, HRL-1, MS888 > >Los Alamos National Laboratory > >Work: 1-505-667-9898 > >Cell: 1-505-920-5774 > > > >_______________________________________________ > >Biopython mailing list - Biopython at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Wed Aug 28 18:47:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Aug 2013 23:47:04 +0100 Subject: [Biopython] Biopython 1.62 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.62 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). Python support This is our first release of Biopython which officially supports Python 3. Specifically, this is supported under Python 3.3. Older versions of Python 3 may still work albeit with some issues, but are not supported. We still fully support Python 2.5, 2.6, and 2.7. Support under Jython is available for versions 2.5 and 2.7 and under PyPy for versions 1.9 and 2.0. However, unlike CPython, Jython and PyPy support is partial: NumPy and our C extensions are not covered. Please note that this release marks our last official for support Python 2.5. Beginning from Biopython 1.63, the minimum supported Python version will be 2.6. Highlights The translation functions will give a warning on any partial codons (and this will probably become an error in a future release). If you know you are dealing with partial sequences, either pad with ?N? to extend the sequence length to a multiple of three, or explicitly trim the sequence. The handling of joins and related complex features in Genbank/EMBL files has been changed with the introduction of a CompoundLocation object. Previously a SeqFeaturefor something like a multi-exon CDS would have a child SeqFeature (under thesub_features attribute) for each exon. The sub_features property will still be populated for now, but is deprecated and will in future be removed. Please consult the examples in the help (docstrings) and Tutorial. Thanks to the efforts of Ben Morris, the Phylo module now supports the file formats NeXML and CDAO. The Newick parser is also significantly faster, and can now optionally extract bootstrap values from the Newick comment field (like Molphy and Archaeopteryx do). Nate Sutton added a wrapper for FastTree toBio.Phylo.Applications. New module Bio.UniProt adds parsers for the GAF, GPA and GPI formats from UniProt-GOA. The BioSQL module is now supported in Jython. MySQL and PostgreSQL databases can be used. The relevant JDBC driver should be available in the CLASSPATH. Feature labels on circular GenomeDiagram figures now support the label_positionargument (start, middle or end) in addition to the current default placement, and in a change to prior releases these labels are outside the features which is now consistent with the linear diagrams. The code for parsing 3D structures in mmCIF files was updated to use the Python standard library?s shlex module instead of C code using flex. The Bio.Sequencing.Applications module now includes a BWA command line wrapper. Bio.motifs supports JASPAR format files with multiple position-frequence matrices. Additionally there have been other minor bug fixes and more unit tests. Contributors Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Alexander Campbell (first contribution) Andrea Rizzi (first contribution) Anthony Mathelier (first contribution) Ben Morris (first contribution) Brad Chapman Christian Brueffer David Arenillas (first contribution) David Martin (first contribution) Eric Talevich Iddo Friedberg Jian-Long Huang (first contribution) Joao Rodrigues Kai Blin Lenna Peterson Michiel de Hoon Matsuyuki Shirota (first contribution) Nate Sutton (first contribution) Peter Cock Petra Kubincov? (first contribution) Phillip Garland Saket Choudhary (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Xabier Bello (first contribution) Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/08/biopython-1-62-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From jadolfbr at gmail.com Wed Aug 28 20:25:43 2013 From: jadolfbr at gmail.com (Gmail) Date: Wed, 28 Aug 2013 19:25:43 -0500 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: Message-ID: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> > Hi All, > > I am trying to delete residues from a chain using > > id = res.id > old_chain.detach_child(id) > > Which was talked about here: http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython > > I used both the release versions and built the github version, but I get an error message: > > File "/PyIgClassify/tools/renumbering.py", line 296, in delete_res_in_old_chain > old_chain.detach_child(id) > File "/Library/Python/2.7/site-packages/Bio/PDB/Entity.py", line 76, in detach_child > child=self.child_dict[id] > TypeError: unhashable type: 'list' > > I have tried this in iPython with arbitrary chains and the result seems to be the same. > Any advice for a newbie Biopython coder? Is this a bug or is there some way around this? > > Thanks!! > > Jared Adolf-Bryfogle > PhD Candidate > Lab of Dr. Roland Dunbrack > FCCC/DrexelMed > From anaryin at gmail.com Wed Aug 28 20:31:04 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 29 Aug 2013 01:31:04 +0100 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Hi Jared, Can you give us an example of the code you are running? child_dict *should* be a dictionary, not a list, so there must be something wrong in there. Cheers, Jo?o From jadolfbr at gmail.com Wed Aug 28 20:53:22 2013 From: jadolfbr at gmail.com (Jared Adolf-Bryfogle) Date: Wed, 28 Aug 2013 20:53:22 -0400 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Sure, I was thinking it's the res.id list that's passed to the child_dict to detach it? Here is the code snippet: def delete_res_in_old_chain(old_chain, start, end): seq_position = 1 for res in old_chain: if not (start <= seq_position <= end): id = res.id old_chain.detach_child(id) if not res.id[0]==' ': seq_position+=1 On Wed, Aug 28, 2013 at 8:31 PM, Jo?o Rodrigues wrote: > Hi Jared, > > Can you give us an example of the code you are running? child_dict > *should* be a dictionary, not a list, so there must be something wrong in > there. > > Cheers, > > Jo?o > From anaryin at gmail.com Wed Aug 28 21:02:32 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 29 Aug 2013 02:02:32 +0100 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: True, if id is list something is wrong. The code on github seems good, and my local copy here works good too. What exactly is old_chain? Are you sure it is a true chain?? What is the content of res.id? From jadolfbr at gmail.com Wed Aug 28 22:04:23 2013 From: jadolfbr at gmail.com (Jared Adolf-Bryfogle) Date: Wed, 28 Aug 2013 22:04:23 -0400 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Yes - the list was the problem. Looking through the code again, I made res.id a list in another function where I changed residue numbers - instead of passing it a tuple like I should have. Sorry about this - thanks for taking the time to help me. -Jared On Wed, Aug 28, 2013 at 9:02 PM, Jo?o Rodrigues wrote: > True, if id is list something is wrong. The code on github seems good, and > my local copy here works good too. What exactly is old_chain? Are you sure > it is a true chain? What is the content of res.id? > From jdjensen at eng.ucsd.edu Thu Aug 29 19:04:41 2013 From: jdjensen at eng.ucsd.edu (James Jensen) Date: Thu, 29 Aug 2013 16:04:41 -0700 Subject: [Biopython] (Bio.PDB) problems with NeighborSearch: error at levels above "A", residue index discrepancy with unfold_entities Message-ID: <521FD389.1090207@eng.ucsd.edu> Hello! I am writing a function that, given two chains in a PDB file, should return 1) the positions and identities of all residues that are in contact with (distance < 5 angstroms) a residue on the other chain, and 2) the amino acid sequences of the chains. I've been doing this with NeighborSearch.search_all(radius=5, level='A') and then for each atom pair, seeing what its parent residue is and whether the parent residues of the two atoms belong to different chains. This may seem like a roundabout way of doing it, but if I call search_all(radius=5, level='R'), or indeed with level=any level other than 'A', I get the error TypeError: unorderable types: Residue() < Residue() So my first question is why it might be that search_all isn't working at higher levels. For the adjacent residue pairs I identify using NeighborSearch, I get each residue's position in its respective chain by residue.get_id()[1]. I've noticed, however, that if I get the sequence of the chain using seq = Selection.unfold_entities(chain, 'R') and then reference (i.e. seq[index]) the amino acids using the indices returned by the NeighborSearch step, they are not the same residues that I get if during the NeighborSearch step I report residue.get_resname() for each adjacent residue. I've tried it with several proteins, and the problem is the same. Chains A and C of 2h62 are an example. I then noticed that the lowest residue ID number of the residues yielded from Selection.unfold_entities(chain, 'R') is not 1. For chain A, it's 11, and for chain C, it's 34. Not knowing why this was, I thought I'd try subtracting the lowest ID number from the indices returned by the NeighborSearch step (i.e. in chain A, 11 -> 0 so seq[0] would be the first residue, the one with ID 11). This happened to seem to work for chain A. However, it gives me negative indices for some of the contacts in chain C. This means that NeighborSearch can return residues that are not returned by unfold_entities(). The lowest residue ID returned by NeighborSearch for chain C was 24, whereas for unfold_entities() it was 34. For both chains A and C, I was given the warning PDBConstructionWarning: WARNING: Chain [letter] is discontinuous at line [line number]. In fact, I seem to get this warning for just about every chain of every structure I load. Is this the reason that the first residues in the two chains are at 11 and 34, rather than 1? If so, could it be that NeighborSearch is able to work around the discontinuity while unfold_entities is not? Any suggestions? Thanks for your time and help, James Jensen From mictadlo at gmail.com Mon Aug 5 06:58:30 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 5 Aug 2013 16:58:30 +1000 Subject: [Biopython] GTF writing Message-ID: Hello, What is necessary to change the following example http://biopython.org/wiki/GFF_Parsing#Writing_GFF3_from_scratch so the output would be GTF instead of GFF3? Thank you in advance. Cheers, Mic From chapmanb at 50mail.com Mon Aug 5 10:44:05 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 05 Aug 2013 06:44:05 -0400 Subject: [Biopython] GTF writing In-Reply-To: References: Message-ID: <87li4g8lje.fsf@fastmail.fm> Mic; > What is necessary to change the following example > http://biopython.org/wiki/GFF_Parsing#Writing_GFF3_from_scratch so the > output would be GTF instead of GFF3? Unfortunately, it doesn't support output to GTF. This was never a priority because GTF is a underdefined format that GFF3 replaces, so the goal was to move towards the current specification. Hope this helps, Brad From dalke at dalkescientific.com Mon Aug 5 23:09:05 2013 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 6 Aug 2013 01:09:05 +0200 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> Message-ID: <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> A bit late, but a bit of background: > On Sun, Jul 14, 2013 at 5:40 PM, Katrina Lexa wrote: >> My PDB file came from Maestro, so that is the ordering it follows after 9999. On Jul 15, 2013, at 7:46 PM, Peter Cock wrote: > i.e. This software package? http://www.schrodinger.com/productpage/14/12/ > > Could you contact their support to find out why they are doing this please? Yes, that's the Maestro Katrina was almost certainly talking about. It's a commercial package which has been around for a while; the company started in 1990 as a commercialization of the Jaguar QM package from Richard Friesner's and William Goddard's labs at CalTech. Maestro is the GUI to their QM and MM codes. Their conversion routines support various options. See: https://www.schrodinger.com//AcrobatFile.php?type=supportdocs&type2=&ident=530 The key ones are: -hex : Use hexadecimal encoding for atom numbers greater than 99999 and for residue numbers greater than 9999 and -hybrid36 : Use the hybrid36 scheme for atom serial numbers. On input, integers of up to 6 digits and hexadecimal numbers are recognized on ATOM records by default. On output, the default is to use integers for less than 100 000 atoms, and hexadecimal for 100 000 atoms or more Annoyingly, as Robert Hanson reported in: http://www.mailinglistarchive.com/html/jmol-users at lists.sourceforge.net/2013-01/msg00111.html (and see the thread at) http://article.gmane.org/gmane.science.chemistry.blue-obelisk/1659/match=pdb+ok+who%27s+wise+guy their default output generates records like: ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H which means there can be two atoms with serial numbers "18700" (or "99999", etc) in the same file, with different meanings of what those numbers really mean. This obviously messes up all of the other PDB annotations which use a serial id, but I presume that most Maestro user only use PDB files for coordinate data, and not for the other fields. Maestro is the only program I know of which uses this awful form. A default enabling of the "-hybrid36" option (first-digit-is-in-base-36) would make it more consistent with tools in the X-PLOR/VMD heritage does, where A0000 follows 99999. Presumably they want the full 1,048,575 atom range. > If there are guidelines in the PDB specification for when this field overflows > I missed them, but it is a problem is there are rival hacks in common use > (roll-over/wrap-around versus this semi-hex scheme). There are no specs for how to handle more than 9999 residues, just like there are no specs for how to handle more than 99999 atoms. Cheers, Andrew dalke at dalkescientific.com From p.j.a.cock at googlemail.com Tue Aug 6 09:35:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 6 Aug 2013 10:35:25 +0100 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: On Tue, Aug 6, 2013 at 12:09 AM, Andrew Dalke wrote: > A bit late, but a bit of background: > >> On Sun, Jul 14, 2013 at 5:40 PM, Katrina Lexa wrote: >>> My PDB file came from Maestro, so that is the ordering it follows after 9999. > > On Jul 15, 2013, at 7:46 PM, Peter Cock wrote: >> i.e. This software package? http://www.schrodinger.com/productpage/14/12/ >> >> Could you contact their support to find out why they are doing this please? > > Yes, that's the Maestro Katrina was almost certainly talking about. It's a > commercial package which has been around for a while; the company > started in 1990 as a commercialization of the Jaguar QM package from > Richard Friesner's and William Goddard's labs at CalTech. Maestro is > the GUI to their QM and MM codes. > > Their conversion routines support various options. See: > https://www.schrodinger.com//AcrobatFile.php?type=supportdocs&type2=&ident=530 > > The key ones are: > > -hex : Use hexadecimal encoding for atom numbers greater > than 99999 and for residue numbers greater than 9999 > > and > > -hybrid36 : Use the hybrid36 scheme for atom serial numbers. > On input, integers of up to 6 digits and hexadecimal numbers are > recognized on ATOM records by default. On output, the default is > to use integers for less than 100 000 atoms, and hexadecimal for > 100 000 atoms or more > > > Annoyingly, as Robert Hanson reported in: > http://www.mailinglistarchive.com/html/jmol-users at lists.sourceforge.net/2013-01/msg00111.html > (and see the thread at) > http://article.gmane.org/gmane.science.chemistry.blue-obelisk/1659/match=pdb+ok+who%27s+wise+guy > > their default output generates records like: > > ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H > ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H > ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O > ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H > ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H > > which means there can be two atoms with serial numbers "18700" (or > "99999", etc) in the same file, with different meanings of what those > numbers really mean. > > This obviously messes up all of the other PDB annotations which use > a serial id, but I presume that most Maestro user only use PDB files > for coordinate data, and not for the other fields. > > Maestro is the only program I know of which uses this awful form. A > default enabling of the "-hybrid36" option (first-digit-is-in-base-36) > would make it more consistent with tools in the X-PLOR/VMD > heritage does, where A0000 follows 99999. Presumably they want > the full 1,048,575 atom range. > > >> If there are guidelines in the PDB specification for when this field overflows >> I missed them, but it is a problem is there are rival hacks in common use >> (roll-over/wrap-around versus this semi-hex scheme). > > There are no specs for how to handle more than 9999 residues, > just like there are no specs for how to handle more than 99999 atoms. > > Cheers, > > > Andrew > dalke at dalkescientific.com Thanks Andrew - useful background. In the long run this problem should go away as the PDB moves to using the The PDBx/mmCIF format: http://www.wwpdb.org/news/news_2013.html#22-May-2013 Peter From dalke at dalkescientific.com Tue Aug 6 18:49:35 2013 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 6 Aug 2013 20:49:35 +0200 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: > In the long run this problem should go away as the PDB moves > to using the The PDBx/mmCIF format: > http://www.wwpdb.org/news/news_2013.html#22-May-2013 Either you are optimistic or a ultra marathon runner! The move over to mmCIF started of course 20 years ago, and that link you gave said the change applies only to very large structures: Structures that do not exceed the limitations of the PDB format will continue to be provided as PDB files in the archive for the foreseeable future. Even for large files, which previously would split the structure over multiple records, there will be a "best-effort" PDB format, available as a web service. 40 years of the PDB format => well-entrenched => not going to get rid of it any time soon. For another historical side-note, the PDB format started in the early 1970s, but contains a kernel which is even older! Quoting from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : In order to establish the PDB, acceptance by the crystallographic community was necessary, requiring a pilgrimage in 1970 to the Medical Research Council (MRC) laboratory and Crystal Data Centre (CDC) in Cambridge. One result of this exchange was a concession that coordinates of protein structures would be stored in the same format as the small molecule CDC database (with a redundant ATOM label at the beginning of each card), retaining the now-arcane counting number at the end. But the idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, and colleagues in Cambridge. The "now-arcane" counting number has long disappeared from the spec. It was there, I believe, so that if the punch cards were dropped then they could be resorted based on the last few columns. (I imagine you could also write a program to strip out the C-alpha cards, work with them, then merge the C-alphas back into the card deck correctly.) Andrew dalke at dalkescientific.com From anaryin at gmail.com Tue Aug 6 19:08:20 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 6 Aug 2013 12:08:20 -0700 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: It's quite hopeful indeed to believe that PDB is going to be phased out.. unfortunately structural biology is quite conservative (nice word for stubborn) regarding formats! The new format will probably only be "yet another one", although I'm hopeful it will bring some fresh air. From cjfields at illinois.edu Tue Aug 6 18:59:09 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 6 Aug 2013 18:59:09 +0000 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF7B1630D8@CHIMBX5.ad.uillinois.edu> On Aug 6, 2013, at 1:49 PM, Andrew Dalke wrote: > >... > For another historical side-note, the PDB format started in > the early 1970s, but contains a kernel which is even older! > Quoting from > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : > > In order to establish the PDB, acceptance by the crystallographic > community was necessary, requiring a pilgrimage in 1970 to the Medical > Research Council (MRC) laboratory and Crystal Data Centre (CDC) in > Cambridge. One result of this exchange was a concession that coordinates > of protein structures would be stored in the same format as the small > molecule CDC database (with a redundant ATOM label at the beginning of > each card), retaining the now-arcane counting number at the end. But the > idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, > and colleagues in Cambridge. > > The "now-arcane" counting number has long disappeared from the > spec. It was there, I believe, so that if the punch cards were > dropped then they could be resorted based on the last few columns. > (I imagine you could also write a program to strip out the > C-alpha cards, work with them, then merge the C-alphas back into > the card deck correctly.) > > Andrew > dalke at dalkescientific.com Now *that* is backwards-compatibility taken to an extreme. chris From Jared.Sampson at nyumc.org Tue Aug 6 20:10:25 2013 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Tue, 6 Aug 2013 20:10:25 +0000 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> Message-ID: <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> For the curious, there has been a conversation on the CCP4 Bulletin Board over the past few days addressing exactly this topic. The takeaway message is essentially what Andrew has mentioned: PDB format is here for the foreseeable future. http://www.mail-archive.com/ccp4bb at jiscmail.ac.uk/msg32321.html Cheers, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center Old Public Health Building, Room 610 341 East 25th Street New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Aug 6, 2013, at 2:49 PM, Andrew Dalke wrote: On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: In the long run this problem should go away as the PDB moves to using the The PDBx/mmCIF format: http://www.wwpdb.org/news/news_2013.html#22-May-2013 Either you are optimistic or a ultra marathon runner! The move over to mmCIF started of course 20 years ago, and that link you gave said the change applies only to very large structures: Structures that do not exceed the limitations of the PDB format will continue to be provided as PDB files in the archive for the foreseeable future. Even for large files, which previously would split the structure over multiple records, there will be a "best-effort" PDB format, available as a web service. 40 years of the PDB format => well-entrenched => not going to get rid of it any time soon. For another historical side-note, the PDB format started in the early 1970s, but contains a kernel which is even older! Quoting from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : In order to establish the PDB, acceptance by the crystallographic community was necessary, requiring a pilgrimage in 1970 to the Medical Research Council (MRC) laboratory and Crystal Data Centre (CDC) in Cambridge. One result of this exchange was a concession that coordinates of protein structures would be stored in the same format as the small molecule CDC database (with a redundant ATOM label at the beginning of each card), retaining the now-arcane counting number at the end. But the idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, and colleagues in Cambridge. The "now-arcane" counting number has long disappeared from the spec. It was there, I believe, so that if the punch cards were dropped then they could be resorted based on the last few columns. (I imagine you could also write a program to strip out the C-alpha cards, work with them, then merge the C-alphas back into the card deck correctly.) Andrew dalke at dalkescientific.com _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Tue Aug 6 20:46:17 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 6 Aug 2013 13:46:17 -0700 Subject: [Biopython] Reading large files, Biopython cookbook example In-Reply-To: <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> References: <5EA03B7D-5815-4C23-912B-12471E1D28A4@umich.edu> <5516DC46-FE01-405A-92EA-D4E947C79761@dalkescientific.com> <4B22CFF6-F543-45B8-B82C-704642A9CED7@nyumc.org> Message-ID: Really nice discussion Jared, thanks for sharing. 2013/8/6 Sampson, Jared > For the curious, there has been a conversation on the CCP4 Bulletin Board > over the past few days addressing exactly this topic. The takeaway message > is essentially what Andrew has mentioned: PDB format is here for the > foreseeable future. > > http://www.mail-archive.com/ccp4bb at jiscmail.ac.uk/msg32321.html > > Cheers, > Jared > > -- > Jared Sampson > Xiangpeng Kong Lab > NYU Langone Medical Center > Old Public Health Building, Room 610 > 341 East 25th Street > New York, NY 10016 > 212-263-7898 > http://kong.med.nyu.edu/ > > > > > On Aug 6, 2013, at 2:49 PM, Andrew Dalke > wrote: > > On Aug 6, 2013, at 11:35 AM, Peter Cock wrote: > In the long run this problem should go away as the PDB moves > to using the The PDBx/mmCIF format: > http://www.wwpdb.org/news/news_2013.html#22-May-2013 > > Either you are optimistic or a ultra marathon runner! The > move over to mmCIF started of course 20 years ago, and that > link you gave said the change applies only to very large > structures: > > Structures that do not exceed the limitations of the PDB > format will continue to be provided as PDB files in the > archive for the foreseeable future. > > Even for large files, which previously would split the structure > over multiple records, there will be a "best-effort" PDB format, > available as a web service. > > > 40 years of the PDB format => well-entrenched => not going to > get rid of it any time soon. > > > > For another historical side-note, the PDB format started in > the early 1970s, but contains a kernel which is even older! > Quoting from > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf : > > In order to establish the PDB, acceptance by the crystallographic > community was necessary, requiring a pilgrimage in 1970 to the Medical > Research Council (MRC) laboratory and Crystal Data Centre (CDC) in > Cambridge. One result of this exchange was a concession that coordinates > of protein structures would be stored in the same format as the small > molecule CDC database (with a redundant ATOM label at the beginning of > each card), retaining the now-arcane counting number at the end. But the > idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond, > and colleagues in Cambridge. > > The "now-arcane" counting number has long disappeared from the > spec. It was there, I believe, so that if the punch cards were > dropped then they could be resorted based on the last few columns. > (I imagine you could also write a program to strip out the > C-alpha cards, work with them, then merge the C-alphas back into > the card deck correctly.) > > Andrew > dalke at dalkescientific.com > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From arklenna at gmail.com Thu Aug 8 19:54:58 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 8 Aug 2013 15:54:58 -0400 Subject: [Biopython] PDB occupancy behavior Message-ID: Hi all, I just submitted a pull request I'd like wider feedback on. https://github.com/biopython/biopython/pull/207 In summary, I am using software-produced PDB files that simply stop after the coordinate data, so occupancy data is missing. Currently, the Biopython PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing this to 1.0. I would like to see if anyone knows of situations in which this would be a bad idea. Cheers, Lenna From anaryin at gmail.com Thu Aug 8 20:02:39 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 8 Aug 2013 13:02:39 -0700 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Hi Lenna, As I mentioned in the Github email, I think it's fine. It doesn't matter if the occupancy is 0 or 1 in case of a model most of the time. I agree with it. The only bad thing I can think about is having occupancy for a certain atom larger than 1 in some bogus cases but to be honest, no software that I know of bothers checking that... Cheers, Jo?o 2013/8/8 Lenna Peterson > Hi all, > > I just submitted a pull request I'd like wider feedback on. > > https://github.com/biopython/biopython/pull/207 > > In summary, I am using software-produced PDB files that simply stop after > the coordinate data, so occupancy data is missing. Currently, the Biopython > PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing > this to 1.0. > > I would like to see if anyone knows of situations in which this would be a > bad idea. > > Cheers, > > Lenna > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Jared.Sampson at nyumc.org Thu Aug 8 20:30:31 2013 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Thu, 8 Aug 2013 20:30:31 +0000 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Thanks, Lenna and Jo?o - I also agree, 1.0 is a better default occupancy value. For most structural manipulation purposes, unless specified otherwise, we must assume the atoms listed are present in the structure at full occupancy. Setting a reduced occupancy can be useful for partially bound ligands, disordered loops, and so forth, but doing so is the exception, not the rule. Cheers, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center Old Public Health Building, Room 610 341 East 25th Street New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Aug 8, 2013, at 4:02 PM, Jo?o Rodrigues > wrote: Hi Lenna, As I mentioned in the Github email, I think it's fine. It doesn't matter if the occupancy is 0 or 1 in case of a model most of the time. I agree with it. The only bad thing I can think about is having occupancy for a certain atom larger than 1 in some bogus cases but to be honest, no software that I know of bothers checking that... Cheers, Jo?o 2013/8/8 Lenna Peterson > Hi all, I just submitted a pull request I'd like wider feedback on. https://github.com/biopython/biopython/pull/207 In summary, I am using software-produced PDB files that simply stop after the coordinate data, so occupancy data is missing. Currently, the Biopython PDBParser sets missing or blank occupancy to 0.0. I am suggesting changing this to 1.0. I would like to see if anyone knows of situations in which this would be a bad idea. Cheers, Lenna _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu Aug 8 22:37:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Aug 2013 23:37:27 +0100 Subject: [Biopython] PDB occupancy behavior In-Reply-To: References: Message-ID: Thanks everyone - that seems like a clear consensus, patch applied :) Peter On Thu, Aug 8, 2013 at 9:30 PM, Sampson, Jared wrote: > Thanks, Lenna and Jo?o - > > I also agree, 1.0 is a better default occupancy value. For most > structural manipulation purposes, unless specified otherwise, we must assume > the atoms listed are present in the structure at full occupancy. Setting a > reduced occupancy can be useful for partially bound ligands, disordered > loops, and so forth, but doing so is the exception, not the rule. > > Cheers, > Jared > > -- > Jared Sampson > Xiangpeng Kong Lab > NYU Langone Medical Center > Old Public Health Building, Room 610 > 341 East 25th Street > New York, NY 10016 > 212-263-7898 > http://kong.med.nyu.edu/ > > > > > On Aug 8, 2013, at 4:02 PM, Jo?o Rodrigues > > wrote: > > Hi Lenna, > > As I mentioned in the Github email, I think it's fine. It doesn't matter > if the occupancy is 0 or 1 in case of a model most of the time. I agree > with it. The only bad thing I can think about is having occupancy for > a certain atom larger than 1 in some bogus cases but to be honest, > no software that I know of bothers checking that... > > Cheers, > > Jo?o > > > 2013/8/8 Lenna Peterson > > > Hi all, > > I just submitted a pull request I'd like wider feedback on. > > https://github.com/biopython/biopython/pull/207 > > In summary, I am using software-produced PDB files that simply stop after > the coordinate data, so occupancy data is missing. Currently, the > Biopython PDBParser sets missing or blank occupancy to 0.0. I am > suggesting changing this to 1.0. > > I would like to see if anyone knows of situations in which this would be a > bad idea. > > Cheers, > > Lenna From sainitin7 at gmail.com Fri Aug 9 19:12:39 2013 From: sainitin7 at gmail.com (sai nitin) Date: Fri, 9 Aug 2013 21:12:39 +0200 Subject: [Biopython] Issue in retrieving Pubmed Ids Message-ID: Hi all, I have set of genes ( ASCL1, AEBP1, MLF1) i want to search PUBMED for literature in glioma. Means i want to get Pubmed Ids for these genes in glioma. To achieve this i tried biopython script as follows. First i stored this terms in file as follows ASCL1 and glioma AEBP1 and glioma ..... infile = "file.txt" for line in infile.readlines(): single_id = line #Retreiving information data = Entrez.esearch(db="pubmed",term = single_id) res=Entrez.read(data) PMID = res["IdList"] print "%s" %(PMID) out_put.write("%s\n" %(PMID)) out_put.close() It reads only first line ASCL1 and glioma ... and gives result as follows ['22859994', '18796682', '18636433', '17146289', '17124508', '16103883', '11433425'] Traceback (most recent call last): File "PUBMED.py", line 13, in res=Entrez.read(data) File "/Library/Python/2.7/site-packages/Bio/Entrez/__init__.py", line 367, in read record = handler.read(handle) File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Empty term and query_key - nothing todo It looks like it does not read second line..in file.txt..Can any body tell how to solve this issue.. Thanks, -- Sainitin D From arklenna at gmail.com Fri Aug 9 19:42:13 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 9 Aug 2013 15:42:13 -0400 Subject: [Biopython] Issue in retrieving Pubmed Ids In-Reply-To: References: Message-ID: If your file has blank lines as shown, then the second iteration is calling `data = Entrez.esearch(db="pubmed",term = "\n")` When reading files I often have a check like this: line = line.strip() if not line: continue Hope that helps. Cheers, Lenna On Fri, Aug 9, 2013 at 3:12 PM, sai nitin wrote: > Hi all, > > I have set of genes ( ASCL1, AEBP1, MLF1) i want to search PUBMED for > literature in glioma. Means i want to get Pubmed Ids for these genes in > glioma. To achieve this i tried biopython script as follows. First i stored > this terms in file as follows > > ASCL1 and glioma > > AEBP1 and glioma > > ..... > > infile = "file.txt" > > for line in infile.readlines(): > > single_id = line > > #Retreiving information > > data = Entrez.esearch(db="pubmed",term = single_id) > > res=Entrez.read(data) > > PMID = res["IdList"] > > print "%s" %(PMID) > > out_put.write("%s\n" %(PMID)) > > out_put.close() > > It reads only first line ASCL1 and glioma ... and gives result as follows > > ['22859994', '18796682', '18636433', '17146289', '17124508', '16103883', > '11433425'] > > Traceback (most recent call last): > > File "PUBMED.py", line 13, in > > res=Entrez.read(data) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/__init__.py", line > 367, in read > > record = handler.read(handle) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 184, > in read > > self.parser.ParseFile(handle) > > File "/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py", line 322, > in endElementHandler > > raise RuntimeError(value) > > RuntimeError: Empty term and query_key - nothing todo > > > It looks like it does not read second line..in file.txt..Can any body tell > how to solve this issue.. > > > Thanks, > > -- > > Sainitin D > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From golubchi at stats.ox.ac.uk Wed Aug 21 08:41:48 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 21 Aug 2013 09:41:48 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast Message-ID: <52147D4C.4090900@stats.ox.ac.uk> Hello, The following refers to Biopython 1.61. Does anyone know if there are any hidden or hard-coded defaults for any parameters in NcbiblastnCommandline? Or any known bugs that could cause hits not to be reported? I've encountered an extremely frustrating issue that I've never seen before. The upshot is that the blastn result obtained through NcbiblastnCommandline occasionally reports "no hits" in a way that seems dependent on the query file. This strange output is different from that obtained via a subprocess call (or outside python entirely) -- both of which recover all hits consistently. I am using *exactly* the same parameters, inputs, and exactly the same path to the blastn executable. What actually happens is that for my large fasta file of 3000-odd nucleotide queries, several give no hits with the NcbiblastnCommandline call (these are not at the end of the query file, just throughout the file). On the other hand, cutting the file down to about 2000 queries, but not changing it in any other way, does give hits for these missing queries. Note that *nothing* else is changed; the parameters and call to blastn remain identical. This only happens for some, not all, of the blast databases I'm searching, making it look like there are variable deletions between the samples. I can provide (large) test data files if anyone thinks they can help. I have the query file that produces the wrong 'patchy' output, another that produces the correct output, and a sample blast database for which this happens. The actual calls are (paths substituted): NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, culling_limit=2) The above gives 'no hits' for about 25 queries out of the 3000+ in the file. stdout, stderr = sp.Popen("/path/to/blastn -db /path/to/db/C00006635 -query untrimmed.fasta -outfmt 5 -word_size 17 -gapopen 5 -gapextend 2 -culling_limit 2".split(), stdout=sp.PIPE, stderr=sp.PIPE).communicate() The above call to subprocess returns all hits, correctly. Thanks Tanya From p.j.a.cock at googlemail.com Wed Aug 21 09:39:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 21 Aug 2013 10:39:04 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast In-Reply-To: <52147D4C.4090900@stats.ox.ac.uk> References: <52147D4C.4090900@stats.ox.ac.uk> Message-ID: On Wed, Aug 21, 2013 at 9:41 AM, Tanya Golubchik wrote: > Hello, > > The following refers to Biopython 1.61. Does anyone know if there are any > hidden or hard-coded defaults for any parameters in NcbiblastnCommandline? > Or any known bugs that could cause hits not to be reported? > > I've encountered an extremely frustrating issue that I've never seen before. > The upshot is that the blastn result obtained through NcbiblastnCommandline > occasionally reports "no hits" in a way that seems dependent on the query > file. This strange output is different from that obtained via a subprocess > call (or outside python entirely) -- both of which recover all hits > consistently. I am using *exactly* the same parameters, inputs, and exactly > the same path to the blastn executable. > > What actually happens is that for my large fasta file of 3000-odd nucleotide > queries, several give no hits with the NcbiblastnCommandline call (these are > not at the end of the query file, just throughout the file). On the other > hand, cutting the file down to about 2000 queries, but not changing it in > any other way, does give hits for these missing queries. Note that *nothing* > else is changed; the parameters and call to blastn remain identical. This > only happens for some, not all, of the blast databases I'm searching, making > it look like there are variable deletions between the samples. > > I can provide (large) test data files if anyone thinks they can help. I have > the query file that produces the wrong 'patchy' output, another that > produces the correct output, and a sample blast database for which this > happens. > > The actual calls are (paths substituted): > > NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, > query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, > culling_limit=2) > > The above gives 'no hits' for about 25 queries out of the 3000+ in the file. Are you using it like this, which also uses subprocess, cline = NcbiblastnCommandline(...) stdout, stderr = cline() > stdout, stderr = sp.Popen("/path/to/blastn -db /path/to/db/C00006635 -query > untrimmed.fasta -outfmt 5 -word_size 17 -gapopen 5 -gapextend 2 > -culling_limit 2".split(), stdout=sp.PIPE, stderr=sp.PIPE).communicate() > > The above call to subprocess returns all hits, correctly. Note there is a subtle difference in the order of the command line which could (depending on how the command line parsing is done) reveal a bug in BLAST: >>> from Bio.Blast.Applications import NcbiblastnCommandline >>> cline = NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, gapextend=2, culling_limit=2) >>> str(cline) '/path/to/blastn -outfmt 5 -query untrimmed.fasta -db /path/to/db/C00006635 -gapopen 5 -gapextend 2 -culling_limit 2' Just to rule this out, retry running this string instead. However, the more likely explanation is you didn't set the wordsize, unless that is a typo in this email? Regards, Peter From golubchi at stats.ox.ac.uk Wed Aug 21 11:38:36 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 21 Aug 2013 12:38:36 +0100 Subject: [Biopython] NcbiblastnCommandline vs subprocess blast In-Reply-To: References: <52147D4C.4090900@stats.ox.ac.uk> Message-ID: <5214A6BC.2070003@stats.ox.ac.uk> Hi Peter, Thank you for your help. You're right that it's the word size, although not in the way this was suggesting -- what happened was that with my selected word size, some queries were failing with -culling_limit 1 (I still don't quite know why this happens, it seems to be a blast bug) and defaulting to culling limit 2 -- which is fine, but at this point they were losing the word size argument (my stuff-up). Thanks again! Tanya From csaba.kiss at lanl.gov Sun Aug 25 19:20:40 2013 From: csaba.kiss at lanl.gov (Kiss, Csaba) Date: Sun, 25 Aug 2013 19:20:40 +0000 Subject: [Biopython] Mac installation problem Message-ID: I have a problem installing biopython on a MAC system using the easy_install method. I have tried these two commands: sudo easy_install -f http://biopython.org/DIST/ biopython sudo easy_install -U biopython None of them worked. I get these error messages: Reading http://pypi.python.org/simple/biopython/ No local packages or download links found for biopython error: Could not find suitable distribution for Requirement.parse('biopython') Can someone help me rectify this? I have a paper being reviewed and the reviewer has problems getting biopython going, which means it's pretty crucial for me. Csaba From p.j.a.cock at googlemail.com Sun Aug 25 19:47:19 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 25 Aug 2013 20:47:19 +0100 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: You'll need Apple's XCode for the compilers etc, available for free on the App Store - and then from within XCode you must install the optional command line utilities as well. Also, I'd recommend the old fashioned way, just download and decompress the tar-ball, then python setup.py build python setup.py test sudo python setup.py install This should match our wiki download page, http://biopython.org/wiki/Download Peter On Sunday, August 25, 2013, Kiss, Csaba wrote: > I have a problem installing biopython on a MAC system using the > easy_install method. I have tried these two commands: > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > sudo easy_install -U biopython > > None of them worked. I get these error messages: > > Reading http://pypi.python.org/simple/biopython/ > No local packages or download links found for biopython > error: Could not find suitable distribution for > Requirement.parse('biopython') > > Can someone help me rectify this? I have a paper being reviewed and the > reviewer has problems getting biopython going, which means it's pretty > crucial for me. > > Csaba > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Sun Aug 25 21:24:25 2013 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sun, 25 Aug 2013 17:24:25 -0400 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: I have found that installing MAC packages through FINK is helpful (other options similar to fink are macports and homebrew but I have found fink sufficient and it does install biopython ) On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock wrote: > You'll need Apple's XCode for the compilers > etc, available for free on the App Store - and then > from within XCode you must install the optional > command line utilities as well. > > Also, I'd recommend the old fashioned way, just > download and decompress the tar-ball, then > > python setup.py build > python setup.py test > sudo python setup.py install > > This should match our wiki download page, > http://biopython.org/wiki/Download > > Peter > > On Sunday, August 25, 2013, Kiss, Csaba wrote: > > > I have a problem installing biopython on a MAC system using the > > easy_install method. I have tried these two commands: > > > > > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > > > sudo easy_install -U biopython > > > > None of them worked. I get these error messages: > > > > Reading http://pypi.python.org/simple/biopython/ > > No local packages or download links found for biopython > > error: Could not find suitable distribution for > > Requirement.parse('biopython') > > > > Can someone help me rectify this? I have a paper being reviewed and the > > reviewer has problems getting biopython going, which means it's pretty > > crucial for me. > > > > Csaba > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From csaba.kiss at lanl.gov Mon Aug 26 13:47:09 2013 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Mon, 26 Aug 2013 07:47:09 -0600 Subject: [Biopython] Mac installation problem In-Reply-To: References: Message-ID: <521B5C5D.1000102@lanl.gov> Thanks, George. I will give FINK a try. Csaba On 8/25/2013 3:24 PM, George Devaniranjan wrote: > I have found that installing MAC packages through FINK is helpful > (other options similar to fink are macports and homebrew but I have > found fink sufficient and it does install biopython ) > > > On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock > wrote: > > You'll need Apple's XCode for the compilers > etc, available for free on the App Store - and then > from within XCode you must install the optional > command line utilities as well. > > Also, I'd recommend the old fashioned way, just > download and decompress the tar-ball, then > > python setup.py build > python setup.py test > sudo python setup.py install > > This should match our wiki download page, > http://biopython.org/wiki/Download > > Peter > > On Sunday, August 25, 2013, Kiss, Csaba wrote: > > > I have a problem installing biopython on a MAC system using the > > easy_install method. I have tried these two commands: > > > > > > > > sudo easy_install -f http://biopython.org/DIST/ biopython > > > > sudo easy_install -U biopython > > > > None of them worked. I get these error messages: > > > > Reading http://pypi.python.org/simple/biopython/ > > No local packages or download links found for biopython > > error: Could not find suitable distribution for > > Requirement.parse('biopython') > > > > Can someone help me rectify this? I have a paper being reviewed > and the > > reviewer has problems getting biopython going, which means it's > pretty > > crucial for me. > > > > Csaba > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From nlindberg at mkei.org Mon Aug 26 14:08:19 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Mon, 26 Aug 2013 14:08:19 +0000 Subject: [Biopython] Mac installation problem In-Reply-To: <521B5C5D.1000102@lanl.gov> Message-ID: I would highly recommend doing it the old fashioned way, as the first poster recommended. The package managers start to do things a little funky the way they link packages, and depending on if you're using the built in Python versus a homebrewed/ported copy, etc. I don't think fink is actively maintained any longer. Nick Lindberg Sr. Consulting Engineer, HPC Milwaukee Institute 414.727.6413 (W) http://www.mkei.org On 8/26/13 8:47 AM, "Csaba Kiss" wrote: >Thanks, George. I will give FINK a try. > >Csaba >On 8/25/2013 3:24 PM, George Devaniranjan wrote: >> I have found that installing MAC packages through FINK is helpful >> (other options similar to fink are macports and homebrew but I have >> found fink sufficient and it does install biopython ) >> >> >> On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock > > wrote: >> >> You'll need Apple's XCode for the compilers >> etc, available for free on the App Store - and then >> from within XCode you must install the optional >> command line utilities as well. >> >> Also, I'd recommend the old fashioned way, just >> download and decompress the tar-ball, then >> >> python setup.py build >> python setup.py test >> sudo python setup.py install >> >> This should match our wiki download page, >> http://biopython.org/wiki/Download >> >> Peter >> >> On Sunday, August 25, 2013, Kiss, Csaba wrote: >> >> > I have a problem installing biopython on a MAC system using the >> > easy_install method. I have tried these two commands: >> > >> > >> > >> > sudo easy_install -f http://biopython.org/DIST/ biopython >> > >> > sudo easy_install -U biopython >> > >> > None of them worked. I get these error messages: >> > >> > Reading http://pypi.python.org/simple/biopython/ >> > No local packages or download links found for biopython >> > error: Could not find suitable distribution for >> > Requirement.parse('biopython') >> > >> > Can someone help me rectify this? I have a paper being reviewed >> and the >> > reviewer has problems getting biopython going, which means it's >> pretty >> > crucial for me. >> > >> > Csaba >> > >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biopython >> >> > >-- >Best Regards: >Csaba Kiss PhD, MSc, BSc >TA-43, HRL-1, MS888 >Los Alamos National Laboratory >Work: 1-505-667-9898 >Cell: 1-505-920-5774 > >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From arklenna at gmail.com Mon Aug 26 16:57:23 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Mon, 26 Aug 2013 12:57:23 -0400 Subject: [Biopython] Mac installation problem In-Reply-To: References: <521B5C5D.1000102@lanl.gov> Message-ID: Furthermore, I don't believe installing any of the Mac "package managers" eliminates the need to install developer tools/XCode. Cheers, Lenna On Mon, Aug 26, 2013 at 10:08 AM, Nick Lindberg wrote: > I would highly recommend doing it the old fashioned way, as the first > poster recommended. The package managers start to do things a little > funky the way they link packages, and depending on if you're using the > built in Python versus a homebrewed/ported copy, etc. > > I don't think fink is actively maintained any longer. > > Nick Lindberg > Sr. Consulting Engineer, HPC > Milwaukee Institute > 414.727.6413 (W) > http://www.mkei.org > > > > > > > > > > > > On 8/26/13 8:47 AM, "Csaba Kiss" wrote: > > >Thanks, George. I will give FINK a try. > > > >Csaba > >On 8/25/2013 3:24 PM, George Devaniranjan wrote: > >> I have found that installing MAC packages through FINK is helpful > >> (other options similar to fink are macports and homebrew but I have > >> found fink sufficient and it does install biopython ) > >> > >> > >> On Sun, Aug 25, 2013 at 3:47 PM, Peter Cock >> > wrote: > >> > >> You'll need Apple's XCode for the compilers > >> etc, available for free on the App Store - and then > >> from within XCode you must install the optional > >> command line utilities as well. > >> > >> Also, I'd recommend the old fashioned way, just > >> download and decompress the tar-ball, then > >> > >> python setup.py build > >> python setup.py test > >> sudo python setup.py install > >> > >> This should match our wiki download page, > >> http://biopython.org/wiki/Download > >> > >> Peter > >> > >> On Sunday, August 25, 2013, Kiss, Csaba wrote: > >> > >> > I have a problem installing biopython on a MAC system using the > >> > easy_install method. I have tried these two commands: > >> > > >> > > >> > > >> > sudo easy_install -f http://biopython.org/DIST/ biopython > >> > > >> > sudo easy_install -U biopython > >> > > >> > None of them worked. I get these error messages: > >> > > >> > Reading http://pypi.python.org/simple/biopython/ > >> > No local packages or download links found for biopython > >> > error: Could not find suitable distribution for > >> > Requirement.parse('biopython') > >> > > >> > Can someone help me rectify this? I have a paper being reviewed > >> and the > >> > reviewer has problems getting biopython going, which means it's > >> pretty > >> > crucial for me. > >> > > >> > Csaba > >> > > >> > _______________________________________________ > >> > Biopython mailing list - Biopython at lists.open-bio.org > >> > >> > http://lists.open-bio.org/mailman/listinfo/biopython > >> > > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/biopython > >> > >> > > > >-- > >Best Regards: > >Csaba Kiss PhD, MSc, BSc > >TA-43, HRL-1, MS888 > >Los Alamos National Laboratory > >Work: 1-505-667-9898 > >Cell: 1-505-920-5774 > > > >_______________________________________________ > >Biopython mailing list - Biopython at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Wed Aug 28 22:47:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Aug 2013 23:47:04 +0100 Subject: [Biopython] Biopython 1.62 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.62 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). Python support This is our first release of Biopython which officially supports Python 3. Specifically, this is supported under Python 3.3. Older versions of Python 3 may still work albeit with some issues, but are not supported. We still fully support Python 2.5, 2.6, and 2.7. Support under Jython is available for versions 2.5 and 2.7 and under PyPy for versions 1.9 and 2.0. However, unlike CPython, Jython and PyPy support is partial: NumPy and our C extensions are not covered. Please note that this release marks our last official for support Python 2.5. Beginning from Biopython 1.63, the minimum supported Python version will be 2.6. Highlights The translation functions will give a warning on any partial codons (and this will probably become an error in a future release). If you know you are dealing with partial sequences, either pad with ?N? to extend the sequence length to a multiple of three, or explicitly trim the sequence. The handling of joins and related complex features in Genbank/EMBL files has been changed with the introduction of a CompoundLocation object. Previously a SeqFeaturefor something like a multi-exon CDS would have a child SeqFeature (under thesub_features attribute) for each exon. The sub_features property will still be populated for now, but is deprecated and will in future be removed. Please consult the examples in the help (docstrings) and Tutorial. Thanks to the efforts of Ben Morris, the Phylo module now supports the file formats NeXML and CDAO. The Newick parser is also significantly faster, and can now optionally extract bootstrap values from the Newick comment field (like Molphy and Archaeopteryx do). Nate Sutton added a wrapper for FastTree toBio.Phylo.Applications. New module Bio.UniProt adds parsers for the GAF, GPA and GPI formats from UniProt-GOA. The BioSQL module is now supported in Jython. MySQL and PostgreSQL databases can be used. The relevant JDBC driver should be available in the CLASSPATH. Feature labels on circular GenomeDiagram figures now support the label_positionargument (start, middle or end) in addition to the current default placement, and in a change to prior releases these labels are outside the features which is now consistent with the linear diagrams. The code for parsing 3D structures in mmCIF files was updated to use the Python standard library?s shlex module instead of C code using flex. The Bio.Sequencing.Applications module now includes a BWA command line wrapper. Bio.motifs supports JASPAR format files with multiple position-frequence matrices. Additionally there have been other minor bug fixes and more unit tests. Contributors Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Alexander Campbell (first contribution) Andrea Rizzi (first contribution) Anthony Mathelier (first contribution) Ben Morris (first contribution) Brad Chapman Christian Brueffer David Arenillas (first contribution) David Martin (first contribution) Eric Talevich Iddo Friedberg Jian-Long Huang (first contribution) Joao Rodrigues Kai Blin Lenna Peterson Michiel de Hoon Matsuyuki Shirota (first contribution) Nate Sutton (first contribution) Peter Cock Petra Kubincov? (first contribution) Phillip Garland Saket Choudhary (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Xabier Bello (first contribution) Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/08/biopython-1-62-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From jadolfbr at gmail.com Thu Aug 29 00:25:43 2013 From: jadolfbr at gmail.com (Gmail) Date: Wed, 28 Aug 2013 19:25:43 -0500 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: Message-ID: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> > Hi All, > > I am trying to delete residues from a chain using > > id = res.id > old_chain.detach_child(id) > > Which was talked about here: http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython > > I used both the release versions and built the github version, but I get an error message: > > File "/PyIgClassify/tools/renumbering.py", line 296, in delete_res_in_old_chain > old_chain.detach_child(id) > File "/Library/Python/2.7/site-packages/Bio/PDB/Entity.py", line 76, in detach_child > child=self.child_dict[id] > TypeError: unhashable type: 'list' > > I have tried this in iPython with arbitrary chains and the result seems to be the same. > Any advice for a newbie Biopython coder? Is this a bug or is there some way around this? > > Thanks!! > > Jared Adolf-Bryfogle > PhD Candidate > Lab of Dr. Roland Dunbrack > FCCC/DrexelMed > From anaryin at gmail.com Thu Aug 29 00:31:04 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 29 Aug 2013 01:31:04 +0100 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Hi Jared, Can you give us an example of the code you are running? child_dict *should* be a dictionary, not a list, so there must be something wrong in there. Cheers, Jo?o From jadolfbr at gmail.com Thu Aug 29 00:53:22 2013 From: jadolfbr at gmail.com (Jared Adolf-Bryfogle) Date: Wed, 28 Aug 2013 20:53:22 -0400 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Sure, I was thinking it's the res.id list that's passed to the child_dict to detach it? Here is the code snippet: def delete_res_in_old_chain(old_chain, start, end): seq_position = 1 for res in old_chain: if not (start <= seq_position <= end): id = res.id old_chain.detach_child(id) if not res.id[0]==' ': seq_position+=1 On Wed, Aug 28, 2013 at 8:31 PM, Jo?o Rodrigues wrote: > Hi Jared, > > Can you give us an example of the code you are running? child_dict > *should* be a dictionary, not a list, so there must be something wrong in > there. > > Cheers, > > Jo?o > From anaryin at gmail.com Thu Aug 29 01:02:32 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 29 Aug 2013 02:02:32 +0100 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: True, if id is list something is wrong. The code on github seems good, and my local copy here works good too. What exactly is old_chain? Are you sure it is a true chain?? What is the content of res.id? From jadolfbr at gmail.com Thu Aug 29 02:04:23 2013 From: jadolfbr at gmail.com (Jared Adolf-Bryfogle) Date: Wed, 28 Aug 2013 22:04:23 -0400 Subject: [Biopython] deleting/detaching residues from a chain In-Reply-To: References: <5DE600F8-F3B6-4098-B64C-DE2C75F1DF59@gmail.com> Message-ID: Yes - the list was the problem. Looking through the code again, I made res.id a list in another function where I changed residue numbers - instead of passing it a tuple like I should have. Sorry about this - thanks for taking the time to help me. -Jared On Wed, Aug 28, 2013 at 9:02 PM, Jo?o Rodrigues wrote: > True, if id is list something is wrong. The code on github seems good, and > my local copy here works good too. What exactly is old_chain? Are you sure > it is a true chain? What is the content of res.id? > From jdjensen at eng.ucsd.edu Thu Aug 29 23:04:41 2013 From: jdjensen at eng.ucsd.edu (James Jensen) Date: Thu, 29 Aug 2013 16:04:41 -0700 Subject: [Biopython] (Bio.PDB) problems with NeighborSearch: error at levels above "A", residue index discrepancy with unfold_entities Message-ID: <521FD389.1090207@eng.ucsd.edu> Hello! I am writing a function that, given two chains in a PDB file, should return 1) the positions and identities of all residues that are in contact with (distance < 5 angstroms) a residue on the other chain, and 2) the amino acid sequences of the chains. I've been doing this with NeighborSearch.search_all(radius=5, level='A') and then for each atom pair, seeing what its parent residue is and whether the parent residues of the two atoms belong to different chains. This may seem like a roundabout way of doing it, but if I call search_all(radius=5, level='R'), or indeed with level=any level other than 'A', I get the error TypeError: unorderable types: Residue() < Residue() So my first question is why it might be that search_all isn't working at higher levels. For the adjacent residue pairs I identify using NeighborSearch, I get each residue's position in its respective chain by residue.get_id()[1]. I've noticed, however, that if I get the sequence of the chain using seq = Selection.unfold_entities(chain, 'R') and then reference (i.e. seq[index]) the amino acids using the indices returned by the NeighborSearch step, they are not the same residues that I get if during the NeighborSearch step I report residue.get_resname() for each adjacent residue. I've tried it with several proteins, and the problem is the same. Chains A and C of 2h62 are an example. I then noticed that the lowest residue ID number of the residues yielded from Selection.unfold_entities(chain, 'R') is not 1. For chain A, it's 11, and for chain C, it's 34. Not knowing why this was, I thought I'd try subtracting the lowest ID number from the indices returned by the NeighborSearch step (i.e. in chain A, 11 -> 0 so seq[0] would be the first residue, the one with ID 11). This happened to seem to work for chain A. However, it gives me negative indices for some of the contacts in chain C. This means that NeighborSearch can return residues that are not returned by unfold_entities(). The lowest residue ID returned by NeighborSearch for chain C was 24, whereas for unfold_entities() it was 34. For both chains A and C, I was given the warning PDBConstructionWarning: WARNING: Chain [letter] is discontinuous at line [line number]. In fact, I seem to get this warning for just about every chain of every structure I load. Is this the reason that the first residues in the two chains are at 11 and 34, rather than 1? If so, could it be that NeighborSearch is able to work around the discontinuity while unfold_entities is not? Any suggestions? Thanks for your time and help, James Jensen