From xiaochuan.liu at mssm.edu Thu May 3 18:46:58 2012 From: xiaochuan.liu at mssm.edu (Liu, XiaoChuan) Date: Thu, 3 May 2012 22:46:58 +0000 Subject: [Biopython] How to use SeqRecord to get the subseq location information Message-ID: Dear all, I face a problem: How to use SeqRecord to get the subseq location information? My code is like this: >>> from Bio.Seq import Seq >>> simple_seq = Seq("gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugaguccugcucuuguucugagcaccaccccucucucaga") >>> from Bio.SeqRecord import SeqRecord >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> example_feature = SeqFeature(FeatureLocation(25382494, 25382583), type="mRNA", strand=-1) >>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>> simple_seq_r SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> simple_seq_r.features [SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1)] >>> simple_seq_r.features[0] SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1) >>> subseq=simple_seq_r[3:10] >>> subseq SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) But when I type ?subseq.features? like this: >>> subseq.features [] I could not get the location information of subseq. Why? Do somebody know how to get these information? Thank you very much! Best, Xiaochuan From w.arindrarto at gmail.com Fri May 4 02:09:31 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 4 May 2012 08:09:31 +0200 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: Hi Liu, It looks like the problem is caused by the values you put in your SeqFeature. Your sequence length is less than the feature location values. If you try plugging in a number in range, like this: >>> example_feature = SeqFeature(FeatureLocation(5, 7), type="mRNA", strand=-1) You should still keep the feature in your subsequence, like so: >>> subseq = simple_seq_r[3:10] >>> subseq.features [SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4), strand=-1), type='mRNA')] Hope that helps :), cheers, Bow From xiaochuan.liu at mssm.edu Fri May 4 12:19:47 2012 From: xiaochuan.liu at mssm.edu (Liu, XiaoChuan) Date: Fri, 4 May 2012 16:19:47 +0000 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: Hi Bow, Thank you very much for your helps! But according to your suggestion, I also face this problem. See below: >>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1) >>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>> simple_seq_r SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> simple_seq_r.features [SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)] >>> subseq=simple_seq_r[3:10] >>> subseq SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> subseq.features [] I could not get the location information of subseq yet. Why? Thank you very much! Best, Xiaochuan -----Original Message----- From: Wibowo Arindrarto [mailto:w.arindrarto at gmail.com] Sent: Friday, May 04, 2012 2:10 AM To: Liu, XiaoChuan Cc: biopython at biopython.org Subject: Re: [Biopython] How to use SeqRecord to get the subseq location information Hi Liu, It looks like the problem is caused by the values you put in your SeqFeature. Your sequence length is less than the feature location values. If you try plugging in a number in range, like this: >>> example_feature = SeqFeature(FeatureLocation(5, 7), type="mRNA", >>> strand=-1) You should still keep the feature in your subsequence, like so: >>> subseq = simple_seq_r[3:10] >>> subseq.features [SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4), strand=-1), type='mRNA')] Hope that helps :), cheers, Bow From p.j.a.cock at googlemail.com Fri May 4 12:31:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 4 May 2012 17:31:07 +0100 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: On Fri, May 4, 2012 at 5:19 PM, Liu, XiaoChuan wrote: > Hi Bow, > > Thank you very much for your helps! > But according to your suggestion, I also face this problem. See below: > >>>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1) >>>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>>> simple_seq_r > SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>>> simple_seq_r.features > [SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)] >>>> subseq=simple_seq_r[3:10] >>>> subseq > SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>>> subseq.features > [] > > I could not get the location information of subseq yet. Why? Thank you very much! > What numbers are you trying to get? In your example the parent sequence (simple_seq_r) has a feature from 0 to 88, but when you slice a SeqRecord only features fully inside the slice are kept - so no features are kept for the child record (subseq). We do not breakup larger features which straddle the cut sites. Peter From p.j.a.cock at googlemail.com Sun May 6 07:09:30 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 6 May 2012 12:09:30 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: Dear Biopythoneers, Are any of us planning to attend the SciPy meeting? The 2012 SciPy Bioinformatics Workshop is crying out for a Biopython related talk... and from the email below it sounds like they're not just looking for a developers perspectives, but also how Python is being used in bioinformatics. Is it quite close after BOSC and ISMB but July 19 doesn't actually clash: http://www.open-bio.org/wiki/BOSC_2012 SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it clashes with the planned CodeFest too: http://www.open-bio.org/wiki/EU_Codefest_2012 July is definitely conference season... Peter ---------- Forwarded message ---------- From: *Chris Mueller* Date: Thursday, May 3, 2012 Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop To: "chris.mueller at lab7.io" We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in conjunction with SciPy 2012 this July in Austin, TX. Python in biology is not dead yet... in fact, it's alive and well! Remember just a few short years ago when BioPerl ruled the world? Just one minor paradigm shift* later and Python now has a commanding presence in bioinformatics. From Python bindings to common tools all the way to entire Python-based informatics platforms, Python is used everywhere** in modern bioinformatics. If you use Python for bioinformatics or just want to learn more about how its being used, join us at the 2012 SciPy Bioinformatics Workshop. We will have speakers from both academia and industry showcasing how Python is enabling biologists to effectively work with large, complex data sets. The workshop will be held the evening of July 19 from 5-6:30. More information about SciPy is available on the conference site: http://conference.scipy.org/scipy2012/ !! Participate !! Are you using Python in bioinformatics? We'd love to have you share your story. We are looking for 3-4 speakers to share their experiences using Python for bioinformatics. Please contact Chris Mueller at chris.mueller [at] lab7.io and Ray Roberts at rroberts [at] enthought.com to volunteer. Please include a brief description or link to a paper/topic which you would like to discuss. Presentations will last for 15 minutes each and will be followed by a panel Q&A. -- * That would be next generation sequencing ** Yes, we aRe awaRe of that otheR language used eveRywhere, but let's celebRate Python Right now. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From tiagoantao at gmail.com Sun May 6 07:16:36 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 6 May 2012 12:16:36 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: Hi, On Sun, May 6, 2012 at 12:09 PM, Peter Cock wrote: > SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it > clashes with the planned CodeFest too: > http://www.open-bio.org/wiki/EU_Codefest_2012 Are any people from here going to the codefest? Tiago From cjfields at illinois.edu Sun May 6 11:03:27 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 6 May 2012 15:03:27 +0000 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com>, Message-ID: On May 6, 2012, at 6:12 AM, "Peter Cock" wrote: > ... > Is it quite close after BOSC and ISMB but July 19 doesn't actually clash: > http://www.open-bio.org/wiki/BOSC_2012 > > SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it > clashes with the planned CodeFest too: > http://www.open-bio.org/wiki/EU_Codefest_2012 > > July is definitely conference season... Galaxy community conference as well. Chris > > Peter > > ---------- Forwarded message ---------- > From: *Chris Mueller* > Date: Thursday, May 3, 2012 > Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop > To: "chris.mueller at lab7.io" > > > We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in > conjunction with SciPy 2012 this July in Austin, TX. > > Python in biology is not dead yet... in fact, it's alive and well! > > Remember just a few short years ago when BioPerl ruled the world? Just one > minor paradigm shift* later and Python now has a commanding presence in > bioinformatics. From Python bindings to common tools all the way to entire > Python-based informatics platforms, Python is used everywhere** in modern > bioinformatics. > > If you use Python for bioinformatics or just want to learn more about how > its being used, join us at the 2012 SciPy Bioinformatics Workshop. We will > have speakers from both academia and industry showcasing how Python is > enabling biologists to effectively work with large, complex data sets. > > The workshop will be held the evening of July 19 from 5-6:30. > > More information about SciPy is available on the conference site: > http://conference.scipy.org/scipy2012/ > > !! Participate !! > > Are you using Python in bioinformatics? We'd love to have you share your > story. We are looking for 3-4 speakers to share their experiences using > Python for bioinformatics. > > Please contact Chris Mueller at chris.mueller [at] lab7.io and Ray Roberts > at rroberts [at] enthought.com to volunteer. Please include a brief > description or link to a paper/topic which you would like to discuss. > Presentations will last for 15 minutes each and will be followed by a panel > Q&A. > > -- > * That would be next generation sequencing > ** Yes, we aRe awaRe of that otheR language used eveRywhere, but let's > celebRate Python Right now. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From arklenna at gmail.com Sun May 6 17:26:30 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Sun, 6 May 2012 17:26:30 -0400 Subject: [Biopython] GSoC python variant update Message-ID: Hi all, I've written a few new posts on my blog; here's the latest: http://arklenna.tumblr.com/post/22542372076/spot-isa-dog I will attach a UML diagram and include the part of the post addressing the diagram. Click through to the full post for a bonus Einstein quote! ------- My main goals are not limited to: * Make the structure parser and file-format agnostic: an abstracted OO design should allow anything to be slotted in (for example, Marjan's C GFF parser?) * Maintain encapsulation: limit how much each object can see of objects above and below it * Allow extension at multiple levels: some existing parsers may process data in different ways; this structure should allow handling both raw data and data in various formats. The `Variant` object's constructor allows an end user to change the default parsers. Practical implementation details of `parse()` and `write()` will need to be finessed - for example, ways to help the user sift through immense quantities of data. I'm still in the process of comparing the data contained in VCF/GVF files as well as the APIs of PyVCF and BCBio.GFF. `Parser` and `Writer` are both abstract classes that will define all methods found in known parsers/writers with `NotImplementedError`s. I'm speculating on whether a Variant-specific exception would be useful, but a custom message should suffice. Continuing down the diagram, `PyVCFWrapper` and `BCBioGFFWrapper` would each inherit from both `Parser` and `Writer`. As the name implies, they would serve as the adapter between the generic `Variant` and the specific parser. I anticipate that this structure could easily be extended to allow intermediate storage in DBs as well as innumerable sorting/comparing/filtering methods inside `Variant`. ------- I would appreciate any and all feedback about the overall structure. Namespace is definitely flexible. I'd also appreciate any specific genomic variant workflows, and if somebody can point me to smallish sample files of the same data in both VCF and GVF, I'd be eternally grateful. Regards, Lenna -------------- next part -------------- A non-text attachment was scrubbed... Name: Variant_UML.png Type: image/png Size: 23313 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Mon May 7 04:37:38 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 May 2012 09:37:38 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: On Sun, May 6, 2012 at 12:16 PM, Tiago Ant?o wrote: > Hi, > > On Sun, May 6, 2012 at 12:09 PM, Peter Cock wrote: >> SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it >> clashes with the planned CodeFest too: >> http://www.open-bio.org/wiki/EU_Codefest_2012 > > Are any people from here going to the codefest? > > Tiago Brad is going to the pre-BOSC CodeFest in California, http://www.open-bio.org/wiki/Codefest_2012 I'm not sure if we have any Biopython folk signed up for the post-BOSC EU CodeFest in Italy yet. http://www.open-bio.org/wiki/EU_Codefest_2012 I aim to attend one of the CodeFests - trying to firm up summer travel plans now... Peter From devaniranjan at gmail.com Mon May 7 18:25:46 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 7 May 2012 18:25:46 -0400 Subject: [Biopython] PDBParser Message-ID: Hi, I have a question about using PDBParser from Bio.PDB.PDBParser import PDBParser parser=PDBParser() structure=parser.get_structure("test", "1fat.pdb") model=structure[0] chain=model["A"] residue=chain[1] I want to use it to extract and WRITE to a file the coordinates of residues 10 to 20 only. (or whatever residue range I specify) Using the PDB Parser file I can extract residue id in the range but how to I back trace and write the file in the exact format that is found in the PDB so that I can view it in a program like VMD/Pymol? (that is I want to write the coordinates and all information as found in the PDB but only for selected residues that I pass into it ) I know I can do it using VMD but I want to do it for thousands of PDB and would like to write a database of such extracted fragments. The other alternative is of course to go line by line in each file and write the lines that match the residue range specified but I was wondering if there is a way of doing the same thing using the PDBParser? Thank you, George From anaryin at gmail.com Tue May 8 04:16:44 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 10:16:44 +0200 Subject: [Biopython] PDBParser In-Reply-To: References: Message-ID: Hello George, You want to write only a part of the PDB file? What do you mean by 'all the info'? If it is header information as well, then this is not possible, but coordinates it is. You can do it in two ways: 1. Delete all residues/chains/models that are not part of your region of interest and then write the structure with PDBIO. 2. Use the 'Select' class from PDBIO.py and trim the region of interest. For example, for residues 1-10 you could do something like this: from Bio.PDB import PDBIO from Bio.PDB import Select class ResidueFilter(Select): def accept_residue(self, residue): if residue.id[1] in range(1,11): return 1 P = PDBParser() s = P.get_structure('dummy', 'foo.pdb') io = PDBIO() io.set_structure(s) io.save('foo_1-10.pdb', ResidueFilter()) Check the FAQ for a more detailed explanation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi, > > I have a question about using PDBParser > > > from Bio.PDB.PDBParser import PDBParser > > parser=PDBParser() > > structure=parser.get_structure("test", "1fat.pdb") > model=structure[0] > chain=model["A"] > residue=chain[1] > > I want to use it to extract and WRITE to a file the coordinates of residues > 10 to 20 only. > (or whatever residue range I specify) > > Using the PDB Parser file I can extract residue id in the range but how to > I back trace and write the file in the exact format that is found in the > PDB so that I can view it in a program like VMD/Pymol? > (that is I want to write the coordinates and all information as found in > the PDB but only for selected residues that I pass into it ) > I know I can do it using VMD but I want to do it for thousands of PDB and > would like to write a database of such extracted fragments. > > The other alternative is of course to go line by line in each file and > write the lines that match the residue range specified but I was wondering > if there is a way of doing the same thing using the PDBParser? > > Thank you, > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Tue May 8 08:44:09 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 08:44:09 -0400 Subject: [Biopython] PDBParser In-Reply-To: References: Message-ID: Thank you everyone, I just need the coordinates of certain fragments from PDB files and this works for me. I was trying to use the PDBParser only, but thank you for pointing out PDBIO to me. Thank you, George On Tue, May 8, 2012 at 4:16 AM, Jo?o Rodrigues wrote: > Hello George, > > You want to write only a part of the PDB file? What do you mean by 'all > the info'? If it is header information as well, then this is not possible, > but coordinates it is. You can do it in two ways: > > 1. Delete all residues/chains/models that are not part of your region of > interest and then write the structure with PDBIO. > > 2. Use the 'Select' class from PDBIO.py and trim the region of interest. > For example, for residues 1-10 you could do something like this: > > from Bio.PDB import PDBIO > from Bio.PDB import Select > > class ResidueFilter(Select): > def accept_residue(self, residue): > if residue.id[1] in range(1,11): > return 1 > > P = PDBParser() > s = P.get_structure('dummy', 'foo.pdb') > > io = PDBIO() > > io.set_structure(s) > io.save('foo_1-10.pdb', ResidueFilter()) > > > Check the FAQ for a more detailed explanation: > > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > >> Hi, >> >> I have a question about using PDBParser >> >> >> from Bio.PDB.PDBParser import PDBParser >> >> parser=PDBParser() >> >> structure=parser.get_structure("test", "1fat.pdb") >> model=structure[0] >> chain=model["A"] >> residue=chain[1] >> >> I want to use it to extract and WRITE to a file the coordinates of >> residues >> 10 to 20 only. >> (or whatever residue range I specify) >> >> Using the PDB Parser file I can extract residue id in the range but how >> to >> I back trace and write the file in the exact format that is found in the >> PDB so that I can view it in a program like VMD/Pymol? >> (that is I want to write the coordinates and all information as found in >> the PDB but only for selected residues that I pass into it ) >> I know I can do it using VMD but I want to do it for thousands of PDB and >> would like to write a database of such extracted fragments. >> >> The other alternative is of course to go line by line in each file and >> write the lines that match the residue range specified but I was wondering >> if there is a way of doing the same thing using the PDBParser? >> >> Thank you, >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From devaniranjan at gmail.com Tue May 8 15:12:22 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 15:12:22 -0400 Subject: [Biopython] PDBParser-chain breaks Message-ID: Hi, I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with chain breaks. However, nothing like that seems to happen..... For instance P=PDBParser(PERMISSIVE=0) structure=P.get_structure('test', '7ODC.pdb') 7ODC has 3 chain breaks but it does not raise an exception. Thank you George From anaryin at gmail.com Tue May 8 15:34:26 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 22:34:26 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Hi George, Chain breaks are pretty "harmless" and usually do not represent a faulty PDB file. The PERMISSIVE flag is for "features" like missing b-factors. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi, > > I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with > chain breaks. > However, nothing like that seems to happen..... > > For instance > > P=PDBParser(PERMISSIVE=0) > structure=P.get_structure('test', '7ODC.pdb') > > > 7ODC has 3 chain breaks but it does not raise an exception. > > Thank you > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Tue May 8 15:37:47 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 15:37:47 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Hi Jo?o, Is there a way though to find PDB's with chain breaks? using biopython? Thank you, George On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues wrote: > Hi George, > > Chain breaks are pretty "harmless" and usually do not represent a faulty > PDB file. The PERMISSIVE flag is for "features" like missing b-factors. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > >> Hi, >> >> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with >> chain breaks. >> However, nothing like that seems to happen..... >> >> For instance >> >> P=PDBParser(PERMISSIVE=0) >> structure=P.get_structure('test', '7ODC.pdb') >> >> >> 7ODC has 3 chain breaks but it does not raise an exception. >> >> Thank you >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From anaryin at gmail.com Tue May 8 15:39:02 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 22:39:02 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Of course. Since they throw a warning just make sure to count the warnings, parse the chain break ones, and if they are more than 0, you have chain breaks. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi Jo?o, > > Is there a way though to find PDB's with chain breaks? using biopython? > > Thank you, > George > > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues wrote: > >> Hi George, >> >> Chain breaks are pretty "harmless" and usually do not represent a faulty >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. >> >> Cheers, >> >> Jo?o [...] Rodrigues >> http://nmr.chem.uu.nl/~joao >> >> >> >> 2012/5/8 George Devaniranjan >> >>> Hi, >>> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >>> with >>> chain breaks. >>> However, nothing like that seems to happen..... >>> >>> For instance >>> >>> P=PDBParser(PERMISSIVE=0) >>> structure=P.get_structure('test', '7ODC.pdb') >>> >>> >>> 7ODC has 3 chain breaks but it does not raise an exception. >>> >>> Thank you >>> George >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From eric.talevich at gmail.com Tue May 8 21:59:13 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 8 May 2012 21:59:13 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: The warnings module also lets you convert any warning to an error (or ignore it, etc.). Use a regular expression to match the warning message: from Bio import PDB import warnings warnings.filterwarnings('error', message='.*discontinuous at.*') p = PDB.PDBParser() s = p.get_structure("", "3BEG.pdb") On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: > Of course. Since they throw a warning just make sure to count the warnings, > parse the chain break ones, and if they are more than 0, you have chain > breaks. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > > > Hi Jo?o, > > > > Is there a way though to find PDB's with chain breaks? using biopython? > > > > Thank you, > > George > > > > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues > wrote: > > > >> Hi George, > >> > >> Chain breaks are pretty "harmless" and usually do not represent a faulty > >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. > >> > >> Cheers, > >> > >> Jo?o [...] Rodrigues > >> http://nmr.chem.uu.nl/~joao > >> > >> > >> > >> 2012/5/8 George Devaniranjan > >> > >>> Hi, > >>> > >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB > >>> with > >>> chain breaks. > >>> However, nothing like that seems to happen..... > >>> > >>> For instance > >>> > >>> P=PDBParser(PERMISSIVE=0) > >>> structure=P.get_structure('test', '7ODC.pdb') > >>> > >>> > >>> 7ODC has 3 chain breaks but it does not raise an exception. > >>> > >>> Thank you > >>> George > >>> _______________________________________________ > >>> Biopython mailing list - Biopython at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython > >>> > >> > >> > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anaryin at gmail.com Wed May 9 01:39:40 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 9 May 2012 08:39:40 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: For some reason however, I didn't get the discontinuous error .. That's why I proposed this alternative. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/9 Eric Talevich > The warnings module also lets you convert any warning to an error (or > ignore it, etc.). Use a regular expression to match the warning message: > > from Bio import PDB > import warnings > warnings.filterwarnings('error', message='.*discontinuous at.*') > p = PDB.PDBParser() > s = p.get_structure("", "3BEG.pdb") > > > > On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: > >> Of course. Since they throw a warning just make sure to count the >> warnings, >> parse the chain break ones, and if they are more than 0, you have chain >> breaks. >> >> Cheers, >> >> Jo?o [...] Rodrigues >> http://nmr.chem.uu.nl/~joao >> >> >> >> 2012/5/8 George Devaniranjan >> >> > Hi Jo?o, >> > >> > Is there a way though to find PDB's with chain breaks? using biopython? >> > >> > Thank you, >> > George >> > >> > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues >> wrote: >> > >> >> Hi George, >> >> >> >> Chain breaks are pretty "harmless" and usually do not represent a >> faulty >> >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. >> >> >> >> Cheers, >> >> >> >> Jo?o [...] Rodrigues >> >> http://nmr.chem.uu.nl/~joao >> >> >> >> >> >> >> >> 2012/5/8 George Devaniranjan >> >> >> >>> Hi, >> >>> >> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >> >>> with >> >>> chain breaks. >> >>> However, nothing like that seems to happen..... >> >>> >> >>> For instance >> >>> >> >>> P=PDBParser(PERMISSIVE=0) >> >>> structure=P.get_structure('test', '7ODC.pdb') >> >>> >> >>> >> >>> 7ODC has 3 chain breaks but it does not raise an exception. >> >>> >> >>> Thank you >> >>> George >> >>> _______________________________________________ >> >>> Biopython mailing list - Biopython at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >>> >> >> >> >> >> > >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From eric.talevich at gmail.com Wed May 9 10:31:34 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 9 May 2012 10:31:34 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Oh, there's a caveat to the warnings module -- if a given warning isn't captured this way the first time, it's never issued again. So, parsing 3BEG once normally, and again with the setup I gave, won't trigger the warning again and therefore won't raise an error. On Wed, May 9, 2012 at 1:39 AM, Jo?o Rodrigues wrote: > For some reason however, I didn't get the discontinuous error .. That's > why I proposed this alternative. > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/9 Eric Talevich > >> The warnings module also lets you convert any warning to an error (or >> ignore it, etc.). Use a regular expression to match the warning message: >> >> from Bio import PDB >> import warnings >> warnings.filterwarnings('error', message='.*discontinuous at.*') >> p = PDB.PDBParser() >> s = p.get_structure("", "3BEG.pdb") >> >> >> >> On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: >> >>> Of course. Since they throw a warning just make sure to count the >>> warnings, >>> parse the chain break ones, and if they are more than 0, you have chain >>> breaks. >>> >>> Cheers, >>> >>> Jo?o [...] Rodrigues >>> http://nmr.chem.uu.nl/~joao >>> >>> >>> >>> 2012/5/8 George Devaniranjan >>> >>> > Hi Jo?o, >>> > >>> > Is there a way though to find PDB's with chain breaks? using biopython? >>> > >>> > Thank you, >>> > George >>> > >>> > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues >>> wrote: >>> > >>> >> Hi George, >>> >> >>> >> Chain breaks are pretty "harmless" and usually do not represent a >>> faulty >>> >> PDB file. The PERMISSIVE flag is for "features" like missing >>> b-factors. >>> >> >>> >> Cheers, >>> >> >>> >> Jo?o [...] Rodrigues >>> >> http://nmr.chem.uu.nl/~joao >>> >> >>> >> >>> >> >>> >> 2012/5/8 George Devaniranjan >>> >> >>> >>> Hi, >>> >>> >>> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >>> >>> with >>> >>> chain breaks. >>> >>> However, nothing like that seems to happen..... >>> >>> >>> >>> For instance >>> >>> >>> >>> P=PDBParser(PERMISSIVE=0) >>> >>> structure=P.get_structure('test', '7ODC.pdb') >>> >>> >>> >>> >>> >>> 7ODC has 3 chain breaks but it does not raise an exception. >>> >>> >>> >>> Thank you >>> >>> George >>> >>> _______________________________________________ >>> >>> Biopython mailing list - Biopython at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >>> >>> >> >>> >> >>> > >>> >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From w.arindrarto at gmail.com Wed May 9 12:24:43 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 9 May 2012 18:24:43 +0200 Subject: [Biopython] GSoC Project Update -- 1 Message-ID: Hi everyone, I just posted my latest blog updated here: http://bow.web.id/blog/2012/05/warming-up-for-the-coding-period/ To summarize, I've spent most of my time getting to know the programs I will support better. This has been done by: 1. Playing around with the programs to see how many different outputs I can generate. 2. Writing scripts to automate test case generation for each of the programs. 3. Writing wrappers (for programs not yet wrapped by Biopython: FASTA, HMMER, and BLAT) to ease writing the test case generators. 4. Continuing to complete my proposed SearchIO object naming scheme (http://bit.ly/searchio-terms) The test cases, their generators, and the wrappers I've written are available in my non-Biopython gsoc repo here: http://github.com/bow/gsoc/. Additionally, I've used the generated test case to improve a recent bug report and submitted a fix for the next release. For the coming weeks prior to coding start, I'm planning to play around more with XML and SQLite as I will use them in the code. I might start to add more skeleton code to my current development branch as well (https://github.com/bow/biopython). cheers, Bow From mmokrejs at fold.natur.cuni.cz Wed May 9 14:01:08 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 09 May 2012 20:01:08 +0200 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: <87r4ynu3cx.fsf@fastmail.fm> References: <87r4ynu3cx.fsf@fastmail.fm> Message-ID: <4FAAB0E4.30409@fold.natur.cuni.cz> Hi Brad, I just got bitten by this myself as well. Could be the legacy blast parser improved to give clearer error message? E.g. that it failed to find the LOCUS line or whatever was it looking for? With the legacy BLAST documentation being gone from current Tutorial it is easy to pick the wrong parser. ;) And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ give me same alignment, no matter what arguments I use to adjust for the (it gives me wider alignment than wanted and I can make it a look shorter, but shortening it just a bit like legacy BLAST output .. is not doable). And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . Thanks, Martin Brad Chapman wrote: > > Sar; > >> I am new to both python and biopython. > > Welcome. Thanks for including your code along with the problem report. > >> What I'm trying to do is to parse a blast result xml file (myblast.xml), >> attached here. >> >> The code looks like this: > [...] >> blast_parser = NCBIStandalone.BlastParser() > [...] >> ValueError: Invalid header? > > You are using NCBIStandalone, which parses plain text blast output. To > parse the XML output, you should use the NCBIXML parser: > > from Bio.Blast import NCBIXML > blast_records = NCBIXML.parse(result_handle) > > The tutorial has more details and examples: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87 > > Hope this helps, > Brad > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From p.j.a.cock at googlemail.com Wed May 9 14:25:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 May 2012 19:25:36 +0100 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: <4FAAB0E4.30409@fold.natur.cuni.cz> References: <87r4ynu3cx.fsf@fastmail.fm> <4FAAB0E4.30409@fold.natur.cuni.cz> Message-ID: On Wed, May 9, 2012 at 7:01 PM, Martin Mokrejs wrote: > Hi Brad, > ?I just got bitten by this myself as well. Could be the legacy blast parser > improved to give clearer error message? E.g. that it failed to find the LOCUS > line or whatever was it looking for? With the legacy BLAST documentation being > gone from current Tutorial it is easy to pick the wrong parser. ;) > > ?And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ > give me same alignment, no matter what arguments I use to adjust for the (it gives me > wider alignment than wanted and I can make it a look shorter, but shortening it just > a bit like legacy BLAST output .. is not doable). Have you contacted the NCBI about this possible regression? > ?And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. > Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) > I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . > Thanks, > Martin Could you clarify if you are talking about documentation for calling the 'legacy' BLAST command line tools (e.g. blastall), or documentation for parsing the plain text human readable output (which still exists in BLAST+)? On a related point, Bow's just done a bit of work updating our plain text parser to cope with BLAST+ (specifically changes in BLAST 2.2.25+ and/or 2.2.26+). One of the aims of Bow's GSoC project will make dealing with the different BLAST formats a lot simpler. Peter From mmokrejs at fold.natur.cuni.cz Wed May 9 14:48:19 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 09 May 2012 20:48:19 +0200 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: <87r4ynu3cx.fsf@fastmail.fm> <4FAAB0E4.30409@fold.natur.cuni.cz> Message-ID: <4FAABBF3.1030007@fold.natur.cuni.cz> Hi Peter, Peter Cock wrote: > On Wed, May 9, 2012 at 7:01 PM, Martin Mokrejs > wrote: >> Hi Brad, >> I just got bitten by this myself as well. Could be the legacy blast parser >> improved to give clearer error message? E.g. that it failed to find the LOCUS >> line or whatever was it looking for? With the legacy BLAST documentation being >> gone from current Tutorial it is easy to pick the wrong parser. ;) >> >> And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ >> give me same alignment, no matter what arguments I use to adjust for the (it gives me >> wider alignment than wanted and I can make it a look shorter, but shortening it just >> a bit like legacy BLAST output .. is not doable). > > Have you contacted the NCBI about this possible regression? No, not yet. >> And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. >> Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) >> I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . >> Thanks, >> Martin > > Could you clarify if you are talking about documentation for calling > the 'legacy' > BLAST command line tools (e.g. blastall), or documentation for parsing the Yes, "blastall -p blastn ...", the default plaintext pairwise output (-m 0), version 2.2.24. > plain text human readable output (which still exists in BLAST+)? > > On a related point, Bow's just done a bit of work updating our plain > text parser to > cope with BLAST+ (specifically changes in BLAST 2.2.25+ and/or 2.2.26+). > > One of the aims of Bow's GSoC project will make dealing with the different > BLAST formats a lot simpler. Its great that we have GSoC students, would I have some spare time I would mentor one. Good luck and thanks for your care, Peter! Martin From livingstonemark at gmail.com Fri May 11 22:06:07 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Sat, 12 May 2012 12:06:07 +1000 Subject: [Biopython] Superposition Message-ID: Hi Guys, Thanks to Andrew and others for help in my previous message. I have gone through various incarnations of my code, and suddenly found this simple code works for the small test I have done. I am using the datafiles from: Kellogg, E. H., Leaver-Fay, A., & Baker, D. (2011). Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins, 79(3), 830?838. doi:10.1002/prot.22921 These files have been modified so that there are matched PDBs which vary only by one mutated residue, and I am trying to carbon alpha superimpose the PDB which is the Mutanttype over the Wildtype and save to a PDB - which I seem to have fluked how to do. I am still working on the code for directory traversal so I have not tried it on the hundreds of matched PDBs yet. Is there anything in this code which is going to bite me? How can I improve it? ------------------------------------------------------------------------------------ #!/usr/bin/env python # Wildtype (wt) = reference, Mutanttype (mt) = alternate from Bio.PDB import * #parsing the PDBs parser = PDBParser(PERMISSIVE=1) l_wt_atoms = [] l_mt_atoms = [] pdb_out_filename = "./1bti_aligned.pdb" wt_structure = parser.get_structure("1bpi", './1bpi.pdb') mt_structure = parser.get_structure("1bti", './1bti.pdb') wt_model = wt_structure[0] mt_model = mt_structure[0] wt_chain = wt_model["A"] mt_chain = mt_model["A"] for wt_residue in wt_chain: resnum = wt_residue.get_id()[1] l_wt_atoms.append( wt_residue['CA']) for mt_residue in mt_chain: resnum = mt_residue.get_id()[1] l_mt_atoms.append( mt_residue['CA']) ##SuperImpose sup = Superimposer() ## Specify the atom lists ## ""wildtype"" and ""mutanttype"" are lists of Atom objects ## The mt atoms will be put on the wt atoms sup.set_atoms(l_wt_atoms, l_mt_atoms) ## Print rotation/translation/rmsd print "ROTRAN: ", sup.rotran print "RMS: ", sup.rms ## Apply rotation/translation to the moving atoms sup.apply(l_mt_atoms) print "Saving aligned structure as PDB file %s" % pdb_out_filename io=PDBIO() io.set_structure(mt_structure) io.save(pdb_out_filename) print "Done" ------------------------------------------------------------------------------------ Thanks in advance, Mark Livingstone B.InfoTech (Hons) Student Griffith University School of ICT Southport Qld Australia From livingstonemark at gmail.com Sun May 13 02:54:28 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Sun, 13 May 2012 16:54:28 +1000 Subject: [Biopython] .ent versus .pdb files? Message-ID: Hi Guys, I have a bunch of files which appear to be pdb file-like but have the .ent file extension. Is there any difference of significance to Bio.PDB? Thanks in advance, MarkL From anaryin at gmail.com Sun May 13 05:16:00 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sun, 13 May 2012 11:16:00 +0200 Subject: [Biopython] .ent versus .pdb files? In-Reply-To: References: Message-ID: Hi Mark, Nope, not at all. Cheers, Jo?o No dia 13 de Mai de 2012 08:55, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I have a bunch of files which appear to be pdb file-like but have the > .ent file extension. Is there any difference of significance to > Bio.PDB? > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anubratadas at gmail.com Mon May 14 08:03:29 2012 From: anubratadas at gmail.com (Anubrata Das) Date: Mon, 14 May 2012 17:33:29 +0530 Subject: [Biopython] parsing genbank file Message-ID: i am new to biopython. i wanted to parse through individual records from the genbank file of deinococcus radiodurans chromosome 1 sequence.for e.g i wanted the list of identifiers for each record >>> identifiers=[seq_record.id for seq_record in SeqIO.parse("C:\\Dr1.gb","genbank")] >>> identifiers[:10] ['NC_001263.1'] but i would get only one master entry. then if i wanted to parse thru individual records >>> record=SeqIO.parse("C:\\Dr1.gb","genbank") >>> record.next() SeqRecord(seq=UnknownSeq(2648638, alphabet = IUPACAmbiguousDNA(), character = 'N'), id='NC_001263.1', name='NC_001263', description='Deinococcus radiodurans R1 chromosome 1, complete sequence.', dbxrefs=['Project:57665']) >>> record.next() Traceback (most recent call last): File "", line 1, in record.next() StopIteration i get this output. please tell me the correct method regards -- Anubrata Das Scientific Officer Molecular Biology Division Bhabha Atomic Research Centre From p.j.a.cock at googlemail.com Mon May 14 09:48:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 14 May 2012 14:48:39 +0100 Subject: [Biopython] parsing genbank file In-Reply-To: References: Message-ID: On Mon, May 14, 2012 at 1:03 PM, Anubrata Das wrote: > i am new to biopython. i wanted to parse through individual records > from the genbank file of deinococcus radiodurans chromosome 1 > sequence. Probably your GenBank file only contains one record (for the whole of chr1). You could use Bio.SeqIO.read(...) in this case: from Bio import SeqIO record = SeqIO.parse(r"C:\Dr1.gb","genbank") print record.id print len(record.features) Peter From David.Lapointe at umassmed.edu Mon May 14 10:00:20 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Mon, 14 May 2012 14:00:20 +0000 Subject: [Biopython] parsing genbank file In-Reply-To: References: Message-ID: <86BFEB1DFA6CB3448DB8AB1FC52F4059081148@ummscsmbx06.ad.umassmed.edu> You might try looking for sequences here ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Deinococcus_radiodurans_R1_uid57665/ David -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter Cock Sent: Monday, May 14, 2012 9:49 AM To: Anubrata Das Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] parsing genbank file On Mon, May 14, 2012 at 1:03 PM, Anubrata Das wrote: > i am new to biopython. i wanted to parse through individual records > from the genbank file of deinococcus radiodurans chromosome 1 > sequence. Probably your GenBank file only contains one record (for the whole of chr1). You could use Bio.SeqIO.read(...) in this case: from Bio import SeqIO record = SeqIO.parse(r"C:\Dr1.gb","genbank") print record.id print len(record.features) Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mmokrejs at fold.natur.cuni.cz Wed May 16 05:32:34 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 16 May 2012 11:32:34 +0200 Subject: [Biopython] Legacy blast XML parser returns prematurely StopIteration Message-ID: <4FB37432.7060707@fold.natur.cuni.cz> Hi, I am parsing some blast 2.2.24 XML output and the last record I get is the one from iteration 124. I see that entry is followed by a new section which is probably the culprit. I will try newer legacy blast but still, biopython could maybe overcome this bug in XML input? blastall -p blastn -A 4 -i SRR068315.fasta -d my_targets.fasta -F 0 -S 1 -r 2 -e 10e-30 -m 7 blastn blastn 2.2.24 [Aug-08-2010] ~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~"Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs", Nucleic Acids Res. 25:3389-3402. my_targets.fasta lcl|1_0 FYUQ5C204IQCOE length=283 xy=3463_2076 region=4 run=R_2009_07_08_19_30_38_ 318 1e-29 2 -3 5 2 F [cut] 124 lcl|124_0 FYUQ5C204JXGMI length=44 xy=3954_2264 region=4 run=R_2009_07_08_19_30_38_ 350 22 9262 0 0 0.41 0.625 0.78 No hits found 1 22 9262 0 0 0.41 0.625 0.78 125 lcl|125_0 FYUQ5C204JFG82 length=173 xy=3749_2948 region=4 run=R_2009_07_08_19_30_38_ 208 22 9262 0 0 0.41 0.625 0.78 No hits found 126 lcl|126_0 FYUQ5C204I2D3A length=146 xy=3600_2628 region=4 run=R_2009_07_08_19_30_38_ 205 22 9262 0 0 0.41 0.625 0.78 No hits found Grep-ping for the iteration numbers I foresee few more cases like that ahead in the XML file: 234 1 235 236 345 1 346 347 450 1 451 452 555 1 556 557 655 1 656 657 759 1 760 761 859 1 860 861 956 1 957 958 1050 1 1051 1052 1145 1 1146 1147 1239 1 1240 1241 1333 1 1334 1335 1430 1 1431 1432 1523 1 1524 1525 1610 1 1611 1612 1703 1 1704 1705 1792 1 1793 1794 1881 1 1882 1883 Then, no this problem anymore until end of the XML file at: 25698 Thanks for comments, Martin From p.j.a.cock at googlemail.com Wed May 16 05:48:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 May 2012 10:48:10 +0100 Subject: [Biopython] Legacy blast XML parser returns prematurely StopIteration In-Reply-To: <4FB37432.7060707@fold.natur.cuni.cz> References: <4FB37432.7060707@fold.natur.cuni.cz> Message-ID: On Wed, May 16, 2012 at 10:32 AM, Martin Mokrejs wrote: > Hi, > ?I am parsing some blast 2.2.24 XML output and the last record I get is the one from > iteration 124. I see that entry is followed by a new section which > is probably the culprit. I will try newer legacy blast but still, biopython could maybe > overcome this bug in XML input? > > blastall -p blastn -A 4 -i SRR068315.fasta -d my_targets.fasta -F 0 -S 1 -r 2 -e 10e-30 -m 7 > Could you file a bug here and attach the complete XML test case please? http://redmine.open-bio.org/projects/biopython Our XML parser should handle both NCBI 'legacy' BLAST and BLAST+ Thanks, Peter From josefergil at gmail.com Wed May 23 10:18:44 2012 From: josefergil at gmail.com (jose gil) Date: Wed, 23 May 2012 16:18:44 +0200 Subject: [Biopython] starting with biopython Message-ID: Hello everyone, I'm starting with the program and I have some problems, because I don't know how download the files in order the program can load them. from the python shell I follow the instructions in the tutorial in order to load the sequence but I don't know actually where is the correct place to save the files I download for example from GenBank. Thank you very much for your help, -- Jos? Fernando Gil R. From p.j.a.cock at googlemail.com Wed May 23 10:45:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 May 2012 15:45:04 +0100 Subject: [Biopython] starting with biopython In-Reply-To: References: Message-ID: On Wed, May 23, 2012 at 3:18 PM, jose gil wrote: > Hello everyone, > > I'm starting with the program and I have some problems, because I don't > know how download the files in order the program can load them. > from the python shell I follow the instructions in the tutorial in order to > load the sequence but I don't know actually where is the correct place to > save the files I download for example from GenBank. > > Thank you very much for your help, The simplest approach is to put your Python scripts and data files all in the same folder together. Then you don't need to bother with giving paths, just local filenames will be fine. Some experience with working at the command line would help with understanding paths, absolute paths, and relative paths. Are you working on Windows? Note that by default Windows hides the extension of known file formats - I always turn this off so that in Explorer I see the full file names. What I mean is I prefer to see "example.fasta" and "example.gbk" instead of two files apparently called "example" but with a different icon. You'll find there are lots of file extensions in Bioinformatics, and they are important. Peter From animesh.agrawal at anu.edu.au Thu May 24 01:58:12 2012 From: animesh.agrawal at anu.edu.au (Animesh Agrawal) Date: Thu, 24 May 2012 15:58:12 +1000 Subject: [Biopython] Integrating SQL query to biopython Message-ID: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> Hi, I am running small SQL queries to select sequences from a local BIOSQL database. One instance such query is as follows: SELECT biosequence.* FROM biosequence JOIN bioentry USING (bioentry_id) WHERE biosequence.seq NOT LIKE "%X%" AND biosequence.alphabet = 'protein' I am wondering, how do I integrate this SQL query with Biopython code to get the output in form of SeqRecord or Seq objects. Cheers, Animesh Animesh Agrawal PhD Scholar Computational & Conceptual Biology, JCSMR Australian National University Canberra, Australia Tel: +61 2 6125 8303 Email: animesh.agrawal at anu.edu.au From p.j.a.cock at googlemail.com Thu May 24 05:10:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 May 2012 10:10:13 +0100 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <4fbdcf67.ca48340a.3ba1.5c63SMTPIN_ADDED@mx.google.com> References: <4fbdcf67.ca48340a.3ba1.5c63SMTPIN_ADDED@mx.google.com> Message-ID: On Thu, May 24, 2012 at 6:58 AM, Animesh Agrawal wrote: > Hi, > > I am running small SQL queries to select sequences from a local BIOSQL > database. One instance such query is as follows: > > > > SELECT ?biosequence.* > > FROM ? ?biosequence JOIN bioentry USING (bioentry_id) > > WHERE ? biosequence.seq NOT LIKE "%X%" > > AND ? biosequence.alphabet = 'protein' > > > > I am wondering, how do I integrate this SQL query with Biopython code to get > the output in form of SeqRecord or Seq objects. >From your direct database access, get the bioentry table's primary ID, and then use that to create a DBSeqRecord object (which is a subclass of SeqRecord and will also load the sequence for you). You will also need the adapter object as the other initialization argument, which is how the DBSeqRecord knows which database to read from. Get that by connecting to the BioSQL database through the Biopython code as usual. Something like this (untested): from BioSQL import BioSeqDatabase from BioSQL.BioSeq import DBSeqRecord #Connect to BioSQL database as usual, server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") primary_id = .... #your code here #Use Biopython's BioSQL SeqRecord loading: record = DBSeqRecord(server.adapter, primary_id) Peter From chapmanb at 50mail.com Thu May 24 05:15:02 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 24 May 2012 05:15:02 -0400 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> References: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> Message-ID: <87d35uos89.fsf@fastmail.fm> Animesh; > I am running small SQL queries to select sequences from a local BIOSQL > database. One instance such query is as follows: > > SELECT biosequence.* > FROM biosequence JOIN bioentry USING (bioentry_id) > WHERE biosequence.seq NOT LIKE "%X%" > AND biosequence.alphabet = 'protein' > > I am wondering, how do I integrate this SQL query with Biopython code to get > the output in form of SeqRecord or Seq objects. If you have the interval bioentry IDs you can use the BioSQL code directly to get SeqRecord compatible objects: from BioSQL import BioSeqDatabase from BioSQL.BioSeq import DBSeqRecord server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") rec = DBSeqRecord(server.adaptor, your_bioentry_id) Hope this helps, Brad From p.j.a.cock at googlemail.com Thu May 24 05:28:14 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 May 2012 10:28:14 +0100 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <87d35uos89.fsf@fastmail.fm> References: <87d35uos89.fsf@fastmail.fm> Message-ID: On Thu, May 24, 2012 at 10:15 AM, Brad Chapman wrote: > > Animesh; > >> I am running small SQL queries to select sequences from a local BIOSQL >> database. One instance such query is as follows: >> >> SELECT ?biosequence.* >> FROM ? ?biosequence JOIN bioentry USING (bioentry_id) >> WHERE ? biosequence.seq NOT LIKE "%X%" >> AND ? biosequence.alphabet = 'protein' >> >> I am wondering, how do I integrate this SQL query with Biopython code to get >> the output in form of SeqRecord or Seq objects. > > If you have the interval bioentry IDs you can use the BioSQL code > directly to get SeqRecord compatible objects: > > from BioSQL import BioSeqDatabase > from BioSQL.BioSeq import DBSeqRecord > > server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", > ? ? ? ? ? ? ? ? ? ? passwd = "", host = "localhost", db="bioseqdb") > rec = DBSeqRecord(server.adaptor, your_bioentry_id) > > Hope this helps, > Brad Good to see we gave the same answer :) Peter From livingstonemark at gmail.com Sun May 27 21:55:25 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Mon, 28 May 2012 11:55:25 +1000 Subject: [Biopython] Getting the atom number of a CA residue Message-ID: Hi Guys, I want to use the Bio.PDB.NeighborSearch. To do so, it seems I need to tell it what atom to center the search on. I have constructed this convoluted center finding method ;-) and I'm wondering if there is something simpler!! atoms = Bio.PDB.Selection.unfold_entities(mtc, "A") # we find the atom number of the mutation site residue's CA atom which becomes the center of our search radius center = atoms[mtc[mutation_site].get_unpacked_list()[1].get_serial_number()].get_coord() bions = Bio.PDB.NeighborSearch(atoms) atoms_found = bions.search(center, 5.0, "A") residues_found = bions.search(center, 5.0, "R") Using 1bti.pdb and asking for the Residue 22 [=mutation_site] CA [1] does give me atom #187 coords which is correct. At the moment I am only interested in CA, but I realise this somewhat hardcoded solution will not scale! Secondly, what I eventually want to get is more of a range function where I can find e.g. what is between 5-10A from center. Since neighborsearch doesn't give access to distances of the atoms / residues, am I correct in thinking I will have to "roll my own" neighbourhoodsearch and construct a sorted by distance list and iterate through it getting e.g. 5 < distance < 10 or similar? Thanks for your thoughts. MarkL From livingstonemark at gmail.com Tue May 29 23:09:41 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Wed, 30 May 2012 13:09:41 +1000 Subject: [Biopython] Getting side chain atoms? Message-ID: Hi Guys, I notice on the wiki that it says the mailing list is at biopython at biopython.org, but when I suscribed it said to use biopython at lists.open-bio.org, so I'm wondering what the difference is? What is the simplest way to get a list of the side chain atoms given say a residue number? Also, not entirely related to Biopython, but I'm wondering if there is some way to get a sense of the overall shape of a protein? Like is it globular, a big string, a sheet or what? I can see if you looked at the bounding box, that might be a starting point, but does anyone have any other ideas? I habe been looking at it as a geometry type problem but haven't gotten too far yet. Thanks in advance, MarkL From livingstonemark at gmail.com Tue May 29 23:09:41 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Wed, 30 May 2012 13:09:41 +1000 Subject: [Biopython] Getting side chain atoms? Message-ID: Hi Guys, I notice on the wiki that it says the mailing list is at biopython at biopython.org, but when I suscribed it said to use biopython at lists.open-bio.org, so I'm wondering what the difference is? What is the simplest way to get a list of the side chain atoms given say a residue number? Also, not entirely related to Biopython, but I'm wondering if there is some way to get a sense of the overall shape of a protein? Like is it globular, a big string, a sheet or what? I can see if you looked at the bounding box, that might be a starting point, but does anyone have any other ideas? I habe been looking at it as a geometry type problem but haven't gotten too far yet. Thanks in advance, MarkL From dilara.ally at gmail.com Tue May 29 23:30:27 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Tue, 29 May 2012 20:30:27 -0700 Subject: [Biopython] replace header Message-ID: Hi Guys, I'm interested in replacing just one part of the header for every read in a 40Gb fastq file. Because the files are so huge I don't want to read the entire file into the memory just the single read and then rewrite to a new file. The problem as it stands is that I'm creating all new SeqRecord object, appending a list called newsolid. And then once that list is complete with all records, I write that list to a new file. Preferably I'd like to write each new SeqRecord immediately to a file. Sorry if I've missed this lesson in the Biopython tutorial and cook book! Any help would be greatly appreciated! Here is the code. from Bio import SeqIO from Bio.SeqRecord import SeqRecord newsolid=[] for seq_record in SeqIO.parse("solid_1.fastq", "fastq"): print seq_record.id original_header=seq_record.id import re subfind=r"(\w+)_(\w+)" result=re.search(subfind, original_header) print result.groups() subheader="_1" subreplace=r"\1_1" new_header=re.sub(subfind, subreplace, original_header) print new_header newfastqrecord=SeqRecord(seq_record.seq, id=new_header, letter_annotations=seq_record.letter_annotations) newsolid.append(newfastqrecord) output="newsolid_1.fastq" from Bio import SeqIO SeqIO.write(newsolid, output, "fastq") Cheers, Dilara From arklenna at gmail.com Tue May 29 23:44:41 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 29 May 2012 23:44:41 -0400 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: Hi Dilara, Opening a file for append with 'a' allows successive writes to go to the end of the file. Before the loop: out_handle = open("newsolid_1.fastq", 'a') In the loop: SeqIO.write(newfastqrecord, out_handle, "fastq") After the loop: out_handle.close() You may have to manually write newlines to the file but hopefully the fastq writer handles that properly. Hope that helps, Lenna From dilara.ally at gmail.com Tue May 29 23:54:13 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Tue, 29 May 2012 20:54:13 -0700 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: Thanks! That worked. On May 29, 2012, at 8:44 PM, Lenna Peterson wrote: > Hi Dilara, > > Opening a file for append with 'a' allows successive writes to go to > the end of the file. > > Before the loop: > > out_handle = open("newsolid_1.fastq", 'a') > > In the loop: > > SeqIO.write(newfastqrecord, out_handle, "fastq") > > After the loop: > > out_handle.close() > > > You may have to manually write newlines to the file but hopefully the > fastq writer handles that properly. > > Hope that helps, > > Lenna From anaryin at gmail.com Wed May 30 02:04:14 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 08:04:14 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: Hi Mark, The gyration tensor should give you the means of calculating how oblate or prolate your molecule is. Regarding the sidechain, i think you just have to manually do it, but since the backbone atoms are always the same it shouldn't be too hard. Cheers, Jo?o No dia 30 de Mai de 2012 05:10, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org, but when I suscribed it said to use > biopython at lists.open-bio.org, so I'm wondering what the difference is? > > What is the simplest way to get a list of the side chain atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering if there is > some way to get a sense of the overall shape of a protein? Like is it > globular, a big string, a sheet or what? I can see if you looked at > the bounding box, that might be a starting point, but does anyone have > any other ideas? I habe been looking at it as a geometry type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anaryin at gmail.com Wed May 30 02:04:14 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 08:04:14 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: Hi Mark, The gyration tensor should give you the means of calculating how oblate or prolate your molecule is. Regarding the sidechain, i think you just have to manually do it, but since the backbone atoms are always the same it shouldn't be too hard. Cheers, Jo?o No dia 30 de Mai de 2012 05:10, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org, but when I suscribed it said to use > biopython at lists.open-bio.org, so I'm wondering what the difference is? > > What is the simplest way to get a list of the side chain atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering if there is > some way to get a sense of the overall shape of a protein? Like is it > globular, a big string, a sheet or what? I can see if you looked at > the bounding box, that might be a starting point, but does anyone have > any other ideas? I habe been looking at it as a geometry type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Wed May 30 03:45:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 08:45:29 +0100 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: On Wednesday, May 30, 2012, Mark Livingstone wrote: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , but when I suscribed it said to > use > biopython at lists.open-bio.org , so I'm wondering what the > difference is? > > They are the same - the OBF (open-bio.org) machine also handles the BioPerl mailing lists etc as well. Peter From p.j.a.cock at googlemail.com Wed May 30 03:45:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 08:45:29 +0100 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: On Wednesday, May 30, 2012, Mark Livingstone wrote: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , but when I suscribed it said to > use > biopython at lists.open-bio.org , so I'm wondering what the > difference is? > > They are the same - the OBF (open-bio.org) machine also handles the BioPerl mailing lists etc as well. Peter From p.j.a.cock at googlemail.com Wed May 30 04:56:57 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 09:56:57 +0100 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 4:54 AM, Dilara Ally wrote: > Thanks! ?That worked. > > On May 29, 2012, at 8:44 PM, Lenna Peterson wrote: > >> Hi c, >> >> Opening a file for append with 'a' allows successive writes to go to >> the end of the file. >> >> Before the loop: >> >> ? ?out_handle = open("newsolid_1.fastq", 'a') >> >> In the loop: >> >> ? ?SeqIO.write(newfastqrecord, out_handle, "fastq") >> >> After the loop: >> >> ? ?out_handle.close() Hi Dilara & Lenna, I would use append mode with caution - it will have side effects like if you run this script twice, the output file will double in size (the first run plus the second run). Wouldn't opening in write mode work just as well here? i.e. Open the handle, do the loop, close the handle. There are some other further changes I would suggest. First, you don't need to create a new SeqRecord, you can modify the old record in situ. This will be faster as it avoids extra object creation: ... newfastqrecord=SeqRecord(seq_record.seq, id=new_header, letter_annotations=seq_record.letter_annotations) ... becomes just: ... seq_record.id = new_header ... Next, it is better to call SeqIO.write(...) once to do the whole file. On simple file formats like FASTA, FASTQ, GenBank, there is no header/footer structure so you can write each record independently. In general this is not possible, e.g. SFF files. Moreover, multiple calls to SeqIO.write(...) is slower than one single call. The key point about using SeqIO.write(...) once to do a whole file is this requires an iterator based approach. For example, using a generator expression and a function acting on a single record: def modify_record(record): #Do something sensible to the headers here: record.id = "modified" return record #This is a generator expression: modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") print "Modified %i records" % count Equivalently using a generator function which does the looping itself: def modify_records(records): for record in records: #Do something sensible to the headers here: record.id = "modified" yield record count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", "fastq")), "newsolid_1.fastq", "fastq") print "Modified %i records" % count Getting to gripes with iterators and thinking this way takes a while - but it is extremely useful for dealing with large datasets efficiently (without running out of memory). Now, For FASTQ in particular, the files are usually very large, and using SeqIO and SeqRecord objects can be too slow. You might find this useful: http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ Peter From ferreirafm at usp.br Wed May 30 08:57:49 2012 From: ferreirafm at usp.br (Frederico Moras Ferreira) Date: Wed, 30 May 2012 09:57:49 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: <4FC6194D.1060802@usp.br> Hi Mark, I'm also very interested in overall protein shape analysis. I'm completely new to Biopython and can't help you much. Regarding to your question itself, that's something not trivial. One of the approaches would be to calculate the center of mass of your protein and iteratively calculate the momentum of inertia along three mutually perpendicular axes so as it is maximum in one direction and minimum in another. Sampling the momentum of inertia of the third axis and comparing with the other two will give a good estimation of your protein overall shape. Best of luck, Fred Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > Hi Mark, > > The gyration tensor should give you the means of calculating how oblate or > prolate your molecule is. > > Regarding the sidechain, i think you just have to manually do it, but since > the backbone atoms are always the same it shouldn't be too hard. > > Cheers, > > Jo?o > No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< > livingstonemark at gmail.com> escreveu: > >> Hi Guys, >> >> I notice on the wiki that it says the mailing list is at >> biopython at biopython.org, but when I suscribed it said to use >> biopython at lists.open-bio.org, so I'm wondering what the difference is? >> >> What is the simplest way to get a list of the side chain atoms given >> say a residue number? >> >> Also, not entirely related to Biopython, but I'm wondering if there is >> some way to get a sense of the overall shape of a protein? Like is it >> globular, a big string, a sheet or what? I can see if you looked at >> the bounding box, that might be a starting point, but does anyone have >> any other ideas? I habe been looking at it as a geometry type problem >> but haven't gotten too far yet. >> >> Thanks in advance, >> >> MarkL >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From golubchi at stats.ox.ac.uk Wed May 30 09:40:38 2012 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 30 May 2012 14:40:38 +0100 Subject: [Biopython] Bio.Phylo: midpoint root? Message-ID: <4FC62356.2030709@stats.ox.ac.uk> Hello, Does anyone have a quickie method for calculating the midpoint root of a given tree? Thanks, Tanya From eric.talevich at gmail.com Wed May 30 10:55:11 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 30 May 2012 10:55:11 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: <4FC62356.2030709@stats.ox.ac.uk> References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik wrote: > Hello, > > Does anyone have a quickie method for calculating the midpoint root of a > given tree? > > It's been on my to-do list. (The first step was adding the keyword argument 'outgroup_branch_length' to root_with_outgroup.) The tree method 'depths' should also be handy. The algorithm I had in mind looks like: 1. Take the depths() of each clade under the root. 2. Identify the deepest tip under each clade. 3. Assuming the tree is bifurcating, take the shallower tip as the "out_tip" and the deeper tip as the "in_tip". 4. If the difference between the depths of "out_tip" and "in_tip" are greater than the length of the branch connecting the two clades below the root (tree.clade[0].branch_length + tree.clade[1].branch_length), there's a possibility that a better out_tip is hiding inside the deeper clade. So, repeat the operation on tree.clade[1], recursively, until meeting the stop condition I just described. 5. To identify the midpoint, halve the distance between in_tip and out_tip, and trace backward from in_tip by that distance to reach the new root. With multifurcations, the algorithm looks similar, but with more loops. I too would be delighted to see a better algorithm for this. -E From anaryin at gmail.com Wed May 30 11:13:41 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 17:13:41 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: <4FC6194D.1060802@usp.br> References: <4FC6194D.1060802@usp.br> Message-ID: Dear Frederico and Mark, I have a few scripts to do exactly what Frederico described, that play with Biopython. I will share them tomorrow and put an example here of how they work. Eventually it will become part of Biopython, in a future release I hope.. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/30 Frederico Moras Ferreira > Hi Mark, > I'm also very interested in overall protein shape analysis. I'm completely > new to Biopython and can't help you much. Regarding to your question > itself, that's something not trivial. One of the approaches would be to > calculate the center of mass of your protein and iteratively calculate the > momentum of inertia along three mutually perpendicular axes so as it is > maximum in one direction and minimum in another. Sampling the momentum of > inertia of the third axis and comparing with the other two will give a good > estimation of your protein overall shape. > Best of luck, > Fred > > Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > > Hi Mark, >> >> The gyration tensor should give you the means of calculating how oblate or >> prolate your molecule is. >> >> Regarding the sidechain, i think you just have to manually do it, but >> since >> the backbone atoms are always the same it shouldn't be too hard. >> >> Cheers, >> >> Jo?o >> No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< >> livingstonemark at gmail.com> escreveu: >> >> Hi Guys, >>> >>> I notice on the wiki that it says the mailing list is at >>> biopython at biopython.org, but when I suscribed it said to use >>> biopython at lists.open-bio.org, so I'm wondering what the difference is? >>> >>> What is the simplest way to get a list of the side chain atoms given >>> say a residue number? >>> >>> Also, not entirely related to Biopython, but I'm wondering if there is >>> some way to get a sense of the overall shape of a protein? Like is it >>> globular, a big string, a sheet or what? I can see if you looked at >>> the bounding box, that might be a starting point, but does anyone have >>> any other ideas? I habe been looking at it as a geometry type problem >>> but haven't gotten too far yet. >>> >>> Thanks in advance, >>> >>> MarkL >>> ______________________________**_________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/biopython >>> >>> ______________________________**_________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/biopython >> > > ______________________________**_________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/biopython > From ferreirafm at usp.br Wed May 30 12:55:23 2012 From: ferreirafm at usp.br (Frederico Moras Ferreira) Date: Wed, 30 May 2012 13:55:23 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: <4FC6194D.1060802@usp.br> Message-ID: <4FC650FB.8090502@usp.br> That's great! Look forward hearing from you. Cheers, Fred Em 30-05-2012 12:13, Jo?o Rodrigues escreveu: > Dear Frederico and Mark, > > I have a few scripts to do exactly what Frederico described, that play > with Biopython. I will share them tomorrow and put an example here of > how they work. Eventually it will become part of Biopython, in a > future release I hope.. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/30 Frederico Moras Ferreira > > > Hi Mark, > I'm also very interested in overall protein shape analysis. I'm > completely new to Biopython and can't help you much. Regarding to > your question itself, that's something not trivial. One of the > approaches would be to calculate the center of mass of your > protein and iteratively calculate the momentum of inertia along > three mutually perpendicular axes so as it is maximum in one > direction and minimum in another. Sampling the momentum of inertia > of the third axis and comparing with the other two will give a > good estimation of your protein overall shape. > Best of luck, > Fred > > Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > > Hi Mark, > > The gyration tensor should give you the means of calculating > how oblate or > prolate your molecule is. > > Regarding the sidechain, i think you just have to manually do > it, but since > the backbone atoms are always the same it shouldn't be too hard. > > Cheers, > > Jo?o > No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< > livingstonemark at gmail.com > > escreveu: > > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , > but when I suscribed it said to use > biopython at lists.open-bio.org > , so I'm wondering > what the difference is? > > What is the simplest way to get a list of the side chain > atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering > if there is > some way to get a sense of the overall shape of a protein? > Like is it > globular, a big string, a sheet or what? I can see if you > looked at > the bounding box, that might be a starting point, but does > anyone have > any other ideas? I habe been looking at it as a geometry > type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From eric.talevich at gmail.com Thu May 31 00:36:49 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 31 May 2012 00:36:49 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: On Wed, May 30, 2012 at 10:55 AM, Eric Talevich wrote: > On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik wrote: > >> Hello, >> >> Does anyone have a quickie method for calculating the midpoint root of a >> given tree? >> >> > It's been on my to-do list. (The first step was adding the keyword > argument 'outgroup_branch_length' to root_with_outgroup.) The tree method > 'depths' should also be handy. > > The algorithm I had in mind looks like: > > > I too would be delighted to see a better algorithm for this. > > I implemented this in an intuitive but very inefficient way, calculating the pairwise distances between all tips of the tree. You can try it from git: https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 It would still be nice to see a better algorithm, if anyone has one on hand. -E From francesco.strozzi at gmail.com Thu May 31 05:11:25 2012 From: francesco.strozzi at gmail.com (Francesco Strozzi) Date: Thu, 31 May 2012 11:11:25 +0200 Subject: [Biopython] EU Codefest 2012 Announcement Message-ID: The Open Bioinformatics Foundation (OBF) EU-CodeFest will be held in Parco Tecnologico Padano (PTP) Lodi, Italy on the19th ? 20th of July. The CodeFest is a small focused event under the auspices of the Open Bioinformatics Foundation, and is a sister event of BOSC2012 being held in California USA this year. Three main topics will be worked on during the CodeFest: - NGS and high performance parsers for OpenBio projects. - RDF and semantic web for bioinformatics. - Bioinformatics pipelines definition, execution and distribution. The number of places is limited to 30 participants at maximum, on a first come, first serve basis. Undergraduate and PhD students are welcome to participate. The cost of the event is EUR 100 per person, which includes also lunches, coffee breaks and the social dinner on the 19th of July. Only for students, we can sponsor a limited number of attendees that will not pay for the registration fee. Those students, willing to participate for free to the event, will be asked to submit their qualifications and experience in software development. The organizing committee will review students? applications before final acceptance. Talks and abstracts may be presented during the CodeFest in sessions of 10 minutes plus questions. Coding activities will continue during the talks. The City of Lodi is very close to Milano and has good hotel facilities. The connections by air are excellent, via Milano Malpensa, Milano Linate and Bergamo Orio Al Serio airports. Please register soon using the form at this page http://tecnoparco.org/codefest, places may run out quickly. -- Francesco From anaryin at gmail.com Thu May 31 06:54:55 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 31 May 2012 12:54:55 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: <4FC650FB.8090502@usp.br> References: <4FC6194D.1060802@usp.br> <4FC650FB.8090502@usp.br> Message-ID: I included the header already as a biopython module, since parts of this script are my work at GSOC 2010, and others are Ezgi's independent work that she agreed to contribute. Just a safety measure :) The usage of the code is very simple. Parse a structure with biopython and use the calculate_shape_param function of this geometry module to calculate all values required to compute the shape of your molecule. It will not tell you what shape it is, but it will give you all the ingredients (if the anisotropy is 0, your molecule is spherical, for example). Let me know of any comments, and Ezgi too, as she is the main contributor to this. *Again, this should not be included, for now, in the main distribution, nor considered an "official" addition. * Download here: http://nmr.chem.uu.nl/~joao/f/geometry.py Cheers, Jo?o From golubchi at stats.ox.ac.uk Thu May 31 08:10:21 2012 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Thu, 31 May 2012 13:10:21 +0100 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: <4FC75FAD.2050909@stats.ox.ac.uk> Thanks, Eric - will have a go! T On 31/05/12 05:36, Eric Talevich wrote: > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich > wrote: > > On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik > > wrote: > > Hello, > > Does anyone have a quickie method for calculating the midpoint > root of a > given tree? > > > It's been on my to-do list. (The first step was adding the keyword > argument 'outgroup_branch_length' to root_with_outgroup.) The tree > method 'depths' should also be handy. > > The algorithm I had in mind looks like: > > > > > I too would be delighted to see a better algorithm for this. > > > I implemented this in an intuitive but very inefficient way, calculating > the pairwise distances between all tips of the tree. You can try it from > git: > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > It would still be nice to see a better algorithm, if anyone has one on hand. > > -E From ferreirafm at usp.br Thu May 31 09:06:54 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Thu, 31 May 2012 10:06:54 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: <4FC6194D.1060802@usp.br> <4FC650FB.8090502@usp.br> Message-ID: <4FC76CEE.1070300@usp.br> Hi Jo?o and Ezgi, That's a very nice piece code. I'll do some tests and let you know the results. All the Best, Fred Em 31-05-2012 07:54, Jo?o Rodrigues escreveu: > I included the header already as a biopython module, since parts of > this script are my work at GSOC 2010, and others are Ezgi's > independent work that she agreed to contribute. Just a safety measure :) > > The usage of the code is very simple. Parse a structure with biopython > and use the calculate_shape_param function of this geometry module to > calculate all values required to compute the shape of your molecule. > It will not tell you what shape it is, but it will give you all the > ingredients (if the anisotropy is 0, your molecule is spherical, for > example). > > Let me know of any comments, and Ezgi too, as she is the main > contributor to this. > > *_Again, this should not be included, for now, in the main > distribution, nor considered an "official" addition. > _* > > Download here: http://nmr.chem.uu.nl/~joao/f/geometry.py > > > Cheers, > > Jo?o From b.invergo at gmail.com Thu May 31 09:31:21 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 31 May 2012 15:31:21 +0200 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: <1338471081.627.7.camel@localhost.localdomain> On Thu, 2012-05-31 at 00:36 -0400, Eric Talevich wrote: > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich wrote: > I implemented this in an intuitive but very inefficient way, calculating > the pairwise distances between all tips of the tree. You can try it from > git: > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > It would still be nice to see a better algorithm, if anyone has one on hand. > > -E I sped it up a little bit by getting rid of those nested for loops: https://github.com/brandoninvergo/biopython/commit/102189cd49d448423ee160a0a0ad891b58f56c26 According to a naive benchmark of comparing execution times for the unit test, this version is about 40% faster (0.901s vs 0.524s on my computer). I'll do a pull request... As for the problem of accumulating floating point rounding errors, perhaps you can do the root operations on copies of the tree instead... -brandon From eric.talevich at gmail.com Thu May 31 12:30:07 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 31 May 2012 12:30:07 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: <1338471081.627.7.camel@localhost.localdomain> References: <4FC62356.2030709@stats.ox.ac.uk> <1338471081.627.7.camel@localhost.localdomain> Message-ID: On Thu, May 31, 2012 at 9:31 AM, Brandon Invergo wrote: > On Thu, 2012-05-31 at 00:36 -0400, Eric Talevich wrote: > > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich >wrote: > > I implemented this in an intuitive but very inefficient way, calculating > > the pairwise distances between all tips of the tree. You can try it from > > git: > > > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > > > It would still be nice to see a better algorithm, if anyone has one on > hand. > > > > -E > > I sped it up a little bit by getting rid of those nested for loops: > > https://github.com/brandoninvergo/biopython/commit/102189cd49d448423ee160a0a0ad891b58f56c26 > > According to a naive benchmark of comparing execution times for the unit > test, this version is about 40% faster (0.901s vs 0.524s on my > computer). I'll do a pull request... > > As for the problem of accumulating floating point rounding errors, > perhaps you can do the root operations on copies of the tree instead... > > -brandon > > Looks better, thanks! I merged it. I'll look into the rounding issue some more. It might be enough to make a single copy of the tree, do all the rerooting and distance calculation there, and use the original copy to calculate the outgroup branch length and do a single rerooting. Alternatively, I could add a separate tree method that generates pairwise distances without rerooting the tree -- either producing a big dictionary, or an iterable of ((node1, node2), distance) which could be easily fed to a dictionary if needed. From vinkurella at yahoo.com Thu May 31 13:28:13 2012 From: vinkurella at yahoo.com (vinodh kurella) Date: Thu, 31 May 2012 10:28:13 -0700 (PDT) Subject: [Biopython] Help needed to fix an error in biopython on mac osx Message-ID: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Hi Biopython community, I am trying to install?biopython?onto my?mac osx?(10.7.4) but after downloading the?biopython 1.59.tar.gz?from?http://biopython.org/wiki/Download?and installation get some error which is pasted below. $cd biopython-1.59 $sudo python?setup.py?install It works until it gets to this error and aborts. Could anyone please let me know what is wrong ? Thanks and appreciate your help. Below is the report, have cut out (......) the middle and have ?shown the start and the end output only. Vinodh running install running build running build_py creating build creating build/lib.macosx-10.7-intel-2.7 creating build/lib.macosx-10.7-intel-2.7/Bio copying Bio/__init__.py -> build/lib.macosx-10.7-intel-2.7/Bio copying Bio/_py3k.py -> build/lib.macosx-10.7-intel-2.7/Bio . . . . copying Bio/PopGen/SimCoal/data/ssm_2d.par -> build/lib.macosx-10.7-intel-2.7/Bio/PopGen/SimCoal/data running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.7-intel-2.7 creating build/temp.macosx-10.7-intel-2.7/Bio llvm-gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IBio -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.7-intel-2.7/Bio/cpairwise2module.o unable to execute llvm-gcc-4.2: No such file or directory error: command 'llvm-gcc-4.2' failed with exit status 1 From arklenna at gmail.com Thu May 31 13:46:05 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 31 May 2012 13:46:05 -0400 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: Hi Vinodh, Do you have XCode installed? It's required to build Biopython on a Mac. llvm-gcc-4.2 is a C compiler used by Apple. Lenna From anaryin at gmail.com Thu May 31 13:49:17 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 31 May 2012 19:49:17 +0200 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: Is this on Lion? From arklenna at gmail.com Thu May 31 13:55:40 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 31 May 2012 13:55:40 -0400 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 4:56 AM, Peter Cock wrote: > > Hi Dilara & Lenna, > > I would use append mode with caution - it will have side effects > like if you run this script twice, the output file will double in size > (the first run plus the second run). Wouldn't opening in write > mode work just as well here? > i.e. Open the handle, do the loop, close the handle. > Hi Peter, Thanks for the warning. Python, making me adjust my thought patterns every day. I'm used to shell, > vs >> for cat etc. I had never tried multiple writes to a single open file. But the behavior is logical. > > The key point about using SeqIO.write(...) once to do a whole > file is this requires an iterator based approach. For example, > using a generator expression and a function acting on a single > record: > > def modify_record(record): > ? ?#Do something sensible to the headers here: > ? ?record.id = "modified" > ? ?return record > #This is a generator expression: > modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) > count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") > print "Modified %i records" % count > > Equivalently using a generator function which does the > looping itself: > > def modify_records(records): > ? ?for record in records: > ? ? ? ?#Do something sensible to the headers here: > ? ? ? ?record.id = "modified" > ? ? ? ?yield record > count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", > "fastq")), "newsolid_1.fastq", "fastq") > print "Modified %i records" % count The generator function is nice, too. I presume this only works because SeqIO.write knows how to write from an iterator? Lenna From p.j.a.cock at googlemail.com Thu May 31 14:06:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 31 May 2012 19:06:44 +0100 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: On Thu, May 31, 2012 at 6:49 PM, Jo?o Rodrigues wrote: > Is this on Lion? Yes, he said OS X 10.7 which is Lion. You can download Xcode free from the Apple Mac App Store (it just takes a while, several GB in size). Peter From p.j.a.cock at googlemail.com Thu May 31 14:10:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 31 May 2012 19:10:11 +0100 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Thu, May 31, 2012 at 6:55 PM, Lenna Peterson wrote: >> The key point about using SeqIO.write(...) once to do a whole >> file is this requires an iterator based approach. For example, >> using a generator expression and a function acting on a single >> record: >> >> def modify_record(record): >> ? ?#Do something sensible to the headers here: >> ? ?record.id = "modified" >> ? ?return record >> #This is a generator expression: >> modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) >> count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") >> print "Modified %i records" % count >> >> Equivalently using a generator function which does the >> looping itself: >> >> def modify_records(records): >> ? ?for record in records: >> ? ? ? ?#Do something sensible to the headers here: >> ? ? ? ?record.id = "modified" >> ? ? ? ?yield record >> count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", >> "fastq")), "newsolid_1.fastq", "fastq") >> print "Modified %i records" % count > > > The generator function is nice, too. I presume this only works because > SeqIO.write knows how to write from an iterator? > > Lenna Bio.SeqIO.write is *designed* to take a Python iterator of SeqRecord objects. That can be a generator function, generator expression, a custom class which supports iteration, or even a simple list or tuple of SeqRecord objects all in memory. As a special case connivence it also accepts a single SeqRecord. Peter From ferreirafm at usp.br Thu May 31 14:52:00 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Thu, 31 May 2012 15:52:00 -0300 Subject: [Biopython] geometry.py Message-ID: <4FC7BDD0.2020905@usp.br> Hi Jo?o, The gyration radio (Rg) is running just fine. They are in excellent agreement with those from some models I have tested. However, the maximum dimensions do not match at all. Did orientate the model before tensor analysis? Fred From xiaochuan.liu at mssm.edu Thu May 3 22:46:58 2012 From: xiaochuan.liu at mssm.edu (Liu, XiaoChuan) Date: Thu, 3 May 2012 22:46:58 +0000 Subject: [Biopython] How to use SeqRecord to get the subseq location information Message-ID: Dear all, I face a problem: How to use SeqRecord to get the subseq location information? My code is like this: >>> from Bio.Seq import Seq >>> simple_seq = Seq("gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugaguccugcucuuguucugagcaccaccccucucucaga") >>> from Bio.SeqRecord import SeqRecord >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> example_feature = SeqFeature(FeatureLocation(25382494, 25382583), type="mRNA", strand=-1) >>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>> simple_seq_r SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> simple_seq_r.features [SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1)] >>> simple_seq_r.features[0] SeqFeature(FeatureLocation(ExactPosition(25382494),ExactPosition(25382583)), type='mRNA', strand=-1) >>> subseq=simple_seq_r[3:10] >>> subseq SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) But when I type ?subseq.features? like this: >>> subseq.features [] I could not get the location information of subseq. Why? Do somebody know how to get these information? Thank you very much! Best, Xiaochuan From w.arindrarto at gmail.com Fri May 4 06:09:31 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 4 May 2012 08:09:31 +0200 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: Hi Liu, It looks like the problem is caused by the values you put in your SeqFeature. Your sequence length is less than the feature location values. If you try plugging in a number in range, like this: >>> example_feature = SeqFeature(FeatureLocation(5, 7), type="mRNA", strand=-1) You should still keep the feature in your subsequence, like so: >>> subseq = simple_seq_r[3:10] >>> subseq.features [SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4), strand=-1), type='mRNA')] Hope that helps :), cheers, Bow From xiaochuan.liu at mssm.edu Fri May 4 16:19:47 2012 From: xiaochuan.liu at mssm.edu (Liu, XiaoChuan) Date: Fri, 4 May 2012 16:19:47 +0000 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: Hi Bow, Thank you very much for your helps! But according to your suggestion, I also face this problem. See below: >>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1) >>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>> simple_seq_r SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> simple_seq_r.features [SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)] >>> subseq=simple_seq_r[3:10] >>> subseq SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>> subseq.features [] I could not get the location information of subseq yet. Why? Thank you very much! Best, Xiaochuan -----Original Message----- From: Wibowo Arindrarto [mailto:w.arindrarto at gmail.com] Sent: Friday, May 04, 2012 2:10 AM To: Liu, XiaoChuan Cc: biopython at biopython.org Subject: Re: [Biopython] How to use SeqRecord to get the subseq location information Hi Liu, It looks like the problem is caused by the values you put in your SeqFeature. Your sequence length is less than the feature location values. If you try plugging in a number in range, like this: >>> example_feature = SeqFeature(FeatureLocation(5, 7), type="mRNA", >>> strand=-1) You should still keep the feature in your subsequence, like so: >>> subseq = simple_seq_r[3:10] >>> subseq.features [SeqFeature(FeatureLocation(ExactPosition(2), ExactPosition(4), strand=-1), type='mRNA')] Hope that helps :), cheers, Bow From p.j.a.cock at googlemail.com Fri May 4 16:31:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 4 May 2012 17:31:07 +0100 Subject: [Biopython] How to use SeqRecord to get the subseq location information In-Reply-To: References: Message-ID: On Fri, May 4, 2012 at 5:19 PM, Liu, XiaoChuan wrote: > Hi Bow, > > Thank you very much for your helps! > But according to your suggestion, I also face this problem. See below: > >>>> example_feature = SeqFeature(FeatureLocation(0, 88), type="mRNA", strand=-1) >>>> simple_seq_r = SeqRecord(simple_seq, id="17_329.4",features=[example_feature]) >>>> simple_seq_r > SeqRecord(seq=Seq('gugggaagagggguggggcccgggacuguacccaugugaggacuauucuugagu...aga', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>>> simple_seq_r.features > [SeqFeature(FeatureLocation(ExactPosition(0),ExactPosition(88)), type='mRNA', strand=-1)] >>>> subseq=simple_seq_r[3:10] >>>> subseq > SeqRecord(seq=Seq('ggaagag', Alphabet()), id='17_329.4', name='', description='', dbxrefs=[]) >>>> subseq.features > [] > > I could not get the location information of subseq yet. Why? Thank you very much! > What numbers are you trying to get? In your example the parent sequence (simple_seq_r) has a feature from 0 to 88, but when you slice a SeqRecord only features fully inside the slice are kept - so no features are kept for the child record (subseq). We do not breakup larger features which straddle the cut sites. Peter From p.j.a.cock at googlemail.com Sun May 6 11:09:30 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 6 May 2012 12:09:30 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: Dear Biopythoneers, Are any of us planning to attend the SciPy meeting? The 2012 SciPy Bioinformatics Workshop is crying out for a Biopython related talk... and from the email below it sounds like they're not just looking for a developers perspectives, but also how Python is being used in bioinformatics. Is it quite close after BOSC and ISMB but July 19 doesn't actually clash: http://www.open-bio.org/wiki/BOSC_2012 SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it clashes with the planned CodeFest too: http://www.open-bio.org/wiki/EU_Codefest_2012 July is definitely conference season... Peter ---------- Forwarded message ---------- From: *Chris Mueller* Date: Thursday, May 3, 2012 Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop To: "chris.mueller at lab7.io" We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in conjunction with SciPy 2012 this July in Austin, TX. Python in biology is not dead yet... in fact, it's alive and well! Remember just a few short years ago when BioPerl ruled the world? Just one minor paradigm shift* later and Python now has a commanding presence in bioinformatics. From Python bindings to common tools all the way to entire Python-based informatics platforms, Python is used everywhere** in modern bioinformatics. If you use Python for bioinformatics or just want to learn more about how its being used, join us at the 2012 SciPy Bioinformatics Workshop. We will have speakers from both academia and industry showcasing how Python is enabling biologists to effectively work with large, complex data sets. The workshop will be held the evening of July 19 from 5-6:30. More information about SciPy is available on the conference site: http://conference.scipy.org/scipy2012/ !! Participate !! Are you using Python in bioinformatics? We'd love to have you share your story. We are looking for 3-4 speakers to share their experiences using Python for bioinformatics. Please contact Chris Mueller at chris.mueller [at] lab7.io and Ray Roberts at rroberts [at] enthought.com to volunteer. Please include a brief description or link to a paper/topic which you would like to discuss. Presentations will last for 15 minutes each and will be followed by a panel Q&A. -- * That would be next generation sequencing ** Yes, we aRe awaRe of that otheR language used eveRywhere, but let's celebRate Python Right now. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From tiagoantao at gmail.com Sun May 6 11:16:36 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 6 May 2012 12:16:36 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: Hi, On Sun, May 6, 2012 at 12:09 PM, Peter Cock wrote: > SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it > clashes with the planned CodeFest too: > http://www.open-bio.org/wiki/EU_Codefest_2012 Are any people from here going to the codefest? Tiago From cjfields at illinois.edu Sun May 6 15:03:27 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 6 May 2012 15:03:27 +0000 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com>, Message-ID: On May 6, 2012, at 6:12 AM, "Peter Cock" wrote: > ... > Is it quite close after BOSC and ISMB but July 19 doesn't actually clash: > http://www.open-bio.org/wiki/BOSC_2012 > > SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it > clashes with the planned CodeFest too: > http://www.open-bio.org/wiki/EU_Codefest_2012 > > July is definitely conference season... Galaxy community conference as well. Chris > > Peter > > ---------- Forwarded message ---------- > From: *Chris Mueller* > Date: Thursday, May 3, 2012 > Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop > To: "chris.mueller at lab7.io" > > > We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in > conjunction with SciPy 2012 this July in Austin, TX. > > Python in biology is not dead yet... in fact, it's alive and well! > > Remember just a few short years ago when BioPerl ruled the world? Just one > minor paradigm shift* later and Python now has a commanding presence in > bioinformatics. From Python bindings to common tools all the way to entire > Python-based informatics platforms, Python is used everywhere** in modern > bioinformatics. > > If you use Python for bioinformatics or just want to learn more about how > its being used, join us at the 2012 SciPy Bioinformatics Workshop. We will > have speakers from both academia and industry showcasing how Python is > enabling biologists to effectively work with large, complex data sets. > > The workshop will be held the evening of July 19 from 5-6:30. > > More information about SciPy is available on the conference site: > http://conference.scipy.org/scipy2012/ > > !! Participate !! > > Are you using Python in bioinformatics? We'd love to have you share your > story. We are looking for 3-4 speakers to share their experiences using > Python for bioinformatics. > > Please contact Chris Mueller at chris.mueller [at] lab7.io and Ray Roberts > at rroberts [at] enthought.com to volunteer. Please include a brief > description or link to a paper/topic which you would like to discuss. > Presentations will last for 15 minutes each and will be followed by a panel > Q&A. > > -- > * That would be next generation sequencing > ** Yes, we aRe awaRe of that otheR language used eveRywhere, but let's > celebRate Python Right now. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From arklenna at gmail.com Sun May 6 21:26:30 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Sun, 6 May 2012 17:26:30 -0400 Subject: [Biopython] GSoC python variant update Message-ID: Hi all, I've written a few new posts on my blog; here's the latest: http://arklenna.tumblr.com/post/22542372076/spot-isa-dog I will attach a UML diagram and include the part of the post addressing the diagram. Click through to the full post for a bonus Einstein quote! ------- My main goals are not limited to: * Make the structure parser and file-format agnostic: an abstracted OO design should allow anything to be slotted in (for example, Marjan's C GFF parser?) * Maintain encapsulation: limit how much each object can see of objects above and below it * Allow extension at multiple levels: some existing parsers may process data in different ways; this structure should allow handling both raw data and data in various formats. The `Variant` object's constructor allows an end user to change the default parsers. Practical implementation details of `parse()` and `write()` will need to be finessed - for example, ways to help the user sift through immense quantities of data. I'm still in the process of comparing the data contained in VCF/GVF files as well as the APIs of PyVCF and BCBio.GFF. `Parser` and `Writer` are both abstract classes that will define all methods found in known parsers/writers with `NotImplementedError`s. I'm speculating on whether a Variant-specific exception would be useful, but a custom message should suffice. Continuing down the diagram, `PyVCFWrapper` and `BCBioGFFWrapper` would each inherit from both `Parser` and `Writer`. As the name implies, they would serve as the adapter between the generic `Variant` and the specific parser. I anticipate that this structure could easily be extended to allow intermediate storage in DBs as well as innumerable sorting/comparing/filtering methods inside `Variant`. ------- I would appreciate any and all feedback about the overall structure. Namespace is definitely flexible. I'd also appreciate any specific genomic variant workflows, and if somebody can point me to smallish sample files of the same data in both VCF and GVF, I'd be eternally grateful. Regards, Lenna -------------- next part -------------- A non-text attachment was scrubbed... Name: Variant_UML.png Type: image/png Size: 23313 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Mon May 7 08:37:38 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 May 2012 09:37:38 +0100 Subject: [Biopython] Fwd: 2012 SciPy Bioinformatics Workshop In-Reply-To: References: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> Message-ID: On Sun, May 6, 2012 at 12:16 PM, Tiago Ant?o wrote: > Hi, > > On Sun, May 6, 2012 at 12:09 PM, Peter Cock wrote: >> SciPy 2012 as a whole does clash with ISMB, and for those in Europe, it >> clashes with the planned CodeFest too: >> http://www.open-bio.org/wiki/EU_Codefest_2012 > > Are any people from here going to the codefest? > > Tiago Brad is going to the pre-BOSC CodeFest in California, http://www.open-bio.org/wiki/Codefest_2012 I'm not sure if we have any Biopython folk signed up for the post-BOSC EU CodeFest in Italy yet. http://www.open-bio.org/wiki/EU_Codefest_2012 I aim to attend one of the CodeFests - trying to firm up summer travel plans now... Peter From devaniranjan at gmail.com Mon May 7 22:25:46 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 7 May 2012 18:25:46 -0400 Subject: [Biopython] PDBParser Message-ID: Hi, I have a question about using PDBParser from Bio.PDB.PDBParser import PDBParser parser=PDBParser() structure=parser.get_structure("test", "1fat.pdb") model=structure[0] chain=model["A"] residue=chain[1] I want to use it to extract and WRITE to a file the coordinates of residues 10 to 20 only. (or whatever residue range I specify) Using the PDB Parser file I can extract residue id in the range but how to I back trace and write the file in the exact format that is found in the PDB so that I can view it in a program like VMD/Pymol? (that is I want to write the coordinates and all information as found in the PDB but only for selected residues that I pass into it ) I know I can do it using VMD but I want to do it for thousands of PDB and would like to write a database of such extracted fragments. The other alternative is of course to go line by line in each file and write the lines that match the residue range specified but I was wondering if there is a way of doing the same thing using the PDBParser? Thank you, George From anaryin at gmail.com Tue May 8 08:16:44 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 10:16:44 +0200 Subject: [Biopython] PDBParser In-Reply-To: References: Message-ID: Hello George, You want to write only a part of the PDB file? What do you mean by 'all the info'? If it is header information as well, then this is not possible, but coordinates it is. You can do it in two ways: 1. Delete all residues/chains/models that are not part of your region of interest and then write the structure with PDBIO. 2. Use the 'Select' class from PDBIO.py and trim the region of interest. For example, for residues 1-10 you could do something like this: from Bio.PDB import PDBIO from Bio.PDB import Select class ResidueFilter(Select): def accept_residue(self, residue): if residue.id[1] in range(1,11): return 1 P = PDBParser() s = P.get_structure('dummy', 'foo.pdb') io = PDBIO() io.set_structure(s) io.save('foo_1-10.pdb', ResidueFilter()) Check the FAQ for a more detailed explanation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi, > > I have a question about using PDBParser > > > from Bio.PDB.PDBParser import PDBParser > > parser=PDBParser() > > structure=parser.get_structure("test", "1fat.pdb") > model=structure[0] > chain=model["A"] > residue=chain[1] > > I want to use it to extract and WRITE to a file the coordinates of residues > 10 to 20 only. > (or whatever residue range I specify) > > Using the PDB Parser file I can extract residue id in the range but how to > I back trace and write the file in the exact format that is found in the > PDB so that I can view it in a program like VMD/Pymol? > (that is I want to write the coordinates and all information as found in > the PDB but only for selected residues that I pass into it ) > I know I can do it using VMD but I want to do it for thousands of PDB and > would like to write a database of such extracted fragments. > > The other alternative is of course to go line by line in each file and > write the lines that match the residue range specified but I was wondering > if there is a way of doing the same thing using the PDBParser? > > Thank you, > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Tue May 8 12:44:09 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 08:44:09 -0400 Subject: [Biopython] PDBParser In-Reply-To: References: Message-ID: Thank you everyone, I just need the coordinates of certain fragments from PDB files and this works for me. I was trying to use the PDBParser only, but thank you for pointing out PDBIO to me. Thank you, George On Tue, May 8, 2012 at 4:16 AM, Jo?o Rodrigues wrote: > Hello George, > > You want to write only a part of the PDB file? What do you mean by 'all > the info'? If it is header information as well, then this is not possible, > but coordinates it is. You can do it in two ways: > > 1. Delete all residues/chains/models that are not part of your region of > interest and then write the structure with PDBIO. > > 2. Use the 'Select' class from PDBIO.py and trim the region of interest. > For example, for residues 1-10 you could do something like this: > > from Bio.PDB import PDBIO > from Bio.PDB import Select > > class ResidueFilter(Select): > def accept_residue(self, residue): > if residue.id[1] in range(1,11): > return 1 > > P = PDBParser() > s = P.get_structure('dummy', 'foo.pdb') > > io = PDBIO() > > io.set_structure(s) > io.save('foo_1-10.pdb', ResidueFilter()) > > > Check the FAQ for a more detailed explanation: > > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > >> Hi, >> >> I have a question about using PDBParser >> >> >> from Bio.PDB.PDBParser import PDBParser >> >> parser=PDBParser() >> >> structure=parser.get_structure("test", "1fat.pdb") >> model=structure[0] >> chain=model["A"] >> residue=chain[1] >> >> I want to use it to extract and WRITE to a file the coordinates of >> residues >> 10 to 20 only. >> (or whatever residue range I specify) >> >> Using the PDB Parser file I can extract residue id in the range but how >> to >> I back trace and write the file in the exact format that is found in the >> PDB so that I can view it in a program like VMD/Pymol? >> (that is I want to write the coordinates and all information as found in >> the PDB but only for selected residues that I pass into it ) >> I know I can do it using VMD but I want to do it for thousands of PDB and >> would like to write a database of such extracted fragments. >> >> The other alternative is of course to go line by line in each file and >> write the lines that match the residue range specified but I was wondering >> if there is a way of doing the same thing using the PDBParser? >> >> Thank you, >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From devaniranjan at gmail.com Tue May 8 19:12:22 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 15:12:22 -0400 Subject: [Biopython] PDBParser-chain breaks Message-ID: Hi, I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with chain breaks. However, nothing like that seems to happen..... For instance P=PDBParser(PERMISSIVE=0) structure=P.get_structure('test', '7ODC.pdb') 7ODC has 3 chain breaks but it does not raise an exception. Thank you George From anaryin at gmail.com Tue May 8 19:34:26 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 22:34:26 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Hi George, Chain breaks are pretty "harmless" and usually do not represent a faulty PDB file. The PERMISSIVE flag is for "features" like missing b-factors. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi, > > I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with > chain breaks. > However, nothing like that seems to happen..... > > For instance > > P=PDBParser(PERMISSIVE=0) > structure=P.get_structure('test', '7ODC.pdb') > > > 7ODC has 3 chain breaks but it does not raise an exception. > > Thank you > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Tue May 8 19:37:47 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 8 May 2012 15:37:47 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Hi Jo?o, Is there a way though to find PDB's with chain breaks? using biopython? Thank you, George On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues wrote: > Hi George, > > Chain breaks are pretty "harmless" and usually do not represent a faulty > PDB file. The PERMISSIVE flag is for "features" like missing b-factors. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > >> Hi, >> >> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB with >> chain breaks. >> However, nothing like that seems to happen..... >> >> For instance >> >> P=PDBParser(PERMISSIVE=0) >> structure=P.get_structure('test', '7ODC.pdb') >> >> >> 7ODC has 3 chain breaks but it does not raise an exception. >> >> Thank you >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From anaryin at gmail.com Tue May 8 19:39:02 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 8 May 2012 22:39:02 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Of course. Since they throw a warning just make sure to count the warnings, parse the chain break ones, and if they are more than 0, you have chain breaks. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/8 George Devaniranjan > Hi Jo?o, > > Is there a way though to find PDB's with chain breaks? using biopython? > > Thank you, > George > > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues wrote: > >> Hi George, >> >> Chain breaks are pretty "harmless" and usually do not represent a faulty >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. >> >> Cheers, >> >> Jo?o [...] Rodrigues >> http://nmr.chem.uu.nl/~joao >> >> >> >> 2012/5/8 George Devaniranjan >> >>> Hi, >>> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >>> with >>> chain breaks. >>> However, nothing like that seems to happen..... >>> >>> For instance >>> >>> P=PDBParser(PERMISSIVE=0) >>> structure=P.get_structure('test', '7ODC.pdb') >>> >>> >>> 7ODC has 3 chain breaks but it does not raise an exception. >>> >>> Thank you >>> George >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From eric.talevich at gmail.com Wed May 9 01:59:13 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 8 May 2012 21:59:13 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: The warnings module also lets you convert any warning to an error (or ignore it, etc.). Use a regular expression to match the warning message: from Bio import PDB import warnings warnings.filterwarnings('error', message='.*discontinuous at.*') p = PDB.PDBParser() s = p.get_structure("", "3BEG.pdb") On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: > Of course. Since they throw a warning just make sure to count the warnings, > parse the chain break ones, and if they are more than 0, you have chain > breaks. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/8 George Devaniranjan > > > Hi Jo?o, > > > > Is there a way though to find PDB's with chain breaks? using biopython? > > > > Thank you, > > George > > > > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues > wrote: > > > >> Hi George, > >> > >> Chain breaks are pretty "harmless" and usually do not represent a faulty > >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. > >> > >> Cheers, > >> > >> Jo?o [...] Rodrigues > >> http://nmr.chem.uu.nl/~joao > >> > >> > >> > >> 2012/5/8 George Devaniranjan > >> > >>> Hi, > >>> > >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB > >>> with > >>> chain breaks. > >>> However, nothing like that seems to happen..... > >>> > >>> For instance > >>> > >>> P=PDBParser(PERMISSIVE=0) > >>> structure=P.get_structure('test', '7ODC.pdb') > >>> > >>> > >>> 7ODC has 3 chain breaks but it does not raise an exception. > >>> > >>> Thank you > >>> George > >>> _______________________________________________ > >>> Biopython mailing list - Biopython at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython > >>> > >> > >> > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anaryin at gmail.com Wed May 9 05:39:40 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 9 May 2012 08:39:40 +0300 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: For some reason however, I didn't get the discontinuous error .. That's why I proposed this alternative. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/9 Eric Talevich > The warnings module also lets you convert any warning to an error (or > ignore it, etc.). Use a regular expression to match the warning message: > > from Bio import PDB > import warnings > warnings.filterwarnings('error', message='.*discontinuous at.*') > p = PDB.PDBParser() > s = p.get_structure("", "3BEG.pdb") > > > > On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: > >> Of course. Since they throw a warning just make sure to count the >> warnings, >> parse the chain break ones, and if they are more than 0, you have chain >> breaks. >> >> Cheers, >> >> Jo?o [...] Rodrigues >> http://nmr.chem.uu.nl/~joao >> >> >> >> 2012/5/8 George Devaniranjan >> >> > Hi Jo?o, >> > >> > Is there a way though to find PDB's with chain breaks? using biopython? >> > >> > Thank you, >> > George >> > >> > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues >> wrote: >> > >> >> Hi George, >> >> >> >> Chain breaks are pretty "harmless" and usually do not represent a >> faulty >> >> PDB file. The PERMISSIVE flag is for "features" like missing b-factors. >> >> >> >> Cheers, >> >> >> >> Jo?o [...] Rodrigues >> >> http://nmr.chem.uu.nl/~joao >> >> >> >> >> >> >> >> 2012/5/8 George Devaniranjan >> >> >> >>> Hi, >> >>> >> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >> >>> with >> >>> chain breaks. >> >>> However, nothing like that seems to happen..... >> >>> >> >>> For instance >> >>> >> >>> P=PDBParser(PERMISSIVE=0) >> >>> structure=P.get_structure('test', '7ODC.pdb') >> >>> >> >>> >> >>> 7ODC has 3 chain breaks but it does not raise an exception. >> >>> >> >>> Thank you >> >>> George >> >>> _______________________________________________ >> >>> Biopython mailing list - Biopython at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >>> >> >> >> >> >> > >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From eric.talevich at gmail.com Wed May 9 14:31:34 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 9 May 2012 10:31:34 -0400 Subject: [Biopython] PDBParser-chain breaks In-Reply-To: References: Message-ID: Oh, there's a caveat to the warnings module -- if a given warning isn't captured this way the first time, it's never issued again. So, parsing 3BEG once normally, and again with the setup I gave, won't trigger the warning again and therefore won't raise an error. On Wed, May 9, 2012 at 1:39 AM, Jo?o Rodrigues wrote: > For some reason however, I didn't get the discontinuous error .. That's > why I proposed this alternative. > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/9 Eric Talevich > >> The warnings module also lets you convert any warning to an error (or >> ignore it, etc.). Use a regular expression to match the warning message: >> >> from Bio import PDB >> import warnings >> warnings.filterwarnings('error', message='.*discontinuous at.*') >> p = PDB.PDBParser() >> s = p.get_structure("", "3BEG.pdb") >> >> >> >> On Tue, May 8, 2012 at 3:39 PM, Jo?o Rodrigues wrote: >> >>> Of course. Since they throw a warning just make sure to count the >>> warnings, >>> parse the chain break ones, and if they are more than 0, you have chain >>> breaks. >>> >>> Cheers, >>> >>> Jo?o [...] Rodrigues >>> http://nmr.chem.uu.nl/~joao >>> >>> >>> >>> 2012/5/8 George Devaniranjan >>> >>> > Hi Jo?o, >>> > >>> > Is there a way though to find PDB's with chain breaks? using biopython? >>> > >>> > Thank you, >>> > George >>> > >>> > On Tue, May 8, 2012 at 3:34 PM, Jo?o Rodrigues >>> wrote: >>> > >>> >> Hi George, >>> >> >>> >> Chain breaks are pretty "harmless" and usually do not represent a >>> faulty >>> >> PDB file. The PERMISSIVE flag is for "features" like missing >>> b-factors. >>> >> >>> >> Cheers, >>> >> >>> >> Jo?o [...] Rodrigues >>> >> http://nmr.chem.uu.nl/~joao >>> >> >>> >> >>> >> >>> >> 2012/5/8 George Devaniranjan >>> >> >>> >>> Hi, >>> >>> >>> >>> I thought using PERMISSIVE=0 would raise an exception if I pass a PDB >>> >>> with >>> >>> chain breaks. >>> >>> However, nothing like that seems to happen..... >>> >>> >>> >>> For instance >>> >>> >>> >>> P=PDBParser(PERMISSIVE=0) >>> >>> structure=P.get_structure('test', '7ODC.pdb') >>> >>> >>> >>> >>> >>> 7ODC has 3 chain breaks but it does not raise an exception. >>> >>> >>> >>> Thank you >>> >>> George >>> >>> _______________________________________________ >>> >>> Biopython mailing list - Biopython at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >>> >>> >> >>> >> >>> > >>> >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From w.arindrarto at gmail.com Wed May 9 16:24:43 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 9 May 2012 18:24:43 +0200 Subject: [Biopython] GSoC Project Update -- 1 Message-ID: Hi everyone, I just posted my latest blog updated here: http://bow.web.id/blog/2012/05/warming-up-for-the-coding-period/ To summarize, I've spent most of my time getting to know the programs I will support better. This has been done by: 1. Playing around with the programs to see how many different outputs I can generate. 2. Writing scripts to automate test case generation for each of the programs. 3. Writing wrappers (for programs not yet wrapped by Biopython: FASTA, HMMER, and BLAT) to ease writing the test case generators. 4. Continuing to complete my proposed SearchIO object naming scheme (http://bit.ly/searchio-terms) The test cases, their generators, and the wrappers I've written are available in my non-Biopython gsoc repo here: http://github.com/bow/gsoc/. Additionally, I've used the generated test case to improve a recent bug report and submitted a fix for the next release. For the coming weeks prior to coding start, I'm planning to play around more with XML and SQLite as I will use them in the code. I might start to add more skeleton code to my current development branch as well (https://github.com/bow/biopython). cheers, Bow From mmokrejs at fold.natur.cuni.cz Wed May 9 18:01:08 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 09 May 2012 20:01:08 +0200 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: <87r4ynu3cx.fsf@fastmail.fm> References: <87r4ynu3cx.fsf@fastmail.fm> Message-ID: <4FAAB0E4.30409@fold.natur.cuni.cz> Hi Brad, I just got bitten by this myself as well. Could be the legacy blast parser improved to give clearer error message? E.g. that it failed to find the LOCUS line or whatever was it looking for? With the legacy BLAST documentation being gone from current Tutorial it is easy to pick the wrong parser. ;) And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ give me same alignment, no matter what arguments I use to adjust for the (it gives me wider alignment than wanted and I can make it a look shorter, but shortening it just a bit like legacy BLAST output .. is not doable). And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . Thanks, Martin Brad Chapman wrote: > > Sar; > >> I am new to both python and biopython. > > Welcome. Thanks for including your code along with the problem report. > >> What I'm trying to do is to parse a blast result xml file (myblast.xml), >> attached here. >> >> The code looks like this: > [...] >> blast_parser = NCBIStandalone.BlastParser() > [...] >> ValueError: Invalid header? > > You are using NCBIStandalone, which parses plain text blast output. To > parse the XML output, you should use the NCBIXML parser: > > from Bio.Blast import NCBIXML > blast_records = NCBIXML.parse(result_handle) > > The tutorial has more details and examples: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87 > > Hope this helps, > Brad > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From p.j.a.cock at googlemail.com Wed May 9 18:25:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 May 2012 19:25:36 +0100 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: <4FAAB0E4.30409@fold.natur.cuni.cz> References: <87r4ynu3cx.fsf@fastmail.fm> <4FAAB0E4.30409@fold.natur.cuni.cz> Message-ID: On Wed, May 9, 2012 at 7:01 PM, Martin Mokrejs wrote: > Hi Brad, > ?I just got bitten by this myself as well. Could be the legacy blast parser > improved to give clearer error message? E.g. that it failed to find the LOCUS > line or whatever was it looking for? With the legacy BLAST documentation being > gone from current Tutorial it is easy to pick the wrong parser. ;) > > ?And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ > give me same alignment, no matter what arguments I use to adjust for the (it gives me > wider alignment than wanted and I can make it a look shorter, but shortening it just > a bit like legacy BLAST output .. is not doable). Have you contacted the NCBI about this possible regression? > ?And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. > Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) > I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . > Thanks, > Martin Could you clarify if you are talking about documentation for calling the 'legacy' BLAST command line tools (e.g. blastall), or documentation for parsing the plain text human readable output (which still exists in BLAST+)? On a related point, Bow's just done a bit of work updating our plain text parser to cope with BLAST+ (specifically changes in BLAST 2.2.25+ and/or 2.2.26+). One of the aims of Bow's GSoC project will make dealing with the different BLAST formats a lot simpler. Peter From mmokrejs at fold.natur.cuni.cz Wed May 9 18:48:19 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 09 May 2012 20:48:19 +0200 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: <87r4ynu3cx.fsf@fastmail.fm> <4FAAB0E4.30409@fold.natur.cuni.cz> Message-ID: <4FAABBF3.1030007@fold.natur.cuni.cz> Hi Peter, Peter Cock wrote: > On Wed, May 9, 2012 at 7:01 PM, Martin Mokrejs > wrote: >> Hi Brad, >> I just got bitten by this myself as well. Could be the legacy blast parser >> improved to give clearer error message? E.g. that it failed to find the LOCUS >> line or whatever was it looking for? With the legacy BLAST documentation being >> gone from current Tutorial it is easy to pick the wrong parser. ;) >> >> And BTW, please do not drop support for legacy BLAST. I just cannot make BLAST+ >> give me same alignment, no matter what arguments I use to adjust for the (it gives me >> wider alignment than wanted and I can make it a look shorter, but shortening it just >> a bit like legacy BLAST output .. is not doable). > > Have you contacted the NCBI about this possible regression? No, not yet. >> And, took me a while to find old biopython-1.52.tar.gz to lookup the old docs. >> Could there be a hyperlink from Tutorial to these unpacked, browseable sources? ;) >> I am speaking about http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc85 . >> Thanks, >> Martin > > Could you clarify if you are talking about documentation for calling > the 'legacy' > BLAST command line tools (e.g. blastall), or documentation for parsing the Yes, "blastall -p blastn ...", the default plaintext pairwise output (-m 0), version 2.2.24. > plain text human readable output (which still exists in BLAST+)? > > On a related point, Bow's just done a bit of work updating our plain > text parser to > cope with BLAST+ (specifically changes in BLAST 2.2.25+ and/or 2.2.26+). > > One of the aims of Bow's GSoC project will make dealing with the different > BLAST formats a lot simpler. Its great that we have GSoC students, would I have some spare time I would mentor one. Good luck and thanks for your care, Peter! Martin From livingstonemark at gmail.com Sat May 12 02:06:07 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Sat, 12 May 2012 12:06:07 +1000 Subject: [Biopython] Superposition Message-ID: Hi Guys, Thanks to Andrew and others for help in my previous message. I have gone through various incarnations of my code, and suddenly found this simple code works for the small test I have done. I am using the datafiles from: Kellogg, E. H., Leaver-Fay, A., & Baker, D. (2011). Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins, 79(3), 830?838. doi:10.1002/prot.22921 These files have been modified so that there are matched PDBs which vary only by one mutated residue, and I am trying to carbon alpha superimpose the PDB which is the Mutanttype over the Wildtype and save to a PDB - which I seem to have fluked how to do. I am still working on the code for directory traversal so I have not tried it on the hundreds of matched PDBs yet. Is there anything in this code which is going to bite me? How can I improve it? ------------------------------------------------------------------------------------ #!/usr/bin/env python # Wildtype (wt) = reference, Mutanttype (mt) = alternate from Bio.PDB import * #parsing the PDBs parser = PDBParser(PERMISSIVE=1) l_wt_atoms = [] l_mt_atoms = [] pdb_out_filename = "./1bti_aligned.pdb" wt_structure = parser.get_structure("1bpi", './1bpi.pdb') mt_structure = parser.get_structure("1bti", './1bti.pdb') wt_model = wt_structure[0] mt_model = mt_structure[0] wt_chain = wt_model["A"] mt_chain = mt_model["A"] for wt_residue in wt_chain: resnum = wt_residue.get_id()[1] l_wt_atoms.append( wt_residue['CA']) for mt_residue in mt_chain: resnum = mt_residue.get_id()[1] l_mt_atoms.append( mt_residue['CA']) ##SuperImpose sup = Superimposer() ## Specify the atom lists ## ""wildtype"" and ""mutanttype"" are lists of Atom objects ## The mt atoms will be put on the wt atoms sup.set_atoms(l_wt_atoms, l_mt_atoms) ## Print rotation/translation/rmsd print "ROTRAN: ", sup.rotran print "RMS: ", sup.rms ## Apply rotation/translation to the moving atoms sup.apply(l_mt_atoms) print "Saving aligned structure as PDB file %s" % pdb_out_filename io=PDBIO() io.set_structure(mt_structure) io.save(pdb_out_filename) print "Done" ------------------------------------------------------------------------------------ Thanks in advance, Mark Livingstone B.InfoTech (Hons) Student Griffith University School of ICT Southport Qld Australia From livingstonemark at gmail.com Sun May 13 06:54:28 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Sun, 13 May 2012 16:54:28 +1000 Subject: [Biopython] .ent versus .pdb files? Message-ID: Hi Guys, I have a bunch of files which appear to be pdb file-like but have the .ent file extension. Is there any difference of significance to Bio.PDB? Thanks in advance, MarkL From anaryin at gmail.com Sun May 13 09:16:00 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sun, 13 May 2012 11:16:00 +0200 Subject: [Biopython] .ent versus .pdb files? In-Reply-To: References: Message-ID: Hi Mark, Nope, not at all. Cheers, Jo?o No dia 13 de Mai de 2012 08:55, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I have a bunch of files which appear to be pdb file-like but have the > .ent file extension. Is there any difference of significance to > Bio.PDB? > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anubratadas at gmail.com Mon May 14 12:03:29 2012 From: anubratadas at gmail.com (Anubrata Das) Date: Mon, 14 May 2012 17:33:29 +0530 Subject: [Biopython] parsing genbank file Message-ID: i am new to biopython. i wanted to parse through individual records from the genbank file of deinococcus radiodurans chromosome 1 sequence.for e.g i wanted the list of identifiers for each record >>> identifiers=[seq_record.id for seq_record in SeqIO.parse("C:\\Dr1.gb","genbank")] >>> identifiers[:10] ['NC_001263.1'] but i would get only one master entry. then if i wanted to parse thru individual records >>> record=SeqIO.parse("C:\\Dr1.gb","genbank") >>> record.next() SeqRecord(seq=UnknownSeq(2648638, alphabet = IUPACAmbiguousDNA(), character = 'N'), id='NC_001263.1', name='NC_001263', description='Deinococcus radiodurans R1 chromosome 1, complete sequence.', dbxrefs=['Project:57665']) >>> record.next() Traceback (most recent call last): File "", line 1, in record.next() StopIteration i get this output. please tell me the correct method regards -- Anubrata Das Scientific Officer Molecular Biology Division Bhabha Atomic Research Centre From p.j.a.cock at googlemail.com Mon May 14 13:48:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 14 May 2012 14:48:39 +0100 Subject: [Biopython] parsing genbank file In-Reply-To: References: Message-ID: On Mon, May 14, 2012 at 1:03 PM, Anubrata Das wrote: > i am new to biopython. i wanted to parse through individual records > from the genbank file of deinococcus radiodurans chromosome 1 > sequence. Probably your GenBank file only contains one record (for the whole of chr1). You could use Bio.SeqIO.read(...) in this case: from Bio import SeqIO record = SeqIO.parse(r"C:\Dr1.gb","genbank") print record.id print len(record.features) Peter From David.Lapointe at umassmed.edu Mon May 14 14:00:20 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Mon, 14 May 2012 14:00:20 +0000 Subject: [Biopython] parsing genbank file In-Reply-To: References: Message-ID: <86BFEB1DFA6CB3448DB8AB1FC52F4059081148@ummscsmbx06.ad.umassmed.edu> You might try looking for sequences here ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Deinococcus_radiodurans_R1_uid57665/ David -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter Cock Sent: Monday, May 14, 2012 9:49 AM To: Anubrata Das Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] parsing genbank file On Mon, May 14, 2012 at 1:03 PM, Anubrata Das wrote: > i am new to biopython. i wanted to parse through individual records > from the genbank file of deinococcus radiodurans chromosome 1 > sequence. Probably your GenBank file only contains one record (for the whole of chr1). You could use Bio.SeqIO.read(...) in this case: from Bio import SeqIO record = SeqIO.parse(r"C:\Dr1.gb","genbank") print record.id print len(record.features) Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mmokrejs at fold.natur.cuni.cz Wed May 16 09:32:34 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 16 May 2012 11:32:34 +0200 Subject: [Biopython] Legacy blast XML parser returns prematurely StopIteration Message-ID: <4FB37432.7060707@fold.natur.cuni.cz> Hi, I am parsing some blast 2.2.24 XML output and the last record I get is the one from iteration 124. I see that entry is followed by a new section which is probably the culprit. I will try newer legacy blast but still, biopython could maybe overcome this bug in XML input? blastall -p blastn -A 4 -i SRR068315.fasta -d my_targets.fasta -F 0 -S 1 -r 2 -e 10e-30 -m 7 blastn blastn 2.2.24 [Aug-08-2010] ~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~"Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs", Nucleic Acids Res. 25:3389-3402. my_targets.fasta lcl|1_0 FYUQ5C204IQCOE length=283 xy=3463_2076 region=4 run=R_2009_07_08_19_30_38_ 318 1e-29 2 -3 5 2 F [cut] 124 lcl|124_0 FYUQ5C204JXGMI length=44 xy=3954_2264 region=4 run=R_2009_07_08_19_30_38_ 350 22 9262 0 0 0.41 0.625 0.78 No hits found 1 22 9262 0 0 0.41 0.625 0.78 125 lcl|125_0 FYUQ5C204JFG82 length=173 xy=3749_2948 region=4 run=R_2009_07_08_19_30_38_ 208 22 9262 0 0 0.41 0.625 0.78 No hits found 126 lcl|126_0 FYUQ5C204I2D3A length=146 xy=3600_2628 region=4 run=R_2009_07_08_19_30_38_ 205 22 9262 0 0 0.41 0.625 0.78 No hits found Grep-ping for the iteration numbers I foresee few more cases like that ahead in the XML file: 234 1 235 236 345 1 346 347 450 1 451 452 555 1 556 557 655 1 656 657 759 1 760 761 859 1 860 861 956 1 957 958 1050 1 1051 1052 1145 1 1146 1147 1239 1 1240 1241 1333 1 1334 1335 1430 1 1431 1432 1523 1 1524 1525 1610 1 1611 1612 1703 1 1704 1705 1792 1 1793 1794 1881 1 1882 1883 Then, no this problem anymore until end of the XML file at: 25698 Thanks for comments, Martin From p.j.a.cock at googlemail.com Wed May 16 09:48:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 May 2012 10:48:10 +0100 Subject: [Biopython] Legacy blast XML parser returns prematurely StopIteration In-Reply-To: <4FB37432.7060707@fold.natur.cuni.cz> References: <4FB37432.7060707@fold.natur.cuni.cz> Message-ID: On Wed, May 16, 2012 at 10:32 AM, Martin Mokrejs wrote: > Hi, > ?I am parsing some blast 2.2.24 XML output and the last record I get is the one from > iteration 124. I see that entry is followed by a new section which > is probably the culprit. I will try newer legacy blast but still, biopython could maybe > overcome this bug in XML input? > > blastall -p blastn -A 4 -i SRR068315.fasta -d my_targets.fasta -F 0 -S 1 -r 2 -e 10e-30 -m 7 > Could you file a bug here and attach the complete XML test case please? http://redmine.open-bio.org/projects/biopython Our XML parser should handle both NCBI 'legacy' BLAST and BLAST+ Thanks, Peter From josefergil at gmail.com Wed May 23 14:18:44 2012 From: josefergil at gmail.com (jose gil) Date: Wed, 23 May 2012 16:18:44 +0200 Subject: [Biopython] starting with biopython Message-ID: Hello everyone, I'm starting with the program and I have some problems, because I don't know how download the files in order the program can load them. from the python shell I follow the instructions in the tutorial in order to load the sequence but I don't know actually where is the correct place to save the files I download for example from GenBank. Thank you very much for your help, -- Jos? Fernando Gil R. From p.j.a.cock at googlemail.com Wed May 23 14:45:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 May 2012 15:45:04 +0100 Subject: [Biopython] starting with biopython In-Reply-To: References: Message-ID: On Wed, May 23, 2012 at 3:18 PM, jose gil wrote: > Hello everyone, > > I'm starting with the program and I have some problems, because I don't > know how download the files in order the program can load them. > from the python shell I follow the instructions in the tutorial in order to > load the sequence but I don't know actually where is the correct place to > save the files I download for example from GenBank. > > Thank you very much for your help, The simplest approach is to put your Python scripts and data files all in the same folder together. Then you don't need to bother with giving paths, just local filenames will be fine. Some experience with working at the command line would help with understanding paths, absolute paths, and relative paths. Are you working on Windows? Note that by default Windows hides the extension of known file formats - I always turn this off so that in Explorer I see the full file names. What I mean is I prefer to see "example.fasta" and "example.gbk" instead of two files apparently called "example" but with a different icon. You'll find there are lots of file extensions in Bioinformatics, and they are important. Peter From animesh.agrawal at anu.edu.au Thu May 24 05:58:12 2012 From: animesh.agrawal at anu.edu.au (Animesh Agrawal) Date: Thu, 24 May 2012 15:58:12 +1000 Subject: [Biopython] Integrating SQL query to biopython Message-ID: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> Hi, I am running small SQL queries to select sequences from a local BIOSQL database. One instance such query is as follows: SELECT biosequence.* FROM biosequence JOIN bioentry USING (bioentry_id) WHERE biosequence.seq NOT LIKE "%X%" AND biosequence.alphabet = 'protein' I am wondering, how do I integrate this SQL query with Biopython code to get the output in form of SeqRecord or Seq objects. Cheers, Animesh Animesh Agrawal PhD Scholar Computational & Conceptual Biology, JCSMR Australian National University Canberra, Australia Tel: +61 2 6125 8303 Email: animesh.agrawal at anu.edu.au From p.j.a.cock at googlemail.com Thu May 24 09:10:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 May 2012 10:10:13 +0100 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <4fbdcf67.ca48340a.3ba1.5c63SMTPIN_ADDED@mx.google.com> References: <4fbdcf67.ca48340a.3ba1.5c63SMTPIN_ADDED@mx.google.com> Message-ID: On Thu, May 24, 2012 at 6:58 AM, Animesh Agrawal wrote: > Hi, > > I am running small SQL queries to select sequences from a local BIOSQL > database. One instance such query is as follows: > > > > SELECT ?biosequence.* > > FROM ? ?biosequence JOIN bioentry USING (bioentry_id) > > WHERE ? biosequence.seq NOT LIKE "%X%" > > AND ? biosequence.alphabet = 'protein' > > > > I am wondering, how do I integrate this SQL query with Biopython code to get > the output in form of SeqRecord or Seq objects. >From your direct database access, get the bioentry table's primary ID, and then use that to create a DBSeqRecord object (which is a subclass of SeqRecord and will also load the sequence for you). You will also need the adapter object as the other initialization argument, which is how the DBSeqRecord knows which database to read from. Get that by connecting to the BioSQL database through the Biopython code as usual. Something like this (untested): from BioSQL import BioSeqDatabase from BioSQL.BioSeq import DBSeqRecord #Connect to BioSQL database as usual, server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") primary_id = .... #your code here #Use Biopython's BioSQL SeqRecord loading: record = DBSeqRecord(server.adapter, primary_id) Peter From chapmanb at 50mail.com Thu May 24 09:15:02 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 24 May 2012 05:15:02 -0400 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> References: <000001cd3972$33fc4c40$9bf4e4c0$@agrawal@anu.edu.au> Message-ID: <87d35uos89.fsf@fastmail.fm> Animesh; > I am running small SQL queries to select sequences from a local BIOSQL > database. One instance such query is as follows: > > SELECT biosequence.* > FROM biosequence JOIN bioentry USING (bioentry_id) > WHERE biosequence.seq NOT LIKE "%X%" > AND biosequence.alphabet = 'protein' > > I am wondering, how do I integrate this SQL query with Biopython code to get > the output in form of SeqRecord or Seq objects. If you have the interval bioentry IDs you can use the BioSQL code directly to get SeqRecord compatible objects: from BioSQL import BioSeqDatabase from BioSQL.BioSeq import DBSeqRecord server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") rec = DBSeqRecord(server.adaptor, your_bioentry_id) Hope this helps, Brad From p.j.a.cock at googlemail.com Thu May 24 09:28:14 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 May 2012 10:28:14 +0100 Subject: [Biopython] Integrating SQL query to biopython In-Reply-To: <87d35uos89.fsf@fastmail.fm> References: <87d35uos89.fsf@fastmail.fm> Message-ID: On Thu, May 24, 2012 at 10:15 AM, Brad Chapman wrote: > > Animesh; > >> I am running small SQL queries to select sequences from a local BIOSQL >> database. One instance such query is as follows: >> >> SELECT ?biosequence.* >> FROM ? ?biosequence JOIN bioentry USING (bioentry_id) >> WHERE ? biosequence.seq NOT LIKE "%X%" >> AND ? biosequence.alphabet = 'protein' >> >> I am wondering, how do I integrate this SQL query with Biopython code to get >> the output in form of SeqRecord or Seq objects. > > If you have the interval bioentry IDs you can use the BioSQL code > directly to get SeqRecord compatible objects: > > from BioSQL import BioSeqDatabase > from BioSQL.BioSeq import DBSeqRecord > > server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", > ? ? ? ? ? ? ? ? ? ? passwd = "", host = "localhost", db="bioseqdb") > rec = DBSeqRecord(server.adaptor, your_bioentry_id) > > Hope this helps, > Brad Good to see we gave the same answer :) Peter From livingstonemark at gmail.com Mon May 28 01:55:25 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Mon, 28 May 2012 11:55:25 +1000 Subject: [Biopython] Getting the atom number of a CA residue Message-ID: Hi Guys, I want to use the Bio.PDB.NeighborSearch. To do so, it seems I need to tell it what atom to center the search on. I have constructed this convoluted center finding method ;-) and I'm wondering if there is something simpler!! atoms = Bio.PDB.Selection.unfold_entities(mtc, "A") # we find the atom number of the mutation site residue's CA atom which becomes the center of our search radius center = atoms[mtc[mutation_site].get_unpacked_list()[1].get_serial_number()].get_coord() bions = Bio.PDB.NeighborSearch(atoms) atoms_found = bions.search(center, 5.0, "A") residues_found = bions.search(center, 5.0, "R") Using 1bti.pdb and asking for the Residue 22 [=mutation_site] CA [1] does give me atom #187 coords which is correct. At the moment I am only interested in CA, but I realise this somewhat hardcoded solution will not scale! Secondly, what I eventually want to get is more of a range function where I can find e.g. what is between 5-10A from center. Since neighborsearch doesn't give access to distances of the atoms / residues, am I correct in thinking I will have to "roll my own" neighbourhoodsearch and construct a sorted by distance list and iterate through it getting e.g. 5 < distance < 10 or similar? Thanks for your thoughts. MarkL From livingstonemark at gmail.com Wed May 30 03:09:41 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Wed, 30 May 2012 13:09:41 +1000 Subject: [Biopython] Getting side chain atoms? Message-ID: Hi Guys, I notice on the wiki that it says the mailing list is at biopython at biopython.org, but when I suscribed it said to use biopython at lists.open-bio.org, so I'm wondering what the difference is? What is the simplest way to get a list of the side chain atoms given say a residue number? Also, not entirely related to Biopython, but I'm wondering if there is some way to get a sense of the overall shape of a protein? Like is it globular, a big string, a sheet or what? I can see if you looked at the bounding box, that might be a starting point, but does anyone have any other ideas? I habe been looking at it as a geometry type problem but haven't gotten too far yet. Thanks in advance, MarkL From livingstonemark at gmail.com Wed May 30 03:09:41 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Wed, 30 May 2012 13:09:41 +1000 Subject: [Biopython] Getting side chain atoms? Message-ID: Hi Guys, I notice on the wiki that it says the mailing list is at biopython at biopython.org, but when I suscribed it said to use biopython at lists.open-bio.org, so I'm wondering what the difference is? What is the simplest way to get a list of the side chain atoms given say a residue number? Also, not entirely related to Biopython, but I'm wondering if there is some way to get a sense of the overall shape of a protein? Like is it globular, a big string, a sheet or what? I can see if you looked at the bounding box, that might be a starting point, but does anyone have any other ideas? I habe been looking at it as a geometry type problem but haven't gotten too far yet. Thanks in advance, MarkL From dilara.ally at gmail.com Wed May 30 03:30:27 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Tue, 29 May 2012 20:30:27 -0700 Subject: [Biopython] replace header Message-ID: Hi Guys, I'm interested in replacing just one part of the header for every read in a 40Gb fastq file. Because the files are so huge I don't want to read the entire file into the memory just the single read and then rewrite to a new file. The problem as it stands is that I'm creating all new SeqRecord object, appending a list called newsolid. And then once that list is complete with all records, I write that list to a new file. Preferably I'd like to write each new SeqRecord immediately to a file. Sorry if I've missed this lesson in the Biopython tutorial and cook book! Any help would be greatly appreciated! Here is the code. from Bio import SeqIO from Bio.SeqRecord import SeqRecord newsolid=[] for seq_record in SeqIO.parse("solid_1.fastq", "fastq"): print seq_record.id original_header=seq_record.id import re subfind=r"(\w+)_(\w+)" result=re.search(subfind, original_header) print result.groups() subheader="_1" subreplace=r"\1_1" new_header=re.sub(subfind, subreplace, original_header) print new_header newfastqrecord=SeqRecord(seq_record.seq, id=new_header, letter_annotations=seq_record.letter_annotations) newsolid.append(newfastqrecord) output="newsolid_1.fastq" from Bio import SeqIO SeqIO.write(newsolid, output, "fastq") Cheers, Dilara From arklenna at gmail.com Wed May 30 03:44:41 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 29 May 2012 23:44:41 -0400 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: Hi Dilara, Opening a file for append with 'a' allows successive writes to go to the end of the file. Before the loop: out_handle = open("newsolid_1.fastq", 'a') In the loop: SeqIO.write(newfastqrecord, out_handle, "fastq") After the loop: out_handle.close() You may have to manually write newlines to the file but hopefully the fastq writer handles that properly. Hope that helps, Lenna From dilara.ally at gmail.com Wed May 30 03:54:13 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Tue, 29 May 2012 20:54:13 -0700 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: Thanks! That worked. On May 29, 2012, at 8:44 PM, Lenna Peterson wrote: > Hi Dilara, > > Opening a file for append with 'a' allows successive writes to go to > the end of the file. > > Before the loop: > > out_handle = open("newsolid_1.fastq", 'a') > > In the loop: > > SeqIO.write(newfastqrecord, out_handle, "fastq") > > After the loop: > > out_handle.close() > > > You may have to manually write newlines to the file but hopefully the > fastq writer handles that properly. > > Hope that helps, > > Lenna From anaryin at gmail.com Wed May 30 06:04:14 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 08:04:14 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: Hi Mark, The gyration tensor should give you the means of calculating how oblate or prolate your molecule is. Regarding the sidechain, i think you just have to manually do it, but since the backbone atoms are always the same it shouldn't be too hard. Cheers, Jo?o No dia 30 de Mai de 2012 05:10, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org, but when I suscribed it said to use > biopython at lists.open-bio.org, so I'm wondering what the difference is? > > What is the simplest way to get a list of the side chain atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering if there is > some way to get a sense of the overall shape of a protein? Like is it > globular, a big string, a sheet or what? I can see if you looked at > the bounding box, that might be a starting point, but does anyone have > any other ideas? I habe been looking at it as a geometry type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anaryin at gmail.com Wed May 30 06:04:14 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 08:04:14 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: Hi Mark, The gyration tensor should give you the means of calculating how oblate or prolate your molecule is. Regarding the sidechain, i think you just have to manually do it, but since the backbone atoms are always the same it shouldn't be too hard. Cheers, Jo?o No dia 30 de Mai de 2012 05:10, "Mark Livingstone" < livingstonemark at gmail.com> escreveu: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org, but when I suscribed it said to use > biopython at lists.open-bio.org, so I'm wondering what the difference is? > > What is the simplest way to get a list of the side chain atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering if there is > some way to get a sense of the overall shape of a protein? Like is it > globular, a big string, a sheet or what? I can see if you looked at > the bounding box, that might be a starting point, but does anyone have > any other ideas? I habe been looking at it as a geometry type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Wed May 30 07:45:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 08:45:29 +0100 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: On Wednesday, May 30, 2012, Mark Livingstone wrote: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , but when I suscribed it said to > use > biopython at lists.open-bio.org , so I'm wondering what the > difference is? > > They are the same - the OBF (open-bio.org) machine also handles the BioPerl mailing lists etc as well. Peter From p.j.a.cock at googlemail.com Wed May 30 07:45:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 08:45:29 +0100 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: On Wednesday, May 30, 2012, Mark Livingstone wrote: > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , but when I suscribed it said to > use > biopython at lists.open-bio.org , so I'm wondering what the > difference is? > > They are the same - the OBF (open-bio.org) machine also handles the BioPerl mailing lists etc as well. Peter From p.j.a.cock at googlemail.com Wed May 30 08:56:57 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 May 2012 09:56:57 +0100 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 4:54 AM, Dilara Ally wrote: > Thanks! ?That worked. > > On May 29, 2012, at 8:44 PM, Lenna Peterson wrote: > >> Hi c, >> >> Opening a file for append with 'a' allows successive writes to go to >> the end of the file. >> >> Before the loop: >> >> ? ?out_handle = open("newsolid_1.fastq", 'a') >> >> In the loop: >> >> ? ?SeqIO.write(newfastqrecord, out_handle, "fastq") >> >> After the loop: >> >> ? ?out_handle.close() Hi Dilara & Lenna, I would use append mode with caution - it will have side effects like if you run this script twice, the output file will double in size (the first run plus the second run). Wouldn't opening in write mode work just as well here? i.e. Open the handle, do the loop, close the handle. There are some other further changes I would suggest. First, you don't need to create a new SeqRecord, you can modify the old record in situ. This will be faster as it avoids extra object creation: ... newfastqrecord=SeqRecord(seq_record.seq, id=new_header, letter_annotations=seq_record.letter_annotations) ... becomes just: ... seq_record.id = new_header ... Next, it is better to call SeqIO.write(...) once to do the whole file. On simple file formats like FASTA, FASTQ, GenBank, there is no header/footer structure so you can write each record independently. In general this is not possible, e.g. SFF files. Moreover, multiple calls to SeqIO.write(...) is slower than one single call. The key point about using SeqIO.write(...) once to do a whole file is this requires an iterator based approach. For example, using a generator expression and a function acting on a single record: def modify_record(record): #Do something sensible to the headers here: record.id = "modified" return record #This is a generator expression: modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") print "Modified %i records" % count Equivalently using a generator function which does the looping itself: def modify_records(records): for record in records: #Do something sensible to the headers here: record.id = "modified" yield record count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", "fastq")), "newsolid_1.fastq", "fastq") print "Modified %i records" % count Getting to gripes with iterators and thinking this way takes a while - but it is extremely useful for dealing with large datasets efficiently (without running out of memory). Now, For FASTQ in particular, the files are usually very large, and using SeqIO and SeqRecord objects can be too slow. You might find this useful: http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ Peter From ferreirafm at usp.br Wed May 30 12:57:49 2012 From: ferreirafm at usp.br (Frederico Moras Ferreira) Date: Wed, 30 May 2012 09:57:49 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: Message-ID: <4FC6194D.1060802@usp.br> Hi Mark, I'm also very interested in overall protein shape analysis. I'm completely new to Biopython and can't help you much. Regarding to your question itself, that's something not trivial. One of the approaches would be to calculate the center of mass of your protein and iteratively calculate the momentum of inertia along three mutually perpendicular axes so as it is maximum in one direction and minimum in another. Sampling the momentum of inertia of the third axis and comparing with the other two will give a good estimation of your protein overall shape. Best of luck, Fred Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > Hi Mark, > > The gyration tensor should give you the means of calculating how oblate or > prolate your molecule is. > > Regarding the sidechain, i think you just have to manually do it, but since > the backbone atoms are always the same it shouldn't be too hard. > > Cheers, > > Jo?o > No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< > livingstonemark at gmail.com> escreveu: > >> Hi Guys, >> >> I notice on the wiki that it says the mailing list is at >> biopython at biopython.org, but when I suscribed it said to use >> biopython at lists.open-bio.org, so I'm wondering what the difference is? >> >> What is the simplest way to get a list of the side chain atoms given >> say a residue number? >> >> Also, not entirely related to Biopython, but I'm wondering if there is >> some way to get a sense of the overall shape of a protein? Like is it >> globular, a big string, a sheet or what? I can see if you looked at >> the bounding box, that might be a starting point, but does anyone have >> any other ideas? I habe been looking at it as a geometry type problem >> but haven't gotten too far yet. >> >> Thanks in advance, >> >> MarkL >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From golubchi at stats.ox.ac.uk Wed May 30 13:40:38 2012 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 30 May 2012 14:40:38 +0100 Subject: [Biopython] Bio.Phylo: midpoint root? Message-ID: <4FC62356.2030709@stats.ox.ac.uk> Hello, Does anyone have a quickie method for calculating the midpoint root of a given tree? Thanks, Tanya From eric.talevich at gmail.com Wed May 30 14:55:11 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 30 May 2012 10:55:11 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: <4FC62356.2030709@stats.ox.ac.uk> References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik wrote: > Hello, > > Does anyone have a quickie method for calculating the midpoint root of a > given tree? > > It's been on my to-do list. (The first step was adding the keyword argument 'outgroup_branch_length' to root_with_outgroup.) The tree method 'depths' should also be handy. The algorithm I had in mind looks like: 1. Take the depths() of each clade under the root. 2. Identify the deepest tip under each clade. 3. Assuming the tree is bifurcating, take the shallower tip as the "out_tip" and the deeper tip as the "in_tip". 4. If the difference between the depths of "out_tip" and "in_tip" are greater than the length of the branch connecting the two clades below the root (tree.clade[0].branch_length + tree.clade[1].branch_length), there's a possibility that a better out_tip is hiding inside the deeper clade. So, repeat the operation on tree.clade[1], recursively, until meeting the stop condition I just described. 5. To identify the midpoint, halve the distance between in_tip and out_tip, and trace backward from in_tip by that distance to reach the new root. With multifurcations, the algorithm looks similar, but with more loops. I too would be delighted to see a better algorithm for this. -E From anaryin at gmail.com Wed May 30 15:13:41 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 30 May 2012 17:13:41 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: <4FC6194D.1060802@usp.br> References: <4FC6194D.1060802@usp.br> Message-ID: Dear Frederico and Mark, I have a few scripts to do exactly what Frederico described, that play with Biopython. I will share them tomorrow and put an example here of how they work. Eventually it will become part of Biopython, in a future release I hope.. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/5/30 Frederico Moras Ferreira > Hi Mark, > I'm also very interested in overall protein shape analysis. I'm completely > new to Biopython and can't help you much. Regarding to your question > itself, that's something not trivial. One of the approaches would be to > calculate the center of mass of your protein and iteratively calculate the > momentum of inertia along three mutually perpendicular axes so as it is > maximum in one direction and minimum in another. Sampling the momentum of > inertia of the third axis and comparing with the other two will give a good > estimation of your protein overall shape. > Best of luck, > Fred > > Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > > Hi Mark, >> >> The gyration tensor should give you the means of calculating how oblate or >> prolate your molecule is. >> >> Regarding the sidechain, i think you just have to manually do it, but >> since >> the backbone atoms are always the same it shouldn't be too hard. >> >> Cheers, >> >> Jo?o >> No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< >> livingstonemark at gmail.com> escreveu: >> >> Hi Guys, >>> >>> I notice on the wiki that it says the mailing list is at >>> biopython at biopython.org, but when I suscribed it said to use >>> biopython at lists.open-bio.org, so I'm wondering what the difference is? >>> >>> What is the simplest way to get a list of the side chain atoms given >>> say a residue number? >>> >>> Also, not entirely related to Biopython, but I'm wondering if there is >>> some way to get a sense of the overall shape of a protein? Like is it >>> globular, a big string, a sheet or what? I can see if you looked at >>> the bounding box, that might be a starting point, but does anyone have >>> any other ideas? I habe been looking at it as a geometry type problem >>> but haven't gotten too far yet. >>> >>> Thanks in advance, >>> >>> MarkL >>> ______________________________**_________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/biopython >>> >>> ______________________________**_________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/biopython >> > > ______________________________**_________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/biopython > From ferreirafm at usp.br Wed May 30 16:55:23 2012 From: ferreirafm at usp.br (Frederico Moras Ferreira) Date: Wed, 30 May 2012 13:55:23 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: <4FC6194D.1060802@usp.br> Message-ID: <4FC650FB.8090502@usp.br> That's great! Look forward hearing from you. Cheers, Fred Em 30-05-2012 12:13, Jo?o Rodrigues escreveu: > Dear Frederico and Mark, > > I have a few scripts to do exactly what Frederico described, that play > with Biopython. I will share them tomorrow and put an example here of > how they work. Eventually it will become part of Biopython, in a > future release I hope.. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > 2012/5/30 Frederico Moras Ferreira > > > Hi Mark, > I'm also very interested in overall protein shape analysis. I'm > completely new to Biopython and can't help you much. Regarding to > your question itself, that's something not trivial. One of the > approaches would be to calculate the center of mass of your > protein and iteratively calculate the momentum of inertia along > three mutually perpendicular axes so as it is maximum in one > direction and minimum in another. Sampling the momentum of inertia > of the third axis and comparing with the other two will give a > good estimation of your protein overall shape. > Best of luck, > Fred > > Em 30-05-2012 03:04, Jo?o Rodrigues escreveu: > > Hi Mark, > > The gyration tensor should give you the means of calculating > how oblate or > prolate your molecule is. > > Regarding the sidechain, i think you just have to manually do > it, but since > the backbone atoms are always the same it shouldn't be too hard. > > Cheers, > > Jo?o > No dia 30 de Mai de 2012 05:10, "Mark Livingstone"< > livingstonemark at gmail.com > > escreveu: > > Hi Guys, > > I notice on the wiki that it says the mailing list is at > biopython at biopython.org , > but when I suscribed it said to use > biopython at lists.open-bio.org > , so I'm wondering > what the difference is? > > What is the simplest way to get a list of the side chain > atoms given > say a residue number? > > Also, not entirely related to Biopython, but I'm wondering > if there is > some way to get a sense of the overall shape of a protein? > Like is it > globular, a big string, a sheet or what? I can see if you > looked at > the bounding box, that might be a starting point, but does > anyone have > any other ideas? I habe been looking at it as a geometry > type problem > but haven't gotten too far yet. > > Thanks in advance, > > MarkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From eric.talevich at gmail.com Thu May 31 04:36:49 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 31 May 2012 00:36:49 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: On Wed, May 30, 2012 at 10:55 AM, Eric Talevich wrote: > On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik wrote: > >> Hello, >> >> Does anyone have a quickie method for calculating the midpoint root of a >> given tree? >> >> > It's been on my to-do list. (The first step was adding the keyword > argument 'outgroup_branch_length' to root_with_outgroup.) The tree method > 'depths' should also be handy. > > The algorithm I had in mind looks like: > > > I too would be delighted to see a better algorithm for this. > > I implemented this in an intuitive but very inefficient way, calculating the pairwise distances between all tips of the tree. You can try it from git: https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 It would still be nice to see a better algorithm, if anyone has one on hand. -E From francesco.strozzi at gmail.com Thu May 31 09:11:25 2012 From: francesco.strozzi at gmail.com (Francesco Strozzi) Date: Thu, 31 May 2012 11:11:25 +0200 Subject: [Biopython] EU Codefest 2012 Announcement Message-ID: The Open Bioinformatics Foundation (OBF) EU-CodeFest will be held in Parco Tecnologico Padano (PTP) Lodi, Italy on the19th ? 20th of July. The CodeFest is a small focused event under the auspices of the Open Bioinformatics Foundation, and is a sister event of BOSC2012 being held in California USA this year. Three main topics will be worked on during the CodeFest: - NGS and high performance parsers for OpenBio projects. - RDF and semantic web for bioinformatics. - Bioinformatics pipelines definition, execution and distribution. The number of places is limited to 30 participants at maximum, on a first come, first serve basis. Undergraduate and PhD students are welcome to participate. The cost of the event is EUR 100 per person, which includes also lunches, coffee breaks and the social dinner on the 19th of July. Only for students, we can sponsor a limited number of attendees that will not pay for the registration fee. Those students, willing to participate for free to the event, will be asked to submit their qualifications and experience in software development. The organizing committee will review students? applications before final acceptance. Talks and abstracts may be presented during the CodeFest in sessions of 10 minutes plus questions. Coding activities will continue during the talks. The City of Lodi is very close to Milano and has good hotel facilities. The connections by air are excellent, via Milano Malpensa, Milano Linate and Bergamo Orio Al Serio airports. Please register soon using the form at this page http://tecnoparco.org/codefest, places may run out quickly. -- Francesco From anaryin at gmail.com Thu May 31 10:54:55 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 31 May 2012 12:54:55 +0200 Subject: [Biopython] Getting side chain atoms? In-Reply-To: <4FC650FB.8090502@usp.br> References: <4FC6194D.1060802@usp.br> <4FC650FB.8090502@usp.br> Message-ID: I included the header already as a biopython module, since parts of this script are my work at GSOC 2010, and others are Ezgi's independent work that she agreed to contribute. Just a safety measure :) The usage of the code is very simple. Parse a structure with biopython and use the calculate_shape_param function of this geometry module to calculate all values required to compute the shape of your molecule. It will not tell you what shape it is, but it will give you all the ingredients (if the anisotropy is 0, your molecule is spherical, for example). Let me know of any comments, and Ezgi too, as she is the main contributor to this. *Again, this should not be included, for now, in the main distribution, nor considered an "official" addition. * Download here: http://nmr.chem.uu.nl/~joao/f/geometry.py Cheers, Jo?o From golubchi at stats.ox.ac.uk Thu May 31 12:10:21 2012 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Thu, 31 May 2012 13:10:21 +0100 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: <4FC75FAD.2050909@stats.ox.ac.uk> Thanks, Eric - will have a go! T On 31/05/12 05:36, Eric Talevich wrote: > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich > wrote: > > On Wed, May 30, 2012 at 9:40 AM, Tanya Golubchik > > wrote: > > Hello, > > Does anyone have a quickie method for calculating the midpoint > root of a > given tree? > > > It's been on my to-do list. (The first step was adding the keyword > argument 'outgroup_branch_length' to root_with_outgroup.) The tree > method 'depths' should also be handy. > > The algorithm I had in mind looks like: > > > > > I too would be delighted to see a better algorithm for this. > > > I implemented this in an intuitive but very inefficient way, calculating > the pairwise distances between all tips of the tree. You can try it from > git: > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > It would still be nice to see a better algorithm, if anyone has one on hand. > > -E From ferreirafm at usp.br Thu May 31 13:06:54 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Thu, 31 May 2012 10:06:54 -0300 Subject: [Biopython] Getting side chain atoms? In-Reply-To: References: <4FC6194D.1060802@usp.br> <4FC650FB.8090502@usp.br> Message-ID: <4FC76CEE.1070300@usp.br> Hi Jo?o and Ezgi, That's a very nice piece code. I'll do some tests and let you know the results. All the Best, Fred Em 31-05-2012 07:54, Jo?o Rodrigues escreveu: > I included the header already as a biopython module, since parts of > this script are my work at GSOC 2010, and others are Ezgi's > independent work that she agreed to contribute. Just a safety measure :) > > The usage of the code is very simple. Parse a structure with biopython > and use the calculate_shape_param function of this geometry module to > calculate all values required to compute the shape of your molecule. > It will not tell you what shape it is, but it will give you all the > ingredients (if the anisotropy is 0, your molecule is spherical, for > example). > > Let me know of any comments, and Ezgi too, as she is the main > contributor to this. > > *_Again, this should not be included, for now, in the main > distribution, nor considered an "official" addition. > _* > > Download here: http://nmr.chem.uu.nl/~joao/f/geometry.py > > > Cheers, > > Jo?o From b.invergo at gmail.com Thu May 31 13:31:21 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 31 May 2012 15:31:21 +0200 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: References: <4FC62356.2030709@stats.ox.ac.uk> Message-ID: <1338471081.627.7.camel@localhost.localdomain> On Thu, 2012-05-31 at 00:36 -0400, Eric Talevich wrote: > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich wrote: > I implemented this in an intuitive but very inefficient way, calculating > the pairwise distances between all tips of the tree. You can try it from > git: > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > It would still be nice to see a better algorithm, if anyone has one on hand. > > -E I sped it up a little bit by getting rid of those nested for loops: https://github.com/brandoninvergo/biopython/commit/102189cd49d448423ee160a0a0ad891b58f56c26 According to a naive benchmark of comparing execution times for the unit test, this version is about 40% faster (0.901s vs 0.524s on my computer). I'll do a pull request... As for the problem of accumulating floating point rounding errors, perhaps you can do the root operations on copies of the tree instead... -brandon From eric.talevich at gmail.com Thu May 31 16:30:07 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 31 May 2012 12:30:07 -0400 Subject: [Biopython] Bio.Phylo: midpoint root? In-Reply-To: <1338471081.627.7.camel@localhost.localdomain> References: <4FC62356.2030709@stats.ox.ac.uk> <1338471081.627.7.camel@localhost.localdomain> Message-ID: On Thu, May 31, 2012 at 9:31 AM, Brandon Invergo wrote: > On Thu, 2012-05-31 at 00:36 -0400, Eric Talevich wrote: > > On Wed, May 30, 2012 at 10:55 AM, Eric Talevich >wrote: > > I implemented this in an intuitive but very inefficient way, calculating > > the pairwise distances between all tips of the tree. You can try it from > > git: > > > https://github.com/biopython/biopython/commit/94c128bd428cc5d53b50edd1d2e4730ee212f530 > > > > It would still be nice to see a better algorithm, if anyone has one on > hand. > > > > -E > > I sped it up a little bit by getting rid of those nested for loops: > > https://github.com/brandoninvergo/biopython/commit/102189cd49d448423ee160a0a0ad891b58f56c26 > > According to a naive benchmark of comparing execution times for the unit > test, this version is about 40% faster (0.901s vs 0.524s on my > computer). I'll do a pull request... > > As for the problem of accumulating floating point rounding errors, > perhaps you can do the root operations on copies of the tree instead... > > -brandon > > Looks better, thanks! I merged it. I'll look into the rounding issue some more. It might be enough to make a single copy of the tree, do all the rerooting and distance calculation there, and use the original copy to calculate the outgroup branch length and do a single rerooting. Alternatively, I could add a separate tree method that generates pairwise distances without rerooting the tree -- either producing a big dictionary, or an iterable of ((node1, node2), distance) which could be easily fed to a dictionary if needed. From vinkurella at yahoo.com Thu May 31 17:28:13 2012 From: vinkurella at yahoo.com (vinodh kurella) Date: Thu, 31 May 2012 10:28:13 -0700 (PDT) Subject: [Biopython] Help needed to fix an error in biopython on mac osx Message-ID: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Hi Biopython community, I am trying to install?biopython?onto my?mac osx?(10.7.4) but after downloading the?biopython 1.59.tar.gz?from?http://biopython.org/wiki/Download?and installation get some error which is pasted below. $cd biopython-1.59 $sudo python?setup.py?install It works until it gets to this error and aborts. Could anyone please let me know what is wrong ? Thanks and appreciate your help. Below is the report, have cut out (......) the middle and have ?shown the start and the end output only. Vinodh running install running build running build_py creating build creating build/lib.macosx-10.7-intel-2.7 creating build/lib.macosx-10.7-intel-2.7/Bio copying Bio/__init__.py -> build/lib.macosx-10.7-intel-2.7/Bio copying Bio/_py3k.py -> build/lib.macosx-10.7-intel-2.7/Bio . . . . copying Bio/PopGen/SimCoal/data/ssm_2d.par -> build/lib.macosx-10.7-intel-2.7/Bio/PopGen/SimCoal/data running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.7-intel-2.7 creating build/temp.macosx-10.7-intel-2.7/Bio llvm-gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -IBio -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.7-intel-2.7/Bio/cpairwise2module.o unable to execute llvm-gcc-4.2: No such file or directory error: command 'llvm-gcc-4.2' failed with exit status 1 From arklenna at gmail.com Thu May 31 17:46:05 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 31 May 2012 13:46:05 -0400 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: Hi Vinodh, Do you have XCode installed? It's required to build Biopython on a Mac. llvm-gcc-4.2 is a C compiler used by Apple. Lenna From anaryin at gmail.com Thu May 31 17:49:17 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 31 May 2012 19:49:17 +0200 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: Is this on Lion? From arklenna at gmail.com Thu May 31 17:55:40 2012 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 31 May 2012 13:55:40 -0400 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 4:56 AM, Peter Cock wrote: > > Hi Dilara & Lenna, > > I would use append mode with caution - it will have side effects > like if you run this script twice, the output file will double in size > (the first run plus the second run). Wouldn't opening in write > mode work just as well here? > i.e. Open the handle, do the loop, close the handle. > Hi Peter, Thanks for the warning. Python, making me adjust my thought patterns every day. I'm used to shell, > vs >> for cat etc. I had never tried multiple writes to a single open file. But the behavior is logical. > > The key point about using SeqIO.write(...) once to do a whole > file is this requires an iterator based approach. For example, > using a generator expression and a function acting on a single > record: > > def modify_record(record): > ? ?#Do something sensible to the headers here: > ? ?record.id = "modified" > ? ?return record > #This is a generator expression: > modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) > count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") > print "Modified %i records" % count > > Equivalently using a generator function which does the > looping itself: > > def modify_records(records): > ? ?for record in records: > ? ? ? ?#Do something sensible to the headers here: > ? ? ? ?record.id = "modified" > ? ? ? ?yield record > count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", > "fastq")), "newsolid_1.fastq", "fastq") > print "Modified %i records" % count The generator function is nice, too. I presume this only works because SeqIO.write knows how to write from an iterator? Lenna From p.j.a.cock at googlemail.com Thu May 31 18:06:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 31 May 2012 19:06:44 +0100 Subject: [Biopython] Help needed to fix an error in biopython on mac osx In-Reply-To: References: <1338485293.26613.YahooMailNeo@web161302.mail.bf1.yahoo.com> Message-ID: On Thu, May 31, 2012 at 6:49 PM, Jo?o Rodrigues wrote: > Is this on Lion? Yes, he said OS X 10.7 which is Lion. You can download Xcode free from the Apple Mac App Store (it just takes a while, several GB in size). Peter From p.j.a.cock at googlemail.com Thu May 31 18:10:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 31 May 2012 19:10:11 +0100 Subject: [Biopython] replace header In-Reply-To: References: Message-ID: On Thu, May 31, 2012 at 6:55 PM, Lenna Peterson wrote: >> The key point about using SeqIO.write(...) once to do a whole >> file is this requires an iterator based approach. For example, >> using a generator expression and a function acting on a single >> record: >> >> def modify_record(record): >> ? ?#Do something sensible to the headers here: >> ? ?record.id = "modified" >> ? ?return record >> #This is a generator expression: >> modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq")) >> count = SeqIO.write(modified, "newsolid_1.fastq", "fastq") >> print "Modified %i records" % count >> >> Equivalently using a generator function which does the >> looping itself: >> >> def modify_records(records): >> ? ?for record in records: >> ? ? ? ?#Do something sensible to the headers here: >> ? ? ? ?record.id = "modified" >> ? ? ? ?yield record >> count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq", >> "fastq")), "newsolid_1.fastq", "fastq") >> print "Modified %i records" % count > > > The generator function is nice, too. I presume this only works because > SeqIO.write knows how to write from an iterator? > > Lenna Bio.SeqIO.write is *designed* to take a Python iterator of SeqRecord objects. That can be a generator function, generator expression, a custom class which supports iteration, or even a simple list or tuple of SeqRecord objects all in memory. As a special case connivence it also accepts a single SeqRecord. Peter From ferreirafm at usp.br Thu May 31 18:52:00 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Thu, 31 May 2012 15:52:00 -0300 Subject: [Biopython] geometry.py Message-ID: <4FC7BDD0.2020905@usp.br> Hi Jo?o, The gyration radio (Rg) is running just fine. They are in excellent agreement with those from some models I have tested. However, the maximum dimensions do not match at all. Did orientate the model before tensor analysis? Fred