From mjldehoon at yahoo.com Wed Aug 1 01:14:52 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 31 Jul 2012 22:14:52 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm Message-ID: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi everybody, I was using the search_pwm method in Bio.Motif (which btw is very useful, thanks Bartek) to search for motif instances on both strands of a sequence. If the motif starts at position and is located on the forward strand, this function returns +position; if it is located on the reverse strand, it returns -position. So for position==0, we cannot deduce from the sign whether the motif is located on the forward or on the backward strand. How about using Python-style negative indices to indicate the strand? For example, +20 means that the motif is located at [20:20+motif_length] on the forward strand, while -20 means that the motif is located at [-20:-20+motif_length]. Alternatively, we could return the strand explicitly. In the same function, I wish we could get rid of this line: sequence=sequence.tostring().upper() since this assumes that sequence is a Biopython Seq object, and not a plain string. We could either use str(sequence) instead of sequence.tostring() to cover both cases, or have the Seq class inherit from strings (which we have been discussing for some time; see https://redmine.open-bio.org/issues/2351). Best, -Michiel. From p.j.a.cock at googlemail.com Wed Aug 1 04:31:15 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Aug 2012 09:31:15 +0100 Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Wed, Aug 1, 2012 at 6:14 AM, Michiel de Hoon wrote: > Hi everybody, > > I was using the search_pwm method in Bio.Motif (which btw > is very useful, thanks Bartek) to search for motif instances > on both strands of a sequence. If the motif starts at position > and is located on the forward strand, this function returns > +position; if it is located on the reverse strand, it returns > -position. So for position==0, we cannot deduce from the > sign whether the motif is located on the forward or on the > backward strand. That is a problem :( > How about using Python-style negative indices to indicate > the strand? For example, +20 means that the motif is > located at [20:20+motif_length] on the forward strand, > while -20 means that the motif is located at [-20:-20+motif_length]. > > Alternatively, we could return the strand explicitly. Either makes sense, but would be a break - but probably a necessary break in backwards compatibility. > In the same function, I wish we could get rid of this line: > > sequence=sequence.tostring().upper() > > since this assumes that sequence is a Biopython Seq > object, and not a plain string. Allowing a plain string makes good sense. +1 > We could either use str(sequence) instead of sequence.tostring() > to cover both cases, That would also accept other objects accidentally, e.g. a list, and probably lead to some obscure errors downstream. > or have the Seq class inherit from strings (which we have > been discussing for some time; see > https://redmine.open-bio.org/issues/2351). Or perhaps the Seq is already string like enough for this function (it supports upper()) so no casting is needed? That would be simpler - although likely not a fast. Or, we could follow the pattern used in Bio.SeqUtils and try the tostring() method, catching any AttributeError and then treating it like a string (since real strings don't have this). The advantage of this route is low risk. Peter From livingstonemark at gmail.com Thu Aug 2 00:41:29 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Thu, 2 Aug 2012 14:41:29 +1000 Subject: [Biopython] Superimpose description error? Message-ID: Hi Guys, In Bio.PDB.Superimpose, it says: set_atoms(self, fixed, moving) Put (translate/rotate) the atoms in fixed on the atoms in moving, in such a way that the RMSD is minimized. Aren't the words fixed & moving in that description round the wrong way? In my research at present I am using a curanted set of 2,141 pairs of PDB files curated because they have only 1 mutation. Unfortunately, because Superimpose counts atoms before alignment, only 9 / 2,141 PDB pairs will align using the example shown in Superimpose.py source code. The way I have gotten around this is for each of the two PDBs to make a List of the CA atoms and the align these lists. Not optimal, but seems to work. The only other way I can see would be to get a full list of atoms then snip out the mutation side chain atoms allowing Superimpose to work as per the source code example - also not optimal but close. I am doing this because I am experimenting with different ways of using RMSD, so the better I get the alignment, the better my results - even if it is only different in decimal place differences. Are there any better approaches? Thanks in advance, MArkL From mjldehoon at yahoo.com Thu Aug 2 09:23:30 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 2 Aug 2012 06:23:30 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: Message-ID: <1343913810.81963.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi everybody, > On Wed, Aug 1, 2012 at 7:14 AM, Michiel de Hoon > wrote: > > Hi everybody, > > > > I was using the search_pwm method in Bio.Motif (which > btw is very useful, thanks Bartek) to search for motif > instances on both strands of a sequence. If the motif starts > at position and is located on the forward strand, this > function returns +position; if it is located on the reverse > strand, it returns -position. So for position==0, we cannot > deduce from the sign whether the motif is located on the > forward or on the backward strand. > > > > How about using Python-style negative indices to > indicate the strand? For example, +20 means that the motif > is located at [20:20+motif_length] on the forward strand, > while -20 means that the motif is located at > [-20:-20+motif_length]. > > Very nice idea! +1 from me Done; see https://github.com/biopython/biopython/commit/d7b67b7192b211b6bd1e4ca6e42eee55c2bc34a8 Best, -Michiel. From mjldehoon at yahoo.com Thu Aug 2 09:27:56 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 2 Aug 2012 06:27:56 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: Message-ID: <1343914076.83712.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi all, > > We could either use str(sequence) instead of > > sequence.tostring() to cover both [plain strings and Seq objects] > > That would also accept other objects accidentally, e.g. a > list, and probably lead to some obscure errors downstream. > > > or have the Seq class inherit from strings (which we > > have been discussing for some time; see > > https://redmine.open-bio.org/issues/2351). > > Or perhaps the Seq is already string like enough for this > function (it supports upper()) so no casting is needed? This indeed works, so I simply removed the casting. > Or, we could follow the pattern used in Bio.SeqUtils and try > the tostring() method, catching any AttributeError and then > treating it like a string (since real strings don't have > this). The advantage of this route is low risk. To avoid these kinds of complications, at some point we should really move forward to let Seq inherit from plain strings. We have been discussing this issue for five years now. Best, -Michiel. From anaryin at gmail.com Fri Aug 3 03:47:20 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 3 Aug 2012 09:47:20 +0200 Subject: [Biopython] Superimpose description error? In-Reply-To: References: Message-ID: Hey Mark, Indeed it is wrong the other way. Well, Id say it depends what you are looking at. You can trim all selections to bb + cb/h (gly) so that you always have the same number of atoms. That is what we do here for mutant analysis. you can also code to iteratively add side chain carbons until the wt and mutant no longer match. This would give you a better description of the side chain orientation. No dia 2 de Ago de 2012 06:42, "Mark Livingstone" escreveu: > Hi Guys, > > In Bio.PDB.Superimpose, it says: > > set_atoms(self, fixed, moving) > Put (translate/rotate) the atoms in fixed on the atoms in moving, in > such a way that the RMSD is minimized. > > Aren't the words fixed & moving in that description round the wrong way? > > In my research at present I am using a curanted set of 2,141 pairs of > PDB files curated because they have only 1 mutation. Unfortunately, > because Superimpose counts atoms before alignment, only 9 / 2,141 PDB > pairs will align using the example shown in Superimpose.py source > code. > > The way I have gotten around this is for each of the two PDBs to make > a List of the CA atoms and the align these lists. Not optimal, but > seems to work. > > The only other way I can see would be to get a full list of atoms then > snip out the mutation side chain atoms allowing Superimpose to work as > per the source code example - also not optimal but close. > > I am doing this because I am experimenting with different ways of > using RMSD, so the better I get the alignment, the better my results - > even if it is only different in decimal place differences. Are there > any better approaches? > > Thanks in advance, > > MArkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anna.kostikova at gmail.com Tue Aug 7 06:13:16 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Tue, 7 Aug 2012 12:13:16 +0200 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) Message-ID: Dear list users, I am wondering if Bio.Geography module is supported in Biophython? I was trying to access it following http://biopython.org/wiki/BioGeography guidelines, but got an error message (ImportError: No module named Geography.GbifXml) Thanks a lot in advance, Anna From chapmanb at 50mail.com Wed Aug 8 09:57:35 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 08 Aug 2012 09:57:35 -0400 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) In-Reply-To: References: Message-ID: <871ujhh4fk.fsf@fastmail.fm> Anna; Bio.Geography is available on a separate fork of Biopython and wasn't merged into the main codebase, so you'd need to install from there: http://github.com/nmatzke/biopython/tree/Geography Brad > Dear list users, > > I am wondering if Bio.Geography module is supported in Biophython? > I was trying to access it following > http://biopython.org/wiki/BioGeography guidelines, > but got an error message (ImportError: No module named Geography.GbifXml) > > Thanks a lot in advance, > Anna > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From anna.kostikova at gmail.com Thu Aug 9 05:35:35 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Thu, 9 Aug 2012 11:35:35 +0200 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) In-Reply-To: <871ujhh4fk.fsf@fastmail.fm> References: <871ujhh4fk.fsf@fastmail.fm> Message-ID: Thanks a lot Brad, It works nicely once I've added it as separate module. Thanks again! Anna 2012/8/8 Brad Chapman : > > Anna; > Bio.Geography is available on a separate fork of Biopython and wasn't > merged into the main codebase, so you'd need to install from there: > > http://github.com/nmatzke/biopython/tree/Geography > > Brad > >> Dear list users, >> >> I am wondering if Bio.Geography module is supported in Biophython? >> I was trying to access it following >> http://biopython.org/wiki/BioGeography guidelines, >> but got an error message (ImportError: No module named Geography.GbifXml) >> >> Thanks a lot in advance, >> Anna >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython From devaniranjan at gmail.com Sat Aug 11 17:37:48 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 11 Aug 2012 17:37:48 -0400 Subject: [Biopython] Markov modelling Message-ID: Hi, I am sorry, I know this is not the appropriate forum for asking this question but I am not sure which is, so please pardon me for asking this here. Any help in this matter would be much appreciated. I have some data that is quite large and I wish to find the relationship between them but doing it by visual inspection is not possible. I was told maybe Markov modelling might help. My question is, is there a simple modelling tool that is out there I can use to do Markov modelling? I would appreciate any help in this as I am completely out of my depth in this question. Thank you very much and once again my apologies for asking this question in this forum. George From chris.mit7 at gmail.com Sat Aug 11 18:32:42 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Sat, 11 Aug 2012 18:32:42 -0400 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: Hi George, A few things I would do: 1) Make a spare dataset by randomly selecting a subset of your data to visualize. Just blindly fitting the data points to any distribution usually doesn't end well. 2) Find a copy of Numerical Recipes and read up on fitting data/modeling. There is a good section on Markov Models and other fitting routines. 3) Think about what you are trying to fit. Is the process generating the data a Poisson Process? What sort of equation should be used to model it, and what do the extracted parameters mean in relation to your data? Start with the simplest model you can explain (linear) and add/change parameters only if you can justify it. Chris On Sat, Aug 11, 2012 at 5:37 PM, George Devaniranjan wrote: > Hi, > > I am sorry, I know this is not the appropriate forum for asking this > question but I am not sure which is, so please pardon me for asking this > here. > Any help in this matter would be much appreciated. > > I have some data that is quite large and I wish to find the relationship > between them but doing it by visual inspection is not possible. > I was told maybe Markov modelling might help. > My question is, is there a simple modelling tool that is out there I can > use to do Markov modelling? I would appreciate any help in this as I am > completely out of my depth in this question. > > Thank you very much and once again my apologies for asking this question in > this forum. > > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Sat Aug 11 22:03:26 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 11 Aug 2012 22:03:26 -0400 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: Thank you very much Chris--I will read the numerical recipes. I should have explained more about the data. I extracted unique 5 mer patterns of amino acids from a database and look at their secondary structures. I want to know is there a relationship between the amino acid sequence and the secondary structure. This is the reason I thought Markov modelling might give me some ideas. On Sat, Aug 11, 2012 at 6:32 PM, Chris Mitchell wrote: > Hi George, > > A few things I would do: > 1) Make a spare dataset by randomly selecting a subset of your data to > visualize. Just blindly fitting the data points to any distribution > usually doesn't end well. > 2) Find a copy of Numerical Recipes and read up on fitting data/modeling. > There is a good section on Markov Models and other fitting routines. > 3) Think about what you are trying to fit. Is the process generating the > data a Poisson Process? What sort of equation should be used to model it, > and what do the extracted parameters mean in relation to your data? Start > with the simplest model you can explain (linear) and add/change parameters > only if you can justify it. > > Chris > > On Sat, Aug 11, 2012 at 5:37 PM, George Devaniranjan < > devaniranjan at gmail.com> wrote: > >> Hi, >> >> I am sorry, I know this is not the appropriate forum for asking this >> question but I am not sure which is, so please pardon me for asking this >> here. >> Any help in this matter would be much appreciated. >> >> I have some data that is quite large and I wish to find the relationship >> between them but doing it by visual inspection is not possible. >> I was told maybe Markov modelling might help. >> My question is, is there a simple modelling tool that is out there I can >> use to do Markov modelling? I would appreciate any help in this as I am >> completely out of my depth in this question. >> >> Thank you very much and once again my apologies for asking this question >> in >> this forum. >> >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From ferreirafm at usp.br Mon Aug 13 14:21:45 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Mon, 13 Aug 2012 15:21:45 -0300 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: <502945B9.5080108@usp.br> Hi George, That's not exactly a simple modelling tool, but you can give a try to the HMMER (http://hmmer.janelia.org). It comes with very good tutorials on Markov modelling. Best, Fred Em 11-08-2012 18:37, George Devaniranjan escreveu: > Hi, > > I am sorry, I know this is not the appropriate forum for asking this > question but I am not sure which is, so please pardon me for asking this > here. > Any help in this matter would be much appreciated. > > I have some data that is quite large and I wish to find the relationship > between them but doing it by visual inspection is not possible. > I was told maybe Markov modelling might help. > My question is, is there a simple modelling tool that is out there I can > use to do Markov modelling? I would appreciate any help in this as I am > completely out of my depth in this question. > > Thank you very much and once again my apologies for asking this question in > this forum. > > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From smwilson at hpc.unm.edu Tue Aug 14 10:10:53 2012 From: smwilson at hpc.unm.edu (Susan Wilson) Date: Tue, 14 Aug 2012 08:10:53 -0600 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files Message-ID: <502A5C6D.5040905@hpc.unm.edu> Hi, I am parsing the gb files with biopython. My problem is that none of the seqfeature.strand values are returning the plus strand (value == 1). The commands below are a bit fabricated. (For instance, I have left out the opening and closing of fout.) I have read in Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. The file output of command [13] shows only "-1" and "None". Is there a bug in the parser? Or am I making a mistake of some sort? Thanks. Susan In [10]: genome Out[10]: SeqRecord(seq=Seq('NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN', Alphabet()), id='1GRCh37', name='1', description='Homo sapiens chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL', dbxrefs=[]) In [11]: len(genome) Out[11]: 249250621 In [12]: len(genome.features) Out[12]: 109751 In [13]: for f in genome.features: ...: fout.write(str(f.strand) + "~" + str(f.location) + \ ...: "~" + str(f.qualifiers.get('gene')) + "\n") From p.j.a.cock at googlemail.com Tue Aug 14 10:46:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 15:46:54 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A5C6D.5040905@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 3:10 PM, Susan Wilson wrote: > Hi, > > I am parsing the gb files with biopython. My problem is that none of the > seqfeature.strand values are returning the plus strand (value == 1). That should happen with a protein sequence. > The commands below are a bit fabricated. (For instance, I have left out the > opening and closing of fout.) I have read in > Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. What URL are you getting that file from? Which version of Biopython are you using? There were some strand related changes recently (internally moving it from the SeqFeature to the SeqFeature's location object). Thanks, Peter From smwilson at hpc.unm.edu Tue Aug 14 10:54:24 2012 From: smwilson at hpc.unm.edu (Susan Wilson) Date: Tue, 14 Aug 2012 08:54:24 -0600 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: References: <502A5C6D.5040905@hpc.unm.edu> Message-ID: <502A66A0.2090207@hpc.unm.edu> Hi Peter, Thanks for quick response. I have downloaded the files from ftp://ftp.ensembl.org/pub/release-68/genbank/homo_sapiens/. Got version 1.53 of biopython. Maybe I should try 1.6? Here's some diagnostics: $ head Homo_sapiens.GRCh37.68.chromosome.1.dat LOCUS 1 249250621 bp DNA HTG 14-JUL-2012 DEFINITION Homo sapiens chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL ACCESSION chromosome:GRCh37:1:1:249250621:1 VERSION 1GRCh37 KEYWORDS . SOURCE human ORGANISM Homo sapiens . COMMENT This sequence was annotated by the Ensembl system. Please visit the Output from ipython: import sys sys.version_info Out[3]: (2, 6, 5, 'final', 0) sys.version Out[4]: '2.6.5 (r265:79063, Apr 16 2010, 13:57:41) \n[GCC 4.4.3]' import Bio print Bio.__version__ 1.53 On 08/14/2012 08:46 AM, Peter Cock wrote: > On Tue, Aug 14, 2012 at 3:10 PM, Susan Wilson wrote: >> Hi, >> >> I am parsing the gb files with biopython. My problem is that none of the >> seqfeature.strand values are returning the plus strand (value == 1). > That should happen with a protein sequence. > >> The commands below are a bit fabricated. (For instance, I have left out the >> opening and closing of fout.) I have read in >> Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. > What URL are you getting that file from? > > Which version of Biopython are you using? There were some strand > related changes recently (internally moving it from the SeqFeature to > the SeqFeature's location object). > > Thanks, > > Peter From p.j.a.cock at googlemail.com Tue Aug 14 11:38:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 16:38:42 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A66A0.2090207@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> <502A66A0.2090207@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 3:54 PM, Susan Wilson wrote: > Hi Peter, > > Thanks for quick response. I have downloaded the files from > ftp://ftp.ensembl.org/pub/release-68/genbank/homo_sapiens/. Got version 1.53 > of biopython. Maybe I should try 1.6? Biopython 1.53 was released over two years ago (December 2009). The current release is 1.60 (one dot sixty), there never was a 1.6 (one dot six). Yes, please try the current Biopython. It seems fine here at least - using this quick test I seem to get strands of +1 or -1 only as expected: from Bio import SeqIO genome = SeqIO.read("Homo_sapiens.GRCh37.68.chromosome.1.dat", "gb") for f in genome.features: print f.strand, f.location, f.qualifiers.get("gene") Going back to Biopython 1.53 on my machine (which didn't allow a filename in SeqIO thus needs an explicit open), I get a parser warning: UserWarning: Malformed LOCUS line found - is this correct? LOCUS 1 249250621 bp DNA HTG 14-JUL-2012 You should have seen this warning on your machine. Did you? This meant the sequence wasn't considered DNA or RNA (but an unspecified alphabet), and as a result the strand wasn't set to +1, but left as None (which would normally only happen on proteins). At some point the LOCUS line handling was updated, so it now does recognise this as a nucleotide sequence. Peter From p.j.a.cock at googlemail.com Tue Aug 14 15:55:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 20:55:42 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A9442.9070702@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> <502A66A0.2090207@hpc.unm.edu> <502A9442.9070702@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 7:09 PM, Susan Wilson wrote: > Peter! > > Thanks. I've been in a meeting all morning and apologize for not responding > sooner. > > Yes I always see the warning about the malformed LOCUS line. I googled > around, but couldn't find much, except about the second entry on the line > being a problem, but when I substituted a character string for the "1", it > didn't change anything, and since most everything else I was parsing seemed > correct, I was just ignoring it. > > I will get the 1.60 version of Biopython and try it all again. I greatly > appreciate your assistance. > > Susan Great - good luck, Peter From bala.biophysics at gmail.com Thu Aug 30 06:35:19 2012 From: bala.biophysics at gmail.com (Bala subramanian) Date: Thu, 30 Aug 2012 12:35:19 +0200 Subject: [Biopython] creating atom list Message-ID: Friends, 1) Sorry if this question is repeated. I want to create an atom list for super-imposition, what i understand from documentation and examples given on the net is that i have to loop through model-chain-residues to create a list of atoms that i need. Is there any way to select CA or backbone atoms in a single line rather looping over the SMCRA hierarchy. Once i create a structure object, i just want to create a list of its backbone atoms. 2) I could nt do a search in the biopython user forum for previous questions posted on a topic. I looked in the following link provided in Biopython wiki. Am i looking at the wrong link, http://lists.open-bio.org/pipermail/biopython/ Thanks, Bala -- C. Balasubramanian From anaryin at gmail.com Thu Aug 30 07:08:56 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 30 Aug 2012 13:08:56 +0200 Subject: [Biopython] creating atom list In-Reply-To: References: Message-ID: On my phone, short answer: bb = set(['ca', 'c', 'n', 'o']) selection = [atom for atom in struct.get_atoms() if atom.name in bb] No dia 30 de Ago de 2012 12:38, "Bala subramanian" < bala.biophysics at gmail.com> escreveu: > Friends, > 1) Sorry if this question is repeated. I want to create an atom list > for super-imposition, what i understand from documentation and > examples given on the net is that i have to loop through > model-chain-residues to create a list of atoms that i need. Is there > any way to select CA or backbone atoms in a single line rather looping > over the SMCRA hierarchy. Once i create a structure object, i just > want to create a list of its backbone atoms. > > 2) I could nt do a search in the biopython user forum for previous > questions posted on a topic. I looked in the following link provided > in Biopython wiki. Am i looking at the wrong link, > > http://lists.open-bio.org/pipermail/biopython/ > > Thanks, > Bala > > -- > C. Balasubramanian > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Aug 30 07:32:30 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 Aug 2012 12:32:30 +0100 Subject: [Biopython] creating atom list In-Reply-To: References: Message-ID: On Thu, Aug 30, 2012 at 11:35 AM, Bala subramanian wrote: > Friends, > 1) Sorry if this question is repeated. I want to create an atom list > for super-imposition, what i understand from documentation and > examples given on the net is that i have to loop through > model-chain-residues to create a list of atoms that i need. Is there > any way to select CA or backbone atoms in a single line rather looping > over the SMCRA hierarchy. Once i create a structure object, i just > want to create a list of its backbone atoms. Joao has given a short answer - see also this (old) example, http://www.warwick.ac.uk/go/peter_cock/python/protein_superposition/ > 2) I could nt do a search in the biopython user forum for previous > questions posted on a topic. I looked in the following link provided > in Biopython wiki. Am i looking at the wrong link, > > http://lists.open-bio.org/pipermail/biopython/ That is the archive, and can be searched in Google in combination with a URL restriction, e.g. use: superposition site:http://lists.open-bio.org/pipermail/biopython/ See also http://www.biopython.org/wiki/Mailing_lists and in particular the Gmane archive is searchable: http://dir.gmane.org/gmane.comp.python.bio.general Peter From mjldehoon at yahoo.com Wed Aug 1 05:14:52 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 31 Jul 2012 22:14:52 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm Message-ID: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi everybody, I was using the search_pwm method in Bio.Motif (which btw is very useful, thanks Bartek) to search for motif instances on both strands of a sequence. If the motif starts at position and is located on the forward strand, this function returns +position; if it is located on the reverse strand, it returns -position. So for position==0, we cannot deduce from the sign whether the motif is located on the forward or on the backward strand. How about using Python-style negative indices to indicate the strand? For example, +20 means that the motif is located at [20:20+motif_length] on the forward strand, while -20 means that the motif is located at [-20:-20+motif_length]. Alternatively, we could return the strand explicitly. In the same function, I wish we could get rid of this line: sequence=sequence.tostring().upper() since this assumes that sequence is a Biopython Seq object, and not a plain string. We could either use str(sequence) instead of sequence.tostring() to cover both cases, or have the Seq class inherit from strings (which we have been discussing for some time; see https://redmine.open-bio.org/issues/2351). Best, -Michiel. From p.j.a.cock at googlemail.com Wed Aug 1 08:31:15 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Aug 2012 09:31:15 +0100 Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1343798092.97341.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Wed, Aug 1, 2012 at 6:14 AM, Michiel de Hoon wrote: > Hi everybody, > > I was using the search_pwm method in Bio.Motif (which btw > is very useful, thanks Bartek) to search for motif instances > on both strands of a sequence. If the motif starts at position > and is located on the forward strand, this function returns > +position; if it is located on the reverse strand, it returns > -position. So for position==0, we cannot deduce from the > sign whether the motif is located on the forward or on the > backward strand. That is a problem :( > How about using Python-style negative indices to indicate > the strand? For example, +20 means that the motif is > located at [20:20+motif_length] on the forward strand, > while -20 means that the motif is located at [-20:-20+motif_length]. > > Alternatively, we could return the strand explicitly. Either makes sense, but would be a break - but probably a necessary break in backwards compatibility. > In the same function, I wish we could get rid of this line: > > sequence=sequence.tostring().upper() > > since this assumes that sequence is a Biopython Seq > object, and not a plain string. Allowing a plain string makes good sense. +1 > We could either use str(sequence) instead of sequence.tostring() > to cover both cases, That would also accept other objects accidentally, e.g. a list, and probably lead to some obscure errors downstream. > or have the Seq class inherit from strings (which we have > been discussing for some time; see > https://redmine.open-bio.org/issues/2351). Or perhaps the Seq is already string like enough for this function (it supports upper()) so no casting is needed? That would be simpler - although likely not a fast. Or, we could follow the pattern used in Bio.SeqUtils and try the tostring() method, catching any AttributeError and then treating it like a string (since real strings don't have this). The advantage of this route is low risk. Peter From livingstonemark at gmail.com Thu Aug 2 04:41:29 2012 From: livingstonemark at gmail.com (Mark Livingstone) Date: Thu, 2 Aug 2012 14:41:29 +1000 Subject: [Biopython] Superimpose description error? Message-ID: Hi Guys, In Bio.PDB.Superimpose, it says: set_atoms(self, fixed, moving) Put (translate/rotate) the atoms in fixed on the atoms in moving, in such a way that the RMSD is minimized. Aren't the words fixed & moving in that description round the wrong way? In my research at present I am using a curanted set of 2,141 pairs of PDB files curated because they have only 1 mutation. Unfortunately, because Superimpose counts atoms before alignment, only 9 / 2,141 PDB pairs will align using the example shown in Superimpose.py source code. The way I have gotten around this is for each of the two PDBs to make a List of the CA atoms and the align these lists. Not optimal, but seems to work. The only other way I can see would be to get a full list of atoms then snip out the mutation side chain atoms allowing Superimpose to work as per the source code example - also not optimal but close. I am doing this because I am experimenting with different ways of using RMSD, so the better I get the alignment, the better my results - even if it is only different in decimal place differences. Are there any better approaches? Thanks in advance, MArkL From mjldehoon at yahoo.com Thu Aug 2 13:23:30 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 2 Aug 2012 06:23:30 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: Message-ID: <1343913810.81963.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi everybody, > On Wed, Aug 1, 2012 at 7:14 AM, Michiel de Hoon > wrote: > > Hi everybody, > > > > I was using the search_pwm method in Bio.Motif (which > btw is very useful, thanks Bartek) to search for motif > instances on both strands of a sequence. If the motif starts > at position and is located on the forward strand, this > function returns +position; if it is located on the reverse > strand, it returns -position. So for position==0, we cannot > deduce from the sign whether the motif is located on the > forward or on the backward strand. > > > > How about using Python-style negative indices to > indicate the strand? For example, +20 means that the motif > is located at [20:20+motif_length] on the forward strand, > while -20 means that the motif is located at > [-20:-20+motif_length]. > > Very nice idea! +1 from me Done; see https://github.com/biopython/biopython/commit/d7b67b7192b211b6bd1e4ca6e42eee55c2bc34a8 Best, -Michiel. From mjldehoon at yahoo.com Thu Aug 2 13:27:56 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 2 Aug 2012 06:27:56 -0700 (PDT) Subject: [Biopython] Bio.Motif search_pwm In-Reply-To: Message-ID: <1343914076.83712.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi all, > > We could either use str(sequence) instead of > > sequence.tostring() to cover both [plain strings and Seq objects] > > That would also accept other objects accidentally, e.g. a > list, and probably lead to some obscure errors downstream. > > > or have the Seq class inherit from strings (which we > > have been discussing for some time; see > > https://redmine.open-bio.org/issues/2351). > > Or perhaps the Seq is already string like enough for this > function (it supports upper()) so no casting is needed? This indeed works, so I simply removed the casting. > Or, we could follow the pattern used in Bio.SeqUtils and try > the tostring() method, catching any AttributeError and then > treating it like a string (since real strings don't have > this). The advantage of this route is low risk. To avoid these kinds of complications, at some point we should really move forward to let Seq inherit from plain strings. We have been discussing this issue for five years now. Best, -Michiel. From anaryin at gmail.com Fri Aug 3 07:47:20 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 3 Aug 2012 09:47:20 +0200 Subject: [Biopython] Superimpose description error? In-Reply-To: References: Message-ID: Hey Mark, Indeed it is wrong the other way. Well, Id say it depends what you are looking at. You can trim all selections to bb + cb/h (gly) so that you always have the same number of atoms. That is what we do here for mutant analysis. you can also code to iteratively add side chain carbons until the wt and mutant no longer match. This would give you a better description of the side chain orientation. No dia 2 de Ago de 2012 06:42, "Mark Livingstone" escreveu: > Hi Guys, > > In Bio.PDB.Superimpose, it says: > > set_atoms(self, fixed, moving) > Put (translate/rotate) the atoms in fixed on the atoms in moving, in > such a way that the RMSD is minimized. > > Aren't the words fixed & moving in that description round the wrong way? > > In my research at present I am using a curanted set of 2,141 pairs of > PDB files curated because they have only 1 mutation. Unfortunately, > because Superimpose counts atoms before alignment, only 9 / 2,141 PDB > pairs will align using the example shown in Superimpose.py source > code. > > The way I have gotten around this is for each of the two PDBs to make > a List of the CA atoms and the align these lists. Not optimal, but > seems to work. > > The only other way I can see would be to get a full list of atoms then > snip out the mutation side chain atoms allowing Superimpose to work as > per the source code example - also not optimal but close. > > I am doing this because I am experimenting with different ways of > using RMSD, so the better I get the alignment, the better my results - > even if it is only different in decimal place differences. Are there > any better approaches? > > Thanks in advance, > > MArkL > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From anna.kostikova at gmail.com Tue Aug 7 10:13:16 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Tue, 7 Aug 2012 12:13:16 +0200 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) Message-ID: Dear list users, I am wondering if Bio.Geography module is supported in Biophython? I was trying to access it following http://biopython.org/wiki/BioGeography guidelines, but got an error message (ImportError: No module named Geography.GbifXml) Thanks a lot in advance, Anna From chapmanb at 50mail.com Wed Aug 8 13:57:35 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 08 Aug 2012 09:57:35 -0400 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) In-Reply-To: References: Message-ID: <871ujhh4fk.fsf@fastmail.fm> Anna; Bio.Geography is available on a separate fork of Biopython and wasn't merged into the main codebase, so you'd need to install from there: http://github.com/nmatzke/biopython/tree/Geography Brad > Dear list users, > > I am wondering if Bio.Geography module is supported in Biophython? > I was trying to access it following > http://biopython.org/wiki/BioGeography guidelines, > but got an error message (ImportError: No module named Geography.GbifXml) > > Thanks a lot in advance, > Anna > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From anna.kostikova at gmail.com Thu Aug 9 09:35:35 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Thu, 9 Aug 2012 11:35:35 +0200 Subject: [Biopython] Bio.Geography module (Geography.GbifXml) In-Reply-To: <871ujhh4fk.fsf@fastmail.fm> References: <871ujhh4fk.fsf@fastmail.fm> Message-ID: Thanks a lot Brad, It works nicely once I've added it as separate module. Thanks again! Anna 2012/8/8 Brad Chapman : > > Anna; > Bio.Geography is available on a separate fork of Biopython and wasn't > merged into the main codebase, so you'd need to install from there: > > http://github.com/nmatzke/biopython/tree/Geography > > Brad > >> Dear list users, >> >> I am wondering if Bio.Geography module is supported in Biophython? >> I was trying to access it following >> http://biopython.org/wiki/BioGeography guidelines, >> but got an error message (ImportError: No module named Geography.GbifXml) >> >> Thanks a lot in advance, >> Anna >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython From devaniranjan at gmail.com Sat Aug 11 21:37:48 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 11 Aug 2012 17:37:48 -0400 Subject: [Biopython] Markov modelling Message-ID: Hi, I am sorry, I know this is not the appropriate forum for asking this question but I am not sure which is, so please pardon me for asking this here. Any help in this matter would be much appreciated. I have some data that is quite large and I wish to find the relationship between them but doing it by visual inspection is not possible. I was told maybe Markov modelling might help. My question is, is there a simple modelling tool that is out there I can use to do Markov modelling? I would appreciate any help in this as I am completely out of my depth in this question. Thank you very much and once again my apologies for asking this question in this forum. George From chris.mit7 at gmail.com Sat Aug 11 22:32:42 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Sat, 11 Aug 2012 18:32:42 -0400 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: Hi George, A few things I would do: 1) Make a spare dataset by randomly selecting a subset of your data to visualize. Just blindly fitting the data points to any distribution usually doesn't end well. 2) Find a copy of Numerical Recipes and read up on fitting data/modeling. There is a good section on Markov Models and other fitting routines. 3) Think about what you are trying to fit. Is the process generating the data a Poisson Process? What sort of equation should be used to model it, and what do the extracted parameters mean in relation to your data? Start with the simplest model you can explain (linear) and add/change parameters only if you can justify it. Chris On Sat, Aug 11, 2012 at 5:37 PM, George Devaniranjan wrote: > Hi, > > I am sorry, I know this is not the appropriate forum for asking this > question but I am not sure which is, so please pardon me for asking this > here. > Any help in this matter would be much appreciated. > > I have some data that is quite large and I wish to find the relationship > between them but doing it by visual inspection is not possible. > I was told maybe Markov modelling might help. > My question is, is there a simple modelling tool that is out there I can > use to do Markov modelling? I would appreciate any help in this as I am > completely out of my depth in this question. > > Thank you very much and once again my apologies for asking this question in > this forum. > > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Sun Aug 12 02:03:26 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 11 Aug 2012 22:03:26 -0400 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: Thank you very much Chris--I will read the numerical recipes. I should have explained more about the data. I extracted unique 5 mer patterns of amino acids from a database and look at their secondary structures. I want to know is there a relationship between the amino acid sequence and the secondary structure. This is the reason I thought Markov modelling might give me some ideas. On Sat, Aug 11, 2012 at 6:32 PM, Chris Mitchell wrote: > Hi George, > > A few things I would do: > 1) Make a spare dataset by randomly selecting a subset of your data to > visualize. Just blindly fitting the data points to any distribution > usually doesn't end well. > 2) Find a copy of Numerical Recipes and read up on fitting data/modeling. > There is a good section on Markov Models and other fitting routines. > 3) Think about what you are trying to fit. Is the process generating the > data a Poisson Process? What sort of equation should be used to model it, > and what do the extracted parameters mean in relation to your data? Start > with the simplest model you can explain (linear) and add/change parameters > only if you can justify it. > > Chris > > On Sat, Aug 11, 2012 at 5:37 PM, George Devaniranjan < > devaniranjan at gmail.com> wrote: > >> Hi, >> >> I am sorry, I know this is not the appropriate forum for asking this >> question but I am not sure which is, so please pardon me for asking this >> here. >> Any help in this matter would be much appreciated. >> >> I have some data that is quite large and I wish to find the relationship >> between them but doing it by visual inspection is not possible. >> I was told maybe Markov modelling might help. >> My question is, is there a simple modelling tool that is out there I can >> use to do Markov modelling? I would appreciate any help in this as I am >> completely out of my depth in this question. >> >> Thank you very much and once again my apologies for asking this question >> in >> this forum. >> >> George >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From ferreirafm at usp.br Mon Aug 13 18:21:45 2012 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Mon, 13 Aug 2012 15:21:45 -0300 Subject: [Biopython] Markov modelling In-Reply-To: References: Message-ID: <502945B9.5080108@usp.br> Hi George, That's not exactly a simple modelling tool, but you can give a try to the HMMER (http://hmmer.janelia.org). It comes with very good tutorials on Markov modelling. Best, Fred Em 11-08-2012 18:37, George Devaniranjan escreveu: > Hi, > > I am sorry, I know this is not the appropriate forum for asking this > question but I am not sure which is, so please pardon me for asking this > here. > Any help in this matter would be much appreciated. > > I have some data that is quite large and I wish to find the relationship > between them but doing it by visual inspection is not possible. > I was told maybe Markov modelling might help. > My question is, is there a simple modelling tool that is out there I can > use to do Markov modelling? I would appreciate any help in this as I am > completely out of my depth in this question. > > Thank you very much and once again my apologies for asking this question in > this forum. > > George > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From smwilson at hpc.unm.edu Tue Aug 14 14:10:53 2012 From: smwilson at hpc.unm.edu (Susan Wilson) Date: Tue, 14 Aug 2012 08:10:53 -0600 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files Message-ID: <502A5C6D.5040905@hpc.unm.edu> Hi, I am parsing the gb files with biopython. My problem is that none of the seqfeature.strand values are returning the plus strand (value == 1). The commands below are a bit fabricated. (For instance, I have left out the opening and closing of fout.) I have read in Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. The file output of command [13] shows only "-1" and "None". Is there a bug in the parser? Or am I making a mistake of some sort? Thanks. Susan In [10]: genome Out[10]: SeqRecord(seq=Seq('NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN', Alphabet()), id='1GRCh37', name='1', description='Homo sapiens chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL', dbxrefs=[]) In [11]: len(genome) Out[11]: 249250621 In [12]: len(genome.features) Out[12]: 109751 In [13]: for f in genome.features: ...: fout.write(str(f.strand) + "~" + str(f.location) + \ ...: "~" + str(f.qualifiers.get('gene')) + "\n") From p.j.a.cock at googlemail.com Tue Aug 14 14:46:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 15:46:54 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A5C6D.5040905@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 3:10 PM, Susan Wilson wrote: > Hi, > > I am parsing the gb files with biopython. My problem is that none of the > seqfeature.strand values are returning the plus strand (value == 1). That should happen with a protein sequence. > The commands below are a bit fabricated. (For instance, I have left out the > opening and closing of fout.) I have read in > Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. What URL are you getting that file from? Which version of Biopython are you using? There were some strand related changes recently (internally moving it from the SeqFeature to the SeqFeature's location object). Thanks, Peter From smwilson at hpc.unm.edu Tue Aug 14 14:54:24 2012 From: smwilson at hpc.unm.edu (Susan Wilson) Date: Tue, 14 Aug 2012 08:54:24 -0600 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: References: <502A5C6D.5040905@hpc.unm.edu> Message-ID: <502A66A0.2090207@hpc.unm.edu> Hi Peter, Thanks for quick response. I have downloaded the files from ftp://ftp.ensembl.org/pub/release-68/genbank/homo_sapiens/. Got version 1.53 of biopython. Maybe I should try 1.6? Here's some diagnostics: $ head Homo_sapiens.GRCh37.68.chromosome.1.dat LOCUS 1 249250621 bp DNA HTG 14-JUL-2012 DEFINITION Homo sapiens chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL ACCESSION chromosome:GRCh37:1:1:249250621:1 VERSION 1GRCh37 KEYWORDS . SOURCE human ORGANISM Homo sapiens . COMMENT This sequence was annotated by the Ensembl system. Please visit the Output from ipython: import sys sys.version_info Out[3]: (2, 6, 5, 'final', 0) sys.version Out[4]: '2.6.5 (r265:79063, Apr 16 2010, 13:57:41) \n[GCC 4.4.3]' import Bio print Bio.__version__ 1.53 On 08/14/2012 08:46 AM, Peter Cock wrote: > On Tue, Aug 14, 2012 at 3:10 PM, Susan Wilson wrote: >> Hi, >> >> I am parsing the gb files with biopython. My problem is that none of the >> seqfeature.strand values are returning the plus strand (value == 1). > That should happen with a protein sequence. > >> The commands below are a bit fabricated. (For instance, I have left out the >> opening and closing of fout.) I have read in >> Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. > What URL are you getting that file from? > > Which version of Biopython are you using? There were some strand > related changes recently (internally moving it from the SeqFeature to > the SeqFeature's location object). > > Thanks, > > Peter From p.j.a.cock at googlemail.com Tue Aug 14 15:38:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 16:38:42 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A66A0.2090207@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> <502A66A0.2090207@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 3:54 PM, Susan Wilson wrote: > Hi Peter, > > Thanks for quick response. I have downloaded the files from > ftp://ftp.ensembl.org/pub/release-68/genbank/homo_sapiens/. Got version 1.53 > of biopython. Maybe I should try 1.6? Biopython 1.53 was released over two years ago (December 2009). The current release is 1.60 (one dot sixty), there never was a 1.6 (one dot six). Yes, please try the current Biopython. It seems fine here at least - using this quick test I seem to get strands of +1 or -1 only as expected: from Bio import SeqIO genome = SeqIO.read("Homo_sapiens.GRCh37.68.chromosome.1.dat", "gb") for f in genome.features: print f.strand, f.location, f.qualifiers.get("gene") Going back to Biopython 1.53 on my machine (which didn't allow a filename in SeqIO thus needs an explicit open), I get a parser warning: UserWarning: Malformed LOCUS line found - is this correct? LOCUS 1 249250621 bp DNA HTG 14-JUL-2012 You should have seen this warning on your machine. Did you? This meant the sequence wasn't considered DNA or RNA (but an unspecified alphabet), and as a result the strand wasn't set to +1, but left as None (which would normally only happen on proteins). At some point the LOCUS line handling was updated, so it now does recognise this as a nucleotide sequence. Peter From p.j.a.cock at googlemail.com Tue Aug 14 19:55:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Aug 2012 20:55:42 +0100 Subject: [Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files In-Reply-To: <502A9442.9070702@hpc.unm.edu> References: <502A5C6D.5040905@hpc.unm.edu> <502A66A0.2090207@hpc.unm.edu> <502A9442.9070702@hpc.unm.edu> Message-ID: On Tue, Aug 14, 2012 at 7:09 PM, Susan Wilson wrote: > Peter! > > Thanks. I've been in a meeting all morning and apologize for not responding > sooner. > > Yes I always see the warning about the malformed LOCUS line. I googled > around, but couldn't find much, except about the second entry on the line > being a problem, but when I substituted a character string for the "1", it > didn't change anything, and since most everything else I was parsing seemed > correct, I was just ignoring it. > > I will get the 1.60 version of Biopython and try it all again. I greatly > appreciate your assistance. > > Susan Great - good luck, Peter From bala.biophysics at gmail.com Thu Aug 30 10:35:19 2012 From: bala.biophysics at gmail.com (Bala subramanian) Date: Thu, 30 Aug 2012 12:35:19 +0200 Subject: [Biopython] creating atom list Message-ID: Friends, 1) Sorry if this question is repeated. I want to create an atom list for super-imposition, what i understand from documentation and examples given on the net is that i have to loop through model-chain-residues to create a list of atoms that i need. Is there any way to select CA or backbone atoms in a single line rather looping over the SMCRA hierarchy. Once i create a structure object, i just want to create a list of its backbone atoms. 2) I could nt do a search in the biopython user forum for previous questions posted on a topic. I looked in the following link provided in Biopython wiki. Am i looking at the wrong link, http://lists.open-bio.org/pipermail/biopython/ Thanks, Bala -- C. Balasubramanian From anaryin at gmail.com Thu Aug 30 11:08:56 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 30 Aug 2012 13:08:56 +0200 Subject: [Biopython] creating atom list In-Reply-To: References: Message-ID: On my phone, short answer: bb = set(['ca', 'c', 'n', 'o']) selection = [atom for atom in struct.get_atoms() if atom.name in bb] No dia 30 de Ago de 2012 12:38, "Bala subramanian" < bala.biophysics at gmail.com> escreveu: > Friends, > 1) Sorry if this question is repeated. I want to create an atom list > for super-imposition, what i understand from documentation and > examples given on the net is that i have to loop through > model-chain-residues to create a list of atoms that i need. Is there > any way to select CA or backbone atoms in a single line rather looping > over the SMCRA hierarchy. Once i create a structure object, i just > want to create a list of its backbone atoms. > > 2) I could nt do a search in the biopython user forum for previous > questions posted on a topic. I looked in the following link provided > in Biopython wiki. Am i looking at the wrong link, > > http://lists.open-bio.org/pipermail/biopython/ > > Thanks, > Bala > > -- > C. Balasubramanian > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Aug 30 11:32:30 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 Aug 2012 12:32:30 +0100 Subject: [Biopython] creating atom list In-Reply-To: References: Message-ID: On Thu, Aug 30, 2012 at 11:35 AM, Bala subramanian wrote: > Friends, > 1) Sorry if this question is repeated. I want to create an atom list > for super-imposition, what i understand from documentation and > examples given on the net is that i have to loop through > model-chain-residues to create a list of atoms that i need. Is there > any way to select CA or backbone atoms in a single line rather looping > over the SMCRA hierarchy. Once i create a structure object, i just > want to create a list of its backbone atoms. Joao has given a short answer - see also this (old) example, http://www.warwick.ac.uk/go/peter_cock/python/protein_superposition/ > 2) I could nt do a search in the biopython user forum for previous > questions posted on a topic. I looked in the following link provided > in Biopython wiki. Am i looking at the wrong link, > > http://lists.open-bio.org/pipermail/biopython/ That is the archive, and can be searched in Google in combination with a URL restriction, e.g. use: superposition site:http://lists.open-bio.org/pipermail/biopython/ See also http://www.biopython.org/wiki/Mailing_lists and in particular the Gmane archive is searchable: http://dir.gmane.org/gmane.comp.python.bio.general Peter