From romeliasalomon at gmail.com Mon Dec 2 18:43:52 2013 From: romeliasalomon at gmail.com (Romelia Salomon) Date: Mon, 2 Dec 2013 15:43:52 -0800 Subject: [Biopython] error object is not callable, but callable() returns True Message-ID: Hi I am having problems using biopython on my laptop. I have Python version v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using Ubuntu), but I can't get it to work right, I am having a problem running even the examples in the tutorial. When I try to use any of the command line wrappers I get an error saying the object is not callable, even though when I check that same object with callable() it does seem to be callable. For example: mycomputer$ python Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Align.Applications import MuscleCommandline >>> muscle_cline = MuscleCommandline(input="opuntia.fasta") >>> stdout, stderr = muscle_cline() Traceback (most recent call last): File "", line 1, in TypeError: 'MuscleCommandline' object is not callable >>> callable(MuscleCommandline) True Thanks for your help! -- **************************************** Dr. Romelia Salomon Ferrer Nagarajan Research Group Immunology Department City of Hope From p.j.a.cock at googlemail.com Tue Dec 3 05:31:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 Dec 2013 10:31:28 +0000 Subject: [Biopython] error object is not callable, but callable() returns True In-Reply-To: References: Message-ID: On Mon, Dec 2, 2013 at 11:43 PM, Romelia Salomon wrote: > Hi > > I am having problems using biopython on my laptop. I have Python version > v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using > Ubuntu), but I can't get it to work right, I am having a problem running > even the examples in the tutorial. > > When I try to use any of the command line wrappers I get an error saying > the object is not callable, even though when I check that same object with > callable() it does seem to be callable. For example: > > mycomputer$ python > Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio.Align.Applications import MuscleCommandline >>>> muscle_cline = MuscleCommandline(input="opuntia.fasta") >>>> stdout, stderr = muscle_cline() > Traceback (most recent call last): > File "", line 1, in > TypeError: 'MuscleCommandline' object is not callable >>>> callable(MuscleCommandline) > True > > Thanks for your help! Hi Romelia, The simple answer is from the FAQ (in the current Biopython Tutorial), http://biopython.org/DIST/docs/tutorial/Tutorial.html "Why can?t I run command line tools directly from the application wrappers? You need Biopython 1.55 or later. Alternatively, use the Python subprocess module directly." You are using Biopython 1.53 (which is now four years old), and it doesn't have the __call__ method defined which made the class callable in this way. Your investigation *almost* identified the problem, but the key test you missed was: callable(muscle_cline) Unfortunately checking callable(MuscleCommandline) just confirmed you can invoke MuscleCommandline(...) which you did to create the object stored as muscle_cline. So, I would suggest you uninstall the (very old) Ubunutu Biopython (using 'sudo apt-get remove python-biopython') and then instead install the current release from source. See: http://biopython.org/wiki/Download#Ubuntu_or_Debian Regards, Peter From romeliasalomon at gmail.com Tue Dec 3 11:34:30 2013 From: romeliasalomon at gmail.com (Romelia Salomon) Date: Tue, 3 Dec 2013 08:34:30 -0800 Subject: [Biopython] error object is not callable, but callable() returns True In-Reply-To: References: Message-ID: Thanks Peter! It works perfectly now. On Tue, Dec 3, 2013 at 2:31 AM, Peter Cock wrote: > On Mon, Dec 2, 2013 at 11:43 PM, Romelia Salomon > wrote: > > Hi > > > > I am having problems using biopython on my laptop. I have Python version > > v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using > > Ubuntu), but I can't get it to work right, I am having a problem running > > even the examples in the tutorial. > > > > When I try to use any of the command line wrappers I get an error saying > > the object is not callable, even though when I check that same object > with > > callable() it does seem to be callable. For example: > > > > mycomputer$ python > > Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) > > [GCC 4.4.3] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > >>>> from Bio.Align.Applications import MuscleCommandline > >>>> muscle_cline = MuscleCommandline(input="opuntia.fasta") > >>>> stdout, stderr = muscle_cline() > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: 'MuscleCommandline' object is not callable > >>>> callable(MuscleCommandline) > > True > > > > Thanks for your help! > > Hi Romelia, > > The simple answer is from the FAQ (in the current Biopython Tutorial), > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > "Why can?t I run command line tools directly from the application wrappers? > You need Biopython 1.55 or later. Alternatively, use the Python > subprocess module directly." > > You are using Biopython 1.53 (which is now four years old), and it > doesn't have the __call__ method defined which made the class > callable in this way. Your investigation *almost* identified the > problem, but the key test you missed was: > > callable(muscle_cline) > > Unfortunately checking callable(MuscleCommandline) just confirmed > you can invoke MuscleCommandline(...) which you did to create > the object stored as muscle_cline. > > So, I would suggest you uninstall the (very old) Ubunutu Biopython > (using 'sudo apt-get remove python-biopython') and then instead > install the current release from source. See: > http://biopython.org/wiki/Download#Ubuntu_or_Debian > > Regards, > > Peter > -- **************************************** Dr. Romelia Salomon Ferrer Nagarajan Research Group Immunology Department City of Hope From aradwen at gmail.com Wed Dec 4 12:40:10 2013 From: aradwen at gmail.com (Aniba, Radhouane) Date: Wed, 4 Dec 2013 12:40:10 -0500 Subject: [Biopython] Python crowd-coding Message-ID: <65B72E1F-AAD9-4AE8-AD6A-8F075A3E216C@gmail.com> Hello ' pythonians', I apologize in advance if the message sounds like an advertisement, but I thought it might be useful to the users of Biopython list, so I will make it short. I just wanted to announce the first public release of CodersCrowd, the bioinformatics crowdcoding platform, to the public. This first release of our application comes with a lot of features that will be, we hope, of great help for your day to day bioinformatics developments and research. It was less than a year ago that I first thought about a platform where people share their skills and methodologies to solve different kinds of problems. After months of feedbacks from the community, I think that we have here something that will be of great help, an addition to some excellent initiatives that are already changing the way we are doing bioinformatics, such us biostars, seqanswers and rosalind. The first release of CodersCrowd comes already loaded with a lot of features, but this is only the top of the iceberg. We decided to go public with the version 1.1 that was ready about a month ago (we were running a battery of tests to make sure everything works great). The future of CodersCrowd is incredibly bright and we welcome you to join us on the journey of its first chapter. You can sign up here : coderscrowd.com, and I would be glad to have some feedbacks if any. Thanks, Rad, the guy behind CodersCrowd From cfriedline at vcu.edu Wed Dec 4 11:48:24 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 11:48:24 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' Message-ID: Hi everyone, I ran into this problem today, and wanted to bring it up to see if anyone else has seen it. Searching the archives didn't get me anywhere. I'm running a simple restriction analysis (on a random genome), and getting this error. If I look for cut sites with the enzymes individually, it works for each. However, when I put them into a RestrictionBatch, I get an error on the class. type object 'RestrictionType' has no attribute 'size' I put an IPython notebook describing this here: http://nbviewer.ipython.org/gist/cfriedline/7790932 Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) Any help is appreciated, Thanks, Chris From p.j.a.cock at googlemail.com Wed Dec 4 18:38:08 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 Dec 2013 23:38:08 +0000 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Wed, Dec 4, 2013 at 4:48 PM, Christopher J Friedline wrote: > Hi everyone, > > I ran into this problem today, and wanted to bring it up to see if anyone > else has seen it. Searching the archives didn't get me anywhere. I'm > running a simple restriction analysis (on a random genome), and getting > this error. If I look for cut sites with the enzymes individually, it works > for each. However, when I put them into a RestrictionBatch, I get an error > on the class. > > type object 'RestrictionType' has no attribute 'size' > > I put an IPython notebook describing this here: > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > Any help is appreciated, > > Thanks, > Chris Hi Chris, Unfortunately when I tried it with a random genome like your example it didn't fail... can you replace your random genome with something fixed and still get the exception from the .with_site() call? i.e. Can we eliminate the randomness as a factor? Here's a cut down example which works for me (64 bit Linux, Python 2.7.3, latest Biopython from git): >>> from Bio.Restriction import * >>> from Bio.Seq import Seq >>> g = Seq(EcoRI.site + "ACGT" + MstI.site + "ACGT" + EcoRI.site) >>> rb = RestrictionBatch(['EcoRI', 'MstI']) >>> ana = Analysis(rb, g, linear=True) >>> ana.with_sites() {EcoRI: [2, 22], MstI: [14]} This rings a couple of bells (oddities with the .size property), http://lists.open-bio.org/pipermail/biopython-dev/2013-October/010935.html Also a much older thread where RestrictionBatch gave trouble, which it appears was never properly resolved http://lists.open-bio.org/pipermail/biopython/2009-December/006004.html http://lists.open-bio.org/pipermail/biopython/2009-December/006005.html That example fails today, with hindsight one of us should have filed a bug rather than just forgetting about it once the original poster's problem was solved. It looks like it could be down to the version of Python... Again using Biopython from git, $ python Python 2.7.3 (default, Sep 26 2013, 20:03:06) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Restriction import * >>> EcoRI.size == len(EcoRI) == 6 True >>> rb = RestrictionBatch(first=[], suppliers=['F','R']) >>> len(rb) #varies depending on version of REBASE 235 >>> len([x for x in rb if x.size == 6]) 125 >>> len(rb.lambdasplit(lambda x: x.size == 6)) #same? 0 >>> len(rb.lambdasplit(lambda x: len(x) == 6)) #same? 0 (This is probably a separate issue to yours though.) Peter From w.arindrarto at gmail.com Wed Dec 4 19:03:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 5 Dec 2013 01:03:02 +0100 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Hi Christopher, Peter, Thank you for reporting the issue. After prodding around, I have to say this looks like a very interesting 'bug' :). I'm going to ramble a bit below, so I'll give a short TL;DR first: I don't think this is a bug from Biopython per se, since if you try the same code in a regular Python shell, it works (as Peter has shown). Don't get me wrong, I do appreciate the report in IPython notebook form (I'm a user myself and I think more people should use it, also for reporting bugs like this). But in the context of fixing the problem, I'd suggest that you use a regular Python shell instead. Now, that being said, let's delve in a bit into what's causing the bug (with a small disclaimer: these are my observations and I'm quite sure there are others who know more about this issue). If you see the stack trace of the error, you'd see that a huge chunk is coming from the IPython codebase. Initially I thought this was something of an artifact (I expected the entire stack to trace only Biopython code), but I was wrong. That stack trace really points to the cause of the problem: the way IPython displays cell results in the browser. More specifically, I've pinpointed this problem into the way IPython displays restriction enzymes. You can try it in your same Ipython notebook: from Bio import Restriction as rst rst.EcoRI The code above will never fail in a regular Python shell, but will fail in IPython. Why is this the case? I suspect it has something to do with the way Biopython's Restriction module is written. Turns out, there's already some nifty metaclass tricks being employed there that allows a given enzyme to have the class type of itself (e.g. instead of having a RestrictionType class with an instance called EcoRI, we have a metaclass RestrictionType with a class called EcoRI). To keep it short, IPython seems to do an attribute lookup on these metaclasses when it tries to display the object. You'll notice that on line 333 of /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, there is an expression that tries to get the class of the object to display: obj_class = getattr(obj, '__class__', None) or type(obj) However, these metaclasses have not been instantiated into a class and thus do not have any 'size' attribute, IPython complains and raises an error. So to sum up, this behavior may be cause by IPython instead of Biopython. It's perhaps a good idea to mention this in the IPython forum / mailing list, too. They've been doing a very impressive job so far, so I have a hunch this may have popped up in one of their discussions. I have to admit that my metaclass chops are still very limited as these usually aren't required for most Python programming. But if you're interested this SO page (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) has a very easy to follow explanation about what they are and why they are useful. Anyway, that's enough for now I guess :). Cheers, Bow P.S. Peter, perhaps these other Restriction bugs are related with the metaclass? I've never delved that deep into the submodule's codebase, but it looks like there are some very interesting tidbits there. On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline wrote: > Hi everyone, > > I ran into this problem today, and wanted to bring it up to see if anyone > else has seen it. Searching the archives didn't get me anywhere. I'm > running a simple restriction analysis (on a random genome), and getting > this error. If I look for cut sites with the enzymes individually, it works > for each. However, when I put them into a RestrictionBatch, I get an error > on the class. > > type object 'RestrictionType' has no attribute 'size' > > I put an IPython notebook describing this here: > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > Any help is appreciated, > > Thanks, > Chris > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From cfriedline at vcu.edu Wed Dec 4 20:56:05 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 20:56:05 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: I may have spoken too soon about the notebook specificity - got the same error when I run .full() on the Analysis object in the ipython console. On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline wrote: > Bow, > > Thanks for your excellent reply and for taking the time to look - it never > occurred to me to use the regular (or IPython) REPL - because it's just not > my way of developing. I think you're right about it not being a Biopython > bug, but I think the problems are related to the IPython notebook > specifically, rather than IPython, in general. I downloaded the notebook > and ran it in the ipython console (%run magic), and it completes without > error. Good catch - I'll touch base with the IPython folks. > > Thanks, > Chris > > > > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto wrote: > >> Hi Christopher, Peter, >> >> Thank you for reporting the issue. After prodding around, I have to >> say this looks like a very interesting 'bug' :). I'm going to ramble a >> bit below, so I'll give a short TL;DR first: >> >> I don't think this is a bug from Biopython per se, since if you try >> the same code in a regular Python shell, it works (as Peter has >> shown). Don't get me wrong, >> I do appreciate the report in IPython notebook form (I'm a user myself >> and I think more people should use it, also for reporting bugs like >> this). But in the context of fixing the problem, I'd suggest that you >> use a regular Python shell instead. >> >> Now, that being said, let's delve in a bit into what's causing the bug >> (with a small disclaimer: these are my observations and I'm quite sure >> there are others who know more about this issue). >> >> If you see the stack >> trace of the error, you'd see that a huge chunk is coming from the >> IPython codebase. Initially I thought this was something of an >> artifact (I expected the entire stack to trace only Biopython code), >> but I was wrong. That stack trace really points to the cause of the >> problem: the way IPython displays cell results in the browser. >> >> More specifically, I've pinpointed this problem into the way IPython >> displays restriction enzymes. You can try it in your same Ipython >> notebook: >> >> from Bio import Restriction as rst >> rst.EcoRI >> >> The code above will never fail in a regular Python shell, but will >> fail in IPython. >> >> Why is this the case? I suspect it has something to do with the way >> Biopython's Restriction module is written. Turns out, there's already >> some nifty metaclass tricks being employed there that allows a given >> enzyme to have the class type of itself (e.g. instead of having a >> RestrictionType class with an instance called EcoRI, we have a >> metaclass RestrictionType with a class called EcoRI). To keep it >> short, IPython seems to do an attribute lookup on these metaclasses >> when it tries to display the object. You'll notice that on line 333 >> of >> /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, >> there is an expression that tries to get the class of the object to >> display: >> >> obj_class = getattr(obj, '__class__', None) or type(obj) >> >> However, these metaclasses have not been instantiated into a class and >> thus do not have any 'size' attribute, IPython complains and raises an >> error. >> >> So to sum up, this behavior may be cause by IPython instead of >> Biopython. It's perhaps a good idea to mention this in the IPython >> forum / mailing list, too. They've been doing a very impressive job so >> far, so I have a hunch this may have popped up in one of their >> discussions. >> >> I have to admit that my metaclass chops are still very limited as >> these usually aren't required for most Python programming. But if >> you're interested this SO page >> (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) >> has a very easy to follow explanation about what they are and why they >> are useful. >> >> Anyway, that's enough for now I guess :). >> >> Cheers, >> Bow >> >> P.S. Peter, perhaps these other Restriction bugs are related with the >> metaclass? I've never delved that deep into the submodule's codebase, >> but it looks like there are some very interesting tidbits there. >> >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline >> wrote: >> > Hi everyone, >> > >> > I ran into this problem today, and wanted to bring it up to see if >> anyone >> > else has seen it. Searching the archives didn't get me anywhere. I'm >> > running a simple restriction analysis (on a random genome), and getting >> > this error. If I look for cut sites with the enzymes individually, it >> works >> > for each. However, when I put them into a RestrictionBatch, I get an >> error >> > on the class. >> > >> > type object 'RestrictionType' has no attribute 'size' >> > >> > I put an IPython notebook describing this here: >> > >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 >> > >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) >> > >> > Any help is appreciated, >> > >> > Thanks, >> > Chris >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > > From cfriedline at vcu.edu Wed Dec 4 20:53:26 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 20:53:26 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Bow, Thanks for your excellent reply and for taking the time to look - it never occurred to me to use the regular (or IPython) REPL - because it's just not my way of developing. I think you're right about it not being a Biopython bug, but I think the problems are related to the IPython notebook specifically, rather than IPython, in general. I downloaded the notebook and ran it in the ipython console (%run magic), and it completes without error. Good catch - I'll touch base with the IPython folks. Thanks, Chris On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto wrote: > Hi Christopher, Peter, > > Thank you for reporting the issue. After prodding around, I have to > say this looks like a very interesting 'bug' :). I'm going to ramble a > bit below, so I'll give a short TL;DR first: > > I don't think this is a bug from Biopython per se, since if you try > the same code in a regular Python shell, it works (as Peter has > shown). Don't get me wrong, > I do appreciate the report in IPython notebook form (I'm a user myself > and I think more people should use it, also for reporting bugs like > this). But in the context of fixing the problem, I'd suggest that you > use a regular Python shell instead. > > Now, that being said, let's delve in a bit into what's causing the bug > (with a small disclaimer: these are my observations and I'm quite sure > there are others who know more about this issue). > > If you see the stack > trace of the error, you'd see that a huge chunk is coming from the > IPython codebase. Initially I thought this was something of an > artifact (I expected the entire stack to trace only Biopython code), > but I was wrong. That stack trace really points to the cause of the > problem: the way IPython displays cell results in the browser. > > More specifically, I've pinpointed this problem into the way IPython > displays restriction enzymes. You can try it in your same Ipython > notebook: > > from Bio import Restriction as rst > rst.EcoRI > > The code above will never fail in a regular Python shell, but will > fail in IPython. > > Why is this the case? I suspect it has something to do with the way > Biopython's Restriction module is written. Turns out, there's already > some nifty metaclass tricks being employed there that allows a given > enzyme to have the class type of itself (e.g. instead of having a > RestrictionType class with an instance called EcoRI, we have a > metaclass RestrictionType with a class called EcoRI). To keep it > short, IPython seems to do an attribute lookup on these metaclasses > when it tries to display the object. You'll notice that on line 333 > of > /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, > there is an expression that tries to get the class of the object to > display: > > obj_class = getattr(obj, '__class__', None) or type(obj) > > However, these metaclasses have not been instantiated into a class and > thus do not have any 'size' attribute, IPython complains and raises an > error. > > So to sum up, this behavior may be cause by IPython instead of > Biopython. It's perhaps a good idea to mention this in the IPython > forum / mailing list, too. They've been doing a very impressive job so > far, so I have a hunch this may have popped up in one of their > discussions. > > I have to admit that my metaclass chops are still very limited as > these usually aren't required for most Python programming. But if > you're interested this SO page > (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) > has a very easy to follow explanation about what they are and why they > are useful. > > Anyway, that's enough for now I guess :). > > Cheers, > Bow > > P.S. Peter, perhaps these other Restriction bugs are related with the > metaclass? I've never delved that deep into the submodule's codebase, > but it looks like there are some very interesting tidbits there. > > On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline > wrote: > > Hi everyone, > > > > I ran into this problem today, and wanted to bring it up to see if anyone > > else has seen it. Searching the archives didn't get me anywhere. I'm > > running a simple restriction analysis (on a random genome), and getting > > this error. If I look for cut sites with the enzymes individually, it > works > > for each. However, when I put them into a RestrictionBatch, I get an > error > > on the class. > > > > type object 'RestrictionType' has no attribute 'size' > > > > I put an IPython notebook describing this here: > > > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > > > Any help is appreciated, > > > > Thanks, > > Chris > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > From antony.lee at berkeley.edu Wed Dec 4 22:38:22 2013 From: antony.lee at berkeley.edu (Antony Lee) Date: Wed, 4 Dec 2013 19:38:22 -0800 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: The restriction code does *way* more magic with metaclasses than needed (and I don't think IPython is really to blame here as the code breaks some fairly basic assumptions about the Python object model IMHO). I have in fact a PR from nearly one year ago that basically reimplemented the whole module from scratch (https://github.com/biopython/biopython/pull/148). Feel free to try it. Antony 2013/12/4 Christopher J Friedline > I may have spoken too soon about the notebook specificity - got the same > error when I run .full() on the Analysis object in the ipython console. > > On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline > wrote: > > > Bow, > > > > Thanks for your excellent reply and for taking the time to look - it > never > > occurred to me to use the regular (or IPython) REPL - because it's just > not > > my way of developing. I think you're right about it not being a > Biopython > > bug, but I think the problems are related to the IPython notebook > > specifically, rather than IPython, in general. I downloaded the notebook > > and ran it in the ipython console (%run magic), and it completes without > > error. Good catch - I'll touch base with the IPython folks. > > > > Thanks, > > Chris > > > > > > > > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto < > w.arindrarto at gmail.com>wrote: > > > >> Hi Christopher, Peter, > >> > >> Thank you for reporting the issue. After prodding around, I have to > >> say this looks like a very interesting 'bug' :). I'm going to ramble a > >> bit below, so I'll give a short TL;DR first: > >> > >> I don't think this is a bug from Biopython per se, since if you try > >> the same code in a regular Python shell, it works (as Peter has > >> shown). Don't get me wrong, > >> I do appreciate the report in IPython notebook form (I'm a user myself > >> and I think more people should use it, also for reporting bugs like > >> this). But in the context of fixing the problem, I'd suggest that you > >> use a regular Python shell instead. > >> > >> Now, that being said, let's delve in a bit into what's causing the bug > >> (with a small disclaimer: these are my observations and I'm quite sure > >> there are others who know more about this issue). > >> > >> If you see the stack > >> trace of the error, you'd see that a huge chunk is coming from the > >> IPython codebase. Initially I thought this was something of an > >> artifact (I expected the entire stack to trace only Biopython code), > >> but I was wrong. That stack trace really points to the cause of the > >> problem: the way IPython displays cell results in the browser. > >> > >> More specifically, I've pinpointed this problem into the way IPython > >> displays restriction enzymes. You can try it in your same Ipython > >> notebook: > >> > >> from Bio import Restriction as rst > >> rst.EcoRI > >> > >> The code above will never fail in a regular Python shell, but will > >> fail in IPython. > >> > >> Why is this the case? I suspect it has something to do with the way > >> Biopython's Restriction module is written. Turns out, there's already > >> some nifty metaclass tricks being employed there that allows a given > >> enzyme to have the class type of itself (e.g. instead of having a > >> RestrictionType class with an instance called EcoRI, we have a > >> metaclass RestrictionType with a class called EcoRI). To keep it > >> short, IPython seems to do an attribute lookup on these metaclasses > >> when it tries to display the object. You'll notice that on line 333 > >> of > >> > /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, > >> there is an expression that tries to get the class of the object to > >> display: > >> > >> obj_class = getattr(obj, '__class__', None) or type(obj) > >> > >> However, these metaclasses have not been instantiated into a class and > >> thus do not have any 'size' attribute, IPython complains and raises an > >> error. > >> > >> So to sum up, this behavior may be cause by IPython instead of > >> Biopython. It's perhaps a good idea to mention this in the IPython > >> forum / mailing list, too. They've been doing a very impressive job so > >> far, so I have a hunch this may have popped up in one of their > >> discussions. > >> > >> I have to admit that my metaclass chops are still very limited as > >> these usually aren't required for most Python programming. But if > >> you're interested this SO page > >> ( > http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) > >> has a very easy to follow explanation about what they are and why they > >> are useful. > >> > >> Anyway, that's enough for now I guess :). > >> > >> Cheers, > >> Bow > >> > >> P.S. Peter, perhaps these other Restriction bugs are related with the > >> metaclass? I've never delved that deep into the submodule's codebase, > >> but it looks like there are some very interesting tidbits there. > >> > >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline > >> wrote: > >> > Hi everyone, > >> > > >> > I ran into this problem today, and wanted to bring it up to see if > >> anyone > >> > else has seen it. Searching the archives didn't get me anywhere. I'm > >> > running a simple restriction analysis (on a random genome), and > getting > >> > this error. If I look for cut sites with the enzymes individually, it > >> works > >> > for each. However, when I put them into a RestrictionBatch, I get an > >> error > >> > on the class. > >> > > >> > type object 'RestrictionType' has no attribute 'size' > >> > > >> > I put an IPython notebook describing this here: > >> > > >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 > >> > > >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > >> > > >> > Any help is appreciated, > >> > > >> > Thanks, > >> > Chris > >> > _______________________________________________ > >> > Biopython mailing list - Biopython at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/biopython > >> > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From w.arindrarto at gmail.com Thu Dec 5 05:01:04 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 5 Dec 2013 11:01:04 +0100 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Hi everyone, Christopher: Ah yes, I actually meant the IPython notebook (but I guess it turns out the same occurs in the IPython console then :) ). Antony: That may be the case, too. But thanks for the pull request (I think Peter has just looked at it, actually https://github.com/biopython/biopython/pull/148). I do think there is room for improvement there, especially since the code probably predates modern Python conventions (& assumptions). Cheers, Bow On Thu, Dec 5, 2013 at 4:38 AM, Antony Lee wrote: > The restriction code does *way* more magic with metaclasses than needed (and > I don't think IPython is really to blame here as the code breaks some fairly > basic assumptions about the Python object model IMHO). I have in fact a PR > from nearly one year ago that basically reimplemented the whole module from > scratch (https://github.com/biopython/biopython/pull/148). Feel free to try > it. > > Antony > > > 2013/12/4 Christopher J Friedline >> >> I may have spoken too soon about the notebook specificity - got the same >> error when I run .full() on the Analysis object in the ipython console. >> >> On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline >> wrote: >> >> > Bow, >> > >> > Thanks for your excellent reply and for taking the time to look - it >> > never >> > occurred to me to use the regular (or IPython) REPL - because it's just >> > not >> > my way of developing. I think you're right about it not being a >> > Biopython >> > bug, but I think the problems are related to the IPython notebook >> > specifically, rather than IPython, in general. I downloaded the >> > notebook >> > and ran it in the ipython console (%run magic), and it completes without >> > error. Good catch - I'll touch base with the IPython folks. >> > >> > Thanks, >> > Chris >> > >> > >> > >> > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto >> > wrote: >> > >> >> Hi Christopher, Peter, >> >> >> >> Thank you for reporting the issue. After prodding around, I have to >> >> say this looks like a very interesting 'bug' :). I'm going to ramble a >> >> bit below, so I'll give a short TL;DR first: >> >> >> >> I don't think this is a bug from Biopython per se, since if you try >> >> the same code in a regular Python shell, it works (as Peter has >> >> shown). Don't get me wrong, >> >> I do appreciate the report in IPython notebook form (I'm a user myself >> >> and I think more people should use it, also for reporting bugs like >> >> this). But in the context of fixing the problem, I'd suggest that you >> >> use a regular Python shell instead. >> >> >> >> Now, that being said, let's delve in a bit into what's causing the bug >> >> (with a small disclaimer: these are my observations and I'm quite sure >> >> there are others who know more about this issue). >> >> >> >> If you see the stack >> >> trace of the error, you'd see that a huge chunk is coming from the >> >> IPython codebase. Initially I thought this was something of an >> >> artifact (I expected the entire stack to trace only Biopython code), >> >> but I was wrong. That stack trace really points to the cause of the >> >> problem: the way IPython displays cell results in the browser. >> >> >> >> More specifically, I've pinpointed this problem into the way IPython >> >> displays restriction enzymes. You can try it in your same Ipython >> >> notebook: >> >> >> >> from Bio import Restriction as rst >> >> rst.EcoRI >> >> >> >> The code above will never fail in a regular Python shell, but will >> >> fail in IPython. >> >> >> >> Why is this the case? I suspect it has something to do with the way >> >> Biopython's Restriction module is written. Turns out, there's already >> >> some nifty metaclass tricks being employed there that allows a given >> >> enzyme to have the class type of itself (e.g. instead of having a >> >> RestrictionType class with an instance called EcoRI, we have a >> >> metaclass RestrictionType with a class called EcoRI). To keep it >> >> short, IPython seems to do an attribute lookup on these metaclasses >> >> when it tries to display the object. You'll notice that on line 333 >> >> of >> >> >> >> /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, >> >> there is an expression that tries to get the class of the object to >> >> display: >> >> >> >> obj_class = getattr(obj, '__class__', None) or type(obj) >> >> >> >> However, these metaclasses have not been instantiated into a class and >> >> thus do not have any 'size' attribute, IPython complains and raises an >> >> error. >> >> >> >> So to sum up, this behavior may be cause by IPython instead of >> >> Biopython. It's perhaps a good idea to mention this in the IPython >> >> forum / mailing list, too. They've been doing a very impressive job so >> >> far, so I have a hunch this may have popped up in one of their >> >> discussions. >> >> >> >> I have to admit that my metaclass chops are still very limited as >> >> these usually aren't required for most Python programming. But if >> >> you're interested this SO page >> >> >> >> (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) >> >> has a very easy to follow explanation about what they are and why they >> >> are useful. >> >> >> >> Anyway, that's enough for now I guess :). >> >> >> >> Cheers, >> >> Bow >> >> >> >> P.S. Peter, perhaps these other Restriction bugs are related with the >> >> metaclass? I've never delved that deep into the submodule's codebase, >> >> but it looks like there are some very interesting tidbits there. >> >> >> >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline >> >> wrote: >> >> > Hi everyone, >> >> > >> >> > I ran into this problem today, and wanted to bring it up to see if >> >> anyone >> >> > else has seen it. Searching the archives didn't get me anywhere. >> >> > I'm >> >> > running a simple restriction analysis (on a random genome), and >> >> > getting >> >> > this error. If I look for cut sites with the enzymes individually, it >> >> works >> >> > for each. However, when I put them into a RestrictionBatch, I get an >> >> error >> >> > on the class. >> >> > >> >> > type object 'RestrictionType' has no attribute 'size' >> >> > >> >> > I put an IPython notebook describing this here: >> >> > >> >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 >> >> > >> >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) >> >> > >> >> > Any help is appreciated, >> >> > >> >> > Thanks, >> >> > Chris >> >> > _______________________________________________ >> >> > Biopython mailing list - Biopython at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biopython >> >> >> > >> > >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > From p.j.a.cock at googlemail.com Thu Dec 5 05:30:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 10:30:27 +0000 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 10:01 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Christopher: Ah yes, I actually meant the IPython notebook (but I > guess it turns out the same occurs in the IPython console then :) ). > > Antony: That may be the case, too. But thanks for the pull request (I > think Peter has just looked at it, actually > https://github.com/biopython/biopython/pull/148). I do think there is > room for improvement there, especially since the code probably > predates modern Python conventions (& assumptions). > > Cheers, > Bow Even Frederic Sohm (the original author) agreed that the current Bio.Restriction code is too complicated (I called it 'magic' in our discussion back in 2010 regarding a Python 2.6 problem with super http://bugzilla.open-bio.org/show_bug.cgi?id=2604 or now https://redmine.open-bio.org/issues/2604 ). And also I dislike the fact it does one-based counting. However, none of our currently active developers really understand the code so changing it is hard - and backward compatibility constrains us greatly. I think the best route forward is to replace Bio.Restriction with a new less complicated implementation trying to follow modern Python conventions (using zero-based counting!), likey based on Antony's branch https://github.com/biopython/biopython/pull/148 and then deprecate and later remove Bio.Restriction. (We should continue that debate on the biopython-dev list, CC'd) In terms of Christopher's problems - it would not surprise me if they are specific to IPython since introspection of the 'magic' classes seems problematic. Regards, Peter From davidsshin at lbl.gov Thu Dec 5 07:05:00 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 04:05:00 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? Message-ID: Hi all - During off peak times, I was working on a script that takes a list of gi numbers (a list of 8) for protein sequences to use as input to get fasta sequences via Entrez.efetch. I passed along my email address during the searches. I am still new to python, etc. so had been testing my program. At some point I started getting the error below, and I'm not sure if it is my program, my web provider, or if ncbi got mad at me. It's been like this for about an hour, so giving up for the night. Thanks error: Traceback (most recent call last): File "../gi-to-fasta5.py", line 11, in handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", id=gi_numbers) File "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 144, in efetch return _open(cgi, variables, post) File "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 460, in _open raise exception urllib2.HTTPError: HTTP Error 400: Bad Request -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Thu Dec 5 07:31:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 12:31:07 +0000 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Hi David, This is probably an intermittent failure, perhaps the NCBI is very busy or there could be a temporary network problem somewhere. The chances are it will work tomorrow... Peter On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: > Hi all - > > During off peak times, I was working on a script that takes a list of gi > numbers (a list of 8) for protein sequences to use as input to get fasta > sequences via Entrez.efetch. I passed along my email address during the > searches. I am still new to python, etc. so had been testing my program. > > At some point I started getting the error below, and I'm not sure if it is > my program, my web provider, or if ncbi got mad at me. It's been like this > for about an hour, so giving up for the night. > > Thanks > > error: > > Traceback (most recent call last): > File "../gi-to-fasta5.py", line 11, in > handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", > id=gi_numbers) > File > "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 144, in efetch > return _open(cgi, variables, post) > File > "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 460, in _open > raise exception > urllib2.HTTPError: HTTP Error 400: Bad Request > > > > > > -- > David Shin, Ph.D > Lawrence Berkeley National Labs > 1 Cyclotron Road > MS 83-R0101 > Berkeley, CA 94720 > USA > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From dhoworth at mrc-lmb.cam.ac.uk Thu Dec 5 09:42:47 2013 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu, 05 Dec 2013 14:42:47 +0000 Subject: [Biopython] Bio.PDB local MMCIF files Message-ID: <52A090E7.10706@mrc-lmb.cam.ac.uk> I'm just looking at Bio.PDB with a view to switching over to it, so I can use it to read MMCIF files in preparation for 4-character chain IDs etc. I presently use a homegrown perl module to read PDB files, and I have a local PDB mirror. I see from http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ that there's some sort of facility to use local files, perhaps using PDBList() but I don't see from the FAQ or the PDBList class docs exactly which directory it wants and whether it is PDB files only or a complete archive. The only hints seem to hint at a single directory of PDB files or a divided directory hierarchy of PDB files, neither of which will contain the MMCIF files. So is it: /$TOP/data /$TOP/data/structures /$TOP/data/structures/all Or is it some other technique entirely to access MMCIF files? Looking at the code, it seems to want a full path to an MMCIF file, and the MMCIF file must be uncompressed. Since all the files in the archive are compressed, that would be another thing I'd want to fix. Oh, I'm currently using the V1.58 that came with the machine I'm testing on. I can upgrade if essential. Thanks and regards, Dave From anaryin at gmail.com Thu Dec 5 10:09:17 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 5 Dec 2013 16:09:17 +0100 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <52A090E7.10706@mrc-lmb.cam.ac.uk> References: <52A090E7.10706@mrc-lmb.cam.ac.uk> Message-ID: Dear Dave, I'm not quite sure I understood your question. PDBList is used to download and maintain a local copy of the PDB, which would not suit you since you are looking for mmCIF data. It could be tweaked however to download mmCIF files. Is this what you are looking for? As for mmCIF parsing and manipulation, currently the parser accepts a path to the file (relative paths should do) but indeed it does not handle compression. I think it would be up to the user to inflate the gz file before parsing.. Best, Jo?o 2013/12/5 Dave Howorth > I'm just looking at Bio.PDB with a view to switching over to it, so I > can use it to read MMCIF files in preparation for 4-character chain IDs > etc. I presently use a homegrown perl module to read PDB files, and I > have a local PDB mirror. I see from > > http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ > > that there's some sort of facility to use local files, perhaps using > PDBList() but I don't see from the FAQ or the PDBList class docs exactly > which directory it wants and whether it is PDB files only or a complete > archive. The only hints seem to hint at a single directory of PDB files > or a divided directory hierarchy of PDB files, neither of which will > contain the MMCIF files. So is it: > > /$TOP/data > /$TOP/data/structures > /$TOP/data/structures/all > > Or is it some other technique entirely to access MMCIF files? > > Looking at the code, it seems to want a full path to an MMCIF file, and > the MMCIF file must be uncompressed. Since all the files in the archive > are compressed, that would be another thing I'd want to fix. > > Oh, I'm currently using the V1.58 that came with the machine I'm testing > on. I can upgrade if essential. > > Thanks and regards, > Dave > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From phyrexian.kavu at gmail.com Thu Dec 5 10:13:02 2013 From: phyrexian.kavu at gmail.com (phyrexian.kavu at gmail.com) Date: Thu, 5 Dec 2013 09:13:02 -0600 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Hi David! I had the same error while trying to download hundreds of sequences from Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter says it is an internet error. I avoided it by making a loop and adding the try - except statements, so whenever this error appeared, the script just returned and tried it again until the sequences were downloaded correctly. I know it is a brute force-like approach, but this was the best solution I could think of, I hope It helps you too. Miguel [Theropoda is my profession] > On 05/12/2013, at 06:31, Peter Cock wrote: > > Hi David, > > This is probably an intermittent failure, perhaps the NCBI > is very busy or there could be a temporary network problem > somewhere. The chances are it will work tomorrow... > > Peter > > >> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >> Hi all - >> >> During off peak times, I was working on a script that takes a list of gi >> numbers (a list of 8) for protein sequences to use as input to get fasta >> sequences via Entrez.efetch. I passed along my email address during the >> searches. I am still new to python, etc. so had been testing my program. >> >> At some point I started getting the error below, and I'm not sure if it is >> my program, my web provider, or if ncbi got mad at me. It's been like this >> for about an hour, so giving up for the night. >> >> Thanks >> >> error: >> >> Traceback (most recent call last): >> File "../gi-to-fasta5.py", line 11, in >> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >> id=gi_numbers) >> File >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 144, in efetch >> return _open(cgi, variables, post) >> File >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 460, in _open >> raise exception >> urllib2.HTTPError: HTTP Error 400: Bad Request >> >> >> >> >> >> -- >> David Shin, Ph.D >> Lawrence Berkeley National Labs >> 1 Cyclotron Road >> MS 83-R0101 >> Berkeley, CA 94720 >> USA >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Thu Dec 5 11:53:46 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 5 Dec 2013 17:53:46 +0100 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> References: <52A090E7.10706@mrc-lmb.cam.ac.uk> <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Message-ID: Hi Dave, I understand your concern. Python has the gzip module that can decompress the files on the file and provide a handle for the content. This will not work for the parsers since they except a filename. I will have a look at the parsers code and if it's simple, I'll add a layer to do this exactly. Cheers, Jo?o 2013/12/5 Dave Howorth > Jo?o Rodrigues wrote: > > Dear Dave, > > > > I'm not quite sure I understood your question. PDBList is used to > download > > and maintain a local copy of the PDB, which would not suit you since you > > are looking for mmCIF data. It could be tweaked however to download mmCIF > > files. Is this what you are looking for? > > Sorry, I didn't express myself very well. I misunderstood the purpose of > PDBList, and at the time thought it was simply a way to tell Biopython > where the local archive was. I already have access to a PDB/mmCIF > archive; I don't need to create one. > > > As for mmCIF parsing and manipulation, currently the parser accepts a > path > > to the file (relative paths should do) but indeed it does not handle > > compression. I think it would be up to the user to inflate the gz file > > before parsing.. > > I don't think that is very convenient, since all the files are normally > stored compressed. That's the usual case. Using a filename as the only > way to specify a file means that I would have to open the file in the > archive, read and uncompress it and store it in another file before > passing the name of that file to the mmCIF parser. Unless python > supports some means to incorporate a decopmression layer specification > into the 'filename'? (Sorry, I'm new to python) > > I would think that the 'nicest' solution would be for the parser to > recognize a compressed file and use a gzip layer to decompress it on the > fly. Alternatively, the parser could accept an open file handle as an > alternative to a filename and the caller would be responsible for > opening the file through a decompression layer. > > Since the caller is going to have to deal with prepending the library > base to the filename anyway, I suppose having it produce a decompressed > stream is not a great problem, if only it could pass the stream to the > parser! > > Cheers, Dave > > > Best, > > > > Jo?o > > PS, Sorry if I'm breaking threads by replying to the copy of the email > that Jo?o sent to me, but the copy from the mail server hasn't arrived > here yet, despite being visible at gmane. > From dhoworth at mrc-lmb.cam.ac.uk Thu Dec 5 11:45:28 2013 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu, 05 Dec 2013 16:45:28 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: References: <52A090E7.10706@mrc-lmb.cam.ac.uk> Message-ID: <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Jo?o Rodrigues wrote: > Dear Dave, > > I'm not quite sure I understood your question. PDBList is used to download > and maintain a local copy of the PDB, which would not suit you since you > are looking for mmCIF data. It could be tweaked however to download mmCIF > files. Is this what you are looking for? Sorry, I didn't express myself very well. I misunderstood the purpose of PDBList, and at the time thought it was simply a way to tell Biopython where the local archive was. I already have access to a PDB/mmCIF archive; I don't need to create one. > As for mmCIF parsing and manipulation, currently the parser accepts a path > to the file (relative paths should do) but indeed it does not handle > compression. I think it would be up to the user to inflate the gz file > before parsing.. I don't think that is very convenient, since all the files are normally stored compressed. That's the usual case. Using a filename as the only way to specify a file means that I would have to open the file in the archive, read and uncompress it and store it in another file before passing the name of that file to the mmCIF parser. Unless python supports some means to incorporate a decopmression layer specification into the 'filename'? (Sorry, I'm new to python) I would think that the 'nicest' solution would be for the parser to recognize a compressed file and use a gzip layer to decompress it on the fly. Alternatively, the parser could accept an open file handle as an alternative to a filename and the caller would be responsible for opening the file through a decompression layer. Since the caller is going to have to deal with prepending the library base to the filename anyway, I suppose having it produce a decompressed stream is not a great problem, if only it could pass the stream to the parser! Cheers, Dave > Best, > > Jo?o PS, Sorry if I'm breaking threads by replying to the copy of the email that Jo?o sent to me, but the copy from the mail server hasn't arrived here yet, despite being visible at gmane. From p.j.a.cock at googlemail.com Thu Dec 5 12:35:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 17:35:50 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: References: <52A090E7.10706@mrc-lmb.cam.ac.uk> <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Message-ID: On Thu, Dec 5, 2013 at 4:53 PM, Jo?o Rodrigues wrote: > Hi Dave, > > I understand your concern. Python has the gzip module that can decompress > the files on the file and provide a handle for the content. This will not > work for the parsers since they except a filename. I will have a look at > the parsers code and if it's simple, I'll add a layer to do this exactly. > > Cheers, > > Jo?o Yeah, most of Biopython's parsers take a handle only, some like the top level functions in Bio.SeqIO, AlignIO, SearchIO will take a filename or a handle for convenience. I agree that is is unfortunate that Bio.PDB currently only takes a filename (historical design choice). If we can tweak it to take either a filename or a handle that would be much better :) Regards, Peter From davidsshin at lbl.gov Fri Dec 6 02:27:16 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 23:27:16 -0800 Subject: [Biopython] going from protein to gene to oligos for cloning Message-ID: Hi again, I'm trying to use biopython to help me grab a lot of protein sequences that will eventually be used as the basis for cloning. I'm almost done screening my protein sequences, and pretty much ok on that part... I was just curious if anyone has already developed, or has any decent advice on going from protein codes to getting the actual coding sequences of the genes. At this point, my plan is to take protein codes (ie. numbers in gi|145323746|) and use these to search entrez nucleotide databases directly to get hits (I have tested it once seems to work to get genbank records... then try to use the information inside to get the nucleotide sequences... or I guess the other way is to use the top hit from tblastn somehow? Thanks, Dave -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From davidsshin at lbl.gov Fri Dec 6 02:20:00 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 23:20:00 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Thanks guys - For now it eventually worked its way out, but if you could show me a snippet of how try and except statements work, I would be thankful (I know I can probably google it too), I'm also new to python in general, so coding examples always help. Thanks again, Dave On Thu, Dec 5, 2013 at 7:13 AM, wrote: > Hi David! > > I had the same error while trying to download hundreds of sequences from > Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter > says it is an internet error. I avoided it by making a loop and adding the > try - except statements, so whenever this error appeared, the script just > returned and tried it again until the sequences were downloaded correctly. > I know it is a brute force-like approach, but this was the best solution I > could think of, I hope It helps you too. > > Miguel > > > [Theropoda is my profession] > > > On 05/12/2013, at 06:31, Peter Cock wrote: > > > > Hi David, > > > > This is probably an intermittent failure, perhaps the NCBI > > is very busy or there could be a temporary network problem > > somewhere. The chances are it will work tomorrow... > > > > Peter > > > > > >> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: > >> Hi all - > >> > >> During off peak times, I was working on a script that takes a list of gi > >> numbers (a list of 8) for protein sequences to use as input to get fasta > >> sequences via Entrez.efetch. I passed along my email address during the > >> searches. I am still new to python, etc. so had been testing my program. > >> > >> At some point I started getting the error below, and I'm not sure if it > is > >> my program, my web provider, or if ncbi got mad at me. It's been like > this > >> for about an hour, so giving up for the night. > >> > >> Thanks > >> > >> error: > >> > >> Traceback (most recent call last): > >> File "../gi-to-fasta5.py", line 11, in > >> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", > >> id=gi_numbers) > >> File > >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > >> line 144, in efetch > >> return _open(cgi, variables, post) > >> File > >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > >> line 460, in _open > >> raise exception > >> urllib2.HTTPError: HTTP Error 400: Bad Request > >> > >> > >> > >> > >> > >> -- > >> David Shin, Ph.D > >> Lawrence Berkeley National Labs > >> 1 Cyclotron Road > >> MS 83-R0101 > >> Berkeley, CA 94720 > >> USA > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From karin.lagesen at medisin.uio.no Fri Dec 6 04:26:06 2013 From: karin.lagesen at medisin.uio.no (Karin Lagesen) Date: Fri, 06 Dec 2013 10:26:06 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: <52A1982E.1070501@medisin.uio.no> IIRC, NCBI requires a 3 second delay between fetches to avoid flooding the server. I cannot find again the site where I found this, but that's what I learned the last time I did this, and I am also finding the same info in bioperl documentation. Karin On 06.12.2013 08:20, David Shin wrote: > Thanks guys - > > For now it eventually worked its way out, but if you could show me a > snippet of how try and except statements work, I would be thankful (I know > I can probably google it too), I'm also new to python in general, so coding > examples always help. > > Thanks again, > > Dave > > > On Thu, Dec 5, 2013 at 7:13 AM, wrote: > >> Hi David! >> >> I had the same error while trying to download hundreds of sequences from >> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >> says it is an internet error. I avoided it by making a loop and adding the >> try - except statements, so whenever this error appeared, the script just >> returned and tried it again until the sequences were downloaded correctly. >> I know it is a brute force-like approach, but this was the best solution I >> could think of, I hope It helps you too. >> >> Miguel >> >> >> [Theropoda is my profession] >> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>> >>> Hi David, >>> >>> This is probably an intermittent failure, perhaps the NCBI >>> is very busy or there could be a temporary network problem >>> somewhere. The chances are it will work tomorrow... >>> >>> Peter >>> >>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >>>> Hi all - >>>> >>>> During off peak times, I was working on a script that takes a list of gi >>>> numbers (a list of 8) for protein sequences to use as input to get fasta >>>> sequences via Entrez.efetch. I passed along my email address during the >>>> searches. I am still new to python, etc. so had been testing my program. >>>> >>>> At some point I started getting the error below, and I'm not sure if it >> is >>>> my program, my web provider, or if ncbi got mad at me. It's been like >> this >>>> for about an hour, so giving up for the night. >>>> >>>> Thanks >>>> >>>> error: >>>> >>>> Traceback (most recent call last): >>>> File "../gi-to-fasta5.py", line 11, in >>>> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >>>> id=gi_numbers) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 144, in efetch >>>> return _open(cgi, variables, post) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 460, in _open >>>> raise exception >>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> David Shin, Ph.D >>>> Lawrence Berkeley National Labs >>>> 1 Cyclotron Road >>>> MS 83-R0101 >>>> Berkeley, CA 94720 >>>> USA >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > -- Karin Lagesen, Ph.D. Department of Medical Genetics and Norwegian High-Throughput Sequencing Centre (NSC) Oslo University Hospital From norbert.auer at boku.ac.at Fri Dec 6 05:09:18 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Fri, 06 Dec 2013 11:09:18 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: <52A1A24E.80403@boku.ac.at> Hi David, The Entrez Service seems to be very buggy. Sometimes I even got different results for the same query. Therefore I stopped experiments with the Entrez API Service. But you can use a faster and more robust approach. You can simply download the whole data at once from NCBI's ftp site. For example the address for the complete gene information from mouse is: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Mus_musculus.ags.gz Then you can use the NCBI ToolKit utility programs to convert this ags binary file into xml. This format is the same what you get from the Entrez.fetch command. It is also described in the biopython tutorial. http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec128 You can use this simple shell script to download and extract the right genome and convert it into xml. curl -C - ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/{All_Mammalia.ags.gz} -o "Sources/ASN/#1" gene2xml -b T -c T -i Sources/ASN/All_Mammalia.ags.gz -t 10029 -o Output/10029.xml These lines extract a whole gene information for Cricetulus griseus(10029) in one turn. I also work on a parser to extract refseq and gi ids (genomic-mRNA-peptide) from this xml into csv and html files. If you work by any chance with mouse or CHO sequences you can take a look here. http://ala.boku.ac.at:4080/nauer/tools/genomes Hope this helps. Best regards, Norbert Am 2013-12-06 08:20, schrieb David Shin: > Thanks guys - > > For now it eventually worked its way out, but if you could show me a > snippet of how try and except statements work, I would be thankful (I know > I can probably google it too), I'm also new to python in general, so coding > examples always help. > > Thanks again, > > Dave > > > On Thu, Dec 5, 2013 at 7:13 AM, wrote: > >> Hi David! >> >> I had the same error while trying to download hundreds of sequences from >> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >> says it is an internet error. I avoided it by making a loop and adding the >> try - except statements, so whenever this error appeared, the script just >> returned and tried it again until the sequences were downloaded correctly. >> I know it is a brute force-like approach, but this was the best solution I >> could think of, I hope It helps you too. >> >> Miguel >> >> >> [Theropoda is my profession] >> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>> >>> Hi David, >>> >>> This is probably an intermittent failure, perhaps the NCBI >>> is very busy or there could be a temporary network problem >>> somewhere. The chances are it will work tomorrow... >>> >>> Peter >>> >>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >>>> Hi all - >>>> >>>> During off peak times, I was working on a script that takes a list of gi >>>> numbers (a list of 8) for protein sequences to use as input to get fasta >>>> sequences via Entrez.efetch. I passed along my email address during the >>>> searches. I am still new to python, etc. so had been testing my program. >>>> >>>> At some point I started getting the error below, and I'm not sure if it >> is >>>> my program, my web provider, or if ncbi got mad at me. It's been like >> this >>>> for about an hour, so giving up for the night. >>>> >>>> Thanks >>>> >>>> error: >>>> >>>> Traceback (most recent call last): >>>> File "../gi-to-fasta5.py", line 11, in >>>> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >>>> id=gi_numbers) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 144, in efetch >>>> return _open(cgi, variables, post) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 460, in _open >>>> raise exception >>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> David Shin, Ph.D >>>> Lawrence Berkeley National Labs >>>> 1 Cyclotron Road >>>> MS 83-R0101 >>>> Berkeley, CA 94720 >>>> USA >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > From norbert.auer at boku.ac.at Fri Dec 6 05:05:10 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Fri, 06 Dec 2013 11:05:10 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: <52A1982E.1070501@medisin.uio.no> References: <52A1982E.1070501@medisin.uio.no> Message-ID: <52A1A156.1050701@boku.ac.at> This is automatically enforced by Biopython. See http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec119 Best regards, Norbert Am 2013-12-06 10:26, schrieb Karin Lagesen: > IIRC, NCBI requires a 3 second delay between fetches to avoid flooding > the server. I cannot find again the site where I found this, but that's > what I learned the last time I did this, and I am also finding the same > info in bioperl documentation. > > Karin > > On 06.12.2013 08:20, David Shin wrote: >> Thanks guys - >> >> For now it eventually worked its way out, but if you could show me a >> snippet of how try and except statements work, I would be thankful (I >> know >> I can probably google it too), I'm also new to python in general, so >> coding >> examples always help. >> >> Thanks again, >> >> Dave >> >> >> On Thu, Dec 5, 2013 at 7:13 AM, wrote: >> >>> Hi David! >>> >>> I had the same error while trying to download hundreds of sequences from >>> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >>> says it is an internet error. I avoided it by making a loop and >>> adding the >>> try - except statements, so whenever this error appeared, the script >>> just >>> returned and tried it again until the sequences were downloaded >>> correctly. >>> I know it is a brute force-like approach, but this was the best >>> solution I >>> could think of, I hope It helps you too. >>> >>> Miguel >>> >>> >>> [Theropoda is my profession] >>> >>>> On 05/12/2013, at 06:31, Peter Cock wrote: >>>> >>>> Hi David, >>>> >>>> This is probably an intermittent failure, perhaps the NCBI >>>> is very busy or there could be a temporary network problem >>>> somewhere. The chances are it will work tomorrow... >>>> >>>> Peter >>>> >>>> >>>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin >>>>> wrote: >>>>> Hi all - >>>>> >>>>> During off peak times, I was working on a script that takes a list >>>>> of gi >>>>> numbers (a list of 8) for protein sequences to use as input to get >>>>> fasta >>>>> sequences via Entrez.efetch. I passed along my email address during >>>>> the >>>>> searches. I am still new to python, etc. so had been testing my >>>>> program. >>>>> >>>>> At some point I started getting the error below, and I'm not sure >>>>> if it >>> is >>>>> my program, my web provider, or if ncbi got mad at me. It's been like >>> this >>>>> for about an hour, so giving up for the night. >>>>> >>>>> Thanks >>>>> >>>>> error: >>>>> >>>>> Traceback (most recent call last): >>>>> File "../gi-to-fasta5.py", line 11, in >>>>> handle = Entrez.efetch(db="protein", rettype="fasta", >>>>> retmode="text", >>>>> id=gi_numbers) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>>> >>>>> line 144, in efetch >>>>> return _open(cgi, variables, post) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>>> >>>>> line 460, in _open >>>>> raise exception >>>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> David Shin, Ph.D >>>>> Lawrence Berkeley National Labs >>>>> 1 Cyclotron Road >>>>> MS 83-R0101 >>>>> Berkeley, CA 94720 >>>>> USA >>>>> _______________________________________________ >>>>> Biopython mailing list - Biopython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> >> > > From davidsshin at lbl.gov Fri Dec 6 05:04:53 2013 From: davidsshin at lbl.gov (David Shin) Date: Fri, 6 Dec 2013 02:04:53 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: <52A1982E.1070501@medisin.uio.no> References: <52A1982E.1070501@medisin.uio.no> Message-ID: Hi Karin, Yes, I saw that in the tutorial manual, but appeared biopython took care of that part. I was thinking about putting a delay in my script anyway. On Fri, Dec 6, 2013 at 1:26 AM, Karin Lagesen wrote: > IIRC, NCBI requires a 3 second delay between fetches to avoid flooding the > server. I cannot find again the site where I found this, but that's what I > learned the last time I did this, and I am also finding the same info in > bioperl documentation. > > Karin > > > On 06.12.2013 08:20, David Shin wrote: > >> Thanks guys - >> >> For now it eventually worked its way out, but if you could show me a >> snippet of how try and except statements work, I would be thankful (I know >> I can probably google it too), I'm also new to python in general, so >> coding >> examples always help. >> >> Thanks again, >> >> Dave >> >> >> On Thu, Dec 5, 2013 at 7:13 AM, wrote: >> >> Hi David! >>> >>> I had the same error while trying to download hundreds of sequences from >>> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >>> says it is an internet error. I avoided it by making a loop and adding >>> the >>> try - except statements, so whenever this error appeared, the script just >>> returned and tried it again until the sequences were downloaded >>> correctly. >>> I know it is a brute force-like approach, but this was the best solution >>> I >>> could think of, I hope It helps you too. >>> >>> Miguel >>> >>> >>> [Theropoda is my profession] >>> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>>> >>>> Hi David, >>>> >>>> This is probably an intermittent failure, perhaps the NCBI >>>> is very busy or there could be a temporary network problem >>>> somewhere. The chances are it will work tomorrow... >>>> >>>> Peter >>>> >>>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin >>>>> wrote: >>>>> Hi all - >>>>> >>>>> During off peak times, I was working on a script that takes a list of >>>>> gi >>>>> numbers (a list of 8) for protein sequences to use as input to get >>>>> fasta >>>>> sequences via Entrez.efetch. I passed along my email address during the >>>>> searches. I am still new to python, etc. so had been testing my >>>>> program. >>>>> >>>>> At some point I started getting the error below, and I'm not sure if it >>>>> >>>> is >>> >>>> my program, my web provider, or if ncbi got mad at me. It's been like >>>>> >>>> this >>> >>>> for about an hour, so giving up for the night. >>>>> >>>>> Thanks >>>>> >>>>> error: >>>>> >>>>> Traceback (most recent call last): >>>>> File "../gi-to-fasta5.py", line 11, in >>>>> handle = Entrez.efetch(db="protein", rettype="fasta", >>>>> retmode="text", >>>>> id=gi_numbers) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/ >>>>> Entrez/__init__.py", >>>>> line 144, in efetch >>>>> return _open(cgi, variables, post) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/ >>>>> Entrez/__init__.py", >>>>> line 460, in _open >>>>> raise exception >>>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> David Shin, Ph.D >>>>> Lawrence Berkeley National Labs >>>>> 1 Cyclotron Road >>>>> MS 83-R0101 >>>>> Berkeley, CA 94720 >>>>> USA >>>>> _______________________________________________ >>>>> Biopython mailing list - Biopython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>>> >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> >>> >>> >> >> >> > > -- > Karin Lagesen, Ph.D. > Department of Medical Genetics > and Norwegian High-Throughput Sequencing Centre (NSC) > Oslo University Hospital > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Fri Dec 6 05:24:38 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:24:38 +0000 Subject: [Biopython] going from protein to gene to oligos for cloning In-Reply-To: References: Message-ID: On Fri, Dec 6, 2013 at 7:27 AM, David Shin wrote: > Hi again, > > I'm trying to use biopython to help me grab a lot of protein sequences that > will eventually be used as the basis for cloning. I'm almost done screening > my protein sequences, and pretty much ok on that part... > > I was just curious if anyone has already developed, or has any decent > advice on going from protein codes to getting the actual coding sequences > of the genes. > > At this point, my plan is to take protein codes (ie. numbers in > gi|145323746|) and use these to search entrez nucleotide databases directly > to get hits (I have tested it once seems to work to get genbank records... > then try to use the information inside to get the nucleotide sequences... > or I guess the other way is to use the top hit from tblastn somehow? > > Thanks, > > Dave Hi Dave, The catch here is the protein IDs are not directly usable in the nucleotide database - which is where ELink (Entrez Link) comes in, available as the Entrez.elink(...) function in Biopython. I've not tried it myself, but a colleague posted a long example on his blog which sounds close to what you are aiming for: http://armchairbiology.blogspot.co.uk/2013/02/surely-this-has-been-done-already.html https://github.com/widdowquinn/scripts/blob/master/bioinformatics/get_NCBI_cds_from_protein.py Peter From tiagoantao at gmail.com Fri Dec 6 06:27:28 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 11:27:28 +0000 Subject: [Biopython] Biopython 1.63 released Message-ID: Source distributions and Windows installers for Biopython 1.63 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). The current version removed the requirement of the 2to3 library. This was made possible by dropping Python 2.5 (and Jython 2.5). This release of Biopython supports Python 2.6 and 2.7, and also Python 3.3. The Biopython Tutorial & Cookbook, and the docstring examples in the source code, now use the Python 3 style print function in place of the Python 2 style print statement. This language feature is available under Python 2.6 and 2.7 via: from __future__ import print_function Similarly we now use the Python 3 style built-in next function in place of the Python 2 style iterators? .next() method. This language feature is also available under Python 2.6 and 2.7. The restriction enzyme list in Bio.Restriction has been updated to the December 2013 release of REBASE. Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Chris Mitchell (first contribution) Christian Brueffer Eric Talevich Gokcen Eraslan (first contribution) Josha Inglis (first contribution) Konstantin Tretyakov (first contribution) Lenna Peterson Martin Mokrejs Nigel Delaney (first contribution) Peter Cock Sergei Lebedev (first contribution) Tiago Antao Wayne Decatur (first contribution) Wibowo ?Bow? Arindrarto From davidsshin at lbl.gov Tue Dec 10 02:59:33 2013 From: davidsshin at lbl.gov (David Shin) Date: Mon, 9 Dec 2013 23:59:33 -0800 Subject: [Biopython] using prosite Message-ID: Hi Again, I'm trying to find motifs in a group of proteins using prosite via biopython. I've tried to copy what's in the tutorial (pg 152-153) into a script to get started. If I run it on my test sequence I get nothing, so I assume the default is to "exclude motifs with a high probability of occurrence from the scan" like on the website. So I'm trying to substitute some flag to turn off this feature, or to just scan for specific motifs, ie. PS00001 Asn-Glycosylation, but have had no luck. Any suggestions on syntax here? Below is a basic script where I tried to give a specific motif: from Bio.ExPASy import ScanProsite sequence = "MSQHLLLLILLSLLLLSLLLLHKPISATTIIQKFKEAPQFYNSADCPLIDDSESDDDVVAKPIFCSRRAVHVAMTLDAAYIRGSVAAVLSVLQHSSCPENIVFHFVASASADASSLRATISSSFPYLDFTVYVFNVSSVSRLISSSIRSALDCPLNYARSYLADLLPPCVRRVVYLDSDLILVDDIAKLAATDLGRDSVLAAPEYCNANFTSYFTSTFWSNPTLSLTFADRKACYFNTGVMVIDLSRWREGAYTSRIEEWMAMQKRMRIYELGSLPPFLLVFAGLIKPVNHRWNQHGLGGDNFRGLCRDLHPGPVSLLHWSGKGKPWARLDAGRPCPLDALWAPYPDPDPDDLLQTPFALDS" handle = ScanProsite.scan(seq=sequence, accession=PS00001) result = ScanProsite.read(handle) print type(result) print len(result) print result[0] print result[1] Thanks! Dave From philipp.schiffer at gmail.com Sat Dec 21 10:46:40 2013 From: philipp.schiffer at gmail.com (Philipp Schiffer) Date: Sat, 21 Dec 2013 16:46:40 +0100 Subject: [Biopython] Python getting stuck reading fastq file Message-ID: <981B2822CDB34C8B89869B7C68D04898@gmail.com> Hi! I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5). Biopython was installed through pip on a Scientific Linux 6.2 system. Is this an error with the SeqIO parser? Or am I doing something wrong? Any help with this would be highly appreciated. Kind regards Philipp import string from pprint import pprint import os from Bio import SeqIO from subprocess import call import sys import re fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w') my_abundantheads=set() my_onetwo = re.compile('\/[1-2]?) abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU') for record in SeqIO.parse(abundant, "fastq"): ids = my_onetwo.split(record.id) my_abundantheads.update([ids[0]]) ?. ^C--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) in () ----> 1 for record in SeqIO.parse(abundant, "fastq"): 2 ids = my_onetwo.split(record.id) 3 my_abundantheads.update([ids[0]]) 4 /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet) 539 raise ValueError("Unknown format '%s'" % format) 540 #This imposes some overhead... wait until we drop Python 2.4 to fix it --> 541 for r in i: 542 yield r 543 /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids) 1034 for letter in range(0, 255): 1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET -> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle): 1037 if title2ids: 1038 id, name, descr = title2ids(title_line) /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle) 934 #There may now be more quality data, or another sequence, or EOF 935 while True: --> 936 line = handle_readline() 937 if not line: 938 break # end of file KeyboardInterrupt: -- Philipp Schiffer Sent with Sparrow (http://www.sparrowmailapp.com/?sig) From philipp.schiffer at gmail.com Sat Dec 21 10:57:35 2013 From: philipp.schiffer at gmail.com (Philipp Schiffer) Date: Sat, 21 Dec 2013 16:57:35 +0100 Subject: [Biopython] Python getting stuck reading fastq file In-Reply-To: <981B2822CDB34C8B89869B7C68D04898@gmail.com> References: <981B2822CDB34C8B89869B7C68D04898@gmail.com> Message-ID: <83CECE51818A4D9D8E85862AF331A838@gmail.com> Hi again! Seems to be working now, after I pulled the latest Biopython from github. Very sorry for Saturday evening disturbance. Have a great Christmas season everybody. Philipp -- Philipp Schiffer Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, 21 December 2013 at 16:46, Philipp Schiffer wrote: > Hi! > > I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5). > Biopython was installed through pip on a Scientific Linux 6.2 system. > Is this an error with the SeqIO parser? Or am I doing something wrong? > > Any help with this would be highly appreciated. > > Kind regards > > Philipp > > import string > from pprint import pprint > import os > from Bio import SeqIO > from subprocess import call > import sys > import re > > fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w') > my_abundantheads=set() > my_onetwo = re.compile('\/[1-2]?) > > abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU') > for record in SeqIO.parse(abundant, "fastq"): > ids = my_onetwo.split(record.id) > > my_abundantheads.update([ids[0]]) > > > > ?. > > ^C--------------------------------------------------------------------------- > KeyboardInterrupt Traceback (most recent call last) > in () > ----> 1 for record in SeqIO.parse(abundant, "fastq"): > 2 ids = my_onetwo.split(record.id) > 3 my_abundantheads.update([ids[0]]) > 4 > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet) > 539 raise ValueError("Unknown format '%s'" % format) > 540 #This imposes some overhead... wait until we drop Python 2.4 to fix it > --> 541 for r in i: > 542 yield r > 543 > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids) > 1034 for letter in range(0, 255): > 1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET > -> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle): > 1037 if title2ids: > 1038 id, name, descr = title2ids(title_line) > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle) > 934 #There may now be more quality data, or another sequence, or EOF > 935 while True: > --> 936 line = handle_readline() > 937 if not line: > 938 break # end of file > > KeyboardInterrupt: > > > > > -- > Philipp Schiffer > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > From p.j.a.cock at googlemail.com Sat Dec 21 11:01:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 21 Dec 2013 16:01:15 +0000 Subject: [Biopython] Python getting stuck reading fastq file In-Reply-To: <83CECE51818A4D9D8E85862AF331A838@gmail.com> References: <981B2822CDB34C8B89869B7C68D04898@gmail.com> <83CECE51818A4D9D8E85862AF331A838@gmail.com> Message-ID: On Sat, Dec 21, 2013 at 3:57 PM, Philipp Schiffer wrote: > Hi again! > > Seems to be working now, after I pulled the latest Biopython from github. > Very sorry for Saturday evening disturbance. > > Have a great Christmas season everybody. OK, good :) What made you think it is stuck though (and not just taking a long time)? Also, since you are only using the ID rather than the full SeqRecord representation, it would be faster to use the FastqGeneralIterator iterator function instead, e.g. http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ Regards, Peter From mjldehoon at yahoo.com Fri Dec 27 21:10:24 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Dec 2013 18:10:24 -0800 (PST) Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: Message-ID: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> Various issues of the MMCIF parser also came up previously, see for example the thread starting here: http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html I'd like to propose to make the MMCIF parser a bit more Pythonic and more consistent with other Biopython modules. This is a description of how the MMCIF parser would work: http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 This uses a new Bio.PDB.mmcif module that has a read() function (taking both file names and file handles) which stores the information in an MMCIF file in a mmcif.Record object, which is a Python dictionary. Please have a look if you agree or have any comments or suggestions. If this looks OK, I can upload the new code. Best, -Michiel. -------------------------------------------- On Thu, 12/5/13, Peter Cock wrote: Subject: Re: [Biopython] Bio.PDB local MMCIF files To: "Jo?o Rodrigues" Cc: "Biopython Mailing List" Date: Thursday, December 5, 2013, 12:35 PM On Thu, Dec 5, 2013 at 4:53 PM, Jo?o Rodrigues wrote: > Hi Dave, > > I understand your concern. Python has the gzip module that can decompress > the files on the file and provide a handle for the content. This will not > work for the parsers since they except a filename. I will have a look at > the parsers code and if it's simple, I'll add a layer to do this exactly. > > Cheers, > > Jo?o Yeah, most of Biopython's parsers take a handle only, some like the top level functions in Bio.SeqIO, AlignIO, SearchIO will take a filename or a handle for convenience. I agree that is is unfortunate that Bio.PDB currently only takes a filename (historical design choice). If we can tweak it to take either a filename or a handle that would be much better :) Regards, Peter _______________________________________________ Biopython mailing list? -? Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Sat Dec 28 15:11:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 28 Dec 2013 20:11:08 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: Dear all, I agree that the code for mmcif parsing should be improved. However, I also think that we should move to an entirely new module to have sane names. The people at the EBI-EMBL have developed an efficient and robust mmcif parser that they would like to share with us so we can integrate in our code. I'm currently on holidays, I'll be back to work in a couple of weeks, so I suggest we postpone this discussion until then? Cheers, Jo?o Em 28/12/2013 02:10, "Michiel de Hoon" escreveu: > Various issues of the MMCIF parser also came up previously, see for > example the thread starting here: > > http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html > > I'd like to propose to make the MMCIF parser a bit more Pythonic and more > consistent with other Biopython modules. > This is a description of how the MMCIF parser would work: > > > http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 > > This uses a new Bio.PDB.mmcif module that has a read() function (taking > both file names and file handles) which stores the information in an MMCIF > file in a mmcif.Record object, which is a Python dictionary. > > Please have a look if you agree or have any comments or suggestions. If > this looks OK, I can upload the new code. > > Best, > -Michiel. > > -------------------------------------------- > On Thu, 12/5/13, Peter Cock wrote: > > Subject: Re: [Biopython] Bio.PDB local MMCIF files > To: "Jo?o Rodrigues" > Cc: "Biopython Mailing List" > Date: Thursday, December 5, 2013, 12:35 PM > > On Thu, Dec 5, 2013 at 4:53 PM, Jo?o > Rodrigues > wrote: > > Hi Dave, > > > > I understand your concern. Python has the gzip module > that can decompress > > the files on the file and provide a handle for the > content. This will not > > work for the parsers since they except a filename. I > will have a look at > > the parsers code and if it's simple, I'll add a layer > to do this exactly. > > > > Cheers, > > > > Jo?o > > Yeah, most of Biopython's parsers take a handle only, some > like the top level functions in Bio.SeqIO, AlignIO, > SearchIO > will take a filename or a handle for convenience. > > I agree that is is unfortunate that Bio.PDB currently only > takes > a filename (historical design choice). If we can tweak it to > take > either a filename or a handle that would be much better :) > > Regards, > > Peter > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From mjldehoon at yahoo.com Sun Dec 29 22:37:30 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 29 Dec 2013 19:37:30 -0800 (PST) Subject: [Biopython] Bio.PDB local MMCIF files Message-ID: <1388374650.98611.BPMail_high_carrier@web164003.mail.gq1.yahoo.com> OK, I will hold off for now then. -Michiel ------------------------------ On Sat, Dec 28, 2013 3:11 PM EST Jo?o Rodrigues wrote: >Dear all, > >I agree that the code for mmcif parsing should be improved. However, I also >think that we should move to an entirely new module to have sane names. > >The people at the EBI-EMBL have developed an efficient and robust mmcif >parser that they would like to share with us so we can integrate in our >code. > >I'm currently on holidays, I'll be back to work in a couple of weeks, so I >suggest we postpone this discussion until then? > >Cheers, >Jo?o >Em 28/12/2013 02:10, "Michiel de Hoon" escreveu: > >> Various issues of the MMCIF parser also came up previously, see for >> example the thread starting here: >> >> http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html >> >> I'd like to propose to make the MMCIF parser a bit more Pythonic and more >> consistent with other Biopython modules. >> This is a description of how the MMCIF parser would work: >> >> >> http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 >> >> This uses a new Bio.PDB.mmcif module that has a read() function (taking >> both file names and file handles) which stores the information in an MMCIF >> file in a mmcif.Record object, which is a Python dictionary. >> >> Please have a look if you agree or have any comments or suggestions. If >> this looks OK, I can upload the new code. >> >> Best, >> -Michiel. >> >> -------------------------------------------- >> On Thu, 12/5/13, Peter Cock wrote: >> >> Subject: Re: [Biopython] Bio.PDB local MMCIF files >> To: "Jo?o Rodrigues" >> Cc: "Biopython Mailing List" >> Date: Thursday, December 5, 2013, 12:35 PM >> >> On Thu, Dec 5, 2013 at 4:53 PM, Jo?o >> Rodrigues >> wrote: >> > Hi Dave, >> > >> > I understand your concern. Python has the gzip module >> that can decompress >> > the files on the file and provide a handle for the >> content. This will not >> > work for the parsers since they except a filename. I >> will have a look at >> > the parsers code and if it's simple, I'll add a layer >> to do this exactly. >> > >> > Cheers, >> > >> > Jo?o >> >> Yeah, most of Biopython's parsers take a handle only, some >> like the top level functions in Bio.SeqIO, AlignIO, >> SearchIO >> will take a filename or a handle for convenience. >> >> I agree that is is unfortunate that Bio.PDB currently only >> takes >> a filename (historical design choice). If we can tweak it to >> take >> either a filename or a handle that would be much better :) >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> >> From romeliasalomon at gmail.com Mon Dec 2 23:43:52 2013 From: romeliasalomon at gmail.com (Romelia Salomon) Date: Mon, 2 Dec 2013 15:43:52 -0800 Subject: [Biopython] error object is not callable, but callable() returns True Message-ID: Hi I am having problems using biopython on my laptop. I have Python version v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using Ubuntu), but I can't get it to work right, I am having a problem running even the examples in the tutorial. When I try to use any of the command line wrappers I get an error saying the object is not callable, even though when I check that same object with callable() it does seem to be callable. For example: mycomputer$ python Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Align.Applications import MuscleCommandline >>> muscle_cline = MuscleCommandline(input="opuntia.fasta") >>> stdout, stderr = muscle_cline() Traceback (most recent call last): File "", line 1, in TypeError: 'MuscleCommandline' object is not callable >>> callable(MuscleCommandline) True Thanks for your help! -- **************************************** Dr. Romelia Salomon Ferrer Nagarajan Research Group Immunology Department City of Hope From p.j.a.cock at googlemail.com Tue Dec 3 10:31:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 Dec 2013 10:31:28 +0000 Subject: [Biopython] error object is not callable, but callable() returns True In-Reply-To: References: Message-ID: On Mon, Dec 2, 2013 at 11:43 PM, Romelia Salomon wrote: > Hi > > I am having problems using biopython on my laptop. I have Python version > v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using > Ubuntu), but I can't get it to work right, I am having a problem running > even the examples in the tutorial. > > When I try to use any of the command line wrappers I get an error saying > the object is not callable, even though when I check that same object with > callable() it does seem to be callable. For example: > > mycomputer$ python > Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio.Align.Applications import MuscleCommandline >>>> muscle_cline = MuscleCommandline(input="opuntia.fasta") >>>> stdout, stderr = muscle_cline() > Traceback (most recent call last): > File "", line 1, in > TypeError: 'MuscleCommandline' object is not callable >>>> callable(MuscleCommandline) > True > > Thanks for your help! Hi Romelia, The simple answer is from the FAQ (in the current Biopython Tutorial), http://biopython.org/DIST/docs/tutorial/Tutorial.html "Why can?t I run command line tools directly from the application wrappers? You need Biopython 1.55 or later. Alternatively, use the Python subprocess module directly." You are using Biopython 1.53 (which is now four years old), and it doesn't have the __call__ method defined which made the class callable in this way. Your investigation *almost* identified the problem, but the key test you missed was: callable(muscle_cline) Unfortunately checking callable(MuscleCommandline) just confirmed you can invoke MuscleCommandline(...) which you did to create the object stored as muscle_cline. So, I would suggest you uninstall the (very old) Ubunutu Biopython (using 'sudo apt-get remove python-biopython') and then instead install the current release from source. See: http://biopython.org/wiki/Download#Ubuntu_or_Debian Regards, Peter From romeliasalomon at gmail.com Tue Dec 3 16:34:30 2013 From: romeliasalomon at gmail.com (Romelia Salomon) Date: Tue, 3 Dec 2013 08:34:30 -0800 Subject: [Biopython] error object is not callable, but callable() returns True In-Reply-To: References: Message-ID: Thanks Peter! It works perfectly now. On Tue, Dec 3, 2013 at 2:31 AM, Peter Cock wrote: > On Mon, Dec 2, 2013 at 11:43 PM, Romelia Salomon > wrote: > > Hi > > > > I am having problems using biopython on my laptop. I have Python version > > v2.6 and I recently installed biopython 1.53-1 using synaptic (I am using > > Ubuntu), but I can't get it to work right, I am having a problem running > > even the examples in the tutorial. > > > > When I try to use any of the command line wrappers I get an error saying > > the object is not callable, even though when I check that same object > with > > callable() it does seem to be callable. For example: > > > > mycomputer$ python > > Python 2.6.5 (r265:79063, Sep 26 2013, 18:48:04) > > [GCC 4.4.3] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > >>>> from Bio.Align.Applications import MuscleCommandline > >>>> muscle_cline = MuscleCommandline(input="opuntia.fasta") > >>>> stdout, stderr = muscle_cline() > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: 'MuscleCommandline' object is not callable > >>>> callable(MuscleCommandline) > > True > > > > Thanks for your help! > > Hi Romelia, > > The simple answer is from the FAQ (in the current Biopython Tutorial), > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > "Why can?t I run command line tools directly from the application wrappers? > You need Biopython 1.55 or later. Alternatively, use the Python > subprocess module directly." > > You are using Biopython 1.53 (which is now four years old), and it > doesn't have the __call__ method defined which made the class > callable in this way. Your investigation *almost* identified the > problem, but the key test you missed was: > > callable(muscle_cline) > > Unfortunately checking callable(MuscleCommandline) just confirmed > you can invoke MuscleCommandline(...) which you did to create > the object stored as muscle_cline. > > So, I would suggest you uninstall the (very old) Ubunutu Biopython > (using 'sudo apt-get remove python-biopython') and then instead > install the current release from source. See: > http://biopython.org/wiki/Download#Ubuntu_or_Debian > > Regards, > > Peter > -- **************************************** Dr. Romelia Salomon Ferrer Nagarajan Research Group Immunology Department City of Hope From aradwen at gmail.com Wed Dec 4 17:40:10 2013 From: aradwen at gmail.com (Aniba, Radhouane) Date: Wed, 4 Dec 2013 12:40:10 -0500 Subject: [Biopython] Python crowd-coding Message-ID: <65B72E1F-AAD9-4AE8-AD6A-8F075A3E216C@gmail.com> Hello ' pythonians', I apologize in advance if the message sounds like an advertisement, but I thought it might be useful to the users of Biopython list, so I will make it short. I just wanted to announce the first public release of CodersCrowd, the bioinformatics crowdcoding platform, to the public. This first release of our application comes with a lot of features that will be, we hope, of great help for your day to day bioinformatics developments and research. It was less than a year ago that I first thought about a platform where people share their skills and methodologies to solve different kinds of problems. After months of feedbacks from the community, I think that we have here something that will be of great help, an addition to some excellent initiatives that are already changing the way we are doing bioinformatics, such us biostars, seqanswers and rosalind. The first release of CodersCrowd comes already loaded with a lot of features, but this is only the top of the iceberg. We decided to go public with the version 1.1 that was ready about a month ago (we were running a battery of tests to make sure everything works great). The future of CodersCrowd is incredibly bright and we welcome you to join us on the journey of its first chapter. You can sign up here : coderscrowd.com, and I would be glad to have some feedbacks if any. Thanks, Rad, the guy behind CodersCrowd From cfriedline at vcu.edu Wed Dec 4 16:48:24 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 11:48:24 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' Message-ID: Hi everyone, I ran into this problem today, and wanted to bring it up to see if anyone else has seen it. Searching the archives didn't get me anywhere. I'm running a simple restriction analysis (on a random genome), and getting this error. If I look for cut sites with the enzymes individually, it works for each. However, when I put them into a RestrictionBatch, I get an error on the class. type object 'RestrictionType' has no attribute 'size' I put an IPython notebook describing this here: http://nbviewer.ipython.org/gist/cfriedline/7790932 Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) Any help is appreciated, Thanks, Chris From p.j.a.cock at googlemail.com Wed Dec 4 23:38:08 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 Dec 2013 23:38:08 +0000 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Wed, Dec 4, 2013 at 4:48 PM, Christopher J Friedline wrote: > Hi everyone, > > I ran into this problem today, and wanted to bring it up to see if anyone > else has seen it. Searching the archives didn't get me anywhere. I'm > running a simple restriction analysis (on a random genome), and getting > this error. If I look for cut sites with the enzymes individually, it works > for each. However, when I put them into a RestrictionBatch, I get an error > on the class. > > type object 'RestrictionType' has no attribute 'size' > > I put an IPython notebook describing this here: > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > Any help is appreciated, > > Thanks, > Chris Hi Chris, Unfortunately when I tried it with a random genome like your example it didn't fail... can you replace your random genome with something fixed and still get the exception from the .with_site() call? i.e. Can we eliminate the randomness as a factor? Here's a cut down example which works for me (64 bit Linux, Python 2.7.3, latest Biopython from git): >>> from Bio.Restriction import * >>> from Bio.Seq import Seq >>> g = Seq(EcoRI.site + "ACGT" + MstI.site + "ACGT" + EcoRI.site) >>> rb = RestrictionBatch(['EcoRI', 'MstI']) >>> ana = Analysis(rb, g, linear=True) >>> ana.with_sites() {EcoRI: [2, 22], MstI: [14]} This rings a couple of bells (oddities with the .size property), http://lists.open-bio.org/pipermail/biopython-dev/2013-October/010935.html Also a much older thread where RestrictionBatch gave trouble, which it appears was never properly resolved http://lists.open-bio.org/pipermail/biopython/2009-December/006004.html http://lists.open-bio.org/pipermail/biopython/2009-December/006005.html That example fails today, with hindsight one of us should have filed a bug rather than just forgetting about it once the original poster's problem was solved. It looks like it could be down to the version of Python... Again using Biopython from git, $ python Python 2.7.3 (default, Sep 26 2013, 20:03:06) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Restriction import * >>> EcoRI.size == len(EcoRI) == 6 True >>> rb = RestrictionBatch(first=[], suppliers=['F','R']) >>> len(rb) #varies depending on version of REBASE 235 >>> len([x for x in rb if x.size == 6]) 125 >>> len(rb.lambdasplit(lambda x: x.size == 6)) #same? 0 >>> len(rb.lambdasplit(lambda x: len(x) == 6)) #same? 0 (This is probably a separate issue to yours though.) Peter From w.arindrarto at gmail.com Thu Dec 5 00:03:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 5 Dec 2013 01:03:02 +0100 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Hi Christopher, Peter, Thank you for reporting the issue. After prodding around, I have to say this looks like a very interesting 'bug' :). I'm going to ramble a bit below, so I'll give a short TL;DR first: I don't think this is a bug from Biopython per se, since if you try the same code in a regular Python shell, it works (as Peter has shown). Don't get me wrong, I do appreciate the report in IPython notebook form (I'm a user myself and I think more people should use it, also for reporting bugs like this). But in the context of fixing the problem, I'd suggest that you use a regular Python shell instead. Now, that being said, let's delve in a bit into what's causing the bug (with a small disclaimer: these are my observations and I'm quite sure there are others who know more about this issue). If you see the stack trace of the error, you'd see that a huge chunk is coming from the IPython codebase. Initially I thought this was something of an artifact (I expected the entire stack to trace only Biopython code), but I was wrong. That stack trace really points to the cause of the problem: the way IPython displays cell results in the browser. More specifically, I've pinpointed this problem into the way IPython displays restriction enzymes. You can try it in your same Ipython notebook: from Bio import Restriction as rst rst.EcoRI The code above will never fail in a regular Python shell, but will fail in IPython. Why is this the case? I suspect it has something to do with the way Biopython's Restriction module is written. Turns out, there's already some nifty metaclass tricks being employed there that allows a given enzyme to have the class type of itself (e.g. instead of having a RestrictionType class with an instance called EcoRI, we have a metaclass RestrictionType with a class called EcoRI). To keep it short, IPython seems to do an attribute lookup on these metaclasses when it tries to display the object. You'll notice that on line 333 of /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, there is an expression that tries to get the class of the object to display: obj_class = getattr(obj, '__class__', None) or type(obj) However, these metaclasses have not been instantiated into a class and thus do not have any 'size' attribute, IPython complains and raises an error. So to sum up, this behavior may be cause by IPython instead of Biopython. It's perhaps a good idea to mention this in the IPython forum / mailing list, too. They've been doing a very impressive job so far, so I have a hunch this may have popped up in one of their discussions. I have to admit that my metaclass chops are still very limited as these usually aren't required for most Python programming. But if you're interested this SO page (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) has a very easy to follow explanation about what they are and why they are useful. Anyway, that's enough for now I guess :). Cheers, Bow P.S. Peter, perhaps these other Restriction bugs are related with the metaclass? I've never delved that deep into the submodule's codebase, but it looks like there are some very interesting tidbits there. On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline wrote: > Hi everyone, > > I ran into this problem today, and wanted to bring it up to see if anyone > else has seen it. Searching the archives didn't get me anywhere. I'm > running a simple restriction analysis (on a random genome), and getting > this error. If I look for cut sites with the enzymes individually, it works > for each. However, when I put them into a RestrictionBatch, I get an error > on the class. > > type object 'RestrictionType' has no attribute 'size' > > I put an IPython notebook describing this here: > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > Any help is appreciated, > > Thanks, > Chris > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From cfriedline at vcu.edu Thu Dec 5 01:56:05 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 20:56:05 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: I may have spoken too soon about the notebook specificity - got the same error when I run .full() on the Analysis object in the ipython console. On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline wrote: > Bow, > > Thanks for your excellent reply and for taking the time to look - it never > occurred to me to use the regular (or IPython) REPL - because it's just not > my way of developing. I think you're right about it not being a Biopython > bug, but I think the problems are related to the IPython notebook > specifically, rather than IPython, in general. I downloaded the notebook > and ran it in the ipython console (%run magic), and it completes without > error. Good catch - I'll touch base with the IPython folks. > > Thanks, > Chris > > > > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto wrote: > >> Hi Christopher, Peter, >> >> Thank you for reporting the issue. After prodding around, I have to >> say this looks like a very interesting 'bug' :). I'm going to ramble a >> bit below, so I'll give a short TL;DR first: >> >> I don't think this is a bug from Biopython per se, since if you try >> the same code in a regular Python shell, it works (as Peter has >> shown). Don't get me wrong, >> I do appreciate the report in IPython notebook form (I'm a user myself >> and I think more people should use it, also for reporting bugs like >> this). But in the context of fixing the problem, I'd suggest that you >> use a regular Python shell instead. >> >> Now, that being said, let's delve in a bit into what's causing the bug >> (with a small disclaimer: these are my observations and I'm quite sure >> there are others who know more about this issue). >> >> If you see the stack >> trace of the error, you'd see that a huge chunk is coming from the >> IPython codebase. Initially I thought this was something of an >> artifact (I expected the entire stack to trace only Biopython code), >> but I was wrong. That stack trace really points to the cause of the >> problem: the way IPython displays cell results in the browser. >> >> More specifically, I've pinpointed this problem into the way IPython >> displays restriction enzymes. You can try it in your same Ipython >> notebook: >> >> from Bio import Restriction as rst >> rst.EcoRI >> >> The code above will never fail in a regular Python shell, but will >> fail in IPython. >> >> Why is this the case? I suspect it has something to do with the way >> Biopython's Restriction module is written. Turns out, there's already >> some nifty metaclass tricks being employed there that allows a given >> enzyme to have the class type of itself (e.g. instead of having a >> RestrictionType class with an instance called EcoRI, we have a >> metaclass RestrictionType with a class called EcoRI). To keep it >> short, IPython seems to do an attribute lookup on these metaclasses >> when it tries to display the object. You'll notice that on line 333 >> of >> /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, >> there is an expression that tries to get the class of the object to >> display: >> >> obj_class = getattr(obj, '__class__', None) or type(obj) >> >> However, these metaclasses have not been instantiated into a class and >> thus do not have any 'size' attribute, IPython complains and raises an >> error. >> >> So to sum up, this behavior may be cause by IPython instead of >> Biopython. It's perhaps a good idea to mention this in the IPython >> forum / mailing list, too. They've been doing a very impressive job so >> far, so I have a hunch this may have popped up in one of their >> discussions. >> >> I have to admit that my metaclass chops are still very limited as >> these usually aren't required for most Python programming. But if >> you're interested this SO page >> (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) >> has a very easy to follow explanation about what they are and why they >> are useful. >> >> Anyway, that's enough for now I guess :). >> >> Cheers, >> Bow >> >> P.S. Peter, perhaps these other Restriction bugs are related with the >> metaclass? I've never delved that deep into the submodule's codebase, >> but it looks like there are some very interesting tidbits there. >> >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline >> wrote: >> > Hi everyone, >> > >> > I ran into this problem today, and wanted to bring it up to see if >> anyone >> > else has seen it. Searching the archives didn't get me anywhere. I'm >> > running a simple restriction analysis (on a random genome), and getting >> > this error. If I look for cut sites with the enzymes individually, it >> works >> > for each. However, when I put them into a RestrictionBatch, I get an >> error >> > on the class. >> > >> > type object 'RestrictionType' has no attribute 'size' >> > >> > I put an IPython notebook describing this here: >> > >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 >> > >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) >> > >> > Any help is appreciated, >> > >> > Thanks, >> > Chris >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > > From cfriedline at vcu.edu Thu Dec 5 01:53:26 2013 From: cfriedline at vcu.edu (Christopher J Friedline) Date: Wed, 4 Dec 2013 20:53:26 -0500 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Bow, Thanks for your excellent reply and for taking the time to look - it never occurred to me to use the regular (or IPython) REPL - because it's just not my way of developing. I think you're right about it not being a Biopython bug, but I think the problems are related to the IPython notebook specifically, rather than IPython, in general. I downloaded the notebook and ran it in the ipython console (%run magic), and it completes without error. Good catch - I'll touch base with the IPython folks. Thanks, Chris On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto wrote: > Hi Christopher, Peter, > > Thank you for reporting the issue. After prodding around, I have to > say this looks like a very interesting 'bug' :). I'm going to ramble a > bit below, so I'll give a short TL;DR first: > > I don't think this is a bug from Biopython per se, since if you try > the same code in a regular Python shell, it works (as Peter has > shown). Don't get me wrong, > I do appreciate the report in IPython notebook form (I'm a user myself > and I think more people should use it, also for reporting bugs like > this). But in the context of fixing the problem, I'd suggest that you > use a regular Python shell instead. > > Now, that being said, let's delve in a bit into what's causing the bug > (with a small disclaimer: these are my observations and I'm quite sure > there are others who know more about this issue). > > If you see the stack > trace of the error, you'd see that a huge chunk is coming from the > IPython codebase. Initially I thought this was something of an > artifact (I expected the entire stack to trace only Biopython code), > but I was wrong. That stack trace really points to the cause of the > problem: the way IPython displays cell results in the browser. > > More specifically, I've pinpointed this problem into the way IPython > displays restriction enzymes. You can try it in your same Ipython > notebook: > > from Bio import Restriction as rst > rst.EcoRI > > The code above will never fail in a regular Python shell, but will > fail in IPython. > > Why is this the case? I suspect it has something to do with the way > Biopython's Restriction module is written. Turns out, there's already > some nifty metaclass tricks being employed there that allows a given > enzyme to have the class type of itself (e.g. instead of having a > RestrictionType class with an instance called EcoRI, we have a > metaclass RestrictionType with a class called EcoRI). To keep it > short, IPython seems to do an attribute lookup on these metaclasses > when it tries to display the object. You'll notice that on line 333 > of > /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, > there is an expression that tries to get the class of the object to > display: > > obj_class = getattr(obj, '__class__', None) or type(obj) > > However, these metaclasses have not been instantiated into a class and > thus do not have any 'size' attribute, IPython complains and raises an > error. > > So to sum up, this behavior may be cause by IPython instead of > Biopython. It's perhaps a good idea to mention this in the IPython > forum / mailing list, too. They've been doing a very impressive job so > far, so I have a hunch this may have popped up in one of their > discussions. > > I have to admit that my metaclass chops are still very limited as > these usually aren't required for most Python programming. But if > you're interested this SO page > (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) > has a very easy to follow explanation about what they are and why they > are useful. > > Anyway, that's enough for now I guess :). > > Cheers, > Bow > > P.S. Peter, perhaps these other Restriction bugs are related with the > metaclass? I've never delved that deep into the submodule's codebase, > but it looks like there are some very interesting tidbits there. > > On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline > wrote: > > Hi everyone, > > > > I ran into this problem today, and wanted to bring it up to see if anyone > > else has seen it. Searching the archives didn't get me anywhere. I'm > > running a simple restriction analysis (on a random genome), and getting > > this error. If I look for cut sites with the enzymes individually, it > works > > for each. However, when I put them into a RestrictionBatch, I get an > error > > on the class. > > > > type object 'RestrictionType' has no attribute 'size' > > > > I put an IPython notebook describing this here: > > > > http://nbviewer.ipython.org/gist/cfriedline/7790932 > > > > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > > > > Any help is appreciated, > > > > Thanks, > > Chris > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > From antony.lee at berkeley.edu Thu Dec 5 03:38:22 2013 From: antony.lee at berkeley.edu (Antony Lee) Date: Wed, 4 Dec 2013 19:38:22 -0800 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: The restriction code does *way* more magic with metaclasses than needed (and I don't think IPython is really to blame here as the code breaks some fairly basic assumptions about the Python object model IMHO). I have in fact a PR from nearly one year ago that basically reimplemented the whole module from scratch (https://github.com/biopython/biopython/pull/148). Feel free to try it. Antony 2013/12/4 Christopher J Friedline > I may have spoken too soon about the notebook specificity - got the same > error when I run .full() on the Analysis object in the ipython console. > > On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline > wrote: > > > Bow, > > > > Thanks for your excellent reply and for taking the time to look - it > never > > occurred to me to use the regular (or IPython) REPL - because it's just > not > > my way of developing. I think you're right about it not being a > Biopython > > bug, but I think the problems are related to the IPython notebook > > specifically, rather than IPython, in general. I downloaded the notebook > > and ran it in the ipython console (%run magic), and it completes without > > error. Good catch - I'll touch base with the IPython folks. > > > > Thanks, > > Chris > > > > > > > > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto < > w.arindrarto at gmail.com>wrote: > > > >> Hi Christopher, Peter, > >> > >> Thank you for reporting the issue. After prodding around, I have to > >> say this looks like a very interesting 'bug' :). I'm going to ramble a > >> bit below, so I'll give a short TL;DR first: > >> > >> I don't think this is a bug from Biopython per se, since if you try > >> the same code in a regular Python shell, it works (as Peter has > >> shown). Don't get me wrong, > >> I do appreciate the report in IPython notebook form (I'm a user myself > >> and I think more people should use it, also for reporting bugs like > >> this). But in the context of fixing the problem, I'd suggest that you > >> use a regular Python shell instead. > >> > >> Now, that being said, let's delve in a bit into what's causing the bug > >> (with a small disclaimer: these are my observations and I'm quite sure > >> there are others who know more about this issue). > >> > >> If you see the stack > >> trace of the error, you'd see that a huge chunk is coming from the > >> IPython codebase. Initially I thought this was something of an > >> artifact (I expected the entire stack to trace only Biopython code), > >> but I was wrong. That stack trace really points to the cause of the > >> problem: the way IPython displays cell results in the browser. > >> > >> More specifically, I've pinpointed this problem into the way IPython > >> displays restriction enzymes. You can try it in your same Ipython > >> notebook: > >> > >> from Bio import Restriction as rst > >> rst.EcoRI > >> > >> The code above will never fail in a regular Python shell, but will > >> fail in IPython. > >> > >> Why is this the case? I suspect it has something to do with the way > >> Biopython's Restriction module is written. Turns out, there's already > >> some nifty metaclass tricks being employed there that allows a given > >> enzyme to have the class type of itself (e.g. instead of having a > >> RestrictionType class with an instance called EcoRI, we have a > >> metaclass RestrictionType with a class called EcoRI). To keep it > >> short, IPython seems to do an attribute lookup on these metaclasses > >> when it tries to display the object. You'll notice that on line 333 > >> of > >> > /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, > >> there is an expression that tries to get the class of the object to > >> display: > >> > >> obj_class = getattr(obj, '__class__', None) or type(obj) > >> > >> However, these metaclasses have not been instantiated into a class and > >> thus do not have any 'size' attribute, IPython complains and raises an > >> error. > >> > >> So to sum up, this behavior may be cause by IPython instead of > >> Biopython. It's perhaps a good idea to mention this in the IPython > >> forum / mailing list, too. They've been doing a very impressive job so > >> far, so I have a hunch this may have popped up in one of their > >> discussions. > >> > >> I have to admit that my metaclass chops are still very limited as > >> these usually aren't required for most Python programming. But if > >> you're interested this SO page > >> ( > http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) > >> has a very easy to follow explanation about what they are and why they > >> are useful. > >> > >> Anyway, that's enough for now I guess :). > >> > >> Cheers, > >> Bow > >> > >> P.S. Peter, perhaps these other Restriction bugs are related with the > >> metaclass? I've never delved that deep into the submodule's codebase, > >> but it looks like there are some very interesting tidbits there. > >> > >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline > >> wrote: > >> > Hi everyone, > >> > > >> > I ran into this problem today, and wanted to bring it up to see if > >> anyone > >> > else has seen it. Searching the archives didn't get me anywhere. I'm > >> > running a simple restriction analysis (on a random genome), and > getting > >> > this error. If I look for cut sites with the enzymes individually, it > >> works > >> > for each. However, when I put them into a RestrictionBatch, I get an > >> error > >> > on the class. > >> > > >> > type object 'RestrictionType' has no attribute 'size' > >> > > >> > I put an IPython notebook describing this here: > >> > > >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 > >> > > >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) > >> > > >> > Any help is appreciated, > >> > > >> > Thanks, > >> > Chris > >> > _______________________________________________ > >> > Biopython mailing list - Biopython at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/biopython > >> > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From w.arindrarto at gmail.com Thu Dec 5 10:01:04 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 5 Dec 2013 11:01:04 +0100 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: Hi everyone, Christopher: Ah yes, I actually meant the IPython notebook (but I guess it turns out the same occurs in the IPython console then :) ). Antony: That may be the case, too. But thanks for the pull request (I think Peter has just looked at it, actually https://github.com/biopython/biopython/pull/148). I do think there is room for improvement there, especially since the code probably predates modern Python conventions (& assumptions). Cheers, Bow On Thu, Dec 5, 2013 at 4:38 AM, Antony Lee wrote: > The restriction code does *way* more magic with metaclasses than needed (and > I don't think IPython is really to blame here as the code breaks some fairly > basic assumptions about the Python object model IMHO). I have in fact a PR > from nearly one year ago that basically reimplemented the whole module from > scratch (https://github.com/biopython/biopython/pull/148). Feel free to try > it. > > Antony > > > 2013/12/4 Christopher J Friedline >> >> I may have spoken too soon about the notebook specificity - got the same >> error when I run .full() on the Analysis object in the ipython console. >> >> On Wed, Dec 4, 2013 at 8:53 PM, Christopher J Friedline >> wrote: >> >> > Bow, >> > >> > Thanks for your excellent reply and for taking the time to look - it >> > never >> > occurred to me to use the regular (or IPython) REPL - because it's just >> > not >> > my way of developing. I think you're right about it not being a >> > Biopython >> > bug, but I think the problems are related to the IPython notebook >> > specifically, rather than IPython, in general. I downloaded the >> > notebook >> > and ran it in the ipython console (%run magic), and it completes without >> > error. Good catch - I'll touch base with the IPython folks. >> > >> > Thanks, >> > Chris >> > >> > >> > >> > On Wed, Dec 4, 2013 at 7:03 PM, Wibowo Arindrarto >> > wrote: >> > >> >> Hi Christopher, Peter, >> >> >> >> Thank you for reporting the issue. After prodding around, I have to >> >> say this looks like a very interesting 'bug' :). I'm going to ramble a >> >> bit below, so I'll give a short TL;DR first: >> >> >> >> I don't think this is a bug from Biopython per se, since if you try >> >> the same code in a regular Python shell, it works (as Peter has >> >> shown). Don't get me wrong, >> >> I do appreciate the report in IPython notebook form (I'm a user myself >> >> and I think more people should use it, also for reporting bugs like >> >> this). But in the context of fixing the problem, I'd suggest that you >> >> use a regular Python shell instead. >> >> >> >> Now, that being said, let's delve in a bit into what's causing the bug >> >> (with a small disclaimer: these are my observations and I'm quite sure >> >> there are others who know more about this issue). >> >> >> >> If you see the stack >> >> trace of the error, you'd see that a huge chunk is coming from the >> >> IPython codebase. Initially I thought this was something of an >> >> artifact (I expected the entire stack to trace only Biopython code), >> >> but I was wrong. That stack trace really points to the cause of the >> >> problem: the way IPython displays cell results in the browser. >> >> >> >> More specifically, I've pinpointed this problem into the way IPython >> >> displays restriction enzymes. You can try it in your same Ipython >> >> notebook: >> >> >> >> from Bio import Restriction as rst >> >> rst.EcoRI >> >> >> >> The code above will never fail in a regular Python shell, but will >> >> fail in IPython. >> >> >> >> Why is this the case? I suspect it has something to do with the way >> >> Biopython's Restriction module is written. Turns out, there's already >> >> some nifty metaclass tricks being employed there that allows a given >> >> enzyme to have the class type of itself (e.g. instead of having a >> >> RestrictionType class with an instance called EcoRI, we have a >> >> metaclass RestrictionType with a class called EcoRI). To keep it >> >> short, IPython seems to do an attribute lookup on these metaclasses >> >> when it tries to display the object. You'll notice that on line 333 >> >> of >> >> >> >> /data7/cfriedline/anaconda/envs/conda/lib/python2.7/site-packages/IPython/lib/pretty.py, >> >> there is an expression that tries to get the class of the object to >> >> display: >> >> >> >> obj_class = getattr(obj, '__class__', None) or type(obj) >> >> >> >> However, these metaclasses have not been instantiated into a class and >> >> thus do not have any 'size' attribute, IPython complains and raises an >> >> error. >> >> >> >> So to sum up, this behavior may be cause by IPython instead of >> >> Biopython. It's perhaps a good idea to mention this in the IPython >> >> forum / mailing list, too. They've been doing a very impressive job so >> >> far, so I have a hunch this may have popped up in one of their >> >> discussions. >> >> >> >> I have to admit that my metaclass chops are still very limited as >> >> these usually aren't required for most Python programming. But if >> >> you're interested this SO page >> >> >> >> (http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python) >> >> has a very easy to follow explanation about what they are and why they >> >> are useful. >> >> >> >> Anyway, that's enough for now I guess :). >> >> >> >> Cheers, >> >> Bow >> >> >> >> P.S. Peter, perhaps these other Restriction bugs are related with the >> >> metaclass? I've never delved that deep into the submodule's codebase, >> >> but it looks like there are some very interesting tidbits there. >> >> >> >> On Wed, Dec 4, 2013 at 5:48 PM, Christopher J Friedline >> >> wrote: >> >> > Hi everyone, >> >> > >> >> > I ran into this problem today, and wanted to bring it up to see if >> >> anyone >> >> > else has seen it. Searching the archives didn't get me anywhere. >> >> > I'm >> >> > running a simple restriction analysis (on a random genome), and >> >> > getting >> >> > this error. If I look for cut sites with the enzymes individually, it >> >> works >> >> > for each. However, when I put them into a RestrictionBatch, I get an >> >> error >> >> > on the class. >> >> > >> >> > type object 'RestrictionType' has no attribute 'size' >> >> > >> >> > I put an IPython notebook describing this here: >> >> > >> >> > http://nbviewer.ipython.org/gist/cfriedline/7790932 >> >> > >> >> > Bio (1.6.2), IPython (1.1.0), Python (2.7.6/Anaconda 1.8) >> >> > >> >> > Any help is appreciated, >> >> > >> >> > Thanks, >> >> > Chris >> >> > _______________________________________________ >> >> > Biopython mailing list - Biopython at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biopython >> >> >> > >> > >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > From p.j.a.cock at googlemail.com Thu Dec 5 10:30:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 10:30:27 +0000 Subject: [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 10:01 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Christopher: Ah yes, I actually meant the IPython notebook (but I > guess it turns out the same occurs in the IPython console then :) ). > > Antony: That may be the case, too. But thanks for the pull request (I > think Peter has just looked at it, actually > https://github.com/biopython/biopython/pull/148). I do think there is > room for improvement there, especially since the code probably > predates modern Python conventions (& assumptions). > > Cheers, > Bow Even Frederic Sohm (the original author) agreed that the current Bio.Restriction code is too complicated (I called it 'magic' in our discussion back in 2010 regarding a Python 2.6 problem with super http://bugzilla.open-bio.org/show_bug.cgi?id=2604 or now https://redmine.open-bio.org/issues/2604 ). And also I dislike the fact it does one-based counting. However, none of our currently active developers really understand the code so changing it is hard - and backward compatibility constrains us greatly. I think the best route forward is to replace Bio.Restriction with a new less complicated implementation trying to follow modern Python conventions (using zero-based counting!), likey based on Antony's branch https://github.com/biopython/biopython/pull/148 and then deprecate and later remove Bio.Restriction. (We should continue that debate on the biopython-dev list, CC'd) In terms of Christopher's problems - it would not surprise me if they are specific to IPython since introspection of the 'magic' classes seems problematic. Regards, Peter From davidsshin at lbl.gov Thu Dec 5 12:05:00 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 04:05:00 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? Message-ID: Hi all - During off peak times, I was working on a script that takes a list of gi numbers (a list of 8) for protein sequences to use as input to get fasta sequences via Entrez.efetch. I passed along my email address during the searches. I am still new to python, etc. so had been testing my program. At some point I started getting the error below, and I'm not sure if it is my program, my web provider, or if ncbi got mad at me. It's been like this for about an hour, so giving up for the night. Thanks error: Traceback (most recent call last): File "../gi-to-fasta5.py", line 11, in handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", id=gi_numbers) File "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 144, in efetch return _open(cgi, variables, post) File "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 460, in _open raise exception urllib2.HTTPError: HTTP Error 400: Bad Request -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Thu Dec 5 12:31:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 12:31:07 +0000 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Hi David, This is probably an intermittent failure, perhaps the NCBI is very busy or there could be a temporary network problem somewhere. The chances are it will work tomorrow... Peter On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: > Hi all - > > During off peak times, I was working on a script that takes a list of gi > numbers (a list of 8) for protein sequences to use as input to get fasta > sequences via Entrez.efetch. I passed along my email address during the > searches. I am still new to python, etc. so had been testing my program. > > At some point I started getting the error below, and I'm not sure if it is > my program, my web provider, or if ncbi got mad at me. It's been like this > for about an hour, so giving up for the night. > > Thanks > > error: > > Traceback (most recent call last): > File "../gi-to-fasta5.py", line 11, in > handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", > id=gi_numbers) > File > "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 144, in efetch > return _open(cgi, variables, post) > File > "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 460, in _open > raise exception > urllib2.HTTPError: HTTP Error 400: Bad Request > > > > > > -- > David Shin, Ph.D > Lawrence Berkeley National Labs > 1 Cyclotron Road > MS 83-R0101 > Berkeley, CA 94720 > USA > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From dhoworth at mrc-lmb.cam.ac.uk Thu Dec 5 14:42:47 2013 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu, 05 Dec 2013 14:42:47 +0000 Subject: [Biopython] Bio.PDB local MMCIF files Message-ID: <52A090E7.10706@mrc-lmb.cam.ac.uk> I'm just looking at Bio.PDB with a view to switching over to it, so I can use it to read MMCIF files in preparation for 4-character chain IDs etc. I presently use a homegrown perl module to read PDB files, and I have a local PDB mirror. I see from http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ that there's some sort of facility to use local files, perhaps using PDBList() but I don't see from the FAQ or the PDBList class docs exactly which directory it wants and whether it is PDB files only or a complete archive. The only hints seem to hint at a single directory of PDB files or a divided directory hierarchy of PDB files, neither of which will contain the MMCIF files. So is it: /$TOP/data /$TOP/data/structures /$TOP/data/structures/all Or is it some other technique entirely to access MMCIF files? Looking at the code, it seems to want a full path to an MMCIF file, and the MMCIF file must be uncompressed. Since all the files in the archive are compressed, that would be another thing I'd want to fix. Oh, I'm currently using the V1.58 that came with the machine I'm testing on. I can upgrade if essential. Thanks and regards, Dave From anaryin at gmail.com Thu Dec 5 15:09:17 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 5 Dec 2013 16:09:17 +0100 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <52A090E7.10706@mrc-lmb.cam.ac.uk> References: <52A090E7.10706@mrc-lmb.cam.ac.uk> Message-ID: Dear Dave, I'm not quite sure I understood your question. PDBList is used to download and maintain a local copy of the PDB, which would not suit you since you are looking for mmCIF data. It could be tweaked however to download mmCIF files. Is this what you are looking for? As for mmCIF parsing and manipulation, currently the parser accepts a path to the file (relative paths should do) but indeed it does not handle compression. I think it would be up to the user to inflate the gz file before parsing.. Best, Jo?o 2013/12/5 Dave Howorth > I'm just looking at Bio.PDB with a view to switching over to it, so I > can use it to read MMCIF files in preparation for 4-character chain IDs > etc. I presently use a homegrown perl module to read PDB files, and I > have a local PDB mirror. I see from > > http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ > > that there's some sort of facility to use local files, perhaps using > PDBList() but I don't see from the FAQ or the PDBList class docs exactly > which directory it wants and whether it is PDB files only or a complete > archive. The only hints seem to hint at a single directory of PDB files > or a divided directory hierarchy of PDB files, neither of which will > contain the MMCIF files. So is it: > > /$TOP/data > /$TOP/data/structures > /$TOP/data/structures/all > > Or is it some other technique entirely to access MMCIF files? > > Looking at the code, it seems to want a full path to an MMCIF file, and > the MMCIF file must be uncompressed. Since all the files in the archive > are compressed, that would be another thing I'd want to fix. > > Oh, I'm currently using the V1.58 that came with the machine I'm testing > on. I can upgrade if essential. > > Thanks and regards, > Dave > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From phyrexian.kavu at gmail.com Thu Dec 5 15:13:02 2013 From: phyrexian.kavu at gmail.com (phyrexian.kavu at gmail.com) Date: Thu, 5 Dec 2013 09:13:02 -0600 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Hi David! I had the same error while trying to download hundreds of sequences from Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter says it is an internet error. I avoided it by making a loop and adding the try - except statements, so whenever this error appeared, the script just returned and tried it again until the sequences were downloaded correctly. I know it is a brute force-like approach, but this was the best solution I could think of, I hope It helps you too. Miguel [Theropoda is my profession] > On 05/12/2013, at 06:31, Peter Cock wrote: > > Hi David, > > This is probably an intermittent failure, perhaps the NCBI > is very busy or there could be a temporary network problem > somewhere. The chances are it will work tomorrow... > > Peter > > >> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >> Hi all - >> >> During off peak times, I was working on a script that takes a list of gi >> numbers (a list of 8) for protein sequences to use as input to get fasta >> sequences via Entrez.efetch. I passed along my email address during the >> searches. I am still new to python, etc. so had been testing my program. >> >> At some point I started getting the error below, and I'm not sure if it is >> my program, my web provider, or if ncbi got mad at me. It's been like this >> for about an hour, so giving up for the night. >> >> Thanks >> >> error: >> >> Traceback (most recent call last): >> File "../gi-to-fasta5.py", line 11, in >> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >> id=gi_numbers) >> File >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 144, in efetch >> return _open(cgi, variables, post) >> File >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 460, in _open >> raise exception >> urllib2.HTTPError: HTTP Error 400: Bad Request >> >> >> >> >> >> -- >> David Shin, Ph.D >> Lawrence Berkeley National Labs >> 1 Cyclotron Road >> MS 83-R0101 >> Berkeley, CA 94720 >> USA >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Thu Dec 5 16:53:46 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 5 Dec 2013 17:53:46 +0100 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> References: <52A090E7.10706@mrc-lmb.cam.ac.uk> <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Message-ID: Hi Dave, I understand your concern. Python has the gzip module that can decompress the files on the file and provide a handle for the content. This will not work for the parsers since they except a filename. I will have a look at the parsers code and if it's simple, I'll add a layer to do this exactly. Cheers, Jo?o 2013/12/5 Dave Howorth > Jo?o Rodrigues wrote: > > Dear Dave, > > > > I'm not quite sure I understood your question. PDBList is used to > download > > and maintain a local copy of the PDB, which would not suit you since you > > are looking for mmCIF data. It could be tweaked however to download mmCIF > > files. Is this what you are looking for? > > Sorry, I didn't express myself very well. I misunderstood the purpose of > PDBList, and at the time thought it was simply a way to tell Biopython > where the local archive was. I already have access to a PDB/mmCIF > archive; I don't need to create one. > > > As for mmCIF parsing and manipulation, currently the parser accepts a > path > > to the file (relative paths should do) but indeed it does not handle > > compression. I think it would be up to the user to inflate the gz file > > before parsing.. > > I don't think that is very convenient, since all the files are normally > stored compressed. That's the usual case. Using a filename as the only > way to specify a file means that I would have to open the file in the > archive, read and uncompress it and store it in another file before > passing the name of that file to the mmCIF parser. Unless python > supports some means to incorporate a decopmression layer specification > into the 'filename'? (Sorry, I'm new to python) > > I would think that the 'nicest' solution would be for the parser to > recognize a compressed file and use a gzip layer to decompress it on the > fly. Alternatively, the parser could accept an open file handle as an > alternative to a filename and the caller would be responsible for > opening the file through a decompression layer. > > Since the caller is going to have to deal with prepending the library > base to the filename anyway, I suppose having it produce a decompressed > stream is not a great problem, if only it could pass the stream to the > parser! > > Cheers, Dave > > > Best, > > > > Jo?o > > PS, Sorry if I'm breaking threads by replying to the copy of the email > that Jo?o sent to me, but the copy from the mail server hasn't arrived > here yet, despite being visible at gmane. > From dhoworth at mrc-lmb.cam.ac.uk Thu Dec 5 16:45:28 2013 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Thu, 05 Dec 2013 16:45:28 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: References: <52A090E7.10706@mrc-lmb.cam.ac.uk> Message-ID: <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Jo?o Rodrigues wrote: > Dear Dave, > > I'm not quite sure I understood your question. PDBList is used to download > and maintain a local copy of the PDB, which would not suit you since you > are looking for mmCIF data. It could be tweaked however to download mmCIF > files. Is this what you are looking for? Sorry, I didn't express myself very well. I misunderstood the purpose of PDBList, and at the time thought it was simply a way to tell Biopython where the local archive was. I already have access to a PDB/mmCIF archive; I don't need to create one. > As for mmCIF parsing and manipulation, currently the parser accepts a path > to the file (relative paths should do) but indeed it does not handle > compression. I think it would be up to the user to inflate the gz file > before parsing.. I don't think that is very convenient, since all the files are normally stored compressed. That's the usual case. Using a filename as the only way to specify a file means that I would have to open the file in the archive, read and uncompress it and store it in another file before passing the name of that file to the mmCIF parser. Unless python supports some means to incorporate a decopmression layer specification into the 'filename'? (Sorry, I'm new to python) I would think that the 'nicest' solution would be for the parser to recognize a compressed file and use a gzip layer to decompress it on the fly. Alternatively, the parser could accept an open file handle as an alternative to a filename and the caller would be responsible for opening the file through a decompression layer. Since the caller is going to have to deal with prepending the library base to the filename anyway, I suppose having it produce a decompressed stream is not a great problem, if only it could pass the stream to the parser! Cheers, Dave > Best, > > Jo?o PS, Sorry if I'm breaking threads by replying to the copy of the email that Jo?o sent to me, but the copy from the mail server hasn't arrived here yet, despite being visible at gmane. From p.j.a.cock at googlemail.com Thu Dec 5 17:35:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 17:35:50 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: References: <52A090E7.10706@mrc-lmb.cam.ac.uk> <52A0ADA8.5080406@mrc-lmb.cam.ac.uk> Message-ID: On Thu, Dec 5, 2013 at 4:53 PM, Jo?o Rodrigues wrote: > Hi Dave, > > I understand your concern. Python has the gzip module that can decompress > the files on the file and provide a handle for the content. This will not > work for the parsers since they except a filename. I will have a look at > the parsers code and if it's simple, I'll add a layer to do this exactly. > > Cheers, > > Jo?o Yeah, most of Biopython's parsers take a handle only, some like the top level functions in Bio.SeqIO, AlignIO, SearchIO will take a filename or a handle for convenience. I agree that is is unfortunate that Bio.PDB currently only takes a filename (historical design choice). If we can tweak it to take either a filename or a handle that would be much better :) Regards, Peter From davidsshin at lbl.gov Fri Dec 6 07:27:16 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 23:27:16 -0800 Subject: [Biopython] going from protein to gene to oligos for cloning Message-ID: Hi again, I'm trying to use biopython to help me grab a lot of protein sequences that will eventually be used as the basis for cloning. I'm almost done screening my protein sequences, and pretty much ok on that part... I was just curious if anyone has already developed, or has any decent advice on going from protein codes to getting the actual coding sequences of the genes. At this point, my plan is to take protein codes (ie. numbers in gi|145323746|) and use these to search entrez nucleotide databases directly to get hits (I have tested it once seems to work to get genbank records... then try to use the information inside to get the nucleotide sequences... or I guess the other way is to use the top hit from tblastn somehow? Thanks, Dave -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From davidsshin at lbl.gov Fri Dec 6 07:20:00 2013 From: davidsshin at lbl.gov (David Shin) Date: Thu, 5 Dec 2013 23:20:00 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: Thanks guys - For now it eventually worked its way out, but if you could show me a snippet of how try and except statements work, I would be thankful (I know I can probably google it too), I'm also new to python in general, so coding examples always help. Thanks again, Dave On Thu, Dec 5, 2013 at 7:13 AM, wrote: > Hi David! > > I had the same error while trying to download hundreds of sequences from > Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter > says it is an internet error. I avoided it by making a loop and adding the > try - except statements, so whenever this error appeared, the script just > returned and tried it again until the sequences were downloaded correctly. > I know it is a brute force-like approach, but this was the best solution I > could think of, I hope It helps you too. > > Miguel > > > [Theropoda is my profession] > > > On 05/12/2013, at 06:31, Peter Cock wrote: > > > > Hi David, > > > > This is probably an intermittent failure, perhaps the NCBI > > is very busy or there could be a temporary network problem > > somewhere. The chances are it will work tomorrow... > > > > Peter > > > > > >> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: > >> Hi all - > >> > >> During off peak times, I was working on a script that takes a list of gi > >> numbers (a list of 8) for protein sequences to use as input to get fasta > >> sequences via Entrez.efetch. I passed along my email address during the > >> searches. I am still new to python, etc. so had been testing my program. > >> > >> At some point I started getting the error below, and I'm not sure if it > is > >> my program, my web provider, or if ncbi got mad at me. It's been like > this > >> for about an hour, so giving up for the night. > >> > >> Thanks > >> > >> error: > >> > >> Traceback (most recent call last): > >> File "../gi-to-fasta5.py", line 11, in > >> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", > >> id=gi_numbers) > >> File > >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > >> line 144, in efetch > >> return _open(cgi, variables, post) > >> File > >> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > >> line 460, in _open > >> raise exception > >> urllib2.HTTPError: HTTP Error 400: Bad Request > >> > >> > >> > >> > >> > >> -- > >> David Shin, Ph.D > >> Lawrence Berkeley National Labs > >> 1 Cyclotron Road > >> MS 83-R0101 > >> Berkeley, CA 94720 > >> USA > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From karin.lagesen at medisin.uio.no Fri Dec 6 09:26:06 2013 From: karin.lagesen at medisin.uio.no (Karin Lagesen) Date: Fri, 06 Dec 2013 10:26:06 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: <52A1982E.1070501@medisin.uio.no> IIRC, NCBI requires a 3 second delay between fetches to avoid flooding the server. I cannot find again the site where I found this, but that's what I learned the last time I did this, and I am also finding the same info in bioperl documentation. Karin On 06.12.2013 08:20, David Shin wrote: > Thanks guys - > > For now it eventually worked its way out, but if you could show me a > snippet of how try and except statements work, I would be thankful (I know > I can probably google it too), I'm also new to python in general, so coding > examples always help. > > Thanks again, > > Dave > > > On Thu, Dec 5, 2013 at 7:13 AM, wrote: > >> Hi David! >> >> I had the same error while trying to download hundreds of sequences from >> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >> says it is an internet error. I avoided it by making a loop and adding the >> try - except statements, so whenever this error appeared, the script just >> returned and tried it again until the sequences were downloaded correctly. >> I know it is a brute force-like approach, but this was the best solution I >> could think of, I hope It helps you too. >> >> Miguel >> >> >> [Theropoda is my profession] >> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>> >>> Hi David, >>> >>> This is probably an intermittent failure, perhaps the NCBI >>> is very busy or there could be a temporary network problem >>> somewhere. The chances are it will work tomorrow... >>> >>> Peter >>> >>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >>>> Hi all - >>>> >>>> During off peak times, I was working on a script that takes a list of gi >>>> numbers (a list of 8) for protein sequences to use as input to get fasta >>>> sequences via Entrez.efetch. I passed along my email address during the >>>> searches. I am still new to python, etc. so had been testing my program. >>>> >>>> At some point I started getting the error below, and I'm not sure if it >> is >>>> my program, my web provider, or if ncbi got mad at me. It's been like >> this >>>> for about an hour, so giving up for the night. >>>> >>>> Thanks >>>> >>>> error: >>>> >>>> Traceback (most recent call last): >>>> File "../gi-to-fasta5.py", line 11, in >>>> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >>>> id=gi_numbers) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 144, in efetch >>>> return _open(cgi, variables, post) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 460, in _open >>>> raise exception >>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> David Shin, Ph.D >>>> Lawrence Berkeley National Labs >>>> 1 Cyclotron Road >>>> MS 83-R0101 >>>> Berkeley, CA 94720 >>>> USA >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > -- Karin Lagesen, Ph.D. Department of Medical Genetics and Norwegian High-Throughput Sequencing Centre (NSC) Oslo University Hospital From norbert.auer at boku.ac.at Fri Dec 6 10:09:18 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Fri, 06 Dec 2013 11:09:18 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: References: Message-ID: <52A1A24E.80403@boku.ac.at> Hi David, The Entrez Service seems to be very buggy. Sometimes I even got different results for the same query. Therefore I stopped experiments with the Entrez API Service. But you can use a faster and more robust approach. You can simply download the whole data at once from NCBI's ftp site. For example the address for the complete gene information from mouse is: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Mus_musculus.ags.gz Then you can use the NCBI ToolKit utility programs to convert this ags binary file into xml. This format is the same what you get from the Entrez.fetch command. It is also described in the biopython tutorial. http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec128 You can use this simple shell script to download and extract the right genome and convert it into xml. curl -C - ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/{All_Mammalia.ags.gz} -o "Sources/ASN/#1" gene2xml -b T -c T -i Sources/ASN/All_Mammalia.ags.gz -t 10029 -o Output/10029.xml These lines extract a whole gene information for Cricetulus griseus(10029) in one turn. I also work on a parser to extract refseq and gi ids (genomic-mRNA-peptide) from this xml into csv and html files. If you work by any chance with mouse or CHO sequences you can take a look here. http://ala.boku.ac.at:4080/nauer/tools/genomes Hope this helps. Best regards, Norbert Am 2013-12-06 08:20, schrieb David Shin: > Thanks guys - > > For now it eventually worked its way out, but if you could show me a > snippet of how try and except statements work, I would be thankful (I know > I can probably google it too), I'm also new to python in general, so coding > examples always help. > > Thanks again, > > Dave > > > On Thu, Dec 5, 2013 at 7:13 AM, wrote: > >> Hi David! >> >> I had the same error while trying to download hundreds of sequences from >> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >> says it is an internet error. I avoided it by making a loop and adding the >> try - except statements, so whenever this error appeared, the script just >> returned and tried it again until the sequences were downloaded correctly. >> I know it is a brute force-like approach, but this was the best solution I >> could think of, I hope It helps you too. >> >> Miguel >> >> >> [Theropoda is my profession] >> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>> >>> Hi David, >>> >>> This is probably an intermittent failure, perhaps the NCBI >>> is very busy or there could be a temporary network problem >>> somewhere. The chances are it will work tomorrow... >>> >>> Peter >>> >>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin wrote: >>>> Hi all - >>>> >>>> During off peak times, I was working on a script that takes a list of gi >>>> numbers (a list of 8) for protein sequences to use as input to get fasta >>>> sequences via Entrez.efetch. I passed along my email address during the >>>> searches. I am still new to python, etc. so had been testing my program. >>>> >>>> At some point I started getting the error below, and I'm not sure if it >> is >>>> my program, my web provider, or if ncbi got mad at me. It's been like >> this >>>> for about an hour, so giving up for the night. >>>> >>>> Thanks >>>> >>>> error: >>>> >>>> Traceback (most recent call last): >>>> File "../gi-to-fasta5.py", line 11, in >>>> handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text", >>>> id=gi_numbers) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 144, in efetch >>>> return _open(cgi, variables, post) >>>> File >>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>> line 460, in _open >>>> raise exception >>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> David Shin, Ph.D >>>> Lawrence Berkeley National Labs >>>> 1 Cyclotron Road >>>> MS 83-R0101 >>>> Berkeley, CA 94720 >>>> USA >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > From norbert.auer at boku.ac.at Fri Dec 6 10:05:10 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Fri, 06 Dec 2013 11:05:10 +0100 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: <52A1982E.1070501@medisin.uio.no> References: <52A1982E.1070501@medisin.uio.no> Message-ID: <52A1A156.1050701@boku.ac.at> This is automatically enforced by Biopython. See http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec119 Best regards, Norbert Am 2013-12-06 10:26, schrieb Karin Lagesen: > IIRC, NCBI requires a 3 second delay between fetches to avoid flooding > the server. I cannot find again the site where I found this, but that's > what I learned the last time I did this, and I am also finding the same > info in bioperl documentation. > > Karin > > On 06.12.2013 08:20, David Shin wrote: >> Thanks guys - >> >> For now it eventually worked its way out, but if you could show me a >> snippet of how try and except statements work, I would be thankful (I >> know >> I can probably google it too), I'm also new to python in general, so >> coding >> examples always help. >> >> Thanks again, >> >> Dave >> >> >> On Thu, Dec 5, 2013 at 7:13 AM, wrote: >> >>> Hi David! >>> >>> I had the same error while trying to download hundreds of sequences from >>> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >>> says it is an internet error. I avoided it by making a loop and >>> adding the >>> try - except statements, so whenever this error appeared, the script >>> just >>> returned and tried it again until the sequences were downloaded >>> correctly. >>> I know it is a brute force-like approach, but this was the best >>> solution I >>> could think of, I hope It helps you too. >>> >>> Miguel >>> >>> >>> [Theropoda is my profession] >>> >>>> On 05/12/2013, at 06:31, Peter Cock wrote: >>>> >>>> Hi David, >>>> >>>> This is probably an intermittent failure, perhaps the NCBI >>>> is very busy or there could be a temporary network problem >>>> somewhere. The chances are it will work tomorrow... >>>> >>>> Peter >>>> >>>> >>>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin >>>>> wrote: >>>>> Hi all - >>>>> >>>>> During off peak times, I was working on a script that takes a list >>>>> of gi >>>>> numbers (a list of 8) for protein sequences to use as input to get >>>>> fasta >>>>> sequences via Entrez.efetch. I passed along my email address during >>>>> the >>>>> searches. I am still new to python, etc. so had been testing my >>>>> program. >>>>> >>>>> At some point I started getting the error below, and I'm not sure >>>>> if it >>> is >>>>> my program, my web provider, or if ncbi got mad at me. It's been like >>> this >>>>> for about an hour, so giving up for the night. >>>>> >>>>> Thanks >>>>> >>>>> error: >>>>> >>>>> Traceback (most recent call last): >>>>> File "../gi-to-fasta5.py", line 11, in >>>>> handle = Entrez.efetch(db="protein", rettype="fasta", >>>>> retmode="text", >>>>> id=gi_numbers) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>>> >>>>> line 144, in efetch >>>>> return _open(cgi, variables, post) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >>>>> >>>>> line 460, in _open >>>>> raise exception >>>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> David Shin, Ph.D >>>>> Lawrence Berkeley National Labs >>>>> 1 Cyclotron Road >>>>> MS 83-R0101 >>>>> Berkeley, CA 94720 >>>>> USA >>>>> _______________________________________________ >>>>> Biopython mailing list - Biopython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> >> > > From davidsshin at lbl.gov Fri Dec 6 10:04:53 2013 From: davidsshin at lbl.gov (David Shin) Date: Fri, 6 Dec 2013 02:04:53 -0800 Subject: [Biopython] 400 error, Did Entrez block me or is it something else? In-Reply-To: <52A1982E.1070501@medisin.uio.no> References: <52A1982E.1070501@medisin.uio.no> Message-ID: Hi Karin, Yes, I saw that in the tutorial manual, but appeared biopython took care of that part. I was thinking about putting a delay in my script anyway. On Fri, Dec 6, 2013 at 1:26 AM, Karin Lagesen wrote: > IIRC, NCBI requires a 3 second delay between fetches to avoid flooding the > server. I cannot find again the site where I found this, but that's what I > learned the last time I did this, and I am also finding the same info in > bioperl documentation. > > Karin > > > On 06.12.2013 08:20, David Shin wrote: > >> Thanks guys - >> >> For now it eventually worked its way out, but if you could show me a >> snippet of how try and except statements work, I would be thankful (I know >> I can probably google it too), I'm also new to python in general, so >> coding >> examples always help. >> >> Thanks again, >> >> Dave >> >> >> On Thu, Dec 5, 2013 at 7:13 AM, wrote: >> >> Hi David! >>> >>> I had the same error while trying to download hundreds of sequences from >>> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter >>> says it is an internet error. I avoided it by making a loop and adding >>> the >>> try - except statements, so whenever this error appeared, the script just >>> returned and tried it again until the sequences were downloaded >>> correctly. >>> I know it is a brute force-like approach, but this was the best solution >>> I >>> could think of, I hope It helps you too. >>> >>> Miguel >>> >>> >>> [Theropoda is my profession] >>> >>> On 05/12/2013, at 06:31, Peter Cock wrote: >>>> >>>> Hi David, >>>> >>>> This is probably an intermittent failure, perhaps the NCBI >>>> is very busy or there could be a temporary network problem >>>> somewhere. The chances are it will work tomorrow... >>>> >>>> Peter >>>> >>>> >>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin >>>>> wrote: >>>>> Hi all - >>>>> >>>>> During off peak times, I was working on a script that takes a list of >>>>> gi >>>>> numbers (a list of 8) for protein sequences to use as input to get >>>>> fasta >>>>> sequences via Entrez.efetch. I passed along my email address during the >>>>> searches. I am still new to python, etc. so had been testing my >>>>> program. >>>>> >>>>> At some point I started getting the error below, and I'm not sure if it >>>>> >>>> is >>> >>>> my program, my web provider, or if ncbi got mad at me. It's been like >>>>> >>>> this >>> >>>> for about an hour, so giving up for the night. >>>>> >>>>> Thanks >>>>> >>>>> error: >>>>> >>>>> Traceback (most recent call last): >>>>> File "../gi-to-fasta5.py", line 11, in >>>>> handle = Entrez.efetch(db="protein", rettype="fasta", >>>>> retmode="text", >>>>> id=gi_numbers) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/ >>>>> Entrez/__init__.py", >>>>> line 144, in efetch >>>>> return _open(cgi, variables, post) >>>>> File >>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/ >>>>> Entrez/__init__.py", >>>>> line 460, in _open >>>>> raise exception >>>>> urllib2.HTTPError: HTTP Error 400: Bad Request >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> David Shin, Ph.D >>>>> Lawrence Berkeley National Labs >>>>> 1 Cyclotron Road >>>>> MS 83-R0101 >>>>> Berkeley, CA 94720 >>>>> USA >>>>> _______________________________________________ >>>>> Biopython mailing list - Biopython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>>> >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> >>> >>> >> >> >> > > -- > Karin Lagesen, Ph.D. > Department of Medical Genetics > and Norwegian High-Throughput Sequencing Centre (NSC) > Oslo University Hospital > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Fri Dec 6 10:24:38 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:24:38 +0000 Subject: [Biopython] going from protein to gene to oligos for cloning In-Reply-To: References: Message-ID: On Fri, Dec 6, 2013 at 7:27 AM, David Shin wrote: > Hi again, > > I'm trying to use biopython to help me grab a lot of protein sequences that > will eventually be used as the basis for cloning. I'm almost done screening > my protein sequences, and pretty much ok on that part... > > I was just curious if anyone has already developed, or has any decent > advice on going from protein codes to getting the actual coding sequences > of the genes. > > At this point, my plan is to take protein codes (ie. numbers in > gi|145323746|) and use these to search entrez nucleotide databases directly > to get hits (I have tested it once seems to work to get genbank records... > then try to use the information inside to get the nucleotide sequences... > or I guess the other way is to use the top hit from tblastn somehow? > > Thanks, > > Dave Hi Dave, The catch here is the protein IDs are not directly usable in the nucleotide database - which is where ELink (Entrez Link) comes in, available as the Entrez.elink(...) function in Biopython. I've not tried it myself, but a colleague posted a long example on his blog which sounds close to what you are aiming for: http://armchairbiology.blogspot.co.uk/2013/02/surely-this-has-been-done-already.html https://github.com/widdowquinn/scripts/blob/master/bioinformatics/get_NCBI_cds_from_protein.py Peter From tiagoantao at gmail.com Fri Dec 6 11:27:28 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 11:27:28 +0000 Subject: [Biopython] Biopython 1.63 released Message-ID: Source distributions and Windows installers for Biopython 1.63 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). The current version removed the requirement of the 2to3 library. This was made possible by dropping Python 2.5 (and Jython 2.5). This release of Biopython supports Python 2.6 and 2.7, and also Python 3.3. The Biopython Tutorial & Cookbook, and the docstring examples in the source code, now use the Python 3 style print function in place of the Python 2 style print statement. This language feature is available under Python 2.6 and 2.7 via: from __future__ import print_function Similarly we now use the Python 3 style built-in next function in place of the Python 2 style iterators? .next() method. This language feature is also available under Python 2.6 and 2.7. The restriction enzyme list in Bio.Restriction has been updated to the December 2013 release of REBASE. Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Chris Mitchell (first contribution) Christian Brueffer Eric Talevich Gokcen Eraslan (first contribution) Josha Inglis (first contribution) Konstantin Tretyakov (first contribution) Lenna Peterson Martin Mokrejs Nigel Delaney (first contribution) Peter Cock Sergei Lebedev (first contribution) Tiago Antao Wayne Decatur (first contribution) Wibowo ?Bow? Arindrarto From davidsshin at lbl.gov Tue Dec 10 07:59:33 2013 From: davidsshin at lbl.gov (David Shin) Date: Mon, 9 Dec 2013 23:59:33 -0800 Subject: [Biopython] using prosite Message-ID: Hi Again, I'm trying to find motifs in a group of proteins using prosite via biopython. I've tried to copy what's in the tutorial (pg 152-153) into a script to get started. If I run it on my test sequence I get nothing, so I assume the default is to "exclude motifs with a high probability of occurrence from the scan" like on the website. So I'm trying to substitute some flag to turn off this feature, or to just scan for specific motifs, ie. PS00001 Asn-Glycosylation, but have had no luck. Any suggestions on syntax here? Below is a basic script where I tried to give a specific motif: from Bio.ExPASy import ScanProsite sequence = "MSQHLLLLILLSLLLLSLLLLHKPISATTIIQKFKEAPQFYNSADCPLIDDSESDDDVVAKPIFCSRRAVHVAMTLDAAYIRGSVAAVLSVLQHSSCPENIVFHFVASASADASSLRATISSSFPYLDFTVYVFNVSSVSRLISSSIRSALDCPLNYARSYLADLLPPCVRRVVYLDSDLILVDDIAKLAATDLGRDSVLAAPEYCNANFTSYFTSTFWSNPTLSLTFADRKACYFNTGVMVIDLSRWREGAYTSRIEEWMAMQKRMRIYELGSLPPFLLVFAGLIKPVNHRWNQHGLGGDNFRGLCRDLHPGPVSLLHWSGKGKPWARLDAGRPCPLDALWAPYPDPDPDDLLQTPFALDS" handle = ScanProsite.scan(seq=sequence, accession=PS00001) result = ScanProsite.read(handle) print type(result) print len(result) print result[0] print result[1] Thanks! Dave From philipp.schiffer at gmail.com Sat Dec 21 15:46:40 2013 From: philipp.schiffer at gmail.com (Philipp Schiffer) Date: Sat, 21 Dec 2013 16:46:40 +0100 Subject: [Biopython] Python getting stuck reading fastq file Message-ID: <981B2822CDB34C8B89869B7C68D04898@gmail.com> Hi! I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5). Biopython was installed through pip on a Scientific Linux 6.2 system. Is this an error with the SeqIO parser? Or am I doing something wrong? Any help with this would be highly appreciated. Kind regards Philipp import string from pprint import pprint import os from Bio import SeqIO from subprocess import call import sys import re fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w') my_abundantheads=set() my_onetwo = re.compile('\/[1-2]?) abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU') for record in SeqIO.parse(abundant, "fastq"): ids = my_onetwo.split(record.id) my_abundantheads.update([ids[0]]) ?. ^C--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) in () ----> 1 for record in SeqIO.parse(abundant, "fastq"): 2 ids = my_onetwo.split(record.id) 3 my_abundantheads.update([ids[0]]) 4 /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet) 539 raise ValueError("Unknown format '%s'" % format) 540 #This imposes some overhead... wait until we drop Python 2.4 to fix it --> 541 for r in i: 542 yield r 543 /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids) 1034 for letter in range(0, 255): 1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET -> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle): 1037 if title2ids: 1038 id, name, descr = title2ids(title_line) /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle) 934 #There may now be more quality data, or another sequence, or EOF 935 while True: --> 936 line = handle_readline() 937 if not line: 938 break # end of file KeyboardInterrupt: -- Philipp Schiffer Sent with Sparrow (http://www.sparrowmailapp.com/?sig) From philipp.schiffer at gmail.com Sat Dec 21 15:57:35 2013 From: philipp.schiffer at gmail.com (Philipp Schiffer) Date: Sat, 21 Dec 2013 16:57:35 +0100 Subject: [Biopython] Python getting stuck reading fastq file In-Reply-To: <981B2822CDB34C8B89869B7C68D04898@gmail.com> References: <981B2822CDB34C8B89869B7C68D04898@gmail.com> Message-ID: <83CECE51818A4D9D8E85862AF331A838@gmail.com> Hi again! Seems to be working now, after I pulled the latest Biopython from github. Very sorry for Saturday evening disturbance. Have a great Christmas season everybody. Philipp -- Philipp Schiffer Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, 21 December 2013 at 16:46, Philipp Schiffer wrote: > Hi! > > I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5). > Biopython was installed through pip on a Scientific Linux 6.2 system. > Is this an error with the SeqIO parser? Or am I doing something wrong? > > Any help with this would be highly appreciated. > > Kind regards > > Philipp > > import string > from pprint import pprint > import os > from Bio import SeqIO > from subprocess import call > import sys > import re > > fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w') > my_abundantheads=set() > my_onetwo = re.compile('\/[1-2]?) > > abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU') > for record in SeqIO.parse(abundant, "fastq"): > ids = my_onetwo.split(record.id) > > my_abundantheads.update([ids[0]]) > > > > ?. > > ^C--------------------------------------------------------------------------- > KeyboardInterrupt Traceback (most recent call last) > in () > ----> 1 for record in SeqIO.parse(abundant, "fastq"): > 2 ids = my_onetwo.split(record.id) > 3 my_abundantheads.update([ids[0]]) > 4 > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet) > 539 raise ValueError("Unknown format '%s'" % format) > 540 #This imposes some overhead... wait until we drop Python 2.4 to fix it > --> 541 for r in i: > 542 yield r > 543 > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids) > 1034 for letter in range(0, 255): > 1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET > -> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle): > 1037 if title2ids: > 1038 id, name, descr = title2ids(title_line) > > /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle) > 934 #There may now be more quality data, or another sequence, or EOF > 935 while True: > --> 936 line = handle_readline() > 937 if not line: > 938 break # end of file > > KeyboardInterrupt: > > > > > -- > Philipp Schiffer > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > From p.j.a.cock at googlemail.com Sat Dec 21 16:01:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 21 Dec 2013 16:01:15 +0000 Subject: [Biopython] Python getting stuck reading fastq file In-Reply-To: <83CECE51818A4D9D8E85862AF331A838@gmail.com> References: <981B2822CDB34C8B89869B7C68D04898@gmail.com> <83CECE51818A4D9D8E85862AF331A838@gmail.com> Message-ID: On Sat, Dec 21, 2013 at 3:57 PM, Philipp Schiffer wrote: > Hi again! > > Seems to be working now, after I pulled the latest Biopython from github. > Very sorry for Saturday evening disturbance. > > Have a great Christmas season everybody. OK, good :) What made you think it is stuck though (and not just taking a long time)? Also, since you are only using the ID rather than the full SeqRecord representation, it would be faster to use the FastqGeneralIterator iterator function instead, e.g. http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ Regards, Peter From mjldehoon at yahoo.com Sat Dec 28 02:10:24 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 27 Dec 2013 18:10:24 -0800 (PST) Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: Message-ID: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> Various issues of the MMCIF parser also came up previously, see for example the thread starting here: http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html I'd like to propose to make the MMCIF parser a bit more Pythonic and more consistent with other Biopython modules. This is a description of how the MMCIF parser would work: http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 This uses a new Bio.PDB.mmcif module that has a read() function (taking both file names and file handles) which stores the information in an MMCIF file in a mmcif.Record object, which is a Python dictionary. Please have a look if you agree or have any comments or suggestions. If this looks OK, I can upload the new code. Best, -Michiel. -------------------------------------------- On Thu, 12/5/13, Peter Cock wrote: Subject: Re: [Biopython] Bio.PDB local MMCIF files To: "Jo?o Rodrigues" Cc: "Biopython Mailing List" Date: Thursday, December 5, 2013, 12:35 PM On Thu, Dec 5, 2013 at 4:53 PM, Jo?o Rodrigues wrote: > Hi Dave, > > I understand your concern. Python has the gzip module that can decompress > the files on the file and provide a handle for the content. This will not > work for the parsers since they except a filename. I will have a look at > the parsers code and if it's simple, I'll add a layer to do this exactly. > > Cheers, > > Jo?o Yeah, most of Biopython's parsers take a handle only, some like the top level functions in Bio.SeqIO, AlignIO, SearchIO will take a filename or a handle for convenience. I agree that is is unfortunate that Bio.PDB currently only takes a filename (historical design choice). If we can tweak it to take either a filename or a handle that would be much better :) Regards, Peter _______________________________________________ Biopython mailing list? -? Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From anaryin at gmail.com Sat Dec 28 20:11:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 28 Dec 2013 20:11:08 +0000 Subject: [Biopython] Bio.PDB local MMCIF files In-Reply-To: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1388196624.13802.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: Dear all, I agree that the code for mmcif parsing should be improved. However, I also think that we should move to an entirely new module to have sane names. The people at the EBI-EMBL have developed an efficient and robust mmcif parser that they would like to share with us so we can integrate in our code. I'm currently on holidays, I'll be back to work in a couple of weeks, so I suggest we postpone this discussion until then? Cheers, Jo?o Em 28/12/2013 02:10, "Michiel de Hoon" escreveu: > Various issues of the MMCIF parser also came up previously, see for > example the thread starting here: > > http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html > > I'd like to propose to make the MMCIF parser a bit more Pythonic and more > consistent with other Biopython modules. > This is a description of how the MMCIF parser would work: > > > http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 > > This uses a new Bio.PDB.mmcif module that has a read() function (taking > both file names and file handles) which stores the information in an MMCIF > file in a mmcif.Record object, which is a Python dictionary. > > Please have a look if you agree or have any comments or suggestions. If > this looks OK, I can upload the new code. > > Best, > -Michiel. > > -------------------------------------------- > On Thu, 12/5/13, Peter Cock wrote: > > Subject: Re: [Biopython] Bio.PDB local MMCIF files > To: "Jo?o Rodrigues" > Cc: "Biopython Mailing List" > Date: Thursday, December 5, 2013, 12:35 PM > > On Thu, Dec 5, 2013 at 4:53 PM, Jo?o > Rodrigues > wrote: > > Hi Dave, > > > > I understand your concern. Python has the gzip module > that can decompress > > the files on the file and provide a handle for the > content. This will not > > work for the parsers since they except a filename. I > will have a look at > > the parsers code and if it's simple, I'll add a layer > to do this exactly. > > > > Cheers, > > > > Jo?o > > Yeah, most of Biopython's parsers take a handle only, some > like the top level functions in Bio.SeqIO, AlignIO, > SearchIO > will take a filename or a handle for convenience. > > I agree that is is unfortunate that Bio.PDB currently only > takes > a filename (historical design choice). If we can tweak it to > take > either a filename or a handle that would be much better :) > > Regards, > > Peter > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From mjldehoon at yahoo.com Mon Dec 30 03:37:30 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 29 Dec 2013 19:37:30 -0800 (PST) Subject: [Biopython] Bio.PDB local MMCIF files Message-ID: <1388374650.98611.BPMail_high_carrier@web164003.mail.gq1.yahoo.com> OK, I will hold off for now then. -Michiel ------------------------------ On Sat, Dec 28, 2013 3:11 PM EST Jo?o Rodrigues wrote: >Dear all, > >I agree that the code for mmcif parsing should be improved. However, I also >think that we should move to an entirely new module to have sane names. > >The people at the EBI-EMBL have developed an efficient and robust mmcif >parser that they would like to share with us so we can integrate in our >code. > >I'm currently on holidays, I'll be back to work in a couple of weeks, so I >suggest we postpone this discussion until then? > >Cheers, >Jo?o >Em 28/12/2013 02:10, "Michiel de Hoon" escreveu: > >> Various issues of the MMCIF parser also came up previously, see for >> example the thread starting here: >> >> http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010452.html >> >> I'd like to propose to make the MMCIF parser a bit more Pythonic and more >> consistent with other Biopython modules. >> This is a description of how the MMCIF parser would work: >> >> >> http://biopython.org/DIST/docs/tutorial/Tutorial.mmcif_proposal.html#htoc149 >> >> This uses a new Bio.PDB.mmcif module that has a read() function (taking >> both file names and file handles) which stores the information in an MMCIF >> file in a mmcif.Record object, which is a Python dictionary. >> >> Please have a look if you agree or have any comments or suggestions. If >> this looks OK, I can upload the new code. >> >> Best, >> -Michiel. >> >> -------------------------------------------- >> On Thu, 12/5/13, Peter Cock wrote: >> >> Subject: Re: [Biopython] Bio.PDB local MMCIF files >> To: "Jo?o Rodrigues" >> Cc: "Biopython Mailing List" >> Date: Thursday, December 5, 2013, 12:35 PM >> >> On Thu, Dec 5, 2013 at 4:53 PM, Jo?o >> Rodrigues >> wrote: >> > Hi Dave, >> > >> > I understand your concern. Python has the gzip module >> that can decompress >> > the files on the file and provide a handle for the >> content. This will not >> > work for the parsers since they except a filename. I >> will have a look at >> > the parsers code and if it's simple, I'll add a layer >> to do this exactly. >> > >> > Cheers, >> > >> > Jo?o >> >> Yeah, most of Biopython's parsers take a handle only, some >> like the top level functions in Bio.SeqIO, AlignIO, >> SearchIO >> will take a filename or a handle for convenience. >> >> I agree that is is unfortunate that Bio.PDB currently only >> takes >> a filename (historical design choice). If we can tweak it to >> take >> either a filename or a handle that would be much better :) >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> >>