From b.invergo at gmail.com Fri Jul 1 04:46:02 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 10:46:02 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0D894A.7050209@gmail.com> Hi Eric, Yes, I was feeling that as well while I was doing it. I think it would be easy enough to implement, so I'll take care of that today... Cheers, Brandon On 07/01/2011 04:17 AM, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind > of tedious when a lot of options need to be set. It would be nice to > be able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other > folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except > that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > > From b.invergo at gmail.com Fri Jul 1 06:28:22 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 12:28:22 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0DA146.7040703@gmail.com> Since the options are stored in a dict() object keyed by strings, I think it would be easiest to have the arguments passed as ("option", value) tuples or to have a dict() passed in with "option":value pairs. Alternatively, the set_option & get_option methods can be dropped altogether and we can just let the user have full access to the options dict. The write_ctl_file method can be used then to check the validity of each option, to prevent erroneous option names from being used. Perhaps that's better, despite being slightly more work to implement. What do you think? Cheers, Brandon On 07/01/2011 04:17 AM, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind > of tedious when a lot of options need to be set. It would be nice to > be able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other > folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except > that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > > From b.invergo at gmail.com Fri Jul 1 06:37:45 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 12:37:45 +0200 Subject: [Biopython-dev] "Developing on Github" wiki amendment Message-ID: <4E0DA379.4030902@gmail.com> Hi everyone, Based on my own experiences working with Github, I suggest a minor addition to the wiki tutorial on using it. I'm working behind an HTTP proxy at my university, which doesn't pose big problems with cloning or pushing, which use the git http_proxy setting without problem. However, git-pull for whatever reason doesn't seem to use that setting but instead relies on the GitProxy setting. I haven't been able to get this to play nicely, so I had trouble pulling upstream changes to my repository for a while. In the end, the easiest solution was to just change my upstream master to https://github.com/biopython/biopython.git rather than git:// so that the git https_proxy is used. The only problem is that I couldn't find a git command to change the upstream master (didn't search very deeply, admittedly), so I did it by manually editing the .git/config file in my repository. Does anyone know if there is one? So, if it's ok with everyone, I would write a small addition of a sentence or two offering this as a work-around for people having problems pulling upstream changes from behind a proxy. Before I do so, perhaps it would be prudent to ask if there are any problems about using https:// rather than git:// for pulling. Cheers, -brandon From eric.talevich at gmail.com Fri Jul 1 11:47:09 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 11:47:09 -0400 Subject: [Biopython-dev] "Developing on Github" wiki amendment In-Reply-To: <4E0DA379.4030902@gmail.com> References: <4E0DA379.4030902@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 6:37 AM, Brandon Invergo wrote: > Hi everyone, > > Based on my own experiences working with Github, I suggest a minor addition > to the wiki tutorial on using it. I'm working behind an HTTP proxy at my > university, which doesn't pose big problems with cloning or pushing, which > use the git http_proxy setting without problem. However, git-pull for > whatever reason doesn't seem to use that setting but instead relies on the > GitProxy setting. I haven't been able to get this to play nicely, so I had > trouble pulling upstream changes to my repository for a while. In the end, > the easiest solution was to just change my upstream master to > https://github.com/biopython/**biopython.gitrather than git:// so that the git https_proxy is used. > > The only problem is that I couldn't find a git command to change the > upstream master (didn't search very deeply, admittedly), so I did it by > manually editing the .git/config file in my repository. Does anyone know if > there is one? > I think the "git remote" command can do what you need, but editing .git/config is fine too. The config file is meant to be useful to power-users. So, if it's ok with everyone, I would write a small addition of a sentence > or two offering this as a work-around for people having problems pulling > upstream changes from behind a proxy. > You mean the http://biopython.org/wiki/GitUsage ? Sure, go right ahead, it's a wiki :) > Before I do so, perhaps it would be prudent to ask if there are any > problems about using https:// rather than git:// for pulling. > > If you're a committer, then the public https:// option won't let you push changes back upstream (from the command line); the repo is read-only. For pulling, it should be perfectly fine. Cheers, Eric From eric.talevich at gmail.com Fri Jul 1 12:03:40 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 12:03:40 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DA146.7040703@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DA146.7040703@gmail.com> Message-ID: I think the style of Bio.Applications should be the guide here. Check out these examples: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc74 To make this work, you'll need to accept **kwargs somewhere and use the kwargs dictionary (with some validation) for the options dict. Validation can happen either before setting the options dict, or just before writing out the config file -- I think Bio.Applications checks the args just before running the application. A complication is that you have two sets of options -- for generating the config file, and for running the program on the command line. You could require these to be set separately, to avoid some confusion in the code, and still use **kwargs for each. Also worth considering: (i) Use Bio.Phylo.PAML to generate the config file (ii) Under the hood, call out to Bio.Phylo.Applications._PamlCommandline to actually run the program with a given config file (already written) (iii) Use Bio.Phylo.PAML again to parse the output. The API of Bio.Phylo.PAML would stay basically the same, but this could help keep the options separate and maybe decouple the three phases of running the programs. (I can help with this over the weekend if you'd like.) Cheers, Eric On Fri, Jul 1, 2011 at 6:28 AM, Brandon Invergo wrote: > Since the options are stored in a dict() object keyed by strings, I think > it would be easiest to have the arguments passed as ("option", value) tuples > or to have a dict() passed in with "option":value pairs. > > Alternatively, the set_option & get_option methods can be dropped > altogether and we can just let the user have full access to the options > dict. The write_ctl_file method can be used then to check the validity of > each option, to prevent erroneous option names from being used. Perhaps > that's better, despite being slightly more work to implement. > > What do you think? > > > Cheers, > Brandon > > On 07/01/2011 04:17 AM, Eric Talevich wrote: > > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the wiki > is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind of > tedious when a lot of options need to be set. It would be nice to be able to > set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo wrote: > >> Well, it's not much, but how's this? >> https://github.com/brandoninvergo/biopython/tree/doc-branch >> Do you want me to go more into detail about the options available like >> in the wikior is this sufficient as a tutorial? Just let me know... >> >> Cheers, >> Brandon >> >> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >> wrote: >> > Hi Eric, >> > No problem, I'll start writing something up now. >> > Cheers, >> > -brandon >> > >> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >> wrote: >> >> Hi Brandon, >> >> >> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >> >> >> https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 >> >> >> >> Do you think you could add some more to that section, maybe pulling a >> chunk >> >> of content from the wiki page you just wrote? If you're not comfortable >> with >> >> LaTeX you can just point me to some text and I'll add it. >> >> >> >> Thanks, >> >> Eric >> >> >> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > >> >> wrote: >> >>> >> >>> Ok, the documentation is finished: >> >>> http://biopython.org/wiki/PAML >> >>> >> >>> Cheers, >> >>> Brandon >> >>> >> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >> wrote: >> >>> > Brandon; >> >>> > >> >>> >> Ok I've just sent the email to the main list. >> >>> > >> >>> > Awesome, thanks for this. Hope this convinces some other folks to >> >>> > take a look. >> >>> > >> >>> >> I can write up some documentation this week. What is the official >> >>> >> procedure for adding documentation to the wiki, if any? Or can I >> just >> >>> >> create an account and start writing? >> >>> > >> >>> > Create an account and start writing. Nothing official except that >> >>> > documentation is good. >> >>> > >> >>> >> Also, just to double-check, are my docstrings all sufficient or >> should >> >>> >> I expand those? >> >>> > >> >>> > Your code comments looked great to me. The end user documentation >> >>> > seems to be the main thing at this point: describing how someone can >> >>> > pick up and get started with the code. >> >>> > >> >>> > Thanks again for all the work, >> >>> > Brad >> >>> > _______________________________________________ >> >>> > Biopython-dev mailing list >> >>> > Biopython-dev at lists.open-bio.org >> >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >>> > >> >>> _______________________________________________ >> >>> Biopython-dev mailing list >> >>> Biopython-dev at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> >> >> >> > >> > > > From b.invergo at gmail.com Fri Jul 1 12:20:30 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 18:20:30 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0DF3CE.7020504@gmail.com> Hi Eric, You're right, I had the functionality of kwargs mixed up in my head (I've rarely used it) and I forgot that it's passed in as a dict. In that case, it's relatively straight-forward to do. Something like this: def set_options(self, **kwargs): for option, value in kwargs.items(): if option in self._options: self._options[option] = value # else: # Raise exception Not sure if raising an exception would really be necessary here. (ps - I haven't tested that code, I just typed it up quickly now) Regarding the splitting of functionality, to an extent it makes sense but I wonder if it's worth it, since the PAML commandline programs only take a single argument, the path to the control file. However, if the main advantages lie in code readability and standardization with the rest of the applications framework, then I think it's ok. Unfortunately I'll be unavailable all weekend (starting in about 3 minutes) but I should be free on Monday to work on it. Cheers, Brandon On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind of > tedious when a lot of options need to be set. It would be nice to be > able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all sufficient > or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > From b.invergo at gmail.com Fri Jul 1 12:28:09 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 18:28:09 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DF3CE.7020504@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> Message-ID: <4E0DF599.8080703@gmail.com> Hi Eric, I lied. I took a moment to at least implement the kwargs change: https://github.com/brandoninvergo/biopython/commit/533b06476899b631ec28a6e4cc97a2e669a45ea0 It seems to work swimmingly but I haven't tested it extensively yet. There was already exception-handling in place. Ok, *now* I'm off for the weekend! Cheers, Brandon On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > Hi Eric, > You're right, I had the functionality of kwargs mixed up in my head > (I've rarely used it) and I forgot that it's passed in as a dict. In > that case, it's relatively straight-forward to do. Something like this: > > def set_options(self, **kwargs): > for option, value in kwargs.items(): > if option in self._options: > self._options[option] = value > # else: > # Raise exception > > Not sure if raising an exception would really be necessary here. (ps - I > haven't tested that code, I just typed it up quickly now) > > Regarding the splitting of functionality, to an extent it makes sense > but I wonder if it's worth it, since the PAML commandline programs only > take a single argument, the path to the control file. However, if the > main advantages lie in code readability and standardization with the > rest of the applications framework, then I think it's ok. > > Unfortunately I'll be unavailable all weekend (starting in about 3 > minutes) but I should be free on Monday to work on it. > > Cheers, Brandon > > On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: >> Hi Brandon, >> >> Looks good, thanks! It's just enough to get the point across, and the >> wiki is a fine place for extended examples. >> >> Reading this, I notice that the cml.set_option(key, value) gets kind >> of tedious when a lot of options need to be set. It would be nice to >> be able to set them all in one go, as keyword arguments: >> >> cml.set_options( >> seqtype=1, >> verbose=0, >> noisy=0, >> RateAncestor=0, >> model=0, >> NSsites=[0, 1, 2], >> CodonFreq=2, >> cleandata=1, >> fix_alpha=1, >> kappa=4.54006, >> ) >> >> What do you think? Worth implementing? >> >> Cheers, >> Eric >> >> >> On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > > wrote: >> >> Well, it's not much, but how's this? >> https://github.com/brandoninvergo/biopython/tree/doc-branch >> Do you want me to go more into detail about the options available like >> in the wikior is this sufficient as a tutorial? Just let me know... >> >> Cheers, >> Brandon >> >> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >> > wrote: >> > Hi Eric, >> > No problem, I'll start writing something up now. >> > Cheers, >> > -brandon >> > >> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >> > wrote: >> >> Hi Brandon, >> >> >> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >> >> >> https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 >> >> >> >> >> Do you think you could add some more to that section, maybe >> pulling a chunk >> >> of content from the wiki page you just wrote? If you're not >> comfortable with >> >> LaTeX you can just point me to some text and I'll add it. >> >> >> >> Thanks, >> >> Eric >> >> >> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo >> > >> >> wrote: >> >>> >> >>> Ok, the documentation is finished: >> >>> http://biopython.org/wiki/PAML >> >>> >> >>> Cheers, >> >>> Brandon >> >>> >> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >> > wrote: >> >>> > Brandon; >> >>> > >> >>> >> Ok I've just sent the email to the main list. >> >>> > >> >>> > Awesome, thanks for this. Hope this convinces some other folks to >> >>> > take a look. >> >>> > >> >>> >> I can write up some documentation this week. What is the >> official >> >>> >> procedure for adding documentation to the wiki, if any? Or >> can I just >> >>> >> create an account and start writing? >> >>> > >> >>> > Create an account and start writing. Nothing official except that >> >>> > documentation is good. >> >>> > >> >>> >> Also, just to double-check, are my docstrings all sufficient >> or should >> >>> >> I expand those? >> >>> > >> >>> > Your code comments looked great to me. The end user documentation >> >>> > seems to be the main thing at this point: describing how >> someone can >> >>> > pick up and get started with the code. >> >>> > >> >>> > Thanks again for all the work, >> >>> > Brad >> >>> > _______________________________________________ >> >>> > Biopython-dev mailing list >> >>> > Biopython-dev at lists.open-bio.org >> >> >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >>> > >> >>> _______________________________________________ >> >>> Biopython-dev mailing list >> >>> Biopython-dev at lists.open-bio.org >> >> >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> >> >> >> > From redmine at redmine.open-bio.org Fri Jul 1 19:29:27 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Jul 2011 23:29:27 +0000 Subject: [Biopython-dev] [Biopython - Feature #3217] (In Progress) Bio.Phylo I/O support for the NeXML format References: Message-ID: Issue #3217 has been updated by Eric Talevich. Status changed from New to In Progress Assignee changed from Eric Talevich to Biopython Dev Mailing List % Done changed from 0 to 20 Jaime Huerta-Cepas pointed me to a strategy he's using to support both phyloXML and NeXML in ETE. A separate program called generateDS.py generates parsers automatically from the XSD files defining the specs. Here's the code: https://github.com/jhcepas/phylogenetic-XML-python-parsers I suggest: 1. Copying nexml.py into the Biopython source tree as Bio/Phylo/_nexml_gds.py 2. Writing something basic to convert the essential tree elements into compatible Bio.Phylo object types. Call that NexmlIO, for now? Also write unit tests. 3. As time permits, write more converters to make _nexml_gds.py objects compatible with existing Biopython types. This could include character matrices for AlignIO, and more tree annotations for Phylo. When generateDS.py is updated we'll just copy the newly generated nexml.py into _nexml_gds.py manually -- hopefully this won't require many changes in the converters each time. Timeline: After the 1.58 release. ---------------------------------------- Feature #3217: Bio.Phylo I/O support for the NeXML format https://redmine.open-bio.org/issues/3217 Author: Eric Talevich Status: In Progress Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: The future data exchange standard is... approaching rapidly. NeXML is going to become the format of choice for TreeBASE, Mesquite and probably MIAPA-targeted tools over the next year or two, and Biopython should be there to support it. Notes: * Another Python library, DendroPy, already supports (some of?) the NeXML format. Jeet Sukumaran and Mark Holder changed the license to BSD to allow other projects -- particularly us -- to share their code. So let's start there. * NeXML was designed so its elements can be treated as RDF triples, so see if RDFLib can help -- either as the underlying parser, or to provide some additional (optional) functionality. See: http://nexml.org/ http://packages.python.org/DendroPy/ http://www.rdflib.net/ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Fri Jul 1 21:40:48 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 2 Jul 2011 01:40:48 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (New) Draw a Bio.Phylo tree as a phylogram Message-ID: Issue #3260 has been reported by Eric Talevich. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Fri Jul 1 21:40:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 2 Jul 2011 01:40:49 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (New) Draw a Bio.Phylo tree as a phylogram Message-ID: Issue #3260 has been reported by Eric Talevich. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From updates at feedmyinbox.com Sat Jul 2 07:56:58 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 2 Jul 2011 07:56:58 -0400 Subject: [Biopython-dev] 7/2 newest questions tagged biopython - Stack Overflow Message-ID: // biopython import in Enthought Python Distribution? // July 1, 2011 at 2:30 PM http://stackoverflow.com/questions/6551862/biopython-import-in-enthought-python-distribution I am using the Enthought Python Distribution v7.0-2 (32-bit) and I am having trouble importing biopython. Does anyone know how to import biopython in EPD? I can import other libraries like numpy, matplotlib, etc. with no problem, but import biopython is not recognized. What's going on? Thanks in advance for the help. -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Sun Jul 3 19:14:46 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 3 Jul 2011 19:14:46 -0400 Subject: [Biopython-dev] 7/3 newest questions tagged biopython - Stack Overflow Message-ID: // Display line starting with word in a file Python regex // July 2, 2011 at 6:41 AM http://stackoverflow.com/questions/6556514/display-line-starting-with-word-in-a-file-python-regex I have a file "abc.txt" with the following contents.. EMBOSS_001 601 FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDS 650 :...::.||...||....|..|.|.... |..:.|.|.|..: EMBOSS_002 1 -----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNAT 41 EMBOSS_001 651 SGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW 691 |..:.| :.:|........|:| |...|...|: :.|..|.... EMBOSS_002 42 SSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE 91 I want to create three strings.. the first string "a" should have all characters thats written after EMBOSS_001 (of both the lines) ie A="FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDSSGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW" Second string should have everything written after EMBOSS_002 (of both the lines) minus numbers ie B="-----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNATSSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE" and the third string C should be whatever is between EMBOSS_1 and EMBOSS_2 (alphanumeric characters or -) in both the lines C=" :...::.||...||....|..|.|.... |..:.|.|.|..|..:.| :.:|........|:| |...|...|: :.|..|...." The original spaces at start, end(if any) and at the middle of C should be intact. In this case 5 spaces are at the start since C starts from "F" of A and "-" of B Thanks // Display line starting with word in a file Python regex [closed] // July 2, 2011 at 6:41 AM http://stackoverflow.com/questions/6556514/display-line-starting-with-word-in-a-file-python-regex I have a file "abc.txt" with the following contents.. EMBOSS_001 601 FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDS 650 :...::.||...||....|..|.|.... |..:.|.|.|..: EMBOSS_002 1 -----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNAT 41 EMBOSS_001 651 SGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW 691 |..:.| :.:|........|:| |...|...|: :.|..|.... EMBOSS_002 42 SSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE 91 I want to create three strings.. the first string "a" should have all characters thats written after EMBOSS_001 (of both the lines) ie A="FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDSSGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW" Second string should have everything written after EMBOSS_002 (of both the lines) minus numbers ie B="-----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNATSSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE" and the third string C should be whatever is between EMBOSS_1 and EMBOSS_2 (alphanumeric characters or -) in both the lines C=" :...::.||...||....|..|.|.... |..:.|.|.|..|..:.| :.:|........|:| |...|...|: :.|..|...." The original spaces at start, end(if any) and at the middle of C should be intact. In this case 5 spaces are at the start since C starts from "F" of A and "-" of B Thanks -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Tue Jul 5 06:56:46 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 5 Jul 2011 06:56:46 -0400 Subject: [Biopython-dev] 7/5 biopython Questions - BioStar Message-ID: <46e62975e9d52606c34141cdfe578caf@74.63.51.88> // GenBank to Fasta failing with CONTIG fields // July 5, 2011 at 6:31 AM http://biostar.stackexchange.com/questions/9892/genbank-to-fasta-failing-with-contig-fields I used to generate FASTA out of my GenBank source files using a simple conversion script: #!/usr/bin/env python import sys, signal from Bio import SeqIO def wrap( text, width=80 ): for i in xrange( 0, len( text ), width ): yield text[i:i+width] if name == "main": status = progress() for record in SeqIO.parse( sys.stdin, "genbank"): try: gi = record.annotations["gi"] except KeyError: gi = None accession = record.id desc = record.description seq = record.seq locus = record.name print ">gi|%s|emb|%s|%s| %s" % (gi, accession, locus, desc) for block in wrap( seq ): print block When I changed the sequence files to newer versions some of the resulting FASTA file sequences were just filled with Ns. After closer inspection of the GenBank source files, it turns out that they have replaced the ORIGIN block ORIGIN sequence... with a CONTIG block, something like CONTIG join(BX640437.1:1..347356,BX640438.1:51..347786,...) Is there a way to resolve this using BioPython? I was working with BioPython 1.52 and 1.57 (latest). Thanks for your suggestions. // Parsing BLAST output BioPython Error // July 5, 2011 at 2:25 AM http://biostar.stackexchange.com/questions/9882/parsing-blast-output-biopython-error Hi, I have the following code def runBLAST(self): print "Running BLAST .........." cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True) cmd.communicate()[0] f1=open("out.faa") blast_records = NCBIXML.parse(f1) save_file = open("my_fasta_seq.fasta", 'w') for blast_record in blast_records[:10]: for alignment in blast_record.alignments: for hsp in alignment.hsps: save_file.write('>%s\n' % (alignment.hseq,)) save_file.close() f1.close() f2=open("my_fasta_seq.fasta") for record in SeqIO.parse(f2,"fasta"): f=open("tempBLAST1.txt","w") f.write(">"+"\n"+str(record.name)+"\n"+str(record.seq)+"\n") f.close() I get the error on TypeError: for blast_record in blast_records[:10]: saying 'generator' object is not subscriptable. I am looking to get top 10 blast hits (sequences) // Getting top 10 sequences of BLAST results Bio Python // July 5, 2011 at 12:29 AM http://biostar.stackexchange.com/questions/9880/getting-top-10-sequences-of-blast-results-bio-python Hi, I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences. I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2 the output has lots of other information. Should I parse this output now or there's a better way. Thanks P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Tue Jul 5 06:56:52 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 5 Jul 2011 06:56:52 -0400 Subject: [Biopython-dev] 7/5 newest questions tagged biopython - Stack Overflow Message-ID: <11de02f7858811397f05ce54a2078248@74.63.51.88> // Getting top 10 sequences of BLAST results Bio Python // July 5, 2011 at 12:31 AM http://stackoverflow.com/questions/6577975/getting-top-10-sequences-of-blast-results-bio-python I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences. I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2 the output has lots of other information. Should I parse this output now or there's a better way. Thanks P.S XML might be a way but I didn't find a relavant NCBIXML parser syntax. -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Tue Jul 5 12:10:14 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Jul 2011 17:10:14 +0100 Subject: [Biopython-dev] Fwd: SeqXML an alternative for FASTA In-Reply-To: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> References: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> Message-ID: Hi all, I've been in touch with Thomas Schmitt about merging read/write support for the SeqXML file format (see below and http://seqxml.org/ ) into Biopython's SeqIO module. BioPerl already supports this (under format name "seqxml") and a BioJava v3 implementation is in progress. We're discussing this and the format itself on the cross project OBF mailing list (see below), http://lists.open-bio.org/pipermail/open-bio-l/2011-July/000805.html Please feel free to join that list if you want to discuss anything general, or comment here on the Biopython implementation. I've got a branch which seems nearly ready for merging on github, https://github.com/peterjc/biopython/commits/seqxml2 a rebase of https://github.com/peterjc/biopython/commits/seqxml Regards, Peter ---------- Forwarded message ---------- From: Thomas Schmitt Date: Fri, Jul 1, 2011 at 8:57 AM Subject: [Open-bio-l] SeqXML an alternative for FASTA To: open-bio-l at lists.open-bio.org Hello everybody, We recently published a new XML format called SeqXML to store biological sequences. Our aim was to create a lightweight alternative to FASTA that allows to store the metadata that is typical squeezed into a FASTA header in a standardized way. It looks something like this: ? ? ? ? ? ?dystroglycan 1 ? ? ? ?AAGGCGAUGUC.....ACAU ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?AAGGCGAAA...CACJOXA ? ? Check out the paper at http://bib.oxfordjournals.org/content/early/2011/06/10/bib.bbr025.full?keytype=ref&ijkey=dWzLPFBuzrdZme8 There is also a website (http://seqxml.org) where you can find the schema and a detailed documentation. The whole thing emerged from developing formats for the orthology community so you will also find information about our orthology format OrthoXML at these resources. To my knowledge the only format comparable to SepXML is TinySeq which does have some significant limitation: - It doesn't support database cross referencing - The identifiers are more NCBI specific - It is more verbose - There is only a very primitive DTD - It doesn't allow to validate the sequence alphabet - It isn't possible to define the source of the sequences - It doesn't support key value pair annotations We are trying to get IO implementations for SeqXML for all Bio* projects. There is already an implementation in BioPerl maintained by Dave Messina. We do have an implementation for the legacy version of BioJava and Andrew Yates promised to help us migrating it into BioJava 3. I'm also in contact with Peter Cock about a Biopython integration. He in fact asked me to move the discussion to this list. What do you guys thinks about the format? Is there anybody who wants to contribute with a BioRuby implementation? Best regards, Thomas _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From updates at feedmyinbox.com Wed Jul 6 07:04:23 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 6 Jul 2011 07:04:23 -0400 Subject: [Biopython-dev] 7/6 biopython Questions - BioStar Message-ID: <47ab7fabfee60bc6430c75c127531996@74.63.51.88> // Biopython not working on Window 7 64 (import bio function not working) // July 6, 2011 at 1:56 AM http://biostar.stackexchange.com/questions/9947/biopython-not-working-on-window-7-64-import-bio-function-not-working Dear All I am having trouble using biopython as my 'import bio' does not work. I have Window 7 , 64 system with Python 2.7.1 with Piston , Django and NumPy site packages installed and they all work well with the import function. Any ideas? thanks! cheers Gary -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Wed Jul 6 07:04:40 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 6 Jul 2011 07:04:40 -0400 Subject: [Biopython-dev] 7/6 newest questions tagged biopython - Stack Overflow Message-ID: <467a099f2f1f03ff9fe38323f1bc5d33@74.63.51.88> // Biopython not working on Window 7 64 (import bio function not working) // July 6, 2011 at 1:55 AM http://stackoverflow.com/questions/6592127/biopython-not-working-on-window-7-64-import-bio-function-not-working I am having trouble using biopython as my 'import bio' does not work. I have Window 7 , 64 system with Python 2.7.1 with Piston , Django and NumPy site packages installed and they all work well with the import function. Any ideas? thanks! cheers Gary -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From w.arindrarto at gmail.com Wed Jul 6 21:16:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 7 Jul 2011 03:16:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser Message-ID: Hi everyone, This is my first post in the dev mailing list, so greetings :). I've been using Biopython for a few months in total now (in a period of ~1.5 years) and before that Python for ~0.5 years. Most of the time, I'm working with Sanger sequencing results and at one point I was a bit disappointed that I couldn't find any (bio)python module for reading .ab1 files. That compelled me to write my first python module that reads those files and extracts the useful information out of them. In the process I became more interested in python itself and finally thought it might be neat if biopython could do this, built-in. So I forked the main repo, made some changes to my module so it became a parser for the SeqIO submodule that reads Abi files. It's not cooked 100% yet, but if anyone is interested in seeing/commenting/criticizing the code, I'd appreciate that very much. Here's the link: https://github.com/bow/biopython/blob/seqio-abif/Bio/SeqIO/AbiIO.py Some features to note: - I've included a method to trim the sequence based on its quality scores - the parser does not extract the entire metadata of the trace files, only ones I consider important for further analysis/annotations. Of course, this could be changed if the community think some other data should be included/excluded - For those of you already familiar with the Abi format, I deliberately chose the 'PBAS2' tag for the sequence information, which is the unedited bases after base-calling by the sequencing program. Some things that I'm doing right now: - writing unit tests - making sure it's compatible with Python 3 (thanks Peter :)! ) - completing the docs - making sure it's compatible with most Abi format versions. Currently I've only tested it with files from the 310, 3100, and 3700 machines. Does anyone have some other versions that I can test this with? As I understand as well, this is not the only Sanger sequencing trace format out there (e.g. SCF is another). I would be glad to learn more and write a parser for the SCF format as well. The problem is, I'm not sure this would be useful in the long run as I've personally never seen anyone use an SCF file and so I've never had a chance to play around with one. If anyone has an SCF file lying around and thinks SCF support would be beneficial, I'd be happy to accept them :). I guess that's all for now. Thanks for reading! --- Wibowo Arindrarto (bow) http://bow.web.id From genivaldo.gueiros at gmail.com Thu Jul 7 21:18:30 2011 From: genivaldo.gueiros at gmail.com (Genivaldo Gueiros) Date: Thu, 7 Jul 2011 18:18:30 -0700 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] Message-ID: Hey guys , I ?d like to make a contribution to Biopython community ,Well what I wanna share my script using python to clean sequences up , you should know analyzing poor data takes CPU time and interpreting the results from poor data takes people time, so always is importat make a preprocessing. Let me call my script as ?Sequence_cleaner? and the big idea is to remove duplicate sequences, remove sequence too short ( the user define the minimum length) and remove sequences which has too much unknown nucleotides ( N) ( the user define the % of N is allows ) and in the end the use can choose if he/she wanna have a file as output or print the result. Let me know if you are interested -- Cheers, Geni From p.j.a.cock at googlemail.com Fri Jul 8 08:55:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 8 Jul 2011 13:55:32 +0100 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 2:18 AM, Genivaldo Gueiros wrote: > Hey guys , I ?d like to make a contribution to Biopython community ,Well > what I wanna share my script using python to clean sequences up , you should > know analyzing poor data takes CPU time and interpreting the results from > poor data takes people time, so always is importat make a preprocessing. > > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > duplicate sequences, remove sequence too short ( the user define the minimum > length) ?and remove sequences which has too much unknown nucleotides ( N) ?( > the user define the % of N is allows ) and in the end the use can choose if > he/she wanna have a file as output or print the result. > > Let me know if you are interested > Hi Genivaldo, This sounds like something you could add to the list here, http://biopython.org/wiki/Scriptcentral Or it might make a nice Cookbook example: http://biopython.org/wiki/Category:Cookbook Peter From genivaldo.gueiros at gmail.com Fri Jul 8 12:00:03 2011 From: genivaldo.gueiros at gmail.com (Genivaldo Gueiros) Date: Fri, 8 Jul 2011 09:00:03 -0700 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: Gotcha , I gonna read and probably add. Thx 2011/7/8 Peter Cock > On Fri, Jul 8, 2011 at 2:18 AM, Genivaldo Gueiros > wrote: > > Hey guys , I ?d like to make a contribution to Biopython community ,Well > > what I wanna share my script using python to clean sequences up , you > should > > know analyzing poor data takes CPU time and interpreting the results from > > poor data takes people time, so always is importat make a preprocessing. > > > > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > > duplicate sequences, remove sequence too short ( the user define the > minimum > > length) and remove sequences which has too much unknown nucleotides ( N) > ( > > the user define the % of N is allows ) and in the end the use can choose > if > > he/she wanna have a file as output or print the result. > > > > Let me know if you are interested > > > > Hi Genivaldo, > > This sounds like something you could add to the list here, > http://biopython.org/wiki/Scriptcentral > > Or it might make a nice Cookbook example: > http://biopython.org/wiki/Category:Cookbook > > Peter > -- Cheers, Geni From tim.te.beek at nbic.nl Mon Jul 11 04:34:16 2011 From: tim.te.beek at nbic.nl (Tim te Beek) Date: Mon, 11 Jul 2011 10:34:16 +0200 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk using?SeqIO.read(genbank_file, 'genbank') I get the following stacktrace: ... ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for genbank_file in genbank_files) ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line 604, in read ? ? first = iterator.next() ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line 532, in parse ? ? for r in i: ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 440, in parse_records ? ? record = self.parse(handle, do_features) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 423, in parse ? ? if self.feed(handle, consumer, do_features): ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 395, in feed ? ? self._feed_feature_table(consumer, self.parse_features(skip=False)) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 347, in _feed_feature_table ? ? consumer.location(location_string) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/__init__.py", line 975, in location ? ? raise LocationParserError(location_line) Bio.GenBank.LocationParserError: order(1078481..1078483,join(1078778,1078800..1078810)) The offending feature is: misc_feature ? ?complement(order(1078481..1078483,join(1078778, ?? ? ? ? ? ? ? ?1078800..1078810))) ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" ?? ? ? ? ? ? ? ?/note="active site" ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" Could you look into whether this is a bug in the parser or in the input file? From tim.te.beek at nbic.nl Mon Jul 11 04:46:54 2011 From: tim.te.beek at nbic.nl (Tim te Beek) Date: Mon, 11 Jul 2011 10:46:54 +0200 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: The same happens when parsing ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Saccharopolyspora_erythraea_NRRL_2338_uid62947/NC_009142.gbk, offending features: misc_feature order(2409324..2409326,2409399..2409401,2409528..2409533, 2409619..2409624,2409679..2409681,2409748..2409753, 2409754..2409759,2409835..2409837,join(2409886..2409890, 2409892..2409898),2409911..2409913,2409920..2409925) /locus_tag="SACE_2218" /note="active site" /db_xref="CDD:119408" misc_feature order(2409324..2409326,2409399..2409401,2409528..2409530) /locus_tag="SACE_2218" /note="catalytic tetrad; other site" /db_xref="CDD:119408" could have something to do with the order() instruction, but I'm not sure. On Mon, Jul 11, 2011 at 10:34, Tim te Beek wrote: > When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk > using?SeqIO.read(genbank_file, 'genbank') I get the following > stacktrace: > > ... > ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for > genbank_file in genbank_files) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", > line 604, in read > ? ? first = iterator.next() > ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", > line 532, in parse > ? ? for r in i: > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 440, in parse_records > ? ? record = self.parse(handle, do_features) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 423, in parse > ? ? if self.feed(handle, consumer, do_features): > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 395, in feed > ? ? self._feed_feature_table(consumer, self.parse_features(skip=False)) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 347, in _feed_feature_table > ? ? consumer.location(location_string) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/__init__.py", > line 975, in location > ? ? raise LocationParserError(location_line) > Bio.GenBank.LocationParserError: > order(1078481..1078483,join(1078778,1078800..1078810)) > > The offending feature is: > misc_feature ? ?complement(order(1078481..1078483,join(1078778, > ?? ? ? ? ? ? ? ?1078800..1078810))) > ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" > ?? ? ? ? ? ? ? ?/note="active site" > ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" > > Could you look into whether this is a bug in the parser or in the input file? > From p.j.a.cock at googlemail.com Mon Jul 11 05:38:03 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Jul 2011 10:38:03 +0100 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: On Mon, Jul 11, 2011 at 9:34 AM, Tim te Beek wrote: > When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk > using?SeqIO.read(genbank_file, 'genbank') I get the following > stacktrace: > > ... > ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for > genbank_file in genbank_files) > ... > Bio.GenBank.LocationParserError: > order(1078481..1078483,join(1078778,1078800..1078810)) > > The offending feature is: > misc_feature ? ?complement(order(1078481..1078483,join(1078778, > ?? ? ? ? ? ? ? ?1078800..1078810))) > ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" > ?? ? ? ? ? ? ? ?/note="active site" > ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" > > Could you look into whether this is a bug in the parser or in the input file? > That looks like the issue reported in Bug 3197, which turned out to be invalid GenBank files: https://redmine.open-bio.org/issues/3197 Quoting from: http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> 3.4.2.2 Operators >> >> ... >> >> Note : location operator "complement" can be used in combination with >> either "join" or "order" within the same location; combinations of "join" >> and "order" within the same location (nested operators) are illegal. Please report this problem with NC_011766.gbk and NC_009142.gbk to the NCBI (could you CC me too?), try using gb-admin at ncbi.nlm.nih.gov The next release of Biopython will have a clearer error message in this situation. Thank you, Peter From redmine at redmine.open-bio.org Mon Jul 11 05:44:25 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 11 Jul 2011 09:44:25 +0000 Subject: [Biopython-dev] [Biopython - Bug #3197] SeqIO parse error with some genbank files References: Message-ID: Issue #3197 has been updated by Peter Cock. Two more examples from the NCBI Bacteria FTP site, reported by Tim te Beek on our mailing list: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009018.html ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk LOCUS NC_011766 1365223 bp DNA circular BCT 20-MAY-2011 DEFINITION Desulfurococcus kamchatkensis 1221n chromosome, complete genome. ACCESSION NC_011766 VERSION NC_011766.1 GI:218883314 DBLINK Project: 59133 KEYWORDS . SOURCE Desulfurococcus kamchatkensis 1221n ... misc_feature complement(order(1078481..1078483,join(1078778, 1078800..1078810))) /locus_tag="DKAM_1147" /note="active site" /db_xref="CDD:73252" http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009019.html ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Saccharopolyspora_erythraea_NRRL_2338_uid62947/NC_009142.gbk LOCUS NC_009142 8212805 bp DNA circular BCT 14-FEB-2011 DEFINITION Saccharopolyspora erythraea NRRL 2338 chromosome, complete genome. ACCESSION NC_009142 VERSION NC_009142.1 GI:134096620 DBLINK Project: 62947 KEYWORDS complete genome. SOURCE Saccharopolyspora erythraea NRRL 2338 ... misc_feature order(2409324..2409326,2409399..2409401,2409528..2409533, 2409619..2409624,2409679..2409681,2409748..2409753, 2409754..2409759,2409835..2409837,join(2409886..2409890, 2409892..2409898),2409911..2409913,2409920..2409925) /locus_tag="SACE_2218" /note="active site" /db_xref="CDD:119408" misc_feature order(2409324..2409326,2409399..2409401,2409528..2409530) /locus_tag="SACE_2218" /note="catalytic tetrad; other site" /db_xref="CDD:119408" ---------------------------------------- Bug #3197: SeqIO parse error with some genbank files https://redmine.open-bio.org/issues/3197 Author: Cedar McKay Status: Resolved Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.56 URL: I've found a file that seems to choke SeqIO genbank parsing. I downloaded this file straight from NCBI, so it should be a good file. I've found a couple of other files that do the same thing. I reproduced this bug on another machine, also with biopython 1.56. I am able to successfully parse other genbank files. Maybe it has something to do with that very long location? Please let me know if I can provide any other information! Thanks! Cedar >>> from Bio import SeqIO >>> record = SeqIO.read('./Acorus_americanus_NC_010093.gb', 'genbank') Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/SeqIO/__init__.py", line 597, in read first = iterator.next() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/SeqIO/__init__.py", line 525, in parse for r in i: File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 437, in parse_records record = self.parse(handle, do_features) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 420, in parse if self.feed(handle, consumer, do_features): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 392, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 344, in _feed_feature_table consumer.location(location_string) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/__init__.py", line 975, in location raise LocationParserError(location_line) Bio.GenBank.LocationParserError: order(join(42724..42726,43455..43457),43464..43469,43476..43481,43557..43562,43569..43574,43578..43583,43677..43682,44434..44439) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From sbassi at clubdelarazon.org Tue Jul 12 11:28:59 2011 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 12 Jul 2011 12:28:59 -0300 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 10:18 PM, Genivaldo Gueiros wrote: > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > duplicate sequences, remove sequence too short ( the user define the minimum > length) ?and remove sequences which has too much unknown nucleotides ( N) ?( > the user define the % of N is allows ) and in the end the use can choose if > he/she wanna have a file as output or print the result. You should take a look at seqclean utility. Some methods should be apply only to the extremes. From updates at feedmyinbox.com Wed Jul 13 07:22:38 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 13 Jul 2011 07:22:38 -0400 Subject: [Biopython-dev] 7/13 biopython Questions - BioStar Message-ID: <454384b04ce8959586cf724c16ed5e54@74.63.51.88> // Compare two fasta files by headers // July 12, 2011 at 7:00 AM http://biostar.stackexchange.com/questions/10185/compare-two-fasta-files-by-headers Hi everyone; this is my first question on the forum. How can I compare if two fasta files contain the same sequence headers? Does any BioPython module exist for doing this? Thanks in advance, peixe -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From redmine at redmine.open-bio.org Wed Jul 13 23:02:35 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:02:35 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (Closed) Draw a Bio.Phylo tree as a phylogram References: Message-ID: Issue #3260 has been updated by Eric Talevich. Status changed from New to Closed % Done changed from 0 to 100 Committed: https://github.com/biopython/biopython/commit/d3aa24c808b4558dabcf024a485e0128792aa4aa Folks: lemme know how this works for you. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Jul 13 23:08:52 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:08:52 +0000 Subject: [Biopython-dev] [Biopython - Bug #3263] (New) Phylo: Move clade 'color' and 'width' attributes to BaseTree Message-ID: Issue #3263 has been reported by Eric Talevich. ---------------------------------------- Bug #3263: Phylo: Move clade 'color' and 'width' attributes to BaseTree https://redmine.open-bio.org/issues/3263 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The 'color' and 'width' attributes are associated with PhyloXML trees right now, but are useful enough to be associated with the base Tree object (which you'd get from parsing a Newick or Nexus file), even though Newick and Nexus can't serialize this info. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Jul 13 23:08:53 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:08:53 +0000 Subject: [Biopython-dev] [Biopython - Bug #3263] (New) Phylo: Move clade 'color' and 'width' attributes to BaseTree Message-ID: Issue #3263 has been reported by Eric Talevich. ---------------------------------------- Bug #3263: Phylo: Move clade 'color' and 'width' attributes to BaseTree https://redmine.open-bio.org/issues/3263 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The 'color' and 'width' attributes are associated with PhyloXML trees right now, but are useful enough to be associated with the base Tree object (which you'd get from parsing a Newick or Nexus file), even though Newick and Nexus can't serialize this info. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Wed Jul 13 23:23:15 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 13 Jul 2011 23:23:15 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DF599.8080703@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> <4E0DF599.8080703@gmail.com> Message-ID: Hey Brandon, I cherry-picked those commits, finally: https://github.com/biopython/biopython/commit/e2bb900212bd5113385a239b4ed42b570f06e146 https://github.com/biopython/biopython/commit/ab62ac508f02b3df1d2475f599fcd92eda6c078b https://github.com/biopython/biopython/commit/de671e1baf3faa0ed8c10835397e308b1cf1b59d Cheers, Eric On Fri, Jul 1, 2011 at 12:28 PM, Brandon Invergo wrote: > Hi Eric, > > I lied. I took a moment to at least implement the kwargs change: > https://github.com/**brandoninvergo/biopython/**commit/** > 533b06476899b631ec28a6e4cc97a2**e669a45ea0 > > It seems to work swimmingly but I haven't tested it extensively yet. There > was already exception-handling in place. > > Ok, *now* I'm off for the weekend! > Cheers, > Brandon > > > > On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > >> Hi Eric, >> You're right, I had the functionality of kwargs mixed up in my head (I've >> rarely used it) and I forgot that it's passed in as a dict. In that case, >> it's relatively straight-forward to do. Something like this: >> >> def set_options(self, **kwargs): >> for option, value in kwargs.items(): >> if option in self._options: >> self._options[option] = value >> # else: >> # Raise exception >> >> Not sure if raising an exception would really be necessary here. (ps - I >> haven't tested that code, I just typed it up quickly now) >> >> Regarding the splitting of functionality, to an extent it makes sense but >> I wonder if it's worth it, since the PAML commandline programs only take a >> single argument, the path to the control file. However, if the main >> advantages lie in code readability and standardization with the rest of the >> applications framework, then I think it's ok. >> >> Unfortunately I'll be unavailable all weekend (starting in about 3 >> minutes) but I should be free on Monday to work on it. >> >> Cheers, Brandon >> >> On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: >> >>> Hi Brandon, >>> >>> Looks good, thanks! It's just enough to get the point across, and the >>> wiki is a fine place for extended examples. >>> >>> Reading this, I notice that the cml.set_option(key, value) gets kind of >>> tedious when a lot of options need to be set. It would be nice to be able to >>> set them all in one go, as keyword arguments: >>> >>> cml.set_options( >>> seqtype=1, >>> verbose=0, >>> noisy=0, >>> RateAncestor=0, >>> model=0, >>> NSsites=[0, 1, 2], >>> CodonFreq=2, >>> cleandata=1, >>> fix_alpha=1, >>> kappa=4.54006, >>> ) >>> >>> What do you think? Worth implementing? >>> >>> Cheers, >>> Eric >>> >>> >>> On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo >> b.invergo at gmail.com>> wrote: >>> >>> Well, it's not much, but how's this? >>> https://github.com/**brandoninvergo/biopython/tree/**doc-branch >>> Do you want me to go more into detail about the options available like >>> in the wikior is this sufficient as a tutorial? Just let me know... >>> >>> Cheers, >>> Brandon >>> >>> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >>> > wrote: >>> > Hi Eric, >>> > No problem, I'll start writing something up now. >>> > Cheers, >>> > -brandon >>> > >>> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >>> >> >>> wrote: >>> >> Hi Brandon, >>> >> >>> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >>> >> >>> https://github.com/biopython/**biopython/commit/** >>> 190a85c5bde9c079fa5cee4ab9f8ee**3362538cb8 >>> >> >>> >> Do you think you could add some more to that section, maybe >>> pulling a chunk >>> >> of content from the wiki page you just wrote? If you're not >>> comfortable with >>> >> LaTeX you can just point me to some text and I'll add it. >>> >> >>> >> Thanks, >>> >> Eric >>> >> >>> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo >>> > >>> >> wrote: >>> >>> >>> >>> Ok, the documentation is finished: >>> >>> http://biopython.org/wiki/PAML >>> >>> >>> >>> Cheers, >>> >>> Brandon >>> >>> >>> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >>> > wrote: >>> >>> > Brandon; >>> >>> > >>> >>> >> Ok I've just sent the email to the main list. >>> >>> > >>> >>> > Awesome, thanks for this. Hope this convinces some other folks to >>> >>> > take a look. >>> >>> > >>> >>> >> I can write up some documentation this week. What is the >>> official >>> >>> >> procedure for adding documentation to the wiki, if any? Or >>> can I just >>> >>> >> create an account and start writing? >>> >>> > >>> >>> > Create an account and start writing. Nothing official except that >>> >>> > documentation is good. >>> >>> > >>> >>> >> Also, just to double-check, are my docstrings all sufficient >>> or should >>> >>> >> I expand those? >>> >>> > >>> >>> > Your code comments looked great to me. The end user documentation >>> >>> > seems to be the main thing at this point: describing how >>> someone can >>> >>> > pick up and get started with the code. >>> >>> > >>> >>> > Thanks again for all the work, >>> >>> > Brad >>> >>> > ______________________________**_________________ >>> >>> > Biopython-dev mailing list >>> >>> > Biopython-dev at lists.open-bio.**org >>> >>> > >>> >>> > http://lists.open-bio.org/**mailman/listinfo/biopython-dev >>> >>> > >>> >>> ______________________________**_________________ >>> >>> Biopython-dev mailing list >>> >>> Biopython-dev at lists.open-bio.**org >>> >>> > >>> >>> http://lists.open-bio.org/**mailman/listinfo/biopython-dev >>> >> >>> >> >>> > >>> >> > > From p.j.a.cock at googlemail.com Thu Jul 14 03:56:09 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 08:56:09 +0100 Subject: [Biopython-dev] [Biopython] Gene ontology parsing In-Reply-To: References: Message-ID: Hi Kyle, Last year you wrote this on the main Biopython mailing list, On Fri, Jul 23, 2010 at 5:17 PM, Kyle wrote: >> There are already several people working on GO stuff in branches on github, >> e.g. Chris Lasher, Kyle Ellrott, Tam?s Nepusz. I don't know if any of them are >> doing OBO v1.2, but it would be sensible to check and try and combine efforts. > > The branch at http://github.com/kellrott/biopython/tree/gosupport > should parse most of the information held in OBO v1.2. > Chris's original version was targeted only for the GO OBO file, as > there was a typecheck to make sure the node ID's started with 'GO:'. > That's disable in my branch, and I've used the package to parse a few > of the other ontologies found at www.obofoundry.org. > The module is currently called Bio.GO, but maybe it should be > re-factored to represent the fact that it covers general OBO files, > and not just the GO file specifically. > > The main things things keeping it from merging into the main branch > are proper documentation, complete unit tests, and making sure that it > covers all of the standard usage practices. > > If you can try it out, and let me know which function are missing (and > maybe contribute some code), we can push this thing forward. > > Kyle Does your code still exist, perhaps on a different branch? I couldn't find it at the URL http://github.com/kellrott/biopython/tree/gosupport I'm at the CodeFest preceding BOSC 2011, and Herve (CC'd) is interested in parsing OBO files in Biopython. Thanks, Peter From p.j.a.cock at googlemail.com Thu Jul 14 04:25:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 09:25:26 +0100 Subject: [Biopython-dev] PAML unit test failure Message-ID: Hi guys, We're seeing new failures on the buildbot under Python 3, e.g. http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.2/builds/147/steps/shell/logs/stdio ====================================================================== ERROR: testOptionExists (test_Baseml.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Baseml.py", line 94, in testOptionExists self.bml.set_option, "xxxx", 1) AttributeError: 'Baseml' object has no attribute 'set_option' ====================================================================== ERROR: testOptionExists (test_Codeml.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Codeml.py", line 92, in testOptionExists self.cml.set_option, "xxxx", 1) AttributeError: 'Codeml' object has no attribute 'set_option' ====================================================================== ERROR: testOptionExists (test_Yn00.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Yn00.py", line 67, in testOptionExists self.yn00.set_option, "xxxx", 1) AttributeError: 'Yn00' object has no attribute 'set_option' ---------------------------------------------------------------------- Caused by this commit, https://github.com/biopython/biopython/tree/de671e1baf3faa0ed8c10835397e308b1cf1b59d I couldn't see a matching change to the unit tests on Brandon's branch to apply, so I just fixed it: https://github.com/biopython/biopython/commit/145fe2a01afb4092cb2e862142dd04234410b74f Peter From b.invergo at gmail.com Thu Jul 14 04:49:07 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 14 Jul 2011 10:49:07 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> <4E0DF599.8080703@gmail.com> Message-ID: <4E1EAD83.3080603@gmail.com> Hi Eric, I'm really sorry about that!!! I was on holiday for some days and now I'm up to my neck in grant applications, so it totally slipped my mind. Is there anything left to do? I see Peter's email about the unit tests so I'll look into that... Apologies, -brandon On Thu 14 Jul 2011 05:23:15 AM CEST, Eric Talevich wrote: > Hey Brandon, > > I cherry-picked those commits, finally: > https://github.com/biopython/biopython/commit/e2bb900212bd5113385a239b4ed42b570f06e146 > https://github.com/biopython/biopython/commit/ab62ac508f02b3df1d2475f599fcd92eda6c078b > https://github.com/biopython/biopython/commit/de671e1baf3faa0ed8c10835397e308b1cf1b59d > > Cheers, > Eric > > On Fri, Jul 1, 2011 at 12:28 PM, Brandon Invergo > wrote: > > Hi Eric, > > I lied. I took a moment to at least implement the kwargs change: > https://github.com/__brandoninvergo/biopython/__commit/__533b06476899b631ec28a6e4cc97a2__e669a45ea0 > > > It seems to work swimmingly but I haven't tested it extensively yet. > There was already exception-handling in place. > > Ok, *now* I'm off for the weekend! > Cheers, > Brandon > > > > On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > > Hi Eric, > You're right, I had the functionality of kwargs mixed up in my > head (I've rarely used it) and I forgot that it's passed in as a > dict. In that case, it's relatively straight-forward to do. > Something like this: > > def set_options(self, **kwargs): > for option, value in kwargs.items(): > if option in self._options: > self._options[option] = value > # else: > # Raise exception > > Not sure if raising an exception would really be necessary here. > (ps - I haven't tested that code, I just typed it up quickly now) > > Regarding the splitting of functionality, to an extent it makes > sense but I wonder if it's worth it, since the PAML commandline > programs only take a single argument, the path to the control > file. However, if the main advantages lie in code readability > and standardization with the rest of the applications framework, > then I think it's ok. > > Unfortunately I'll be unavailable all weekend (starting in about > 3 minutes) but I should be free on Monday to work on it. > > Cheers, Brandon > > On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: > > Hi Brandon, > > Looks good, thanks! It's just enough to get the point > across, and the wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) > gets kind of tedious when a lot of options need to be set. > It would be nice to be able to set them all in one go, as > keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > > >> > wrote: > > Well, it's not much, but how's this? > https://github.com/__brandoninvergo/biopython/tree/__doc-branch > > Do you want me to go more into detail about the options > available like > in the wikior is this sufficient as a tutorial? Just let me > know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > >> > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > >> wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/__biopython/commit/__190a85c5bde9c079fa5cee4ab9f8ee__3362538cb8 > > > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > >> > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > >> > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some > other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if > any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official > except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient > or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _________________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.__org > > > > >>> > > http://lists.open-bio.org/__mailman/listinfo/biopython-dev > > >>> > > >>> _________________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.__org > > > > >>> > http://lists.open-bio.org/__mailman/listinfo/biopython-dev > > >> > >> > > From kellrott at gmail.com Thu Jul 14 05:19:48 2011 From: kellrott at gmail.com (Kyle) Date: Thu, 14 Jul 2011 02:19:48 -0700 Subject: [Biopython-dev] [Biopython] Gene ontology parsing In-Reply-To: References: Message-ID: (Hitting Reply-All this time) I think I let the work being done by Chris Lasher and Tamas supersedes my work. My patches got pulled into Chris's branch: https://github.com/gotgenes/biopython/tree/gosupport/Bio/GO I'll be at BOSC starting tomorrow. (About to get on a plane to Vienna) Kyle On Thu, Jul 14, 2011 at 12:56 AM, Peter Cock wrote: > Hi Kyle, > > Last year you wrote this on the main Biopython mailing list, > > On Fri, Jul 23, 2010 at 5:17 PM, Kyle wrote: >>> There are already several people working on GO stuff in branches on github, >>> e.g. Chris Lasher, Kyle Ellrott, Tam?s Nepusz. I don't know if any of them are >>> doing OBO v1.2, but it would be sensible to check and try and combine efforts. >> >> The branch at http://github.com/kellrott/biopython/tree/gosupport >> should parse most of the information held in OBO v1.2. >> Chris's original version was targeted only for the GO OBO file, as >> there was a typecheck to make sure the node ID's started with 'GO:'. >> That's disable in my branch, and I've used the package to parse a few >> of the other ontologies found at www.obofoundry.org. >> The module is currently called Bio.GO, but maybe it should be >> re-factored to represent the fact that it covers general OBO files, >> and not just the GO file specifically. >> >> The main things things keeping it from merging into the main branch >> are proper documentation, complete unit tests, and making sure that it >> covers all of the standard usage practices. >> >> If you can try it out, and let me know which function are missing (and >> maybe contribute some code), we can push this thing forward. >> >> Kyle > > Does your code still exist, perhaps on a different branch? I couldn't > find it at the URL http://github.com/kellrott/biopython/tree/gosupport > > I'm at the CodeFest preceding BOSC 2011, and Herve (CC'd) is > interested in parsing OBO files in Biopython. > > Thanks, > > Peter > From b.invergo at gmail.com Thu Jul 14 05:23:14 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 14 Jul 2011 11:23:14 +0200 Subject: [Biopython-dev] PAML unit test failure In-Reply-To: References: Message-ID: <4E1EB582.1090005@gmail.com> So, just to confirm, this is resolved then? Sorry again everyone, I dropped the ball on this one. -brandon On Thu 14 Jul 2011 10:25:26 AM CEST, Peter Cock wrote: > Hi guys, > > We're seeing new failures on the buildbot under Python 3, e.g. > http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.2/builds/147/steps/shell/logs/stdio > > ====================================================================== > ERROR: testOptionExists (test_Baseml.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Baseml.py", > line 94, in testOptionExists > self.bml.set_option, "xxxx", 1) > AttributeError: 'Baseml' object has no attribute 'set_option' > > ====================================================================== > ERROR: testOptionExists (test_Codeml.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Codeml.py", > line 92, in testOptionExists > self.cml.set_option, "xxxx", 1) > AttributeError: 'Codeml' object has no attribute 'set_option' > > ====================================================================== > ERROR: testOptionExists (test_Yn00.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Yn00.py", > line 67, in testOptionExists > self.yn00.set_option, "xxxx", 1) > AttributeError: 'Yn00' object has no attribute 'set_option' > > ---------------------------------------------------------------------- > > Caused by this commit, > > https://github.com/biopython/biopython/tree/de671e1baf3faa0ed8c10835397e308b1cf1b59d > > I couldn't see a matching change to the unit tests on Brandon's branch > to apply, so I just fixed it: > > https://github.com/biopython/biopython/commit/145fe2a01afb4092cb2e862142dd04234410b74f > > Peter From p.j.a.cock at googlemail.com Thu Jul 14 08:54:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 13:54:42 +0100 Subject: [Biopython-dev] PAML unit test failure In-Reply-To: <4E1EB582.1090005@gmail.com> References: <4E1EB582.1090005@gmail.com> Message-ID: On Thu, Jul 14, 2011 at 10:23 AM, Brandon Invergo wrote: > So, just to confirm, this is resolved then? Yes - but you should make sure you get this fix in your branches if required. > Sorry again everyone, I dropped the ball on this one. > > -brandon No problem - one of the points of the buildbot is to catch oversights. Peter From p.j.a.cock at googlemail.com Thu Jul 14 09:15:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 14:15:12 +0100 Subject: [Biopython-dev] SeqXML an alternative for FASTA In-Reply-To: References: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> Message-ID: On Tue, Jul 5, 2011 at 5:10 PM, Peter Cock wrote: > Hi all, > > I've been in touch with Thomas Schmitt about merging read/write > support for the SeqXML file format (see below and http://seqxml.org/ ) > into Biopython's SeqIO module. > > BioPerl already supports this (under format name "seqxml") and a > BioJava v3 implementation is in progress. We're discussing this and > the format itself on the cross project OBF mailing list (see below), > > http://lists.open-bio.org/pipermail/open-bio-l/2011-July/000805.html > > Please feel free to join that list if you want to discuss anything > general, or comment here on the Biopython implementation. > I've got a branch which seems nearly ready for merging on > github, https://github.com/peterjc/biopython/commits/seqxml2 > a rebase of https://github.com/peterjc/biopython/commits/seqxml > > Regards, > > Peter I've just applied this to the trunk. Thomas, please keep in touch if and when there are any further changes to the seqxml specification and the code needs updating. Thanks, Peter From mathieu.giraud at lifl.fr Fri Jul 15 01:44:08 2011 From: mathieu.giraud at lifl.fr (Mathieu Giraud) Date: Fri, 15 Jul 2011 07:44:08 +0200 Subject: [Biopython-dev] Biomanycores, GPU bioinformatics with BioJava, BioPerl and Biopython Message-ID: <345DF252-C388-4DFB-9BF5-9BF077A43CC2@lifl.fr> Dear colleagues, We are pleased to announce the last release of Biomanycores, a library of bioinformatics applications for GPU and other manycore processors. We gather some CUDA and OpenCL codes and provide interfaces with the latest versions of BioJava, BioPerl and Biopython, and we aim to provide soon benchmarks on GPU-accelerated applications through the Bio* frameworks. ==> http://www.biomanycores.org/ <== Currently, Biomanycores has codes for sequence alignment (Smith-Waterman), genome alignment (MUMmer), transcription factor lookup with position-weight matrices, and RNA secondary structure and pseudo-knot prediction. We plan to integrate other applications in the next months. We invite you to try Biomanycores applications and interfaces. We welcome any feedback, most notably on the way we realised the integrations. The main developper of Biomanycores, Jean-Fr?d?ric Berthelot, is full-time on the project and can provide help. In particular, we are willing to provide support on the following points: 1) you can propose other GPU applications for inclusion ; 2) if you have some computation time bottlenecks on a BioJava, BioPerl or Biopython pipeline, we can work together to see if some current or potential GPU applications could speed-up your scripts. You can join us at . Moreover, we will be present this week in Wien at BOSC 2011. Best regards, The Biomanycores team -- Mathieu Giraud - http://www.lifl.fr/~giraud CNRS, LIFL, Universit? Lille 1, INRIA Lille, France From anaryin at gmail.com Fri Jul 15 09:20:03 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 15 Jul 2011 15:20:03 +0200 Subject: [Biopython-dev] Fwd: Update [GSoC - M. Trellet] Message-ID: Updates on https://github.com/mtrellet/biopython/tree/interface_analysis The wiki was also updated with some (not all) information. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao ---------- Forwarded message ---------- From: Mikael Trellet Date: Fri, Jul 15, 2011 at 3:01 PM Subject: Update To: Eric Talevich Cc: Joao Rodrigues Hello Eric, Some news from the project ! We are still ahead on the road map, I have much more time than during the last weeks and can work essentially on it. I added some stuff this morning as the unit tests for the ExtendedResidues class and other smaller functions. I'm going to focus on the buried surface area during the following days, I'm still wondering which method I will use, NACCESS, HSExposure or an other one... Don't hesitate if you have any question ! Cheers, -- Mikael TRELLET, Computational structural biology group, Utrecht University Bijvoet Center, The Netherlands From redmine at redmine.open-bio.org Sat Jul 16 17:42:16 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 16 Jul 2011 21:42:16 +0000 Subject: [Biopython-dev] [Biopython - Bug #3266] (New) Bio/PDB/PDBParser.py: local variable 'i' referenced before assignment Message-ID: Issue #3266 has been reported by Boris Nagaev. ---------------------------------------- Bug #3266: Bio/PDB/PDBParser.py: local variable 'i' referenced before assignment https://redmine.open-bio.org/issues/3266 Author: Boris Nagaev Status: New Priority: Normal Assignee: Category: Target version: URL: Hey! Look at file Bio/PDB/PDBParser.py, line 100:
     91     def _get_header(self, header_coords_trailer):
     92         "Get the header of the PDB file, return the rest."
     93         structure_builder=self.structure_builder
     94         for i in range(0, len(header_coords_trailer)):
     95             structure_builder.set_line_counter(i+1)
     96             line=header_coords_trailer[i]
     97             record_type=line[0:6] 
     98             if(record_type=='ATOM  ' or record_type=='HETATM' or record_type=='MODEL '):
     99                 break
    100         header=header_coords_trailer[0:i]
    101         # Return the rest of the coords+trailer for further processing
    102         self.line_counter=i
    103         coords_trailer=header_coords_trailer[i:]
    104         header_dict=_parse_pdb_header_list(header)
    105         return header_dict, coords_trailer
At line 100, variable i is used, but it may be undefined, if len(header_coords_trailer) = 0 Biopython version 1.57-1+b1, debian lenny ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From venkatesh20 at gmail.com Wed Jul 20 07:51:01 2011 From: venkatesh20 at gmail.com (Venkatesh U) Date: Wed, 20 Jul 2011 17:21:01 +0530 Subject: [Biopython-dev] Contributing to Bio-python Message-ID: Hi Friends, I am interested in contributing to Bio-Python, I am programmer with 5+ years of experience in applying the Machine Learning / Data mining in the telecom domain. I am comfortable with the programming languages Java, Python, also proficient in SQL and NOSql databases and some experience with map reduce and hadoop. Recently I started learning Bio-informatics and got impressed with the contribution that Bio-python is making to this field. I am very interested in contributing to BioPython. I would highly appreciate if you could help me get started. Thanks in Advance. Thanks, Venki From venkatesh20 at gmail.com Wed Jul 20 08:45:20 2011 From: venkatesh20 at gmail.com (Venkatesh U) Date: Wed, 20 Jul 2011 18:15:20 +0530 Subject: [Biopython-dev] Willing to Contribute to Biopython Message-ID: Hi Friends, I am interested in contributing to Bio-Python, I am programmer with 5+ years of experience in applying the Machine Learning / Data mining in the telecom domain. I am comfortable with the programming languages Java, Python, also proficient in SQL and NOSql databases and some experience with map reduce and hadoop. Recently I started learning Bio-informatics and got impressed with the contribution that Bio-python is making to this field. I am very interested in contributing to BioPython. I would highly appreciate if you could help me get started. Thanks in Advance. Thanks, Venki From chapmanb at 50mail.com Wed Jul 20 12:00:28 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 20 Jul 2011 12:00:28 -0400 Subject: [Biopython-dev] Willing to Contribute to Biopython In-Reply-To: References: Message-ID: <20110720160028.GB13254@sobchak> Venki; > I am very interested in contributing to BioPython. I would highly appreciate > if you could help me get started. Thanks in Advance. Welcome, we're very happy to have you interested. The best way to get started is to begin using Biopython for your projects of interest and then contribute back documentation about how to do useful tasks. The Cookbook page is a great example of useful parts contributed by users: http://biopython.org/wiki/Category:Cookbook As you begin doing this and get more familiar with Biopython, you'll likely run into areas where additional libraries might be useful. At that point feel free to suggest potential new libraries and ideas to the list; or get started coded and ask for feedback on what you've written. An alternative approach to getting started is to look at the Issue tracker and pick some problems of interest: https://redmine.open-bio.org/projects/biopython Thanks again for the message and interest, Brad From updates at feedmyinbox.com Wed Jul 20 18:05:09 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 20 Jul 2011 18:05:09 -0400 Subject: [Biopython-dev] 7/20 biopython Questions - BioStar Message-ID: // PSI-Blast commandline version using Bipython // July 20, 2011 at 11:10 AM http://biostar.stackexchange.com/questions/10442/psi-blast-commandline-version-using-bipython Hi Niek, I have been trying to get my program for PSI-Blast run, i have been using the wrapper from Bio.Blast.Applications.NcbipsiblastCommanline: I saw the code you posted at http://biostar.stackexchange.com/questions/2515/how-to-parse-psiblast-results-using-biopython-and-blast-2-2-24 i tried to use the same but it doesnt seem to work for me. can you suggest me why? code that i used: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") #p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,\ # stderr=subprocess.PIPE,shell=(sys.platform!="win32")) #blastParser(p.stdout) str(psi_cline) psi_cline() here is the error that i get : Traceback (most recent call last): File "psiBlast.py", line 116, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/init.py", line 432, in call stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' i do not get any XML as my output as the command is not found apparently. in this case queryID is a protein name and the queryID.fasta consists of the fasta file. looking forward for your reply. Molly -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Thu Jul 21 06:54:55 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 21 Jul 2011 06:54:55 -0400 Subject: [Biopython-dev] 7/21 biopython Questions - BioStar Message-ID: // PSI-Blast commandline version using Biopython // July 20, 2011 at 11:10 AM http://biostar.stackexchange.com/questions/10442/psi-blast-commandline-version-using-biopython Hi Niek, I have been trying to get my program for PSI-Blast run, i have been using the wrapper from Bio.Blast.Applications.NcbipsiblastCommanline: I saw the code you posted at http://biostar.stackexchange.com/questions/2515/how-to-parse-psiblast-results-using-biopython-and-blast-2-2-24 i tried to use the same but it doesnt seem to work for me. can you suggest me why? code that i used: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") #p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,\ # stderr=subprocess.PIPE,shell=(sys.platform!="win32")) #blastParser(p.stdout) str(psi_cline) psi_cline() here is the error that i get : Traceback (most recent call last): File "psiBlast.py", line 116, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/init.py", line 432, in call stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' i do not get any XML as my output as the command is not found apparently. in this case queryID is a protein name and the queryID.fasta consists of the fasta file. looking forward for your reply. Molly -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From redmine at redmine.open-bio.org Thu Jul 21 10:55:02 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 21 Jul 2011 14:55:02 +0000 Subject: [Biopython-dev] [Biopython - Bug #3267] (New) Empty files trigger Bio.SeqIO exception from Bio.SeqIO.SeqXML under Jython Message-ID: Issue #3267 has been reported by Peter Cock. ---------------------------------------- Bug #3267: Empty files trigger Bio.SeqIO exception from Bio.SeqIO.SeqXML under Jython https://redmine.open-bio.org/issues/3267 Author: Peter Cock Status: New Priority: Normal Assignee: Category: Target version: URL: Buildbot test failures following addition of seqxml support to Bio.SeqIO $ jython test_SeqIO.py Traceback (most recent call last): File "test_SeqIO.py", line 392, in records = list(SeqIO.parse(handle, t_format)) File "/Users/pjcock/jython2.5.1/Lib/site-packages/Bio/SeqIO/__init__.py", line 536, in parse for r in i: File "/Users/pjcock/jython2.5.1/Lib/site-packages/Bio/SeqIO/SeqXmlIO.py", line 53, in __iter__ for event,node in self._events: File "/Users/pjcock/jython2.5.1/Lib/xml/dom/pulldom.py", line 231, in next rc = self.getEvent() File "/Users/pjcock/jython2.5.1/Lib/xml/dom/pulldom.py", line 275, in _slurp self.parser.parse(self.stream) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/drivers2/drv_javasax.py", line 141, in parse self._parser.parse(JyInputSourceWrapper(source)) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/drivers2/drv_javasax.py", line 58, in fatalError self._err_handler.fatalError(_wrap_sax_exception(exc)) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:1: Premature end of file. Turns out to be due to different handling of empty XML files under Jython. I will file a bug in Jython shortly (currently http://www.jython.org/ is down). Reduced test case: import sys print sys.version from StringIO import StringIO from xml.dom import pulldom from xml.sax import SAXParseException handle = StringIO() # simulate empty file try: for event,node in pulldom.parse(handle): print event except SAXParseException, e: print repr(e) print "Line number", e.getLineNumber() print "Column number", e.getColumnNumber() print "Done" $ python2.5 ../../sax_empty_xml.py 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ python2.6 ../../sax_empty_xml.py 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ jython ../../sax_empty_xml.py 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] SAXParseException(u'Premature end of file.',) Line number 1 Column number 1 Done Notice (a) different exception description, (b) different column number: ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Jul 21 13:55:54 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 18:55:54 +0100 Subject: [Biopython-dev] Jython updated on mac build slave Message-ID: Hi all, If anyone is curious about the following unit test failure, FAIL: test_SeqIO ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 241, in runTest assert expected_line == output_line, \ AssertionError: Output : " Failed: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)" Expected: " Checking can write/read as 'clustal' format" This was from converting a new SeqXML file into FASTA, where a description contained non-ASCII. It works fine on C Python, but was failing under Jython 2.5.1. The solution? Update to Jython 2.5.2 As part of this work it means all the build slaves now run Jython 2.5.2 rather than a mix of this and Jython 2.5.1 so I have standardised this for the buildbot. There is now one less column listed here: http://testing.open-bio.org:8010/tgrid or one less row with the other view: http://testing.open-bio.org:8010/grid Sadly this seems to have lost the old build history under Jython, but I don't mind too much. Peter From p.j.a.cock at googlemail.com Thu Jul 21 16:18:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 21:18:12 +0100 Subject: [Biopython-dev] Jython updated on mac build slave In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 6:55 PM, Peter Cock wrote: > Hi all, > > If anyone is curious about the following unit test failure, > > FAIL: test_SeqIO > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "run_tests.py", line 241, in runTest > ? ?assert expected_line == output_line, \ > AssertionError: > Output ?: " Failed: 'ascii' codec can't decode byte 0xe5 in position > 0: ordinal not in range(128)" > Expected: " Checking can write/read as 'clustal' format" > > This was from converting a new SeqXML file into FASTA, where a > description contained non-ASCII. It works fine on C Python, but was > failing under Jython 2.5.1. The solution? Update to Jython 2.5.2 > > As part of this work it means all the build slaves now run Jython 2.5.2 > rather than a mix of this and Jython 2.5.1 so I have standardised this > for the buildbot. There is now one less column listed here: > > http://testing.open-bio.org:8010/tgrid > > or one less row with the other view: > > http://testing.open-bio.org:8010/grid > > Sadly this seems to have lost the old build history under Jython, > but I don't mind too much. Note the Windows buildslave is still on Jython 2.5.1 right now, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Jython%202.5/builds/0/steps/shell/logs/stdio Peter From p.j.a.cock at googlemail.com Thu Jul 21 16:48:38 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 21:48:38 +0100 Subject: [Biopython-dev] float('nan') fails on Python 2.5 on Windows (PAML) Message-ID: On Sun, Jun 5, 2011 at 10:51 PM, Peter wrote: > Hi all, > > As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') > and also float('nan') were passed to the underlying C library, which > may or may not return the IEEE special floating point value for > infinity, minus infinity or nan. See: > http://www.python.org/dev/peps/pep-0754/ > > This is the root cause of this unit test failure on Windows Python 2.5, > ... The related problem of float("nan") on Python 2.5 or older on Windows is the cause of this problem in the new PAML code too: http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/229/steps/shell/logs/stdio ValueError: invalid literal for float(): nan I guess _parse_codeml.py might need something like this: try: float("nan") _float = float except ValueError: def _float(txt): try: return float(text) except ValueError, e: if txt=="nan": return XXX else: raise e And then use the nan safe _float function in the parser. Unless anyone has a nicer solution? Peter From p.j.a.cock at googlemail.com Fri Jul 22 13:35:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:35:25 +0100 Subject: [Biopython-dev] Jython updated on mac build slave In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 9:18 PM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 6:55 PM, Peter Cock wrote: >> Hi all, >> >> If anyone is curious about the following unit test failure, ... >> It works fine on C Python, but was failing under Jython 2.5.1. >> The solution? Update to Jython 2.5.2 > > ... > > Note the Windows buildslave is still on Jython 2.5.1 right now, > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Jython%202.5/builds/0/steps/shell/logs/stdio That's updated now, Peter From p.j.a.cock at googlemail.com Fri Jul 22 13:36:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:36:46 +0100 Subject: [Biopython-dev] float('nan') fails on Python 2.5 on Windows (PAML) In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 9:48 PM, Peter Cock wrote: > On Sun, Jun 5, 2011 at 10:51 PM, Peter wrote: >> Hi all, >> >> As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') >> and also float('nan') were passed to the underlying C library, which >> may or may not return the IEEE special floating point value for >> infinity, minus infinity or nan. See: >> http://www.python.org/dev/peps/pep-0754/ >> >> This is the root cause of this unit test failure on Windows Python 2.5, >> ... > > The related problem of float("nan") on Python 2.5 or older on > Windows is the cause of this problem in the new PAML code too: > > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/229/steps/shell/logs/stdio > > ValueError: invalid literal for float(): nan > > I guess _parse_codeml.py might need something like this: > > > try: > ? ?float("nan") > ? ?_float = float > except ValueError: > ? ?def _float(txt): > ? ? ? ?try: > ? ? ? ? ? ?return float(text) > ? ? ? ?except ValueError, e: > ? ? ? ? ? ?if txt=="nan": > ? ? ? ? ? ? ? ?return XXX > ? ? ? ? ? ?else: > ? ? ? ? ? ? ? ?raise e > > And then use the nan safe _float function in the parser. > > Unless anyone has a nicer solution? > > Peter I've committed something like that after testing on Windows, https://github.com/biopython/biopython/commit/7539e9163839642ada24e1ebb9c3aff1bb25d573 Not very elegant, but it works. Peter From redmine at redmine.open-bio.org Fri Jul 22 13:43:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 22 Jul 2011 17:43:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3268] (New) Windows and Python 3 specific unicode problem in SeqIO / SeqXML Message-ID: Issue #3268 has been reported by Peter Cock. ---------------------------------------- Bug #3268: Windows and Python 3 specific unicode problem in SeqIO / SeqXML https://redmine.open-bio.org/issues/3268 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: We're seeing unittest failures under Python 3.1 and 3.2 on Windows XP via the buildbot, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.1/builds/240/steps/shell/logs/stdio or http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.2/builds/100/steps/shell/logs/stdio ====================================================================== FAIL: test_unicode_characters_desc (test_SeqIO_SeqXML.TestDetailedRead) Test special unicode characters in the description. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win31\build\build\py3.1\Tests\test_SeqIO_SeqXML.py", line 55, in test_unicode_characters_desc self.assertEqual(self.records["rna"][2].description, "\u00E5\u00C5\u00FC\u00F6\u00D6\u00DF\u00F8\u00E4\u00A2\u00A3$\u20AC\u9999\u80A0") AssertionError: '??????????$???\xa0' != '????????$?\u9999\u80a0' ---------------------------------------------------------------------- This test is currently working on Linux, Mac OS X for Python 3. There was a similar failure in Jython 2.5.1 (cross platform), now fixed in Jython 2.5.2, see: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009044.html Peter ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Fri Jul 22 13:43:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 22 Jul 2011 17:43:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3268] (New) Windows and Python 3 specific unicode problem in SeqIO / SeqXML Message-ID: Issue #3268 has been reported by Peter Cock. ---------------------------------------- Bug #3268: Windows and Python 3 specific unicode problem in SeqIO / SeqXML https://redmine.open-bio.org/issues/3268 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: We're seeing unittest failures under Python 3.1 and 3.2 on Windows XP via the buildbot, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.1/builds/240/steps/shell/logs/stdio or http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.2/builds/100/steps/shell/logs/stdio ====================================================================== FAIL: test_unicode_characters_desc (test_SeqIO_SeqXML.TestDetailedRead) Test special unicode characters in the description. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win31\build\build\py3.1\Tests\test_SeqIO_SeqXML.py", line 55, in test_unicode_characters_desc self.assertEqual(self.records["rna"][2].description, "\u00E5\u00C5\u00FC\u00F6\u00D6\u00DF\u00F8\u00E4\u00A2\u00A3$\u20AC\u9999\u80A0") AssertionError: '??????????$???\xa0' != '????????$?\u9999\u80a0' ---------------------------------------------------------------------- This test is currently working on Linux, Mac OS X for Python 3. There was a similar failure in Jython 2.5.1 (cross platform), now fixed in Jython 2.5.2, see: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009044.html Peter -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Jul 22 13:50:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:50:33 +0100 Subject: [Biopython-dev] Windows buildslave back online Message-ID: Hi all, After a period where my Windows machine was being borrowed, it should now be back online as a buildslave again. Most of the Windows specific unit test failures that have developed during that last few weeks have now been fixed, except this one: https://redmine.open-bio.org/issues/3268 Peter P.S. If anyone has some suitable networked machines that could be left on as buildslaves, we'd like more - especially a 64 bit Windows box (which will be an interesting challenge to get Python, NumPy and Biopython compiling and running nicely in the first place). See also http://www.biopython.org/wiki/Continuous_integration From tiagoantao at gmail.com Fri Jul 22 15:38:21 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 22 Jul 2011 20:38:21 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 6:50 PM, Peter Cock wrote: > P.S. If anyone has some suitable networked machines that could > be left on as buildslaves, we'd like more - especially a 64 bit > Windows box (which will be an interesting challenge to get Python, > NumPy and Biopython compiling and running nicely in the first place). > See also http://www.biopython.org/wiki/Continuous_integration I can put mine online (now things are more quiet and I have time again), but it be up just a few times a week or so. Better than nothing? From p.j.a.cock at googlemail.com Fri Jul 22 15:46:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 20:46:29 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Friday, July 22, 2011, Tiago Ant?o wrote: > On Fri, Jul 22, 2011 at 6:50 PM, Peter Cock wrote: >> P.S. If anyone has some suitable networked machines that could >> be left on as buildslaves, we'd like more - especially a 64 bit >> Windows box (which will be an interesting challenge to get Python, >> NumPy and Biopython compiling and running nicely in the first place). >> See also http://www.biopython.org/wiki/Continuous_integration > > I can put mine online (now things are more quiet and I have time > again), but it be up just a few times a week or so. Better than > nothing? Certainly much better than nothing - yes please. Peter From tiagoantao at gmail.com Fri Jul 22 22:52:56 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 23 Jul 2011 02:52:56 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 5:50 PM, Peter Cock wrote: > P.S. If anyone has some suitable networked machines that could > be left on as buildslaves, we'd like more - especially a 64 bit > Windows box (which will be an interesting challenge to get Python, > NumPy and Biopython compiling and running nicely in the first place). > See also http://www.biopython.org/wiki/Continuous_integration I tried compiling a 64 bit version. It is a bit of hell. 1. There is a 64 bit windows version of python and numpy. So far so good 2. Python does not support mingw64. One needs to tweak the python-dev include files (there are several bugs/requests related to this) 3. Visual Studio express does not generate 64 bit binaries, just the pro version can do this (but see below) 4. distutils does not support recent versions of VS (i.e. not version 10, only version 9). It can work, but it requires tweaking the distutils source code. So: no joy unless with either (i) tweaking the python compiler source code for mingw OR (i) having the pro version of VS (which I do not have - and is what Christoph Gohlke seems to be doing) OR ... There is way to generate a 64 bit version with VS Express + the MS SDK (all free), but it will require minor changes to setup.py (because of point 4 above). I am now able to compile biopython fully 64 bit with free (as in beer) tools. I just now have to streamline the process.. To sum it up it requires: VS Express + MS SDK + minor changes to setup.py (subclassing distutils for VS 2010 Express + MS SDK) Going to bed now, its late around were, F#!$#!!! Windows! Tiago From redmine at redmine.open-bio.org Mon Jul 25 01:54:30 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 25 Jul 2011 05:54:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #2578] The GenBank SeqRecord parser does not record molecule type or if circular References: Message-ID: Issue #2578 has been updated by Mark Diekhans. not having the molecule type is a fairly serious problem with the Genbank parser. The fact that it guess the type when writing the record is corrupting data. The priority of fixing this should be increased.on top A the very least this should be documented in the class, since its a huge waste of time looking for where it is store in the SeqRecord. ---------------------------------------- Bug #2578: The GenBank SeqRecord parser does not record molecule type or if circular https://redmine.open-bio.org/issues/2578 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.47 URL: Filing this bug after discussion on the mailing list, where the issue was raised by Chris Lasher: http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html The LOCUS line at the start of a GenBank record can record the molecule type (DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular, e.g. LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not record these two bits of information in the SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Jul 25 04:32:11 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 25 Jul 2011 08:32:11 +0000 Subject: [Biopython-dev] [Biopython - Bug #2578] The GenBank SeqRecord parser does not record molecule type or if circular References: Message-ID: Issue #2578 has been updated by Peter Cock. Regarding how is_circular is/should be stored in BioSQL, http://lists.open-bio.org/pipermail/biosql-l/2011-July/001774.html http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035433.html ---------------------------------------- Bug #2578: The GenBank SeqRecord parser does not record molecule type or if circular https://redmine.open-bio.org/issues/2578 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.47 URL: Filing this bug after discussion on the mailing list, where the issue was raised by Chris Lasher: http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html The LOCUS line at the start of a GenBank record can record the molecule type (DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular, e.g. LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not record these two bits of information in the SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Tue Jul 26 05:08:38 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Jul 2011 11:08:38 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi everyone, A few weeks ago I wrote about my interest in making Biopython able to parse the Abi trace file. I've finished writing the SeqIO plugin and some tests. I thought this would be useful to a number of people, so I was wondering about what I should do after this (how will my code be reviewed?, should I just go with a pull request?). Of course, there are things that I might have missed when writing the plugin, so feel free to criticize/comment :)! Here's the SeqIO plugin: https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py Looking forward to the reply, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Jul 7, 2011 at 03:16, Wibowo Arindrarto wrote: > Hi everyone, > > This is my first post in the dev mailing list, so greetings :). > > I've been using Biopython for a few months in total now (in a period of > ~1.5 years) and before that Python for ~0.5 years. Most of the time, I'm > working with Sanger sequencing results and at one point I was a bit > disappointed that I couldn't find any (bio)python module for reading .ab1 > files. That compelled me to write my first python module that reads those > files and extracts the useful information out of them. In the process I > became more interested in python itself and finally thought it might be neat > if biopython could do this, built-in. > > So I forked the main repo, made some changes to my module so it became a > parser for the SeqIO submodule that reads Abi files. It's not cooked 100% > yet, but if anyone is interested in seeing/commenting/criticizing the code, > I'd appreciate that very much. Here's the link: > https://github.com/bow/biopython/blob/seqio-abif/Bio/SeqIO/AbiIO.py > > Some features to note: > - I've included a method to trim the sequence based on its quality scores > - the parser does not extract the entire metadata of the trace files, only > ones I consider important for further analysis/annotations. Of course, this > could be changed if the community think some other data should be > included/excluded > - For those of you already familiar with the Abi format, I deliberately > chose the 'PBAS2' tag for the sequence information, which is the unedited > bases after base-calling by the sequencing program. > > Some things that I'm doing right now: > - writing unit tests > - making sure it's compatible with Python 3 (thanks Peter :)! ) > - completing the docs > - making sure it's compatible with most Abi format versions. Currently I've > only tested it with files from the 310, 3100, and 3700 machines. Does anyone > have some other versions that I can test this with? > > As I understand as well, this is not the only Sanger sequencing trace > format out there (e.g. SCF is another). I would be glad to learn more and > write a parser for the SCF format as well. The problem is, I'm not sure this > would be useful in the long run as I've personally never seen anyone use an > SCF file and so I've never had a chance to play around with one. If anyone > has an SCF file lying around and thinks SCF support would be beneficial, I'd > be happy to accept them :). > > I guess that's all for now. Thanks for reading! > > --- > Wibowo Arindrarto (bow) > http://bow.web.id > From p.j.a.cock at googlemail.com Tue Jul 26 05:59:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Jul 2011 10:59:33 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 10:08 AM, Wibowo Arindrarto wrote: > Hi everyone, > > A few weeks ago I wrote about my interest in making Biopython able to parse > the Abi trace file. I've finished writing the SeqIO plugin and some tests. I > thought this would be useful to a number of people, so I was wondering about > what I should do after this (how will my code be reviewed?, should I just go > with a pull request?). Of course, there are things that I might have missed > when writing the plugin, so feel free to criticize/comment :)! > > Here's the SeqIO plugin: > https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py > > Looking forward to the reply, Hi Wibowo, You could send a pull request if you wanted, but this email to the dev list is enough. I probably wouldn't just merge it - I prefer to rebase to the current master first to get a clean history (especially if there are already several merges in the branch history). I will review your work. In particular I plan to cross test with EMBOSS seqret to verify you produce the same sequence and the same PHRED quality scores - this could be done in test_Emboss.py Thanks for your work! Peter From w.arindrarto at gmail.com Tue Jul 26 06:37:23 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Jul 2011 12:37:23 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, You're welcome :). In that case, I'll just stick to submitting through this mailing list. I'm looking forward to your review! --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Jul 26, 2011 at 11:59, Peter Cock wrote: > On Tue, Jul 26, 2011 at 10:08 AM, Wibowo Arindrarto > wrote: > > Hi everyone, > > > > A few weeks ago I wrote about my interest in making Biopython able to > parse > > the Abi trace file. I've finished writing the SeqIO plugin and some > tests. I > > thought this would be useful to a number of people, so I was wondering > about > > what I should do after this (how will my code be reviewed?, should I just > go > > with a pull request?). Of course, there are things that I might have > missed > > when writing the plugin, so feel free to criticize/comment :)! > > > > Here's the SeqIO plugin: > > https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py > > > > Looking forward to the reply, > > Hi Wibowo, > > You could send a pull request if you wanted, but this email to the dev list > is enough. I probably wouldn't just merge it - I prefer to rebase to the > current master first to get a clean history (especially if there are > already > several merges in the branch history). > > I will review your work. > > In particular I plan to cross test with EMBOSS seqret to verify you produce > the same sequence and the same PHRED quality scores - this could be > done in test_Emboss.py > > Thanks for your work! > > Peter > From peter at maubp.freeserve.co.uk Tue Jul 26 14:02:23 2011 From: peter at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Jul 2011 19:02:23 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] New SOAP based BLAST service In-Reply-To: References: Message-ID: FYI ---------- Forwarded message ---------- From: Mcginnis, Scott (NIH/NLM/NCBI) [E] Date: Tue, Jul 26, 2011 at 5:12 PM Subject: [blast-announce] New SOAP based BLAST service To: NLM/NCBI List blast-announce A SOAP based BLAST service is available. ? ?This service makes use of the Simple Object Access Protocol to submit and retrieve searches with the NCBI BLAST web server. ?The service can also query the server for other information. ?A simple ("Lite") interface is available that should be suitable for most projects. ?Documentation and links to the WSDL and sample clients are http://www.ncbi.nlm.nih.gov/books/NBK55699/ From redmine at redmine.open-bio.org Tue Jul 26 16:16:43 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 26 Jul 2011 20:16:43 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (New) Updates to PDBList.py- downloading PDB structures Message-ID: Issue #3271 has been reported by David Cain. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Jul 26 16:16:43 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 26 Jul 2011 20:16:43 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (New) Updates to PDBList.py- downloading PDB structures Message-ID: Issue #3271 has been reported by David Cain. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From tiagoantao at gmail.com Tue Jul 26 23:19:48 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 03:19:48 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: Hi, Just some developments on the win 64 bit front: It now compiles and mostly tests (I say "mostly" because I do not have all external binaries installed, neither reportlab). I would like to enquire about the possibility of adding a (default) option to windows linkage. By default, link.exe generates manifest files, but on VS 2010 in some cases it does not (a somewhat related discussion can be found here http://bugs.python.org/issue4431 ). I would like to try and add the workaround described on that bug page, namely extra_link_args=["/MANIFEST"] to all Extensions on setup.py, for example EXTENSIONS.append( Extension('Bio.Cluster.cluster', ['Bio/Cluster/clustermodule.c', 'Bio/Cluster/cluster.c'], include_dirs=[numpy_include_dir], extra_link_args=["/MANIFEST"] )) [Note the extra_link_args] I am far from being a windows specialist, but from what I have read, this is harmless and would sort out the only issue that I have with compiling on VS2010 + Win SDK to generate native 64-bit binaries. Tiago From p.j.a.cock at googlemail.com Wed Jul 27 04:02:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Jul 2011 09:02:31 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Tiago Ant?o : > Hi, > > Just some developments on the win 64 bit front: It now compiles and > mostly tests (I say "mostly" because I do not have all external > binaries installed, neither reportlab). > > I would like to enquire about the possibility of adding a (default) > option to windows linkage. By default, link.exe generates manifest > files, but on VS 2010 in some cases it does not (a somewhat related > discussion can be found here http://bugs.python.org/issue4431 ). I > would like to try and add the workaround described on that bug page, > namely extra_link_args=["/MANIFEST"] to all Extensions on setup.py, > for example > ? ?EXTENSIONS.append( > ? ? ? ?Extension('Bio.Cluster.cluster', > ? ? ? ? ? ? ? ? ?['Bio/Cluster/clustermodule.c', > ? ? ? ? ? ? ? ? ? 'Bio/Cluster/cluster.c'], > ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > extra_link_args=["/MANIFEST"] > ? ? ? ? ? ? ? ? ?)) > > [Note the extra_link_args] > > I am far from being a windows specialist, but from what I have read, > this is harmless and would sort out the only issue that I have with > compiling on VS2010 + Win SDK to generate native 64-bit binaries. > > Tiago Can you make it conditional on Windows? Peter From tiagoantao at gmail.com Wed Jul 27 07:17:39 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 11:17:39 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Peter Cock : > Can you make it conditional on Windows? Silly me, just forgot all other platforms. Furthermore, the 32-bit compiler in Windows is VS? Because if it is mingw then something extra is needed. I will work on a new git fork to change the setup.py only to accomodate windows 64. Tiago From tiagoantao at gmail.com Wed Jul 27 07:59:42 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 11:59:42 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: One final question: If VS Express is used to compile the official biopython, what is the version? 2008? Tiago 2011/7/27 Tiago Ant?o : > 2011/7/27 Peter Cock : >> Can you make it conditional on Windows? > > Silly me, just forgot all other platforms. Furthermore, the 32-bit > compiler in Windows is VS? Because if it is mingw then something extra > is needed. > I will work on a new git fork to change the setup.py only to > accomodate windows 64. > > Tiago > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Wed Jul 27 08:17:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Jul 2011 13:17:41 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Tiago Ant?o : > 2011/7/27 Peter Cock : >> Can you make it conditional on Windows? > > Silly me, just forgot all other platforms. Furthermore, the 32-bit > compiler in Windows is VS? Because if it is mingw then something extra > is needed. > I will work on a new git fork to change the setup.py only to > accomodate windows 64. > > Tiago Actually we (well - my Windows machine since that is the one that has done all the recent releases) are using a mixture depending on the version of Python. According to my old email from Dec 2010, since I don't have the machine in front of me right now: >> We're using mingw32 from Cygwin on older versions of Python, >> and I think Python 2.6 onwards I'm using Microsoft?s free VC++ >> 2008 Express Edition which was downloaded from >> http://www.microsoft.com/express/download/ See: http://lists.open-bio.org/pipermail/biopython-dev/2010-December/008582.html It would be worth asking anyone else on the dev list who has previously compiled and built installers on Windows (e.g. Michael) to also test any changes. I might on day be able to setup a 64bit Windows virtual machine, but right now I don't have time and it would require finding the right person in IT for the install media and licence etc. Peter From tiagoantao at gmail.com Wed Jul 27 08:26:08 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 13:26:08 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: I am using VS Express 2010 plus the Windows SDK. It seems to work. It is possible to generate 64 bit binaries on 32 bit machines, BTW (with cross compiling). I have made some very simple changes to setup.py : https://github.com/tiagoantao/biopython/blob/master/setup.py They should only impact Windows with 64-bit interpreters. Even if they do not work (which they seem to do), they should only break 64-bin win interpreters (which are not functional anyway). Tiago 2011/7/27 Peter Cock : > 2011/7/27 Tiago Ant?o : >> 2011/7/27 Peter Cock : >>> Can you make it conditional on Windows? >> >> Silly me, just forgot all other platforms. Furthermore, the 32-bit >> compiler in Windows is VS? Because if it is mingw then something extra >> is needed. >> I will work on a new git fork to change the setup.py only to >> accomodate windows 64. >> >> Tiago > > Actually we (well - my Windows machine since that is the one > that has done all the recent releases) are using a mixture > depending on the version of Python. > > According to my old email from Dec 2010, since I don't have > the machine in front of me right now: >>> We're using mingw32 from Cygwin on older versions of Python, >>> and I think Python 2.6 onwards I'm using Microsoft?s free VC++ >>> 2008 Express Edition which was downloaded from >>> http://www.microsoft.com/express/download/ > > See: http://lists.open-bio.org/pipermail/biopython-dev/2010-December/008582.html > > It would be worth asking anyone else on the dev list who has > previously compiled and built installers on Windows (e.g. Michael) > to also test any changes. > > I might on day be able to setup a 64bit Windows virtual machine, > but right now I don't have time and it would require finding the > right person in IT for the install media and licence etc. > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Thu Jul 28 19:01:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 00:01:29 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 11:37 AM, Wibowo Arindrarto wrote: > > I'm looking forward to your review! > Hi Bow, I made a start tonight, https://github.com/peterjc/biopython/tree/seqio-abi I haven't added the ABI files to test_SeqIO.py yet, there is some alphabet issue to check there. I noticed that EMBOSS seqret (at least the old copy I had installed on my laptop, v6.1.0 I think) was able to give ABI records identifiers (rather than the hack of using the filename). Peter From w.arindrarto at gmail.com Fri Jul 29 04:07:43 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 29 Jul 2011 10:07:43 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I made a local branch tracking your seqio-abi tree. I agree to most of the changes, but I think I'm a bit lost on the filename part. My intention is to use the filename of the Abi file as the ID for the SeqRecord, instead of the stored records identified returned by seqret. The reason is because it's easier to see which Abi file a SeqRecord came from by looking at the ID (or output file name, in case the SeqRecord is written as another format), since the records identifier data is not readily available. I chose to store the records identifier in SeqRecord.name (sample_id), so users can still cross check if they want to. My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being absent, now that I think of it. But do you think instead of 'None', maybe we could use 'file_id = str(handle)' or 'file_id = self.name'? And lastly, could you clarify what you mean by alphabet issue on test_SeqIO.py? Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 01:01, Peter Cock wrote: > On Tue, Jul 26, 2011 at 11:37 AM, Wibowo Arindrarto wrote: > > > > I'm looking forward to your review! > > > > Hi Bow, > > I made a start tonight, > https://github.com/peterjc/biopython/tree/seqio-abi > > I haven't added the ABI files to test_SeqIO.py yet, > there is some alphabet issue to check there. > > I noticed that EMBOSS seqret (at least the old > copy I had installed on my laptop, v6.1.0 I think) > was able to give ABI records identifiers (rather > than the hack of using the filename). > > Peter > From p.j.a.cock at googlemail.com Fri Jul 29 05:39:20 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 10:39:20 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 9:07 AM, Wibowo Arindrarto wrote: > Hi Peter, > I made a local branch tracking your seqio-abi tree. I agree to most of the > changes, but I think I'm a bit lost on the filename part. > My intention is to use the filename of the Abi file as the ID for the > SeqRecord, instead of the stored records identified returned by seqret. The > reason is because it's easier to see which Abi file a SeqRecord came from by > looking at the ID (or output file name, in case the SeqRecord is written as > another format), since the records identifier data is not readily available. > I chose to store the records identifier in SeqRecord.name (sample_id), so > users can still cross check if they want to. > My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being > absent, now that I think of it. But do you think instead of 'None', maybe we > could use 'file_id = str(handle)' or 'file_id = self.name'? There may not be a filename - the ABI file might be piped from stdin, or supplied as a StringIO handle, or a network handle. So using the filename as the primary identifier seems wrong to me. I would want the same ID regardless of how the file was loaded, or what the name was. Using the filename as the SeqRecord name (if available, "" if not) would be OK with me. The other justification for using the ID in the file as the SeqRecord's id is consistency with EMBOSS. We should also check how BioPerl does it - but I'm not sure if I have all the dependencies installed. Also, is it possible to concatenate multiple ABI files together? > And lastly, could you clarify what you mean by alphabet issue on > test_SeqIO.py? Add the three good ABI test files to the list in test_SeqIO.py and run the test, you'll get a complaint about the alphabet handling. I didn't have time to look into what exactly was going on yet. Peter From w.arindrarto at gmail.com Fri Jul 29 07:34:12 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 29 Jul 2011 13:34:12 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, Thanks for explaining. I understand why we should stick to the stored sequence id. In this case, we can use the filename as SeqRecord.name as well. Regarding BioPerl, I don't have it installed myself -- but I took a quick look at their source and it seems they also use the stored sequence ID as their main identifier instead of the filename. If the stored sequence ID is not present, it's "(unknown)" in their case. As for concatenation, I don't think it's possible. The official specfrom ABI does not mention anything about combining ABI records. Plus, the file structure itself does not allow multiple sequence to be stored. I'll look on the test_SeqIO.py over the weekend. I think it'll have something to do with some ambiguous dna base stored in the abi files. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 11:39, Peter Cock wrote: > On Fri, Jul 29, 2011 at 9:07 AM, Wibowo Arindrarto > wrote: > > Hi Peter, > > I made a local branch tracking your seqio-abi tree. I agree to most of > the > > changes, but I think I'm a bit lost on the filename part. > > My intention is to use the filename of the Abi file as the ID for the > > SeqRecord, instead of the stored records identified returned by seqret. > The > > reason is because it's easier to see which Abi file a SeqRecord came from > by > > looking at the ID (or output file name, in case the SeqRecord is written > as > > another format), since the records identifier data is not readily > available. > > I chose to store the records identifier in SeqRecord.name (sample_id), so > > users can still cross check if they want to. > > My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being > > absent, now that I think of it. But do you think instead of 'None', maybe > we > > could use 'file_id = str(handle)' or 'file_id = self.name'? > > There may not be a filename - the ABI file might be piped from stdin, > or supplied as a StringIO handle, or a network handle. So using the > filename as the primary identifier seems wrong to me. I would want > the same ID regardless of how the file was loaded, or what the name > was. Using the filename as the SeqRecord name (if available, "" if > not) would be OK with me. > > The other justification for using the ID in the file as the SeqRecord's id > is consistency with EMBOSS. We should also check how BioPerl does > it - but I'm not sure if I have all the dependencies installed. > > Also, is it possible to concatenate multiple ABI files together? > > > And lastly, could you clarify what you mean by alphabet issue on > > test_SeqIO.py? > > Add the three good ABI test files to the list in test_SeqIO.py and > run the test, you'll get a complaint about the alphabet handling. > I didn't have time to look into what exactly was going on yet. > > Peter > From p.j.a.cock at googlemail.com Fri Jul 29 08:14:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 13:14:06 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: > Hi Peter, > Thanks for explaining. I understand why we should stick to the stored > sequence id. In this case, we can use the filename as SeqRecord.name as > well. Regarding BioPerl, I don't have it installed myself -- but I took a > quick look at their source and it seems they also use the stored sequence ID > as their main identifier instead of the filename. If the stored sequence ID > is not present, it's "(unknown)" in their case. OK good, that means Biopython, BioPerl and EMBOSS should be consistent :) > As for concatenation, I don't think it's possible. The official spec from > ABI does not mention anything about combining ABI records. Plus, the file > structure itself does not allow multiple sequence to be stored. OK good, I didn't think it was allowed but wanted to check. > I'll look on the test_SeqIO.py over the weekend. I think it'll have > something to do with some ambiguous dna base stored in the abi files. > Regards, Some of the alphabet stuff is a bit nasty - so please feel free to ask or get me to help. Peter From p.j.a.cock at googlemail.com Fri Jul 29 12:20:23 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 17:20:23 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi again, I had a bit of time this afternoon so I looked at this. On Fri, Jul 29, 2011 at 1:14 PM, Peter Cock wrote: > On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: >> Hi Peter, >> Thanks for explaining. I understand why we should stick to the stored >> sequence id. In this case, we can use the filename as SeqRecord.name as >> well. Regarding BioPerl, I don't have it installed myself -- but I took a >> quick look at their source and it seems they also use the stored sequence ID >> as their main identifier instead of the filename. If the stored sequence ID >> is not present, it's "(unknown)" in their case. > > OK good, that means Biopython, BioPerl and EMBOSS should be > consistent :) I've made that switch, >> I'll look on the test_SeqIO.py over the weekend. I think it'll have >> something to do with some ambiguous dna base stored in the abi files. >> Regards, > > Some of the alphabet stuff is a bit nasty - so please feel free to ask > or get me to help. I've done enough to get the test_SeqIO.py unit test to pass. We probably need a check (like in SFF) to check the user hasn't given a handle opened in text mode. That should probably have a unit test too. I still haven't cross checked the sequence and PHRED scores from your code and EMBOSS. Anyway - I'll leave the code for you to work on for now... Peter From w.arindrarto at gmail.com Sat Jul 30 03:42:04 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 30 Jul 2011 09:42:04 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I've done some more improvements to the code: - I've written the check and unittest for the file handle mode. I've set it so that abi file has to be opened in 'rb' mode, otherwise it'll return an error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be specified as 'rb' in Windows and/or Python 3 for the file to be read correctly. So I decided forcing it to 'rb' is the best. Because of this, I changed 'test_SeqIO.py:503' to include the mode argument when opening. - I've also checked against test_Emboss.py for seqret output, after including the abi format in it. My EMBOSS version is 6.4.0. There was a slight problem with this testing, since for some reason the ID returned by seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS installation, since when I previously tested it against 6.1.0, the ID was correct (although the qual values not, so I had to upgrade). As expected, if I comment out the code that tests for sequence id ('test_Emboss.py:168-172') the tests pass. Maybe you could try testing it as well and see if EMBOSS also returns the default id instead of the sample name? - Finally, I did some small cosmetic changes to the code (typos, etc). All changes have been pushed to my github fork. Now I still have time for the weekend to improve whatever needs to be improved :). Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 18:20, Peter Cock wrote: > Hi again, > > I had a bit of time this afternoon so I looked at this. > > On Fri, Jul 29, 2011 at 1:14 PM, Peter Cock > wrote: > > On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: > >> Hi Peter, > >> Thanks for explaining. I understand why we should stick to the stored > >> sequence id. In this case, we can use the filename as SeqRecord.name as > >> well. Regarding BioPerl, I don't have it installed myself -- but I took > a > >> quick look at their source and it seems they also use the stored > sequence ID > >> as their main identifier instead of the filename. If the stored sequence > ID > >> is not present, it's "(unknown)" in their case. > > > > OK good, that means Biopython, BioPerl and EMBOSS should be > > consistent :) > > I've made that switch, > > >> I'll look on the test_SeqIO.py over the weekend. I think it'll have > >> something to do with some ambiguous dna base stored in the abi files. > >> Regards, > > > > Some of the alphabet stuff is a bit nasty - so please feel free to ask > > or get me to help. > > I've done enough to get the test_SeqIO.py unit test to pass. > > We probably need a check (like in SFF) to check the user hasn't given > a handle opened in text mode. That should probably have a unit test > too. > > I still haven't cross checked the sequence and PHRED scores from > your code and EMBOSS. > > Anyway - I'll leave the code for you to work on for now... > > Peter > From redmine at redmine.open-bio.org Sun Jul 31 16:22:05 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 31 Jul 2011 20:22:05 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by Eric Talevich. Hi David, Thanks for doing this. Overall I agree with your solution. I peppered your proposed fix with review comments on Github: https://github.com/DavidCain/biopython/commit/e6eef7e2a8117b6de4e9fdea3b4bd77575d383cf Once you've looked at it again can you submit your pdb-fixes branch as a pull request on GitHub? (If not, no worries, I can cherry-pick it. Just let us know when you're ready.) -Eric ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Fri Jul 1 02:17:26 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 30 Jun 2011 22:17:26 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Hi Brandon, Looks good, thanks! It's just enough to get the point across, and the wiki is a fine place for extended examples. Reading this, I notice that the cml.set_option(key, value) gets kind of tedious when a lot of options need to be set. It would be nice to be able to set them all in one go, as keyword arguments: cml.set_options( seqtype=1, verbose=0, noisy=0, RateAncestor=0, model=0, NSsites=[0, 1, 2], CodonFreq=2, cleandata=1, fix_alpha=1, kappa=4.54006, ) What do you think? Worth implementing? Cheers, Eric On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo wrote: > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe pulling a > chunk > >> of content from the wiki page you just wrote? If you're not comfortable > with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the official > >>> >> procedure for adding documentation to the wiki, if any? Or can I > just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all sufficient or > should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user documentation > >>> > seems to be the main thing at this point: describing how someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > From b.invergo at gmail.com Fri Jul 1 08:46:02 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 10:46:02 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0D894A.7050209@gmail.com> Hi Eric, Yes, I was feeling that as well while I was doing it. I think it would be easy enough to implement, so I'll take care of that today... Cheers, Brandon On 07/01/2011 04:17 AM, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind > of tedious when a lot of options need to be set. It would be nice to > be able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other > folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except > that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > > From b.invergo at gmail.com Fri Jul 1 10:28:22 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 12:28:22 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0DA146.7040703@gmail.com> Since the options are stored in a dict() object keyed by strings, I think it would be easiest to have the arguments passed as ("option", value) tuples or to have a dict() passed in with "option":value pairs. Alternatively, the set_option & get_option methods can be dropped altogether and we can just let the user have full access to the options dict. The write_ctl_file method can be used then to check the validity of each option, to prevent erroneous option names from being used. Perhaps that's better, despite being slightly more work to implement. What do you think? Cheers, Brandon On 07/01/2011 04:17 AM, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind > of tedious when a lot of options need to be set. It would be nice to > be able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other > folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except > that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > > From b.invergo at gmail.com Fri Jul 1 10:37:45 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 12:37:45 +0200 Subject: [Biopython-dev] "Developing on Github" wiki amendment Message-ID: <4E0DA379.4030902@gmail.com> Hi everyone, Based on my own experiences working with Github, I suggest a minor addition to the wiki tutorial on using it. I'm working behind an HTTP proxy at my university, which doesn't pose big problems with cloning or pushing, which use the git http_proxy setting without problem. However, git-pull for whatever reason doesn't seem to use that setting but instead relies on the GitProxy setting. I haven't been able to get this to play nicely, so I had trouble pulling upstream changes to my repository for a while. In the end, the easiest solution was to just change my upstream master to https://github.com/biopython/biopython.git rather than git:// so that the git https_proxy is used. The only problem is that I couldn't find a git command to change the upstream master (didn't search very deeply, admittedly), so I did it by manually editing the .git/config file in my repository. Does anyone know if there is one? So, if it's ok with everyone, I would write a small addition of a sentence or two offering this as a work-around for people having problems pulling upstream changes from behind a proxy. Before I do so, perhaps it would be prudent to ask if there are any problems about using https:// rather than git:// for pulling. Cheers, -brandon From eric.talevich at gmail.com Fri Jul 1 15:47:09 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 11:47:09 -0400 Subject: [Biopython-dev] "Developing on Github" wiki amendment In-Reply-To: <4E0DA379.4030902@gmail.com> References: <4E0DA379.4030902@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 6:37 AM, Brandon Invergo wrote: > Hi everyone, > > Based on my own experiences working with Github, I suggest a minor addition > to the wiki tutorial on using it. I'm working behind an HTTP proxy at my > university, which doesn't pose big problems with cloning or pushing, which > use the git http_proxy setting without problem. However, git-pull for > whatever reason doesn't seem to use that setting but instead relies on the > GitProxy setting. I haven't been able to get this to play nicely, so I had > trouble pulling upstream changes to my repository for a while. In the end, > the easiest solution was to just change my upstream master to > https://github.com/biopython/**biopython.gitrather than git:// so that the git https_proxy is used. > > The only problem is that I couldn't find a git command to change the > upstream master (didn't search very deeply, admittedly), so I did it by > manually editing the .git/config file in my repository. Does anyone know if > there is one? > I think the "git remote" command can do what you need, but editing .git/config is fine too. The config file is meant to be useful to power-users. So, if it's ok with everyone, I would write a small addition of a sentence > or two offering this as a work-around for people having problems pulling > upstream changes from behind a proxy. > You mean the http://biopython.org/wiki/GitUsage ? Sure, go right ahead, it's a wiki :) > Before I do so, perhaps it would be prudent to ask if there are any > problems about using https:// rather than git:// for pulling. > > If you're a committer, then the public https:// option won't let you push changes back upstream (from the command line); the repo is read-only. For pulling, it should be perfectly fine. Cheers, Eric From eric.talevich at gmail.com Fri Jul 1 16:03:40 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 12:03:40 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DA146.7040703@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DA146.7040703@gmail.com> Message-ID: I think the style of Bio.Applications should be the guide here. Check out these examples: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc74 To make this work, you'll need to accept **kwargs somewhere and use the kwargs dictionary (with some validation) for the options dict. Validation can happen either before setting the options dict, or just before writing out the config file -- I think Bio.Applications checks the args just before running the application. A complication is that you have two sets of options -- for generating the config file, and for running the program on the command line. You could require these to be set separately, to avoid some confusion in the code, and still use **kwargs for each. Also worth considering: (i) Use Bio.Phylo.PAML to generate the config file (ii) Under the hood, call out to Bio.Phylo.Applications._PamlCommandline to actually run the program with a given config file (already written) (iii) Use Bio.Phylo.PAML again to parse the output. The API of Bio.Phylo.PAML would stay basically the same, but this could help keep the options separate and maybe decouple the three phases of running the programs. (I can help with this over the weekend if you'd like.) Cheers, Eric On Fri, Jul 1, 2011 at 6:28 AM, Brandon Invergo wrote: > Since the options are stored in a dict() object keyed by strings, I think > it would be easiest to have the arguments passed as ("option", value) tuples > or to have a dict() passed in with "option":value pairs. > > Alternatively, the set_option & get_option methods can be dropped > altogether and we can just let the user have full access to the options > dict. The write_ctl_file method can be used then to check the validity of > each option, to prevent erroneous option names from being used. Perhaps > that's better, despite being slightly more work to implement. > > What do you think? > > > Cheers, > Brandon > > On 07/01/2011 04:17 AM, Eric Talevich wrote: > > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the wiki > is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind of > tedious when a lot of options need to be set. It would be nice to be able to > set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo wrote: > >> Well, it's not much, but how's this? >> https://github.com/brandoninvergo/biopython/tree/doc-branch >> Do you want me to go more into detail about the options available like >> in the wikior is this sufficient as a tutorial? Just let me know... >> >> Cheers, >> Brandon >> >> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >> wrote: >> > Hi Eric, >> > No problem, I'll start writing something up now. >> > Cheers, >> > -brandon >> > >> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >> wrote: >> >> Hi Brandon, >> >> >> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >> >> >> https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 >> >> >> >> Do you think you could add some more to that section, maybe pulling a >> chunk >> >> of content from the wiki page you just wrote? If you're not comfortable >> with >> >> LaTeX you can just point me to some text and I'll add it. >> >> >> >> Thanks, >> >> Eric >> >> >> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > >> >> wrote: >> >>> >> >>> Ok, the documentation is finished: >> >>> http://biopython.org/wiki/PAML >> >>> >> >>> Cheers, >> >>> Brandon >> >>> >> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >> wrote: >> >>> > Brandon; >> >>> > >> >>> >> Ok I've just sent the email to the main list. >> >>> > >> >>> > Awesome, thanks for this. Hope this convinces some other folks to >> >>> > take a look. >> >>> > >> >>> >> I can write up some documentation this week. What is the official >> >>> >> procedure for adding documentation to the wiki, if any? Or can I >> just >> >>> >> create an account and start writing? >> >>> > >> >>> > Create an account and start writing. Nothing official except that >> >>> > documentation is good. >> >>> > >> >>> >> Also, just to double-check, are my docstrings all sufficient or >> should >> >>> >> I expand those? >> >>> > >> >>> > Your code comments looked great to me. The end user documentation >> >>> > seems to be the main thing at this point: describing how someone can >> >>> > pick up and get started with the code. >> >>> > >> >>> > Thanks again for all the work, >> >>> > Brad >> >>> > _______________________________________________ >> >>> > Biopython-dev mailing list >> >>> > Biopython-dev at lists.open-bio.org >> >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >>> > >> >>> _______________________________________________ >> >>> Biopython-dev mailing list >> >>> Biopython-dev at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> >> >> >> > >> > > > From b.invergo at gmail.com Fri Jul 1 16:20:30 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 18:20:30 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: <4E0DF3CE.7020504@gmail.com> Hi Eric, You're right, I had the functionality of kwargs mixed up in my head (I've rarely used it) and I forgot that it's passed in as a dict. In that case, it's relatively straight-forward to do. Something like this: def set_options(self, **kwargs): for option, value in kwargs.items(): if option in self._options: self._options[option] = value # else: # Raise exception Not sure if raising an exception would really be necessary here. (ps - I haven't tested that code, I just typed it up quickly now) Regarding the splitting of functionality, to an extent it makes sense but I wonder if it's worth it, since the PAML commandline programs only take a single argument, the path to the control file. However, if the main advantages lie in code readability and standardization with the rest of the applications framework, then I think it's ok. Unfortunately I'll be unavailable all weekend (starting in about 3 minutes) but I should be free on Monday to work on it. Cheers, Brandon On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: > Hi Brandon, > > Looks good, thanks! It's just enough to get the point across, and the > wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) gets kind of > tedious when a lot of options need to be set. It would be nice to be > able to set them all in one go, as keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > wrote: > > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all sufficient > or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > From b.invergo at gmail.com Fri Jul 1 16:28:09 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 01 Jul 2011 18:28:09 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DF3CE.7020504@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> Message-ID: <4E0DF599.8080703@gmail.com> Hi Eric, I lied. I took a moment to at least implement the kwargs change: https://github.com/brandoninvergo/biopython/commit/533b06476899b631ec28a6e4cc97a2e669a45ea0 It seems to work swimmingly but I haven't tested it extensively yet. There was already exception-handling in place. Ok, *now* I'm off for the weekend! Cheers, Brandon On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > Hi Eric, > You're right, I had the functionality of kwargs mixed up in my head > (I've rarely used it) and I forgot that it's passed in as a dict. In > that case, it's relatively straight-forward to do. Something like this: > > def set_options(self, **kwargs): > for option, value in kwargs.items(): > if option in self._options: > self._options[option] = value > # else: > # Raise exception > > Not sure if raising an exception would really be necessary here. (ps - I > haven't tested that code, I just typed it up quickly now) > > Regarding the splitting of functionality, to an extent it makes sense > but I wonder if it's worth it, since the PAML commandline programs only > take a single argument, the path to the control file. However, if the > main advantages lie in code readability and standardization with the > rest of the applications framework, then I think it's ok. > > Unfortunately I'll be unavailable all weekend (starting in about 3 > minutes) but I should be free on Monday to work on it. > > Cheers, Brandon > > On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: >> Hi Brandon, >> >> Looks good, thanks! It's just enough to get the point across, and the >> wiki is a fine place for extended examples. >> >> Reading this, I notice that the cml.set_option(key, value) gets kind >> of tedious when a lot of options need to be set. It would be nice to >> be able to set them all in one go, as keyword arguments: >> >> cml.set_options( >> seqtype=1, >> verbose=0, >> noisy=0, >> RateAncestor=0, >> model=0, >> NSsites=[0, 1, 2], >> CodonFreq=2, >> cleandata=1, >> fix_alpha=1, >> kappa=4.54006, >> ) >> >> What do you think? Worth implementing? >> >> Cheers, >> Eric >> >> >> On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > > wrote: >> >> Well, it's not much, but how's this? >> https://github.com/brandoninvergo/biopython/tree/doc-branch >> Do you want me to go more into detail about the options available like >> in the wikior is this sufficient as a tutorial? Just let me know... >> >> Cheers, >> Brandon >> >> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >> > wrote: >> > Hi Eric, >> > No problem, I'll start writing something up now. >> > Cheers, >> > -brandon >> > >> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >> > wrote: >> >> Hi Brandon, >> >> >> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >> >> >> https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 >> >> >> >> >> Do you think you could add some more to that section, maybe >> pulling a chunk >> >> of content from the wiki page you just wrote? If you're not >> comfortable with >> >> LaTeX you can just point me to some text and I'll add it. >> >> >> >> Thanks, >> >> Eric >> >> >> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo >> > >> >> wrote: >> >>> >> >>> Ok, the documentation is finished: >> >>> http://biopython.org/wiki/PAML >> >>> >> >>> Cheers, >> >>> Brandon >> >>> >> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >> > wrote: >> >>> > Brandon; >> >>> > >> >>> >> Ok I've just sent the email to the main list. >> >>> > >> >>> > Awesome, thanks for this. Hope this convinces some other folks to >> >>> > take a look. >> >>> > >> >>> >> I can write up some documentation this week. What is the >> official >> >>> >> procedure for adding documentation to the wiki, if any? Or >> can I just >> >>> >> create an account and start writing? >> >>> > >> >>> > Create an account and start writing. Nothing official except that >> >>> > documentation is good. >> >>> > >> >>> >> Also, just to double-check, are my docstrings all sufficient >> or should >> >>> >> I expand those? >> >>> > >> >>> > Your code comments looked great to me. The end user documentation >> >>> > seems to be the main thing at this point: describing how >> someone can >> >>> > pick up and get started with the code. >> >>> > >> >>> > Thanks again for all the work, >> >>> > Brad >> >>> > _______________________________________________ >> >>> > Biopython-dev mailing list >> >>> > Biopython-dev at lists.open-bio.org >> >> >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >>> > >> >>> _______________________________________________ >> >>> Biopython-dev mailing list >> >>> Biopython-dev at lists.open-bio.org >> >> >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> >> >> >> > From redmine at redmine.open-bio.org Fri Jul 1 23:29:27 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Jul 2011 23:29:27 +0000 Subject: [Biopython-dev] [Biopython - Feature #3217] (In Progress) Bio.Phylo I/O support for the NeXML format References: Message-ID: Issue #3217 has been updated by Eric Talevich. Status changed from New to In Progress Assignee changed from Eric Talevich to Biopython Dev Mailing List % Done changed from 0 to 20 Jaime Huerta-Cepas pointed me to a strategy he's using to support both phyloXML and NeXML in ETE. A separate program called generateDS.py generates parsers automatically from the XSD files defining the specs. Here's the code: https://github.com/jhcepas/phylogenetic-XML-python-parsers I suggest: 1. Copying nexml.py into the Biopython source tree as Bio/Phylo/_nexml_gds.py 2. Writing something basic to convert the essential tree elements into compatible Bio.Phylo object types. Call that NexmlIO, for now? Also write unit tests. 3. As time permits, write more converters to make _nexml_gds.py objects compatible with existing Biopython types. This could include character matrices for AlignIO, and more tree annotations for Phylo. When generateDS.py is updated we'll just copy the newly generated nexml.py into _nexml_gds.py manually -- hopefully this won't require many changes in the converters each time. Timeline: After the 1.58 release. ---------------------------------------- Feature #3217: Bio.Phylo I/O support for the NeXML format https://redmine.open-bio.org/issues/3217 Author: Eric Talevich Status: In Progress Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: The future data exchange standard is... approaching rapidly. NeXML is going to become the format of choice for TreeBASE, Mesquite and probably MIAPA-targeted tools over the next year or two, and Biopython should be there to support it. Notes: * Another Python library, DendroPy, already supports (some of?) the NeXML format. Jeet Sukumaran and Mark Holder changed the license to BSD to allow other projects -- particularly us -- to share their code. So let's start there. * NeXML was designed so its elements can be treated as RDF triples, so see if RDFLib can help -- either as the underlying parser, or to provide some additional (optional) functionality. See: http://nexml.org/ http://packages.python.org/DendroPy/ http://www.rdflib.net/ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sat Jul 2 01:40:48 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 2 Jul 2011 01:40:48 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (New) Draw a Bio.Phylo tree as a phylogram Message-ID: Issue #3260 has been reported by Eric Talevich. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sat Jul 2 01:40:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 2 Jul 2011 01:40:49 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (New) Draw a Bio.Phylo tree as a phylogram Message-ID: Issue #3260 has been reported by Eric Talevich. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From updates at feedmyinbox.com Sat Jul 2 11:56:58 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 2 Jul 2011 07:56:58 -0400 Subject: [Biopython-dev] 7/2 newest questions tagged biopython - Stack Overflow Message-ID: // biopython import in Enthought Python Distribution? // July 1, 2011 at 2:30 PM http://stackoverflow.com/questions/6551862/biopython-import-in-enthought-python-distribution I am using the Enthought Python Distribution v7.0-2 (32-bit) and I am having trouble importing biopython. Does anyone know how to import biopython in EPD? I can import other libraries like numpy, matplotlib, etc. with no problem, but import biopython is not recognized. What's going on? Thanks in advance for the help. -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Sun Jul 3 23:14:46 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 3 Jul 2011 19:14:46 -0400 Subject: [Biopython-dev] 7/3 newest questions tagged biopython - Stack Overflow Message-ID: // Display line starting with word in a file Python regex // July 2, 2011 at 6:41 AM http://stackoverflow.com/questions/6556514/display-line-starting-with-word-in-a-file-python-regex I have a file "abc.txt" with the following contents.. EMBOSS_001 601 FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDS 650 :...::.||...||....|..|.|.... |..:.|.|.|..: EMBOSS_002 1 -----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNAT 41 EMBOSS_001 651 SGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW 691 |..:.| :.:|........|:| |...|...|: :.|..|.... EMBOSS_002 42 SSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE 91 I want to create three strings.. the first string "a" should have all characters thats written after EMBOSS_001 (of both the lines) ie A="FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDSSGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW" Second string should have everything written after EMBOSS_002 (of both the lines) minus numbers ie B="-----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNATSSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE" and the third string C should be whatever is between EMBOSS_1 and EMBOSS_2 (alphanumeric characters or -) in both the lines C=" :...::.||...||....|..|.|.... |..:.|.|.|..|..:.| :.:|........|:| |...|...|: :.|..|...." The original spaces at start, end(if any) and at the middle of C should be intact. In this case 5 spaces are at the start since C starts from "F" of A and "-" of B Thanks // Display line starting with word in a file Python regex [closed] // July 2, 2011 at 6:41 AM http://stackoverflow.com/questions/6556514/display-line-starting-with-word-in-a-file-python-regex I have a file "abc.txt" with the following contents.. EMBOSS_001 601 FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDS 650 :...::.||...||....|..|.|.... |..:.|.|.|..: EMBOSS_002 1 -----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNAT 41 EMBOSS_001 651 SGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW 691 |..:.| :.:|........|:| |...|...|: :.|..|.... EMBOSS_002 42 SSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE 91 I want to create three strings.. the first string "a" should have all characters thats written after EMBOSS_001 (of both the lines) ie A="FEDSESRRDSLFVPHRPGERRNSNGTTTETEVRKRRLSSYQISMEMLEDSSGRQRS-MSIASILTNTMEELE-ESRQKCPPCW-------YRFANVFLIW" Second string should have everything written after EMBOSS_002 (of both the lines) minus numbers ie B="-----NPSLTVTVPIAVGESDFENLNTEEFSSE----SELEESKEKLNATSSSEGSTVDVAPPREGEQAEIEPEEDLKPEACFTEGCIKKFPFCQVSTEE" and the third string C should be whatever is between EMBOSS_1 and EMBOSS_2 (alphanumeric characters or -) in both the lines C=" :...::.||...||....|..|.|.... |..:.|.|.|..|..:.| :.:|........|:| |...|...|: :.|..|...." The original spaces at start, end(if any) and at the middle of C should be intact. In this case 5 spaces are at the start since C starts from "F" of A and "-" of B Thanks -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Tue Jul 5 10:56:46 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 5 Jul 2011 06:56:46 -0400 Subject: [Biopython-dev] 7/5 biopython Questions - BioStar Message-ID: <46e62975e9d52606c34141cdfe578caf@74.63.51.88> // GenBank to Fasta failing with CONTIG fields // July 5, 2011 at 6:31 AM http://biostar.stackexchange.com/questions/9892/genbank-to-fasta-failing-with-contig-fields I used to generate FASTA out of my GenBank source files using a simple conversion script: #!/usr/bin/env python import sys, signal from Bio import SeqIO def wrap( text, width=80 ): for i in xrange( 0, len( text ), width ): yield text[i:i+width] if name == "main": status = progress() for record in SeqIO.parse( sys.stdin, "genbank"): try: gi = record.annotations["gi"] except KeyError: gi = None accession = record.id desc = record.description seq = record.seq locus = record.name print ">gi|%s|emb|%s|%s| %s" % (gi, accession, locus, desc) for block in wrap( seq ): print block When I changed the sequence files to newer versions some of the resulting FASTA file sequences were just filled with Ns. After closer inspection of the GenBank source files, it turns out that they have replaced the ORIGIN block ORIGIN sequence... with a CONTIG block, something like CONTIG join(BX640437.1:1..347356,BX640438.1:51..347786,...) Is there a way to resolve this using BioPython? I was working with BioPython 1.52 and 1.57 (latest). Thanks for your suggestions. // Parsing BLAST output BioPython Error // July 5, 2011 at 2:25 AM http://biostar.stackexchange.com/questions/9882/parsing-blast-output-biopython-error Hi, I have the following code def runBLAST(self): print "Running BLAST .........." cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True) cmd.communicate()[0] f1=open("out.faa") blast_records = NCBIXML.parse(f1) save_file = open("my_fasta_seq.fasta", 'w') for blast_record in blast_records[:10]: for alignment in blast_record.alignments: for hsp in alignment.hsps: save_file.write('>%s\n' % (alignment.hseq,)) save_file.close() f1.close() f2=open("my_fasta_seq.fasta") for record in SeqIO.parse(f2,"fasta"): f=open("tempBLAST1.txt","w") f.write(">"+"\n"+str(record.name)+"\n"+str(record.seq)+"\n") f.close() I get the error on TypeError: for blast_record in blast_records[:10]: saying 'generator' object is not subscriptable. I am looking to get top 10 blast hits (sequences) // Getting top 10 sequences of BLAST results Bio Python // July 5, 2011 at 12:29 AM http://biostar.stackexchange.com/questions/9880/getting-top-10-sequences-of-blast-results-bio-python Hi, I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences. I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2 the output has lots of other information. Should I parse this output now or there's a better way. Thanks P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Tue Jul 5 10:56:52 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Tue, 5 Jul 2011 06:56:52 -0400 Subject: [Biopython-dev] 7/5 newest questions tagged biopython - Stack Overflow Message-ID: <11de02f7858811397f05ce54a2078248@74.63.51.88> // Getting top 10 sequences of BLAST results Bio Python // July 5, 2011 at 12:31 AM http://stackoverflow.com/questions/6577975/getting-top-10-sequences-of-blast-results-bio-python I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences. I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2 the output has lots of other information. Should I parse this output now or there's a better way. Thanks P.S XML might be a way but I didn't find a relavant NCBIXML parser syntax. -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Tue Jul 5 16:10:14 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Jul 2011 17:10:14 +0100 Subject: [Biopython-dev] Fwd: SeqXML an alternative for FASTA In-Reply-To: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> References: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> Message-ID: Hi all, I've been in touch with Thomas Schmitt about merging read/write support for the SeqXML file format (see below and http://seqxml.org/ ) into Biopython's SeqIO module. BioPerl already supports this (under format name "seqxml") and a BioJava v3 implementation is in progress. We're discussing this and the format itself on the cross project OBF mailing list (see below), http://lists.open-bio.org/pipermail/open-bio-l/2011-July/000805.html Please feel free to join that list if you want to discuss anything general, or comment here on the Biopython implementation. I've got a branch which seems nearly ready for merging on github, https://github.com/peterjc/biopython/commits/seqxml2 a rebase of https://github.com/peterjc/biopython/commits/seqxml Regards, Peter ---------- Forwarded message ---------- From: Thomas Schmitt Date: Fri, Jul 1, 2011 at 8:57 AM Subject: [Open-bio-l] SeqXML an alternative for FASTA To: open-bio-l at lists.open-bio.org Hello everybody, We recently published a new XML format called SeqXML to store biological sequences. Our aim was to create a lightweight alternative to FASTA that allows to store the metadata that is typical squeezed into a FASTA header in a standardized way. It looks something like this: ? ? ? ? ? ?dystroglycan 1 ? ? ? ?AAGGCGAUGUC.....ACAU ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?AAGGCGAAA...CACJOXA ? ? Check out the paper at http://bib.oxfordjournals.org/content/early/2011/06/10/bib.bbr025.full?keytype=ref&ijkey=dWzLPFBuzrdZme8 There is also a website (http://seqxml.org) where you can find the schema and a detailed documentation. The whole thing emerged from developing formats for the orthology community so you will also find information about our orthology format OrthoXML at these resources. To my knowledge the only format comparable to SepXML is TinySeq which does have some significant limitation: - It doesn't support database cross referencing - The identifiers are more NCBI specific - It is more verbose - There is only a very primitive DTD - It doesn't allow to validate the sequence alphabet - It isn't possible to define the source of the sequences - It doesn't support key value pair annotations We are trying to get IO implementations for SeqXML for all Bio* projects. There is already an implementation in BioPerl maintained by Dave Messina. We do have an implementation for the legacy version of BioJava and Andrew Yates promised to help us migrating it into BioJava 3. I'm also in contact with Peter Cock about a Biopython integration. He in fact asked me to move the discussion to this list. What do you guys thinks about the format? Is there anybody who wants to contribute with a BioRuby implementation? Best regards, Thomas _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From updates at feedmyinbox.com Wed Jul 6 11:04:23 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 6 Jul 2011 07:04:23 -0400 Subject: [Biopython-dev] 7/6 biopython Questions - BioStar Message-ID: <47ab7fabfee60bc6430c75c127531996@74.63.51.88> // Biopython not working on Window 7 64 (import bio function not working) // July 6, 2011 at 1:56 AM http://biostar.stackexchange.com/questions/9947/biopython-not-working-on-window-7-64-import-bio-function-not-working Dear All I am having trouble using biopython as my 'import bio' does not work. I have Window 7 , 64 system with Python 2.7.1 with Piston , Django and NumPy site packages installed and they all work well with the import function. Any ideas? thanks! cheers Gary -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Wed Jul 6 11:04:40 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 6 Jul 2011 07:04:40 -0400 Subject: [Biopython-dev] 7/6 newest questions tagged biopython - Stack Overflow Message-ID: <467a099f2f1f03ff9fe38323f1bc5d33@74.63.51.88> // Biopython not working on Window 7 64 (import bio function not working) // July 6, 2011 at 1:55 AM http://stackoverflow.com/questions/6592127/biopython-not-working-on-window-7-64-import-bio-function-not-working I am having trouble using biopython as my 'import bio' does not work. I have Window 7 , 64 system with Python 2.7.1 with Piston , Django and NumPy site packages installed and they all work well with the import function. Any ideas? thanks! cheers Gary -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From w.arindrarto at gmail.com Thu Jul 7 01:16:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 7 Jul 2011 03:16:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser Message-ID: Hi everyone, This is my first post in the dev mailing list, so greetings :). I've been using Biopython for a few months in total now (in a period of ~1.5 years) and before that Python for ~0.5 years. Most of the time, I'm working with Sanger sequencing results and at one point I was a bit disappointed that I couldn't find any (bio)python module for reading .ab1 files. That compelled me to write my first python module that reads those files and extracts the useful information out of them. In the process I became more interested in python itself and finally thought it might be neat if biopython could do this, built-in. So I forked the main repo, made some changes to my module so it became a parser for the SeqIO submodule that reads Abi files. It's not cooked 100% yet, but if anyone is interested in seeing/commenting/criticizing the code, I'd appreciate that very much. Here's the link: https://github.com/bow/biopython/blob/seqio-abif/Bio/SeqIO/AbiIO.py Some features to note: - I've included a method to trim the sequence based on its quality scores - the parser does not extract the entire metadata of the trace files, only ones I consider important for further analysis/annotations. Of course, this could be changed if the community think some other data should be included/excluded - For those of you already familiar with the Abi format, I deliberately chose the 'PBAS2' tag for the sequence information, which is the unedited bases after base-calling by the sequencing program. Some things that I'm doing right now: - writing unit tests - making sure it's compatible with Python 3 (thanks Peter :)! ) - completing the docs - making sure it's compatible with most Abi format versions. Currently I've only tested it with files from the 310, 3100, and 3700 machines. Does anyone have some other versions that I can test this with? As I understand as well, this is not the only Sanger sequencing trace format out there (e.g. SCF is another). I would be glad to learn more and write a parser for the SCF format as well. The problem is, I'm not sure this would be useful in the long run as I've personally never seen anyone use an SCF file and so I've never had a chance to play around with one. If anyone has an SCF file lying around and thinks SCF support would be beneficial, I'd be happy to accept them :). I guess that's all for now. Thanks for reading! --- Wibowo Arindrarto (bow) http://bow.web.id From genivaldo.gueiros at gmail.com Fri Jul 8 01:18:30 2011 From: genivaldo.gueiros at gmail.com (Genivaldo Gueiros) Date: Thu, 7 Jul 2011 18:18:30 -0700 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] Message-ID: Hey guys , I ?d like to make a contribution to Biopython community ,Well what I wanna share my script using python to clean sequences up , you should know analyzing poor data takes CPU time and interpreting the results from poor data takes people time, so always is importat make a preprocessing. Let me call my script as ?Sequence_cleaner? and the big idea is to remove duplicate sequences, remove sequence too short ( the user define the minimum length) and remove sequences which has too much unknown nucleotides ( N) ( the user define the % of N is allows ) and in the end the use can choose if he/she wanna have a file as output or print the result. Let me know if you are interested -- Cheers, Geni From p.j.a.cock at googlemail.com Fri Jul 8 12:55:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 8 Jul 2011 13:55:32 +0100 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 2:18 AM, Genivaldo Gueiros wrote: > Hey guys , I ?d like to make a contribution to Biopython community ,Well > what I wanna share my script using python to clean sequences up , you should > know analyzing poor data takes CPU time and interpreting the results from > poor data takes people time, so always is importat make a preprocessing. > > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > duplicate sequences, remove sequence too short ( the user define the minimum > length) ?and remove sequences which has too much unknown nucleotides ( N) ?( > the user define the % of N is allows ) and in the end the use can choose if > he/she wanna have a file as output or print the result. > > Let me know if you are interested > Hi Genivaldo, This sounds like something you could add to the list here, http://biopython.org/wiki/Scriptcentral Or it might make a nice Cookbook example: http://biopython.org/wiki/Category:Cookbook Peter From genivaldo.gueiros at gmail.com Fri Jul 8 16:00:03 2011 From: genivaldo.gueiros at gmail.com (Genivaldo Gueiros) Date: Fri, 8 Jul 2011 09:00:03 -0700 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: Gotcha , I gonna read and probably add. Thx 2011/7/8 Peter Cock > On Fri, Jul 8, 2011 at 2:18 AM, Genivaldo Gueiros > wrote: > > Hey guys , I ?d like to make a contribution to Biopython community ,Well > > what I wanna share my script using python to clean sequences up , you > should > > know analyzing poor data takes CPU time and interpreting the results from > > poor data takes people time, so always is importat make a preprocessing. > > > > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > > duplicate sequences, remove sequence too short ( the user define the > minimum > > length) and remove sequences which has too much unknown nucleotides ( N) > ( > > the user define the % of N is allows ) and in the end the use can choose > if > > he/she wanna have a file as output or print the result. > > > > Let me know if you are interested > > > > Hi Genivaldo, > > This sounds like something you could add to the list here, > http://biopython.org/wiki/Scriptcentral > > Or it might make a nice Cookbook example: > http://biopython.org/wiki/Category:Cookbook > > Peter > -- Cheers, Geni From tim.te.beek at nbic.nl Mon Jul 11 08:34:16 2011 From: tim.te.beek at nbic.nl (Tim te Beek) Date: Mon, 11 Jul 2011 10:34:16 +0200 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk using?SeqIO.read(genbank_file, 'genbank') I get the following stacktrace: ... ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for genbank_file in genbank_files) ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line 604, in read ? ? first = iterator.next() ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", line 532, in parse ? ? for r in i: ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 440, in parse_records ? ? record = self.parse(handle, do_features) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 423, in parse ? ? if self.feed(handle, consumer, do_features): ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 395, in feed ? ? self._feed_feature_table(consumer, self.parse_features(skip=False)) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", line 347, in _feed_feature_table ? ? consumer.location(location_string) ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/__init__.py", line 975, in location ? ? raise LocationParserError(location_line) Bio.GenBank.LocationParserError: order(1078481..1078483,join(1078778,1078800..1078810)) The offending feature is: misc_feature ? ?complement(order(1078481..1078483,join(1078778, ?? ? ? ? ? ? ? ?1078800..1078810))) ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" ?? ? ? ? ? ? ? ?/note="active site" ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" Could you look into whether this is a bug in the parser or in the input file? From tim.te.beek at nbic.nl Mon Jul 11 08:46:54 2011 From: tim.te.beek at nbic.nl (Tim te Beek) Date: Mon, 11 Jul 2011 10:46:54 +0200 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: The same happens when parsing ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Saccharopolyspora_erythraea_NRRL_2338_uid62947/NC_009142.gbk, offending features: misc_feature order(2409324..2409326,2409399..2409401,2409528..2409533, 2409619..2409624,2409679..2409681,2409748..2409753, 2409754..2409759,2409835..2409837,join(2409886..2409890, 2409892..2409898),2409911..2409913,2409920..2409925) /locus_tag="SACE_2218" /note="active site" /db_xref="CDD:119408" misc_feature order(2409324..2409326,2409399..2409401,2409528..2409530) /locus_tag="SACE_2218" /note="catalytic tetrad; other site" /db_xref="CDD:119408" could have something to do with the order() instruction, but I'm not sure. On Mon, Jul 11, 2011 at 10:34, Tim te Beek wrote: > When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk > using?SeqIO.read(genbank_file, 'genbank') I get the following > stacktrace: > > ... > ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for > genbank_file in genbank_files) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", > line 604, in read > ? ? first = iterator.next() > ? File "/usr/local/lib/python2.6/dist-packages/Bio/SeqIO/__init__.py", > line 532, in parse > ? ? for r in i: > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 440, in parse_records > ? ? record = self.parse(handle, do_features) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 423, in parse > ? ? if self.feed(handle, consumer, do_features): > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 395, in feed > ? ? self._feed_feature_table(consumer, self.parse_features(skip=False)) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/Scanner.py", > line 347, in _feed_feature_table > ? ? consumer.location(location_string) > ? File "/usr/local/lib/python2.6/dist-packages/Bio/GenBank/__init__.py", > line 975, in location > ? ? raise LocationParserError(location_line) > Bio.GenBank.LocationParserError: > order(1078481..1078483,join(1078778,1078800..1078810)) > > The offending feature is: > misc_feature ? ?complement(order(1078481..1078483,join(1078778, > ?? ? ? ? ? ? ? ?1078800..1078810))) > ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" > ?? ? ? ? ? ? ? ?/note="active site" > ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" > > Could you look into whether this is a bug in the parser or in the input file? > From p.j.a.cock at googlemail.com Mon Jul 11 09:38:03 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Jul 2011 10:38:03 +0100 Subject: [Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk In-Reply-To: References: Message-ID: On Mon, Jul 11, 2011 at 9:34 AM, Tim te Beek wrote: > When parsing?ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk > using?SeqIO.read(genbank_file, 'genbank') I get the following > stacktrace: > > ... > ? ? gbk_records = (SeqIO.read(genbank_file, 'genbank') for > genbank_file in genbank_files) > ... > Bio.GenBank.LocationParserError: > order(1078481..1078483,join(1078778,1078800..1078810)) > > The offending feature is: > misc_feature ? ?complement(order(1078481..1078483,join(1078778, > ?? ? ? ? ? ? ? ?1078800..1078810))) > ?? ? ? ? ? ? ? ?/locus_tag="DKAM_1147" > ?? ? ? ? ? ? ? ?/note="active site" > ?? ? ? ? ? ? ? ?/db_xref="CDD:73252" > > Could you look into whether this is a bug in the parser or in the input file? > That looks like the issue reported in Bug 3197, which turned out to be invalid GenBank files: https://redmine.open-bio.org/issues/3197 Quoting from: http://www.ncbi.nlm.nih.gov/collab/FT/ >> >> 3.4.2.2 Operators >> >> ... >> >> Note : location operator "complement" can be used in combination with >> either "join" or "order" within the same location; combinations of "join" >> and "order" within the same location (nested operators) are illegal. Please report this problem with NC_011766.gbk and NC_009142.gbk to the NCBI (could you CC me too?), try using gb-admin at ncbi.nlm.nih.gov The next release of Biopython will have a clearer error message in this situation. Thank you, Peter From redmine at redmine.open-bio.org Mon Jul 11 09:44:25 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 11 Jul 2011 09:44:25 +0000 Subject: [Biopython-dev] [Biopython - Bug #3197] SeqIO parse error with some genbank files References: Message-ID: Issue #3197 has been updated by Peter Cock. Two more examples from the NCBI Bacteria FTP site, reported by Tim te Beek on our mailing list: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009018.html ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk LOCUS NC_011766 1365223 bp DNA circular BCT 20-MAY-2011 DEFINITION Desulfurococcus kamchatkensis 1221n chromosome, complete genome. ACCESSION NC_011766 VERSION NC_011766.1 GI:218883314 DBLINK Project: 59133 KEYWORDS . SOURCE Desulfurococcus kamchatkensis 1221n ... misc_feature complement(order(1078481..1078483,join(1078778, 1078800..1078810))) /locus_tag="DKAM_1147" /note="active site" /db_xref="CDD:73252" http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009019.html ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Saccharopolyspora_erythraea_NRRL_2338_uid62947/NC_009142.gbk LOCUS NC_009142 8212805 bp DNA circular BCT 14-FEB-2011 DEFINITION Saccharopolyspora erythraea NRRL 2338 chromosome, complete genome. ACCESSION NC_009142 VERSION NC_009142.1 GI:134096620 DBLINK Project: 62947 KEYWORDS complete genome. SOURCE Saccharopolyspora erythraea NRRL 2338 ... misc_feature order(2409324..2409326,2409399..2409401,2409528..2409533, 2409619..2409624,2409679..2409681,2409748..2409753, 2409754..2409759,2409835..2409837,join(2409886..2409890, 2409892..2409898),2409911..2409913,2409920..2409925) /locus_tag="SACE_2218" /note="active site" /db_xref="CDD:119408" misc_feature order(2409324..2409326,2409399..2409401,2409528..2409530) /locus_tag="SACE_2218" /note="catalytic tetrad; other site" /db_xref="CDD:119408" ---------------------------------------- Bug #3197: SeqIO parse error with some genbank files https://redmine.open-bio.org/issues/3197 Author: Cedar McKay Status: Resolved Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.56 URL: I've found a file that seems to choke SeqIO genbank parsing. I downloaded this file straight from NCBI, so it should be a good file. I've found a couple of other files that do the same thing. I reproduced this bug on another machine, also with biopython 1.56. I am able to successfully parse other genbank files. Maybe it has something to do with that very long location? Please let me know if I can provide any other information! Thanks! Cedar >>> from Bio import SeqIO >>> record = SeqIO.read('./Acorus_americanus_NC_010093.gb', 'genbank') Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/SeqIO/__init__.py", line 597, in read first = iterator.next() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/SeqIO/__init__.py", line 525, in parse for r in i: File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 437, in parse_records record = self.parse(handle, do_features) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 420, in parse if self.feed(handle, consumer, do_features): File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 392, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/Scanner.py", line 344, in _feed_feature_table consumer.location(location_string) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/GenBank/__init__.py", line 975, in location raise LocationParserError(location_line) Bio.GenBank.LocationParserError: order(join(42724..42726,43455..43457),43464..43469,43476..43481,43557..43562,43569..43574,43578..43583,43677..43682,44434..44439) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From sbassi at clubdelarazon.org Tue Jul 12 15:28:59 2011 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 12 Jul 2011 12:28:59 -0300 Subject: [Biopython-dev] Contributing - description of my code [Sequence_cleaner] In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 10:18 PM, Genivaldo Gueiros wrote: > Let me call my script as ?Sequence_cleaner? and the big idea is to remove > duplicate sequences, remove sequence too short ( the user define the minimum > length) ?and remove sequences which has too much unknown nucleotides ( N) ?( > the user define the % of N is allows ) and in the end the use can choose if > he/she wanna have a file as output or print the result. You should take a look at seqclean utility. Some methods should be apply only to the extremes. From updates at feedmyinbox.com Wed Jul 13 11:22:38 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 13 Jul 2011 07:22:38 -0400 Subject: [Biopython-dev] 7/13 biopython Questions - BioStar Message-ID: <454384b04ce8959586cf724c16ed5e54@74.63.51.88> // Compare two fasta files by headers // July 12, 2011 at 7:00 AM http://biostar.stackexchange.com/questions/10185/compare-two-fasta-files-by-headers Hi everyone; this is my first question on the forum. How can I compare if two fasta files contain the same sequence headers? Does any BioPython module exist for doing this? Thanks in advance, peixe -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From redmine at redmine.open-bio.org Thu Jul 14 03:02:35 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:02:35 +0000 Subject: [Biopython-dev] [Biopython - Feature #3260] (Closed) Draw a Bio.Phylo tree as a phylogram References: Message-ID: Issue #3260 has been updated by Eric Talevich. Status changed from New to Closed % Done changed from 0 to 100 Committed: https://github.com/biopython/biopython/commit/d3aa24c808b4558dabcf024a485e0128792aa4aa Folks: lemme know how this works for you. ---------------------------------------- Feature #3260: Draw a Bio.Phylo tree as a phylogram https://redmine.open-bio.org/issues/3260 Author: Eric Talevich Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Bio.Phylo should be able to draw a decent tree. This means a standard phylogram, with accurate branch lengths, labels for taxa and any internal nodes that have them, and support values -- like Phylip's "drawtree" does. The two existing functions don't quite suffice: draw_graphviz ignores branch lengths, though it's nice for unrooted trees; draw_ascii looks more like a typical phylogram, but it's ascii art and can't display internal node labels or support values (or large trees). I wrote a function to do this, based on the algorithm in draw_ascii. It uses matplotlib for display. I tested it on all the trees under Tests/Nexus and Tests/PhyloXML; it's nice. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Jul 14 03:08:52 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:08:52 +0000 Subject: [Biopython-dev] [Biopython - Bug #3263] (New) Phylo: Move clade 'color' and 'width' attributes to BaseTree Message-ID: Issue #3263 has been reported by Eric Talevich. ---------------------------------------- Bug #3263: Phylo: Move clade 'color' and 'width' attributes to BaseTree https://redmine.open-bio.org/issues/3263 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The 'color' and 'width' attributes are associated with PhyloXML trees right now, but are useful enough to be associated with the base Tree object (which you'd get from parsing a Newick or Nexus file), even though Newick and Nexus can't serialize this info. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Jul 14 03:08:53 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 14 Jul 2011 03:08:53 +0000 Subject: [Biopython-dev] [Biopython - Bug #3263] (New) Phylo: Move clade 'color' and 'width' attributes to BaseTree Message-ID: Issue #3263 has been reported by Eric Talevich. ---------------------------------------- Bug #3263: Phylo: Move clade 'color' and 'width' attributes to BaseTree https://redmine.open-bio.org/issues/3263 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The 'color' and 'width' attributes are associated with PhyloXML trees right now, but are useful enough to be associated with the base Tree object (which you'd get from parsing a Newick or Nexus file), even though Newick and Nexus can't serialize this info. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Thu Jul 14 03:23:15 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 13 Jul 2011 23:23:15 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: <4E0DF599.8080703@gmail.com> References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> <4E0DF599.8080703@gmail.com> Message-ID: Hey Brandon, I cherry-picked those commits, finally: https://github.com/biopython/biopython/commit/e2bb900212bd5113385a239b4ed42b570f06e146 https://github.com/biopython/biopython/commit/ab62ac508f02b3df1d2475f599fcd92eda6c078b https://github.com/biopython/biopython/commit/de671e1baf3faa0ed8c10835397e308b1cf1b59d Cheers, Eric On Fri, Jul 1, 2011 at 12:28 PM, Brandon Invergo wrote: > Hi Eric, > > I lied. I took a moment to at least implement the kwargs change: > https://github.com/**brandoninvergo/biopython/**commit/** > 533b06476899b631ec28a6e4cc97a2**e669a45ea0 > > It seems to work swimmingly but I haven't tested it extensively yet. There > was already exception-handling in place. > > Ok, *now* I'm off for the weekend! > Cheers, > Brandon > > > > On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > >> Hi Eric, >> You're right, I had the functionality of kwargs mixed up in my head (I've >> rarely used it) and I forgot that it's passed in as a dict. In that case, >> it's relatively straight-forward to do. Something like this: >> >> def set_options(self, **kwargs): >> for option, value in kwargs.items(): >> if option in self._options: >> self._options[option] = value >> # else: >> # Raise exception >> >> Not sure if raising an exception would really be necessary here. (ps - I >> haven't tested that code, I just typed it up quickly now) >> >> Regarding the splitting of functionality, to an extent it makes sense but >> I wonder if it's worth it, since the PAML commandline programs only take a >> single argument, the path to the control file. However, if the main >> advantages lie in code readability and standardization with the rest of the >> applications framework, then I think it's ok. >> >> Unfortunately I'll be unavailable all weekend (starting in about 3 >> minutes) but I should be free on Monday to work on it. >> >> Cheers, Brandon >> >> On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: >> >>> Hi Brandon, >>> >>> Looks good, thanks! It's just enough to get the point across, and the >>> wiki is a fine place for extended examples. >>> >>> Reading this, I notice that the cml.set_option(key, value) gets kind of >>> tedious when a lot of options need to be set. It would be nice to be able to >>> set them all in one go, as keyword arguments: >>> >>> cml.set_options( >>> seqtype=1, >>> verbose=0, >>> noisy=0, >>> RateAncestor=0, >>> model=0, >>> NSsites=[0, 1, 2], >>> CodonFreq=2, >>> cleandata=1, >>> fix_alpha=1, >>> kappa=4.54006, >>> ) >>> >>> What do you think? Worth implementing? >>> >>> Cheers, >>> Eric >>> >>> >>> On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo >> b.invergo at gmail.com>> wrote: >>> >>> Well, it's not much, but how's this? >>> https://github.com/**brandoninvergo/biopython/tree/**doc-branch >>> Do you want me to go more into detail about the options available like >>> in the wikior is this sufficient as a tutorial? Just let me know... >>> >>> Cheers, >>> Brandon >>> >>> On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo >>> > wrote: >>> > Hi Eric, >>> > No problem, I'll start writing something up now. >>> > Cheers, >>> > -brandon >>> > >>> > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich >>> >> >>> wrote: >>> >> Hi Brandon, >>> >> >>> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >>> >> >>> https://github.com/biopython/**biopython/commit/** >>> 190a85c5bde9c079fa5cee4ab9f8ee**3362538cb8 >>> >> >>> >> Do you think you could add some more to that section, maybe >>> pulling a chunk >>> >> of content from the wiki page you just wrote? If you're not >>> comfortable with >>> >> LaTeX you can just point me to some text and I'll add it. >>> >> >>> >> Thanks, >>> >> Eric >>> >> >>> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo >>> > >>> >> wrote: >>> >>> >>> >>> Ok, the documentation is finished: >>> >>> http://biopython.org/wiki/PAML >>> >>> >>> >>> Cheers, >>> >>> Brandon >>> >>> >>> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman >>> > wrote: >>> >>> > Brandon; >>> >>> > >>> >>> >> Ok I've just sent the email to the main list. >>> >>> > >>> >>> > Awesome, thanks for this. Hope this convinces some other folks to >>> >>> > take a look. >>> >>> > >>> >>> >> I can write up some documentation this week. What is the >>> official >>> >>> >> procedure for adding documentation to the wiki, if any? Or >>> can I just >>> >>> >> create an account and start writing? >>> >>> > >>> >>> > Create an account and start writing. Nothing official except that >>> >>> > documentation is good. >>> >>> > >>> >>> >> Also, just to double-check, are my docstrings all sufficient >>> or should >>> >>> >> I expand those? >>> >>> > >>> >>> > Your code comments looked great to me. The end user documentation >>> >>> > seems to be the main thing at this point: describing how >>> someone can >>> >>> > pick up and get started with the code. >>> >>> > >>> >>> > Thanks again for all the work, >>> >>> > Brad >>> >>> > ______________________________**_________________ >>> >>> > Biopython-dev mailing list >>> >>> > Biopython-dev at lists.open-bio.**org >>> >>> > >>> >>> > http://lists.open-bio.org/**mailman/listinfo/biopython-dev >>> >>> > >>> >>> ______________________________**_________________ >>> >>> Biopython-dev mailing list >>> >>> Biopython-dev at lists.open-bio.**org >>> >>> > >>> >>> http://lists.open-bio.org/**mailman/listinfo/biopython-dev >>> >> >>> >> >>> > >>> >> > > From p.j.a.cock at googlemail.com Thu Jul 14 07:56:09 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 08:56:09 +0100 Subject: [Biopython-dev] [Biopython] Gene ontology parsing In-Reply-To: References: Message-ID: Hi Kyle, Last year you wrote this on the main Biopython mailing list, On Fri, Jul 23, 2010 at 5:17 PM, Kyle wrote: >> There are already several people working on GO stuff in branches on github, >> e.g. Chris Lasher, Kyle Ellrott, Tam?s Nepusz. I don't know if any of them are >> doing OBO v1.2, but it would be sensible to check and try and combine efforts. > > The branch at http://github.com/kellrott/biopython/tree/gosupport > should parse most of the information held in OBO v1.2. > Chris's original version was targeted only for the GO OBO file, as > there was a typecheck to make sure the node ID's started with 'GO:'. > That's disable in my branch, and I've used the package to parse a few > of the other ontologies found at www.obofoundry.org. > The module is currently called Bio.GO, but maybe it should be > re-factored to represent the fact that it covers general OBO files, > and not just the GO file specifically. > > The main things things keeping it from merging into the main branch > are proper documentation, complete unit tests, and making sure that it > covers all of the standard usage practices. > > If you can try it out, and let me know which function are missing (and > maybe contribute some code), we can push this thing forward. > > Kyle Does your code still exist, perhaps on a different branch? I couldn't find it at the URL http://github.com/kellrott/biopython/tree/gosupport I'm at the CodeFest preceding BOSC 2011, and Herve (CC'd) is interested in parsing OBO files in Biopython. Thanks, Peter From p.j.a.cock at googlemail.com Thu Jul 14 08:25:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 09:25:26 +0100 Subject: [Biopython-dev] PAML unit test failure Message-ID: Hi guys, We're seeing new failures on the buildbot under Python 3, e.g. http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.2/builds/147/steps/shell/logs/stdio ====================================================================== ERROR: testOptionExists (test_Baseml.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Baseml.py", line 94, in testOptionExists self.bml.set_option, "xxxx", 1) AttributeError: 'Baseml' object has no attribute 'set_option' ====================================================================== ERROR: testOptionExists (test_Codeml.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Codeml.py", line 92, in testOptionExists self.cml.set_option, "xxxx", 1) AttributeError: 'Codeml' object has no attribute 'set_option' ====================================================================== ERROR: testOptionExists (test_Yn00.ModTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Yn00.py", line 67, in testOptionExists self.yn00.set_option, "xxxx", 1) AttributeError: 'Yn00' object has no attribute 'set_option' ---------------------------------------------------------------------- Caused by this commit, https://github.com/biopython/biopython/tree/de671e1baf3faa0ed8c10835397e308b1cf1b59d I couldn't see a matching change to the unit tests on Brandon's branch to apply, so I just fixed it: https://github.com/biopython/biopython/commit/145fe2a01afb4092cb2e862142dd04234410b74f Peter From b.invergo at gmail.com Thu Jul 14 08:49:07 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 14 Jul 2011 10:49:07 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> <4E0DF3CE.7020504@gmail.com> <4E0DF599.8080703@gmail.com> Message-ID: <4E1EAD83.3080603@gmail.com> Hi Eric, I'm really sorry about that!!! I was on holiday for some days and now I'm up to my neck in grant applications, so it totally slipped my mind. Is there anything left to do? I see Peter's email about the unit tests so I'll look into that... Apologies, -brandon On Thu 14 Jul 2011 05:23:15 AM CEST, Eric Talevich wrote: > Hey Brandon, > > I cherry-picked those commits, finally: > https://github.com/biopython/biopython/commit/e2bb900212bd5113385a239b4ed42b570f06e146 > https://github.com/biopython/biopython/commit/ab62ac508f02b3df1d2475f599fcd92eda6c078b > https://github.com/biopython/biopython/commit/de671e1baf3faa0ed8c10835397e308b1cf1b59d > > Cheers, > Eric > > On Fri, Jul 1, 2011 at 12:28 PM, Brandon Invergo > wrote: > > Hi Eric, > > I lied. I took a moment to at least implement the kwargs change: > https://github.com/__brandoninvergo/biopython/__commit/__533b06476899b631ec28a6e4cc97a2__e669a45ea0 > > > It seems to work swimmingly but I haven't tested it extensively yet. > There was already exception-handling in place. > > Ok, *now* I'm off for the weekend! > Cheers, > Brandon > > > > On Fri 01 Jul 2011 06:20:30 PM CEST, Brandon Invergo wrote: > > Hi Eric, > You're right, I had the functionality of kwargs mixed up in my > head (I've rarely used it) and I forgot that it's passed in as a > dict. In that case, it's relatively straight-forward to do. > Something like this: > > def set_options(self, **kwargs): > for option, value in kwargs.items(): > if option in self._options: > self._options[option] = value > # else: > # Raise exception > > Not sure if raising an exception would really be necessary here. > (ps - I haven't tested that code, I just typed it up quickly now) > > Regarding the splitting of functionality, to an extent it makes > sense but I wonder if it's worth it, since the PAML commandline > programs only take a single argument, the path to the control > file. However, if the main advantages lie in code readability > and standardization with the rest of the applications framework, > then I think it's ok. > > Unfortunately I'll be unavailable all weekend (starting in about > 3 minutes) but I should be free on Monday to work on it. > > Cheers, Brandon > > On Fri 01 Jul 2011 04:17:26 AM CEST, Eric Talevich wrote: > > Hi Brandon, > > Looks good, thanks! It's just enough to get the point > across, and the wiki is a fine place for extended examples. > > Reading this, I notice that the cml.set_option(key, value) > gets kind of tedious when a lot of options need to be set. > It would be nice to be able to set them all in one go, as > keyword arguments: > > cml.set_options( > seqtype=1, > verbose=0, > noisy=0, > RateAncestor=0, > model=0, > NSsites=[0, 1, 2], > CodonFreq=2, > cleandata=1, > fix_alpha=1, > kappa=4.54006, > ) > > What do you think? Worth implementing? > > Cheers, > Eric > > > On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo > > >> > wrote: > > Well, it's not much, but how's this? > https://github.com/__brandoninvergo/biopython/tree/__doc-branch > > Do you want me to go more into detail about the options > available like > in the wikior is this sufficient as a tutorial? Just let me > know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > > >> > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > > >> wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/__biopython/commit/__190a85c5bde9c079fa5cee4ab9f8ee__3362538cb8 > > > >> > >> Do you think you could add some more to that section, maybe > pulling a chunk > >> of content from the wiki page you just wrote? If you're not > comfortable with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > > >> > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > > >> > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some > other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the > official > >>> >> procedure for adding documentation to the wiki, if > any? Or > can I just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official > except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all > sufficient > or should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user > documentation > >>> > seems to be the main thing at this point: describing how > someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _________________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.__org > > > > >>> > > http://lists.open-bio.org/__mailman/listinfo/biopython-dev > > >>> > > >>> _________________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.__org > > > > >>> > http://lists.open-bio.org/__mailman/listinfo/biopython-dev > > >> > >> > > From kellrott at gmail.com Thu Jul 14 09:19:48 2011 From: kellrott at gmail.com (Kyle) Date: Thu, 14 Jul 2011 02:19:48 -0700 Subject: [Biopython-dev] [Biopython] Gene ontology parsing In-Reply-To: References: Message-ID: (Hitting Reply-All this time) I think I let the work being done by Chris Lasher and Tamas supersedes my work. My patches got pulled into Chris's branch: https://github.com/gotgenes/biopython/tree/gosupport/Bio/GO I'll be at BOSC starting tomorrow. (About to get on a plane to Vienna) Kyle On Thu, Jul 14, 2011 at 12:56 AM, Peter Cock wrote: > Hi Kyle, > > Last year you wrote this on the main Biopython mailing list, > > On Fri, Jul 23, 2010 at 5:17 PM, Kyle wrote: >>> There are already several people working on GO stuff in branches on github, >>> e.g. Chris Lasher, Kyle Ellrott, Tam?s Nepusz. I don't know if any of them are >>> doing OBO v1.2, but it would be sensible to check and try and combine efforts. >> >> The branch at http://github.com/kellrott/biopython/tree/gosupport >> should parse most of the information held in OBO v1.2. >> Chris's original version was targeted only for the GO OBO file, as >> there was a typecheck to make sure the node ID's started with 'GO:'. >> That's disable in my branch, and I've used the package to parse a few >> of the other ontologies found at www.obofoundry.org. >> The module is currently called Bio.GO, but maybe it should be >> re-factored to represent the fact that it covers general OBO files, >> and not just the GO file specifically. >> >> The main things things keeping it from merging into the main branch >> are proper documentation, complete unit tests, and making sure that it >> covers all of the standard usage practices. >> >> If you can try it out, and let me know which function are missing (and >> maybe contribute some code), we can push this thing forward. >> >> Kyle > > Does your code still exist, perhaps on a different branch? I couldn't > find it at the URL http://github.com/kellrott/biopython/tree/gosupport > > I'm at the CodeFest preceding BOSC 2011, and Herve (CC'd) is > interested in parsing OBO files in Biopython. > > Thanks, > > Peter > From b.invergo at gmail.com Thu Jul 14 09:23:14 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 14 Jul 2011 11:23:14 +0200 Subject: [Biopython-dev] PAML unit test failure In-Reply-To: References: Message-ID: <4E1EB582.1090005@gmail.com> So, just to confirm, this is resolved then? Sorry again everyone, I dropped the ball on this one. -brandon On Thu 14 Jul 2011 10:25:26 AM CEST, Peter Cock wrote: > Hi guys, > > We're seeing new failures on the buildbot under Python 3, e.g. > http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.2/builds/147/steps/shell/logs/stdio > > ====================================================================== > ERROR: testOptionExists (test_Baseml.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Baseml.py", > line 94, in testOptionExists > self.bml.set_option, "xxxx", 1) > AttributeError: 'Baseml' object has no attribute 'set_option' > > ====================================================================== > ERROR: testOptionExists (test_Codeml.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Codeml.py", > line 92, in testOptionExists > self.cml.set_option, "xxxx", 1) > AttributeError: 'Codeml' object has no attribute 'set_option' > > ====================================================================== > ERROR: testOptionExists (test_Yn00.ModTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/pjcock/repositories/BuildBot/lin32/build/build/py3.2/Tests/test_Yn00.py", > line 67, in testOptionExists > self.yn00.set_option, "xxxx", 1) > AttributeError: 'Yn00' object has no attribute 'set_option' > > ---------------------------------------------------------------------- > > Caused by this commit, > > https://github.com/biopython/biopython/tree/de671e1baf3faa0ed8c10835397e308b1cf1b59d > > I couldn't see a matching change to the unit tests on Brandon's branch > to apply, so I just fixed it: > > https://github.com/biopython/biopython/commit/145fe2a01afb4092cb2e862142dd04234410b74f > > Peter From p.j.a.cock at googlemail.com Thu Jul 14 12:54:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 13:54:42 +0100 Subject: [Biopython-dev] PAML unit test failure In-Reply-To: <4E1EB582.1090005@gmail.com> References: <4E1EB582.1090005@gmail.com> Message-ID: On Thu, Jul 14, 2011 at 10:23 AM, Brandon Invergo wrote: > So, just to confirm, this is resolved then? Yes - but you should make sure you get this fix in your branches if required. > Sorry again everyone, I dropped the ball on this one. > > -brandon No problem - one of the points of the buildbot is to catch oversights. Peter From p.j.a.cock at googlemail.com Thu Jul 14 13:15:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Jul 2011 14:15:12 +0100 Subject: [Biopython-dev] SeqXML an alternative for FASTA In-Reply-To: References: <9A1DEE28-F2AD-40A8-998B-538137226584@scilifelab.se> Message-ID: On Tue, Jul 5, 2011 at 5:10 PM, Peter Cock wrote: > Hi all, > > I've been in touch with Thomas Schmitt about merging read/write > support for the SeqXML file format (see below and http://seqxml.org/ ) > into Biopython's SeqIO module. > > BioPerl already supports this (under format name "seqxml") and a > BioJava v3 implementation is in progress. We're discussing this and > the format itself on the cross project OBF mailing list (see below), > > http://lists.open-bio.org/pipermail/open-bio-l/2011-July/000805.html > > Please feel free to join that list if you want to discuss anything > general, or comment here on the Biopython implementation. > I've got a branch which seems nearly ready for merging on > github, https://github.com/peterjc/biopython/commits/seqxml2 > a rebase of https://github.com/peterjc/biopython/commits/seqxml > > Regards, > > Peter I've just applied this to the trunk. Thomas, please keep in touch if and when there are any further changes to the seqxml specification and the code needs updating. Thanks, Peter From mathieu.giraud at lifl.fr Fri Jul 15 05:44:08 2011 From: mathieu.giraud at lifl.fr (Mathieu Giraud) Date: Fri, 15 Jul 2011 07:44:08 +0200 Subject: [Biopython-dev] Biomanycores, GPU bioinformatics with BioJava, BioPerl and Biopython Message-ID: <345DF252-C388-4DFB-9BF5-9BF077A43CC2@lifl.fr> Dear colleagues, We are pleased to announce the last release of Biomanycores, a library of bioinformatics applications for GPU and other manycore processors. We gather some CUDA and OpenCL codes and provide interfaces with the latest versions of BioJava, BioPerl and Biopython, and we aim to provide soon benchmarks on GPU-accelerated applications through the Bio* frameworks. ==> http://www.biomanycores.org/ <== Currently, Biomanycores has codes for sequence alignment (Smith-Waterman), genome alignment (MUMmer), transcription factor lookup with position-weight matrices, and RNA secondary structure and pseudo-knot prediction. We plan to integrate other applications in the next months. We invite you to try Biomanycores applications and interfaces. We welcome any feedback, most notably on the way we realised the integrations. The main developper of Biomanycores, Jean-Fr?d?ric Berthelot, is full-time on the project and can provide help. In particular, we are willing to provide support on the following points: 1) you can propose other GPU applications for inclusion ; 2) if you have some computation time bottlenecks on a BioJava, BioPerl or Biopython pipeline, we can work together to see if some current or potential GPU applications could speed-up your scripts. You can join us at . Moreover, we will be present this week in Wien at BOSC 2011. Best regards, The Biomanycores team -- Mathieu Giraud - http://www.lifl.fr/~giraud CNRS, LIFL, Universit? Lille 1, INRIA Lille, France From anaryin at gmail.com Fri Jul 15 13:20:03 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 15 Jul 2011 15:20:03 +0200 Subject: [Biopython-dev] Fwd: Update [GSoC - M. Trellet] Message-ID: Updates on https://github.com/mtrellet/biopython/tree/interface_analysis The wiki was also updated with some (not all) information. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao ---------- Forwarded message ---------- From: Mikael Trellet Date: Fri, Jul 15, 2011 at 3:01 PM Subject: Update To: Eric Talevich Cc: Joao Rodrigues Hello Eric, Some news from the project ! We are still ahead on the road map, I have much more time than during the last weeks and can work essentially on it. I added some stuff this morning as the unit tests for the ExtendedResidues class and other smaller functions. I'm going to focus on the buried surface area during the following days, I'm still wondering which method I will use, NACCESS, HSExposure or an other one... Don't hesitate if you have any question ! Cheers, -- Mikael TRELLET, Computational structural biology group, Utrecht University Bijvoet Center, The Netherlands From redmine at redmine.open-bio.org Sat Jul 16 21:42:16 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 16 Jul 2011 21:42:16 +0000 Subject: [Biopython-dev] [Biopython - Bug #3266] (New) Bio/PDB/PDBParser.py: local variable 'i' referenced before assignment Message-ID: Issue #3266 has been reported by Boris Nagaev. ---------------------------------------- Bug #3266: Bio/PDB/PDBParser.py: local variable 'i' referenced before assignment https://redmine.open-bio.org/issues/3266 Author: Boris Nagaev Status: New Priority: Normal Assignee: Category: Target version: URL: Hey! Look at file Bio/PDB/PDBParser.py, line 100:
     91     def _get_header(self, header_coords_trailer):
     92         "Get the header of the PDB file, return the rest."
     93         structure_builder=self.structure_builder
     94         for i in range(0, len(header_coords_trailer)):
     95             structure_builder.set_line_counter(i+1)
     96             line=header_coords_trailer[i]
     97             record_type=line[0:6] 
     98             if(record_type=='ATOM  ' or record_type=='HETATM' or record_type=='MODEL '):
     99                 break
    100         header=header_coords_trailer[0:i]
    101         # Return the rest of the coords+trailer for further processing
    102         self.line_counter=i
    103         coords_trailer=header_coords_trailer[i:]
    104         header_dict=_parse_pdb_header_list(header)
    105         return header_dict, coords_trailer
At line 100, variable i is used, but it may be undefined, if len(header_coords_trailer) = 0 Biopython version 1.57-1+b1, debian lenny ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From venkatesh20 at gmail.com Wed Jul 20 11:51:01 2011 From: venkatesh20 at gmail.com (Venkatesh U) Date: Wed, 20 Jul 2011 17:21:01 +0530 Subject: [Biopython-dev] Contributing to Bio-python Message-ID: Hi Friends, I am interested in contributing to Bio-Python, I am programmer with 5+ years of experience in applying the Machine Learning / Data mining in the telecom domain. I am comfortable with the programming languages Java, Python, also proficient in SQL and NOSql databases and some experience with map reduce and hadoop. Recently I started learning Bio-informatics and got impressed with the contribution that Bio-python is making to this field. I am very interested in contributing to BioPython. I would highly appreciate if you could help me get started. Thanks in Advance. Thanks, Venki From venkatesh20 at gmail.com Wed Jul 20 12:45:20 2011 From: venkatesh20 at gmail.com (Venkatesh U) Date: Wed, 20 Jul 2011 18:15:20 +0530 Subject: [Biopython-dev] Willing to Contribute to Biopython Message-ID: Hi Friends, I am interested in contributing to Bio-Python, I am programmer with 5+ years of experience in applying the Machine Learning / Data mining in the telecom domain. I am comfortable with the programming languages Java, Python, also proficient in SQL and NOSql databases and some experience with map reduce and hadoop. Recently I started learning Bio-informatics and got impressed with the contribution that Bio-python is making to this field. I am very interested in contributing to BioPython. I would highly appreciate if you could help me get started. Thanks in Advance. Thanks, Venki From chapmanb at 50mail.com Wed Jul 20 16:00:28 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 20 Jul 2011 12:00:28 -0400 Subject: [Biopython-dev] Willing to Contribute to Biopython In-Reply-To: References: Message-ID: <20110720160028.GB13254@sobchak> Venki; > I am very interested in contributing to BioPython. I would highly appreciate > if you could help me get started. Thanks in Advance. Welcome, we're very happy to have you interested. The best way to get started is to begin using Biopython for your projects of interest and then contribute back documentation about how to do useful tasks. The Cookbook page is a great example of useful parts contributed by users: http://biopython.org/wiki/Category:Cookbook As you begin doing this and get more familiar with Biopython, you'll likely run into areas where additional libraries might be useful. At that point feel free to suggest potential new libraries and ideas to the list; or get started coded and ask for feedback on what you've written. An alternative approach to getting started is to look at the Issue tracker and pick some problems of interest: https://redmine.open-bio.org/projects/biopython Thanks again for the message and interest, Brad From updates at feedmyinbox.com Wed Jul 20 22:05:09 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 20 Jul 2011 18:05:09 -0400 Subject: [Biopython-dev] 7/20 biopython Questions - BioStar Message-ID: // PSI-Blast commandline version using Bipython // July 20, 2011 at 11:10 AM http://biostar.stackexchange.com/questions/10442/psi-blast-commandline-version-using-bipython Hi Niek, I have been trying to get my program for PSI-Blast run, i have been using the wrapper from Bio.Blast.Applications.NcbipsiblastCommanline: I saw the code you posted at http://biostar.stackexchange.com/questions/2515/how-to-parse-psiblast-results-using-biopython-and-blast-2-2-24 i tried to use the same but it doesnt seem to work for me. can you suggest me why? code that i used: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") #p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,\ # stderr=subprocess.PIPE,shell=(sys.platform!="win32")) #blastParser(p.stdout) str(psi_cline) psi_cline() here is the error that i get : Traceback (most recent call last): File "psiBlast.py", line 116, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/init.py", line 432, in call stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' i do not get any XML as my output as the command is not found apparently. in this case queryID is a protein name and the queryID.fasta consists of the fasta file. looking forward for your reply. Molly -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Thu Jul 21 10:54:55 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 21 Jul 2011 06:54:55 -0400 Subject: [Biopython-dev] 7/21 biopython Questions - BioStar Message-ID: // PSI-Blast commandline version using Biopython // July 20, 2011 at 11:10 AM http://biostar.stackexchange.com/questions/10442/psi-blast-commandline-version-using-biopython Hi Niek, I have been trying to get my program for PSI-Blast run, i have been using the wrapper from Bio.Blast.Applications.NcbipsiblastCommanline: I saw the code you posted at http://biostar.stackexchange.com/questions/2515/how-to-parse-psiblast-results-using-biopython-and-blast-2-2-24 i tried to use the same but it doesnt seem to work for me. can you suggest me why? code that i used: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") #p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,\ # stderr=subprocess.PIPE,shell=(sys.platform!="win32")) #blastParser(p.stdout) str(psi_cline) psi_cline() here is the error that i get : Traceback (most recent call last): File "psiBlast.py", line 116, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/init.py", line 432, in call stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' i do not get any XML as my output as the command is not found apparently. in this case queryID is a protein name and the queryID.fasta consists of the fasta file. looking forward for your reply. Molly -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From redmine at redmine.open-bio.org Thu Jul 21 14:55:02 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 21 Jul 2011 14:55:02 +0000 Subject: [Biopython-dev] [Biopython - Bug #3267] (New) Empty files trigger Bio.SeqIO exception from Bio.SeqIO.SeqXML under Jython Message-ID: Issue #3267 has been reported by Peter Cock. ---------------------------------------- Bug #3267: Empty files trigger Bio.SeqIO exception from Bio.SeqIO.SeqXML under Jython https://redmine.open-bio.org/issues/3267 Author: Peter Cock Status: New Priority: Normal Assignee: Category: Target version: URL: Buildbot test failures following addition of seqxml support to Bio.SeqIO $ jython test_SeqIO.py Traceback (most recent call last): File "test_SeqIO.py", line 392, in records = list(SeqIO.parse(handle, t_format)) File "/Users/pjcock/jython2.5.1/Lib/site-packages/Bio/SeqIO/__init__.py", line 536, in parse for r in i: File "/Users/pjcock/jython2.5.1/Lib/site-packages/Bio/SeqIO/SeqXmlIO.py", line 53, in __iter__ for event,node in self._events: File "/Users/pjcock/jython2.5.1/Lib/xml/dom/pulldom.py", line 231, in next rc = self.getEvent() File "/Users/pjcock/jython2.5.1/Lib/xml/dom/pulldom.py", line 275, in _slurp self.parser.parse(self.stream) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/drivers2/drv_javasax.py", line 141, in parse self._parser.parse(JyInputSourceWrapper(source)) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/drivers2/drv_javasax.py", line 58, in fatalError self._err_handler.fatalError(_wrap_sax_exception(exc)) File "/Users/pjcock/jython2.5.1/Lib/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:1: Premature end of file. Turns out to be due to different handling of empty XML files under Jython. I will file a bug in Jython shortly (currently http://www.jython.org/ is down). Reduced test case: import sys print sys.version from StringIO import StringIO from xml.dom import pulldom from xml.sax import SAXParseException handle = StringIO() # simulate empty file try: for event,node in pulldom.parse(handle): print event except SAXParseException, e: print repr(e) print "Line number", e.getLineNumber() print "Column number", e.getColumnNumber() print "Done" $ python2.5 ../../sax_empty_xml.py 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ python2.6 ../../sax_empty_xml.py 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ jython ../../sax_empty_xml.py 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] SAXParseException(u'Premature end of file.',) Line number 1 Column number 1 Done Notice (a) different exception description, (b) different column number: ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Jul 21 17:55:54 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 18:55:54 +0100 Subject: [Biopython-dev] Jython updated on mac build slave Message-ID: Hi all, If anyone is curious about the following unit test failure, FAIL: test_SeqIO ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 241, in runTest assert expected_line == output_line, \ AssertionError: Output : " Failed: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)" Expected: " Checking can write/read as 'clustal' format" This was from converting a new SeqXML file into FASTA, where a description contained non-ASCII. It works fine on C Python, but was failing under Jython 2.5.1. The solution? Update to Jython 2.5.2 As part of this work it means all the build slaves now run Jython 2.5.2 rather than a mix of this and Jython 2.5.1 so I have standardised this for the buildbot. There is now one less column listed here: http://testing.open-bio.org:8010/tgrid or one less row with the other view: http://testing.open-bio.org:8010/grid Sadly this seems to have lost the old build history under Jython, but I don't mind too much. Peter From p.j.a.cock at googlemail.com Thu Jul 21 20:18:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 21:18:12 +0100 Subject: [Biopython-dev] Jython updated on mac build slave In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 6:55 PM, Peter Cock wrote: > Hi all, > > If anyone is curious about the following unit test failure, > > FAIL: test_SeqIO > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "run_tests.py", line 241, in runTest > ? ?assert expected_line == output_line, \ > AssertionError: > Output ?: " Failed: 'ascii' codec can't decode byte 0xe5 in position > 0: ordinal not in range(128)" > Expected: " Checking can write/read as 'clustal' format" > > This was from converting a new SeqXML file into FASTA, where a > description contained non-ASCII. It works fine on C Python, but was > failing under Jython 2.5.1. The solution? Update to Jython 2.5.2 > > As part of this work it means all the build slaves now run Jython 2.5.2 > rather than a mix of this and Jython 2.5.1 so I have standardised this > for the buildbot. There is now one less column listed here: > > http://testing.open-bio.org:8010/tgrid > > or one less row with the other view: > > http://testing.open-bio.org:8010/grid > > Sadly this seems to have lost the old build history under Jython, > but I don't mind too much. Note the Windows buildslave is still on Jython 2.5.1 right now, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Jython%202.5/builds/0/steps/shell/logs/stdio Peter From p.j.a.cock at googlemail.com Thu Jul 21 20:48:38 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 21:48:38 +0100 Subject: [Biopython-dev] float('nan') fails on Python 2.5 on Windows (PAML) Message-ID: On Sun, Jun 5, 2011 at 10:51 PM, Peter wrote: > Hi all, > > As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') > and also float('nan') were passed to the underlying C library, which > may or may not return the IEEE special floating point value for > infinity, minus infinity or nan. See: > http://www.python.org/dev/peps/pep-0754/ > > This is the root cause of this unit test failure on Windows Python 2.5, > ... The related problem of float("nan") on Python 2.5 or older on Windows is the cause of this problem in the new PAML code too: http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/229/steps/shell/logs/stdio ValueError: invalid literal for float(): nan I guess _parse_codeml.py might need something like this: try: float("nan") _float = float except ValueError: def _float(txt): try: return float(text) except ValueError, e: if txt=="nan": return XXX else: raise e And then use the nan safe _float function in the parser. Unless anyone has a nicer solution? Peter From p.j.a.cock at googlemail.com Fri Jul 22 17:35:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:35:25 +0100 Subject: [Biopython-dev] Jython updated on mac build slave In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 9:18 PM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 6:55 PM, Peter Cock wrote: >> Hi all, >> >> If anyone is curious about the following unit test failure, ... >> It works fine on C Python, but was failing under Jython 2.5.1. >> The solution? Update to Jython 2.5.2 > > ... > > Note the Windows buildslave is still on Jython 2.5.1 right now, > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Jython%202.5/builds/0/steps/shell/logs/stdio That's updated now, Peter From p.j.a.cock at googlemail.com Fri Jul 22 17:36:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:36:46 +0100 Subject: [Biopython-dev] float('nan') fails on Python 2.5 on Windows (PAML) In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 9:48 PM, Peter Cock wrote: > On Sun, Jun 5, 2011 at 10:51 PM, Peter wrote: >> Hi all, >> >> As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') >> and also float('nan') were passed to the underlying C library, which >> may or may not return the IEEE special floating point value for >> infinity, minus infinity or nan. See: >> http://www.python.org/dev/peps/pep-0754/ >> >> This is the root cause of this unit test failure on Windows Python 2.5, >> ... > > The related problem of float("nan") on Python 2.5 or older on > Windows is the cause of this problem in the new PAML code too: > > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/229/steps/shell/logs/stdio > > ValueError: invalid literal for float(): nan > > I guess _parse_codeml.py might need something like this: > > > try: > ? ?float("nan") > ? ?_float = float > except ValueError: > ? ?def _float(txt): > ? ? ? ?try: > ? ? ? ? ? ?return float(text) > ? ? ? ?except ValueError, e: > ? ? ? ? ? ?if txt=="nan": > ? ? ? ? ? ? ? ?return XXX > ? ? ? ? ? ?else: > ? ? ? ? ? ? ? ?raise e > > And then use the nan safe _float function in the parser. > > Unless anyone has a nicer solution? > > Peter I've committed something like that after testing on Windows, https://github.com/biopython/biopython/commit/7539e9163839642ada24e1ebb9c3aff1bb25d573 Not very elegant, but it works. Peter From redmine at redmine.open-bio.org Fri Jul 22 17:43:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 22 Jul 2011 17:43:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3268] (New) Windows and Python 3 specific unicode problem in SeqIO / SeqXML Message-ID: Issue #3268 has been reported by Peter Cock. ---------------------------------------- Bug #3268: Windows and Python 3 specific unicode problem in SeqIO / SeqXML https://redmine.open-bio.org/issues/3268 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: We're seeing unittest failures under Python 3.1 and 3.2 on Windows XP via the buildbot, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.1/builds/240/steps/shell/logs/stdio or http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.2/builds/100/steps/shell/logs/stdio ====================================================================== FAIL: test_unicode_characters_desc (test_SeqIO_SeqXML.TestDetailedRead) Test special unicode characters in the description. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win31\build\build\py3.1\Tests\test_SeqIO_SeqXML.py", line 55, in test_unicode_characters_desc self.assertEqual(self.records["rna"][2].description, "\u00E5\u00C5\u00FC\u00F6\u00D6\u00DF\u00F8\u00E4\u00A2\u00A3$\u20AC\u9999\u80A0") AssertionError: '??????????$???\xa0' != '????????$?\u9999\u80a0' ---------------------------------------------------------------------- This test is currently working on Linux, Mac OS X for Python 3. There was a similar failure in Jython 2.5.1 (cross platform), now fixed in Jython 2.5.2, see: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009044.html Peter ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Fri Jul 22 17:43:49 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 22 Jul 2011 17:43:49 +0000 Subject: [Biopython-dev] [Biopython - Bug #3268] (New) Windows and Python 3 specific unicode problem in SeqIO / SeqXML Message-ID: Issue #3268 has been reported by Peter Cock. ---------------------------------------- Bug #3268: Windows and Python 3 specific unicode problem in SeqIO / SeqXML https://redmine.open-bio.org/issues/3268 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: We're seeing unittest failures under Python 3.1 and 3.2 on Windows XP via the buildbot, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.1/builds/240/steps/shell/logs/stdio or http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%203.2/builds/100/steps/shell/logs/stdio ====================================================================== FAIL: test_unicode_characters_desc (test_SeqIO_SeqXML.TestDetailedRead) Test special unicode characters in the description. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win31\build\build\py3.1\Tests\test_SeqIO_SeqXML.py", line 55, in test_unicode_characters_desc self.assertEqual(self.records["rna"][2].description, "\u00E5\u00C5\u00FC\u00F6\u00D6\u00DF\u00F8\u00E4\u00A2\u00A3$\u20AC\u9999\u80A0") AssertionError: '??????????$???\xa0' != '????????$?\u9999\u80a0' ---------------------------------------------------------------------- This test is currently working on Linux, Mac OS X for Python 3. There was a similar failure in Jython 2.5.1 (cross platform), now fixed in Jython 2.5.2, see: http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009044.html Peter -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Jul 22 17:50:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 18:50:33 +0100 Subject: [Biopython-dev] Windows buildslave back online Message-ID: Hi all, After a period where my Windows machine was being borrowed, it should now be back online as a buildslave again. Most of the Windows specific unit test failures that have developed during that last few weeks have now been fixed, except this one: https://redmine.open-bio.org/issues/3268 Peter P.S. If anyone has some suitable networked machines that could be left on as buildslaves, we'd like more - especially a 64 bit Windows box (which will be an interesting challenge to get Python, NumPy and Biopython compiling and running nicely in the first place). See also http://www.biopython.org/wiki/Continuous_integration From tiagoantao at gmail.com Fri Jul 22 19:38:21 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 22 Jul 2011 20:38:21 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 6:50 PM, Peter Cock wrote: > P.S. If anyone has some suitable networked machines that could > be left on as buildslaves, we'd like more - especially a 64 bit > Windows box (which will be an interesting challenge to get Python, > NumPy and Biopython compiling and running nicely in the first place). > See also http://www.biopython.org/wiki/Continuous_integration I can put mine online (now things are more quiet and I have time again), but it be up just a few times a week or so. Better than nothing? From p.j.a.cock at googlemail.com Fri Jul 22 19:46:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 20:46:29 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Friday, July 22, 2011, Tiago Ant?o wrote: > On Fri, Jul 22, 2011 at 6:50 PM, Peter Cock wrote: >> P.S. If anyone has some suitable networked machines that could >> be left on as buildslaves, we'd like more - especially a 64 bit >> Windows box (which will be an interesting challenge to get Python, >> NumPy and Biopython compiling and running nicely in the first place). >> See also http://www.biopython.org/wiki/Continuous_integration > > I can put mine online (now things are more quiet and I have time > again), but it be up just a few times a week or so. Better than > nothing? Certainly much better than nothing - yes please. Peter From tiagoantao at gmail.com Sat Jul 23 02:52:56 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 23 Jul 2011 02:52:56 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 5:50 PM, Peter Cock wrote: > P.S. If anyone has some suitable networked machines that could > be left on as buildslaves, we'd like more - especially a 64 bit > Windows box (which will be an interesting challenge to get Python, > NumPy and Biopython compiling and running nicely in the first place). > See also http://www.biopython.org/wiki/Continuous_integration I tried compiling a 64 bit version. It is a bit of hell. 1. There is a 64 bit windows version of python and numpy. So far so good 2. Python does not support mingw64. One needs to tweak the python-dev include files (there are several bugs/requests related to this) 3. Visual Studio express does not generate 64 bit binaries, just the pro version can do this (but see below) 4. distutils does not support recent versions of VS (i.e. not version 10, only version 9). It can work, but it requires tweaking the distutils source code. So: no joy unless with either (i) tweaking the python compiler source code for mingw OR (i) having the pro version of VS (which I do not have - and is what Christoph Gohlke seems to be doing) OR ... There is way to generate a 64 bit version with VS Express + the MS SDK (all free), but it will require minor changes to setup.py (because of point 4 above). I am now able to compile biopython fully 64 bit with free (as in beer) tools. I just now have to streamline the process.. To sum it up it requires: VS Express + MS SDK + minor changes to setup.py (subclassing distutils for VS 2010 Express + MS SDK) Going to bed now, its late around were, F#!$#!!! Windows! Tiago From redmine at redmine.open-bio.org Mon Jul 25 05:54:30 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 25 Jul 2011 05:54:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #2578] The GenBank SeqRecord parser does not record molecule type or if circular References: Message-ID: Issue #2578 has been updated by Mark Diekhans. not having the molecule type is a fairly serious problem with the Genbank parser. The fact that it guess the type when writing the record is corrupting data. The priority of fixing this should be increased.on top A the very least this should be documented in the class, since its a huge waste of time looking for where it is store in the SeqRecord. ---------------------------------------- Bug #2578: The GenBank SeqRecord parser does not record molecule type or if circular https://redmine.open-bio.org/issues/2578 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.47 URL: Filing this bug after discussion on the mailing list, where the issue was raised by Chris Lasher: http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html The LOCUS line at the start of a GenBank record can record the molecule type (DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular, e.g. LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not record these two bits of information in the SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Jul 25 08:32:11 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 25 Jul 2011 08:32:11 +0000 Subject: [Biopython-dev] [Biopython - Bug #2578] The GenBank SeqRecord parser does not record molecule type or if circular References: Message-ID: Issue #2578 has been updated by Peter Cock. Regarding how is_circular is/should be stored in BioSQL, http://lists.open-bio.org/pipermail/biosql-l/2011-July/001774.html http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035433.html ---------------------------------------- Bug #2578: The GenBank SeqRecord parser does not record molecule type or if circular https://redmine.open-bio.org/issues/2578 Author: Peter Cock Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.47 URL: Filing this bug after discussion on the mailing list, where the issue was raised by Chris Lasher: http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html The LOCUS line at the start of a GenBank record can record the molecule type (DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular, e.g. LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not record these two bits of information in the SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Tue Jul 26 09:08:38 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Jul 2011 11:08:38 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi everyone, A few weeks ago I wrote about my interest in making Biopython able to parse the Abi trace file. I've finished writing the SeqIO plugin and some tests. I thought this would be useful to a number of people, so I was wondering about what I should do after this (how will my code be reviewed?, should I just go with a pull request?). Of course, there are things that I might have missed when writing the plugin, so feel free to criticize/comment :)! Here's the SeqIO plugin: https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py Looking forward to the reply, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Jul 7, 2011 at 03:16, Wibowo Arindrarto wrote: > Hi everyone, > > This is my first post in the dev mailing list, so greetings :). > > I've been using Biopython for a few months in total now (in a period of > ~1.5 years) and before that Python for ~0.5 years. Most of the time, I'm > working with Sanger sequencing results and at one point I was a bit > disappointed that I couldn't find any (bio)python module for reading .ab1 > files. That compelled me to write my first python module that reads those > files and extracts the useful information out of them. In the process I > became more interested in python itself and finally thought it might be neat > if biopython could do this, built-in. > > So I forked the main repo, made some changes to my module so it became a > parser for the SeqIO submodule that reads Abi files. It's not cooked 100% > yet, but if anyone is interested in seeing/commenting/criticizing the code, > I'd appreciate that very much. Here's the link: > https://github.com/bow/biopython/blob/seqio-abif/Bio/SeqIO/AbiIO.py > > Some features to note: > - I've included a method to trim the sequence based on its quality scores > - the parser does not extract the entire metadata of the trace files, only > ones I consider important for further analysis/annotations. Of course, this > could be changed if the community think some other data should be > included/excluded > - For those of you already familiar with the Abi format, I deliberately > chose the 'PBAS2' tag for the sequence information, which is the unedited > bases after base-calling by the sequencing program. > > Some things that I'm doing right now: > - writing unit tests > - making sure it's compatible with Python 3 (thanks Peter :)! ) > - completing the docs > - making sure it's compatible with most Abi format versions. Currently I've > only tested it with files from the 310, 3100, and 3700 machines. Does anyone > have some other versions that I can test this with? > > As I understand as well, this is not the only Sanger sequencing trace > format out there (e.g. SCF is another). I would be glad to learn more and > write a parser for the SCF format as well. The problem is, I'm not sure this > would be useful in the long run as I've personally never seen anyone use an > SCF file and so I've never had a chance to play around with one. If anyone > has an SCF file lying around and thinks SCF support would be beneficial, I'd > be happy to accept them :). > > I guess that's all for now. Thanks for reading! > > --- > Wibowo Arindrarto (bow) > http://bow.web.id > From p.j.a.cock at googlemail.com Tue Jul 26 09:59:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Jul 2011 10:59:33 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 10:08 AM, Wibowo Arindrarto wrote: > Hi everyone, > > A few weeks ago I wrote about my interest in making Biopython able to parse > the Abi trace file. I've finished writing the SeqIO plugin and some tests. I > thought this would be useful to a number of people, so I was wondering about > what I should do after this (how will my code be reviewed?, should I just go > with a pull request?). Of course, there are things that I might have missed > when writing the plugin, so feel free to criticize/comment :)! > > Here's the SeqIO plugin: > https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py > > Looking forward to the reply, Hi Wibowo, You could send a pull request if you wanted, but this email to the dev list is enough. I probably wouldn't just merge it - I prefer to rebase to the current master first to get a clean history (especially if there are already several merges in the branch history). I will review your work. In particular I plan to cross test with EMBOSS seqret to verify you produce the same sequence and the same PHRED quality scores - this could be done in test_Emboss.py Thanks for your work! Peter From w.arindrarto at gmail.com Tue Jul 26 10:37:23 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Jul 2011 12:37:23 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, You're welcome :). In that case, I'll just stick to submitting through this mailing list. I'm looking forward to your review! --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Jul 26, 2011 at 11:59, Peter Cock wrote: > On Tue, Jul 26, 2011 at 10:08 AM, Wibowo Arindrarto > wrote: > > Hi everyone, > > > > A few weeks ago I wrote about my interest in making Biopython able to > parse > > the Abi trace file. I've finished writing the SeqIO plugin and some > tests. I > > thought this would be useful to a number of people, so I was wondering > about > > what I should do after this (how will my code be reviewed?, should I just > go > > with a pull request?). Of course, there are things that I might have > missed > > when writing the plugin, so feel free to criticize/comment :)! > > > > Here's the SeqIO plugin: > > https://github.com/bow/biopython/blob/seqio-abi/Bio/SeqIO/AbiIO.py > > > > Looking forward to the reply, > > Hi Wibowo, > > You could send a pull request if you wanted, but this email to the dev list > is enough. I probably wouldn't just merge it - I prefer to rebase to the > current master first to get a clean history (especially if there are > already > several merges in the branch history). > > I will review your work. > > In particular I plan to cross test with EMBOSS seqret to verify you produce > the same sequence and the same PHRED quality scores - this could be > done in test_Emboss.py > > Thanks for your work! > > Peter > From peter at maubp.freeserve.co.uk Tue Jul 26 18:02:23 2011 From: peter at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Jul 2011 19:02:23 +0100 Subject: [Biopython-dev] Fwd: [blast-announce] New SOAP based BLAST service In-Reply-To: References: Message-ID: FYI ---------- Forwarded message ---------- From: Mcginnis, Scott (NIH/NLM/NCBI) [E] Date: Tue, Jul 26, 2011 at 5:12 PM Subject: [blast-announce] New SOAP based BLAST service To: NLM/NCBI List blast-announce A SOAP based BLAST service is available. ? ?This service makes use of the Simple Object Access Protocol to submit and retrieve searches with the NCBI BLAST web server. ?The service can also query the server for other information. ?A simple ("Lite") interface is available that should be suitable for most projects. ?Documentation and links to the WSDL and sample clients are http://www.ncbi.nlm.nih.gov/books/NBK55699/ From redmine at redmine.open-bio.org Tue Jul 26 20:16:43 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 26 Jul 2011 20:16:43 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (New) Updates to PDBList.py- downloading PDB structures Message-ID: Issue #3271 has been reported by David Cain. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Jul 26 20:16:43 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 26 Jul 2011 20:16:43 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (New) Updates to PDBList.py- downloading PDB structures Message-ID: Issue #3271 has been reported by David Cain. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From tiagoantao at gmail.com Wed Jul 27 03:19:48 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 03:19:48 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: Hi, Just some developments on the win 64 bit front: It now compiles and mostly tests (I say "mostly" because I do not have all external binaries installed, neither reportlab). I would like to enquire about the possibility of adding a (default) option to windows linkage. By default, link.exe generates manifest files, but on VS 2010 in some cases it does not (a somewhat related discussion can be found here http://bugs.python.org/issue4431 ). I would like to try and add the workaround described on that bug page, namely extra_link_args=["/MANIFEST"] to all Extensions on setup.py, for example EXTENSIONS.append( Extension('Bio.Cluster.cluster', ['Bio/Cluster/clustermodule.c', 'Bio/Cluster/cluster.c'], include_dirs=[numpy_include_dir], extra_link_args=["/MANIFEST"] )) [Note the extra_link_args] I am far from being a windows specialist, but from what I have read, this is harmless and would sort out the only issue that I have with compiling on VS2010 + Win SDK to generate native 64-bit binaries. Tiago From p.j.a.cock at googlemail.com Wed Jul 27 08:02:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Jul 2011 09:02:31 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Tiago Ant?o : > Hi, > > Just some developments on the win 64 bit front: It now compiles and > mostly tests (I say "mostly" because I do not have all external > binaries installed, neither reportlab). > > I would like to enquire about the possibility of adding a (default) > option to windows linkage. By default, link.exe generates manifest > files, but on VS 2010 in some cases it does not (a somewhat related > discussion can be found here http://bugs.python.org/issue4431 ). I > would like to try and add the workaround described on that bug page, > namely extra_link_args=["/MANIFEST"] to all Extensions on setup.py, > for example > ? ?EXTENSIONS.append( > ? ? ? ?Extension('Bio.Cluster.cluster', > ? ? ? ? ? ? ? ? ?['Bio/Cluster/clustermodule.c', > ? ? ? ? ? ? ? ? ? 'Bio/Cluster/cluster.c'], > ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > extra_link_args=["/MANIFEST"] > ? ? ? ? ? ? ? ? ?)) > > [Note the extra_link_args] > > I am far from being a windows specialist, but from what I have read, > this is harmless and would sort out the only issue that I have with > compiling on VS2010 + Win SDK to generate native 64-bit binaries. > > Tiago Can you make it conditional on Windows? Peter From tiagoantao at gmail.com Wed Jul 27 11:17:39 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 11:17:39 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Peter Cock : > Can you make it conditional on Windows? Silly me, just forgot all other platforms. Furthermore, the 32-bit compiler in Windows is VS? Because if it is mingw then something extra is needed. I will work on a new git fork to change the setup.py only to accomodate windows 64. Tiago From tiagoantao at gmail.com Wed Jul 27 11:59:42 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 11:59:42 +0000 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: One final question: If VS Express is used to compile the official biopython, what is the version? 2008? Tiago 2011/7/27 Tiago Ant?o : > 2011/7/27 Peter Cock : >> Can you make it conditional on Windows? > > Silly me, just forgot all other platforms. Furthermore, the 32-bit > compiler in Windows is VS? Because if it is mingw then something extra > is needed. > I will work on a new git fork to change the setup.py only to > accomodate windows 64. > > Tiago > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Wed Jul 27 12:17:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Jul 2011 13:17:41 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: 2011/7/27 Tiago Ant?o : > 2011/7/27 Peter Cock : >> Can you make it conditional on Windows? > > Silly me, just forgot all other platforms. Furthermore, the 32-bit > compiler in Windows is VS? Because if it is mingw then something extra > is needed. > I will work on a new git fork to change the setup.py only to > accomodate windows 64. > > Tiago Actually we (well - my Windows machine since that is the one that has done all the recent releases) are using a mixture depending on the version of Python. According to my old email from Dec 2010, since I don't have the machine in front of me right now: >> We're using mingw32 from Cygwin on older versions of Python, >> and I think Python 2.6 onwards I'm using Microsoft?s free VC++ >> 2008 Express Edition which was downloaded from >> http://www.microsoft.com/express/download/ See: http://lists.open-bio.org/pipermail/biopython-dev/2010-December/008582.html It would be worth asking anyone else on the dev list who has previously compiled and built installers on Windows (e.g. Michael) to also test any changes. I might on day be able to setup a 64bit Windows virtual machine, but right now I don't have time and it would require finding the right person in IT for the install media and licence etc. Peter From tiagoantao at gmail.com Wed Jul 27 12:26:08 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Jul 2011 13:26:08 +0100 Subject: [Biopython-dev] Windows buildslave back online In-Reply-To: References: Message-ID: I am using VS Express 2010 plus the Windows SDK. It seems to work. It is possible to generate 64 bit binaries on 32 bit machines, BTW (with cross compiling). I have made some very simple changes to setup.py : https://github.com/tiagoantao/biopython/blob/master/setup.py They should only impact Windows with 64-bit interpreters. Even if they do not work (which they seem to do), they should only break 64-bin win interpreters (which are not functional anyway). Tiago 2011/7/27 Peter Cock : > 2011/7/27 Tiago Ant?o : >> 2011/7/27 Peter Cock : >>> Can you make it conditional on Windows? >> >> Silly me, just forgot all other platforms. Furthermore, the 32-bit >> compiler in Windows is VS? Because if it is mingw then something extra >> is needed. >> I will work on a new git fork to change the setup.py only to >> accomodate windows 64. >> >> Tiago > > Actually we (well - my Windows machine since that is the one > that has done all the recent releases) are using a mixture > depending on the version of Python. > > According to my old email from Dec 2010, since I don't have > the machine in front of me right now: >>> We're using mingw32 from Cygwin on older versions of Python, >>> and I think Python 2.6 onwards I'm using Microsoft?s free VC++ >>> 2008 Express Edition which was downloaded from >>> http://www.microsoft.com/express/download/ > > See: http://lists.open-bio.org/pipermail/biopython-dev/2010-December/008582.html > > It would be worth asking anyone else on the dev list who has > previously compiled and built installers on Windows (e.g. Michael) > to also test any changes. > > I might on day be able to setup a 64bit Windows virtual machine, > but right now I don't have time and it would require finding the > right person in IT for the install media and licence etc. > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Thu Jul 28 23:01:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 00:01:29 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Jul 26, 2011 at 11:37 AM, Wibowo Arindrarto wrote: > > I'm looking forward to your review! > Hi Bow, I made a start tonight, https://github.com/peterjc/biopython/tree/seqio-abi I haven't added the ABI files to test_SeqIO.py yet, there is some alphabet issue to check there. I noticed that EMBOSS seqret (at least the old copy I had installed on my laptop, v6.1.0 I think) was able to give ABI records identifiers (rather than the hack of using the filename). Peter From w.arindrarto at gmail.com Fri Jul 29 08:07:43 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 29 Jul 2011 10:07:43 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I made a local branch tracking your seqio-abi tree. I agree to most of the changes, but I think I'm a bit lost on the filename part. My intention is to use the filename of the Abi file as the ID for the SeqRecord, instead of the stored records identified returned by seqret. The reason is because it's easier to see which Abi file a SeqRecord came from by looking at the ID (or output file name, in case the SeqRecord is written as another format), since the records identifier data is not readily available. I chose to store the records identifier in SeqRecord.name (sample_id), so users can still cross check if they want to. My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being absent, now that I think of it. But do you think instead of 'None', maybe we could use 'file_id = str(handle)' or 'file_id = self.name'? And lastly, could you clarify what you mean by alphabet issue on test_SeqIO.py? Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 01:01, Peter Cock wrote: > On Tue, Jul 26, 2011 at 11:37 AM, Wibowo Arindrarto wrote: > > > > I'm looking forward to your review! > > > > Hi Bow, > > I made a start tonight, > https://github.com/peterjc/biopython/tree/seqio-abi > > I haven't added the ABI files to test_SeqIO.py yet, > there is some alphabet issue to check there. > > I noticed that EMBOSS seqret (at least the old > copy I had installed on my laptop, v6.1.0 I think) > was able to give ABI records identifiers (rather > than the hack of using the filename). > > Peter > From p.j.a.cock at googlemail.com Fri Jul 29 09:39:20 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 10:39:20 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 9:07 AM, Wibowo Arindrarto wrote: > Hi Peter, > I made a local branch tracking your seqio-abi tree. I agree to most of the > changes, but I think I'm a bit lost on the filename part. > My intention is to use the filename of the Abi file as the ID for the > SeqRecord, instead of the stored records identified returned by seqret. The > reason is because it's easier to see which Abi file a SeqRecord came from by > looking at the ID (or output file name, in case the SeqRecord is written as > another format), since the records identifier data is not readily available. > I chose to store the records identifier in SeqRecord.name (sample_id), so > users can still cross check if they want to. > My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being > absent, now that I think of it. But do you think instead of 'None', maybe we > could use 'file_id = str(handle)' or 'file_id = self.name'? There may not be a filename - the ABI file might be piped from stdin, or supplied as a StringIO handle, or a network handle. So using the filename as the primary identifier seems wrong to me. I would want the same ID regardless of how the file was loaded, or what the name was. Using the filename as the SeqRecord name (if available, "" if not) would be OK with me. The other justification for using the ID in the file as the SeqRecord's id is consistency with EMBOSS. We should also check how BioPerl does it - but I'm not sure if I have all the dependencies installed. Also, is it possible to concatenate multiple ABI files together? > And lastly, could you clarify what you mean by alphabet issue on > test_SeqIO.py? Add the three good ABI test files to the list in test_SeqIO.py and run the test, you'll get a complaint about the alphabet handling. I didn't have time to look into what exactly was going on yet. Peter From w.arindrarto at gmail.com Fri Jul 29 11:34:12 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 29 Jul 2011 13:34:12 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, Thanks for explaining. I understand why we should stick to the stored sequence id. In this case, we can use the filename as SeqRecord.name as well. Regarding BioPerl, I don't have it installed myself -- but I took a quick look at their source and it seems they also use the stored sequence ID as their main identifier instead of the filename. If the stored sequence ID is not present, it's "(unknown)" in their case. As for concatenation, I don't think it's possible. The official specfrom ABI does not mention anything about combining ABI records. Plus, the file structure itself does not allow multiple sequence to be stored. I'll look on the test_SeqIO.py over the weekend. I think it'll have something to do with some ambiguous dna base stored in the abi files. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 11:39, Peter Cock wrote: > On Fri, Jul 29, 2011 at 9:07 AM, Wibowo Arindrarto > wrote: > > Hi Peter, > > I made a local branch tracking your seqio-abi tree. I agree to most of > the > > changes, but I think I'm a bit lost on the filename part. > > My intention is to use the filename of the Abi file as the ID for the > > SeqRecord, instead of the stored records identified returned by seqret. > The > > reason is because it's easier to see which Abi file a SeqRecord came from > by > > looking at the ID (or output file name, in case the SeqRecord is written > as > > another format), since the records identifier data is not readily > available. > > I chose to store the records identifier in SeqRecord.name (sample_id), so > > users can still cross check if they want to. > > My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being > > absent, now that I think of it. But do you think instead of 'None', maybe > we > > could use 'file_id = str(handle)' or 'file_id = self.name'? > > There may not be a filename - the ABI file might be piped from stdin, > or supplied as a StringIO handle, or a network handle. So using the > filename as the primary identifier seems wrong to me. I would want > the same ID regardless of how the file was loaded, or what the name > was. Using the filename as the SeqRecord name (if available, "" if > not) would be OK with me. > > The other justification for using the ID in the file as the SeqRecord's id > is consistency with EMBOSS. We should also check how BioPerl does > it - but I'm not sure if I have all the dependencies installed. > > Also, is it possible to concatenate multiple ABI files together? > > > And lastly, could you clarify what you mean by alphabet issue on > > test_SeqIO.py? > > Add the three good ABI test files to the list in test_SeqIO.py and > run the test, you'll get a complaint about the alphabet handling. > I didn't have time to look into what exactly was going on yet. > > Peter > From p.j.a.cock at googlemail.com Fri Jul 29 12:14:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 13:14:06 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: > Hi Peter, > Thanks for explaining. I understand why we should stick to the stored > sequence id. In this case, we can use the filename as SeqRecord.name as > well. Regarding BioPerl, I don't have it installed myself -- but I took a > quick look at their source and it seems they also use the stored sequence ID > as their main identifier instead of the filename. If the stored sequence ID > is not present, it's "(unknown)" in their case. OK good, that means Biopython, BioPerl and EMBOSS should be consistent :) > As for concatenation, I don't think it's possible. The official spec from > ABI does not mention anything about combining ABI records. Plus, the file > structure itself does not allow multiple sequence to be stored. OK good, I didn't think it was allowed but wanted to check. > I'll look on the test_SeqIO.py over the weekend. I think it'll have > something to do with some ambiguous dna base stored in the abi files. > Regards, Some of the alphabet stuff is a bit nasty - so please feel free to ask or get me to help. Peter From p.j.a.cock at googlemail.com Fri Jul 29 16:20:23 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 29 Jul 2011 17:20:23 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi again, I had a bit of time this afternoon so I looked at this. On Fri, Jul 29, 2011 at 1:14 PM, Peter Cock wrote: > On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: >> Hi Peter, >> Thanks for explaining. I understand why we should stick to the stored >> sequence id. In this case, we can use the filename as SeqRecord.name as >> well. Regarding BioPerl, I don't have it installed myself -- but I took a >> quick look at their source and it seems they also use the stored sequence ID >> as their main identifier instead of the filename. If the stored sequence ID >> is not present, it's "(unknown)" in their case. > > OK good, that means Biopython, BioPerl and EMBOSS should be > consistent :) I've made that switch, >> I'll look on the test_SeqIO.py over the weekend. I think it'll have >> something to do with some ambiguous dna base stored in the abi files. >> Regards, > > Some of the alphabet stuff is a bit nasty - so please feel free to ask > or get me to help. I've done enough to get the test_SeqIO.py unit test to pass. We probably need a check (like in SFF) to check the user hasn't given a handle opened in text mode. That should probably have a unit test too. I still haven't cross checked the sequence and PHRED scores from your code and EMBOSS. Anyway - I'll leave the code for you to work on for now... Peter From w.arindrarto at gmail.com Sat Jul 30 07:42:04 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 30 Jul 2011 09:42:04 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I've done some more improvements to the code: - I've written the check and unittest for the file handle mode. I've set it so that abi file has to be opened in 'rb' mode, otherwise it'll return an error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be specified as 'rb' in Windows and/or Python 3 for the file to be read correctly. So I decided forcing it to 'rb' is the best. Because of this, I changed 'test_SeqIO.py:503' to include the mode argument when opening. - I've also checked against test_Emboss.py for seqret output, after including the abi format in it. My EMBOSS version is 6.4.0. There was a slight problem with this testing, since for some reason the ID returned by seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS installation, since when I previously tested it against 6.1.0, the ID was correct (although the qual values not, so I had to upgrade). As expected, if I comment out the code that tests for sequence id ('test_Emboss.py:168-172') the tests pass. Maybe you could try testing it as well and see if EMBOSS also returns the default id instead of the sample name? - Finally, I did some small cosmetic changes to the code (typos, etc). All changes have been pushed to my github fork. Now I still have time for the weekend to improve whatever needs to be improved :). Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 29, 2011 at 18:20, Peter Cock wrote: > Hi again, > > I had a bit of time this afternoon so I looked at this. > > On Fri, Jul 29, 2011 at 1:14 PM, Peter Cock > wrote: > > On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote: > >> Hi Peter, > >> Thanks for explaining. I understand why we should stick to the stored > >> sequence id. In this case, we can use the filename as SeqRecord.name as > >> well. Regarding BioPerl, I don't have it installed myself -- but I took > a > >> quick look at their source and it seems they also use the stored > sequence ID > >> as their main identifier instead of the filename. If the stored sequence > ID > >> is not present, it's "(unknown)" in their case. > > > > OK good, that means Biopython, BioPerl and EMBOSS should be > > consistent :) > > I've made that switch, > > >> I'll look on the test_SeqIO.py over the weekend. I think it'll have > >> something to do with some ambiguous dna base stored in the abi files. > >> Regards, > > > > Some of the alphabet stuff is a bit nasty - so please feel free to ask > > or get me to help. > > I've done enough to get the test_SeqIO.py unit test to pass. > > We probably need a check (like in SFF) to check the user hasn't given > a handle opened in text mode. That should probably have a unit test > too. > > I still haven't cross checked the sequence and PHRED scores from > your code and EMBOSS. > > Anyway - I'll leave the code for you to work on for now... > > Peter > From redmine at redmine.open-bio.org Sun Jul 31 20:22:05 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 31 Jul 2011 20:22:05 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by Eric Talevich. Hi David, Thanks for doing this. Overall I agree with your solution. I peppered your proposed fix with review comments on Github: https://github.com/DavidCain/biopython/commit/e6eef7e2a8117b6de4e9fdea3b4bd77575d383cf Once you've looked at it again can you submit your pdb-fixes branch as a pull request on GitHub? (If not, no worries, I can cherry-pick it. Just let us know when you're ready.) -Eric ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org