From w.arindrarto at gmail.com Thu Nov 1 04:19:58 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 1 Nov 2012 09:19:58 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> Message-ID: Hi Kai, Michiel, (I hope this gets through to the mailing list. I'm CC-ing several people in the discussion as well, just in case). I've made a new branch based on Kai's SearchIO rebase here: https://github.com/bow/biopython/tree/searchio-rebase, with the following important changes: >>Does anyone have preference between '.acc' or '.accession'? If not, I >>can change the current '.acc' into '.accession'. > > I would prefer .accession for clarity. 1. All accession attributes now use the 'accession' name (https://github.com/bow/biopython/commit/002b08df91040e6bcf3f0dd3d087b3d378005632). There's a similar attribute from blast-tab, which is the accession number and its version. This has also been renamed from 'acc_ver' to 'accession_version'. The docs have been updated accordingly. > See the attached hmmpfam output. You'll notice that the domain table > is not in the order of the hit table. As I'd like to preserve the > order of the hit table, the current setup of the API forces me to > either repeatedly parse the domain annotations until I find the > correct domain annotations for my hit, or to create the hits in the > order of the domain annotation table and then reshuffle them to make > sure they're in the order of the hit table. > > If I could just create "empty" hit objects when parsing the hit table, > I could easily preserve the order of the hits but still add the hsps > as I parse them. 2. Regarding the Hit object API change, I've changed it so that Hit objects can now be created without any HSPs (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4). However, per my explanation about keeping as few places possible to store the same value (in this case the hit and query ID and description), the empty Hit object will raise errors if any of these attributes are accessed. Setting and getting these attributes will only work if there is at least one HSP in the Hit. Other Hit functions, like append, should work ok as long as it doesn't involve accessing these attributes. I think this will allow parsing of file formats like HMMER2 plain text while maintaining the attribute storage constraint. Hope these help :). regards, Bow From kai.blin at biotech.uni-tuebingen.de Thu Nov 1 05:10:11 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 01 Nov 2012 10:10:11 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> Message-ID: <50923C73.8060609@biotech.uni-tuebingen.de> On 2012-11-01 09:19, Wibowo Arindrarto wrote: Hi Bow, > 2. Regarding the Hit object API change, I've changed it so that Hit > objects can now be created without any HSPs > (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4). > However, per my explanation about keeping as few places possible to > store the same value (in this case the hit and query ID and > description), the empty Hit object will raise errors if any of these > attributes are accessed. Setting and getting these attributes will > only work if there is at least one HSP in the Hit. Other Hit > functions, like append, should work ok as long as it doesn't involve > accessing these attributes. I think this will allow parsing of file > formats like HMMER2 plain text while maintaining the attribute storage > constraint. I totally agree the Hit object isn't valid until it has at least one HSP. Thanks for that change. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From redmine at redmine.open-bio.org Thu Nov 1 06:48:11 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 1 Nov 2012 10:48:11 +0000 Subject: [Biopython-dev] [Biopython - Bug #3297] (Rejected) newline added in quated features References: Message-ID: Issue #3297 has been updated by Peter Cock. Status changed from New to Rejected Was this really files a year ago or is that an oddity in RedMine? All the discussion is in the last day... This to me is a bug in the GenBank data, rather than this:
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"
the data should have been line-split in a more sensible place, e.g.
                     /product="Glutamate synthase [NADPH] small chain (EC
                     1.4.1.13)"
In any case, the suggested fix is inappropriate for two reasons. First, as noted by Paul, it would remove the white space between words (the typical case). Second, the GenBank parser uses a scanner/consumer, with the GenBank specific consumer attempting to closely model the underlying data (and in this case keep the new lines as given) while the SeqRecord consumer (used by SeqIO) would convert the newlines into spaces. As noted by Paul, the translation value is a special case. Closing issue. ---------------------------------------- Bug #3297: newline added in quated features https://redmine.open-bio.org/issues/3297 Author: Jesse van Dam Status: Rejected Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system When I have a feature line like (which spans multiple lines) in a genbank file
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
  print(source[0].qualifiers["product"])
It will print (with the an unwanted space)
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
Changed the following thing in scanner.py to fix this problem
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

-- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Thu Nov 1 10:36:36 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 1 Nov 2012 15:36:36 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <50923C73.8060609@biotech.uni-tuebingen.de> References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> <50923C73.8060609@biotech.uni-tuebingen.de> Message-ID: Hi Kai, You're welcome :). I was thinking changing Hit similar to QueryResult, which you can create without containing any items. The trade off is that there's more attributes to keep track of (4 instead of 2) due to them being stored apart from the contained objects, so I chose not to do it for now. Anyway, let me know if there are still parsing difficulties because of the object model. cheers, Bow On Thu, Nov 1, 2012 at 10:10 AM, Kai Blin wrote: > On 2012-11-01 09:19, Wibowo Arindrarto wrote: > > Hi Bow, > > > 2. Regarding the Hit object API change, I've changed it so that Hit > > objects can now be created without any HSPs > > ( > https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4 > ). > > However, per my explanation about keeping as few places possible to > > store the same value (in this case the hit and query ID and > > description), the empty Hit object will raise errors if any of these > > attributes are accessed. Setting and getting these attributes will > > only work if there is at least one HSP in the Hit. Other Hit > > functions, like append, should work ok as long as it doesn't involve > > accessing these attributes. I think this will allow parsing of file > > formats like HMMER2 plain text while maintaining the attribute storage > > constraint. > > I totally agree the Hit object isn't valid until it has at least one > HSP. Thanks for that change. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From eric.talevich at gmail.com Thu Nov 1 14:10:17 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 1 Nov 2012 14:10:17 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock wrote: > On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: > > > > Peter; > > > >> In the case of Bow's SearchIO code, what would you prefer? > >> e.g. Bio.SearchIO as it is now on his branch? > > > > I like plain ol' Search the best but don't have a strong preference. I'm > > terrible at naming things so trust everyone's judgment on this. > > > > Brad > > Since we have no clear consensus, I propose we add Bow's code > as Bio.SearchIO (which is how it is written right now), with the new > BiopythonExperimentalWarning in place (to alert people that it may > change in the next release). We can then rename or move it at a > later date. This will make it easier for people to test the code, and > also suggest further changes or additions (e.g. Kai's HMMER work). > > If we and when we agree a consolidation of the Bio.SeqXXX > modules, then Bio.SearchIO could move too. If this happens > before any public release as Bio.SearchIO so much the better. > > Adopting lower case module names under Python 3 is also a > separate issue. > > Peter > > +1 Regarding the "great upheaval" of module renaming and reorganization: 0. If the only change is to combine the SeqIO, Seq, SeqRecord and SeqFeature classes under a single module, we probably can do that in a backwards-compatible way. But that means keeping our StudlyCaps module names for the most part. 1. If we're going to change the API substantially, we might as well "do it right". Besides our PEP8 non-compliance, there are some dark, dusty corners of Biopython that we ought to clean up while we're at it -- reorganize the little historical fiefdoms into a coherent structure. We'd call it Biopython 2. 2. Observing BioPerl and BioRuby, it could make sense to split the distribution into multiple, with a sequence- and data-oriented "biopython-core" package and separate packages for, say, 3D structures ("biopython-struct") and perhaps other existing components that have ready maintainers and which the "core" of Biopython doesn't rely on. I don't think we need to fragment the code base much, primarily just extract PDB, SCOP and the other parts that depend on NumPy. On GitHub, these repositories would still be under the biopython organization name. 3. If we've decided to focus on Python 3 for the reorganization, we can take advantage of new features in that lineage for packaging, organization and distribution. These features could make it easier to have side-by-side Biopython 1 and 2 installations (maybe), and also plugging additional modules into the main "bio" package (namespace packages, new in Py3.3). 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy convention. 5. Porting: I, personally, would keep using the old Biopython for everything that's meant to run on Python 2, which is, currently, everything. Biopython2 running on Python 3 would give me an excuse to start using Python 3 for new code. Keeping these separate would be more difficult if the lowercasing were done under the same "Bio" namespace. Thoughts? -Eric From p.j.a.cock at googlemail.com Thu Nov 1 14:46:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 18:46:36 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: > On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock > wrote: >> >> Since we have no clear consensus, I propose we add Bow's code >> as Bio.SearchIO (which is how it is written right now), with the new >> BiopythonExperimentalWarning in place (to alert people that it may >> change in the next release). We can then rename or move it at a >> later date. This will make it easier for people to test the code, and >> also suggest further changes or additions (e.g. Kai's HMMER work). >> >> If we and when we agree a consolidation of the Bio.SeqXXX >> modules, then Bio.SearchIO could move too. If this happens >> before any public release as Bio.SearchIO so much the better. >> >> Adopting lower case module names under Python 3 is also a >> separate issue. >> >> Peter >> > > +1 > > Regarding the "great upheaval" of module renaming and reorganization: > > 0. If the only change is to combine the SeqIO, Seq, SeqRecord and > SeqFeature classes under a single module, we probably can do that > in a backwards-compatible way. But that means keeping our > StudlyCaps module names for the most part. Yes, that is something we could do in a backwards compatible way, with the old "StdulyCaps" Bio.SeqXXX modules persisting as legacy imports for at least a year (say). But it is worth it? See below. > 1. If we're going to change the API substantially, we might as well "do it > right". Besides our PEP8 non-compliance, there are some dark, dusty corners > of Biopython that we ought to clean up while we're at it -- reorganize the > little historical fiefdoms into a coherent structure. We'd call it Biopython > 2. Absolutely there are things we've lived with out of backwards compatibility - the Alphabet objects are one example (foremost the way gaps and stops codons were done with wrapper objects). I'd also like us to switch the restriction digest module to using zero based counting as Guido intended, and simplify some of the more 'magical' code which has caused trouble porting to the other Python implementations. > 2. Observing BioPerl and BioRuby, it could make sense to split the > distribution into multiple, with a sequence- and data-oriented > "biopython-core" package and separate packages for, say, 3D structures > ("biopython-struct") and perhaps other existing components that have ready > maintainers and which the "core" of Biopython doesn't rely on. I don't think > we need to fragment the code base much, primarily just extract PDB, SCOP and > the other parts that depend on NumPy. On GitHub, these repositories would > still be under the biopython organization name. A clearer divide would be good - something we have at some level already along the lines with and without numpy. However, given the still unclear future for python packaging I'm not quite so sure if we can/should go all the way to separate packages. Perhaps I am being unduly worried by the concerns in the numpy/scipy community? After all, we have no fortran code! > 3. If we've decided to focus on Python 3 for the reorganization, we can take > advantage of new features in that lineage for packaging, organization and > distribution. These features could make it easier to have side-by-side > Biopython 1 and 2 installations (maybe), and also plugging additional > modules into the main "bio" package (namespace packages, new in Py3.3). We can and should port the current namespace to Python 3, but writing "Biopython 2" for Python 3 only (not Python 2) sounds wise. More on this below. > 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't > know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy > convention. As noted before, we couldn't use "bio" on the average Mac either - the default file system is like Windows, case insensitive. The name biopy is in-line with bumpy/scipy, which is a plus. I know not everyone liked this name, but personally it seems fine. Better than bio2 in my view. > 5. Porting: I, personally, would keep using the old Biopython for everything > that's meant to run on Python 2, which is, currently, everything. Biopython2 > running on Python 3 would give me an excuse to start using Python 3 for new > code. Keeping these separate would be more difficult if the lowercasing were > done under the same "Bio" namespace. > > Thoughts? As noted above, I'm on board with planning a Biopython 2 requiring Python 3 or later. I would regard this as effectively be forking from the current code base, porting individual modules on a case by case basis (doing a final 2to3 conversion manually as part of this). The code could be shared as a series of 'alpha' level releases for early testing - assume we want to make some releases, particularly for Windows where fewer potential testers would have all the compilers setup to follow the repository. However, if we do that, we would still support Biopython 1.xx under Python 3 as well (via 2to3 as we are now, currently 'beta' level support) for some time in parallel (although likely not getting major new features - just bug fixes and if required updates for format changes). Is there enough enthusiasm now to start planning what we'd change for a (potentially Python 3 only) Biopython 2 yet? Peter From p.j.a.cock at googlemail.com Thu Nov 1 15:40:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 19:40:32 +0000 Subject: [Biopython-dev] Fwd: OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: FYI regarding the Biopython website and recent mailing list outage. Peter PS you also keep an eye on @Biopython and @OBF_news on Twitter, which are a useful alternative when the mailing lists are down. ---------- Forwarded message ---------- From: *Peter Cock* Date: Thursday, November 1, 2012 Subject: OBF server outage announcement / call for SysAdmin volunteers To: open-bio-l at lists.open-bio.org, OBF Members Cc: Chris Dagdigian , OBF Board Dear all, As many of you may have noticed, yesterday the Open Bioinformatics Foundation (OBF) server hosting the mailing lists and most of the Bio* websites went down. The mailing lists and simple static webpages (e.g. download pages for Bio* releases) seem to be back online, as is the OBF news blog: http://news.open-bio.org/news/ - but the wiki pages are down (which unfortunately means the Bio* homepages are unavailable). Services on the failing server are being moved to virtual machines on the Amazon Cloud, so it may take a few days until everything has been set up properly and the wiki will be back. If there is anybody from the Bio* projects who wants to join the OBF's SysAdmin team and help out with projects like this one, this would be a good moment to volunteer - please email me or Chris Dagdigian (the OBF Treasurer and our head Systems Administrator). Thank you, and please bear with us, Peter On behalf of the OBF Board of Directors. From p.j.a.cock at googlemail.com Thu Nov 1 15:50:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 19:50:50 +0000 Subject: [Biopython-dev] OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock wrote: > FYI regarding the Biopython website and recent mailing list outage. > > Peter > > PS you also keep an eye on @Biopython and @OBF_news on Twitter, > which are a useful alternative when the mailing lists are down. > > I should have added that while the wiki is down (which does unfortunately include the Biopython home page), the Biopython downloads remain available via http://biopython.org/DIST/ and other 'static' content like the Tutorial and API pages are up: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/api/ Our source code repository is on GitHub, also fine: https://github.com/biopython/biopython Issue tracking is on our RedMine server, also fine: https://redmine.open-bio.org/projects/biopython Nightly unit tests are on our Buildbot server, also fine: http://testing.open-bio.org/biopython/tgrid Continuous integration testing is on TravisCI, also fine: http://travis-ci.org/biopython/biopython Regards, Peter From andrewscz at gmail.com Thu Nov 1 16:32:10 2012 From: andrewscz at gmail.com (Andrew Sczesnak) Date: Thu, 1 Nov 2012 13:32:10 -0700 Subject: [Biopython-dev] Pull Request: MafIO.py In-Reply-To: References: <620A45B10433AE4C81D3F931A02812F93BE3FB5721@LESMBX1.adf.bham.ac.uk> Message-ID: Thanks Nick! I updated the MafIO branch to allow reading of other key names not specified in the MAF spec. However, writing is still restricted to "score" and "pass" keys. On Thu, Nov 1, 2012 at 4:51 AM, Nick Loman wrote: > Hi Andrew > > Here you go: > > https://gist.github.com/58bc53d492ecc112d926 > > Thanks for your help > > Regards > > Nick > > > > On Wed, Oct 31, 2012 at 6:10 PM, Andrew Sczesnak > wrote: >> >> Nick, >> >> Can you provide a snippet of a file from mugsy for the unit tests? >> >> Thanks, >> Andrew >> >> On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org >> wrote: >> >> > From: Nick Loman >> > Date: Tue, Oct 30, 2012 at 6:34 AM >> > Subject: Pull Request: MafIO.py >> > >> > >> > Hi there >> > >> > Thanks for the MafIO branch. In order to get it to read MAF files >> > produced >> > by Mugsy (mugsy.sourceforge.net) I had to make the following change: >> > >> > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py >> > index 6eda0ca..4bb1407 100644 >> > --- a/Bio/AlignIO/MafIO.py >> > +++ b/Bio/AlignIO/MafIO.py >> > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = >> > single_letter_alphabet): >> > >> > annotations = dict([x.split("=") for x in >> > line.strip().split()[1:]]) >> > >> > - if len([x for x in annotations.keys() if x not in ("score", >> > "pass")]) > 0: >> > + if len([x for x in annotations.keys() if x not in ("score", >> > "pass", "label", "mult")]) > 0: >> > raise ValueError("Error parsing alignment - invalid key >> > in >> > 'a' line") >> > elif line.startswith("#"): >> > # ignore comments >> > >> > >> > My Python fork is a bit confusing right now so hope you don't mind me >> > sending this pull request via email! >> > >> > Cheers >> > >> > Nick > > From eric.talevich at gmail.com Thu Nov 1 22:47:56 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 1 Nov 2012 22:47:56 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Thu, Nov 1, 2012 at 2:46 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich > wrote: > > > 2. Observing BioPerl and BioRuby, it could make sense to split the > > distribution into multiple, with a sequence- and data-oriented > > "biopython-core" package and separate packages for, say, 3D structures > > ("biopython-struct") and perhaps other existing components that have > ready > > maintainers and which the "core" of Biopython doesn't rely on. I don't > think > > we need to fragment the code base much, primarily just extract PDB, SCOP > and > > the other parts that depend on NumPy. On GitHub, these repositories would > > still be under the biopython organization name. > > A clearer divide would be good - something we have at some level > already along the lines with and without numpy. However, given > the still unclear future for python packaging I'm not quite so sure > if we can/should go all the way to separate packages. Perhaps I > am being unduly worried by the concerns in the numpy/scipy > community? After all, we have no fortran code! > My own use of packaging features and setuptools in particular is pretty primitive, so I'm not sure what the risks are. Having a separate repository for structure-related code would make it much easier for me and Jo?o to hack on a Bio.PDB successor, I think. It would also be nice to have a dependency-free "core" and then a bit more flexibility in using dependencies for add-on packages -- there are a lot of good existing libraries for structural biology, for instance, and since performance is so important there we even might want to start using Cython for some of that code. Then there's Lenna's pure-Python mmCIF parser which depends on PLY. > > 5. Porting: I, personally, would keep using the old Biopython for > everything > > that's meant to run on Python 2, which is, currently, everything. > Biopython2 > > running on Python 3 would give me an excuse to start using Python 3 for > new > > code. Keeping these separate would be more difficult if the lowercasing > were > > done under the same "Bio" namespace. > > > > Thoughts? > > > As noted above, I'm on board with planning a Biopython 2 requiring Python 3 > or later. I would regard this as effectively be forking from the current > code > base, porting individual modules on a case by case basis (doing a final > 2to3 > conversion manually as part of this). The code could be shared as a series > of 'alpha' level releases for early testing - assume we want to make some > releases, particularly for Windows where fewer potential testers would > have all the compilers setup to follow the repository. > > Sounds good to me. > However, if we do that, we would still support Biopython 1.xx under > Python 3 as well (via 2to3 as we are now, currently 'beta' level support) > for some time in parallel (although likely not getting major new features - > just bug fixes and if required updates for format changes). > > Sure. I'm assuming it will be some time before we have a Biopython2 we're happy with, sorting out the module organization, dusting off old code, dealing with module-specific dependencies and so on, and I'm OK with that. > Is there enough enthusiasm now to start planning what we'd change for > a (potentially Python 3 only) Biopython 2 yet? > > Peter > Maybe a good time to create the initial fork would be after we've merged the latest GSoC work and any feasible long-running branches. The Bio.PDB-related GSoC work, on the other hand, seems to be held up specifically because we're afraid to muck with the existing sub-package too much with unstable new code, and I can imagine it would be easier to land it in a new namespace. -Eric From mjldehoon at yahoo.com Fri Nov 2 12:01:35 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 2 Nov 2012 09:01:35 -0700 (PDT) Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: Message-ID: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi everybody, --- On Thu, 11/1/12, Eric Talevich wrote: > 1. If we're going to change the API substantially, we might > as well "do it right". Besides our PEP8 non-compliance, there > are some dark, dusty corners of Biopython that we ought to clean > up while we're at it -- reorganize the little historical fiefdoms > into a coherent structure. We'd call it Biopython 2. +1. > 2. Observing BioPerl and BioRuby, it could make sense to > split the distribution into multiple, with a sequence- and > data-oriented "biopython-core" package and separate packages > for, say, 3D structures ("biopython-struct") and perhaps other > existing components that have ready > maintainers and which the "core" of Biopython doesn't rely > on. I don't think we need to fragment the code base much, > primarily just extract PDB, SCOP and the other parts that > depend on NumPy. This goes against the "coherent structure" in point 1. What is the advantage of splitting the distribution according to whether a module needs NumPy or not? I don't see an advantage to the user, and I don't see an advantage to the developers either. Already I feel that we need to install too many packages to get going with Python in bioinformatics (Python itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to explain to people new to bioinformatics or new to Python. So I would prefer to keep one distribution. We can be more lenient in terms of dependencies, especially those that don't occur at compile time. > 4. Naming: "bio" is clean but might cause problems on > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; > "biopy" follows the numpy/scipy convention. Any problems on Windows will only occur during a transition period, so I wouldn't worry about that too much. Perhaps we should check if there would be any problems; if they are severe, we could check for an existing Biopython installation in setup.py. bio2 would stay with us forever (well at least until bio3) and is just plain ugly, especially to new users who are not aware of the transition. Then there is the issue that "bio2" would not be for Python 2 but for Python 3. The "py" is needed in numpy and scipy because otherwise it would be "num" and "sci", which is too short. On the other hand, "bio" is used as a prefix in lots of words, and can stand on its own. Therefore, hurray for "bio". > 5. Porting: I, personally, would keep using the old Biopython for > everything that's meant to run on Python 2, which is, currently, > everything. Biopython2 running on Python 3 would give me an > excuse to start using Python 3 for new code. Keeping these > separate would be more difficult if the lowercasing were done > under the same "Bio" namespace. Yes that makes sense. Best, -Michiel. From anaryin at gmail.com Sat Nov 3 07:12:37 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 3 Nov 2012 12:12:37 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi everyone, A bit late for the party but my two cents. I agree with Eric in that we should take the opportunity to review some "dark corners" of the code. Regarding what I can contribute to, there are a lot of changes planned for Bio.PDB that could benefit from a "cleaner start". However, and also in line with Michiel, splitting the distribution in core/extras would be more cumbersome for new users. However, what about having in the setup file a part where the user can turn on/off installation of particular parts of the package. This way you can control if you need the dependencies or not. By default you would install everything as it is now, but it would give you a larger degree of control. As for the namespace and lowercase, I don't really have strong arguments, but I like 'bio'. Cheers, Jo?o Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/11/2 Michiel de Hoon > Hi everybody, > > --- On Thu, 11/1/12, Eric Talevich wrote: > > 1. If we're going to change the API substantially, we might > > as well "do it right". Besides our PEP8 non-compliance, there > > are some dark, dusty corners of Biopython that we ought to clean > > up while we're at it -- reorganize the little historical fiefdoms > > into a coherent structure. We'd call it Biopython 2. > > +1. > > > 2. Observing BioPerl and BioRuby, it could make sense to > > split the distribution into multiple, with a sequence- and > > data-oriented "biopython-core" package and separate packages > > for, say, 3D structures ("biopython-struct") and perhaps other > > existing components that have ready > > maintainers and which the "core" of Biopython doesn't rely > > on. I don't think we need to fragment the code base much, > > primarily just extract PDB, SCOP and the other parts that > > depend on NumPy. > > This goes against the "coherent structure" in point 1. What is the > advantage of splitting the distribution according to whether a module needs > NumPy or not? I don't see an advantage to the user, and I don't see an > advantage to the developers either. Already I feel that we need to install > too many packages to get going with Python in bioinformatics (Python > itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to > compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to > explain to people new to bioinformatics or new to Python. So I would prefer > to keep one distribution. > > We can be more lenient in terms of dependencies, especially those that > don't occur at compile time. > > > 4. Naming: "bio" is clean but might cause problems on > > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; > > "biopy" follows the numpy/scipy convention. > > Any problems on Windows will only occur during a transition period, so I > wouldn't worry about that too much. Perhaps we should check if there would > be any problems; if they are severe, we could check for an existing > Biopython installation in setup.py. > > bio2 would stay with us forever (well at least until bio3) and is just > plain ugly, especially to new users who are not aware of the transition. > Then there is the issue that "bio2" would not be for Python 2 but for > Python 3. > > The "py" is needed in numpy and scipy because otherwise it would be "num" > and "sci", which is too short. On the other hand, "bio" is used as a prefix > in lots of words, and can stand on its own. Therefore, hurray for "bio". > > > 5. Porting: I, personally, would keep using the old Biopython for > > everything that's meant to run on Python 2, which is, currently, > > everything. Biopython2 running on Python 3 would give me an > > excuse to start using Python 3 for new code. Keeping these > > separate would be more difficult if the lowercasing were done > > under the same "Bio" namespace. > > Yes that makes sense. > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Sun Nov 4 08:09:35 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 13:09:35 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi, On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon wrote: > Already I feel that we need to install too many packages to get going with > Python in bioinformatics (Python itself, NumPy, Matplotlib and its > dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps > SciPy, Biopython). I find this hard to explain to people new to > bioinformatics or new to Python. So I would prefer to keep one distribution. > > We can be more lenient in terms of dependencies, especially those that > don't occur at compile time. > > One of the things that I always found lacking with biopython is a clear, consistent policy on dependencies: Depending on the mood of the day it could be either good/bad to add a library dependency. As an example, this ended up with there being a dependency on reportlab, but not on scipy. Whatever the policy, I think that is should be consistent all across. Preferably simple to both users and developers. A few ideas on policy: 1. I totally agree with the the idea of being as lenient as possible with dependencies (as you say, especially with those that do not occur at compile time). 2. Biopython belongs to a certain software ecology. I think it would make sense to see as natural adding dependencies on well established python libraries. 3. (1+2) If a developer wants to add a dependency on a package, that should not be a major problem (as long as the package is maintained for long/well known/stable). Users should only have to deal with the dependency if they need the functionality that depends on that package. Python being a dynamic language, there does not have to be a burden on users/developers if a remote part of Biopython depends on something more exotic (which most users/developers will never see/install in any case). Again by "exotic" I mean well known libraries with a track record of years of stability. Tiago PS - Another issue that it would be interesting see cleared-up would be the policy on compile time (linkage) dependencies. Are new ones encouraged? What about Java/Jython based? From p.j.a.cock at googlemail.com Sun Nov 4 09:01:16 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 14:01:16 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? Message-ID: Retitling thread On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o wrote: > Hi, > > > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon wrote: >> >> Already I feel that we need to install too many packages to get going with >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps >> SciPy, Biopython). I find this hard to explain to people new to >> bioinformatics or new to Python. So I would prefer to keep one distribution. >> >> We can be more lenient in terms of dependencies, especially those that >> don't occur at compile time. >> > > One of the things that I always found lacking with biopython is a clear, > consistent policy on dependencies: It would be good to have something written down, just as we did with the deprecation policy. > Depending on the mood of the day it could be either good/bad > to add a library dependency. As an example, this ended up > with there being a dependency on reportlab, but not on scipy. The ReportLab dependency is a 'run time only' dependency and has been in Biopython for a very long time. You'd have to remind me if there was any compile time issue with scipy, but my recollection was we were loath to add a dependency on scipy (which is quite a complex library to install if not using a package) for just one or two functions - however you were planning something more substantial in the PopGen code which would justify it (using lots of statistics). > Whatever the policy, I think that is should be consistent all across. > Preferably simple to both users and developers. > > A few ideas on policy: > > 1. I totally agree with the the idea of being as lenient as possible with > dependencies (as you say, especially with those that do not occur at > compile time). > 2. Biopython belongs to a certain software ecology. I think it would make > sense to see as natural adding dependencies on well established python > libraries. > 3. (1+2) If a developer wants to add a dependency on a package, that should > not be a major problem (as long as the package is maintained for long/well > known/stable). Users should only have to deal with the dependency if they > need the functionality that depends on that package. > > Python being a dynamic language, there does not have to be a burden on > users/developers if a remote part of Biopython depends on something more > exotic (which most users/developers will never see/install in any case). > Again by "exotic" I mean well known libraries with a track record of years > of stability. That all sounds reasonable. It is compile time dependencies that I am most wary of. However, from an end user perspective having installed Biopython and then trying a script from a colleague and only then finding 101 optional run time dependencies are also needed would be annoying. For Linux packages like Debian there is a 'recommends' field for this kind of soft dependency. Where do we stand with declaring dependencies in setup.py so that if using a package manager like pip this it less painful? In fact, how many 'soft' dependencies like this do we already have? Just from a quick look at the README file many are not mentioned under the current 'System Requirements' text (e.g. Network X). > Tiago > PS - Another issue that it would be interesting see cleared-up would be the > policy on compile time (linkage) dependencies. Are new ones encouraged? Currently discouraged. They make installation much more painful, and have tended to be left untested, e.g. mmCIF was for many years disabled by default because no one could work out how to detect its requirements at compile time. > What about Java/Jython based? I'm not so keen on something providing Java/Jython only functionality. However, something where we could require library X under Jython while using library Y under C Python makes sense. Database access would be a perfect example - things like Python's sqlite3 don't yet exist under Jython. Peter From sbassi at clubdelarazon.org Sun Nov 4 12:34:55 2012 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sun, 4 Nov 2012 14:34:55 -0300 Subject: [Biopython-dev] 403 link Message-ID: On page http://biopython.org/wiki/Documentation there are 2 links to a 403 error: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf I can't correct this doc since I don't know were they are. From p.j.a.cock at googlemail.com Sun Nov 4 13:08:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 18:08:40 +0000 Subject: [Biopython-dev] 403 link In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 5:34 PM, Sebastian Bassi wrote: > On page http://biopython.org/wiki/Documentation there are 2 links to a > 403 error: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > I can't correct this doc since I don't know were they are. The links are correct - this is a side effect of the current migration from the (dying) OBF server to an Amazon hosted virtual machine. As of yesterday the static pages were up and the wiki down, for now it is the other way round... its being worked on. Regards, Peter From eric.talevich at gmail.com Sun Nov 4 14:47:53 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Nov 2012 14:47:53 -0500 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock wrote: > Retitling thread > > On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o wrote: > > Hi, > > > > > > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon > wrote: > >> > >> Already I feel that we need to install too many packages to get going > with > >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its > >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps > >> SciPy, Biopython). I find this hard to explain to people new to > >> bioinformatics or new to Python. So I would prefer to keep one > distribution. > >> > >> We can be more lenient in terms of dependencies, especially those that > >> don't occur at compile time. > >> > > > > One of the things that I always found lacking with biopython is a clear, > > consistent policy on dependencies: > > It would be good to have something written down, just as we > did with the deprecation policy. > Should we start a page for this on the wiki? > > Depending on the mood of the day it could be either good/bad > > to add a library dependency. As an example, this ended up > > with there being a dependency on reportlab, but not on scipy. > > The ReportLab dependency is a 'run time only' dependency and > has been in Biopython for a very long time. You'd have to remind > me if there was any compile time issue with scipy, but my > recollection was we were loath to add a dependency on scipy > (which is quite a complex library to install if not using a package) > for just one or two functions - however you were planning something > more substantial in the PopGen code which would justify it (using > lots of statistics). > > > Whatever the policy, I think that is should be consistent all across. > > Preferably simple to both users and developers. > > > > A few ideas on policy: > > > > 1. I totally agree with the the idea of being as lenient as possible with > > dependencies (as you say, especially with those that do not occur at > > compile time). > > 2. Biopython belongs to a certain software ecology. I think it would make > > sense to see as natural adding dependencies on well established python > > libraries. > > 3. (1+2) If a developer wants to add a dependency on a package, that > should > > not be a major problem (as long as the package is maintained for > long/well > > known/stable). Users should only have to deal with the dependency if they > > need the functionality that depends on that package. > > > > Python being a dynamic language, there does not have to be a burden on > > users/developers if a remote part of Biopython depends on something more > > exotic (which most users/developers will never see/install in any case). > > Again by "exotic" I mean well known libraries with a track record of > years > > of stability. > > That all sounds reasonable. It is compile time dependencies that I am > most wary of. > Pure-Python dependencies seem less scary -- a package like PLY should work on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the dependencies that are most tempting are the ones with essential C extensions (numpy, scipy, matplotlib). However, from an end user perspective having installed Biopython and > then trying a script from a colleague and only then finding 101 optional > run time dependencies are also needed would be annoying. > > For Linux packages like Debian there is a 'recommends' field for this kind > of soft dependency. Where do we stand with declaring dependencies in > setup.py so that if using a package manager like pip this it less painful? > > In fact, how many 'soft' dependencies like this do we already have? > Just from a quick look at the README file many are not mentioned > under the current 'System Requirements' text (e.g. Network X). > I just used "git grep import Bio/" to find out. The only egregious undocumented dependencies are the ones I added in Phylo for graphics: networkx and matplotlib/pylab. Other *possible* dependencies are sqlite3 in the case of Jython (Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k). Should we add these to the "install_recommends" list in setup.py? > > Tiago > > PS - Another issue that it would be interesting see cleared-up would be > the > > policy on compile time (linkage) dependencies. Are new ones encouraged? > > Currently discouraged. They make installation much more painful, and > have tended to be left untested, e.g. mmCIF was for many years disabled > by default because no one could work out how to detect its requirements > at compile time. > > > What about Java/Jython based? > > I'm not so keen on something providing Java/Jython only functionality. > However, something where we could require library X under Jython > while using library Y under C Python makes sense. Database access > would be a perfect example - things like Python's sqlite3 don't yet exist > under Jython. > > Peter > From tiagoantao at gmail.com Sun Nov 4 15:49:33 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 20:49:33 +0000 Subject: [Biopython-dev] Jython DB Message-ID: Howdy, On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote: > Retitling thread > Again ;) > while using library Y under C Python makes sense. Database access > would be a perfect example - things like Python's sqlite3 don't yet exist > under Jython. > > I noticed that there is 1 reference to sqlite3: Bio.SeqIO._index Other stuff on BioSQL is just really related to database configuration and does not impair functionality (exception to a test case that really depends on sqlite3). I suppose that a "default" DB with Jython would probably be JavaDB (aka Apache Derby)? It is available as a default on the Sun/Oracle JDK (though not the JRE). I could go ahead and have a try at evaluating the portability costs for sqlite3->javadb. In theory it should be easy ( http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html) -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Sun Nov 4 15:49:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 20:49:58 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sunday, November 4, 2012, Eric Talevich wrote: > On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock > > wrote: > >> Retitling thread >> >> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o > >> wrote: >> > Hi, >> > >> > >> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon > >> wrote: >> >> >> >> Already I feel that we need to install too many packages to get going >> with >> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its >> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps >> >> SciPy, Biopython). I find this hard to explain to people new to >> >> bioinformatics or new to Python. So I would prefer to keep one >> distribution. >> >> >> >> We can be more lenient in terms of dependencies, especially those that >> >> don't occur at compile time. >> >> >> > >> > One of the things that I always found lacking with biopython is a clear, >> > consistent policy on dependencies: >> >> It would be good to have something written down, just as we >> did with the deprecation policy. >> > > Should we start a page for this on the wiki? > > The wiki is online again now :) Maybe agree a draft by email first? > > Depending on the mood of the day it could be either good/bad >> > to add a library dependency. As an example, this ended up >> > with there being a dependency on reportlab, but not on scipy. >> >> The ReportLab dependency is a 'run time only' dependency and >> has been in Biopython for a very long time. You'd have to remind >> me if there was any compile time issue with scipy, but my >> recollection was we were loath to add a dependency on scipy >> (which is quite a complex library to install if not using a package) >> for just one or two functions - however you were planning something >> more substantial in the PopGen code which would justify it (using >> lots of statistics). >> >> > Whatever the policy, I think that is should be consistent all across. >> > Preferably simple to both users and developers. >> > >> > A few ideas on policy: >> > >> > 1. I totally agree with the the idea of being as lenient as possible >> with >> > dependencies (as you say, especially with those that do not occur at >> > compile time). >> > 2. Biopython belongs to a certain software ecology. I think it would >> make >> > sense to see as natural adding dependencies on well established python >> > libraries. >> > 3. (1+2) If a developer wants to add a dependency on a package, that >> should >> > not be a major problem (as long as the package is maintained for >> long/well >> > known/stable). Users should only have to deal with the dependency if >> they >> > need the functionality that depends on that package. >> > >> > Python being a dynamic language, there does not have to be a burden on >> > users/developers if a remote part of Biopython depends on something more >> > exotic (which most users/developers will never see/install in any case). >> > Again by "exotic" I mean well known libraries with a track record of >> years >> > of stability. >> >> That all sounds reasonable. It is compile time dependencies that I am >> most wary of. >> > > Pure-Python dependencies seem less scary -- a package like PLY should work > on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the > dependencies that are most tempting are the ones with essential C > extensions (numpy, scipy, matplotlib). > But (for example) matplotlib wouldn't be a build time dependency for us. > However, from an end user perspective having installed Biopython and >> then trying a script from a colleague and only then finding 101 optional >> run time dependencies are also needed would be annoying. >> >> For Linux packages like Debian there is a 'recommends' field for this kind >> of soft dependency. Where do we stand with declaring dependencies in >> setup.py so that if using a package manager like pip this it less painful? >> >> In fact, how many 'soft' dependencies like this do we already have? >> Just from a quick look at the README file many are not mentioned >> under the current 'System Requirements' text (e.g. Network X). >> > > I just used "git grep import Bio/" to find out. The only egregious > undocumented dependencies are the ones I added in Phylo for graphics: > networkx and matplotlib/pylab. > Could you add those to the README file then? > Other *possible* dependencies are sqlite3 in the case of Jython > (Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k). > > Should we add these to the "install_recommends" list in setup.py? > No, they are in the standard lib on C Python, except in the case of OrderedDict on older Pythons were we bundle a backport anyway. Jython has an open bug on including the sqlite3 module, and might be worth mentioning under a new Jython specific section of the README. Peter From tiagoantao at gmail.com Sun Nov 4 16:00:10 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 21:00:10 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote: > Jython has an open bug on including the sqlite3 module, > > This will go nowhere fast as it will be dependent on a JNI library (i.e. linkage of C code). The only durable option in the Java space would be a native implementation of sqlite3. All other options are not of the "embeddable" type (e.g. JDBC driver to something running outside), defeating the main purpose of sqlite3. To sum it up: I doubt that sqlite3 will be a realistic solution in the Jython space. As per previous email, I suspect that a Python DBI to JDBC bridge (bundled with Jython by default) + a default database (javadb/derby or H2 or HSQLDB) is probably more realistic in the Java space. On the OracleJDK javadb will require 0 dependencies. On other JDK or a JRE, Apache derby. -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Sun Nov 4 16:47:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 21:47:20 +0000 Subject: [Biopython-dev] Jython DB In-Reply-To: References: Message-ID: Hi Tiago, On Sun, Nov 4, 2012 at 8:49 PM, Tiago Ant?o wrote: > Howdy, > > On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote: >> >> Retitling thread > > > Again ;) > > >> >> while using library Y under C Python makes sense. Database access >> would be a perfect example - things like Python's sqlite3 don't yet exist >> under Jython. >> > > I noticed that there is 1 reference to sqlite3: > Bio.SeqIO._index > > Other stuff on BioSQL is just really related to database configuration and > does not impair functionality (exception to a test case that really depends > on sqlite3). > > I suppose that a "default" DB with Jython would probably be JavaDB (aka > Apache Derby)? It is available as a default on the Sun/Oracle JDK (though > not the JRE). > > I could go ahead and have a try at evaluating the portability costs for > sqlite3->javadb. In theory it should be easy > (http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html) The database stuff in Biopython currently is BioSQL (which under C Python supports a MySQL, PostgreSQL or SQLite back end) and things like SeqIO.index which use SQLite3 directly. None of this currently works under Jython :( I was hoping Jython would implement an sqlite3 module which we (and any other Python library) could just use - there seems to be no progress on that: http://bugs.jython.org/issue1682864 Likewise the MySQLdb and PostgreSQL modules. Failing a port allowing our current code to "just work", someone could write alternative code for Biopython to all an appropriate Java DB interface directly. For our BioSQL we already have a structure to cope with a range of backends, so this should be quite clean. In the case of Bio.SeqIO.index_db, we probably only use a fraction of the full sqlite3 module's capabilities, so special casing this under Jython to call JavaDB might not be too complicated... (for anyone who knows there way round Jython and JavaDB)? If you fancy exploring SQLite3 under Jython, go for it :) Peter From p.j.a.cock at googlemail.com Sun Nov 4 16:48:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 21:48:56 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 9:00 PM, Tiago Ant?o wrote: > On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote: >> >> Jython has an open bug on including the sqlite3 module, >> > > This will go nowhere fast as it will be dependent on a JNI library (i.e. > linkage of C code). > The only durable option in the Java space would be a native implementation > of sqlite3. > All other options are not of the "embeddable" type (e.g. JDBC driver to > something running outside), defeating the main purpose of sqlite3. Let's continue this on the new thread: http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010072.html Peter From redmine at redmine.open-bio.org Sun Nov 4 17:47:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3392 has been reported by Brad Zoltick. ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 17:47:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3392 has been reported by Brad Zoltick. ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 17:47:23 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3393 has been reported by Brad Zoltick. ---------------------------------------- Bug #3393: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3393 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 17:47:22 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3393 has been reported by Brad Zoltick. ---------------------------------------- Bug #3393: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3393 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 19:06:10 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 5 Nov 2012 00:06:10 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] unable to download almost any documentation - the download links are invalid References: Message-ID: Issue #3392 has been updated by Peter Cock. Category changed from Documentation to Website Priority changed from Normal to Urgent Yep, we know about it - but thanks for letting us know just in case: http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010069.html The same issue affects our release downloads too which is more annoying. Its a side effect during server migration from a dying machine to a virtual machine on the Amazon Cloud, http://lists.open-bio.org/pipermail/biopython/2012-November/008248.html Leaving this bug open until the new server is fixed... ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Urgent Assignee: Biopython Dev Mailing List Category: Website Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Nov 5 18:07:09 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Nov 2012 23:07:09 +0000 Subject: [Biopython-dev] OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: On Thu, Nov 1, 2012 at 7:50 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock wrote: >> FYI regarding the Biopython website and recent mailing list outage. >> >> Peter >> >> PS you also keep an eye on @Biopython and @OBF_news on Twitter, >> which are a useful alternative when the mailing lists are down. >> >> > > I should have added that while the wiki is down (which does > unfortunately include the Biopython home page), the Biopython > downloads remain available via http://biopython.org/DIST/ and > other 'static' content like the Tutorial and API pages are up: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > http://biopython.org/DIST/docs/api/ Hosting of biopython.org (and the bioperl.org and open-bio.org websites) was transferred to an Amazon cloud machine over the weekend, which fixed the wiki but temporarily disabled the static pages (like the Tutorial and downloads). Those should all be working again now. At some later date (to be announced) the server running the OBF mailing lists will be transferred, which would make the mailing lists unavailable for a short period. Regards, Peter From redmine at redmine.open-bio.org Mon Nov 5 18:13:43 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 5 Nov 2012 23:13:43 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (Resolved) unable to download almost any documentation - the download links are invalid References: Message-ID: Issue #3392 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 This should be working again now :) ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: Resolved Priority: Urgent Assignee: Biopython Dev Mailing List Category: Website Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kai.blin at biotech.uni-tuebingen.de Mon Nov 19 09:11:42 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 19 Nov 2012 15:11:42 +0100 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. Message-ID: <50AA3E1E.70407@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, I'm currently investigating an error caused by an invalid GenBank file input that annotates CDS features with invalid coordinates. The GenBank parser accepts these features, but later my program crashes. It turns out the crash is because I'm calling the extract() method for my seq features, which then return an empty Seq object for out-of-range parent_sequence. I have the feeling that raising an exception would be the best way of dealing with this, but of course I can also check the result of extract() to be different from an empty Seq object. The line I'd like to throw a ValueError on out-of-bounds coordinates is https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 What are your thoughts on this? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQqj4eAAoJEKM5lwBiwTTP7rsIANURFpsEtHOIgJ1z3r6nV3mQ rI0Vo0fBh59beZA0NYi2rMez+TUFXf87Ih3b9LGIH4xaFsAwpXJrUjvbqC1tuqBv KFg65psNCnDlp9Pc4DZQnaAS7ycoDrDiJStV387XWE6CA7dTiCkBUfKwuaf7S/om m1je0XMJ6j6J5+Jn2qW/QMpf2G9e8lAkZyeNIQyYtGF+RbPkBPSxpZFTEn6KsymT dOLoCQVhlf1R9X0S+nLBAh9Q29akf6/tkUcqdUg5ROoNqvqjudDWbz0JgoTgsf7n j24rlTIpxktl3KKna6DtoX5ig4EKF5IOnQmo00JrWWL8Liy0oKTY/LRkF5CB85k= =djFF -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Mon Nov 19 11:10:15 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 Nov 2012 16:10:15 +0000 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. In-Reply-To: <50AA3E1E.70407@biotech.uni-tuebingen.de> References: <50AA3E1E.70407@biotech.uni-tuebingen.de> Message-ID: On Mon, Nov 19, 2012 at 2:11 PM, Kai Blin wrote: > Hi folks, > > I'm currently investigating an error caused by an invalid GenBank file > input that annotates CDS features with invalid coordinates. The > GenBank parser accepts these features, but later my program crashes. Perhaps we should have a parser error/warning at that point? (as well as any fix to the extract method) > It turns out the crash is because I'm calling the extract() method for > my seq features, which then return an empty Seq object for > out-of-range parent_sequence. > > I have the feeling that raising an exception would be the best way of > dealing with this, but of course I can also check the result of > extract() to be different from an empty Seq object. > > The line I'd like to throw a ValueError on out-of-bounds coordinates is > https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 > > What are your thoughts on this? Some might find this surprising given the (initially rather odd) Python slicing behviour with out of range coordindates (which indirectly cause the behaviour ovserved here): >>> "hello"[100:200] '' i.e. Slicing a string outside its bounds gives an empty string. On balance you're probably right that an error in this situation makes more sense (a discrepancy between feature location and the given parent sequence not being long enough). Peter From p.j.a.cock at googlemail.com Mon Nov 19 11:32:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 Nov 2012 16:32:11 +0000 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. In-Reply-To: <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com> References: <50AA3E1E.70407@biotech.uni-tuebingen.de> <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com> Message-ID: On Mon, Nov 19, 2012 at 4:25 PM, Kai Blin wrote: > Peter Cock wrote: > >>> GenBank parser accepts these features, but later my program crashes. >> >>Perhaps we should have a parser error/warning at that point? >>(as well as any fix to the extract method) > > Probably a bit tricky because the GenBank file might not contain a > sequence at all, and we can't tell until we either see the sequence or > an end of record marker. The first line should tell you the length, and we already have a warning in place for naughty GenBank files where the actual sequence has a different length. Those could be a problem for this new warning, as you'd only know the expected sequence length from the header while parsing the features. >>> I have the feeling that raising an exception would be the best way >>> of dealing with this, but of course I can also check the result >>> of extract() to be different from an empty Seq object. >>> >>> The line I'd like to throw a ValueError on out-of-bounds coordinates >>> is >>> >>> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 >>> >>> What are your thoughts on this? >> >>Some might find this surprising given the (initially rather odd) >>Python slicing behviour with out of range coordindates (which >>indirectly cause the behaviour ovserved here): >> >>>>> "hello"[100:200] >>'' >> >>i.e. Slicing a string outside its bounds gives an empty string. > > Yes, that is why we end up with an empty Seq object. > >>On balance you're probably right that an error in this situation >>makes more sense (a discrepancy between feature location >>and the given parent sequence not being long enough). > > Yes. The way I understand the intention of the parent sequence, > the whole point is that the feature should be located on it. > > I'll gladly prepare a patch (and some test). > Cheers, > Kai OK. Peter From redmine at redmine.open-bio.org Tue Nov 20 08:41:47 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 13:41:47 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie implementation can't load large data sets Message-ID: Issue #3395 has been reported by Micha? Nowotka. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 08:41:48 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 13:41:48 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie implementation can't load large data sets Message-ID: Issue #3395 has been reported by Micha? Nowotka. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 09:02:01 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 14:02:01 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Can you try the same test case without gzip? i.e. Can you load /tmp/trie.dat rather than /tmp/trie.dat.gz? Also I would try explicitly opening the files in binary mode. P.S. Which OS, which version of Python, which version of Biopython? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 09:18:46 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 14:18:46 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. Sure, I'll update this issue as soon as I check that. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 11:31:13 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 16:31:13 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. OK, I tried using standard python file handler with explicit binary mode and it also failed. The file is now 165.5MB. I also tried bz2 and zip compression, without any luck... ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 12:02:48 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:02:48 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Well that is progress - it means this isn't a problem coming from reading a compressed file on disk - you've made the test case simpler. Can you actually share a self contained example script? If not, I suggest you try halving the dataset (only record the first half of the tries), and retest. Then repeat - this should tell you if the problem is as you suspect a large dataset, or something specific about a special value. Alternatively can you share the (compressed) file? I could at least check if it fails the same way here, and perhaps add some debugging code to get more information. The error message itself is coming from some C code, which hasn't changed for some time: https://github.com/biopython/biopython/blob/master/Bio/triemodule.c The error itself is likely triggered in function _deserialize_transition in trie.c: https://github.com/biopython/biopython/blob/master/Bio/triemodule.c You still haven't told us the important information of which OS, which version of Python, which version of Biopython. Given it is C code, I'd also like to know how Biopython was installed (e.g. did you compile it from source yourself). ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 12:14:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:14:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I'm using Ubuntu 12.04 LTS, Biopython 1.6 and Python 2.7.3. Can you tell me where should I place compressed file? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 12:21:58 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:21:58 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Sadly RedMine is limited to 5MB attachments. You could use DropBox or something similar, or if you have your own server put the file online temporarily for me to download it? You probably have Biopython 1.60 (one dot sixty), there was no Biopython 1.6, one dot six. Did you install Biopython using the Ubuntu package manager? i.e. the GUI tool, or at the command line with something like 'apt-get install biopython'? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 12:43:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:43:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I put the file here: http://mnowotka.kei.pl/trie.4.dat.gz ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 12:56:47 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:56:47 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I confirm, it's 1.60 version, I'm using. I installed it either by apt-get install or pip. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Nov 26 08:29:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 13:29:58 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? Message-ID: On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: > On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock > wrote: >> >> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: >> > >> > Peter; >> > >> >> In the case of Bow's SearchIO code, what would you prefer? >> >> e.g. Bio.SearchIO as it is now on his branch? >> > >> > I like plain ol' Search the best but don't have a strong preference. I'm >> > terrible at naming things so trust everyone's judgment on this. >> > >> > Brad >> >> Since we have no clear consensus, I propose we add Bow's code >> as Bio.SearchIO (which is how it is written right now), with the new >> BiopythonExperimentalWarning in place (to alert people that it may >> change in the next release). We can then rename or move it at a >> later date. This will make it easier for people to test the code, and >> also suggest further changes or additions (e.g. Kai's HMMER work). >> >> If we and when we agree a consolidation of the Bio.SeqXXX >> modules, then Bio.SearchIO could move too. If this happens >> before any public release as Bio.SearchIO so much the better. >> >> Adopting lower case module names under Python 3 is also a >> separate issue. >> >> Peter >> > > +1 > > Regarding ... I plan to do the commit today, barring any last minute objections. I am leaning towards a merge from Bow's original (un-rebased) branch, which had only three trivial conflicts to handle. Peter From w.arindrarto at gmail.com Mon Nov 26 08:38:23 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 14:38:23 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: Hi Peter and everyone, If it helps, I've done the rebase (also resolving the three conflicts) with the latest master branch. On top of it, I've also added the new BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's available here: https://github.com/bow/biopython/tree/searchio. However if you're interested in inspecting the non-rebased branch, I've also kept it here: https://github.com/bow/biopython/tree/searchio-nonrebased. Note that this one doesn't have the new experimental warning since it's a feature added more recently. Also, in both branches, the tutorial has been changed with the addition of the (draft) Bio.SearchIO tutorial. Let me know which one you prefer and I'll submit a pull request :). cheers, Bow On Mon, Nov 26, 2012 at 2:29 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: >> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock >> wrote: >>> >>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: >>> > >>> > Peter; >>> > >>> >> In the case of Bow's SearchIO code, what would you prefer? >>> >> e.g. Bio.SearchIO as it is now on his branch? >>> > >>> > I like plain ol' Search the best but don't have a strong preference. I'm >>> > terrible at naming things so trust everyone's judgment on this. >>> > >>> > Brad >>> >>> Since we have no clear consensus, I propose we add Bow's code >>> as Bio.SearchIO (which is how it is written right now), with the new >>> BiopythonExperimentalWarning in place (to alert people that it may >>> change in the next release). We can then rename or move it at a >>> later date. This will make it easier for people to test the code, and >>> also suggest further changes or additions (e.g. Kai's HMMER work). >>> >>> If we and when we agree a consolidation of the Bio.SeqXXX >>> modules, then Bio.SearchIO could move too. If this happens >>> before any public release as Bio.SearchIO so much the better. >>> >>> Adopting lower case module names under Python 3 is also a >>> separate issue. >>> >>> Peter >>> >> >> +1 >> >> Regarding ... > > I plan to do the commit today, barring any last minute objections. > > I am leaning towards a merge from Bow's original (un-rebased) branch, > which had only three trivial conflicts to handle. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Nov 26 08:49:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 13:49:44 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 1:38 PM, Wibowo Arindrarto wrote: > Hi Peter and everyone, > > If it helps, I've done the rebase (also resolving the three conflicts) > with the latest master branch. On top of it, I've also added the new > BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's > available here: https://github.com/bow/biopython/tree/searchio. > > However if you're interested in inspecting the non-rebased branch, > I've also kept it here: > https://github.com/bow/biopython/tree/searchio-nonrebased. Note that > this one doesn't have the new experimental warning since it's a > feature added more recently. > > Also, in both branches, the tutorial has been changed with the > addition of the (draft) Bio.SearchIO tutorial. > > Let me know which one you prefer and I'll submit a pull request :). > > cheers, > Bow That's fine - I found both branches :) I've actually done a trial merge on the non-rebased one and then cherry-picked the experimental warning - looks good. Once that's done there is some housekeeping to do, like the indexing code duplication with Bio.SeqIO, and tackling indexing BGZF compressed files with Bio.SearchIO which I will have a go at. Peter P.S. I had intended to do this earlier this month, but we had the OBF server issues to deal with. From w.arindrarto at gmail.com Mon Nov 26 09:06:03 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 15:06:03 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: > That's fine - I found both branches :) > > I've actually done a trial merge on the non-rebased one and > then cherry-picked the experimental warning - looks good. Ah, good then :). > Once that's done there is some housekeeping to do, like > the indexing code duplication with Bio.SeqIO, and tackling > indexing BGZF compressed files with Bio.SearchIO which > I will have a go at. Yes. I'm pretty sure there will also be changes we need to implement after more feedback from users. > P.S. I had intended to do this earlier this month, but we > had the OBF server issues to deal with. That's ok, I also noticed that it's not until quite recently that the commits become frequent again. From mauriceling at gmail.com Mon Nov 26 09:48:24 2012 From: mauriceling at gmail.com (Maurice Ling) Date: Mon, 26 Nov 2012 08:48:24 -0600 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ Message-ID: Hi I am setting an error running this: from Bio import Entrez from Bio import Medline handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline", retmode="text") The traceback is Traceback (most recent call last): File "C:\Users\Maurice.Ling\Desktop\muscorian\archive\pubmed_dump.py", line 16, in retmode="text") File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 133, in efetch keywords["id"] = ",".join(keywds["id"]) TypeError: sequence item 0: expected string, int found When I changed line 133 of Bio.Entrez.__init__ from keywords["id"] = ",".join(keywds["id"]) to keywords["id"] = ",".join(str(keywds["id"])) The error disappeared. Maurice LING mobile: +1(605)5920300, +6596669233 www: http://maurice.vodien.com CV: http://maurice.vodien.com/maurice_resume.pdf Linkedin: http://www.linkedin.com/in/mauriceling ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling From p.j.a.cock at googlemail.com Mon Nov 26 09:57:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 14:57:28 +0000 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 2:48 PM, Maurice Ling wrote: > Hi > > I am setting an error running this: > > from Bio import Entrez > from Bio import Medline > handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline", > retmode="text") > I would have used this: Entrez.efetch(db="pubmed", id=["19300000"], rettype="medline", retmode="text") In general the NCBI identifiers are arbitrary strings, although perhaps the pubmed identifiers could be treated as integers. This is perhaps worth changing in the Bio.Entrez code... What do you think Michael? Peter From mauriceling at gmail.com Mon Nov 26 10:23:31 2012 From: mauriceling at gmail.com (Maurice Ling) Date: Mon, 26 Nov 2012 09:23:31 -0600 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations Message-ID: Hi I found something strange in my download script to pull a list of pubmed citations. This was working in the past (back in 2008 period)... The script is ID_start = 19000000 ID_stop = 19000010 downtime = 1.2 from Bio import Entrez from Bio import Medline import string import time import cPickle Entrez.email = 'maurice.ling at sdstate.edu' while (ID_start < ID_stop): try: handle = Entrez.efetch(db="pubmed", id=[str(ID_start)], rettype="medline", retmode="text") records = list(Medline.parse(handle))[0] print records cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1) ID_start = ID_start + 1 time.sleep(downtime) print 'ID count: ', str(ID_start) except: print 'ID count: error ', str(ID_start) ID_start = ID_start + 1 But the results from print records kept showing the same thing: {'STAT': 'MEDLINE', 'IP': '2', 'JT': 'Biochemical medicine', 'DA': '19760116', 'FAU': ['Makar, A B', 'McMartin, K E', 'Palese, M', 'Tephly, T R'], 'DP': '1975 Jun', 'OWN': 'NLM', 'PT': ['Journal Article', "Research Support, U.S. Gov't, P.H.S."], 'LA': ['eng'], 'CRDT': ['1975/06/01 00:00'], 'DCOM': '19760116', 'LR': '20091111', 'PG': '117-26', 'TI': 'Formate assay in body fluids: application in methanol poisoning.', 'RN': ['0 (Formates)', '124-38-9 (Carbon Dioxide)', '67-56-1 (Methanol)', 'EC 1.2.- (Aldehyde Oxidoreductases)'], 'PL': 'UNITED STATES', 'TA': 'Biochem Med', 'JID': '0151424', 'VI': '13', 'IS': '0006-2944 (Print) 0006-2944 (Linking)', 'AU': ['Makar AB', 'McMartin KE', 'Palese M', 'Tephly TR'], 'MHDA': '1975/06/01 00:01', 'MH': ['Aldehyde Oxidoreductases/metabolism', 'Animals', 'Body Fluids/*analysis', 'Carbon Dioxide/blood', 'Formates/blood/*poisoning', 'Haplorhini', 'Humans', 'Hydrogen-Ion Concentration', 'Kinetics', 'Methanol/blood', 'Methods', 'Pseudomonas/enzymology'], 'EDAT': '1975/06/01', 'SO': 'Biochem Med. 1975 Jun;13(2):117-26.', 'SB': 'IM', 'PMID': '1', 'PST': 'ppublish'} It seems to keep efetching PMID 1 (http://www.ncbi.nlm.nih.gov/pubmed/1) Any idea? Thanks in advance. Maurice LING mobile: +1(605)5920300, +6596669233 www: http://maurice.vodien.com CV: http://maurice.vodien.com/maurice_resume.pdf Linkedin: http://www.linkedin.com/in/mauriceling ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling From p.j.a.cock at googlemail.com Mon Nov 26 10:36:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 15:36:13 +0000 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 3:23 PM, Maurice Ling wrote: > Hi > > I found something strange in my download script to pull a list of pubmed > citations. This was working in the past (back in 2008 period)... > > The script is > > ID_start = 19000000 > ID_stop = 19000010 > downtime = 1.2 > > from Bio import Entrez > from Bio import Medline > import string > import time > import cPickle > > Entrez.email = 'maurice.ling at sdstate.edu' > > while (ID_start < ID_stop): > try: > handle = Entrez.efetch(db="pubmed", id=[str(ID_start)], > rettype="medline", > retmode="text") > records = list(Medline.parse(handle))[0] > print records > cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1) > ID_start = ID_start + 1 > time.sleep(downtime) > print 'ID count: ', str(ID_start) > except: > print 'ID count: error ', str(ID_start) > ID_start = ID_start + 1 Are you sure you didn't run something slightly different? The simplest possibility would be a line accidentally setting ID_start to equal 1, rather than increasing it. Also, using a for loop would be much cleaner (with the identifiers as either integers or as strings). For instance, for identifier in range(19000000, 19000010): #Do stuff Note you have a discrepancy with ID_stop vs ID_end This seems to work for me: ID_start = 19000000 ID_stop = 19000010 downtime = 1.2 from Bio import Entrez from Bio import Medline import string import time import cPickle Entrez.email = 'maurice.ling at sdstate.edu' for identifier in range(ID_start, ID_stop): identifier = str(identifier) try: handle = Entrez.efetch(db="pubmed", id=identifier, rettype="medline", retmode="text") records = list(Medline.parse(handle))[0] print records cPickle.dump(records, open('%s.txt' % identifier, 'w'), -1) except Excpetion, error: print "Error for %s - %s" % (identifier, error) However, rather than parsing the Medline records and saving the pickled object, I would save the plain text Medline data itself. That way you can use the files outside of Python (e.g. working at the Unix command line with grep). Peter From p.j.a.cock at googlemail.com Mon Nov 26 11:08:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:08:28 +0000 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 3:42 PM, Maurice Ling wrote: > Thanks Peter > > Now, that seems to work... still scratching my uncaffeinated head though.... > Great. I'm sure a coffee will help :) Peter P.S. Next time could you use the main list for usage queries, rather than the development list, biopython-dev - thanks! From p.j.a.cock at googlemail.com Mon Nov 26 11:46:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:46:44 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto wrote: >> That's fine - I found both branches :) >> >> I've actually done a trial merge on the non-rebased one and >> then cherry-picked the experimental warning - looks good. > > Ah, good then :). Done, https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 >> Once that's done there is some housekeeping to do, like >> the indexing code duplication with Bio.SeqIO, and tackling >> indexing BGZF compressed files with Bio.SearchIO which >> I will have a go at. > > Yes. Started, it seems the two _index.py files have diverged a little more than I'd expected: https://github.com/biopython/biopython/commit/ad1786b99afd2a50248246d877ff00a53949546b >> P.S. I had intended to do this earlier this month, but we >> had the OBF server issues to deal with. > > That's ok, I also noticed that it's not until quite recently that the > commits become frequent again. Christian Brueffer deserves some of the credit for the recent burst of commits - he's been very busy sending pull requests! Peter From p.j.a.cock at googlemail.com Mon Nov 26 11:55:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:55:32 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 4:46 PM, Peter Cock wrote: > On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto > wrote: >>> That's fine - I found both branches :) >>> >>> I've actually done a trial merge on the non-rebased one and >>> then cherry-picked the experimental warning - looks good. >> >> Ah, good then :). > > Done, > https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 I've put a short note in the NEWS file, https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7 Congratulations Bow :) I guess this would be a good excuse for you to write another blog post ;) Speaking of which, unless we expect to release Biopython 1.61 soon, we should probably have something on the news blog too (which reminds me I was supposed to co-ordinate a general OBF GSoC 2012 post). Maybe I will manage that will on leave in December? Regards, Peter From w.arindrarto at gmail.com Mon Nov 26 12:05:43 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 18:05:43 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: >>>> That's fine - I found both branches :) >>>> >>>> I've actually done a trial merge on the non-rebased one and >>>> then cherry-picked the experimental warning - looks good. >>> >>> Ah, good then :). >> >> Done, >> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 > > I've put a short note in the NEWS file, > https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7 > > Congratulations Bow :) Thank you :D! It feels great to see the code in master. > I guess this would be a good excuse for you to write another blog post ;) It is, and one should come up in the next couple of days :). Now I'm anxiously waiting for the next Biopython release ~ and the submodule's 'final' form after more feedback ;). cheers, Bow From p.j.a.cock at googlemail.com Mon Nov 26 12:22:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 17:22:00 +0000 Subject: [Biopython-dev] [GSoC] GSoC python variant final update In-Reply-To: References: Message-ID: On Mon, Aug 20, 2012 at 5:22 AM, Lenna Peterson wrote: > Post: http://arklenna.tumblr.com/post/29808300789/ > > The coordinate mapper, with updated documentation, is now located on > this branch: https://github.com/lennax/biopython/tree/f_loc4 > It awaits the merging of Peter's f_loc4 branch. > > I've written an entry on coordinate mapping for the Cookbook: > http://biopython.org/wiki/Coordinate_mapping Hi Lenna, Do you need my f_loc4 branch for the main GSoC variants work, or just the coordinate mapper? Thanks, Peter From chapmanb at 50mail.com Mon Nov 26 15:18:09 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 26 Nov 2012 15:18:09 -0500 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: <87vccs15ku.fsf@fastmail.fm> Bow and Peter; >> Congratulations Bow :) > > Thank you :D! It feels great to see the code in master. Awesome, nice work on this project and congratulations on getting it integrated. It's great to see this go in, Brad From p.j.a.cock at googlemail.com Tue Nov 27 04:35:46 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 27 Nov 2012 09:35:46 +0000 Subject: [Biopython-dev] Minor buildbot issues from SearchIO Message-ID: Hi all, The BuildBot flagged two new issues overnight, http://testing.open-bio.org/biopython/tgrid Python 2.5 on Windows - doctests are failing due to floating point decimal place differences in the exponent (down to C library differences, something fixed in later Python releases). Perhaps a Python 2.5 hack is the way to go here? http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity. Perhaps there is some encoding setting needed under Python 3 for the BLAST XML files? http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio There is a separate cross-platform issue on Python 3.1, "TypeError: invalid event tuple" again with XML parsing. Curiously this had started a few days back in the UniprotIO tests on one machine, pre-dating the SearchIO merge. I'm not sure what triggered it. http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767 http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio (Note TravisCI doesn't officially support Python 3.1, although until recently they did offer it unofficially - Python 3.3 support is happening soon through). Peter From diego_zea at yahoo.com.ar Tue Nov 27 09:25:48 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Tue, 27 Nov 2012 06:25:48 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython Message-ID: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Hi!!! This is my firts mail in the list. I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project. I did this post in Stackoverflow, and I want to share my question to all of you ;) http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated Best wishes, ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } From anaryin at gmail.com Tue Nov 27 10:40:58 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 27 Nov 2012 16:40:58 +0100 Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: Hi Diego, Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach. Cheers, Jo?o 2012/11/27 Diego Zea > > http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated From redmine at redmine.open-bio.org Tue Nov 27 19:46:22 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 28 Nov 2012 00:46:22 +0000 Subject: [Biopython-dev] [Biopython - Feature #3396] (New) Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO Message-ID: Issue #3396 has been reported by Olga Botvinnik. ---------------------------------------- Feature #3396: Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO https://redmine.open-bio.org/issues/3396 Author: Olga Botvinnik Status: New Priority: Normal Assignee: Olga Botvinnik Category: Target version: URL: As of BioPython 1.59, if an alignment is read in with Bio.AlignIO(handle, 'emboss'), the metadata such as the substitution matrix used, gap_penalty, extend_penalty, identity, similarity, gaps, and score in the header is ignored:
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# Score: 100.0
#
#
#=======================================
I edited the EmbossIO.py file to read these metadata and add it as an annotation to each SeqRecord in the MultipleSequenceAlignment object, since the MultipleSequenceAlignment object does not have the option for annotations. I also added the appropriate unit tests. Please let me know if there is a bug in the code that I missed. For example, for the above alignment, the SeqRecord objects would have the following annotations:
{'identity_denominator': 131, 'matrix': 'EBLOSUM62', 'similarity': 0.8549618320610687, 'similarity_numerator': 112, 'similarity_denominator': 131, 'gaps': 0.1450381679389313, 'identity_numerator': 112, 'gap_penalty': 10.0, 'extend_penalty': 0.5, 'gaps_denominator': 131, 'score': 591.5, 'identity': 0.8549618320610687, 'gaps_numerator': 19}
I decided to keep the numerators and denominators separately from the identity, similarity, and gap percentages just in case a user wanted to do something else with them. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From diego_zea at yahoo.com.ar Tue Nov 27 22:09:58 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Tue, 27 Nov 2012 19:09:58 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: <1354072198.13226.YahooMailNeo@web140606.mail.bf1.yahoo.com> """" Hi Jo?o (and others)!!! Thanks :) I think someone with more Numpy knowledgement can do this better, but this is my idea: 1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser) 2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how ) [ The indice is the same, and you can use one for slice the other with boolean arrays ;) ] 3- Define methods for the most commons operations This is and example of my idea (work on 1AB0 from PDB)... """" import numpy names=[] descript=[] xyz = [] # The example structure is # http://www.rcsb.org/pdb/explore.do?structureId=1ab0 with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh: ??? """ Very naive parser.I write this in a couple of minutes. ??? It's bad, but it's only for show the idea """ ??? for line in fh: ??????? if line[0:4]=='ATOM': ??????????? temp =[] ??????????? temp2 =[] ??????????? temp.append(line[4:11].replace(" ","")) ??????????? temp2.append(line[11:16].replace(" ","")) ??????????? temp2.append(line[17:21].replace(" ","")) ??????????? temp.append(line[22:27].replace(" ","")) ??????????? xyz.append(line[31:56].split()) ??????????? temp.append(line[55:60].replace(" ","")) ??????????? temp.append(line[60:67].replace(" ","")) ??????????? temp2.append(line[-5:].replace(" ","").replace("\n","")) ??????????? descript.append(temp) ??????????? names.append(temp2) # I don't good for using different dtypes # In different columns # But can be better columns with names instead of this: names_array = numpy.array(names,numpy.character)???????????? descript_array = numpy.array(descript,numpy.float16) xyz_array = numpy.array(xyz,numpy.float16) def select_atom(names,xyz,descript,atom='CA'): ??? xyz_s = xyz[names[:,0]==atom,:] ??? names_s = names[names[:,0]==atom,:] ??? descript_s = descript[names[:,0]==atom,:] ??? return names_s,xyz_s,descript_s def delete_res_num(names,xyz,descript,num=20): ??? xyz_s = xyz[descript[:,1]!=num,:] ??? names_s = names[descript[:,1]!=num,:] ??? descript_s = descript[descript[:,1]!=num,:] ??? return names_s,xyz_s,descript_s def delete_atom_num(names,xyz,descript,num=20): ??? xyz_s = xyz[descript[:,0]!=num,:] ??? names_s = names[descript[:,0]!=num,:] ??? descript_s = descript[descript[:,0]!=num,:] ??? return names_s,xyz_s,descript_s def add_atom(new_name,new_xyz,new_descript,names,xyz,descript): ??? # Using vstack ;) ??? new_name = numpy.array(new_name,numpy.character) ??? new_descript = numpy.array(new_descript,numpy.float16) ??? new_xyz = numpy.array(new_xyz,numpy.float16) ??? xyz_s = numpy.vstack((xyz,new_xyz)) ??? names_s = numpy.vstack((names,new_name)) ??? descript_s = numpy.vstack((descript,new_descript)) ??? return names_s,xyz_s,descript_s ## Example (works!!!) xyz_array.shape delete_atom_num(names_array,xyz_array,descript_array)[1].shape add_atom(['H','H','H'],[0,0,0],[0,0,0,0],names_array,xyz_array,descript_array)[1].shape ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } >________________________________ > De: Jo?o Rodrigues >Para: Diego Zea >CC: "biopython-dev at lists.open-bio.org" >Enviado: martes, 27 de noviembre de 2012 12:40 >Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython > > >Hi Diego, > > >Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach. > > >Cheers, > > >Jo?o > > >2012/11/27 Diego Zea > >http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated > > > From redmine at redmine.open-bio.org Thu Nov 29 04:09:49 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 09:09:49 +0000 Subject: [Biopython-dev] [Biopython - Feature #3398] (New) Oracle BioSQL Message-ID: Issue #3398 has been reported by Hyungyong Kim. ---------------------------------------- Feature #3398: Oracle BioSQL https://redmine.open-bio.org/issues/3398 Author: Hyungyong Kim Status: New Priority: Normal Assignee: Category: Target version: URL: I just tested Oracle BioSQL for Biopython using cx_Oracle. It includes some Biopython modification due to my genbank file test. I attached this patch and describe how it was generated.
[yong27 at dev biopython]$ git ls-remote --heads origin
902947a7df49d8529faeb7e1bfb55b2d06252272        refs/heads/master
[yong27 at dev biopython]$ git diff origin/master master > oracle_biosql.diff
[yong27 at dev biopython]$
This is a example how to use Oracle BioSQL. Oracle, Oracle BioSQL schema, cx_Oracle has to be installed.
from context lib import contextmanager
from BioSQL import BioSeqDatabase

@contextmanager
def biosqlconn(dbname):
    server = BioSeqDatabase.open_database(driver='cx_Oracle, user='USER', passwd='PASS')
    conn = server[dbname]
    try:
        yield conn
    except:
        conn.adaptor.rollback()
        raise
    else:
        conn.adaptor.commit()
    finally:
        conn.adaptor.close()

with biosqlconn('mydb') as biosqldb:
    record = biosqldb.lookup(accession='1234')

---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Nov 29 05:56:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 10:56:04 +0000 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: <50B6F8FF.2090206@brueffer.de> References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> Message-ID: Can we continue this on the biopython-dev mailing list (CC'd)? On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer wrote: > On 11/29/2012 10:42 AM, Christian Brueffer wrote: >> >> Hi, >> >> in preparation of cleaning up the AlignACE wrapper, I wanted to test >> the current wrapper. However, it doesn't seem to work at all ... >> >> For the record, I'm testing with the Linux version of the binary >> (AlignACE version 2.3 October 27, 1998). >> > > Some of the test files in the Tests directory mention the following AlignACE > version: "AlignACE 4.0 05/13/04" > > This may be the answer to my problems. Does anyone know where to get hold > of this version? > > The website (http://atlas.med.harvard.edu/) is down and the only > other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html) > only distributes the old 2.3 version that I have. Hmm, I don't see any existing unit tests dedicated to this wrapper. There should really be a file named test_AlignACE_tool.py or similar. I would also like some doctests in Bio/Motif/Applications/_AlignAce.py which must be non-executing so they can be run without dependencies, which of course isn't actually a functional test but it does still catch some issues - but primarily would be as documentation to demonstrate typical usage. I don't appear to have AlignAce installed on my own machines - in particular, the nightly buildslaves don't have it. I don't think there is a Debian/Ubuntu package for AlignAce, so testing this under TravisCI is non-trivial - it looks like their licence agreement could block packaging it. Thanks, Peter From p.j.a.cock at googlemail.com Thu Nov 29 06:22:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 11:22:51 +0000 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: <50B74199.6020904@brueffer.de> References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> <50B74199.6020904@brueffer.de> Message-ID: On Thu, Nov 29, 2012 at 11:06 AM, Christian Brueffer wrote: > On 11/29/2012 06:56 PM, Peter Cock wrote: >> >> Can we continue this on the biopython-dev mailing list (CC'd)? >> > > (moved to biopython-dev) > Thanks. > Indeed. I already have a cleaned up wrapper and unit tests in my local > tree, but I don't want to submit them without actually testing them with an > up to date binary ;-) Excellent - I suspected you'd been doing something like this ;) > archive.org has a version of http://atlas.med.harvard.edu/ from 2011, > I have contacted the responsible person mentioned on the page. It was Bartek who wrote the original wrapper (I only made re-factoring changes since then), hopefully he still has a working AliceACE installation and can tell us the version numbers etc that he was using. Regards, Peter From christian at brueffer.de Thu Nov 29 06:06:01 2012 From: christian at brueffer.de (Christian Brueffer) Date: Thu, 29 Nov 2012 19:06:01 +0800 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> Message-ID: <50B74199.6020904@brueffer.de> On 11/29/2012 06:56 PM, Peter Cock wrote: > Can we continue this on the biopython-dev mailing list (CC'd)? > > On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer > wrote: >> On 11/29/2012 10:42 AM, Christian Brueffer wrote: >>> >>> Hi, >>> >>> in preparation of cleaning up the AlignACE wrapper, I wanted to test >>> the current wrapper. However, it doesn't seem to work at all ... >>> >>> For the record, I'm testing with the Linux version of the binary >>> (AlignACE version 2.3 October 27, 1998). >>> >> >> Some of the test files in the Tests directory mention the following AlignACE >> version: "AlignACE 4.0 05/13/04" >> >> This may be the answer to my problems. Does anyone know where to get hold >> of this version? >> >> The website (http://atlas.med.harvard.edu/) is down and the only >> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html) >> only distributes the old 2.3 version that I have. > > Hmm, I don't see any existing unit tests dedicated to this wrapper. > There should really be a file named test_AlignACE_tool.py or similar. > > I would also like some doctests in Bio/Motif/Applications/_AlignAce.py > which must be non-executing so they can be run without dependencies, > which of course isn't actually a functional test but it does still catch some > issues - but primarily would be as documentation to demonstrate typical > usage. > > I don't appear to have AlignAce installed on my own machines - in > particular, the nightly buildslaves don't have it. I don't think there is > a Debian/Ubuntu package for AlignAce, so testing this under > TravisCI is non-trivial - it looks like their licence agreement could > block packaging it. > (moved to biopython-dev) Indeed. I already have a cleaned up wrapper and unit tests in my local tree, but I don't want to submit them without actually testing them with an up to date binary ;-) archive.org has a version of http://atlas.med.harvard.edu/ from 2011, I have contacted the responsible person mentioned on the page. Cheers, Chris From mjldehoon at yahoo.com Thu Nov 29 09:33:12 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 29 Nov 2012 06:33:12 -0800 (PST) Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: Message-ID: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Mon, 11/26/12, Peter Cock wrote: > In general the NCBI identifiers are arbitrary strings, > although perhaps the pubmed identifiers could be treated as > integers. > This is perhaps worth changing in the Bio.Entrez code... > > What do you think Michael? If we change this in the Bio.Entrez code, we should put str(..) around all NCBI identifiers, not just the pubmed ones. Otherwise we'd have special treatment for one of the Entrez databases, which may cause problems in the future. I'm OK if somebody else adds the calls to str(..), but I wouldn't champion it myself. Best, -Michiel. From p.j.a.cock at googlemail.com Thu Nov 29 09:49:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 14:49:42 +0000 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Thu, Nov 29, 2012 at 2:33 PM, Michiel de Hoon wrote: > --- On Mon, 11/26/12, Peter Cock wrote: >> In general the NCBI identifiers are arbitrary strings, >> although perhaps the pubmed identifiers could be treated as >> integers. >> This is perhaps worth changing in the Bio.Entrez code... >> >> What do you think Michael? > > If we change this in the Bio.Entrez code, we should put str(..) around > all NCBI identifiers, not just the pubmed ones. Otherwise we'd have > special treatment for one of the Entrez databases, which may cause > problems in the future. Yes, after all there are other Entrez database with 'numerical' identifiers. > I'm OK if somebody else adds the calls to str(..), but I wouldn't champion > it myself. I don't mind doing the commit (and a unit test), but do you have any specific concern in mind? Peter From redmine at redmine.open-bio.org Thu Nov 29 12:12:31 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 17:12:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. File trie_debug.patch added I can reproduce the problem with your saved file under Mac OS X, using the latest Biopython from github, e.g. $ python Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import trie >>> import gzip >>> with gzip.open("trie.4.dat.gz") as handle: ... t = trie.load(handle) ... Traceback (most recent call last): File "", line 2, in RuntimeError: loading failed for some reason Adding a little debugging to the C code tells us where this fails (see attachment), line 669: 668 if(has_value) { 669 if(!(trie->value = (*read_value)(data))) 670 goto _deserialize_trie_error; 371 } What kind of CPU does your machine have? i.e. is it a normal Intel or AMD CPU, or something unusual like a PowerPC where we have to worry about the bit order interpretation? We may need a complete example creating the trie as well - the problem could be in the trie itself, the serialisation (writing to disk), or de-serialisation (loading from disk). ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 29 12:21:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 17:21:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I'm using ubuntu virtual machine running on MacBookPro using single Intel? Core? i7-2720QM CPU @ 2.20GHz processor. I will try to prepare code and data for which it fails. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Thu Nov 29 21:35:25 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 30 Nov 2012 03:35:25 +0100 Subject: [Biopython-dev] Minor buildbot issues from SearchIO In-Reply-To: References: Message-ID: Hi everyone, I've done some digging around to see how to deal with these issues. Here's what I found: > The BuildBot flagged two new issues overnight, > http://testing.open-bio.org/biopython/tgrid > > Python 2.5 on Windows - doctests are failing due to floating point decimal place > differences in the exponent (down to C library differences, something fixed in > later Python releases). Perhaps a Python 2.5 hack is the way to go here? > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio I've submitted a pull request to fix this here: https://github.com/biopython/biopython/pull/98 > Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity. > Perhaps there is some encoding setting needed under Python 3 for the BLAST > XML files? > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio I've also addressed these failures here: https://github.com/biopython/biopython/pull/99 > There is a separate cross-platform issue on Python 3.1, "TypeError: > invalid event tuple" > again with XML parsing. Curiously this had started a few days back in > the UniprotIO > tests on one machine, pre-dating the SearchIO merge. I'm not sure what > triggered it. > http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767 > http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio As for this one, it seems that it's caused by a bug in Python3.1 (http://bugs.python.org/issue9257) due to the way `xml.etree.cElemenTree.iterparse` accepts the `event` argument. I haven't submitted any pull request for this bug, since the fix looks quite messy. Should we try to address this or simply make note that XML parsing in Python3.1 will not work? Like Peter noted, currently this bug involves Bio.SearchIO blast xml parsing, SeqIO.UniprotIO, and Phylo.PhyloXMLIO. regards, Bow From diego_zea at yahoo.com.ar Fri Nov 30 08:00:20 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Fri, 30 Nov 2012 05:00:20 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: <1354280420.4305.YahooMailNeo@web140605.mail.bf1.yahoo.com> Hi! I were checking the Seq/AlignIO, and I think can be possible avoid the overhead of create Bio objects after Numpy object. Adding an optional funci?n in __init__ with a argument setting in False for default. When this arguments became True, objects based on Numpy are generate too. At the time, maybe can be more easy interchange between simple python objects and numpy based objects. And use all functionality of Bio and fast numerical operations of Numpy arrays... It's only and idea, what do you think? Thanks!!! :) ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } >________________________________ > De: Diego Zea >Para: "biopython-dev at lists.open-bio.org" >Enviado: martes, 27 de noviembre de 2012 11:25 >Asunto: [Biopython-dev] Numpy/Scipy and Biopython > >Hi!!! >This is my firts mail in the list. >I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project. >I did this post in Stackoverflow, and I want to share my question to all of you ;) > >http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated >Best wishes, > >? >if ((dx*dp)>=(h/(2*pi))) >{ >printf("Diego Javier Zea\n"); >} >_______________________________________________ >Biopython-dev mailing list >Biopython-dev at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From w.arindrarto at gmail.com Thu Nov 1 08:19:58 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 1 Nov 2012 09:19:58 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> Message-ID: Hi Kai, Michiel, (I hope this gets through to the mailing list. I'm CC-ing several people in the discussion as well, just in case). I've made a new branch based on Kai's SearchIO rebase here: https://github.com/bow/biopython/tree/searchio-rebase, with the following important changes: >>Does anyone have preference between '.acc' or '.accession'? If not, I >>can change the current '.acc' into '.accession'. > > I would prefer .accession for clarity. 1. All accession attributes now use the 'accession' name (https://github.com/bow/biopython/commit/002b08df91040e6bcf3f0dd3d087b3d378005632). There's a similar attribute from blast-tab, which is the accession number and its version. This has also been renamed from 'acc_ver' to 'accession_version'. The docs have been updated accordingly. > See the attached hmmpfam output. You'll notice that the domain table > is not in the order of the hit table. As I'd like to preserve the > order of the hit table, the current setup of the API forces me to > either repeatedly parse the domain annotations until I find the > correct domain annotations for my hit, or to create the hits in the > order of the domain annotation table and then reshuffle them to make > sure they're in the order of the hit table. > > If I could just create "empty" hit objects when parsing the hit table, > I could easily preserve the order of the hits but still add the hsps > as I parse them. 2. Regarding the Hit object API change, I've changed it so that Hit objects can now be created without any HSPs (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4). However, per my explanation about keeping as few places possible to store the same value (in this case the hit and query ID and description), the empty Hit object will raise errors if any of these attributes are accessed. Setting and getting these attributes will only work if there is at least one HSP in the Hit. Other Hit functions, like append, should work ok as long as it doesn't involve accessing these attributes. I think this will allow parsing of file formats like HMMER2 plain text while maintaining the attribute storage constraint. Hope these help :). regards, Bow From kai.blin at biotech.uni-tuebingen.de Thu Nov 1 09:10:11 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 01 Nov 2012 10:10:11 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> Message-ID: <50923C73.8060609@biotech.uni-tuebingen.de> On 2012-11-01 09:19, Wibowo Arindrarto wrote: Hi Bow, > 2. Regarding the Hit object API change, I've changed it so that Hit > objects can now be created without any HSPs > (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4). > However, per my explanation about keeping as few places possible to > store the same value (in this case the hit and query ID and > description), the empty Hit object will raise errors if any of these > attributes are accessed. Setting and getting these attributes will > only work if there is at least one HSP in the Hit. Other Hit > functions, like append, should work ok as long as it doesn't involve > accessing these attributes. I think this will allow parsing of file > formats like HMMER2 plain text while maintaining the attribute storage > constraint. I totally agree the Hit object isn't valid until it has at least one HSP. Thanks for that change. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From redmine at redmine.open-bio.org Thu Nov 1 10:48:11 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 1 Nov 2012 10:48:11 +0000 Subject: [Biopython-dev] [Biopython - Bug #3297] (Rejected) newline added in quated features References: Message-ID: Issue #3297 has been updated by Peter Cock. Status changed from New to Rejected Was this really files a year ago or is that an oddity in RedMine? All the discussion is in the last day... This to me is a bug in the GenBank data, rather than this:
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"
the data should have been line-split in a more sensible place, e.g.
                     /product="Glutamate synthase [NADPH] small chain (EC
                     1.4.1.13)"
In any case, the suggested fix is inappropriate for two reasons. First, as noted by Paul, it would remove the white space between words (the typical case). Second, the GenBank parser uses a scanner/consumer, with the GenBank specific consumer attempting to closely model the underlying data (and in this case keep the new lines as given) while the SeqRecord consumer (used by SeqIO) would convert the newlines into spaces. As noted by Paul, the translation value is a special case. Closing issue. ---------------------------------------- Bug #3297: newline added in quated features https://redmine.open-bio.org/issues/3297 Author: Jesse van Dam Status: Rejected Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system When I have a feature line like (which spans multiple lines) in a genbank file
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
  print(source[0].qualifiers["product"])
It will print (with the an unwanted space)
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
Changed the following thing in scanner.py to fix this problem
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

-- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Thu Nov 1 14:36:36 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 1 Nov 2012 15:36:36 +0100 Subject: [Biopython-dev] Working with the new SearchIO API In-Reply-To: <50923C73.8060609@biotech.uni-tuebingen.de> References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com> <50923C73.8060609@biotech.uni-tuebingen.de> Message-ID: Hi Kai, You're welcome :). I was thinking changing Hit similar to QueryResult, which you can create without containing any items. The trade off is that there's more attributes to keep track of (4 instead of 2) due to them being stored apart from the contained objects, so I chose not to do it for now. Anyway, let me know if there are still parsing difficulties because of the object model. cheers, Bow On Thu, Nov 1, 2012 at 10:10 AM, Kai Blin wrote: > On 2012-11-01 09:19, Wibowo Arindrarto wrote: > > Hi Bow, > > > 2. Regarding the Hit object API change, I've changed it so that Hit > > objects can now be created without any HSPs > > ( > https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4 > ). > > However, per my explanation about keeping as few places possible to > > store the same value (in this case the hit and query ID and > > description), the empty Hit object will raise errors if any of these > > attributes are accessed. Setting and getting these attributes will > > only work if there is at least one HSP in the Hit. Other Hit > > functions, like append, should work ok as long as it doesn't involve > > accessing these attributes. I think this will allow parsing of file > > formats like HMMER2 plain text while maintaining the attribute storage > > constraint. > > I totally agree the Hit object isn't valid until it has at least one > HSP. Thanks for that change. > > Cheers, > Kai > > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben > From eric.talevich at gmail.com Thu Nov 1 18:10:17 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 1 Nov 2012 14:10:17 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock wrote: > On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: > > > > Peter; > > > >> In the case of Bow's SearchIO code, what would you prefer? > >> e.g. Bio.SearchIO as it is now on his branch? > > > > I like plain ol' Search the best but don't have a strong preference. I'm > > terrible at naming things so trust everyone's judgment on this. > > > > Brad > > Since we have no clear consensus, I propose we add Bow's code > as Bio.SearchIO (which is how it is written right now), with the new > BiopythonExperimentalWarning in place (to alert people that it may > change in the next release). We can then rename or move it at a > later date. This will make it easier for people to test the code, and > also suggest further changes or additions (e.g. Kai's HMMER work). > > If we and when we agree a consolidation of the Bio.SeqXXX > modules, then Bio.SearchIO could move too. If this happens > before any public release as Bio.SearchIO so much the better. > > Adopting lower case module names under Python 3 is also a > separate issue. > > Peter > > +1 Regarding the "great upheaval" of module renaming and reorganization: 0. If the only change is to combine the SeqIO, Seq, SeqRecord and SeqFeature classes under a single module, we probably can do that in a backwards-compatible way. But that means keeping our StudlyCaps module names for the most part. 1. If we're going to change the API substantially, we might as well "do it right". Besides our PEP8 non-compliance, there are some dark, dusty corners of Biopython that we ought to clean up while we're at it -- reorganize the little historical fiefdoms into a coherent structure. We'd call it Biopython 2. 2. Observing BioPerl and BioRuby, it could make sense to split the distribution into multiple, with a sequence- and data-oriented "biopython-core" package and separate packages for, say, 3D structures ("biopython-struct") and perhaps other existing components that have ready maintainers and which the "core" of Biopython doesn't rely on. I don't think we need to fragment the code base much, primarily just extract PDB, SCOP and the other parts that depend on NumPy. On GitHub, these repositories would still be under the biopython organization name. 3. If we've decided to focus on Python 3 for the reorganization, we can take advantage of new features in that lineage for packaging, organization and distribution. These features could make it easier to have side-by-side Biopython 1 and 2 installations (maybe), and also plugging additional modules into the main "bio" package (namespace packages, new in Py3.3). 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy convention. 5. Porting: I, personally, would keep using the old Biopython for everything that's meant to run on Python 2, which is, currently, everything. Biopython2 running on Python 3 would give me an excuse to start using Python 3 for new code. Keeping these separate would be more difficult if the lowercasing were done under the same "Bio" namespace. Thoughts? -Eric From p.j.a.cock at googlemail.com Thu Nov 1 18:46:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 18:46:36 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: > On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock > wrote: >> >> Since we have no clear consensus, I propose we add Bow's code >> as Bio.SearchIO (which is how it is written right now), with the new >> BiopythonExperimentalWarning in place (to alert people that it may >> change in the next release). We can then rename or move it at a >> later date. This will make it easier for people to test the code, and >> also suggest further changes or additions (e.g. Kai's HMMER work). >> >> If we and when we agree a consolidation of the Bio.SeqXXX >> modules, then Bio.SearchIO could move too. If this happens >> before any public release as Bio.SearchIO so much the better. >> >> Adopting lower case module names under Python 3 is also a >> separate issue. >> >> Peter >> > > +1 > > Regarding the "great upheaval" of module renaming and reorganization: > > 0. If the only change is to combine the SeqIO, Seq, SeqRecord and > SeqFeature classes under a single module, we probably can do that > in a backwards-compatible way. But that means keeping our > StudlyCaps module names for the most part. Yes, that is something we could do in a backwards compatible way, with the old "StdulyCaps" Bio.SeqXXX modules persisting as legacy imports for at least a year (say). But it is worth it? See below. > 1. If we're going to change the API substantially, we might as well "do it > right". Besides our PEP8 non-compliance, there are some dark, dusty corners > of Biopython that we ought to clean up while we're at it -- reorganize the > little historical fiefdoms into a coherent structure. We'd call it Biopython > 2. Absolutely there are things we've lived with out of backwards compatibility - the Alphabet objects are one example (foremost the way gaps and stops codons were done with wrapper objects). I'd also like us to switch the restriction digest module to using zero based counting as Guido intended, and simplify some of the more 'magical' code which has caused trouble porting to the other Python implementations. > 2. Observing BioPerl and BioRuby, it could make sense to split the > distribution into multiple, with a sequence- and data-oriented > "biopython-core" package and separate packages for, say, 3D structures > ("biopython-struct") and perhaps other existing components that have ready > maintainers and which the "core" of Biopython doesn't rely on. I don't think > we need to fragment the code base much, primarily just extract PDB, SCOP and > the other parts that depend on NumPy. On GitHub, these repositories would > still be under the biopython organization name. A clearer divide would be good - something we have at some level already along the lines with and without numpy. However, given the still unclear future for python packaging I'm not quite so sure if we can/should go all the way to separate packages. Perhaps I am being unduly worried by the concerns in the numpy/scipy community? After all, we have no fortran code! > 3. If we've decided to focus on Python 3 for the reorganization, we can take > advantage of new features in that lineage for packaging, organization and > distribution. These features could make it easier to have side-by-side > Biopython 1 and 2 installations (maybe), and also plugging additional > modules into the main "bio" package (namespace packages, new in Py3.3). We can and should port the current namespace to Python 3, but writing "Biopython 2" for Python 3 only (not Python 2) sounds wise. More on this below. > 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't > know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy > convention. As noted before, we couldn't use "bio" on the average Mac either - the default file system is like Windows, case insensitive. The name biopy is in-line with bumpy/scipy, which is a plus. I know not everyone liked this name, but personally it seems fine. Better than bio2 in my view. > 5. Porting: I, personally, would keep using the old Biopython for everything > that's meant to run on Python 2, which is, currently, everything. Biopython2 > running on Python 3 would give me an excuse to start using Python 3 for new > code. Keeping these separate would be more difficult if the lowercasing were > done under the same "Bio" namespace. > > Thoughts? As noted above, I'm on board with planning a Biopython 2 requiring Python 3 or later. I would regard this as effectively be forking from the current code base, porting individual modules on a case by case basis (doing a final 2to3 conversion manually as part of this). The code could be shared as a series of 'alpha' level releases for early testing - assume we want to make some releases, particularly for Windows where fewer potential testers would have all the compilers setup to follow the repository. However, if we do that, we would still support Biopython 1.xx under Python 3 as well (via 2to3 as we are now, currently 'beta' level support) for some time in parallel (although likely not getting major new features - just bug fixes and if required updates for format changes). Is there enough enthusiasm now to start planning what we'd change for a (potentially Python 3 only) Biopython 2 yet? Peter From p.j.a.cock at googlemail.com Thu Nov 1 19:40:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 19:40:32 +0000 Subject: [Biopython-dev] Fwd: OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: FYI regarding the Biopython website and recent mailing list outage. Peter PS you also keep an eye on @Biopython and @OBF_news on Twitter, which are a useful alternative when the mailing lists are down. ---------- Forwarded message ---------- From: *Peter Cock* Date: Thursday, November 1, 2012 Subject: OBF server outage announcement / call for SysAdmin volunteers To: open-bio-l at lists.open-bio.org, OBF Members Cc: Chris Dagdigian , OBF Board Dear all, As many of you may have noticed, yesterday the Open Bioinformatics Foundation (OBF) server hosting the mailing lists and most of the Bio* websites went down. The mailing lists and simple static webpages (e.g. download pages for Bio* releases) seem to be back online, as is the OBF news blog: http://news.open-bio.org/news/ - but the wiki pages are down (which unfortunately means the Bio* homepages are unavailable). Services on the failing server are being moved to virtual machines on the Amazon Cloud, so it may take a few days until everything has been set up properly and the wiki will be back. If there is anybody from the Bio* projects who wants to join the OBF's SysAdmin team and help out with projects like this one, this would be a good moment to volunteer - please email me or Chris Dagdigian (the OBF Treasurer and our head Systems Administrator). Thank you, and please bear with us, Peter On behalf of the OBF Board of Directors. From p.j.a.cock at googlemail.com Thu Nov 1 19:50:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Nov 2012 19:50:50 +0000 Subject: [Biopython-dev] OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock wrote: > FYI regarding the Biopython website and recent mailing list outage. > > Peter > > PS you also keep an eye on @Biopython and @OBF_news on Twitter, > which are a useful alternative when the mailing lists are down. > > I should have added that while the wiki is down (which does unfortunately include the Biopython home page), the Biopython downloads remain available via http://biopython.org/DIST/ and other 'static' content like the Tutorial and API pages are up: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/api/ Our source code repository is on GitHub, also fine: https://github.com/biopython/biopython Issue tracking is on our RedMine server, also fine: https://redmine.open-bio.org/projects/biopython Nightly unit tests are on our Buildbot server, also fine: http://testing.open-bio.org/biopython/tgrid Continuous integration testing is on TravisCI, also fine: http://travis-ci.org/biopython/biopython Regards, Peter From andrewscz at gmail.com Thu Nov 1 20:32:10 2012 From: andrewscz at gmail.com (Andrew Sczesnak) Date: Thu, 1 Nov 2012 13:32:10 -0700 Subject: [Biopython-dev] Pull Request: MafIO.py In-Reply-To: References: <620A45B10433AE4C81D3F931A02812F93BE3FB5721@LESMBX1.adf.bham.ac.uk> Message-ID: Thanks Nick! I updated the MafIO branch to allow reading of other key names not specified in the MAF spec. However, writing is still restricted to "score" and "pass" keys. On Thu, Nov 1, 2012 at 4:51 AM, Nick Loman wrote: > Hi Andrew > > Here you go: > > https://gist.github.com/58bc53d492ecc112d926 > > Thanks for your help > > Regards > > Nick > > > > On Wed, Oct 31, 2012 at 6:10 PM, Andrew Sczesnak > wrote: >> >> Nick, >> >> Can you provide a snippet of a file from mugsy for the unit tests? >> >> Thanks, >> Andrew >> >> On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org >> wrote: >> >> > From: Nick Loman >> > Date: Tue, Oct 30, 2012 at 6:34 AM >> > Subject: Pull Request: MafIO.py >> > >> > >> > Hi there >> > >> > Thanks for the MafIO branch. In order to get it to read MAF files >> > produced >> > by Mugsy (mugsy.sourceforge.net) I had to make the following change: >> > >> > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py >> > index 6eda0ca..4bb1407 100644 >> > --- a/Bio/AlignIO/MafIO.py >> > +++ b/Bio/AlignIO/MafIO.py >> > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet = >> > single_letter_alphabet): >> > >> > annotations = dict([x.split("=") for x in >> > line.strip().split()[1:]]) >> > >> > - if len([x for x in annotations.keys() if x not in ("score", >> > "pass")]) > 0: >> > + if len([x for x in annotations.keys() if x not in ("score", >> > "pass", "label", "mult")]) > 0: >> > raise ValueError("Error parsing alignment - invalid key >> > in >> > 'a' line") >> > elif line.startswith("#"): >> > # ignore comments >> > >> > >> > My Python fork is a bit confusing right now so hope you don't mind me >> > sending this pull request via email! >> > >> > Cheers >> > >> > Nick > > From eric.talevich at gmail.com Fri Nov 2 02:47:56 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 1 Nov 2012 22:47:56 -0400 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: References: <1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com> <508A694B.7030800@biotech.uni-tuebingen.de> <508A8041.2020203@biotech.uni-tuebingen.de> <87pq42s9lt.fsf@fastmail.fm> <874nldqi3t.fsf@fastmail.fm> Message-ID: On Thu, Nov 1, 2012 at 2:46 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich > wrote: > > > 2. Observing BioPerl and BioRuby, it could make sense to split the > > distribution into multiple, with a sequence- and data-oriented > > "biopython-core" package and separate packages for, say, 3D structures > > ("biopython-struct") and perhaps other existing components that have > ready > > maintainers and which the "core" of Biopython doesn't rely on. I don't > think > > we need to fragment the code base much, primarily just extract PDB, SCOP > and > > the other parts that depend on NumPy. On GitHub, these repositories would > > still be under the biopython organization name. > > A clearer divide would be good - something we have at some level > already along the lines with and without numpy. However, given > the still unclear future for python packaging I'm not quite so sure > if we can/should go all the way to separate packages. Perhaps I > am being unduly worried by the concerns in the numpy/scipy > community? After all, we have no fortran code! > My own use of packaging features and setuptools in particular is pretty primitive, so I'm not sure what the risks are. Having a separate repository for structure-related code would make it much easier for me and Jo?o to hack on a Bio.PDB successor, I think. It would also be nice to have a dependency-free "core" and then a bit more flexibility in using dependencies for add-on packages -- there are a lot of good existing libraries for structural biology, for instance, and since performance is so important there we even might want to start using Cython for some of that code. Then there's Lenna's pure-Python mmCIF parser which depends on PLY. > > 5. Porting: I, personally, would keep using the old Biopython for > everything > > that's meant to run on Python 2, which is, currently, everything. > Biopython2 > > running on Python 3 would give me an excuse to start using Python 3 for > new > > code. Keeping these separate would be more difficult if the lowercasing > were > > done under the same "Bio" namespace. > > > > Thoughts? > > > As noted above, I'm on board with planning a Biopython 2 requiring Python 3 > or later. I would regard this as effectively be forking from the current > code > base, porting individual modules on a case by case basis (doing a final > 2to3 > conversion manually as part of this). The code could be shared as a series > of 'alpha' level releases for early testing - assume we want to make some > releases, particularly for Windows where fewer potential testers would > have all the compilers setup to follow the repository. > > Sounds good to me. > However, if we do that, we would still support Biopython 1.xx under > Python 3 as well (via 2to3 as we are now, currently 'beta' level support) > for some time in parallel (although likely not getting major new features - > just bug fixes and if required updates for format changes). > > Sure. I'm assuming it will be some time before we have a Biopython2 we're happy with, sorting out the module organization, dusting off old code, dealing with module-specific dependencies and so on, and I'm OK with that. > Is there enough enthusiasm now to start planning what we'd change for > a (potentially Python 3 only) Biopython 2 yet? > > Peter > Maybe a good time to create the initial fork would be after we've merged the latest GSoC work and any feasible long-running branches. The Bio.PDB-related GSoC work, on the other hand, seems to be held up specifically because we're afraid to muck with the existing sub-package too much with unstable new code, and I can imagine it would be easier to land it in a new namespace. -Eric From mjldehoon at yahoo.com Fri Nov 2 16:01:35 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 2 Nov 2012 09:01:35 -0700 (PDT) Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: Message-ID: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi everybody, --- On Thu, 11/1/12, Eric Talevich wrote: > 1. If we're going to change the API substantially, we might > as well "do it right". Besides our PEP8 non-compliance, there > are some dark, dusty corners of Biopython that we ought to clean > up while we're at it -- reorganize the little historical fiefdoms > into a coherent structure. We'd call it Biopython 2. +1. > 2. Observing BioPerl and BioRuby, it could make sense to > split the distribution into multiple, with a sequence- and > data-oriented "biopython-core" package and separate packages > for, say, 3D structures ("biopython-struct") and perhaps other > existing components that have ready > maintainers and which the "core" of Biopython doesn't rely > on. I don't think we need to fragment the code base much, > primarily just extract PDB, SCOP and the other parts that > depend on NumPy. This goes against the "coherent structure" in point 1. What is the advantage of splitting the distribution according to whether a module needs NumPy or not? I don't see an advantage to the user, and I don't see an advantage to the developers either. Already I feel that we need to install too many packages to get going with Python in bioinformatics (Python itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to explain to people new to bioinformatics or new to Python. So I would prefer to keep one distribution. We can be more lenient in terms of dependencies, especially those that don't occur at compile time. > 4. Naming: "bio" is clean but might cause problems on > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; > "biopy" follows the numpy/scipy convention. Any problems on Windows will only occur during a transition period, so I wouldn't worry about that too much. Perhaps we should check if there would be any problems; if they are severe, we could check for an existing Biopython installation in setup.py. bio2 would stay with us forever (well at least until bio3) and is just plain ugly, especially to new users who are not aware of the transition. Then there is the issue that "bio2" would not be for Python 2 but for Python 3. The "py" is needed in numpy and scipy because otherwise it would be "num" and "sci", which is too short. On the other hand, "bio" is used as a prefix in lots of words, and can stand on its own. Therefore, hurray for "bio". > 5. Porting: I, personally, would keep using the old Biopython for > everything that's meant to run on Python 2, which is, currently, > everything. Biopython2 running on Python 3 would give me an > excuse to start using Python 3 for new code. Keeping these > separate would be more difficult if the lowercasing were done > under the same "Bio" namespace. Yes that makes sense. Best, -Michiel. From anaryin at gmail.com Sat Nov 3 11:12:37 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Sat, 3 Nov 2012 12:12:37 +0100 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi everyone, A bit late for the party but my two cents. I agree with Eric in that we should take the opportunity to review some "dark corners" of the code. Regarding what I can contribute to, there are a lot of changes planned for Bio.PDB that could benefit from a "cleaner start". However, and also in line with Michiel, splitting the distribution in core/extras would be more cumbersome for new users. However, what about having in the setup file a part where the user can turn on/off installation of particular parts of the package. This way you can control if you need the dependencies or not. By default you would install everything as it is now, but it would give you a larger degree of control. As for the namespace and lowercase, I don't really have strong arguments, but I like 'bio'. Cheers, Jo?o Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2012/11/2 Michiel de Hoon > Hi everybody, > > --- On Thu, 11/1/12, Eric Talevich wrote: > > 1. If we're going to change the API substantially, we might > > as well "do it right". Besides our PEP8 non-compliance, there > > are some dark, dusty corners of Biopython that we ought to clean > > up while we're at it -- reorganize the little historical fiefdoms > > into a coherent structure. We'd call it Biopython 2. > > +1. > > > 2. Observing BioPerl and BioRuby, it could make sense to > > split the distribution into multiple, with a sequence- and > > data-oriented "biopython-core" package and separate packages > > for, say, 3D structures ("biopython-struct") and perhaps other > > existing components that have ready > > maintainers and which the "core" of Biopython doesn't rely > > on. I don't think we need to fragment the code base much, > > primarily just extract PDB, SCOP and the other parts that > > depend on NumPy. > > This goes against the "coherent structure" in point 1. What is the > advantage of splitting the distribution according to whether a module needs > NumPy or not? I don't see an advantage to the user, and I don't see an > advantage to the developers either. Already I feel that we need to install > too many packages to get going with Python in bioinformatics (Python > itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to > compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to > explain to people new to bioinformatics or new to Python. So I would prefer > to keep one distribution. > > We can be more lenient in terms of dependencies, especially those that > don't occur at compile time. > > > 4. Naming: "bio" is clean but might cause problems on > > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean; > > "biopy" follows the numpy/scipy convention. > > Any problems on Windows will only occur during a transition period, so I > wouldn't worry about that too much. Perhaps we should check if there would > be any problems; if they are severe, we could check for an existing > Biopython installation in setup.py. > > bio2 would stay with us forever (well at least until bio3) and is just > plain ugly, especially to new users who are not aware of the transition. > Then there is the issue that "bio2" would not be for Python 2 but for > Python 3. > > The "py" is needed in numpy and scipy because otherwise it would be "num" > and "sci", which is too short. On the other hand, "bio" is used as a prefix > in lots of words, and can stand on its own. Therefore, hurray for "bio". > > > 5. Porting: I, personally, would keep using the old Biopython for > > everything that's meant to run on Python 2, which is, currently, > > everything. Biopython2 running on Python 3 would give me an > > excuse to start using Python 3 for new code. Keeping these > > separate would be more difficult if the lowercasing were done > > under the same "Bio" namespace. > > Yes that makes sense. > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Sun Nov 4 13:09:35 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 13:09:35 +0000 Subject: [Biopython-dev] PEP8 lower case module names? In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi, On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon wrote: > Already I feel that we need to install too many packages to get going with > Python in bioinformatics (Python itself, NumPy, Matplotlib and its > dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps > SciPy, Biopython). I find this hard to explain to people new to > bioinformatics or new to Python. So I would prefer to keep one distribution. > > We can be more lenient in terms of dependencies, especially those that > don't occur at compile time. > > One of the things that I always found lacking with biopython is a clear, consistent policy on dependencies: Depending on the mood of the day it could be either good/bad to add a library dependency. As an example, this ended up with there being a dependency on reportlab, but not on scipy. Whatever the policy, I think that is should be consistent all across. Preferably simple to both users and developers. A few ideas on policy: 1. I totally agree with the the idea of being as lenient as possible with dependencies (as you say, especially with those that do not occur at compile time). 2. Biopython belongs to a certain software ecology. I think it would make sense to see as natural adding dependencies on well established python libraries. 3. (1+2) If a developer wants to add a dependency on a package, that should not be a major problem (as long as the package is maintained for long/well known/stable). Users should only have to deal with the dependency if they need the functionality that depends on that package. Python being a dynamic language, there does not have to be a burden on users/developers if a remote part of Biopython depends on something more exotic (which most users/developers will never see/install in any case). Again by "exotic" I mean well known libraries with a track record of years of stability. Tiago PS - Another issue that it would be interesting see cleared-up would be the policy on compile time (linkage) dependencies. Are new ones encouraged? What about Java/Jython based? From p.j.a.cock at googlemail.com Sun Nov 4 14:01:16 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 14:01:16 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? Message-ID: Retitling thread On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o wrote: > Hi, > > > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon wrote: >> >> Already I feel that we need to install too many packages to get going with >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps >> SciPy, Biopython). I find this hard to explain to people new to >> bioinformatics or new to Python. So I would prefer to keep one distribution. >> >> We can be more lenient in terms of dependencies, especially those that >> don't occur at compile time. >> > > One of the things that I always found lacking with biopython is a clear, > consistent policy on dependencies: It would be good to have something written down, just as we did with the deprecation policy. > Depending on the mood of the day it could be either good/bad > to add a library dependency. As an example, this ended up > with there being a dependency on reportlab, but not on scipy. The ReportLab dependency is a 'run time only' dependency and has been in Biopython for a very long time. You'd have to remind me if there was any compile time issue with scipy, but my recollection was we were loath to add a dependency on scipy (which is quite a complex library to install if not using a package) for just one or two functions - however you were planning something more substantial in the PopGen code which would justify it (using lots of statistics). > Whatever the policy, I think that is should be consistent all across. > Preferably simple to both users and developers. > > A few ideas on policy: > > 1. I totally agree with the the idea of being as lenient as possible with > dependencies (as you say, especially with those that do not occur at > compile time). > 2. Biopython belongs to a certain software ecology. I think it would make > sense to see as natural adding dependencies on well established python > libraries. > 3. (1+2) If a developer wants to add a dependency on a package, that should > not be a major problem (as long as the package is maintained for long/well > known/stable). Users should only have to deal with the dependency if they > need the functionality that depends on that package. > > Python being a dynamic language, there does not have to be a burden on > users/developers if a remote part of Biopython depends on something more > exotic (which most users/developers will never see/install in any case). > Again by "exotic" I mean well known libraries with a track record of years > of stability. That all sounds reasonable. It is compile time dependencies that I am most wary of. However, from an end user perspective having installed Biopython and then trying a script from a colleague and only then finding 101 optional run time dependencies are also needed would be annoying. For Linux packages like Debian there is a 'recommends' field for this kind of soft dependency. Where do we stand with declaring dependencies in setup.py so that if using a package manager like pip this it less painful? In fact, how many 'soft' dependencies like this do we already have? Just from a quick look at the README file many are not mentioned under the current 'System Requirements' text (e.g. Network X). > Tiago > PS - Another issue that it would be interesting see cleared-up would be the > policy on compile time (linkage) dependencies. Are new ones encouraged? Currently discouraged. They make installation much more painful, and have tended to be left untested, e.g. mmCIF was for many years disabled by default because no one could work out how to detect its requirements at compile time. > What about Java/Jython based? I'm not so keen on something providing Java/Jython only functionality. However, something where we could require library X under Jython while using library Y under C Python makes sense. Database access would be a perfect example - things like Python's sqlite3 don't yet exist under Jython. Peter From sbassi at clubdelarazon.org Sun Nov 4 17:34:55 2012 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Sun, 4 Nov 2012 14:34:55 -0300 Subject: [Biopython-dev] 403 link Message-ID: On page http://biopython.org/wiki/Documentation there are 2 links to a 403 error: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf I can't correct this doc since I don't know were they are. From p.j.a.cock at googlemail.com Sun Nov 4 18:08:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 18:08:40 +0000 Subject: [Biopython-dev] 403 link In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 5:34 PM, Sebastian Bassi wrote: > On page http://biopython.org/wiki/Documentation there are 2 links to a > 403 error: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > I can't correct this doc since I don't know were they are. The links are correct - this is a side effect of the current migration from the (dying) OBF server to an Amazon hosted virtual machine. As of yesterday the static pages were up and the wiki down, for now it is the other way round... its being worked on. Regards, Peter From eric.talevich at gmail.com Sun Nov 4 19:47:53 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Nov 2012 14:47:53 -0500 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock wrote: > Retitling thread > > On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o wrote: > > Hi, > > > > > > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon > wrote: > >> > >> Already I feel that we need to install too many packages to get going > with > >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its > >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps > >> SciPy, Biopython). I find this hard to explain to people new to > >> bioinformatics or new to Python. So I would prefer to keep one > distribution. > >> > >> We can be more lenient in terms of dependencies, especially those that > >> don't occur at compile time. > >> > > > > One of the things that I always found lacking with biopython is a clear, > > consistent policy on dependencies: > > It would be good to have something written down, just as we > did with the deprecation policy. > Should we start a page for this on the wiki? > > Depending on the mood of the day it could be either good/bad > > to add a library dependency. As an example, this ended up > > with there being a dependency on reportlab, but not on scipy. > > The ReportLab dependency is a 'run time only' dependency and > has been in Biopython for a very long time. You'd have to remind > me if there was any compile time issue with scipy, but my > recollection was we were loath to add a dependency on scipy > (which is quite a complex library to install if not using a package) > for just one or two functions - however you were planning something > more substantial in the PopGen code which would justify it (using > lots of statistics). > > > Whatever the policy, I think that is should be consistent all across. > > Preferably simple to both users and developers. > > > > A few ideas on policy: > > > > 1. I totally agree with the the idea of being as lenient as possible with > > dependencies (as you say, especially with those that do not occur at > > compile time). > > 2. Biopython belongs to a certain software ecology. I think it would make > > sense to see as natural adding dependencies on well established python > > libraries. > > 3. (1+2) If a developer wants to add a dependency on a package, that > should > > not be a major problem (as long as the package is maintained for > long/well > > known/stable). Users should only have to deal with the dependency if they > > need the functionality that depends on that package. > > > > Python being a dynamic language, there does not have to be a burden on > > users/developers if a remote part of Biopython depends on something more > > exotic (which most users/developers will never see/install in any case). > > Again by "exotic" I mean well known libraries with a track record of > years > > of stability. > > That all sounds reasonable. It is compile time dependencies that I am > most wary of. > Pure-Python dependencies seem less scary -- a package like PLY should work on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the dependencies that are most tempting are the ones with essential C extensions (numpy, scipy, matplotlib). However, from an end user perspective having installed Biopython and > then trying a script from a colleague and only then finding 101 optional > run time dependencies are also needed would be annoying. > > For Linux packages like Debian there is a 'recommends' field for this kind > of soft dependency. Where do we stand with declaring dependencies in > setup.py so that if using a package manager like pip this it less painful? > > In fact, how many 'soft' dependencies like this do we already have? > Just from a quick look at the README file many are not mentioned > under the current 'System Requirements' text (e.g. Network X). > I just used "git grep import Bio/" to find out. The only egregious undocumented dependencies are the ones I added in Phylo for graphics: networkx and matplotlib/pylab. Other *possible* dependencies are sqlite3 in the case of Jython (Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k). Should we add these to the "install_recommends" list in setup.py? > > Tiago > > PS - Another issue that it would be interesting see cleared-up would be > the > > policy on compile time (linkage) dependencies. Are new ones encouraged? > > Currently discouraged. They make installation much more painful, and > have tended to be left untested, e.g. mmCIF was for many years disabled > by default because no one could work out how to detect its requirements > at compile time. > > > What about Java/Jython based? > > I'm not so keen on something providing Java/Jython only functionality. > However, something where we could require library X under Jython > while using library Y under C Python makes sense. Database access > would be a perfect example - things like Python's sqlite3 don't yet exist > under Jython. > > Peter > From tiagoantao at gmail.com Sun Nov 4 20:49:33 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 20:49:33 +0000 Subject: [Biopython-dev] Jython DB Message-ID: Howdy, On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote: > Retitling thread > Again ;) > while using library Y under C Python makes sense. Database access > would be a perfect example - things like Python's sqlite3 don't yet exist > under Jython. > > I noticed that there is 1 reference to sqlite3: Bio.SeqIO._index Other stuff on BioSQL is just really related to database configuration and does not impair functionality (exception to a test case that really depends on sqlite3). I suppose that a "default" DB with Jython would probably be JavaDB (aka Apache Derby)? It is available as a default on the Sun/Oracle JDK (though not the JRE). I could go ahead and have a try at evaluating the portability costs for sqlite3->javadb. In theory it should be easy ( http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html) -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Sun Nov 4 20:49:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 20:49:58 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sunday, November 4, 2012, Eric Talevich wrote: > On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock > > wrote: > >> Retitling thread >> >> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o > >> wrote: >> > Hi, >> > >> > >> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon > >> wrote: >> >> >> >> Already I feel that we need to install too many packages to get going >> with >> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its >> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps >> >> SciPy, Biopython). I find this hard to explain to people new to >> >> bioinformatics or new to Python. So I would prefer to keep one >> distribution. >> >> >> >> We can be more lenient in terms of dependencies, especially those that >> >> don't occur at compile time. >> >> >> > >> > One of the things that I always found lacking with biopython is a clear, >> > consistent policy on dependencies: >> >> It would be good to have something written down, just as we >> did with the deprecation policy. >> > > Should we start a page for this on the wiki? > > The wiki is online again now :) Maybe agree a draft by email first? > > Depending on the mood of the day it could be either good/bad >> > to add a library dependency. As an example, this ended up >> > with there being a dependency on reportlab, but not on scipy. >> >> The ReportLab dependency is a 'run time only' dependency and >> has been in Biopython for a very long time. You'd have to remind >> me if there was any compile time issue with scipy, but my >> recollection was we were loath to add a dependency on scipy >> (which is quite a complex library to install if not using a package) >> for just one or two functions - however you were planning something >> more substantial in the PopGen code which would justify it (using >> lots of statistics). >> >> > Whatever the policy, I think that is should be consistent all across. >> > Preferably simple to both users and developers. >> > >> > A few ideas on policy: >> > >> > 1. I totally agree with the the idea of being as lenient as possible >> with >> > dependencies (as you say, especially with those that do not occur at >> > compile time). >> > 2. Biopython belongs to a certain software ecology. I think it would >> make >> > sense to see as natural adding dependencies on well established python >> > libraries. >> > 3. (1+2) If a developer wants to add a dependency on a package, that >> should >> > not be a major problem (as long as the package is maintained for >> long/well >> > known/stable). Users should only have to deal with the dependency if >> they >> > need the functionality that depends on that package. >> > >> > Python being a dynamic language, there does not have to be a burden on >> > users/developers if a remote part of Biopython depends on something more >> > exotic (which most users/developers will never see/install in any case). >> > Again by "exotic" I mean well known libraries with a track record of >> years >> > of stability. >> >> That all sounds reasonable. It is compile time dependencies that I am >> most wary of. >> > > Pure-Python dependencies seem less scary -- a package like PLY should work > on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the > dependencies that are most tempting are the ones with essential C > extensions (numpy, scipy, matplotlib). > But (for example) matplotlib wouldn't be a build time dependency for us. > However, from an end user perspective having installed Biopython and >> then trying a script from a colleague and only then finding 101 optional >> run time dependencies are also needed would be annoying. >> >> For Linux packages like Debian there is a 'recommends' field for this kind >> of soft dependency. Where do we stand with declaring dependencies in >> setup.py so that if using a package manager like pip this it less painful? >> >> In fact, how many 'soft' dependencies like this do we already have? >> Just from a quick look at the README file many are not mentioned >> under the current 'System Requirements' text (e.g. Network X). >> > > I just used "git grep import Bio/" to find out. The only egregious > undocumented dependencies are the ones I added in Phylo for graphics: > networkx and matplotlib/pylab. > Could you add those to the README file then? > Other *possible* dependencies are sqlite3 in the case of Jython > (Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k). > > Should we add these to the "install_recommends" list in setup.py? > No, they are in the standard lib on C Python, except in the case of OrderedDict on older Pythons were we bundle a backport anyway. Jython has an open bug on including the sqlite3 module, and might be worth mentioning under a new Jython specific section of the README. Peter From tiagoantao at gmail.com Sun Nov 4 21:00:10 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 4 Nov 2012 21:00:10 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote: > Jython has an open bug on including the sqlite3 module, > > This will go nowhere fast as it will be dependent on a JNI library (i.e. linkage of C code). The only durable option in the Java space would be a native implementation of sqlite3. All other options are not of the "embeddable" type (e.g. JDBC driver to something running outside), defeating the main purpose of sqlite3. To sum it up: I doubt that sqlite3 will be a realistic solution in the Jython space. As per previous email, I suspect that a Python DBI to JDBC bridge (bundled with Jython by default) + a default database (javadb/derby or H2 or HSQLDB) is probably more realistic in the Java space. On the OracleJDK javadb will require 0 dependencies. On other JDK or a JRE, Apache derby. -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Sun Nov 4 21:47:20 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 21:47:20 +0000 Subject: [Biopython-dev] Jython DB In-Reply-To: References: Message-ID: Hi Tiago, On Sun, Nov 4, 2012 at 8:49 PM, Tiago Ant?o wrote: > Howdy, > > On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote: >> >> Retitling thread > > > Again ;) > > >> >> while using library Y under C Python makes sense. Database access >> would be a perfect example - things like Python's sqlite3 don't yet exist >> under Jython. >> > > I noticed that there is 1 reference to sqlite3: > Bio.SeqIO._index > > Other stuff on BioSQL is just really related to database configuration and > does not impair functionality (exception to a test case that really depends > on sqlite3). > > I suppose that a "default" DB with Jython would probably be JavaDB (aka > Apache Derby)? It is available as a default on the Sun/Oracle JDK (though > not the JRE). > > I could go ahead and have a try at evaluating the portability costs for > sqlite3->javadb. In theory it should be easy > (http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html) The database stuff in Biopython currently is BioSQL (which under C Python supports a MySQL, PostgreSQL or SQLite back end) and things like SeqIO.index which use SQLite3 directly. None of this currently works under Jython :( I was hoping Jython would implement an sqlite3 module which we (and any other Python library) could just use - there seems to be no progress on that: http://bugs.jython.org/issue1682864 Likewise the MySQLdb and PostgreSQL modules. Failing a port allowing our current code to "just work", someone could write alternative code for Biopython to all an appropriate Java DB interface directly. For our BioSQL we already have a structure to cope with a range of backends, so this should be quite clean. In the case of Bio.SeqIO.index_db, we probably only use a fraction of the full sqlite3 module's capabilities, so special casing this under Jython to call JavaDB might not be too complicated... (for anyone who knows there way round Jython and JavaDB)? If you fancy exploring SQLite3 under Jython, go for it :) Peter From p.j.a.cock at googlemail.com Sun Nov 4 21:48:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 4 Nov 2012 21:48:56 +0000 Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names? In-Reply-To: References: Message-ID: On Sun, Nov 4, 2012 at 9:00 PM, Tiago Ant?o wrote: > On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote: >> >> Jython has an open bug on including the sqlite3 module, >> > > This will go nowhere fast as it will be dependent on a JNI library (i.e. > linkage of C code). > The only durable option in the Java space would be a native implementation > of sqlite3. > All other options are not of the "embeddable" type (e.g. JDBC driver to > something running outside), defeating the main purpose of sqlite3. Let's continue this on the new thread: http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010072.html Peter From redmine at redmine.open-bio.org Sun Nov 4 22:47:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3392 has been reported by Brad Zoltick. ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 22:47:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3392 has been reported by Brad Zoltick. ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 22:47:23 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3393 has been reported by Brad Zoltick. ---------------------------------------- Bug #3393: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3393 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Nov 4 22:47:22 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 4 Nov 2012 22:47:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download almost any documentation - the download links are invalid Message-ID: Issue #3393 has been reported by Brad Zoltick. ---------------------------------------- Bug #3393: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3393 Author: Brad Zoltick Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Documentation Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Nov 5 00:06:10 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 5 Nov 2012 00:06:10 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] unable to download almost any documentation - the download links are invalid References: Message-ID: Issue #3392 has been updated by Peter Cock. Category changed from Documentation to Website Priority changed from Normal to Urgent Yep, we know about it - but thanks for letting us know just in case: http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010069.html The same issue affects our release downloads too which is more annoying. Its a side effect during server migration from a dying machine to a virtual machine on the Amazon Cloud, http://lists.open-bio.org/pipermail/biopython/2012-November/008248.html Leaving this bug open until the new server is fixed... ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: New Priority: Urgent Assignee: Biopython Dev Mailing List Category: Website Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Nov 5 23:07:09 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Nov 2012 23:07:09 +0000 Subject: [Biopython-dev] OBF server outage announcement / call for SysAdmin volunteers In-Reply-To: References: Message-ID: On Thu, Nov 1, 2012 at 7:50 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock wrote: >> FYI regarding the Biopython website and recent mailing list outage. >> >> Peter >> >> PS you also keep an eye on @Biopython and @OBF_news on Twitter, >> which are a useful alternative when the mailing lists are down. >> >> > > I should have added that while the wiki is down (which does > unfortunately include the Biopython home page), the Biopython > downloads remain available via http://biopython.org/DIST/ and > other 'static' content like the Tutorial and API pages are up: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > http://biopython.org/DIST/docs/api/ Hosting of biopython.org (and the bioperl.org and open-bio.org websites) was transferred to an Amazon cloud machine over the weekend, which fixed the wiki but temporarily disabled the static pages (like the Tutorial and downloads). Those should all be working again now. At some later date (to be announced) the server running the OBF mailing lists will be transferred, which would make the mailing lists unavailable for a short period. Regards, Peter From redmine at redmine.open-bio.org Mon Nov 5 23:13:43 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 5 Nov 2012 23:13:43 +0000 Subject: [Biopython-dev] [Biopython - Bug #3392] (Resolved) unable to download almost any documentation - the download links are invalid References: Message-ID: Issue #3392 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 This should be working again now :) ---------------------------------------- Bug #3392: unable to download almost any documentation - the download links are invalid https://redmine.open-bio.org/issues/3392 Author: Brad Zoltick Status: Resolved Priority: Urgent Assignee: Biopython Dev Mailing List Category: Website Target version: Not Applicable URL: People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response: Forbidden You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server. Apache/2.2.23 (Amazon) Server at biopython.org Port 80 -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kai.blin at biotech.uni-tuebingen.de Mon Nov 19 14:11:42 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Mon, 19 Nov 2012 15:11:42 +0100 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. Message-ID: <50AA3E1E.70407@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, I'm currently investigating an error caused by an invalid GenBank file input that annotates CDS features with invalid coordinates. The GenBank parser accepts these features, but later my program crashes. It turns out the crash is because I'm calling the extract() method for my seq features, which then return an empty Seq object for out-of-range parent_sequence. I have the feeling that raising an exception would be the best way of dealing with this, but of course I can also check the result of extract() to be different from an empty Seq object. The line I'd like to throw a ValueError on out-of-bounds coordinates is https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 What are your thoughts on this? Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQqj4eAAoJEKM5lwBiwTTP7rsIANURFpsEtHOIgJ1z3r6nV3mQ rI0Vo0fBh59beZA0NYi2rMez+TUFXf87Ih3b9LGIH4xaFsAwpXJrUjvbqC1tuqBv KFg65psNCnDlp9Pc4DZQnaAS7ycoDrDiJStV387XWE6CA7dTiCkBUfKwuaf7S/om m1je0XMJ6j6J5+Jn2qW/QMpf2G9e8lAkZyeNIQyYtGF+RbPkBPSxpZFTEn6KsymT dOLoCQVhlf1R9X0S+nLBAh9Q29akf6/tkUcqdUg5ROoNqvqjudDWbz0JgoTgsf7n j24rlTIpxktl3KKna6DtoX5ig4EKF5IOnQmo00JrWWL8Liy0oKTY/LRkF5CB85k= =djFF -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Mon Nov 19 16:10:15 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 Nov 2012 16:10:15 +0000 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. In-Reply-To: <50AA3E1E.70407@biotech.uni-tuebingen.de> References: <50AA3E1E.70407@biotech.uni-tuebingen.de> Message-ID: On Mon, Nov 19, 2012 at 2:11 PM, Kai Blin wrote: > Hi folks, > > I'm currently investigating an error caused by an invalid GenBank file > input that annotates CDS features with invalid coordinates. The > GenBank parser accepts these features, but later my program crashes. Perhaps we should have a parser error/warning at that point? (as well as any fix to the extract method) > It turns out the crash is because I'm calling the extract() method for > my seq features, which then return an empty Seq object for > out-of-range parent_sequence. > > I have the feeling that raising an exception would be the best way of > dealing with this, but of course I can also check the result of > extract() to be different from an empty Seq object. > > The line I'd like to throw a ValueError on out-of-bounds coordinates is > https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 > > What are your thoughts on this? Some might find this surprising given the (initially rather odd) Python slicing behviour with out of range coordindates (which indirectly cause the behaviour ovserved here): >>> "hello"[100:200] '' i.e. Slicing a string outside its bounds gives an empty string. On balance you're probably right that an error in this situation makes more sense (a discrepancy between feature location and the given parent sequence not being long enough). Peter From p.j.a.cock at googlemail.com Mon Nov 19 16:32:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 Nov 2012 16:32:11 +0000 Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence. In-Reply-To: <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com> References: <50AA3E1E.70407@biotech.uni-tuebingen.de> <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com> Message-ID: On Mon, Nov 19, 2012 at 4:25 PM, Kai Blin wrote: > Peter Cock wrote: > >>> GenBank parser accepts these features, but later my program crashes. >> >>Perhaps we should have a parser error/warning at that point? >>(as well as any fix to the extract method) > > Probably a bit tricky because the GenBank file might not contain a > sequence at all, and we can't tell until we either see the sequence or > an end of record marker. The first line should tell you the length, and we already have a warning in place for naughty GenBank files where the actual sequence has a different length. Those could be a problem for this new warning, as you'd only know the expected sequence length from the header while parsing the features. >>> I have the feeling that raising an exception would be the best way >>> of dealing with this, but of course I can also check the result >>> of extract() to be different from an empty Seq object. >>> >>> The line I'd like to throw a ValueError on out-of-bounds coordinates >>> is >>> >>> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811 >>> >>> What are your thoughts on this? >> >>Some might find this surprising given the (initially rather odd) >>Python slicing behviour with out of range coordindates (which >>indirectly cause the behaviour ovserved here): >> >>>>> "hello"[100:200] >>'' >> >>i.e. Slicing a string outside its bounds gives an empty string. > > Yes, that is why we end up with an empty Seq object. > >>On balance you're probably right that an error in this situation >>makes more sense (a discrepancy between feature location >>and the given parent sequence not being long enough). > > Yes. The way I understand the intention of the parent sequence, > the whole point is that the feature should be located on it. > > I'll gladly prepare a patch (and some test). > Cheers, > Kai OK. Peter From redmine at redmine.open-bio.org Tue Nov 20 13:41:47 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 13:41:47 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie implementation can't load large data sets Message-ID: Issue #3395 has been reported by Micha? Nowotka. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 13:41:48 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 13:41:48 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie implementation can't load large data sets Message-ID: Issue #3395 has been reported by Micha? Nowotka. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 14:02:01 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 14:02:01 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Can you try the same test case without gzip? i.e. Can you load /tmp/trie.dat rather than /tmp/trie.dat.gz? Also I would try explicitly opening the files in binary mode. P.S. Which OS, which version of Python, which version of Biopython? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 14:18:46 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 14:18:46 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. Sure, I'll update this issue as soon as I check that. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 16:31:13 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 16:31:13 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. OK, I tried using standard python file handler with explicit binary mode and it also failed. The file is now 165.5MB. I also tried bz2 and zip compression, without any luck... ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 17:02:48 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:02:48 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Well that is progress - it means this isn't a problem coming from reading a compressed file on disk - you've made the test case simpler. Can you actually share a self contained example script? If not, I suggest you try halving the dataset (only record the first half of the tries), and retest. Then repeat - this should tell you if the problem is as you suspect a large dataset, or something specific about a special value. Alternatively can you share the (compressed) file? I could at least check if it fails the same way here, and perhaps add some debugging code to get more information. The error message itself is coming from some C code, which hasn't changed for some time: https://github.com/biopython/biopython/blob/master/Bio/triemodule.c The error itself is likely triggered in function _deserialize_transition in trie.c: https://github.com/biopython/biopython/blob/master/Bio/triemodule.c You still haven't told us the important information of which OS, which version of Python, which version of Biopython. Given it is C code, I'd also like to know how Biopython was installed (e.g. did you compile it from source yourself). ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 17:14:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:14:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I'm using Ubuntu 12.04 LTS, Biopython 1.6 and Python 2.7.3. Can you tell me where should I place compressed file? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 17:21:58 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:21:58 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Sadly RedMine is limited to 5MB attachments. You could use DropBox or something similar, or if you have your own server put the file online temporarily for me to download it? You probably have Biopython 1.60 (one dot sixty), there was no Biopython 1.6, one dot six. Did you install Biopython using the Ubuntu package manager? i.e. the GUI tool, or at the command line with something like 'apt-get install biopython'? ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 17:43:21 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:43:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I put the file here: http://mnowotka.kei.pl/trie.4.dat.gz ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 20 17:56:47 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 20 Nov 2012 17:56:47 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I confirm, it's 1.60 version, I'm using. I installed it either by apt-get install or pip. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Nov 26 13:29:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 13:29:58 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? Message-ID: On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: > On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock > wrote: >> >> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: >> > >> > Peter; >> > >> >> In the case of Bow's SearchIO code, what would you prefer? >> >> e.g. Bio.SearchIO as it is now on his branch? >> > >> > I like plain ol' Search the best but don't have a strong preference. I'm >> > terrible at naming things so trust everyone's judgment on this. >> > >> > Brad >> >> Since we have no clear consensus, I propose we add Bow's code >> as Bio.SearchIO (which is how it is written right now), with the new >> BiopythonExperimentalWarning in place (to alert people that it may >> change in the next release). We can then rename or move it at a >> later date. This will make it easier for people to test the code, and >> also suggest further changes or additions (e.g. Kai's HMMER work). >> >> If we and when we agree a consolidation of the Bio.SeqXXX >> modules, then Bio.SearchIO could move too. If this happens >> before any public release as Bio.SearchIO so much the better. >> >> Adopting lower case module names under Python 3 is also a >> separate issue. >> >> Peter >> > > +1 > > Regarding ... I plan to do the commit today, barring any last minute objections. I am leaning towards a merge from Bow's original (un-rebased) branch, which had only three trivial conflicts to handle. Peter From w.arindrarto at gmail.com Mon Nov 26 13:38:23 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 14:38:23 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: Hi Peter and everyone, If it helps, I've done the rebase (also resolving the three conflicts) with the latest master branch. On top of it, I've also added the new BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's available here: https://github.com/bow/biopython/tree/searchio. However if you're interested in inspecting the non-rebased branch, I've also kept it here: https://github.com/bow/biopython/tree/searchio-nonrebased. Note that this one doesn't have the new experimental warning since it's a feature added more recently. Also, in both branches, the tutorial has been changed with the addition of the (draft) Bio.SearchIO tutorial. Let me know which one you prefer and I'll submit a pull request :). cheers, Bow On Mon, Nov 26, 2012 at 2:29 PM, Peter Cock wrote: > On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich wrote: >> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock >> wrote: >>> >>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman wrote: >>> > >>> > Peter; >>> > >>> >> In the case of Bow's SearchIO code, what would you prefer? >>> >> e.g. Bio.SearchIO as it is now on his branch? >>> > >>> > I like plain ol' Search the best but don't have a strong preference. I'm >>> > terrible at naming things so trust everyone's judgment on this. >>> > >>> > Brad >>> >>> Since we have no clear consensus, I propose we add Bow's code >>> as Bio.SearchIO (which is how it is written right now), with the new >>> BiopythonExperimentalWarning in place (to alert people that it may >>> change in the next release). We can then rename or move it at a >>> later date. This will make it easier for people to test the code, and >>> also suggest further changes or additions (e.g. Kai's HMMER work). >>> >>> If we and when we agree a consolidation of the Bio.SeqXXX >>> modules, then Bio.SearchIO could move too. If this happens >>> before any public release as Bio.SearchIO so much the better. >>> >>> Adopting lower case module names under Python 3 is also a >>> separate issue. >>> >>> Peter >>> >> >> +1 >> >> Regarding ... > > I plan to do the commit today, barring any last minute objections. > > I am leaning towards a merge from Bow's original (un-rebased) branch, > which had only three trivial conflicts to handle. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Nov 26 13:49:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 13:49:44 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 1:38 PM, Wibowo Arindrarto wrote: > Hi Peter and everyone, > > If it helps, I've done the rebase (also resolving the three conflicts) > with the latest master branch. On top of it, I've also added the new > BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's > available here: https://github.com/bow/biopython/tree/searchio. > > However if you're interested in inspecting the non-rebased branch, > I've also kept it here: > https://github.com/bow/biopython/tree/searchio-nonrebased. Note that > this one doesn't have the new experimental warning since it's a > feature added more recently. > > Also, in both branches, the tutorial has been changed with the > addition of the (draft) Bio.SearchIO tutorial. > > Let me know which one you prefer and I'll submit a pull request :). > > cheers, > Bow That's fine - I found both branches :) I've actually done a trial merge on the non-rebased one and then cherry-picked the experimental warning - looks good. Once that's done there is some housekeeping to do, like the indexing code duplication with Bio.SeqIO, and tackling indexing BGZF compressed files with Bio.SearchIO which I will have a go at. Peter P.S. I had intended to do this earlier this month, but we had the OBF server issues to deal with. From w.arindrarto at gmail.com Mon Nov 26 14:06:03 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 15:06:03 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: > That's fine - I found both branches :) > > I've actually done a trial merge on the non-rebased one and > then cherry-picked the experimental warning - looks good. Ah, good then :). > Once that's done there is some housekeeping to do, like > the indexing code duplication with Bio.SeqIO, and tackling > indexing BGZF compressed files with Bio.SearchIO which > I will have a go at. Yes. I'm pretty sure there will also be changes we need to implement after more feedback from users. > P.S. I had intended to do this earlier this month, but we > had the OBF server issues to deal with. That's ok, I also noticed that it's not until quite recently that the commits become frequent again. From mauriceling at gmail.com Mon Nov 26 14:48:24 2012 From: mauriceling at gmail.com (Maurice Ling) Date: Mon, 26 Nov 2012 08:48:24 -0600 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ Message-ID: Hi I am setting an error running this: from Bio import Entrez from Bio import Medline handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline", retmode="text") The traceback is Traceback (most recent call last): File "C:\Users\Maurice.Ling\Desktop\muscorian\archive\pubmed_dump.py", line 16, in retmode="text") File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 133, in efetch keywords["id"] = ",".join(keywds["id"]) TypeError: sequence item 0: expected string, int found When I changed line 133 of Bio.Entrez.__init__ from keywords["id"] = ",".join(keywds["id"]) to keywords["id"] = ",".join(str(keywds["id"])) The error disappeared. Maurice LING mobile: +1(605)5920300, +6596669233 www: http://maurice.vodien.com CV: http://maurice.vodien.com/maurice_resume.pdf Linkedin: http://www.linkedin.com/in/mauriceling ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling From p.j.a.cock at googlemail.com Mon Nov 26 14:57:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 14:57:28 +0000 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 2:48 PM, Maurice Ling wrote: > Hi > > I am setting an error running this: > > from Bio import Entrez > from Bio import Medline > handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline", > retmode="text") > I would have used this: Entrez.efetch(db="pubmed", id=["19300000"], rettype="medline", retmode="text") In general the NCBI identifiers are arbitrary strings, although perhaps the pubmed identifiers could be treated as integers. This is perhaps worth changing in the Bio.Entrez code... What do you think Michael? Peter From mauriceling at gmail.com Mon Nov 26 15:23:31 2012 From: mauriceling at gmail.com (Maurice Ling) Date: Mon, 26 Nov 2012 09:23:31 -0600 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations Message-ID: Hi I found something strange in my download script to pull a list of pubmed citations. This was working in the past (back in 2008 period)... The script is ID_start = 19000000 ID_stop = 19000010 downtime = 1.2 from Bio import Entrez from Bio import Medline import string import time import cPickle Entrez.email = 'maurice.ling at sdstate.edu' while (ID_start < ID_stop): try: handle = Entrez.efetch(db="pubmed", id=[str(ID_start)], rettype="medline", retmode="text") records = list(Medline.parse(handle))[0] print records cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1) ID_start = ID_start + 1 time.sleep(downtime) print 'ID count: ', str(ID_start) except: print 'ID count: error ', str(ID_start) ID_start = ID_start + 1 But the results from print records kept showing the same thing: {'STAT': 'MEDLINE', 'IP': '2', 'JT': 'Biochemical medicine', 'DA': '19760116', 'FAU': ['Makar, A B', 'McMartin, K E', 'Palese, M', 'Tephly, T R'], 'DP': '1975 Jun', 'OWN': 'NLM', 'PT': ['Journal Article', "Research Support, U.S. Gov't, P.H.S."], 'LA': ['eng'], 'CRDT': ['1975/06/01 00:00'], 'DCOM': '19760116', 'LR': '20091111', 'PG': '117-26', 'TI': 'Formate assay in body fluids: application in methanol poisoning.', 'RN': ['0 (Formates)', '124-38-9 (Carbon Dioxide)', '67-56-1 (Methanol)', 'EC 1.2.- (Aldehyde Oxidoreductases)'], 'PL': 'UNITED STATES', 'TA': 'Biochem Med', 'JID': '0151424', 'VI': '13', 'IS': '0006-2944 (Print) 0006-2944 (Linking)', 'AU': ['Makar AB', 'McMartin KE', 'Palese M', 'Tephly TR'], 'MHDA': '1975/06/01 00:01', 'MH': ['Aldehyde Oxidoreductases/metabolism', 'Animals', 'Body Fluids/*analysis', 'Carbon Dioxide/blood', 'Formates/blood/*poisoning', 'Haplorhini', 'Humans', 'Hydrogen-Ion Concentration', 'Kinetics', 'Methanol/blood', 'Methods', 'Pseudomonas/enzymology'], 'EDAT': '1975/06/01', 'SO': 'Biochem Med. 1975 Jun;13(2):117-26.', 'SB': 'IM', 'PMID': '1', 'PST': 'ppublish'} It seems to keep efetching PMID 1 (http://www.ncbi.nlm.nih.gov/pubmed/1) Any idea? Thanks in advance. Maurice LING mobile: +1(605)5920300, +6596669233 www: http://maurice.vodien.com CV: http://maurice.vodien.com/maurice_resume.pdf Linkedin: http://www.linkedin.com/in/mauriceling ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling From p.j.a.cock at googlemail.com Mon Nov 26 15:36:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 15:36:13 +0000 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 3:23 PM, Maurice Ling wrote: > Hi > > I found something strange in my download script to pull a list of pubmed > citations. This was working in the past (back in 2008 period)... > > The script is > > ID_start = 19000000 > ID_stop = 19000010 > downtime = 1.2 > > from Bio import Entrez > from Bio import Medline > import string > import time > import cPickle > > Entrez.email = 'maurice.ling at sdstate.edu' > > while (ID_start < ID_stop): > try: > handle = Entrez.efetch(db="pubmed", id=[str(ID_start)], > rettype="medline", > retmode="text") > records = list(Medline.parse(handle))[0] > print records > cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1) > ID_start = ID_start + 1 > time.sleep(downtime) > print 'ID count: ', str(ID_start) > except: > print 'ID count: error ', str(ID_start) > ID_start = ID_start + 1 Are you sure you didn't run something slightly different? The simplest possibility would be a line accidentally setting ID_start to equal 1, rather than increasing it. Also, using a for loop would be much cleaner (with the identifiers as either integers or as strings). For instance, for identifier in range(19000000, 19000010): #Do stuff Note you have a discrepancy with ID_stop vs ID_end This seems to work for me: ID_start = 19000000 ID_stop = 19000010 downtime = 1.2 from Bio import Entrez from Bio import Medline import string import time import cPickle Entrez.email = 'maurice.ling at sdstate.edu' for identifier in range(ID_start, ID_stop): identifier = str(identifier) try: handle = Entrez.efetch(db="pubmed", id=identifier, rettype="medline", retmode="text") records = list(Medline.parse(handle))[0] print records cPickle.dump(records, open('%s.txt' % identifier, 'w'), -1) except Excpetion, error: print "Error for %s - %s" % (identifier, error) However, rather than parsing the Medline records and saving the pickled object, I would save the plain text Medline data itself. That way you can use the files outside of Python (e.g. working at the Unix command line with grep). Peter From p.j.a.cock at googlemail.com Mon Nov 26 16:08:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:08:28 +0000 Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 3:42 PM, Maurice Ling wrote: > Thanks Peter > > Now, that seems to work... still scratching my uncaffeinated head though.... > Great. I'm sure a coffee will help :) Peter P.S. Next time could you use the main list for usage queries, rather than the development list, biopython-dev - thanks! From p.j.a.cock at googlemail.com Mon Nov 26 16:46:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:46:44 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto wrote: >> That's fine - I found both branches :) >> >> I've actually done a trial merge on the non-rebased one and >> then cherry-picked the experimental warning - looks good. > > Ah, good then :). Done, https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 >> Once that's done there is some housekeeping to do, like >> the indexing code duplication with Bio.SeqIO, and tackling >> indexing BGZF compressed files with Bio.SearchIO which >> I will have a go at. > > Yes. Started, it seems the two _index.py files have diverged a little more than I'd expected: https://github.com/biopython/biopython/commit/ad1786b99afd2a50248246d877ff00a53949546b >> P.S. I had intended to do this earlier this month, but we >> had the OBF server issues to deal with. > > That's ok, I also noticed that it's not until quite recently that the > commits become frequent again. Christian Brueffer deserves some of the credit for the recent burst of commits - he's been very busy sending pull requests! Peter From p.j.a.cock at googlemail.com Mon Nov 26 16:55:32 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 16:55:32 +0000 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: On Mon, Nov 26, 2012 at 4:46 PM, Peter Cock wrote: > On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto > wrote: >>> That's fine - I found both branches :) >>> >>> I've actually done a trial merge on the non-rebased one and >>> then cherry-picked the experimental warning - looks good. >> >> Ah, good then :). > > Done, > https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 I've put a short note in the NEWS file, https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7 Congratulations Bow :) I guess this would be a good excuse for you to write another blog post ;) Speaking of which, unless we expect to release Biopython 1.61 soon, we should probably have something on the news blog too (which reminds me I was supposed to co-ordinate a general OBF GSoC 2012 post). Maybe I will manage that will on leave in December? Regards, Peter From w.arindrarto at gmail.com Mon Nov 26 17:05:43 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 26 Nov 2012 18:05:43 +0100 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: >>>> That's fine - I found both branches :) >>>> >>>> I've actually done a trial merge on the non-rebased one and >>>> then cherry-picked the experimental warning - looks good. >>> >>> Ah, good then :). >> >> Done, >> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513 > > I've put a short note in the NEWS file, > https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7 > > Congratulations Bow :) Thank you :D! It feels great to see the code in master. > I guess this would be a good excuse for you to write another blog post ;) It is, and one should come up in the next couple of days :). Now I'm anxiously waiting for the next Biopython release ~ and the submodule's 'final' form after more feedback ;). cheers, Bow From p.j.a.cock at googlemail.com Mon Nov 26 17:22:00 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Nov 2012 17:22:00 +0000 Subject: [Biopython-dev] [GSoC] GSoC python variant final update In-Reply-To: References: Message-ID: On Mon, Aug 20, 2012 at 5:22 AM, Lenna Peterson wrote: > Post: http://arklenna.tumblr.com/post/29808300789/ > > The coordinate mapper, with updated documentation, is now located on > this branch: https://github.com/lennax/biopython/tree/f_loc4 > It awaits the merging of Peter's f_loc4 branch. > > I've written an entry on coordinate mapping for the Cookbook: > http://biopython.org/wiki/Coordinate_mapping Hi Lenna, Do you need my f_loc4 branch for the main GSoC variants work, or just the coordinate mapper? Thanks, Peter From chapmanb at 50mail.com Mon Nov 26 20:18:09 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 26 Nov 2012 15:18:09 -0500 Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names? In-Reply-To: References: Message-ID: <87vccs15ku.fsf@fastmail.fm> Bow and Peter; >> Congratulations Bow :) > > Thank you :D! It feels great to see the code in master. Awesome, nice work on this project and congratulations on getting it integrated. It's great to see this go in, Brad From p.j.a.cock at googlemail.com Tue Nov 27 09:35:46 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 27 Nov 2012 09:35:46 +0000 Subject: [Biopython-dev] Minor buildbot issues from SearchIO Message-ID: Hi all, The BuildBot flagged two new issues overnight, http://testing.open-bio.org/biopython/tgrid Python 2.5 on Windows - doctests are failing due to floating point decimal place differences in the exponent (down to C library differences, something fixed in later Python releases). Perhaps a Python 2.5 hack is the way to go here? http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity. Perhaps there is some encoding setting needed under Python 3 for the BLAST XML files? http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio There is a separate cross-platform issue on Python 3.1, "TypeError: invalid event tuple" again with XML parsing. Curiously this had started a few days back in the UniprotIO tests on one machine, pre-dating the SearchIO merge. I'm not sure what triggered it. http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767 http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio (Note TravisCI doesn't officially support Python 3.1, although until recently they did offer it unofficially - Python 3.3 support is happening soon through). Peter From diego_zea at yahoo.com.ar Tue Nov 27 14:25:48 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Tue, 27 Nov 2012 06:25:48 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython Message-ID: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Hi!!! This is my firts mail in the list. I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project. I did this post in Stackoverflow, and I want to share my question to all of you ;) http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated Best wishes, ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } From anaryin at gmail.com Tue Nov 27 15:40:58 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 27 Nov 2012 16:40:58 +0100 Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: Hi Diego, Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach. Cheers, Jo?o 2012/11/27 Diego Zea > > http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated From redmine at redmine.open-bio.org Wed Nov 28 00:46:22 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 28 Nov 2012 00:46:22 +0000 Subject: [Biopython-dev] [Biopython - Feature #3396] (New) Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO Message-ID: Issue #3396 has been reported by Olga Botvinnik. ---------------------------------------- Feature #3396: Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO https://redmine.open-bio.org/issues/3396 Author: Olga Botvinnik Status: New Priority: Normal Assignee: Olga Botvinnik Category: Target version: URL: As of BioPython 1.59, if an alignment is read in with Bio.AlignIO(handle, 'emboss'), the metadata such as the substitution matrix used, gap_penalty, extend_penalty, identity, similarity, gaps, and score in the header is ignored:
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# Score: 100.0
#
#
#=======================================
I edited the EmbossIO.py file to read these metadata and add it as an annotation to each SeqRecord in the MultipleSequenceAlignment object, since the MultipleSequenceAlignment object does not have the option for annotations. I also added the appropriate unit tests. Please let me know if there is a bug in the code that I missed. For example, for the above alignment, the SeqRecord objects would have the following annotations:
{'identity_denominator': 131, 'matrix': 'EBLOSUM62', 'similarity': 0.8549618320610687, 'similarity_numerator': 112, 'similarity_denominator': 131, 'gaps': 0.1450381679389313, 'identity_numerator': 112, 'gap_penalty': 10.0, 'extend_penalty': 0.5, 'gaps_denominator': 131, 'score': 591.5, 'identity': 0.8549618320610687, 'gaps_numerator': 19}
I decided to keep the numerators and denominators separately from the identity, similarity, and gap percentages just in case a user wanted to do something else with them. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From diego_zea at yahoo.com.ar Wed Nov 28 03:09:58 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Tue, 27 Nov 2012 19:09:58 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: <1354072198.13226.YahooMailNeo@web140606.mail.bf1.yahoo.com> """" Hi Jo?o (and others)!!! Thanks :) I think someone with more Numpy knowledgement can do this better, but this is my idea: 1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser) 2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how ) [ The indice is the same, and you can use one for slice the other with boolean arrays ;) ] 3- Define methods for the most commons operations This is and example of my idea (work on 1AB0 from PDB)... """" import numpy names=[] descript=[] xyz = [] # The example structure is # http://www.rcsb.org/pdb/explore.do?structureId=1ab0 with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh: ??? """ Very naive parser.I write this in a couple of minutes. ??? It's bad, but it's only for show the idea """ ??? for line in fh: ??????? if line[0:4]=='ATOM': ??????????? temp =[] ??????????? temp2 =[] ??????????? temp.append(line[4:11].replace(" ","")) ??????????? temp2.append(line[11:16].replace(" ","")) ??????????? temp2.append(line[17:21].replace(" ","")) ??????????? temp.append(line[22:27].replace(" ","")) ??????????? xyz.append(line[31:56].split()) ??????????? temp.append(line[55:60].replace(" ","")) ??????????? temp.append(line[60:67].replace(" ","")) ??????????? temp2.append(line[-5:].replace(" ","").replace("\n","")) ??????????? descript.append(temp) ??????????? names.append(temp2) # I don't good for using different dtypes # In different columns # But can be better columns with names instead of this: names_array = numpy.array(names,numpy.character)???????????? descript_array = numpy.array(descript,numpy.float16) xyz_array = numpy.array(xyz,numpy.float16) def select_atom(names,xyz,descript,atom='CA'): ??? xyz_s = xyz[names[:,0]==atom,:] ??? names_s = names[names[:,0]==atom,:] ??? descript_s = descript[names[:,0]==atom,:] ??? return names_s,xyz_s,descript_s def delete_res_num(names,xyz,descript,num=20): ??? xyz_s = xyz[descript[:,1]!=num,:] ??? names_s = names[descript[:,1]!=num,:] ??? descript_s = descript[descript[:,1]!=num,:] ??? return names_s,xyz_s,descript_s def delete_atom_num(names,xyz,descript,num=20): ??? xyz_s = xyz[descript[:,0]!=num,:] ??? names_s = names[descript[:,0]!=num,:] ??? descript_s = descript[descript[:,0]!=num,:] ??? return names_s,xyz_s,descript_s def add_atom(new_name,new_xyz,new_descript,names,xyz,descript): ??? # Using vstack ;) ??? new_name = numpy.array(new_name,numpy.character) ??? new_descript = numpy.array(new_descript,numpy.float16) ??? new_xyz = numpy.array(new_xyz,numpy.float16) ??? xyz_s = numpy.vstack((xyz,new_xyz)) ??? names_s = numpy.vstack((names,new_name)) ??? descript_s = numpy.vstack((descript,new_descript)) ??? return names_s,xyz_s,descript_s ## Example (works!!!) xyz_array.shape delete_atom_num(names_array,xyz_array,descript_array)[1].shape add_atom(['H','H','H'],[0,0,0],[0,0,0,0],names_array,xyz_array,descript_array)[1].shape ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } >________________________________ > De: Jo?o Rodrigues >Para: Diego Zea >CC: "biopython-dev at lists.open-bio.org" >Enviado: martes, 27 de noviembre de 2012 12:40 >Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython > > >Hi Diego, > > >Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach. > > >Cheers, > > >Jo?o > > >2012/11/27 Diego Zea > >http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated > > > From redmine at redmine.open-bio.org Thu Nov 29 09:09:49 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 09:09:49 +0000 Subject: [Biopython-dev] [Biopython - Feature #3398] (New) Oracle BioSQL Message-ID: Issue #3398 has been reported by Hyungyong Kim. ---------------------------------------- Feature #3398: Oracle BioSQL https://redmine.open-bio.org/issues/3398 Author: Hyungyong Kim Status: New Priority: Normal Assignee: Category: Target version: URL: I just tested Oracle BioSQL for Biopython using cx_Oracle. It includes some Biopython modification due to my genbank file test. I attached this patch and describe how it was generated.
[yong27 at dev biopython]$ git ls-remote --heads origin
902947a7df49d8529faeb7e1bfb55b2d06252272        refs/heads/master
[yong27 at dev biopython]$ git diff origin/master master > oracle_biosql.diff
[yong27 at dev biopython]$
This is a example how to use Oracle BioSQL. Oracle, Oracle BioSQL schema, cx_Oracle has to be installed.
from context lib import contextmanager
from BioSQL import BioSeqDatabase

@contextmanager
def biosqlconn(dbname):
    server = BioSeqDatabase.open_database(driver='cx_Oracle, user='USER', passwd='PASS')
    conn = server[dbname]
    try:
        yield conn
    except:
        conn.adaptor.rollback()
        raise
    else:
        conn.adaptor.commit()
    finally:
        conn.adaptor.close()

with biosqlconn('mydb') as biosqldb:
    record = biosqldb.lookup(accession='1234')

---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Nov 29 10:56:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 10:56:04 +0000 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: <50B6F8FF.2090206@brueffer.de> References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> Message-ID: Can we continue this on the biopython-dev mailing list (CC'd)? On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer wrote: > On 11/29/2012 10:42 AM, Christian Brueffer wrote: >> >> Hi, >> >> in preparation of cleaning up the AlignACE wrapper, I wanted to test >> the current wrapper. However, it doesn't seem to work at all ... >> >> For the record, I'm testing with the Linux version of the binary >> (AlignACE version 2.3 October 27, 1998). >> > > Some of the test files in the Tests directory mention the following AlignACE > version: "AlignACE 4.0 05/13/04" > > This may be the answer to my problems. Does anyone know where to get hold > of this version? > > The website (http://atlas.med.harvard.edu/) is down and the only > other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html) > only distributes the old 2.3 version that I have. Hmm, I don't see any existing unit tests dedicated to this wrapper. There should really be a file named test_AlignACE_tool.py or similar. I would also like some doctests in Bio/Motif/Applications/_AlignAce.py which must be non-executing so they can be run without dependencies, which of course isn't actually a functional test but it does still catch some issues - but primarily would be as documentation to demonstrate typical usage. I don't appear to have AlignAce installed on my own machines - in particular, the nightly buildslaves don't have it. I don't think there is a Debian/Ubuntu package for AlignAce, so testing this under TravisCI is non-trivial - it looks like their licence agreement could block packaging it. Thanks, Peter From p.j.a.cock at googlemail.com Thu Nov 29 11:22:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 11:22:51 +0000 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: <50B74199.6020904@brueffer.de> References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> <50B74199.6020904@brueffer.de> Message-ID: On Thu, Nov 29, 2012 at 11:06 AM, Christian Brueffer wrote: > On 11/29/2012 06:56 PM, Peter Cock wrote: >> >> Can we continue this on the biopython-dev mailing list (CC'd)? >> > > (moved to biopython-dev) > Thanks. > Indeed. I already have a cleaned up wrapper and unit tests in my local > tree, but I don't want to submit them without actually testing them with an > up to date binary ;-) Excellent - I suspected you'd been doing something like this ;) > archive.org has a version of http://atlas.med.harvard.edu/ from 2011, > I have contacted the responsible person mentioned on the page. It was Bartek who wrote the original wrapper (I only made re-factoring changes since then), hopefully he still has a working AliceACE installation and can tell us the version numbers etc that he was using. Regards, Peter From christian at brueffer.de Thu Nov 29 11:06:01 2012 From: christian at brueffer.de (Christian Brueffer) Date: Thu, 29 Nov 2012 19:06:01 +0800 Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper In-Reply-To: References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de> Message-ID: <50B74199.6020904@brueffer.de> On 11/29/2012 06:56 PM, Peter Cock wrote: > Can we continue this on the biopython-dev mailing list (CC'd)? > > On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer > wrote: >> On 11/29/2012 10:42 AM, Christian Brueffer wrote: >>> >>> Hi, >>> >>> in preparation of cleaning up the AlignACE wrapper, I wanted to test >>> the current wrapper. However, it doesn't seem to work at all ... >>> >>> For the record, I'm testing with the Linux version of the binary >>> (AlignACE version 2.3 October 27, 1998). >>> >> >> Some of the test files in the Tests directory mention the following AlignACE >> version: "AlignACE 4.0 05/13/04" >> >> This may be the answer to my problems. Does anyone know where to get hold >> of this version? >> >> The website (http://atlas.med.harvard.edu/) is down and the only >> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html) >> only distributes the old 2.3 version that I have. > > Hmm, I don't see any existing unit tests dedicated to this wrapper. > There should really be a file named test_AlignACE_tool.py or similar. > > I would also like some doctests in Bio/Motif/Applications/_AlignAce.py > which must be non-executing so they can be run without dependencies, > which of course isn't actually a functional test but it does still catch some > issues - but primarily would be as documentation to demonstrate typical > usage. > > I don't appear to have AlignAce installed on my own machines - in > particular, the nightly buildslaves don't have it. I don't think there is > a Debian/Ubuntu package for AlignAce, so testing this under > TravisCI is non-trivial - it looks like their licence agreement could > block packaging it. > (moved to biopython-dev) Indeed. I already have a cleaned up wrapper and unit tests in my local tree, but I don't want to submit them without actually testing them with an up to date binary ;-) archive.org has a version of http://atlas.med.harvard.edu/ from 2011, I have contacted the responsible person mentioned on the page. Cheers, Chris From mjldehoon at yahoo.com Thu Nov 29 14:33:12 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 29 Nov 2012 06:33:12 -0800 (PST) Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: Message-ID: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Mon, 11/26/12, Peter Cock wrote: > In general the NCBI identifiers are arbitrary strings, > although perhaps the pubmed identifiers could be treated as > integers. > This is perhaps worth changing in the Bio.Entrez code... > > What do you think Michael? If we change this in the Bio.Entrez code, we should put str(..) around all NCBI identifiers, not just the pubmed ones. Otherwise we'd have special treatment for one of the Entrez databases, which may cause problems in the future. I'm OK if somebody else adds the calls to str(..), but I wouldn't champion it myself. Best, -Michiel. From p.j.a.cock at googlemail.com Thu Nov 29 14:49:42 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Nov 2012 14:49:42 +0000 Subject: [Biopython-dev] Error in Bio.Entrez.__init__ In-Reply-To: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Thu, Nov 29, 2012 at 2:33 PM, Michiel de Hoon wrote: > --- On Mon, 11/26/12, Peter Cock wrote: >> In general the NCBI identifiers are arbitrary strings, >> although perhaps the pubmed identifiers could be treated as >> integers. >> This is perhaps worth changing in the Bio.Entrez code... >> >> What do you think Michael? > > If we change this in the Bio.Entrez code, we should put str(..) around > all NCBI identifiers, not just the pubmed ones. Otherwise we'd have > special treatment for one of the Entrez databases, which may cause > problems in the future. Yes, after all there are other Entrez database with 'numerical' identifiers. > I'm OK if somebody else adds the calls to str(..), but I wouldn't champion > it myself. I don't mind doing the commit (and a unit test), but do you have any specific concern in mind? Peter From redmine at redmine.open-bio.org Thu Nov 29 17:12:31 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 17:12:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. File trie_debug.patch added I can reproduce the problem with your saved file under Mac OS X, using the latest Biopython from github, e.g. $ python Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import trie >>> import gzip >>> with gzip.open("trie.4.dat.gz") as handle: ... t = trie.load(handle) ... Traceback (most recent call last): File "", line 2, in RuntimeError: loading failed for some reason Adding a little debugging to the C code tells us where this fails (see attachment), line 669: 668 if(has_value) { 669 if(!(trie->value = (*read_value)(data))) 670 goto _deserialize_trie_error; 371 } What kind of CPU does your machine have? i.e. is it a normal Intel or AMD CPU, or something unusual like a PowerPC where we have to worry about the bit order interpretation? We may need a complete example creating the trie as well - the problem could be in the trie itself, the serialisation (writing to disk), or de-serialisation (loading from disk). ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 29 17:21:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 29 Nov 2012 17:21:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Micha? Nowotka. I'm using ubuntu virtual machine running on MacBookPro using single Intel? Core? i7-2720QM CPU @ 2.20GHz processor. I will try to prepare code and data for which it fails. ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Fri Nov 30 02:35:25 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 30 Nov 2012 03:35:25 +0100 Subject: [Biopython-dev] Minor buildbot issues from SearchIO In-Reply-To: References: Message-ID: Hi everyone, I've done some digging around to see how to deal with these issues. Here's what I found: > The BuildBot flagged two new issues overnight, > http://testing.open-bio.org/biopython/tgrid > > Python 2.5 on Windows - doctests are failing due to floating point decimal place > differences in the exponent (down to C library differences, something fixed in > later Python releases). Perhaps a Python 2.5 hack is the way to go here? > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio I've submitted a pull request to fix this here: https://github.com/biopython/biopython/pull/98 > Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity. > Perhaps there is some encoding setting needed under Python 3 for the BLAST > XML files? > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio I've also addressed these failures here: https://github.com/biopython/biopython/pull/99 > There is a separate cross-platform issue on Python 3.1, "TypeError: > invalid event tuple" > again with XML parsing. Curiously this had started a few days back in > the UniprotIO > tests on one machine, pre-dating the SearchIO merge. I'm not sure what > triggered it. > http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767 > http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio As for this one, it seems that it's caused by a bug in Python3.1 (http://bugs.python.org/issue9257) due to the way `xml.etree.cElemenTree.iterparse` accepts the `event` argument. I haven't submitted any pull request for this bug, since the fix looks quite messy. Should we try to address this or simply make note that XML parsing in Python3.1 will not work? Like Peter noted, currently this bug involves Bio.SearchIO blast xml parsing, SeqIO.UniprotIO, and Phylo.PhyloXMLIO. regards, Bow From diego_zea at yahoo.com.ar Fri Nov 30 13:00:20 2012 From: diego_zea at yahoo.com.ar (Diego Zea) Date: Fri, 30 Nov 2012 05:00:20 -0800 (PST) Subject: [Biopython-dev] Numpy/Scipy and Biopython In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: <1354280420.4305.YahooMailNeo@web140605.mail.bf1.yahoo.com> Hi! I were checking the Seq/AlignIO, and I think can be possible avoid the overhead of create Bio objects after Numpy object. Adding an optional funci?n in __init__ with a argument setting in False for default. When this arguments became True, objects based on Numpy are generate too. At the time, maybe can be more easy interchange between simple python objects and numpy based objects. And use all functionality of Bio and fast numerical operations of Numpy arrays... It's only and idea, what do you think? Thanks!!! :) ? if ((dx*dp)>=(h/(2*pi))) { printf("Diego Javier Zea\n"); } >________________________________ > De: Diego Zea >Para: "biopython-dev at lists.open-bio.org" >Enviado: martes, 27 de noviembre de 2012 11:25 >Asunto: [Biopython-dev] Numpy/Scipy and Biopython > >Hi!!! >This is my firts mail in the list. >I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project. >I did this post in Stackoverflow, and I want to share my question to all of you ;) > >http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated >Best wishes, > >? >if ((dx*dp)>=(h/(2*pi))) >{ >printf("Diego Javier Zea\n"); >} >_______________________________________________ >Biopython-dev mailing list >Biopython-dev at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython-dev > > >