From meesters at uni-mainz.de Mon Sep 1 02:57:24 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 1 Sep 2008 08:57:24 +0200 Subject: [BioPython] NumPy In-Reply-To: <128888.36737.qm@web62405.mail.re1.yahoo.com> References: <128888.36737.qm@web62405.mail.re1.yahoo.com> Message-ID: <1220252244.28769.6.camel@meesters.biologie.uni-mainz.de> Hi, Apart from compiling issues - which can be fixed in collaboration with the numpy developers - another reason for updating to numpy is Python 3. In numpy there are already efforts to make the transition: http://www.scipy.org/Python3k While this task is not yet completed, it (almost) certainly never will be in the case of Numeric. Christian From chapmanb at 50mail.com Mon Sep 1 14:48:24 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 1 Sep 2008 14:48:24 -0400 Subject: [BioPython] NumPy changes ready to check in Message-ID: <20080901184823.GA7915@localdomain> Hi all; Hope everyone is doing well. Nice to see many familiar faces on the lists. The NumPy discussion over the past couple of days is very timely, as I was hoping to get permission to check in changes to convert over to NumPy with very reasonable back compatibility with Numeric. Recently, I wanted to utilize the PDB parser, and realized it still used the old Numeric module. Over the last few evenings, I made the adjustments and made sure things are not broken based on regression and simple script testing. The changes are fairly extensive in terms of modules touched, but limited in terms of the actual change. All python level changes use the "oldnumeric" compatibility layer if the Numeric import fails: try: from Numeric import array, Float0 except ImportError: from numpy.oldnumeric import array, Float0 So if people are still using Numeric, these changes should not impact them, but will allow use of NumPy if desired. Over time we would probably want to get rid of the Numeric imports as everyone moves to NumPy. The C/C++ changes are limited to KDTree and Cluster and are not back compatible, but do not require code level changes over than the imports as they also use the C compatibility layer in NumPy. So the major transition point would be people using KDTree and Cluster to install NumPy. I still have my CVS account and can check in these changes if y'all approve. Thanks all, Brad -- Brad Chapman Codon Devices http://www.codondevices.com From p.j.a.cock at googlemail.com Mon Sep 1 15:08:06 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 1 Sep 2008 20:08:06 +0100 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <20080901184823.GA7915@localdomain> References: <20080901184823.GA7915@localdomain> Message-ID: <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> On Mon, Sep 1, 2008 at 7:48 PM, Brad Chapman wrote: > Hi all; > Hope everyone is doing well. Nice to see many familiar faces on the > lists. Wow - welcome back! > The NumPy discussion over the past couple of days is very timely, > as I was hoping to get permission to check in changes to convert over to > NumPy with very reasonable back compatibility with Numeric. > > Recently, I wanted to utilize the PDB parser, and realized it still > used the old Numeric module. Over the last few evenings, I made the > adjustments and made sure things are not broken based on regression > and simple script testing. > > The changes are fairly extensive in terms of modules touched, but > limited in terms of the actual change. All python level changes use > the "oldnumeric" compatibility layer if the Numeric import fails: > > try: > from Numeric import array, Float0 > except ImportError: > from numpy.oldnumeric import array, Float0 > > So if people are still using Numeric, these changes should not impact > them, but will allow use of NumPy if desired. Over time we would > probably want to get rid of the Numeric imports as everyone moves to > NumPy. This on the fly system is great for the pure python code... > The C/C++ changes are limited to KDTree and Cluster and are not back > compatible, but do not require code level changes over than the > imports as they also use the C compatibility layer in NumPy. So the > major transition point would be people using KDTree and Cluster to > install NumPy. Its certainly the C/C++ bits that make life complicated. Have you looked at the patch on Bug 2251 at all? http://bugzilla.open-bio.org/show_bug.cgi?id=2251 > I still have my CVS account and can check in these changes if y'all > approve. Thanks all, > > Brad Michiel and I had chatted about doing one more release with Numeric only - there are a few things I'd like to try and get in first like ambiguous translation (Bug 2530) but perhaps we should just do Biopython 1.48 pretty much as is. http://bugzilla.open-bio.org/show_bug.cgi?id=2530 I'm happy to try and put together the release this week - but will need a hand for the windows builds. If that pans out, then we could start the Numeric to numpy transition in CVS. Peter From chris.lasher at gmail.com Mon Sep 1 20:00:20 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 1 Sep 2008 20:00:20 -0400 Subject: [BioPython] Determining if GenBank record is circular Message-ID: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> Is there a way to see if a sequence represented in a GenBank record is circular in the information contained within a SeqRecord object created by parsing the record with SeqIO? I couldn't find it in the annotations, description, or format attributes of SeqRecord. An example file that I'm looking for this in is ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mesorhizobium_loti/NC_002678.gbk Thanks, Chris From chris.lasher at gmail.com Mon Sep 1 21:25:09 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 1 Sep 2008 21:25:09 -0400 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> Message-ID: <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: > > Should be in LOCUS: > > LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Ah, sure. Let me re-state my question more precisely: Where is this represented in the SeqRecord object created by SeqIO.parse(), or is it represented at all? Best, Chris From biopython at maubp.freeserve.co.uk Tue Sep 2 05:00:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 10:00:41 +0100 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> Message-ID: <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> On Tue, Sep 2, 2008 at 2:25 AM, Chris Lasher wrote: > On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: >> >> Should be in LOCUS: >> >> LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 > > Ah, sure. Let me re-state my question more precisely: Where is this > represented in the SeqRecord object created by SeqIO.parse(), or is it > represented at all? Currently if the sequence is circular I don't think it is represented at all when parsed in a SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. Do you want to file a bug on this Chris? Peter From chapmanb at 50mail.com Tue Sep 2 08:04:38 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 2 Sep 2008 08:04:38 -0400 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> References: <20080901184823.GA7915@localdomain> <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> Message-ID: <20080902120438.GB7990@localdomain> Hi Peter; > Wow - welcome back! Thanks. Glad to be able to help; great to see you around. > Its certainly the C/C++ bits that make life complicated. > > Have you looked at the patch on Bug 2251 at all? > http://bugzilla.open-bio.org/show_bug.cgi?id=2251 I did; sorry, forgot to mention that in my last e-mail. I wasn't exactly sure why this wasn't checked in then as the thread dies a bit. It was a great help for getting started. I think the idea there is good, but might be a little more complicated than needed. My thoughts were that we need to manage a transition to NumPy rather than support both Numeric and NumPy concurrently. Numeric hasn't been updated since 2005 and the compatibility modules for transitioning seem solid, so most people should have moved by now or need some encouragement to do so. We are probably being more annoying to up to date users by keeping Numeric support. The way it is implemented in my recent changes are that Numeric users will not get broken builds, but will rather not get the C extensions compiled unless they have NumPy. So the users who absolutely need to install NumPy are those still using Numeric and using the C/C++ extensions. > Michiel and I had chatted about doing one more release with Numeric > only - there are a few things I'd like to try and get in first like > ambiguous translation (Bug 2530) but perhaps we should just do > Biopython 1.48 pretty much as is. > http://bugzilla.open-bio.org/show_bug.cgi?id=2530 > > I'm happy to try and put together the release this week - but will > need a hand for the windows builds. > > If that pans out, then we could start the Numeric to numpy transition in CVS. That sounds fine to me. Once the new release is out I can check things in and we can find out if there are any issues. Brad -- Brad Chapman Codon Devices http://www.codondevices.com From biopython at maubp.freeserve.co.uk Tue Sep 2 08:46:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 13:46:53 +0100 Subject: [BioPython] Bio.MetaTool In-Reply-To: <46010.36121.qm@web62405.mail.re1.yahoo.com> References: <46010.36121.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00809020546y385bb9dcw749fe7f4713c8955@mail.gmail.com> On Sat, Aug 30, 2008 at 3:45 AM, Michiel de Hoon wrote: > Hi everybody, > > Is anybody using the Bio.MetaTool module? If not, can we deprecate it? > > The Bio.MetaTool tests suggest that this module was written for MetaTool > version 3.5 (28.03.2001), while the most current MetaTool version is at 5.0. > Since MetaTool is written for Matlab/Octave, and it seems to be out of data, > I expect that few people are using it with Python. > > Currently, Bio.MetaTool is the only non-deprecated module in Biopython > that uses Martel. If we can deprecate Bio.MetaTool, then (over time) we > can deprecate Martel, which means that Biopython won't need the > mxTextTools any more, making Biopython's installation a lot easier. I have now deprecated Bio.MetaTool in CVS. If anyone is still using this, please speak up. I have also marked Bio.Mindy and Martel as obsolete (documentation changes only - no warning messages), and likely to be deprecated or removed in a future release. Peter From bsouthey at gmail.com Tue Sep 2 09:55:30 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 02 Sep 2008 08:55:30 -0500 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <20080902120438.GB7990@localdomain> References: <20080901184823.GA7915@localdomain> <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> <20080902120438.GB7990@localdomain> Message-ID: <48BD45D2.4090408@gmail.com> Brad Chapman wrote: > Hi Peter; > > >> Wow - welcome back! >> > > Thanks. Glad to be able to help; great to see you around. > > >> Its certainly the C/C++ bits that make life complicated. >> >> Have you looked at the patch on Bug 2251 at all? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 >> > > I did; sorry, forgot to mention that in my last e-mail. I wasn't > exactly sure why this wasn't checked in then as the thread dies a > bit. It was a great help for getting started. > > I think the idea there is good, but might be a little more > complicated than needed. My thoughts were that we need to manage a > transition to NumPy rather than support both Numeric and NumPy > concurrently. Numeric hasn't been updated since 2005 and the > compatibility modules for transitioning seem solid, so most people > should have moved by now or need some encouragement to do so. We are > probably being more annoying to up to date users by keeping Numeric > support. > > The way it is implemented in my recent changes are that Numeric users will > not get broken builds, but will rather not get the C extensions > compiled unless they have NumPy. So the users who absolutely need to > install NumPy are those still using Numeric and using the C/C++ > extensions. > > >> Michiel and I had chatted about doing one more release with Numeric >> only - there are a few things I'd like to try and get in first like >> ambiguous translation (Bug 2530) but perhaps we should just do >> Biopython 1.48 pretty much as is. >> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 >> >> I'm happy to try and put together the release this week - but will >> need a hand for the windows builds. >> >> If that pans out, then we could start the Numeric to numpy transition in CVS. >> > > That sounds fine to me. Once the new release is out I can check > things in and we can find out if there are any issues. > > Brad > Hi, This is excellent work! I agree that at least one more version of BioPython should be released. This version would allow backporting of any fixes if necessary. The transition should require NumPy 1.2 or at least NumPy 1.1 due to NumPy API changes. On that note (referring to the related thread), NumPy's API has to change to support Python3K (due to Python API changes) which limit the porting of BioPython to Python3K (ignoring the significant porting effort for BioPython). Also that probably only NumPy will be ported to Python3K. Also I would like to point out this thread on the future directions of NumPy that should make NumPy easier to support: http://projects.scipy.org/pipermail/numpy-discussion/2008-August/036909.html Bruce From matzke at berkeley.edu Tue Sep 2 19:19:01 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:19:01 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb Message-ID: <48BDC9E5.3020905@berkeley.edu> Hi all, I'm following the BioSQL tutorial at the biopython website (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb to work, and the biosql python scripts etc. However I can't get these directions to work in loading the taxonomy database into bioseqdb. I get: "Can't locate object method "connect" via package "DBI" " I double-checked to make sure I've got DBI in perl (see error message below) but that doesn't seem to help. Any help appreciated! Nick commands/returns: =========================== =========================== =========================== mws2:/bioinformatics/eg nick$ perl -v This is perl, v5.10.0 built for darwin-thread-multi-2level (with 3 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 1002 [283697] provided by ActiveState http://www.ActiveState.com Built Jan 10 2008 11:08:33 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. mws2:/bioinformatics/eg nick$ sudo cpan DBI Password: CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Wed, 23 Jul 2008 10:02:50 GMT CPAN: LWP::UserAgent loaded ok (v5.810) CPAN: Time::HiRes loaded ok (v1.9711) Fetching with LWP: http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz CPAN: YAML loaded ok (v0.66) Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz ............................................................................DONE Fetching with LWP: http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Mon, 01 Sep 2008 10:02:51 GMT ............................................................................DONE Fetching with LWP: http://ppm.activestate.com/CPAN/modules/03modlist.data.gz Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz ............................................................................DONE Going to write /usr/local/Metadata DBI is up to date (1.607). mws2:/bioinformatics/eg nick$ /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Can't locate object method "connect" via package "DBI" at /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl line 323. mws2:/bioinformatics/eg nick$ =========================== =========================== =========================== -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From matzke at berkeley.edu Tue Sep 2 19:36:45 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:36:45 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDC9E5.3020905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> Message-ID: <48BDCE0D.5000905@berkeley.edu> Based on some web advice on this general error I also did this: sudo cpan DBD::mysql ...which did install something that wasn't installed. But the same error resulted when the load_ncbi_taxonomy.pl script was run. Nick Nick Matzke wrote: > Hi all, > > I'm following the BioSQL tutorial at the biopython website > (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get > bioseqdb to work, and the biosql python scripts etc. > > However I can't get these directions to work in loading the taxonomy > database into bioseqdb. I get: "Can't locate object method "connect" > via package "DBI" " > > I double-checked to make sure I've got DBI in perl (see error message > below) but that doesn't seem to help. > > Any help appreciated! > Nick > > > commands/returns: > =========================== > =========================== > =========================== > mws2:/bioinformatics/eg nick$ perl -v > > This is perl, v5.10.0 built for darwin-thread-multi-2level > (with 3 registered patches, see perl -V for more detail) > > Copyright 1987-2007, Larry Wall > > Binary build 1002 [283697] provided by ActiveState > http://www.ActiveState.com > Built Jan 10 2008 11:08:33 > > Perl may be copied only under the terms of either the Artistic License > or the > GNU General Public License, which may be found in the Perl 5 source kit. > > Complete documentation for Perl, including FAQ lists, should be found on > this system using "man perl" or "perldoc perl". If you have access to the > Internet, point your browser at http://www.perl.org/, the Perl Home Page. > > > > > > > mws2:/bioinformatics/eg nick$ sudo cpan DBI > Password: > > CPAN: Storable loaded ok (v2.18) > Going to read /usr/local/Metadata > Database was generated on Wed, 23 Jul 2008 10:02:50 GMT > CPAN: LWP::UserAgent loaded ok (v5.810) > CPAN: Time::HiRes loaded ok (v1.9711) > Fetching with LWP: > http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz > CPAN: YAML loaded ok (v0.66) > Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz > ............................................................................DONE > > Fetching with LWP: > http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz > Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz > Database was generated on Mon, 01 Sep 2008 10:02:51 GMT > ............................................................................DONE > > Fetching with LWP: > http://ppm.activestate.com/CPAN/modules/03modlist.data.gz > Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz > ............................................................................DONE > > Going to write /usr/local/Metadata > DBI is up to date (1.607). > > > > > > > mws2:/bioinformatics/eg nick$ > /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl > --dbname bioseqdb --driver mysql --dbuser root --download true > > > Can't locate object method "connect" via package "DBI" at > /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl > line 323. > > mws2:/bioinformatics/eg nick$ > =========================== > =========================== > =========================== > > > > > > > > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From matzke at berkeley.edu Tue Sep 2 19:43:31 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:43:31 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDCE0D.5000905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> <48BDCE0D.5000905@berkeley.edu> Message-ID: <48BDCFA3.4050605@berkeley.edu> Perl *does* successfully recognize DBI and DBD::mysql, so e.g. not finding them in @INC or something doesn't seem to be the problem... (these commands return '1' if the module is found) ============ perl -MDBD::mysql -e 'print 1' 1 perl -MDBI -e 'print 1' 1 ============ Nick Matzke wrote: > Based on some web advice on this general error I also did this: > > sudo cpan DBD::mysql > > ...which did install something that wasn't installed. But the same > error resulted when the load_ncbi_taxonomy.pl script was run. > > Nick > > > Nick Matzke wrote: >> Hi all, >> >> I'm following the BioSQL tutorial at the biopython website >> (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get >> bioseqdb to work, and the biosql python scripts etc. >> >> However I can't get these directions to work in loading the taxonomy >> database into bioseqdb. I get: "Can't locate object method "connect" >> via package "DBI" " >> >> I double-checked to make sure I've got DBI in perl (see error message >> below) but that doesn't seem to help. >> >> Any help appreciated! >> Nick >> >> >> commands/returns: >> =========================== >> =========================== >> =========================== >> mws2:/bioinformatics/eg nick$ perl -v >> >> This is perl, v5.10.0 built for darwin-thread-multi-2level >> (with 3 registered patches, see perl -V for more detail) >> >> Copyright 1987-2007, Larry Wall >> >> Binary build 1002 [283697] provided by ActiveState >> http://www.ActiveState.com >> Built Jan 10 2008 11:08:33 >> >> Perl may be copied only under the terms of either the Artistic License >> or the >> GNU General Public License, which may be found in the Perl 5 source kit. >> >> Complete documentation for Perl, including FAQ lists, should be found on >> this system using "man perl" or "perldoc perl". If you have access to >> the >> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >> >> >> >> >> >> >> mws2:/bioinformatics/eg nick$ sudo cpan DBI >> Password: >> >> CPAN: Storable loaded ok (v2.18) >> Going to read /usr/local/Metadata >> Database was generated on Wed, 23 Jul 2008 10:02:50 GMT >> CPAN: LWP::UserAgent loaded ok (v5.810) >> CPAN: Time::HiRes loaded ok (v1.9711) >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz >> CPAN: YAML loaded ok (v0.66) >> Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz >> ............................................................................DONE >> >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz >> Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz >> Database was generated on Mon, 01 Sep 2008 10:02:51 GMT >> ............................................................................DONE >> >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/modules/03modlist.data.gz >> Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz >> ............................................................................DONE >> >> Going to write /usr/local/Metadata >> DBI is up to date (1.607). >> >> >> >> >> >> >> mws2:/bioinformatics/eg nick$ >> /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl >> --dbname bioseqdb --driver mysql --dbuser root --download true >> >> >> Can't locate object method "connect" via package "DBI" at >> /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl >> line 323. >> >> mws2:/bioinformatics/eg nick$ >> =========================== >> =========================== >> =========================== >> >> >> >> >> >> >> >> > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From biopython at maubp.freeserve.co.uk Wed Sep 3 07:19:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Sep 2008 12:19:29 +0100 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDC9E5.3020905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> Message-ID: <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> On Wed, Sep 3, 2008 at 12:19 AM, Nick Matzke wrote: > Hi all, > > I'm following the BioSQL tutorial at the biopython website > (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb > to work, and the biosql python scripts etc. > > However I can't get these directions to work in loading the taxonomy > database into bioseqdb. I get: "Can't locate object method "connect" via > package "DBI" " > > I double-checked to make sure I've got DBI in perl (see error message below) > but that doesn't seem to help. This does sound like a question for the BioSQL mailing list (which I see you've now asked on). I'm no perl expert - so if you can resolve this via the BioSQL mailing list, and we can improve the Biopython BioSQL wiki page, that would be great. Peter From chris.lasher at gmail.com Wed Sep 3 12:34:24 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 3 Sep 2008 12:34:24 -0400 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> Message-ID: <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> On Tue, Sep 2, 2008 at 5:00 AM, Peter wrote: > > On Tue, Sep 2, 2008 at 2:25 AM, Chris Lasher wrote: > > On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: > >> > >> Should be in LOCUS: > >> > >> LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 > > > > Ah, sure. Let me re-state my question more precisely: Where is this > > represented in the SeqRecord object created by SeqIO.parse(), or is it > > represented at all? > > Currently if the sequence is circular I don't think it is represented > at all when parsed in a SeqRecord. > > Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this > information from the Scanner via the residue_type event. This is a > combined lump of data containing both the sequence type (DNA, RNA etc) > and if it is linear or circular. It is currently only used to > determine the Seq alphabet, and has never been recorded. So in > addition to not recording if the LOCUS line said the sequence was > circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail > is also currently lost in the SeqRecord representation. On the other > hand, the Bio.GenBank.RecordParser stores all this as the record's > residue_type property (a single combined field, presumably reflecting > the layout of early GenBank files). > > It would be a logical improvement to record the sequence data > (molecule type and if circular) in the SeqRecord's annotations > dictionary - perhaps as two fields but we'd need to check if that > would be straight forward for EMBL files too. Alternatively, if > Biopython included a native CircularSeq object, we could use that > explicitly when the sequence is declared as circular. This might be > considered a little surprising though. > > Do you want to file a bug on this Chris? Would you mind filing it, Peter? I've got a poster to complete very soon. I think you did a fine job describing the features we'd like to add. Thanks, Chris From biopython at maubp.freeserve.co.uk Wed Sep 3 12:49:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Sep 2008 17:49:29 +0100 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> Message-ID: <320fb6e00809030949m78ea710cq9d18a5677ebb8edf@mail.gmail.com> >> Do you want to file a bug on this Chris? > > Would you mind filing it, Peter? I've got a poster to complete very > soon. I think you did a fine job describing the features we'd like to > add. Done, Bug 2578 - The GenBank SeqRecord parser does not record module type or if circular http://bugzilla.open-bio.org/show_bug.cgi?id=2578 You might want to add yourself as a CC. I'm not planning to tackle this right away, and I'll be on holiday most of next week. Peter From matzke at berkeley.edu Wed Sep 3 18:45:53 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Wed, 03 Sep 2008 15:45:53 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> References: <48BDC9E5.3020905@berkeley.edu> <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> Message-ID: <48BF13A1.3020804@berkeley.edu> (Resolved this on the BioSQL list but I figured I would follow up to biopython also -- thanks! & sorry for the confusion) Well, I'm not sure what I did, but some combination of these things seems to have worked. 1. moved the site/lib directory (which contains DBI.pm) to the front of my PERL5LIB (which goes into @INC) export PERL5LIB=$PERL5LIB:/usr/local/ActivePerl-5.10/site/lib:/usr/local/ActivePerl-5.10/man/man3:/usr/local/ActivePerl-5.10/site/lib/Bundle 2. Checked to make sure DBI & DBD::mysql were installed ========================================== mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl -MCPAN -e 'install DBI' CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Mon, 01 Sep 2008 10:02:51 GMT DBI is up to date (1.607). mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl -MCPAN -e 'install DBD::mysql' CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Mon, 01 Sep 2008 10:02:51 GMT DBD::mysql is up to date (4.008). ========================================== 3. (Make sure you have an empty version of the db, at least for me I got errors if I had already loaded sequences etc. into it...I got errors like this: ========================================== note: node (28;331111;27;species;;) is retired; failed to delete: Cannot delete or update a parent row: a foreign key constraint fails (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY (`taxon_id`) REFERENCES `taxon` (`taxon_id`)) note: node (70;300268;69;species;;) is retired; failed to delete: Cannot delete or update a parent row: a foreign key constraint fails (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY (`taxon_id`) REFERENCES `taxon` (`taxon_id`)) note: node (77;3002 ========================================== 4. Ran it again: (I also added '--host localhost') mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true --host localhost Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. So thanks for the help, something or other worked! Cheers, Nick Peter wrote: > On Wed, Sep 3, 2008 at 12:19 AM, Nick Matzke wrote: >> Hi all, >> >> I'm following the BioSQL tutorial at the biopython website >> (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb >> to work, and the biosql python scripts etc. >> >> However I can't get these directions to work in loading the taxonomy >> database into bioseqdb. I get: "Can't locate object method "connect" via >> package "DBI" " >> >> I double-checked to make sure I've got DBI in perl (see error message below) >> but that doesn't seem to help. > > This does sound like a question for the BioSQL mailing list (which I > see you've now asked on). I'm no perl expert - so if you can resolve > this via the BioSQL mailing list, and we can improve the Biopython > BioSQL wiki page, that would be great. > > Peter > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From quwubin at gmail.com Mon Sep 8 09:43:05 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 21:43:05 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi all, I started a new project named BioPythonGUI for a few of days. The following is the 'About' page from BioPythonGUI project. BioPythonGUI is a Graphical User Interface of BioPython. BioPython is a widely used python module set in bioinformatics. It help researchers: - Parsing files in different database formats - Interfaces into programs like Blast, Entrez and PubMed - A sequence class (can transcribe, translate, invert, etc) - Code for handling alignments of sequences - Clustering algorithms - etc. However, it's not everyone can use the BioPython, especially ones who do not know much about the programming. How can you expect a professor who never known about any programming to use BioPython to parse the BLAST report file? This is the problem which the BioPythonGUI would solve. I started the project with the goal "Everyone can use BioPython with BioPythonGUI". Until now, there are two modules SeqGUI and BlastGUI are available in BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me the feedback. Please see the developer's blog for details. Project Blog: http://biopythongui.blogspot.com/ Download: https://sites.google.com/site/biopythongui/download Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02# ______________________________ Best regards, Wubin Qu From quwubin at gmail.com Mon Sep 8 09:46:24 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 21:46:24 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi all, I started a new project named BioPythonGUI for a few of days. The following is the 'About' page from BioPythonGUI project. BioPythonGUI is a Graphical User Interface of BioPython. BioPython is a widely used python module set in bioinformatics. It help researchers: - Parsing files in di fferent database formats - Interfaces into programs like Blast, Entrez and PubMed - A sequence class (can transcribe, translate, invert, etc) - Code for handling alignments of sequences - Clustering algorithms - etc. However, it's not everyone can use the BioPython, especially ones who do not know much about the programming. How can you expect a professor who never known about any programming to use BioPython to parse the BLAST report file? This is the problem which the BioPythonGUI would solve. I started the project with the goal "Everyone can use BioPython with BioPythonGUI". Until now, there are two modules SeqGUI and BlastGUI are available in BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me the feedback. Please see the developer's blog for details. Project Blog: http://biopythongui.blogspot.com/ Download: https://sites.google.com/site/biopythongui/download Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02# ______________________________ Best regards, Wubin Qu From p.j.a.cock at googlemail.com Mon Sep 8 10:12:15 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 15:12:15 +0100 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: References: Message-ID: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > Hi all, > > I started a new project named BioPythonGUI for a few of days. Hello Wubin Qu, > BioPythonGUI is a Graphical User Interface of BioPython. I'm uncomfortable about the name BioPythonGUI, as this to me implies it is part of Biopython (whereas is it currently just a third party project built on top of Biopython). What do other people think? > However, it's not everyone can use the BioPython, especially ones who do not > know much about the programming. How can you expect a professor who never > known about any programming to use BioPython to parse the BLAST report file? > This is the problem which the BioPythonGUI would solve. I started the > project with the goal "Everyone can use BioPython with BioPythonGUI". I don't really understand your goal. How would a non-programming professor use your program to parse a BLAST report file? The NCBI already try and make the HTML and plain text output useful to non-programmers and from looking at the screenshots I don't see how your tool would help. > Until now, there are two modules SeqGUI and BlastGUI are available in > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me > the feedback. I see your module SeqGUI builds on the SeqGui.py in BioPython (in the scripts directory). It might make sense to include your improvements to this code as part of Biopython. I haven't looked at your code yet, so I don't know how much you've changed things. It is nice to be able to be able to translate, transcribe, reverse complement etc in a GUI, but personally I don't see the point or writing a little application just for this. Also, there are probably many many existing tools out there that already offer this functionality. However, I am happy writing code, so I am not in your target audience. Regarding your BlastGUI idea, I can see that a GUI for standalone blast is nicer than the command line for some people. However, I don't see how this is more useful than running a local blast web server (something the NCBI already provides). Sorry for being so negative, Peter From quwubin at gmail.com Mon Sep 8 10:38:27 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 22:38:27 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> References: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> Message-ID: Hi Peter, Thans for your reply. My goal is simple: Programs with GUI are easily to use. BioPython with GUI will facilitate people. The next module is: BlastParserGUI. I think it will be useful. Yes, SeqGUI is built on SeqGui.py. And I learn a lot from SeqGui.py. It inspires me to build other modules. I mentioned this here . ______________________________ Best regards, Wubin Qu 2008/9/8 Peter Cock > On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > > Hi all, > > > > I started a new project named BioPythonGUI for a few of days. > > Hello Wubin Qu, > > > BioPythonGUI is a Graphical User Interface of BioPython. > > I'm uncomfortable about the name BioPythonGUI, as this to me implies > it is part of Biopython (whereas is it currently just a third party > project built on top of Biopython). What do other people think? > > > However, it's not everyone can use the BioPython, especially ones who do > not > > know much about the programming. How can you expect a professor who never > > known about any programming to use BioPython to parse the BLAST report > file? > > This is the problem which the BioPythonGUI would solve. I started the > > project with the goal "Everyone can use BioPython with BioPythonGUI". > > I don't really understand your goal. How would a non-programming > professor use your program to parse a BLAST report file? The NCBI > already try and make the HTML and plain text output useful to > non-programmers and from looking at the screenshots I don't see how > your tool would help. > > > Until now, there are two modules SeqGUI and BlastGUI are available in > > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send > me > > the feedback. > > I see your module SeqGUI builds on the SeqGui.py in BioPython (in the > scripts directory). It might make sense to include your improvements > to this code as part of Biopython. I haven't looked at your code yet, > so I don't know how much you've changed things. > > It is nice to be able to be able to translate, transcribe, reverse > complement etc in a GUI, but personally I don't see the point or > writing a little application just for this. Also, there are probably > many many existing tools out there that already offer this > functionality. However, I am happy writing code, so I am not in your > target audience. > > Regarding your BlastGUI idea, I can see that a GUI for standalone > blast is nicer than the command line for some people. However, I > don't see how this is more useful than running a local blast web > server (something the NCBI already provides). > > Sorry for being so negative, > > Peter > From biopython at maubp.freeserve.co.uk Tue Sep 9 06:14:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Sep 2008 11:14:11 +0100 Subject: [BioPython] Biopython 1.48 released Message-ID: <320fb6e00809090314s722f404bqda71d7d9f97360e7@mail.gmail.com> We are pleased to announce the release of Biopython 1.48. Some new functionality has been added, a few bugs have been fixed, the documentation has been updated, plus several obsolete modules have been deprecated (or explicitly labelled as obsolete). The following additional file formats are now supported in Bio.SeqIO and Bio.AlignIO: * reading and writing "tab" format (simple tab separated) * writing "nexus" files * reading "pir" files (NBRF/PIR) * basic support for writing "genbank" files (GenBank plain text) This release also fixes some problems reading Clustal alignments (introduced in Biopython 1.46 when consolidating Bio.AlignIO and Bio.Clustalw), and some updates to the Bio.Sequencing parsers. The SeqRecord and Alignment objects have a new method to get the object as a string in a given file format (handled via Bio.SeqIO and Bio.AlignIO). Bio.PubMed and the online code in Bio.GenBank are now considered obsolete, and we intend to deprecate them after the next release. For accessing PubMed and GenBank, please use Bio.Entrez instead. Martel and Bio.Mindy are now considered to be obsolete, and are likely to be deprecated and removed in a future release, at which point we will drop the optional dependency on mxTextTools. Bio.Fasta is also considered to be obsolete, please use Bio.SeqIO instead. We do intend to deprecate this module eventually, however, for several years this was the primary FASTA parsing module in Biopython and is likely to be in use in many existing scripts. In addition a number of other modules have been deprecated, including: Bio.MetaTool, Bio.EUtils, Bio.Saf, Bio.NBRF, and Bio.IntelliGenetics - see the DEPRECATED file for full details. Source distributions are available from the Biopython website at http://biopython.org, and Windows installers will be added shortly. My thanks to all bug reporters, code contributors and others who made this new release possible. Peter, on behalf of the Biopython developers P.S. This message will be forwarded to the Biopython anoucement mailing list shortly. For those of you who prefer news readers to email lists, have a look at the OBF news server: http://news.open-bio.org/news/2008/09/biopython-release-148/ where there are Biopython news feeds available: http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2 http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom From lueck at ipk-gatersleben.de Tue Sep 9 07:11:56 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 9 Sep 2008 13:11:56 +0200 Subject: [BioPython] Biopython 1.48 released Message-ID: <006b01c9126c$ded5eee0$1022a8c0@ipkgatersleben.de> Great work! Thanks to all! Stefanie ;-) From bsantos at biocant.pt Wed Sep 10 17:42:31 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 10 Sep 2008 21:42:31 +0000 Subject: [BioPython] Entrez.read Message-ID: <20080910214231.f492d16a@mail.biocant.pt> Hi everyone, I am trying to use Entrez.esearch to find the ids I can then use to fetch the genbank records. For that purpose I went looking for biopython tutorial for more information in how to do this, and I get the following information: >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]") >>> record = Entrez.read(handle) >>> record["Count"] '25' >>> record["IdList"] ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read to parse the output of Entrez.esearch. Is this a new option in the last release? Thanks in advance, Bruno Santos From bsantos at biocant.pt Thu Sep 11 11:09:58 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Thu, 11 Sep 2008 16:09:58 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: <001501c91420$769d0240$63d706c0$@pt> No answers? -----Mensagem original----- De: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos Enviada: quarta-feira, 10 de Setembro de 2008 22:43 Para: biopython at biopython.org Assunto: [BioPython] Entrez.read Hi everyone, I am trying to use Entrez.esearch to find the ids I can then use to fetch the genbank records. For that purpose I went looking for biopython tutorial for more information in how to do this, and I get the following information: >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]") >>> record = Entrez.read(handle) >>> record["Count"] '25' >>> record["IdList"] ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read to parse the output of Entrez.esearch. Is this a new option in the last release? Thanks in advance, Bruno Santos _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Thu Sep 11 11:37:58 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 11 Sep 2008 08:37:58 -0700 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: can you provide the steps and error messages you are getting? It seems to work fine for me. On Wed, Sep 10, 2008 at 2:42 PM, Bruno Santos wrote: > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND > matK[Gene]") > >>> record = Entrez.read(handle) > >>> record["Count"] > '25' > >>> record["IdList"] > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From fkauff at biologie.uni-kl.de Thu Sep 11 11:26:58 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 11 Sep 2008 17:26:58 +0200 Subject: [BioPython] Entrez.read In-Reply-To: <001501c91420$769d0240$63d706c0$@pt> References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> Message-ID: <48C938C2.2080902@biologie.uni-kl.de> Hi Bruno, from what I see your code should work. Without knowing what version of Biopython you are running and what error message you get, it's a bit difficult to tackle he problem. I'd guess your biopython is outdated. Frank Bruno Santos wrote: > No answers? > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > Para: biopython at biopython.org > Assunto: [BioPython] Entrez.read > > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND >>>> > matK[Gene]") > >>>> record = Entrez.read(handle) >>>> record["Count"] >>>> > '25' > >>>> record["IdList"] >>>> > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From bsantos at biocant.pt Thu Sep 11 12:36:43 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Thu, 11 Sep 2008 17:36:43 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <48C938C2.2080902@biologie.uni-kl.de> References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> <48C938C2.2080902@biologie.uni-kl.de> Message-ID: <001d01c9142c$94e2d520$bea87f60$@pt> Ok this seem to be a completely stupid error, I was playing a little bit with the pydev console in eclipse and the Entrez.read raise an exception, but if I place this instruction in a script it works fine. No idea why this is happening but in fact this method don't appear listed in eclipse auto complete. Thank you all for your answers. No I have another question it is possible to limit the esearch not only to database but also to species? Because I only need the ids associated with droshophila melanogaster, and have to fetch all the sequences and only than check if they belong to this species seems a waste of bandwidth. Thanks in advance, Bruno Santos -----Mensagem original----- De: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] Em nome de Frank Kauff Enviada: quinta-feira, 11 de Setembro de 2008 16:27 Para: biopython at biopython.org Assunto: Re: [BioPython] Entrez.read Hi Bruno, from what I see your code should work. Without knowing what version of Biopython you are running and what error message you get, it's a bit difficult to tackle he problem. I'd guess your biopython is outdated. Frank Bruno Santos wrote: > No answers? > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > Para: biopython at biopython.org > Assunto: [BioPython] Entrez.read > > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND >>>> > matK[Gene]") > >>>> record = Entrez.read(handle) >>>> record["Count"] >>>> > '25' > >>>> record["IdList"] >>>> > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Thu Sep 11 12:54:28 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 11 Sep 2008 09:54:28 -0700 Subject: [BioPython] Entrez.read In-Reply-To: References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> <48C938C2.2080902@biologie.uni-kl.de> <001d01c9142c$94e2d520$bea87f60$@pt> Message-ID: The example *is* limiting your search to the subfamily Cypripedioideae. To limit to a species you can write along the same lines: handle = Entrez.esearch(db="nucleotide",term="Escherichia coli[Orgn]") (With apologies to Brad Chapman. I'm a microbe guy, not a plant guy). ./I On Thu, Sep 11, 2008 at 9:36 AM, Bruno Santos wrote: > Ok this seem to be a completely stupid error, I was playing a little bit > with the pydev console in eclipse and the Entrez.read raise an exception, > but if I place this instruction in a script it works fine. No idea why this > is happening but in fact this method don't appear listed in eclipse auto > complete. > Thank you all for your answers. > No I have another question it is possible to limit the esearch not only to > database but also to species? Because I only need the ids associated with > droshophila melanogaster, and have to fetch all the sequences and only than > check if they belong to this species seems a waste of bandwidth. > > Thanks in advance, > Bruno Santos > > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Frank Kauff > Enviada: quinta-feira, 11 de Setembro de 2008 16:27 > Para: biopython at biopython.org > Assunto: Re: [BioPython] Entrez.read > > Hi Bruno, > > from what I see your code should work. Without knowing what version of > Biopython you are running and what error message you get, it's a bit > difficult to tackle he problem. I'd guess your biopython is outdated. > > Frank > > Bruno Santos wrote: > > No answers? > > -----Mensagem original----- > > De: biopython-bounces at lists.open-bio.org > > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > > Para: biopython at biopython.org > > Assunto: [BioPython] Entrez.read > > > > Hi everyone, > > I am trying to use Entrez.esearch to find the ids I can then use to fetch > > the genbank records. > > For that purpose I went looking for biopython tutorial for more > information > > in how to do this, and I get the following information: > > > > > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] > AND > >>>> > > matK[Gene]") > > > >>>> record = Entrez.read(handle) > >>>> record["Count"] > >>>> > > '25' > > > >>>> record["IdList"] > >>>> > > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > > > Unfortunately when I try to follow this instructions I found out that I > > don't have any Entrez.read > > to parse the output of Entrez.esearch. Is this a new option in the last > > release? > > Thanks in advance, > > Bruno Santos > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Tue Sep 16 16:20:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Sep 2008 21:20:58 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: <320fb6e00809161320l2249e12t5b5537becc50c238@mail.gmail.com> On Wed, Sep 10, 2008 at 10:42 PM, Bruno Santos wrote: > Hi everyone, > > Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last release? My initial guess would be the answer in the FAQ (near the start of the tutorial), >> Why doesn't Bio.Entrez.read() work? The module imports fine but there is no read function! >> You need Biopython 1.46 or later. But then you said later it was something odd about the pydev console in eclipse - which stumps me. Peter From biopython at maubp.freeserve.co.uk Thu Sep 18 08:32:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Sep 2008 13:32:43 +0100 Subject: [BioPython] Deprecating Bio.Mindy and Martel Message-ID: <320fb6e00809180532t38c9aa19leff6b65f1902ca5f@mail.gmail.com> Dear all, As of Biopython release 1.48, Martel and Bio.Mindy are officially considered to be obsolete. These modules provided a regular expression based parsing framework originally used in several of the Biopython parsers, but required the external dependency mxTextTools. None of current parsers in Biopython use Martel and Bio.Mindy anymore. I would like to deprecate Martel and Bio.Mindy (and closely associated support modules) for the next release. This means the code will still be present and functional, but using it will trigger a warning message. We can then remove them in a future release. This change should not affect the vast majority of Biopython users. However, if there are any Biopython users on the mailing list who are using Martel and Bio.Mindy in their own parsers, please get in touch. Thank you, Peter From mjldehoon at yahoo.com Fri Sep 19 23:02:51 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 20:02:51 -0700 (PDT) Subject: [BioPython] Migrating from Numerical Python to numpy Message-ID: <623262.17729.qm@web62407.mail.re1.yahoo.com> Dear all, As you probably are well aware, Biopython releases to date have used the now obsolete Numeric python library. This is no longer being maintained and has been superseded by the numpy library. See http://www.scipy.org/History_of_SciPy for more about details on the history of numerical python. Biopython 1.48 should be the last Numeric only release of Biopython - we have already started moving to numpy in CVS. Supporting both Numeric and numpy ought to be fairly straightforward for the pure python modules in Biopython. However, we also have C code which must interact with Numeric/numpy, and trying to support both would be harder. Would anyone be inconvenienced if the next release of Biopython supported numpy ONLY (dropping support for Numeric)? If so please speak up now - either here or on the development mailing list. Otherwise, a simple switch from Numeric to numpy will probably be the most straightforward migration plan. Thank you, --Mihiel. From ericgibert at yahoo.fr Sat Sep 20 22:46:01 2008 From: ericgibert at yahoo.fr (Eric Gibert) Date: Sun, 21 Sep 2008 10:46:01 +0800 Subject: [BioPython] Migrating from Numerical Python to numpy In-Reply-To: <623262.17729.qm@web62407.mail.re1.yahoo.com> References: <623262.17729.qm@web62407.mail.re1.yahoo.com> Message-ID: <93940882994C47E3AEDA37AC021C187D@Gecko> Go ahead! :-) numpy only. Eric -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Michiel de Hoon Sent: Saturday, September 20, 2008 11:03 AM To: biopython at biopython.org Subject: [BioPython] Migrating from Numerical Python to numpy Dear all, As you probably are well aware, Biopython releases to date have used the now obsolete Numeric python library. This is no longer being maintained and has been superseded by the numpy library. See http://www.scipy.org/History_of_SciPy for more about details on the history of numerical python. Biopython 1.48 should be the last Numeric only release of Biopython - we have already started moving to numpy in CVS. Supporting both Numeric and numpy ought to be fairly straightforward for the pure python modules in Biopython. However, we also have C code which must interact with Numeric/numpy, and trying to support both would be harder. Would anyone be inconvenienced if the next release of Biopython supported numpy ONLY (dropping support for Numeric)? If so please speak up now - either here or on the development mailing list. Otherwise, a simple switch from Numeric to numpy will probably be the most straightforward migration plan. Thank you, --Mihiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From lunt at ctbp.ucsd.edu Sun Sep 21 17:57:59 2008 From: lunt at ctbp.ucsd.edu (Bryan Lunt) Date: Sun, 21 Sep 2008 14:57:59 -0700 Subject: [BioPython] Biopython 1.48 Still using Numeric!? Message-ID: Greetings, It seems that v1.48 still uses Numeric, and not Numpy? The last time I installed biopython (v1.47) I was able to use the migration tools included with numpy, plus some manual judgement and search/replace to migrate the code. It seemed like the tests worked after that, but maybe there was something I missed? Obviously it must be a huge job, or someone would have done it by now. (Which I suppose supports the "something I missed" hypothesis...) -Bryan Lunt From biopython at maubp.freeserve.co.uk Sun Sep 21 18:26:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 21 Sep 2008 23:26:24 +0100 Subject: [BioPython] Biopython 1.48 Still using Numeric!? In-Reply-To: References: Message-ID: <320fb6e00809211526y6abcac90i9f76b97455e1e9af@mail.gmail.com> On Sun, Sep 21, 2008 at 10:57 PM, Bryan Lunt wrote: > Greetings, > It seems that v1.48 still uses Numeric, and not Numpy? Yes, that is true. See below... > The last time I installed biopython (v1.47) I was able to use the > migration tools included with numpy, plus some manual judgement and > search/replace to migrate the code. > It seemed like the tests worked after that, but maybe there was > something I missed? > > Obviously it must be a huge job, or someone would have done it by now. > (Which I suppose supports the "something I missed" hypothesis...) I'm guessing you missed the earlier emails on this list - we are in the process of moving from Numeric to numpy for the next release. This is non-trivial due to Biopython having C code using Numeric. If you are brave, you could checkout the latest code from CVS and see how that works for you? Peter From meesters at uni-mainz.de Mon Sep 22 07:29:26 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 13:29:26 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector Message-ID: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> Hi, For my own software I've re-written Bio.PDB.Vector. (Or at least something similar, but I've adpated my code to the look & feel of Thomas Hamelryck's original code.) My code avoids using the 'from numpy.oldnumeric import'-statements. I've send the file some months ago to Thomas, but it seems it didn't make it to CVS. Anybody interested in using it as a template (it includes some unit tests)? Christian PS I don't have time to join the development, but appreciate any progress in biopython. Thanks to all of you and keep up the good work! From biopython at maubp.freeserve.co.uk Mon Sep 22 09:14:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 14:14:14 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> On Mon, Sep 22, 2008 at 12:29 PM, Christian Meesters wrote: > Hi, > > For my own software I've re-written Bio.PDB.Vector. (Or at least > something similar, but I've adpated my code to the look & feel of Thomas > Hamelryck's original code.) > My code avoids using the 'from numpy.oldnumeric import'-statements. I've > send the file some months ago to Thomas, but it seems it didn't make it > to CVS. Anybody interested in using it as a template (it includes some > unit tests)? > > Christian > > PS I don't have time to join the development, but appreciate any > progress in biopython. Thanks to all of you and keep up the good work! Hi Christian, It seems Thomas has been a little too busy lately to spend much time on Biopython. Could you file an enhancement bug on bugzilla for this, and then upload your code as an attachment? Depending on what exactly you mean be re-writing Bio/PDB/Vector.py either a patch or just the new python file(s) could be best. Further unit tests are also welcome :) Thanks, Peter From meesters at uni-mainz.de Mon Sep 22 10:56:13 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 16:56:13 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> Message-ID: <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> > > It seems Thomas has been a little too busy lately to spend much time > on Biopython. Could you file an enhancement bug on bugzilla for this, > and then upload your code as an attachment? Done. Only that I didn't find how to attach files. So I simply pasted the code Feel free to use or drop anything. Christian From biopython at maubp.freeserve.co.uk Mon Sep 22 11:19:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 16:19:33 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> On Mon, Sep 22, 2008 at 3:56 PM, Christian Meesters wrote: > >> >> It seems Thomas has been a little too busy lately to spend much time >> on Biopython. Could you file an enhancement bug on bugzilla for this, >> and then upload your code as an attachment? > > Done. Only that I didn't find how to attach files. So I simply pasted > the code Feel free to use or drop anything. > > Christian Thanks for that. I did wonder if I should be more explicit - its a two step process, file the bug, then add the attachment. Odd, I agree. As to the code, I've only skimmed over it. One thing that caught my eye is you are using scipy, which we currently do not want to introduce as a Biopython dependency. Also, by the nature of your code (using the new numpy API, not numpy.oldnumeric), making this kind of change will have to wait until we have agreed to completely drop support for Numeric. There is a thread proposing a simple transition from Numeric only (Biopython 1.48 or older) to numpy only (Biopython 1.49 onwards), but a transition period supporting both could be possible (but would probably be more work!). If you have any views on this migration, please voice them (ideally on that thread). Thanks, Peter From meesters at uni-mainz.de Mon Sep 22 11:44:30 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 17:44:30 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> Message-ID: <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> > Thanks for that. I did wonder if I should be more explicit - its a > two step process, file the bug, then add the attachment. Odd, I > agree. Indeed ;-). > > As to the code, I've only skimmed over it. One thing that caught my > eye is you are using scipy, which we currently do not want to > introduce as a Biopython dependency. Alright, this was only a shortcut to get the determinant. Actually there is no need to use scipy. > If you have any views on this migration, please voice them > (ideally on that thread). No, nothing to add to things already written. Christian From biopython at maubp.freeserve.co.uk Mon Sep 22 12:05:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 17:05:13 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220905x6c0c0c82wc3b06344d87714b0@mail.gmail.com> >> As to the code, I've only skimmed over it. One thing that caught my >> eye is you are using scipy, which we currently do not want to >> introduce as a Biopython dependency. > > Alright, this was only a shortcut to get the determinant. Actually there > is no need to use scipy. Good - that's what I would have assumed anyway. >> If you have any views on this migration, please voice them >> (ideally on that thread). > > No, nothing to add to things already written. OK, good :) Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 12:18:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 17:18:31 +0100 Subject: [BioPython] Removing Bio.ECell, Bio.Rebase, Bio.CDD and Bio.SGMLExtractor Message-ID: <320fb6e00809230918s2ceb3ee7hbd14f3d72a46c903@mail.gmail.com> Dear all, The following unmaintained modules were deprecated a couple of releases ago, and unless anyone objects, they will shortly be removed, and will not be present in the next release of Biopython: Bio.ECell - obsolete with ECell updates Bio.Rebase - could not parse current Rebase records Bio.CDD - could not parse current HTML files from CDD Bio.SGMLExtractor - old parsing code no longer needed within Biopython As usual, the files will still be preserved in CVS so if at a later date someone wants to resurrect this functionality, this is possible. Peter From adityashukla1983 at gmail.com Tue Sep 23 20:13:06 2008 From: adityashukla1983 at gmail.com (aditya shukla) Date: Tue, 23 Sep 2008 19:13:06 -0500 Subject: [BioPython] Creating and traversing an ultrametric tree Message-ID: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> Hello Folks, Can someone explain how can an ultrametric tree be created and traversed? I found think link on ultrametric trees but didn't get a lot from it. http://www.koders.com/python/fid414AE4B3075F9E434B76B0AEA3CE336C5A9F60F8.aspx?s=cdef%3Atree#L4 Thanks Aditya From biopython at maubp.freeserve.co.uk Wed Sep 24 05:00:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 10:00:52 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> Message-ID: <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> On Wed, Sep 24, 2008 at 1:13 AM, aditya shukla wrote: > Hello Folks, > Hi Aditya, > > Can someone explain how can an ultrametric tree be created and traversed? > What kind of data are you starting with? People tend to use specialist software to generate trees using a range of algorithms, often starting from a set of aligned sequences. > > I found think link on ultrametric trees but didn't get a lot from it. > http://www.koders.com/python/fid414AE4B3075F9E434B76B0AEA3CE336C5A9F60F8.aspx?s=cdef%3Atree#L4 > Its a shame that code doesn't seem to include a header with any copyright or authorship statement. It is not using Biopython, and my initial guess was the "import newick" was using Thomas Mailund's Newick tree parser: http://www.daimi.au.dk/~mailund/newick.html (You can alternatively use the Bio.Nexus module in Biopython to parse a Newick tree) However, clicking on some of the other neighbouring files on that koders.com website I realised you are looking at a copy of the lagrange python package ("Likelihood analysis of geographic range evolution"), the homepage for which is here: http://code.google.com/p/lagrange/ Peter From biopython at maubp.freeserve.co.uk Wed Sep 24 16:26:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 21:26:37 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> Message-ID: <320fb6e00809241326i16a337das844f4ac74766b459@mail.gmail.com> On Wed, Sep 24, 2008 at 7:32 PM, aditya shukla wrote: > Hello Peter , > > Thanks for the reply , > I have attached a file with of the kind of data that i wanna parse. > I tried using Thomas Mailund's Newick tree parser but this dosen't > seem to work , so is there any other module that can help? Your file looks like this (in case anyone on the mailing list recognises it), /T_0_size=105((-bin-ulockmgr_server:0.99[&&NHX:C=0.195.0], (((-bin-hostname:0.00[&&NHX:C=200.0.0], (-bin-dnsdomainname:0.00[&&NHX:C=200.0.0], ...):0.99):0.99):0.99):0.99); [with a large chunk removed, and new lines inserted] I'm guessing this is some kind of computer system profile - nothing to do with bioinformatics. I'm not 100% sure this is Newick format - it might be worth trying to parse everything after the "/T_0_size=105" text which looks out of place to me. If it is a valid Newick format tree file, then it is using named internal nodes which is something Biopython can't currently parse (see Bug 2543, http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ). So I don't think you can use the Bio.Nexus module in Biopython to read this tree. The only other python package I can suggest you try is NetworkX, https://networkx.lanl.gov/wiki Good luck, Peter From peter at maubp.freeserve.co.uk Fri Sep 26 11:59:51 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 16:59:51 +0100 Subject: [BioPython] More string methods for the Seq object Message-ID: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> Dear Biopythoneers, On bugzilla I've put forward a couple of suggested string-like methods which I would like to add to the Seq object. Bug 2351 comment 15 - adding a split method http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 Here I have suggested the separator be non-optional (for strings this defaults to white space) Bug 2596 - adding strip, rstrip and lstrip http://bugzilla.open-bio.org/show_bug.cgi?id=2596 Here I have suggested these default to stripping gap characters (for strings these default to stripping white space) Any comments? Positive or negative feedback is welcome. Peter From mmokrejs at ribosome.natur.cuni.cz Fri Sep 26 12:34:01 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 26 Sep 2008 18:34:01 +0200 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> Message-ID: <48DD0EF9.80204@ribosome.natur.cuni.cz> Hi Peter, Peter wrote: > Dear Biopythoneers, > > On bugzilla I've put forward a couple of suggested string-like methods > which I would like to add to the Seq object. > > Bug 2351 comment 15 - adding a split method > http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 > Here I have suggested the separator be non-optional (for strings this > defaults to white space) please apologize my ignorance but what is this useful for? > Bug 2596 - adding strip, rstrip and lstrip > http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > Here I have suggested these default to stripping gap characters (for > strings these default to stripping white space) Again, what is this useful for? Aren't there checks for quality of the sequence when one tries to instantiate the object? M. From peter at maubp.freeserve.co.uk Fri Sep 26 12:42:19 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:42:19 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD0EF9.80204@ribosome.natur.cuni.cz> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> Message-ID: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> >> Bug 2351 comment 15 - adding a split method >> http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 >> Here I have suggested the separator be non-optional (for strings this >> defaults to white space) > > please apologize my ignorance but what is this useful for? Support you had translated a nucleotide sequence into for example, "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein alphabet). You might want to split the sequence at terminators, to get the open reading frames (and then filter them on length). Right now the Seq object doesn't have a split method so you would have to switch to using python strings (and then go back to a Biopython Seq object later if need be). >> Bug 2596 - adding strip, rstrip and lstrip >> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 >> Here I have suggested these default to stripping gap characters (for >> strings these default to stripping white space) > > Again, what is this useful for? Aren't there checks for quality > of the sequence when one tries to instantiate the object? I'm not sure what you mean by quality of the sequence here (are you talking about sequencing quality scores?) Suppose you have some sequences which you have aligned in ClustalW, and most have leading or trailing gaps characters. e.g. Given "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) you might want to strip off the leading and trailing gaps to have just "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now the Seq object doesn't have a strip method, so you would have to switch to a string and back again. I could write these up as examples in python if it would help. Peter From peter at maubp.freeserve.co.uk Fri Sep 26 12:52:13 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:52:13 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> Message-ID: <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> > Support you had translated a nucleotide sequence into for example, > "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein > alphabet). You might want to split the sequence at terminators, to > get the open reading frames (and then filter them on length). Right > now the Seq object doesn't have a split method so you would have to > switch to using python strings (and then go back to a Biopython Seq > object later if need be). Using pure python strings: str_seq = "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" orf_str_list = str_seq.split("*") Using Biopython Seq objects: from Bio.Seq import Seq from Bio.Alphabet import generic_protein seq = Seq("SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ", generic_protein) #I want to be able to do this: orf_seq_list = seq.split("*") #Right now I have to do something like this: orf_seq_list = [Seq(x, generic_protein) for x in seq.tostring().split("*")] Another example of using a Seq object .split() method would be for restriction enzymes (although the Bio.Restriction package should be more general). > Suppose you have some sequences which you have aligned in ClustalW, > and most have leading or trailing gaps characters. e.g. Given > "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) > you might want to strip off the leading and trailing gaps to have just > "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now > the Seq object doesn't have a strip method, so you would have to > switch to a string and back again. Using pure python strings: long_seq_str = "---SAD-KCNKADND---" trimmed_seq_str = long_seq_str.strip("-") Using Biopython Seq objects: from Bio.Seq import Seq from Bio.Alphabet import generic_protein long_seq = Seq("---SAD-KCNKADND---", generic_protein) #I want to be able to do this: trimmed_seq = long_seq.strip("-") #Right now, I have to do something like this: trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) Another possible example is if you have some EST sequences and you want to strip the poly A tail on the trailing end (right side), e.g "ACACTGCAGCATCAGCAAAAAAA".rstrip("A") Peter From mmokrejs at ribosome.natur.cuni.cz Fri Sep 26 12:54:15 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 26 Sep 2008 18:54:15 +0200 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> Message-ID: <48DD13B7.50007@ribosome.natur.cuni.cz> Hi Peter, thanks for explanation, now I got your point. With 'quality checks' I meant test for compliance of the sequence with say IUPAC definitions. OK, it is irrelevant now. As long you you document these functions I am fine with that. ;-) Regards, M. From peter at maubp.freeserve.co.uk Fri Sep 26 13:10:57 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 18:10:57 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD13B7.50007@ribosome.natur.cuni.cz> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <48DD13B7.50007@ribosome.natur.cuni.cz> Message-ID: <320fb6e00809261010h2733e132u93182ea19dcdc456@mail.gmail.com> On Fri, Sep 26, 2008 at 5:54 PM, Martin MOKREJ? wrote: > Hi Peter, > thanks for explanation, now I got your point. Good. > > With 'quality checks' I meant test for compliance of the sequence > with say IUPAC definitions. OK, it is irrelevant now. > I did wonder if this is what you meant. Checking Seq objects for compliance with the stated (IUPAC) alphabets is a separate issue - which I have been thinking about. I've just filed Bug 2597 so we don't forget about it. http://bugzilla.open-bio.org/show_bug.cgi?id=2597 > > As long you document these functions I am fine with that. ;-) > Sure. The suggested code for these new Seq object methods includes docstrings, but I would also intend to show how they might be used in the tutorial (probably examples like the ones shown in this email discussion). Peter From bsouthey at gmail.com Fri Sep 26 14:45:58 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 13:45:58 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> Message-ID: <48DD2DE6.10908@gmail.com> Peter wrote: >> Support you had translated a nucleotide sequence into for example, >> "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein >> alphabet). You might want to split the sequence at terminators, to >> get the open reading frames (and then filter them on length). Right >> now the Seq object doesn't have a split method so you would have to >> switch to using python strings (and then go back to a Biopython Seq >> object later if need be). >> > > Using pure python strings: > > str_seq = "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" > orf_str_list = str_seq.split("*") > > Using Biopython Seq objects: > > from Bio.Seq import Seq > from Bio.Alphabet import generic_protein > seq = Seq("SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ", generic_protein) > #I want to be able to do this: > orf_seq_list = seq.split("*") > #Right now I have to do something like this: > orf_seq_list = [Seq(x, generic_protein) for x in seq.tostring().split("*")] > > Another example of using a Seq object .split() method would be for > restriction enzymes (although the Bio.Restriction package should be > more general). > > >> Suppose you have some sequences which you have aligned in ClustalW, >> and most have leading or trailing gaps characters. e.g. Given >> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >> you might want to strip off the leading and trailing gaps to have just >> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >> the Seq object doesn't have a strip method, so you would have to >> switch to a string and back again. >> > > Using pure python strings: > > long_seq_str = "---SAD-KCNKADND---" > trimmed_seq_str = long_seq_str.strip("-") > > Using Biopython Seq objects: > > from Bio.Seq import Seq > from Bio.Alphabet import generic_protein > long_seq = Seq("---SAD-KCNKADND---", generic_protein) > #I want to be able to do this: > trimmed_seq = long_seq.strip("-") > #Right now, I have to do something like this: > trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) > > Another possible example is if you have some EST sequences and you > want to strip the poly A tail on the trailing end (right side), e.g > "ACACTGCAGCATCAGCAAAAAAA".rstrip("A") > > Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > Hi, While I do like the idea, strip(), as defined here, is inconsistent with the Python string version. Python documentation: strip([chars]): "Return a copy of the string with the leading and trailing characters removed." Rather you should use an alternative word like compress to remove the said character from within a sequence. Bruce From biopython at maubp.freeserve.co.uk Fri Sep 26 17:13:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:13:19 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809261340s1ee6692fo8834d2261d3a5ab@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> <320fb6e00809241326i16a337das844f4ac74766b459@mail.gmail.com> <73045cca0809261340s1ee6692fo8834d2261d3a5ab@mail.gmail.com> Message-ID: <320fb6e00809261413o1e4da682s6968aded56e6bd14@mail.gmail.com> On Fri, Sep 26, 2008 at 9:40 PM, aditya shukla wrote: > > Hello, Thanks for the help i appreciate it. > > I got the Thomas mailund's newick module to work with the kind of files i > have (they are in extended newick format).I wanna calculate > distances between the inner nodes , the current module dosen't do that.Do > you have any idea of how to accomplish this? Using Thomas Mailund's module, I think you would have to identify the shared parent node, then add up the branch lengths on the two branches. You can do it - but its a little tricky. When I last did any work on trees like this, I think I tended to use recursive functions, but I don't have any examples handy. Peter From biopython at maubp.freeserve.co.uk Fri Sep 26 17:22:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:22:48 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD2DE6.10908@gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> Message-ID: <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> >>> Suppose you have some sequences which you have aligned in ClustalW, >>> and most have leading or trailing gaps characters. e.g. Given >>> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >>> you might want to strip off the leading and trailing gaps to have just >>> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >>> the Seq object doesn't have a strip method, so you would have to >>> switch to a string and back again. >> >> Using pure python strings: >> >> long_seq_str = "---SAD-KCNKADND---" >> trimmed_seq_str = long_seq_str.strip("-") This gives "SAD-KCNKADND", it does NOT remove the internal "-" character. >> Using Biopython Seq objects: >> >> from Bio.Seq import Seq >> from Bio.Alphabet import generic_protein >> long_seq = Seq("---SAD-KCNKADND---", generic_protein) >> #I want to be able to do this: >> trimmed_seq = long_seq.strip("-") >> #Right now, I have to do something like this: >> trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) This gives Seq("SAD-KCNKADND", ProteinAlphabet()), i.e. it would NOT remove the internal "-" character. > While I do like the idea, strip(), as defined here, is inconsistent with the > Python string version. Python documentation: strip([chars]): "Return a > copy of the string with the leading and trailing characters removed." My intended Seq strip method is intended EXACTLY like the python string apart from the default strip characters (except I would suggest defaulting to the gap character rather than white space). My proposed implementation even calls the python string strip method internally. Have another look at the suggested code: http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > Rather you should use an alternative word like compress to remove the said > character from within a sequence. I suspect you have misunderstood my intension. My Seq object .strip() method would NOT remove the given characters from the interior of the sequence - only from the ends. However, there is certainly a case for wanting an .ungap() method for the Seq class (or a more general method to remove all of a particular character), but I hadn't intended to raise this issue yet. Peter From bsouthey at gmail.com Fri Sep 26 17:53:35 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 16:53:35 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> Message-ID: <48DD59DF.1000504@gmail.com> Peter wrote: >>>> Suppose you have some sequences which you have aligned in ClustalW, >>>> and most have leading or trailing gaps characters. e.g. Given >>>> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >>>> you might want to strip off the leading and trailing gaps to have just >>>> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >>>> the Seq object doesn't have a strip method, so you would have to >>>> switch to a string and back again. >>>> >>> Using pure python strings: >>> >>> long_seq_str = "---SAD-KCNKADND---" >>> trimmed_seq_str = long_seq_str.strip("-") >>> > > This gives "SAD-KCNKADND", it does NOT remove the internal "-" character. > > >>> Using Biopython Seq objects: >>> >>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_protein >>> long_seq = Seq("---SAD-KCNKADND---", generic_protein) >>> #I want to be able to do this: >>> trimmed_seq = long_seq.strip("-") >>> #Right now, I have to do something like this: >>> trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) >>> > > This gives Seq("SAD-KCNKADND", ProteinAlphabet()), i.e. it would NOT > remove the internal "-" character. > > >> While I do like the idea, strip(), as defined here, is inconsistent with the >> Python string version. Python documentation: strip([chars]): "Return a >> copy of the string with the leading and trailing characters removed." >> > > My intended Seq strip method is intended EXACTLY like the python > string apart from the default strip characters (except I would suggest > defaulting to the gap character rather than white space). My proposed > implementation even calls the python string strip method internally. > Have another look at the suggested code: > http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > > >> Rather you should use an alternative word like compress to remove the said >> character from within a sequence. >> > > I suspect you have misunderstood my intension. My Seq object .strip() > method would NOT remove the given characters from the interior of the > sequence - only from the ends. > > However, there is certainly a case for wanting an .ungap() method for > the Seq class (or a more general method to remove all of a particular > character), but I hadn't intended to raise this issue yet. > > Peter > > Yes, sorry about that. I misunderstood because I confused myself with the first part that uses the split. Bruce From biopython at maubp.freeserve.co.uk Fri Sep 26 17:57:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:57:31 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD59DF.1000504@gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> Message-ID: <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> >> I suspect you have misunderstood my intension. My Seq object .strip() >> method would NOT remove the given characters from the interior of the >> sequence - only from the ends. >> >> However, there is certainly a case for wanting an .ungap() method for >> the Seq class (or a more general method to remove all of a particular >> character), but I hadn't intended to raise this issue yet. >> >> Peter > > Yes, sorry about that. I misunderstood because I confused myself with the > first part that uses the split. > > Bruce Fair enough - maybe I shouldn't have tackled both methods in one email... but I'm glad we cleared that up. Anyway - do think adding the split and strip methods to the Seq object is worthwhile? Peter From bsouthey at gmail.com Fri Sep 26 21:55:18 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 20:55:18 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> Message-ID: On Fri, Sep 26, 2008 at 4:57 PM, Peter wrote: >>> I suspect you have misunderstood my intension. My Seq object .strip() >>> method would NOT remove the given characters from the interior of the >>> sequence - only from the ends. >>> >>> However, there is certainly a case for wanting an .ungap() method for >>> the Seq class (or a more general method to remove all of a particular >>> character), but I hadn't intended to raise this issue yet. >>> >>> Peter >> >> Yes, sorry about that. I misunderstood because I confused myself with the >> first part that uses the split. >> >> Bruce > > Fair enough - maybe I shouldn't have tackled both methods in one > email... but I'm glad we cleared that up. > > Anyway - do think adding the split and strip methods to the Seq object > is worthwhile? > > Peter > Yes - in fact probably essential now many users are likely to have to and want to parse genome sequences. I really would like to see many of the sequence methods 'work' in the same manner Python string methods. The string methods that I use a lot for sequences are: strip split join find (I don't the 'l' and 'r' versions very much.) So you would address the first two. I do something like your ungap() idea with strings using join: >>> ''.join(sequence.split('-')) Python 2.5 introduced 'partition(sep): Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator'. While I don't use it (because I usually split multiple times) it has advantages if you are looking for the first occurrence of a patten: >>> a='GTATGCGTAATG' >>> a.partition('ATG') ('GT', 'ATG', 'CGTAATG') Regards Bruce From biopython at maubp.freeserve.co.uk Sat Sep 27 08:57:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Sep 2008 13:57:41 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> Message-ID: <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> > >> Anyway - do think adding the split and strip methods to the Seq object >> is worthwhile? > > Yes - in fact probably essential now many users are likely to have to > and want to parse genome sequences. > > I really would like to see many of the sequence methods 'work' in the > same manner Python string methods. The string methods that I use a lot > for sequences are: > strip > split > join > find > > (I don't the 'l' and 'r' versions very much.) > So you would address the first two. I was planning deal with strip and split first, and then move on to discuss the remaining string methods. No one has objected to adding strip and split (plus lstrip and rstrip) so if we take that as a consensus, the only point we should still debate is their default arguments. Other alternatives to what I have already put forward include following the python string and defaulting to white space (which would never normally be present in a sequence), or making the arguments non optional. Adding the join method is more complicated as regards the alphabet of the sequence and the list of sequences given (which could be strings or Seq objects) - but in principle I think we should support it. I'd prefer to leave this one till last! Adding support for find should be straight forward. > I do something like your ungap() idea with strings using join: >>>> ''.join(sequence.split('-')) That would work but to get a sensible alphabet forces a much longer version with Seq objects - something like this: Seq("", generic_protein).join(my_seq.split("-")) Having my_seq.ungap() or my_seq.ungap("-") would in my opinion be much clearer for the reader, plus the ungap method would also be able to amend the alphabet appropriately. > Python 2.5 introduced 'partition(sep): Split the string at the first > occurrence of sep, and return a 3-tuple containing the part before the > separator, the separator itself, and the part after the separator'. > While I don't use it (because I usually split multiple times) it has > advantages if you are looking for the first occurrence of a patten: >>>> a='GTATGCGTAATG' >>>> a.partition('ATG') > ('GT', 'ATG', 'CGTAATG') Thanks for pointing that out. I hadn't noticed the addition of the partition method to python - until recently my main machine ran python 2.4 (and even now I still use python 2.3 on some occasions). However, we could still add a partition method to the Seq object, but wouldn't be able to take advantage of the string implementation on the older versions of python. Peter From bsouthey at gmail.com Sat Sep 27 21:06:54 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 27 Sep 2008 20:06:54 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> Message-ID: On Sat, Sep 27, 2008 at 7:57 AM, Peter wrote: >> >>> Anyway - do think adding the split and strip methods to the Seq object >>> is worthwhile? >> >> Yes - in fact probably essential now many users are likely to have to >> and want to parse genome sequences. >> >> I really would like to see many of the sequence methods 'work' in the >> same manner Python string methods. The string methods that I use a lot >> for sequences are: >> strip >> split >> join >> find >> >> (I don't the 'l' and 'r' versions very much.) >> So you would address the first two. > > I was planning deal with strip and split first, and then move on to > discuss the remaining string methods. > > No one has objected to adding strip and split (plus lstrip and rstrip) > so if we take that as a consensus, the only point we should still > debate is their default arguments. Other alternatives to what I have > already put forward include following the python string and defaulting > to white space (which would never normally be present in a sequence), > or making the arguments non optional. I do agree especially in terms of attempting to keep the standard Python defaults and behavior. > > Adding the join method is more complicated as regards the alphabet of > the sequence and the list of sequences given (which could be strings > or Seq objects) - but in principle I think we should support it. I'd > prefer to leave this one till last! Well I have the view that if this is easier to do now then it should be done now. > > Adding support for find should be straight forward. This would be great to have. > >> I do something like your ungap() idea with strings using join: >>>>> ''.join(sequence.split('-')) > > That would work but to get a sensible alphabet forces a much longer > version with Seq objects - something like this: > Seq("", generic_protein).join(my_seq.split("-")) > > Having my_seq.ungap() or my_seq.ungap("-") would in my opinion be much > clearer for the reader, plus the ungap method would also be able to > amend the alphabet appropriately. I do agree and the terminology is appropriate. > >> Python 2.5 introduced 'partition(sep): Split the string at the first >> occurrence of sep, and return a 3-tuple containing the part before the >> separator, the separator itself, and the part after the separator'. >> While I don't use it (because I usually split multiple times) it has >> advantages if you are looking for the first occurrence of a patten: >>>>> a='GTATGCGTAATG' >>>>> a.partition('ATG') >> ('GT', 'ATG', 'CGTAATG') > > Thanks for pointing that out. I hadn't noticed the addition of the > partition method to python - until recently my main machine ran python > 2.4 (and even now I still use python 2.3 on some occasions). However, > we could still add a partition method to the Seq object, but wouldn't > be able to take advantage of the string implementation on the older > versions of python. > The real question is would this functionality be sufficiently useful to justify it? I can see that it is useful for very special cases like open reading frames but I do not think that this is sufficient. Regards Bruce From biopython at maubp.freeserve.co.uk Mon Sep 29 07:50:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Sep 2008 12:50:21 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> Message-ID: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> >> No one has objected to adding strip and split (plus lstrip and rstrip) >> so if we take that as a consensus, the only point we should still >> debate is their default arguments. Other alternatives to what I have >> already put forward include following the python string and defaulting >> to white space (which would never normally be present in a sequence), >> or making the arguments non optional. Listing out those explicitly, for the poposed Seq object methods: (a) strip, lstrip and rstrip default to the gap character. split argument is mandatory. (b) strip, lstrip, rstrip and split all default to white space (following string default) (c) strip, lstrip, rstrip and split all have a mandatory argument (explicit better than implicit) > I do agree especially in terms of attempting to keep the standard > Python defaults and behavior. I assume you are agreeing with what I have now called (b), follow the string defaults of white space for stipping or splitting (for consistency, even though this won't typically be useful for sequences). On balance this would probably be best from a principle of consistency and least surprise for the user - I'll update the patches. Peter From biopython at maubp.freeserve.co.uk Mon Sep 29 08:06:17 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Sep 2008 13:06:17 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> Message-ID: <320fb6e00809290506p8aa2b51p4901b693ebb268bf@mail.gmail.com> > I assume you [Bruce] are agreeing with ... follow[ing] the > string defaults of white space for stipping or splitting (for > consistency, even though this won't typically be useful for > sequences). On balance this would probably be best from > a principle of consistency and least surprise for the user - > I'll update the patches. New patch for Seq object split, strip, lstrip and rstrip methods on Bug 2596 which follows the python string defaults (splitting on or stripping of white space). http://bugzilla.open-bio.org/show_bug.cgi?id=2596 If everyone is happy with this, I can extend the unit tests and documentation too. Peter From lpritc at scri.ac.uk Mon Sep 29 08:15:11 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 29 Sep 2008 13:15:11 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> Message-ID: +1 for following the Python default. A less straightforward implementation might use a knowledge of the Alphabet for the Seq to strip out non-Alphabet characters, such as gaps, but I think that this, along with (a) and (c), would be overkill. I like the principle of least surprise in this case. On 29/09/2008 12:50, "Peter" wrote: >>> No one has objected to adding strip and split (plus lstrip and rstrip) >>> so if we take that as a consensus, the only point we should still >>> debate is their default arguments. Other alternatives to what I have >>> already put forward include following the python string and defaulting >>> to white space (which would never normally be present in a sequence), >>> or making the arguments non optional. > > Listing out those explicitly, for the poposed Seq object methods: > (a) strip, lstrip and rstrip default to the gap character. split > argument is mandatory. > (b) strip, lstrip, rstrip and split all default to white space > (following string default) > (c) strip, lstrip, rstrip and split all have a mandatory argument > (explicit better than implicit) > >> I do agree especially in terms of attempting to keep the standard >> Python defaults and behavior. > > I assume you are agreeing with what I have now called (b), follow the > string defaults of white space for stipping or splitting (for > consistency, even though this won't typically be useful for > sequences). On balance this would probably be best from a principle > of consistency and least surprise for the user - I'll update the > patches. > > Peter -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From meesters at uni-mainz.de Mon Sep 1 06:57:24 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 1 Sep 2008 08:57:24 +0200 Subject: [BioPython] NumPy In-Reply-To: <128888.36737.qm@web62405.mail.re1.yahoo.com> References: <128888.36737.qm@web62405.mail.re1.yahoo.com> Message-ID: <1220252244.28769.6.camel@meesters.biologie.uni-mainz.de> Hi, Apart from compiling issues - which can be fixed in collaboration with the numpy developers - another reason for updating to numpy is Python 3. In numpy there are already efforts to make the transition: http://www.scipy.org/Python3k While this task is not yet completed, it (almost) certainly never will be in the case of Numeric. Christian From chapmanb at 50mail.com Mon Sep 1 18:48:24 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 1 Sep 2008 14:48:24 -0400 Subject: [BioPython] NumPy changes ready to check in Message-ID: <20080901184823.GA7915@localdomain> Hi all; Hope everyone is doing well. Nice to see many familiar faces on the lists. The NumPy discussion over the past couple of days is very timely, as I was hoping to get permission to check in changes to convert over to NumPy with very reasonable back compatibility with Numeric. Recently, I wanted to utilize the PDB parser, and realized it still used the old Numeric module. Over the last few evenings, I made the adjustments and made sure things are not broken based on regression and simple script testing. The changes are fairly extensive in terms of modules touched, but limited in terms of the actual change. All python level changes use the "oldnumeric" compatibility layer if the Numeric import fails: try: from Numeric import array, Float0 except ImportError: from numpy.oldnumeric import array, Float0 So if people are still using Numeric, these changes should not impact them, but will allow use of NumPy if desired. Over time we would probably want to get rid of the Numeric imports as everyone moves to NumPy. The C/C++ changes are limited to KDTree and Cluster and are not back compatible, but do not require code level changes over than the imports as they also use the C compatibility layer in NumPy. So the major transition point would be people using KDTree and Cluster to install NumPy. I still have my CVS account and can check in these changes if y'all approve. Thanks all, Brad -- Brad Chapman Codon Devices http://www.codondevices.com From p.j.a.cock at googlemail.com Mon Sep 1 19:08:06 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 1 Sep 2008 20:08:06 +0100 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <20080901184823.GA7915@localdomain> References: <20080901184823.GA7915@localdomain> Message-ID: <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> On Mon, Sep 1, 2008 at 7:48 PM, Brad Chapman wrote: > Hi all; > Hope everyone is doing well. Nice to see many familiar faces on the > lists. Wow - welcome back! > The NumPy discussion over the past couple of days is very timely, > as I was hoping to get permission to check in changes to convert over to > NumPy with very reasonable back compatibility with Numeric. > > Recently, I wanted to utilize the PDB parser, and realized it still > used the old Numeric module. Over the last few evenings, I made the > adjustments and made sure things are not broken based on regression > and simple script testing. > > The changes are fairly extensive in terms of modules touched, but > limited in terms of the actual change. All python level changes use > the "oldnumeric" compatibility layer if the Numeric import fails: > > try: > from Numeric import array, Float0 > except ImportError: > from numpy.oldnumeric import array, Float0 > > So if people are still using Numeric, these changes should not impact > them, but will allow use of NumPy if desired. Over time we would > probably want to get rid of the Numeric imports as everyone moves to > NumPy. This on the fly system is great for the pure python code... > The C/C++ changes are limited to KDTree and Cluster and are not back > compatible, but do not require code level changes over than the > imports as they also use the C compatibility layer in NumPy. So the > major transition point would be people using KDTree and Cluster to > install NumPy. Its certainly the C/C++ bits that make life complicated. Have you looked at the patch on Bug 2251 at all? http://bugzilla.open-bio.org/show_bug.cgi?id=2251 > I still have my CVS account and can check in these changes if y'all > approve. Thanks all, > > Brad Michiel and I had chatted about doing one more release with Numeric only - there are a few things I'd like to try and get in first like ambiguous translation (Bug 2530) but perhaps we should just do Biopython 1.48 pretty much as is. http://bugzilla.open-bio.org/show_bug.cgi?id=2530 I'm happy to try and put together the release this week - but will need a hand for the windows builds. If that pans out, then we could start the Numeric to numpy transition in CVS. Peter From chris.lasher at gmail.com Tue Sep 2 00:00:20 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 1 Sep 2008 20:00:20 -0400 Subject: [BioPython] Determining if GenBank record is circular Message-ID: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> Is there a way to see if a sequence represented in a GenBank record is circular in the information contained within a SeqRecord object created by parsing the record with SeqIO? I couldn't find it in the annotations, description, or format attributes of SeqRecord. An example file that I'm looking for this in is ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mesorhizobium_loti/NC_002678.gbk Thanks, Chris From chris.lasher at gmail.com Tue Sep 2 01:25:09 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 1 Sep 2008 21:25:09 -0400 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> Message-ID: <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: > > Should be in LOCUS: > > LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Ah, sure. Let me re-state my question more precisely: Where is this represented in the SeqRecord object created by SeqIO.parse(), or is it represented at all? Best, Chris From biopython at maubp.freeserve.co.uk Tue Sep 2 09:00:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 10:00:41 +0100 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> Message-ID: <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> On Tue, Sep 2, 2008 at 2:25 AM, Chris Lasher wrote: > On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: >> >> Should be in LOCUS: >> >> LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 > > Ah, sure. Let me re-state my question more precisely: Where is this > represented in the SeqRecord object created by SeqIO.parse(), or is it > represented at all? Currently if the sequence is circular I don't think it is represented at all when parsed in a SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. Do you want to file a bug on this Chris? Peter From chapmanb at 50mail.com Tue Sep 2 12:04:38 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 2 Sep 2008 08:04:38 -0400 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> References: <20080901184823.GA7915@localdomain> <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> Message-ID: <20080902120438.GB7990@localdomain> Hi Peter; > Wow - welcome back! Thanks. Glad to be able to help; great to see you around. > Its certainly the C/C++ bits that make life complicated. > > Have you looked at the patch on Bug 2251 at all? > http://bugzilla.open-bio.org/show_bug.cgi?id=2251 I did; sorry, forgot to mention that in my last e-mail. I wasn't exactly sure why this wasn't checked in then as the thread dies a bit. It was a great help for getting started. I think the idea there is good, but might be a little more complicated than needed. My thoughts were that we need to manage a transition to NumPy rather than support both Numeric and NumPy concurrently. Numeric hasn't been updated since 2005 and the compatibility modules for transitioning seem solid, so most people should have moved by now or need some encouragement to do so. We are probably being more annoying to up to date users by keeping Numeric support. The way it is implemented in my recent changes are that Numeric users will not get broken builds, but will rather not get the C extensions compiled unless they have NumPy. So the users who absolutely need to install NumPy are those still using Numeric and using the C/C++ extensions. > Michiel and I had chatted about doing one more release with Numeric > only - there are a few things I'd like to try and get in first like > ambiguous translation (Bug 2530) but perhaps we should just do > Biopython 1.48 pretty much as is. > http://bugzilla.open-bio.org/show_bug.cgi?id=2530 > > I'm happy to try and put together the release this week - but will > need a hand for the windows builds. > > If that pans out, then we could start the Numeric to numpy transition in CVS. That sounds fine to me. Once the new release is out I can check things in and we can find out if there are any issues. Brad -- Brad Chapman Codon Devices http://www.codondevices.com From biopython at maubp.freeserve.co.uk Tue Sep 2 12:46:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 13:46:53 +0100 Subject: [BioPython] Bio.MetaTool In-Reply-To: <46010.36121.qm@web62405.mail.re1.yahoo.com> References: <46010.36121.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00809020546y385bb9dcw749fe7f4713c8955@mail.gmail.com> On Sat, Aug 30, 2008 at 3:45 AM, Michiel de Hoon wrote: > Hi everybody, > > Is anybody using the Bio.MetaTool module? If not, can we deprecate it? > > The Bio.MetaTool tests suggest that this module was written for MetaTool > version 3.5 (28.03.2001), while the most current MetaTool version is at 5.0. > Since MetaTool is written for Matlab/Octave, and it seems to be out of data, > I expect that few people are using it with Python. > > Currently, Bio.MetaTool is the only non-deprecated module in Biopython > that uses Martel. If we can deprecate Bio.MetaTool, then (over time) we > can deprecate Martel, which means that Biopython won't need the > mxTextTools any more, making Biopython's installation a lot easier. I have now deprecated Bio.MetaTool in CVS. If anyone is still using this, please speak up. I have also marked Bio.Mindy and Martel as obsolete (documentation changes only - no warning messages), and likely to be deprecated or removed in a future release. Peter From bsouthey at gmail.com Tue Sep 2 13:55:30 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 02 Sep 2008 08:55:30 -0500 Subject: [BioPython] NumPy changes ready to check in In-Reply-To: <20080902120438.GB7990@localdomain> References: <20080901184823.GA7915@localdomain> <320fb6e00809011208w4fb7a34h577a1a7dcb6a938d@mail.gmail.com> <20080902120438.GB7990@localdomain> Message-ID: <48BD45D2.4090408@gmail.com> Brad Chapman wrote: > Hi Peter; > > >> Wow - welcome back! >> > > Thanks. Glad to be able to help; great to see you around. > > >> Its certainly the C/C++ bits that make life complicated. >> >> Have you looked at the patch on Bug 2251 at all? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 >> > > I did; sorry, forgot to mention that in my last e-mail. I wasn't > exactly sure why this wasn't checked in then as the thread dies a > bit. It was a great help for getting started. > > I think the idea there is good, but might be a little more > complicated than needed. My thoughts were that we need to manage a > transition to NumPy rather than support both Numeric and NumPy > concurrently. Numeric hasn't been updated since 2005 and the > compatibility modules for transitioning seem solid, so most people > should have moved by now or need some encouragement to do so. We are > probably being more annoying to up to date users by keeping Numeric > support. > > The way it is implemented in my recent changes are that Numeric users will > not get broken builds, but will rather not get the C extensions > compiled unless they have NumPy. So the users who absolutely need to > install NumPy are those still using Numeric and using the C/C++ > extensions. > > >> Michiel and I had chatted about doing one more release with Numeric >> only - there are a few things I'd like to try and get in first like >> ambiguous translation (Bug 2530) but perhaps we should just do >> Biopython 1.48 pretty much as is. >> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 >> >> I'm happy to try and put together the release this week - but will >> need a hand for the windows builds. >> >> If that pans out, then we could start the Numeric to numpy transition in CVS. >> > > That sounds fine to me. Once the new release is out I can check > things in and we can find out if there are any issues. > > Brad > Hi, This is excellent work! I agree that at least one more version of BioPython should be released. This version would allow backporting of any fixes if necessary. The transition should require NumPy 1.2 or at least NumPy 1.1 due to NumPy API changes. On that note (referring to the related thread), NumPy's API has to change to support Python3K (due to Python API changes) which limit the porting of BioPython to Python3K (ignoring the significant porting effort for BioPython). Also that probably only NumPy will be ported to Python3K. Also I would like to point out this thread on the future directions of NumPy that should make NumPy easier to support: http://projects.scipy.org/pipermail/numpy-discussion/2008-August/036909.html Bruce From matzke at berkeley.edu Tue Sep 2 23:19:01 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:19:01 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb Message-ID: <48BDC9E5.3020905@berkeley.edu> Hi all, I'm following the BioSQL tutorial at the biopython website (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb to work, and the biosql python scripts etc. However I can't get these directions to work in loading the taxonomy database into bioseqdb. I get: "Can't locate object method "connect" via package "DBI" " I double-checked to make sure I've got DBI in perl (see error message below) but that doesn't seem to help. Any help appreciated! Nick commands/returns: =========================== =========================== =========================== mws2:/bioinformatics/eg nick$ perl -v This is perl, v5.10.0 built for darwin-thread-multi-2level (with 3 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 1002 [283697] provided by ActiveState http://www.ActiveState.com Built Jan 10 2008 11:08:33 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. mws2:/bioinformatics/eg nick$ sudo cpan DBI Password: CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Wed, 23 Jul 2008 10:02:50 GMT CPAN: LWP::UserAgent loaded ok (v5.810) CPAN: Time::HiRes loaded ok (v1.9711) Fetching with LWP: http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz CPAN: YAML loaded ok (v0.66) Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz ............................................................................DONE Fetching with LWP: http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz Database was generated on Mon, 01 Sep 2008 10:02:51 GMT ............................................................................DONE Fetching with LWP: http://ppm.activestate.com/CPAN/modules/03modlist.data.gz Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz ............................................................................DONE Going to write /usr/local/Metadata DBI is up to date (1.607). mws2:/bioinformatics/eg nick$ /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true Can't locate object method "connect" via package "DBI" at /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl line 323. mws2:/bioinformatics/eg nick$ =========================== =========================== =========================== -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From matzke at berkeley.edu Tue Sep 2 23:36:45 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:36:45 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDC9E5.3020905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> Message-ID: <48BDCE0D.5000905@berkeley.edu> Based on some web advice on this general error I also did this: sudo cpan DBD::mysql ...which did install something that wasn't installed. But the same error resulted when the load_ncbi_taxonomy.pl script was run. Nick Nick Matzke wrote: > Hi all, > > I'm following the BioSQL tutorial at the biopython website > (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get > bioseqdb to work, and the biosql python scripts etc. > > However I can't get these directions to work in loading the taxonomy > database into bioseqdb. I get: "Can't locate object method "connect" > via package "DBI" " > > I double-checked to make sure I've got DBI in perl (see error message > below) but that doesn't seem to help. > > Any help appreciated! > Nick > > > commands/returns: > =========================== > =========================== > =========================== > mws2:/bioinformatics/eg nick$ perl -v > > This is perl, v5.10.0 built for darwin-thread-multi-2level > (with 3 registered patches, see perl -V for more detail) > > Copyright 1987-2007, Larry Wall > > Binary build 1002 [283697] provided by ActiveState > http://www.ActiveState.com > Built Jan 10 2008 11:08:33 > > Perl may be copied only under the terms of either the Artistic License > or the > GNU General Public License, which may be found in the Perl 5 source kit. > > Complete documentation for Perl, including FAQ lists, should be found on > this system using "man perl" or "perldoc perl". If you have access to the > Internet, point your browser at http://www.perl.org/, the Perl Home Page. > > > > > > > mws2:/bioinformatics/eg nick$ sudo cpan DBI > Password: > > CPAN: Storable loaded ok (v2.18) > Going to read /usr/local/Metadata > Database was generated on Wed, 23 Jul 2008 10:02:50 GMT > CPAN: LWP::UserAgent loaded ok (v5.810) > CPAN: Time::HiRes loaded ok (v1.9711) > Fetching with LWP: > http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz > CPAN: YAML loaded ok (v0.66) > Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz > ............................................................................DONE > > Fetching with LWP: > http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz > Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz > Database was generated on Mon, 01 Sep 2008 10:02:51 GMT > ............................................................................DONE > > Fetching with LWP: > http://ppm.activestate.com/CPAN/modules/03modlist.data.gz > Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz > ............................................................................DONE > > Going to write /usr/local/Metadata > DBI is up to date (1.607). > > > > > > > mws2:/bioinformatics/eg nick$ > /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl > --dbname bioseqdb --driver mysql --dbuser root --download true > > > Can't locate object method "connect" via package "DBI" at > /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl > line 323. > > mws2:/bioinformatics/eg nick$ > =========================== > =========================== > =========================== > > > > > > > > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From matzke at berkeley.edu Tue Sep 2 23:43:31 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 02 Sep 2008 16:43:31 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDCE0D.5000905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> <48BDCE0D.5000905@berkeley.edu> Message-ID: <48BDCFA3.4050605@berkeley.edu> Perl *does* successfully recognize DBI and DBD::mysql, so e.g. not finding them in @INC or something doesn't seem to be the problem... (these commands return '1' if the module is found) ============ perl -MDBD::mysql -e 'print 1' 1 perl -MDBI -e 'print 1' 1 ============ Nick Matzke wrote: > Based on some web advice on this general error I also did this: > > sudo cpan DBD::mysql > > ...which did install something that wasn't installed. But the same > error resulted when the load_ncbi_taxonomy.pl script was run. > > Nick > > > Nick Matzke wrote: >> Hi all, >> >> I'm following the BioSQL tutorial at the biopython website >> (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get >> bioseqdb to work, and the biosql python scripts etc. >> >> However I can't get these directions to work in loading the taxonomy >> database into bioseqdb. I get: "Can't locate object method "connect" >> via package "DBI" " >> >> I double-checked to make sure I've got DBI in perl (see error message >> below) but that doesn't seem to help. >> >> Any help appreciated! >> Nick >> >> >> commands/returns: >> =========================== >> =========================== >> =========================== >> mws2:/bioinformatics/eg nick$ perl -v >> >> This is perl, v5.10.0 built for darwin-thread-multi-2level >> (with 3 registered patches, see perl -V for more detail) >> >> Copyright 1987-2007, Larry Wall >> >> Binary build 1002 [283697] provided by ActiveState >> http://www.ActiveState.com >> Built Jan 10 2008 11:08:33 >> >> Perl may be copied only under the terms of either the Artistic License >> or the >> GNU General Public License, which may be found in the Perl 5 source kit. >> >> Complete documentation for Perl, including FAQ lists, should be found on >> this system using "man perl" or "perldoc perl". If you have access to >> the >> Internet, point your browser at http://www.perl.org/, the Perl Home Page. >> >> >> >> >> >> >> mws2:/bioinformatics/eg nick$ sudo cpan DBI >> Password: >> >> CPAN: Storable loaded ok (v2.18) >> Going to read /usr/local/Metadata >> Database was generated on Wed, 23 Jul 2008 10:02:50 GMT >> CPAN: LWP::UserAgent loaded ok (v5.810) >> CPAN: Time::HiRes loaded ok (v1.9711) >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/authors/01mailrc.txt.gz >> CPAN: YAML loaded ok (v0.66) >> Going to read /Users/nick/.cpan/sources/authors/01mailrc.txt.gz >> ............................................................................DONE >> >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/modules/02packages.details.txt.gz >> Going to read /Users/nick/.cpan/sources/modules/02packages.details.txt.gz >> Database was generated on Mon, 01 Sep 2008 10:02:51 GMT >> ............................................................................DONE >> >> Fetching with LWP: >> http://ppm.activestate.com/CPAN/modules/03modlist.data.gz >> Going to read /Users/nick/.cpan/sources/modules/03modlist.data.gz >> ............................................................................DONE >> >> Going to write /usr/local/Metadata >> DBI is up to date (1.607). >> >> >> >> >> >> >> mws2:/bioinformatics/eg nick$ >> /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl >> --dbname bioseqdb --driver mysql --dbuser root --download true >> >> >> Can't locate object method "connect" via package "DBI" at >> /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl >> line 323. >> >> mws2:/bioinformatics/eg nick$ >> =========================== >> =========================== >> =========================== >> >> >> >> >> >> >> >> > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From biopython at maubp.freeserve.co.uk Wed Sep 3 11:19:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Sep 2008 12:19:29 +0100 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <48BDC9E5.3020905@berkeley.edu> References: <48BDC9E5.3020905@berkeley.edu> Message-ID: <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> On Wed, Sep 3, 2008 at 12:19 AM, Nick Matzke wrote: > Hi all, > > I'm following the BioSQL tutorial at the biopython website > (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb > to work, and the biosql python scripts etc. > > However I can't get these directions to work in loading the taxonomy > database into bioseqdb. I get: "Can't locate object method "connect" via > package "DBI" " > > I double-checked to make sure I've got DBI in perl (see error message below) > but that doesn't seem to help. This does sound like a question for the BioSQL mailing list (which I see you've now asked on). I'm no perl expert - so if you can resolve this via the BioSQL mailing list, and we can improve the Biopython BioSQL wiki page, that would be great. Peter From chris.lasher at gmail.com Wed Sep 3 16:34:24 2008 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 3 Sep 2008 12:34:24 -0400 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> Message-ID: <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> On Tue, Sep 2, 2008 at 5:00 AM, Peter wrote: > > On Tue, Sep 2, 2008 at 2:25 AM, Chris Lasher wrote: > > On Mon, Sep 1, 2008 at 8:19 PM, Iddo Friedberg wrote: > >> > >> Should be in LOCUS: > >> > >> LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 > > > > Ah, sure. Let me re-state my question more precisely: Where is this > > represented in the SeqRecord object created by SeqIO.parse(), or is it > > represented at all? > > Currently if the sequence is circular I don't think it is represented > at all when parsed in a SeqRecord. > > Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this > information from the Scanner via the residue_type event. This is a > combined lump of data containing both the sequence type (DNA, RNA etc) > and if it is linear or circular. It is currently only used to > determine the Seq alphabet, and has never been recorded. So in > addition to not recording if the LOCUS line said the sequence was > circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail > is also currently lost in the SeqRecord representation. On the other > hand, the Bio.GenBank.RecordParser stores all this as the record's > residue_type property (a single combined field, presumably reflecting > the layout of early GenBank files). > > It would be a logical improvement to record the sequence data > (molecule type and if circular) in the SeqRecord's annotations > dictionary - perhaps as two fields but we'd need to check if that > would be straight forward for EMBL files too. Alternatively, if > Biopython included a native CircularSeq object, we could use that > explicitly when the sequence is declared as circular. This might be > considered a little surprising though. > > Do you want to file a bug on this Chris? Would you mind filing it, Peter? I've got a poster to complete very soon. I think you did a fine job describing the features we'd like to add. Thanks, Chris From biopython at maubp.freeserve.co.uk Wed Sep 3 16:49:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Sep 2008 17:49:29 +0100 Subject: [BioPython] Determining if GenBank record is circular In-Reply-To: <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> References: <128a885f0809011700h69333d68xf0b0f2a1c4188cd1@mail.gmail.com> <128a885f0809011825m4550ca68q50557f1b7a75ee36@mail.gmail.com> <320fb6e00809020200s5ac59ed5xe4a65ae5117d0597@mail.gmail.com> <128a885f0809030934s2b466966m813eaf55932afb19@mail.gmail.com> Message-ID: <320fb6e00809030949m78ea710cq9d18a5677ebb8edf@mail.gmail.com> >> Do you want to file a bug on this Chris? > > Would you mind filing it, Peter? I've got a poster to complete very > soon. I think you did a fine job describing the features we'd like to > add. Done, Bug 2578 - The GenBank SeqRecord parser does not record module type or if circular http://bugzilla.open-bio.org/show_bug.cgi?id=2578 You might want to add yourself as a CC. I'm not planning to tackle this right away, and I'll be on holiday most of next week. Peter From matzke at berkeley.edu Wed Sep 3 22:45:53 2008 From: matzke at berkeley.edu (Nick Matzke) Date: Wed, 03 Sep 2008 15:45:53 -0700 Subject: [BioPython] problem loading NCBI_taxonomy database into bioseqdb In-Reply-To: <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> References: <48BDC9E5.3020905@berkeley.edu> <320fb6e00809030419x3a90cbb7vfa2195d48637b058@mail.gmail.com> Message-ID: <48BF13A1.3020804@berkeley.edu> (Resolved this on the BioSQL list but I figured I would follow up to biopython also -- thanks! & sorry for the confusion) Well, I'm not sure what I did, but some combination of these things seems to have worked. 1. moved the site/lib directory (which contains DBI.pm) to the front of my PERL5LIB (which goes into @INC) export PERL5LIB=$PERL5LIB:/usr/local/ActivePerl-5.10/site/lib:/usr/local/ActivePerl-5.10/man/man3:/usr/local/ActivePerl-5.10/site/lib/Bundle 2. Checked to make sure DBI & DBD::mysql were installed ========================================== mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl -MCPAN -e 'install DBI' CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Mon, 01 Sep 2008 10:02:51 GMT DBI is up to date (1.607). mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl -MCPAN -e 'install DBD::mysql' CPAN: Storable loaded ok (v2.18) Going to read /usr/local/Metadata Database was generated on Mon, 01 Sep 2008 10:02:51 GMT DBD::mysql is up to date (4.008). ========================================== 3. (Make sure you have an empty version of the db, at least for me I got errors if I had already loaded sequences etc. into it...I got errors like this: ========================================== note: node (28;331111;27;species;;) is retired; failed to delete: Cannot delete or update a parent row: a foreign key constraint fails (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY (`taxon_id`) REFERENCES `taxon` (`taxon_id`)) note: node (70;300268;69;species;;) is retired; failed to delete: Cannot delete or update a parent row: a foreign key constraint fails (`bioseqdb/bioentry`, CONSTRAINT `FKtaxon_bioentry` FOREIGN KEY (`taxon_id`) REFERENCES `taxon` (`taxon_id`)) note: node (77;3002 ========================================== 4. Ran it again: (I also added '--host localhost') mws2:/usr/local/ActivePerl-5.10/bin nick$ sudo perl /bioinformatics/pythonstuff/biosql-1.0.0/scripts/load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true --host localhost Loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert / update / delete taxon nodes ... (committing nodes) ... rebuilding nested set left/right values ... reading in taxon names from names.dmp ... deleting old taxon names ... inserting new taxon names ... cleaning up Done. So thanks for the help, something or other worked! Cheers, Nick Peter wrote: > On Wed, Sep 3, 2008 at 12:19 AM, Nick Matzke wrote: >> Hi all, >> >> I'm following the BioSQL tutorial at the biopython website >> (http://www.biopython.org/wiki/BioSQL#NCBI_Taxonomy ). I can get bioseqdb >> to work, and the biosql python scripts etc. >> >> However I can't get these directions to work in loading the taxonomy >> database into bioseqdb. I get: "Can't locate object method "connect" via >> package "DBI" " >> >> I double-checked to make sure I've got DBI in perl (see error message below) >> but that doesn't seem to help. > > This does sound like a question for the BioSQL mailing list (which I > see you've now asked on). I'm no perl expert - so if you can resolve > this via the BioSQL mailing list, and we can improve the Biopython > BioSQL wiki page, that would be great. > > Peter > -- ==================================================== Nicholas J. Matzke Ph.D. student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week) Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ==================================================== From quwubin at gmail.com Mon Sep 8 13:43:05 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 21:43:05 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi all, I started a new project named BioPythonGUI for a few of days. The following is the 'About' page from BioPythonGUI project. BioPythonGUI is a Graphical User Interface of BioPython. BioPython is a widely used python module set in bioinformatics. It help researchers: - Parsing files in different database formats - Interfaces into programs like Blast, Entrez and PubMed - A sequence class (can transcribe, translate, invert, etc) - Code for handling alignments of sequences - Clustering algorithms - etc. However, it's not everyone can use the BioPython, especially ones who do not know much about the programming. How can you expect a professor who never known about any programming to use BioPython to parse the BLAST report file? This is the problem which the BioPythonGUI would solve. I started the project with the goal "Everyone can use BioPython with BioPythonGUI". Until now, there are two modules SeqGUI and BlastGUI are available in BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me the feedback. Please see the developer's blog for details. Project Blog: http://biopythongui.blogspot.com/ Download: https://sites.google.com/site/biopythongui/download Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02# ______________________________ Best regards, Wubin Qu From quwubin at gmail.com Mon Sep 8 13:46:24 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 21:46:24 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi all, I started a new project named BioPythonGUI for a few of days. The following is the 'About' page from BioPythonGUI project. BioPythonGUI is a Graphical User Interface of BioPython. BioPython is a widely used python module set in bioinformatics. It help researchers: - Parsing files in di fferent database formats - Interfaces into programs like Blast, Entrez and PubMed - A sequence class (can transcribe, translate, invert, etc) - Code for handling alignments of sequences - Clustering algorithms - etc. However, it's not everyone can use the BioPython, especially ones who do not know much about the programming. How can you expect a professor who never known about any programming to use BioPython to parse the BLAST report file? This is the problem which the BioPythonGUI would solve. I started the project with the goal "Everyone can use BioPython with BioPythonGUI". Until now, there are two modules SeqGUI and BlastGUI are available in BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me the feedback. Please see the developer's blog for details. Project Blog: http://biopythongui.blogspot.com/ Download: https://sites.google.com/site/biopythongui/download Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02# ______________________________ Best regards, Wubin Qu From p.j.a.cock at googlemail.com Mon Sep 8 14:12:15 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 15:12:15 +0100 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: References: Message-ID: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > Hi all, > > I started a new project named BioPythonGUI for a few of days. Hello Wubin Qu, > BioPythonGUI is a Graphical User Interface of BioPython. I'm uncomfortable about the name BioPythonGUI, as this to me implies it is part of Biopython (whereas is it currently just a third party project built on top of Biopython). What do other people think? > However, it's not everyone can use the BioPython, especially ones who do not > know much about the programming. How can you expect a professor who never > known about any programming to use BioPython to parse the BLAST report file? > This is the problem which the BioPythonGUI would solve. I started the > project with the goal "Everyone can use BioPython with BioPythonGUI". I don't really understand your goal. How would a non-programming professor use your program to parse a BLAST report file? The NCBI already try and make the HTML and plain text output useful to non-programmers and from looking at the screenshots I don't see how your tool would help. > Until now, there are two modules SeqGUI and BlastGUI are available in > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me > the feedback. I see your module SeqGUI builds on the SeqGui.py in BioPython (in the scripts directory). It might make sense to include your improvements to this code as part of Biopython. I haven't looked at your code yet, so I don't know how much you've changed things. It is nice to be able to be able to translate, transcribe, reverse complement etc in a GUI, but personally I don't see the point or writing a little application just for this. Also, there are probably many many existing tools out there that already offer this functionality. However, I am happy writing code, so I am not in your target audience. Regarding your BlastGUI idea, I can see that a GUI for standalone blast is nicer than the command line for some people. However, I don't see how this is more useful than running a local blast web server (something the NCBI already provides). Sorry for being so negative, Peter From quwubin at gmail.com Mon Sep 8 14:38:27 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 22:38:27 +0800 Subject: [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> References: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> Message-ID: Hi Peter, Thans for your reply. My goal is simple: Programs with GUI are easily to use. BioPython with GUI will facilitate people. The next module is: BlastParserGUI. I think it will be useful. Yes, SeqGUI is built on SeqGui.py. And I learn a lot from SeqGui.py. It inspires me to build other modules. I mentioned this here . ______________________________ Best regards, Wubin Qu 2008/9/8 Peter Cock > On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > > Hi all, > > > > I started a new project named BioPythonGUI for a few of days. > > Hello Wubin Qu, > > > BioPythonGUI is a Graphical User Interface of BioPython. > > I'm uncomfortable about the name BioPythonGUI, as this to me implies > it is part of Biopython (whereas is it currently just a third party > project built on top of Biopython). What do other people think? > > > However, it's not everyone can use the BioPython, especially ones who do > not > > know much about the programming. How can you expect a professor who never > > known about any programming to use BioPython to parse the BLAST report > file? > > This is the problem which the BioPythonGUI would solve. I started the > > project with the goal "Everyone can use BioPython with BioPythonGUI". > > I don't really understand your goal. How would a non-programming > professor use your program to parse a BLAST report file? The NCBI > already try and make the HTML and plain text output useful to > non-programmers and from looking at the screenshots I don't see how > your tool would help. > > > Until now, there are two modules SeqGUI and BlastGUI are available in > > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send > me > > the feedback. > > I see your module SeqGUI builds on the SeqGui.py in BioPython (in the > scripts directory). It might make sense to include your improvements > to this code as part of Biopython. I haven't looked at your code yet, > so I don't know how much you've changed things. > > It is nice to be able to be able to translate, transcribe, reverse > complement etc in a GUI, but personally I don't see the point or > writing a little application just for this. Also, there are probably > many many existing tools out there that already offer this > functionality. However, I am happy writing code, so I am not in your > target audience. > > Regarding your BlastGUI idea, I can see that a GUI for standalone > blast is nicer than the command line for some people. However, I > don't see how this is more useful than running a local blast web > server (something the NCBI already provides). > > Sorry for being so negative, > > Peter > From biopython at maubp.freeserve.co.uk Tue Sep 9 10:14:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Sep 2008 11:14:11 +0100 Subject: [BioPython] Biopython 1.48 released Message-ID: <320fb6e00809090314s722f404bqda71d7d9f97360e7@mail.gmail.com> We are pleased to announce the release of Biopython 1.48. Some new functionality has been added, a few bugs have been fixed, the documentation has been updated, plus several obsolete modules have been deprecated (or explicitly labelled as obsolete). The following additional file formats are now supported in Bio.SeqIO and Bio.AlignIO: * reading and writing "tab" format (simple tab separated) * writing "nexus" files * reading "pir" files (NBRF/PIR) * basic support for writing "genbank" files (GenBank plain text) This release also fixes some problems reading Clustal alignments (introduced in Biopython 1.46 when consolidating Bio.AlignIO and Bio.Clustalw), and some updates to the Bio.Sequencing parsers. The SeqRecord and Alignment objects have a new method to get the object as a string in a given file format (handled via Bio.SeqIO and Bio.AlignIO). Bio.PubMed and the online code in Bio.GenBank are now considered obsolete, and we intend to deprecate them after the next release. For accessing PubMed and GenBank, please use Bio.Entrez instead. Martel and Bio.Mindy are now considered to be obsolete, and are likely to be deprecated and removed in a future release, at which point we will drop the optional dependency on mxTextTools. Bio.Fasta is also considered to be obsolete, please use Bio.SeqIO instead. We do intend to deprecate this module eventually, however, for several years this was the primary FASTA parsing module in Biopython and is likely to be in use in many existing scripts. In addition a number of other modules have been deprecated, including: Bio.MetaTool, Bio.EUtils, Bio.Saf, Bio.NBRF, and Bio.IntelliGenetics - see the DEPRECATED file for full details. Source distributions are available from the Biopython website at http://biopython.org, and Windows installers will be added shortly. My thanks to all bug reporters, code contributors and others who made this new release possible. Peter, on behalf of the Biopython developers P.S. This message will be forwarded to the Biopython anoucement mailing list shortly. For those of you who prefer news readers to email lists, have a look at the OBF news server: http://news.open-bio.org/news/2008/09/biopython-release-148/ where there are Biopython news feeds available: http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2 http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom From lueck at ipk-gatersleben.de Tue Sep 9 11:11:56 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 9 Sep 2008 13:11:56 +0200 Subject: [BioPython] Biopython 1.48 released Message-ID: <006b01c9126c$ded5eee0$1022a8c0@ipkgatersleben.de> Great work! Thanks to all! Stefanie ;-) From bsantos at biocant.pt Wed Sep 10 21:42:31 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 10 Sep 2008 21:42:31 +0000 Subject: [BioPython] Entrez.read Message-ID: <20080910214231.f492d16a@mail.biocant.pt> Hi everyone, I am trying to use Entrez.esearch to find the ids I can then use to fetch the genbank records. For that purpose I went looking for biopython tutorial for more information in how to do this, and I get the following information: >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]") >>> record = Entrez.read(handle) >>> record["Count"] '25' >>> record["IdList"] ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read to parse the output of Entrez.esearch. Is this a new option in the last release? Thanks in advance, Bruno Santos From bsantos at biocant.pt Thu Sep 11 15:09:58 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Thu, 11 Sep 2008 16:09:58 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: <001501c91420$769d0240$63d706c0$@pt> No answers? -----Mensagem original----- De: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos Enviada: quarta-feira, 10 de Setembro de 2008 22:43 Para: biopython at biopython.org Assunto: [BioPython] Entrez.read Hi everyone, I am trying to use Entrez.esearch to find the ids I can then use to fetch the genbank records. For that purpose I went looking for biopython tutorial for more information in how to do this, and I get the following information: >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]") >>> record = Entrez.read(handle) >>> record["Count"] '25' >>> record["IdList"] ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read to parse the output of Entrez.esearch. Is this a new option in the last release? Thanks in advance, Bruno Santos _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Thu Sep 11 15:37:58 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 11 Sep 2008 08:37:58 -0700 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: can you provide the steps and error messages you are getting? It seems to work fine for me. On Wed, Sep 10, 2008 at 2:42 PM, Bruno Santos wrote: > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND > matK[Gene]") > >>> record = Entrez.read(handle) > >>> record["Count"] > '25' > >>> record["IdList"] > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From fkauff at biologie.uni-kl.de Thu Sep 11 15:26:58 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 11 Sep 2008 17:26:58 +0200 Subject: [BioPython] Entrez.read In-Reply-To: <001501c91420$769d0240$63d706c0$@pt> References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> Message-ID: <48C938C2.2080902@biologie.uni-kl.de> Hi Bruno, from what I see your code should work. Without knowing what version of Biopython you are running and what error message you get, it's a bit difficult to tackle he problem. I'd guess your biopython is outdated. Frank Bruno Santos wrote: > No answers? > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > Para: biopython at biopython.org > Assunto: [BioPython] Entrez.read > > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND >>>> > matK[Gene]") > >>>> record = Entrez.read(handle) >>>> record["Count"] >>>> > '25' > >>>> record["IdList"] >>>> > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From bsantos at biocant.pt Thu Sep 11 16:36:43 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Thu, 11 Sep 2008 17:36:43 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <48C938C2.2080902@biologie.uni-kl.de> References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> <48C938C2.2080902@biologie.uni-kl.de> Message-ID: <001d01c9142c$94e2d520$bea87f60$@pt> Ok this seem to be a completely stupid error, I was playing a little bit with the pydev console in eclipse and the Entrez.read raise an exception, but if I place this instruction in a script it works fine. No idea why this is happening but in fact this method don't appear listed in eclipse auto complete. Thank you all for your answers. No I have another question it is possible to limit the esearch not only to database but also to species? Because I only need the ids associated with droshophila melanogaster, and have to fetch all the sequences and only than check if they belong to this species seems a waste of bandwidth. Thanks in advance, Bruno Santos -----Mensagem original----- De: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] Em nome de Frank Kauff Enviada: quinta-feira, 11 de Setembro de 2008 16:27 Para: biopython at biopython.org Assunto: Re: [BioPython] Entrez.read Hi Bruno, from what I see your code should work. Without knowing what version of Biopython you are running and what error message you get, it's a bit difficult to tackle he problem. I'd guess your biopython is outdated. Frank Bruno Santos wrote: > No answers? > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > Para: biopython at biopython.org > Assunto: [BioPython] Entrez.read > > Hi everyone, > I am trying to use Entrez.esearch to find the ids I can then use to fetch > the genbank records. > For that purpose I went looking for biopython tutorial for more information > in how to do this, and I get the following information: > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND >>>> > matK[Gene]") > >>>> record = Entrez.read(handle) >>>> record["Count"] >>>> > '25' > >>>> record["IdList"] >>>> > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > Unfortunately when I try to follow this instructions I found out that I > don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last > release? > Thanks in advance, > Bruno Santos > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Thu Sep 11 16:54:28 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 11 Sep 2008 09:54:28 -0700 Subject: [BioPython] Entrez.read In-Reply-To: References: <20080910214231.f492d16a@mail.biocant.pt> <001501c91420$769d0240$63d706c0$@pt> <48C938C2.2080902@biologie.uni-kl.de> <001d01c9142c$94e2d520$bea87f60$@pt> Message-ID: The example *is* limiting your search to the subfamily Cypripedioideae. To limit to a species you can write along the same lines: handle = Entrez.esearch(db="nucleotide",term="Escherichia coli[Orgn]") (With apologies to Brad Chapman. I'm a microbe guy, not a plant guy). ./I On Thu, Sep 11, 2008 at 9:36 AM, Bruno Santos wrote: > Ok this seem to be a completely stupid error, I was playing a little bit > with the pydev console in eclipse and the Entrez.read raise an exception, > but if I place this instruction in a script it works fine. No idea why this > is happening but in fact this method don't appear listed in eclipse auto > complete. > Thank you all for your answers. > No I have another question it is possible to limit the esearch not only to > database but also to species? Because I only need the ids associated with > droshophila melanogaster, and have to fetch all the sequences and only than > check if they belong to this species seems a waste of bandwidth. > > Thanks in advance, > Bruno Santos > > -----Mensagem original----- > De: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Frank Kauff > Enviada: quinta-feira, 11 de Setembro de 2008 16:27 > Para: biopython at biopython.org > Assunto: Re: [BioPython] Entrez.read > > Hi Bruno, > > from what I see your code should work. Without knowing what version of > Biopython you are running and what error message you get, it's a bit > difficult to tackle he problem. I'd guess your biopython is outdated. > > Frank > > Bruno Santos wrote: > > No answers? > > -----Mensagem original----- > > De: biopython-bounces at lists.open-bio.org > > [mailto:biopython-bounces at lists.open-bio.org] Em nome de Bruno Santos > > Enviada: quarta-feira, 10 de Setembro de 2008 22:43 > > Para: biopython at biopython.org > > Assunto: [BioPython] Entrez.read > > > > Hi everyone, > > I am trying to use Entrez.esearch to find the ids I can then use to fetch > > the genbank records. > > For that purpose I went looking for biopython tutorial for more > information > > in how to do this, and I get the following information: > > > > > > > >>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] > AND > >>>> > > matK[Gene]") > > > >>>> record = Entrez.read(handle) > >>>> record["Count"] > >>>> > > '25' > > > >>>> record["IdList"] > >>>> > > ['186972394', '186972384', '186972382', '186972378', ..., '61585484'] > > > > Unfortunately when I try to follow this instructions I found out that I > > don't have any Entrez.read > > to parse the output of Entrez.esearch. Is this a new option in the last > > release? > > Thanks in advance, > > Bruno Santos > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Tue Sep 16 20:20:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Sep 2008 21:20:58 +0100 Subject: [BioPython] Entrez.read In-Reply-To: <20080910214231.f492d16a@mail.biocant.pt> References: <20080910214231.f492d16a@mail.biocant.pt> Message-ID: <320fb6e00809161320l2249e12t5b5537becc50c238@mail.gmail.com> On Wed, Sep 10, 2008 at 10:42 PM, Bruno Santos wrote: > Hi everyone, > > Unfortunately when I try to follow this instructions I found out that I don't have any Entrez.read > to parse the output of Entrez.esearch. Is this a new option in the last release? My initial guess would be the answer in the FAQ (near the start of the tutorial), >> Why doesn't Bio.Entrez.read() work? The module imports fine but there is no read function! >> You need Biopython 1.46 or later. But then you said later it was something odd about the pydev console in eclipse - which stumps me. Peter From biopython at maubp.freeserve.co.uk Thu Sep 18 12:32:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Sep 2008 13:32:43 +0100 Subject: [BioPython] Deprecating Bio.Mindy and Martel Message-ID: <320fb6e00809180532t38c9aa19leff6b65f1902ca5f@mail.gmail.com> Dear all, As of Biopython release 1.48, Martel and Bio.Mindy are officially considered to be obsolete. These modules provided a regular expression based parsing framework originally used in several of the Biopython parsers, but required the external dependency mxTextTools. None of current parsers in Biopython use Martel and Bio.Mindy anymore. I would like to deprecate Martel and Bio.Mindy (and closely associated support modules) for the next release. This means the code will still be present and functional, but using it will trigger a warning message. We can then remove them in a future release. This change should not affect the vast majority of Biopython users. However, if there are any Biopython users on the mailing list who are using Martel and Bio.Mindy in their own parsers, please get in touch. Thank you, Peter From mjldehoon at yahoo.com Sat Sep 20 03:02:51 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 20:02:51 -0700 (PDT) Subject: [BioPython] Migrating from Numerical Python to numpy Message-ID: <623262.17729.qm@web62407.mail.re1.yahoo.com> Dear all, As you probably are well aware, Biopython releases to date have used the now obsolete Numeric python library. This is no longer being maintained and has been superseded by the numpy library. See http://www.scipy.org/History_of_SciPy for more about details on the history of numerical python. Biopython 1.48 should be the last Numeric only release of Biopython - we have already started moving to numpy in CVS. Supporting both Numeric and numpy ought to be fairly straightforward for the pure python modules in Biopython. However, we also have C code which must interact with Numeric/numpy, and trying to support both would be harder. Would anyone be inconvenienced if the next release of Biopython supported numpy ONLY (dropping support for Numeric)? If so please speak up now - either here or on the development mailing list. Otherwise, a simple switch from Numeric to numpy will probably be the most straightforward migration plan. Thank you, --Mihiel. From ericgibert at yahoo.fr Sun Sep 21 02:46:01 2008 From: ericgibert at yahoo.fr (Eric Gibert) Date: Sun, 21 Sep 2008 10:46:01 +0800 Subject: [BioPython] Migrating from Numerical Python to numpy In-Reply-To: <623262.17729.qm@web62407.mail.re1.yahoo.com> References: <623262.17729.qm@web62407.mail.re1.yahoo.com> Message-ID: <93940882994C47E3AEDA37AC021C187D@Gecko> Go ahead! :-) numpy only. Eric -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Michiel de Hoon Sent: Saturday, September 20, 2008 11:03 AM To: biopython at biopython.org Subject: [BioPython] Migrating from Numerical Python to numpy Dear all, As you probably are well aware, Biopython releases to date have used the now obsolete Numeric python library. This is no longer being maintained and has been superseded by the numpy library. See http://www.scipy.org/History_of_SciPy for more about details on the history of numerical python. Biopython 1.48 should be the last Numeric only release of Biopython - we have already started moving to numpy in CVS. Supporting both Numeric and numpy ought to be fairly straightforward for the pure python modules in Biopython. However, we also have C code which must interact with Numeric/numpy, and trying to support both would be harder. Would anyone be inconvenienced if the next release of Biopython supported numpy ONLY (dropping support for Numeric)? If so please speak up now - either here or on the development mailing list. Otherwise, a simple switch from Numeric to numpy will probably be the most straightforward migration plan. Thank you, --Mihiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From lunt at ctbp.ucsd.edu Sun Sep 21 21:57:59 2008 From: lunt at ctbp.ucsd.edu (Bryan Lunt) Date: Sun, 21 Sep 2008 14:57:59 -0700 Subject: [BioPython] Biopython 1.48 Still using Numeric!? Message-ID: Greetings, It seems that v1.48 still uses Numeric, and not Numpy? The last time I installed biopython (v1.47) I was able to use the migration tools included with numpy, plus some manual judgement and search/replace to migrate the code. It seemed like the tests worked after that, but maybe there was something I missed? Obviously it must be a huge job, or someone would have done it by now. (Which I suppose supports the "something I missed" hypothesis...) -Bryan Lunt From biopython at maubp.freeserve.co.uk Sun Sep 21 22:26:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 21 Sep 2008 23:26:24 +0100 Subject: [BioPython] Biopython 1.48 Still using Numeric!? In-Reply-To: References: Message-ID: <320fb6e00809211526y6abcac90i9f76b97455e1e9af@mail.gmail.com> On Sun, Sep 21, 2008 at 10:57 PM, Bryan Lunt wrote: > Greetings, > It seems that v1.48 still uses Numeric, and not Numpy? Yes, that is true. See below... > The last time I installed biopython (v1.47) I was able to use the > migration tools included with numpy, plus some manual judgement and > search/replace to migrate the code. > It seemed like the tests worked after that, but maybe there was > something I missed? > > Obviously it must be a huge job, or someone would have done it by now. > (Which I suppose supports the "something I missed" hypothesis...) I'm guessing you missed the earlier emails on this list - we are in the process of moving from Numeric to numpy for the next release. This is non-trivial due to Biopython having C code using Numeric. If you are brave, you could checkout the latest code from CVS and see how that works for you? Peter From meesters at uni-mainz.de Mon Sep 22 11:29:26 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 13:29:26 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector Message-ID: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> Hi, For my own software I've re-written Bio.PDB.Vector. (Or at least something similar, but I've adpated my code to the look & feel of Thomas Hamelryck's original code.) My code avoids using the 'from numpy.oldnumeric import'-statements. I've send the file some months ago to Thomas, but it seems it didn't make it to CVS. Anybody interested in using it as a template (it includes some unit tests)? Christian PS I don't have time to join the development, but appreciate any progress in biopython. Thanks to all of you and keep up the good work! From biopython at maubp.freeserve.co.uk Mon Sep 22 13:14:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 14:14:14 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> On Mon, Sep 22, 2008 at 12:29 PM, Christian Meesters wrote: > Hi, > > For my own software I've re-written Bio.PDB.Vector. (Or at least > something similar, but I've adpated my code to the look & feel of Thomas > Hamelryck's original code.) > My code avoids using the 'from numpy.oldnumeric import'-statements. I've > send the file some months ago to Thomas, but it seems it didn't make it > to CVS. Anybody interested in using it as a template (it includes some > unit tests)? > > Christian > > PS I don't have time to join the development, but appreciate any > progress in biopython. Thanks to all of you and keep up the good work! Hi Christian, It seems Thomas has been a little too busy lately to spend much time on Biopython. Could you file an enhancement bug on bugzilla for this, and then upload your code as an attachment? Depending on what exactly you mean be re-writing Bio/PDB/Vector.py either a patch or just the new python file(s) could be best. Further unit tests are also welcome :) Thanks, Peter From meesters at uni-mainz.de Mon Sep 22 14:56:13 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 16:56:13 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> Message-ID: <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> > > It seems Thomas has been a little too busy lately to spend much time > on Biopython. Could you file an enhancement bug on bugzilla for this, > and then upload your code as an attachment? Done. Only that I didn't find how to attach files. So I simply pasted the code Feel free to use or drop anything. Christian From biopython at maubp.freeserve.co.uk Mon Sep 22 15:19:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 16:19:33 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> On Mon, Sep 22, 2008 at 3:56 PM, Christian Meesters wrote: > >> >> It seems Thomas has been a little too busy lately to spend much time >> on Biopython. Could you file an enhancement bug on bugzilla for this, >> and then upload your code as an attachment? > > Done. Only that I didn't find how to attach files. So I simply pasted > the code Feel free to use or drop anything. > > Christian Thanks for that. I did wonder if I should be more explicit - its a two step process, file the bug, then add the attachment. Odd, I agree. As to the code, I've only skimmed over it. One thing that caught my eye is you are using scipy, which we currently do not want to introduce as a Biopython dependency. Also, by the nature of your code (using the new numpy API, not numpy.oldnumeric), making this kind of change will have to wait until we have agreed to completely drop support for Numeric. There is a thread proposing a simple transition from Numeric only (Biopython 1.48 or older) to numpy only (Biopython 1.49 onwards), but a transition period supporting both could be possible (but would probably be more work!). If you have any views on this migration, please voice them (ideally on that thread). Thanks, Peter From meesters at uni-mainz.de Mon Sep 22 15:44:30 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 22 Sep 2008 17:44:30 +0200 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> Message-ID: <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> > Thanks for that. I did wonder if I should be more explicit - its a > two step process, file the bug, then add the attachment. Odd, I > agree. Indeed ;-). > > As to the code, I've only skimmed over it. One thing that caught my > eye is you are using scipy, which we currently do not want to > introduce as a Biopython dependency. Alright, this was only a shortcut to get the determinant. Actually there is no need to use scipy. > If you have any views on this migration, please voice them > (ideally on that thread). No, nothing to add to things already written. Christian From biopython at maubp.freeserve.co.uk Mon Sep 22 16:05:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 17:05:13 +0100 Subject: [BioPython] numpy migration for Bio.PDB.Vector In-Reply-To: <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> References: <1222082966.5737.53.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220614q161d05a3sf21d2f3ecea651f2@mail.gmail.com> <1222095373.5737.73.camel@meesters.biologie.uni-mainz.de> <320fb6e00809220819s27fb8cb4s6e713e56fe89cc@mail.gmail.com> <1222098270.5737.85.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00809220905x6c0c0c82wc3b06344d87714b0@mail.gmail.com> >> As to the code, I've only skimmed over it. One thing that caught my >> eye is you are using scipy, which we currently do not want to >> introduce as a Biopython dependency. > > Alright, this was only a shortcut to get the determinant. Actually there > is no need to use scipy. Good - that's what I would have assumed anyway. >> If you have any views on this migration, please voice them >> (ideally on that thread). > > No, nothing to add to things already written. OK, good :) Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 16:18:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 17:18:31 +0100 Subject: [BioPython] Removing Bio.ECell, Bio.Rebase, Bio.CDD and Bio.SGMLExtractor Message-ID: <320fb6e00809230918s2ceb3ee7hbd14f3d72a46c903@mail.gmail.com> Dear all, The following unmaintained modules were deprecated a couple of releases ago, and unless anyone objects, they will shortly be removed, and will not be present in the next release of Biopython: Bio.ECell - obsolete with ECell updates Bio.Rebase - could not parse current Rebase records Bio.CDD - could not parse current HTML files from CDD Bio.SGMLExtractor - old parsing code no longer needed within Biopython As usual, the files will still be preserved in CVS so if at a later date someone wants to resurrect this functionality, this is possible. Peter From adityashukla1983 at gmail.com Wed Sep 24 00:13:06 2008 From: adityashukla1983 at gmail.com (aditya shukla) Date: Tue, 23 Sep 2008 19:13:06 -0500 Subject: [BioPython] Creating and traversing an ultrametric tree Message-ID: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> Hello Folks, Can someone explain how can an ultrametric tree be created and traversed? I found think link on ultrametric trees but didn't get a lot from it. http://www.koders.com/python/fid414AE4B3075F9E434B76B0AEA3CE336C5A9F60F8.aspx?s=cdef%3Atree#L4 Thanks Aditya From biopython at maubp.freeserve.co.uk Wed Sep 24 09:00:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 10:00:52 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> Message-ID: <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> On Wed, Sep 24, 2008 at 1:13 AM, aditya shukla wrote: > Hello Folks, > Hi Aditya, > > Can someone explain how can an ultrametric tree be created and traversed? > What kind of data are you starting with? People tend to use specialist software to generate trees using a range of algorithms, often starting from a set of aligned sequences. > > I found think link on ultrametric trees but didn't get a lot from it. > http://www.koders.com/python/fid414AE4B3075F9E434B76B0AEA3CE336C5A9F60F8.aspx?s=cdef%3Atree#L4 > Its a shame that code doesn't seem to include a header with any copyright or authorship statement. It is not using Biopython, and my initial guess was the "import newick" was using Thomas Mailund's Newick tree parser: http://www.daimi.au.dk/~mailund/newick.html (You can alternatively use the Bio.Nexus module in Biopython to parse a Newick tree) However, clicking on some of the other neighbouring files on that koders.com website I realised you are looking at a copy of the lagrange python package ("Likelihood analysis of geographic range evolution"), the homepage for which is here: http://code.google.com/p/lagrange/ Peter From biopython at maubp.freeserve.co.uk Wed Sep 24 20:26:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 21:26:37 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> Message-ID: <320fb6e00809241326i16a337das844f4ac74766b459@mail.gmail.com> On Wed, Sep 24, 2008 at 7:32 PM, aditya shukla wrote: > Hello Peter , > > Thanks for the reply , > I have attached a file with of the kind of data that i wanna parse. > I tried using Thomas Mailund's Newick tree parser but this dosen't > seem to work , so is there any other module that can help? Your file looks like this (in case anyone on the mailing list recognises it), /T_0_size=105((-bin-ulockmgr_server:0.99[&&NHX:C=0.195.0], (((-bin-hostname:0.00[&&NHX:C=200.0.0], (-bin-dnsdomainname:0.00[&&NHX:C=200.0.0], ...):0.99):0.99):0.99):0.99); [with a large chunk removed, and new lines inserted] I'm guessing this is some kind of computer system profile - nothing to do with bioinformatics. I'm not 100% sure this is Newick format - it might be worth trying to parse everything after the "/T_0_size=105" text which looks out of place to me. If it is a valid Newick format tree file, then it is using named internal nodes which is something Biopython can't currently parse (see Bug 2543, http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ). So I don't think you can use the Bio.Nexus module in Biopython to read this tree. The only other python package I can suggest you try is NetworkX, https://networkx.lanl.gov/wiki Good luck, Peter From peter at maubp.freeserve.co.uk Fri Sep 26 15:59:51 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 16:59:51 +0100 Subject: [BioPython] More string methods for the Seq object Message-ID: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> Dear Biopythoneers, On bugzilla I've put forward a couple of suggested string-like methods which I would like to add to the Seq object. Bug 2351 comment 15 - adding a split method http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 Here I have suggested the separator be non-optional (for strings this defaults to white space) Bug 2596 - adding strip, rstrip and lstrip http://bugzilla.open-bio.org/show_bug.cgi?id=2596 Here I have suggested these default to stripping gap characters (for strings these default to stripping white space) Any comments? Positive or negative feedback is welcome. Peter From mmokrejs at ribosome.natur.cuni.cz Fri Sep 26 16:34:01 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 26 Sep 2008 18:34:01 +0200 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> Message-ID: <48DD0EF9.80204@ribosome.natur.cuni.cz> Hi Peter, Peter wrote: > Dear Biopythoneers, > > On bugzilla I've put forward a couple of suggested string-like methods > which I would like to add to the Seq object. > > Bug 2351 comment 15 - adding a split method > http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 > Here I have suggested the separator be non-optional (for strings this > defaults to white space) please apologize my ignorance but what is this useful for? > Bug 2596 - adding strip, rstrip and lstrip > http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > Here I have suggested these default to stripping gap characters (for > strings these default to stripping white space) Again, what is this useful for? Aren't there checks for quality of the sequence when one tries to instantiate the object? M. From peter at maubp.freeserve.co.uk Fri Sep 26 16:42:19 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:42:19 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD0EF9.80204@ribosome.natur.cuni.cz> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> Message-ID: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> >> Bug 2351 comment 15 - adding a split method >> http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 >> Here I have suggested the separator be non-optional (for strings this >> defaults to white space) > > please apologize my ignorance but what is this useful for? Support you had translated a nucleotide sequence into for example, "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein alphabet). You might want to split the sequence at terminators, to get the open reading frames (and then filter them on length). Right now the Seq object doesn't have a split method so you would have to switch to using python strings (and then go back to a Biopython Seq object later if need be). >> Bug 2596 - adding strip, rstrip and lstrip >> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 >> Here I have suggested these default to stripping gap characters (for >> strings these default to stripping white space) > > Again, what is this useful for? Aren't there checks for quality > of the sequence when one tries to instantiate the object? I'm not sure what you mean by quality of the sequence here (are you talking about sequencing quality scores?) Suppose you have some sequences which you have aligned in ClustalW, and most have leading or trailing gaps characters. e.g. Given "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) you might want to strip off the leading and trailing gaps to have just "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now the Seq object doesn't have a strip method, so you would have to switch to a string and back again. I could write these up as examples in python if it would help. Peter From peter at maubp.freeserve.co.uk Fri Sep 26 16:52:13 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:52:13 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> Message-ID: <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> > Support you had translated a nucleotide sequence into for example, > "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein > alphabet). You might want to split the sequence at terminators, to > get the open reading frames (and then filter them on length). Right > now the Seq object doesn't have a split method so you would have to > switch to using python strings (and then go back to a Biopython Seq > object later if need be). Using pure python strings: str_seq = "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" orf_str_list = str_seq.split("*") Using Biopython Seq objects: from Bio.Seq import Seq from Bio.Alphabet import generic_protein seq = Seq("SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ", generic_protein) #I want to be able to do this: orf_seq_list = seq.split("*") #Right now I have to do something like this: orf_seq_list = [Seq(x, generic_protein) for x in seq.tostring().split("*")] Another example of using a Seq object .split() method would be for restriction enzymes (although the Bio.Restriction package should be more general). > Suppose you have some sequences which you have aligned in ClustalW, > and most have leading or trailing gaps characters. e.g. Given > "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) > you might want to strip off the leading and trailing gaps to have just > "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now > the Seq object doesn't have a strip method, so you would have to > switch to a string and back again. Using pure python strings: long_seq_str = "---SAD-KCNKADND---" trimmed_seq_str = long_seq_str.strip("-") Using Biopython Seq objects: from Bio.Seq import Seq from Bio.Alphabet import generic_protein long_seq = Seq("---SAD-KCNKADND---", generic_protein) #I want to be able to do this: trimmed_seq = long_seq.strip("-") #Right now, I have to do something like this: trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) Another possible example is if you have some EST sequences and you want to strip the poly A tail on the trailing end (right side), e.g "ACACTGCAGCATCAGCAAAAAAA".rstrip("A") Peter From mmokrejs at ribosome.natur.cuni.cz Fri Sep 26 16:54:15 2008 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 26 Sep 2008 18:54:15 +0200 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> Message-ID: <48DD13B7.50007@ribosome.natur.cuni.cz> Hi Peter, thanks for explanation, now I got your point. With 'quality checks' I meant test for compliance of the sequence with say IUPAC definitions. OK, it is irrelevant now. As long you you document these functions I am fine with that. ;-) Regards, M. From peter at maubp.freeserve.co.uk Fri Sep 26 17:10:57 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 18:10:57 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD13B7.50007@ribosome.natur.cuni.cz> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <48DD13B7.50007@ribosome.natur.cuni.cz> Message-ID: <320fb6e00809261010h2733e132u93182ea19dcdc456@mail.gmail.com> On Fri, Sep 26, 2008 at 5:54 PM, Martin MOKREJ? wrote: > Hi Peter, > thanks for explanation, now I got your point. Good. > > With 'quality checks' I meant test for compliance of the sequence > with say IUPAC definitions. OK, it is irrelevant now. > I did wonder if this is what you meant. Checking Seq objects for compliance with the stated (IUPAC) alphabets is a separate issue - which I have been thinking about. I've just filed Bug 2597 so we don't forget about it. http://bugzilla.open-bio.org/show_bug.cgi?id=2597 > > As long you document these functions I am fine with that. ;-) > Sure. The suggested code for these new Seq object methods includes docstrings, but I would also intend to show how they might be used in the tutorial (probably examples like the ones shown in this email discussion). Peter From bsouthey at gmail.com Fri Sep 26 18:45:58 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 13:45:58 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> Message-ID: <48DD2DE6.10908@gmail.com> Peter wrote: >> Support you had translated a nucleotide sequence into for example, >> "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" (as a Seq object with a protein >> alphabet). You might want to split the sequence at terminators, to >> get the open reading frames (and then filter them on length). Right >> now the Seq object doesn't have a split method so you would have to >> switch to using python strings (and then go back to a Biopython Seq >> object later if need be). >> > > Using pure python strings: > > str_seq = "SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ" > orf_str_list = str_seq.split("*") > > Using Biopython Seq objects: > > from Bio.Seq import Seq > from Bio.Alphabet import generic_protein > seq = Seq("SADKCNKADND*AKDNCDNADK*AK*NCAKNSHJ", generic_protein) > #I want to be able to do this: > orf_seq_list = seq.split("*") > #Right now I have to do something like this: > orf_seq_list = [Seq(x, generic_protein) for x in seq.tostring().split("*")] > > Another example of using a Seq object .split() method would be for > restriction enzymes (although the Bio.Restriction package should be > more general). > > >> Suppose you have some sequences which you have aligned in ClustalW, >> and most have leading or trailing gaps characters. e.g. Given >> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >> you might want to strip off the leading and trailing gaps to have just >> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >> the Seq object doesn't have a strip method, so you would have to >> switch to a string and back again. >> > > Using pure python strings: > > long_seq_str = "---SAD-KCNKADND---" > trimmed_seq_str = long_seq_str.strip("-") > > Using Biopython Seq objects: > > from Bio.Seq import Seq > from Bio.Alphabet import generic_protein > long_seq = Seq("---SAD-KCNKADND---", generic_protein) > #I want to be able to do this: > trimmed_seq = long_seq.strip("-") > #Right now, I have to do something like this: > trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) > > Another possible example is if you have some EST sequences and you > want to strip the poly A tail on the trailing end (right side), e.g > "ACACTGCAGCATCAGCAAAAAAA".rstrip("A") > > Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > Hi, While I do like the idea, strip(), as defined here, is inconsistent with the Python string version. Python documentation: strip([chars]): "Return a copy of the string with the leading and trailing characters removed." Rather you should use an alternative word like compress to remove the said character from within a sequence. Bruce From biopython at maubp.freeserve.co.uk Fri Sep 26 21:13:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:13:19 +0100 Subject: [BioPython] Creating and traversing an ultrametric tree In-Reply-To: <73045cca0809261340s1ee6692fo8834d2261d3a5ab@mail.gmail.com> References: <73045cca0809231713v219c3ec3tfc24461c7af6b453@mail.gmail.com> <320fb6e00809240200y144500cbl86f9023cb868da89@mail.gmail.com> <73045cca0809241132x30bc4d63t7ac0b9967a20e76c@mail.gmail.com> <320fb6e00809241326i16a337das844f4ac74766b459@mail.gmail.com> <73045cca0809261340s1ee6692fo8834d2261d3a5ab@mail.gmail.com> Message-ID: <320fb6e00809261413o1e4da682s6968aded56e6bd14@mail.gmail.com> On Fri, Sep 26, 2008 at 9:40 PM, aditya shukla wrote: > > Hello, Thanks for the help i appreciate it. > > I got the Thomas mailund's newick module to work with the kind of files i > have (they are in extended newick format).I wanna calculate > distances between the inner nodes , the current module dosen't do that.Do > you have any idea of how to accomplish this? Using Thomas Mailund's module, I think you would have to identify the shared parent node, then add up the branch lengths on the two branches. You can do it - but its a little tricky. When I last did any work on trees like this, I think I tended to use recursive functions, but I don't have any examples handy. Peter From biopython at maubp.freeserve.co.uk Fri Sep 26 21:22:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:22:48 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD2DE6.10908@gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> Message-ID: <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> >>> Suppose you have some sequences which you have aligned in ClustalW, >>> and most have leading or trailing gaps characters. e.g. Given >>> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >>> you might want to strip off the leading and trailing gaps to have just >>> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >>> the Seq object doesn't have a strip method, so you would have to >>> switch to a string and back again. >> >> Using pure python strings: >> >> long_seq_str = "---SAD-KCNKADND---" >> trimmed_seq_str = long_seq_str.strip("-") This gives "SAD-KCNKADND", it does NOT remove the internal "-" character. >> Using Biopython Seq objects: >> >> from Bio.Seq import Seq >> from Bio.Alphabet import generic_protein >> long_seq = Seq("---SAD-KCNKADND---", generic_protein) >> #I want to be able to do this: >> trimmed_seq = long_seq.strip("-") >> #Right now, I have to do something like this: >> trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) This gives Seq("SAD-KCNKADND", ProteinAlphabet()), i.e. it would NOT remove the internal "-" character. > While I do like the idea, strip(), as defined here, is inconsistent with the > Python string version. Python documentation: strip([chars]): "Return a > copy of the string with the leading and trailing characters removed." My intended Seq strip method is intended EXACTLY like the python string apart from the default strip characters (except I would suggest defaulting to the gap character rather than white space). My proposed implementation even calls the python string strip method internally. Have another look at the suggested code: http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > Rather you should use an alternative word like compress to remove the said > character from within a sequence. I suspect you have misunderstood my intension. My Seq object .strip() method would NOT remove the given characters from the interior of the sequence - only from the ends. However, there is certainly a case for wanting an .ungap() method for the Seq class (or a more general method to remove all of a particular character), but I hadn't intended to raise this issue yet. Peter From bsouthey at gmail.com Fri Sep 26 21:53:35 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 16:53:35 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> Message-ID: <48DD59DF.1000504@gmail.com> Peter wrote: >>>> Suppose you have some sequences which you have aligned in ClustalW, >>>> and most have leading or trailing gaps characters. e.g. Given >>>> "---SAD-KCNKADND---" (as a Seq object with a gapped protein alphabet) >>>> you might want to strip off the leading and trailing gaps to have just >>>> "SAD-KCNKADND" (as a Seq object with the same alphabet). Right now >>>> the Seq object doesn't have a strip method, so you would have to >>>> switch to a string and back again. >>>> >>> Using pure python strings: >>> >>> long_seq_str = "---SAD-KCNKADND---" >>> trimmed_seq_str = long_seq_str.strip("-") >>> > > This gives "SAD-KCNKADND", it does NOT remove the internal "-" character. > > >>> Using Biopython Seq objects: >>> >>> from Bio.Seq import Seq >>> from Bio.Alphabet import generic_protein >>> long_seq = Seq("---SAD-KCNKADND---", generic_protein) >>> #I want to be able to do this: >>> trimmed_seq = long_seq.strip("-") >>> #Right now, I have to do something like this: >>> trimmed_seq = Seq(long_seq.tostring().strip("-"), generic_protein) >>> > > This gives Seq("SAD-KCNKADND", ProteinAlphabet()), i.e. it would NOT > remove the internal "-" character. > > >> While I do like the idea, strip(), as defined here, is inconsistent with the >> Python string version. Python documentation: strip([chars]): "Return a >> copy of the string with the leading and trailing characters removed." >> > > My intended Seq strip method is intended EXACTLY like the python > string apart from the default strip characters (except I would suggest > defaulting to the gap character rather than white space). My proposed > implementation even calls the python string strip method internally. > Have another look at the suggested code: > http://bugzilla.open-bio.org/show_bug.cgi?id=2596 > > >> Rather you should use an alternative word like compress to remove the said >> character from within a sequence. >> > > I suspect you have misunderstood my intension. My Seq object .strip() > method would NOT remove the given characters from the interior of the > sequence - only from the ends. > > However, there is certainly a case for wanting an .ungap() method for > the Seq class (or a more general method to remove all of a particular > character), but I hadn't intended to raise this issue yet. > > Peter > > Yes, sorry about that. I misunderstood because I confused myself with the first part that uses the split. Bruce From biopython at maubp.freeserve.co.uk Fri Sep 26 21:57:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:57:31 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <48DD59DF.1000504@gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> Message-ID: <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> >> I suspect you have misunderstood my intension. My Seq object .strip() >> method would NOT remove the given characters from the interior of the >> sequence - only from the ends. >> >> However, there is certainly a case for wanting an .ungap() method for >> the Seq class (or a more general method to remove all of a particular >> character), but I hadn't intended to raise this issue yet. >> >> Peter > > Yes, sorry about that. I misunderstood because I confused myself with the > first part that uses the split. > > Bruce Fair enough - maybe I shouldn't have tackled both methods in one email... but I'm glad we cleared that up. Anyway - do think adding the split and strip methods to the Seq object is worthwhile? Peter From bsouthey at gmail.com Sat Sep 27 01:55:18 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Sep 2008 20:55:18 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> Message-ID: On Fri, Sep 26, 2008 at 4:57 PM, Peter wrote: >>> I suspect you have misunderstood my intension. My Seq object .strip() >>> method would NOT remove the given characters from the interior of the >>> sequence - only from the ends. >>> >>> However, there is certainly a case for wanting an .ungap() method for >>> the Seq class (or a more general method to remove all of a particular >>> character), but I hadn't intended to raise this issue yet. >>> >>> Peter >> >> Yes, sorry about that. I misunderstood because I confused myself with the >> first part that uses the split. >> >> Bruce > > Fair enough - maybe I shouldn't have tackled both methods in one > email... but I'm glad we cleared that up. > > Anyway - do think adding the split and strip methods to the Seq object > is worthwhile? > > Peter > Yes - in fact probably essential now many users are likely to have to and want to parse genome sequences. I really would like to see many of the sequence methods 'work' in the same manner Python string methods. The string methods that I use a lot for sequences are: strip split join find (I don't the 'l' and 'r' versions very much.) So you would address the first two. I do something like your ungap() idea with strings using join: >>> ''.join(sequence.split('-')) Python 2.5 introduced 'partition(sep): Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator'. While I don't use it (because I usually split multiple times) it has advantages if you are looking for the first occurrence of a patten: >>> a='GTATGCGTAATG' >>> a.partition('ATG') ('GT', 'ATG', 'CGTAATG') Regards Bruce From biopython at maubp.freeserve.co.uk Sat Sep 27 12:57:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Sep 2008 13:57:41 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> Message-ID: <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> > >> Anyway - do think adding the split and strip methods to the Seq object >> is worthwhile? > > Yes - in fact probably essential now many users are likely to have to > and want to parse genome sequences. > > I really would like to see many of the sequence methods 'work' in the > same manner Python string methods. The string methods that I use a lot > for sequences are: > strip > split > join > find > > (I don't the 'l' and 'r' versions very much.) > So you would address the first two. I was planning deal with strip and split first, and then move on to discuss the remaining string methods. No one has objected to adding strip and split (plus lstrip and rstrip) so if we take that as a consensus, the only point we should still debate is their default arguments. Other alternatives to what I have already put forward include following the python string and defaulting to white space (which would never normally be present in a sequence), or making the arguments non optional. Adding the join method is more complicated as regards the alphabet of the sequence and the list of sequences given (which could be strings or Seq objects) - but in principle I think we should support it. I'd prefer to leave this one till last! Adding support for find should be straight forward. > I do something like your ungap() idea with strings using join: >>>> ''.join(sequence.split('-')) That would work but to get a sensible alphabet forces a much longer version with Seq objects - something like this: Seq("", generic_protein).join(my_seq.split("-")) Having my_seq.ungap() or my_seq.ungap("-") would in my opinion be much clearer for the reader, plus the ungap method would also be able to amend the alphabet appropriately. > Python 2.5 introduced 'partition(sep): Split the string at the first > occurrence of sep, and return a 3-tuple containing the part before the > separator, the separator itself, and the part after the separator'. > While I don't use it (because I usually split multiple times) it has > advantages if you are looking for the first occurrence of a patten: >>>> a='GTATGCGTAATG' >>>> a.partition('ATG') > ('GT', 'ATG', 'CGTAATG') Thanks for pointing that out. I hadn't noticed the addition of the partition method to python - until recently my main machine ran python 2.4 (and even now I still use python 2.3 on some occasions). However, we could still add a partition method to the Seq object, but wouldn't be able to take advantage of the string implementation on the older versions of python. Peter From bsouthey at gmail.com Sun Sep 28 01:06:54 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 27 Sep 2008 20:06:54 -0500 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <48DD0EF9.80204@ribosome.natur.cuni.cz> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> Message-ID: On Sat, Sep 27, 2008 at 7:57 AM, Peter wrote: >> >>> Anyway - do think adding the split and strip methods to the Seq object >>> is worthwhile? >> >> Yes - in fact probably essential now many users are likely to have to >> and want to parse genome sequences. >> >> I really would like to see many of the sequence methods 'work' in the >> same manner Python string methods. The string methods that I use a lot >> for sequences are: >> strip >> split >> join >> find >> >> (I don't the 'l' and 'r' versions very much.) >> So you would address the first two. > > I was planning deal with strip and split first, and then move on to > discuss the remaining string methods. > > No one has objected to adding strip and split (plus lstrip and rstrip) > so if we take that as a consensus, the only point we should still > debate is their default arguments. Other alternatives to what I have > already put forward include following the python string and defaulting > to white space (which would never normally be present in a sequence), > or making the arguments non optional. I do agree especially in terms of attempting to keep the standard Python defaults and behavior. > > Adding the join method is more complicated as regards the alphabet of > the sequence and the list of sequences given (which could be strings > or Seq objects) - but in principle I think we should support it. I'd > prefer to leave this one till last! Well I have the view that if this is easier to do now then it should be done now. > > Adding support for find should be straight forward. This would be great to have. > >> I do something like your ungap() idea with strings using join: >>>>> ''.join(sequence.split('-')) > > That would work but to get a sensible alphabet forces a much longer > version with Seq objects - something like this: > Seq("", generic_protein).join(my_seq.split("-")) > > Having my_seq.ungap() or my_seq.ungap("-") would in my opinion be much > clearer for the reader, plus the ungap method would also be able to > amend the alphabet appropriately. I do agree and the terminology is appropriate. > >> Python 2.5 introduced 'partition(sep): Split the string at the first >> occurrence of sep, and return a 3-tuple containing the part before the >> separator, the separator itself, and the part after the separator'. >> While I don't use it (because I usually split multiple times) it has >> advantages if you are looking for the first occurrence of a patten: >>>>> a='GTATGCGTAATG' >>>>> a.partition('ATG') >> ('GT', 'ATG', 'CGTAATG') > > Thanks for pointing that out. I hadn't noticed the addition of the > partition method to python - until recently my main machine ran python > 2.4 (and even now I still use python 2.3 on some occasions). However, > we could still add a partition method to the Seq object, but wouldn't > be able to take advantage of the string implementation on the older > versions of python. > The real question is would this functionality be sufficiently useful to justify it? I can see that it is useful for very special cases like open reading frames but I do not think that this is sufficient. Regards Bruce From biopython at maubp.freeserve.co.uk Mon Sep 29 11:50:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Sep 2008 12:50:21 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <320fb6e00809260942m611a6afdxa7c08b9130bdb587@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> Message-ID: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> >> No one has objected to adding strip and split (plus lstrip and rstrip) >> so if we take that as a consensus, the only point we should still >> debate is their default arguments. Other alternatives to what I have >> already put forward include following the python string and defaulting >> to white space (which would never normally be present in a sequence), >> or making the arguments non optional. Listing out those explicitly, for the poposed Seq object methods: (a) strip, lstrip and rstrip default to the gap character. split argument is mandatory. (b) strip, lstrip, rstrip and split all default to white space (following string default) (c) strip, lstrip, rstrip and split all have a mandatory argument (explicit better than implicit) > I do agree especially in terms of attempting to keep the standard > Python defaults and behavior. I assume you are agreeing with what I have now called (b), follow the string defaults of white space for stipping or splitting (for consistency, even though this won't typically be useful for sequences). On balance this would probably be best from a principle of consistency and least surprise for the user - I'll update the patches. Peter From biopython at maubp.freeserve.co.uk Mon Sep 29 12:06:17 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Sep 2008 13:06:17 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> References: <320fb6e00809260859r23c7915buc114c5c0b71e195@mail.gmail.com> <320fb6e00809260952i2f3485abp715797e7120147e9@mail.gmail.com> <48DD2DE6.10908@gmail.com> <320fb6e00809261422n6e4c4889p734508613898cc3f@mail.gmail.com> <48DD59DF.1000504@gmail.com> <320fb6e00809261457j65dc0876hd59d17aee01bc983@mail.gmail.com> <320fb6e00809270557n73b81b5ayb93fe85f0f466626@mail.gmail.com> <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> Message-ID: <320fb6e00809290506p8aa2b51p4901b693ebb268bf@mail.gmail.com> > I assume you [Bruce] are agreeing with ... follow[ing] the > string defaults of white space for stipping or splitting (for > consistency, even though this won't typically be useful for > sequences). On balance this would probably be best from > a principle of consistency and least surprise for the user - > I'll update the patches. New patch for Seq object split, strip, lstrip and rstrip methods on Bug 2596 which follows the python string defaults (splitting on or stripping of white space). http://bugzilla.open-bio.org/show_bug.cgi?id=2596 If everyone is happy with this, I can extend the unit tests and documentation too. Peter From lpritc at scri.ac.uk Mon Sep 29 12:15:11 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 29 Sep 2008 13:15:11 +0100 Subject: [BioPython] More string methods for the Seq object In-Reply-To: <320fb6e00809290450m6fedbaacu15a75107e5c39658@mail.gmail.com> Message-ID: +1 for following the Python default. A less straightforward implementation might use a knowledge of the Alphabet for the Seq to strip out non-Alphabet characters, such as gaps, but I think that this, along with (a) and (c), would be overkill. I like the principle of least surprise in this case. On 29/09/2008 12:50, "Peter" wrote: >>> No one has objected to adding strip and split (plus lstrip and rstrip) >>> so if we take that as a consensus, the only point we should still >>> debate is their default arguments. Other alternatives to what I have >>> already put forward include following the python string and defaulting >>> to white space (which would never normally be present in a sequence), >>> or making the arguments non optional. > > Listing out those explicitly, for the poposed Seq object methods: > (a) strip, lstrip and rstrip default to the gap character. split > argument is mandatory. > (b) strip, lstrip, rstrip and split all default to white space > (following string default) > (c) strip, lstrip, rstrip and split all have a mandatory argument > (explicit better than implicit) > >> I do agree especially in terms of attempting to keep the standard >> Python defaults and behavior. > > I assume you are agreeing with what I have now called (b), follow the > string defaults of white space for stipping or splitting (for > consistency, even though this won't typically be useful for > sequences). On balance this would probably be best from a principle > of consistency and least surprise for the user - I'll update the > patches. > > Peter -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________