From hoffman at ebi.ac.uk Mon Mar 1 08:27:19 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Bio.SeqIO.FASTA fix Message-ID: Bio.SeqIO.FASTA.FastaReader.next() will return a SeqRecord that has the id attribute set to either a list or a string depending on how many words are in the definition line (list if there is one word; string if there is more than one word!). This is a fix so that it will always be a string. OK to check in? Index: FASTA.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/SeqIO/FASTA.py,v retrieving revision 1.6 diff -u -r1.6 FASTA.py --- FASTA.py 11 Apr 2003 20:04:54 -0000 1.6 +++ FASTA.py 1 Mar 2004 13:34:58 -0000 @@ -32,7 +32,7 @@ # description. If there's only one word, it's the id. x = string.split(line[1:].rstrip(), None, 1) if len(x) == 1: - id = x + id = x[0] desc = "" else: id, desc = x -- Michael Hoffman European Bioinformatics Institute From chapmanb at uga.edu Mon Mar 1 13:52:13 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Bio.SeqIO.FASTA fix In-Reply-To: References: Message-ID: <20040301185213.GO24150@evostick.agtec.uga.edu> Hi Micheal; > Bio.SeqIO.FASTA.FastaReader.next() will return a SeqRecord that has > the id attribute set to either a list or a string depending on how > many words are in the definition line (list if there is one word; > string if there is more than one word!). This is a fix so that it will > always be a string. Cool. I'm glad that you are using the SeqIO stuff -- I must admit that I don't use it much myself, and it's great to have someone looking after it. Please go ahead and check-in away (now, that can't be proper english phrasing) with impunity. Thanks -- always so glad to have the fixes. Brad From mcolosimo at mitre.org Wed Mar 10 15:12:47 2004 From: mcolosimo at mitre.org (Marc Colosimo) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Problem with MutableSeq Message-ID: <4C6EBA4C-72CF-11D8-8096-000A95A5D8B2@mitre.org> I've been banging my head against my monitor over this for awhile. Here is the problem (using stuff from ) I want to reverse my DNA Seq object, so I did this: mut_seq = my_seq.tomutable() mut_seq.reverse() my_seq = mut_seq I thought these behaved the same (silly me). Later on I translate it, however, I get a TypeError! I had to pull out the code to see what the hell was going on because print my_seq looks fine. The problem is that MutableSeq.data is an array whereas Seq.data is real data. So when you do this: s = my_seq.data n = len(s) for i in range(0, n-n%3, 3): print s[i:i+3] for Seq it prints CGC for MutableSeq it prints things like array('c', 'ACG') which is a TypeError!!! This should be put in the doc that you need to call the lonely method .toseq to get back a real sequence. Or change MutableSeq.data to MutableSeq.array_data and make MutableSeq.data a string. From idoerg at burnham.org Fri Mar 12 16:45:57 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] reduced alphabets Message-ID: <40522F95.40703@burnham.org> Hi all, I am thinking of incorporating reduced alphabets into biopython. Reduced (or redundant) alphabets are used to represent protein sequences using an alternative alphabet which lumps together several amino-acids into one letter, based on physico-chemical traits. For example, all the aliphatics (I,L,V) are usually quite interchangeable, so many sequence studies lump them into one letter. We don't have that, do we? This can also be applied to DNA, although I only heard of a 4->2 reduction (to purines & pyrimidines), and it is usually less useful. You can see examples of reduced alphabets here: http://viscose.ifg.uni-muenster.de/html/alphabets.html I was thinking of making additions in two places: 1) in util.py I will add a function "reduce_sequence": def reduce_sequence(seq, reduction_table,new_alphabet=None): """ given an amino-acid sequence, return it in reduced alphabet form based on the letter-translation table passed seq: a Seq.Seq type sequence reduction_table: a dictionary whose keys are the "from" alphabet, and values are the "to" alphabet""" if new_alphabet is None: new_alphabet = Alphabet.single_letter_alphabet new_alphabet.letters = '' for letter in reduction_table: new_alphabet.letters += letter new_alphabet.size = len(new_alphabet.letters) new_seq = Seq.Seq('',new_alphabet) for letter in seq: new_seq += aa_table[letter] return new_seq ****************** 2) In Bio.Alphabets I will 2.1) add a module some dictionaries mapping the 20 and 23 aa alphabet "brand name" reduced alphabets, 2.2) Add another module, along the lines of IUPAC.py with the brand name alphabets as instances of SingleLetterAlphabet Comments, suggestions? Thanks, ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From idoerg at burnham.org Mon Mar 15 17:37:28 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] why buffer in utils.count_monomers? Message-ID: <40563028.2010905@burnham.org> Hi, I tried the following (CVS version from last Friday, Python 2.2); >>> l = \ Seq.Seq('MINAIRTPDQRFSNLDQYPFSPNYLDDLPGYPGLRAHYLDEGNSDAEDVFLCLHGEPTWS',IUPAC.protein) >>> >>> utils.count_monomers(l) Traceback (most recent call last): File "", line 1, in ? File "/home/iddo/biopy_cvs/biopython/Bio/utils.py", line 64, in count_monomers dict[c] = string.count(s, c) File "/usr/lib/python2.2/string.py", line 161, in count return s.count(*args) AttributeError: 'buffer' object has no attribute 'count' when I replaced the variable "s" in line 64 in utils.py with "seq.data" everything worked fine. In line 62 "s" is defined as: s=buffer(seq.data) Does that serve a purpose? Can we do without it, (meaning I deposit the bugfix) or is it important? thanks, Iddo Traceback (most recent call last): File "", line 1, in ? File "/home/iddo/biopy_cvs/biopython/Bio/utils.py", line 64, in count_monomers dict[c] = string.count(s, c) File "/usr/lib/python2.2/string.py", line 161, in count return s.count(*args) AttributeError: 'buffer' object has no attribute 'count' -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From hoffman at ebi.ac.uk Mon Mar 15 17:55:22 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] why buffer in utils.count_monomers? In-Reply-To: <40563028.2010905@burnham.org> References: <40563028.2010905@burnham.org> Message-ID: On Mon, 15 Mar 2004, Iddo Friedberg wrote: > when I replaced the variable "s" in line 64 in utils.py with "seq.data" > everything worked fine. In line 62 "s" is defined as: > s=buffer(seq.data) > > Does that serve a purpose? Can we do without it, (meaning I deposit the > bugfix) or is it important? There is a comment that says that using a buffer makes the function work for both strings and arrays. Of course right now it works for neither... -- Michael Hoffman European Bioinformatics Institute From idoerg at burnham.org Tue Mar 16 17:53:42 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Reduced alphabets Message-ID: <40578576.60309@burnham.org> Hi, Thanks to overwhelming demand (well, nobody really objected ;) biopython now has the rudimentaries for handling reduced alphabets. I committed the following changes: 1) in Bio.utils I added reduce_sequence(seq, reduction_table, new_alphabet=None) 2) in Alphabet, I added Reduce.py, which has reduction tables, and reduced alphabet definitions + literature citations Enjoy, Iddo -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From chapmanb at uga.edu Wed Mar 17 20:13:41 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Problem with MutableSeq In-Reply-To: <4C6EBA4C-72CF-11D8-8096-000A95A5D8B2@mitre.org> References: <4C6EBA4C-72CF-11D8-8096-000A95A5D8B2@mitre.org> Message-ID: <20040318011341.GB99271@evostick.agtec.uga.edu> Hi Marc; Thanks for the feedback on the documentation and MutableSeq. Sorry for the delay in responding -- I've been out of town and am just getting myself back together. > I've been banging my head against my monitor over this for awhile. Here > is the problem (using stuff from > ) > > I want to reverse my DNA Seq object, so I did this: > > mut_seq = my_seq.tomutable() > mut_seq.reverse() > my_seq = mut_seq > > I thought these behaved the same (silly me). Later on I translate it, > however, I get a TypeError! > > I had to pull out the code to see what the hell was going on because > print my_seq looks fine. > > The problem is that MutableSeq.data is an array whereas Seq.data is > real data. So when you do this: [...] > which is a TypeError!!! This should be put in the doc that you need to > call the lonely method .toseq to get back a real sequence. Or change > MutableSeq.data to MutableSeq.array_data and make MutableSeq.data a > string. Yes, I definitely agree that this is confusing. When Andrew implemented MutableSeq it uses the array to represent the sequence instead of strings, as you correctly point out. This does confuse things because many of the functions that deal with sequences aren't set up to deal with both arrays and strings for the data object. I do think the real answer to fix the problem is to adjust the docs so they make this clear in the transition between the mutable seq part and the translation part. I am currently trying to re-do documentation into a more small sized cookbook format, to make it easier to maintain and update (and for people to contribute :-). I will try to put it on my list to pull out this section and update it to avoid this kind of problem. Sorry for the frustration and thanks for sharing your experience so we can make the documentation better. Brad From chapmanb at uga.edu Wed Mar 17 20:25:46 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Reduced alphabets In-Reply-To: <40578576.60309@burnham.org> References: <40578576.60309@burnham.org> Message-ID: <20040318012546.GD99271@evostick.agtec.uga.edu> Hi Iddo; > Thanks to overwhelming demand (well, nobody really objected ;) biopython > now has the rudimentaries for handling reduced alphabets. I committed > the following changes: > > 1) in Bio.utils I added > > reduce_sequence(seq, reduction_table, new_alphabet=None) > > 2) in Alphabet, I added Reduce.py, which has reduction tables, and > reduced alphabet definitions + literature citations Thanks for this. Sorry I didn't have a chance to weigh in earlier, but I was out of town (actually on Biopython business). Everything looks good -- my only suggestion would be to add a bit more documentation to the modules, specifically Alphabet/Reduced.py. I think just copying and pasting the relevant bits from your original e-mail to a doc-string at the top would be a real help for someone searching around and saying "wellllll...what do we have here." Other than that, all good. Thanks for the fix on count_monomers -- I do think that's the right thing to do. We should really discourage using MutableSeqs (which is where the array stuff comes from) on for anything besides, well, mutating them -- so this fix is fine. Thanks for the contribution! Brad From idoerg at burnham.org Wed Mar 17 20:52:08 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Reduced alphabets In-Reply-To: <20040318012546.GD99271@evostick.agtec.uga.edu> References: <40578576.60309@burnham.org> <20040318012546.GD99271@evostick.agtec.uga.edu> Message-ID: <405900C8.6070106@burnham.org> Hi Brad, Document my code? Why do you think they call it "code"? OK, I'll do it. Tomorrow. After I recover from the green beer we are about to consume as part of a farewell party+St. Patrick's day lab night on the town.... ./I Brad Chapman wrote: > Hi Iddo; > > >>Thanks to overwhelming demand (well, nobody really objected ;) biopython >>now has the rudimentaries for handling reduced alphabets. I committed >>the following changes: >> >>1) in Bio.utils I added >> >> reduce_sequence(seq, reduction_table, new_alphabet=None) >> >>2) in Alphabet, I added Reduce.py, which has reduction tables, and >>reduced alphabet definitions + literature citations > > > Thanks for this. Sorry I didn't have a chance to weigh in earlier, > but I was out of town (actually on Biopython business). > > Everything looks good -- my only suggestion would be to add a bit > more documentation to the modules, specifically Alphabet/Reduced.py. > I think just copying and pasting the relevant bits from your > original e-mail to a doc-string at the top would be a real help for > someone searching around and saying "wellllll...what do we have > here." > > Other than that, all good. Thanks for the fix on count_monomers -- I > do think that's the right thing to do. We should really discourage > using MutableSeqs (which is where the array stuff comes from) on > for anything besides, well, mutating them -- so this fix is fine. > > Thanks for the contribution! > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From jeffrey_chang at stanfordalumni.org Mon Mar 22 11:10:40 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Fwd: Auto-discard notification Message-ID: <768C25C8-7C1B-11D8-B2FD-000A956845CE@stanfordalumni.org> Forwarding a message that was discarded by the spam filter... Jeff > From: Yair Benita > Date: March 22, 2004 9:35:50 AM EST > To: > Subject: Saving an HMM > > Hi All, > I have been playing with the HMM in biopython and I am happy to say it > actually works. However, it takes me about 30 minutes to train the > model and > I would like to save the trained HMM so that I can use it to predict > whenever I need to. Any idea how I can save the trained model? > Thanks, > Yair > -- > Yair Benita > Pharmaceutical Proteomics > Faculty of Pharmacy > Utrecht University From dyoo at hkn.eecs.berkeley.edu Mon Mar 22 21:18:22 2004 From: dyoo at hkn.eecs.berkeley.edu (Danny Yoo) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Fwd: Auto-discard notification In-Reply-To: <768C25C8-7C1B-11D8-B2FD-000A956845CE@stanfordalumni.org> Message-ID: > > I have been playing with the HMM in biopython and I am happy to say it > > actually works. However, it takes me about 30 minutes to train the > > model and I would like to save the trained HMM so that I can use it to > > predict whenever I need to. Any idea how I can save the trained model? Hi Yair, I haven't tested this yet, but if the HMM is in pure Python (as it appears to be!), then the 'pickle' or 'shelve' modules from the Standard Library may be able to store the HMM. Here's a link to the documentation: http://www.python.org/doc/lib/module-shelve.html http://www.python.org/doc/lib/module-pickle.html Good luck! From bugzilla-daemon at portal.open-bio.org Mon Mar 22 21:55:24 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1605] New: kMeans.py should be deprecated Message-ID: <200403230255.i2N2tO12025207@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1605 Summary: kMeans.py should be deprecated Product: Biopython Version: 1.24 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: jchang@biopython.org The functionality is present in Bio.Cluster, so this is now duplicated code. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mcolosimo at mitre.org Wed Mar 24 15:28:29 2004 From: mcolosimo at mitre.org (Marc Colosimo) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] BioSQL bugs Message-ID: First, I've added support for pgdb to DBUtils and did some testing the diff is at the end. Second the fix for taxon doesn't work. The problem is that it tries to enter NULLs for fields that are required to be unique. BioSQL.Loader line 188 parent_taxon_id = None for taxon in lineage: self.adaptor.execute( "INSERT INTO taxon(parent_taxon_id, ncbi_taxon_id, node_rank,"\ " left_value, right_value)" \ " VALUES (%s, %s, %s, %s, %s)", (parent_taxon_id, taxon[0], taxon[1], left_value, right_value)) This might work the first time, but since parent_taxon and other need to be unique this fails. I don't know a simple solution for this, except to give up and not put in a taxon_id (which isn't required for a bioentry). Index: DBUtils.py =================================================================== RCS file: /home/repository/biopython/biopython/BioSQL/DBUtils.py,v retrieving revision 1.2 diff -r1.2 DBUtils.py 37c37 < class Pg_dbutils(Generic_dbutils): --- > class Psycopg_dbutils(Generic_dbutils): 54c54,75 < _dbutils["psycopg"] = Pg_dbutils --- > _dbutils["psycopg"] = Psycopg_dbutils > > class Pgdb_dbutils(Generic_dbutils): > def next_id(self, cursor, table): > table = self.tname(table) > sql = r"select nextval('%s_pk_seq')" % table > cursor.execute(sql) > rv = cursor.fetchone() > return rv[0] > > def last_id(self, cursor, table): > table = self.tname(table) > sql = r"select currval('%s_pk_seq')" % table > cursor.execute(sql) > rv = cursor.fetchone() > return rv[0] > > def autocommit(self, conn, y = True): > raise NotImplementedError("pgdb does not support this!") > > _dbutils["pgdb"] = Pgdb_dbutils > From chapmanb at uga.edu Wed Mar 24 16:35:53 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] BioSQL bugs In-Reply-To: References: Message-ID: <20040324213553.GL22666@evostick.agtec.uga.edu> Hi Marc; > First, I've added support for pgdb to DBUtils and did some testing the > diff is at the end. Thanks. I've just checked your patch in. The only problem I have is with the autocommit functionality. I dug around on mailing lists and the like and do see that PyGreSQL doesn't support anything like this -- however, do you have any ideas to make the Tests work without this type of functionality. The problem (as far as I can see it right now) is that if a connection is opened then you can't do DROPs (or CREATE?). However if you don't have an open connection, then you can't execute SQL so you can't do the DROPs either. So I guess maybe it's a catch 22 that really only affects the tests (where we need to do this annoying dropping and creating automatically), but do you (or anything) have any clever ideas to work around this so that the Tests will work? > Second the fix for taxon doesn't work. The problem > is that it tries to enter NULLs for fields that are required to be > unique. > > BioSQL.Loader > line 188 parent_taxon_id = None > for taxon in lineage: > self.adaptor.execute( > "INSERT INTO taxon(parent_taxon_id, ncbi_taxon_id, > node_rank,"\ > " left_value, right_value)" \ > " VALUES (%s, %s, %s, %s, %s)", (parent_taxon_id, > taxon[0], > taxon[1], > left_value, > right_value)) > > This might work the first time, but since parent_taxon and other need > to be unique this fails. I don't know a simple solution for this, > except to give up and not put in a taxon_id (which isn't required for a > bioentry). Okay, I was playing around with this and fixed it for a problem I was having (with non-unique right_values) in an ugly way which I'm sure is not right. My real problem is I don't understand the table: CREATE TABLE taxon ( taxon_id INT(10) UNSIGNED NOT NULL auto_increment, ncbi_taxon_id INT(10), parent_taxon_id INT(10) UNSIGNED, node_rank VARCHAR(32), genetic_code TINYINT UNSIGNED, mito_genetic_code TINYINT UNSIGNED, left_value INT(10) UNSIGNED, right_value INT(10) UNSIGNED, PRIMARY KEY (taxon_id), UNIQUE (ncbi_taxon_id), UNIQUE (left_value), UNIQUE (right_value) ) TYPE=INNODB; Okay, so the problem is that I have no idea what parent_taxon_id, left_value and right_value are. I assume that they are supposed to represent some kind of heirarchy of taxonomy. As near as I can figure if you have a tree like: A -> B -> C -> D | --> E Then this table would be filled for C with parent_taxon_id to be B's taxon_id, left_value to be D's taxon_id and right_value to be E's taxon_id. Is this right at all or am I completely confused? I can take a hit at this, but without really getting the table I've been stumped so far and just stare at it scratching my head. Thanks for the work on BioSQL. Sorry if I am a bit (a lot) confused about things at the moment. Brad From mcolosimo at mitre.org Fri Mar 26 10:48:07 2004 From: mcolosimo at mitre.org (Marc Colosimo) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] BioSQL bugs Message-ID: <406450B7.6060401@mitre.org> > > >Hi Marc; > >>/ First, I've added support for pgdb to DBUtils and did some testing the >/>/ diff is at the end. >/ >Thanks. I've just checked your patch in. The only problem I have is >with the autocommit functionality. I dug around on mailing lists and >the like and do see that PyGreSQL doesn't support anything like this >-- however, do you have any ideas to make the Tests work without >this type of functionality. > >The problem (as far as I can see it right now) is that if a >connection is opened then you can't do DROPs (or CREATE?). However >if you don't have an open connection, then you can't execute SQL so >you can't do the DROPs either. So I guess maybe it's a catch 22 that >really only affects the tests (where we need to do this annoying >dropping and creating automatically), but do you (or anything) have >any clever ideas to work around this so that the Tests will work? > > I think bioperl makes use of functions (see the biosqldb-pg.sql). I was thinking about adding some of these function calls to the DBUtils section to speed up the transactions. Removing some of the constraints will increase the speed as the database grows. This code works fine for small sets, but it quickly slows down (probably because of the checks). >>/ Second the fix for taxon doesn't work. The problem >/>/ is that it tries to enter NULLs for fields that are required to be >/>/ unique. >/>/ >/>/ BioSQL.Loader >/>/ line 188 parent_taxon_id = None >/>/ for taxon in lineage: >/>/ self.adaptor.execute( >/>/ "INSERT INTO taxon(parent_taxon_id, ncbi_taxon_id, >/>/ node_rank,"\ >/>/ " left_value, right_value)" \ >/>/ " VALUES (%s, %s, %s, %s, %s)", (parent_taxon_id, >/>/ taxon[0], >/>/ taxon[1], >/>/ left_value, >/>/ right_value)) >/>/ >/>/ This might work the first time, but since parent_taxon and other need >/>/ to be unique this fails. I don't know a simple solution for this, >/>/ except to give up and not put in a taxon_id (which isn't required for a >/>/ bioentry). >/ >Okay, I was playing around with this and fixed it for a problem I >was having (with non-unique right_values) in an ugly way which I'm >sure is not right. > >My real problem is I don't understand the table: > >CREATE TABLE taxon ( > taxon_id INT(10) UNSIGNED NOT NULL auto_increment, > ncbi_taxon_id INT(10), > parent_taxon_id INT(10) UNSIGNED, > node_rank VARCHAR(32), > genetic_code TINYINT UNSIGNED, > mito_genetic_code TINYINT UNSIGNED, > left_value INT(10) UNSIGNED, > right_value INT(10) UNSIGNED, > PRIMARY KEY (taxon_id), > UNIQUE (ncbi_taxon_id), > UNIQUE (left_value), > UNIQUE (right_value) >) TYPE=INNODB; > >Okay, so the problem is that I have no idea what parent_taxon_id, >left_value and right_value are. I assume that they are supposed to >represent some kind of heirarchy of taxonomy. As near as I can >figure if you have a tree like: > > These values are needed for nested-set representation . They are used to quickly limit a branch of a tree. Selecting on the values >= the left and <= the right gives you all the elements under that part of the tree. I don't think it would be easy to add a new element to the tree with out rebuilding the whole representation. Therefore, I just skip it and put in a null (and print out that it wasn't known). This needs to be fixed in the source of the data. >A -> B -> C -> D > | > --> E > >Then this table would be filled for C with parent_taxon_id to be B's >taxon_id, left_value to be D's taxon_id and right_value to be E's >taxon_id. > >Is this right at all or am I completely confused? I can take a hit >at this, but without really getting the table I've been stumped so >far and just stare at it scratching my head. > >Thanks for the work on BioSQL. Sorry if I am a bit (a lot) confused >about things at the moment. >Brad > From bugzilla-daemon at portal.open-bio.org Fri Mar 26 11:03:47 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1608] NCBIStandalone.py dies parsing long blastpgp -m6 output Message-ID: <200403261603.i2QG3l8t011835@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1608 ------- Additional Comments From j.a.casbon@qmul.ac.uk 2004-03-26 11:03 ------- Created an attachment (id=121) --> (http://bugzilla.bioperl.org/attachment.cgi?id=121&action=view) sample blast output that causes crash This blast output causes the crash for me. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 26 11:00:58 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1608] New: NCBIStandalone.py dies parsing long blastpgp -m6 output Message-ID: <200403261600.i2QG0wcF011745@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1608 Summary: NCBIStandalone.py dies parsing long blastpgp -m6 output Product: Biopython Version: 1.24 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: j.a.casbon@qmul.ac.uk When using: blastout = os.popen("zcat %s" % file) b_parser = NCBIStandalone.PSIBlastParser() b_iterator = NCBIStandalone.Iterator(blastout, b_parser) for b_record in b_iterator: to parse blast output in multiple alignment format (-m6), the parser dies on some files, and not others. It seems not to like longer files - but this is just my feeling. This is biopython 2.4 on debian linux unstable. I have blast output that produces this bug available at: http://compbio.mds.qmw.ac.uk/~james/78700.blo.gz Here is the stack trace: File "/home/james/exp/iss/shared_sequences", line 63, in get_seqs for b_record in b_iterator: File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 1332, in next return self._parser.parse(File.StringHandle(data)) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 571, in parse self._scanner.feed(handle, self._consumer) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 97, in feed self._scan_rounds(uhandle, consumer) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 152, in _scan_rounds self._scan_descriptions(uhandle, consumer) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 272, in _scan_descriptions read_and_call_until(uhandle, consumer.description, blank=1) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/ParserSupport.py", line 340, in read_and_call_until method(line) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 637, in description dh = self._parse(line) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 694, in _parse dh.score = _safe_int(dh.score) File "/home/james/biopython-1.24/build/lib.linux-i686-2.3/Bio/Blast/NCBIStandalone.py", line 1602, in _safe_int return long(float(str)) ValueError: invalid literal for float(): RPVV----RDD-----R-------P----D----L-I-Y--R-----------T---MEG ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From iliketobicycle at yahoo.ca Sat Mar 27 09:50:41 2004 From: iliketobicycle at yahoo.ca (Harry Zuzan) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] documentation for modules: happydoc/epydoc ? Message-ID: <20040327145041.96064.qmail@web21403.mail.yahoo.com> Hi, I'm trying to put some code in shape for submission to BioPython. It's for handling data from Affymetrix GeneChips. It's efficiently handles both the DAT image files and the probe cell data including the probe sequences. The first thing that I want to do is take care of documentation. I'm not sure if I should be using happydoc or epydoc. I'm also not sure how to get both html and pdf documentation from the same source. Is the code in C and C++ modules documented in a separate way? Best, Harry Zuzan ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca From chapmanb at uga.edu Tue Mar 30 18:06:45 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] documentation for modules: happydoc/epydoc ? In-Reply-To: <20040327145041.96064.qmail@web21403.mail.yahoo.com> References: <20040327145041.96064.qmail@web21403.mail.yahoo.com> Message-ID: <20040330230645.GD29401@evostick.agtec.uga.edu> Hi Harry; > I'm trying to put some code in shape for submission to BioPython. It's > for handling data from Affymetrix GeneChips. It's efficiently handles > both the DAT image files and the probe cell data including the probe > sequences. Great! We could definitely use something like this. Thanks for writing it and thinking of submitting it. > The first thing that I want to do is take care of > documentation. > I'm not sure if I should be using happydoc or epydoc. Well, you don't really need to use either. These are just automated ways to extract documentation from the source code (module, class and function documentation) so that they are readable on the web. The important thing is to have good documentation of your code and then it will extract it. At the bottom of our guide for contributing: http://www.biopython.org/docs/developer/contrib.html there are some hints about writing documentation strings so that epydoc (which is what we are using now) can deal with them best. > I'm also not > sure how to get both html and pdf documentation from the same source. This is a bit of a different question. If you'd like to write separate cookbook-style documents for how to use your code (which is definitely encouraged) you can write them in any format you like which is displayable on the web. Plain text or HTML (preferrably simple, hand-edtiable HTML -- not MS Word generated type) is fine. The PDF/HTML documentation is generated using LaTeX along with HeVeA to make the HTML. But if you don't know LaTeX -- any way to write it that you like is great. > Is the code in C and C++ modules documented in a separate way? I'd just encourage good C commenting here. Not knowing how your modules work, I'd assume you have some Python code wrappers around the C code so that people don't access the C-written code directly. In that case, it's most important that the code itself is documented so that others can understand what you wrote and how to find or fix bugs, if any come up. Thanks again and be sure to write if you have other questions! Brad From chapmanb at uga.edu Tue Mar 30 19:31:37 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] BioSQL bugs In-Reply-To: <406450B7.6060401@mitre.org> References: <406450B7.6060401@mitre.org> Message-ID: <20040331003137.GH29401@evostick.agtec.uga.edu> Hi Mark; [I ask how we can make tests work without autocommit] > I think bioperl makes use of functions (see the biosqldb-pg.sql). I was > thinking about adding some of these function calls to the DBUtils > section to speed up the transactions. Removing some of the constraints > will increase the speed as the database grows. This code works fine for > small sets, but it quickly slows down (probably because of the checks). That would be great -- honestly, I am not a database expert at all (as you can probably tell from my mails). This seems like a good place to start. I'd definitely appreciate more contributions along this line from you if you'd be willing to do more work on it. [I'm confused about the taxon table as well] > These values are needed for nested-set representation > . > They are used to quickly limit a branch of a tree. Selecting on the > values >= the left and <= the right gives you all the elements under > that part of the tree. I don't think it would be easy to add a new > element to the tree with out rebuilding the whole representation. > Therefore, I just skip it and put in a null (and print out that it > wasn't known). This needs to be fixed in the source of the data. Thanks for the link. That makes good sense now -- it seems as if the intent is to have the taxonomy information pre-loaded from taxon tables, and then linking to the taxon table when loading records. I agree with you -- I think the best way to handle it is to add functionality (maybe to a mixin class that DBServer can derive from) to load taxon table information into a database. Then, if this taxon information exists link to it, otherwise add nulls as you suggest. Thanks for the explanations! Brad From bugzilla-daemon at portal.open-bio.org Tue Mar 30 20:04:39 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1608] NCBIStandalone.py dies parsing long blastpgp -m6 output Message-ID: <200403310104.i2V14dmk029750@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1608 ------- Additional Comments From chapmanb@arches.uga.edu 2004-03-30 20:04 ------- Created an attachment (id=126) --> (http://bugzilla.bioperl.org/attachment.cgi?id=126&action=view) Fix for the problem commited to revision 1.52 of NCBIStandalone ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 30 20:07:14 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1608] NCBIStandalone.py dies parsing long blastpgp -m6 output Message-ID: <200403310107.i2V17EfI029794@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1608 chapmanb@arches.uga.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From chapmanb@arches.uga.edu 2004-03-30 20:07 ------- James, thanks for the report. The problem was that sometimes the line: Sequences not found previously or not previously below threshold: which the parser expected to be followed by one or more descriptions, actually contains no descriptions (ie. there are no more sequences to find). In this case it kept on trying to get descriptions and got to the error you noted by trying to parse an alignment line as a description. Fixes are checked into CVS and a patch is attached to the bug. Thanks! ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From miehe at mail.ipk-gatersleben.de Wed Mar 31 11:41:51 2004 From: miehe at mail.ipk-gatersleben.de (miehe@mail.ipk-gatersleben.de) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] (no subject) Message-ID: <1080751311.406af4cfe2d54@webmail.ipk-gatersleben.de> Hello, The parser expression for blastn defined in Bio.expressions.blast.ncbiblast.py is broken for version BLASTN 2.2.8 [Jan-05-2004] (and even for the older BLASTN 2.2.6 [Apr-09-2003]). In the output is an additional 'hsp_info' section, which was not defined in the blastn expression. This could be patched in one line. Here I send you the diff to the current CVS version. *************** *** 441,444 **** --- 441,445 ---- gap_penalties_stats + generic_info1 + + Opt(hsp_info) + generic_info2 + t_info + ------------------- With best regards, Heiko