From mdehoon at ims.u-tokyo.ac.jp Mon Jul 5 00:40:05 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] Bio.Seq and alphabets Message-ID: <40E8DBA5.6000102@ims.u-tokyo.ac.jp> I've been working on a complement() and reverse_complement() function for Bio.Seq's Seq and MutableSeq classes. Previously, similar functions existed in various places in Biopython. I am not sure though how to deal with the alphabet associated with a Seq or MutableSeq object. For example, a Seq can be created where the sequence is inconsistent with the alphabet: >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> Seq('GATCGACXYSMDG_or_any_funny_char_u_like_eg_*&$%', IUPAC.unambiguous_dna) Seq('GATCGACXYSMDG_or_any_funny_char_u_like_eg_*&$%', IUPACUnambiguousDNA()) With a MutableSeq, one can change the sequence regardless of the alphabet: >>> from Bio.Seq import MutableSeq >>> s = MutableSeq('ACTGCCATCGT', IUPAC.unambiguous_dna) >>> s[9] = 'X' >>> s MutableSeq(array('c', 'ACTGCCATCXT'), IUPACUnambiguousDNA()) Anyway, my immediate concern is how to deal with uppercase and lowercase characters. The reverse_complement function in Bio.GFF.easy converts lowercase characters to uppercase before taking the complement: def _forward_complement_list_with_table(table, seq): return [table[x] for x in seq.tostring().upper()] However, the complement and antiparallel functions in Bio.SeqUtils are not implemented for lowercase sequences: _before = ''.join(IUPACData.ambiguous_dna_complement.keys()) _after = ''.join(IUPACData.ambiguous_dna_complement.values()) _ttable = maketrans(_before, _after) def complement(seq): """Returns the complementary sequence (NOT antiparallel). This works on string sequences, not on Bio.Seq objects. """ #Much faster on really long sequences than the previous loop based one. #thx to Michael Palmer, University of Waterloo return seq.translate(_ttable) So there are two issues we need to decide: 1) Should we modify the Seq and MutableSeq classes such that the sequence is always consistent with the alphabet? 2) Should we allow lowercase characters in the sequence? My own preference at this point is 1) yes 2) no, but I'd like to check what y'all think. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From crocha at dc.uba.ar Tue Jul 6 13:52:11 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] hmmpfam parser In-Reply-To: 40226C51.F70315B7@ebc.uu.se Message-ID: <1089136330.19621.34.camel@numero2> Hi, While I was searching for a hmmsearch output parser for biopython, I had found a mail from you to the biopython-dev list with a source code. I'm interesting to use it to parse a lot of hmm results but I would like to know if exists a mature version and if you can append it to the Bio-python CVS. If I can help you to testing and appending code to your parser will be a pleasure. I really need these code. I was learning Martel to write a parser, but I prefer help you than write one alone. Thanks, Cristian. PD: Sorry about my bad english... :) -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 From tcwilliams79 at verizon.net Mon Jul 5 21:35:44 2004 From: tcwilliams79 at verizon.net (Thomas C. Williams) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] ReportLab toolkit is now at http://www.reportlab.org/rl_toolkit.html Message-ID: <20040706013548.CKMQ6671.out003.verizon.net@TCWHP> From h.j.tipney at stud.man.ac.uk Tue Jul 13 09:56:43 2004 From: h.j.tipney at stud.man.ac.uk (h.j.tipney@stud.man.ac.uk) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] python newbies blast problem Message-ID: Hi I posted this to the other mailing list and got no response so I'm hoping you guys can help me. I'm very new to programming and even newer to python, so I apologise in advance if this is a simple problem with an obvious solution but there are no python programmers near to help me. Anyway, I inherited the script below and have been using it on and off as part of a larger workflow. It has been running fine, but I ran it again last week and it didn't give the output I expected - it returned the 'your results will be updated in X seconds' page rather than the actual results. It has been a while since I had used this program and both blast and biopython had been updated so I've now got the new biopython release (1.30) but I still get the 'wrong' output. I'm using python 2.3.3 on solaris, if that helps. Any help would be greatly appreciated! Thank you in advance Hannah Tipney #!/opt/cs/bin/python from Bio import Fasta from Bio.Blast import NCBIWWW import sys import getopt opts, args = getopt.getopt(sys.argv[1:],"",['program=','database=','format=','e ntrez_query=']) print sys.argv print opts if len(args)==0: print "no file given" sys.exit(2) program = "blastn" database = "nr" format = "Text" #"Homo sapiens [ORGN]" short_query="" for o,a in opts: print o,a if o == "--program": program = a if o == "--database": database = a if o == "--format": format = a if o == "--entrez_query": short_query = a if short_query=="human": query="Homo sapiens [ORGN]" else: query="" print "program = %s , database = %s, query = %s" % (program,database,query) file_for_blast = open(args[0], 'r') f_iterator = Fasta.Iterator(file_for_blast) f_record = f_iterator.next() file_for_blast.close() b_results = NCBIWWW.blast(program, database, f_record,format_type=format, entrez_query=query,timeout=60) blast_results = b_results.read() sys.stdout.write(blast_results) ------- End of forwarded message ------- ------------------------------------------ Hannah Tipney Manchester University, Academic Unit of Medical Genetics, St Mary's Hospital, Hathersage Road, Manchester. M13 0JH. UK tel: +44 (0)161 276 6602 fax: +44 (0)161 276 6606 From jeffrey_chang at stanfordalumni.org Tue Jul 13 23:47:32 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] python newbies blast problem In-Reply-To: References: Message-ID: <893C92DD-D548-11D8-8676-000A956845CE@stanfordalumni.org> Hello, This is because the NCBI website is not really meant to be queried by computer scripts. It looks like a recent change has broken the NCBIWWW.blast function. Fortunately, NCBI does have a computer friendly BLAST API called QBLAST. I added an interface to QBLAST into biopython called NCBIWWW.qblast. Please get the updated version of the NCBIWWW.py from CVS, and replace NCBIWWW.blast with NCBIWWW.qblast in your script, and see if that fixes things. The anonymous CVS is at: http://cvs.biopython.org/ Jeff On Jul 13, 2004, at 9:56 AM, h.j.tipney@stud.man.ac.uk wrote: > Hi > I posted this to the other mailing list and got no response so I'm > hoping you guys can help me. I'm very new to programming and > even newer to python, so I apologise in advance if this is a simple > problem with an obvious solution but there are no python > programmers > near to help me. Anyway, I inherited the script below and have been > using it on and off as part of a larger workflow. It has been running > fine, but I ran it again last week and it didn't give the output I > expected - it returned the 'your results will be updated in X seconds' > page rather than the actual results. It has been a while since I had > used this program and both blast and biopython had been updated > so > I've now got the new biopython release (1.30) but I still get the > 'wrong' output. I'm using python 2.3.3 on solaris, if that helps. Any > help would be greatly appreciated! Thank you in advance Hannah > Tipney > > #!/opt/cs/bin/python > from Bio import Fasta > from Bio.Blast import NCBIWWW > import sys > import getopt > > opts, args = > getopt.getopt(sys.argv[1:],"",['program=','database=','format=','e > ntrez_query=']) > > print sys.argv > print opts > > if len(args)==0: > print "no file given" > sys.exit(2) > > program = "blastn" > database = "nr" > format = "Text" > #"Homo sapiens [ORGN]" > > short_query="" > > for o,a in opts: > print o,a > if o == "--program": > program = a > if o == "--database": > database = a > if o == "--format": > format = a > if o == "--entrez_query": > short_query = a > > if short_query=="human": > query="Homo sapiens [ORGN]" > else: > query="" > > print "program = %s , database = %s, query = %s" % > (program,database,query) > > file_for_blast = open(args[0], 'r') > f_iterator = Fasta.Iterator(file_for_blast) > > f_record = f_iterator.next() > file_for_blast.close() > b_results = NCBIWWW.blast(program, database, > f_record,format_type=format, entrez_query=query,timeout=60) > > blast_results = b_results.read() > sys.stdout.write(blast_results) > > ------- End of forwarded message ------- > ------------------------------------------ > Hannah Tipney > Manchester University, > Academic Unit of Medical Genetics, > St Mary's Hospital, > Hathersage Road, > Manchester. M13 0JH. > UK > > tel: +44 (0)161 276 6602 > fax: +44 (0)161 276 6606 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Wed Jul 14 00:35:46 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] [Bug 1667] New: PUBMED key collision in dbxref table Message-ID: <200407140435.i6E4ZkSV012149@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1667 Summary: PUBMED key collision in dbxref table Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev@biopython.org ReportedBy: open-bio@zesty.ca I am using BioPython 1.30. While loading records from the human genome into a MySQL database, BioSQL causes the error: "Duplicate entry PUBMED-0 for key 2". PUBMED appears in the dbxref table. I looked at the code that inserts entries into the dbxref table: the method _add_dbxref at line 97 of BioSQL/Loader.py. _add_dbxref is called twice, at lines 333 and 336. I believe the second call has a bug, since both calls supply "reference.medline_id" as an argument. if reference.medline_id: dbxref_id = self._add_dbxref("MEDLINE", reference.medline_id, 0) elif reference.pubmed_id: dbxref_id = self._add_dbxref("PUBMED", reference.medline_id, 0) It seems clear to me that the last line above should say "reference.pubmed_id". If i make this change in my local copy of BioSQL/Loader.py, the MySQL error about the duplicate key value indeed goes away. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From Hegedus.Tamas at mayo.edu Thu Jul 15 14:58:18 2004 From: Hegedus.Tamas at mayo.edu (Hegedus, Tamas .) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] ModBioSQL release 0.12 Message-ID: Dear All, Since I used Python and BioPython in my Modular BioSQL packege, my site would be interesting for you: http://www.biomembrane.hu/~hegedus/modbiosql/ Best regards, Tamas -- Tamas Hegedus, Research Fellow | phone: 480-301-6041 Mayo Clinic Scottsdale | fax: 480-301-7017 13000 E. Shea Blvd | mailto:hegedus.tamas@mayo.edu Scottsdale, AZ, 85259 | http://www.biomembrane.hu/~hegedus From bugzilla-daemon at portal.open-bio.org Sat Jul 17 13:42:13 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] [Bug 1669] New: SwissProt Parser error - cannot read recent SwissProt entries Message-ID: <200407171742.i6HHgDhF025690@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1669 Summary: SwissProt Parser error - cannot read recent SwissProt entries Product: Biopython Version: 1.24 Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: kris@math.princeton.edu RX field of a SwissProt entry can, in newer records, be more than 1 line long, while Sprot.py only accepts one line per record. See error message below. RX is database reference of the article relevant to the entry, and Swissprot has recently added DOI references as well. RA should be the next field after RX, but if there are two RX lines in the record, parser chokes. Traceback (most recent call last): File "", line 1, in ? File "find_pdb_orgs.py", line 33, in parse_yeast curr=siter.next() File "Bio/SwissProt/SProt.py", line 166, in next return self._parser.parse(File.StringHandle(data)) File "Bio/SwissProt/SProt.py", line 290, in parse self._scanner.feed(handle, self._consumer) File "Bio/SwissProt/SProt.py", line 333, in feed self._scan_record(uhandle, consumer) File "Bio/SwissProt/SProt.py", line 338, in _scan_record fn(self, uhandle, consumer) File "Bio/SwissProt/SProt.py", line 414, in _scan_reference self._scan_ra(uhandle, consumer) File "Bio/SwissProt/SProt.py", line 436, in _scan_ra one_or_more=1) File "Bio/SwissProt/SProt.py", line 360, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'RA': RX DOI=10.1128/JB.183.20.5942-5955.2001; ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From fsms at users.sourceforge.net Mon Jul 26 12:45:51 2004 From: fsms at users.sourceforge.net (fsms@users.sourceforge.net) Date: Sat Mar 5 14:43:35 2005 Subject: [Biopython-dev] Restriction analysis package Message-ID: <4105353F.3000102@users.sourceforge.net> Hi, The restriction analysis package is now ready. Complete with a tutorial/cookbook section in html. If you give me access to the CVS I can commit it in this week. Alternatively the files are in the CVS at : http://cvs.sourceforge.net/viewcvs.py/rana/rana/Bio/Restriction/ Fred