From idoerg at burnham.org Thu Feb 2 16:14:57 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu Feb 2 16:18:54 2006 Subject: [Biopython-dev] Re: [BioPython] Bug in Bio.SeqUtils ? In-Reply-To: <43E250E5.6030901@burnham.org> References: <71dea9850602020635u37a7294dv15911521deeee656@mail.gmail.com> <43E250E5.6030901@burnham.org> Message-ID: <43E27651.3010504@burnham.org> Oh, sorry. Your second problem was with protein_scale, which does indeed break on any letter not of the 20 regular amino acids. I inserted this into a try/except clause which produces a warning to stderr, instead of raising an exception. It is now in CVS. Yair, is that OK, or would we rather leave the exception raising bit there? There are arguments either way... ./I Iddo Friedberg wrote: > Which version are you using? I tried the 1a8y sequence which you gave, > and also a sequence with an 'X', and they worked fine for me. CVS > version. > > # seq is a Record object. seq.sequence is a string with the protein > sequence > > >>> from Bio.SeqUtils import ProtParam > >>> ps = ProtParam.ProteinAnalysis(seq.sequence) > >>> ps.isoelectric_point() > 3.9298931884765151 > > > # and for a sequence with an 'x' > >>> ps2 = ProtParam.ProteinAnalysis('xsdfgvcrtyip') > >>> ps2.isoelectric_point() > 5.8285980224609375 > > Bin Hu wrote: > >> Hi, >> >> When using Bio.SeqUtils to estimate isoelectric point for PDB entry >> 1a8y, it >> seems the function isoelectric_point() cannot reach an end, although it >> worked pretty well for all the other entries that I've tested. Could >> this be >> a bug in Bio.SeqUtils? >> >> If anyone want to test it, blow is the sequence of 1a8y: >> >> eegldfpeydgvdrvinvnaknyknvfkkyevlallyheppeddkasqrqfemeelilel >> aaqvledkgvgfglvdsekdaavakklglteedsiyvfkedevieydgefsadtlvefll >> dvledpveliegerelqafeniedeikligyfknkdsehykafkeaaeefhpyipffatf >> dskvakkltlklneidfyeafmeepvtipdkpnseeeivnfveehrrstlrklkpesmye >> tweddmdgihivafaeeadpdgyefleilksvaqdntdnpdlsiiwidpddfpllvpywe >> ktfdidlsapqigvvnvtdadsvwmemddeedlpsaeeledwledvlegeintedddded >> ddddddd >> >> For PDB entry 1rb9, the hydrophilicity of this protein cannot be >> estimated >> because its sequence starts with "X", which is not in the key list >> used by >> SeqUtils. It will bring the following error message: >> >> Traceback (most recent call last): >> File "./dataGen.py", line 62, in ? >> aHydrophilicityList = aSeqObj.protein_scale(ProtParamData.hw, 5) >> File "/usr/lib/python2.4/site-packages/Bio/SeqUtils/ProtParam.py", line >> 206, in protein_scale >> score += weight[j] * ParamDict[subsequence[j]] + weight[j] * >> ParamDict[subsequence[Window-j-1]] >> KeyError: 'X' >> >> Although I can delete the "X" in this protein, could the author >> implement a >> warning message and work around this error stop? Thank you. >> >> Bin >> >> _______________________________________________ >> BioPython mailing list - BioPython@biopython.org >> http://biopython.org/mailman/listinfo/biopython >> >> >> >> > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From bugzilla-daemon at portal.open-bio.org Fri Feb 3 04:44:54 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 05:34:57 2006 Subject: [Biopython-dev] [Bug 1942] New: GenBank RecordParser fails on particular qualifier structure Message-ID: <200602030944.k139isnV025390@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 Summary: GenBank RecordParser fails on particular qualifier structure Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: lpritc@scri.sari.ac.uk When parsing some GenBank record files, the GenBank.RecordParser throws an error at a (poorly-formatted) qualifier entry: Python 2.3.4 (#1, Feb 2 2005, 12:11:53) [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.GenBank import RecordParser >>> parser = RecordParser() >>> record = parser.parse(file('NC_002758.gbk')) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 240, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1533, in feed assert line[0:1]=='/', \ AssertionError: Expected start of new qualifier, not: similar to bacteriophage terminase small subunit" This problem has been observed for several GenBank .gbk files, including NC_002758 above, and NC_002929. It appears to be caused by qualifiers structured like /note in the following example: CDS 878043..878612 /locus_tag="SAV0800" /note=" similar to bacteriophage terminase small subunit" /codon_start=1 /transl_table=11 /product="similar to bacteriophage terminase small subunit" /protein_id="NP_371324.1" /db_xref="GI:15923790" /db_xref="GeneID:1120775" /translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD DEELDKTVKDVSNANPNHTVIVDDIPLED" where the first double-quotes in the qualifier value are directly followed by '\n', and the description continues on the next line. Editing the source .gbk file directly to remove this resolves the problem. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mcolosimo at mitre.org Fri Feb 3 08:21:20 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri Feb 3 09:39:00 2006 Subject: [Biopython-dev] Error building PDB with cygwin Message-ID: <43E358D0.5000703@mitre.org> I got this error while trying to build biopython (normally I use OS X or linux). What is the fllib? gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.5.19-i686-2.4/Bio/P DB/mmCIF/lex.yy.o build/temp.cygwin-1.5.19-i686-2.4/Bio/PDB/mmCIF/MMCIFlexmodule .o -L/usr/lib/python2.4/config -lfl -lpython2.4 -o build/lib.cygwin-1.5.19-i686- 2.4/Bio/PDB/mmCIF/MMCIFlex.dll /usr/lib/gcc/i686-pc-cygwin/3.4.4/../../../../i686-pc-cygwin/bin/ld: cannot find -lfl collect2: ld returned 1 exit status From yair.benita at gmail.com Fri Feb 3 03:39:38 2006 From: yair.benita at gmail.com (Yair Benita) Date: Fri Feb 3 10:27:48 2006 Subject: [Biopython-dev] Re: [BioPython] Bug in Bio.SeqUtils ? In-Reply-To: <43E27651.3010504@burnham.org> Message-ID: Hi, Sorry I missed to follow up on that bug. I need to revise the isoelectric point anyway since in some rare cases it gets stuck in an endless while loop. I will also look into adding code to handle the X in the amino acid sequence. For now I think its OK to produce a warning instead of an exception. Yair on 2/2/06 10:14 PM, Iddo Friedberg at idoerg@burnham.org wrote: > Oh, sorry. > > Your second problem was with protein_scale, which does indeed break on > any letter not of the 20 regular amino acids. > > I inserted this into a try/except clause which produces a warning to > stderr, instead of raising an exception. It is now in CVS. > > Yair, is that OK, or would we rather leave the exception raising bit > there? There are arguments either way... > > > ./I > > > Iddo Friedberg wrote: > >> Which version are you using? I tried the 1a8y sequence which you gave, >> and also a sequence with an 'X', and they worked fine for me. CVS >> version. >> >> # seq is a Record object. seq.sequence is a string with the protein >> sequence >> >>>>> from Bio.SeqUtils import ProtParam >>>>> ps = ProtParam.ProteinAnalysis(seq.sequence) >>>>> ps.isoelectric_point() >> 3.9298931884765151 >> >> >> # and for a sequence with an 'x' >>>>> ps2 = ProtParam.ProteinAnalysis('xsdfgvcrtyip') >>>>> ps2.isoelectric_point() >> 5.8285980224609375 >> >> Bin Hu wrote: >> >>> Hi, >>> >>> When using Bio.SeqUtils to estimate isoelectric point for PDB entry >>> 1a8y, it >>> seems the function isoelectric_point() cannot reach an end, although it >>> worked pretty well for all the other entries that I've tested. Could >>> this be >>> a bug in Bio.SeqUtils? >>> >>> If anyone want to test it, blow is the sequence of 1a8y: >>> >>> eegldfpeydgvdrvinvnaknyknvfkkyevlallyheppeddkasqrqfemeelilel >>> aaqvledkgvgfglvdsekdaavakklglteedsiyvfkedevieydgefsadtlvefll >>> dvledpveliegerelqafeniedeikligyfknkdsehykafkeaaeefhpyipffatf >>> dskvakkltlklneidfyeafmeepvtipdkpnseeeivnfveehrrstlrklkpesmye >>> tweddmdgihivafaeeadpdgyefleilksvaqdntdnpdlsiiwidpddfpllvpywe >>> ktfdidlsapqigvvnvtdadsvwmemddeedlpsaeeledwledvlegeintedddded >>> ddddddd >>> >>> For PDB entry 1rb9, the hydrophilicity of this protein cannot be >>> estimated >>> because its sequence starts with "X", which is not in the key list >>> used by >>> SeqUtils. It will bring the following error message: >>> >>> Traceback (most recent call last): >>> File "./dataGen.py", line 62, in ? >>> aHydrophilicityList = aSeqObj.protein_scale(ProtParamData.hw, 5) >>> File "/usr/lib/python2.4/site-packages/Bio/SeqUtils/ProtParam.py", line >>> 206, in protein_scale >>> score += weight[j] * ParamDict[subsequence[j]] + weight[j] * >>> ParamDict[subsequence[Window-j-1]] >>> KeyError: 'X' >>> >>> Although I can delete the "X" in this protein, could the author >>> implement a >>> warning message and work around this error stop? Thank you. >>> >>> Bin >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython@biopython.org >>> http://biopython.org/mailman/listinfo/biopython >>> >>> >>> >>> >> >> > From bugzilla-daemon at portal.open-bio.org Fri Feb 3 11:13:29 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 11:35:46 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure Message-ID: <200602031613.k13GDT4L000333@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 ------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-03 11:13 ------- Which version of BioPython are you using? I thought this was fixed in CVS, see bug 1903 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 3 11:30:41 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 11:36:04 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure Message-ID: <200602031630.k13GUfLF000661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-03 11:30 ------- Using the CVS copy of Bio/GenBank/__init__.py your example works for me. Please reopen the bug or follow up on the mailing list if that doesn't solve the problem for you. My copy of the NC_002758 GenBank file has the same "bad" note entry, it starts: LOCUS NC_002758 2878529 bp DNA circular BCT 19-JAN-2005 Sample output: Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.GenBank import RecordParser >>> parser = RecordParser() >>> record = parser.parse(file('NC_002758.gbk')) >>> print record.features[1635] CDS 878043..878612 /locus_tag="SAV0800" /note=" similar to bacteriophage terminase small subunit" /codon_start=1 /transl_table=11 /product="similar to bacteriophage terminase small subunit" /protein_id="NP_371324.1" /db_xref="GI:15923790" /db_xref="GeneID:1120775" /translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD DEELDKTVKDVSNANPNHTVIVDDIPLED" Notice that the original "bad" formating has not been preserved - which is arguably a bug... *** This bug has been marked as a duplicate of 1903 *** ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 3 13:48:04 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 14:35:46 2006 Subject: [Biopython-dev] [Bug 1943] New: Bad Documentation in Bio.Fasta Message-ID: <200602031848.k13Im4mB003238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1943 Summary: Bad Documentation in Bio.Fasta Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: mcolosimo@mitre.org This has been getting me every year I think. I implicatively state what these objects really are. Second, fixed the order in the Documentation for title2id to match the actual code. Diff below Index: __init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Fasta/__init__.py,v retrieving revision 1.13 diff -r1.13 __init__.py 9,10c9,10 < RecordParser Parses FASTA sequence data into a Record object. < SequenceParser Parses FASTA sequence data into a Sequence object. --- > RecordParser Parses FASTA sequence data into a Fasta.Record object. > SequenceParser Parses FASTA sequence data into a SeqRecord object. 109c109 < """Parses FASTA sequence data into a Record object. --- > """Parses FASTA sequence data into a Fasta.Record object. 126c126 < """Parses FASTA sequence data into a Sequence object. --- > """Parses FASTA sequence data into a SeqRecord object. 136c136 < file (without the beginning >), will return the name, id and --- > file (without the beginning >), will return the id, name, and ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 3 14:13:05 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 14:36:05 2006 Subject: [Biopython-dev] [Bug 1944] New: Align.Generic adding iterator and more Message-ID: <200602031913.k13JD5wD003550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 Summary: Align.Generic adding iterator and more Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: mcolosimo@mitre.org I thought it would be nice to be able to directly iterate over the SeqRecords in an alignment. So, I wrote it up and tested it with Bio.Clustalw. I also added the ability to fill in other fields of the SeqRecord (similar to Fasta.SequenceParser) Diff below: Index: Generic.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Align/Generic.py,v retrieving revision 1.5 diff -r1.5 Generic.py 32c32 < # hold everything at a list of seq record objects --- > # hold everything as a list of SeqRecord objects 33a34 > self._iter_pos = 0 34a36,51 > def __iter__(self): > self.__iter_pos = 0 > return iter(self.next, None) > > def next(self): > """Returns one sequence record at a time. > > @return: a SeqRecord or None if end of iteration. > """ > if self._iter_pos >= len(self._records): > return None > > rec = self._records[self._iter_pos] > self._iter_pos += 1 > return rec > 38c55,56 < The return value is a list of SeqRecord objects. --- > @return: a list of the sequences. > @rtype: SeqRecord 45,49c63,66 < Returns: < o A Seq object for the requested sequence. < < Raises: < o IndexError - If the specified number is out of range. --- > @param number: the number of the sequence in the consensus. > @return: the requested sequence. > @rtype: SeqRecord. > @raise IndexError: If the specified number is out of range. 69c86 < weight = 1.0): --- > weight = 1.0, description2ids = None): 86a104,107 > o descriptor2id - A function that, when given the descriptor, > will return the id, name, and description (in that order) > for the record. If this is not given, then the entire descriptor > line will be used as the description. 89c110,117 < new_record = SeqRecord(new_seq, description = descriptor) --- > rec = SeqRecord(new_seq) > if title2ids: > seq_id, name, descr = title2ids(descriptor) > rec.id = seq_id > rec.name = name > rec.description = descr > else: > rec.description = descriptor 99c127 < new_record.annotations['start'] = start --- > rec.annotations['start'] = start 101c129 < new_record.annotations['end'] = end --- > rec.annotations['end'] = end 104c132 < new_record.annotations['weight'] = weight --- > rec.annotations['weight'] = weight 106c134,147 < self._records.append(new_record) --- > # what happens if we're iterating? > self._records.append(rec) > > > def addSeqRecord(self, seqRec): > """Add a Sequence Record to the Alignment > > @param seqRec: a sequence record (SeqRecord) to add. > """ > if isinstance(seqRec, SeqRecord): > self._records.append(seqRec) > else: > raise TypeError("sequence is NOT a SeqRecord Object") > ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 3 21:23:26 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 3 21:34:47 2006 Subject: [Biopython-dev] [Bug 1946] New: Parsing GenBank Files - ParserPositionException: Message-ID: <200602040223.k142NQwM009545@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1946 Summary: Parsing GenBank Files - ParserPositionException: Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Martel/Mindy AssignedTo: biopython-dev@biopython.org ReportedBy: julius.lucks@gmail.com Parsing a genbank file with the following code (BioPython version 1.41 installed with fink on python 2.3 on OS X): from Bio import GenBank feature_parser = GenBank.FeatureParser() gb_record = feature_parser.parse(open('bug.gb','r')) I get a trace: Traceback (most recent call last): File "bug.py", line 11, in ? gb_record = feature_parser.parse(open(gb_file,'r')) File "/sw/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 219, in parse self._scanner.feed(handle, self._consumer) File "/sw/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1259, in feed self._parser.parseFile(handle) File "/sw/lib/python2.3/site-packages/Martel/Parser.py", line 328, in parseFile self.parseString(fileobj.read()) File "/sw/lib/python2.3/site-packages/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/sw/lib/python2.3/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 196 The contents of bug.gb are: LOCUS NC_001416 48502 bp DNA linear PHG 08-DEC-2005 DEFINITION Enterobacteria phage lambda, complete genome. ACCESSION NC_001416 VERSION NC_001416.1 GI:9626243 PROJECT GenomeProject:14204 KEYWORDS . .. (Truncated) If I remove the PROJECT line, the bug is fixed. This seems to be an uncommon tag in GenBank files, so I am not sure if the parser takes this into account. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 5 07:00:15 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sun Feb 5 07:35:08 2006 Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line type PROJECT Message-ID: <200602051200.k15C0FsZ010981@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1946 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Component|Martel/Mindy |Main Distribution OS/Version|Mac OS |All Summary|Parsing GenBank Files - |Parsing GenBank Files - |ParserPositionException: |unknown line type PROJECT ------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-05 07:00 ------- The non-martel GenBank parser in CVS is also unaware of the project line in GenBank files. I would expect it to fail with an assertion error: Unknown line type, PROJECT found: PROJECT GenomeProject:14204 This looks like an easy fix, however we need to decide how to store the project information. Maybe a simple string for now, "GenomeProject:14204" Also maybe unknown line types in the header should trigger warnings rather than errors that stop the parsing... --------------------------------------- Quoting from ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt --------------------------------------- 1.4.1 New Linetype for Genome Project Identifier DDBJ, EMBL, and GenBank are working to create a collaborative system that will assign a unique numeric identifier to genome projects. The purpose of this new identifier is to provide a link among sequence records that pertain to a specific genome sequencing project. At GenBank, this new identifier will be presented in the flatfile format via a new linetype : PROJECT . Here is a mocked-up example demonstrating the new linetype's use: LOCUS CH476840 1669278 bp DNA linear CON 05-OCT-2005 DEFINITION Magnaporthe grisea 70-15 supercont5.200 genomic scaffold, whole genome shotgun sequence. ACCESSION CH476840 AACU02000000 VERSION CH476840.1 GI:77022292 PROJECT GENOME_PROJECT:12345 The integer 12345 represents the value of a possible genome project identifier. There is a possibility that the contents of the PROJECT line might change somewhat from this example by the time the new identifier is implemented. We will keep you posted of any such changes via these release notes and the GenBank listserv. These Genome Project identifiers will be searchable within NCBI's Entrez: Genome-Project database: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj The earliest date on which this new linetype will appear in the GenBank flatfile format is February 15 2006. --------------------------------------- Looks like they are ahead of shedule in releasing this new type line. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 6 05:24:39 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 6 05:35:12 2006 Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line type PROJECT Message-ID: <200602061024.k16AOdoG030083@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1946 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Platform|Macintosh |All ------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-06 05:24 ------- Initial fix checked in, see Bio/GenBank/__init__.py revision 1.57 The parser will now print a warning when unknown header lines are found, but will continue parsing the file. Previously parsing would halt at the unknown line. This will allow people to deal with files containing the new project line (and other headers the NCBI may introduce in the future). Once the NCBI settle on the exact format of the new project line, we should sort out how to represent it in BioPython. I have therefore left this bug open for now... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 6 05:51:18 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 6 06:35:09 2006 Subject: [Biopython-dev] [Bug 1943] Bad Documentation in Bio.Fasta Message-ID: <200602061051.k16ApIul030526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1943 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-06 05:51 ------- I have checked in Marc's suggested three changes to the comments in Bio/Fasta/__init__.py - see revision 1.14 The mistake about (name,id,descr) versus (id,name,descr) in the doc string for the SequenceParser about the title2ids argument had been there a long time. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From gould at embl.de Thu Feb 9 04:37:07 2006 From: gould at embl.de (gould@embl.de) Date: Thu Feb 9 04:32:59 2006 Subject: [Biopython-dev] uniprot release 49/biopython script no longer work Message-ID: <20060209103707.1ckq8osc6d5c844s@webmail.embl.de> hi I've been having problems with some of our applications here that use biopython scripts to retrieve a record from uniprot/swissprot given an accession nr/ID....As far as I'm aware the problem only occurred after the release 49.0 of uniprot/swissprot db yesterday...I see from the release notes that some changes were made to the annotation format and suspect this is why the biopython scripts are no longer happy??....I've checked to make sure I have the latest version of biopython but this has not remedied the problem.....This problem would seem to lie with biopython but I was wondering if you are aware of this problem and if any fix is to be made available?? thanks Kate Gould ________________________________________________________________________________ Software Engineer Gibson Team Structural and Computational Biology Unit EMBL Meyerhofstrasse 1 69117 Heidelberg, Germany phone: +49 6221 387 451 fax: +49 6221 387 517 http://elm.eu.org/ http://phospho.elm.eu.org/ From bugzilla-daemon at portal.open-bio.org Thu Feb 9 08:59:49 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Thu Feb 9 09:34:50 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure Message-ID: <200602091359.k19DxnVl013204@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 lpritc@scri.sari.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lpritc@scri.sari.ac.uk Status|RESOLVED |REOPENED Resolution|DUPLICATE | ------- Comment #3 from lpritc@scri.sari.ac.uk 2006-02-09 08:59 ------- The updated CVS code from February 8th falls over on the note qualifier of the following record from NC_007633.gbk CDS 391217..391771 /locus_tag="MCAP_0327" /note="Similar non-mycoplasma proteins have and additional 120 amino acids at the COOH end; identified by similarity to SP:P54575; match to protein family HMM PF06574" /codon_start=1 /transl_table=4 /product="riboflavin kinase (flavokinase) domain protein" /protein_id="YP_424312.1" /db_xref="GI:83319941" /db_xref="GeneID:3828958" /translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK IKKLLDENLVDKAQELLGIDLKLK" Deleting the extra \n in the record resolves the problem. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 9 13:52:27 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Thu Feb 9 14:35:30 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure Message-ID: <200602091852.k19IqQHQ018021@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 ------- Comment #4 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-09 13:52 ------- This does seem to work for me using a freshly downloaded NC_007633.gbk that starts: LOCUS NC_007633 1010023 bp DNA circular BCT 18-JAN-2006 It has the blank line 7114 you reported in locus MCAP_0327 Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.GenBank import RecordParser >>> parser = RecordParser() >>> record = parser.parse(file('NC_007633.gbk')) WARNING - Ignoring an unknown line type, PROJECT found: PROJECT GenomeProject:16208 >>> print record.features[644] CDS 391217..391771 /locus_tag="MCAP_0327" /note="Similar non-mycoplasma proteins have and additional 120 amino acids at the COOH end; identified by similarity to SP:P54575; match to protein family HMM PF06574" /codon_start=1 /transl_table=4 /product="riboflavin kinase (flavokinase) domain protein" /protein_id="YP_424312.1" /db_xref="GI:83319941" /db_xref="GeneID:3828958" /translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK IKKLLDENLVDKAQELLGIDLKLK" The warning about the PROJECT line is a recent change, see bug 1946 I am using the latest version of Bio/GenBank/__init__.py which is revision 1.57 checked in 6 Feb 2006. This should be the same as yours if you downloaded it on 8 Feb... Assuming you have the same genbank file (same date in the LOCUS line) and the same Bio/GenBank/__init__.py as me, then maybe there is something else different between our machines, maybe in another part of BioPython. Or, it could be a Windows/Unix line ending problem? Or worse, LF vs CR vs CRLF. Did you download the file by FTP or via the website? This might make a difference if the original file contained a mixture of CR and CRLF. So far I have only tried this on Windows (and I download the file via the NCBI website), and BioPython copes with the GenBank file in either windows or unix format. I have not (yet) tried it on Linux... Could you check what happens if you use dos2unix and/or unix2dos on your GenBank file? Thanks ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 10 06:45:54 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 07:35:14 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank lines in features Message-ID: <200602101145.k1ABjsRA028960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED OS/Version|Linux |All Platform|PC |All Summary|GenBank RecordParser fails |GenBank RecordParser fails |on particular qualifier |on blank lines in features |structure | ------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-10 06:45 ------- Tried on Linux with file download by HTTP from here, using the send to file option: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi??db=nucleotide&val=NC_007633 ..and it works perfectly. The "blank line" has the expected 21 spaces. Then I tried (again on Linux) with the file downloaded by FTP from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Mycoplasma_capricolum_ATCC_27343/NC_007633.gbk And it failed. Looking at the file in an editor, the blank line is empty - it doesn't have the 21 spaces. As a result, the parser failure is understandable: Traceback (most recent call last): File "/home/maubp/GenBank/bug1942.py", line 3, in -toplevel- record = parser.parse(file('NC_007633.gbk')) File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 212, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1630, in feed assert line[0:FEATURE_QUALIFIER_INDENT]==FEATURE_QUALIFIER_SPACER, \ AssertionError: Expected qualifier description continuation, not: Is this the same error you saw Leighton? I would say the file with a blank line is actually a malformed GenBank file, but as this is an offical NCBI supplied file I'll try and get the parser to support this too. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 10 07:18:18 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 07:35:28 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank lines in features Message-ID: <200602101218.k1ACIIYo029244@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 lpritc@scri.sari.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OS/Version|All |Linux Platform|All |PC ------- Comment #6 from lpritc@scri.sari.ac.uk 2006-02-10 07:18 ------- Hi Peter, I normally update biopython incrementally from CVS using `cvs update`, and the last update I made was February 8th. I also downloaded the source for __init__.py v1.57 via ViewCVS from the BioPython site, and ran a diff against the installed versions: [lpritc@lplinuxdev downloads]$ diff __init__.py /usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py [lpritc@lplinuxdev downloads]$ diff __init__.py /usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py Neither run reported any differences. The NCBI files are downloaded by ftp in BIN mode direct to the Linux box I work on. Checking the offending record with khexedit shows that the two linebreaks are both CRs (0a), which you would expect to be handled equivalently on Windows and Linux. The NC_007633.gbk file I'm using has the same date as yours. Running unix2dos, then attempting to parse, and dos2unix, and attempting to parse, on a freshly-downloaded copy of NC_007633.gbk threw the same error in each case on Linux: >>> parser.parse(fhandle) Traceback (most recent call last): File "", line 1, in ? File "Bio/GenBank/__init__.py", line 212, in parse self._scanner.feed(handle, self._consumer) File "Bio/GenBank/__init__.py", line 1630, in feed assert line[0:FEATURE_QUALIFIER_INDENT]==FEATURE_QUALIFIER_SPACER, \ AssertionError: Expected qualifier description continuation, not: I get the same error message using the same file on Windows, even after conversion with unix2dos, and on Mac OS X with a fresh download of NC_007633.gbk via ftp, and fresh install of biopython from CVS. I must say I'm baffled as to why I'm getting a different result to you on this file. On the bright side, it's the only current bacterial genome .gbk file I'm having a problem with. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 10 07:23:54 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 07:35:46 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank lines in features Message-ID: <200602101223.k1ACNsjQ029272@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 ------- Comment #7 from lpritc@scri.sari.ac.uk 2006-02-10 07:23 ------- Interesting collision, there ;) Yep - I'm getting exactly that error message, and when I checked the NC_007633.gbk file with khexedit I didn't see the run of 21 spaces, either. Is it possible that that run of spaces is stripped out by my ftp client during transfer? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mcolosimo at mitre.org Fri Feb 10 08:27:14 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri Feb 10 08:23:06 2006 Subject: [Biopython-dev] An iterator for Align.Generic Message-ID: <43EC94B2.6090209@mitre.org> Last week I sent in a patch to Align.Generic to make it an iterator [Bug 1944]. Has anyone looked at this? I've found this to be very useful to me and I really don't want to keep a patch file around to add this functionality each time I checkout biopython. Marc From bugzilla-daemon at portal.open-bio.org Fri Feb 10 08:59:33 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 09:35:09 2006 Subject: [Biopython-dev] [Bug 1948] New: uniprot release 49/SProt.Record Parser Problem Message-ID: <200602101359.k1ADxXTk030188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 Summary: uniprot release 49/SProt.Record Parser Problem Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: gould@embl.de I've been having problems with some of our applications that use biopython scripts to retrieve a record from uniprot/swissprot given an accession nr/ID....As far as I'm aware the problem only occurred after the release 49.0 of uniprot/swissprot db on 6th Feb...I see from the release notes that some changes were made to the annotation format and suspect this is why the biopython scripts are no longer happy??....I've checked to make sure I have the latest version of biopython but this has not remedied the problem.....This problem would seem to lie with biopython. Are any fixes is to be made available?? An example of the error being thrown is below: Python 2.4 (#1, Dec 10 2004, 11:49:12) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.WWW import ExPASy >>> from Bio.SwissProt import SProt >>> from Bio import File >>> acc='Q14155' >>> results = ExPASy.get_sprot_raw(acc.strip()).read() >>> sp_parser = SProt.RecordParser() File "", line 1 sp_parser = SProt.RecordParser() ^ SyntaxError: invalid syntax >>> sp_parser = SProt.RecordParse File "", line 1 sp_parser = SProt.RecordParse ^ SyntaxError: invalid syntax >>> sp_parser = SProt.RecordParser() >>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) >>> Record = sp_iterator.next() Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 166 , in next return self._parser.parse(File.StringHandle(data)) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 290 , in parse self._scanner.feed(handle, self._consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 332 , in feed self._scan_record(uhandle, consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 337 , in _scan_record fn(self, uhandle, consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 369 , in _scan_id self._scan_line('ID', uhandle, consumer.identification, exactly_one=1) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 359 , in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in read_and_ca ll raise SyntaxError, errmsg SyntaxError: Line does not start with 'ID': >>> ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 10 08:52:09 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 09:35:26 2006 Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank lines in features Message-ID: <200602101352.k1ADq91w030107@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1942 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-10 08:52 ------- I have checked in a fix for blank lines in Genbank feature entries. The parser will print a warning to screen and ignore the blank line(s). See Bio/GenBank/__init__.py revision: 1.58 If you manage to find anymore "unusual Genbank files" please file another bug. Thanks Leighton. Peter P.S. Leighton wrote: > Is it possible that that run of spaces is stripped > out by my ftp client during transfer? I don't know - but if so, the same thing happened with my browser's FTP support. Even if this is just a file transfer issue, I still think we should cope. My personal opionion is this is just a one-off bad entry in the NCBI's records, but that BioPython should be able to read anything they produce. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Fri Feb 10 10:03:24 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri Feb 10 10:18:32 2006 Subject: [Biopython-dev] An iterator for Align.Generic In-Reply-To: <43EC94B2.6090209@mitre.org> References: <43EC94B2.6090209@mitre.org> Message-ID: <43ECAB3C.4010708@maubp.freeserve.co.uk> Marc Colosimo wrote: > Last week I sent in a patch to Align.Generic to make it an iterator [Bug > 1944]. Has anyone looked at this? I've found this to be very useful to > me and I really don't want to keep a patch file around to add this > functionality each time I checkout biopython. > > Marc Hi Mark, I haven't looked at your code, but I do use Clustal alignments quite often in BioPython. Could you put together a short example showing how this would work? If this was added to BioPython then we could use this to update the cook book. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 10 10:13:12 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 10:35:40 2006 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more Message-ID: <200602101513.k1AFDCPd031502@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #1 from mdehoon@ims.u-tokyo.ac.jp 2006-02-10 10:13 ------- Can you write an example script of how to use this, and how this is different from the current usage of Align.Generic? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 10 10:17:30 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Fri Feb 10 10:35:49 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602101517.k1AFHUgD031595@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 ------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-10 10:17 ------- I'm not familar with this module, but I get a rather different result. Could you attached the file that ExPASy.get_sprot_raw() returns to this bug? It looks like you got an HTML file back - I would guess this was an error page due to a temporary problem. If you try again I think something else will happen... When I just did this on Windows, I did get a valid looking file back, but BioPython still failed to parse it: from Bio.WWW import ExPASy from Bio.SwissProt import SProt from Bio import File acc='Q14155' results = ExPASy.get_sprot_raw(acc.strip()).read() sp_parser = SProt.RecordParser() sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) Record = sp_iterator.next() It also failed at the iterator next step, but in a different way: Traceback (most recent call last): File "c:\temp\bug1948.py", line 8, in -toplevel- Record = sp_iterator.next() File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 166, in next return self._parser.parse(File.StringHandle(data)) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 290, in parse self._scanner.feed(handle, self._consumer) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 332, in feed self._scan_record(uhandle, consumer) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 337, in _scan_record fn(self, uhandle, consumer) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 378, in _scan_dt self._scan_line('DT', uhandle, consumer.date, exactly_one=1) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 359, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 301, in read_and_call method(line) File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 551, in date assert rel_index >= 0, \ AssertionError: Could not find Rel. in DT line: DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. Looking at the file returned gave: >>> print results ID ARHG7_HUMAN STANDARD; PRT; 803 AA. AC Q14155; Q6P9G3; Q6PII2; Q86W63; Q8N3M1; DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. DT 19-JUL-2004, sequence version 2. DT 07-FEB-2006, entry version 55. DE Rho guanine nucleotide exchange factor 7 (PAK-interacting exchange DE factor beta) (Beta-Pix) (COOL-1) (p85). .. // Reading Bio/SwissProt/Spot.py class _RecordConsumer method date(), none of those three DT lines look like what the code is expecting. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mcolosimo at mitre.org Fri Feb 10 11:48:10 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri Feb 10 11:55:45 2006 Subject: [Biopython-dev] An iterator for Align.Generic In-Reply-To: <43ECAB3C.4010708@maubp.freeserve.co.uk> References: <43EC94B2.6090209@mitre.org> <43ECAB3C.4010708@maubp.freeserve.co.uk> Message-ID: <43ECC3CA.4080804@mitre.org> Peter wrote: > Marc Colosimo wrote: > >> Last week I sent in a patch to Align.Generic to make it an iterator >> [Bug 1944]. Has anyone looked at this? I've found this to be very >> useful to me and I really don't want to keep a patch file around to >> add this functionality each time I checkout biopython. >> >> Marc > > > Hi Mark, > > I haven't looked at your code, but I do use Clustal alignments quite > often in BioPython. Could you put together a short example showing > how this would work? If this was added to BioPython then we could use > this to update the cook book. > > Peter > Peter, First, I think I need to make one change in the code so that one can re-iterate (basically reset the _iter_pos to 0( Now, here is some code (I think this will work), which shows that you can make one function to handle sequences from different file types. from Bio import Clustalw from Bio import Fasta def doSomethingInteresting(recIter): for seqRec in recIter: print seqRec.description print seqRec.seq.tostring() #main fasta_iter = Fasta.Iterator( open("my.fasta"), Fasta.SequenceParser() ) # where my.fasta is the unaligned sequences aln_iter = Clustalw.parse_file("my.aln") # where my.aln is the aligned sequences doSomethingInteresting(fasta_iter) doSomethingInteresting(aln_iter) From mdehoon at c2b2.columbia.edu Fri Feb 10 16:39:53 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Feb 10 16:43:31 2006 Subject: [Biopython-dev] uniprot release 49/biopython script no longer work Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu> Could you post the error message that you're getting? Preferably, with a simple script that causes the error to appear? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces@portal.open-bio.org on behalf of gould@embl.de Sent: Thu 2/9/2006 4:37 AM To: biopython-dev@biopython.org Subject: [Biopython-dev] uniprot release 49/biopython script no longer work hi I've been having problems with some of our applications here that use biopython scripts to retrieve a record from uniprot/swissprot given an accession nr/ID....As far as I'm aware the problem only occurred after the release 49.0 of uniprot/swissprot db yesterday...I see from the release notes that some changes were made to the annotation format and suspect this is why the biopython scripts are no longer happy??....I've checked to make sure I have the latest version of biopython but this has not remedied the problem.....This problem would seem to lie with biopython but I was wondering if you are aware of this problem and if any fix is to be made available?? thanks Kate Gould _____________________________________________________________________________ ___ Software Engineer Gibson Team Structural and Computational Biology Unit EMBL Meyerhofstrasse 1 69117 Heidelberg, Germany phone: +49 6221 387 451 fax: +49 6221 387 517 http://elm.eu.org/ http://phospho.elm.eu.org/ _______________________________________________ Biopython-dev mailing list Biopython-dev@biopython.org http://biopython.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Sun Feb 12 15:49:03 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sun Feb 12 16:34:49 2006 Subject: [Biopython-dev] [Bug 1949] New: Biopython Nexus: Trees.py. Check for False fails Message-ID: <200602122049.k1CKn3X3017202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1949 Summary: Biopython Nexus: Trees.py. Check for False fails Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev@biopython.org ReportedBy: wb@binf.ku.dk Line 383 in Trees.py should be changed from: if not newroot_subtree: to: if newroot_subtree == False The problem arises whenever the value of newroot_subtree is 0. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From gould at embl.de Mon Feb 13 03:10:46 2006 From: gould at embl.de (gould@embl.de) Date: Mon Feb 13 03:06:39 2006 Subject: [Biopython-dev] uniprot release 49/biopython script no longer work In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <20060213091046.pejpszhykf3kssc4@webmail.embl.de> the simple script/error message which occurs when attempting to parse the result from uniprot is as follows: gould@milou:~> python Python 2.4 (#1, Dec 10 2004, 11:49:12) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.WWW import ExPASy >>> from Bio.SwissProt import SProt >>> from Bio import File >>> acc='Q14155' >>> results = ExPASy.get_sprot_raw(acc.strip()).read() >>> sp_parser = SProt.RecordParser() >>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) >>> Record = sp_iterator.next() Traceback (most recent call last): File "", line 1, in ? File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 166, in next return self._parser.parse(File.StringHandle(data)) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 290, in parse self._scanner.feed(handle, self._consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 332, in feed self._scan_record(uhandle, consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 337, in _scan_record fn(self, uhandle, consumer) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 378, in _scan_dt self._scan_line('DT', uhandle, consumer.date, exactly_one=1) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 359, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 301, in read_and_call method(line) File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 551, in date assert rel_index >= 0, \ AssertionError: Could not find Rel. in DT line: DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. Quoting Michiel De Hoon : > Could you post the error message that you're getting? Preferably, with a > simple script that causes the error to appear? > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-dev-bounces@portal.open-bio.org on behalf of = > gould@embl.de > Sent: Thu 2/9/2006 4:37 AM > To: biopython-dev@biopython.org > Subject: [Biopython-dev] uniprot release 49/biopython script no longer = > work > =20 > hi > > I've been having problems with some of our applications here that use > biopython > scripts to retrieve a record from uniprot/swissprot given an accession > nr/ID....As far as I'm aware the problem only occurred after the release = > 49.0 > of uniprot/swissprot db yesterday...I see from the release notes that = > some > changes were made to the annotation format and suspect this is why the > biopython scripts are no longer happy??....I've checked to make sure I = > have > the > latest version of biopython but this has not remedied the = > problem.....This > problem would seem to lie with biopython but I was wondering if you > are aware of this problem and if any fix is to be made available?? > > thanks > > Kate Gould > > > _________________________________________________________________________= From bugzilla-daemon at portal.open-bio.org Mon Feb 13 03:21:38 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 03:34:50 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602130821.k1D8Lclc024275@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 ------- Comment #2 from gould@embl.de 2006-02-13 03:21 ------- (In reply to comment #0) > I've been having problems with some of our applications that use biopython > scripts to retrieve a record from uniprot/swissprot given an accession > nr/ID....As far as I'm aware the problem only occurred after the release 49.0 > of uniprot/swissprot db on 6th Feb...I see from the release notes that some > changes were made to the annotation format and suspect this is why the > biopython scripts are no longer happy??....I've checked to make sure I have the > latest version of biopython but this has not remedied the problem.....This > problem would seem to lie with biopython. > Are any fixes is to be made available?? > An example of the error being thrown is below: > > Python 2.4 (#1, Dec 10 2004, 11:49:12) > [GCC 3.3.1 (SuSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio.WWW import ExPASy > >>> from Bio.SwissProt import SProt > >>> from Bio import File > >>> acc='Q14155' > >>> results = ExPASy.get_sprot_raw(acc.strip()).read() > >>> sp_parser = SProt.RecordParser() > File "", line 1 > sp_parser = SProt.RecordParser() > ^ > SyntaxError: invalid syntax > >>> sp_parser = SProt.RecordParse > File "", line 1 > sp_parser = SProt.RecordParse > ^ > SyntaxError: invalid syntax > >>> sp_parser = SProt.RecordParser() > >>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) > >>> Record = sp_iterator.next() > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 166 > , in next > return self._parser.parse(File.StringHandle(data)) > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 290 > , in parse > self._scanner.feed(handle, self._consumer) > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 332 > , in feed > self._scan_record(uhandle, consumer) > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 337 > , in _scan_record > fn(self, uhandle, consumer) > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 369 > , in _scan_id > self._scan_line('ID', uhandle, consumer.identification, exactly_one=1) > File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line > 359 > , in _scan_line > read_and_call(uhandle, event_fn, start=line_type) > File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, > > in read_and_ca > > ll > raise SyntaxError, errmsg > SyntaxError: Line does not start with 'ID': > > > >>> > (In reply to comment #1) > I'm not familar with this module, but I get a rather different result. > > Could you attached the file that ExPASy.get_sprot_raw() returns to this bug? > It looks like you got an HTML file back - I would guess this was an error page > due to a temporary problem. If you try again I think something else will > happen... > > When I just did this on Windows, I did get a valid looking file back, but > BioPython still failed to parse it: > > from Bio.WWW import ExPASy > from Bio.SwissProt import SProt > from Bio import File > acc='Q14155' > results = ExPASy.get_sprot_raw(acc.strip()).read() > sp_parser = SProt.RecordParser() > sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) > Record = sp_iterator.next() > > It also failed at the iterator next step, but in a different way: > Traceback (most recent call last): > File "c:\temp\bug1948.py", line 8, in -toplevel- > Record = sp_iterator.next() > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 166, in > next > return self._parser.parse(File.StringHandle(data)) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 290, in > parse > self._scanner.feed(handle, self._consumer) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 332, in > feed > self._scan_record(uhandle, consumer) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 337, in > _scan_record > fn(self, uhandle, consumer) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 378, in > _scan_dt > self._scan_line('DT', uhandle, consumer.date, exactly_one=1) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 359, in > _scan_line > read_and_call(uhandle, event_fn, start=line_type) > File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 301, in > read_and_call > method(line) > File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 551, in > date > assert rel_index >= 0, \ > AssertionError: Could not find Rel. in DT line: DT 01-NOV-1997, integrated > into UniProtKB/Swiss-Prot. > > > > Looking at the file returned gave: > > >>> print results > ID ARHG7_HUMAN STANDARD; PRT; 803 AA. > AC Q14155; Q6P9G3; Q6PII2; Q86W63; Q8N3M1; > DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. > DT 19-JUL-2004, sequence version 2. > DT 07-FEB-2006, entry version 55. > DE Rho guanine nucleotide exchange factor 7 (PAK-interacting exchange > DE factor beta) (Beta-Pix) (COOL-1) (p85). > ... > // > > Reading Bio/SwissProt/Spot.py class _RecordConsumer method date(), none of > those three DT lines look like what the code is expecting. > I'm not sure I follow what you are saying....I don't have a problem reading the file and get the same result as you did.. The problem is parsing the results(as the error abaove occurs) ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 13 07:43:21 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 08:35:11 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602131243.k1DChLWS027589@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OS/Version|Linux |All ------- Comment #3 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-13 07:43 ------- I'm unclear what you meant in comment 2 Kate. Your original bug report had the following: SyntaxError: Line does not start with 'ID': This suggests that instead of getting a plain text SProt file (which should start 'ID'), you got an HTML file. Onre reason for this MIGHT be a temporary problem with the ExPASy website - returning an error message in HTML. If you still get the error message, could you attach the raw HTML to this bug (you could use "print results" at the Python prompt). If the HTML problem has gone away on its own (which wouldn't surprise me if it was a temporary problem with the server) do you see the problem I talked about in comment 1 of the bug? I have tried this on both Linux and Windows now, both show the problem described in comment 1 where the 'DT' lines do not match what BioPython is expecting. Quoting your original bug report: > I see from the release notes that some changes were made to the > annotation format and suspect this is why the biopython scripts > are no longer happy? Yes - this does explain the 'DT' line problem, BioPython will need to be updated to cope with the new format DT lines: http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0 Quoting: Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 13 08:59:53 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 09:34:50 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602131359.k1DDxr2s028325@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 ------- Comment #4 from gould@embl.de 2006-02-13 08:59 ------- (In reply to comment #3) > I'm unclear what you meant in comment 2 Kate. > > Your original bug report had the following: > > SyntaxError: Line does not start with 'ID': > > > This suggests that instead of getting a plain text SProt file > (which should start 'ID'), you got an HTML file. > > Onre reason for this MIGHT be a temporary problem with the ExPASy > website - returning an error message in HTML. > > If you still get the error message, could you > attach the raw HTML to this bug (you could use "print results" > at the Python prompt). > > If the HTML problem has gone away on its own (which wouldn't > surprise me if it was a temporary problem with the server) do you > see the problem I talked about in comment 1 of the bug? > > I have tried this on both Linux and Windows now, both show the > problem described in comment 1 where the 'DT' lines do not match > what BioPython is expecting. > > Quoting your original bug report: > > I see from the release notes that some changes were made to the > > annotation format and suspect this is why the biopython scripts > > are no longer happy? > > Yes - this does explain the 'DT' line problem, BioPython will need > to be updated to cope with the new format DT lines: > > http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0 > > Quoting: > > Changes concerning dates and versions numbers (DT lines) > > We changed from showing only the dates corresponding to full UniProtKB releases > in the DT lines to displaying the date of the biweekly release at which an > entry is integrated or updated. We dropped the information concerning the > release number and introduced entry and sequence version numbers in the DT > lines. > > The new format of the three DT lines is: > > DT DD-MMM-YYYY, integrated into UniProtKB/database_name. > DT DD-MMM-YYYY, sequence version version_number. > DT DD-MMM-YYYY, entry version version_number. > > Example for UniProtKB/Swiss-Prot: > > DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. > DT 15-OCT-2001, sequence version 3. > DT 01-APR-2004, entry version 14. > > Example for UniProtKB/TrEMBL: > > DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. > DT 15-OCT-2000, sequence version 2. > DT 15-DEC-2004, entry version 5. > > The sequence version number of an entry is incremented by one when its amino > acid sequence is modified. The entry version number is incremented by one > whenever any data in the flat file representation of the entry is modified. > > We retrofitted the entry and sequence version numbers, as well as all dates, > using archived UniProtKB releases. > Yes, I understand what you are saying now....I'm no longer getting the HTML file but a plain text SProt file which is not being parsed correctly ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 13 19:16:23 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 19:35:29 2006 Subject: [Biopython-dev] [Bug 1950] addition of element, SSBOND, OBSLTE, CAVEAT fields Message-ID: <200602140016.k1E0GNXW002951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1950 ------- Comment #1 from edmonds@fas.harvard.edu 2006-02-13 19:16 ------- Created an attachment (id=287) --> (http://bugzilla.open-bio.org/attachment.cgi?id=287&action=view) patch to PDB to add element, SSBOND, OBSLTE, CAVEAT ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 13 19:15:14 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 19:36:09 2006 Subject: [Biopython-dev] [Bug 1950] New: addition of element, SSBOND, OBSLTE, CAVEAT fields Message-ID: <200602140015.k1E0FEQL002939@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1950 Summary: addition of element, SSBOND, OBSLTE, CAVEAT fields Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: edmonds@fas.harvard.edu I find it useful to be able to sort atoms according to their element (H, C, N, O, etc), which is contained in columns 77-78 of the PDB, so I have added it to the parsing of the PDB, and have added get_element to Atom. I did not add it to the MMCIFParser because I don't know anything about MMCIFs and don't know if it's even applicable for MMCIFs. I also added trivial SSBOND, OBSLTE, and CAVEAT parsing to the PDB header parser. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 13 19:16:59 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Mon Feb 13 19:36:28 2006 Subject: [Biopython-dev] [Bug 1950] addition of element, SSBOND, OBSLTE, CAVEAT fields to PDB Message-ID: <200602140016.k1E0Gxne002963@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1950 edmonds@fas.harvard.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|addition of element, SSBOND,|addition of element, SSBOND, |OBSLTE, CAVEAT fields |OBSLTE, CAVEAT fields to PDB ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 15 08:07:51 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Feb 15 08:35:12 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602151307.k1FD7pmM031301@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-15 08:07 ------- I have checked in a "short term fix" to the SwissProt parser, see Bio/SwissProt/SProt.py revision 1.32 If you want to test this, the simplest way is just backup your local copy of Bio/SwissProt/SProt.py and then replace it with the latest version from CVS via this URL: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython With this change BioPython will recognise the new style DT lines BUT WILL IGNORE THEM and carry on. This should allow people to do any analysis they need to, as long as they don't need the date information. I have logged bug 1956 to do something sensible with the new DT lines. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 15 08:01:10 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Feb 15 08:35:29 2006 Subject: [Biopython-dev] [Bug 1956] New: SwissProt release 49 - Support for new DT lines Message-ID: <200602151301.k1FD1A9t031191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1956 Summary: SwissProt release 49 - Support for new DT lines Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: biopython-bugzilla@maubp.freeserve.co.uk See also bug 1948 (which I am marking fixed) where the parser would fail on the new files. I am checking in a fix to recognise the new DT lines but ignore them. This bug is to do something useful with the new format DT lines. http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0 Quoting: -------------------------------------------------------- Changes concerning dates and versions numbers (DT lines) We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines. The new format of the three DT lines is: DT DD-MMM-YYYY, integrated into UniProtKB/database_name. DT DD-MMM-YYYY, sequence version version_number. DT DD-MMM-YYYY, entry version version_number. Example for UniProtKB/Swiss-Prot: DT 01-JAN-1998, integrated into UniProtKB/Swiss-Prot. DT 15-OCT-2001, sequence version 3. DT 01-APR-2004, entry version 14. Example for UniProtKB/TrEMBL: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5. The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified. We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases. -------------------------------------------------------- End quote. We should expose the three new bits of information: database_name, e.g. "UniProtKB/Swiss-Prot" or maybe just "Swiss-Prot" sequence_version, e.g. 3 entry_version, e.g. 14 Also the precise meaning of the three dates has changed... Finally as the "release number" is no longer included, perhaps that record property should be depreciated. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 15 09:37:29 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Feb 15 10:35:12 2006 Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem Message-ID: <200602151437.k1FEbTJY032137@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1948 ------- Comment #6 from gould@embl.de 2006-02-15 09:37 ------- (In reply to comment #5) > I have checked in a "short term fix" to the SwissProt parser, see > Bio/SwissProt/SProt.py revision 1.32 > > If you want to test this, the simplest way is just backup your local copy of > Bio/SwissProt/SProt.py and then replace it with the latest version from CVS via > this URL: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython > > With this change BioPython will recognise the new style DT lines BUT WILL > IGNORE THEM and carry on. > > This should allow people to do any analysis they need to, as long as they don't > need the date information. > > I have logged bug 1956 to do something sensible with the new DT lines. > Yes, I've checked that 'short term fix' and it works for me so thanks for your help on that matter.... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Wed Feb 15 11:39:52 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed Feb 15 11:35:52 2006 Subject: [Biopython-dev] [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on Windows Message-ID: <43F35958.702@maubp.freeserve.co.uk> Thomas Hamelryck wrote: > If the mmCIF module causes problems it can just be commented out. I'm beginning to think that might be best (certainly on Windows, as it doesn't seem to work "out of the box" using MSVC or cygwin gcc as the compiler). I have just been trying to work out why Bio.PDB.mmCIF.MMCIFlex won't compile on Windows with MSVC 6.0 First of all, running "setup.py build" doesn't seem to call flex. For example, it doesn't regenerate lex.yy.c if I delete it before hand. The version of Bio/PDB/mmCIF/lex.yy.c currently in CVS has the following as line 12: #include This is unconditional, and won't work with MSVC because the header does not exist. If I change to the relevant directory, and run "flex mmcif.lex" using the cygwin version of flex version 2.5.4 then the lex.yy.c file is recreated, and in addition to some lines moving about, this include statement becomes: #ifndef _WIN32 #include #endif This still will not compile because it has not defined exit, malloc, realloc and free: lex.yy.c(1505) : warning C4013: 'exit' undefined; assuming extern returning int lex.yy.c(1568) : warning C4013: 'malloc' undefined; assuming extern returning int lex.yy.c(1586) : warning C4013: 'realloc' undefined; assuming extern returning int lex.yy.c(1596) : warning C4013: 'free' undefined; assuming extern returning int If instead of using the cygwin version of flex, I use the gnuwin32 port, which also claims to be flex version 2.5.4 then the lex.yy.c is slightly different again - it has NO conditional statements checking for win32. http://gnuwin32.sourceforge.net/packages/flex.htm I think the include lines should be something like: #ifdef _WIN32 #include #else #include #endif After messing about with lex.yy.c include statements, I can get MSVC to compile lex.yy.obj (with warnings) as shown here: C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IBio -Ic:\python23\include -Ic:\python23\PC /TcBio/PDB/mmCIF/MMCIF lexmodule.c /Fobuild\temp.win32-2.3\Release\Bio/PDB/mmCIF/MMCIFlexmodule.obj MMCIFlexmodule.c Bio/PDB/mmCIF/MMCIFlexmodule.c(16) : warning C4013: 'mmcif_set_file' undefined; assuming extern returning int Bio/PDB/mmCIF/MMCIFlexmodule.c(44) : warning C4013: 'mmcif_get_token' undefined; assuming extern returning int C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IBio -Ic:\python23\include -Ic:\python23\PC /TcBio/PDB/mmCIF/lex.y y.c /Fobuild\temp.win32-2.3\Release\Bio/PDB/mmCIF/lex.yy.obj lex.yy.c But then it fails at the link stage: C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:c:\python23\libs /LIBPATH:c:\python23\PCBuild fl.lib /EXPORT:initMMCIFlex build\temp.win32-2.3\Release\Bio/PDB/mmCIF/lex.yy.obj build\temp.win3 2-2.3\Release\Bio/PDB/mmCIF/MMCIFlexmodule.obj /OUT:build\lib.win32-2.3\Bio\PDB\mmCIF\MMCIFlex.pyd /IMPLIB:build\temp.win32-2.3\Release\Bio/PDB/mmCIF\MMCIFlex.lib LINK : fatal error LNK1181: cannot open input file "fl.lib" error: command '"C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe"' failed with exit status 1181 This looks similar to the linking problem Michiel sees on Windows using cygwin gcc as the compiler: http://www.biopython.org/pipermail/biopython/2006-February/002923.html Is the problem that the linker can't find the flex library? I assume it needs either the wingnu32 flex file installed by default here: C:\Program Files\GnuWin32\lib\libfl.a Or, if using the cygwin flex, here: C:\cygwin\lib\libfl.a Peter From thamelry at binf.ku.dk Wed Feb 15 11:46:46 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Wed Feb 15 12:00:40 2006 Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on Windows In-Reply-To: <43F35958.702@maubp.freeserve.co.uk> References: <43F35958.702@maubp.freeserve.co.uk> Message-ID: <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk> On Wed, February 15, 2006 5:39 pm, Peter wrote: > First of all, running "setup.py build" doesn't seem to call flex. For > example, it doesn't regenerate lex.yy.c if I delete it before hand. It's not meant to: lex.yy.c is distributed as part of biopython. You just need the Flex libraries to compile it. Anyways, I've commented out the mmCif module. People who need it can uncomment the relevant lines in setup.py. Cheers, -Thomas From biopython-dev at maubp.freeserve.co.uk Wed Feb 15 13:12:48 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed Feb 15 13:25:29 2006 Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on Windows In-Reply-To: <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk> References: <43F35958.702@maubp.freeserve.co.uk> <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk> Message-ID: <43F36F20.9090207@maubp.freeserve.co.uk> Peter wrote: >> First of all, running "setup.py build" doesn't seem to call flex. >> For example, it doesn't regenerate lex.yy.c if I delete it >> before hand. Thomas Hamelryck wrote: > It's not meant to: lex.yy.c is distributed as part of > biopython. You just need the Flex libraries to compile it. OK - I was unclear on this. > Anyways, I've commented out the mmCif module. > People who need it can uncomment the relevant lines > in setup.py. Seems like a good compromise for now (unless someone wants to contribute a patch for setup.py to check if flex is installed). As lex.yy.c is created by flex, would you agree the problems compiling this file with MSVC are a flex problem? e.g. the #include What about the linker problem (seen with by me with MSVC, and by Michiel with cygwin gcc) not finding the flex library? Might this just be a path issue? It would be nice if we could get the module to work on Windows, at least for people with suitable compilers. Peter From mdehoon at c2b2.columbia.edu Wed Feb 15 13:41:26 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Feb 15 13:36:58 2006 Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex onWindows Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE6A@cgcmail.cgc.cpmc.columbia.edu> > What about the linker problem (seen with by me with MSVC, and by Michiel > with cygwin gcc) not finding the flex library? Might this just be a > path issue? In my case, it's probably just because I don't have flex installed. I remember that at one point (when I was creating the Windows installer for an older Biopython version), I was able to build the flex library. But anyway, users will have to install flex themselves also, because they will need the flex DLL. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From thamelry at binf.ku.dk Wed Feb 15 15:10:08 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Wed Feb 15 15:05:33 2006 Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on Windows In-Reply-To: <43F36F20.9090207@maubp.freeserve.co.uk> References: <43F35958.702@maubp.freeserve.co.uk> <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk> <43F36F20.9090207@maubp.freeserve.co.uk> Message-ID: <32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk> > As lex.yy.c is created by flex, would you agree the problems compiling > this file with MSVC are a flex problem? e.g. the #include Uh...no idea. :-) I know next to nothing about Windows, but I can imagine it should in principle work with cygwin. Maybe flex needs to be re-run on windows before compiling? Was that tried? Cheers, -Thomas From biopython-dev at maubp.freeserve.co.uk Wed Feb 15 15:52:48 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed Feb 15 15:48:44 2006 Subject: [Biopython-dev] Re: Compiling Bio.PDB.mmCIF.MMCIFlex on Windows In-Reply-To: <32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk> References: <43F35958.702@maubp.freeserve.co.uk> <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk> <43F36F20.9090207@maubp.freeserve.co.uk> <32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk> Message-ID: <43F394A0.9010000@maubp.freeserve.co.uk> >>As lex.yy.c is created by flex, would you agree the problems compiling >>this file with MSVC are a flex problem? e.g. the #include > > Uh...no idea. :-) > I know next to nothing about Windows, but > I can imagine it should in principle work with cygwin. > Maybe flex needs to be re-run on windows before compiling? > Was that tried? Yes - I tried both the cygwin flex, and a windows port from here: http://gnuwin32.sourceforge.net/packages/flex.htm While both claimed to be flex version 2.5.4, they produced different lex.yy.c files, however neither worked for me. See my earlier email: http://www.biopython.org/pipermail/biopython-dev/2006-February/002280.html Peter P.S. I'll be away for the next few days, so I won't be responding till next week From bugzilla-daemon at portal.open-bio.org Thu Feb 16 19:33:25 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Thu Feb 16 20:16:00 2006 Subject: [Biopython-dev] [Bug 1919] Transcribe DNA Message-ID: <200602170033.k1H0XP3i023754@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1919 mdehoon@ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #2 from mdehoon@ims.u-tokyo.ac.jp 2006-02-16 19:33 ------- > I was reading some examples in the biopython tutorial and cookbook and for the > first time, since I'd already read it many times, I get confused... > Transcribing the dna sequence ATCG produces the AUCG rna sequence or the UAGC? > Biopython does the first one, but until today I was completely sure that the > correct one is the second. DNA sequences are (almost?) always shown as the non-coding strand. So if a DNA sequence is written as ATCG, then this is the non-coding strand; the coding strand has TAGC in the 3'->5' direction. The mRNA is produced by base-pairing to the coding strand, so you end up with AUGC. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Feb 19 12:15:57 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun Feb 19 12:11:31 2006 Subject: [Biopython-dev] RE: problem in using biopython-1.41 Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE7A@cgcmail.cgc.cpmc.columbia.edu> Dear Sarosh, >From the test results, it looks like all of Biopython is working correctly, except for Bio.Cluster. So if you don't plan on using the clustering algorithms in Bio.Cluster, you've got nothing to worry about. I would like to find out though why Bio.Cluster is failing. It may have to do with the fact that you're on a 64-bits machine; the code has not been tested there. So I'd like to ask you the following: a) Which version of Numerical Python are you using? b) Can you run the following commands and send me the output of step 3: 1) Install biopython with "python setup.py install" 2) From the biopython-1.41/Tests directory, run "python -i test_Cluster.py" 3) From the python prompt, execaute "run_tests("Bio.Cluster")" This will show you the exact output from the Bio.Cluster tests. Thanks in advance, --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: mailman-bounces@portal.open-bio.org on behalf of Sarosh Fatakia Sent: Sat 2/18/2006 9:14 PM To: biopython-dev-owner@biopython.org Subject: problem in using biopython-1.41 Hi I tried sending my problem to biopython developers. Hope you can please help, thanks sarosh ---------- Forwarded message ---------- From: biopython-dev-owner@portal.open-bio.org < biopython-dev-owner@portal.open-bio.org> Date: Feb 18, 2006 7:15 PM Subject: problem in using biopython-1.41 To: sarosh.fatakia@gmail.com You are not allowed to post to this mailing list, and your message has been automatically rejected. If you think that your messages are being rejected in error, contact the mailing list owner at biopython-dev-owner@biopython.org. ---------- Forwarded message ---------- From: "Sarosh Fatakia" To: biopython-dev@biopython.org Date: Sat, 18 Feb 2006 18:50:58 -0500 Subject: problem in using biopython-1.41 Greetings! I installed biopython-1.41 on my unix box which is: Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006 x86_64 x86_64 x86_64 GNU/Linux After following the preliminary steps in http://bioinformatics.org/bradstuff/bp/tut/Tutorial001.html I get an error message for the tests performed as: python setup.py test 2>&1 | tee python_setup.py-test.out.txt The main error message is below and the full diagnostic info is attached as a txt file. I hope you can please help figure out the problem since I am a novice python user, and want to get into biopython for using it as a research tool asap. Thanks Sarosh, NIDDK/NIH ====================================================================== FAIL: test_Cluster ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 148, in runTest self.runSafeTest() File "run_tests.py", line 185, in runSafeTest expected_handle) File "run_tests.py", line 285, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'Wrong clustering solution found.\n' Expected: 'Correct clustering solution found.\n' ---------------------------------------------------------------------- Ran 93 tests in 78.138s -- Sarosh N. Fatakia http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html From mdehoon at c2b2.columbia.edu Tue Feb 21 10:50:13 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue Feb 21 10:45:43 2006 Subject: [Biopython-dev] RE: problem in using biopython-1.41 Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE80@cgcmail.cgc.cpmc.columbia.edu> Hi Sarosh, Thanks for your reply. The problem may be the fact that you're using numarray instead of Numerical Python. If I remember correctly, numarray has some facilities to handle large arrays on 64-bits machines. Since Bio.Cluster is created for Numerical Python, it doesn't know about the numarray-specific stuff. So the simplest solution may be to install Numerical Python instead of numarray. >From your output of "python setup.py install", it appears that even the compilation of Bio.Cluster failed -- if not all modules that need compilation: I don't see any of the compiled modules in the install output. So it looks like something terrible went wrong during "python setup.py build". Did you get any error messages while running "python setup.py build"? Finally, there's a typo when you execute 'run_tests("Bio.Cluster")': That should be Bio.Cluster, not Bio_Cluster. I think it's best to execute "python setup.py build" again and check if you get any error messages. Once that works OK, you probably won't get any testing errors any more. If you do, try again with Numerical Python instead of numarray. The latest good version of Numerical Python is 24.2. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Sarosh Fatakia [mailto:sarosh.fatakia@gmail.com] Sent: Tue 2/21/2006 10:01 AM To: Michiel De Hoon Cc: biopython-dev@biopython.org Subject: Re: problem in using biopython-1.41 Hello Michiel, Thanks for your response. Hope you can please help resolve the issue. a) The numpy (numarray) version is: numarray-1.5.1/ b) I am attaching the outputs you require. I hope there is sufficient info. Please do let me know if more information is required. Thanks once again, best, sarosh On 2/19/06, Michiel De Hoon wrote: > > Dear Sarosh, > > From the test results, it looks like all of Biopython is working > correctly, > except for Bio.Cluster. So if you don't plan on using the clustering > algorithms in Bio.Cluster, you've got nothing to worry about. > > I would like to find out though why Bio.Cluster is failing. It may have to > do > with the fact that you're on a 64-bits machine; the code has not been > tested > there. So I'd like to ask you the following: > a) Which version of Numerical Python are you using? > b) Can you run the following commands and send me the output of step 3: > 1) Install biopython with "python setup.py install" > 2) From the biopython-1.41/Tests directory, run "python -i > test_Cluster.py" > 3) From the python prompt, execaute "run_tests("Bio.Cluster")" > This will show you the exact output from the Bio.Cluster tests. > > Thanks in advance, > > --Michiel. > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: mailman-bounces@portal.open-bio.org on behalf of Sarosh Fatakia > Sent: Sat 2/18/2006 9:14 PM > To: biopython-dev-owner@biopython.org > Subject: problem in using biopython-1.41 > > Hi I tried sending my problem to > biopython developers. Hope you can please help, > thanks > sarosh > > > ---------- Forwarded message ---------- > From: biopython-dev-owner@portal.open-bio.org < > biopython-dev-owner@portal.open-bio.org> > Date: Feb 18, 2006 7:15 PM > Subject: problem in using biopython-1.41 > To: sarosh.fatakia@gmail.com > > You are not allowed to post to this mailing list, and your message has > been automatically rejected. If you think that your messages are > being rejected in error, contact the mailing list owner at > biopython-dev-owner@biopython.org. > > > > > ---------- Forwarded message ---------- > From: "Sarosh Fatakia" < sarosh.fatakia@gmail.com> > To: biopython-dev@biopython.org > Date: Sat, 18 Feb 2006 18:50:58 -0500 > Subject: problem in using biopython-1.41 > Greetings! > I installed biopython-1.41 on my unix box which is: > Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006 > x86_64 x86_64 x86_64 GNU/Linux > > After following the preliminary steps in > http://bioinformatics.org/bradstuff/bp/tut/Tutorial001.html > I get an error message for the tests performed as: > > python setup.py test 2>&1 | tee python_setup.py- test.out.txt > > The main error message is below and the full diagnostic info is attached > as > a txt file. > I hope you can please help figure out the problem since I am a novice > python > user, > and want to get into biopython for using it as a research tool asap. > Thanks > Sarosh, > NIDDK/NIH > > > ====================================================================== > FAIL: test_Cluster > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 148, in runTest > self.runSafeTest() > File "run_tests.py", line 185, in runSafeTest > expected_handle) > File "run_tests.py", line 285, in compare_output > assert expected_line == output_line, \ > AssertionError: > Output : 'Wrong clustering solution found.\n' > Expected: 'Correct clustering solution found.\n' > > ---------------------------------------------------------------------- > Ran 93 tests in 78.138s > > > > > > -- > Sarosh N. Fatakia > http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html > > > -- Sarosh N. Fatakia http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html From bugzilla-daemon at portal.open-bio.org Tue Feb 21 18:31:21 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Feb 21 19:15:40 2006 Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML blast output with multiple querys Message-ID: <200602212331.k1LNVL8o002535@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1933 ------- Comment #4 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-21 18:31 ------- Created an attachment (id=290) --> (http://bugzilla.open-bio.org/attachment.cgi?id=290&action=view) RPS-BLAST 2.2.10 multi query XML output file for testing iterator support Example mutlirecord XML test file, actually from rpsblast.exe 2.2.10 running on windows despite what the version string in the file claim. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 21 18:34:06 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Feb 21 19:16:01 2006 Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML blast output with multiple querys Message-ID: <200602212334.k1LNY6Ht002578@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1933 ------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-21 18:34 ------- Created an attachment (id=291) --> (http://bugzilla.open-bio.org/attachment.cgi?id=291&action=view) RPS-BLAST 2.2.10 multi query TXT output file for testing iterator support The matching "plain text" output file to go with the XML file just attached. This is at human readable and should help for any testing of the XML file parsing. Note that BioPython does not support the RPS-BLAST style plain text file format, see bug 1715 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 21 18:36:04 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Feb 21 19:16:19 2006 Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML blast output with multiple querys Message-ID: <200602212336.k1LNa4dN002614@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1933 ------- Comment #6 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-21 18:36 ------- Created an attachment (id=292) --> (http://bugzilla.open-bio.org/attachment.cgi?id=292&action=view) The FASTA file used as input to generate the test cases The attached FASTA amino acid file was used to create the previous two test cases running rpsblast.exe 2.2.10 on windows XP using the CDD database: rpsblast -i xbt_iter.faa -d data_cdd/Cdd -e 0.0001 > xbt_iter_rps.txt rpsblast -i xbt_iter.faa -d data_cdd/Cdd -e 0.0001 -m 7 > xbt_iter_rps .xml ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 22 05:49:40 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Feb 22 06:15:54 2006 Subject: [Biopython-dev] [Bug 1929] Extra reference in BLASTPGP plain text output Message-ID: <200602221049.k1MAneI8013509@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1929 biopython-bugzilla@maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-22 05:49 ------- This was fixed in CVS on 16 Aug 2005 by Jeff Chang, originally reported by Zhengwei Zhu. I can't find any reference to this in Bugzilla, and if it was reported on the mailing list I can't find it. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 22 06:19:26 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Wed Feb 22 07:15:43 2006 Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML blast output with multiple querys Message-ID: <200602221119.k1MBJQ1j014525@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1933 ------- Comment #7 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-22 06:19 ------- Created an attachment (id=293) --> (http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view) Python script to test the Blast XML iteration This is a simple test script which uses the standalone RPS-BLAST output file xbt_iter_rps.xml as the input file, attachment 290 on this bug. Using Michael Anthony Maibaum's patch (attachment 266) this seems to work fine. I would be happy to check in the patch and then integrate the XML iteration into Tests/test_GenBank.py Note: It might be even better to create a matched set of normal BLAST files (plain text and XML) with a test script to confirm they behave identically in BioPython. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Feb 22 11:23:25 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed Feb 22 11:21:23 2006 Subject: [Biopython-dev] RE: please help Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE84@cgcmail.cgc.cpmc.columbia.edu> The reason that test_Cluster fails is that on 64-bits machines, the size of an int is not equal to the size of a long. Numerical Python integer arrays are of type long by default, and casting to int causes the data stored in the arrays to be misinterpreted at the C-level. The fix is more or less straightforward; I expect to have this fixed within a few days. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Sarosh Fatakia [mailto:sarosh.fatakia@gmail.com] Sent: Tue 2/21/2006 7:26 PM To: Michiel De Hoon; biopython-dev@biopython.org Subject: please help Hi folks, I have been trying all weekend to make the biopython functional. It seems that more than one module is non-functional in the 64 bit arch of my machine, viz: Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006 x86_64 x86_64 x86_64 GNU/Linux and the OS is with Red Hat Enterprise Linux WS release 4. (Red Hat 3.4.3-9.EL4) Is there a web page that describes stepwise all the dependant module intallation which lead to a complete biopython-1.41 installation. The python version is: Python 2.3.4 With a partial installation I can only have a limited functionality. The lone test that fails in the biopython-1.41 is test_Cluster.py. Hope you can please help, Thanks, sarosh From bugzilla-daemon at portal.open-bio.org Sat Feb 25 09:33:25 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Feb 25 10:20:52 2006 Subject: [Biopython-dev] [Bug 1963] New: Adding __str__ method to codon tables and translators Message-ID: <200602251433.k1PEXPgO016034@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1963 Summary: Adding __str__ method to codon tables and translators Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: biopython-bugzilla@maubp.freeserve.co.uk The existing CodonTable and Translator objects do not provide a simple way to "see" the table. It would be nice to be able to just "print" them using the __str__ method: e.g. >>> import Bio.Data.CodonTable >>> print Bio.Data.CodonTable.standard_dna_table | G | A | T | C | --+---------+--------+--------+--------+-- G | GGG G |GAG E |GTG V |GCG A | G G | GGA G |GAA E |GTA V |GCA A | A G | GGT G |GAT D |GTT V |GCT A | T G | GGC G |GAC D |GTC V |GCC A | C --+---------+--------+--------+--------+-- A | AGG R |AAG K |ATG M(s)|ACG T | G A | AGA R |AAA K |ATA I |ACA T | A A | AGT S |AAT N |ATT I |ACT T | T A | AGC S |AAC N |ATC I |ACC T | C --+---------+--------+--------+--------+-- T | TGG W |TAG Stop|TTG L(s)|TCG S | G T | TGA Stop|TAA Stop|TTA L |TCA S | A T | TGT C |TAT Y |TTT F |TCT S | T T | TGC C |TAC Y |TTC F |TCC S | C --+---------+--------+--------+--------+-- C | CGG R |CAG Q |CTG L(s)|CCG P | G C | CGA R |CAA Q |CTA L |CCA P | A C | CGT R |CAT H |CTT L |CCT P | T C | CGC R |CAC H |CTC L |CCC P | C --+---------+--------+--------+--------+-- This was done by adding the following method to Bio/Data/CodonTable.py class CodonTable: def __str__(self) : """Returns a simple text representation of the codon table""" answer=" | " + "|".join( \ [" %s " % c2 for c2 in self.nucleotide_alphabet.letters] \ ) + "|" answer = answer + "\n--+---------+--------+--------+--------+--" for c1 in self.nucleotide_alphabet.letters : for c3 in self.nucleotide_alphabet.letters : line = c1 + " | " for c2 in self.nucleotide_alphabet.letters : codon = c1+c2+c3 if codon in self.start_codons : line = line + "%s %s(s)|" \ % (codon, self.forward_table[codon]) elif codon in self.stop_codons : line = line + "%s Stop|" \ % (codon) else: line = line + "%s %s |" \ % (codon, self.forward_table[codon]) line = line + " " + c3 answer = answer + "\n"+ line answer = answer + "\n--+---------+--------+--------+--------+--" return answer A similar __str__ method could be added to Bio/Translate.py to call the codon table's __str__ method. Comments? Should the order be UCAG rather than following self.nucleotide_alphabet.letters? Should it include three letter amino acid codes as well? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 26 10:35:32 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sun Feb 26 11:20:31 2006 Subject: [Biopython-dev] [Bug 1963] Adding __str__ method to codon tables and translators Message-ID: <200602261535.k1QFZWmI014403@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1963 ------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk 2006-02-26 10:35 ------- Revised version which: * Uses the "conventional" nucleotide ordering * Works for the ambigous tables * Shows the table's ID and name(s) Again, add this method to Bio/Data/CodonTable.py class CodonTable: def __str__(self) : """Returns a simple text representation of the codon table""" if self.id : answer = "Table %i" % self.id else : answer = "Table ID unknown" if self.names : answer = answer + " " + ", ".join(filter(None, self.names)) """ #Use the conventional ordering for the codon table #and only use the main four - even for ambiguous tables letters = self.nucleotide_alphabet.letters if "T" in letters : #DNA letters = "TCAG" elif "U" in letters : #RNA letters = "UCAG" else : print "WARNING - Unexpected alphabet" """ #Use the conventional ordering for the codon table letters = self.nucleotide_alphabet.letters if "GATC" == letters : #DNA letters = "TCAG" elif "GAUC" == letters : #RNA letters = "UCAG" answer=answer + "\n\n |" + "|".join( \ [" %s " % c2 for c2 in letters] \ ) + "|" answer=answer + "\n--+" \ + "+".join(["---------" for c2 in letters]) + "+--" for c1 in letters : for c3 in letters : line = c1 + " |" for c2 in letters : codon = c1+c2+c3 line = line + " %s" % codon if codon in self.stop_codons : line = line + " Stop|" else : try : amino = self.forward_table[codon] except KeyError : amino = "?" except TranslationError : amino = "?" if codon in self.start_codons : line = line + " %s(s)|" % amino else : line = line + " %s |" % amino line = line + " " + c3 answer = answer + "\n"+ line answer=answer + "\n--+" \ + "+".join(["---------" for c2 in letters]) + "+--" return answer Example: >>> import Bio.Data.CodonTable >>> print Bio.Data.CodonTable.unambiguous_dna_by_id[11] Table 11 Bacterial | T | C | A | G | --+---------+---------+---------+---------+-- T | TTT F | TCT S | TAT Y | TGT C | T T | TTC F | TCC S | TAC Y | TGC C | C T | TTA L | TCA S | TAA Stop| TGA Stop| A T | TTG L(s)| TCG S | TAG Stop| TGG W | G --+---------+---------+---------+---------+-- C | CTT L | CCT P | CAT H | CGT R | T C | CTC L | CCC P | CAC H | CGC R | C C | CTA L | CCA P | CAA Q | CGA R | A C | CTG L(s)| CCG P | CAG Q | CGG R | G --+---------+---------+---------+---------+-- A | ATT I(s)| ACT T | AAT N | AGT S | T A | ATC I(s)| ACC T | AAC N | AGC S | C A | ATA I(s)| ACA T | AAA K | AGA R | A A | ATG M(s)| ACG T | AAG K | AGG R | G --+---------+---------+---------+---------+-- G | GTT V | GCT A | GAT D | GGT G | T G | GTC V | GCC A | GAC D | GGC G | C G | GTA V | GCA A | GAA E | GGA G | A G | GTG V(s)| GCG A | GAG E | GGG G | G --+---------+---------+---------+---------+-- >>> print Bio.Data.CodonTable.unambiguous_rna_by_id[1] Table 1 Standard, SGC0 | U | C | A | G | --+---------+---------+---------+---------+-- U | UUU F | UCU S | UAU Y | UGU C | U U | UUC F | UCC S | UAC Y | UGC C | C U | UUA L | UCA S | UAA Stop| UGA Stop| A U | UUG L(s)| UCG S | UAG Stop| UGG W | G --+---------+---------+---------+---------+-- C | CUU L | CCU P | CAU H | CGU R | U C | CUC L | CCC P | CAC H | CGC R | C C | CUA L | CCA P | CAA Q | CGA R | A C | CUG L(s)| CCG P | CAG Q | CGG R | G --+---------+---------+---------+---------+-- A | AUU I | ACU T | AAU N | AGU S | U A | AUC I | ACC T | AAC N | AGC S | C A | AUA I | ACA T | AAA K | AGA R | A A | AUG M(s)| ACG T | AAG K | AGG R | G --+---------+---------+---------+---------+-- G | GUU V | GCU A | GAU D | GGU G | U G | GUC V | GCC A | GAC D | GGC G | C G | GUA V | GCA A | GAA E | GGA G | A G | GUG V | GCG A | GAG E | GGG G | G --+---------+---------+---------+---------+-- Question One: Is this worth adding to BioPython or not? Question Two: What is the preferred behaviour for ambiguous tables? Just a 4x4x4 table as for the unambiguous tables? Or the full 15x15x15 table? I have implemented both (see commented out code) Question Three: Is there a standard BioPython function to convert from one letter amino acid sequences into three letter names? i.e. like one_to_three from Bio.PDB.Polypeptide but more general. That function does not cope with ambigous names. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Feb 26 16:56:59 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun Feb 26 16:54:42 2006 Subject: [Biopython-dev] RE: please help Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE8D@cgcmail.cgc.cpmc.columbia.edu> > I have been trying all weekend to make the biopython functional. It seems > that more than one module is non-functional in the 64 bit arch of my machine ... > With a partial installation I can only have a limited functionality. The > lone test that fails in the biopython-1.41 is test_Cluster.py. I fixed this in CVS. If you download Bio/Cluster/clustermodule.c from there and copy it over the one in Biopython-1.41, the problems on 64-bit machines should be solved. Please let me know if you're still finding problems with test_Cluster.py. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Mon Feb 27 13:16:47 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon Feb 27 13:16:10 2006 Subject: [Biopython-dev] RE: [BioPython] qblast fails on parsing XML results Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE91@cgcmail.cgc.cpmc.columbia.edu> There is a simpler solution to this, which is to use urllib instead of the socket library in the function _send_to_qblast and _send_to_blasturl. If we use urllib, we get the results automatically without the HTTP header. So .... does anybody know why socket is used instead of urllib? If it's because older Python versions didn't have urllib, we can just replace socket by urllib to solve this problem. Or am I missing something? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces@portal.open-bio.org on behalf of Ilya Soifer Sent: Mon 2/27/2006 10:38 AM To: biopython@biopython.org Subject: [BioPython] qblast fails on parsing XML results Hi, I hope that I send it to the correct list. When I run qblast I get >>> res1 = NCBIWWW.qblast("blastn", "nr", seq1) Traceback (most recent call last): File "", line 1, in -toplevel- res1 = NCBIWWW.qblast("blastn", "nr", seq1) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1130, in qblast i = results.index("Connection: close") ValueError: substring not found This happens since the results that Blast return no longer have this header # HTTP/1.1 200 OK # Date: Wed, 05 Oct 2005 02:13:33 GMT # Server: Nde # Content-Type: text/plain # Connection: close # but this one HTTP/1.0 200 OK Date: Mon, 27 Feb 2006 11:54:40 GMT Content-Type: application/xml Server: Nde Via: 1.1 proxy7 (NetCache NetApp/6.0.2) I guess it might be better to look for something like " http://bugzilla.open-bio.org/show_bug.cgi?id=1964 Summary: GenBank.FeatureParser dies on LOCUS Record ADRCG Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: mcolosimo@mitre.org from Bio import GenBank gi_list = GenBank.search_for("ADRCG") ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = GenBank.FeatureParser()) rec = ncbi_dict[gi_list[0]] Traceback: [snip] Bio/GenBank/__init__.py", line 1507, in feed line = self._feed_header(handle, consumer) Bio/GenBank/__init__.py", line 1436, in _feed_header consumer.reference_bases(data[data.find(' ')+1:]) Bio/GenBank/__init__.py", line 458, in reference_bases locations = self._split_reference_locations(ref_base_info) Bio/GenBank/__init__.py", line 496, in _split_reference_locations start, end = base_info.split('to') ValueError: unpack list of wrong size ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 28 16:05:03 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Tue Feb 28 16:20:43 2006 Subject: [Biopython-dev] [Bug 1965] New: GenBank FeatureParser converts dates from 4 digits to TWO! Message-ID: <200602282105.k1SL53eD000322@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1965 Summary: GenBank FeatureParser converts dates from 4 digits to TWO! Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: trivial Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: mcolosimo@mitre.org People spent millions, maybe, billions at the end of the 1990s to fix this problem and some how biopython undoes it. Given a LOCUS line with the date of "23-AUG-2002", using FeatureParser converts it to "23-AUG-02". It seems that GenBank._Scanner._feed_locus seems to do the correct thing. So I'm at a loss at this time as to what is doing this "cleaning", but it would be nice to keep it as YYYY. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.