From idoerg at burnham.org  Thu Feb  2 16:14:57 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu Feb  2 16:18:54 2006
Subject: [Biopython-dev] Re: [BioPython] Bug in Bio.SeqUtils ?
In-Reply-To: <43E250E5.6030901@burnham.org>
References: <71dea9850602020635u37a7294dv15911521deeee656@mail.gmail.com>
	<43E250E5.6030901@burnham.org>
Message-ID: <43E27651.3010504@burnham.org>

Oh, sorry.

Your second problem was with protein_scale, which does indeed break on 
any letter not of the 20 regular amino acids.

I inserted this into a try/except clause which produces a warning to 
stderr, instead of raising an exception. It is now in CVS.

Yair, is that OK, or would we rather leave the exception raising bit 
there? There are arguments either way...


./I


Iddo Friedberg wrote:

> Which version are you using? I tried the 1a8y sequence which you gave, 
> and also a sequence with an 'X', and they worked fine for me. CVS 
> version.
>
> # seq is a Record object. seq.sequence is a string with the protein 
> sequence
>
> >>> from Bio.SeqUtils import ProtParam
> >>> ps = ProtParam.ProteinAnalysis(seq.sequence)
> >>> ps.isoelectric_point()
> 3.9298931884765151
>
>
> # and for a sequence with an 'x'
> >>> ps2 = ProtParam.ProteinAnalysis('xsdfgvcrtyip')
> >>> ps2.isoelectric_point()
> 5.8285980224609375
>
> Bin Hu wrote:
>
>> Hi,
>>
>> When using Bio.SeqUtils to estimate isoelectric point for PDB entry 
>> 1a8y, it
>> seems the function isoelectric_point() cannot reach an end, although it
>> worked pretty well for all the other entries that I've tested. Could 
>> this be
>> a bug in Bio.SeqUtils?
>>
>> If anyone want to test it, blow is the sequence of 1a8y:
>>
>> eegldfpeydgvdrvinvnaknyknvfkkyevlallyheppeddkasqrqfemeelilel
>> aaqvledkgvgfglvdsekdaavakklglteedsiyvfkedevieydgefsadtlvefll
>> dvledpveliegerelqafeniedeikligyfknkdsehykafkeaaeefhpyipffatf
>> dskvakkltlklneidfyeafmeepvtipdkpnseeeivnfveehrrstlrklkpesmye
>> tweddmdgihivafaeeadpdgyefleilksvaqdntdnpdlsiiwidpddfpllvpywe
>> ktfdidlsapqigvvnvtdadsvwmemddeedlpsaeeledwledvlegeintedddded
>> ddddddd
>>
>> For PDB entry 1rb9, the hydrophilicity of this protein cannot be 
>> estimated
>> because its sequence starts with "X", which is not in the key list 
>> used by
>> SeqUtils. It will bring the following error message:
>>
>> Traceback (most recent call last):
>>  File "./dataGen.py", line 62, in ?
>>    aHydrophilicityList = aSeqObj.protein_scale(ProtParamData.hw, 5)
>>  File "/usr/lib/python2.4/site-packages/Bio/SeqUtils/ProtParam.py", line
>> 206, in protein_scale
>>    score += weight[j] * ParamDict[subsequence[j]] + weight[j] *
>> ParamDict[subsequence[Window-j-1]]
>> KeyError: 'X'
>>
>> Although I can delete the "X" in this protein, could the author 
>> implement a
>> warning message and work around this error stop? Thank you.
>>
>> Bin
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython@biopython.org
>> http://biopython.org/mailman/listinfo/biopython
>>
>>
>>  
>>
>
>


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org
http://BioFunctionPrediction.org

From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 04:44:54 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 05:34:57 2006
Subject: [Biopython-dev] [Bug 1942] New: GenBank RecordParser fails on
	particular qualifier structure
Message-ID: <200602030944.k139isnV025390@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942

           Summary: GenBank RecordParser fails on particular qualifier
                    structure
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: lpritc@scri.sari.ac.uk


When parsing some GenBank record files, the GenBank.RecordParser throws an
error at a (poorly-formatted) qualifier entry:

Python 2.3.4 (#1, Feb  2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_002758.gbk'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 240, in
parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1533,
in feed
    assert line[0:1]=='/', \
AssertionError: Expected start of new qualifier, not:
similar to bacteriophage terminase small subunit"

This problem has been observed for several GenBank .gbk files, including
NC_002758 above, and NC_002929.  It appears to be caused by qualifiers
structured like /note in the following example:

     CDS             878043..878612
                     /locus_tag="SAV0800"
                     /note="
                     similar to bacteriophage terminase small subunit"
                     /codon_start=1
                     /transl_table=11
                     /product="similar to bacteriophage terminase small
                     subunit"
                     /protein_id="NP_371324.1"
                     /db_xref="GI:15923790"
                     /db_xref="GeneID:1120775"
                     /translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK
                     KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG
                     KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD
                     DEELDKTVKDVSNANPNHTVIVDDIPLED"

where the first double-quotes in the qualifier value are directly followed by
'\n', and the description continues on the next line.  Editing the source .gbk
file directly to remove this resolves the problem.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mcolosimo at mitre.org  Fri Feb  3 08:21:20 2006
From: mcolosimo at mitre.org (Marc Colosimo)
Date: Fri Feb  3 09:39:00 2006
Subject: [Biopython-dev] Error building PDB with cygwin
Message-ID: <43E358D0.5000703@mitre.org>

I got this error while trying to build biopython (normally I use OS X or 
linux). What is the fllib?

gcc -shared -Wl,--enable-auto-image-base 
build/temp.cygwin-1.5.19-i686-2.4/Bio/P
DB/mmCIF/lex.yy.o 
build/temp.cygwin-1.5.19-i686-2.4/Bio/PDB/mmCIF/MMCIFlexmodule
.o -L/usr/lib/python2.4/config -lfl -lpython2.4 -o 
build/lib.cygwin-1.5.19-i686-
2.4/Bio/PDB/mmCIF/MMCIFlex.dll
/usr/lib/gcc/i686-pc-cygwin/3.4.4/../../../../i686-pc-cygwin/bin/ld: 
cannot find
 -lfl
collect2: ld returned 1 exit status

From yair.benita at gmail.com  Fri Feb  3 03:39:38 2006
From: yair.benita at gmail.com (Yair Benita)
Date: Fri Feb  3 10:27:48 2006
Subject: [Biopython-dev] Re: [BioPython] Bug in Bio.SeqUtils ?
In-Reply-To: <43E27651.3010504@burnham.org>
Message-ID: <C008D55A.B0DB%yair.benita@gmail.com>

Hi,
Sorry I missed to follow up on that bug.
I need to revise the isoelectric point anyway since in some rare cases it
gets stuck in an endless while loop. I will also look into adding code to
handle the X in the amino acid sequence. For now I think its OK to produce a
warning instead of an exception.

Yair


on 2/2/06 10:14 PM, Iddo Friedberg at idoerg@burnham.org wrote:

> Oh, sorry.
> 
> Your second problem was with protein_scale, which does indeed break on
> any letter not of the 20 regular amino acids.
> 
> I inserted this into a try/except clause which produces a warning to
> stderr, instead of raising an exception. It is now in CVS.
> 
> Yair, is that OK, or would we rather leave the exception raising bit
> there? There are arguments either way...
> 
> 
> ./I
> 
> 
> Iddo Friedberg wrote:
> 
>> Which version are you using? I tried the 1a8y sequence which you gave,
>> and also a sequence with an 'X', and they worked fine for me. CVS
>> version.
>> 
>> # seq is a Record object. seq.sequence is a string with the protein
>> sequence
>> 
>>>>> from Bio.SeqUtils import ProtParam
>>>>> ps = ProtParam.ProteinAnalysis(seq.sequence)
>>>>> ps.isoelectric_point()
>> 3.9298931884765151
>> 
>> 
>> # and for a sequence with an 'x'
>>>>> ps2 = ProtParam.ProteinAnalysis('xsdfgvcrtyip')
>>>>> ps2.isoelectric_point()
>> 5.8285980224609375
>> 
>> Bin Hu wrote:
>> 
>>> Hi,
>>> 
>>> When using Bio.SeqUtils to estimate isoelectric point for PDB entry
>>> 1a8y, it
>>> seems the function isoelectric_point() cannot reach an end, although it
>>> worked pretty well for all the other entries that I've tested. Could
>>> this be
>>> a bug in Bio.SeqUtils?
>>> 
>>> If anyone want to test it, blow is the sequence of 1a8y:
>>> 
>>> eegldfpeydgvdrvinvnaknyknvfkkyevlallyheppeddkasqrqfemeelilel
>>> aaqvledkgvgfglvdsekdaavakklglteedsiyvfkedevieydgefsadtlvefll
>>> dvledpveliegerelqafeniedeikligyfknkdsehykafkeaaeefhpyipffatf
>>> dskvakkltlklneidfyeafmeepvtipdkpnseeeivnfveehrrstlrklkpesmye
>>> tweddmdgihivafaeeadpdgyefleilksvaqdntdnpdlsiiwidpddfpllvpywe
>>> ktfdidlsapqigvvnvtdadsvwmemddeedlpsaeeledwledvlegeintedddded
>>> ddddddd
>>> 
>>> For PDB entry 1rb9, the hydrophilicity of this protein cannot be
>>> estimated
>>> because its sequence starts with "X", which is not in the key list
>>> used by
>>> SeqUtils. It will bring the following error message:
>>> 
>>> Traceback (most recent call last):
>>>  File "./dataGen.py", line 62, in ?
>>>    aHydrophilicityList = aSeqObj.protein_scale(ProtParamData.hw, 5)
>>>  File "/usr/lib/python2.4/site-packages/Bio/SeqUtils/ProtParam.py", line
>>> 206, in protein_scale
>>>    score += weight[j] * ParamDict[subsequence[j]] + weight[j] *
>>> ParamDict[subsequence[Window-j-1]]
>>> KeyError: 'X'
>>> 
>>> Although I can delete the "X" in this protein, could the author
>>> implement a
>>> warning message and work around this error stop? Thank you.
>>> 
>>> Bin
>>> 
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython@biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>> 
>>> 
>>>  
>>> 
>> 
>> 
> 


From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 11:13:29 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 11:35:46 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular
	qualifier structure
Message-ID: <200602031613.k13GDT4L000333@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-03 11:13 -------
Which version of BioPython are you using?

I thought this was fixed in CVS, see bug 1903


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 11:30:41 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 11:36:04 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular
	qualifier structure
Message-ID: <200602031630.k13GUfLF000661@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-03 11:30 -------
Using the CVS copy of Bio/GenBank/__init__.py your example works for me. 
Please reopen the bug or follow up on the mailing list if that doesn't solve
the problem for you.

My copy of the NC_002758 GenBank file has the same "bad" note entry, it starts:

LOCUS       NC_002758            2878529 bp    DNA     circular BCT 19-JAN-2005

Sample output:

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_002758.gbk'))
>>> print record.features[1635]
     CDS             878043..878612
                     /locus_tag="SAV0800"
                     /note=" similar to bacteriophage terminase small subunit"
                     /codon_start=1
                     /transl_table=11
                     /product="similar to bacteriophage terminase small
                     subunit"
                     /protein_id="NP_371324.1"
                     /db_xref="GI:15923790"
                     /db_xref="GeneID:1120775"
                     /translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK
                     KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG
                     KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD
                     DEELDKTVKDVSNANPNHTVIVDDIPLED"

Notice that the original "bad" formating has not been preserved - which is
arguably a bug...

*** This bug has been marked as a duplicate of 1903 ***


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 13:48:04 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 14:35:46 2006
Subject: [Biopython-dev] [Bug 1943]  New: Bad Documentation in Bio.Fasta
Message-ID: <200602031848.k13Im4mB003238@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1943

           Summary: Bad Documentation in Bio.Fasta
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: mcolosimo@mitre.org


This has been getting me every year I think. I implicatively state what these
objects really are.

Second, fixed the order in the Documentation for title2id to match the actual
code.

Diff below

Index: __init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Fasta/__init__.py,v
retrieving revision 1.13
diff -r1.13 __init__.py
9,10c9,10
< RecordParser       Parses FASTA sequence data into a Record object.
< SequenceParser     Parses FASTA sequence data into a Sequence object.
---
> RecordParser       Parses FASTA sequence data into a Fasta.Record object.
> SequenceParser     Parses FASTA sequence data into a SeqRecord object.
109c109
<     """Parses FASTA sequence data into a Record object.
---
>     """Parses FASTA sequence data into a Fasta.Record object.
126c126
<     """Parses FASTA sequence data into a Sequence object.
---
>     """Parses FASTA sequence data into a SeqRecord object.
136c136
<         file (without the beginning >), will return the name, id and
---
>         file (without the beginning >), will return the id, name,  and


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 14:13:05 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 14:36:05 2006
Subject: [Biopython-dev] [Bug 1944] New: Align.Generic adding iterator and
	more
Message-ID: <200602031913.k13JD5wD003550@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944

           Summary: Align.Generic adding iterator and more
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: mcolosimo@mitre.org


I thought it would be nice to be able to directly iterate over the SeqRecords
in an alignment. So, I wrote it up and tested it with Bio.Clustalw. I also
added the ability to fill in other fields of the SeqRecord (similar to
Fasta.SequenceParser)

Diff below:

Index: Generic.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Align/Generic.py,v
retrieving revision 1.5
diff -r1.5 Generic.py
32c32
<         # hold everything at a list of seq record objects
---
>         # hold everything as a list of SeqRecord objects
33a34
>         self._iter_pos = 0
34a36,51
>     def __iter__(self):
>         self.__iter_pos = 0
>         return iter(self.next, None)
>         
>     def next(self):
>         """Returns one sequence record at a time.
>         
>         @return: a SeqRecord or None if end of iteration.
>         """
>         if self._iter_pos >= len(self._records):
>             return None
>             
>         rec = self._records[self._iter_pos]
>         self._iter_pos += 1
>         return rec
>         
38c55,56
<         The return value is a list of SeqRecord objects.
---
>         @return: a list of the sequences.
>         @rtype: SeqRecord
45,49c63,66
<         Returns:
<         o A Seq object for the requested sequence.
< 
<         Raises:
<         o IndexError - If the specified number is out of range.
---
>         @param number: the number of the sequence in the consensus.
>         @return: the requested sequence.
>         @rtype: SeqRecord.
>         @raise IndexError: If the specified number is out of range.
69c86
<                      weight = 1.0):
---
>                      weight = 1.0, description2ids = None):
86a104,107
>         o descriptor2id - A function that, when given the descriptor,
>         will return the id, name, and description (in that order)
>         for the record. If this is not given, then the entire descriptor
>         line will be used as the description.
89c110,117
<         new_record = SeqRecord(new_seq, description = descriptor)
---
>         rec = SeqRecord(new_seq)
>         if title2ids:
>             seq_id, name, descr = title2ids(descriptor)
>             rec.id = seq_id
>             rec.name = name
>             rec.description = descr
>         else:
>             rec.description = descriptor
99c127
<             new_record.annotations['start'] = start
---
>             rec.annotations['start'] = start
101c129
<             new_record.annotations['end'] = end
---
>             rec.annotations['end'] = end
104c132
<         new_record.annotations['weight'] = weight
---
>         rec.annotations['weight'] = weight
106c134,147
<         self._records.append(new_record)
---
>         # what happens if we're iterating?
>         self._records.append(rec)
> 
>         
>     def addSeqRecord(self, seqRec):
>         """Add a Sequence Record to the Alignment
> 
>         @param seqRec: a sequence record (SeqRecord) to add.
>         """
>         if isinstance(seqRec, SeqRecord):
>             self._records.append(seqRec)
>         else:
>             raise TypeError("sequence is NOT a SeqRecord Object")
>


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb  3 21:23:26 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb  3 21:34:47 2006
Subject: [Biopython-dev] [Bug 1946] New: Parsing GenBank Files -
	ParserPositionException:
Message-ID: <200602040223.k142NQwM009545@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1946

           Summary: Parsing GenBank Files - ParserPositionException:
           Product: Biopython
           Version: Not Applicable
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Martel/Mindy
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: julius.lucks@gmail.com


Parsing a genbank file with the following code
(BioPython version 1.41 installed with fink on python 2.3 on OS X):

from Bio import GenBank
feature_parser = GenBank.FeatureParser()
gb_record = feature_parser.parse(open('bug.gb','r'))

I get a trace:
Traceback (most recent call last):
  File "bug.py", line 11, in ?
    gb_record = feature_parser.parse(open(gb_file,'r'))
  File "/sw/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 219, in
parse
    self._scanner.feed(handle, self._consumer)
  File "/sw/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1259, in
feed
    self._parser.parseFile(handle)
  File "/sw/lib/python2.3/site-packages/Martel/Parser.py", line 328, in
parseFile
    self.parseString(fileobj.read())
  File "/sw/lib/python2.3/site-packages/Martel/Parser.py", line 356, in
parseString
    self._err_handler.fatalError(result)
  File "/sw/lib/python2.3/xml/sax/handler.py", line 38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond character 196


The contents of bug.gb are:
LOCUS       NC_001416              48502 bp    DNA     linear   PHG 08-DEC-2005
DEFINITION  Enterobacteria phage lambda, complete genome.
ACCESSION   NC_001416
VERSION     NC_001416.1  GI:9626243
PROJECT     GenomeProject:14204
KEYWORDS    .
.. (Truncated)

If I remove the PROJECT line, the bug is fixed.  This seems to be an uncommon
tag in GenBank files, so I am not sure if the parser takes this into account.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Sun Feb  5 07:00:15 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sun Feb  5 07:35:08 2006
Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line
	type PROJECT
Message-ID: <200602051200.k15C0FsZ010981@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1946


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
          Component|Martel/Mindy                |Main Distribution
         OS/Version|Mac OS                      |All
            Summary|Parsing GenBank Files -     |Parsing GenBank Files -
                   |ParserPositionException:    |unknown line type PROJECT


------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-05 07:00 -------
The non-martel GenBank parser in CVS is also unaware of the project line in
GenBank files.

I would expect it to fail with an assertion error:

Unknown line type, PROJECT found:
PROJECT     GenomeProject:14204

This looks like an easy fix, however we need to decide how to store the project
information.  Maybe a simple string for now, "GenomeProject:14204"

Also maybe unknown line types in the header should trigger warnings rather than
errors that stop the parsing...

---------------------------------------

Quoting from 
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt

---------------------------------------

1.4.1 New Linetype for Genome Project Identifier

  DDBJ, EMBL, and GenBank are working to create a collaborative system that
will assign a unique numeric identifier to genome projects. The purpose of
this new identifier is to provide a link among sequence records that pertain
to a specific genome sequencing project.

  At GenBank, this new identifier will be presented in the flatfile format
via a new linetype : PROJECT . Here is a mocked-up example demonstrating
the new linetype's use:

LOCUS       CH476840             1669278 bp    DNA     linear   CON 05-OCT-2005
DEFINITION  Magnaporthe grisea 70-15 supercont5.200 genomic scaffold, whole
            genome shotgun sequence.
ACCESSION   CH476840 AACU02000000
VERSION     CH476840.1  GI:77022292
PROJECT     GENOME_PROJECT:12345

The integer 12345 represents the value of a possible genome project
identifier.

There is a possibility that the contents of the PROJECT line might change 
somewhat from this example by the time the new identifier is implemented.
We will keep you posted of any such changes via these release notes and the
GenBank listserv.

  These Genome Project identifiers will be searchable within NCBI's
Entrez: Genome-Project database:

  http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj

  The earliest date on which this new linetype will appear in the GenBank
flatfile format is February 15 2006.
---------------------------------------

Looks like they are ahead of shedule in releasing this new type line.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb  6 05:24:39 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb  6 05:35:12 2006
Subject: [Biopython-dev] [Bug 1946] Parsing GenBank Files - unknown line
	type PROJECT
Message-ID: <200602061024.k16AOdoG030083@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1946


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
           Platform|Macintosh                   |All


------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-06 05:24 -------
Initial fix checked in, see Bio/GenBank/__init__.py revision 1.57

The parser will now print a warning when unknown header lines are found, but
will continue parsing the file.  Previously parsing would halt at the unknown
line.  This will allow people to deal with files containing the new project
line (and other headers the NCBI may introduce in the future).

Once the NCBI settle on the exact format of the new project line, we should
sort out how to represent it in BioPython.  I have therefore left this bug open
for now...


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb  6 05:51:18 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb  6 06:35:09 2006
Subject: [Biopython-dev] [Bug 1943] Bad Documentation in Bio.Fasta
Message-ID: <200602061051.k16ApIul030526@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1943


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-06 05:51 -------
I have checked in Marc's suggested three changes to the comments in
Bio/Fasta/__init__.py - see revision 1.14

The mistake about (name,id,descr) versus (id,name,descr) in the doc string for
the SequenceParser about the title2ids argument had been there a long time.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From gould at embl.de  Thu Feb  9 04:37:07 2006
From: gould at embl.de (gould@embl.de)
Date: Thu Feb  9 04:32:59 2006
Subject: [Biopython-dev] uniprot release 49/biopython script no longer work
Message-ID: <20060209103707.1ckq8osc6d5c844s@webmail.embl.de>

hi

I've been having problems with some of our applications here that use biopython
scripts to retrieve a record from uniprot/swissprot given an accession
nr/ID....As far as I'm aware the problem only occurred after the release 49.0
of uniprot/swissprot db yesterday...I see from the release notes that some
changes were made to the annotation format and suspect this is why the
biopython scripts are no longer happy??....I've checked to make sure I have the
latest version of biopython but this has not remedied the problem.....This
problem would seem to lie with biopython but I was wondering if you
are aware of this problem and if any fix is to be made available??

thanks

Kate Gould


________________________________________________________________________________
Software Engineer

Gibson Team
Structural and Computational Biology Unit
EMBL
Meyerhofstrasse 1
69117 Heidelberg,  Germany

phone:         +49 6221 387 451
fax:        +49 6221 387 517

http://elm.eu.org/
http://phospho.elm.eu.org/
From bugzilla-daemon at portal.open-bio.org  Thu Feb  9 08:59:49 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Thu Feb  9 09:34:50 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular
	qualifier structure
Message-ID: <200602091359.k19DxnVl013204@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


lpritc@scri.sari.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lpritc@scri.sari.ac.uk
             Status|RESOLVED                    |REOPENED
         Resolution|DUPLICATE                   |


------- Comment #3 from lpritc@scri.sari.ac.uk  2006-02-09 08:59 -------
The updated CVS code from February 8th falls over on the note qualifier of the
following record from NC_007633.gbk

     CDS             391217..391771
                     /locus_tag="MCAP_0327"
                     /note="Similar non-mycoplasma proteins have and additional

                     120 amino acids at the COOH end; identified by similarity
                     to SP:P54575; match to protein family HMM PF06574"
                     /codon_start=1
                     /transl_table=4
                     /product="riboflavin kinase (flavokinase) domain protein"
                     /protein_id="YP_424312.1"
                     /db_xref="GI:83319941"
                     /db_xref="GeneID:3828958"
                     /translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA
                     KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT
                     KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK
                     IKKLLDENLVDKAQELLGIDLKLK"

Deleting the extra \n in the record resolves the problem.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Thu Feb  9 13:52:27 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Thu Feb  9 14:35:30 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular
	qualifier structure
Message-ID: <200602091852.k19IqQHQ018021@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


------- Comment #4 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-09 13:52 -------
This does seem to work for me using a freshly downloaded NC_007633.gbk that
starts:

LOCUS       NC_007633            1010023 bp    DNA     circular BCT 18-JAN-2006

It has the blank line 7114 you reported in locus MCAP_0327

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_007633.gbk'))
WARNING - Ignoring an unknown line type, PROJECT found:
PROJECT     GenomeProject:16208

>>> print record.features[644]
     CDS             391217..391771
                     /locus_tag="MCAP_0327"
                     /note="Similar non-mycoplasma proteins have and additional
                     120 amino acids at the COOH end; identified by similarity
                     to SP:P54575; match to protein family HMM PF06574"
                     /codon_start=1
                     /transl_table=4
                     /product="riboflavin kinase (flavokinase) domain protein"
                     /protein_id="YP_424312.1"
                     /db_xref="GI:83319941"
                     /db_xref="GeneID:3828958"
                     /translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA
                     KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT
                     KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK
                     IKKLLDENLVDKAQELLGIDLKLK"

The warning about the PROJECT line is a recent change, see bug 1946

I am using the latest version of Bio/GenBank/__init__.py which is revision 1.57
checked in 6 Feb 2006.  This should be the same as yours if you downloaded it
on 8 Feb...

Assuming you have the same genbank file (same date in the LOCUS line) and the
same Bio/GenBank/__init__.py as me, then maybe there is something else
different between our machines, maybe in another part of BioPython.

Or, it could be a Windows/Unix line ending problem?  Or worse, LF vs CR vs
CRLF.  Did you download the file by FTP or via the website?  This might make a
difference if the original file contained a mixture of CR and CRLF.

So far I have only tried this on Windows (and I download the file via the NCBI
website), and BioPython copes with the GenBank file in either windows or unix
format.

I have not (yet) tried it on Linux...

Could you check what happens if you use dos2unix and/or unix2dos on your
GenBank file?

Thanks


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 06:45:54 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 07:35:14 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank
	lines in features
Message-ID: <200602101145.k1ABjsRA028960@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |ASSIGNED
         OS/Version|Linux                       |All
           Platform|PC                          |All
            Summary|GenBank RecordParser fails  |GenBank RecordParser fails
                   |on particular qualifier     |on blank lines in features
                   |structure                   |


------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-10 06:45 -------
Tried on Linux with file download by HTTP from here, using the send to file
option:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi??db=nucleotide&val=NC_007633

..and it works perfectly.  The "blank line" has the expected 21 spaces.

Then I tried (again on Linux) with the file downloaded by FTP from here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Mycoplasma_capricolum_ATCC_27343/NC_007633.gbk

And it failed.  Looking at the file in an editor, the blank line is empty - it
doesn't have the 21 spaces.  As a result, the parser failure is understandable:

Traceback (most recent call last):
  File "/home/maubp/GenBank/bug1942.py", line 3, in -toplevel-
    record = parser.parse(file('NC_007633.gbk'))
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 212, in
parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1630,
in feed
    assert line[0:FEATURE_QUALIFIER_INDENT]==FEATURE_QUALIFIER_SPACER, \
AssertionError: Expected qualifier description continuation, not:

Is this the same error you saw Leighton?

I would say the file with a blank line is actually a malformed GenBank file,
but as this is an offical NCBI supplied file I'll try and get the parser to
support this too.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 07:18:18 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 07:35:28 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank
	lines in features
Message-ID: <200602101218.k1ACIIYo029244@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


lpritc@scri.sari.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|All                         |Linux
           Platform|All                         |PC


------- Comment #6 from lpritc@scri.sari.ac.uk  2006-02-10 07:18 -------
Hi Peter,

I normally update biopython incrementally from CVS using `cvs update`, and the
last update I made was February 8th.  I also downloaded the source for
__init__.py v1.57 via ViewCVS from the BioPython site, and ran a diff against
the installed versions:

[lpritc@lplinuxdev downloads]$ diff __init__.py
/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py
[lpritc@lplinuxdev downloads]$ diff __init__.py
/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py

Neither run reported any differences.

The NCBI files are downloaded by ftp in BIN mode direct to the Linux box I work
on.  Checking the offending record with khexedit shows that the two linebreaks
are both CRs (0a), which you would expect to be handled equivalently on Windows
and Linux.  The NC_007633.gbk file I'm using has the same date as yours.

Running unix2dos, then attempting to parse, and dos2unix, and attempting to
parse, on a freshly-downloaded copy of NC_007633.gbk threw the same error in
each case on Linux:

>>> parser.parse(fhandle)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "Bio/GenBank/__init__.py", line 212, in parse
    self._scanner.feed(handle, self._consumer)
  File "Bio/GenBank/__init__.py", line 1630, in feed
    assert line[0:FEATURE_QUALIFIER_INDENT]==FEATURE_QUALIFIER_SPACER, \
AssertionError: Expected qualifier description continuation, not:

I get the same error message using the same file on Windows, even after
conversion with unix2dos, and on Mac OS X with a fresh download of
NC_007633.gbk via ftp, and fresh install of biopython from CVS.

I must say I'm baffled as to why I'm getting a different result to you on this
file.  On the bright side, it's the only current bacterial genome .gbk file I'm
having a problem with.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 07:23:54 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 07:35:46 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank
	lines in features
Message-ID: <200602101223.k1ACNsjQ029272@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


------- Comment #7 from lpritc@scri.sari.ac.uk  2006-02-10 07:23 -------
Interesting collision, there ;)

Yep - I'm getting exactly that error message, and when I checked the
NC_007633.gbk file with khexedit I didn't see the run of 21 spaces, either.  Is
it possible that that run of spaces is stripped out by my ftp client during
transfer?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mcolosimo at mitre.org  Fri Feb 10 08:27:14 2006
From: mcolosimo at mitre.org (Marc Colosimo)
Date: Fri Feb 10 08:23:06 2006
Subject: [Biopython-dev] An iterator for Align.Generic
Message-ID: <43EC94B2.6090209@mitre.org>

Last week I sent in a patch to Align.Generic to make it an iterator [Bug 
1944]. Has anyone looked at this? I've found this to be very useful to 
me and I really don't want to keep a patch file around to add this 
functionality each time I checkout biopython.

Marc

From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 08:59:33 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 09:35:09 2006
Subject: [Biopython-dev] [Bug 1948] New: uniprot release 49/SProt.Record
	Parser Problem
Message-ID: <200602101359.k1ADxXTk030188@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948

           Summary: uniprot release 49/SProt.Record Parser Problem
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: gould@embl.de


I've been having problems with some of our applications that use biopython
scripts to retrieve a record from uniprot/swissprot given an accession
nr/ID....As far as I'm aware the problem only occurred after the release 49.0
of uniprot/swissprot db on 6th Feb...I see from the release notes that some
changes were made to the annotation format and suspect this is why the
biopython scripts are no longer happy??....I've checked to make sure I have the
latest version of biopython but this has not remedied the problem.....This
problem would seem to lie with biopython.
Are any fixes is to be made available??
An example of the error being thrown is below:

Python 2.4 (#1, Dec 10 2004, 11:49:12)
[GCC 3.3.1 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.WWW import ExPASy
>>> from Bio.SwissProt import SProt
>>> from Bio import File
>>> acc='Q14155'
>>> results = ExPASy.get_sprot_raw(acc.strip()).read()
>>>  sp_parser = SProt.RecordParser()
  File "<stdin>", line 1
    sp_parser = SProt.RecordParser()
    ^
SyntaxError: invalid syntax
>>>  sp_parser = SProt.RecordParse
  File "<stdin>", line 1
    sp_parser = SProt.RecordParse
    ^
SyntaxError: invalid syntax
>>> sp_parser = SProt.RecordParser()
>>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser)
>>> Record = sp_iterator.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
166
, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
290                                                                            
                                        , in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
332                                                                            
                                        , in feed
    self._scan_record(uhandle, consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
337                                                                            
                                        , in _scan_record
    fn(self, uhandle, consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
369                                                                            
                                        , in _scan_id
    self._scan_line('ID', uhandle, consumer.identification, exactly_one=1)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
359                                                                            
                                        , in _scan_line
    read_and_call(uhandle, event_fn, start=line_type)
  File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300,
                                                                               
                                     in read_and_ca

ll
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'ID':
<HTML LANG="EN">

>>>


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 08:52:09 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 09:35:26 2006
Subject: [Biopython-dev] [Bug 1942] GenBank RecordParser fails on blank
	lines in features
Message-ID: <200602101352.k1ADq91w030107@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1942


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #8 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-10 08:52 -------
I have checked in a fix for blank lines in Genbank feature entries.  The parser
will print a warning to screen and ignore the blank line(s).

See Bio/GenBank/__init__.py revision: 1.58

If you manage to find anymore "unusual Genbank files" please file another bug.

Thanks Leighton.

Peter

P.S.

Leighton wrote:
> Is it possible that that run of spaces is stripped
> out by my ftp client during transfer?

I don't know - but if so, the same thing happened with my browser's FTP
support.  Even if this is just a file transfer issue, I still think we should
cope.  My personal opionion is this is just a one-off bad entry in the NCBI's
records, but that BioPython should be able to read anything they produce.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk  Fri Feb 10 10:03:24 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri Feb 10 10:18:32 2006
Subject: [Biopython-dev] An iterator for Align.Generic
In-Reply-To: <43EC94B2.6090209@mitre.org>
References: <43EC94B2.6090209@mitre.org>
Message-ID: <43ECAB3C.4010708@maubp.freeserve.co.uk>

Marc Colosimo wrote:
> Last week I sent in a patch to Align.Generic to make it an iterator [Bug 
> 1944]. Has anyone looked at this? I've found this to be very useful to 
> me and I really don't want to keep a patch file around to add this 
> functionality each time I checkout biopython.
> 
> Marc

Hi Mark,

I haven't looked at your code, but I do use Clustal alignments quite 
often in BioPython.  Could you put together a short example showing how 
this would work?  If this was added to BioPython then we could use this 
to update the cook book.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 10:13:12 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 10:35:40 2006
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
Message-ID: <200602101513.k1AFDCPd031502@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #1 from mdehoon@ims.u-tokyo.ac.jp  2006-02-10 10:13 -------
Can you write an example script of how to use this, and how this is different
from the current usage of Align.Generic?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Fri Feb 10 10:17:30 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Fri Feb 10 10:35:49 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602101517.k1AFHUgD031595@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-10 10:17 -------
I'm not familar with this module, but I get a rather different result.

Could you attached the file that ExPASy.get_sprot_raw() returns to this bug? 
It looks like you got an HTML file back - I would guess this was an error page
due to a temporary problem.  If you try again I think something else will
happen...

When I just did this on Windows, I did get a valid looking file back, but
BioPython still failed to parse it:

from Bio.WWW import ExPASy
from Bio.SwissProt import SProt
from Bio import File
acc='Q14155'
results = ExPASy.get_sprot_raw(acc.strip()).read()
sp_parser = SProt.RecordParser()
sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser)
Record = sp_iterator.next()

It also failed at the iterator next step, but in a different way:
Traceback (most recent call last):
  File "c:\temp\bug1948.py", line 8, in -toplevel-
    Record = sp_iterator.next()
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 166, in
next
    return self._parser.parse(File.StringHandle(data))
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 290, in
parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 332, in
feed
    self._scan_record(uhandle, consumer)
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 337, in
_scan_record
    fn(self, uhandle, consumer)
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 378, in
_scan_dt
    self._scan_line('DT', uhandle, consumer.date, exactly_one=1)
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 359, in
_scan_line
    read_and_call(uhandle, event_fn, start=line_type)
  File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 301, in
read_and_call
    method(line)
  File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 551, in
date
    assert rel_index >= 0, \
AssertionError: Could not find Rel. in DT line: DT   01-NOV-1997, integrated
into UniProtKB/Swiss-Prot.


Looking at the file returned gave:

>>> print results
ID   ARHG7_HUMAN    STANDARD;      PRT;   803 AA.
AC   Q14155; Q6P9G3; Q6PII2; Q86W63; Q8N3M1;
DT   01-NOV-1997, integrated into UniProtKB/Swiss-Prot.
DT   19-JUL-2004, sequence version 2.
DT   07-FEB-2006, entry version 55.
DE   Rho guanine nucleotide exchange factor 7 (PAK-interacting exchange
DE   factor beta) (Beta-Pix) (COOL-1) (p85).
..
//

Reading Bio/SwissProt/Spot.py class _RecordConsumer method date(), none of
those three DT lines look like what the code is expecting.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mcolosimo at mitre.org  Fri Feb 10 11:48:10 2006
From: mcolosimo at mitre.org (Marc Colosimo)
Date: Fri Feb 10 11:55:45 2006
Subject: [Biopython-dev] An iterator for Align.Generic
In-Reply-To: <43ECAB3C.4010708@maubp.freeserve.co.uk>
References: <43EC94B2.6090209@mitre.org>
	<43ECAB3C.4010708@maubp.freeserve.co.uk>
Message-ID: <43ECC3CA.4080804@mitre.org>

Peter wrote:

> Marc Colosimo wrote:
>
>> Last week I sent in a patch to Align.Generic to make it an iterator 
>> [Bug 1944]. Has anyone looked at this? I've found this to be very 
>> useful to me and I really don't want to keep a patch file around to 
>> add this functionality each time I checkout biopython.
>>
>> Marc
>
>
> Hi Mark,
>
> I haven't looked at your code, but I do use Clustal alignments quite 
> often in BioPython.  Could you put together a short example showing 
> how this would work?  If this was added to BioPython then we could use 
> this to update the cook book.
>
> Peter
>
Peter,

First, I think I need to make one change in the code so that one can 
re-iterate (basically reset the _iter_pos to 0(

Now, here is some code (I think this will work), which shows that you 
can make one function to handle sequences from different file types.

from Bio import Clustalw
from Bio import Fasta

def doSomethingInteresting(recIter):
    for seqRec in recIter:
       print seqRec.description
       print seqRec.seq.tostring()

#main

fasta_iter = Fasta.Iterator( open("my.fasta"), Fasta.SequenceParser() )  
# where my.fasta is the unaligned sequences
aln_iter = Clustalw.parse_file("my.aln")   # where my.aln is the aligned 
sequences

doSomethingInteresting(fasta_iter)
doSomethingInteresting(aln_iter)


From mdehoon at c2b2.columbia.edu  Fri Feb 10 16:39:53 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri Feb 10 16:43:31 2006
Subject: [Biopython-dev] uniprot release 49/biopython script no longer work
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu>

Could you post the error message that you're getting? Preferably, with a
simple script that causes the error to appear?

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-dev-bounces@portal.open-bio.org on behalf of gould@embl.de
Sent: Thu 2/9/2006 4:37 AM
To: biopython-dev@biopython.org
Subject: [Biopython-dev] uniprot release 49/biopython script no longer work
 
hi

I've been having problems with some of our applications here that use
biopython
scripts to retrieve a record from uniprot/swissprot given an accession
nr/ID....As far as I'm aware the problem only occurred after the release 49.0
of uniprot/swissprot db yesterday...I see from the release notes that some
changes were made to the annotation format and suspect this is why the
biopython scripts are no longer happy??....I've checked to make sure I have
the
latest version of biopython but this has not remedied the problem.....This
problem would seem to lie with biopython but I was wondering if you
are aware of this problem and if any fix is to be made available??

thanks

Kate Gould


_____________________________________________________________________________
___
Software Engineer

Gibson Team
Structural and Computational Biology Unit
EMBL
Meyerhofstrasse 1
69117 Heidelberg,  Germany

phone:         +49 6221 387 451
fax:        +49 6221 387 517

http://elm.eu.org/
http://phospho.elm.eu.org/
_______________________________________________
Biopython-dev mailing list
Biopython-dev@biopython.org
http://biopython.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Sun Feb 12 15:49:03 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sun Feb 12 16:34:49 2006
Subject: [Biopython-dev] [Bug 1949] New: Biopython Nexus: Trees.py. Check
	for False fails
Message-ID: <200602122049.k1CKn3X3017202@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1949

           Summary: Biopython Nexus: Trees.py. Check for False fails
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: wb@binf.ku.dk


Line 383 in Trees.py should be changed from:

    if not newroot_subtree: 

to:

    if newroot_subtree == False

The problem arises whenever the value of newroot_subtree is 0.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From gould at embl.de  Mon Feb 13 03:10:46 2006
From: gould at embl.de (gould@embl.de)
Date: Mon Feb 13 03:06:39 2006
Subject: [Biopython-dev] uniprot release 49/biopython script no longer work
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE5A@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <20060213091046.pejpszhykf3kssc4@webmail.embl.de>

the simple script/error message which occurs when attempting to parse 
the result
from uniprot is as follows:


gould@milou:~> python
Python 2.4 (#1, Dec 10 2004, 11:49:12)
[GCC 3.3.1 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.WWW import ExPASy
>>> from Bio.SwissProt import SProt
>>> from Bio import File
>>> acc='Q14155'
>>> results = ExPASy.get_sprot_raw(acc.strip()).read()
>>> sp_parser = SProt.RecordParser()
>>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser)
>>> Record = sp_iterator.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
166, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
290, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
332, in feed
    self._scan_record(uhandle, consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
337, in _scan_record
    fn(self, uhandle, consumer)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
378, in _scan_dt
    self._scan_line('DT', uhandle, consumer.date, exactly_one=1)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
359, in _scan_line
    read_and_call(uhandle, event_fn, start=line_type)
  File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 301,
in read_and_call
    method(line)
  File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
551, in date
    assert rel_index >= 0, \
AssertionError: Could not find Rel. in DT line: DT   01-NOV-1997, integrated
into UniProtKB/Swiss-Prot.


Quoting Michiel De Hoon <mdehoon@c2b2.columbia.edu>:

> Could you post the error message that you're getting? Preferably, with a
> simple script that causes the error to appear?
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-dev-bounces@portal.open-bio.org on behalf of =
> gould@embl.de
> Sent: Thu 2/9/2006 4:37 AM
> To: biopython-dev@biopython.org
> Subject: [Biopython-dev] uniprot release 49/biopython script no longer =
> work
> =20
> hi
>
> I've been having problems with some of our applications here that use
> biopython
> scripts to retrieve a record from uniprot/swissprot given an accession
> nr/ID....As far as I'm aware the problem only occurred after the release =
> 49.0
> of uniprot/swissprot db yesterday...I see from the release notes that =
> some
> changes were made to the annotation format and suspect this is why the
> biopython scripts are no longer happy??....I've checked to make sure I =
> have
> the
> latest version of biopython but this has not remedied the =
> problem.....This
> problem would seem to lie with biopython but I was wondering if you
> are aware of this problem and if any fix is to be made available??
>
> thanks
>
> Kate Gould
>
>
> _________________________________________________________________________=

From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 03:21:38 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 03:34:50 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602130821.k1D8Lclc024275@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


------- Comment #2 from gould@embl.de  2006-02-13 03:21 -------
(In reply to comment #0)
> I've been having problems with some of our applications that use biopython
> scripts to retrieve a record from uniprot/swissprot given an accession
> nr/ID....As far as I'm aware the problem only occurred after the release 49.0
> of uniprot/swissprot db on 6th Feb...I see from the release notes that some
> changes were made to the annotation format and suspect this is why the
> biopython scripts are no longer happy??....I've checked to make sure I have the
> latest version of biopython but this has not remedied the problem.....This
> problem would seem to lie with biopython.
> Are any fixes is to be made available??
> An example of the error being thrown is below:
> 
> Python 2.4 (#1, Dec 10 2004, 11:49:12)
> [GCC 3.3.1 (SuSE Linux)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio.WWW import ExPASy
> >>> from Bio.SwissProt import SProt
> >>> from Bio import File
> >>> acc='Q14155'
> >>> results = ExPASy.get_sprot_raw(acc.strip()).read()
> >>>  sp_parser = SProt.RecordParser()
>   File "<stdin>", line 1
>     sp_parser = SProt.RecordParser()
>     ^
> SyntaxError: invalid syntax
> >>>  sp_parser = SProt.RecordParse
>   File "<stdin>", line 1
>     sp_parser = SProt.RecordParse
>     ^
> SyntaxError: invalid syntax
> >>> sp_parser = SProt.RecordParser()
> >>> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser)
> >>> Record = sp_iterator.next()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 166
> , in next
>     return self._parser.parse(File.StringHandle(data))
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 290                                                                            
>                                         , in parse
>     self._scanner.feed(handle, self._consumer)
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 332                                                                            
>                                         , in feed
>     self._scan_record(uhandle, consumer)
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 337                                                                            
>                                         , in _scan_record
>     fn(self, uhandle, consumer)
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 369                                                                            
>                                         , in _scan_id
>     self._scan_line('ID', uhandle, consumer.identification, exactly_one=1)
>   File "/usr/local/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line
> 359                                                                            
>                                         , in _scan_line
>     read_and_call(uhandle, event_fn, start=line_type)
>   File "/usr/local/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300,
>                                                                                
>                                      in read_and_ca
> 
> ll
>     raise SyntaxError, errmsg
> SyntaxError: Line does not start with 'ID':
> <HTML LANG="EN">
> 
> >>>
> 

(In reply to comment #1)
> I'm not familar with this module, but I get a rather different result.
> 
> Could you attached the file that ExPASy.get_sprot_raw() returns to this bug? 
> It looks like you got an HTML file back - I would guess this was an error page
> due to a temporary problem.  If you try again I think something else will
> happen...
> 
> When I just did this on Windows, I did get a valid looking file back, but
> BioPython still failed to parse it:
> 
> from Bio.WWW import ExPASy
> from Bio.SwissProt import SProt
> from Bio import File
> acc='Q14155'
> results = ExPASy.get_sprot_raw(acc.strip()).read()
> sp_parser = SProt.RecordParser()
> sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser)
> Record = sp_iterator.next()
> 
> It also failed at the iterator next step, but in a different way:
> Traceback (most recent call last):
>   File "c:\temp\bug1948.py", line 8, in -toplevel-
>     Record = sp_iterator.next()
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 166, in
> next
>     return self._parser.parse(File.StringHandle(data))
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 290, in
> parse
>     self._scanner.feed(handle, self._consumer)
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 332, in
> feed
>     self._scan_record(uhandle, consumer)
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 337, in
> _scan_record
>     fn(self, uhandle, consumer)
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 378, in
> _scan_dt
>     self._scan_line('DT', uhandle, consumer.date, exactly_one=1)
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 359, in
> _scan_line
>     read_and_call(uhandle, event_fn, start=line_type)
>   File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 301, in
> read_and_call
>     method(line)
>   File "C:\Python23\lib\site-packages\Bio\SwissProt\SProt.py", line 551, in
> date
>     assert rel_index >= 0, \
> AssertionError: Could not find Rel. in DT line: DT   01-NOV-1997, integrated
> into UniProtKB/Swiss-Prot.
> 
> 
> 
> Looking at the file returned gave:
> 
> >>> print results
> ID   ARHG7_HUMAN    STANDARD;      PRT;   803 AA.
> AC   Q14155; Q6P9G3; Q6PII2; Q86W63; Q8N3M1;
> DT   01-NOV-1997, integrated into UniProtKB/Swiss-Prot.
> DT   19-JUL-2004, sequence version 2.
> DT   07-FEB-2006, entry version 55.
> DE   Rho guanine nucleotide exchange factor 7 (PAK-interacting exchange
> DE   factor beta) (Beta-Pix) (COOL-1) (p85).
> ...
> //
> 
> Reading Bio/SwissProt/Spot.py class _RecordConsumer method date(), none of
> those three DT lines look like what the code is expecting.
> 


I'm not sure I follow what you are saying....I don't have a problem reading the
file and get the same result as you did.. The problem is parsing the results(as
the error abaove occurs)


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 07:43:21 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 08:35:11 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602131243.k1DChLWS027589@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|Linux                       |All


------- Comment #3 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-13 07:43 -------
I'm unclear what you meant in comment 2 Kate.

Your original bug report had the following:

SyntaxError: Line does not start with 'ID':
<HTML LANG="EN">

This suggests that instead of getting a plain text SProt file
(which should start 'ID'), you got an HTML file.

Onre reason for this MIGHT be a temporary problem with the ExPASy
website - returning an error message in HTML.

If you still get the <HTML LANG="EN"> error message, could you
attach the raw HTML to this bug (you could use "print results"
at the Python prompt).

If the HTML problem has gone away on its own (which wouldn't
surprise me if it was a temporary problem with the server) do you
see the problem I talked about in comment 1 of the bug?

I have tried this on both Linux and Windows now, both show the
problem described in comment 1 where the 'DT' lines do not match
what BioPython is expecting.

Quoting your original bug report:
> I see from the release notes that some changes were made to the
> annotation format and suspect this is why the biopython scripts
> are no longer happy?

Yes - this does explain the 'DT' line problem, BioPython will need
to be updated to cope with the new format DT lines:

http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0

Quoting:

Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB releases
in the DT lines to displaying the date of the biweekly release at which an
entry is integrated or updated. We dropped the information concerning the
release number and introduced entry and sequence version numbers in the DT
lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 08:59:53 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 09:34:50 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602131359.k1DDxr2s028325@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


------- Comment #4 from gould@embl.de  2006-02-13 08:59 -------
(In reply to comment #3)
> I'm unclear what you meant in comment 2 Kate.
> 
> Your original bug report had the following:
> 
> SyntaxError: Line does not start with 'ID':
> <HTML LANG="EN">
> 
> This suggests that instead of getting a plain text SProt file
> (which should start 'ID'), you got an HTML file.
> 
> Onre reason for this MIGHT be a temporary problem with the ExPASy
> website - returning an error message in HTML.
> 
> If you still get the <HTML LANG="EN"> error message, could you
> attach the raw HTML to this bug (you could use "print results"
> at the Python prompt).
> 
> If the HTML problem has gone away on its own (which wouldn't
> surprise me if it was a temporary problem with the server) do you
> see the problem I talked about in comment 1 of the bug?
> 
> I have tried this on both Linux and Windows now, both show the
> problem described in comment 1 where the 'DT' lines do not match
> what BioPython is expecting.
> 
> Quoting your original bug report:
> > I see from the release notes that some changes were made to the
> > annotation format and suspect this is why the biopython scripts
> > are no longer happy?
> 
> Yes - this does explain the 'DT' line problem, BioPython will need
> to be updated to cope with the new format DT lines:
> 
> http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0
> 
> Quoting:
> 
> Changes concerning dates and versions numbers (DT lines)
> 
> We changed from showing only the dates corresponding to full UniProtKB releases
> in the DT lines to displaying the date of the biweekly release at which an
> entry is integrated or updated. We dropped the information concerning the
> release number and introduced entry and sequence version numbers in the DT
> lines.
> 
> The new format of the three DT lines is:
> 
> DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
> DT   DD-MMM-YYYY, sequence version version_number.
> DT   DD-MMM-YYYY, entry version version_number.
> 
> Example for UniProtKB/Swiss-Prot:
> 
> DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
> DT   15-OCT-2001, sequence version 3.
> DT   01-APR-2004, entry version 14.
> 
> Example for UniProtKB/TrEMBL:
> 
> DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
> DT   15-OCT-2000, sequence version 2.
> DT   15-DEC-2004, entry version 5.
> 
> The sequence version number of an entry is incremented by one when its amino
> acid sequence is modified. The entry version number is incremented by one
> whenever any data in the flat file representation of the entry is modified.
> 
> We retrofitted the entry and sequence version numbers, as well as all dates,
> using archived UniProtKB releases.
> 


Yes, I understand what you are saying now....I'm no longer getting the HTML
file but a plain text SProt file which is not being parsed correctly


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 19:16:23 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 19:35:29 2006
Subject: [Biopython-dev] [Bug 1950] addition of element, SSBOND, OBSLTE,
	CAVEAT fields
Message-ID: <200602140016.k1E0GNXW002951@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1950


------- Comment #1 from edmonds@fas.harvard.edu  2006-02-13 19:16 -------
Created an attachment (id=287)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=287&action=view)
patch to PDB to add element, SSBOND, OBSLTE, CAVEAT


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 19:15:14 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 19:36:09 2006
Subject: [Biopython-dev] 
	[Bug 1950]  New: addition of element, SSBOND, OBSLTE, CAVEAT fields
Message-ID: <200602140015.k1E0FEQL002939@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1950

           Summary: addition of element, SSBOND, OBSLTE, CAVEAT fields
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: edmonds@fas.harvard.edu


I find it useful to be able to sort atoms according to their element (H, C, N,
O, etc), which is contained in columns 77-78 of the PDB, so I have added it to
the parsing of the PDB, and have added get_element to Atom.  

I did not add it to the MMCIFParser because I don't know anything about MMCIFs
and don't know if it's even applicable for MMCIFs.  

I also added trivial SSBOND, OBSLTE, and CAVEAT parsing to the PDB header
parser.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Mon Feb 13 19:16:59 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Mon Feb 13 19:36:28 2006
Subject: [Biopython-dev] [Bug 1950] addition of element, SSBOND, OBSLTE,
	CAVEAT fields to PDB
Message-ID: <200602140016.k1E0Gxne002963@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1950


edmonds@fas.harvard.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|addition of element, SSBOND,|addition of element, SSBOND,
                   |OBSLTE, CAVEAT fields       |OBSLTE, CAVEAT fields to PDB


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Wed Feb 15 08:07:51 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Wed Feb 15 08:35:12 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602151307.k1FD7pmM031301@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-15 08:07 -------
I have checked in a "short term fix" to the SwissProt parser, see
Bio/SwissProt/SProt.py revision 1.32

If you want to test this, the simplest way is just backup your local copy of
Bio/SwissProt/SProt.py and then replace it with the latest version from CVS via
this URL:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython

With this change BioPython will recognise the new style DT lines BUT WILL
IGNORE THEM and carry on.

This should allow people to do any analysis they need to, as long as they don't
need the date information.

I have logged bug 1956 to do something sensible with the new DT lines.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Wed Feb 15 08:01:10 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Wed Feb 15 08:35:29 2006
Subject: [Biopython-dev] [Bug 1956] New: SwissProt release 49 - Support for
	new DT lines
Message-ID: <200602151301.k1FD1A9t031191@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1956

           Summary: SwissProt release 49 - Support for new DT lines
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: biopython-bugzilla@maubp.freeserve.co.uk


See also bug 1948 (which I am marking fixed) where the parser would fail on the
new files.  I am checking in a fix to recognise the new DT lines but ignore
them.

This bug is to do something useful with the new format DT lines.

http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0

Quoting:
--------------------------------------------------------
Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB releases
in the DT lines to displaying the date of the biweekly release at which an
entry is integrated or updated. We dropped the information concerning the
release number and introduced entry and sequence version numbers in the DT
lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.
--------------------------------------------------------
End quote.

We should expose the three new bits of information:

database_name, e.g. "UniProtKB/Swiss-Prot" or maybe just "Swiss-Prot"
sequence_version, e.g. 3
entry_version, e.g. 14

Also the precise meaning of the three dates has changed...

Finally as the "release number" is no longer included, perhaps that record
property should be depreciated.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Wed Feb 15 09:37:29 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Wed Feb 15 10:35:12 2006
Subject: [Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser
	Problem
Message-ID: <200602151437.k1FEbTJY032137@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1948


------- Comment #6 from gould@embl.de  2006-02-15 09:37 -------
(In reply to comment #5)
> I have checked in a "short term fix" to the SwissProt parser, see
> Bio/SwissProt/SProt.py revision 1.32
> 
> If you want to test this, the simplest way is just backup your local copy of
> Bio/SwissProt/SProt.py and then replace it with the latest version from CVS via
> this URL:
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
> 
> With this change BioPython will recognise the new style DT lines BUT WILL
> IGNORE THEM and carry on.
> 
> This should allow people to do any analysis they need to, as long as they don't
> need the date information.
> 
> I have logged bug 1956 to do something sensible with the new DT lines.
> 


Yes, I've checked that 'short term fix' and it works for me so thanks for your
help on that matter....


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk  Wed Feb 15 11:39:52 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed Feb 15 11:35:52 2006
Subject: [Biopython-dev] [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on
	Windows
Message-ID: <43F35958.702@maubp.freeserve.co.uk>

Thomas Hamelryck wrote:
 > If the mmCIF module causes problems it can just be commented out.

I'm beginning to think that might be best (certainly on Windows, as it 
doesn't seem to work "out of the box" using MSVC or cygwin gcc as the 
compiler).

I have just been trying to work out why Bio.PDB.mmCIF.MMCIFlex won't 
compile on Windows with MSVC 6.0

First of all, running "setup.py build" doesn't seem to call flex.  For 
example, it doesn't regenerate lex.yy.c if I delete it before hand.

The version of Bio/PDB/mmCIF/lex.yy.c currently in CVS has the following 
as line 12:

#include <unistd.h>

This is unconditional, and won't work with MSVC because the header does 
not exist.

If I change to the relevant directory, and run "flex mmcif.lex" using 
the cygwin version of flex version 2.5.4 then the lex.yy.c file is 
recreated, and in addition to some lines moving about, this include 
statement becomes:

#ifndef _WIN32
#include <unistd.h>
#endif

This still will not compile because it has not defined exit, malloc, 
realloc and free:

lex.yy.c(1505) : warning C4013: 'exit' undefined; assuming extern 
returning int
lex.yy.c(1568) : warning C4013: 'malloc' undefined; assuming extern 
returning int
lex.yy.c(1586) : warning C4013: 'realloc' undefined; assuming extern 
returning int
lex.yy.c(1596) : warning C4013: 'free' undefined; assuming extern 
returning int

If instead of using the cygwin version of flex, I use the gnuwin32 port, 
which also claims to be flex version 2.5.4 then the lex.yy.c is slightly 
different again - it has NO conditional statements checking for win32.

http://gnuwin32.sourceforge.net/packages/flex.htm

I think the include lines should be something like:

#ifdef _WIN32
#include <windows.h>
#else
#include <unistd.h>
#endif

After messing about with lex.yy.c include statements, I can get MSVC to 
compile lex.yy.obj (with warnings) as shown here:

C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox 
/MD /W3
/GX /DNDEBUG -IBio -Ic:\python23\include -Ic:\python23\PC 
/TcBio/PDB/mmCIF/MMCIF
lexmodule.c /Fobuild\temp.win32-2.3\Release\Bio/PDB/mmCIF/MMCIFlexmodule.obj
MMCIFlexmodule.c
Bio/PDB/mmCIF/MMCIFlexmodule.c(16) : warning C4013: 'mmcif_set_file' 
undefined;
assuming extern returning int
Bio/PDB/mmCIF/MMCIFlexmodule.c(44) : warning C4013: 'mmcif_get_token' 
undefined;
  assuming extern returning int
C:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox 
/MD /W3
/GX /DNDEBUG -IBio -Ic:\python23\include -Ic:\python23\PC 
/TcBio/PDB/mmCIF/lex.y
y.c /Fobuild\temp.win32-2.3\Release\Bio/PDB/mmCIF/lex.yy.obj
lex.yy.c


But then it fails at the link stage:


C:\Program Files\Microsoft Visual Studio\VC98\BIN\link.exe /DLL /nologo 
/INCREMENTAL:NO /LIBPATH:c:\python23\libs /LIBPATH:c:\python23\PCBuild 
fl.lib /EXPORT:initMMCIFlex 
build\temp.win32-2.3\Release\Bio/PDB/mmCIF/lex.yy.obj build\temp.win3
2-2.3\Release\Bio/PDB/mmCIF/MMCIFlexmodule.obj 
/OUT:build\lib.win32-2.3\Bio\PDB\mmCIF\MMCIFlex.pyd 
/IMPLIB:build\temp.win32-2.3\Release\Bio/PDB/mmCIF\MMCIFlex.lib
LINK : fatal error LNK1181: cannot open input file "fl.lib"
error: command '"C:\Program Files\Microsoft Visual 
Studio\VC98\BIN\link.exe"' failed with exit status 1181


This looks similar to the linking problem Michiel sees on Windows using 
cygwin gcc as the compiler:

http://www.biopython.org/pipermail/biopython/2006-February/002923.html

Is the problem that the linker can't find the flex library?

I assume it needs either the wingnu32 flex file installed by default here:

C:\Program Files\GnuWin32\lib\libfl.a

Or, if using the cygwin flex, here:

C:\cygwin\lib\libfl.a

Peter


From thamelry at binf.ku.dk  Wed Feb 15 11:46:46 2006
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Wed Feb 15 12:00:40 2006
Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on
	Windows
In-Reply-To: <43F35958.702@maubp.freeserve.co.uk>
References: <43F35958.702@maubp.freeserve.co.uk>
Message-ID: <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk>


On Wed, February 15, 2006 5:39 pm, Peter wrote:

> First of all, running "setup.py build" doesn't seem to call flex.  For
> example, it doesn't regenerate lex.yy.c if I delete it before hand.

It's not meant to: lex.yy.c is distributed as part of
biopython. You just need the Flex libraries to compile it.

Anyways, I've commented out the mmCif module.
People who need it can uncomment the relevant lines
in setup.py.

Cheers,

-Thomas

From biopython-dev at maubp.freeserve.co.uk  Wed Feb 15 13:12:48 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed Feb 15 13:25:29 2006
Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on
	Windows
In-Reply-To: <33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk>
References: <43F35958.702@maubp.freeserve.co.uk>
	<33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk>
Message-ID: <43F36F20.9090207@maubp.freeserve.co.uk>

Peter wrote:
>> First of all, running "setup.py build" doesn't seem to call flex.
 >> For example, it doesn't regenerate lex.yy.c if I delete it
 >> before hand.

Thomas Hamelryck wrote:
> It's not meant to: lex.yy.c is distributed as part of
> biopython. You just need the Flex libraries to compile it.

OK - I was unclear on this.

> Anyways, I've commented out the mmCif module.
> People who need it can uncomment the relevant lines
> in setup.py.

Seems like a good compromise for now (unless someone wants to contribute 
a patch for setup.py to check if flex is installed).

As lex.yy.c is created by flex, would you agree the problems compiling 
this file with MSVC are a flex problem?  e.g. the #include <unistd.h>

What about the linker problem (seen with by me with MSVC, and by Michiel 
with cygwin gcc) not finding the flex library?  Might this just be a 
path issue?

It would be nice if we could get the module to work on Windows, at least 
for people with suitable compilers.

Peter

From mdehoon at c2b2.columbia.edu  Wed Feb 15 13:41:26 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed Feb 15 13:36:58 2006
Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex
	onWindows
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE6A@cgcmail.cgc.cpmc.columbia.edu>

> What about the linker problem (seen with by me with MSVC, and by Michiel 
> with cygwin gcc) not finding the flex library?  Might this just be a 
> path issue?

In my case, it's probably just because I don't have flex installed. I
remember that at one point (when I was creating the Windows installer for an
older Biopython version), I was able to build the flex library. But anyway,
users will have to install flex themselves also, because they will need the
flex DLL.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From thamelry at binf.ku.dk  Wed Feb 15 15:10:08 2006
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Wed Feb 15 15:05:33 2006
Subject: [Biopython-dev] Re: [BioPython] Compiling Bio.PDB.mmCIF.MMCIFlex on
	Windows
In-Reply-To: <43F36F20.9090207@maubp.freeserve.co.uk>
References: <43F35958.702@maubp.freeserve.co.uk>
	<33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk>
	<43F36F20.9090207@maubp.freeserve.co.uk>
Message-ID: <32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk>


> As lex.yy.c is created by flex, would you agree the problems compiling
> this file with MSVC are a flex problem?  e.g. the #include <unistd.h>

Uh...no idea. :-)
I know next to nothing about Windows, but
I can imagine it should in principle work with cygwin.
Maybe flex needs to be re-run on windows before compiling?
Was that tried?

Cheers,

-Thomas

From biopython-dev at maubp.freeserve.co.uk  Wed Feb 15 15:52:48 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed Feb 15 15:48:44 2006
Subject: [Biopython-dev] Re: Compiling Bio.PDB.mmCIF.MMCIFlex on Windows
In-Reply-To: <32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk>
References: <43F35958.702@maubp.freeserve.co.uk>
	<33458.192.168.10.162.1140022006.squirrel@www.binf.ku.dk>
	<43F36F20.9090207@maubp.freeserve.co.uk>
	<32785.87.72.27.226.1140034208.squirrel@www.binf.ku.dk>
Message-ID: <43F394A0.9010000@maubp.freeserve.co.uk>

>>As lex.yy.c is created by flex, would you agree the problems compiling
>>this file with MSVC are a flex problem?  e.g. the #include <unistd.h>
> 
> Uh...no idea. :-)
> I know next to nothing about Windows, but
> I can imagine it should in principle work with cygwin.
> Maybe flex needs to be re-run on windows before compiling?
> Was that tried?

Yes - I tried both the cygwin flex, and a windows port from here:

http://gnuwin32.sourceforge.net/packages/flex.htm

While both claimed to be flex version 2.5.4, they produced different 
lex.yy.c files, however neither worked for me.  See my earlier email:

http://www.biopython.org/pipermail/biopython-dev/2006-February/002280.html

Peter

P.S. I'll be away for the next few days, so I won't be responding till 
next week

From bugzilla-daemon at portal.open-bio.org  Thu Feb 16 19:33:25 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Thu Feb 16 20:16:00 2006
Subject: [Biopython-dev] [Bug 1919] Transcribe DNA
Message-ID: <200602170033.k1H0XP3i023754@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1919


mdehoon@ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #2 from mdehoon@ims.u-tokyo.ac.jp  2006-02-16 19:33 -------
> I was reading some examples in the biopython tutorial and cookbook and for the
> first time, since I'd already read it many times, I get confused...
> Transcribing the dna sequence ATCG produces the AUCG rna sequence or the UAGC?
> Biopython does the first one, but until today I was completely sure that the
> correct one is the second.

DNA sequences are (almost?) always shown as the non-coding strand. So if a DNA
sequence is written as ATCG, then this is the non-coding strand; the coding
strand has TAGC in the 3'->5' direction. The mRNA is produced by base-pairing
to the coding strand, so you end up with AUGC.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu  Sun Feb 19 12:15:57 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun Feb 19 12:11:31 2006
Subject: [Biopython-dev] RE: problem in using biopython-1.41
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE7A@cgcmail.cgc.cpmc.columbia.edu>

Dear Sarosh,

>From the test results, it looks like all of Biopython is working correctly,
except for Bio.Cluster. So if you don't plan on using the clustering
algorithms in Bio.Cluster, you've got nothing to worry about.

I would like to find out though why Bio.Cluster is failing. It may have to do
with the fact that you're on a 64-bits machine; the code has not been tested
there. So I'd like to ask you the following:
a) Which version of Numerical Python are you using?
b) Can you run the following commands and send me the output of step 3:
    1) Install biopython with "python setup.py install"
    2) From the biopython-1.41/Tests directory, run "python -i
test_Cluster.py"
    3) From the python prompt, execaute "run_tests("Bio.Cluster")"
This will show you the exact output from the Bio.Cluster tests.

Thanks in advance,

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: mailman-bounces@portal.open-bio.org on behalf of Sarosh Fatakia
Sent: Sat 2/18/2006 9:14 PM
To: biopython-dev-owner@biopython.org
Subject: problem in using biopython-1.41
 
Hi I tried sending my problem to
biopython developers. Hope you can please help,
thanks
sarosh


---------- Forwarded message ----------
From: biopython-dev-owner@portal.open-bio.org <
biopython-dev-owner@portal.open-bio.org>
Date: Feb 18, 2006 7:15 PM
Subject: problem in using biopython-1.41
To: sarosh.fatakia@gmail.com

You are not allowed to post to this mailing list, and your message has
been automatically rejected.  If you think that your messages are
being rejected in error, contact the mailing list owner at
biopython-dev-owner@biopython.org.


---------- Forwarded message ----------
From: "Sarosh Fatakia" <sarosh.fatakia@gmail.com>
To: biopython-dev@biopython.org
Date: Sat, 18 Feb 2006 18:50:58 -0500
Subject: problem in using biopython-1.41
Greetings!
I installed biopython-1.41 on my unix box which is:
Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006
x86_64 x86_64 x86_64 GNU/Linux

After following the preliminary steps in
http://bioinformatics.org/bradstuff/bp/tut/Tutorial001.html
I get an error message for the tests performed as:

python setup.py test  2>&1 |  tee python_setup.py-test.out.txt

The main error message is below and the full diagnostic info is attached as
a txt file.
I hope you can please help figure out the problem since I am a novice python
user,
and want to get into biopython for using it as a research tool asap.
Thanks
Sarosh,
NIDDK/NIH


======================================================================
FAIL: test_Cluster
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 148, in runTest
    self.runSafeTest()
  File "run_tests.py", line 185, in runSafeTest
    expected_handle)
  File "run_tests.py", line 285, in compare_output
    assert expected_line == output_line, \
AssertionError:
Output  : 'Wrong clustering solution found.\n'
Expected: 'Correct clustering solution found.\n'

----------------------------------------------------------------------
Ran 93 tests in 78.138s


--
Sarosh N. Fatakia
http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html


From mdehoon at c2b2.columbia.edu  Tue Feb 21 10:50:13 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue Feb 21 10:45:43 2006
Subject: [Biopython-dev] RE: problem in using biopython-1.41
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE80@cgcmail.cgc.cpmc.columbia.edu>

Hi Sarosh,

Thanks for your reply. The problem may be the fact that you're using numarray
instead of Numerical Python. If I remember correctly, numarray has some
facilities to handle large arrays on 64-bits machines. Since Bio.Cluster is
created for Numerical Python, it doesn't know about the numarray-specific
stuff. So the simplest solution may be to install Numerical Python instead of
numarray.

>From your output of "python setup.py install", it appears that even the
compilation of Bio.Cluster failed -- if not all modules that need
compilation: I don't see any of the compiled modules in the install output.
So it looks like something terrible went wrong during "python setup.py
build". Did you get any error messages while running "python setup.py build"?

Finally, there's a typo when you execute 'run_tests("Bio.Cluster")': That
should be Bio.Cluster, not Bio_Cluster.

I think it's best to execute "python setup.py build" again and check if you
get any error messages. Once that works OK, you probably won't get any
testing errors any more. If you do, try again with Numerical Python instead
of numarray. The latest good version of Numerical Python is 24.2.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Sarosh Fatakia [mailto:sarosh.fatakia@gmail.com]
Sent: Tue 2/21/2006 10:01 AM
To: Michiel De Hoon
Cc: biopython-dev@biopython.org
Subject: Re: problem in using biopython-1.41
 
Hello Michiel,
Thanks for your response. Hope you can please help resolve the issue.
a) The numpy (numarray) version is:   numarray-1.5.1/
b) I am attaching the outputs you require. I hope there is sufficient info.
Please do let me know if more information is required.
Thanks once again,
best,
sarosh


On 2/19/06, Michiel De Hoon <mdehoon@c2b2.columbia.edu> wrote:
>
> Dear Sarosh,
>
> From the test results, it looks like all of Biopython is working
> correctly,
> except for Bio.Cluster. So if you don't plan on using the clustering
> algorithms in Bio.Cluster, you've got nothing to worry about.
>
> I would like to find out though why Bio.Cluster is failing. It may have to
> do
> with the fact that you're on a 64-bits machine; the code has not been
> tested
> there. So I'd like to ask you the following:
> a) Which version of Numerical Python are you using?
> b) Can you run the following commands and send me the output of step 3:
>     1) Install biopython with "python setup.py install"
>     2) From the biopython-1.41/Tests directory, run "python -i
> test_Cluster.py"
>     3) From the python prompt, execaute "run_tests("Bio.Cluster")"
> This will show you the exact output from the Bio.Cluster tests.
>
> Thanks in advance,
>
> --Michiel.
>
>
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: mailman-bounces@portal.open-bio.org on behalf of Sarosh Fatakia
> Sent: Sat 2/18/2006 9:14 PM
> To: biopython-dev-owner@biopython.org
> Subject: problem in using biopython-1.41
>
> Hi I tried sending my problem to
> biopython developers. Hope you can please help,
> thanks
> sarosh
>
>
> ---------- Forwarded message ----------
> From: biopython-dev-owner@portal.open-bio.org <
> biopython-dev-owner@portal.open-bio.org>
> Date: Feb 18, 2006 7:15 PM
> Subject: problem in using biopython-1.41
> To: sarosh.fatakia@gmail.com
>
> You are not allowed to post to this mailing list, and your message has
> been automatically rejected.  If you think that your messages are
> being rejected in error, contact the mailing list owner at
> biopython-dev-owner@biopython.org.
>
>
>
>
> ---------- Forwarded message ----------
> From: "Sarosh Fatakia" < sarosh.fatakia@gmail.com>
> To: biopython-dev@biopython.org
> Date: Sat, 18 Feb 2006 18:50:58 -0500
> Subject: problem in using biopython-1.41
> Greetings!
> I installed biopython-1.41 on my unix box which is:
> Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006
> x86_64 x86_64 x86_64 GNU/Linux
>
> After following the preliminary steps in
> http://bioinformatics.org/bradstuff/bp/tut/Tutorial001.html
> I get an error message for the tests performed as:
>
> python setup.py test  2>&1 |  tee python_setup.py- test.out.txt
>
> The main error message is below and the full diagnostic info is attached
> as
> a txt file.
> I hope you can please help figure out the problem since I am a novice
> python
> user,
> and want to get into biopython for using it as a research tool asap.
> Thanks
> Sarosh,
> NIDDK/NIH
>
>
> ======================================================================
> FAIL: test_Cluster
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "run_tests.py", line 148, in runTest
>     self.runSafeTest()
>   File "run_tests.py", line 185, in runSafeTest
>     expected_handle)
>   File "run_tests.py", line 285, in compare_output
>     assert expected_line == output_line, \
> AssertionError:
> Output  : 'Wrong clustering solution found.\n'
> Expected: 'Correct clustering solution found.\n'
>
> ----------------------------------------------------------------------
> Ran 93 tests in 78.138s
>
>
>
>
>
> --
> Sarosh N. Fatakia
>
http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html<http://budoe.bu.edu/%7Esfata
kia/sarosh/sarosh.html>
>
>
>


--
Sarosh N. Fatakia
http://budoe.bu.edu/~sfatakia/sarosh/sarosh.html<http://budoe.bu.edu/%7Esfata
kia/sarosh/sarosh.html>


From bugzilla-daemon at portal.open-bio.org  Tue Feb 21 18:31:21 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Tue Feb 21 19:15:40 2006
Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML
	blast output with multiple querys
Message-ID: <200602212331.k1LNVL8o002535@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1933


------- Comment #4 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-21 18:31 -------
Created an attachment (id=290)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=290&action=view)
RPS-BLAST 2.2.10 multi query XML output file for testing iterator support

Example mutlirecord XML test file, actually from rpsblast.exe 2.2.10 running on
windows despite what the version string in the file claim.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Tue Feb 21 18:34:06 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Tue Feb 21 19:16:01 2006
Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML
	blast output with multiple querys
Message-ID: <200602212334.k1LNY6Ht002578@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1933


------- Comment #5 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-21 18:34 -------
Created an attachment (id=291)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=291&action=view)
RPS-BLAST 2.2.10 multi query TXT output file for testing iterator support

The matching "plain text" output file to go with the XML file just attached.

This is at human readable and should help for any testing of the XML file
parsing.

Note that BioPython does not support the RPS-BLAST style plain text file
format, see bug 1715


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Tue Feb 21 18:36:04 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Tue Feb 21 19:16:19 2006
Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML
	blast output with multiple querys
Message-ID: <200602212336.k1LNa4dN002614@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1933


------- Comment #6 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-21 18:36 -------
Created an attachment (id=292)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=292&action=view)
The FASTA file used as input to generate the test cases

The attached FASTA amino acid file was used to create the previous two test
cases running rpsblast.exe 2.2.10 on windows XP using the CDD database:

rpsblast -i xbt_iter.faa -d data_cdd/Cdd -e 0.0001 > xbt_iter_rps.txt

rpsblast -i xbt_iter.faa -d data_cdd/Cdd -e 0.0001 -m 7 > xbt_iter_rps
.xml


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Wed Feb 22 05:49:40 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Wed Feb 22 06:15:54 2006
Subject: [Biopython-dev] [Bug 1929] Extra reference in BLASTPGP plain text
	output
Message-ID: <200602221049.k1MAneI8013509@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1929


biopython-bugzilla@maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-22 05:49 -------
This was fixed in CVS on 16 Aug 2005 by Jeff Chang, originally reported by
Zhengwei Zhu.

I can't find any reference to this in Bugzilla, and if it was reported on the
mailing list I can't find it.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Wed Feb 22 06:19:26 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Wed Feb 22 07:15:43 2006
Subject: [Biopython-dev] [Bug 1933] Iterator support for Standalone XML
	blast output with multiple querys
Message-ID: <200602221119.k1MBJQ1j014525@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1933


------- Comment #7 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-22 06:19 -------
Created an attachment (id=293)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view)
Python script to test the Blast XML iteration

This is a simple test script which uses the standalone RPS-BLAST output file
xbt_iter_rps.xml as the input file, attachment 290 on this bug.

Using Michael Anthony Maibaum's patch (attachment 266) this seems to work fine.

I would be happy to check in the patch and then integrate the XML iteration
into Tests/test_GenBank.py

Note:
It might be even better to create a matched set of normal BLAST files (plain
text and XML) with a test script to confirm they behave identically in
BioPython.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu  Wed Feb 22 11:23:25 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed Feb 22 11:21:23 2006
Subject: [Biopython-dev] RE: please help
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE84@cgcmail.cgc.cpmc.columbia.edu>

The reason that test_Cluster fails is that on 64-bits machines, the size of
an int is not equal to the size of a long. Numerical Python integer arrays
are of type long by default, and casting to int causes the data stored in the
arrays to be misinterpreted at the C-level. The fix is more or less
straightforward; I expect to have this fixed within a few days.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Sarosh Fatakia [mailto:sarosh.fatakia@gmail.com]
Sent: Tue 2/21/2006 7:26 PM
To: Michiel De Hoon; biopython-dev@biopython.org
Subject: please help
 
Hi folks,
I have been trying all weekend to make the biopython functional. It seems
that more than one
module is non-functional in the 64 bit arch of my machine, viz:
Linux DK12AR4059LNX1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006
x86_64 x86_64 x86_64 GNU/Linux
and the OS is with
Red Hat Enterprise Linux WS release 4.
(Red Hat 3.4.3-9.EL4)
Is there a web page that describes stepwise all the dependant module
intallation which lead to a complete
biopython-1.41 installation.
The python version is: Python 2.3.4
With a partial installation I can only have a limited functionality. The
lone test that fails in the
biopython-1.41 is test_Cluster.py.
Hope you can please help,
Thanks,
sarosh


From bugzilla-daemon at portal.open-bio.org  Sat Feb 25 09:33:25 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sat Feb 25 10:20:52 2006
Subject: [Biopython-dev] [Bug 1963] New: Adding __str__ method to codon
	tables and translators
Message-ID: <200602251433.k1PEXPgO016034@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1963

           Summary: Adding __str__ method to codon tables and translators
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: biopython-bugzilla@maubp.freeserve.co.uk


The existing CodonTable and Translator objects do not provide a simple way to
"see" the table.  It would be nice to be able to just "print" them using the
__str__ method:

e.g.

>>> import Bio.Data.CodonTable
>>> print Bio.Data.CodonTable.standard_dna_table
  |   G     |  A     |  T     |  C     |
--+---------+--------+--------+--------+--
G | GGG G   |GAG E   |GTG V   |GCG A   | G
G | GGA G   |GAA E   |GTA V   |GCA A   | A
G | GGT G   |GAT D   |GTT V   |GCT A   | T
G | GGC G   |GAC D   |GTC V   |GCC A   | C
--+---------+--------+--------+--------+--
A | AGG R   |AAG K   |ATG M(s)|ACG T   | G
A | AGA R   |AAA K   |ATA I   |ACA T   | A
A | AGT S   |AAT N   |ATT I   |ACT T   | T
A | AGC S   |AAC N   |ATC I   |ACC T   | C
--+---------+--------+--------+--------+--
T | TGG W   |TAG Stop|TTG L(s)|TCG S   | G
T | TGA Stop|TAA Stop|TTA L   |TCA S   | A
T | TGT C   |TAT Y   |TTT F   |TCT S   | T
T | TGC C   |TAC Y   |TTC F   |TCC S   | C
--+---------+--------+--------+--------+--
C | CGG R   |CAG Q   |CTG L(s)|CCG P   | G
C | CGA R   |CAA Q   |CTA L   |CCA P   | A
C | CGT R   |CAT H   |CTT L   |CCT P   | T
C | CGC R   |CAC H   |CTC L   |CCC P   | C
--+---------+--------+--------+--------+--

This was done by adding the following method to Bio/Data/CodonTable.py class
CodonTable:

    def __str__(self) :
        """Returns a simple text representation of the codon table"""
        answer="  | " + "|".join( \
            ["  %s     " % c2 for c2 in self.nucleotide_alphabet.letters] \
            ) + "|"
        answer = answer + "\n--+---------+--------+--------+--------+--"
        for c1 in self.nucleotide_alphabet.letters :
            for c3 in self.nucleotide_alphabet.letters :
                line = c1 + " | "
                for c2 in self.nucleotide_alphabet.letters :
                    codon = c1+c2+c3
                    if codon in self.start_codons :
                        line = line + "%s %s(s)|" \
                               % (codon, self.forward_table[codon])
                    elif codon in self.stop_codons :
                        line = line + "%s Stop|" \
                               % (codon)
                    else:
                        line = line + "%s %s   |" \
                               % (codon, self.forward_table[codon])
                line = line + " " + c3
                answer = answer + "\n"+ line 
            answer = answer + "\n--+---------+--------+--------+--------+--"
        return answer

A similar __str__ method could be added to Bio/Translate.py to call the codon
table's __str__ method.

Comments?  Should the order be UCAG rather than following
self.nucleotide_alphabet.letters?   Should it include three letter amino acid
codes as well?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Sun Feb 26 10:35:32 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Sun Feb 26 11:20:31 2006
Subject: [Biopython-dev] [Bug 1963] Adding __str__ method to codon tables
	and translators
Message-ID: <200602261535.k1QFZWmI014403@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1963


------- Comment #1 from biopython-bugzilla@maubp.freeserve.co.uk  2006-02-26 10:35 -------
Revised version which:
* Uses the "conventional" nucleotide ordering
* Works for the ambigous tables
* Shows the table's ID and name(s)

Again, add this method to Bio/Data/CodonTable.py class
CodonTable:

    def __str__(self) :
        """Returns a simple text representation of the codon table"""
        if self.id :
            answer = "Table %i" % self.id
        else :
            answer = "Table ID unknown"
        if self.names :
            answer = answer + " " + ", ".join(filter(None, self.names))

        """
        #Use the conventional ordering for the codon table
        #and only use the main four - even for ambiguous tables
        letters = self.nucleotide_alphabet.letters
        if "T" in letters :
            #DNA
            letters = "TCAG"
        elif "U" in letters :
            #RNA
            letters = "UCAG"
        else :
            print "WARNING - Unexpected alphabet"
        """

        #Use the conventional ordering for the codon table
        letters = self.nucleotide_alphabet.letters
        if "GATC" == letters :
            #DNA
            letters = "TCAG"
        elif "GAUC" == letters :
            #RNA
            letters = "UCAG"


        answer=answer + "\n\n  |" + "|".join( \
            ["  %s      " % c2 for c2 in letters] \
            ) + "|"
        answer=answer + "\n--+" \
               + "+".join(["---------" for c2 in letters]) + "+--"
        for c1 in letters :
            for c3 in letters :
                line = c1 + " |"
                for c2 in letters :
                    codon = c1+c2+c3
                    line = line + " %s" % codon
                    if codon in self.stop_codons :
                        line = line + " Stop|"
                    else :
                        try :
                            amino = self.forward_table[codon]
                        except KeyError :
                            amino = "?"
                        except TranslationError :
                            amino = "?"
                        if codon in self.start_codons :
                            line = line + " %s(s)|" % amino
                        else :
                            line = line + " %s   |" % amino
                line = line + " " + c3
                answer = answer + "\n"+ line 
            answer=answer + "\n--+" \
                  + "+".join(["---------" for c2 in letters]) + "+--"
        return answer

Example:

>>> import Bio.Data.CodonTable
>>> print Bio.Data.CodonTable.unambiguous_dna_by_id[11]
Table 11 Bacterial

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I(s)| ACT T   | AAT N   | AGT S   | T
A | ATC I(s)| ACC T   | AAC N   | AGC S   | C
A | ATA I(s)| ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V(s)| GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--
>>> print Bio.Data.CodonTable.unambiguous_rna_by_id[1]
Table 1 Standard, SGC0

  |  U      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
U | UUU F   | UCU S   | UAU Y   | UGU C   | U
U | UUC F   | UCC S   | UAC Y   | UGC C   | C
U | UUA L   | UCA S   | UAA Stop| UGA Stop| A
U | UUG L(s)| UCG S   | UAG Stop| UGG W   | G
--+---------+---------+---------+---------+--
C | CUU L   | CCU P   | CAU H   | CGU R   | U
C | CUC L   | CCC P   | CAC H   | CGC R   | C
C | CUA L   | CCA P   | CAA Q   | CGA R   | A
C | CUG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | AUU I   | ACU T   | AAU N   | AGU S   | U
A | AUC I   | ACC T   | AAC N   | AGC S   | C
A | AUA I   | ACA T   | AAA K   | AGA R   | A
A | AUG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GUU V   | GCU A   | GAU D   | GGU G   | U
G | GUC V   | GCC A   | GAC D   | GGC G   | C
G | GUA V   | GCA A   | GAA E   | GGA G   | A
G | GUG V   | GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--

Question One:
Is this worth adding to BioPython or not?

Question Two:
What is the preferred behaviour for ambiguous tables?  Just a 4x4x4 table as
for the unambiguous tables?  Or the full 15x15x15 table?  I have implemented
both (see commented out code)

Question Three:
Is there a standard BioPython function to convert from one letter amino acid
sequences into three letter names?  i.e. like one_to_three from
Bio.PDB.Polypeptide but more general.  That function does not cope with
ambigous names.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu  Sun Feb 26 16:56:59 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun Feb 26 16:54:42 2006
Subject: [Biopython-dev] RE: please help
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE8D@cgcmail.cgc.cpmc.columbia.edu>

> I have been trying all weekend to make the biopython functional. It seems
> that more than one module is non-functional in the 64 bit arch of my
machine
...
> With a partial installation I can only have a limited functionality. The
> lone test that fails in the biopython-1.41 is test_Cluster.py.

I fixed this in CVS. If you download Bio/Cluster/clustermodule.c from there
and copy it over the one in Biopython-1.41, the problems on 64-bit machines
should be solved. Please let me know if you're still finding problems with
test_Cluster.py.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From mdehoon at c2b2.columbia.edu  Mon Feb 27 13:16:47 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon Feb 27 13:16:10 2006
Subject: [Biopython-dev] RE: [BioPython] qblast fails on parsing XML results
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE91@cgcmail.cgc.cpmc.columbia.edu>

There is a simpler solution to this, which is to use urllib instead of the
socket library in the function _send_to_qblast and _send_to_blasturl. If we
use urllib, we get the results automatically without the HTTP header.

So .... does anybody know why socket is used instead of urllib? If it's
because older Python versions didn't have urllib, we can just replace socket
by urllib to solve this problem. Or am I missing something?

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces@portal.open-bio.org on behalf of Ilya Soifer
Sent: Mon 2/27/2006 10:38 AM
To: biopython@biopython.org
Subject: [BioPython] qblast fails on parsing XML results
 
Hi,
I hope that I send it to the correct list.
When I run qblast I get

>>> res1 = NCBIWWW.qblast("blastn", "nr", seq1)

Traceback (most recent call last):
 File "<pyshell#24>", line 1, in -toplevel-
   res1 = NCBIWWW.qblast("blastn", "nr", seq1)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line
1130, in qblast
   i = results.index("Connection: close")
ValueError: substring not found

This happens since the results that Blast return no longer have this header

   # HTTP/1.1 200 OK
   # Date: Wed, 05 Oct 2005 02:13:33 GMT
   # Server: Nde
   # Content-Type: text/plain
   # Connection: close
   #


but this one

HTTP/1.0 200 OK
Date: Mon, 27 Feb 2006 11:54:40 GMT
Content-Type: application/xml
Server: Nde
Via: 1.1 proxy7 (NetCache NetApp/6.0.2)

I guess it might be better to look for something like "<?xml" etc. in
order to remove the annoying header.

Ilya

_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython


From bugzilla-daemon at portal.open-bio.org  Tue Feb 28 13:38:54 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Tue Feb 28 14:20:44 2006
Subject: [Biopython-dev] [Bug 1964] New: GenBank.FeatureParser dies on LOCUS
	Record ADRCG 
Message-ID: <200602281838.k1SIcs0I029984@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1964

           Summary: GenBank.FeatureParser dies on LOCUS Record ADRCG
           Product: Biopython
           Version: Not Applicable
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: mcolosimo@mitre.org


from Bio import GenBank
gi_list = GenBank.search_for("ADRCG")
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser =  
                    GenBank.FeatureParser())
rec = ncbi_dict[gi_list[0]]

Traceback:
[snip]
Bio/GenBank/__init__.py", line 1507, in feed
    line = self._feed_header(handle, consumer)
Bio/GenBank/__init__.py", line 1436, in _feed_header
    consumer.reference_bases(data[data.find(' ')+1:])
Bio/GenBank/__init__.py", line 458, in reference_bases
    locations = self._split_reference_locations(ref_base_info)
Bio/GenBank/__init__.py", line 496, in _split_reference_locations
    start, end = base_info.split('to')
ValueError: unpack list of wrong size


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org  Tue Feb 28 16:05:03 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org)
Date: Tue Feb 28 16:20:43 2006
Subject: [Biopython-dev] [Bug 1965] New: GenBank FeatureParser converts
	dates from 4 digits to TWO!
Message-ID: <200602282105.k1SL53eD000322@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1965

           Summary: GenBank FeatureParser converts dates from 4 digits to
                    TWO!
           Product: Biopython
           Version: Not Applicable
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev@biopython.org
        ReportedBy: mcolosimo@mitre.org


People spent millions, maybe, billions at the end of the 1990s to fix this
problem and some how biopython undoes it.

Given a LOCUS line with the date of "23-AUG-2002", using FeatureParser converts
it to "23-AUG-02".

It seems that GenBank._Scanner._feed_locus seems to do the correct thing. So
I'm at a loss at this time as to what is doing this "cleaning", but it would be
nice to keep it as YYYY.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.