From mjldehoon at yahoo.com  Mon May  5 10:55:42 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 5 May 2008 07:55:42 -0700 (PDT)
Subject: [Biopython-dev] BOSC 2008 announcement and call for submissions
Message-ID: <698765.93604.qm@web62401.mail.re1.yahoo.com>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will
 take place in Toronto, Ontario, Canada, as one of several Special
 Interest Group (SIG) meetings occurring in conjunction with the 16th annual
 Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is the final reminder to submit your proposals for talks to the
 BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems
 site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a
 full paper.  The small Abstract text should be a summary, while the
 longer abstract (should provide more details, including the open-source
 license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm)
 margins on the top, sides, and bottom.  The full-length abstract should
 include the title, authors, and affiliations.  We prefer your abstract
 to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

From bugzilla-daemon at portal.open-bio.org  Wed May  7 11:36:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 May 2008 11:36:43 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200805071536.m47FahTU028186@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-07 11:36 EST -------
Created an attachment (id=917)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=917&action=view)
Patch to BioSQL/BioSeq.py

Hi Eric.

I've tried your script with MySQL 5.0 under Linux, and see similar example
timings, e.g.:

getTaxonSQLsimplex took 458.646 ms
getTaxonSQL took 8152.112 ms
getTaxonSQLall took 8565.304 ms
getTaxonLoop took 18.612 ms

However, your loop function doesn't return exactly the same list as the
original code.  In particular you do not exclude taxonomy lineage entries with
a rank of "no rank".  Also I didn't like the hard coded assumption about
taxon_id 1 as a top node.  What do you think of this version:

def getTaxonLoopPeter(adaptor, taxon_id):
    # climbing up the hierarchy: bottom-up approach based on the child/parent
link with parent_taxon_id
    taxonomy = []
    while taxon_id :
        name, rank, parent_taxon_id = adaptor.execute_one(
        "SELECT taxon_name.name, taxon.node_rank, taxon.parent_taxon_id" \
        " FROM taxon, taxon_name" \
        " WHERE taxon.taxon_id=taxon_name.taxon_id" \
        " AND taxon_name.name_class='scientific name'" \
        " AND taxon.taxon_id = %s", (taxon_id,))
        if taxon_id == parent_taxon_id :
            # If the taxon table has been populated by the BioSQL script
            # load_ncbi_taxonomy.pl this is how top parent nodes are stored.
            # Personally, I would have used a NULL parent_taxon_id here.
            break
        if rank <> "no rank" :
            #For consistency with older versions of Biopython, we are only
            #interested in taxonomy entries with a stated rank.
            #Add this to the start of the lineage list.
            taxonomy.insert(0, name)
        taxon_id = parent_taxon_id
    return taxonomy

I'm attaching a patch to BioSQL/BioSeq.py that uses this code in place of the
current left/right dependent version.  While this does seem to be much faster
in your test script, I'm not sure how much difference this will make in normal
usage.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu May  8 07:56:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 07:56:24 -0400
Subject: [Biopython-dev] [Bug 2496] New: Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
Message-ID: <bug-2496-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496

           Summary: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST
                    option
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
                CC: betainverse at gmail.com


Problem reported on the mailing list by Katie Edmonds.

We need to add the CGI option RUN_PSIBLAST to the Blast URL in order to support
PSI-BLAST.  However, the current Biopython code can't parse the RID from the
resulting HTML which needs another fix.

Patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu May  8 07:58:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 07:58:46 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805081158.m48Bwkxq028674@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-08 07:58 EST -------
Created an attachment (id=918)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=918&action=view)
Patch to Bio/Blast/NCBIWWW.py

This seems to work, however there is another problem in the XML parser.  e.g.

from Bio.Blast.NCBIWWW import qblast
#gi|160837788|ref|NP_075631.2| actin related protein 2/3 complex, subunit 1B
sequence = \
"MAYHSFLVEPISCHAWNKDRTQIAICPNNHEVHIYEKSGAKWNKVHELKEHNGQVTGIDWAPESNRIVTC" \
+ "GTDRNAYVWTLKGRTWKPTLVILRINRAARCVRWAPNENKFAVGSGSRVISICYFEQENDWWVCKHIKKP" \
+ "IRSTVLSLDWHPNNVLLAAGSCDFKCRIFSAYIKEVEERPAPTPWGSKMPFGELMFESSSSCGWVHGVCF" \
+ "SASGSRVAWVSHDSTVCLVDADKKMAVATLASETLPLLAVTFITENSLVAAGHDCFPVLFTYDNAAVTLS" \
+ "FGGRLDVPKQSSQRGMTARERFQNLDKKASSEGGAATGAGLDSLHKNSVSQISVLSGGKAKCSQFCTTGM" \
+  "DGGMSIWDVKSLESALKDLKIK"
result_handle1 = qblast('blastp', 'nr', sequence, expect=0.001)
result_handle2 = qblast('blastp', 'nr', sequence, i_thresh=0.05, expect=10,
run_psiblast="on")


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu May  8 10:28:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 10:28:21 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805081428.m48ESLbe006861@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-08 10:28 EST -------
This patch seems to be working - note that you will also need to update
Bio/Blast/NCBIXML.py to CVS revision 1.18 in order to parse the results.  This
is due to a small change in the formatting of the version number in the latest
XML output.

I would like someone familiar with PSI-Blast to confirm this is OK before I
commit this change to CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May  9 05:01:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 05:01:46 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805090901.m4991kut017980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-09 05:01 EST -------
Katie has reported back via the mailing list that there are still issues with
multiple PSI-Blast iterations, see:
http://lists.open-bio.org/pipermail/biopython/2008-May/004220.html

See also the original thread:
http://lists.open-bio.org/pipermail/biopython/2008-May/004213.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May  9 07:21:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 07:21:33 -0400
Subject: [Biopython-dev] [Bug 2497] New: Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
Message-ID: <bug-2497-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497

           Summary: Unit tests do not cover Bio.Blast.NCBIWWW.qblast()
           Product: Biopython
           Version: 1.45
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Recent NCBI changes to use BLAST 2.2.18+ with their online API broke our XML
parser.  This was actually reported via the mailing list and fixed quickly.

Adding an online unit test to explicitly run a few queries with
Bio.Blast.NCBIWWW.qblast() and parse the XML output could have caught this
earlier.

I'm going to attach a proposed additional unit test to do this,
test_NCBIWWW_online.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May  9 07:24:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 07:24:48 -0400
Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
In-Reply-To: <bug-2497-42@http.bugzilla.open-bio.org/>
Message-ID: <200805091124.m49BOmUD023507@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-09 07:24 EST -------
Created an attachment (id=919)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=919&action=view)
Addition unit test

This is a simple unit test which calls qblast() twice, once using blastp and
once using blastn.

The XML results are then parsed, and it checks that a few pre-defined expected
matches are found.  There is minimal output to the console/output file as I do
not want minor details like the precise number of hits to be reported
(anticpating these to fluctuate as the databases grow).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From quantrum75 at yahoo.com  Fri May  9 09:37:05 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Fri, 9 May 2008 06:37:05 -0700 (PDT)
Subject: [Biopython-dev] Anyone needs help?
In-Reply-To: <mailman.7167.1210332098.2995.biopython-dev@lists.open-bio.org>
Message-ID: <686395.82650.qm@web31404.mail.mud.yahoo.com>

Hi there,
I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? I have tried contributing at a few places before and the problem I ran into was that it was too long and unfocused requirements and nothing came of it in the end.
What I am looking for is,

1) Something small to start off with.
2) Something I can complete within a short period of time (focused work of a day or two) and reach a definite conclusion.
3) No work is too small for me.
4) I d be willing to do any kind of grunt work and would be glad to help with documentation etc
5) Ideally, it would be something like reviewing some documentation and correcting it, or writing some documentation for a function or whatever for someone who needs to do it but just does not have the time to do it.
6) The kind of work I like to do is work that can be completed.

If anyone has such a job in mind, let me know.
Thanks for your time.
Sincerely
Regards
Rama


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

From biopython at maubp.freeserve.co.uk  Fri May  9 11:33:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 May 2008 16:33:08 +0100
Subject: [Biopython-dev] Anyone needs help?
In-Reply-To: <686395.82650.qm@web31404.mail.mud.yahoo.com>
References: <mailman.7167.1210332098.2995.biopython-dev@lists.open-bio.org>
	<686395.82650.qm@web31404.mail.mud.yahoo.com>
Message-ID: <320fb6e00805090833w6977bb3fr6ca32d70cb2887ea@mail.gmail.com>

On Fri, May 9, 2008 at 2:37 PM, Rama wrote:
> Hi there,
> I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project?

Hello Rama.

What is your background? Do you know anything about bioinformatics for
example?  Also how experienced are you with python, and have you ever
worked with the tools diff, patch and CVS?

> I have tried contributing at a few places before and the problem I ran into was that it was too long
> and unfocused requirements and nothing came of it in the end. What I am looking for is,
> ...
> If anyone has such a job in mind, let me know.

I would suggest you have a go at Bug 2446, which is small and
shouldn't be too complicated.   The bug reporter Dave Thompson has
been kind enough to provide a few test cases and example code to
demonstrate the problem.

http://bugzilla.open-bio.org/show_bug.cgi?id=2446

Could you try modifying the Ace parser to just ignore these comment
sections?  The file you need to look at is Bio/Sequencing/Ace.py

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Sequencing/Ace.py?cvsroot=biopython

As you can see from the CVS history, this code hasn't changed since
our latest release of Biopython 1.45, so you could work from that if
its easier than learning about CVS too.   If you can get this to work,
then prepare a patch file against the CVS code (or Biopython 1.45) and
attach it to the bug.

Let me know what you think about trying this.

Regards,

Peter
(one of the Biopython developers)

From bugzilla-daemon at portal.open-bio.org  Fri May  9 14:20:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 14:20:12 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805091820.m49IKCMh009431@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


------- Comment #31 from mmokrejs at ribosome.natur.cuni.cz  2008-05-09 14:20 EST -------
Hi,
  I wanted to test what you have but lack some more user friendly
documentation. Specifically, I lack documentation for the class BioSeqDatabase
in BioSeqDatabase.py (attachment 915). In the method load which Eric has
modified it is not clear to me what would be fetched from NCBI Taxonomy DB. I
guess the full lineage, but still I do not know whether as a string or a list
of strings or similarly just taxids?

  The Loader.py (attachment 914) has scary function called remove()
and I would like to see moro elaborate explanation what it really does.
Imagine I have two subspecies of same species in the database want
to delete the first one. Will it zap the parents common to both
of them? I wish not. ;-)

Also, I am a bit surprised that _get_taxon_id() would actually modify a local
database. Could there be another name of could it be split into two functions,
one doing the search ove local db, and optionally fetching data via internet
and second modifying local db?

And, shouldn't the 'if self.fetch_NCBI_taxonomy' have a corresponding elif for
the second attempt and the third one? It is a bit too long to read. ;-)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon May 12 14:40:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 May 2008 14:40:34 -0400
Subject: [Biopython-dev] [Bug 2499] New: Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
Message-ID: <bug-2499-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499

           Summary: Bio.Blast.NCBIXML cannot handle XML without date in
                    BlastOutput_version
           Product: Biopython
           Version: 1.44
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: n.j.loman at bham.ac.uk


I got the following XML file directly from the NCBI website.

<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.2.18+</BlastOutput_version>
  <BlastOutput_reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Sch????ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman
(1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>env_nr</BlastOutput_db>
...

This output raises an exception when put through NCBIXML.parse() due to the
absence of a date after the string BLASTP 2.2.18+

The following diff sorts it out:

--- /home/nick/biopython/biopython-1.44/Bio/Blast/NCBIXML.py    2007-07-27
21:34:07.000000000 +0100
+++ NCBIXML.py  2008-05-12 18:01:36.000000000 +0100
@@ -212,8 +212,10 @@

         Save this to put on each blast record object
         """
-        self._header.version = self._value.split()[1]
-        self._header.date = self._value.split()[2][1:-1]
+        s = self._value.split()
+        self._header.version = s[1]
+        if len(s) > 2:
+           self._header.date = s[2][1:-1]

     def _end_BlastOutput_reference(self):
         """a reference to the article describing the algorithm

I'm sorry, I haven't checked to see if this is fixed in 1.45.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue May 13 04:09:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 04:09:53 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200805130809.m4D89ro7003140@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-13 04:09 EST -------
Hi Nick,

This was reported earlier on the mailing list, and fixed in
Bio/Blast/NCBIXML.py revision 1.18 (at the time I didn't bother to file a bug,
maybe I should have):
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIXML.py?cvsroot=biopython

If you need the fix urgently, you can either get the whole of Biopython from
CVS and install from source, or just replace that one file which can simple be
downloaded from ViewCVS (link above).  Your exception error will tell you where
exactly your local copy of Bio/Blast/NCBIXML.py is.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue May 13 05:16:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 05:16:18 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200805130916.m4D9GIMV006160@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


------- Comment #2 from n.j.loman at bham.ac.uk  2008-05-13 05:16 EST -------
Many thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Tue May 13 08:07:15 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 13 May 2008 05:07:15 -0700 (PDT)
Subject: [Biopython-dev] Reportlab requirement
Message-ID: <305778.65303.qm@web62415.mail.re1.yahoo.com>

Hi everybody,

Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message:

*** Reportlab *** is either not installed or out of date.

This package is optional, which means it is only used in a few
specialized modules in Biopython.  You probably don't need this if you
are unsure.  You can ignore this requirement, and install it later if
you see ImportErrors.
You can find Reportlab at http://www.reportlab.org/downloads.html.

Do you want to continue this installation? (Y/n)  


Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found.

 So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)...

Any objections?

--Michiel.
 
       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

From sdavis2 at mail.nih.gov  Tue May 13 08:34:20 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 13 May 2008 08:34:20 -0400
Subject: [Biopython-dev] Reportlab requirement
In-Reply-To: <305778.65303.qm@web62415.mail.re1.yahoo.com>
References: <305778.65303.qm@web62415.mail.re1.yahoo.com>
Message-ID: <264855a00805130534q6451e40fj427a51e4aa729b18@mail.gmail.com>

On Tue, May 13, 2008 at 8:07 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
>  Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message:
>
>  *** Reportlab *** is either not installed or out of date.
>
>  This package is optional, which means it is only used in a few
>  specialized modules in Biopython.  You probably don't need this if you
>  are unsure.  You can ignore this requirement, and install it later if
>  you see ImportErrors.
>  You can find Reportlab at http://www.reportlab.org/downloads.html.
>
>  Do you want to continue this installation? (Y/n)
>
>
>  Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found.
>
>   So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)...
>
>  Any objections?

I personally think it is a good idea to remove the question, yes.

Sean

From bugzilla-daemon at portal.open-bio.org  Tue May 13 12:25:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 12:25:49 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200805131625.m4DGPn3W028364@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-13 12:25 EST -------
I see some interesting parrallels for the __getitem__ options for a sequence
alignment, and recent and on going discussions on the numpy discussion list for
the __getitem__ behaviour of matrices versus arrays.  In particular, some
participants favour return of row/column vector objects in some situations. 
Also methods to allow iteration over rows or columns have been suggested.

Here with the sequence Alignment class, we could have SeqRecords for the rows,
but Seq or strings for the columns.  Perhaps we should wait and see how the
numpy discussion turns out?

However, some of the other options discussed here on this bug are probably
worth committing soon (e.g. the __str__ and __repr__ methods)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 14 16:49:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 16:49:08 -0400
Subject: [Biopython-dev] [Bug 2500] New: should use python-numpy instead of
	python-num{eric, array}
Message-ID: <bug-2500-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2500

           Summary: should use python-numpy instead of python-
                    num{eric,array}
           Product: Biopython
           Version: 1.45
          Platform: All
               URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478457
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mail at philipp-benner.de


Both python-numeric and python-numarray do not see new upstream releases
anymore; the currently maintained project is python-numpy. Please convert
the package to use python-numpy instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 14 20:58:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 20:58:11 -0400
Subject: [Biopython-dev] [Bug 2500] should use python-numpy instead of
	python-num{eric, array}
In-Reply-To: <bug-2500-42@http.bugzilla.open-bio.org/>
Message-ID: <200805150058.m4F0wBCO023044@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2500


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-05-14 20:58 EST -------


*** This bug has been marked as a duplicate of bug 2251 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 14 20:58:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 20:58:13 -0400
Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython
In-Reply-To: <bug-2251-42@http.bugzilla.open-bio.org/>
Message-ID: <200805150058.m4F0wDfd023057@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2251


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail at philipp-benner.de


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-05-14 20:58 EST -------
*** Bug 2500 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Thu May 15 09:04:22 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 15 May 2008 14:04:22 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com>
	<320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com>
	<320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
Message-ID: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>

Hi all,

We are trying to submit an abstract for BOSC 2008 regarding Biopython.
Below is the current version.
Comments would be very appreciated (we are already after the deadline,
so they should come in fast ;) ).
Michiel, do you want to add anything to the "future" section?


---------------------------------------

Biopython Project Update

Tiago Antao[1], Peter Cock[2]

In this talk we present the current status of the Biopython project,
we focus on features developed since BOSC 2007, future plans for the
project and present example usages of the new population genetics
module.

The latest Biopython release is 1.45 made available on 22 March 2008.
Some of the new features are:

  1. A new population genetics module including support for
coalescent simulation, selection detection and the GenePop file
format. The new module relies on existing open source external
software (e.g., the open source Simcoal2 for coalescent simulation
which is can take advantage of multiple core CPUs for computationally
intensive tasks).
  2. Improved documentation.
  3. Deprecation of many modules which were either obsolete or had
been superseded by other code.
  4. Plus many bugs were fixed, included updates for evolving file formats.

Since the Biopython 1.45 release, further work is planned to extend
the Population Genetics module (e.g., with a statistics component).  A
new sequence alignment module is also being implemented with a uniform
API for reading and writing various alignment files, based on the
approach of the Bio.SeqIO module added last year for working with
sequences.  Work to improve Biopython's BioSQL support is also
ongoing.

Time permitting, the talk will also show usage examples of the new
population genetics module. The focus will be put not only on the
population genetics side, but also on strategies to easily use all
available computational power on new multiple core computers. This is
useful for users of the most scripting languages as most language
interpreter implementations impose stern limits on multi-threaded
programming efficiency, which is important when using computational
biology code which is CPU intensive. We will take this opportunity to
discuss strategies to overcome those language limitations.


Any feedback would really be much appreciated, thanks!

From biopython at maubp.freeserve.co.uk  Thu May 15 09:48:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 May 2008 14:48:26 +0100
Subject: [Biopython-dev] Bio.AlignIO for sequence alignment input/output
Message-ID: <320fb6e00805150648y42e91765oa99eab7e5e1cf8fa@mail.gmail.com>

Those of you subscribed to the CVS update feed (see
http://biopython.org/wiki/Tracking_CVS_commits and the RRS link) will
have noticed some activity in Bio.AlignIO which I originally proposed
adding a year ago.  See also enhancement Bug 2285,
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

I've been using this code on and off in my own work, and have put
together a reasonable unit test.  I've finished a first draft of a new
chapter in the tutorial describing the module (you'll need to run
pdflatex or hevea on biopython/Doc/Tutorial.tex to read this), and
started a wiki page too: http://www.biopython.org/wiki/AlignIO

The API is deliberately very close to that of Bio.SeqIO, but deals
with Alignment objects rather than SeqRecord objects.  I'm hoping for
some feedback now, even if it is as little as pointing out any typos
in the documentation.  Also additional example input files would be
good - and checking the Biopython output is understood by third party
tools.

One particular issue with the API is handling ambiguous FASTA files
which have been used to store more than one alignment (discussed in
the updated tutorial).  There is an optional argument to the
Bio.AlignIO.parse() function to specify the number of sequences
expected per alignment which covers the most typical scenarios.  I am
open to the idea of simply removing this option, which means if the
user really wants to parse one of the ambigous files, they would have
to read in the individual sequences using Bio.SeqIO, batch them as
needed, and then create the alignments.

Peter

From p.j.a.cock at googlemail.com  Thu May 15 09:51:59 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 15 May 2008 14:51:59 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com>
	<320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com>
	<320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
	<6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
Message-ID: <320fb6e00805150651md383437w2233bc1419589d40@mail.gmail.com>

One little typo I should have spotted earlier:

 4. Plus many bugs were fixed, included updates for evolving file formats.

Should be:

 4. Plus many bugs were fixed, including updates for evolving file formats.

Also I didn't insert our addresses for the [1] and [2] implied footnotes.

Peter

From mjldehoon at yahoo.com  Fri May 16 23:04:54 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 16 May 2008 20:04:54 -0700 (PDT)
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
Message-ID: <89450.67823.qm@web62411.mail.re1.yahoo.com>

Dear Tiago,
Thank you for representing Biopython at BOSC!
If there's still time, I would suggest to aim the abstract (and also the talk itself) more at the general audience, who may know very little about Biopython or Python. So perhaps an overview of the main modules (no details, just to give an idea of what is covered by Biopython), the Population Genetics module, number of developers, number of users, and perhaps just mention the existence of some other big packages (numerical python, matplotlib, MMTK, ...) that are relevant to science & biology with Python. The point is that most people in the audience are not Biopython users (yet), so for them a general introduction is more suitable.

--Michiel.

Tiago Ant???o <tiagoantao at gmail.com> wrote: Hi all,

We are trying to submit an abstract for BOSC 2008 regarding Biopython.
Below is the current version.
Comments would be very appreciated (we are already after the deadline,
so they should come in fast ;) ).
Michiel, do you want to add anything to the "future" section?


---------------------------------------

Biopython Project Update

Tiago Antao[1], Peter Cock[2]

In this talk we present the current status of the Biopython project,
we focus on features developed since BOSC 2007, future plans for the
project and present example usages of the new population genetics
module.

The latest Biopython release is 1.45 made available on 22 March 2008.
Some of the new features are:

  1. A new population genetics module including support for
coalescent simulation, selection detection and the GenePop file
format. The new module relies on existing open source external
software (e.g., the open source Simcoal2 for coalescent simulation
which is can take advantage of multiple core CPUs for computationally
intensive tasks).
  2. Improved documentation.
  3. Deprecation of many modules which were either obsolete or had
been superseded by other code.
  4. Plus many bugs were fixed, included updates for evolving file formats.

Since the Biopython 1.45 release, further work is planned to extend
the Population Genetics module (e.g., with a statistics component).  A
new sequence alignment module is also being implemented with a uniform
API for reading and writing various alignment files, based on the
approach of the Bio.SeqIO module added last year for working with
sequences.  Work to improve Biopython's BioSQL support is also
ongoing.

Time permitting, the talk will also show usage examples of the new
population genetics module. The focus will be put not only on the
population genetics side, but also on strategies to easily use all
available computational power on new multiple core computers. This is
useful for users of the most scripting languages as most language
interpreter implementations impose stern limits on multi-threaded
programming efficiency, which is important when using computational
biology code which is CPU intensive. We will take this opportunity to
discuss strategies to overcome those language limitations.


Any feedback would really be much appreciated, thanks!
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Sat May 17 02:13:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 May 2008 02:13:33 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805170613.m4H6DXDZ016145@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #914 is|0                           |1
           obsolete|                            |


------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp  2008-05-17 02:13 EST -------
Created an attachment (id=920)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=920&action=view)
Replacement for "Usage ... to load a SeqRecord's taxonomy"

Recently I made some changes to the Taxonomy parser in Bio.Entrez, specifically
to make it more consistent with the other parsers in Bio.Entrez. Some fields in
the XML are now accessed slightly differently. I updated Loader.py accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sun May 18 10:33:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 18 May 2008 07:33:25 -0700 (PDT)
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
Message-ID: <157512.3075.qm@web62408.mail.re1.yahoo.com>

Hi everybody,

In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now installed using a specialized install_data_biopython class. For Bio.Entrez, I am using the package_data argument to the setup function instead. Does anybody know why the install_data_biopython class was used? If there's no specific reason, I'd prefer to use the package_data argument instead.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Mon May 19 05:30:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 05:30:59 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805190930.m4J9UxLu016813@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-19 05:30 EST -------
This bug is also linked to Bug 2494 (currently titled "_retrieve_taxon in
BioSQL.py needs urgent optimization") which is about not using the left/right
values when reteiving data from the database.

This is important because changes made in this bug (i.e. Bug 2475) may leave
the left/right values NULL when writing new lineages.

Also, in repley to comment 31, all of the other _get_...() methods of the
Loader class can also add things to the database (e.g. qualifier keys).  Once
you know this, the fact that _get_taxon_id() goes this too isn't a shock. 
Also, yes, the _get_taxon_id() function is getting far too long, and should
probably be restructured as part of this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Mon May 19 08:09:57 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:09:57 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <89450.67823.qm@web62411.mail.re1.yahoo.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
Message-ID: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>

On Sat, May 17, 2008 at 4:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> The point is that most people in the audience are not Biopython users (yet),
> so for them a general introduction is more suitable.

Actually this issue is of a major concern to me... Do you (or anybody)
has a feel of what audience will be there? I think it is important to
tune the message to the audience. I actually was speculating that
people would know about biopython. But if that is not the case, as you
imply, then maybe a something that makes biopython more competitive
for people which might be deciding which system (language and
libraries) might be the best approach...

From p.j.a.cock at googlemail.com  Mon May 19 08:26:31 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 May 2008 13:26:31 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
Message-ID: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>

>> The point is that most people in the audience are not Biopython users (yet),
>> so for them a general introduction is more suitable.
>
> Actually this issue is of a major concern to me... Do you (or anybody)
> has a feel of what audience will be there? I think it is important to
> tune the message to the audience. I actually was speculating that
> people would know about biopython. But if that is not the case, as you
> imply, then maybe a something that makes biopython more competitive
> for people which might be deciding which system (language and
> libraries) might be the best approach...

Perhaps I should have given you a broader introduction to BOSC itself.
 There will probably be talks from BioPerl, BioJava and BioRuby in the
same session, and I would expect almost all the audience to be
familiar with at least one of these projects.  However, they may or
may not use python, and I would expect that the majority will not be
Biopython users.  At least, that was my impression last year at BOSC
2007.  Reading over the talk titles/abstracts from last year should
give you a feel for the sort of people there presenting work outside
the Bio* projects.  In terms of general impressions, I felt most of
the attendees actually did some hands on coding.

So yes, as Michiel says, perhaps the current text isn't general
enough.  This is a regular opportunity to try raise awareness of the
project, although I personally wouldn't give a "hard sell", we should
try to give a general overview of Biopython's capabilities.

Peter

From sbassi at gmail.com  Mon May 19 08:36:15 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 19 May 2008 09:36:15 -0300
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
Message-ID: <b43bf2080805190536q546a01c1s2feb0f9ecb386a00@mail.gmail.com>

On Mon, May 19, 2008 at 9:26 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
....
> project, although I personally wouldn't give a "hard sell", we should
> try to give a general overview of Biopython's capabilities.

This work may give some ideas about introducing Biopython:
http://openwetware.org/wiki/Julius_B._Lucks/Projects/Python_All_A_Scientist_Needs

From tiagoantao at gmail.com  Mon May 19 08:38:34 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:38:34 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
Message-ID: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>

In order to address this I am thinking in changing the starting
paragraph of the "paper" along the following lines:

In this talk we present the current status of the Biopython project.
We start by giving a short overview of Biopython - presenting existing
functionality - and useful software libraries for computational
biology in the Python development 'ecology' (from plotting libraries
capable of producing publication quality figures to numerical
libraries, among others). We then focus on features developed since
BOSC 2007, future plans for the project and present example usages of
the new population genetics module.


I think changing the abstract along these lines might also be good.

I think I will target most of the presentation to the idea that the
Python ecology of software development is really good (e.g. putting
one slide on matplot lib with code and result, to show how concise and
simple code can produce nice results). "Selling" Biopython in the
whole python context.

On Mon, May 19, 2008 at 1:26 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> The point is that most people in the audience are not Biopython users (yet),
>>> so for them a general introduction is more suitable.
>>
>> Actually this issue is of a major concern to me... Do you (or anybody)
>> has a feel of what audience will be there? I think it is important to
>> tune the message to the audience. I actually was speculating that
>> people would know about biopython. But if that is not the case, as you
>> imply, then maybe a something that makes biopython more competitive
>> for people which might be deciding which system (language and
>> libraries) might be the best approach...
>
> Perhaps I should have given you a broader introduction to BOSC itself.
>  There will probably be talks from BioPerl, BioJava and BioRuby in the
> same session, and I would expect almost all the audience to be
> familiar with at least one of these projects.  However, they may or
> may not use python, and I would expect that the majority will not be
> Biopython users.  At least, that was my impression last year at BOSC
> 2007.  Reading over the talk titles/abstracts from last year should
> give you a feel for the sort of people there presenting work outside
> the Bio* projects.  In terms of general impressions, I felt most of
> the attendees actually did some hands on coding.
>
> So yes, as Michiel says, perhaps the current text isn't general
> enough.  This is a regular opportunity to try raise awareness of the
> project, although I personally wouldn't give a "hard sell", we should
> try to give a general overview of Biopython's capabilities.
>
> Peter
>


-- 
http://www.tiago.org

From tiagoantao at gmail.com  Mon May 19 08:49:29 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:49:29 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
	<6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>
Message-ID: <6d941f120805190549u773310aj5df318952eca5e52@mail.gmail.com>

By the way, the suggested abstract proposal:

Introduction to and news from the Biopython project presenting both
existing modules and current developments including a new Population
Genetics module and XML parsers for the NCBI's Entrez web interface.

An overview of the existing python software ecology will also be
presented in relationship with computational biology. Libraries to do,
among others, plotting, numerical analysis and molecular modeling will
be presented in connection with Biopython and from the point a view of
having a complete platform to do research in computational biology.

Biopython is freely available on http://www.biopython.org under a
liberal "MIT style" open source license,
http://www.biopython.org/DIST/LICENSE


On Mon, May 19, 2008 at 1:38 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> In order to address this I am thinking in changing the starting
> paragraph of the "paper" along the following lines:
>
> In this talk we present the current status of the Biopython project.
> We start by giving a short overview of Biopython - presenting existing
> functionality - and useful software libraries for computational
> biology in the Python development 'ecology' (from plotting libraries
> capable of producing publication quality figures to numerical
> libraries, among others). We then focus on features developed since
> BOSC 2007, future plans for the project and present example usages of
> the new population genetics module.
>
>
> I think changing the abstract along these lines might also be good.
>
> I think I will target most of the presentation to the idea that the
> Python ecology of software development is really good (e.g. putting
> one slide on matplot lib with code and result, to show how concise and
> simple code can produce nice results). "Selling" Biopython in the
> whole python context.
>
> On Mon, May 19, 2008 at 1:26 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> The point is that most people in the audience are not Biopython users (yet),
>>>> so for them a general introduction is more suitable.
>>>
>>> Actually this issue is of a major concern to me... Do you (or anybody)
>>> has a feel of what audience will be there? I think it is important to
>>> tune the message to the audience. I actually was speculating that
>>> people would know about biopython. But if that is not the case, as you
>>> imply, then maybe a something that makes biopython more competitive
>>> for people which might be deciding which system (language and
>>> libraries) might be the best approach...
>>
>> Perhaps I should have given you a broader introduction to BOSC itself.
>>  There will probably be talks from BioPerl, BioJava and BioRuby in the
>> same session, and I would expect almost all the audience to be
>> familiar with at least one of these projects.  However, they may or
>> may not use python, and I would expect that the majority will not be
>> Biopython users.  At least, that was my impression last year at BOSC
>> 2007.  Reading over the talk titles/abstracts from last year should
>> give you a feel for the sort of people there presenting work outside
>> the Bio* projects.  In terms of general impressions, I felt most of
>> the attendees actually did some hands on coding.
>>
>> So yes, as Michiel says, perhaps the current text isn't general
>> enough.  This is a regular opportunity to try raise awareness of the
>> project, although I personally wouldn't give a "hard sell", we should
>> try to give a general overview of Biopython's capabilities.
>>
>> Peter
>>
>
>
>
> --
> http://www.tiago.org
>


-- 
http://www.tiago.org


From bugzilla-daemon at portal.open-bio.org  Mon May 19 09:46:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 09:46:22 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805191346.m4JDkMMf028474@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|blocker                     |major


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-19 09:46 EST -------
I see from comment 11 that some nasty quote escaping may be needed (which could
be an NCBI bug).

Have you been able to try using relative paths at the command line (avoiding
spaces ideally)?

Unfortunately my Windows machine is currently without internet access, which is
one reason why I haven't made time to sit down and explore this issue.

P.S. I don't think this is a critical bug in Biopython, although I do take your
point that it your setup this is a big issue.  Downgrading this to severity
"major".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon May 19 17:03:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:03:44 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192103.m4JL3iSk021133@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #13 from drpatnaik at yahoo.com  2008-05-19 17:03 EST -------
To get BioPython call BLAST, this works:
  1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'

Variations like these do not work:
  2. "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"
  3. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"

The error being:
  'C:/Documents' is not recognized as an internal or external command, operable
program or batch file.

With my_blast_exe set to the 1st value constant, and trying different
my_blast_db values, BLAST reports:
  [NULL_Caption] ERROR: Arguments must start with '-' (the offending argument
#5 was: 'and') /* or 'and\' or 'and\\' */

The values tried for my_blast_db are:
  4. 'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine'
  5. 'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine'
  6. 'C:/Documents\\ and\ Settings/patnaik/My\\ Documents/blast/bin/mine'

  7. "C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"
  8. "C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"
  9. "C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"

  10. r'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine'
  11. r'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine'
  12. r'C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine'

  13. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"
  14. r"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"
  15. r"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"

But a different error ...:
  'C:/Documents' is not recognized as an internal or external command, operable
program or batch file.

... is shown with these values:

  16. r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"'
  17. r'"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"'
  18. r'"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"'

That same error is also seen when I try these variations of the value that
works in command-line BLAST (comment #10 above):

  19. r'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'
  20. r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""'
  20. "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""
  21. r"\"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"\""
  22. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'"

Doesn't this suggest that Biopython is not passing the my_blast_db value
properly? 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon May 19 17:36:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:36:42 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192136.m4JLag7h022387@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #14 from drpatnaik at yahoo.com  2008-05-19 17:36 EST -------
(continuing comment #13)

  23. r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine"'
  24. '"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine"'
  25. '\\"C:\\Documents and Settings\\patnaik\\My
Documents\\blast\\bin\\mine\\"'
  26. r"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""
  27. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon May 19 17:47:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:47:00 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192147.m4JLl0HQ022723@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #15 from drpatnaik at yahoo.com  2008-05-19 17:47 EST -------
(In reply to comment #13)
> To get BioPython call BLAST, this works:
>   1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My
> Documents/blast/bin/blastall.exe"'

I forgot to add that I had to comment-out the os.path.exists in
NCBIStandaolne.py to get to that step. Equivalently, with this script I get the
'does not exist' message:

   import os
   my_blast_exe =r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'
   if not os.path.exists(my_blast_exe):
     print 'cannot find my_blast_exe'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue May 20 12:31:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 20 May 2008 12:31:41 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805201631.m4KGVfF8016867@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-20 12:31 EST -------
Follow up discussion on the mailing list:
http://lists.open-bio.org/pipermail/biopython/2008-May/004231.html

Katie wrote:
> I asked NCBI about this, and they (eventually) replied that it's "not
> officially supported."  I have been unable to figure out how to get it to
> return iterations after the first one.

I'm going to close this bug as "invalid" unless the NCBI do make a public API
for PSI-BLAST.  It looks like the only solution for now would be to install the
standalone blast tools...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue May 20 22:45:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 20 May 2008 22:45:58 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805210245.m4L2jwgM013784@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #16 from drpatnaik at yahoo.com  2008-05-20 22:45 EST -------
Similar to what I mentioned in comment #10 this BLAST command-line code works:

(1)   "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"
-p blastn -d "\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\mine\"" -i "C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin" -m 7

Now I've been trying to see the system call popen3 makes in line 1662 of
NCBIStandalone.py by putting this line of code before the os.popen3("
".join([blastcmd] + params):

   print " ".join([blastcmd] + params)

(as reported in comment #15, I do have to first disable the os.path.exists)

Using these values in my test script:
   my_blast_db =r'"\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\mine\""'
   my_blast_file =r'"C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin"'
   my_blast_exe =r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'

I get a print command result that is identical to the working BLAST
command-line code (1).

   "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p
blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i
"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7

But BLAST doesn't get called and the error reported is:

   'C:/Documents' is not recognized as ...

Finally I tried replacing the code inside the os.popen3 of NCBIStandalone.py
with the string (1):

   w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and
Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and
Settings\patnaik\My Documents\blast\bin\hairpin" -m 7')

And I get the same error:

   'C:/Documents' is not recognized as ...

With a non-Biopython-dependent script, I get the same error (irrespective of
the quote combinations I tried):

   import os
   w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and
Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and
Settings\patnaik\My Documents\blast\bin\hairpin" -m 7')
   print e.read()

-------------------------------------------------------------------

FINAL THOUGHTS

I think I've to give up on this.

There seem to be two incurable issues, unlikely Biopython-specific:
os.path.exists (see comment #15) and os.popen3


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 04:34:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 04:34:52 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805210834.m4L8YqVL004607@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 04:34 EST -------
The os.path.exists(...) check in Biopython should be easy to fix, probably by
the user specifying the exe name without quotes and biopython adding the quotes
when building the command line.

For specifying the NCBI database locations, have you set the database folder
using NCBI.ini yet?  I'm not sure if it will work if the INI file is in the
BLAST directory as the NCBI documentation says it should go in the Windows
directory (which you don't have write access to).  Perhaps anywhere on the path
will work.  See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

There is also the option of using relative paths...

You might get more success talking to the machine administrator and asking them
to install BLAST for you?

The good news is my home internet connection is up and running, so I may be
able to do a little investigation on this issue now (time permitting).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed May 21 05:21:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 21 May 2008 10:21:51 +0100
Subject: [Biopython-dev] Next release?
Message-ID: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>

>From the discussion list, quite a few people have suffered from the
NCBI tweaking the online Blast XML format with 2.2.18+ (bug 3499), so
it would be nice to get a new release out soon to address this.  See
http://bugzilla.open-bio.org/show_bug.cgi?id=2499

How do the other modules stand at the moment?

Bio.PopGen (Tiago). Is this currently stable, or are you in the middle
of adding more features?

Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488

Bio.AlignIO - this is new, but has a reasonable amount of
documentation and a small unit test (see bug 2285).  If we did do a
release soon, it could be announced as "in beta", and subject to
change, but feedback welcomed.
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

In terms of the unit tests, I haven't run them on Windows recently
(internet access issues, hopefully resolved now), but on Linux things
looks fine.

Peter

From mjldehoon at yahoo.com  Wed May 21 05:40:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 21 May 2008 02:40:25 -0700 (PDT)
Subject: [Biopython-dev] Next release?
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <928585.24226.qm@web62401.mail.re1.yahoo.com>

Peter <biopython at maubp.freeserve.co.uk> wrote:Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488

I am still making refinements. I am using this module a lot for my own work, and I have a lot of changes that are not in CVS yet. The final result should be much simpler than what is in CVS now. In particular, we won't have to write a Python module for each DTD, but let Python figure out the DTD for itself.
Once this is finished (hopefully soon), I'd be happy to make a new release.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Wed May 21 06:48:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 06:48:40 -0400
Subject: [Biopython-dev] [Bug 2501] New: Minor erratas in module
	Bio.SeqRecord
Message-ID: <bug-2501-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2501

           Summary: Minor erratas in module Bio.SeqRecord
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: xbello at gmail.com


line 32: description - Seqeuence description, optional (string) 
line 63: if self.description : lines.append("Desription: %s" %
self.description)

Seqeuence instead of Sequence
Desription instead of Description


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 07:28:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:28:33 -0400
Subject: [Biopython-dev] [Bug 2501] Minor erratas in module Bio.SeqRecord
In-Reply-To: <bug-2501-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211128.m4LBSX99014512@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2501


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 07:28 EST -------
Thanks for point those out - fixed in CVS revision 1.16


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Wed May 21 07:41:15 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 21 May 2008 12:41:15 +0100
Subject: [Biopython-dev] Next release?
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <6d941f120805210441w4f3fc3d7m42ee5531dca127df@mail.gmail.com>

On Wed, May 21, 2008 at 10:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Bio.PopGen (Tiago). Is this currently stable, or are you in the middle
> of adding more features?

Long story: I will just add after moving to SVN. Actually the most
important part is going to be added next, but I am waiting for SVN
(any news on that front?). The statistics part that I will be
commiting is the core of the module...

Short story: Don't worry with me if you are doing a release in the
next couple of weeks...

From bugzilla-daemon at portal.open-bio.org  Wed May 21 07:51:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:51:17 -0400
Subject: [Biopython-dev] [Bug 2502] New: PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
Message-ID: <bug-2502-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502

           Summary: PSIBlastParser fails with blastpgp 2.2.18 though works
                    with blastpgp 2.2.15
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: ibdeno at gmail.com


When parsing a PSI-Blast result from blastpgp version 2.2.18 I get this error:

Traceback (most recent call last):
  File "./lpbl.py", line 23, in <module>
    b_record = b_parser.parse(blast_out)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 760,
in parse
    self._scanner.feed(handle, self._consumer)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 98,
in feed
    self._scan_header(uhandle, consumer)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 208,
in _scan_header
    raise ValueError("Invalid header?")
ValueError: Invalid header?

The same script and same input just works with blastpgp 2.2.15

I will attach script and input file later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 07:56:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:56:53 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211156.m4LBurSt016108@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #1 from ibdeno at gmail.com  2008-05-21 07:56 EST -------
Created an attachment (id=921)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=921&action=view)
Contains a script and an example sequence to reproduce the bug

Change in the script the location of the blast command and of the database to
be used.

Run it as:

./lpbl.py hsTXN.prot.fasta 3

The second argument is the number of iterations for blastpgp


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 09:05:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 09:05:13 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211305.m4LD5DhV020562@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 09:05 EST -------
Miguel - could you also attach the XML output from blastpgp 2.2.15 and 2.2.18
please?

e.g.  Something like this if you want to do it via Biopython:

blast_out, error_info = NCBIStandalone.blastpgp(
        blastcmd='/opt/Bio/blast-2.2.15/bin/blastpgp',
        database='/opt/databases/BlastDB/nrdb100ncbi',
        infile=file,
        npasses=passes,
        align_view='0',
        matrix_outfile=file + '.pssm')

handle = open("blastpgp_2.2.15.xml","w")
handle.write(blast_out.read())
handle.close()

Thanks, Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 10:44:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 10:44:41 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211444.m4LEifII025392@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #3 from ibdeno at gmail.com  2008-05-21 10:44 EST -------
Created an attachment (id=922)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=922&action=view)
Plain text and XML outputs from blastgpg

The names should be self-explanatory.
The log files where produced with the appropriate blastpgp version using the
command line:

blastpgp -i hsTXN.prot.fasta -d /drives/databases/BlastDB/nrdb100ncbi -j 1 -m
[0,7]

m = 0 is plain text (as in the original submitted bug)
m = 7 is XML


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed May 21 13:21:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 13:21:27 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211721.m4LHLRX1003810@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #18 from drpatnaik at yahoo.com  2008-05-21 13:21 EST -------
The BLAST database folder being inside blast/bin seems to be fine as
command-line BLAST does work. I haven't tried relative paths. It should work,
as should using an external drive that can provide for space-less paths. But
the issue of spaces in paths on Windows remains. I thank you for your
suggestions and efforts looking into it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jblanca at btc.upv.es  Thu May 22 03:30:52 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 22 May 2008 09:30:52 +0200
Subject: [Biopython-dev] sequence class proposal
Message-ID: <200805220930.53004.jblanca@btc.upv.es>

Dear Biopython developers,
I've been using python and Biopython for some time now and I would like to 
talk with you about the sequence classes in Biopython. I have had some issues 
using the SeqRecord and Alignment classes and I have being discussing and 
implementing with two students (Victor Sanchez y Pablo Martinez) a proposal 
of a new sequence class. We would like to present this implementation as a 
tip in the discussion about the design of the sequence classes in Biopython 
and we're eager to receive your comments.

The first problem that I found with the SeqRecord is the lack of support for 
qualities. And it is also difficult to implement this quality support in a 
SeqRecord derived class. There's a problem with the current SeqRecord API 
that difficults this. Let me explain it.
Currently SeqRecord has a seq property and if you want an slice or if you need 
to reverse or complement you would do something like:
my_seq = SeqRecord()
my_seq.seq = Seq('ACTG')
my_seq.seq[0:2]
my_seq.seq = my_seq.reverse()
If I derive a class from SeqRecord with a qual property I don't know how to 
reverse both the sequence and the quality at the same time, because now the 
Seq methods are called directly without SeqRecord being aware of that. In 
order to support that we have discuss a new class with a slightly different 
API and we have done a preliminary implementation. We have named this new 
class as RichSeq, and we think that this could solve the quality problem. 
With this new class it would work like this:
myseq = RichSeq(seq='ACTG', qual=[50,50,50,50])
subseq = myseq[0:2]
myseq.reverse()
myseq.complement()
RichSeq is equivalent to SeqRecord and it has the same properties as 
SeqRecord, but it adds the methods __getitem__, reverse, complement and 
reverse_complement.

We have also implemented a new type of features, we have called them 
RichFeature. They are similar to the SeqFeature. The main difference is that 
instead of a location and a location operator, they have a BioRange (another 
new class). This BioRange is inspired/copied from the Bioperl library. The 
BioRange is optional, so some RichFeature uses would be:
RichFeature(id='a_feature', type='annotation', feature='this is an 
annotation')
RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG'))
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation, 
e.g. an exon')
seq = RichSeq(seq='ACTGACTG', features=[feat])

With this implementation you can define a sequence with seq, qual and 
annotations associated with a range in a easy way, and after that you can 
reverse and complement them in a trivial way.
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation')
seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat])
seq.reverse()

By the way, this is a mutable class, although that could be easily changed.

You can even use Seqs and RichSeq as subsequences and ask for slices or 
complements.
range = BioRange(start=1,end=2)
feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range)
seq = RichSeq(seq='ACTG', features=[feat])
seq2 = seq[1:2]
seq.reverse()
This capability makes this RichSeq an excellent candidate for a base class for 
an Alignment implementation, but we have not implemented this yet.

Attach to this mail you can find the implementation of this new classes. They 
have some tests that provide an idea about their intended use. We would like 
to know about your opinions and suggestions. Do you think that this kind of 
functionality is desirable? Please let us know about any flaw, specially in 
the API. I think that my work would be easier using a sequence class similar 
to RichSeq, but maybe there's an easier way.
Do you think that is a good idea to attach this classes to bugzilla? Do we 
open a new bug or there's one for this sequence class debate already open?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: richseq.0.0.1.tar.gz
Type: application/x-tgz
Size: 7075 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20080522/aba24889/attachment-0001.bin>

From biopython at maubp.freeserve.co.uk  Thu May 22 11:47:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 22 May 2008 16:47:58 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <200805220930.53004.jblanca@btc.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
Message-ID: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>

On Thu, May 22, 2008 at 8:30 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Dear Biopython developers,
> I've been using python and Biopython for some time now and I would like to
> talk with you about the sequence classes in Biopython. I have had some issues
> using the SeqRecord and Alignment classes and I have being discussing and
> implementing with two students (Victor Sanchez y Pablo Martinez) a proposal
> of a new sequence class. We would like to present this implementation as a
> tip in the discussion about the design of the sequence classes in Biopython
> and we're eager to receive your comments.

If I understood your terminology correctly, "qualities" is a list of
scores, one for each letter in the sequence.  I see this is a special
case of a more general situation where you have per-letter-annotation
information.  Examples include secondardy structure or residue
coordinates of a protein sequence.  Very often for example, secondary
structures are stored in files as a simple string whise length matches
the length of the sequence.  Also related are sub-features like
domains or promotor sites which span a range of residues.

So I would agree with you that an enhanced class would be useful,
where the per letter annotations were respected in splicing, reversing
etc.  Handling sub-features when slicing is less straight forward.

The current SeqRecord and Seq classes separate the sequence annotation
from the sequence letters themselves, making this sort of integration
difficult.  Making the SeqRecord a direct subclass of the Seq object
has previously been suggested and would pave the way for this sort of
operation.

See Bug 2351 where some of these ideas have been floated...
http://bugzilla.open-bio.org/show_bug.cgi?id=2351

There are a lot of things that would need to be discussed - for
example how would you handle the pre-sequence annotation (e.g. record
identifiers) when adding two "rich" seqeunces?  I've been content with
making small steps for now, with backwards compatibility always in
mind.

On another note, I'm also thinking about the need for an annotated
sequence alignment object, where there are similar concerns.

Also, have you discussed the alphabet objects?

> Do you think that is a good idea to attach this classes to bugzilla? Do we
> open a new bug or there's one for this sequence class debate already open?

Your proposals do seem very broad, so have a look at Bug 2351 first,
but perhaps start a new enhancement bug, and then attach the code.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri May 23 06:06:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 06:06:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231006.m4NA6itj022486@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 06:06 EST -------
I've worked out that the original problem was use to trying to parse XML output
with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text
output only).  Perhaps the error message could be more helpful in this
situation?

I'm using Biopython from CVS, but it seems to parse the plain text output from
both 2.2.15 and 2.2.18 fine.  Here is a modified version of your code which
reads from the example plain text files provided:

#!/usr/bin/env python
#
import os, re, string, operator
from Bio.Blast import NCBIStandalone
from sys import *

E_VALUE_THRESH = 0.005

nolf = re.compile('\n')
nogaps = re.compile('-')

blast_out = open("blastpgp.2.2.18.txt")
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)

if b_record.converged == 1:
    print '*** Converged!!! ***'

fastaout = open('test_psiblast.fasta','w')
summout = open('test_psiblast.txt','w')

for alignment in b_record.rounds[-1].alignments:
    for hsp in alignment.hsps:
        if hsp.expect < E_VALUE_THRESH:
            ident = 100.0*hsp.identities[0]/hsp.identities[1]
            simil = 100.0*hsp.positives[0]/hsp.positives[1]
            mytitle = nolf.sub(' ',alignment.title)
            mysbjct = nogaps.sub('',hsp.sbjct)
            summout.write('****Alignment****\n')
            summout.write('sequence: %s\n' % mytitle[0:70])
            summout.write('e value: %e\n' % hsp.expect)
            summout.write('alignment length: %i\n' % hsp.positives[1])
            summout.write('identity:   %(ident)5.2f\n' % {'ident': ident} )
            summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} )
            summout.write('query:   from %i to %i\n' % (hsp.query_start,
hsp.query_end))
            summout.write('subject: from %i to %i\n' % (hsp.sbjct_start,
hsp.sbjct_end))
            summout.write('%s ...\n' % hsp.query[0:75])
            summout.write('%s ...\n' % hsp.match[0:75])
            summout.write('%s ...\n' % hsp.sbjct[0:75])
            fastaout.write('%s\n%s\n' % (mytitle,mysbjct))

summout.close()
fastaout.close()
print "Done"

----------------------------------------------------------------------

So, as far as I can tell, the plain text PSI Blast parser is fine .

As I do not have the relevant databases installed, I have not tried using
Biopython to call blastpgp to run PSI-Blast.  It could be there is a problem
here with specifying the output format...

As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I
think you get back an iterator yielding a record for each iteration.  However,
as the example you provided had only one query and one iteration, this should
be tested further.  The record is not showing all the information extracted by
the PSI-Blast text parse, which should be in the XML file.  Perhaps you would
like to investigate this?

Example code:

from Bio.Blast import NCBIStandalone, NCBIXML

for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    record = NCBIStandalone.PSIBlastParser().parse(handle)
    print record.query
    if record.converged : print '*** Converged!!! ***'
    for iter_round in record.rounds :
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
        print "%i new sequences, %i reused" \
              %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
    print "End of plain text output"

for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    for iter_round in NCBIXML.parse(handle) :
        print iter_round.query
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
    print "End of XML output"

The output:

blastpgp.2.2.15.txt
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
500 new sequences, 0 reused
End of plain text output

blastpgp.2.2.18.txt
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
500 new sequences, 0 reused
End of plain text output

blastpgp.2.2.15.xml
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 500 alignments
End of XML output

blastpgp.2.2.18.xml
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
End of XML output

Notice that NCBI must have changed the XML format in some way (500 versus 250
alignments between versions 2.2.15 and 2.2.18).  I have not explored this in
any detail.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 06:45:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 06:45:58 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231045.m4NAjwr4023917@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #5 from ibdeno at gmail.com  2008-05-23 06:45 EST -------
Hi Peter,

Thank you. The problem must be then with the blastpgp call from Biopython,
since my code was trying to obtain plain text via the align_view='0' option:

blast_out, error_info = NCBIStandalone.blastpgp(
        blastcmd='/home/mortiz/Progs/blast-2.2.15/bin/blastpgp',
        database='/drives/databases/BlastDB/nrdb100ncbi',
        infile=file,
        npasses=passes,
        align_view='0',
        matrix_outfile=file + '.nrdb100ncbi.pssm')

However, when I print the result of this call with the handler you proposed:

handle = open("blastpgp_2.2.18.txt","w")
handle.write(blast_out.read())
handle.close()

I actually get plain text!

The same blastpgp call (same binary, same database, same input file sequence,
same number of PSI-Blast iterations) still gives the error reported in the bug
with version 2.2.18, but works all right with 2.2.15.
Because the error appears within seconds, I'm wondering if the parser is not
trying to read the results before blastpgp has actually finished the iterations
(about 3 minutes in my test)

I'm without a clue...


Miguel

(In reply to comment #4)
> I've worked out that the original problem was use to trying to parse XML output
> with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text
> output only).  Perhaps the error message could be more helpful in this
> situation?
> 
> I'm using Biopython from CVS, but it seems to parse the plain text output from
> both 2.2.15 and 2.2.18 fine.  Here is a modified version of your code which
> reads from the example plain text files provided:
> 
> #!/usr/bin/env python
> #
> import os, re, string, operator
> from Bio.Blast import NCBIStandalone
> from sys import *
> 
> E_VALUE_THRESH = 0.005
> 
> nolf = re.compile('\n')
> nogaps = re.compile('-')
> 
> blast_out = open("blastpgp.2.2.18.txt")
> b_parser = NCBIStandalone.PSIBlastParser()
> b_record = b_parser.parse(blast_out)
> 
> if b_record.converged == 1:
>     print '*** Converged!!! ***'
> 
> fastaout = open('test_psiblast.fasta','w')
> summout = open('test_psiblast.txt','w')
> 
> for alignment in b_record.rounds[-1].alignments:
>     for hsp in alignment.hsps:
>         if hsp.expect < E_VALUE_THRESH:
>             ident = 100.0*hsp.identities[0]/hsp.identities[1]
>             simil = 100.0*hsp.positives[0]/hsp.positives[1]
>             mytitle = nolf.sub(' ',alignment.title)
>             mysbjct = nogaps.sub('',hsp.sbjct)
>             summout.write('****Alignment****\n')
>             summout.write('sequence: %s\n' % mytitle[0:70])
>             summout.write('e value: %e\n' % hsp.expect)
>             summout.write('alignment length: %i\n' % hsp.positives[1])
>             summout.write('identity:   %(ident)5.2f\n' % {'ident': ident} )
>             summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} )
>             summout.write('query:   from %i to %i\n' % (hsp.query_start,
> hsp.query_end))
>             summout.write('subject: from %i to %i\n' % (hsp.sbjct_start,
> hsp.sbjct_end))
>             summout.write('%s ...\n' % hsp.query[0:75])
>             summout.write('%s ...\n' % hsp.match[0:75])
>             summout.write('%s ...\n' % hsp.sbjct[0:75])
>             fastaout.write('%s\n%s\n' % (mytitle,mysbjct))
> 
> summout.close()
> fastaout.close()
> print "Done"
> 
> ----------------------------------------------------------------------
> 
> So, as far as I can tell, the plain text PSI Blast parser is fine .
> 
> As I do not have the relevant databases installed, I have not tried using
> Biopython to call blastpgp to run PSI-Blast.  It could be there is a problem
> here with specifying the output format...
> 
> As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I
> think you get back an iterator yielding a record for each iteration.  However,
> as the example you provided had only one query and one iteration, this should
> be tested further.  The record is not showing all the information extracted by
> the PSI-Blast text parse, which should be in the XML file.  Perhaps you would
> like to investigate this?
> 
> Example code:
> 
> from Bio.Blast import NCBIStandalone, NCBIXML
> 
> for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     record = NCBIStandalone.PSIBlastParser().parse(handle)
>     print record.query
>     if record.converged : print '*** Converged!!! ***'
>     for iter_round in record.rounds :
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>         print "%i new sequences, %i reused" \
>               %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
>     print "End of plain text output"
> 
> for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     for iter_round in NCBIXML.parse(handle) :
>         print iter_round.query
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>     print "End of XML output"
> 
> The output:
> 
> blastpgp.2.2.15.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.18.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.15.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 500 alignments
> End of XML output
> 
> blastpgp.2.2.18.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> End of XML output
> 
> Notice that NCBI must have changed the XML format in some way (500 versus 250
> alignments between versions 2.2.15 and 2.2.18).  I have not explored this in
> any detail.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 07:02:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 07:02:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231102.m4NB2iPS024763@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 07:02 EST -------
That's an interesting theory - reading directly from standard out is causing
the problem (comment 5).  One thing you could try is writing the blastpgp
output to a file, and then opening the file for reading.

I'm not sure if blastpgp has a file output option.  You could just try this:

blast_out, error_info = NCBIStandalone.blastpgp(...)
handle = open("blastpgp_2.2.18.txt","w")
handle.write(blast_out.read())
handle.close()
blast_out = open("blastpgp_2.2.18.txt")
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)
...

Or, for a very crude workaround:

from time import sleep
blast_out, error_info = NCBIStandalone.blastpgp(...)
sleep(5*60) #Five minutes
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)
...

If those work, it would be good evidence that your theory is right.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jblanca at btc.upv.es  Fri May 23 07:10:13 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 23 May 2008 13:10:13 +0200
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
Message-ID: <200805231310.13408.jblanca@btc.upv.es>

Hi:
After reding the suggestions in Bug 2351 I've coded a MutableSeq class that 
inherits from UserString.MutableString instead of using an array stored in 
self.data. It's quite easily to do it work as the MutableSeq present in 
Biopytyhon 1.45, but there's some problems to solve.
I don't know if this class would be faster or easier to maintain than the 
MutableSeq that uses array.array. I've just done that as an experiment to 
learn something about Biopython.

Now the compatibility problems that I have found...

self.data is not an array but an str. That's not easy to solve becase 
MutableString uses self.data internaly. I tried to define a property class, 
but MutableString is an old style class. Maybe I don't know enough python, 
but I don't know how to solve this type mismatch.

append() and extend() could be coded using __add__(). insert() and remove() 
are not supported by MutableSeq and would have to be coded. But I don't see 
the point of this methods in a sequence class. I think that the Seq and the 
MutableSeq API should be as similar as possible and since Seq uses __add__() 
I don't understand why MutableSeq should use append() and extend().

I also have problems with del seq[2:4:-1] and seq[2::3] = "N" * len(seq[2::3])
All the other tests for MutableSeq just work. 

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From bugzilla-daemon at portal.open-bio.org  Fri May 23 08:38:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 08:38:28 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231238.m4NCcS0S028452@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #7 from ibdeno at gmail.com  2008-05-23 08:38 EST -------
Unfortunately the hypothesis was not correct.
If I create an intermediate file, the parser works well if the file comes from
blastpgp 2.2.15 but chokes on 2.2.18.

There is a new reference in 2.2.18 header:

Reference for compositional score matrix adjustment: Altschul, Stephen F., 
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.


which falls between the two ones existing in the 2.2.15 version and makes the
header longer in terms of number of lines... Might be this?


Miguel

(In reply to comment #6)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 10:30:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 10:30:42 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231430.m4NEUgVL001388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 10:30 EST -------
I'm using the CVS version of Biopython under Linux.  The file main
NCBIStandalone.py hasn't changed since Biopython 1.45, although Record.py has.

I am a little puzzled about why I can parse both the 2.2.15 and the 2.2.18
plain text examples you provided without problems, but something fails for you.
 Could you double check what happens on your machine using these two example
files from attachment 922 comment 3, and this code I gave in comment 4:

from Bio.Blast import NCBIStandalone
for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    record = NCBIStandalone.PSIBlastParser().parse(handle)
    print record.query
    if record.converged : print '*** Converged!!! ***'
    for iter_round in record.rounds :
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
        print "%i new sequences, %i reused" \
              %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
    print "End of plain text output"

If this doesn't work, please give the full stack trace - "chokes" is a little
vague.

Looking at the example files you provided in attachment 922 comment 3, they
seem to have replaced one reference with another.  This is the start of the
diff output comparing the two files:

1c1
< BLASTP 2.2.15 [Oct-15-2006]
---
> BLASTP 2.2.18 [Mar-02-2008]
10,15c10,13
< Reference for composition-based statistics:
< Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
< Sergei Shavirin, John L. Spouge, Yuri I. Wolf,  
< Eugene V. Koonin, and Stephen F. Altschul (2001), 
< "Improving the accuracy of PSI-BLAST protein database searches with 
< composition-based statistics and other refinements",  Nucleic Acids Res.
29:2994-3005.
---
> Reference for compositional score matrix adjustment: Altschul, Stephen F., 
> John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
> Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
> using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

This reference change doesn't seem to cause a problem on my machine.  I didn't
notice anything else worth commenting about.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:02:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:02:10 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231502.m4NF2AZm003440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #9 from ibdeno at gmail.com  2008-05-23 11:02 EST -------
Hi Peter,

Thank you for your patience and sorry not to be clear.

1. By 'choke' I meant that it produced the same error mentioned in the original
but report.

2. I see now that my attachments (#922) were not appropriate: to gain some time
I had requested no iterations to blastpgp, that is: I used '-j 1'. I can
actually parse the plain text from 2.2.18 that I had submitted in those
attachments both with your and my code. This also explains the differences in
the headers... 

I will now submit two plain text outputs from blastpgp with 2 iterations ('-j
3') Your code and mine can parse 2.2.15 but both fail (with the "Incorrect
header ?" error) with 2.2.18

Sorry again...


Miguel

(In reply to comment #8)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:05:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:05:16 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231505.m4NF5G1k003638@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #10 from ibdeno at gmail.com  2008-05-23 11:05 EST -------
Created an attachment (id=923)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=923&action=view)
Plain text outputs from blastpgp versions 2.2.15 and 2.2.18 with 2 iterations

These files are the result of calling blastpgp with the -j 3 option.
The files sent with attachment #922 were actually no problematic, only when at
least one iteration is carried out the parsing problem appears with blastpgp
version 2.2.18.

Perhaps due to the insertion of a new Reference in the header of the blastpgp
output?

Cheers,

Miguel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:16:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:16:14 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231516.m4NFGExh004121@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 11:16 EST -------
Great - I now get the same error as you :)

I'll try and have a look at this over the weekend.  Would you be able to make
matching XML files as well?  While I'm playing with blastpgp output it would be
worth checking exactly what the XML files do...

P.S. Would you object to me using any of your examples as test cases for the
Biopython unit tests?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:25:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:25:19 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231525.m4NFPJVY004581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 11:25 EST -------
You are right - it is the extra reference which was causing the failure.

I've checked in a fix to Bio/Blast/NCBIStandalone.py with CVS revision 1.72

Could you update your Biopython installation to CVS and retest?  Or just
replace /home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py with
revision 1.72 from the ViewCVS website once its updated:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython

(I haven't closed this bug yet - I'd like your confirmation that this fixes
things, adding a new test case would probably be wise.)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:39:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:39:49 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231539.m4NFdn83005197@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #13 from ibdeno at gmail.com  2008-05-23 11:39 EST -------
Created an attachment (id=924)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=924&action=view)
XML equivalent of the files in the previous attachment (#923)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:41:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:41:17 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231541.m4NFfHv7005278@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #14 from ibdeno at gmail.com  2008-05-23 11:41 EST -------
I have now submitted the XML equivalent files.
Sure, please use the examples and code if you find them useful.

Cheers,


Miguel
(In reply to comment #11)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:42:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:42:50 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231542.m4NFgolb005350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #15 from ibdeno at gmail.com  2008-05-23 11:42 EST -------
I will try as soon as revision 1.72 is available through the link you provided.
So far, the latest is 1.71

Thank you!


Miguel

(In reply to comment #12)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:56:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:56:13 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231556.m4NFuDpd005873@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #16 from ibdeno at gmail.com  2008-05-23 11:56 EST -------
Sorry, I won't be able to try your fix until next week: I don't have access to
the computer due to maintenance.

I'll let you know as soon as possible.


Miguel

(In reply to comment #15)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat May 24 03:16:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 03:16:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805240716.m4O7GiqV007275@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #17 from ibdeno at gmail.com  2008-05-24 03:16 EST -------
I have managed to access to a different computer and tested your revised (1.72)
version of NCBIStandalone.py

I'm glad I can confirm it does work.

I guess the best way to avoid such problems in future would be to have an
appropriate XML parser for PSI-Blast.

Thank you very much for your assistance.


(In reply to comment #12)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From peter at maubp.freeserve.co.uk  Sat May 24 07:02:51 2008
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 24 May 2008 12:02:51 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <200805231310.13408.jblanca@btc.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<200805231310.13408.jblanca@btc.upv.es>
Message-ID: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com>

Hi Jose,

Your ideas are interesting for switching the MutableSeq class from an
array of char internally to a python mutable string.  However, are you
talking about the UserString.MutableString object? The documentation
suggests its not going to be as fast as a list or a character array:
http://pydoc.org/2.5.1/UserString.html#MutableString

Note that at some point we will be moving from Numeric to numpy, so
the exact internals of the current array based MutableSeq will change
slightly then.

I will be away most of next week, so don't worry if I seem to be ignoring you ;)

Peter

From bugzilla-daemon at portal.open-bio.org  Sat May 24 08:10:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:10:24 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To: <bug-2382-42@http.bugzilla.open-bio.org/>
Message-ID: <200805241210.m4OCAOol018283@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2382


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-24 08:10 EST -------
See also http://www.bioperl.org/wiki/Qual_sequence_format where there is a
similar looking file format which they call "qual" described as also being used
by PHRAP and CAP3.  e.g.

>HSMETOO 134bp
10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 20 30 20 10 10


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat May 24 08:15:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:15:23 -0400
Subject: [Biopython-dev] [Bug 2503] New: An error when parsing NCBIWWW Blast
	output
Message-ID: <bug-2503-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503

           Summary: An error when parsing NCBIWWW Blast output
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: hebbar.prashanth at gmail.com


Hi All,
I get following error when I start parsing NCBIWWW balst output.
Traceback (most recent call last):
 File "<pyshell#17>", line 1, in -toplevel-
   b_record = b_parser.parse(blast_results)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 43, in
parse
   self._scanner.feed(handle, self._consumer)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 94, in
feed
   has_re=re.compile(r'<b>.?BLAST'))
 File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 335, in
read_and_call_until
   line = safe_readline(uhandle)
 File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
   raise SyntaxError, "Unexpected end of stream."
SyntaxError: Unexpected end of stream.
Can any one please help me to solve this? I am using biopython 1.44 version (I
tried with 1.45 too, the same error comes)
in windows system
Thank you in anticipation,
Prashanth


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat May 24 08:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:25:59 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200805241225.m4OCPxTc018893@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-24 08:25 EST -------
We need more information.

Could you show us the example code that causes this problem?  

If you are trying to parse a file (e.g. from standalone blast), could attach it
to this bug?

For the look of the stack trace, you are trying to parse the HTML output from
blast (?).  We do recommend parsing the XML output if possible (not the plain
text or HTML output).

Thank you,
Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat May 24 10:26:27 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 24 May 2008 07:26:27 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtils
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <893127.27535.qm@web62412.mail.re1.yahoo.com>

Dear all,

I have essentially completed the parser in Bio.Entrez. AFAICT, it works with all kinds of XML files returned by NCBI's Entrez Utilities, except for the Pubmed Central database (Pubmed itself is fine). I am using this module a lot for my own work, so it has received quite a lot of testing. As a case in point, there are 40 unit tests for the Bio.Entrez parser. These, by the way, can show you some examples of how to use this module. The documentation is now also updated.

This module may at some point replace Bio.EUtils, so if you are using this module you might want to try Bio.Entrez to see if it covers everything Bio.EUtils covers.

--Michiel

Peter <biopython at maubp.freeserve.co.uk> wrote:Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488


From mjldehoon at yahoo.com  Sat May 24 10:16:15 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 24 May 2008 07:16:15 -0700 (PDT)
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com>
Message-ID: <135625.21242.qm@web62412.mail.re1.yahoo.com>

Peter <peter at maubp.freeserve.co.uk> wrote:
> Note that at some point we will be moving from Numeric to numpy, so
> the exact internals of the current array based MutableSeq will change
> slightly then.

MutableSeq uses Python's array, not Numeric's array, so it should not be affected by moving from Numeric to numpy.

--Michiel.

       
From biopython at maubp.freeserve.co.uk  Sun May 25 06:36:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 11:36:14 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<1211479809.4835b70111c71@webmail.upv.es>
Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com>

On May 22, 2008, Blanca Postigo Jose Miguel wrote:
>> If I understood your terminology correctly, "qualities" is a list of
>> scores, one for each letter in the sequence.
> You're right. I'm sorry, I used them a lot and a reserved them a special place
> in the API, my fault, I will remove it, only the sequence should have a
> relevant place in the API, the rest should be stored as features.

I've asked on the BioSQL mailing list about this sort of "per letter"
annotation.  Currently there is no mechanism to store this sort of
thing in the schema.
http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html

However, Hilmar did point out some relevant bits of BioPerl to have a look at:

Hilmar Lapp wrote:
> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
> Bio::Seq::MetaI.

The BioPerl SeqWithQuality sounds like what you were most interested
in  Jose, although the Meta-Interface may be of relevance too.

Peter

From biopython at maubp.freeserve.co.uk  Sun May 25 08:06:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 13:06:50 +0100
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <157512.3075.qm@web62408.mail.re1.yahoo.com>
References: <157512.3075.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>

On Sun, May 18, 2008, Michiel de Hoon wrote:
> Hi everybody,
>
> In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now
> installed using a specialized install_data_biopython class. For Bio.Entrez, I am
> using the package_data argument to the setup function instead. Does anybody
> know why the install_data_biopython class was used? If there's no specific
> reason, I'd prefer to use the package_data argument instead.

I think I've found one reason not to - it doesn't seem to be supported
in Python 2.3 as shown here:

C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe
setup.py install
c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option:
 'package_data'
  warnings.warn(msg)
running install
...

If I'd known this earlier, I would of course have said something.  On
the other hand, I may be the only person still using Biopython with
python 2.3.

Peter

From tiagoantao at gmail.com  Sun May 25 08:48:35 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 25 May 2008 13:48:35 +0100
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
References: <157512.3075.qm@web62408.mail.re1.yahoo.com>
	<320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
Message-ID: <6d941f120805250548t357d6d0fwe36d5d1b39eaaa77@mail.gmail.com>

> If I'd known this earlier, I would of course have said something.  On
> the other hand, I may be the only person still using Biopython with
> python 2.3.

What about doing a survey (or a web poll on the site) on the main list
to know what python versions people are using? To have a sense of what
should be supported/deprecated...

From biopython at maubp.freeserve.co.uk  Sun May 25 06:36:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 11:36:14 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<1211479809.4835b70111c71@webmail.upv.es>
Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com>

On May 22, 2008, Blanca Postigo Jose Miguel wrote:
>> If I understood your terminology correctly, "qualities" is a list of
>> scores, one for each letter in the sequence.
> You're right. I'm sorry, I used them a lot and a reserved them a special place
> in the API, my fault, I will remove it, only the sequence should have a
> relevant place in the API, the rest should be stored as features.

I've asked on the BioSQL mailing list about this sort of "per letter"
annotation.  Currently there is no mechanism to store this sort of
thing in the schema.
http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html

However, Hilmar did point out some relevant bits of BioPerl to have a look at:

Hilmar Lapp wrote:
> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
> Bio::Seq::MetaI.

The BioPerl SeqWithQuality sounds like what you were most interested
in  Jose, although the Meta-Interface may be of relevance too.

Peter


From jblanca at btc.upv.es  Mon May 26 01:24:30 2008
From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel)
Date: Mon, 26 May 2008 07:24:30 +0200
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
Message-ID: <1211779470.483a498e18e3e@webmail.upv.es>

> One of your points seemed to be that the SeqRecord couldn't have a
> __getitem__ and methods like reverse, complement, etc.  I don't see
> why it couldn't have these.  Perhaps rather than introducing a whole
> new class, enhancing the SeqRecord would be a better avenue.
My main concern with SeqRecord is that is has a Seq, it we want a slice or a
reverse we would do:
my_seq = SeqRecord(Seq('ACTGTGAC'))
myseq.seq[1:5]
myseq.seq.reverse()
If we add to SeqRecord residues annotations (like qualities) how could be
reversed if we are calling directly to the .seq.reverse(). I don't know how
could this work.
my_seq = SeqRecord(Seq('ACTG'), Qual([10,20,30,40]))
myseq.seq.reverse()
It would create a non-valid sequence
str(myseq.seq) -> 'GTCA'
str(myseq.qual) -> [10,20,30,40]
One possibility is to have methods like __getitem__ and in Seq, it would be
like:
my_seq.seq[1:3]
my_seq[1:3]
Just for testing I have done a RichSeq that is compatible with Seq and
SeqRecord, but that's very confusing. Does this SeqRecord HAS or IS a sequence?
It could work, but I feel that is wrong and it is easier to explain to the users
that a new improved SeqRecord has been created (RichSeq) and that they should
migrate to that.
Another problem difficult to solve. If RichSeq is compatible with Seq as Michel
wants to and I agree on that, how it could be compatible with SeqRecord. The
parameters in their constructors are not compatible:
SeqRecord(seq, ...)
Seq(data, alphabet...)
I would happily improve on RichSeq, but I don't know how to do it in a sane way.
What do you think?

>
> Also, I do think we should bear in mind the BioSQL sequence
> representation, which we currently expose in a SeqRecord/Seq like way.
>  I wouldn't want to lose this / have to completely re-write the
> Biopython BioSQL code.
I would look into that.
Best regards,

Jose Blanca


>
> Peter
>
> On Sun, May 25, 2008 at 9:12 PM, Blanca Postigo Jose Miguel
> <jblanca at btc.upv.es> wrote:
> > Dear biopythonistas:
> > First of all my apologize for the MutableSeq reimplementation. I did it
> just for
> > the sake of learning more about python and Biopython, not to achive a
> speedier
> > implementation. It has been a good learning exercise for me, but now let's
> go
> > for the meat...
> >
> > Everything that follows is just my opinion on the sequence classes. Mine is
> not
> > a well informed opinion and I would just like to show my ideas to you to
> get
> > some feed back and to learn from you.
> >
> > Since this sequence class remodelation is a complex topic I would like to
> > explain my ideas about it with some order. I won't enter into
> implementation
> > details, I will just discuss the API of the classes.
> > I think that Seq and MutableSeq are pretty ok, although MutableSeq has some
> > extra method that depends on implementation and are not relevant for a
> sequence
> > class (append, insert, pop, remove). In general Seq and MutableSeq should
> have
> > the same API, that would do their use simpler.
> >
> > I think that the main problem is SeqRecord. SeqRecord IS NOT a sequence it
> HAS a
> > sequence, that's its main flaw. A more capable Seq class should be a Seq.
> My
> > proposal is to create a RichSeq that inherits from Seq and a MutableRichSeq
> > that inherits from MutableSeq. I've been doing some coding and some
> thinking
> > about that. I'm discussing this with you, because I would like to improve
> the
> > desing of the API of such sequence and I could implement it. It's main
> desing
> > guidelines would be:
> > - Compatible with Seq or with MutableSeq. Everytime that you can use a Seq
> class
> > you can also use a more capable RichSeq without changing anything in your
> > program.
> > - RichSeq IS a Seq, it inherits from Seq.
> > - RichSeq is similar to SeqRecord, but they aren't compatible.
> >        The SeqRecord constructor is:
> >    def __init__(self, seq, id = "<unknown id>", name = "<unknown name>",
> >                 description = "<unknown description>", dbxrefs = None,
> >                 features = None):
> >        and the RichSeq one maybe:
> >    def __init__(self, seq=None, alphabet = None,
> >                 id = "<unknown id>", name = "<unknown name>",
> >                 description = "<unknown description>", dbxrefs = None,
> >                 features = None):
> >        RichSeq has a seq(or could be data) and an alphabet (like the Seq
> class) while
> > SeqRecord has a Seq object.
> >        RichSeq would not have a .seq property.
> > - RichSeq has a __getitem__ method capable of things like RichSeq[1:2]. And
> it
> > would also had the methods reverse, complement, etc.. That's not possible
> with
> > SeqRecord.
> > - RichSeq should be a new type class, what about Seq and MutableSeq?
> > - From a Michel's comment:
> >        1) A Seq object is basically a string, so it should behave as if it
> were
> >        subclassed from string.
> >        2) As a result, functions that have a sequence as an argument, but
> don't need
> >        the added features of a Seq object, should work with strings as well
> as Seq
> >        objects.
> >        4) Currently, Seq objects have an associated alphabet; SeqRecord
> objects have
> >        annotations, dbxrefs, a description, features, id, and name. I think
> a new Seq
> >        object should have both, so that we can avoid having both a Seq and
> a SeqRecord
> >        class. Of course, some or all of these fields can remain None. (I
> would add,
> > that even the seq could be None)
> > If biopython had a class like RichSeq I wouldn't use SeqRecord. Also, the
> > transition from using SeqRecord to RichSeq would be very easy and both
> classes
> > could coexist as long as you would like.
> > Also using the features the per-residue annotation is very easy to
> implement. In
> > fact I have done it already using a RichFeature class, but I would discuss
> that
> > in other mail.
> > RichSeq is more easy to extend than SeqRecord, that's its main advantage. I
> have
> > pretty wild plans for a class like RichSeq. A class like SeqWithQuality or
> the
> > Bio::Seq::MetaI from Bioperl would be very easy to derive from RichSeq. The
> > would be just easier interfaces to the more capable and general RichSeq.
> Even
> > Alignment would be derived from RichSeq. An Alignment IS a sequence with
> > subsequences in it. I have also implemented a prototype of that and it work
> > quite ok with very like coding.
> > This are the more general remarks about RichSeq. What do you think? Is a
> good
> > idea to go beyond SeqRecord for biopython? Could be something like RichSeq
> a
> > possible way to do it?
> >
> > Now I would like to list the open discussion points regarding the sequence
> class
> > APIs.
> > - annotations is not in the constructor of SeqRecord. There's two options:
> add
> > it to the RichSeq constructor or remove it altogether. In my implementation
> a
> > feature can span the whole sequence length or can have a range attached. In
> > this way annotations are just a special case of featues. We would have to
> > decide between dict and list for the API.
> >
> > - __getitem__ should always return a RichSeq. It's more consistent to
> return the
> > same for a_seq[1:2] and a_seq[1]. If someone wants a character can do
> > str(seq)[1].
> >
> > - no seq property in RichSeq.
> >
> > - with __str__ is enough, so tostring() is not necessary for more complex
> > representations we have __repr__. tostring()could be kept for compatibility
> > with the Seq and MutableSeq API.
> >
> > - What to do with id, name and the str annotations when a slice is
> requested? If
> > seq.name is 'a_sequence' should seq[1:10].name be 'a_sequence' or
> 'a_sequence
> > [1:10]' or ''? Same problem with add and __radd__.This is a problem, but
> some
> > of the three alternatives should be taken and explained in the
> documetation. A
> > better solution is in my RichFeature class, but I wouldn't discuss it now.
> >
> > - __iter__ iterates over the sequence as a character string.
> >
> > - __add__ and __radd__
> >
> > - .upper(), .count(), .lower()
> >
> > - .data property. I think that this is an implemetation detail and it
> should be
> > deprecated from Seq and MutableSeq.
> >
> > Well, that's all sorry for the long mail. I'm enjoing working on this
> problem
> > and learning from you.
> > Best regards,
> >
> > Jose Blanca
> >
> >
>


-- 


From bugzilla-daemon at portal.open-bio.org  Wed May 28 08:17:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 May 2008 08:17:25 -0400
Subject: [Biopython-dev] [Bug 2506] New: SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
Message-ID: <bug-2506-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506

           Summary: SELECT problems on _get_seqfeature_dbxref in Loader.py
                    with postgresql
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: andrea at biodec.com
                CC: andrea at biodec.com


Using: 
  - postgres 8.3 or less # the version is not important
  - BioSQL 1.0.0 installed on a postgresql database (on Linux) # the version is
not important
  - python-psycopg 1.1.21-14 or less
  - python-psycopg2 2.0.5.1-6 or less
  - python 2.4.4-2 # not important
  - Biopython CVS version 28/05/08,
    - Loader.py version 1.30
  - "psycopg" or "psycopg2" as BioSeqDatabase.open_database drivers

During insertion in the BioSQL database of a seq_record object derived from a
GenBank Iterator, the procedure _get_seqfeature_dbxref fails with the errror:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 420,
in load
    db_loader.load_seqrecord(cur_record)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 50, in
load_seqrecord
    self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 542, in
_load_seqfeature
    self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 641, in
_load_seqfeature_qualifiers
    seqfeature_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 679, in
_load_seqfeature_dbxref
    self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 712, in
_get_seqfeature_dbxref
    result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id,
  File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
    self.cursor.execute(sql, args or ())
psycopg.ProgrammingError: ERROR:  column "195" does not exist

SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id =
"195" AND dbxref_id = "207739"

The problem is that there is an error in the query format at rows 710 and 711
of the Loader.py in Biopyton/BioSQL:
    709    # Check for an existing record
    710    sql = r'SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref ' \
    711          r'WHERE seqfeature_id = "%s" AND dbxref_id = "%s"'
because the query has double quotes (") around the values, and
postgres interprets them as Column names and not values.

If you correct the query with single quotes, you correct the error. 
    709    # Check for an existing record
    710    sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \
    711          r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Wed May 28 08:31:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 28 May 2008 05:31:33 -0700 (PDT)
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
Message-ID: <499799.68733.qm@web62408.mail.re1.yahoo.com>

That's odd ... I had tried with a Python version 2.3, and it worked there. Maybe this feature was added during the Python 2.3 cycle.
Then, I guess we need to use the  install_data_biopython class for now, and start using package_data once we stop supporting Python 2.3.

--Michiel

Peter <biopython at maubp.freeserve.co.uk> wrote: On Sun, May 18, 2008, Michiel de Hoon wrote:
> Hi everybody,
>
> In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now
> installed using a specialized install_data_biopython class. For Bio.Entrez, I am
> using the package_data argument to the setup function instead. Does anybody
> know why the install_data_biopython class was used? If there's no specific
> reason, I'd prefer to use the package_data argument instead.

I think I've found one reason not to - it doesn't seem to be supported
in Python 2.3 as shown here:

C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe
setup.py install
c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option:
 'package_data'
  warnings.warn(msg)
running install
...

If I'd known this earlier, I would of course have said something.  On
the other hand, I may be the only person still using Biopython with
python 2.3.

Peter


From fkauff at biologie.uni-kl.de  Thu May 29 05:20:56 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Thu, 29 May 2008 11:20:56 +0200
Subject: [Biopython-dev] CVS access and developers web site
Message-ID: <483E7578.50402@biologie.uni-kl.de>

Hi folks,

although I've been quiet for a while, I'm still doing some changes to 
the Nexus parser of biopython from time to time.... I totally lost my 
passwords to access the repository. Could someone please send me a new 
password to get write access to cvs? And I would also like to change the 
information on the biopython developers web site, as they are somewhat 
outdated.
And is this the right place to ask for such things?

Thanks!

Frank

From bugzilla-daemon at portal.open-bio.org  Thu May 29 06:47:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 May 2008 06:47:29 -0400
Subject: [Biopython-dev] [Bug 2506] SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
In-Reply-To: <bug-2506-42@http.bugzilla.open-bio.org/>
Message-ID: <200805291047.m4TAlT18002239@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506


------- Comment #1 from andrea at biodec.com  2008-05-29 06:47 EST -------
Created an attachment (id=926)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=926&action=view)
Proposed patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From p.j.a.cock at googlemail.com  Thu May 29 17:46:46 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 May 2008 22:46:46 +0100
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <483E7578.50402@biologie.uni-kl.de>
References: <483E7578.50402@biologie.uni-kl.de>
Message-ID: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>

Hi Frank,

I would try emailing support at helpdesk.open-bio.org using the email
address associated with your CVS username.  If you've changed email
address, and you run into problems, I expect Michiel or I could vouch
for you.

For the website, the wiki usernames are entirely separate and you
should be able to create a new account if you don't have one already.
If you want to update the tutorial new HTML and PDF files are loaded
with each release from the version in CVS.

Peter

On Thu, May 29, 2008 at 10:20 AM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
> Hi folks,
>
> although I've been quiet for a while, I'm still doing some changes to the
> Nexus parser of biopython from time to time.... I totally lost my passwords
> to access the repository. Could someone please send me a new password to get
> write access to cvs? And I would also like to change the information on the
> biopython developers web site, as they are somewhat outdated.
> And is this the right place to ask for such things?
>
> Thanks!
>
> Frank

From bugzilla-daemon at portal.open-bio.org  Fri May 30 07:15:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 May 2008 07:15:23 -0400
Subject: [Biopython-dev] [Bug 2506] SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
In-Reply-To: <bug-2506-42@http.bugzilla.open-bio.org/>
Message-ID: <200805301115.m4UBFNE3011942@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-30 07:15 EST -------
Thanks for the report.  I've fixed this issue (method _get_seqfeature_dbxref at
line 710) and a similar one (in _get_bioentry_dbxref at line 761) in CVS
BioSQL/Loader.py revision 1.31

Note that I have only tested this with MySQL under Linux.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri May 30 10:17:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 30 May 2008 15:17:08 +0100
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <893127.27535.qm@web62412.mail.re1.yahoo.com>
References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
	<893127.27535.qm@web62412.mail.re1.yahoo.com>
Message-ID: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com>

On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri May 30 11:15:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 May 2008 11:15:16 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805301515.m4UFFGhJ024631@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-30 11:15 EST -------
The XML parser seems to be fine on your example output.  However, the XML
output does not appear to list/flag any difference between:

"Sequences used in model and found again"
"Sequences not found previously or not previously below threshold"

This means there is no way to populate the .new_seqs and .reused_seqs lists. 
If you care about this information, then for now using the plain text output
might be best.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Mon May  5 14:55:42 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 5 May 2008 07:55:42 -0700 (PDT)
Subject: [Biopython-dev] BOSC 2008 announcement and call for submissions
Message-ID: <698765.93604.qm@web62401.mail.re1.yahoo.com>


BOSC 2008 Call for Abstracts Reminder

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will
 take place in Toronto, Ontario, Canada, as one of several Special
 Interest Group (SIG) meetings occurring in conjunction with the 16th annual
 Intelligent Systems for Molecular Biology Conference (ISMB 2008).

This is the final reminder to submit your proposals for talks to the
 BOSC submission system before May 11.

Submission Process:
All abstracts must be submitted through our Open Conference Systems
 site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a
 full paper.  The small Abstract text should be a summary, while the
 longer abstract (should provide more details, including the open-source
 license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm)
 margins on the top, sides, and bottom.  The full-length abstract should
 include the title, authors, and affiliations.  We prefer your abstract
 to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


From bugzilla-daemon at portal.open-bio.org  Wed May  7 15:36:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 May 2008 11:36:43 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
	urgent optimization
In-Reply-To: <bug-2494-42@http.bugzilla.open-bio.org/>
Message-ID: <200805071536.m47FahTU028186@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2494


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-07 11:36 EST -------
Created an attachment (id=917)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=917&action=view)
Patch to BioSQL/BioSeq.py

Hi Eric.

I've tried your script with MySQL 5.0 under Linux, and see similar example
timings, e.g.:

getTaxonSQLsimplex took 458.646 ms
getTaxonSQL took 8152.112 ms
getTaxonSQLall took 8565.304 ms
getTaxonLoop took 18.612 ms

However, your loop function doesn't return exactly the same list as the
original code.  In particular you do not exclude taxonomy lineage entries with
a rank of "no rank".  Also I didn't like the hard coded assumption about
taxon_id 1 as a top node.  What do you think of this version:

def getTaxonLoopPeter(adaptor, taxon_id):
    # climbing up the hierarchy: bottom-up approach based on the child/parent
link with parent_taxon_id
    taxonomy = []
    while taxon_id :
        name, rank, parent_taxon_id = adaptor.execute_one(
        "SELECT taxon_name.name, taxon.node_rank, taxon.parent_taxon_id" \
        " FROM taxon, taxon_name" \
        " WHERE taxon.taxon_id=taxon_name.taxon_id" \
        " AND taxon_name.name_class='scientific name'" \
        " AND taxon.taxon_id = %s", (taxon_id,))
        if taxon_id == parent_taxon_id :
            # If the taxon table has been populated by the BioSQL script
            # load_ncbi_taxonomy.pl this is how top parent nodes are stored.
            # Personally, I would have used a NULL parent_taxon_id here.
            break
        if rank <> "no rank" :
            #For consistency with older versions of Biopython, we are only
            #interested in taxonomy entries with a stated rank.
            #Add this to the start of the lineage list.
            taxonomy.insert(0, name)
        taxon_id = parent_taxon_id
    return taxonomy

I'm attaching a patch to BioSQL/BioSeq.py that uses this code in place of the
current left/right dependent version.  While this does seem to be much faster
in your test script, I'm not sure how much difference this will make in normal
usage.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu May  8 11:56:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 07:56:24 -0400
Subject: [Biopython-dev] [Bug 2496] New: Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
Message-ID: <bug-2496-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496

           Summary: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST
                    option
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
                CC: betainverse at gmail.com


Problem reported on the mailing list by Katie Edmonds.

We need to add the CGI option RUN_PSIBLAST to the Blast URL in order to support
PSI-BLAST.  However, the current Biopython code can't parse the RID from the
resulting HTML which needs another fix.

Patch to follow.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu May  8 11:58:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 07:58:46 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805081158.m48Bwkxq028674@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-08 07:58 EST -------
Created an attachment (id=918)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=918&action=view)
Patch to Bio/Blast/NCBIWWW.py

This seems to work, however there is another problem in the XML parser.  e.g.

from Bio.Blast.NCBIWWW import qblast
#gi|160837788|ref|NP_075631.2| actin related protein 2/3 complex, subunit 1B
sequence = \
"MAYHSFLVEPISCHAWNKDRTQIAICPNNHEVHIYEKSGAKWNKVHELKEHNGQVTGIDWAPESNRIVTC" \
+ "GTDRNAYVWTLKGRTWKPTLVILRINRAARCVRWAPNENKFAVGSGSRVISICYFEQENDWWVCKHIKKP" \
+ "IRSTVLSLDWHPNNVLLAAGSCDFKCRIFSAYIKEVEERPAPTPWGSKMPFGELMFESSSSCGWVHGVCF" \
+ "SASGSRVAWVSHDSTVCLVDADKKMAVATLASETLPLLAVTFITENSLVAAGHDCFPVLFTYDNAAVTLS" \
+ "FGGRLDVPKQSSQRGMTARERFQNLDKKASSEGGAATGAGLDSLHKNSVSQISVLSGGKAKCSQFCTTGM" \
+  "DGGMSIWDVKSLESALKDLKIK"
result_handle1 = qblast('blastp', 'nr', sequence, expect=0.001)
result_handle2 = qblast('blastp', 'nr', sequence, i_thresh=0.05, expect=10,
run_psiblast="on")


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu May  8 14:28:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 8 May 2008 10:28:21 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805081428.m48ESLbe006861@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-08 10:28 EST -------
This patch seems to be working - note that you will also need to update
Bio/Blast/NCBIXML.py to CVS revision 1.18 in order to parse the results.  This
is due to a small change in the formatting of the version number in the latest
XML output.

I would like someone familiar with PSI-Blast to confirm this is OK before I
commit this change to CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May  9 09:01:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 05:01:46 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805090901.m4991kut017980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-09 05:01 EST -------
Katie has reported back via the mailing list that there are still issues with
multiple PSI-Blast iterations, see:
http://lists.open-bio.org/pipermail/biopython/2008-May/004220.html

See also the original thread:
http://lists.open-bio.org/pipermail/biopython/2008-May/004213.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May  9 11:21:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 07:21:33 -0400
Subject: [Biopython-dev] [Bug 2497] New: Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
Message-ID: <bug-2497-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497

           Summary: Unit tests do not cover Bio.Blast.NCBIWWW.qblast()
           Product: Biopython
           Version: 1.45
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Recent NCBI changes to use BLAST 2.2.18+ with their online API broke our XML
parser.  This was actually reported via the mailing list and fixed quickly.

Adding an online unit test to explicitly run a few queries with
Bio.Blast.NCBIWWW.qblast() and parse the XML output could have caught this
earlier.

I'm going to attach a proposed additional unit test to do this,
test_NCBIWWW_online.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May  9 11:24:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 07:24:48 -0400
Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover
	Bio.Blast.NCBIWWW.qblast()
In-Reply-To: <bug-2497-42@http.bugzilla.open-bio.org/>
Message-ID: <200805091124.m49BOmUD023507@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2497


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-09 07:24 EST -------
Created an attachment (id=919)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=919&action=view)
Addition unit test

This is a simple unit test which calls qblast() twice, once using blastp and
once using blastn.

The XML results are then parsed, and it checks that a few pre-defined expected
matches are found.  There is minimal output to the console/output file as I do
not want minor details like the precise number of hits to be reported
(anticpating these to fluctuate as the databases grow).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From quantrum75 at yahoo.com  Fri May  9 13:37:05 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Fri, 9 May 2008 06:37:05 -0700 (PDT)
Subject: [Biopython-dev] Anyone needs help?
In-Reply-To: <mailman.7167.1210332098.2995.biopython-dev@lists.open-bio.org>
Message-ID: <686395.82650.qm@web31404.mail.mud.yahoo.com>

Hi there,
I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? I have tried contributing at a few places before and the problem I ran into was that it was too long and unfocused requirements and nothing came of it in the end.
What I am looking for is,

1) Something small to start off with.
2) Something I can complete within a short period of time (focused work of a day or two) and reach a definite conclusion.
3) No work is too small for me.
4) I d be willing to do any kind of grunt work and would be glad to help with documentation etc
5) Ideally, it would be something like reviewing some documentation and correcting it, or writing some documentation for a function or whatever for someone who needs to do it but just does not have the time to do it.
6) The kind of work I like to do is work that can be completed.

If anyone has such a job in mind, let me know.
Thanks for your time.
Sincerely
Regards
Rama


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


From biopython at maubp.freeserve.co.uk  Fri May  9 15:33:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 May 2008 16:33:08 +0100
Subject: [Biopython-dev] Anyone needs help?
In-Reply-To: <686395.82650.qm@web31404.mail.mud.yahoo.com>
References: <mailman.7167.1210332098.2995.biopython-dev@lists.open-bio.org>
	<686395.82650.qm@web31404.mail.mud.yahoo.com>
Message-ID: <320fb6e00805090833w6977bb3fr6ca32d70cb2887ea@mail.gmail.com>

On Fri, May 9, 2008 at 2:37 PM, Rama wrote:
> Hi there,
> I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project?

Hello Rama.

What is your background? Do you know anything about bioinformatics for
example?  Also how experienced are you with python, and have you ever
worked with the tools diff, patch and CVS?

> I have tried contributing at a few places before and the problem I ran into was that it was too long
> and unfocused requirements and nothing came of it in the end. What I am looking for is,
> ...
> If anyone has such a job in mind, let me know.

I would suggest you have a go at Bug 2446, which is small and
shouldn't be too complicated.   The bug reporter Dave Thompson has
been kind enough to provide a few test cases and example code to
demonstrate the problem.

http://bugzilla.open-bio.org/show_bug.cgi?id=2446

Could you try modifying the Ace parser to just ignore these comment
sections?  The file you need to look at is Bio/Sequencing/Ace.py

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Sequencing/Ace.py?cvsroot=biopython

As you can see from the CVS history, this code hasn't changed since
our latest release of Biopython 1.45, so you could work from that if
its easier than learning about CVS too.   If you can get this to work,
then prepare a patch file against the CVS code (or Biopython 1.45) and
attach it to the bug.

Let me know what you think about trying this.

Regards,

Peter
(one of the Biopython developers)


From bugzilla-daemon at portal.open-bio.org  Fri May  9 18:20:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 May 2008 14:20:12 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805091820.m49IKCMh009431@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


------- Comment #31 from mmokrejs at ribosome.natur.cuni.cz  2008-05-09 14:20 EST -------
Hi,
  I wanted to test what you have but lack some more user friendly
documentation. Specifically, I lack documentation for the class BioSeqDatabase
in BioSeqDatabase.py (attachment 915). In the method load which Eric has
modified it is not clear to me what would be fetched from NCBI Taxonomy DB. I
guess the full lineage, but still I do not know whether as a string or a list
of strings or similarly just taxids?

  The Loader.py (attachment 914) has scary function called remove()
and I would like to see moro elaborate explanation what it really does.
Imagine I have two subspecies of same species in the database want
to delete the first one. Will it zap the parents common to both
of them? I wish not. ;-)

Also, I am a bit surprised that _get_taxon_id() would actually modify a local
database. Could there be another name of could it be split into two functions,
one doing the search ove local db, and optionally fetching data via internet
and second modifying local db?

And, shouldn't the 'if self.fetch_NCBI_taxonomy' have a corresponding elif for
the second attempt and the third one? It is a bit too long to read. ;-)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon May 12 18:40:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 May 2008 14:40:34 -0400
Subject: [Biopython-dev] [Bug 2499] New: Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
Message-ID: <bug-2499-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499

           Summary: Bio.Blast.NCBIXML cannot handle XML without date in
                    BlastOutput_version
           Product: Biopython
           Version: 1.44
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: n.j.loman at bham.ac.uk


I got the following XML file directly from the NCBI website.

<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.2.18+</BlastOutput_version>
  <BlastOutput_reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Sch????ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman
(1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>env_nr</BlastOutput_db>
...

This output raises an exception when put through NCBIXML.parse() due to the
absence of a date after the string BLASTP 2.2.18+

The following diff sorts it out:

--- /home/nick/biopython/biopython-1.44/Bio/Blast/NCBIXML.py    2007-07-27
21:34:07.000000000 +0100
+++ NCBIXML.py  2008-05-12 18:01:36.000000000 +0100
@@ -212,8 +212,10 @@

         Save this to put on each blast record object
         """
-        self._header.version = self._value.split()[1]
-        self._header.date = self._value.split()[2][1:-1]
+        s = self._value.split()
+        self._header.version = s[1]
+        if len(s) > 2:
+           self._header.date = s[2][1:-1]

     def _end_BlastOutput_reference(self):
         """a reference to the article describing the algorithm

I'm sorry, I haven't checked to see if this is fixed in 1.45.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue May 13 08:09:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 04:09:53 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200805130809.m4D89ro7003140@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-13 04:09 EST -------
Hi Nick,

This was reported earlier on the mailing list, and fixed in
Bio/Blast/NCBIXML.py revision 1.18 (at the time I didn't bother to file a bug,
maybe I should have):
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIXML.py?cvsroot=biopython

If you need the fix urgently, you can either get the whole of Biopython from
CVS and install from source, or just replace that one file which can simple be
downloaded from ViewCVS (link above).  Your exception error will tell you where
exactly your local copy of Bio/Blast/NCBIXML.py is.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue May 13 09:16:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 05:16:18 -0400
Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML
	without date in BlastOutput_version
In-Reply-To: <bug-2499-42@http.bugzilla.open-bio.org/>
Message-ID: <200805130916.m4D9GIMV006160@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2499


------- Comment #2 from n.j.loman at bham.ac.uk  2008-05-13 05:16 EST -------
Many thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Tue May 13 12:07:15 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 13 May 2008 05:07:15 -0700 (PDT)
Subject: [Biopython-dev] Reportlab requirement
Message-ID: <305778.65303.qm@web62415.mail.re1.yahoo.com>

Hi everybody,

Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message:

*** Reportlab *** is either not installed or out of date.

This package is optional, which means it is only used in a few
specialized modules in Biopython.  You probably don't need this if you
are unsure.  You can ignore this requirement, and install it later if
you see ImportErrors.
You can find Reportlab at http://www.reportlab.org/downloads.html.

Do you want to continue this installation? (Y/n)  


Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found.

 So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)...

Any objections?

--Michiel.
 
       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


From sdavis2 at mail.nih.gov  Tue May 13 12:34:20 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 13 May 2008 08:34:20 -0400
Subject: [Biopython-dev] Reportlab requirement
In-Reply-To: <305778.65303.qm@web62415.mail.re1.yahoo.com>
References: <305778.65303.qm@web62415.mail.re1.yahoo.com>
Message-ID: <264855a00805130534q6451e40fj427a51e4aa729b18@mail.gmail.com>

On Tue, May 13, 2008 at 8:07 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
>  Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message:
>
>  *** Reportlab *** is either not installed or out of date.
>
>  This package is optional, which means it is only used in a few
>  specialized modules in Biopython.  You probably don't need this if you
>  are unsure.  You can ignore this requirement, and install it later if
>  you see ImportErrors.
>  You can find Reportlab at http://www.reportlab.org/downloads.html.
>
>  Do you want to continue this installation? (Y/n)
>
>
>  Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found.
>
>   So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)...
>
>  Any objections?

I personally think it is a good idea to remove the question, yes.

Sean


From bugzilla-daemon at portal.open-bio.org  Tue May 13 16:25:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 13 May 2008 12:25:49 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200805131625.m4DGPn3W028364@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-13 12:25 EST -------
I see some interesting parrallels for the __getitem__ options for a sequence
alignment, and recent and on going discussions on the numpy discussion list for
the __getitem__ behaviour of matrices versus arrays.  In particular, some
participants favour return of row/column vector objects in some situations. 
Also methods to allow iteration over rows or columns have been suggested.

Here with the sequence Alignment class, we could have SeqRecords for the rows,
but Seq or strings for the columns.  Perhaps we should wait and see how the
numpy discussion turns out?

However, some of the other options discussed here on this bug are probably
worth committing soon (e.g. the __str__ and __repr__ methods)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 14 20:49:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 16:49:08 -0400
Subject: [Biopython-dev] [Bug 2500] New: should use python-numpy instead of
	python-num{eric, array}
Message-ID: <bug-2500-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2500

           Summary: should use python-numpy instead of python-
                    num{eric,array}
           Product: Biopython
           Version: 1.45
          Platform: All
               URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478457
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mail at philipp-benner.de


Both python-numeric and python-numarray do not see new upstream releases
anymore; the currently maintained project is python-numpy. Please convert
the package to use python-numpy instead.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu May 15 00:58:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 20:58:11 -0400
Subject: [Biopython-dev] [Bug 2500] should use python-numpy instead of
	python-num{eric, array}
In-Reply-To: <bug-2500-42@http.bugzilla.open-bio.org/>
Message-ID: <200805150058.m4F0wBCO023044@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2500


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2008-05-14 20:58 EST -------


*** This bug has been marked as a duplicate of bug 2251 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu May 15 00:58:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 14 May 2008 20:58:13 -0400
Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython
In-Reply-To: <bug-2251-42@http.bugzilla.open-bio.org/>
Message-ID: <200805150058.m4F0wDfd023057@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2251


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail at philipp-benner.de


------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp  2008-05-14 20:58 EST -------
*** Bug 2500 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Thu May 15 13:04:22 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 15 May 2008 14:04:22 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com>
	<320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com>
	<320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
Message-ID: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>

Hi all,

We are trying to submit an abstract for BOSC 2008 regarding Biopython.
Below is the current version.
Comments would be very appreciated (we are already after the deadline,
so they should come in fast ;) ).
Michiel, do you want to add anything to the "future" section?


---------------------------------------

Biopython Project Update

Tiago Antao[1], Peter Cock[2]

In this talk we present the current status of the Biopython project,
we focus on features developed since BOSC 2007, future plans for the
project and present example usages of the new population genetics
module.

The latest Biopython release is 1.45 made available on 22 March 2008.
Some of the new features are:

  1. A new population genetics module including support for
coalescent simulation, selection detection and the GenePop file
format. The new module relies on existing open source external
software (e.g., the open source Simcoal2 for coalescent simulation
which is can take advantage of multiple core CPUs for computationally
intensive tasks).
  2. Improved documentation.
  3. Deprecation of many modules which were either obsolete or had
been superseded by other code.
  4. Plus many bugs were fixed, included updates for evolving file formats.

Since the Biopython 1.45 release, further work is planned to extend
the Population Genetics module (e.g., with a statistics component).  A
new sequence alignment module is also being implemented with a uniform
API for reading and writing various alignment files, based on the
approach of the Bio.SeqIO module added last year for working with
sequences.  Work to improve Biopython's BioSQL support is also
ongoing.

Time permitting, the talk will also show usage examples of the new
population genetics module. The focus will be put not only on the
population genetics side, but also on strategies to easily use all
available computational power on new multiple core computers. This is
useful for users of the most scripting languages as most language
interpreter implementations impose stern limits on multi-threaded
programming efficiency, which is important when using computational
biology code which is CPU intensive. We will take this opportunity to
discuss strategies to overcome those language limitations.


Any feedback would really be much appreciated, thanks!


From biopython at maubp.freeserve.co.uk  Thu May 15 13:48:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 May 2008 14:48:26 +0100
Subject: [Biopython-dev] Bio.AlignIO for sequence alignment input/output
Message-ID: <320fb6e00805150648y42e91765oa99eab7e5e1cf8fa@mail.gmail.com>

Those of you subscribed to the CVS update feed (see
http://biopython.org/wiki/Tracking_CVS_commits and the RRS link) will
have noticed some activity in Bio.AlignIO which I originally proposed
adding a year ago.  See also enhancement Bug 2285,
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

I've been using this code on and off in my own work, and have put
together a reasonable unit test.  I've finished a first draft of a new
chapter in the tutorial describing the module (you'll need to run
pdflatex or hevea on biopython/Doc/Tutorial.tex to read this), and
started a wiki page too: http://www.biopython.org/wiki/AlignIO

The API is deliberately very close to that of Bio.SeqIO, but deals
with Alignment objects rather than SeqRecord objects.  I'm hoping for
some feedback now, even if it is as little as pointing out any typos
in the documentation.  Also additional example input files would be
good - and checking the Biopython output is understood by third party
tools.

One particular issue with the API is handling ambiguous FASTA files
which have been used to store more than one alignment (discussed in
the updated tutorial).  There is an optional argument to the
Bio.AlignIO.parse() function to specify the number of sequences
expected per alignment which covers the most typical scenarios.  I am
open to the idea of simply removing this option, which means if the
user really wants to parse one of the ambigous files, they would have
to read in the individual sequences using Bio.SeqIO, batch them as
needed, and then create the alignments.

Peter


From p.j.a.cock at googlemail.com  Thu May 15 13:51:59 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 15 May 2008 14:51:59 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com>
	<320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com>
	<320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com>
	<6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
Message-ID: <320fb6e00805150651md383437w2233bc1419589d40@mail.gmail.com>

One little typo I should have spotted earlier:

 4. Plus many bugs were fixed, included updates for evolving file formats.

Should be:

 4. Plus many bugs were fixed, including updates for evolving file formats.

Also I didn't insert our addresses for the [1] and [2] implied footnotes.

Peter


From mjldehoon at yahoo.com  Sat May 17 03:04:54 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 16 May 2008 20:04:54 -0700 (PDT)
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
Message-ID: <89450.67823.qm@web62411.mail.re1.yahoo.com>

Dear Tiago,
Thank you for representing Biopython at BOSC!
If there's still time, I would suggest to aim the abstract (and also the talk itself) more at the general audience, who may know very little about Biopython or Python. So perhaps an overview of the main modules (no details, just to give an idea of what is covered by Biopython), the Population Genetics module, number of developers, number of users, and perhaps just mention the existence of some other big packages (numerical python, matplotlib, MMTK, ...) that are relevant to science & biology with Python. The point is that most people in the audience are not Biopython users (yet), so for them a general introduction is more suitable.

--Michiel.

Tiago Ant???o <tiagoantao at gmail.com> wrote: Hi all,

We are trying to submit an abstract for BOSC 2008 regarding Biopython.
Below is the current version.
Comments would be very appreciated (we are already after the deadline,
so they should come in fast ;) ).
Michiel, do you want to add anything to the "future" section?


---------------------------------------

Biopython Project Update

Tiago Antao[1], Peter Cock[2]

In this talk we present the current status of the Biopython project,
we focus on features developed since BOSC 2007, future plans for the
project and present example usages of the new population genetics
module.

The latest Biopython release is 1.45 made available on 22 March 2008.
Some of the new features are:

  1. A new population genetics module including support for
coalescent simulation, selection detection and the GenePop file
format. The new module relies on existing open source external
software (e.g., the open source Simcoal2 for coalescent simulation
which is can take advantage of multiple core CPUs for computationally
intensive tasks).
  2. Improved documentation.
  3. Deprecation of many modules which were either obsolete or had
been superseded by other code.
  4. Plus many bugs were fixed, included updates for evolving file formats.

Since the Biopython 1.45 release, further work is planned to extend
the Population Genetics module (e.g., with a statistics component).  A
new sequence alignment module is also being implemented with a uniform
API for reading and writing various alignment files, based on the
approach of the Bio.SeqIO module added last year for working with
sequences.  Work to improve Biopython's BioSQL support is also
ongoing.

Time permitting, the talk will also show usage examples of the new
population genetics module. The focus will be put not only on the
population genetics side, but also on strategies to easily use all
available computational power on new multiple core computers. This is
useful for users of the most scripting languages as most language
interpreter implementations impose stern limits on multi-threaded
programming efficiency, which is important when using computational
biology code which is CPU intensive. We will take this opportunity to
discuss strategies to overcome those language limitations.


Any feedback would really be much appreciated, thanks!
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Sat May 17 06:13:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 May 2008 02:13:33 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805170613.m4H6DXDZ016145@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #914 is|0                           |1
           obsolete|                            |


------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp  2008-05-17 02:13 EST -------
Created an attachment (id=920)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=920&action=view)
Replacement for "Usage ... to load a SeqRecord's taxonomy"

Recently I made some changes to the Taxonomy parser in Bio.Entrez, specifically
to make it more consistent with the other parsers in Bio.Entrez. Some fields in
the XML are now accessed slightly differently. I updated Loader.py accordingly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sun May 18 14:33:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 18 May 2008 07:33:25 -0700 (PDT)
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
Message-ID: <157512.3075.qm@web62408.mail.re1.yahoo.com>

Hi everybody,

In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now installed using a specialized install_data_biopython class. For Bio.Entrez, I am using the package_data argument to the setup function instead. Does anybody know why the install_data_biopython class was used? If there's no specific reason, I'd prefer to use the package_data argument instead.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Mon May 19 09:30:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 05:30:59 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
	taxon entries in lineage
In-Reply-To: <bug-2475-42@http.bugzilla.open-bio.org/>
Message-ID: <200805190930.m4J9UxLu016813@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2475


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-19 05:30 EST -------
This bug is also linked to Bug 2494 (currently titled "_retrieve_taxon in
BioSQL.py needs urgent optimization") which is about not using the left/right
values when reteiving data from the database.

This is important because changes made in this bug (i.e. Bug 2475) may leave
the left/right values NULL when writing new lineages.

Also, in repley to comment 31, all of the other _get_...() methods of the
Loader class can also add things to the database (e.g. qualifier keys).  Once
you know this, the fact that _get_taxon_id() goes this too isn't a shock. 
Also, yes, the _get_taxon_id() function is getting far too long, and should
probably be restructured as part of this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Mon May 19 12:09:57 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:09:57 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <89450.67823.qm@web62411.mail.re1.yahoo.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
Message-ID: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>

On Sat, May 17, 2008 at 4:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> The point is that most people in the audience are not Biopython users (yet),
> so for them a general introduction is more suitable.

Actually this issue is of a major concern to me... Do you (or anybody)
has a feel of what audience will be there? I think it is important to
tune the message to the audience. I actually was speculating that
people would know about biopython. But if that is not the case, as you
imply, then maybe a something that makes biopython more competitive
for people which might be deciding which system (language and
libraries) might be the best approach...


From p.j.a.cock at googlemail.com  Mon May 19 12:26:31 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 May 2008 13:26:31 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
Message-ID: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>

>> The point is that most people in the audience are not Biopython users (yet),
>> so for them a general introduction is more suitable.
>
> Actually this issue is of a major concern to me... Do you (or anybody)
> has a feel of what audience will be there? I think it is important to
> tune the message to the audience. I actually was speculating that
> people would know about biopython. But if that is not the case, as you
> imply, then maybe a something that makes biopython more competitive
> for people which might be deciding which system (language and
> libraries) might be the best approach...

Perhaps I should have given you a broader introduction to BOSC itself.
 There will probably be talks from BioPerl, BioJava and BioRuby in the
same session, and I would expect almost all the audience to be
familiar with at least one of these projects.  However, they may or
may not use python, and I would expect that the majority will not be
Biopython users.  At least, that was my impression last year at BOSC
2007.  Reading over the talk titles/abstracts from last year should
give you a feel for the sort of people there presenting work outside
the Bio* projects.  In terms of general impressions, I felt most of
the attendees actually did some hands on coding.

So yes, as Michiel says, perhaps the current text isn't general
enough.  This is a regular opportunity to try raise awareness of the
project, although I personally wouldn't give a "hard sell", we should
try to give a general overview of Biopython's capabilities.

Peter


From sbassi at gmail.com  Mon May 19 12:36:15 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 19 May 2008 09:36:15 -0300
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
Message-ID: <b43bf2080805190536q546a01c1s2feb0f9ecb386a00@mail.gmail.com>

On Mon, May 19, 2008 at 9:26 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
....
> project, although I personally wouldn't give a "hard sell", we should
> try to give a general overview of Biopython's capabilities.

This work may give some ideas about introducing Biopython:
http://openwetware.org/wiki/Julius_B._Lucks/Projects/Python_All_A_Scientist_Needs


From tiagoantao at gmail.com  Mon May 19 12:38:34 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:38:34 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
Message-ID: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>

In order to address this I am thinking in changing the starting
paragraph of the "paper" along the following lines:

In this talk we present the current status of the Biopython project.
We start by giving a short overview of Biopython - presenting existing
functionality - and useful software libraries for computational
biology in the Python development 'ecology' (from plotting libraries
capable of producing publication quality figures to numerical
libraries, among others). We then focus on features developed since
BOSC 2007, future plans for the project and present example usages of
the new population genetics module.


I think changing the abstract along these lines might also be good.

I think I will target most of the presentation to the idea that the
Python ecology of software development is really good (e.g. putting
one slide on matplot lib with code and result, to show how concise and
simple code can produce nice results). "Selling" Biopython in the
whole python context.

On Mon, May 19, 2008 at 1:26 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> The point is that most people in the audience are not Biopython users (yet),
>>> so for them a general introduction is more suitable.
>>
>> Actually this issue is of a major concern to me... Do you (or anybody)
>> has a feel of what audience will be there? I think it is important to
>> tune the message to the audience. I actually was speculating that
>> people would know about biopython. But if that is not the case, as you
>> imply, then maybe a something that makes biopython more competitive
>> for people which might be deciding which system (language and
>> libraries) might be the best approach...
>
> Perhaps I should have given you a broader introduction to BOSC itself.
>  There will probably be talks from BioPerl, BioJava and BioRuby in the
> same session, and I would expect almost all the audience to be
> familiar with at least one of these projects.  However, they may or
> may not use python, and I would expect that the majority will not be
> Biopython users.  At least, that was my impression last year at BOSC
> 2007.  Reading over the talk titles/abstracts from last year should
> give you a feel for the sort of people there presenting work outside
> the Bio* projects.  In terms of general impressions, I felt most of
> the attendees actually did some hands on coding.
>
> So yes, as Michiel says, perhaps the current text isn't general
> enough.  This is a regular opportunity to try raise awareness of the
> project, although I personally wouldn't give a "hard sell", we should
> try to give a general overview of Biopython's capabilities.
>
> Peter
>


-- 
http://www.tiago.org


From tiagoantao at gmail.com  Mon May 19 12:49:29 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 19 May 2008 13:49:29 +0100
Subject: [Biopython-dev] Fwd: Abstract
In-Reply-To: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>
References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com>
	<89450.67823.qm@web62411.mail.re1.yahoo.com>
	<6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com>
	<320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com>
	<6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com>
Message-ID: <6d941f120805190549u773310aj5df318952eca5e52@mail.gmail.com>

By the way, the suggested abstract proposal:

Introduction to and news from the Biopython project presenting both
existing modules and current developments including a new Population
Genetics module and XML parsers for the NCBI's Entrez web interface.

An overview of the existing python software ecology will also be
presented in relationship with computational biology. Libraries to do,
among others, plotting, numerical analysis and molecular modeling will
be presented in connection with Biopython and from the point a view of
having a complete platform to do research in computational biology.

Biopython is freely available on http://www.biopython.org under a
liberal "MIT style" open source license,
http://www.biopython.org/DIST/LICENSE


On Mon, May 19, 2008 at 1:38 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> In order to address this I am thinking in changing the starting
> paragraph of the "paper" along the following lines:
>
> In this talk we present the current status of the Biopython project.
> We start by giving a short overview of Biopython - presenting existing
> functionality - and useful software libraries for computational
> biology in the Python development 'ecology' (from plotting libraries
> capable of producing publication quality figures to numerical
> libraries, among others). We then focus on features developed since
> BOSC 2007, future plans for the project and present example usages of
> the new population genetics module.
>
>
> I think changing the abstract along these lines might also be good.
>
> I think I will target most of the presentation to the idea that the
> Python ecology of software development is really good (e.g. putting
> one slide on matplot lib with code and result, to show how concise and
> simple code can produce nice results). "Selling" Biopython in the
> whole python context.
>
> On Mon, May 19, 2008 at 1:26 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> The point is that most people in the audience are not Biopython users (yet),
>>>> so for them a general introduction is more suitable.
>>>
>>> Actually this issue is of a major concern to me... Do you (or anybody)
>>> has a feel of what audience will be there? I think it is important to
>>> tune the message to the audience. I actually was speculating that
>>> people would know about biopython. But if that is not the case, as you
>>> imply, then maybe a something that makes biopython more competitive
>>> for people which might be deciding which system (language and
>>> libraries) might be the best approach...
>>
>> Perhaps I should have given you a broader introduction to BOSC itself.
>>  There will probably be talks from BioPerl, BioJava and BioRuby in the
>> same session, and I would expect almost all the audience to be
>> familiar with at least one of these projects.  However, they may or
>> may not use python, and I would expect that the majority will not be
>> Biopython users.  At least, that was my impression last year at BOSC
>> 2007.  Reading over the talk titles/abstracts from last year should
>> give you a feel for the sort of people there presenting work outside
>> the Bio* projects.  In terms of general impressions, I felt most of
>> the attendees actually did some hands on coding.
>>
>> So yes, as Michiel says, perhaps the current text isn't general
>> enough.  This is a regular opportunity to try raise awareness of the
>> project, although I personally wouldn't give a "hard sell", we should
>> try to give a general overview of Biopython's capabilities.
>>
>> Peter
>>
>
>
>
> --
> http://www.tiago.org
>


-- 
http://www.tiago.org


From bugzilla-daemon at portal.open-bio.org  Mon May 19 13:46:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 09:46:22 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805191346.m4JDkMMf028474@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|blocker                     |major


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-19 09:46 EST -------
I see from comment 11 that some nasty quote escaping may be needed (which could
be an NCBI bug).

Have you been able to try using relative paths at the command line (avoiding
spaces ideally)?

Unfortunately my Windows machine is currently without internet access, which is
one reason why I haven't made time to sit down and explore this issue.

P.S. I don't think this is a critical bug in Biopython, although I do take your
point that it your setup this is a big issue.  Downgrading this to severity
"major".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon May 19 21:03:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:03:44 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192103.m4JL3iSk021133@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #13 from drpatnaik at yahoo.com  2008-05-19 17:03 EST -------
To get BioPython call BLAST, this works:
  1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'

Variations like these do not work:
  2. "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"
  3. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"

The error being:
  'C:/Documents' is not recognized as an internal or external command, operable
program or batch file.

With my_blast_exe set to the 1st value constant, and trying different
my_blast_db values, BLAST reports:
  [NULL_Caption] ERROR: Arguments must start with '-' (the offending argument
#5 was: 'and') /* or 'and\' or 'and\\' */

The values tried for my_blast_db are:
  4. 'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine'
  5. 'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine'
  6. 'C:/Documents\\ and\ Settings/patnaik/My\\ Documents/blast/bin/mine'

  7. "C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"
  8. "C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"
  9. "C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"

  10. r'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine'
  11. r'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine'
  12. r'C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine'

  13. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"
  14. r"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"
  15. r"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"

But a different error ...:
  'C:/Documents' is not recognized as an internal or external command, operable
program or batch file.

... is shown with these values:

  16. r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"'
  17. r'"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"'
  18. r'"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"'

That same error is also seen when I try these variations of the value that
works in command-line BLAST (comment #10 above):

  19. r'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'
  20. r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""'
  20. "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""
  21. r"\"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"\""
  22. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'"

Doesn't this suggest that Biopython is not passing the my_blast_db value
properly? 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon May 19 21:36:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:36:42 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192136.m4JLag7h022387@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #14 from drpatnaik at yahoo.com  2008-05-19 17:36 EST -------
(continuing comment #13)

  23. r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine"'
  24. '"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine"'
  25. '\\"C:\\Documents and Settings\\patnaik\\My
Documents\\blast\\bin\\mine\\"'
  26. r"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""
  27. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon May 19 21:47:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 19 May 2008 17:47:00 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805192147.m4JLl0HQ022723@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #15 from drpatnaik at yahoo.com  2008-05-19 17:47 EST -------
(In reply to comment #13)
> To get BioPython call BLAST, this works:
>   1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My
> Documents/blast/bin/blastall.exe"'

I forgot to add that I had to comment-out the os.path.exists in
NCBIStandaolne.py to get to that step. Equivalently, with this script I get the
'does not exist' message:

   import os
   my_blast_exe =r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'
   if not os.path.exists(my_blast_exe):
     print 'cannot find my_blast_exe'


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue May 20 16:31:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 20 May 2008 12:31:41 -0400
Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not
	support RUN_PSIBLAST option
In-Reply-To: <bug-2496-42@http.bugzilla.open-bio.org/>
Message-ID: <200805201631.m4KGVfF8016867@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2496


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-20 12:31 EST -------
Follow up discussion on the mailing list:
http://lists.open-bio.org/pipermail/biopython/2008-May/004231.html

Katie wrote:
> I asked NCBI about this, and they (eventually) replied that it's "not
> officially supported."  I have been unable to figure out how to get it to
> return iterations after the first one.

I'm going to close this bug as "invalid" unless the NCBI do make a public API
for PSI-BLAST.  It looks like the only solution for now would be to install the
standalone blast tools...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 02:45:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 20 May 2008 22:45:58 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805210245.m4L2jwgM013784@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #16 from drpatnaik at yahoo.com  2008-05-20 22:45 EST -------
Similar to what I mentioned in comment #10 this BLAST command-line code works:

(1)   "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"
-p blastn -d "\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\mine\"" -i "C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin" -m 7

Now I've been trying to see the system call popen3 makes in line 1662 of
NCBIStandalone.py by putting this line of code before the os.popen3("
".join([blastcmd] + params):

   print " ".join([blastcmd] + params)

(as reported in comment #15, I do have to first disable the os.path.exists)

Using these values in my test script:
   my_blast_db =r'"\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\mine\""'
   my_blast_file =r'"C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin"'
   my_blast_exe =r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe"'

I get a print command result that is identical to the working BLAST
command-line code (1).

   "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p
blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i
"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7

But BLAST doesn't get called and the error reported is:

   'C:/Documents' is not recognized as ...

Finally I tried replacing the code inside the os.popen3 of NCBIStandalone.py
with the string (1):

   w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and
Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and
Settings\patnaik\My Documents\blast\bin\hairpin" -m 7')

And I get the same error:

   'C:/Documents' is not recognized as ...

With a non-Biopython-dependent script, I get the same error (irrespective of
the quote combinations I tried):

   import os
   w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and
Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and
Settings\patnaik\My Documents\blast\bin\hairpin" -m 7')
   print e.read()

-------------------------------------------------------------------

FINAL THOUGHTS

I think I've to give up on this.

There seem to be two incurable issues, unlikely Biopython-specific:
os.path.exists (see comment #15) and os.popen3


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 08:34:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 04:34:52 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805210834.m4L8YqVL004607@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 04:34 EST -------
The os.path.exists(...) check in Biopython should be easy to fix, probably by
the user specifying the exe name without quotes and biopython adding the quotes
when building the command line.

For specifying the NCBI database locations, have you set the database folder
using NCBI.ini yet?  I'm not sure if it will work if the INI file is in the
BLAST directory as the NCBI documentation says it should go in the Windows
directory (which you don't have write access to).  Perhaps anywhere on the path
will work.  See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

There is also the option of using relative paths...

You might get more success talking to the machine administrator and asking them
to install BLAST for you?

The good news is my home internet connection is up and running, so I may be
able to do a little investigation on this issue now (time permitting).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed May 21 09:21:51 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 21 May 2008 10:21:51 +0100
Subject: [Biopython-dev] Next release?
Message-ID: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>

>From the discussion list, quite a few people have suffered from the
NCBI tweaking the online Blast XML format with 2.2.18+ (bug 3499), so
it would be nice to get a new release out soon to address this.  See
http://bugzilla.open-bio.org/show_bug.cgi?id=2499

How do the other modules stand at the moment?

Bio.PopGen (Tiago). Is this currently stable, or are you in the middle
of adding more features?

Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488

Bio.AlignIO - this is new, but has a reasonable amount of
documentation and a small unit test (see bug 2285).  If we did do a
release soon, it could be announced as "in beta", and subject to
change, but feedback welcomed.
http://bugzilla.open-bio.org/show_bug.cgi?id=2285

In terms of the unit tests, I haven't run them on Windows recently
(internet access issues, hopefully resolved now), but on Linux things
looks fine.

Peter


From mjldehoon at yahoo.com  Wed May 21 09:40:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 21 May 2008 02:40:25 -0700 (PDT)
Subject: [Biopython-dev] Next release?
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <928585.24226.qm@web62401.mail.re1.yahoo.com>

Peter <biopython at maubp.freeserve.co.uk> wrote:Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488

I am still making refinements. I am using this module a lot for my own work, and I have a lot of changes that are not in CVS yet. The final result should be much simpler than what is in CVS now. In particular, we won't have to write a Python module for each DTD, but let Python figure out the DTD for itself.
Once this is finished (hopefully soon), I'd be happy to make a new release.

--Michiel.

       
From bugzilla-daemon at portal.open-bio.org  Wed May 21 10:48:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 06:48:40 -0400
Subject: [Biopython-dev] [Bug 2501] New: Minor erratas in module
	Bio.SeqRecord
Message-ID: <bug-2501-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2501

           Summary: Minor erratas in module Bio.SeqRecord
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: xbello at gmail.com


line 32: description - Seqeuence description, optional (string) 
line 63: if self.description : lines.append("Desription: %s" %
self.description)

Seqeuence instead of Sequence
Desription instead of Description


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 11:28:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:28:33 -0400
Subject: [Biopython-dev] [Bug 2501] Minor erratas in module Bio.SeqRecord
In-Reply-To: <bug-2501-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211128.m4LBSX99014512@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2501


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 07:28 EST -------
Thanks for point those out - fixed in CVS revision 1.16


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Wed May 21 11:41:15 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 21 May 2008 12:41:15 +0100
Subject: [Biopython-dev] Next release?
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <6d941f120805210441w4f3fc3d7m42ee5531dca127df@mail.gmail.com>

On Wed, May 21, 2008 at 10:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Bio.PopGen (Tiago). Is this currently stable, or are you in the middle
> of adding more features?

Long story: I will just add after moving to SVN. Actually the most
important part is going to be added next, but I am waiting for SVN
(any news on that front?). The statistics part that I will be
commiting is the core of the module...

Short story: Don't worry with me if you are doing a release in the
next couple of weeks...


From bugzilla-daemon at portal.open-bio.org  Wed May 21 11:51:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:51:17 -0400
Subject: [Biopython-dev] [Bug 2502] New: PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
Message-ID: <bug-2502-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502

           Summary: PSIBlastParser fails with blastpgp 2.2.18 though works
                    with blastpgp 2.2.15
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: ibdeno at gmail.com


When parsing a PSI-Blast result from blastpgp version 2.2.18 I get this error:

Traceback (most recent call last):
  File "./lpbl.py", line 23, in <module>
    b_record = b_parser.parse(blast_out)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 760,
in parse
    self._scanner.feed(handle, self._consumer)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 98,
in feed
    self._scan_header(uhandle, consumer)
  File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 208,
in _scan_header
    raise ValueError("Invalid header?")
ValueError: Invalid header?

The same script and same input just works with blastpgp 2.2.15

I will attach script and input file later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 11:56:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 07:56:53 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211156.m4LBurSt016108@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #1 from ibdeno at gmail.com  2008-05-21 07:56 EST -------
Created an attachment (id=921)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=921&action=view)
Contains a script and an example sequence to reproduce the bug

Change in the script the location of the blast command and of the database to
be used.

Run it as:

./lpbl.py hsTXN.prot.fasta 3

The second argument is the number of iterations for blastpgp


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 13:05:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 09:05:13 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211305.m4LD5DhV020562@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-21 09:05 EST -------
Miguel - could you also attach the XML output from blastpgp 2.2.15 and 2.2.18
please?

e.g.  Something like this if you want to do it via Biopython:

blast_out, error_info = NCBIStandalone.blastpgp(
        blastcmd='/opt/Bio/blast-2.2.15/bin/blastpgp',
        database='/opt/databases/BlastDB/nrdb100ncbi',
        infile=file,
        npasses=passes,
        align_view='0',
        matrix_outfile=file + '.pssm')

handle = open("blastpgp_2.2.15.xml","w")
handle.write(blast_out.read())
handle.close()

Thanks, Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 14:44:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 10:44:41 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211444.m4LEifII025392@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #3 from ibdeno at gmail.com  2008-05-21 10:44 EST -------
Created an attachment (id=922)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=922&action=view)
Plain text and XML outputs from blastgpg

The names should be self-explanatory.
The log files where produced with the appropriate blastpgp version using the
command line:

blastpgp -i hsTXN.prot.fasta -d /drives/databases/BlastDB/nrdb100ncbi -j 1 -m
[0,7]

m = 0 is plain text (as in the original submitted bug)
m = 7 is XML


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed May 21 17:21:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 May 2008 13:21:27 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
	file-path values
In-Reply-To: <bug-2480-42@http.bugzilla.open-bio.org/>
Message-ID: <200805211721.m4LHLRX1003810@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2480


------- Comment #18 from drpatnaik at yahoo.com  2008-05-21 13:21 EST -------
The BLAST database folder being inside blast/bin seems to be fine as
command-line BLAST does work. I haven't tried relative paths. It should work,
as should using an external drive that can provide for space-less paths. But
the issue of spaces in paths on Windows remains. I thank you for your
suggestions and efforts looking into it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jblanca at btc.upv.es  Thu May 22 07:30:52 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 22 May 2008 09:30:52 +0200
Subject: [Biopython-dev] sequence class proposal
Message-ID: <200805220930.53004.jblanca@btc.upv.es>

Dear Biopython developers,
I've been using python and Biopython for some time now and I would like to 
talk with you about the sequence classes in Biopython. I have had some issues 
using the SeqRecord and Alignment classes and I have being discussing and 
implementing with two students (Victor Sanchez y Pablo Martinez) a proposal 
of a new sequence class. We would like to present this implementation as a 
tip in the discussion about the design of the sequence classes in Biopython 
and we're eager to receive your comments.

The first problem that I found with the SeqRecord is the lack of support for 
qualities. And it is also difficult to implement this quality support in a 
SeqRecord derived class. There's a problem with the current SeqRecord API 
that difficults this. Let me explain it.
Currently SeqRecord has a seq property and if you want an slice or if you need 
to reverse or complement you would do something like:
my_seq = SeqRecord()
my_seq.seq = Seq('ACTG')
my_seq.seq[0:2]
my_seq.seq = my_seq.reverse()
If I derive a class from SeqRecord with a qual property I don't know how to 
reverse both the sequence and the quality at the same time, because now the 
Seq methods are called directly without SeqRecord being aware of that. In 
order to support that we have discuss a new class with a slightly different 
API and we have done a preliminary implementation. We have named this new 
class as RichSeq, and we think that this could solve the quality problem. 
With this new class it would work like this:
myseq = RichSeq(seq='ACTG', qual=[50,50,50,50])
subseq = myseq[0:2]
myseq.reverse()
myseq.complement()
RichSeq is equivalent to SeqRecord and it has the same properties as 
SeqRecord, but it adds the methods __getitem__, reverse, complement and 
reverse_complement.

We have also implemented a new type of features, we have called them 
RichFeature. They are similar to the SeqFeature. The main difference is that 
instead of a location and a location operator, they have a BioRange (another 
new class). This BioRange is inspired/copied from the Bioperl library. The 
BioRange is optional, so some RichFeature uses would be:
RichFeature(id='a_feature', type='annotation', feature='this is an 
annotation')
RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG'))
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation, 
e.g. an exon')
seq = RichSeq(seq='ACTGACTG', features=[feat])

With this implementation you can define a sequence with seq, qual and 
annotations associated with a range in a easy way, and after that you can 
reverse and complement them in a trivial way.
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation')
seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat])
seq.reverse()

By the way, this is a mutable class, although that could be easily changed.

You can even use Seqs and RichSeq as subsequences and ask for slices or 
complements.
range = BioRange(start=1,end=2)
feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range)
seq = RichSeq(seq='ACTG', features=[feat])
seq2 = seq[1:2]
seq.reverse()
This capability makes this RichSeq an excellent candidate for a base class for 
an Alignment implementation, but we have not implemented this yet.

Attach to this mail you can find the implementation of this new classes. They 
have some tests that provide an idea about their intended use. We would like 
to know about your opinions and suggestions. Do you think that this kind of 
functionality is desirable? Please let us know about any flaw, specially in 
the API. I think that my work would be easier using a sequence class similar 
to RichSeq, but maybe there's an easier way.
Do you think that is a good idea to attach this classes to bugzilla? Do we 
open a new bug or there's one for this sequence class debate already open?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: richseq.0.0.1.tar.gz
Type: application/x-tgz
Size: 7075 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20080522/aba24889/attachment-0002.bin>

From biopython at maubp.freeserve.co.uk  Thu May 22 15:47:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 22 May 2008 16:47:58 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <200805220930.53004.jblanca@btc.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
Message-ID: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>

On Thu, May 22, 2008 at 8:30 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Dear Biopython developers,
> I've been using python and Biopython for some time now and I would like to
> talk with you about the sequence classes in Biopython. I have had some issues
> using the SeqRecord and Alignment classes and I have being discussing and
> implementing with two students (Victor Sanchez y Pablo Martinez) a proposal
> of a new sequence class. We would like to present this implementation as a
> tip in the discussion about the design of the sequence classes in Biopython
> and we're eager to receive your comments.

If I understood your terminology correctly, "qualities" is a list of
scores, one for each letter in the sequence.  I see this is a special
case of a more general situation where you have per-letter-annotation
information.  Examples include secondardy structure or residue
coordinates of a protein sequence.  Very often for example, secondary
structures are stored in files as a simple string whise length matches
the length of the sequence.  Also related are sub-features like
domains or promotor sites which span a range of residues.

So I would agree with you that an enhanced class would be useful,
where the per letter annotations were respected in splicing, reversing
etc.  Handling sub-features when slicing is less straight forward.

The current SeqRecord and Seq classes separate the sequence annotation
from the sequence letters themselves, making this sort of integration
difficult.  Making the SeqRecord a direct subclass of the Seq object
has previously been suggested and would pave the way for this sort of
operation.

See Bug 2351 where some of these ideas have been floated...
http://bugzilla.open-bio.org/show_bug.cgi?id=2351

There are a lot of things that would need to be discussed - for
example how would you handle the pre-sequence annotation (e.g. record
identifiers) when adding two "rich" seqeunces?  I've been content with
making small steps for now, with backwards compatibility always in
mind.

On another note, I'm also thinking about the need for an annotated
sequence alignment object, where there are similar concerns.

Also, have you discussed the alphabet objects?

> Do you think that is a good idea to attach this classes to bugzilla? Do we
> open a new bug or there's one for this sequence class debate already open?

Your proposals do seem very broad, so have a look at Bug 2351 first,
but perhaps start a new enhancement bug, and then attach the code.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri May 23 10:06:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 06:06:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231006.m4NA6itj022486@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 06:06 EST -------
I've worked out that the original problem was use to trying to parse XML output
with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text
output only).  Perhaps the error message could be more helpful in this
situation?

I'm using Biopython from CVS, but it seems to parse the plain text output from
both 2.2.15 and 2.2.18 fine.  Here is a modified version of your code which
reads from the example plain text files provided:

#!/usr/bin/env python
#
import os, re, string, operator
from Bio.Blast import NCBIStandalone
from sys import *

E_VALUE_THRESH = 0.005

nolf = re.compile('\n')
nogaps = re.compile('-')

blast_out = open("blastpgp.2.2.18.txt")
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)

if b_record.converged == 1:
    print '*** Converged!!! ***'

fastaout = open('test_psiblast.fasta','w')
summout = open('test_psiblast.txt','w')

for alignment in b_record.rounds[-1].alignments:
    for hsp in alignment.hsps:
        if hsp.expect < E_VALUE_THRESH:
            ident = 100.0*hsp.identities[0]/hsp.identities[1]
            simil = 100.0*hsp.positives[0]/hsp.positives[1]
            mytitle = nolf.sub(' ',alignment.title)
            mysbjct = nogaps.sub('',hsp.sbjct)
            summout.write('****Alignment****\n')
            summout.write('sequence: %s\n' % mytitle[0:70])
            summout.write('e value: %e\n' % hsp.expect)
            summout.write('alignment length: %i\n' % hsp.positives[1])
            summout.write('identity:   %(ident)5.2f\n' % {'ident': ident} )
            summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} )
            summout.write('query:   from %i to %i\n' % (hsp.query_start,
hsp.query_end))
            summout.write('subject: from %i to %i\n' % (hsp.sbjct_start,
hsp.sbjct_end))
            summout.write('%s ...\n' % hsp.query[0:75])
            summout.write('%s ...\n' % hsp.match[0:75])
            summout.write('%s ...\n' % hsp.sbjct[0:75])
            fastaout.write('%s\n%s\n' % (mytitle,mysbjct))

summout.close()
fastaout.close()
print "Done"

----------------------------------------------------------------------

So, as far as I can tell, the plain text PSI Blast parser is fine .

As I do not have the relevant databases installed, I have not tried using
Biopython to call blastpgp to run PSI-Blast.  It could be there is a problem
here with specifying the output format...

As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I
think you get back an iterator yielding a record for each iteration.  However,
as the example you provided had only one query and one iteration, this should
be tested further.  The record is not showing all the information extracted by
the PSI-Blast text parse, which should be in the XML file.  Perhaps you would
like to investigate this?

Example code:

from Bio.Blast import NCBIStandalone, NCBIXML

for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    record = NCBIStandalone.PSIBlastParser().parse(handle)
    print record.query
    if record.converged : print '*** Converged!!! ***'
    for iter_round in record.rounds :
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
        print "%i new sequences, %i reused" \
              %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
    print "End of plain text output"

for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    for iter_round in NCBIXML.parse(handle) :
        print iter_round.query
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
    print "End of XML output"

The output:

blastpgp.2.2.15.txt
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
500 new sequences, 0 reused
End of plain text output

blastpgp.2.2.18.txt
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
500 new sequences, 0 reused
End of plain text output

blastpgp.2.2.15.xml
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 500 alignments
End of XML output

blastpgp.2.2.18.xml
===================
gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
Iteration with 250 alignments
End of XML output

Notice that NCBI must have changed the XML format in some way (500 versus 250
alignments between versions 2.2.15 and 2.2.18).  I have not explored this in
any detail.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 10:45:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 06:45:58 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231045.m4NAjwr4023917@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #5 from ibdeno at gmail.com  2008-05-23 06:45 EST -------
Hi Peter,

Thank you. The problem must be then with the blastpgp call from Biopython,
since my code was trying to obtain plain text via the align_view='0' option:

blast_out, error_info = NCBIStandalone.blastpgp(
        blastcmd='/home/mortiz/Progs/blast-2.2.15/bin/blastpgp',
        database='/drives/databases/BlastDB/nrdb100ncbi',
        infile=file,
        npasses=passes,
        align_view='0',
        matrix_outfile=file + '.nrdb100ncbi.pssm')

However, when I print the result of this call with the handler you proposed:

handle = open("blastpgp_2.2.18.txt","w")
handle.write(blast_out.read())
handle.close()

I actually get plain text!

The same blastpgp call (same binary, same database, same input file sequence,
same number of PSI-Blast iterations) still gives the error reported in the bug
with version 2.2.18, but works all right with 2.2.15.
Because the error appears within seconds, I'm wondering if the parser is not
trying to read the results before blastpgp has actually finished the iterations
(about 3 minutes in my test)

I'm without a clue...


Miguel

(In reply to comment #4)
> I've worked out that the original problem was use to trying to parse XML output
> with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text
> output only).  Perhaps the error message could be more helpful in this
> situation?
> 
> I'm using Biopython from CVS, but it seems to parse the plain text output from
> both 2.2.15 and 2.2.18 fine.  Here is a modified version of your code which
> reads from the example plain text files provided:
> 
> #!/usr/bin/env python
> #
> import os, re, string, operator
> from Bio.Blast import NCBIStandalone
> from sys import *
> 
> E_VALUE_THRESH = 0.005
> 
> nolf = re.compile('\n')
> nogaps = re.compile('-')
> 
> blast_out = open("blastpgp.2.2.18.txt")
> b_parser = NCBIStandalone.PSIBlastParser()
> b_record = b_parser.parse(blast_out)
> 
> if b_record.converged == 1:
>     print '*** Converged!!! ***'
> 
> fastaout = open('test_psiblast.fasta','w')
> summout = open('test_psiblast.txt','w')
> 
> for alignment in b_record.rounds[-1].alignments:
>     for hsp in alignment.hsps:
>         if hsp.expect < E_VALUE_THRESH:
>             ident = 100.0*hsp.identities[0]/hsp.identities[1]
>             simil = 100.0*hsp.positives[0]/hsp.positives[1]
>             mytitle = nolf.sub(' ',alignment.title)
>             mysbjct = nogaps.sub('',hsp.sbjct)
>             summout.write('****Alignment****\n')
>             summout.write('sequence: %s\n' % mytitle[0:70])
>             summout.write('e value: %e\n' % hsp.expect)
>             summout.write('alignment length: %i\n' % hsp.positives[1])
>             summout.write('identity:   %(ident)5.2f\n' % {'ident': ident} )
>             summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} )
>             summout.write('query:   from %i to %i\n' % (hsp.query_start,
> hsp.query_end))
>             summout.write('subject: from %i to %i\n' % (hsp.sbjct_start,
> hsp.sbjct_end))
>             summout.write('%s ...\n' % hsp.query[0:75])
>             summout.write('%s ...\n' % hsp.match[0:75])
>             summout.write('%s ...\n' % hsp.sbjct[0:75])
>             fastaout.write('%s\n%s\n' % (mytitle,mysbjct))
> 
> summout.close()
> fastaout.close()
> print "Done"
> 
> ----------------------------------------------------------------------
> 
> So, as far as I can tell, the plain text PSI Blast parser is fine .
> 
> As I do not have the relevant databases installed, I have not tried using
> Biopython to call blastpgp to run PSI-Blast.  It could be there is a problem
> here with specifying the output format...
> 
> As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I
> think you get back an iterator yielding a record for each iteration.  However,
> as the example you provided had only one query and one iteration, this should
> be tested further.  The record is not showing all the information extracted by
> the PSI-Blast text parse, which should be in the XML file.  Perhaps you would
> like to investigate this?
> 
> Example code:
> 
> from Bio.Blast import NCBIStandalone, NCBIXML
> 
> for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     record = NCBIStandalone.PSIBlastParser().parse(handle)
>     print record.query
>     if record.converged : print '*** Converged!!! ***'
>     for iter_round in record.rounds :
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>         print "%i new sequences, %i reused" \
>               %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
>     print "End of plain text output"
> 
> for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     for iter_round in NCBIXML.parse(handle) :
>         print iter_round.query
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>     print "End of XML output"
> 
> The output:
> 
> blastpgp.2.2.15.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.18.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.15.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 500 alignments
> End of XML output
> 
> blastpgp.2.2.18.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> End of XML output
> 
> Notice that NCBI must have changed the XML format in some way (500 versus 250
> alignments between versions 2.2.15 and 2.2.18).  I have not explored this in
> any detail.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 11:02:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 07:02:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231102.m4NB2iPS024763@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 07:02 EST -------
That's an interesting theory - reading directly from standard out is causing
the problem (comment 5).  One thing you could try is writing the blastpgp
output to a file, and then opening the file for reading.

I'm not sure if blastpgp has a file output option.  You could just try this:

blast_out, error_info = NCBIStandalone.blastpgp(...)
handle = open("blastpgp_2.2.18.txt","w")
handle.write(blast_out.read())
handle.close()
blast_out = open("blastpgp_2.2.18.txt")
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)
...

Or, for a very crude workaround:

from time import sleep
blast_out, error_info = NCBIStandalone.blastpgp(...)
sleep(5*60) #Five minutes
b_parser = NCBIStandalone.PSIBlastParser()
b_record = b_parser.parse(blast_out)
...

If those work, it would be good evidence that your theory is right.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jblanca at btc.upv.es  Fri May 23 11:10:13 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 23 May 2008 13:10:13 +0200
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
Message-ID: <200805231310.13408.jblanca@btc.upv.es>

Hi:
After reding the suggestions in Bug 2351 I've coded a MutableSeq class that 
inherits from UserString.MutableString instead of using an array stored in 
self.data. It's quite easily to do it work as the MutableSeq present in 
Biopytyhon 1.45, but there's some problems to solve.
I don't know if this class would be faster or easier to maintain than the 
MutableSeq that uses array.array. I've just done that as an experiment to 
learn something about Biopython.

Now the compatibility problems that I have found...

self.data is not an array but an str. That's not easy to solve becase 
MutableString uses self.data internaly. I tried to define a property class, 
but MutableString is an old style class. Maybe I don't know enough python, 
but I don't know how to solve this type mismatch.

append() and extend() could be coded using __add__(). insert() and remove() 
are not supported by MutableSeq and would have to be coded. But I don't see 
the point of this methods in a sequence class. I think that the Seq and the 
MutableSeq API should be as similar as possible and since Seq uses __add__() 
I don't understand why MutableSeq should use append() and extend().

I also have problems with del seq[2:4:-1] and seq[2::3] = "N" * len(seq[2::3])
All the other tests for MutableSeq just work. 

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From bugzilla-daemon at portal.open-bio.org  Fri May 23 12:38:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 08:38:28 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231238.m4NCcS0S028452@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #7 from ibdeno at gmail.com  2008-05-23 08:38 EST -------
Unfortunately the hypothesis was not correct.
If I create an intermediate file, the parser works well if the file comes from
blastpgp 2.2.15 but chokes on 2.2.18.

There is a new reference in 2.2.18 header:

Reference for compositional score matrix adjustment: Altschul, Stephen F., 
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.


which falls between the two ones existing in the 2.2.15 version and makes the
header longer in terms of number of lines... Might be this?


Miguel

(In reply to comment #6)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 14:30:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 10:30:42 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231430.m4NEUgVL001388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 10:30 EST -------
I'm using the CVS version of Biopython under Linux.  The file main
NCBIStandalone.py hasn't changed since Biopython 1.45, although Record.py has.

I am a little puzzled about why I can parse both the 2.2.15 and the 2.2.18
plain text examples you provided without problems, but something fails for you.
 Could you double check what happens on your machine using these two example
files from attachment 922 comment 3, and this code I gave in comment 4:

from Bio.Blast import NCBIStandalone
for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
    print
    print filename
    print "="*len(filename)
    handle = open(filename)
    record = NCBIStandalone.PSIBlastParser().parse(handle)
    print record.query
    if record.converged : print '*** Converged!!! ***'
    for iter_round in record.rounds :
        print "Iteration with %i alignments" \
              % (len(iter_round.alignments))
        print "%i new sequences, %i reused" \
              %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
    print "End of plain text output"

If this doesn't work, please give the full stack trace - "chokes" is a little
vague.

Looking at the example files you provided in attachment 922 comment 3, they
seem to have replaced one reference with another.  This is the start of the
diff output comparing the two files:

1c1
< BLASTP 2.2.15 [Oct-15-2006]
---
> BLASTP 2.2.18 [Mar-02-2008]
10,15c10,13
< Reference for composition-based statistics:
< Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
< Sergei Shavirin, John L. Spouge, Yuri I. Wolf,  
< Eugene V. Koonin, and Stephen F. Altschul (2001), 
< "Improving the accuracy of PSI-BLAST protein database searches with 
< composition-based statistics and other refinements",  Nucleic Acids Res.
29:2994-3005.
---
> Reference for compositional score matrix adjustment: Altschul, Stephen F., 
> John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
> Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
> using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

This reference change doesn't seem to cause a problem on my machine.  I didn't
notice anything else worth commenting about.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:02:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:02:10 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231502.m4NF2AZm003440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #9 from ibdeno at gmail.com  2008-05-23 11:02 EST -------
Hi Peter,

Thank you for your patience and sorry not to be clear.

1. By 'choke' I meant that it produced the same error mentioned in the original
but report.

2. I see now that my attachments (#922) were not appropriate: to gain some time
I had requested no iterations to blastpgp, that is: I used '-j 1'. I can
actually parse the plain text from 2.2.18 that I had submitted in those
attachments both with your and my code. This also explains the differences in
the headers... 

I will now submit two plain text outputs from blastpgp with 2 iterations ('-j
3') Your code and mine can parse 2.2.15 but both fail (with the "Incorrect
header ?" error) with 2.2.18

Sorry again...


Miguel

(In reply to comment #8)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:05:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:05:16 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231505.m4NF5G1k003638@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #10 from ibdeno at gmail.com  2008-05-23 11:05 EST -------
Created an attachment (id=923)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=923&action=view)
Plain text outputs from blastpgp versions 2.2.15 and 2.2.18 with 2 iterations

These files are the result of calling blastpgp with the -j 3 option.
The files sent with attachment #922 were actually no problematic, only when at
least one iteration is carried out the parsing problem appears with blastpgp
version 2.2.18.

Perhaps due to the insertion of a new Reference in the header of the blastpgp
output?

Cheers,

Miguel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:16:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:16:14 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231516.m4NFGExh004121@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 11:16 EST -------
Great - I now get the same error as you :)

I'll try and have a look at this over the weekend.  Would you be able to make
matching XML files as well?  While I'm playing with blastpgp output it would be
worth checking exactly what the XML files do...

P.S. Would you object to me using any of your examples as test cases for the
Biopython unit tests?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:25:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:25:19 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231525.m4NFPJVY004581@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-23 11:25 EST -------
You are right - it is the extra reference which was causing the failure.

I've checked in a fix to Bio/Blast/NCBIStandalone.py with CVS revision 1.72

Could you update your Biopython installation to CVS and retest?  Or just
replace /home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py with
revision 1.72 from the ViewCVS website once its updated:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython

(I haven't closed this bug yet - I'd like your confirmation that this fixes
things, adding a new test case would probably be wise.)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:39:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:39:49 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231539.m4NFdn83005197@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #13 from ibdeno at gmail.com  2008-05-23 11:39 EST -------
Created an attachment (id=924)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=924&action=view)
XML equivalent of the files in the previous attachment (#923)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:41:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:41:17 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231541.m4NFfHv7005278@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #14 from ibdeno at gmail.com  2008-05-23 11:41 EST -------
I have now submitted the XML equivalent files.
Sure, please use the examples and code if you find them useful.

Cheers,


Miguel
(In reply to comment #11)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:42:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:42:50 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231542.m4NFgolb005350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #15 from ibdeno at gmail.com  2008-05-23 11:42 EST -------
I will try as soon as revision 1.72 is available through the link you provided.
So far, the latest is 1.71

Thank you!


Miguel

(In reply to comment #12)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri May 23 15:56:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 May 2008 11:56:13 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805231556.m4NFuDpd005873@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #16 from ibdeno at gmail.com  2008-05-23 11:56 EST -------
Sorry, I won't be able to try your fix until next week: I don't have access to
the computer due to maintenance.

I'll let you know as soon as possible.


Miguel

(In reply to comment #15)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat May 24 07:16:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 03:16:44 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805240716.m4O7GiqV007275@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #17 from ibdeno at gmail.com  2008-05-24 03:16 EST -------
I have managed to access to a different computer and tested your revised (1.72)
version of NCBIStandalone.py

I'm glad I can confirm it does work.

I guess the best way to avoid such problems in future would be to have an
appropriate XML parser for PSI-Blast.

Thank you very much for your assistance.


(In reply to comment #12)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From peter at maubp.freeserve.co.uk  Sat May 24 11:02:51 2008
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 24 May 2008 12:02:51 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <200805231310.13408.jblanca@btc.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<200805231310.13408.jblanca@btc.upv.es>
Message-ID: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com>

Hi Jose,

Your ideas are interesting for switching the MutableSeq class from an
array of char internally to a python mutable string.  However, are you
talking about the UserString.MutableString object? The documentation
suggests its not going to be as fast as a list or a character array:
http://pydoc.org/2.5.1/UserString.html#MutableString

Note that at some point we will be moving from Numeric to numpy, so
the exact internals of the current array based MutableSeq will change
slightly then.

I will be away most of next week, so don't worry if I seem to be ignoring you ;)

Peter


From bugzilla-daemon at portal.open-bio.org  Sat May 24 12:10:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:10:24 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To: <bug-2382-42@http.bugzilla.open-bio.org/>
Message-ID: <200805241210.m4OCAOol018283@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2382


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-24 08:10 EST -------
See also http://www.bioperl.org/wiki/Qual_sequence_format where there is a
similar looking file format which they call "qual" described as also being used
by PHRAP and CAP3.  e.g.

>HSMETOO 134bp
10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 
50 50 50 20 30 20 10 10


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat May 24 12:15:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:15:23 -0400
Subject: [Biopython-dev] [Bug 2503] New: An error when parsing NCBIWWW Blast
	output
Message-ID: <bug-2503-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503

           Summary: An error when parsing NCBIWWW Blast output
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: hebbar.prashanth at gmail.com


Hi All,
I get following error when I start parsing NCBIWWW balst output.
Traceback (most recent call last):
 File "<pyshell#17>", line 1, in -toplevel-
   b_record = b_parser.parse(blast_results)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 43, in
parse
   self._scanner.feed(handle, self._consumer)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 94, in
feed
   has_re=re.compile(r'<b>.?BLAST'))
 File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 335, in
read_and_call_until
   line = safe_readline(uhandle)
 File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
   raise SyntaxError, "Unexpected end of stream."
SyntaxError: Unexpected end of stream.
Can any one please help me to solve this? I am using biopython 1.44 version (I
tried with 1.45 too, the same error comes)
in windows system
Thank you in anticipation,
Prashanth


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat May 24 12:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 24 May 2008 08:25:59 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
	output
In-Reply-To: <bug-2503-42@http.bugzilla.open-bio.org/>
Message-ID: <200805241225.m4OCPxTc018893@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2503


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-24 08:25 EST -------
We need more information.

Could you show us the example code that causes this problem?  

If you are trying to parse a file (e.g. from standalone blast), could attach it
to this bug?

For the look of the stack trace, you are trying to parse the HTML output from
blast (?).  We do recommend parsing the XML output if possible (not the plain
text or HTML output).

Thank you,
Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat May 24 14:26:27 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 24 May 2008 07:26:27 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtils
In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
Message-ID: <893127.27535.qm@web62412.mail.re1.yahoo.com>

Dear all,

I have essentially completed the parser in Bio.Entrez. AFAICT, it works with all kinds of XML files returned by NCBI's Entrez Utilities, except for the Pubmed Central database (Pubmed itself is fine). I am using this module a lot for my own work, so it has received quite a lot of testing. As a case in point, there are 40 unit tests for the Bio.Entrez parser. These, by the way, can show you some examples of how to use this module. The documentation is now also updated.

This module may at some point replace Bio.EUtils, so if you are using this module you might want to try Bio.Entrez to see if it covers everything Bio.EUtils covers.

--Michiel

Peter <biopython at maubp.freeserve.co.uk> wrote:Bio.Entrez (Michiel). I see you've been very busy with the new
simplified XML parsers (see bug 2488).  This looks like a big
improvement on the rather repetitive coding needed in the first draft.
 Are you still actively making further refinements?  How many Entrez
XML file formats are still needed?
http://bugzilla.open-bio.org/show_bug.cgi?id=2488


From mjldehoon at yahoo.com  Sat May 24 14:16:15 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 24 May 2008 07:16:15 -0700 (PDT)
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com>
Message-ID: <135625.21242.qm@web62412.mail.re1.yahoo.com>

Peter <peter at maubp.freeserve.co.uk> wrote:
> Note that at some point we will be moving from Numeric to numpy, so
> the exact internals of the current array based MutableSeq will change
> slightly then.

MutableSeq uses Python's array, not Numeric's array, so it should not be affected by moving from Numeric to numpy.

--Michiel.

       
From biopython at maubp.freeserve.co.uk  Sun May 25 10:36:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 11:36:14 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<1211479809.4835b70111c71@webmail.upv.es>
Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com>

On May 22, 2008, Blanca Postigo Jose Miguel wrote:
>> If I understood your terminology correctly, "qualities" is a list of
>> scores, one for each letter in the sequence.
> You're right. I'm sorry, I used them a lot and a reserved them a special place
> in the API, my fault, I will remove it, only the sequence should have a
> relevant place in the API, the rest should be stored as features.

I've asked on the BioSQL mailing list about this sort of "per letter"
annotation.  Currently there is no mechanism to store this sort of
thing in the schema.
http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html

However, Hilmar did point out some relevant bits of BioPerl to have a look at:

Hilmar Lapp wrote:
> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
> Bio::Seq::MetaI.

The BioPerl SeqWithQuality sounds like what you were most interested
in  Jose, although the Meta-Interface may be of relevance too.

Peter


From biopython at maubp.freeserve.co.uk  Sun May 25 12:06:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 13:06:50 +0100
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <157512.3075.qm@web62408.mail.re1.yahoo.com>
References: <157512.3075.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>

On Sun, May 18, 2008, Michiel de Hoon wrote:
> Hi everybody,
>
> In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now
> installed using a specialized install_data_biopython class. For Bio.Entrez, I am
> using the package_data argument to the setup function instead. Does anybody
> know why the install_data_biopython class was used? If there's no specific
> reason, I'd prefer to use the package_data argument instead.

I think I've found one reason not to - it doesn't seem to be supported
in Python 2.3 as shown here:

C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe
setup.py install
c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option:
 'package_data'
  warnings.warn(msg)
running install
...

If I'd known this earlier, I would of course have said something.  On
the other hand, I may be the only person still using Biopython with
python 2.3.

Peter


From tiagoantao at gmail.com  Sun May 25 12:48:35 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 25 May 2008 13:48:35 +0100
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
References: <157512.3075.qm@web62408.mail.re1.yahoo.com>
	<320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
Message-ID: <6d941f120805250548t357d6d0fwe36d5d1b39eaaa77@mail.gmail.com>

> If I'd known this earlier, I would of course have said something.  On
> the other hand, I may be the only person still using Biopython with
> python 2.3.

What about doing a survey (or a web poll on the site) on the main list
to know what python versions people are using? To have a sense of what
should be supported/deprecated...


From biopython at maubp.freeserve.co.uk  Sun May 25 10:36:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 25 May 2008 11:36:14 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es>
References: <200805220930.53004.jblanca@btc.upv.es>
	<320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com>
	<1211479809.4835b70111c71@webmail.upv.es>
Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com>

On May 22, 2008, Blanca Postigo Jose Miguel wrote:
>> If I understood your terminology correctly, "qualities" is a list of
>> scores, one for each letter in the sequence.
> You're right. I'm sorry, I used them a lot and a reserved them a special place
> in the API, my fault, I will remove it, only the sequence should have a
> relevant place in the API, the rest should be stored as features.

I've asked on the BioSQL mailing list about this sort of "per letter"
annotation.  Currently there is no mechanism to store this sort of
thing in the schema.
http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html

However, Hilmar did point out some relevant bits of BioPerl to have a look at:

Hilmar Lapp wrote:
> In BioPerl we have Bio::Seq::SeqWithQuality and the more generic
> Bio::Seq::MetaI.

The BioPerl SeqWithQuality sounds like what you were most interested
in  Jose, although the Meta-Interface may be of relevance too.

Peter


From jblanca at btc.upv.es  Mon May 26 05:24:30 2008
From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel)
Date: Mon, 26 May 2008 07:24:30 +0200
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
Message-ID: <1211779470.483a498e18e3e@webmail.upv.es>

> One of your points seemed to be that the SeqRecord couldn't have a
> __getitem__ and methods like reverse, complement, etc.  I don't see
> why it couldn't have these.  Perhaps rather than introducing a whole
> new class, enhancing the SeqRecord would be a better avenue.
My main concern with SeqRecord is that is has a Seq, it we want a slice or a
reverse we would do:
my_seq = SeqRecord(Seq('ACTGTGAC'))
myseq.seq[1:5]
myseq.seq.reverse()
If we add to SeqRecord residues annotations (like qualities) how could be
reversed if we are calling directly to the .seq.reverse(). I don't know how
could this work.
my_seq = SeqRecord(Seq('ACTG'), Qual([10,20,30,40]))
myseq.seq.reverse()
It would create a non-valid sequence
str(myseq.seq) -> 'GTCA'
str(myseq.qual) -> [10,20,30,40]
One possibility is to have methods like __getitem__ and in Seq, it would be
like:
my_seq.seq[1:3]
my_seq[1:3]
Just for testing I have done a RichSeq that is compatible with Seq and
SeqRecord, but that's very confusing. Does this SeqRecord HAS or IS a sequence?
It could work, but I feel that is wrong and it is easier to explain to the users
that a new improved SeqRecord has been created (RichSeq) and that they should
migrate to that.
Another problem difficult to solve. If RichSeq is compatible with Seq as Michel
wants to and I agree on that, how it could be compatible with SeqRecord. The
parameters in their constructors are not compatible:
SeqRecord(seq, ...)
Seq(data, alphabet...)
I would happily improve on RichSeq, but I don't know how to do it in a sane way.
What do you think?

>
> Also, I do think we should bear in mind the BioSQL sequence
> representation, which we currently expose in a SeqRecord/Seq like way.
>  I wouldn't want to lose this / have to completely re-write the
> Biopython BioSQL code.
I would look into that.
Best regards,

Jose Blanca


>
> Peter
>
> On Sun, May 25, 2008 at 9:12 PM, Blanca Postigo Jose Miguel
> <jblanca at btc.upv.es> wrote:
> > Dear biopythonistas:
> > First of all my apologize for the MutableSeq reimplementation. I did it
> just for
> > the sake of learning more about python and Biopython, not to achive a
> speedier
> > implementation. It has been a good learning exercise for me, but now let's
> go
> > for the meat...
> >
> > Everything that follows is just my opinion on the sequence classes. Mine is
> not
> > a well informed opinion and I would just like to show my ideas to you to
> get
> > some feed back and to learn from you.
> >
> > Since this sequence class remodelation is a complex topic I would like to
> > explain my ideas about it with some order. I won't enter into
> implementation
> > details, I will just discuss the API of the classes.
> > I think that Seq and MutableSeq are pretty ok, although MutableSeq has some
> > extra method that depends on implementation and are not relevant for a
> sequence
> > class (append, insert, pop, remove). In general Seq and MutableSeq should
> have
> > the same API, that would do their use simpler.
> >
> > I think that the main problem is SeqRecord. SeqRecord IS NOT a sequence it
> HAS a
> > sequence, that's its main flaw. A more capable Seq class should be a Seq.
> My
> > proposal is to create a RichSeq that inherits from Seq and a MutableRichSeq
> > that inherits from MutableSeq. I've been doing some coding and some
> thinking
> > about that. I'm discussing this with you, because I would like to improve
> the
> > desing of the API of such sequence and I could implement it. It's main
> desing
> > guidelines would be:
> > - Compatible with Seq or with MutableSeq. Everytime that you can use a Seq
> class
> > you can also use a more capable RichSeq without changing anything in your
> > program.
> > - RichSeq IS a Seq, it inherits from Seq.
> > - RichSeq is similar to SeqRecord, but they aren't compatible.
> >        The SeqRecord constructor is:
> >    def __init__(self, seq, id = "<unknown id>", name = "<unknown name>",
> >                 description = "<unknown description>", dbxrefs = None,
> >                 features = None):
> >        and the RichSeq one maybe:
> >    def __init__(self, seq=None, alphabet = None,
> >                 id = "<unknown id>", name = "<unknown name>",
> >                 description = "<unknown description>", dbxrefs = None,
> >                 features = None):
> >        RichSeq has a seq(or could be data) and an alphabet (like the Seq
> class) while
> > SeqRecord has a Seq object.
> >        RichSeq would not have a .seq property.
> > - RichSeq has a __getitem__ method capable of things like RichSeq[1:2]. And
> it
> > would also had the methods reverse, complement, etc.. That's not possible
> with
> > SeqRecord.
> > - RichSeq should be a new type class, what about Seq and MutableSeq?
> > - From a Michel's comment:
> >        1) A Seq object is basically a string, so it should behave as if it
> were
> >        subclassed from string.
> >        2) As a result, functions that have a sequence as an argument, but
> don't need
> >        the added features of a Seq object, should work with strings as well
> as Seq
> >        objects.
> >        4) Currently, Seq objects have an associated alphabet; SeqRecord
> objects have
> >        annotations, dbxrefs, a description, features, id, and name. I think
> a new Seq
> >        object should have both, so that we can avoid having both a Seq and
> a SeqRecord
> >        class. Of course, some or all of these fields can remain None. (I
> would add,
> > that even the seq could be None)
> > If biopython had a class like RichSeq I wouldn't use SeqRecord. Also, the
> > transition from using SeqRecord to RichSeq would be very easy and both
> classes
> > could coexist as long as you would like.
> > Also using the features the per-residue annotation is very easy to
> implement. In
> > fact I have done it already using a RichFeature class, but I would discuss
> that
> > in other mail.
> > RichSeq is more easy to extend than SeqRecord, that's its main advantage. I
> have
> > pretty wild plans for a class like RichSeq. A class like SeqWithQuality or
> the
> > Bio::Seq::MetaI from Bioperl would be very easy to derive from RichSeq. The
> > would be just easier interfaces to the more capable and general RichSeq.
> Even
> > Alignment would be derived from RichSeq. An Alignment IS a sequence with
> > subsequences in it. I have also implemented a prototype of that and it work
> > quite ok with very like coding.
> > This are the more general remarks about RichSeq. What do you think? Is a
> good
> > idea to go beyond SeqRecord for biopython? Could be something like RichSeq
> a
> > possible way to do it?
> >
> > Now I would like to list the open discussion points regarding the sequence
> class
> > APIs.
> > - annotations is not in the constructor of SeqRecord. There's two options:
> add
> > it to the RichSeq constructor or remove it altogether. In my implementation
> a
> > feature can span the whole sequence length or can have a range attached. In
> > this way annotations are just a special case of featues. We would have to
> > decide between dict and list for the API.
> >
> > - __getitem__ should always return a RichSeq. It's more consistent to
> return the
> > same for a_seq[1:2] and a_seq[1]. If someone wants a character can do
> > str(seq)[1].
> >
> > - no seq property in RichSeq.
> >
> > - with __str__ is enough, so tostring() is not necessary for more complex
> > representations we have __repr__. tostring()could be kept for compatibility
> > with the Seq and MutableSeq API.
> >
> > - What to do with id, name and the str annotations when a slice is
> requested? If
> > seq.name is 'a_sequence' should seq[1:10].name be 'a_sequence' or
> 'a_sequence
> > [1:10]' or ''? Same problem with add and __radd__.This is a problem, but
> some
> > of the three alternatives should be taken and explained in the
> documetation. A
> > better solution is in my RichFeature class, but I wouldn't discuss it now.
> >
> > - __iter__ iterates over the sequence as a character string.
> >
> > - __add__ and __radd__
> >
> > - .upper(), .count(), .lower()
> >
> > - .data property. I think that this is an implemetation detail and it
> should be
> > deprecated from Seq and MutableSeq.
> >
> > Well, that's all sorry for the long mail. I'm enjoing working on this
> problem
> > and learning from you.
> > Best regards,
> >
> > Jose Blanca
> >
> >
>


-- 


From bugzilla-daemon at portal.open-bio.org  Wed May 28 12:17:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 May 2008 08:17:25 -0400
Subject: [Biopython-dev] [Bug 2506] New: SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
Message-ID: <bug-2506-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506

           Summary: SELECT problems on _get_seqfeature_dbxref in Loader.py
                    with postgresql
           Product: Biopython
           Version: 1.45
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: andrea at biodec.com
                CC: andrea at biodec.com


Using: 
  - postgres 8.3 or less # the version is not important
  - BioSQL 1.0.0 installed on a postgresql database (on Linux) # the version is
not important
  - python-psycopg 1.1.21-14 or less
  - python-psycopg2 2.0.5.1-6 or less
  - python 2.4.4-2 # not important
  - Biopython CVS version 28/05/08,
    - Loader.py version 1.30
  - "psycopg" or "psycopg2" as BioSeqDatabase.open_database drivers

During insertion in the BioSQL database of a seq_record object derived from a
GenBank Iterator, the procedure _get_seqfeature_dbxref fails with the errror:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 420,
in load
    db_loader.load_seqrecord(cur_record)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 50, in
load_seqrecord
    self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 542, in
_load_seqfeature
    self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 641, in
_load_seqfeature_qualifiers
    seqfeature_id)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 679, in
_load_seqfeature_dbxref
    self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
  File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 712, in
_get_seqfeature_dbxref
    result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id,
  File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
    self.cursor.execute(sql, args or ())
psycopg.ProgrammingError: ERROR:  column "195" does not exist

SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id =
"195" AND dbxref_id = "207739"

The problem is that there is an error in the query format at rows 710 and 711
of the Loader.py in Biopyton/BioSQL:
    709    # Check for an existing record
    710    sql = r'SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref ' \
    711          r'WHERE seqfeature_id = "%s" AND dbxref_id = "%s"'
because the query has double quotes (") around the values, and
postgres interprets them as Column names and not values.

If you correct the query with single quotes, you correct the error. 
    709    # Check for an existing record
    710    sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \
    711          r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Wed May 28 12:31:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 28 May 2008 05:31:33 -0700 (PDT)
Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files
In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com>
Message-ID: <499799.68733.qm@web62408.mail.re1.yahoo.com>

That's odd ... I had tried with a Python version 2.3, and it worked there. Maybe this feature was added during the Python 2.3 cycle.
Then, I guess we need to use the  install_data_biopython class for now, and start using package_data once we stop supporting Python 2.3.

--Michiel

Peter <biopython at maubp.freeserve.co.uk> wrote: On Sun, May 18, 2008, Michiel de Hoon wrote:
> Hi everybody,
>
> In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now
> installed using a specialized install_data_biopython class. For Bio.Entrez, I am
> using the package_data argument to the setup function instead. Does anybody
> know why the install_data_biopython class was used? If there's no specific
> reason, I'd prefer to use the package_data argument instead.

I think I've found one reason not to - it doesn't seem to be supported
in Python 2.3 as shown here:

C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe
setup.py install
c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option:
 'package_data'
  warnings.warn(msg)
running install
...

If I'd known this earlier, I would of course have said something.  On
the other hand, I may be the only person still using Biopython with
python 2.3.

Peter


From fkauff at biologie.uni-kl.de  Thu May 29 09:20:56 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Thu, 29 May 2008 11:20:56 +0200
Subject: [Biopython-dev] CVS access and developers web site
Message-ID: <483E7578.50402@biologie.uni-kl.de>

Hi folks,

although I've been quiet for a while, I'm still doing some changes to 
the Nexus parser of biopython from time to time.... I totally lost my 
passwords to access the repository. Could someone please send me a new 
password to get write access to cvs? And I would also like to change the 
information on the biopython developers web site, as they are somewhat 
outdated.
And is this the right place to ask for such things?

Thanks!

Frank


From bugzilla-daemon at portal.open-bio.org  Thu May 29 10:47:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 May 2008 06:47:29 -0400
Subject: [Biopython-dev] [Bug 2506] SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
In-Reply-To: <bug-2506-42@http.bugzilla.open-bio.org/>
Message-ID: <200805291047.m4TAlT18002239@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506


------- Comment #1 from andrea at biodec.com  2008-05-29 06:47 EST -------
Created an attachment (id=926)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=926&action=view)
Proposed patch


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From p.j.a.cock at googlemail.com  Thu May 29 21:46:46 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 May 2008 22:46:46 +0100
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <483E7578.50402@biologie.uni-kl.de>
References: <483E7578.50402@biologie.uni-kl.de>
Message-ID: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>

Hi Frank,

I would try emailing support at helpdesk.open-bio.org using the email
address associated with your CVS username.  If you've changed email
address, and you run into problems, I expect Michiel or I could vouch
for you.

For the website, the wiki usernames are entirely separate and you
should be able to create a new account if you don't have one already.
If you want to update the tutorial new HTML and PDF files are loaded
with each release from the version in CVS.

Peter

On Thu, May 29, 2008 at 10:20 AM, Frank Kauff <fkauff at biologie.uni-kl.de> wrote:
> Hi folks,
>
> although I've been quiet for a while, I'm still doing some changes to the
> Nexus parser of biopython from time to time.... I totally lost my passwords
> to access the repository. Could someone please send me a new password to get
> write access to cvs? And I would also like to change the information on the
> biopython developers web site, as they are somewhat outdated.
> And is this the right place to ask for such things?
>
> Thanks!
>
> Frank


From bugzilla-daemon at portal.open-bio.org  Fri May 30 11:15:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 May 2008 07:15:23 -0400
Subject: [Biopython-dev] [Bug 2506] SELECT problems on
	_get_seqfeature_dbxref in Loader.py with postgresql
In-Reply-To: <bug-2506-42@http.bugzilla.open-bio.org/>
Message-ID: <200805301115.m4UBFNE3011942@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2506


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-30 07:15 EST -------
Thanks for the report.  I've fixed this issue (method _get_seqfeature_dbxref at
line 710) and a similar one (in _get_bioentry_dbxref at line 761) in CVS
BioSQL/Loader.py revision 1.31

Note that I have only tested this with MySQL under Linux.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri May 30 14:17:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 30 May 2008 15:17:08 +0100
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <893127.27535.qm@web62412.mail.re1.yahoo.com>
References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com>
	<893127.27535.qm@web62412.mail.re1.yahoo.com>
Message-ID: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com>

On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri May 30 15:15:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 May 2008 11:15:16 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
	2.2.18 though works with blastpgp 2.2.15
In-Reply-To: <bug-2502-42@http.bugzilla.open-bio.org/>
Message-ID: <200805301515.m4UFFGhJ024631@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2502


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2008-05-30 11:15 EST -------
The XML parser seems to be fine on your example output.  However, the XML
output does not appear to list/flag any difference between:

"Sequences used in model and found again"
"Sequences not found previously or not previously below threshold"

This means there is no way to populate the .new_seqs and .reused_seqs lists. 
If you care about this information, then for now using the plain text output
might be best.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.