From ytu888 at hotmail.com  Mon Oct  1 07:39:50 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 1 Oct 2007 06:39:50 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
Message-ID: <BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>


Thanks Peter, 

However, I still haven't install mxText module in my Mac yet. Also could you tell me how to run the test file of ReportLab, when I launch Python and then import the test file into the python. Thanks.


> Date: Fri, 28 Sep 2007 20:42:31 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> CC: biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> 
> Y Tu wrote:
> > Thank you, Peter for the prompt answer.
> > 
> > I did install the PIL already and tested with the commands "from PIL
> > import Image", then "import _imaging". Both commands succeeded.
> > That's why I don't understand why the test won't work. I used the
> > command "python test_pdfgen_general.py" under the shell prompt, which
> > generated the error. Since I installed PIL and succeeded in importing
> > the module of PIL, I thought maybe I can solve the problem by running
> > the test under Python.
> 
> Looking in more detail at the original stack trace,
> 
> >   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
> >     d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
> >   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
> >     raise IOError("decoder %s not available" % decoder_name)
> > IOError: decoder jpeg not available
> 
> Its possible that PIL needs some optional JPEG library, which ReportLab 
> wants to use.  I suggest you search the ReportLab website & user's 
> mailing list, and if you can't work out what is wrong sign up to their 
> mailing list and ask them, http://www.reportlab.org/
> 
> Very little of Biopython needs ReportLab, you should be able to install 
> Biopython without it.
> 
> Peter
> 
> 

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

From ytu888 at hotmail.com  Mon Oct  1 13:54:00 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 1 Oct 2007 12:54:00 -0500
Subject: [BioPython]  Error for installation of  MySALdb on Mac OS X
In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
Message-ID: <BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>


I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and installed it. Then I tried to install MySQL-python-1.2.2 but got the following error. How to create the mysql_config.path file? Thank you very much.

leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ python setup.py build
sh: line 1: mysql_config: command not found
Traceback (most recent call last):
  File "setup.py", line 16, in 
    metadata, options = get_config()
  File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config
    libs = mysql_config("libs_r")
  File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config
    raise EnvironmentError, "%s not found" % mysql_config.path
EnvironmentError: mysql_config not found

_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx

From lists.steve at arachnedesign.net  Mon Oct  1 16:18:04 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Mon, 1 Oct 2007 16:18:04 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
Message-ID: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>

> I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> installed it. Then I tried to install MySQL-python-1.2.2 but got  
> the following error. How to create the mysql_config.path file?  
> Thank you very much.
>
> leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> python setup.py build
> sh: line 1: mysql_config: command not found

It seems as if you need to have the `mysql_config` command in your  
PATH variable and it's not there.

Look for where mysql was installed (maybe /usr/local/mysql/...) and  
add its bin directory to your PATH environment variable. Or maybe it  
installed some binaries/symlinks into your /usr/local/bin directory?

I think that'll do it for you.

-steve


From biopython at maubp.freeserve.co.uk  Mon Oct  1 17:06:37 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 1 Oct 2007 22:06:37 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>
Message-ID: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>

On 10/1/07, Y Tu <ytu888 at hotmail.com> wrote:
>
> Thanks Peter,
>
> However, I still haven't install mxText module in my Mac yet.

I see you've signed up to the eGenix mailing list - I hope they can
solve your mxTextTools installation problems.

> Also could you tell me how to run the test file of ReportLab, when I
> launch Python and then import the test file into the python. Thanks.

In general I think most tests are designed to be run from the command
line, not by running python, typing an import statement, and typing
another command.  You should check the ReportLab documentation to see
what they recommend.

To run a specific Biopython unit test, such as the general graphics
unit test, you would do this:

python run_tests.py test_GraphicsGeneral.py

That would run the test, and check the output matched the expected
results.  Alternatively, you can do:

python test_GraphicsGeneral.py

I hope that helps.

Peter

From ULNJUJERYDIX at spammotel.com  Tue Oct  2 02:52:53 2007
From: ULNJUJERYDIX at spammotel.com (Kevin Lam)
Date: Tue, 2 Oct 2007 14:52:53 +0800
Subject: [BioPython] Fwd: **Fwd: [Bioperl-l] divide and blast blastunsplit
	blast subsequence
In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com>
References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com>
Message-ID: <5b6410e0710012352s520b537bj7374dd874dc93104@mail.gmail.com>

Hi!
I am trying to annotate a 200kb sequence by doing blastx to find the protein
seq location
I need to split the sequence up so that I get the best hits for each region
(the top blast hits will mask the smaller proteins if i do it as a whole
sequence)
if i were to do it manually i can set the subsequence in the web gui for
ncbi's blast.
this way, the blast hits coords are based on the whole 200kb.

but I can't find this option in blast or a straightforward way to do it in
bioperl.

I found similar solutions like
http://www.bio.davidson.edu/projects/DAB/DAB.html
divide and blast (but I want to specify coords rather than fixed intervals)

there also this from the bioperl archives
http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html

but isn't there an easier way like i can specify blast subsequence 200-900
of fasta file and it will return the blastx hits in coords in terms of the
whole 200kb?

From mdehoon at c2b2.columbia.edu  Tue Oct  2 05:06:54 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 05:06:54 -0400
Subject: [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>

Hi everybody,

Since no users of Bio.MultiProc came forward, I deprecated it for the
upcoming release.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
Sent: Tue 9/11/2007 10:37 AM
To: BioPython Developers List; biopython at biopython.org
Subject: [BioPython] Bio.MultiProc
 
Hi everybody,

In preparation for the upcoming release, I was running the Biopython 
test suite and found that test_copen.py hangs on Cygwin. It doesn't 
fail, it just sits there forever. This may be related to the use of 
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it 
is probably possible to fix this, I'd have to dig fairly deep into the 
code, and I am not sure if it is worth it. It looks like the copen 
functions are used only in Bio/config, which is needed for Bio.db. A 
description of the functionality of thia module can be found in the 
tutorial section 4.7.2.

Now, I don't remember users asking about this module on the mailing 
list. From the tutorial documentation, it seems to be a nice piece of 
code, but I doubt that it is being used often in practice.

So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release? 
Hopefully, people who are using this code will notice, and let us know 
that they need it.

--Michiel.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From ytu888 at hotmail.com  Tue Oct  2 07:36:58 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 2 Oct 2007 06:36:58 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W24ADB785F1713680E663638FAD0@phx.gbl> 
	<320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>
Message-ID: <BAY119-W116B22B2B8E4727BB9196F8FAE0@phx.gbl>


Thank you very much, Peter.

> Date: Mon, 1 Oct 2007 22:06:37 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> CC: biopython at lists.open-bio.org
> 
> On 10/1/07, Y Tu <ytu888 at hotmail.com> wrote:
> >
> > Thanks Peter,
> >
> > However, I still haven't install mxText module in my Mac yet.
> 
> I see you've signed up to the eGenix mailing list - I hope they can
> solve your mxTextTools installation problems.
> 
> > Also could you tell me how to run the test file of ReportLab, when I
> > launch Python and then import the test file into the python. Thanks.
> 
> In general I think most tests are designed to be run from the command
> line, not by running python, typing an import statement, and typing
> another command.  You should check the ReportLab documentation to see
> what they recommend.
> 
> To run a specific Biopython unit test, such as the general graphics
> unit test, you would do this:
> 
> python run_tests.py test_GraphicsGeneral.py
> 
> That would run the test, and check the output matched the expected
> results.  Alternatively, you can do:
> 
> python test_GraphicsGeneral.py
> 
> I hope that helps.
> 
> Peter

_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Caf?. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

From ytu888 at hotmail.com  Tue Oct  2 08:29:46 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 2 Oct 2007 07:29:46 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
Message-ID: <BAY119-W23852A91CA4B6611B79BC58FAE0@phx.gbl>


Hi Steve,

I checked the PATH and added /usr/local/mysql/bin into it. But I still got the same error message when running the setup.py.

Thanks.

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 1 Oct 2007 16:18:04 -0400
> To: ytu888 at hotmail.com
> 
> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> > installed it. Then I tried to install MySQL-python-1.2.2 but got  
> > the following error. How to create the mysql_config.path file?  
> > Thank you very much.
> >
> > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> > python setup.py build
> > sh: line 1: mysql_config: command not found
> 
> It seems as if you need to have the `mysql_config` command in your  
> PATH variable and it's not there.
> 
> Look for where mysql was installed (maybe /usr/local/mysql/...) and  
> add its bin directory to your PATH environment variable. Or maybe it  
> installed some binaries/symlinks into your /usr/local/bin directory?
> 
> I think that'll do it for you.
> 
> -steve
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us

From idoerg at gmail.com  Tue Oct  2 12:00:41 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Tue, 2 Oct 2007 09:00:41 -0700
Subject: [BioPython] [Biopython-dev]  Bio.MultiProc
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
References: <46E6A845.3030601@c2b2.columbia.edu>
	<6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <b5bbbc970710020900n66c9816bs311fa29eb52d3f25@mail.gmail.com>

Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:

1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).

Also, is it possible to track down the original author?

./I

On 10/2/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."

From mdehoon at c2b2.columbia.edu  Tue Oct  2 20:18:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 20:18:59 -0400
Subject: [BioPython] [Biopython-dev]  Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
	<b5bbbc970710020900n66c9816bs311fa29eb52d3f25@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu>

> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?

That is what I did.

> 3) Leave an option of fixing and commenting the code back in (i.e. it is
not
> lost forever).

Even after removing the code in some future release, the code will not be
lost forever. It can always be retrieved from CVS and from older Biopython
releases.

> Also, is it possible to track down the original author?

That would be Jeff Chang.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Iddo Friedberg [mailto:idoerg at gmail.com]
Sent: Tue 10/2/2007 12:00 PM
To: Michiel De Hoon
Cc: BioPython Developers List; biopython at biopython.org
Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc
 
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:

1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).

Also, is it possible to track down the original author?

./I

On 10/2/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


From ytu888 at hotmail.com  Wed Oct  3 08:44:32 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Wed, 3 Oct 2007 07:44:32 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
Message-ID: <BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>


Here is the copy of the output in the Terminal. Please help me to find out what's wrong. Thanks.


Last login: Wed Oct  3 08:28:38 on ttyp4

Welcome to Darwin!

LeesComputer:~ Lee$ echo
$PATH

/Library/Frameworks/Python.framework/Versions/Current/bin:/usr/local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin


LeesComputer:~ Lee$ cd
/applications/python_bio/MySQL-python-1.2.2

LeesComputer:/applications/python_bio/MySQL-python-1.2.2
Lee$ python setup.py build

sh: line 1: mysql_config:
command not found

Traceback (most recent call
last):

  File "setup.py", line 16, in
<module>

    metadata, options = get_config()

  File
"/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line
43, in get_config

    libs = mysql_config("libs_r")

  File
"/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line
24, in mysql_config

    raise EnvironmentError, "%s not
found" % mysql_config.path

EnvironmentError:
mysql_config not found

 
LeesComputer:/applications/python_bio/MySQL-python-1.2.2
Lee$ cd /usr/local                                 

LeesComputer:/usr/local Lee$
ls -al

total 8

drwxr-xr-x    8 root 
wheel  272 Oct  1 13:02 .

drwxr-xr-x   10 root 
wheel  340 Sep 26 11:30 ..

drwxr-xr-x    8 root 
admin  272 Aug  6 04:00 ActivePerl-5.8

drwxr-xr-x   15 root 
wheel  510 Oct  2 03:52 bin

drwxr-xr-x    6 root 
wheel  204 Sep 27 05:22 include

drwxr-xr-x   12 root 
wheel  408 Sep 27 05:21 lib

lrwxr-xr-x    1 root 
wheel   25 Oct  1 13:02 mysql -> mysql-5.0.45-osx10.4-i686

drwxr-xr-x   19 root 
wheel  646 Jul  4 13:54 mysql-5.0.45-osx10.4-i686


> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 1 Oct 2007 16:18:04 -0400
> To: ytu888 at hotmail.com
> 
> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> > installed it. Then I tried to install MySQL-python-1.2.2 but got  
> > the following error. How to create the mysql_config.path file?  
> > Thank you very much.
> >
> > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> > python setup.py build
> > sh: line 1: mysql_config: command not found
> 
> It seems as if you need to have the `mysql_config` command in your  
> PATH variable and it's not there.
> 
> Look for where mysql was installed (maybe /usr/local/mysql/...) and  
> add its bin directory to your PATH environment variable. Or maybe it  
> installed some binaries/symlinks into your /usr/local/bin directory?
> 
> I think that'll do it for you.
> 
> -steve
> 

_________________________________________________________________
Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct

From lists.steve at arachnedesign.net  Wed Oct  3 09:01:09 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Wed, 3 Oct 2007 09:01:09 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
Message-ID: <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>

Hi,

On Oct 3, 2007, at 8:44 AM, Y Tu wrote:

> Here is the copy of the output in the Terminal. Please help me to  
> find out what's wrong. Thanks.
>
> Last login: Wed Oct  3 08:28:38 on ttyp4
> Welcome to Darwin!
> LeesComputer:~ Lee$ echo $PATH
> /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/ 
> local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin

It still looks like your PATH is screwed up, /usr/local/mysql/bin  
isn't in there, you have:
/usr/local/mysl:/bin

Here's a test. Open up a terminal and type:

$ which mysql_config

If you don't get an answer back that indicates that the system can  
find the binary, then your script won't either. For instance, this is  
how it looks for me:

$ which mysql_config
/Library/MySQL/bin/mysql_config

(I have an older version of mysql which was installed into /Library/ 
MySQL)

Yours should say:

$ which mysql_config
/usr/local/mysql/bin/mysql_config

Or something like that.

Try that and see ...

-steve

From lists.steve at arachnedesign.net  Wed Oct  3 10:47:41 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Wed, 3 Oct 2007 10:47:41 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
Message-ID: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>

> Steve, thank you very much. It fixed the problem and I got through  
> the build and install step. But when I tested inside the python for  
> the installation I got following error. Please help me about it.  
> Thanks.
>
> >>> import MySQLdb
> /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ 
> site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ 
> _mysql.py:3: UserWarning: Module _mysql was already imported from / 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc,  
> but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to  
> sys.path
>   import sys, pkg_resources, imp
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "MySQLdb/__init__.py", line 19, in <module>
>     import _mysql
>   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in  
> <module>
>   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in  
> __bootstrap__
> ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / 
> usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
>   Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
>   Reason: image not found


Sorry, don't know exactly what's happening here. Is this from a  
"fresh" python prompt?

How did you install MySQLdb, did you use easy_install? If so, try to  
install from the sourceforge download.

Try to remove it, remove the "build" directory from your mysqldb  
download and redo the whole
python setup.py build / python setup.py install process

To remove it, nuke this:
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg

And try to reinstall?

Perhaps someone who knows what the problem is here can give you a  
better idea on what to do.

-steve

From sbassi at gmail.com  Thu Oct  4 02:47:44 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 4 Oct 2007 03:47:44 -0300
Subject: [BioPython] Problem with blast xml
Message-ID: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>

I am having a problem that it is not originated in Biopython, but it
is affecting the Biopython (1.43) xml blast parser.
I have two xml files, one can be parsed and the other can't.
Here are the commands I run to get the xml files:

sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml
sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o
TABB2v2.xml

The relevant difference is the input file, the sequences are
different, but the output file should have the same format (shouldn't
it?).
When I am parsing the files, I find that this is not true.
This is the file that can be parsed without problem:

>>> bout=open('bioinfo/INTA/TABB2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 31'
>>> y.query
u'fragment 67'
>>> x.alignments
[<Bio.Blast.Record.Alignment instance at 0xb659850c>]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a3c6c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3cec>,
<Bio.Blast.Record.Alignment instance at 0xb65a3d8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3f8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e4c>,
<Bio.Blast.Record.Alignment instance at 0xb65aa1ac>]

Let's see what seems to be a malformed? xml file:

>>> bout=open('bioinfo/INTA/TABB2v2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 1'
>>> y.query
u'fragment 57'
>>> x.alignments
[]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a374c>]

There is a record with an empty list.

Here is a fragment of the "normal" one (TABB2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>31</Iteration_iter-num>
      <Iteration_query-ID>lcl|31_0</Iteration_query-ID>
      <Iteration_query-def>fragment 31 </Iteration_query-def>
      <Iteration_query-len>1174</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|1788520|gb|AE000309.1|AE000309</Hit_id>
          <Hit_def>Escherichia coli K-12 MG1655 section 199 of 400 of
the complete genome</Hit_def>
          <Hit_accession>AE000309</Hit_accession>
          <Hit_len>13453</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>

Here is a fragment of the "malformed" one (TABB2v2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>400</Statistics_db-num>
          <Statistics_db-len>4662239</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.710603</Statistics_kappa>
          <Statistics_lambda>1.37406</Statistics_lambda>
          <Statistics_entropy>1.30725</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>57</Iteration_iter-num>

Why is this happening? Is this a expected behavior?

I uploaded the xml files here:
http://www.bioinformatica.info/TABB2.xml
http://www.bioinformatica.info/TABB2v2.xml

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From ytu888 at hotmail.com  Thu Oct  4 08:24:18 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Thu, 4 Oct 2007 07:24:18 -0500
Subject: [BioPython] Error generated by Clustalw example in Tutorial
Message-ID: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>


Hi,

I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.

>>> from Bio import Clustalw

>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))

>>> cline.set_output("result.aln")

>>> print cline
clustalw .\opuntia.fasta -OUTFILE=result.aln

>>> alignment = Clustalw.do_alignment(cline)
Traceback (most recent call last):  File "<interactive input>", line 1, in <module>  File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment    % (out_file, command_line))IOError: Output .aln file result.aln not produced, commandline: clustalw .\opuntia.fasta -OUTFILE=result.aln

_________________________________________________________________
Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now.
http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033

From sbassi at gmail.com  Thu Oct  4 12:19:22 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 4 Oct 2007 13:19:22 -0300
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
Message-ID: <b43bf2080710040919r4bb281edpac6f594d284fc940@mail.gmail.com>

On 10/4/07, Y Tu <ytu888 at hotmail.com> wrote:
> >>> print cline
> clustalw .\opuntia.fasta -OUTFILE=result.aln

I am not sure if this command is properly formated. The slash should
not be there, but I don't have a windows box to try this.

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From mdehoon at c2b2.columbia.edu  Thu Oct  4 21:01:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 4 Oct 2007 21:01:59 -0400
Subject: [BioPython] Problem with blast xml
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>

Can you create two minimal XML files that demonstrate the problem?
For example, by removing records from the two files you have and checking if
parsing still works for one and fails for the other.
By doing so, you may be able to identify exactly what the essential
difference between the two files is.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Sebastian Bassi
Sent: Thu 10/4/2007 2:47 AM
To: biopython at biopython.org
Subject: [BioPython] Problem with blast xml
 
I am having a problem that it is not originated in Biopython, but it
is affecting the Biopython (1.43) xml blast parser.
I have two xml files, one can be parsed and the other can't.
Here are the commands I run to get the xml files:

sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml
sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o
TABB2v2.xml

The relevant difference is the input file, the sequences are
different, but the output file should have the same format (shouldn't
it?).
When I am parsing the files, I find that this is not true.
This is the file that can be parsed without problem:

>>> bout=open('bioinfo/INTA/TABB2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 31'
>>> y.query
u'fragment 67'
>>> x.alignments
[<Bio.Blast.Record.Alignment instance at 0xb659850c>]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a3c6c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3cec>,
<Bio.Blast.Record.Alignment instance at 0xb65a3d8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3f8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e4c>,
<Bio.Blast.Record.Alignment instance at 0xb65aa1ac>]

Let's see what seems to be a malformed? xml file:

>>> bout=open('bioinfo/INTA/TABB2v2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 1'
>>> y.query
u'fragment 57'
>>> x.alignments
[]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a374c>]

There is a record with an empty list.

Here is a fragment of the "normal" one (TABB2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>31</Iteration_iter-num>
      <Iteration_query-ID>lcl|31_0</Iteration_query-ID>
      <Iteration_query-def>fragment 31 </Iteration_query-def>
      <Iteration_query-len>1174</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|1788520|gb|AE000309.1|AE000309</Hit_id>
          <Hit_def>Escherichia coli K-12 MG1655 section 199 of 400 of
the complete genome</Hit_def>
          <Hit_accession>AE000309</Hit_accession>
          <Hit_len>13453</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>

Here is a fragment of the "malformed" one (TABB2v2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>400</Statistics_db-num>
          <Statistics_db-len>4662239</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.710603</Statistics_kappa>
          <Statistics_lambda>1.37406</Statistics_lambda>
          <Statistics_entropy>1.30725</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>57</Iteration_iter-num>

Why is this happening? Is this a expected behavior?

I uploaded the xml files here:
http://www.bioinformatica.info/TABB2.xml
http://www.bioinformatica.info/TABB2v2.xml

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From sbassi at gmail.com  Fri Oct  5 01:39:44 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 5 Oct 2007 02:39:44 -0300
Subject: [BioPython] Problem with blast xml
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>
	<6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
Message-ID: <b43bf2080710042239t5bcb9c22u6cffdd915bf5a862@mail.gmail.com>

On 10/4/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Can you create two minimal XML files that demonstrate the problem?
> For example, by removing records from the two files you have and checking if
> parsing still works for one and fails for the other.
> By doing so, you may be able to identify exactly what the essential
> difference between the two files is.

After some tests, I found two minimal XML files with this issue:
http://www.bioinformatica.info/mitoA.xml
http://www.bioinformatica.info/mitoB.xml

(only 3.5 kb each).


-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From mdehoon at c2b2.columbia.edu  Fri Oct  5 02:34:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 5 Oct 2007 02:34:56 -0400
Subject: [BioPython] Problem with blast xml
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
	<b43bf2080710042239t5bcb9c22u6cffdd915bf5a862@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B631@mail2.exch.c2b2.columbia.edu>

>From looking at the XML files, it seems that the Biopython Blast XML parser
is doing the right thing. Isn't it?

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Sebastian Bassi [mailto:sbassi at gmail.com]
Sent: Fri 10/5/2007 1:39 AM
To: Michiel De Hoon
Cc: biopython at biopython.org
Subject: Re: [BioPython] Problem with blast xml
 
On 10/4/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Can you create two minimal XML files that demonstrate the problem?
> For example, by removing records from the two files you have and checking
if
> parsing still works for one and fails for the other.
> By doing so, you may be able to identify exactly what the essential
> difference between the two files is.

After some tests, I found two minimal XML files with this issue:
http://www.bioinformatica.info/mitoA.xml
http://www.bioinformatica.info/mitoB.xml

(only 3.5 kb each).


-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From biopython at maubp.freeserve.co.uk  Fri Oct  5 05:26:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 05 Oct 2007 10:26:06 +0100
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
Message-ID: <4706032E.1020703@maubp.freeserve.co.uk>

Y Tu wrote:
> Hi,
> 
> I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.
> 
>>>> from Bio import Clustalw
>>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))
>>>> cline.set_output("result.aln")
>>>> print cline
> clustalw .\opuntia.fasta -OUTFILE=result.aln

The Windows version of ClustalW is very fussy.  To experiment try 
running this by hand at the windows command prompt - note that I'm not 
at my Windows machine so I haven't double checked this:

clustalw .\opuntia.fasta -OUTFILE=result.aln

or,

clustalw opuntia.fasta -OUTFILE=result.aln

Any error messages would be helpful.

I suggest you try this in Biopython:

from Bio import Clustalw
cline = Clustalw.MultipleAlignCL("opuntia.fasta")
cline.set_output("result.aln")
print cline

Also, we have made a few tweaks to this code since Biopython 1.43 was 
released (see emails with Emanuel Hey in July 2007).  If you like, you 
can try updating this module to the CVS version.  Simply backup the 
existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and 
replace it with the latest code from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python

Peter


From ytu888 at hotmail.com  Fri Oct  5 12:32:05 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 5 Oct 2007 11:32:05 -0500
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <4706032E.1020703@maubp.freeserve.co.uk>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
	<4706032E.1020703@maubp.freeserve.co.uk>
Message-ID: <BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>


I tested both commands under window prompt, initially both generated error because window don't know clustalw. Once I give the correct path of the clustalw, both generated alignment results without any error. BTW, I used the one inside BioEdit, I did not find clustalw coming with Biopython. It looks like python use online program at ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right?

Then I replace the old _ini_with the new one, but there is a new error message similar to the old one:

>>> alignment = Clustalw.do_alignment(cline)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment
    # check if the outfile exists before parsing
IOError: Output .aln file result1.aln not produced, commandline: clustalw opuntia.fasta -OUTFILE=result1.aln

Also I tested the example on OS X, the same error was generated:

>>> alignment = Clustalw.do_alignment(cline)
sh: line 1: clustalw: command not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 117, in do_alignment
    % (out_file, command_line))
IOError: Output .aln file result1.aln not produced, commandline: clustalw ./opuntia.fasta -OUTFILE=result1.aln

It seems like the problem is not linked to OS. What other things could be wrong? Thanks.


> Date: Fri, 5 Oct 2007 10:26:06 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> CC: biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error generated by Clustalw example in Tutorial
> 
> Y Tu wrote:
> > Hi,
> > 
> > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.
> > 
> >>>> from Bio import Clustalw
> >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))
> >>>> cline.set_output("result.aln")
> >>>> print cline
> > clustalw .\opuntia.fasta -OUTFILE=result.aln
> 
> The Windows version of ClustalW is very fussy.  To experiment try 
> running this by hand at the windows command prompt - note that I'm not 
> at my Windows machine so I haven't double checked this:
> 
> clustalw .\opuntia.fasta -OUTFILE=result.aln
> 
> or,
> 
> clustalw opuntia.fasta -OUTFILE=result.aln
> 
> Any error messages would be helpful.
> 
> I suggest you try this in Biopython:
> 
> from Bio import Clustalw
> cline = Clustalw.MultipleAlignCL("opuntia.fasta")
> cline.set_output("result.aln")
> print cline
> 
> Also, we have made a few tweaks to this code since Biopython 1.43 was 
> released (see emails with Emanuel Hey in July 2007).  If you like, you 
> can try updating this module to the CVS version.  Simply backup the 
> existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and 
> replace it with the latest code from here:
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python
> 
> Peter
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us

From biopython at maubp.freeserve.co.uk  Fri Oct  5 14:35:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 05 Oct 2007 19:35:05 +0100
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>	<4706032E.1020703@maubp.freeserve.co.uk>
	<BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>
Message-ID: <470683D9.90808@maubp.freeserve.co.uk>

Y Tu wrote:
> I tested both commands under window prompt, initially both generated 
> error because window don't know clustalw.

This is expected.  You must either supply the full path of the clustalw 
executable, or have it on the system path.  Otherwise Windows doesn't 
know how to find the clustalw program.

> Once I give the correct path of the clustalw, both generated 
> alignment results without any error. BTW, I used the one inside 
> BioEdit, I did not find clustalw coming with Biopython. It looks like
> python use online program at
> ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right?

Clustalw is a standalone program (completely separate from Biopython)
which you must install separately if you want to use it.  It is 
available from several servers - the one you chose looks fine.

> Then I replace the old _ini_with the new one, but there is a new 
> error message similar to the old one:
> 
>>>> alignment = Clustalw.do_alignment(cline)
> Traceback (most recent call last): File "<interactive input>", line 
> 1, in <module> File 
> "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, 
> in do_alignment # check if the outfile exists before parsing IOError:
>  Output .aln file result1.aln not produced, commandline: clustalw 
> opuntia.fasta -OUTFILE=result1.aln
> 
> Also I tested the example on OS X, the same error was generated:
> 
>>>> alignment = Clustalw.do_alignment(cline)
> sh: line 1: clustalw: command not found Traceback (most recent call 
> last): File "<stdin>", line 1, in <module> File 
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
>  line 117, in do_alignment % (out_file, command_line)) IOError: 
> Output .aln file result1.aln not produced, commandline: clustalw 
> ./opuntia.fasta -OUTFILE=result1.aln
> 
> It seems like the problem is not linked to OS. What other things 
> could be wrong? Thanks.

In both cases, you are not explicitly providing the path to clustalw - 
so for this to work the clustalw executable must be on the system path.

The other obvious thing to check is the location of the files versus the 
working directory. Is your python script in the same folder as the 
opuntia.fasta file?

What happens if you try those exact command lines (which Biopython says 
it is trying to run) at the command prompt in directory where your 
python script is located? i.e.

Windows:
clustalw opuntia.fasta -OUTFILE=result1.aln

Mac:
clustalw ./opuntia.fasta -OUTFILE=result1.aln

Peter


From meesters at uni-mainz.de  Mon Oct  8 11:07:54 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 8 Oct 2007 17:07:54 +0200
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
Message-ID: <1191856074.5425.24.camel@cmeesters>

Hi,

I'm trying to 'split' a structure in several pieces, e.g. a former chain
'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...

Perhaps some code explains better what I'm trying to achieve:

breakpoints = [1254, 5444,
                6690, 10888,
                10889, 16332,
                16333, 21776,
                21776, 27220,
                27221, 32665]

def split_chain(structure, breakpoints, outname = 'split.pdb'):
    chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
                    'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
                    'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
                    'X', 'Y', 'Z']

    chain = chains.pop(0)
    for atom in structure.get_atoms():
        number = atom.get_serial_number()
        if breaks and number == breaks[0]:
            breaks.pop(0)
            chain = chains.pop(0)
        atom.parent.parent.id = chain # assign new chain

    iostream = PDBIO()
    try:
        outfile = open(outname, 'w')
        iostream.set_structure(structure.structure)
        iostream.save(outfile)
    except IOError, msg:
        raise IOError(msg)

So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
5444. Instead the written pdb-file contains all atoms, but with the
wrong chain ids (see above). (Please don't tell my how unpythonic the
code reads, point is that I've tried so many different things that I
first need to understand my logic mistake.)

Any ideas, where my mistake is?

Thanks,
Christian


From meesters at uni-mainz.de  Mon Oct  8 11:54:32 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 8 Oct 2007 17:54:32 +0200
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
In-Reply-To: <470A508C.4060803@maubp.freeserve.co.uk>
References: <1191856074.5425.24.camel@cmeesters>
	<470A508C.4060803@maubp.freeserve.co.uk>
Message-ID: <1191858872.5425.32.camel@cmeesters>


> > breakpoints = [1254, 5444,
> >                 6690, 10888,
> >                 10889, 16332,
> >                 16333, 21776,
> >                 21776, 27220,
> >                 27221, 32665]
> 
> I'm assuming this is "breaks" later on.
Absolutely - that's the pain with copy & paste for demos ... sorry.

> As the reason, I think this is what is happening: Given an atom, then 
> atom.parent will be a residue object, and atom.parent.parent will be a 
> chain object.  Note all the atoms in a single amino acid residue will 
> share share the same .parent, and all the atoms in a single chain will 
> share the same .parent.parent
> 
> i.e. You have renamed Chain "A" to "A", and then later renamed this 
> chain to "B", and then again to "C".  You didn't ever split up the chain 
> into sub chains.
Mh, makes sense. 
> 
> To be honest, I would be tempted to write a quick and dirty script which 
> parsed the raw PDB file, and rewrote the chain field based on the atom 
> sequence number - without the overhead of the PDB parser.
Yes, would have been too easy ;-). Only wanted to add this functionality
to a larger application and make it easy to use. There is no strict need
to do so, but it would have been nice.
However, thanks for the input.

Christian

From biopython at maubp.freeserve.co.uk  Mon Oct  8 11:45:16 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 08 Oct 2007 16:45:16 +0100
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
In-Reply-To: <1191856074.5425.24.camel@cmeesters>
References: <1191856074.5425.24.camel@cmeesters>
Message-ID: <470A508C.4060803@maubp.freeserve.co.uk>

Christian Meesters wrote:
> Hi,
> 
> I'm trying to 'split' a structure in several pieces, e.g. a former chain
> 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
> Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...
> 
> Perhaps some code explains better what I'm trying to achieve:
> 
> breakpoints = [1254, 5444,
>                 6690, 10888,
>                 10889, 16332,
>                 16333, 21776,
>                 21776, 27220,
>                 27221, 32665]

I'm assuming this is "breaks" later on.

> def split_chain(structure, breakpoints, outname = 'split.pdb'):
>     chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
>                     'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
>                     'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
>                     'X', 'Y', 'Z']
> 
>     chain = chains.pop(0)
>     for atom in structure.get_atoms():
>         number = atom.get_serial_number()
>         if breaks and number == breaks[0]:
>             breaks.pop(0)
>             chain = chains.pop(0)
>         atom.parent.parent.id = chain # assign new chain
> 
>     iostream = PDBIO()
>     try:
>         outfile = open(outname, 'w')
>         iostream.set_structure(structure.structure)
>         iostream.save(outfile)
>     except IOError, msg:
>         raise IOError(msg)
> 
> So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
> 5444. Instead the written pdb-file contains all atoms, but with the
> wrong chain ids (see above). (Please don't tell my how unpythonic the
> code reads, point is that I've tried so many different things that I
> first need to understand my logic mistake.)
> 
> Any ideas, where my mistake is?

As the reason, I think this is what is happening: Given an atom, then 
atom.parent will be a residue object, and atom.parent.parent will be a 
chain object.  Note all the atoms in a single amino acid residue will 
share share the same .parent, and all the atoms in a single chain will 
share the same .parent.parent

i.e. You have renamed Chain "A" to "A", and then later renamed this 
chain to "B", and then again to "C".  You didn't ever split up the chain 
into sub chains.

I think you need to create a new chain objects instead... but I'm not 
sure off hand how best to do this with Bio.PDB

To be honest, I would be tempted to write a quick and dirty script which 
parsed the raw PDB file, and rewrote the chain field based on the atom 
sequence number - without the overhead of the PDB parser.

Peter


From bbrazelton at gmail.com  Mon Oct  8 20:33:03 2007
From: bbrazelton at gmail.com (B. Brazelton)
Date: Mon, 8 Oct 2007 17:33:03 -0700
Subject: [BioPython] BLAST XML parser trouble
Message-ID: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>

I tried to follow the BLAST XML parser example in the tutorial, but I
always get the following error when attempting to iterate through the
records:

Traceback (most recent call last):
  File "BlastXML_Parser.py", line 10, in ?
    for blast_record in blast_records:
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 572, in parse
    expat_parser.Parse(text, False)
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 98, in endElement
    eval("self.%s()" % method)
  File "<string>", line 0, in ?
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 215, in _end_BlastOutput_version
    self._header.version = self._value.split()[1]
IndexError: list index out of range

All I did was:

result_handle = open('NifH_Blast.xml')
from Bio.Blast import NCBIXML
blast_records = NCBIXML.parse(result_handle)
for blast_record in blast_records:
    ... etc

I put my script and xml file here:
http://www.staff.washington.edu/braz/files

I'm using biopython 1.43, and I get the same error on both Python
2.3.5 and Python 5.

It seems like my commands are exactly what is in the tutorial, so I'm
confused. My best guess is that there is a difference in the XML
format, but it's NCBI XML. Thanks for any help,

Bill Brazelton

From sbassi at gmail.com  Mon Oct  8 20:48:50 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 8 Oct 2007 21:48:50 -0300
Subject: [BioPython] BLAST XML parser trouble
In-Reply-To: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
Message-ID: <b43bf2080710081748r1d7e5935sde58fb894d820def@mail.gmail.com>

On 10/8/07, B. Brazelton <bbrazelton at gmail.com> wrote:
> I tried to follow the BLAST XML parser example in the tutorial, but I
> always get the following error when attempting to iterate through the
> records:

Got the same result as you. Could you please tell me the URL of the
tutorial you saw this?

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From mdehoon at c2b2.columbia.edu  Mon Oct  8 22:55:21 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon, 8 Oct 2007 22:55:21 -0400
Subject: [BioPython] BLAST XML parser trouble
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>

How did you produce the XML file? In particular, which Blast version did you
use?
The Blast XML parser trips over the following line in your XML file:

    <BlastOutput_version>unspecified</BlastOutput_version>

This is supposed to be:

  <BlastOutput_version>BLASTP 2.2.12 [Aug-07-2005]</BlastOutput_version>

, of course depending on which Blast version you are using.

--Michiel


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton
Sent: Mon 10/8/2007 8:33 PM
To: biopython at biopython.org
Subject: [BioPython] BLAST XML parser trouble
 
I tried to follow the BLAST XML parser example in the tutorial, but I
always get the following error when attempting to iterate through the
records:

Traceback (most recent call last):
  File "BlastXML_Parser.py", line 10, in ?
    for blast_record in blast_records:
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 572, in parse
    expat_parser.Parse(text, False)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 98, in endElement
    eval("self.%s()" % method)
  File "<string>", line 0, in ?
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 215, in _end_BlastOutput_version
    self._header.version = self._value.split()[1]
IndexError: list index out of range

All I did was:

result_handle = open('NifH_Blast.xml')
from Bio.Blast import NCBIXML
blast_records = NCBIXML.parse(result_handle)
for blast_record in blast_records:
    ... etc

I put my script and xml file here:
http://www.staff.washington.edu/braz/files

I'm using biopython 1.43, and I get the same error on both Python
2.3.5 and Python 5.

It seems like my commands are exactly what is in the tutorial, so I'm
confused. My best guess is that there is a difference in the XML
format, but it's NCBI XML. Thanks for any help,

Bill Brazelton
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From kbaa at novonordisk.com  Tue Oct  9 08:26:14 2007
From: kbaa at novonordisk.com (KBAA (Kent Bondensgaard))
Date: Tue, 9 Oct 2007 14:26:14 +0200
Subject: [BioPython] FW: Parsing sequence information in patents
Message-ID: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>


Does anyone know how to parse protein sequence information in patents with Biopython?

BR, Kent Bondensgaards

__________________________________

Kent Bondensgaard
Research Scientist
Protein Structure and Biophysics

Novo Nordisk A/S
Novo Nordisk Park
DK-2760 M?l?v
Denmark
+45 4443 4510 (direct)
+45 3075 4510 (mobile)
+45 4466 3450 (fax)
kbaa at novonordisk.com

Changing the way we look at diabetes
A new DAWN for people with diabetes? Click here to read more <http://www.novonordisk.com/about_us/changing-diabetes-activities/dawn.asp> 

This e-mail (including any attachments) is intended for the addressee(s) stated above only and may contain confidential information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information contained herein is strictly prohibited and may violate rights to proprietary information. If you are not an intended recipient, please return this e-mail to the sender and delete it immediately hereafter. Thank you.


From sbassi at gmail.com  Tue Oct  9 09:04:51 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 9 Oct 2007 10:04:51 -0300
Subject: [BioPython] FW: Parsing sequence information in patents
In-Reply-To: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>
References: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>
Message-ID: <b43bf2080710090604s5dcff35asd31dd65cd6a254d6@mail.gmail.com>

On 10/9/07, KBAA (Kent Bondensgaard) <kbaa at novonordisk.com> wrote:
>
> Does anyone know how to parse protein sequence information in patents with Biopython?

What about using patAA and patNT from NCBI? They are both available as
blast ready, you could retrieve the fasta file using fastacmd.

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From bbrazelton at gmail.com  Tue Oct  9 16:24:58 2007
From: bbrazelton at gmail.com (B. Brazelton)
Date: Tue, 9 Oct 2007 13:24:58 -0700
Subject: [BioPython] BLAST XML parser trouble
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
	<6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>
Message-ID: <da0826f20710091324m7ec36306v29b36d6f2029073c@mail.gmail.com>

I put in 'tblastx 2.2.15 [Oct-15-2006]' and it worked fine.

Thanks for your help, sorry for the newbie question.

(FYI, I was using results generated from the CAMERA database
(http://camera.calit2.net/), and I was using the main biopython
tutorial and cookbook from biopython.org. thanks again,

BB

On 10/8/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> How did you produce the XML file? In particular, which Blast version did you
> use?
> The Blast XML parser trips over the following line in your XML file:
>
>     <BlastOutput_version>unspecified</BlastOutput_version>
>
> This is supposed to be:
>
>   <BlastOutput_version>BLASTP 2.2.12 [Aug-07-2005]</BlastOutput_version>
>
> , of course depending on which Blast version you are using.
>
> --Michiel
>
>
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton
> Sent: Mon 10/8/2007 8:33 PM
> To: biopython at biopython.org
> Subject: [BioPython] BLAST XML parser trouble
>
> I tried to follow the BLAST XML parser example in the tutorial, but I
> always get the following error when attempting to iterate through the
> records:
>
> Traceback (most recent call last):
>   File "BlastXML_Parser.py", line 10, in ?
>     for blast_record in blast_records:
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 572, in parse
>     expat_parser.Parse(text, False)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 98, in endElement
>     eval("self.%s()" % method)
>   File "<string>", line 0, in ?
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 215, in _end_BlastOutput_version
>     self._header.version = self._value.split()[1]
> IndexError: list index out of range
>
> All I did was:
>
> result_handle = open('NifH_Blast.xml')
> from Bio.Blast import NCBIXML
> blast_records = NCBIXML.parse(result_handle)
> for blast_record in blast_records:
>     ... etc
>
> I put my script and xml file here:
> http://www.staff.washington.edu/braz/files
>
> I'm using biopython 1.43, and I get the same error on both Python
> 2.3.5 and Python 5.
>
> It seems like my commands are exactly what is in the tutorial, so I'm
> confused. My best guess is that there is a difference in the XML
> format, but it's NCBI XML. Thanks for any help,
>
> Bill Brazelton
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>

From sbassi at gmail.com  Tue Oct  9 17:09:09 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 9 Oct 2007 18:09:09 -0300
Subject: [BioPython] Getting Qv using Python?
Message-ID: <b43bf2080710091409t19b0ae14t3e587d64b011ccf3@mail.gmail.com>

Is there an automated way to get Quality Values (QV) from a ab1 file?
I wrap Abiview [1] to get the sequence, but now I need the Qv.

[1] http://bioweb.pasteur.fr/docs/EMBOSS/abiview.html

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From prashanth at ibioinformatics.org  Wed Oct 10 08:17:26 2007
From: prashanth at ibioinformatics.org (Prashantha Hebbar Kiradi)
Date: Wed, 10 Oct 2007 17:47:26 +0530
Subject: [BioPython] where is SeqIO.parse()?
Message-ID: <470CC2D6.1090504@ibioinformatics.org>

Hi everybody,

While trying the example of 'Parsing sequence file formats' from section
2.4 of Biopython tutorial:
-------------------------------------------------
from Bio import SeqIO
handle = open("ls_orchid.fasta")
for seq_record in SeqIO.parse(handle, "fasta") :
    print seq_record.id
    print seq_record.seq
    print len(seq_record.seq)
handle.close()
-------------------------------------------------


I get this error:
-------------------------------------------------
Traceback (most recent call last):
  File "fastEx.py", line 5, in <module>
    for seq_record in SeqIO.parse(handle, "fasta") :
AttributeError: 'module' object has no attribute 'parse'
-------------------------------------------------

Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm
using is opening correctly.

API documentation reports that the 'parse' function is there. What am I
doing wrong?

I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Thanks in advance,

Prashantha Hebbar
Institute of Bioinformatics
ITPL, Bangalore,
INDIA


From fennan at gmail.com  Wed Oct 10 08:20:56 2007
From: fennan at gmail.com (Fernando)
Date: Wed, 10 Oct 2007 14:20:56 +0200
Subject: [BioPython] Code publications
Message-ID: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>

Hi everybody,

This might be off-topic, or maybe not:

I've been working with biopython for a while and I am curious about what the
authors get from all the exceptional work they are doing... I know it won't
have to do anything with money, but in terms of publication / copyrihts etc,
what are the adventages of having your code in biopython? Is there a journey
/ conference where the author publish their works and likewise they can be
referenced or something like that?

Thanks,
Fernando

From mdehoon at c2b2.columbia.edu  Wed Oct 10 08:24:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 10 Oct 2007 08:24:33 -0400
Subject: [BioPython] where is SeqIO.parse()?
References: <470CC2D6.1090504@ibioinformatics.org>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B635@mail2.exch.c2b2.columbia.edu>

> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Use Biopython 1.43.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Prashantha Hebbar
Kiradi
Sent: Wed 10/10/2007 8:17 AM
To: biopython at biopython.org
Subject: [BioPython] where is SeqIO.parse()?
 
Hi everybody,

While trying the example of 'Parsing sequence file formats' from section
2.4 of Biopython tutorial:
-------------------------------------------------
from Bio import SeqIO
handle = open("ls_orchid.fasta")
for seq_record in SeqIO.parse(handle, "fasta") :
    print seq_record.id
    print seq_record.seq
    print len(seq_record.seq)
handle.close()
-------------------------------------------------


I get this error:
-------------------------------------------------
Traceback (most recent call last):
  File "fastEx.py", line 5, in <module>
    for seq_record in SeqIO.parse(handle, "fasta") :
AttributeError: 'module' object has no attribute 'parse'
-------------------------------------------------

Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm
using is opening correctly.

API documentation reports that the 'parse' function is there. What am I
doing wrong?

I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Thanks in advance,

Prashantha Hebbar
Institute of Bioinformatics
ITPL, Bangalore,
INDIA

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at uiuc.edu  Wed Oct 10 10:14:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 Oct 2007 09:14:48 -0500
Subject: [BioPython] Code publications
In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
Message-ID: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>

This is a question that could be posed for any open-source project.

It differs per person in my opinion.  For instance, I donate time and  
code to BioPerl based on several factors.  Not reinventing the wheel,  
giving back to the community, access to the code base, and the joy of  
programming (believe it or not) are among them, but they aren't the  
only ones.

Publications don't hurt but they aren't my primary motivation.  It  
generally isn't the focus of my research, only a means to an end (to  
parse or generate data).  I don't see anything wrong with it being  
someone else's primary drive to donate as long as they continue  
support their code post-publication, an issue that unfortunately pops  
up quite frequently.

chris

On Oct 10, 2007, at 7:20 AM, Fernando wrote:

> Hi everybody,
>
> This might be off-topic, or maybe not:
>
> I've been working with biopython for a while and I am curious about  
> what the
> authors get from all the exceptional work they are doing... I know  
> it won't
> have to do anything with money, but in terms of publication /  
> copyrihts etc,
> what are the adventages of having your code in biopython? Is there  
> a journey
> / conference where the author publish their works and likewise they  
> can be
> referenced or something like that?
>
> Thanks,
> Fernando
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From biopython at maubp.freeserve.co.uk  Wed Oct 10 08:42:01 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Oct 2007 13:42:01 +0100
Subject: [BioPython] Code publications
In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
Message-ID: <470CC899.6080802@maubp.freeserve.co.uk>

Fernando wrote:
> Hi everybody,
> 
> This might be off-topic, or maybe not:
> 
> I've been working with biopython for a while and I am curious about what the
> authors get from all the exceptional work they are doing... I know it won't
> have to do anything with money, but in terms of publication / copyrihts etc,
> what are the adventages of having your code in biopython? Is there a journey
> / conference where the author publish their works and likewise they can be
> referenced or something like that?

Pride? Looks good on a CV?  Although I must say working on BioPerl would 
have been a better choice from the point of view of job hunting ;)

Some of the specific modules have associated publications which get 
cited (e.g. Bio.PDB and Bio.Cluster - although the later is also 
available independently of Biopython).  The closest to a general 
Biopython paper is currently Chapman and Chang 2000.

In terms of talks, most recently I gave a talk at BOSC 2007 in July, the 
"Biopython Project Update". Which reminds me, I have a few photos and 
the slides (sadly in PowerPoint - my initial attempt to convert them 
into PDF wasn't great, font issues leading to content getting cropped).

Peter


From tiagoantao at gmail.com  Wed Oct 10 12:59:56 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Wed, 10 Oct 2007 17:59:56 +0100
Subject: [BioPython] Code publications
In-Reply-To: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
	<865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>
Message-ID: <470D050C.7060500@gmail.com>

I am currently submitting my populations genetics' code into biopython 
and I can talk about my motivations.

Most of the code that I am submitting was used in something that I have 
done in the past (sometimes published). I figured, that if I have the 
code sitting here, I could as well donate it. This has one interesting 
advantage for me: all the code that I know I will try to submit to 
biopython is designed with care, all the code that is a one off is 
really a big mess. For me making code public is a motivator to maintain 
clean code.

It is also a way to get to know people that are interested in this type 
of problems, and I think that, as with all things in life, knowing more 
people is a good thing.

Maybe, in 12/18 months time I might think in suggesting to other people 
writing an article on the popgen work in biopython. Lets face it, that 
is also a good motivator. But, if it is the only one, I would agree that 
is not good (as Chris says, maintenance after publication...)

Last, but not least: ethical and moral issues. Having spent some time 
outside of science I do think most scientific work is done in a very 
closed fashion (it was a shock to me, really). From my personal point of 
view open science and free software are arguments to which I connect 
moral value.

Tiago

Chris Fields wrote:
> This is a question that could be posed for any open-source project.
>
> It differs per person in my opinion.  For instance, I donate time and  
> code to BioPerl based on several factors.  Not reinventing the wheel,  
> giving back to the community, access to the code base, and the joy of  
> programming (believe it or not) are among them, but they aren't the  
> only ones.
>
> Publications don't hurt but they aren't my primary motivation.  It  
> generally isn't the focus of my research, only a means to an end (to  
> parse or generate data).  I don't see anything wrong with it being  
> someone else's primary drive to donate as long as they continue  
> support their code post-publication, an issue that unfortunately pops  
> up quite frequently.
>
> chris
>
> On Oct 10, 2007, at 7:20 AM, Fernando wrote:
>
>   
>> Hi everybody,
>>
>> This might be off-topic, or maybe not:
>>
>> I've been working with biopython for a while and I am curious about  
>> what the
>> authors get from all the exceptional work they are doing... I know  
>> it won't
>> have to do anything with money, but in terms of publication /  
>> copyrihts etc,
>> what are the adventages of having your code in biopython? Is there  
>> a journey
>> / conference where the author publish their works and likewise they  
>> can be
>> referenced or something like that?
>>
>> Thanks,
>> Fernando
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>   


From rebekah.rogers at gmail.com  Thu Oct 11 14:57:21 2007
From: rebekah.rogers at gmail.com (Rebekah Rogers)
Date: Thu, 11 Oct 2007 14:57:21 -0400
Subject: [BioPython] running PAML in python
Message-ID: <79def59f0710111157h7483d5b5m6e6cdb3b86266750@mail.gmail.com>

Hello:

Does anyone know of an existing library that can run aligned sequences
in PAML and then pull out the dN/dS values?

Thanks!
-Rebekah

From The_Polymorph at rocketmail.com  Sun Oct 14 13:04:48 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 14 Oct 2007 10:04:48 -0700 (PDT)
Subject: [BioPython] Performing sequence alignments, etc.
Message-ID: <311410.84366.qm@web50801.mail.re2.yahoo.com>

Hi all.

Hi all.

I'm relatively new to the field of bioinformatics and I'm trying to
perform a multiple sequence alignment on 5-6 sequences (fasta format -
dna sequences). I'd like the output to be formatted in the following
manner (clustalw standalone output):

accession_number1: atctcgatatcgggcgctcta...
accession_number2: atctctattctctggatctct...
...

When one more more nucleotides columns are identical, clustalw displays
an asterisk. If not, a blank space is displayed. Is this a standard
feature of BioPython?

Also, I'm evaluating several sequences but I'd like to obtain the most
recent complete genomes possible from various countries. Is there a
convenient source to use (GenBank?) if I don't know the accession
numbers?

Thanks,

~Caitlin
   

Thanks,

~Caitlin


____________________________________________________________________________________
Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/

From biopython at maubp.freeserve.co.uk  Sun Oct 14 13:38:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 14 Oct 2007 18:38:32 +0100
Subject: [BioPython] Performing sequence alignments, etc.
In-Reply-To: <311410.84366.qm@web50801.mail.re2.yahoo.com>
References: <311410.84366.qm@web50801.mail.re2.yahoo.com>
Message-ID: <47125418.5020009@maubp.freeserve.co.uk>

Caitlin wrote:
> Hi all.
> 
> I'm relatively new to the field of bioinformatics and I'm trying to
> perform a multiple sequence alignment on 5-6 sequences (fasta format -
> dna sequences). I'd like the output to be formatted in the following
> manner (clustalw standalone output):

For reading and writing Clustalw alignment files, you could either use 
Bio.SeqIO (format name "clustal") or the Bio.Clustalw module.
http://biopython.org/wiki/SeqIO

> When one more more nucleotides columns are identical, clustalw displays
> an asterisk. If not, a blank space is displayed. Is this a standard
> feature of BioPython?

There is an example of Clustalw output online here - note there can also 
be a column of numbers on the right hand side (not shown here):
http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format

It sounds like you are describing the simple consensus string which 
clustalw outputs under the alignment (using *:. and space).

Biopython has a SummaryInfo object which can calculate simple consensus 
sequences (see the tutorial). Perhaps this would be close to what you 
want to do.

> Also, I'm evaluating several sequences but I'd like to obtain the most
> recent complete genomes possible from various countries. Is there a
> convenient source to use (GenBank?) if I don't know the accession
> numbers?

What sort of Genomes? Bacteria? Vertebrates?  You could start by having 
a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these 
three are kept in sync with each other).

Biopython has quite a nice interface for searching and downloading 
sequences from GenBank (again, see the tutorial) so that would be my 
first suggestion.

Peter


From The_Polymorph at rocketmail.com  Sun Oct 14 22:13:24 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 14 Oct 2007 19:13:24 -0700 (PDT)
Subject: [BioPython] Performing sequence alignments, etc.
In-Reply-To: <47125418.5020009@maubp.freeserve.co.uk>
Message-ID: <129586.66498.qm@web50807.mail.re2.yahoo.com>

Thanks Peter. The genomes are viral. I'll definitely read that
tutorial.
Your help is very appreciated.

~Caitlin

--- Peter <biopython at maubp.freeserve.co.uk> wrote:

> Caitlin wrote:
> > Hi all.
> > 
> > I'm relatively new to the field of bioinformatics and I'm trying to
> > perform a multiple sequence alignment on 5-6 sequences (fasta
> format -
> > dna sequences). I'd like the output to be formatted in the
> following
> > manner (clustalw standalone output):
> 
> For reading and writing Clustalw alignment files, you could either
> use 
> Bio.SeqIO (format name "clustal") or the Bio.Clustalw module.
> http://biopython.org/wiki/SeqIO
> 
> > When one more more nucleotides columns are identical, clustalw
> displays
> > an asterisk. If not, a blank space is displayed. Is this a standard
> > feature of BioPython?
> 
> There is an example of Clustalw output online here - note there can
> also 
> be a column of numbers on the right hand side (not shown here):
> http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format
> 
> It sounds like you are describing the simple consensus string which 
> clustalw outputs under the alignment (using *:. and space).
> 
> Biopython has a SummaryInfo object which can calculate simple
> consensus 
> sequences (see the tutorial). Perhaps this would be close to what you
> 
> want to do.
> 
> > Also, I'm evaluating several sequences but I'd like to obtain the
> most
> > recent complete genomes possible from various countries. Is there a
> > convenient source to use (GenBank?) if I don't know the accession
> > numbers?
> 
> What sort of Genomes? Bacteria? Vertebrates?  You could start by
> having 
> a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these 
> three are kept in sync with each other).
> 
> Biopython has quite a nice interface for searching and downloading 
> sequences from GenBank (again, see the tutorial) so that would be my 
> first suggestion.
> 
> Peter
> 
> 
> 
> 


"Be who you are and say what you feel because those who mind don't 
matter and those who matter don't mind." 

- Dr. Seuss, "Oh the Places You'll Go"


      ____________________________________________________________________________________
Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
http://autos.yahoo.com/index.html
 

From fredgca at hotmail.com  Mon Oct 15 09:02:27 2007
From: fredgca at hotmail.com (Frederico Arnoldi)
Date: Mon, 15 Oct 2007 13:02:27 +0000
Subject: [BioPython] where is SeqIO.parse()?
In-Reply-To: <mailman.33040.1192414808.2686.biopython@lists.open-bio.org>
References: <mailman.33040.1192414808.2686.biopython@lists.open-bio.org>
Message-ID: <BLU105-W10B76818834BD20393B37FBFA30@phx.gbl>


Dear Kiradi,
Concerning your subject question: where is SeqIO.parse()?
>>> from Bio import SeqIO
>>> SeqIO


  So, in my system, it is at /usr/lib/python2.4/site-packages/Bio/SeqIO/__init__.py. Try the same command in your python console and see where it is in yours.

Concerning your problem:
Try
>>> from Bio import SeqIO
>>> dir()
['SeqIO', '__builtins__', '__doc__', '__name__']
>>> dir(SeqIO)
['Alignment', 'ClustalIO', 'FastaIO', 'InsdcIO', 'Interfaces', 'NexusIO', 'PhylipIO', 'Seq', 'SeqRecord', 'StockholmIO', 'StringIO', 'SwissIO', '_FormatToIterator', '_FormatToWriter', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'generic_alphabet', 'generic_protein', 'os', 'parse', 'to_alignment', 'to_dict', 'write']

   Do you get the same result? See that "parse" is in my SeqIO. Is it in yours? 
   I noted that installing biopython via apt in Ubunutu, the __init__.py in Bio/SeqIO was empty. Maybe it is the source of your problem. But if I am right, when you type, in your system, dir(SeqIO), you get ['__builtins__', '__doc__', '__file__', '__name__', '__path__'], confirming your __init__.py is empty. Check it.
  If this is your problem, try installing biopyton by the tar.gz file available in Biopython home page. 

Good luck,
Fred 


---------------------------------------------------------------------->> Message: 1> Date: Wed, 10 Oct 2007 17:47:26 +0530> From: Prashantha Hebbar Kiradi > Subject: [BioPython] where is SeqIO.parse()?> To: biopython at biopython.org> Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed>> Hi everybody,>> While trying the example of 'Parsing sequence file formats' from section> 2.4 of Biopython tutorial:> -------------------------------------------------> from Bio import SeqIO> handle = open("ls_orchid.fasta")> for seq_record in SeqIO.parse(handle, "fasta") :> print seq_record.id> print seq_record.seq> print len(seq_record.seq)> handle.close()> ------------------------------------------------->>> I get this error:> -------------------------------------------------> Traceback (most recent call last):> File "fastEx.py", line 5, in > for seq_record in SeqIO.parse(handle, "fasta") :> AttributeError: 'module' object has no attribute 'parse'> ------------------------------------------------->> Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm> using is opening correctly.>> API documentation reports that the 'parse' function is there. What am I> doing wrong?>> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.>> Thanks in advance,>> Prashantha Hebbar> Institute of Bioinformatics> ITPL, 
_________________________________________________________________
Receba as ?ltimas not?cias do Brasil e do mundo direto no seu Messenger com Alertas MSN! ? GR?TIS!
http://alertas.br.msn.com/

From ytu888 at hotmail.com  Mon Oct 15 12:19:47 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 15 Oct 2007 11:19:47 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
Message-ID: <BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>


Hi Steve,

Thank you for your email. I was away for a week. 
What do you mean "fresh" python prompt?
I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded online. 
I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, am I right?

Once again, thank you very much for your help..

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Wed, 3 Oct 2007 10:47:41 -0400
> To: ytu888 at hotmail.com
> 
> > Steve, thank you very much. It fixed the problem and I got through  
> > the build and install step. But when I tested inside the python for  
> > the installation I got following error. Please help me about it.  
> > Thanks.
> >
> > >>> import MySQLdb
> > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ 
> > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ 
> > _mysql.py:3: UserWarning: Module _mysql was already imported from / 
> > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc,  
> > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to  
> > sys.path
> >   import sys, pkg_resources, imp
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "MySQLdb/__init__.py", line 19, in <module>
> >     import _mysql
> >   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in  
> > <module>
> >   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in  
> > __bootstrap__
> > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / 
> > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
> >   Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
> >   Reason: image not found
> 
> 
> Sorry, don't know exactly what's happening here. Is this from a  
> "fresh" python prompt?
> 
> How did you install MySQLdb, did you use easy_install? If so, try to  
> install from the sourceforge download.
> 
> Try to remove it, remove the "build" directory from your mysqldb  
> download and redo the whole
> python setup.py build / python setup.py install process
> 
> To remove it, nuke this:
> /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
> 
> And try to reinstall?
> 
> Perhaps someone who knows what the problem is here can give you a  
> better idea on what to do.
> 
> -steve

_________________________________________________________________
Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now.
http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033

From lists.steve at arachnedesign.net  Mon Oct 15 12:30:21 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Mon, 15 Oct 2007 12:30:21 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
Message-ID: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>

Hi,

> Thank you for your email. I was away for a week.
> What do you mean "fresh" python prompt?
> I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> online.
> I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> am I right?

I'm not sure, exactly.

Last time I checked, the only thing you needed to use mysql from  
python was:

(a) A working mysql install (the client/server)
(b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
python

I'm assuming (a) is installed correctly since you are using the .mpkg  
from mysql.org, so I'd just try to fix (b).

You try do so by doing the following:

(1) Remove your original attempt at installing the python mysqldb  
library. From the looks of your error messages, it seems to be  
installed here:

Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/

(2) remove the build directory in your mysqldb directory (the one you  
are installing from) by cd-ing into your mysqldb download, and  
removing the build directory you find there.

(3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
`sudo python setup.py install` dance

For the record, I'm not sure what you are talking about when you are  
distinguishing between "MySQL_python_1.2.2, not MySQLdb"

are you trying to install two python libraries to access mysql?

-steve


From ytu888 at hotmail.com  Mon Oct 15 13:18:42 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 15 Oct 2007 12:18:42 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W26619613084E7C6274C1008FA30@phx.gbl>


What I said: "MySQL_python_1.2.2, not MySQLdb" means to uninstall MySQL_python not the mysql client/server installed with the mpkg.

I just deleted the MYSQL....fat.egg file and downloaded the MySAL-python-1.2.2.tar. I repeated the installation process. However, when I run import MySQLdb, I got the same error message. Is there any other things I should take a look? Thank you very much.


 CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 15 Oct 2007 12:30:21 -0400
> To: ytu888 at hotmail.com
> 
> Hi,
> 
> > Thank you for your email. I was away for a week.
> > What do you mean "fresh" python prompt?
> > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> > online.
> > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> > am I right?
> 
> I'm not sure, exactly.
> 
> Last time I checked, the only thing you needed to use mysql from  
> python was:
> 
> (a) A working mysql install (the client/server)
> (b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
> python
> 
> I'm assuming (a) is installed correctly since you are using the .mpkg  
> from mysql.org, so I'd just try to fix (b).
> 
> You try do so by doing the following:
> 
> (1) Remove your original attempt at installing the python mysqldb  
> library. From the looks of your error messages, it seems to be  
> installed here:
> 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/
> 
> (2) remove the build directory in your mysqldb directory (the one you  
> are installing from) by cd-ing into your mysqldb download, and  
> removing the build directory you find there.
> 
> (3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
> `sudo python setup.py install` dance
> 
> For the record, I'm not sure what you are talking about when you are  
> distinguishing between "MySQL_python_1.2.2, not MySQLdb"
> 
> are you trying to install two python libraries to access mysql?
> 
> -steve
> 

_________________________________________________________________
Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews

From ytu888 at hotmail.com  Tue Oct 16 13:06:36 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 16 Oct 2007 12:06:36 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W36D33A0C262F9D2A0101058F9C0@phx.gbl>


Hi,

I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem.
Thank you very much.

LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build
running build
running build_py
... ...
/usr/bin/ld: for architecture ppc
/usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
/usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install
Password:
running install
... ...
Adding MySQL-python 1.2.2 to easy-install.pth file

Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
Processing dependencies for MySQL-python==1.2.2
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path
  import sys, pkg_resources, imp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MySQLdb/__init__.py", line 19, in <module>
    import _mysql
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
  Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
  Reason: image not found

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 15 Oct 2007 12:30:21 -0400
> To: ytu888 at hotmail.com
> 
> Hi,
> 
> > Thank you for your email. I was away for a week.
> > What do you mean "fresh" python prompt?
> > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> > online.
> > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> > am I right?
> 
> I'm not sure, exactly.
> 
> Last time I checked, the only thing you needed to use mysql from  
> python was:
> 
> (a) A working mysql install (the client/server)
> (b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
> python
> 
> I'm assuming (a) is installed correctly since you are using the .mpkg  
> from mysql.org, so I'd just try to fix (b).
> 
> You try do so by doing the following:
> 
> (1) Remove your original attempt at installing the python mysqldb  
> library. From the looks of your error messages, it seems to be  
> installed here:
> 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/
> 
> (2) remove the build directory in your mysqldb directory (the one you  
> are installing from) by cd-ing into your mysqldb download, and  
> removing the build directory you find there.
> 
> (3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
> `sudo python setup.py install` dance
> 
> For the record, I'm not sure what you are talking about when you are  
> distinguishing between "MySQL_python_1.2.2, not MySQLdb"
> 
> are you trying to install two python libraries to access mysql?
> 
> -steve
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us

From fennan at gmail.com  Tue Oct 16 13:51:30 2007
From: fennan at gmail.com (Fernando)
Date: Tue, 16 Oct 2007 19:51:30 +0200
Subject: [BioPython] Precompute database information
Message-ID: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>

Hi everybody,

I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?

Thanks

From fennan at gmail.com  Tue Oct 16 14:55:54 2007
From: fennan at gmail.com (Fernando)
Date: Tue, 16 Oct 2007 20:55:54 +0200
Subject: [BioPython] Precompute database information
In-Reply-To: <4714FD13.2020708@maubp.freeserve.co.uk>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<4714FD13.2020708@maubp.freeserve.co.uk>
Message-ID: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>

Hi Peter,

>How big would your pre-computed data be?  If its some sort of table or
>other simple data you could perhaps use a simple text file; Another idea
> for complicated objects is to use python's pickle module.

It would be big... I an dealing with pairwise terms comparisons and I want
to consider different species as well.

>How often would the pre-computed data need to be updated?  Every time
>there is a new Gene Ontology release?  It might be better have the
>module download and cache the latest version on request (rather than
>shipping an out of date dataset with Biopython).

Yes, I could do that... It would be OK in Biopython to use mysql? If so the
module could download the last GO version on request, install it and work
with that version until the users decides to update it.

On 10/16/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Fernando wrote:
> > Hi everybody,
> >
> > I am thinking in including some algorithms that I work with into
> biopython.
> > My first concern is that I'm using a local image of the Gene Ontology
> > database to perform several operations. In order to avoid such database
> > accesses I could precompute the information I need and load it once the
> > module is called. How should I do it? Is there a guideline style to load
> > external variables or something like that? Any other ideas/suggestions?
>
> I think you need to go into more detail.
>
> How big would your pre-computed data be?  If its some sort of table or
> other simple data you could perhaps use a simple text file; Another idea
> for complicated objects is to use python's pickle module.
>
> How often would the pre-computed data need to be updated?  Every time
> there is a new Gene Ontology release?  It might be better have the
> module download and cache the latest version on request (rather than
> shipping an out of date dataset with Biopython).
>
> I don't think we have anything in Biopython that requires regular
> updates.  Things like genomes and sequence databases are left up to the
> user.
>
> Peter
>
>

From sdavis2 at mail.nih.gov  Tue Oct 16 15:26:18 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 16 Oct 2007 15:26:18 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>	<4714FD13.2020708@maubp.freeserve.co.uk>
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
Message-ID: <4715105A.30705@mail.nih.gov>

Fernando wrote:
> Hi Peter,
> 
>> How big would your pre-computed data be?  If its some sort of table or
>> other simple data you could perhaps use a simple text file; Another idea
>> for complicated objects is to use python's pickle module.
> 
> It would be big... I an dealing with pairwise terms comparisons and I want
> to consider different species as well.
> 
>> How often would the pre-computed data need to be updated?  Every time
>> there is a new Gene Ontology release?  It might be better have the
>> module download and cache the latest version on request (rather than
>> shipping an out of date dataset with Biopython).
> 
> Yes, I could do that... It would be OK in Biopython to use mysql? If so the
> module could download the last GO version on request, install it and work
> with that version until the users decides to update it.

Asking users to use MySQL to do updates might be a bit much.  Could this
be done from the .obo files?

Sean

From biopython at maubp.freeserve.co.uk  Tue Oct 16 14:04:03 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 19:04:03 +0100
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
Message-ID: <4714FD13.2020708@maubp.freeserve.co.uk>

Fernando wrote:
> Hi everybody,
> 
> I am thinking in including some algorithms that I work with into biopython.
> My first concern is that I'm using a local image of the Gene Ontology
> database to perform several operations. In order to avoid such database
> accesses I could precompute the information I need and load it once the
> module is called. How should I do it? Is there a guideline style to load
> external variables or something like that? Any other ideas/suggestions?

I think you need to go into more detail.

How big would your pre-computed data be?  If its some sort of table or 
other simple data you could perhaps use a simple text file; Another idea 
for complicated objects is to use python's pickle module.

How often would the pre-computed data need to be updated?  Every time 
there is a new Gene Ontology release?  It might be better have the 
module download and cache the latest version on request (rather than 
shipping an out of date dataset with Biopython).

I don't think we have anything in Biopython that requires regular 
updates.  Things like genomes and sequence databases are left up to the 
user.

Peter


From fennan at gmail.com  Wed Oct 17 07:12:36 2007
From: fennan at gmail.com (Fernando)
Date: Wed, 17 Oct 2007 07:12:36 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <4715105A.30705@mail.nih.gov>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<4714FD13.2020708@maubp.freeserve.co.uk>
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
	<4715105A.30705@mail.nih.gov>
Message-ID: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>

>Asking users to use MySQL to do updates might be a bit much.  Could this
>be done from the .obo files?

I think that's probably the best solution... Is there any python module for
working with OBO / OWL  formats? I've been searching but people seem to use
BioPerl for this matter

On 10/16/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> Fernando wrote:
> > Hi Peter,
> >
> >> How big would your pre-computed data be?  If its some sort of table or
> >> other simple data you could perhaps use a simple text file; Another
> idea
> >> for complicated objects is to use python's pickle module.
> >
> > It would be big... I an dealing with pairwise terms comparisons and I
> want
> > to consider different species as well.
> >
> >> How often would the pre-computed data need to be updated?  Every time
> >> there is a new Gene Ontology release?  It might be better have the
> >> module download and cache the latest version on request (rather than
> >> shipping an out of date dataset with Biopython).
> >
> > Yes, I could do that... It would be OK in Biopython to use mysql? If so
> the
> > module could download the last GO version on request, install it and
> work
> > with that version until the users decides to update it.
>
> Asking users to use MySQL to do updates might be a bit much.  Could this
> be done from the .obo files?
>
> Sean
>

From sdavis2 at mail.nih.gov  Wed Oct 17 11:34:17 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 17 Oct 2007 11:34:17 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>	
	<4714FD13.2020708@maubp.freeserve.co.uk>	
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>	
	<4715105A.30705@mail.nih.gov>
	<7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
Message-ID: <47162B79.8080204@mail.nih.gov>

Fernando wrote:
>>Asking users to use MySQL to do updates might be a bit much.  Could this
>>be done from the .obo files?
> 
> I think that's probably the best solution... Is there any python module
> for working with OBO / OWL  formats? I've been searching but people seem
> to use BioPerl for this matter

In a way, it seems silly to reimplement the Bio::OntologyIO stuff in
python, but I (and others, after a quick google search) would probably
benefit from such a thing.  I'm not able to devote much time right this
minute to the project, but I think that, given the huge number of
particularly obo format files available, there would be use for such
parsers and tools in biopython.  How much interest/need is there for a
Bio.OntologyIO like thing?  Has anyone made any attempts at creating one?

For a list of available biologic ontologies (to see what we are
missing), see here:

http://obofoundry.org/

Sean

From luca.beltrame at unimi.it  Wed Oct 17 11:59:47 2007
From: luca.beltrame at unimi.it (Luca Beltrame)
Date: Wed, 17 Oct 2007 17:59:47 +0200
Subject: [BioPython] Precompute database information
In-Reply-To: <47162B79.8080204@mail.nih.gov>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
	<47162B79.8080204@mail.nih.gov>
Message-ID: <200710171759.48595.luca.beltrame@unimi.it>

Il Wednesday 17 October 2007 17:34:17 Sean Davis ha scritto:

> In a way, it seems silly to reimplement the Bio::OntologyIO stuff in

It depends on the perspective, as for some learning yet another programming 
language would be a drawback.

> parsers and tools in biopython.  How much interest/need is there for a
> Bio.OntologyIO like thing?  Has anyone made any attempts at creating one?

Personally speaking, I would love it. No time (and skill) to even think about 
doing something like that, though.

-- 
Luca Beltrame, MSc. - Molecular Medicine PhD Student
Dipartimento di Scienze e Tecnologie Biomediche - UniMI
CNR - Institute of Biomedical Technologies Research Fellow
E-mail: luca dot beltrame [at] unimi dot it - Phone: +39-02-50320924

From jimmy.musselwhite at gmail.com  Wed Oct 17 17:20:41 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 17:20:41 -0400
Subject: [BioPython] Question about Seq.count()
Message-ID: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>

Hello all
I have a script that is running through a list of about 250,000 sequence
records and counting the number of times it counts substrings of 3-5
nucleotides in length

Here is some example code

search = 'ATTCG'

#use SeqIO to get a big list of records
sequences = list(SeqIO.parse(file, "fasta")

for record in sequences :

Now the code I want to do is
record.seq.count(search)

but what I am forced to do is
record.seq.tostring().count(search)

The problem here is that when I am forced to use .tostring() on every single
seq object it devastates my memory usage in a BIG way. It eats up about
1.2gigs and then crashes. If I remove the .tostring() and just tell if to
search for 'A', it will run fine and use memory at about 1/100th the rate

So my question sums down to, is there any way to make .count() be able to
search for strings and not just characters? Otherwise my work is going to
grind to a halt here.

Thanks!

From biopython at maubp.freeserve.co.uk  Wed Oct 17 18:03:51 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Oct 2007 23:03:51 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
Message-ID: <471686C7.6050305@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Now the code I want to do is
> record.seq.count(search)
> 
> but what I am forced to do is
> record.seq.tostring().count(search)
 >
> The problem here is that when I am forced to use .tostring() on every single
> seq object it devastates my memory usage in a BIG way. It eats up about
> 1.2gigs and then crashes. If I remove the .tostring() and just tell if to
> search for 'A', it will run fine and use memory at about 1/100th the rate

In the short term, try record.seq.data.count(search) which is what the 
tostring() method is doing anyway (the Seq object stores the sequence 
internally as a string).  Does that help?

We might be tweaking the Seq object after the next release to act a bit 
more like a string - at which point the .data property might go away.

> So my question sums down to, is there any way to make .count() be able to
> search for strings and not just characters?

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0

Peter


From jimmy.musselwhite at gmail.com  Wed Oct 17 18:48:09 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 18:48:09 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
Message-ID: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>

Thanks guys! That worked great.

On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Now the code I want to do is
> > record.seq.count(search)
> >
> > but what I am forced to do is
> > record.seq.tostring().count(search)
> >
> > The problem here is that when I am forced to use .tostring() on every
> single
> > seq object it devastates my memory usage in a BIG way. It eats up about
> > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> to
> > search for 'A', it will run fine and use memory at about 1/100th the
> rate
>
> In the short term, try record.seq.data.count(search) which is what the
> tostring() method is doing anyway (the Seq object stores the sequence
> internally as a string).  Does that help?
>
> We might be tweaking the Seq object after the next release to act a bit
> more like a string - at which point the .data property might go away.
>
> > So my question sums down to, is there any way to make .count() be able
> to
> > search for strings and not just characters?
>
> You I'd never noticed that - I would call it a bug...
>
> >>> from Bio.Seq import Seq
> >>> my_seq = Seq("AAACACACGGTTTT")
> >>> my_seq.data.count("GG")
> 1
> >>> my_seq.data.count("G")
> 2
> >>> my_seq.tostring().count("G")
> 2
> >>> my_seq.tostring().count("GG")
> 1
> >>> my_seq.count("G")
> 2
> >>> my_seq.count("GG")
> 0
>
> Peter
>
>

From jimmy.musselwhite at gmail.com  Wed Oct 17 18:52:07 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 18:52:07 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
Message-ID: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>

Just kidding, it didn't work great. It only "fixed" it because I was
printing out the output of count() and so it was just executing 100 times
slower and thus eating RAM 100 times slower :(

It doesn't seem like there is a good way for me to fix this.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Thanks guys! That worked great.
>
> On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
> >
> > Jimmy Musselwhite wrote:
> > > Now the code I want to do is
> > > record.seq.count(search)
> > >
> > > but what I am forced to do is
> > > record.seq.tostring().count(search)
> > >
> > > The problem here is that when I am forced to use .tostring() on every
> > single
> > > seq object it devastates my memory usage in a BIG way. It eats up
> > about
> > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> > to
> > > search for 'A', it will run fine and use memory at about 1/100th the
> > rate
> >
> > In the short term, try record.seq.data.count (search) which is what the
> > tostring() method is doing anyway (the Seq object stores the sequence
> > internally as a string).  Does that help?
> >
> > We might be tweaking the Seq object after the next release to act a bit
> > more like a string - at which point the .data property might go away.
> >
> > > So my question sums down to, is there any way to make .count() be able
> > to
> > > search for strings and not just characters?
> >
> > You I'd never noticed that - I would call it a bug...
> >
> > >>> from Bio.Seq import Seq
> > >>> my_seq = Seq("AAACACACGGTTTT")
> > >>> my_seq.data.count("GG")
> > 1
> > >>> my_seq.data.count("G")
> > 2
> > >>> my_seq.tostring().count("G")
> > 2
> > >>> my_seq.tostring().count("GG")
> > 1
> > >>> my_seq.count("G")
> > 2
> > >>> my_seq.count("GG")
> > 0
> >
> > Peter
> >
> >
>

From jimmy.musselwhite at gmail.com  Wed Oct 17 19:04:26 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 19:04:26 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
Message-ID: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>

In response to the first reply you gave me, where you said this

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0


I've tried that many many times and I always get 0 when I do
my_seq.count("GG")
I just rebuilt biopython from the latest CVS tarball and it still does not
work. I have no idea why yours works and mine doesn't.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
>
> It doesn't seem like there is a good way for me to fix this.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Thanks guys! That worked great.
> >
> > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > >
> > > Jimmy Musselwhite wrote:
> > > > Now the code I want to do is
> > > > record.seq.count(search)
> > > >
> > > > but what I am forced to do is
> > > > record.seq.tostring().count(search)
> > > >
> > > > The problem here is that when I am forced to use .tostring() on
> > > every single
> > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > about
> > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell
> > > if to
> > > > search for 'A', it will run fine and use memory at about 1/100th the
> > > rate
> > >
> > > In the short term, try record.seq.data.count (search) which is what
> > > the
> > > tostring() method is doing anyway (the Seq object stores the sequence
> > > internally as a string).  Does that help?
> > >
> > > We might be tweaking the Seq object after the next release to act a
> > > bit
> > > more like a string - at which point the .data property might go away.
> > >
> > > > So my question sums down to, is there any way to make .count() be
> > > able to
> > > > search for strings and not just characters?
> > >
> > > You I'd never noticed that - I would call it a bug...
> > >
> > > >>> from Bio.Seq import Seq
> > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > >>> my_seq.data.count("GG")
> > > 1
> > > >>> my_seq.data.count("G")
> > > 2
> > > >>> my_seq.tostring().count("G")
> > > 2
> > > >>> my_seq.tostring().count("GG")
> > > 1
> > > >>> my_seq.count("G")
> > > 2
> > > >>> my_seq.count("GG")
> > > 0
> > >
> > > Peter
> > >
> > >
> >
>

From jimmy.musselwhite at gmail.com  Wed Oct 17 19:06:03 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 19:06:03 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
	<86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>
Message-ID: <86e5e8970710171606x4ac9b3feg23f2409a4385d237@mail.gmail.com>

Man I"m sorry, I didn't read that well enough. It doesn't work for you
either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or
something.


On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> In response to the first reply you gave me, where you said this
>
> You I'd never noticed that - I would call it a bug...
>
>  >>> from Bio.Seq import Seq
>  >>> my_seq = Seq("AAACACACGGTTTT")
>  >>> my_seq.data.count("GG")
> 1
>  >>> my_seq.data.count("G")
> 2
>  >>> my_seq.tostring().count("G")
> 2
>  >>> my_seq.tostring().count("GG")
> 1
>  >>> my_seq.count("G")
> 2
>  >>> my_seq.count("GG")
> 0
>
>
> I've tried that many many times and I always get 0 when I do
> my_seq.count("GG")
> I just rebuilt biopython from the latest CVS tarball and it still does not
> work. I have no idea why yours works and mine doesn't.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100 times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
> >
> > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote:
> > >
> > > Thanks guys! That worked great.
> > >
> > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > > >
> > > > Jimmy Musselwhite wrote:
> > > > > Now the code I want to do is
> > > > > record.seq.count(search)
> > > > >
> > > > > but what I am forced to do is
> > > > > record.seq.tostring().count(search)
> > > > >
> > > > > The problem here is that when I am forced to use .tostring() on
> > > > every single
> > > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > > about
> > > > > 1.2gigs and then crashes. If I remove the .tostring() and just
> > > > tell if to
> > > > > search for 'A', it will run fine and use memory at about 1/100th
> > > > the rate
> > > >
> > > > In the short term, try record.seq.data.count (search) which is what
> > > > the
> > > > tostring() method is doing anyway (the Seq object stores the
> > > > sequence
> > > > internally as a string).  Does that help?
> > > >
> > > > We might be tweaking the Seq object after the next release to act a
> > > > bit
> > > > more like a string - at which point the .data property might go
> > > > away.
> > > >
> > > > > So my question sums down to, is there any way to make .count() be
> > > > able to
> > > > > search for strings and not just characters?
> > > >
> > > > You I'd never noticed that - I would call it a bug...
> > > >
> > > > >>> from Bio.Seq import Seq
> > > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > > >>> my_seq.data.count("GG")
> > > > 1
> > > > >>> my_seq.data.count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("GG")
> > > > 1
> > > > >>> my_seq.count("G")
> > > > 2
> > > > >>> my_seq.count("GG")
> > > > 0
> > > >
> > > > Peter
> > > >
> > > >
> > >
> >
>

From jimmy.musselwhite at gmail.com  Thu Oct 18 08:48:41 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Thu, 18 Oct 2007 08:48:41 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471733DE.6050803@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
	<471733DE.6050803@maubp.freeserve.co.uk>
Message-ID: <86e5e8970710180548u48e5780crc8d5178401d116d5@mail.gmail.com>

Peter
Well after a day of not thinking very hard I found my problem and it didn't
have anything to do with strings at all. That was just my best guess at the
time of writing this e-mail. Sorry about that =(

On 10/18/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100
> times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
>
> Both of these are using the python string method to count "GG", the only
> difference is the tostring() method has the additional small overhead of
> an extra function call:
>
> my_seq.data.count("GG")
> my_seq.tostring().count("GG")
>
> However, comparing these:
>
> my_seq.data.count("G")         # using python's string count method
> my_seq.tostring().count("G")   # using python's string count method
> my_seq.count("G")              # using an iterator internally
>
> It could be that the Seq record's current single letter search is simply
> very memory efficient compared than the python string's more flexible
> multi-letter search.
>
> How are you measuring the RAM?  If like to see memory usage figures for
> the five simple examples above on a large sequence - plus doing this
> directly on the equivalent string.
>
> Are you using Linux or Windows or Mac OS, and what version of python?  I
> know there have been some string optimisations in Python 2.5 (although I
> don't know if any are relevant to the count method).
>
> Peter
>
>

From ytu888 at hotmail.com  Thu Oct 18 13:35:15 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Thu, 18 Oct 2007 12:35:15 -0500
Subject: [BioPython] Error for running the test code in BioSQL with
 Biopython manual
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W666D7B45057D45AB8D7AC8F9E0@phx.gbl>


I am still waiting for help to fix the problem on Mac (attached at the bottom). However, to make the project going I found a old PC and installed Python, MySQL, BioSql and Bio-python on it. However, when I tested the codes coming with Basic BioSQL with Biopython, I got the following error:
=======================================my PC problem===============================

>>> from BioSQL import BioSeqDatabase
>>> server=BioSeqDatabase.open_database(driver="MySQLdb", user="root",
...     passwd="MySQLdb", host="localhost", db="bioseqdb")
>>> db=server.new_database("Viral")
>>> from Bio import GenBank
>>> parser=GenBank.FeatureParser()
>>> iterator = GenBank.Iterator(open("gbvrl.gb"), parser)
>>> db.load(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 414, in lo
ad
    db_loader.load_seqrecord(cur_record)
  File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 37, in load_seqrec
ord
    bioentry_id = self._load_bioentry_table(record)
  File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 260, in _load_bioe
ntry_table
    bioentry_id = self.adaptor.last_id('bioentry')
  File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 148, in la
st_id
    return self.dbutils.last_id(self.cursor, table)
  File "C:\Python25\Lib\site-packages\BioSQL\DBUtils.py", line 34, in last_id
    return cursor.insert_id()
AttributeError: 'Cursor' object has no attribute 'insert_id'
+++++++++++++++++++++++++++++++++++++++++++++++++

Please help me to fix the problem, thanks.


========================================my old Mac problem========================
Date: Tue, 16 Oct 2007 12:06:36 -0500
From: Y Tu <ytu888 at hotmail.com>
Subject: Re: [BioPython] Error for installation of  MySALdb on Mac OS
	X
To: Steve Lianoglou <lists.steve at arachnedesign.net>
Cc: biopython at lists.open-bio.org
Message-ID: <BAY119-W36D33A0C262F9D2A0101058F9C0 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
 
 
Hi,
 
I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem.
Thank you very much.
 
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build
running build
running build_py
... ...
/usr/bin/ld: for architecture ppc
/usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
/usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install
Password:
running install
... ...
Adding MySQL-python 1.2.2 to easy-install.pth file
 
Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
Processing dependencies for MySQL-python==1.2.2
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path
  import sys, pkg_resources, imp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MySQLdb/__init__.py", line 19, in <module>
    import _mysql
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
  Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
  Reason: image not found

_________________________________________________________________
Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct

From biopython at maubp.freeserve.co.uk  Thu Oct 18 06:22:22 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Oct 2007 11:22:22 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>	<471686C7.6050305@maubp.freeserve.co.uk>	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
Message-ID: <471733DE.6050803@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
> 
> It doesn't seem like there is a good way for me to fix this.

Both of these are using the python string method to count "GG", the only 
difference is the tostring() method has the additional small overhead of 
an extra function call:

my_seq.data.count("GG")
my_seq.tostring().count("GG")

However, comparing these:

my_seq.data.count("G")         # using python's string count method
my_seq.tostring().count("G")   # using python's string count method
my_seq.count("G")              # using an iterator internally

It could be that the Seq record's current single letter search is simply 
very memory efficient compared than the python string's more flexible 
multi-letter search.

How are you measuring the RAM?  If like to see memory usage figures for 
the five simple examples above on a large sequence - plus doing this 
directly on the equivalent string.

Are you using Linux or Windows or Mac OS, and what version of python?  I 
know there have been some string optimisations in Python 2.5 (although I 
don't know if any are relevant to the count method).

Peter


From dalloliogm at gmail.com  Fri Oct 19 09:38:50 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 19 Oct 2007 15:38:50 +0200
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
Message-ID: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>

2007/10/18, Peter <biopython at maubp.freeserve.co.uk>:
>  >>> from Bio.Seq import Seq
>  >>> my_seq = Seq("AAACACACGGTTTT")
>  >>> my_seq.count("G")
> 2
>  >>> my_seq.count("GG")
> 0


I've found the bug!

The code for Bio.Seq.count is:


                                         def count(self, item):
        return len([x for x in self.data if x == item])

it does not work for patterns of two nucleotides, because '[x for x in
self.data]' reiterates on a list of strings of one letter each:

>>> s = Seq( 'ACTTgGCATYCGgtGACGACTGGGcATCGGTCAGTCGGTTT')
>>> [x for x in s.data]
['A', 'C', 'T', 'T', 'g', 'G', 'C', 'A', 'T', 'Y', 'C', 'G', 'g', 't',
'G', 'A', 'C', 'G', 'A', 'C', 'T', 'G', 'G', 'G', 'c', 'A', 'T', 'C',
'G', 'G', 'T', 'C', 'A', 'G', 'T', 'C', 'G', 'G', 'T', 'T', 'T']
>>> for x in s.data:
>>>     print x, 'GG', x == 'GG'
(always false)


Something like [len('GG' in s.data)] also won't work, because "'GG' in
s.data" returns a Boolean value:
>>> 'GG' in s.data
True

What about using regular expressions instead?

>>> import re
>>> r = re.compile('GG')
>>> count = len(r.findall(my_seq.data))

They don't seem to be too different as for the execution time:

# for i in $( seq 10); do time python -m re -c '"cdasd".count("cc")';
done 2>&1| grep real
real    0m0.091s
real    0m0.106s
real    0m0.081s
real    0m0.110s
real    0m0.076s
real    0m0.109s
real    0m0.109s
real    0m0.062s
real    0m0.110s
real    0m0.062s


# for i in $(seq 10); do time python -m re -c 'len(re.findall("cc",
"cdasd"))'; done 2>&1|grep real
real    0m0.065s
real    0m0.108s
real    0m0.079s
real    0m0.082s
real    0m0.111s
real    0m0.113s
real    0m0.110s
real    0m0.112s
real    0m0.112s
real    0m0.111s


Compiling a short pattern with the re module shouldn't take too much
time and maybe in future implementations, it will allows us to do more
interesting things: for example, we will be able to add an
'ignorecase' parameter to Seq.count:

>>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG', 'ignorecase')
2
>>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG')
1

What do you think?

Cheers,
Giovanni


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com

From biopython at maubp.freeserve.co.uk  Fri Oct 19 10:50:56 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 19 Oct 2007 15:50:56 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
Message-ID: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>

> I've found the bug!
>
> The code for Bio.Seq.count is:
>
> def count(self, item):
>         return len([x for x in self.data if x == item])

Yeah - by design this (and the functionally similar version for the
MutableSeq) both expect the count argument to be a single letter.  The
simple fix for the Seq object is to use the string method internally:

def count(self, item):
        return self.data.count(item)

For the MutableSeq things are not so straight forward, but supporting
multiple character arguments can be done.

> What about using regular expressions instead?
> ...
> What do you think?

I think the Seq object's count method should act just like a normal
python string's count method.  If anyone wants to get fancy with
regular expressions, they can do so.

Peter

From anaryin at gmail.com  Mon Oct 22 08:21:49 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 13:21:49 +0100
Subject: [BioPython] Scripts cannot connect
Message-ID: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>

Hello all! I solved my problem a few weeks ago on Windows but now that I've
changed to Linux, it is back again.

I have this script:

#!/usr/bin/env python

from SOAPpy import WSDL

wsdl = 'http://soap.genome.jp/KEGG.wsdl'

serv = WSDL.Proxy(wsdl)

genes = ["eco:b1002", "eco:b2388"]

results = serv.mark_pathway_by_objects("path:eco00010", genes)

print results

Everytime I try to run it, it gets me a timeout. I solved the problem in
Windows by setting up env_variables. Here, the bash can access the web (it
has its env_var http_proxy set) but my scripts can't.. any help?

Thanks in advance!

Jo?o Rodrigues


From biopython at maubp.freeserve.co.uk  Mon Oct 22 08:48:52 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Oct 2007 13:48:52 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
Message-ID: <471C9C34.7000006@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Everytime I try to run it, it gets me a timeout. I solved the problem in
> Windows by setting up env_variables. Here, the bash can access the web (it
> has its env_var http_proxy set) but my scripts can't.. any help?

What does this do if you add it to your script?

import os
print os.environ.keys()
try :
     print os.environ["http_proxy"]
except KeyError :
     print "http_proxy environment variable not setup"

How have you setup the environment variables in Linux? Via your .bashrc 
file?

Peter


From anaryin at gmail.com  Mon Oct 22 09:11:46 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 14:11:46 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <471C9C34.7000006@maubp.freeserve.co.uk>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
Message-ID: <b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>

Hello again!

It says that the proxy isn't set.. I've added the line to my .bashrc ( I had
to create it). Yet, it doesn't work.

What am I doing wrong? (or not doing)

From tiagoantao at gmail.com  Mon Oct 22 10:01:53 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Mon, 22 Oct 2007 15:01:53 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
Message-ID: <471CAD51.101@gmail.com>

Jo?o Rodrigues wrote:
> It says that the proxy isn't set.. I've added the line to my .bashrc ( I had
> to create it). Yet, it doesn't work.
> 
> What am I doing wrong? (or not doing)


Are you doing an export of the variable?
Try doing env at the prompt and check if http_proxy is defined (you will 
get a big list of environment variables, just search or grep for the 
proxy one). Like:
$ env | grep http_proxy


On another front, your .bash_profile should exist and be sourcing 
.bashrc (either that, or you put http_proxy on .bash_profile)

Regards,
Tiago


-- 
tiagoantao at gmail.com
http://tiago.org/ps


From anaryin at gmail.com  Mon Oct 22 11:38:19 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 16:38:19 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
Message-ID: <b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>

Well, the problem is another then.. I've set the environment variables by
hand and it worked. It detects the proxy and works through it.

However, it still doesn't connect to the web. I'm using the example they
gave on the KEGG API reference manual so it *should* work..

I've used a test script to check if other scripts could connect and they do.
I've tried with the urllib to retrieve the kegg page and it does. I guess
the problem is with the webservice... I'll try to figure it out.

Thanks for your help! (Again :) )

From bsantos at biocant.pt  Tue Oct 23 11:57:58 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:57:58 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <001101c8158d$7d146600$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 11:50:16 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:50:16 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <000c01c8158c$69ee0370$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 11:59:50 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:59:50 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <001601c8158d$bff07cc0$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 13:17:24 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 18:17:24 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <471E1CBC.30601@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
Message-ID: <001b01c81598$95f7b3b0$2300a8c0@bsantos>

I have manually checked the file and I didn't found any problem.
Sorry about the three times it was my mistake because I send the message
before register and then I thought I had to send it again.
This is getting stranger every time I ran the script it gave me a different
error. Now I get this one at the first run:

Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: unclosed token: line 2826, column 8

Now if I run the script without first close it I get the following error:
Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: no element found: line 2823, column 81

Now if I execute the close operation on both files in the interactive window
and run the script again I get:

Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: no element found: line 2827, column 0

I have upload my script, the FASTA file I'm using and the XML can anyone
give a look?

XML File: http://www.drivehq.com/folder/p2731454.aspx
Script: http://www.drivehq.com/folder/p2731447.aspx
FASTA File: http://www.drivehq.com/folder/p2731426.aspx


Unidade de Bioinform?tica  

3060-197 Cantanhede  
Tel: 231 410 892
http://bioinformatics.biocant.pt

-----Mensagem original-----
De: Peter [mailto:biopython at maubp.freeserve.co.uk] 
Enviada: ter?a-feira, 23 de Outubro de 2007 17:10
Para: Bruno Santos
Cc: biopython at biopython.org
Assunto: Re: [BioPython] Problems with NCBIXML.py

Bruno Santos wrote:
> I am trying to build a simple script that given a multi FASTA sequence
file
> perform a web BLAST and replace the name of the sequence by the hit with
the
> lowest E-Value.
> 
> But now I?m getting an exception that I don?t now why it?s happening:
> 
> Traceback (most recent call last):
> ...
> 
>     for blast_record in blast_records:
> 
>   File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
> parse
> 
>     expat_parser.Parse(text, False)
> 
> ExpatError: mismatched tag: line 2823, column 362

That sounds like an error in the XML file - have a look at this 
particular XML file by hand in a text editor; maybe its only a partial 
download, or an HTML error page or something.

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 23 14:14:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 19:14:43 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001b01c81598$95f7b3b0$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
Message-ID: <471E3A13.5080505@maubp.freeserve.co.uk>

Bruno Santos wrote:
> I have manually checked the file and I didn't found any problem.
> Sorry about the three times it was my mistake because I send the message
> before register and then I thought I had to send it again.
> This is getting stranger every time I ran the script it gave me a different
> error. Now I get this one at the first run:
 >
 > ...
> 
> Now if I run the script without first close it I get the following error:
> Traceback (most recent call last):
> 

Without seeing the XML file I'm having to guess - but this could be 
something to do with trying to read files from disk before the OS has 
finished flushing the data out.  Mismatched tags could certainly be 
explained if the parser was only getting part of the data.

You could try inserting a sleep of a few seconds after writing and 
closing the XML file.  Also try handle.flush() before the handle.close() 
when you save the XML file to disk.

> I have upload my script, the FASTA file I'm using and the XML can anyone
> give a look?
> 
> XML File: http://www.drivehq.com/folder/p2731454.aspx
> Script: http://www.drivehq.com/folder/p2731447.aspx
> FASTA File: http://www.drivehq.com/folder/p2731426.aspx

That didn't work - the easy solution is to file a bug, and then attach 
the three files:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Peter


From dag23 at duke.edu  Tue Oct 23 17:06:53 2007
From: dag23 at duke.edu (David Garfield)
Date: Tue, 23 Oct 2007 17:06:53 -0400
Subject: [BioPython] Syntax error while parsing Blast output
Message-ID: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>

Hey list,

    I'm having an issue with the BlastParser and Iterator from  
NCBIStandalone.  I assume its because NCBI has gone and changed the  
output file (again)...or I'm an idiot....but maybe there's a real  
problem here.


I'm trying to parse a blast result using the following code:

def filter_blast_results(blast_results, blast_cut_off):
	b_parser = NCBIStandalone.BlastParser()
	b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
	hit_results = {}
	while 1:
		b_record = b_iterator.next()
		if b_record is None:
			break
		header = b_record.Header.query
		temp = []
		for alignment in b_record.alignments:
			for hsp in alignment.hsps:
				if hsp.expect < blast_cut_off:
					temp.append(alignment.title)
		#we now remove duplicates from the temp list and add that the the  
hit_results
		hit_results[header] = remove_duplicates(temp)
	return hit_results

And I get the error I've included at the bottom of this message,  
something about "SyntaxError: Line does not start with 'Reference':"
I know that blast is working because I can print out what appears to  
my untrained eye to be a perfectly good XML of the results I see when  
I run blast manually.


Any help would be very much appreciated,

David


Traceback (most recent call last):
   File "test_scripts.py", line 7, in <module>
     single_blast_sequence.run_2way_blast('single_test_in.fasta','/ 
Users/dagarfield/urchins/blastdbs/urchin_2.0','/Users/dagarfield/ 
urchins/blastdbs/urchin_2.0','NA',.001,'/Users/dagarfield/urchins/ 
urchin_bin/blastall')
   File "/private/var/automount/Network/Share2/genomeScans/urchins/ 
alignment_methods/blast/single_blast_sequence.py", line 57, in  
run_2way_blast
     input_to_other_blast_matches = filter_blast_results 
(blast_results, blast_cut_off)
   File "/private/var/automount/Network/Share2/genomeScans/urchins/ 
alignment_methods/blast/single_blast_sequence.py", line 39, in  
filter_blast_results
     b_record = b_iterator.next()
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1403, in next
     return self._parser.parse(File.StringHandle(data))
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 616, in parse
     self._scanner.feed(handle, self._consumer)
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 96, in feed
     self._scan_header(uhandle, consumer)
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 125, in  
_scan_header
     read_and_call(uhandle, consumer.reference, start='Reference')
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/ParserSupport.py", line 300, in  
read_and_call
     raise SyntaxError, errmsg
SyntaxError: Line does not start with 'Reference':
   <BlastOutput_db>/Users/dagarfield/urchins/blastdbs/urchin_2.0</ 
BlastOutput_db>


From biopython at maubp.freeserve.co.uk  Tue Oct 23 17:45:38 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 22:45:38 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
Message-ID: <471E6B82.5010700@maubp.freeserve.co.uk>

David Garfield wrote:
> Hey list,
> 
>     I'm having an issue with the BlastParser and Iterator from  
> NCBIStandalone.  I assume its because NCBI has gone and changed the  
> output file (again)...or I'm an idiot....but maybe there's a real  
> problem here.

The code you gave uses the NCBIStandalone parser/iterator, which expects 
plain text output - yet you say later the raw file looks like a 
perfectly good XML file.  If you have an XML file (which we recommend 
over the plain text) then you should use the NCBIXML module instead.

Also, a style point - I personally much prefer this:

b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
for b_record in b_iterator :
     #etc

over this:

b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
while 1:
     b_record = b_iterator.next()
     if b_record is None: break
     #etc

Peter


From dag23 at duke.edu  Tue Oct 23 17:59:33 2007
From: dag23 at duke.edu (David Garfield)
Date: Tue, 23 Oct 2007 17:59:33 -0400
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <471E6B82.5010700@maubp.freeserve.co.uk>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
Message-ID: <B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>

Thanks, Peter.  You've found the problem exactly.

Interestingly, the code I presented was taken directly from the  
BioPython cookbook (including the "while 1" bit).

Somewhere in the subsequent versions since that document was  
released, the output of NCBIStandalone has changed from text to XML  
and the NCBIStandalone Iterators and Parser either no longer seem to  
work with the output of NCBIStandalone.blastall or there is an option  
not mentioned in the Cookbook to ensure that the output is in text  
rather than XML.

In any event, the problem is now fixed.  Thanks!

--DG


On Oct 23, 2007, at 5:45 PM, Peter wrote:

> David Garfield wrote:
>> Hey list,
>>     I'm having an issue with the BlastParser and Iterator from   
>> NCBIStandalone.  I assume its because NCBI has gone and changed  
>> the  output file (again)...or I'm an idiot....but maybe there's a  
>> real  problem here.
>
> The code you gave uses the NCBIStandalone parser/iterator, which  
> expects plain text output - yet you say later the raw file looks  
> like a perfectly good XML file.  If you have an XML file (which we  
> recommend over the plain text) then you should use the NCBIXML  
> module instead.
>
> Also, a style point - I personally much prefer this:
>
> b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
> for b_record in b_iterator :
>     #etc
>
> over this:
>
> b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
> while 1:
>     b_record = b_iterator.next()
>     if b_record is None: break
>     #etc
>
> Peter
>


From biopython at maubp.freeserve.co.uk  Tue Oct 23 18:48:28 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 23:48:28 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
Message-ID: <471E7A3C.5010301@maubp.freeserve.co.uk>

David Garfield wrote:
> Thanks, Peter.  You've found the problem exactly.
> 
> Interestingly, the code I presented was taken directly from the 
> BioPython cookbook (including the "while 1" bit).

So it is.  Michiel - do you fancy tweaking that section of the tutorial?

> Somewhere in the subsequent versions since that document was released, 
> the output of NCBIStandalone has changed from text to XML and the 
> NCBIStandalone Iterators and Parser either no longer seem to work with 
> the output of NCBIStandalone.blastall or there is an option not 
> mentioned in the Cookbook to ensure that the output is in text rather 
> than XML.

Biopython 1.43 switched the default from text to XML, because we really 
wanted to encourage people to use the XML output by default as 
maintaining the text format parser is such an ongoing maintainance 
effort.  The release notes did mention this, but it was bound to catch 
someone out.

There is an option to override this...

from Bio.Blast import NCBIStandalone
help(NCBIStandalone.blastall)

You need the align_view option (what the NCBI refers to as the alignment 
view), corresponding to the -m command line option of the NCBI blastall 
tool.  Biopython currently defaults to seven to get XML output.

alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 23 12:09:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 17:09:32 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001101c8158d$7d146600$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
Message-ID: <471E1CBC.30601@maubp.freeserve.co.uk>

Bruno Santos wrote:
> I am trying to build a simple script that given a multi FASTA sequence file
> perform a web BLAST and replace the name of the sequence by the hit with the
> lowest E-Value.
> 
> But now I?m getting an exception that I don?t now why it?s happening:
> 
> Traceback (most recent call last):
> ...
> 
>     for blast_record in blast_records:
> 
>   File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
> parse
> 
>     expat_parser.Parse(text, False)
> 
> ExpatError: mismatched tag: line 2823, column 362

That sounds like an error in the XML file - have a look at this 
particular XML file by hand in a text editor; maybe its only a partial 
download, or an HTML error page or something.

Peter


From mdehoon at c2b2.columbia.edu  Tue Oct 23 20:19:47 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 23 Oct 2007 20:19:47 -0400
Subject: [BioPython] Syntax error while parsing Blast output
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>

> > Interestingly, the code I presented was taken directly from the 
> > BioPython cookbook (including the "while 1" bit).
> 
> So it is.  Michiel - do you fancy tweaking that section of the tutorial?

That part of the tutorial is in the section "Deprecated BLAST parsers", which
will be removed once the plain-text Blast parser is removed from Biopython.
The description of NCBIStandalone.blastall says

"This command will generate BLAST output in XML format, ..."

So this is being described correctly in the documentation.

Nevertheless, it may be a good idea to remove the plain text Blast parser
completely from Biopython in the upcoming release (which will probably be
done this week), to avoid further confusion.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/23/2007 6:48 PM
To: David Garfield; biopython at lists.open-bio.org
Subject: Re: [BioPython] Syntax error while parsing Blast output
 
David Garfield wrote:
> Thanks, Peter.  You've found the problem exactly.
> 
> Somewhere in the subsequent versions since that document was released, 
> the output of NCBIStandalone has changed from text to XML and the 
> NCBIStandalone Iterators and Parser either no longer seem to work with 
> the output of NCBIStandalone.blastall or there is an option not 
> mentioned in the Cookbook to ensure that the output is in text rather 
> than XML.

Biopython 1.43 switched the default from text to XML, because we really 
wanted to encourage people to use the XML output by default as 
maintaining the text format parser is such an ongoing maintainance 
effort.  The release notes did mention this, but it was bound to catch 
someone out.

There is an option to override this...

from Bio.Blast import NCBIStandalone
help(NCBIStandalone.blastall)

You need the align_view option (what the NCBI refers to as the alignment 
view), corresponding to the -m command line option of the NCBI blastall 
tool.  Biopython currently defaults to seven to get XML output.

alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]

Peter

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From biopython at maubp.freeserve.co.uk  Wed Oct 24 04:22:45 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 09:22:45 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>

[Sorry you got this twice Michiel, I forgot to set the from/to fields]

> That part of the tutorial is in the section "Deprecated BLAST parsers", which
> will be removed once the plain-text Blast parser is removed from Biopython.
> ...
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

Removing it sounds too drastic - especially as we have had people on
the mailing list using it  deliberately fairly recently.  If you really do want
to remove this code, then adding a deprecation warning to the plain text
parser for the next release would be a more gentle route.

I think there is still some benefit in having the plain text parser, and that
it could be fixed to cope with current multi-query files without too much
pain.  Maybe I should try this weekend...

Anyone want to voice their opinion?

Peter

From mmokrejs at ribosome.natur.cuni.cz  Wed Oct 24 07:01:26 2007
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Wed, 24 Oct 2007 13:01:26 +0200
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <471F2606.8080500@ribosome.natur.cuni.cz>

Hi,

Michiel De Hoon wrote:
>>> Interestingly, the code I presented was taken directly from the 
>>> BioPython cookbook (including the "while 1" bit).
>> So it is.  Michiel - do you fancy tweaking that section of the tutorial?
> 
> That part of the tutorial is in the section "Deprecated BLAST parsers", which
> will be removed once the plain-text Blast parser is removed from Biopython.
> The description of NCBIStandalone.blastall says
> 
> "This command will generate BLAST output in XML format, ..."
> 
> So this is being described correctly in the documentation.
> 
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

although I understand your points, are you sure to REMOVE it? What if people
need to parse elsewhere generated, maybe even in the past generated BLAST
text outputs? If you wanted to say that you will REMOVE the text-based parser
because it won't be maintained anymore and probably be usable for one or two
NCBI BLAST version only, then it is probably more understandable. Otherwise
I guess more people move to bioperl. ;) BTW, what if some people have older
BLAST version generating broken XML file formats? Or have to parse such
old files again?

Martin

From winter at biotec.tu-dresden.de  Wed Oct 24 08:22:09 2007
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Wed, 24 Oct 2007 14:22:09 +0200
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <471F38F1.1030600@biotec.tu-dresden.de>

Michiel De Hoon wrote:
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

I agree with Peter and Martin that removing the plain text parser is maybe too 
much. Although I further agree that there is benefit in having the plain text 
parser, I am not sure if Biopython should ensure supporting every small format 
change that NCBI might come up with in the future.

I use XML and tabular output only, BTW.

Cheers,
Christof

From cjfields at uiuc.edu  Wed Oct 24 09:49:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 24 Oct 2007 08:49:09 -0500
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
Message-ID: <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>


On Oct 24, 2007, at 3:22 AM, Peter wrote:

> [Sorry you got this twice Michiel, I forgot to set the from/to fields]
>
>> That part of the tutorial is in the section "Deprecated BLAST  
>> parsers", which
>> will be removed once the plain-text Blast parser is removed from  
>> Biopython.
>> ...
>> Nevertheless, it may be a good idea to remove the plain text Blast  
>> parser
>> completely from Biopython in the upcoming release (which will  
>> probably be
>> done this week), to avoid further confusion.
>
> Removing it sounds too drastic - especially as we have had people on
> the mailing list using it  deliberately fairly recently.  If you  
> really do want
> to remove this code, then adding a deprecation warning to the plain  
> text
> parser for the next release would be a more gentle route.
>
> I think there is still some benefit in having the plain text  
> parser, and that
> it could be fixed to cope with current multi-query files without  
> too much
> pain.  Maybe I should try this weekend...
>
> Anyone want to voice their opinion?
>
> Peter

We have a similar issue with the bioperl parsers.  We basically  
promote the BLAST XML parser over the text parser, but we have  
retained both due to demand.  In fact, we have two text parsers, a  
pull and a push parser (we're gluttons for punishment).  As for  
maintenance, we never guarantee how long it will take to fix text  
parsing if it breaks as the text format is fairly unstable by NCBI's  
own admission.

Our deprecation cycle is usually: (1) announce it on list to get  
feedback, (2) if deprecation is planned, add warnings to the module  
in the next release, (3) remove completely in a later release.  It  
gives everyone time to change over.

chris

From bsantos at biocant.pt  Wed Oct 24 12:23:56 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Wed, 24 Oct 2007 17:23:56 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <471E3A13.5080505@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
Message-ID: <001601c8165a$48248600$2300a8c0@bsantos>

Peter Wrote:
>Without seeing the XML file I'm having to guess - but this could be 
>something to do with trying to read files from disk before the OS has 
>finished flushing the data out.  Mismatched tags could certainly be 
>explained if the parser was only getting part of the data.
>
>You could try inserting a sleep of a few seconds after writing and 
>closing the XML file.  Also try handle.flush() before the handle.close() 
>when you save the XML file to disk.

You were right I was getting the data before it has been written to the
file. Now it's working perfect. 

But know I have another problem it's possible to instead of making a single
request to NCBI_Blast with one sequence, make the request for all the
sequences in a multiFASTA file?

I'm trying to use threads to do this but until now without luck.

Thanks in advance,
Bruno Santos


From biopython at maubp.freeserve.co.uk  Wed Oct 24 13:32:52 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 18:32:52 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001601c8165a$48248600$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
	<001601c8165a$48248600$2300a8c0@bsantos>
Message-ID: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>

On 10/24/07, Bruno Santos <bsantos at biocant.pt> wrote:
> You were right I was getting the data before it has been written to the
> file. Now it's working perfect.

Great.

> But know I have another problem it's possible to instead of making a single
> request to NCBI_Blast with one sequence, make the request for all the
> sequences in a multiFASTA file?
>
> I'm trying to use threads to do this but until now without luck.

I would suggest you install standalone blast, then give it the
multi-record FASTA file as input.  You should then get multiple blast
records back (in the same order).  This works fine with the XML output
(but currently does not work for plain text output on recent versions
of NCBI Blast).

If you really want to make multiple blast submissions in parallel
online, first check the NCBI's website for any usage restrictions -
they don't want their servers to be abused.

Peter

From biosql at hotmail.com  Wed Oct 24 16:53:19 2007
From: biosql at hotmail.com (Jonathan Boulais)
Date: Wed, 24 Oct 2007 16:53:19 -0400
Subject: [BioPython] Loading SwissProt to BioSQL
Message-ID: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>


Hello, 

I'm a biologist and quite newb with Biopython. I'm trying to build locally the Swissprot database with BioSQL and I'm having some problems. 
I have installed the latest version from the CVS and I'm using python 2.5 on a Mac Os 10.4. 

First, i get this weird problem. Since I need to connect with MySQL I started to wrote a simple script (Biosql.py) with only this ( from BioSQL import BioSeqDatabase). When I run this script in the terminal : python Biosql.py, I get this message **ImportError: cannot import name BioSeqDatabase**. But the weird thing is if I start a python session in the terminal by simply invoking python and then manually import BioSeqDatabase, it's working ! 
Is there any reason for that ?

Second, I've then decided to continue with the python session since I'm able to import BioSeqDatabse. The connection to MySQL is working fine, but when I'm trying to import the flat file I'm getting this : 


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load
    db_loader.load_seqrecord(cur_record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table
    version))
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute
    self.cursor.execute(sql, args or ())
  File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute
    query = query % db.literal(args)
TypeError: not all arguments converted during string formatting


Here's the lines I'm using : 

from BioSQL import BioSeqDatabase
from Bio.SwissProt import SProt

server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "", passwd = "", host = "localhost", db = "bioseqdb")
s_parser = SProt.SequenceParser()
s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser)
db = server.new_database("Swiss")
db.load(s_iterator)


Does anybody understand this ?

Many thanks if someone can help !

Jonathan


_________________________________________________________________
Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant!
http://www.emoticonesgratuites.ca/?icid=EMFRCA120

From biopython at maubp.freeserve.co.uk  Wed Oct 24 17:15:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 22:15:10 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
Message-ID: <471FB5DE.6080506@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
> Hello,
> 
> I'm a biologist and quite newb with Biopython. I'm trying to build 
> locally the Swissprot database with BioSQL and I'm having some 
> problems. I have installed the latest version from the CVS and I'm 
> using python 2.5 on a Mac Os 10.4.
> 
> First, i get this weird problem. Since I need to connect with MySQL I
>  started to wrote a simple script (Biosql.py) with only this ( from 
> BioSQL import BioSeqDatabase). When I run this script in the
> terminal: python Biosql.py, I get this message **ImportError: cannot
> import name BioSeqDatabase**. But the weird thing is if I start a
> python session in the terminal by simply invoking python and then
> manually import BioSeqDatabase, it's working ! Is there any reason
> for that ?

In both cases are you running python from the command prompt?  If so 
then the same environment variables (e.g. paths) should apply.  Odd.

My guess is you shouldn't call your script "Biosql.py", call it 
"Biosql_test.py" or something.  Python thinks the line "from BioSQL 
import BioSeqDatabase" means importing from the script itself because 
that is also called BioSQL.

Peter


From biopython at maubp.freeserve.co.uk  Wed Oct 24 17:22:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 22:22:05 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
Message-ID: <471FB77D.5060103@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
> from Bio.SwissProt import SProt
> s_parser = SProt.SequenceParser()
> s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser)

This won't help with the database issue, but you should also be able to 
load the SwissProt text file with Bio.SeqIO:

from Bio import SeqIO
s_iterator = SeqIO.parse(open("path/to/uniprot_sprot.dat"), "swiss")

This in fact will call the Bio.SwissProt.SProt module internally, and 
get it to return SeqRecord objects.

The Bio.SeqIO interface is meant to make it easy to switch the input 
file format (e.g. GenBank or EMBL).

Peter


From mdehoon at c2b2.columbia.edu  Wed Oct 24 20:40:18 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 24 Oct 2007 20:40:18 -0400
Subject: [BioPython] Syntax error while parsing Blast output
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
	<3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>

>> Nevertheless, it may be a good idea to remove the plain text Blast  
>> parser
>> completely from Biopython in the upcoming release (which will  
>> probably be
>> done this week), to avoid further confusion.
>
> Removing it sounds too drastic - especially as we have had people on
> the mailing list using it  deliberately fairly recently.  If you  
> really do want
> to remove this code, then adding a deprecation warning to the plain  
> text
> parser for the next release would be a more gentle route.
>

Sorry, I was confused; I was under the impression that the plain text Blast
parser was already deprecated (I was getting confused with the blast and
blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in
favor of qblast). OK, then let's keep the plain-text Blast parser as is, and
maybe think again about this issue after the upcoming release.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From mmayhew at mcb.mcgill.ca  Thu Oct 25 00:12:06 2007
From: mmayhew at mcb.mcgill.ca (Michael Mayhew)
Date: Thu, 25 Oct 2007 00:12:06 -0400
Subject: [BioPython] Any planned BioPython presence at PyCon 2008?
Message-ID: <47201796.2050902@mcb.mcgill.ca>

Was planning on going to PyCon 2008 anyway, but would have even more 
incentive if there is going to be a big BioPython community turnout.

Would love to pitch in on a development session or something like that.

Michael Mayhew

From biosql at hotmail.com  Thu Oct 25 10:52:02 2007
From: biosql at hotmail.com (Jonathan Boulais)
Date: Thu, 25 Oct 2007 10:52:02 -0400
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <471FB5DE.6080506@maubp.freeserve.co.uk>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
	<471FB5DE.6080506@maubp.freeserve.co.uk>
Message-ID: <BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>


> Date: Wed, 24 Oct 2007 22:15:10 +0100
> From: biopython at maubp.freeserve.co.uk
> To: biosql at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Loading SwissProt to BioSQL
> 
> Jonathan Boulais wrote:
> > Hello,
> > 
> > I'm a biologist and quite newb with Biopython. I'm trying to build 
> > locally the Swissprot database with BioSQL and I'm having some 
> > problems. I have installed the latest version from the CVS and I'm 
> > using python 2.5 on a Mac Os 10.4.
> > 
> > First, i get this weird problem. Since I need to connect with MySQL I
> >  started to wrote a simple script (Biosql.py) with only this ( from 
> > BioSQL import BioSeqDatabase). When I run this script in the
> > terminal: python Biosql.py, I get this message **ImportError: cannot
> > import name BioSeqDatabase**. But the weird thing is if I start a
> > python session in the terminal by simply invoking python and then
> > manually import BioSeqDatabase, it's working ! Is there any reason
> > for that ?
> 
> In both cases are you running python from the command prompt?  If so 
> then the same environment variables (e.g. paths) should apply.  Odd.
> 
> My guess is you shouldn't call your script "Biosql.py", call it 
> "Biosql_test.py" or something.  Python thinks the line "from BioSQL 
> import BioSeqDatabase" means importing from the script itself because 
> that is also called BioSQL.
> 
> Peter
> 

Peter you were right about the name of the file. Nice call and thank you !
But I still get the same error as before when I'm running it. 

Traceback (most recent call last):
  File "DB.py", line 14, in <module>
    db.load(s_iterator)
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load
    db_loader.load_seqrecord(cur_record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table
    version))
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute
    self.cursor.execute(sql, args or ())
  File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute
    query = query % db.literal(args)
TypeError: not all arguments converted during string formatting


Is it the MySQLdb driver or a bad arguments that is passed to MySQLdb ?

Again, thank you for your time. 

Jonathan 

_________________________________________________________________
Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant!
http://www.emoticonesgratuites.ca/?icid=EMFRCA120

From biopython at maubp.freeserve.co.uk  Thu Oct 25 13:22:46 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 18:22:46 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>	<471FB5DE.6080506@maubp.freeserve.co.uk>
	<BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
Message-ID: <4720D0E6.8000609@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
>> My guess is you shouldn't call your script "Biosql.py", call it 
>> "Biosql_test.py" or something.  Python thinks the line "from BioSQL 
>> import BioSeqDatabase" means importing from the script itself because 
>> that is also called BioSQL.
> 
> Peter you were right about the name of the file. Nice call and thank you !

Great - I wasn't sure if the case would matter or not.

> But I still get the same error as before when I'm running it. 
> ...

I've not used BioSQL myself (yet), but looking at the code you posted 
earlier, you setup the connection like this:

from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="MySQLdb", user="", 
passwd="", host="localhost", db="bioseqdb")

I think the driver="MySQLdb" is fine, but don't you need a database 
username (and perhaps a password)?

Peter


From biopython at maubp.freeserve.co.uk  Thu Oct 25 05:44:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 10:44:43 +0100
Subject: [BioPython] Any planned BioPython presence at PyCon 2008?
In-Reply-To: <47201796.2050902@mcb.mcgill.ca>
References: <47201796.2050902@mcb.mcgill.ca>
Message-ID: <4720658B.4020103@maubp.freeserve.co.uk>

Michael Mayhew wrote:
> Was planning on going to PyCon 2008 anyway, but would have even more 
> incentive if there is going to be a big BioPython community turnout.
> 
> Would love to pitch in on a development session or something like that.
> 
> Michael Mayhew

http://us.pycon.org/2008/about/
http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html
 > Proposals for PyCon 2008 talks & tutorials are now being accepted.
 > The deadline for proposals is November 16.  PyCon 2008 will be held
 > in Chicago, Illinois, USA, from March 13-20.

It is remotely possible that I'll be working the USA next year, but I 
have to say at this point that it looks unlikely that I'll be able to 
attend.

Peter


From biopython at maubp.freeserve.co.uk  Thu Oct 25 05:57:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 10:57:10 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>	<471E6B82.5010700@maubp.freeserve.co.uk>	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>	<3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>
	<6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>
Message-ID: <47206876.9040905@maubp.freeserve.co.uk>

Michiel De Hoon wrote:
> 
> Sorry, I was confused; I was under the impression that the plain text Blast
> parser was already deprecated (I was getting confused with the blast and
> blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in
> favor of qblast). OK, then let's keep the plain-text Blast parser as is, and
> maybe think again about this issue after the upcoming release.
> 

Panic averted - but it was good to hear some passionate defence of the 
plain text BLAST  parser, it looks like it still gets quite a bit of use.

Peter


From bsantos at biocant.pt  Fri Oct 26 05:13:58 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 10:13:58 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
	<001601c8165a$48248600$2300a8c0@bsantos>
	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
Message-ID: <000301c817b0$8c868c10$2300a8c0@bsantos>

Peter Said
>I would suggest you install standalone blast, then give it the
>multi-record FASTA file as input.  You should then get multiple blast
>records back (in the same order).  This works fine with the XML output
>(but currently does not work for plain text output on recent versions
>of NCBI Blast).
>
>If you really want to make multiple blast submissions in parallel
>online, first check the NCBI's website for any usage restrictions -
>they don't want their servers to be abused.
>
>Peter

I have followed your advice and I decide to install standalone blast. As I
want to make blast against the nt databases I have downloaded it pre
compiled from the ncbi ftp server. And I have created I scrip to do this but
for some reason I'm not getting any results, because the programs does not
write anything to the XML file. 

Where is my script:
from Bio import SeqIO
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
import time
import math

my_blast_db = (r'e:/nt.00')
my_blast_file =
r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
my_blast_exe = r'C:/BLAST/bin/'
save_file = open(r'C:/FASTASeq/Results/well9/V6_BLAST.xml', 'w')
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
blast_results = result_handle.read() #Catch the results
save_file.write(blast_results) #Write all the information to an XML file
save_file.close()
print time.ctime()

As I have download the files from ncbi I have a lot of files in the database
directory theres is any way of perform a search against all of them?

Thanks in advance,
Bruno Santos 

Unidade de Bioinform?tica  

3060-197 Cantanhede  
Tel: 231 410 892
http://bioinformatics.biocant.pt


From biopython at maubp.freeserve.co.uk  Fri Oct 26 05:52:34 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 10:52:34 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000301c817b0$8c868c10$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
	<000301c817b0$8c868c10$2300a8c0@bsantos>
Message-ID: <4721B8E2.2040902@maubp.freeserve.co.uk>

Bruno Santos wrote:
> Peter Said
>> I would suggest you install standalone blast, then give it the
>> multi-record FASTA file as input.  You should then get multiple blast
>> records back (in the same order).  This works fine with the XML output
>> (but currently does not work for plain text output on recent versions
>> of NCBI Blast).
>>
>> If you really want to make multiple blast submissions in parallel
>> online, first check the NCBI's website for any usage restrictions -
>> they don't want their servers to be abused.
>>
>> Peter
> 
> I have followed your advice and I decide to install standalone blast. As I
> want to make blast against the nt databases I have downloaded it pre
> compiled from the ncbi ftp server. And I have created I script to do this but
> for some reason I'm not getting any results, because the programs does not
> write anything to the XML file. 
> 
> Where is my script:
> from Bio import SeqIO
> from Bio.Blast import NCBIStandalone
> from Bio.Blast import NCBIXML
> import time
> import math

You are running on Windows, so the paths should have "\" rather than "/" 
in them.  However, in many cases this isn't essential - and indeed for 
some Unix programs ported to Windows using "/" is sometimes best!

> my_blast_db = (r'e:/nt.00')

I'm not sure if that is correct, but its difficult to tell without 
seeing your setup.

> my_blast_file =
> r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
> my_blast_exe = r'C:/BLAST/bin/'

That is wrong, try something like:
my_blast_exe = r'C:\BLAST\bin\blastall.exe'

I would urge you to try running blastall "by hand" at the command line 
first for a few small examples, to get the hang of it.  Because any 
error messages get printed to the command line, it makes debugging 
simpler. This will also help with you how to prepare the arguments in 
Biopython.  Within python you would have to have checked what was 
written to the error_info output handle.

> As I have download the files from ncbi I have a lot of files in the database
> directory theres is any way of perform a search against all of them?

I'm not sure what exactly you are asking.  BLAST can make databases from 
FASTA files, so you might want to build a database from all your FASTA 
files... check the documentation for the BLAST formatdb program.

Peter


From bsantos at biocant.pt  Fri Oct 26 09:40:40 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 14:40:40 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <4721B8E2.2040902@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
	<000301c817b0$8c868c10$2300a8c0@bsantos>
	<4721B8E2.2040902@maubp.freeserve.co.uk>
Message-ID: <000701c817d5$d0e8f4e0$2300a8c0@bsantos>

>You are running on Windows, so the paths should have "\" rather than "/" 
>in them.  However, in many cases this isn't essential - and indeed for 
>some Unix programs ported to Windows using "/" is sometimes best!
>
> my_blast_db = (r'e:/nt.00')
>
>I'm not sure if that is correct, but its difficult to tell without 
>seeing your setup.
It's ok to use the "/" because it seems that the python interpreter converts
it to the symbol used by the OS. 

> my_blast_file =
> r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
> my_blast_exe = r'C:/BLAST/bin/'
>
>That is wrong, try something like:
>my_blast_exe = r'C:\BLAST\bin\blastall.exe'

You were right about that. It's ok now

> As I have download the files from ncbi I have a lot of files in the
database
> directory theres is any way of perform a search against all of them?

>I'm not sure what exactly you are asking.  BLAST can make databases from 
>FASTA files, so you might want to build a database from all your FASTA 
>files... check the documentation for the BLAST formatdb program.
I have downloaded the pre compiled files which mean I have five different
files like (nt.00.nhr, nt.01.nhr, nt.02.nhr...) and also the same files with
all the others extensions. But I have found I can use them all at the same
time by passing it to command line between "". So now I have my_blast_db =
(r'\"e:/nt.00 e:/nt.01 e:/nt.02 e:/nt.03 e:/nt.04 e:/nt.05 \"'). 

But now I'm mailing you with another doubt it is possible to pass the
result_handle to blast_results line by line or something like that because
I'm having a memory error in the step described below

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
blast_results = result_handle.read() #Catch the results 

Maybe if I pass one line at a time and write ir immediately to the xml file
it will work. 

Thanks once more,
Bruno Santos


From biopython at maubp.freeserve.co.uk  Fri Oct 26 10:37:45 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 15:37:45 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000701c817d5$d0e8f4e0$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>
	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>
Message-ID: <4721FBB9.1040408@maubp.freeserve.co.uk>

> But now I'm mailing you with another doubt it is possible to pass the
> result_handle to blast_results line by line or something like that because
> I'm having a memory error in the step described below
> 
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
> "blastn",my_blast_db, my_blast_file)
> blast_results = result_handle.read() #Catch the results 
> 
> Maybe if I pass one line at a time and write it immediately to the xml file
> it will work. 

XML files are big.  Lots of query sequences will also make things 
bigger.  And the default expectation threshold will also give lots of 
results - setting this to something harsher will help by giving less 
matches.

Unless you want to keep the XML file for other analysis, it might be 
simpler to parse the output from blast directly with Biopython - 
avoiding having the large XML file on disk.

Keeping the XML intermediate file can be a good idea when working on 
smaller datasets, where you want to tweak your analysis (without 
re-running blast each time).

Peter

From bsantos at biocant.pt  Fri Oct 26 11:50:48 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 16:50:48 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <4721FBB9.1040408@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>
	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>
	<4721FBB9.1040408@maubp.freeserve.co.uk>
Message-ID: <000801c817e7$fd1bc940$2300a8c0@bsantos>

Peter Said:
>XML files are big.  Lots of query sequences will also make things 
>bigger.  And the default expectation threshold will also give lots of 
>results - setting this to something harsher will help by giving less 
>matches.
>
>Unless you want to keep the XML file for other analysis, it might be 
>simpler to parse the output from blast directly with Biopython - 
>avoiding having the large XML file on disk.
>
>Keeping the XML intermediate file can be a good idea when working on 
>smaller datasets, where you want to tweak your analysis (without 
>re-running blast each time).

But if even I don't want to save the results to an XML I still have to do
the <blast_results = result_handle.read() #Catch the results> step right?
And my problem is in this step not in writing to the file. 
Or I can use the result_handle directly, because I was reading the biopython
documentation but it's not very clear.


From biopython at maubp.freeserve.co.uk  Fri Oct 26 12:04:40 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 17:04:40 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000801c817e7$fd1bc940$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>	<4721FBB9.1040408@maubp.freeserve.co.uk>
	<000801c817e7$fd1bc940$2300a8c0@bsantos>
Message-ID: <47221018.9090104@maubp.freeserve.co.uk>

Bruno Santos wrote:
> Peter Said:
>> Unless you want to keep the XML file for other analysis, it might be 
>> simpler to parse the output from blast directly with Biopython - 
>> avoiding having the large XML file on disk.
> 
> But if even I don't want to save the results to an XML I still have to do
> the <blast_results = result_handle.read() #Catch the results> step right?
> And my problem is in this step not in writing to the file. 
> Or I can use the result_handle directly, because I was reading the biopython
> documentation but it's not very clear.

The intention is something like this:

result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)

blast_records = NCBIXML.parse(result_handle)
for record in blast_records :
    #do stuff

The bit about saving the results to a file and loading that to give a 
new handle is optional, but very handy if you need to look at the raw 
file by hand.  Perhaps that section of the tutorial could be a little 
clearer ...

Peter


From mdehoon at c2b2.columbia.edu  Sun Oct 28 02:32:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:32:40 -0400
Subject: [BioPython] Biopython release 1.44 ready
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>

Hi everybody,

Biopython release 1.44 is now available for download from the Biopython
website at http://biopython.org.

This release includes lots of code improvements and fixes in the Blast
interface and parsers, sequence input/output, the SwissProt parser, the
clustering routines, as well as a brand new module for population genetics.
For reasons of compatibility, some radical changes were necessary in some
parts of the code; please let us know if you find some functionality missing.

My thanks to all code contributers who made this new release possible.

--Michiel on behalf of the Biopython developers


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From tiagoantao at gmail.com  Sun Oct 28 17:31:58 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 21:31:58 +0000
Subject: [BioPython] Biopython citation
Message-ID: <4724FFCE.20103@gmail.com>

Hello,

I am submitting a paper regarding a Jython selection detection program 
that we have done, and I would like to cite biopython. What is really 
the best, most recent, citation?

Tiago
-- 
tiagoantao at gmail.com
http://tiago.org/ps


From biopython at maubp.freeserve.co.uk  Sun Oct 28 16:52:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 20:52:05 +0000
Subject: [BioPython] Biopython citation
In-Reply-To: <4724FFCE.20103@gmail.com>
References: <4724FFCE.20103@gmail.com>
Message-ID: <4724F675.8030902@maubp.freeserve.co.uk>

Tiago Antao wrote:
> I am submitting a paper regarding a Jython selection detection program 
> that we have done, and I would like to cite biopython. What is really 
> the best, most recent, citation?
> 
> Tiago

For a general project reference, I think the most recent is Brad &
Jeff's 2000 newsletter article:

Chapman, B. and Chang, J. (2000) Biopython: python tools for
computational biology. ACM SIG-BIO Newsletter, 20, 15-19.

However, I confess I only cited the www.biopython.org website in my last 
paper.

Peter

P.S. There are specific papers for some modules, e.g. Bio.PDB and 
Bio.Cluster


From skhadar at gmail.com  Mon Oct 29 09:15:30 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 29 Oct 2007 18:45:30 +0530
Subject: [BioPython] Biopython citation
In-Reply-To: <4724F675.8030902@maubp.freeserve.co.uk>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
Message-ID: <b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>

Hi Peter,

I am interested to look at it. We dont have access to ACM.
If you have a copy of that paper.

Thanks,
Shameer

On 10/29/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Tiago Antao wrote:
> > I am submitting a paper regarding a Jython selection detection program
> > that we have done, and I would like to cite biopython. What is really
> > the best, most recent, citation?
> >
> > Tiago
>
> For a general project reference, I think the most recent is Brad &
> Jeff's 2000 newsletter article:
>
> Chapman, B. and Chang, J. (2000) Biopython: python tools for
> computational biology. ACM SIG-BIO Newsletter, 20, 15-19.
>
> However, I confess I only cited the www.biopython.org website in my last
> paper.
>
> Peter
>
> P.S. There are specific papers for some modules, e.g. Bio.PDB and
> Bio.Cluster
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From skhadar at gmail.com  Mon Oct 29 10:11:41 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 29 Oct 2007 19:41:41 +0530
Subject: [BioPython] Biopython citation
In-Reply-To: <4725E655.8080608@maubp.freeserve.co.uk>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
	<b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
	<4725E655.8080608@maubp.freeserve.co.uk>
Message-ID: <b6ff81950710290711w5cedfcc2s85a6a12a05c4034b@mail.gmail.com>

Hi ,

Thanks for that !!!
--
Shameer

On 10/29/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Hi Peter,
> >
> > I am interested to look at it. We dont have access to ACM. If you
> > have a copy of that paper.
> >
> > Thanks, Shameer
>
> Its not actually very informative, especial as of the examples are now
> rather dated.  Anyway, I believe the new-letter article was the same as
> the document available on our website:
>
> http://biopython.org/DIST/docs/acm/ACMbiopy.html
> http://biopython.org/DIST/docs/acm/ACMbiopy.pdf
>
> Chapman, B. and Chang, J. (2000) Biopython: python tools for
> computational biology. ACM SIG-BIO Newsletter, 20, 15-19.
>
> Peter
>

From biopython at maubp.freeserve.co.uk  Mon Oct 29 09:55:33 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 13:55:33 +0000
Subject: [BioPython] Biopython citation
In-Reply-To: <b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
	<b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
Message-ID: <4725E655.8080608@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Hi Peter,
> 
> I am interested to look at it. We dont have access to ACM. If you
> have a copy of that paper.
> 
> Thanks, Shameer

Its not actually very informative, especial as of the examples are now
rather dated.  Anyway, I believe the new-letter article was the same as 
the document available on our website:

http://biopython.org/DIST/docs/acm/ACMbiopy.html
http://biopython.org/DIST/docs/acm/ACMbiopy.pdf

Chapman, B. and Chang, J. (2000) Biopython: python tools for
computational biology. ACM SIG-BIO Newsletter, 20, 15-19.

Peter

From biopython at maubp.freeserve.co.uk  Mon Oct 29 15:22:20 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 19:22:20 +0000
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <4720D0E6.8000609@maubp.freeserve.co.uk>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
	<471FB5DE.6080506@maubp.freeserve.co.uk>
	<BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
	<4720D0E6.8000609@maubp.freeserve.co.uk>
Message-ID: <320fb6e00710291222l1a5746e9m3bbc5c4c9fd03921@mail.gmail.com>

Jonathan Boulais wrote:
> But I still get the same error as before when I'm running it.
> ...

For anyone wanting to track this issue, Jonathan has filled
Bug 2390 - Error importing Swiss Prot in BioSQL
http://bugzilla.open-bio.org/show_bug.cgi?id=2390

Peter

From anaryin at gmail.com  Mon Oct 29 21:28:21 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 30 Oct 2007 01:28:21 +0000
Subject: [BioPython] Fwd:  Scripts cannot connect
In-Reply-To: <b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
Message-ID: <b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>

I've checked all my connection settings, tested an awful lot of
possibilities and I came to this conclusion. When using a webservice, I
can't connect to the internet. In the same script, I can get for instance,
the google page, but the lines regarding the webservice itself, they won't
connect.

I've tried to set environment proxy (through export
http_proxy='blabla:yyyy') in the script itself and nothing. I've set
os.environ[blabla] and it's doesn't work.

So, does anyone has an idea of why this is happening? Shouldn't the
webservice, if using http protocol (as it does), work just like any other
command (let's say, urllib.urlopen)?

I know this falls out of the BioPython theme but I consider it quite
relevant for my BioPython work :)

Thank you all in advance!

From biopython at maubp.freeserve.co.uk  Tue Oct 30 04:53:14 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 08:53:14 +0000
Subject: [BioPython] Fwd:  Scripts cannot connect
In-Reply-To: <b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>	<471C9C34.7000006@maubp.freeserve.co.uk>	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
	<b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
Message-ID: <4726F0FA.6000209@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> I've checked all my connection settings, tested an awful lot of
> possibilities and I came to this conclusion. When using a webservice, I
> can't connect to the internet. In the same script, I can get for instance,
> the google page, but the lines regarding the webservice itself, they won't
> connect.

Are you still finding things work on Windows, but fail on Linux?
If so, are you running the same version of python (and Biopython) on both?

> I've tried to set environment proxy (through export
> http_proxy='blabla:yyyy') in the script itself and nothing. I've set
> os.environ[blabla] and it's doesn't work.

When you say "it doesn't work", do you mean the (a) environment variable 
isn't set, or (b) the environment variable is set but has not effect.

> So, does anyone has an idea of why this is happening? Shouldn't the
> webservice, if using http protocol (as it does), work just like any other
> command (let's say, urllib.urlopen)?

Are you saying there is a difference depending on the URL type (plain 
page versus web-service?)

Or, are you saying there is a difference depending on what python 
library you use (e.g. urllib or something else).

> I know this falls out of the BioPython theme but I consider it quite
> relevant for my BioPython work :)
> 
> Thank you all in advance!

This must be very frustrating for you.  Have you been able to find your 
University's official documentation for the proxy?

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 30 08:32:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 12:32:10 +0000
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>	
	<471686C7.6050305@maubp.freeserve.co.uk>	
	<5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
	<320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>
Message-ID: <4727244A.4010705@maubp.freeserve.co.uk>

Peter wrote:
>> I've found the bug!
>>
>> The code for Bio.Seq.count is:
>>
>> def count(self, item):
>>         return len([x for x in self.data if x == item])
> 
> Yeah - by design this (and the functionally similar version for the
> MutableSeq) both expect the count argument to be a single letter.  The
> simple fix for the Seq object is to use the string method internally:
> 
> def count(self, item):
>         return self.data.count(item)
> 
> For the MutableSeq things are not so straight forward, but supporting
> multiple character arguments can be done.

Bug 2386 and proposed patch here:
http://bugzilla.open-bio.org/show_bug.cgi?id=2386

This also lets the count methods take Seq or MutableSeq objects as 
arguments - in addition to plain strings.

Note there is room for improvement in my patch: For the case of the 
MutableSeq, we might want to investigate counting from the array of 
characters directly, rather than taking the lazy option of turning it 
into a string and counting that way.

Peter


From anaryin at gmail.com  Tue Oct 30 12:29:00 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 30 Oct 2007 16:29:00 +0000
Subject: [BioPython] Fwd: Scripts cannot connect
In-Reply-To: <4726F0FA.6000209@maubp.freeserve.co.uk>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
	<b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
	<4726F0FA.6000209@maubp.freeserve.co.uk>
Message-ID: <b537e3710710300929u37cee4ecm3aa56e080fe475ac@mail.gmail.com>

Are you still finding things work on Windows, but fail on Linux?
If so, are you running the same version of python (and Biopython) on both?

There is the same version in all operative systems. I'm using XP (one 32bits
the other 64) in the Windows Machines (one at home another at "work") and
Ubuntu 7.10 in both my laptop and the Workstation at the University (it's
dual-booted). Regarding Biopython, it's the same version in all but my
laptop that has the last upgrade of the 28th October (but still, it never
worked before). But since I'm not using any modules, it should not have
anything to do with it.

When you say "it doesn't work", do you mean the (a) environment variable
isn't set, or (b) the environment variable is set but has not effect.

An example: I start a new session in my laptop and open the console. I type
"export http_proxy='blabla'" to set the variable. I then type "env" and it
returns me a list of all env variable *including* the http_proxy one. I run
"aptitude update" and it works. If I do the same in a Python Script, it
doesn't (at least when connecting to a webservice). I believe then, that the
variable is set but it doesn't work somehow.

Are you saying there is a difference depending on the URL type (plain
page versus web-service?)

I *think*, or suppose, that somehow, the two "types" of connection, despite
using HTTP and the same  proxy env. variable, are working differently.


Or, are you saying there is a difference depending on what python
library you use (e.g. urllib or something else).

Which other libraries can I try out? Other than urllib?


This must be very frustrating for you.
Have you been able to find your University's official documentation for the
proxy?

It's a dilemma. On the one hand, I have a perfectly set windows system that
can access the internet through the scripts I write. However, there is no
ZSI for it (ot at least, I can't install it). As such, no SOAP support, no
API I can get to work.
On the other hand, GNU/Linux. It works perfectly, the *.deb packages exist
and are quite easy to install, so I have ZSI and SOAP support to work with
the API. However, I can't access the web with the ZSI module.

I'll try to talk to the University Informatics Service to see if they can
figure it out. Really hope they can, otherwise, I guess I'll just have to
work from home since it works there.. :)

Again, very thankful!

Jo?o Rodrigues


From ytu888 at hotmail.com  Mon Oct  1 11:39:50 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 1 Oct 2007 06:39:50 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
Message-ID: <BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>


Thanks Peter, 

However, I still haven't install mxText module in my Mac yet. Also could you tell me how to run the test file of ReportLab, when I launch Python and then import the test file into the python. Thanks.


> Date: Fri, 28 Sep 2007 20:42:31 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> CC: biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> 
> Y Tu wrote:
> > Thank you, Peter for the prompt answer.
> > 
> > I did install the PIL already and tested with the commands "from PIL
> > import Image", then "import _imaging". Both commands succeeded.
> > That's why I don't understand why the test won't work. I used the
> > command "python test_pdfgen_general.py" under the shell prompt, which
> > generated the error. Since I installed PIL and succeeded in importing
> > the module of PIL, I thought maybe I can solve the problem by running
> > the test under Python.
> 
> Looking in more detail at the original stack trace,
> 
> >   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
> >     d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
> >   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
> >     raise IOError("decoder %s not available" % decoder_name)
> > IOError: decoder jpeg not available
> 
> Its possible that PIL needs some optional JPEG library, which ReportLab 
> wants to use.  I suggest you search the ReportLab website & user's 
> mailing list, and if you can't work out what is wrong sign up to their 
> mailing list and ask them, http://www.reportlab.org/
> 
> Very little of Biopython needs ReportLab, you should be able to install 
> Biopython without it.
> 
> Peter
> 
> 

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us


From ytu888 at hotmail.com  Mon Oct  1 17:54:00 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 1 Oct 2007 12:54:00 -0500
Subject: [BioPython]  Error for installation of  MySALdb on Mac OS X
In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
Message-ID: <BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>


I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and installed it. Then I tried to install MySQL-python-1.2.2 but got the following error. How to create the mysql_config.path file? Thank you very much.

leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ python setup.py build
sh: line 1: mysql_config: command not found
Traceback (most recent call last):
  File "setup.py", line 16, in 
    metadata, options = get_config()
  File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config
    libs = mysql_config("libs_r")
  File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config
    raise EnvironmentError, "%s not found" % mysql_config.path
EnvironmentError: mysql_config not found

_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx


From lists.steve at arachnedesign.net  Mon Oct  1 20:18:04 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Mon, 1 Oct 2007 16:18:04 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
Message-ID: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>

> I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> installed it. Then I tried to install MySQL-python-1.2.2 but got  
> the following error. How to create the mysql_config.path file?  
> Thank you very much.
>
> leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> python setup.py build
> sh: line 1: mysql_config: command not found

It seems as if you need to have the `mysql_config` command in your  
PATH variable and it's not there.

Look for where mysql was installed (maybe /usr/local/mysql/...) and  
add its bin directory to your PATH environment variable. Or maybe it  
installed some binaries/symlinks into your /usr/local/bin directory?

I think that'll do it for you.

-steve


From biopython at maubp.freeserve.co.uk  Mon Oct  1 21:06:37 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 1 Oct 2007 22:06:37 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W24ADB785F1713680E663638FAD0@phx.gbl>
Message-ID: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>

On 10/1/07, Y Tu <ytu888 at hotmail.com> wrote:
>
> Thanks Peter,
>
> However, I still haven't install mxText module in my Mac yet.

I see you've signed up to the eGenix mailing list - I hope they can
solve your mxTextTools installation problems.

> Also could you tell me how to run the test file of ReportLab, when I
> launch Python and then import the test file into the python. Thanks.

In general I think most tests are designed to be run from the command
line, not by running python, typing an import statement, and typing
another command.  You should check the ReportLab documentation to see
what they recommend.

To run a specific Biopython unit test, such as the general graphics
unit test, you would do this:

python run_tests.py test_GraphicsGeneral.py

That would run the test, and check the output matched the expected
results.  Alternatively, you can do:

python test_GraphicsGeneral.py

I hope that helps.

Peter


From ULNJUJERYDIX at spammotel.com  Tue Oct  2 06:52:53 2007
From: ULNJUJERYDIX at spammotel.com (Kevin Lam)
Date: Tue, 2 Oct 2007 14:52:53 +0800
Subject: [BioPython] Fwd: **Fwd: [Bioperl-l] divide and blast blastunsplit
	blast subsequence
In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com>
References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com>
Message-ID: <5b6410e0710012352s520b537bj7374dd874dc93104@mail.gmail.com>

Hi!
I am trying to annotate a 200kb sequence by doing blastx to find the protein
seq location
I need to split the sequence up so that I get the best hits for each region
(the top blast hits will mask the smaller proteins if i do it as a whole
sequence)
if i were to do it manually i can set the subsequence in the web gui for
ncbi's blast.
this way, the blast hits coords are based on the whole 200kb.

but I can't find this option in blast or a straightforward way to do it in
bioperl.

I found similar solutions like
http://www.bio.davidson.edu/projects/DAB/DAB.html
divide and blast (but I want to specify coords rather than fixed intervals)

there also this from the bioperl archives
http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html

but isn't there an easier way like i can specify blast subsequence 200-900
of fasta file and it will return the blastx hits in coords in terms of the
whole 200kb?


From mdehoon at c2b2.columbia.edu  Tue Oct  2 09:06:54 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 05:06:54 -0400
Subject: [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>

Hi everybody,

Since no users of Bio.MultiProc came forward, I deprecated it for the
upcoming release.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
Sent: Tue 9/11/2007 10:37 AM
To: BioPython Developers List; biopython at biopython.org
Subject: [BioPython] Bio.MultiProc
 
Hi everybody,

In preparation for the upcoming release, I was running the Biopython 
test suite and found that test_copen.py hangs on Cygwin. It doesn't 
fail, it just sits there forever. This may be related to the use of 
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it 
is probably possible to fix this, I'd have to dig fairly deep into the 
code, and I am not sure if it is worth it. It looks like the copen 
functions are used only in Bio/config, which is needed for Bio.db. A 
description of the functionality of thia module can be found in the 
tutorial section 4.7.2.

Now, I don't remember users asking about this module on the mailing 
list. From the tutorial documentation, it seems to be a nice piece of 
code, but I doubt that it is being used often in practice.

So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release? 
Hopefully, people who are using this code will notice, and let us know 
that they need it.

--Michiel.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From ytu888 at hotmail.com  Tue Oct  2 11:36:58 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 2 Oct 2007 06:36:58 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W24ADB785F1713680E663638FAD0@phx.gbl> 
	<320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com>
Message-ID: <BAY119-W116B22B2B8E4727BB9196F8FAE0@phx.gbl>


Thank you very much, Peter.

> Date: Mon, 1 Oct 2007 22:06:37 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> CC: biopython at lists.open-bio.org
> 
> On 10/1/07, Y Tu <ytu888 at hotmail.com> wrote:
> >
> > Thanks Peter,
> >
> > However, I still haven't install mxText module in my Mac yet.
> 
> I see you've signed up to the eGenix mailing list - I hope they can
> solve your mxTextTools installation problems.
> 
> > Also could you tell me how to run the test file of ReportLab, when I
> > launch Python and then import the test file into the python. Thanks.
> 
> In general I think most tests are designed to be run from the command
> line, not by running python, typing an import statement, and typing
> another command.  You should check the ReportLab documentation to see
> what they recommend.
> 
> To run a specific Biopython unit test, such as the general graphics
> unit test, you would do this:
> 
> python run_tests.py test_GraphicsGeneral.py
> 
> That would run the test, and check the output matched the expected
> results.  Alternatively, you can do:
> 
> python test_GraphicsGeneral.py
> 
> I hope that helps.
> 
> Peter

_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Caf?. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline


From ytu888 at hotmail.com  Tue Oct  2 12:29:46 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 2 Oct 2007 07:29:46 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
Message-ID: <BAY119-W23852A91CA4B6611B79BC58FAE0@phx.gbl>


Hi Steve,

I checked the PATH and added /usr/local/mysql/bin into it. But I still got the same error message when running the setup.py.

Thanks.

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 1 Oct 2007 16:18:04 -0400
> To: ytu888 at hotmail.com
> 
> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> > installed it. Then I tried to install MySQL-python-1.2.2 but got  
> > the following error. How to create the mysql_config.path file?  
> > Thank you very much.
> >
> > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> > python setup.py build
> > sh: line 1: mysql_config: command not found
> 
> It seems as if you need to have the `mysql_config` command in your  
> PATH variable and it's not there.
> 
> Look for where mysql was installed (maybe /usr/local/mysql/...) and  
> add its bin directory to your PATH environment variable. Or maybe it  
> installed some binaries/symlinks into your /usr/local/bin directory?
> 
> I think that'll do it for you.
> 
> -steve
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us


From idoerg at gmail.com  Tue Oct  2 16:00:41 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Tue, 2 Oct 2007 09:00:41 -0700
Subject: [BioPython] [Biopython-dev]  Bio.MultiProc
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
References: <46E6A845.3030601@c2b2.columbia.edu>
	<6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <b5bbbc970710020900n66c9816bs311fa29eb52d3f25@mail.gmail.com>

Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:

1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).

Also, is it possible to track down the original author?

./I

On 10/2/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


From mdehoon at c2b2.columbia.edu  Wed Oct  3 00:18:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 20:18:59 -0400
Subject: [BioPython] [Biopython-dev]  Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
	<b5bbbc970710020900n66c9816bs311fa29eb52d3f25@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu>

> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?

That is what I did.

> 3) Leave an option of fixing and commenting the code back in (i.e. it is
not
> lost forever).

Even after removing the code in some future release, the code will not be
lost forever. It can always be retrieved from CVS and from older Biopython
releases.

> Also, is it possible to track down the original author?

That would be Jeff Chang.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Iddo Friedberg [mailto:idoerg at gmail.com]
Sent: Tue 10/2/2007 12:00 PM
To: Michiel De Hoon
Cc: BioPython Developers List; biopython at biopython.org
Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc
 
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:

1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).

Also, is it possible to track down the original author?

./I

On 10/2/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


From ytu888 at hotmail.com  Wed Oct  3 12:44:32 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Wed, 3 Oct 2007 07:44:32 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
Message-ID: <BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>


Here is the copy of the output in the Terminal. Please help me to find out what's wrong. Thanks.


Last login: Wed Oct  3 08:28:38 on ttyp4

Welcome to Darwin!

LeesComputer:~ Lee$ echo
$PATH

/Library/Frameworks/Python.framework/Versions/Current/bin:/usr/local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin


LeesComputer:~ Lee$ cd
/applications/python_bio/MySQL-python-1.2.2

LeesComputer:/applications/python_bio/MySQL-python-1.2.2
Lee$ python setup.py build

sh: line 1: mysql_config:
command not found

Traceback (most recent call
last):

  File "setup.py", line 16, in
<module>

    metadata, options = get_config()

  File
"/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line
43, in get_config

    libs = mysql_config("libs_r")

  File
"/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line
24, in mysql_config

    raise EnvironmentError, "%s not
found" % mysql_config.path

EnvironmentError:
mysql_config not found

 
LeesComputer:/applications/python_bio/MySQL-python-1.2.2
Lee$ cd /usr/local                                 

LeesComputer:/usr/local Lee$
ls -al

total 8

drwxr-xr-x    8 root 
wheel  272 Oct  1 13:02 .

drwxr-xr-x   10 root 
wheel  340 Sep 26 11:30 ..

drwxr-xr-x    8 root 
admin  272 Aug  6 04:00 ActivePerl-5.8

drwxr-xr-x   15 root 
wheel  510 Oct  2 03:52 bin

drwxr-xr-x    6 root 
wheel  204 Sep 27 05:22 include

drwxr-xr-x   12 root 
wheel  408 Sep 27 05:21 lib

lrwxr-xr-x    1 root 
wheel   25 Oct  1 13:02 mysql -> mysql-5.0.45-osx10.4-i686

drwxr-xr-x   19 root 
wheel  646 Jul  4 13:54 mysql-5.0.45-osx10.4-i686


> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 1 Oct 2007 16:18:04 -0400
> To: ytu888 at hotmail.com
> 
> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and  
> > installed it. Then I tried to install MySQL-python-1.2.2 but got  
> > the following error. How to create the mysql_config.path file?  
> > Thank you very much.
> >
> > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$  
> > python setup.py build
> > sh: line 1: mysql_config: command not found
> 
> It seems as if you need to have the `mysql_config` command in your  
> PATH variable and it's not there.
> 
> Look for where mysql was installed (maybe /usr/local/mysql/...) and  
> add its bin directory to your PATH environment variable. Or maybe it  
> installed some binaries/symlinks into your /usr/local/bin directory?
> 
> I think that'll do it for you.
> 
> -steve
> 

_________________________________________________________________
Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct


From lists.steve at arachnedesign.net  Wed Oct  3 13:01:09 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Wed, 3 Oct 2007 09:01:09 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
Message-ID: <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>

Hi,

On Oct 3, 2007, at 8:44 AM, Y Tu wrote:

> Here is the copy of the output in the Terminal. Please help me to  
> find out what's wrong. Thanks.
>
> Last login: Wed Oct  3 08:28:38 on ttyp4
> Welcome to Darwin!
> LeesComputer:~ Lee$ echo $PATH
> /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/ 
> local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin

It still looks like your PATH is screwed up, /usr/local/mysql/bin  
isn't in there, you have:
/usr/local/mysl:/bin

Here's a test. Open up a terminal and type:

$ which mysql_config

If you don't get an answer back that indicates that the system can  
find the binary, then your script won't either. For instance, this is  
how it looks for me:

$ which mysql_config
/Library/MySQL/bin/mysql_config

(I have an older version of mysql which was installed into /Library/ 
MySQL)

Yours should say:

$ which mysql_config
/usr/local/mysql/bin/mysql_config

Or something like that.

Try that and see ...

-steve


From lists.steve at arachnedesign.net  Wed Oct  3 14:47:41 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Wed, 3 Oct 2007 10:47:41 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
Message-ID: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>

> Steve, thank you very much. It fixed the problem and I got through  
> the build and install step. But when I tested inside the python for  
> the installation I got following error. Please help me about it.  
> Thanks.
>
> >>> import MySQLdb
> /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ 
> site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ 
> _mysql.py:3: UserWarning: Module _mysql was already imported from / 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc,  
> but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to  
> sys.path
>   import sys, pkg_resources, imp
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "MySQLdb/__init__.py", line 19, in <module>
>     import _mysql
>   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in  
> <module>
>   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in  
> __bootstrap__
> ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / 
> usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
>   Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
>   Reason: image not found


Sorry, don't know exactly what's happening here. Is this from a  
"fresh" python prompt?

How did you install MySQLdb, did you use easy_install? If so, try to  
install from the sourceforge download.

Try to remove it, remove the "build" directory from your mysqldb  
download and redo the whole
python setup.py build / python setup.py install process

To remove it, nuke this:
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg

And try to reinstall?

Perhaps someone who knows what the problem is here can give you a  
better idea on what to do.

-steve


From sbassi at gmail.com  Thu Oct  4 06:47:44 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 4 Oct 2007 03:47:44 -0300
Subject: [BioPython] Problem with blast xml
Message-ID: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>

I am having a problem that it is not originated in Biopython, but it
is affecting the Biopython (1.43) xml blast parser.
I have two xml files, one can be parsed and the other can't.
Here are the commands I run to get the xml files:

sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml
sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o
TABB2v2.xml

The relevant difference is the input file, the sequences are
different, but the output file should have the same format (shouldn't
it?).
When I am parsing the files, I find that this is not true.
This is the file that can be parsed without problem:

>>> bout=open('bioinfo/INTA/TABB2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 31'
>>> y.query
u'fragment 67'
>>> x.alignments
[<Bio.Blast.Record.Alignment instance at 0xb659850c>]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a3c6c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3cec>,
<Bio.Blast.Record.Alignment instance at 0xb65a3d8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3f8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e4c>,
<Bio.Blast.Record.Alignment instance at 0xb65aa1ac>]

Let's see what seems to be a malformed? xml file:

>>> bout=open('bioinfo/INTA/TABB2v2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 1'
>>> y.query
u'fragment 57'
>>> x.alignments
[]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a374c>]

There is a record with an empty list.

Here is a fragment of the "normal" one (TABB2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>31</Iteration_iter-num>
      <Iteration_query-ID>lcl|31_0</Iteration_query-ID>
      <Iteration_query-def>fragment 31 </Iteration_query-def>
      <Iteration_query-len>1174</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|1788520|gb|AE000309.1|AE000309</Hit_id>
          <Hit_def>Escherichia coli K-12 MG1655 section 199 of 400 of
the complete genome</Hit_def>
          <Hit_accession>AE000309</Hit_accession>
          <Hit_len>13453</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>

Here is a fragment of the "malformed" one (TABB2v2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>400</Statistics_db-num>
          <Statistics_db-len>4662239</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.710603</Statistics_kappa>
          <Statistics_lambda>1.37406</Statistics_lambda>
          <Statistics_entropy>1.30725</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>57</Iteration_iter-num>

Why is this happening? Is this a expected behavior?

I uploaded the xml files here:
http://www.bioinformatica.info/TABB2.xml
http://www.bioinformatica.info/TABB2v2.xml

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From ytu888 at hotmail.com  Thu Oct  4 12:24:18 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Thu, 4 Oct 2007 07:24:18 -0500
Subject: [BioPython] Error generated by Clustalw example in Tutorial
Message-ID: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>


Hi,

I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.

>>> from Bio import Clustalw

>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))

>>> cline.set_output("result.aln")

>>> print cline
clustalw .\opuntia.fasta -OUTFILE=result.aln

>>> alignment = Clustalw.do_alignment(cline)
Traceback (most recent call last):  File "<interactive input>", line 1, in <module>  File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment    % (out_file, command_line))IOError: Output .aln file result.aln not produced, commandline: clustalw .\opuntia.fasta -OUTFILE=result.aln

_________________________________________________________________
Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now.
http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033


From sbassi at gmail.com  Thu Oct  4 16:19:22 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 4 Oct 2007 13:19:22 -0300
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
Message-ID: <b43bf2080710040919r4bb281edpac6f594d284fc940@mail.gmail.com>

On 10/4/07, Y Tu <ytu888 at hotmail.com> wrote:
> >>> print cline
> clustalw .\opuntia.fasta -OUTFILE=result.aln

I am not sure if this command is properly formated. The slash should
not be there, but I don't have a windows box to try this.

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From mdehoon at c2b2.columbia.edu  Fri Oct  5 01:01:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 4 Oct 2007 21:01:59 -0400
Subject: [BioPython] Problem with blast xml
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>

Can you create two minimal XML files that demonstrate the problem?
For example, by removing records from the two files you have and checking if
parsing still works for one and fails for the other.
By doing so, you may be able to identify exactly what the essential
difference between the two files is.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Sebastian Bassi
Sent: Thu 10/4/2007 2:47 AM
To: biopython at biopython.org
Subject: [BioPython] Problem with blast xml
 
I am having a problem that it is not originated in Biopython, but it
is affecting the Biopython (1.43) xml blast parser.
I have two xml files, one can be parsed and the other can't.
Here are the commands I run to get the xml files:

sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml
sbassi at xubuntu:~/blast-2.2.16/bin$  ./blastall -p blastn -d
/media/vic300/BLASTdb/ecoli.nt -i
/media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o
TABB2v2.xml

The relevant difference is the input file, the sequences are
different, but the output file should have the same format (shouldn't
it?).
When I am parsing the files, I find that this is not true.
This is the file that can be parsed without problem:

>>> bout=open('bioinfo/INTA/TABB2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 31'
>>> y.query
u'fragment 67'
>>> x.alignments
[<Bio.Blast.Record.Alignment instance at 0xb659850c>]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a3c6c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3cec>,
<Bio.Blast.Record.Alignment instance at 0xb65a3d8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3f8c>,
<Bio.Blast.Record.Alignment instance at 0xb65a3e4c>,
<Bio.Blast.Record.Alignment instance at 0xb65aa1ac>]

Let's see what seems to be a malformed? xml file:

>>> bout=open('bioinfo/INTA/TABB2v2.xml')
>>> b_records=NCBIXML.parse(bout)
>>> x=b_records.next()
>>> y=b_records.next()
>>> x.query
u'fragment 1'
>>> y.query
u'fragment 57'
>>> x.alignments
[]
>>> y.alignments
[<Bio.Blast.Record.Alignment instance at 0xb65a374c>]

There is a record with an empty list.

Here is a fragment of the "normal" one (TABB2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>31</Iteration_iter-num>
      <Iteration_query-ID>lcl|31_0</Iteration_query-ID>
      <Iteration_query-def>fragment 31 </Iteration_query-def>
      <Iteration_query-len>1174</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gi|1788520|gb|AE000309.1|AE000309</Hit_id>
          <Hit_def>Escherichia coli K-12 MG1655 section 199 of 400 of
the complete genome</Hit_def>
          <Hit_accession>AE000309</Hit_accession>
          <Hit_len>13453</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>

Here is a fragment of the "malformed" one (TABB2v2.xml):

      <Parameters_gap-extend>2</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
  <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>400</Statistics_db-num>
          <Statistics_db-len>4662239</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.710603</Statistics_kappa>
          <Statistics_lambda>1.37406</Statistics_lambda>
          <Statistics_entropy>1.30725</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
    </Iteration>
    <Iteration>
      <Iteration_iter-num>57</Iteration_iter-num>

Why is this happening? Is this a expected behavior?

I uploaded the xml files here:
http://www.bioinformatica.info/TABB2.xml
http://www.bioinformatica.info/TABB2v2.xml

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From sbassi at gmail.com  Fri Oct  5 05:39:44 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Fri, 5 Oct 2007 02:39:44 -0300
Subject: [BioPython] Problem with blast xml
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com>
	<6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
Message-ID: <b43bf2080710042239t5bcb9c22u6cffdd915bf5a862@mail.gmail.com>

On 10/4/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Can you create two minimal XML files that demonstrate the problem?
> For example, by removing records from the two files you have and checking if
> parsing still works for one and fails for the other.
> By doing so, you may be able to identify exactly what the essential
> difference between the two files is.

After some tests, I found two minimal XML files with this issue:
http://www.bioinformatica.info/mitoA.xml
http://www.bioinformatica.info/mitoB.xml

(only 3.5 kb each).


-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From mdehoon at c2b2.columbia.edu  Fri Oct  5 06:34:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 5 Oct 2007 02:34:56 -0400
Subject: [BioPython] Problem with blast xml
References: <b43bf2080710032347r1791e181t569cecb1051ae0ea@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu>
	<b43bf2080710042239t5bcb9c22u6cffdd915bf5a862@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B631@mail2.exch.c2b2.columbia.edu>

>From looking at the XML files, it seems that the Biopython Blast XML parser
is doing the right thing. Isn't it?

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: Sebastian Bassi [mailto:sbassi at gmail.com]
Sent: Fri 10/5/2007 1:39 AM
To: Michiel De Hoon
Cc: biopython at biopython.org
Subject: Re: [BioPython] Problem with blast xml
 
On 10/4/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Can you create two minimal XML files that demonstrate the problem?
> For example, by removing records from the two files you have and checking
if
> parsing still works for one and fails for the other.
> By doing so, you may be able to identify exactly what the essential
> difference between the two files is.

After some tests, I found two minimal XML files with this issue:
http://www.bioinformatica.info/mitoA.xml
http://www.bioinformatica.info/mitoB.xml

(only 3.5 kb each).


-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From biopython at maubp.freeserve.co.uk  Fri Oct  5 09:26:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 05 Oct 2007 10:26:06 +0100
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
Message-ID: <4706032E.1020703@maubp.freeserve.co.uk>

Y Tu wrote:
> Hi,
> 
> I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.
> 
>>>> from Bio import Clustalw
>>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))
>>>> cline.set_output("result.aln")
>>>> print cline
> clustalw .\opuntia.fasta -OUTFILE=result.aln

The Windows version of ClustalW is very fussy.  To experiment try 
running this by hand at the windows command prompt - note that I'm not 
at my Windows machine so I haven't double checked this:

clustalw .\opuntia.fasta -OUTFILE=result.aln

or,

clustalw opuntia.fasta -OUTFILE=result.aln

Any error messages would be helpful.

I suggest you try this in Biopython:

from Bio import Clustalw
cline = Clustalw.MultipleAlignCL("opuntia.fasta")
cline.set_output("result.aln")
print cline

Also, we have made a few tweaks to this code since Biopython 1.43 was 
released (see emails with Emanuel Hey in July 2007).  If you like, you 
can try updating this module to the CVS version.  Simply backup the 
existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and 
replace it with the latest code from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python

Peter


From ytu888 at hotmail.com  Fri Oct  5 16:32:05 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 5 Oct 2007 11:32:05 -0500
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <4706032E.1020703@maubp.freeserve.co.uk>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>
	<4706032E.1020703@maubp.freeserve.co.uk>
Message-ID: <BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>


I tested both commands under window prompt, initially both generated error because window don't know clustalw. Once I give the correct path of the clustalw, both generated alignment results without any error. BTW, I used the one inside BioEdit, I did not find clustalw coming with Biopython. It looks like python use online program at ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right?

Then I replace the old _ini_with the new one, but there is a new error message similar to the old one:

>>> alignment = Clustalw.do_alignment(cline)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment
    # check if the outfile exists before parsing
IOError: Output .aln file result1.aln not produced, commandline: clustalw opuntia.fasta -OUTFILE=result1.aln

Also I tested the example on OS X, the same error was generated:

>>> alignment = Clustalw.do_alignment(cline)
sh: line 1: clustalw: command not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 117, in do_alignment
    % (out_file, command_line))
IOError: Output .aln file result1.aln not produced, commandline: clustalw ./opuntia.fasta -OUTFILE=result1.aln

It seems like the problem is not linked to OS. What other things could be wrong? Thanks.


> Date: Fri, 5 Oct 2007 10:26:06 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com
> CC: biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error generated by Clustalw example in Tutorial
> 
> Y Tu wrote:
> > Hi,
> > 
> > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks.
> > 
> >>>> from Bio import Clustalw
> >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta"))
> >>>> cline.set_output("result.aln")
> >>>> print cline
> > clustalw .\opuntia.fasta -OUTFILE=result.aln
> 
> The Windows version of ClustalW is very fussy.  To experiment try 
> running this by hand at the windows command prompt - note that I'm not 
> at my Windows machine so I haven't double checked this:
> 
> clustalw .\opuntia.fasta -OUTFILE=result.aln
> 
> or,
> 
> clustalw opuntia.fasta -OUTFILE=result.aln
> 
> Any error messages would be helpful.
> 
> I suggest you try this in Biopython:
> 
> from Bio import Clustalw
> cline = Clustalw.MultipleAlignCL("opuntia.fasta")
> cline.set_output("result.aln")
> print cline
> 
> Also, we have made a few tweaks to this code since Biopython 1.43 was 
> released (see emails with Emanuel Hey in July 2007).  If you like, you 
> can try updating this module to the CVS version.  Simply backup the 
> existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and 
> replace it with the latest code from here:
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python
> 
> Peter
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us


From biopython at maubp.freeserve.co.uk  Fri Oct  5 18:35:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 05 Oct 2007 19:35:05 +0100
Subject: [BioPython] Error generated by Clustalw example in Tutorial
In-Reply-To: <BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>
References: <BAY119-W6570C19B5E5A30AB2EC978FA80@phx.gbl>	<4706032E.1020703@maubp.freeserve.co.uk>
	<BAY119-W411E7B2D457790F765CC158FA90@phx.gbl>
Message-ID: <470683D9.90808@maubp.freeserve.co.uk>

Y Tu wrote:
> I tested both commands under window prompt, initially both generated 
> error because window don't know clustalw.

This is expected.  You must either supply the full path of the clustalw 
executable, or have it on the system path.  Otherwise Windows doesn't 
know how to find the clustalw program.

> Once I give the correct path of the clustalw, both generated 
> alignment results without any error. BTW, I used the one inside 
> BioEdit, I did not find clustalw coming with Biopython. It looks like
> python use online program at
> ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right?

Clustalw is a standalone program (completely separate from Biopython)
which you must install separately if you want to use it.  It is 
available from several servers - the one you chose looks fine.

> Then I replace the old _ini_with the new one, but there is a new 
> error message similar to the old one:
> 
>>>> alignment = Clustalw.do_alignment(cline)
> Traceback (most recent call last): File "<interactive input>", line 
> 1, in <module> File 
> "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, 
> in do_alignment # check if the outfile exists before parsing IOError:
>  Output .aln file result1.aln not produced, commandline: clustalw 
> opuntia.fasta -OUTFILE=result1.aln
> 
> Also I tested the example on OS X, the same error was generated:
> 
>>>> alignment = Clustalw.do_alignment(cline)
> sh: line 1: clustalw: command not found Traceback (most recent call 
> last): File "<stdin>", line 1, in <module> File 
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
>  line 117, in do_alignment % (out_file, command_line)) IOError: 
> Output .aln file result1.aln not produced, commandline: clustalw 
> ./opuntia.fasta -OUTFILE=result1.aln
> 
> It seems like the problem is not linked to OS. What other things 
> could be wrong? Thanks.

In both cases, you are not explicitly providing the path to clustalw - 
so for this to work the clustalw executable must be on the system path.

The other obvious thing to check is the location of the files versus the 
working directory. Is your python script in the same folder as the 
opuntia.fasta file?

What happens if you try those exact command lines (which Biopython says 
it is trying to run) at the command prompt in directory where your 
python script is located? i.e.

Windows:
clustalw opuntia.fasta -OUTFILE=result1.aln

Mac:
clustalw ./opuntia.fasta -OUTFILE=result1.aln

Peter


From meesters at uni-mainz.de  Mon Oct  8 15:07:54 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 8 Oct 2007 17:07:54 +0200
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
Message-ID: <1191856074.5425.24.camel@cmeesters>

Hi,

I'm trying to 'split' a structure in several pieces, e.g. a former chain
'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...

Perhaps some code explains better what I'm trying to achieve:

breakpoints = [1254, 5444,
                6690, 10888,
                10889, 16332,
                16333, 21776,
                21776, 27220,
                27221, 32665]

def split_chain(structure, breakpoints, outname = 'split.pdb'):
    chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
                    'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
                    'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
                    'X', 'Y', 'Z']

    chain = chains.pop(0)
    for atom in structure.get_atoms():
        number = atom.get_serial_number()
        if breaks and number == breaks[0]:
            breaks.pop(0)
            chain = chains.pop(0)
        atom.parent.parent.id = chain # assign new chain

    iostream = PDBIO()
    try:
        outfile = open(outname, 'w')
        iostream.set_structure(structure.structure)
        iostream.save(outfile)
    except IOError, msg:
        raise IOError(msg)

So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
5444. Instead the written pdb-file contains all atoms, but with the
wrong chain ids (see above). (Please don't tell my how unpythonic the
code reads, point is that I've tried so many different things that I
first need to understand my logic mistake.)

Any ideas, where my mistake is?

Thanks,
Christian


From meesters at uni-mainz.de  Mon Oct  8 15:54:32 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 8 Oct 2007 17:54:32 +0200
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
In-Reply-To: <470A508C.4060803@maubp.freeserve.co.uk>
References: <1191856074.5425.24.camel@cmeesters>
	<470A508C.4060803@maubp.freeserve.co.uk>
Message-ID: <1191858872.5425.32.camel@cmeesters>


> > breakpoints = [1254, 5444,
> >                 6690, 10888,
> >                 10889, 16332,
> >                 16333, 21776,
> >                 21776, 27220,
> >                 27221, 32665]
> 
> I'm assuming this is "breaks" later on.
Absolutely - that's the pain with copy & paste for demos ... sorry.

> As the reason, I think this is what is happening: Given an atom, then 
> atom.parent will be a residue object, and atom.parent.parent will be a 
> chain object.  Note all the atoms in a single amino acid residue will 
> share share the same .parent, and all the atoms in a single chain will 
> share the same .parent.parent
> 
> i.e. You have renamed Chain "A" to "A", and then later renamed this 
> chain to "B", and then again to "C".  You didn't ever split up the chain 
> into sub chains.
Mh, makes sense. 
> 
> To be honest, I would be tempted to write a quick and dirty script which 
> parsed the raw PDB file, and rewrote the chain field based on the atom 
> sequence number - without the overhead of the PDB parser.
Yes, would have been too easy ;-). Only wanted to add this functionality
to a larger application and make it easy to use. There is no strict need
to do so, but it would have been nice.
However, thanks for the input.

Christian


From biopython at maubp.freeserve.co.uk  Mon Oct  8 15:45:16 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 08 Oct 2007 16:45:16 +0100
Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures?
In-Reply-To: <1191856074.5425.24.camel@cmeesters>
References: <1191856074.5425.24.camel@cmeesters>
Message-ID: <470A508C.4060803@maubp.freeserve.co.uk>

Christian Meesters wrote:
> Hi,
> 
> I'm trying to 'split' a structure in several pieces, e.g. a former chain
> 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
> Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...
> 
> Perhaps some code explains better what I'm trying to achieve:
> 
> breakpoints = [1254, 5444,
>                 6690, 10888,
>                 10889, 16332,
>                 16333, 21776,
>                 21776, 27220,
>                 27221, 32665]

I'm assuming this is "breaks" later on.

> def split_chain(structure, breakpoints, outname = 'split.pdb'):
>     chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
>                     'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
>                     'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
>                     'X', 'Y', 'Z']
> 
>     chain = chains.pop(0)
>     for atom in structure.get_atoms():
>         number = atom.get_serial_number()
>         if breaks and number == breaks[0]:
>             breaks.pop(0)
>             chain = chains.pop(0)
>         atom.parent.parent.id = chain # assign new chain
> 
>     iostream = PDBIO()
>     try:
>         outfile = open(outname, 'w')
>         iostream.set_structure(structure.structure)
>         iostream.save(outfile)
>     except IOError, msg:
>         raise IOError(msg)
> 
> So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
> 5444. Instead the written pdb-file contains all atoms, but with the
> wrong chain ids (see above). (Please don't tell my how unpythonic the
> code reads, point is that I've tried so many different things that I
> first need to understand my logic mistake.)
> 
> Any ideas, where my mistake is?

As the reason, I think this is what is happening: Given an atom, then 
atom.parent will be a residue object, and atom.parent.parent will be a 
chain object.  Note all the atoms in a single amino acid residue will 
share share the same .parent, and all the atoms in a single chain will 
share the same .parent.parent

i.e. You have renamed Chain "A" to "A", and then later renamed this 
chain to "B", and then again to "C".  You didn't ever split up the chain 
into sub chains.

I think you need to create a new chain objects instead... but I'm not 
sure off hand how best to do this with Bio.PDB

To be honest, I would be tempted to write a quick and dirty script which 
parsed the raw PDB file, and rewrote the chain field based on the atom 
sequence number - without the overhead of the PDB parser.

Peter


From bbrazelton at gmail.com  Tue Oct  9 00:33:03 2007
From: bbrazelton at gmail.com (B. Brazelton)
Date: Mon, 8 Oct 2007 17:33:03 -0700
Subject: [BioPython] BLAST XML parser trouble
Message-ID: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>

I tried to follow the BLAST XML parser example in the tutorial, but I
always get the following error when attempting to iterate through the
records:

Traceback (most recent call last):
  File "BlastXML_Parser.py", line 10, in ?
    for blast_record in blast_records:
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 572, in parse
    expat_parser.Parse(text, False)
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 98, in endElement
    eval("self.%s()" % method)
  File "<string>", line 0, in ?
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py",
line 215, in _end_BlastOutput_version
    self._header.version = self._value.split()[1]
IndexError: list index out of range

All I did was:

result_handle = open('NifH_Blast.xml')
from Bio.Blast import NCBIXML
blast_records = NCBIXML.parse(result_handle)
for blast_record in blast_records:
    ... etc

I put my script and xml file here:
http://www.staff.washington.edu/braz/files

I'm using biopython 1.43, and I get the same error on both Python
2.3.5 and Python 5.

It seems like my commands are exactly what is in the tutorial, so I'm
confused. My best guess is that there is a difference in the XML
format, but it's NCBI XML. Thanks for any help,

Bill Brazelton


From sbassi at gmail.com  Tue Oct  9 00:48:50 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 8 Oct 2007 21:48:50 -0300
Subject: [BioPython] BLAST XML parser trouble
In-Reply-To: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
Message-ID: <b43bf2080710081748r1d7e5935sde58fb894d820def@mail.gmail.com>

On 10/8/07, B. Brazelton <bbrazelton at gmail.com> wrote:
> I tried to follow the BLAST XML parser example in the tutorial, but I
> always get the following error when attempting to iterate through the
> records:

Got the same result as you. Could you please tell me the URL of the
tutorial you saw this?

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From mdehoon at c2b2.columbia.edu  Tue Oct  9 02:55:21 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon, 8 Oct 2007 22:55:21 -0400
Subject: [BioPython] BLAST XML parser trouble
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>

How did you produce the XML file? In particular, which Blast version did you
use?
The Blast XML parser trips over the following line in your XML file:

    <BlastOutput_version>unspecified</BlastOutput_version>

This is supposed to be:

  <BlastOutput_version>BLASTP 2.2.12 [Aug-07-2005]</BlastOutput_version>

, of course depending on which Blast version you are using.

--Michiel


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton
Sent: Mon 10/8/2007 8:33 PM
To: biopython at biopython.org
Subject: [BioPython] BLAST XML parser trouble
 
I tried to follow the BLAST XML parser example in the tutorial, but I
always get the following error when attempting to iterate through the
records:

Traceback (most recent call last):
  File "BlastXML_Parser.py", line 10, in ?
    for blast_record in blast_records:
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 572, in parse
    expat_parser.Parse(text, False)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 98, in endElement
    eval("self.%s()" % method)
  File "<string>", line 0, in ?
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
packages/Bio/Blast/NCBIXML.py",
line 215, in _end_BlastOutput_version
    self._header.version = self._value.split()[1]
IndexError: list index out of range

All I did was:

result_handle = open('NifH_Blast.xml')
from Bio.Blast import NCBIXML
blast_records = NCBIXML.parse(result_handle)
for blast_record in blast_records:
    ... etc

I put my script and xml file here:
http://www.staff.washington.edu/braz/files

I'm using biopython 1.43, and I get the same error on both Python
2.3.5 and Python 5.

It seems like my commands are exactly what is in the tutorial, so I'm
confused. My best guess is that there is a difference in the XML
format, but it's NCBI XML. Thanks for any help,

Bill Brazelton
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From kbaa at novonordisk.com  Tue Oct  9 12:26:14 2007
From: kbaa at novonordisk.com (KBAA (Kent Bondensgaard))
Date: Tue, 9 Oct 2007 14:26:14 +0200
Subject: [BioPython] FW: Parsing sequence information in patents
Message-ID: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>


Does anyone know how to parse protein sequence information in patents with Biopython?

BR, Kent Bondensgaards

__________________________________

Kent Bondensgaard
Research Scientist
Protein Structure and Biophysics

Novo Nordisk A/S
Novo Nordisk Park
DK-2760 M?l?v
Denmark
+45 4443 4510 (direct)
+45 3075 4510 (mobile)
+45 4466 3450 (fax)
kbaa at novonordisk.com

Changing the way we look at diabetes
A new DAWN for people with diabetes? Click here to read more <http://www.novonordisk.com/about_us/changing-diabetes-activities/dawn.asp> 

This e-mail (including any attachments) is intended for the addressee(s) stated above only and may contain confidential information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information contained herein is strictly prohibited and may violate rights to proprietary information. If you are not an intended recipient, please return this e-mail to the sender and delete it immediately hereafter. Thank you.


From sbassi at gmail.com  Tue Oct  9 13:04:51 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 9 Oct 2007 10:04:51 -0300
Subject: [BioPython] FW: Parsing sequence information in patents
In-Reply-To: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>
References: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net>
Message-ID: <b43bf2080710090604s5dcff35asd31dd65cd6a254d6@mail.gmail.com>

On 10/9/07, KBAA (Kent Bondensgaard) <kbaa at novonordisk.com> wrote:
>
> Does anyone know how to parse protein sequence information in patents with Biopython?

What about using patAA and patNT from NCBI? They are both available as
blast ready, you could retrieve the fasta file using fastacmd.

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From bbrazelton at gmail.com  Tue Oct  9 20:24:58 2007
From: bbrazelton at gmail.com (B. Brazelton)
Date: Tue, 9 Oct 2007 13:24:58 -0700
Subject: [BioPython] BLAST XML parser trouble
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>
References: <da0826f20710081733t666cb56bjc17faed3fc86c0bc@mail.gmail.com>
	<6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu>
Message-ID: <da0826f20710091324m7ec36306v29b36d6f2029073c@mail.gmail.com>

I put in 'tblastx 2.2.15 [Oct-15-2006]' and it worked fine.

Thanks for your help, sorry for the newbie question.

(FYI, I was using results generated from the CAMERA database
(http://camera.calit2.net/), and I was using the main biopython
tutorial and cookbook from biopython.org. thanks again,

BB

On 10/8/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> How did you produce the XML file? In particular, which Blast version did you
> use?
> The Blast XML parser trips over the following line in your XML file:
>
>     <BlastOutput_version>unspecified</BlastOutput_version>
>
> This is supposed to be:
>
>   <BlastOutput_version>BLASTP 2.2.12 [Aug-07-2005]</BlastOutput_version>
>
> , of course depending on which Blast version you are using.
>
> --Michiel
>
>
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton
> Sent: Mon 10/8/2007 8:33 PM
> To: biopython at biopython.org
> Subject: [BioPython] BLAST XML parser trouble
>
> I tried to follow the BLAST XML parser example in the tutorial, but I
> always get the following error when attempting to iterate through the
> records:
>
> Traceback (most recent call last):
>   File "BlastXML_Parser.py", line 10, in ?
>     for blast_record in blast_records:
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 572, in parse
>     expat_parser.Parse(text, False)
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 98, in endElement
>     eval("self.%s()" % method)
>   File "<string>", line 0, in ?
>   File
> "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-
> packages/Bio/Blast/NCBIXML.py",
> line 215, in _end_BlastOutput_version
>     self._header.version = self._value.split()[1]
> IndexError: list index out of range
>
> All I did was:
>
> result_handle = open('NifH_Blast.xml')
> from Bio.Blast import NCBIXML
> blast_records = NCBIXML.parse(result_handle)
> for blast_record in blast_records:
>     ... etc
>
> I put my script and xml file here:
> http://www.staff.washington.edu/braz/files
>
> I'm using biopython 1.43, and I get the same error on both Python
> 2.3.5 and Python 5.
>
> It seems like my commands are exactly what is in the tutorial, so I'm
> confused. My best guess is that there is a difference in the XML
> format, but it's NCBI XML. Thanks for any help,
>
> Bill Brazelton
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>


From sbassi at gmail.com  Tue Oct  9 21:09:09 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 9 Oct 2007 18:09:09 -0300
Subject: [BioPython] Getting Qv using Python?
Message-ID: <b43bf2080710091409t19b0ae14t3e587d64b011ccf3@mail.gmail.com>

Is there an automated way to get Quality Values (QV) from a ab1 file?
I wrap Abiview [1] to get the sequence, but now I need the Qv.

[1] http://bioweb.pasteur.fr/docs/EMBOSS/abiview.html

-- 
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From prashanth at ibioinformatics.org  Wed Oct 10 12:17:26 2007
From: prashanth at ibioinformatics.org (Prashantha Hebbar Kiradi)
Date: Wed, 10 Oct 2007 17:47:26 +0530
Subject: [BioPython] where is SeqIO.parse()?
Message-ID: <470CC2D6.1090504@ibioinformatics.org>

Hi everybody,

While trying the example of 'Parsing sequence file formats' from section
2.4 of Biopython tutorial:
-------------------------------------------------
from Bio import SeqIO
handle = open("ls_orchid.fasta")
for seq_record in SeqIO.parse(handle, "fasta") :
    print seq_record.id
    print seq_record.seq
    print len(seq_record.seq)
handle.close()
-------------------------------------------------


I get this error:
-------------------------------------------------
Traceback (most recent call last):
  File "fastEx.py", line 5, in <module>
    for seq_record in SeqIO.parse(handle, "fasta") :
AttributeError: 'module' object has no attribute 'parse'
-------------------------------------------------

Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm
using is opening correctly.

API documentation reports that the 'parse' function is there. What am I
doing wrong?

I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Thanks in advance,

Prashantha Hebbar
Institute of Bioinformatics
ITPL, Bangalore,
INDIA


From fennan at gmail.com  Wed Oct 10 12:20:56 2007
From: fennan at gmail.com (Fernando)
Date: Wed, 10 Oct 2007 14:20:56 +0200
Subject: [BioPython] Code publications
Message-ID: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>

Hi everybody,

This might be off-topic, or maybe not:

I've been working with biopython for a while and I am curious about what the
authors get from all the exceptional work they are doing... I know it won't
have to do anything with money, but in terms of publication / copyrihts etc,
what are the adventages of having your code in biopython? Is there a journey
/ conference where the author publish their works and likewise they can be
referenced or something like that?

Thanks,
Fernando


From mdehoon at c2b2.columbia.edu  Wed Oct 10 12:24:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 10 Oct 2007 08:24:33 -0400
Subject: [BioPython] where is SeqIO.parse()?
References: <470CC2D6.1090504@ibioinformatics.org>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B635@mail2.exch.c2b2.columbia.edu>

> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Use Biopython 1.43.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Prashantha Hebbar
Kiradi
Sent: Wed 10/10/2007 8:17 AM
To: biopython at biopython.org
Subject: [BioPython] where is SeqIO.parse()?
 
Hi everybody,

While trying the example of 'Parsing sequence file formats' from section
2.4 of Biopython tutorial:
-------------------------------------------------
from Bio import SeqIO
handle = open("ls_orchid.fasta")
for seq_record in SeqIO.parse(handle, "fasta") :
    print seq_record.id
    print seq_record.seq
    print len(seq_record.seq)
handle.close()
-------------------------------------------------


I get this error:
-------------------------------------------------
Traceback (most recent call last):
  File "fastEx.py", line 5, in <module>
    for seq_record in SeqIO.parse(handle, "fasta") :
AttributeError: 'module' object has no attribute 'parse'
-------------------------------------------------

Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm
using is opening correctly.

API documentation reports that the 'parse' function is there. What am I
doing wrong?

I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.

Thanks in advance,

Prashantha Hebbar
Institute of Bioinformatics
ITPL, Bangalore,
INDIA

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at uiuc.edu  Wed Oct 10 14:14:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 Oct 2007 09:14:48 -0500
Subject: [BioPython] Code publications
In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
Message-ID: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>

This is a question that could be posed for any open-source project.

It differs per person in my opinion.  For instance, I donate time and  
code to BioPerl based on several factors.  Not reinventing the wheel,  
giving back to the community, access to the code base, and the joy of  
programming (believe it or not) are among them, but they aren't the  
only ones.

Publications don't hurt but they aren't my primary motivation.  It  
generally isn't the focus of my research, only a means to an end (to  
parse or generate data).  I don't see anything wrong with it being  
someone else's primary drive to donate as long as they continue  
support their code post-publication, an issue that unfortunately pops  
up quite frequently.

chris

On Oct 10, 2007, at 7:20 AM, Fernando wrote:

> Hi everybody,
>
> This might be off-topic, or maybe not:
>
> I've been working with biopython for a while and I am curious about  
> what the
> authors get from all the exceptional work they are doing... I know  
> it won't
> have to do anything with money, but in terms of publication /  
> copyrihts etc,
> what are the adventages of having your code in biopython? Is there  
> a journey
> / conference where the author publish their works and likewise they  
> can be
> referenced or something like that?
>
> Thanks,
> Fernando
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From biopython at maubp.freeserve.co.uk  Wed Oct 10 12:42:01 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Oct 2007 13:42:01 +0100
Subject: [BioPython] Code publications
In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
Message-ID: <470CC899.6080802@maubp.freeserve.co.uk>

Fernando wrote:
> Hi everybody,
> 
> This might be off-topic, or maybe not:
> 
> I've been working with biopython for a while and I am curious about what the
> authors get from all the exceptional work they are doing... I know it won't
> have to do anything with money, but in terms of publication / copyrihts etc,
> what are the adventages of having your code in biopython? Is there a journey
> / conference where the author publish their works and likewise they can be
> referenced or something like that?

Pride? Looks good on a CV?  Although I must say working on BioPerl would 
have been a better choice from the point of view of job hunting ;)

Some of the specific modules have associated publications which get 
cited (e.g. Bio.PDB and Bio.Cluster - although the later is also 
available independently of Biopython).  The closest to a general 
Biopython paper is currently Chapman and Chang 2000.

In terms of talks, most recently I gave a talk at BOSC 2007 in July, the 
"Biopython Project Update". Which reminds me, I have a few photos and 
the slides (sadly in PowerPoint - my initial attempt to convert them 
into PDF wasn't great, font issues leading to content getting cropped).

Peter


From tiagoantao at gmail.com  Wed Oct 10 16:59:56 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Wed, 10 Oct 2007 17:59:56 +0100
Subject: [BioPython] Code publications
In-Reply-To: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>
References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com>
	<865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu>
Message-ID: <470D050C.7060500@gmail.com>

I am currently submitting my populations genetics' code into biopython 
and I can talk about my motivations.

Most of the code that I am submitting was used in something that I have 
done in the past (sometimes published). I figured, that if I have the 
code sitting here, I could as well donate it. This has one interesting 
advantage for me: all the code that I know I will try to submit to 
biopython is designed with care, all the code that is a one off is 
really a big mess. For me making code public is a motivator to maintain 
clean code.

It is also a way to get to know people that are interested in this type 
of problems, and I think that, as with all things in life, knowing more 
people is a good thing.

Maybe, in 12/18 months time I might think in suggesting to other people 
writing an article on the popgen work in biopython. Lets face it, that 
is also a good motivator. But, if it is the only one, I would agree that 
is not good (as Chris says, maintenance after publication...)

Last, but not least: ethical and moral issues. Having spent some time 
outside of science I do think most scientific work is done in a very 
closed fashion (it was a shock to me, really). From my personal point of 
view open science and free software are arguments to which I connect 
moral value.

Tiago

Chris Fields wrote:
> This is a question that could be posed for any open-source project.
>
> It differs per person in my opinion.  For instance, I donate time and  
> code to BioPerl based on several factors.  Not reinventing the wheel,  
> giving back to the community, access to the code base, and the joy of  
> programming (believe it or not) are among them, but they aren't the  
> only ones.
>
> Publications don't hurt but they aren't my primary motivation.  It  
> generally isn't the focus of my research, only a means to an end (to  
> parse or generate data).  I don't see anything wrong with it being  
> someone else's primary drive to donate as long as they continue  
> support their code post-publication, an issue that unfortunately pops  
> up quite frequently.
>
> chris
>
> On Oct 10, 2007, at 7:20 AM, Fernando wrote:
>
>   
>> Hi everybody,
>>
>> This might be off-topic, or maybe not:
>>
>> I've been working with biopython for a while and I am curious about  
>> what the
>> authors get from all the exceptional work they are doing... I know  
>> it won't
>> have to do anything with money, but in terms of publication /  
>> copyrihts etc,
>> what are the adventages of having your code in biopython? Is there  
>> a journey
>> / conference where the author publish their works and likewise they  
>> can be
>> referenced or something like that?
>>
>> Thanks,
>> Fernando
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>   


From rebekah.rogers at gmail.com  Thu Oct 11 18:57:21 2007
From: rebekah.rogers at gmail.com (Rebekah Rogers)
Date: Thu, 11 Oct 2007 14:57:21 -0400
Subject: [BioPython] running PAML in python
Message-ID: <79def59f0710111157h7483d5b5m6e6cdb3b86266750@mail.gmail.com>

Hello:

Does anyone know of an existing library that can run aligned sequences
in PAML and then pull out the dN/dS values?

Thanks!
-Rebekah


From The_Polymorph at rocketmail.com  Sun Oct 14 17:04:48 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 14 Oct 2007 10:04:48 -0700 (PDT)
Subject: [BioPython] Performing sequence alignments, etc.
Message-ID: <311410.84366.qm@web50801.mail.re2.yahoo.com>

Hi all.

Hi all.

I'm relatively new to the field of bioinformatics and I'm trying to
perform a multiple sequence alignment on 5-6 sequences (fasta format -
dna sequences). I'd like the output to be formatted in the following
manner (clustalw standalone output):

accession_number1: atctcgatatcgggcgctcta...
accession_number2: atctctattctctggatctct...
...

When one more more nucleotides columns are identical, clustalw displays
an asterisk. If not, a blank space is displayed. Is this a standard
feature of BioPython?

Also, I'm evaluating several sequences but I'd like to obtain the most
recent complete genomes possible from various countries. Is there a
convenient source to use (GenBank?) if I don't know the accession
numbers?

Thanks,

~Caitlin
   

Thanks,

~Caitlin


____________________________________________________________________________________
Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/


From biopython at maubp.freeserve.co.uk  Sun Oct 14 17:38:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 14 Oct 2007 18:38:32 +0100
Subject: [BioPython] Performing sequence alignments, etc.
In-Reply-To: <311410.84366.qm@web50801.mail.re2.yahoo.com>
References: <311410.84366.qm@web50801.mail.re2.yahoo.com>
Message-ID: <47125418.5020009@maubp.freeserve.co.uk>

Caitlin wrote:
> Hi all.
> 
> I'm relatively new to the field of bioinformatics and I'm trying to
> perform a multiple sequence alignment on 5-6 sequences (fasta format -
> dna sequences). I'd like the output to be formatted in the following
> manner (clustalw standalone output):

For reading and writing Clustalw alignment files, you could either use 
Bio.SeqIO (format name "clustal") or the Bio.Clustalw module.
http://biopython.org/wiki/SeqIO

> When one more more nucleotides columns are identical, clustalw displays
> an asterisk. If not, a blank space is displayed. Is this a standard
> feature of BioPython?

There is an example of Clustalw output online here - note there can also 
be a column of numbers on the right hand side (not shown here):
http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format

It sounds like you are describing the simple consensus string which 
clustalw outputs under the alignment (using *:. and space).

Biopython has a SummaryInfo object which can calculate simple consensus 
sequences (see the tutorial). Perhaps this would be close to what you 
want to do.

> Also, I'm evaluating several sequences but I'd like to obtain the most
> recent complete genomes possible from various countries. Is there a
> convenient source to use (GenBank?) if I don't know the accession
> numbers?

What sort of Genomes? Bacteria? Vertebrates?  You could start by having 
a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these 
three are kept in sync with each other).

Biopython has quite a nice interface for searching and downloading 
sequences from GenBank (again, see the tutorial) so that would be my 
first suggestion.

Peter


From The_Polymorph at rocketmail.com  Mon Oct 15 02:13:24 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 14 Oct 2007 19:13:24 -0700 (PDT)
Subject: [BioPython] Performing sequence alignments, etc.
In-Reply-To: <47125418.5020009@maubp.freeserve.co.uk>
Message-ID: <129586.66498.qm@web50807.mail.re2.yahoo.com>

Thanks Peter. The genomes are viral. I'll definitely read that
tutorial.
Your help is very appreciated.

~Caitlin

--- Peter <biopython at maubp.freeserve.co.uk> wrote:

> Caitlin wrote:
> > Hi all.
> > 
> > I'm relatively new to the field of bioinformatics and I'm trying to
> > perform a multiple sequence alignment on 5-6 sequences (fasta
> format -
> > dna sequences). I'd like the output to be formatted in the
> following
> > manner (clustalw standalone output):
> 
> For reading and writing Clustalw alignment files, you could either
> use 
> Bio.SeqIO (format name "clustal") or the Bio.Clustalw module.
> http://biopython.org/wiki/SeqIO
> 
> > When one more more nucleotides columns are identical, clustalw
> displays
> > an asterisk. If not, a blank space is displayed. Is this a standard
> > feature of BioPython?
> 
> There is an example of Clustalw output online here - note there can
> also 
> be a column of numbers on the right hand side (not shown here):
> http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format
> 
> It sounds like you are describing the simple consensus string which 
> clustalw outputs under the alignment (using *:. and space).
> 
> Biopython has a SummaryInfo object which can calculate simple
> consensus 
> sequences (see the tutorial). Perhaps this would be close to what you
> 
> want to do.
> 
> > Also, I'm evaluating several sequences but I'd like to obtain the
> most
> > recent complete genomes possible from various countries. Is there a
> > convenient source to use (GenBank?) if I don't know the accession
> > numbers?
> 
> What sort of Genomes? Bacteria? Vertebrates?  You could start by
> having 
> a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these 
> three are kept in sync with each other).
> 
> Biopython has quite a nice interface for searching and downloading 
> sequences from GenBank (again, see the tutorial) so that would be my 
> first suggestion.
> 
> Peter
> 
> 
> 
> 


"Be who you are and say what you feel because those who mind don't 
matter and those who matter don't mind." 

- Dr. Seuss, "Oh the Places You'll Go"


      ____________________________________________________________________________________
Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
http://autos.yahoo.com/index.html
 

From fredgca at hotmail.com  Mon Oct 15 13:02:27 2007
From: fredgca at hotmail.com (Frederico Arnoldi)
Date: Mon, 15 Oct 2007 13:02:27 +0000
Subject: [BioPython] where is SeqIO.parse()?
In-Reply-To: <mailman.33040.1192414808.2686.biopython@lists.open-bio.org>
References: <mailman.33040.1192414808.2686.biopython@lists.open-bio.org>
Message-ID: <BLU105-W10B76818834BD20393B37FBFA30@phx.gbl>


Dear Kiradi,
Concerning your subject question: where is SeqIO.parse()?
>>> from Bio import SeqIO
>>> SeqIO


  So, in my system, it is at /usr/lib/python2.4/site-packages/Bio/SeqIO/__init__.py. Try the same command in your python console and see where it is in yours.

Concerning your problem:
Try
>>> from Bio import SeqIO
>>> dir()
['SeqIO', '__builtins__', '__doc__', '__name__']
>>> dir(SeqIO)
['Alignment', 'ClustalIO', 'FastaIO', 'InsdcIO', 'Interfaces', 'NexusIO', 'PhylipIO', 'Seq', 'SeqRecord', 'StockholmIO', 'StringIO', 'SwissIO', '_FormatToIterator', '_FormatToWriter', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'generic_alphabet', 'generic_protein', 'os', 'parse', 'to_alignment', 'to_dict', 'write']

   Do you get the same result? See that "parse" is in my SeqIO. Is it in yours? 
   I noted that installing biopython via apt in Ubunutu, the __init__.py in Bio/SeqIO was empty. Maybe it is the source of your problem. But if I am right, when you type, in your system, dir(SeqIO), you get ['__builtins__', '__doc__', '__file__', '__name__', '__path__'], confirming your __init__.py is empty. Check it.
  If this is your problem, try installing biopyton by the tar.gz file available in Biopython home page. 

Good luck,
Fred 


---------------------------------------------------------------------->> Message: 1> Date: Wed, 10 Oct 2007 17:47:26 +0530> From: Prashantha Hebbar Kiradi > Subject: [BioPython] where is SeqIO.parse()?> To: biopython at biopython.org> Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed>> Hi everybody,>> While trying the example of 'Parsing sequence file formats' from section> 2.4 of Biopython tutorial:> -------------------------------------------------> from Bio import SeqIO> handle = open("ls_orchid.fasta")> for seq_record in SeqIO.parse(handle, "fasta") :> print seq_record.id> print seq_record.seq> print len(seq_record.seq)> handle.close()> ------------------------------------------------->>> I get this error:> -------------------------------------------------> Traceback (most recent call last):> File "fastEx.py", line 5, in > for seq_record in SeqIO.parse(handle, "fasta") :> AttributeError: 'module' object has no attribute 'parse'> ------------------------------------------------->> Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm> using is opening correctly.>> API documentation reports that the 'parse' function is there. What am I> doing wrong?>> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.>> Thanks in advance,>> Prashantha Hebbar> Institute of Bioinformatics> ITPL, 
_________________________________________________________________
Receba as ?ltimas not?cias do Brasil e do mundo direto no seu Messenger com Alertas MSN! ? GR?TIS!
http://alertas.br.msn.com/


From ytu888 at hotmail.com  Mon Oct 15 16:19:47 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 15 Oct 2007 11:19:47 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
Message-ID: <BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>


Hi Steve,

Thank you for your email. I was away for a week. 
What do you mean "fresh" python prompt?
I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded online. 
I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, am I right?

Once again, thank you very much for your help..

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Wed, 3 Oct 2007 10:47:41 -0400
> To: ytu888 at hotmail.com
> 
> > Steve, thank you very much. It fixed the problem and I got through  
> > the build and install step. But when I tested inside the python for  
> > the installation I got following error. Please help me about it.  
> > Thanks.
> >
> > >>> import MySQLdb
> > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ 
> > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ 
> > _mysql.py:3: UserWarning: Module _mysql was already imported from / 
> > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc,  
> > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to  
> > sys.path
> >   import sys, pkg_resources, imp
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "MySQLdb/__init__.py", line 19, in <module>
> >     import _mysql
> >   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in  
> > <module>
> >   File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in  
> > __bootstrap__
> > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / 
> > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
> >   Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- 
> > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
> >   Reason: image not found
> 
> 
> Sorry, don't know exactly what's happening here. Is this from a  
> "fresh" python prompt?
> 
> How did you install MySQLdb, did you use easy_install? If so, try to  
> install from the sourceforge download.
> 
> Try to remove it, remove the "build" directory from your mysqldb  
> download and redo the whole
> python setup.py build / python setup.py install process
> 
> To remove it, nuke this:
> /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
> 
> And try to reinstall?
> 
> Perhaps someone who knows what the problem is here can give you a  
> better idea on what to do.
> 
> -steve

_________________________________________________________________
Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now.
http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033


From lists.steve at arachnedesign.net  Mon Oct 15 16:30:21 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Mon, 15 Oct 2007 12:30:21 -0400
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
Message-ID: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>

Hi,

> Thank you for your email. I was away for a week.
> What do you mean "fresh" python prompt?
> I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> online.
> I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> am I right?

I'm not sure, exactly.

Last time I checked, the only thing you needed to use mysql from  
python was:

(a) A working mysql install (the client/server)
(b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
python

I'm assuming (a) is installed correctly since you are using the .mpkg  
from mysql.org, so I'd just try to fix (b).

You try do so by doing the following:

(1) Remove your original attempt at installing the python mysqldb  
library. From the looks of your error messages, it seems to be  
installed here:

Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/

(2) remove the build directory in your mysqldb directory (the one you  
are installing from) by cd-ing into your mysqldb download, and  
removing the build directory you find there.

(3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
`sudo python setup.py install` dance

For the record, I'm not sure what you are talking about when you are  
distinguishing between "MySQL_python_1.2.2, not MySQLdb"

are you trying to install two python libraries to access mysql?

-steve


From ytu888 at hotmail.com  Mon Oct 15 17:18:42 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Mon, 15 Oct 2007 12:18:42 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W26619613084E7C6274C1008FA30@phx.gbl>


What I said: "MySQL_python_1.2.2, not MySQLdb" means to uninstall MySQL_python not the mysql client/server installed with the mpkg.

I just deleted the MYSQL....fat.egg file and downloaded the MySAL-python-1.2.2.tar. I repeated the installation process. However, when I run import MySQLdb, I got the same error message. Is there any other things I should take a look? Thank you very much.


 CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 15 Oct 2007 12:30:21 -0400
> To: ytu888 at hotmail.com
> 
> Hi,
> 
> > Thank you for your email. I was away for a week.
> > What do you mean "fresh" python prompt?
> > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> > online.
> > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> > am I right?
> 
> I'm not sure, exactly.
> 
> Last time I checked, the only thing you needed to use mysql from  
> python was:
> 
> (a) A working mysql install (the client/server)
> (b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
> python
> 
> I'm assuming (a) is installed correctly since you are using the .mpkg  
> from mysql.org, so I'd just try to fix (b).
> 
> You try do so by doing the following:
> 
> (1) Remove your original attempt at installing the python mysqldb  
> library. From the looks of your error messages, it seems to be  
> installed here:
> 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/
> 
> (2) remove the build directory in your mysqldb directory (the one you  
> are installing from) by cd-ing into your mysqldb download, and  
> removing the build directory you find there.
> 
> (3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
> `sudo python setup.py install` dance
> 
> For the record, I'm not sure what you are talking about when you are  
> distinguishing between "MySQL_python_1.2.2, not MySQLdb"
> 
> are you trying to install two python libraries to access mysql?
> 
> -steve
> 

_________________________________________________________________
Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews


From ytu888 at hotmail.com  Tue Oct 16 17:06:36 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Tue, 16 Oct 2007 12:06:36 -0500
Subject: [BioPython] Error for installation of  MySALdb on Mac OS X
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W36D33A0C262F9D2A0101058F9C0@phx.gbl>


Hi,

I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem.
Thank you very much.

LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build
running build
running build_py
... ...
/usr/bin/ld: for architecture ppc
/usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
/usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install
Password:
running install
... ...
Adding MySQL-python 1.2.2 to easy-install.pth file

Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
Processing dependencies for MySQL-python==1.2.2
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path
  import sys, pkg_resources, imp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MySQLdb/__init__.py", line 19, in <module>
    import _mysql
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
  Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
  Reason: image not found

> CC: biopython at lists.open-bio.org
> From: lists.steve at arachnedesign.net
> Subject: Re: [BioPython]  Error for installation of  MySALdb on Mac OS X
> Date: Mon, 15 Oct 2007 12:30:21 -0400
> To: ytu888 at hotmail.com
> 
> Hi,
> 
> > Thank you for your email. I was away for a week.
> > What do you mean "fresh" python prompt?
> > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded  
> > online.
> > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb,  
> > am I right?
> 
> I'm not sure, exactly.
> 
> Last time I checked, the only thing you needed to use mysql from  
> python was:
> 
> (a) A working mysql install (the client/server)
> (b) The mysqldb package from: http://sourceforge.net/projects/mysql- 
> python
> 
> I'm assuming (a) is installed correctly since you are using the .mpkg  
> from mysql.org, so I'd just try to fix (b).
> 
> You try do so by doing the following:
> 
> (1) Remove your original attempt at installing the python mysqldb  
> library. From the looks of your error messages, it seems to be  
> installed here:
> 
> Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- 
> packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/
> 
> (2) remove the build directory in your mysqldb directory (the one you  
> are installing from) by cd-ing into your mysqldb download, and  
> removing the build directory you find there.
> 
> (3) reinstall mysqldb by doing the usual `pythong setup.py build` and  
> `sudo python setup.py install` dance
> 
> For the record, I'm not sure what you are talking about when you are  
> distinguishing between "MySQL_python_1.2.2, not MySQLdb"
> 
> are you trying to install two python libraries to access mysql?
> 
> -steve
> 

_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us


From fennan at gmail.com  Tue Oct 16 17:51:30 2007
From: fennan at gmail.com (Fernando)
Date: Tue, 16 Oct 2007 19:51:30 +0200
Subject: [BioPython] Precompute database information
Message-ID: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>

Hi everybody,

I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?

Thanks


From fennan at gmail.com  Tue Oct 16 18:55:54 2007
From: fennan at gmail.com (Fernando)
Date: Tue, 16 Oct 2007 20:55:54 +0200
Subject: [BioPython] Precompute database information
In-Reply-To: <4714FD13.2020708@maubp.freeserve.co.uk>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<4714FD13.2020708@maubp.freeserve.co.uk>
Message-ID: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>

Hi Peter,

>How big would your pre-computed data be?  If its some sort of table or
>other simple data you could perhaps use a simple text file; Another idea
> for complicated objects is to use python's pickle module.

It would be big... I an dealing with pairwise terms comparisons and I want
to consider different species as well.

>How often would the pre-computed data need to be updated?  Every time
>there is a new Gene Ontology release?  It might be better have the
>module download and cache the latest version on request (rather than
>shipping an out of date dataset with Biopython).

Yes, I could do that... It would be OK in Biopython to use mysql? If so the
module could download the last GO version on request, install it and work
with that version until the users decides to update it.

On 10/16/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Fernando wrote:
> > Hi everybody,
> >
> > I am thinking in including some algorithms that I work with into
> biopython.
> > My first concern is that I'm using a local image of the Gene Ontology
> > database to perform several operations. In order to avoid such database
> > accesses I could precompute the information I need and load it once the
> > module is called. How should I do it? Is there a guideline style to load
> > external variables or something like that? Any other ideas/suggestions?
>
> I think you need to go into more detail.
>
> How big would your pre-computed data be?  If its some sort of table or
> other simple data you could perhaps use a simple text file; Another idea
> for complicated objects is to use python's pickle module.
>
> How often would the pre-computed data need to be updated?  Every time
> there is a new Gene Ontology release?  It might be better have the
> module download and cache the latest version on request (rather than
> shipping an out of date dataset with Biopython).
>
> I don't think we have anything in Biopython that requires regular
> updates.  Things like genomes and sequence databases are left up to the
> user.
>
> Peter
>
>


From sdavis2 at mail.nih.gov  Tue Oct 16 19:26:18 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 16 Oct 2007 15:26:18 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>	<4714FD13.2020708@maubp.freeserve.co.uk>
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
Message-ID: <4715105A.30705@mail.nih.gov>

Fernando wrote:
> Hi Peter,
> 
>> How big would your pre-computed data be?  If its some sort of table or
>> other simple data you could perhaps use a simple text file; Another idea
>> for complicated objects is to use python's pickle module.
> 
> It would be big... I an dealing with pairwise terms comparisons and I want
> to consider different species as well.
> 
>> How often would the pre-computed data need to be updated?  Every time
>> there is a new Gene Ontology release?  It might be better have the
>> module download and cache the latest version on request (rather than
>> shipping an out of date dataset with Biopython).
> 
> Yes, I could do that... It would be OK in Biopython to use mysql? If so the
> module could download the last GO version on request, install it and work
> with that version until the users decides to update it.

Asking users to use MySQL to do updates might be a bit much.  Could this
be done from the .obo files?

Sean


From biopython at maubp.freeserve.co.uk  Tue Oct 16 18:04:03 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 19:04:03 +0100
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
Message-ID: <4714FD13.2020708@maubp.freeserve.co.uk>

Fernando wrote:
> Hi everybody,
> 
> I am thinking in including some algorithms that I work with into biopython.
> My first concern is that I'm using a local image of the Gene Ontology
> database to perform several operations. In order to avoid such database
> accesses I could precompute the information I need and load it once the
> module is called. How should I do it? Is there a guideline style to load
> external variables or something like that? Any other ideas/suggestions?

I think you need to go into more detail.

How big would your pre-computed data be?  If its some sort of table or 
other simple data you could perhaps use a simple text file; Another idea 
for complicated objects is to use python's pickle module.

How often would the pre-computed data need to be updated?  Every time 
there is a new Gene Ontology release?  It might be better have the 
module download and cache the latest version on request (rather than 
shipping an out of date dataset with Biopython).

I don't think we have anything in Biopython that requires regular 
updates.  Things like genomes and sequence databases are left up to the 
user.

Peter


From fennan at gmail.com  Wed Oct 17 11:12:36 2007
From: fennan at gmail.com (Fernando)
Date: Wed, 17 Oct 2007 07:12:36 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <4715105A.30705@mail.nih.gov>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<4714FD13.2020708@maubp.freeserve.co.uk>
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>
	<4715105A.30705@mail.nih.gov>
Message-ID: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>

>Asking users to use MySQL to do updates might be a bit much.  Could this
>be done from the .obo files?

I think that's probably the best solution... Is there any python module for
working with OBO / OWL  formats? I've been searching but people seem to use
BioPerl for this matter

On 10/16/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> Fernando wrote:
> > Hi Peter,
> >
> >> How big would your pre-computed data be?  If its some sort of table or
> >> other simple data you could perhaps use a simple text file; Another
> idea
> >> for complicated objects is to use python's pickle module.
> >
> > It would be big... I an dealing with pairwise terms comparisons and I
> want
> > to consider different species as well.
> >
> >> How often would the pre-computed data need to be updated?  Every time
> >> there is a new Gene Ontology release?  It might be better have the
> >> module download and cache the latest version on request (rather than
> >> shipping an out of date dataset with Biopython).
> >
> > Yes, I could do that... It would be OK in Biopython to use mysql? If so
> the
> > module could download the last GO version on request, install it and
> work
> > with that version until the users decides to update it.
>
> Asking users to use MySQL to do updates might be a bit much.  Could this
> be done from the .obo files?
>
> Sean
>


From sdavis2 at mail.nih.gov  Wed Oct 17 15:34:17 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 17 Oct 2007 11:34:17 -0400
Subject: [BioPython] Precompute database information
In-Reply-To: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>	
	<4714FD13.2020708@maubp.freeserve.co.uk>	
	<7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com>	
	<4715105A.30705@mail.nih.gov>
	<7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
Message-ID: <47162B79.8080204@mail.nih.gov>

Fernando wrote:
>>Asking users to use MySQL to do updates might be a bit much.  Could this
>>be done from the .obo files?
> 
> I think that's probably the best solution... Is there any python module
> for working with OBO / OWL  formats? I've been searching but people seem
> to use BioPerl for this matter

In a way, it seems silly to reimplement the Bio::OntologyIO stuff in
python, but I (and others, after a quick google search) would probably
benefit from such a thing.  I'm not able to devote much time right this
minute to the project, but I think that, given the huge number of
particularly obo format files available, there would be use for such
parsers and tools in biopython.  How much interest/need is there for a
Bio.OntologyIO like thing?  Has anyone made any attempts at creating one?

For a list of available biologic ontologies (to see what we are
missing), see here:

http://obofoundry.org/

Sean


From luca.beltrame at unimi.it  Wed Oct 17 15:59:47 2007
From: luca.beltrame at unimi.it (Luca Beltrame)
Date: Wed, 17 Oct 2007 17:59:47 +0200
Subject: [BioPython] Precompute database information
In-Reply-To: <47162B79.8080204@mail.nih.gov>
References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com>
	<7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com>
	<47162B79.8080204@mail.nih.gov>
Message-ID: <200710171759.48595.luca.beltrame@unimi.it>

Il Wednesday 17 October 2007 17:34:17 Sean Davis ha scritto:

> In a way, it seems silly to reimplement the Bio::OntologyIO stuff in

It depends on the perspective, as for some learning yet another programming 
language would be a drawback.

> parsers and tools in biopython.  How much interest/need is there for a
> Bio.OntologyIO like thing?  Has anyone made any attempts at creating one?

Personally speaking, I would love it. No time (and skill) to even think about 
doing something like that, though.

-- 
Luca Beltrame, MSc. - Molecular Medicine PhD Student
Dipartimento di Scienze e Tecnologie Biomediche - UniMI
CNR - Institute of Biomedical Technologies Research Fellow
E-mail: luca dot beltrame [at] unimi dot it - Phone: +39-02-50320924


From jimmy.musselwhite at gmail.com  Wed Oct 17 21:20:41 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 17:20:41 -0400
Subject: [BioPython] Question about Seq.count()
Message-ID: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>

Hello all
I have a script that is running through a list of about 250,000 sequence
records and counting the number of times it counts substrings of 3-5
nucleotides in length

Here is some example code

search = 'ATTCG'

#use SeqIO to get a big list of records
sequences = list(SeqIO.parse(file, "fasta")

for record in sequences :

Now the code I want to do is
record.seq.count(search)

but what I am forced to do is
record.seq.tostring().count(search)

The problem here is that when I am forced to use .tostring() on every single
seq object it devastates my memory usage in a BIG way. It eats up about
1.2gigs and then crashes. If I remove the .tostring() and just tell if to
search for 'A', it will run fine and use memory at about 1/100th the rate

So my question sums down to, is there any way to make .count() be able to
search for strings and not just characters? Otherwise my work is going to
grind to a halt here.

Thanks!


From biopython at maubp.freeserve.co.uk  Wed Oct 17 22:03:51 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Oct 2007 23:03:51 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
Message-ID: <471686C7.6050305@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Now the code I want to do is
> record.seq.count(search)
> 
> but what I am forced to do is
> record.seq.tostring().count(search)
 >
> The problem here is that when I am forced to use .tostring() on every single
> seq object it devastates my memory usage in a BIG way. It eats up about
> 1.2gigs and then crashes. If I remove the .tostring() and just tell if to
> search for 'A', it will run fine and use memory at about 1/100th the rate

In the short term, try record.seq.data.count(search) which is what the 
tostring() method is doing anyway (the Seq object stores the sequence 
internally as a string).  Does that help?

We might be tweaking the Seq object after the next release to act a bit 
more like a string - at which point the .data property might go away.

> So my question sums down to, is there any way to make .count() be able to
> search for strings and not just characters?

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0

Peter


From jimmy.musselwhite at gmail.com  Wed Oct 17 22:48:09 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 18:48:09 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
Message-ID: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>

Thanks guys! That worked great.

On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Now the code I want to do is
> > record.seq.count(search)
> >
> > but what I am forced to do is
> > record.seq.tostring().count(search)
> >
> > The problem here is that when I am forced to use .tostring() on every
> single
> > seq object it devastates my memory usage in a BIG way. It eats up about
> > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> to
> > search for 'A', it will run fine and use memory at about 1/100th the
> rate
>
> In the short term, try record.seq.data.count(search) which is what the
> tostring() method is doing anyway (the Seq object stores the sequence
> internally as a string).  Does that help?
>
> We might be tweaking the Seq object after the next release to act a bit
> more like a string - at which point the .data property might go away.
>
> > So my question sums down to, is there any way to make .count() be able
> to
> > search for strings and not just characters?
>
> You I'd never noticed that - I would call it a bug...
>
> >>> from Bio.Seq import Seq
> >>> my_seq = Seq("AAACACACGGTTTT")
> >>> my_seq.data.count("GG")
> 1
> >>> my_seq.data.count("G")
> 2
> >>> my_seq.tostring().count("G")
> 2
> >>> my_seq.tostring().count("GG")
> 1
> >>> my_seq.count("G")
> 2
> >>> my_seq.count("GG")
> 0
>
> Peter
>
>


From jimmy.musselwhite at gmail.com  Wed Oct 17 22:52:07 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 18:52:07 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
Message-ID: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>

Just kidding, it didn't work great. It only "fixed" it because I was
printing out the output of count() and so it was just executing 100 times
slower and thus eating RAM 100 times slower :(

It doesn't seem like there is a good way for me to fix this.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Thanks guys! That worked great.
>
> On 10/17/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
> >
> > Jimmy Musselwhite wrote:
> > > Now the code I want to do is
> > > record.seq.count(search)
> > >
> > > but what I am forced to do is
> > > record.seq.tostring().count(search)
> > >
> > > The problem here is that when I am forced to use .tostring() on every
> > single
> > > seq object it devastates my memory usage in a BIG way. It eats up
> > about
> > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if
> > to
> > > search for 'A', it will run fine and use memory at about 1/100th the
> > rate
> >
> > In the short term, try record.seq.data.count (search) which is what the
> > tostring() method is doing anyway (the Seq object stores the sequence
> > internally as a string).  Does that help?
> >
> > We might be tweaking the Seq object after the next release to act a bit
> > more like a string - at which point the .data property might go away.
> >
> > > So my question sums down to, is there any way to make .count() be able
> > to
> > > search for strings and not just characters?
> >
> > You I'd never noticed that - I would call it a bug...
> >
> > >>> from Bio.Seq import Seq
> > >>> my_seq = Seq("AAACACACGGTTTT")
> > >>> my_seq.data.count("GG")
> > 1
> > >>> my_seq.data.count("G")
> > 2
> > >>> my_seq.tostring().count("G")
> > 2
> > >>> my_seq.tostring().count("GG")
> > 1
> > >>> my_seq.count("G")
> > 2
> > >>> my_seq.count("GG")
> > 0
> >
> > Peter
> >
> >
>


From jimmy.musselwhite at gmail.com  Wed Oct 17 23:04:26 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 19:04:26 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
Message-ID: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>

In response to the first reply you gave me, where you said this

You I'd never noticed that - I would call it a bug...

 >>> from Bio.Seq import Seq
 >>> my_seq = Seq("AAACACACGGTTTT")
 >>> my_seq.data.count("GG")
1
 >>> my_seq.data.count("G")
2
 >>> my_seq.tostring().count("G")
2
 >>> my_seq.tostring().count("GG")
1
 >>> my_seq.count("G")
2
 >>> my_seq.count("GG")
0


I've tried that many many times and I always get 0 when I do
my_seq.count("GG")
I just rebuilt biopython from the latest CVS tarball and it still does not
work. I have no idea why yours works and mine doesn't.

On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
>
> It doesn't seem like there is a good way for me to fix this.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Thanks guys! That worked great.
> >
> > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > >
> > > Jimmy Musselwhite wrote:
> > > > Now the code I want to do is
> > > > record.seq.count(search)
> > > >
> > > > but what I am forced to do is
> > > > record.seq.tostring().count(search)
> > > >
> > > > The problem here is that when I am forced to use .tostring() on
> > > every single
> > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > about
> > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell
> > > if to
> > > > search for 'A', it will run fine and use memory at about 1/100th the
> > > rate
> > >
> > > In the short term, try record.seq.data.count (search) which is what
> > > the
> > > tostring() method is doing anyway (the Seq object stores the sequence
> > > internally as a string).  Does that help?
> > >
> > > We might be tweaking the Seq object after the next release to act a
> > > bit
> > > more like a string - at which point the .data property might go away.
> > >
> > > > So my question sums down to, is there any way to make .count() be
> > > able to
> > > > search for strings and not just characters?
> > >
> > > You I'd never noticed that - I would call it a bug...
> > >
> > > >>> from Bio.Seq import Seq
> > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > >>> my_seq.data.count("GG")
> > > 1
> > > >>> my_seq.data.count("G")
> > > 2
> > > >>> my_seq.tostring().count("G")
> > > 2
> > > >>> my_seq.tostring().count("GG")
> > > 1
> > > >>> my_seq.count("G")
> > > 2
> > > >>> my_seq.count("GG")
> > > 0
> > >
> > > Peter
> > >
> > >
> >
>


From jimmy.musselwhite at gmail.com  Wed Oct 17 23:06:03 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Wed, 17 Oct 2007 19:06:03 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
	<86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com>
Message-ID: <86e5e8970710171606x4ac9b3feg23f2409a4385d237@mail.gmail.com>

Man I"m sorry, I didn't read that well enough. It doesn't work for you
either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or
something.


On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
>
> In response to the first reply you gave me, where you said this
>
> You I'd never noticed that - I would call it a bug...
>
>  >>> from Bio.Seq import Seq
>  >>> my_seq = Seq("AAACACACGGTTTT")
>  >>> my_seq.data.count("GG")
> 1
>  >>> my_seq.data.count("G")
> 2
>  >>> my_seq.tostring().count("G")
> 2
>  >>> my_seq.tostring().count("GG")
> 1
>  >>> my_seq.count("G")
> 2
>  >>> my_seq.count("GG")
> 0
>
>
> I've tried that many many times and I always get 0 when I do
> my_seq.count("GG")
> I just rebuilt biopython from the latest CVS tarball and it still does not
> work. I have no idea why yours works and mine doesn't.
>
> On 10/17/07, Jimmy Musselwhite <jimmy.musselwhite at gmail.com> wrote:
> >
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100 times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
> >
> > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote:
> > >
> > > Thanks guys! That worked great.
> > >
> > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote:
> > > >
> > > > Jimmy Musselwhite wrote:
> > > > > Now the code I want to do is
> > > > > record.seq.count(search)
> > > > >
> > > > > but what I am forced to do is
> > > > > record.seq.tostring().count(search)
> > > > >
> > > > > The problem here is that when I am forced to use .tostring() on
> > > > every single
> > > > > seq object it devastates my memory usage in a BIG way. It eats up
> > > > about
> > > > > 1.2gigs and then crashes. If I remove the .tostring() and just
> > > > tell if to
> > > > > search for 'A', it will run fine and use memory at about 1/100th
> > > > the rate
> > > >
> > > > In the short term, try record.seq.data.count (search) which is what
> > > > the
> > > > tostring() method is doing anyway (the Seq object stores the
> > > > sequence
> > > > internally as a string).  Does that help?
> > > >
> > > > We might be tweaking the Seq object after the next release to act a
> > > > bit
> > > > more like a string - at which point the .data property might go
> > > > away.
> > > >
> > > > > So my question sums down to, is there any way to make .count() be
> > > > able to
> > > > > search for strings and not just characters?
> > > >
> > > > You I'd never noticed that - I would call it a bug...
> > > >
> > > > >>> from Bio.Seq import Seq
> > > > >>> my_seq = Seq("AAACACACGGTTTT")
> > > > >>> my_seq.data.count("GG")
> > > > 1
> > > > >>> my_seq.data.count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("G")
> > > > 2
> > > > >>> my_seq.tostring().count("GG")
> > > > 1
> > > > >>> my_seq.count("G")
> > > > 2
> > > > >>> my_seq.count("GG")
> > > > 0
> > > >
> > > > Peter
> > > >
> > > >
> > >
> >
>


From jimmy.musselwhite at gmail.com  Thu Oct 18 12:48:41 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Thu, 18 Oct 2007 08:48:41 -0400
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471733DE.6050803@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
	<471733DE.6050803@maubp.freeserve.co.uk>
Message-ID: <86e5e8970710180548u48e5780crc8d5178401d116d5@mail.gmail.com>

Peter
Well after a day of not thinking very hard I found my problem and it didn't
have anything to do with strings at all. That was just my best guess at the
time of writing this e-mail. Sorry about that =(

On 10/18/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Jimmy Musselwhite wrote:
> > Just kidding, it didn't work great. It only "fixed" it because I was
> > printing out the output of count() and so it was just executing 100
> times
> > slower and thus eating RAM 100 times slower :(
> >
> > It doesn't seem like there is a good way for me to fix this.
>
> Both of these are using the python string method to count "GG", the only
> difference is the tostring() method has the additional small overhead of
> an extra function call:
>
> my_seq.data.count("GG")
> my_seq.tostring().count("GG")
>
> However, comparing these:
>
> my_seq.data.count("G")         # using python's string count method
> my_seq.tostring().count("G")   # using python's string count method
> my_seq.count("G")              # using an iterator internally
>
> It could be that the Seq record's current single letter search is simply
> very memory efficient compared than the python string's more flexible
> multi-letter search.
>
> How are you measuring the RAM?  If like to see memory usage figures for
> the five simple examples above on a large sequence - plus doing this
> directly on the equivalent string.
>
> Are you using Linux or Windows or Mac OS, and what version of python?  I
> know there have been some string optimisations in Python 2.5 (although I
> don't know if any are relevant to the count method).
>
> Peter
>
>


From ytu888 at hotmail.com  Thu Oct 18 17:35:15 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Thu, 18 Oct 2007 12:35:15 -0500
Subject: [BioPython] Error for running the test code in BioSQL with
 Biopython manual
In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
	<46FD5927.3000207@maubp.freeserve.co.uk>
	<BAY119-W5AA9F19386CAED13C993A8FAD0@phx.gbl>
	<374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net>
	<BAY119-W109FA474DC20CBE2CA2ADB8FAF0@phx.gbl>
	<38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net>
	<BAY119-W34D61C857373444BB72CAC8FAF0@phx.gbl>
	<14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net>
	<BAY119-W23B6C7065F7B8F3D5E07BE8FA30@phx.gbl>
	<908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net>
Message-ID: <BAY119-W666D7B45057D45AB8D7AC8F9E0@phx.gbl>


I am still waiting for help to fix the problem on Mac (attached at the bottom). However, to make the project going I found a old PC and installed Python, MySQL, BioSql and Bio-python on it. However, when I tested the codes coming with Basic BioSQL with Biopython, I got the following error:
=======================================my PC problem===============================

>>> from BioSQL import BioSeqDatabase
>>> server=BioSeqDatabase.open_database(driver="MySQLdb", user="root",
...     passwd="MySQLdb", host="localhost", db="bioseqdb")
>>> db=server.new_database("Viral")
>>> from Bio import GenBank
>>> parser=GenBank.FeatureParser()
>>> iterator = GenBank.Iterator(open("gbvrl.gb"), parser)
>>> db.load(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 414, in lo
ad
    db_loader.load_seqrecord(cur_record)
  File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 37, in load_seqrec
ord
    bioentry_id = self._load_bioentry_table(record)
  File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 260, in _load_bioe
ntry_table
    bioentry_id = self.adaptor.last_id('bioentry')
  File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 148, in la
st_id
    return self.dbutils.last_id(self.cursor, table)
  File "C:\Python25\Lib\site-packages\BioSQL\DBUtils.py", line 34, in last_id
    return cursor.insert_id()
AttributeError: 'Cursor' object has no attribute 'insert_id'
+++++++++++++++++++++++++++++++++++++++++++++++++

Please help me to fix the problem, thanks.


========================================my old Mac problem========================
Date: Tue, 16 Oct 2007 12:06:36 -0500
From: Y Tu <ytu888 at hotmail.com>
Subject: Re: [BioPython] Error for installation of  MySALdb on Mac OS
	X
To: Steve Lianoglou <lists.steve at arachnedesign.net>
Cc: biopython at lists.open-bio.org
Message-ID: <BAY119-W36D33A0C262F9D2A0101058F9C0 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
 
 
Hi,
 
I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem.
Thank you very much.
 
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build
running build
running build_py
... ...
/usr/bin/ld: for architecture ppc
/usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
/usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded)
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install
Password:
running install
... ...
Adding MySQL-python 1.2.2 to easy-install.pth file
 
Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg
Processing dependencies for MySQL-python==1.2.2
LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path
  import sys, pkg_resources, imp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MySQLdb/__init__.py", line 19, in <module>
    import _mysql
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in <module>
  File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib
  Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so
  Reason: image not found

_________________________________________________________________
Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct


From biopython at maubp.freeserve.co.uk  Thu Oct 18 10:22:22 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Oct 2007 11:22:22 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>	<471686C7.6050305@maubp.freeserve.co.uk>	<86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com>
	<86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com>
Message-ID: <471733DE.6050803@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Just kidding, it didn't work great. It only "fixed" it because I was
> printing out the output of count() and so it was just executing 100 times
> slower and thus eating RAM 100 times slower :(
> 
> It doesn't seem like there is a good way for me to fix this.

Both of these are using the python string method to count "GG", the only 
difference is the tostring() method has the additional small overhead of 
an extra function call:

my_seq.data.count("GG")
my_seq.tostring().count("GG")

However, comparing these:

my_seq.data.count("G")         # using python's string count method
my_seq.tostring().count("G")   # using python's string count method
my_seq.count("G")              # using an iterator internally

It could be that the Seq record's current single letter search is simply 
very memory efficient compared than the python string's more flexible 
multi-letter search.

How are you measuring the RAM?  If like to see memory usage figures for 
the five simple examples above on a large sequence - plus doing this 
directly on the equivalent string.

Are you using Linux or Windows or Mac OS, and what version of python?  I 
know there have been some string optimisations in Python 2.5 (although I 
don't know if any are relevant to the count method).

Peter


From dalloliogm at gmail.com  Fri Oct 19 13:38:50 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 19 Oct 2007 15:38:50 +0200
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
Message-ID: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>

2007/10/18, Peter <biopython at maubp.freeserve.co.uk>:
>  >>> from Bio.Seq import Seq
>  >>> my_seq = Seq("AAACACACGGTTTT")
>  >>> my_seq.count("G")
> 2
>  >>> my_seq.count("GG")
> 0


I've found the bug!

The code for Bio.Seq.count is:


                                         def count(self, item):
        return len([x for x in self.data if x == item])

it does not work for patterns of two nucleotides, because '[x for x in
self.data]' reiterates on a list of strings of one letter each:

>>> s = Seq( 'ACTTgGCATYCGgtGACGACTGGGcATCGGTCAGTCGGTTT')
>>> [x for x in s.data]
['A', 'C', 'T', 'T', 'g', 'G', 'C', 'A', 'T', 'Y', 'C', 'G', 'g', 't',
'G', 'A', 'C', 'G', 'A', 'C', 'T', 'G', 'G', 'G', 'c', 'A', 'T', 'C',
'G', 'G', 'T', 'C', 'A', 'G', 'T', 'C', 'G', 'G', 'T', 'T', 'T']
>>> for x in s.data:
>>>     print x, 'GG', x == 'GG'
(always false)


Something like [len('GG' in s.data)] also won't work, because "'GG' in
s.data" returns a Boolean value:
>>> 'GG' in s.data
True

What about using regular expressions instead?

>>> import re
>>> r = re.compile('GG')
>>> count = len(r.findall(my_seq.data))

They don't seem to be too different as for the execution time:

# for i in $( seq 10); do time python -m re -c '"cdasd".count("cc")';
done 2>&1| grep real
real    0m0.091s
real    0m0.106s
real    0m0.081s
real    0m0.110s
real    0m0.076s
real    0m0.109s
real    0m0.109s
real    0m0.062s
real    0m0.110s
real    0m0.062s


# for i in $(seq 10); do time python -m re -c 'len(re.findall("cc",
"cdasd"))'; done 2>&1|grep real
real    0m0.065s
real    0m0.108s
real    0m0.079s
real    0m0.082s
real    0m0.111s
real    0m0.113s
real    0m0.110s
real    0m0.112s
real    0m0.112s
real    0m0.111s


Compiling a short pattern with the re module shouldn't take too much
time and maybe in future implementations, it will allows us to do more
interesting things: for example, we will be able to add an
'ignorecase' parameter to Seq.count:

>>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG', 'ignorecase')
2
>>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG')
1

What do you think?

Cheers,
Giovanni


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com


From biopython at maubp.freeserve.co.uk  Fri Oct 19 14:50:56 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 19 Oct 2007 15:50:56 +0100
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>
	<471686C7.6050305@maubp.freeserve.co.uk>
	<5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
Message-ID: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>

> I've found the bug!
>
> The code for Bio.Seq.count is:
>
> def count(self, item):
>         return len([x for x in self.data if x == item])

Yeah - by design this (and the functionally similar version for the
MutableSeq) both expect the count argument to be a single letter.  The
simple fix for the Seq object is to use the string method internally:

def count(self, item):
        return self.data.count(item)

For the MutableSeq things are not so straight forward, but supporting
multiple character arguments can be done.

> What about using regular expressions instead?
> ...
> What do you think?

I think the Seq object's count method should act just like a normal
python string's count method.  If anyone wants to get fancy with
regular expressions, they can do so.

Peter


From anaryin at gmail.com  Mon Oct 22 12:21:49 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 13:21:49 +0100
Subject: [BioPython] Scripts cannot connect
Message-ID: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>

Hello all! I solved my problem a few weeks ago on Windows but now that I've
changed to Linux, it is back again.

I have this script:

#!/usr/bin/env python

from SOAPpy import WSDL

wsdl = 'http://soap.genome.jp/KEGG.wsdl'

serv = WSDL.Proxy(wsdl)

genes = ["eco:b1002", "eco:b2388"]

results = serv.mark_pathway_by_objects("path:eco00010", genes)

print results

Everytime I try to run it, it gets me a timeout. I solved the problem in
Windows by setting up env_variables. Here, the bash can access the web (it
has its env_var http_proxy set) but my scripts can't.. any help?

Thanks in advance!

Jo?o Rodrigues


From biopython at maubp.freeserve.co.uk  Mon Oct 22 12:48:52 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Oct 2007 13:48:52 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
Message-ID: <471C9C34.7000006@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Everytime I try to run it, it gets me a timeout. I solved the problem in
> Windows by setting up env_variables. Here, the bash can access the web (it
> has its env_var http_proxy set) but my scripts can't.. any help?

What does this do if you add it to your script?

import os
print os.environ.keys()
try :
     print os.environ["http_proxy"]
except KeyError :
     print "http_proxy environment variable not setup"

How have you setup the environment variables in Linux? Via your .bashrc 
file?

Peter


From anaryin at gmail.com  Mon Oct 22 13:11:46 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 14:11:46 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <471C9C34.7000006@maubp.freeserve.co.uk>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
Message-ID: <b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>

Hello again!

It says that the proxy isn't set.. I've added the line to my .bashrc ( I had
to create it). Yet, it doesn't work.

What am I doing wrong? (or not doing)


From tiagoantao at gmail.com  Mon Oct 22 14:01:53 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Mon, 22 Oct 2007 15:01:53 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
Message-ID: <471CAD51.101@gmail.com>

Jo?o Rodrigues wrote:
> It says that the proxy isn't set.. I've added the line to my .bashrc ( I had
> to create it). Yet, it doesn't work.
> 
> What am I doing wrong? (or not doing)


Are you doing an export of the variable?
Try doing env at the prompt and check if http_proxy is defined (you will 
get a big list of environment variables, just search or grep for the 
proxy one). Like:
$ env | grep http_proxy


On another front, your .bash_profile should exist and be sourcing 
.bashrc (either that, or you put http_proxy on .bash_profile)

Regards,
Tiago


-- 
tiagoantao at gmail.com
http://tiago.org/ps


From anaryin at gmail.com  Mon Oct 22 15:38:19 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 22 Oct 2007 16:38:19 +0100
Subject: [BioPython] Scripts cannot connect
In-Reply-To: <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
Message-ID: <b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>

Well, the problem is another then.. I've set the environment variables by
hand and it worked. It detects the proxy and works through it.

However, it still doesn't connect to the web. I'm using the example they
gave on the KEGG API reference manual so it *should* work..

I've used a test script to check if other scripts could connect and they do.
I've tried with the urllib to retrieve the kegg page and it does. I guess
the problem is with the webservice... I'll try to figure it out.

Thanks for your help! (Again :) )


From bsantos at biocant.pt  Tue Oct 23 15:57:58 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:57:58 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <001101c8158d$7d146600$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 15:50:16 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:50:16 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <000c01c8158c$69ee0370$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 15:59:50 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 16:59:50 +0100
Subject: [BioPython] Problems with NCBIXML.py
Message-ID: <001601c8158d$bff07cc0$2300a8c0@bsantos>

I am trying to build a simple script that given a multi FASTA sequence file
perform a web BLAST and replace the name of the sequence by the hit with the
lowest E-Value.

But now I?m getting an exception that I don?t now why it?s happening:

 
Traceback (most recent call last):

  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript

    exec codeObject in __main__.__dict__

  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 16, in <module>

    for blast_record in blast_records:

  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
parse

    expat_parser.Parse(text, False)

ExpatError: mismatched tag: line 2823, column 362

 
And where is my script:

 
from Bio import SeqIO

from Bio.Blast import NCBIWWW

import cStringIO

from Bio.Blast import NCBIXML

#for file in dir

file_handle =
open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open
file to an handler

records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq
Object

save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w")

for record in records:

    sequence = record.seq.data #Converts record to Plain Text

    result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a
Blastn against the database nr

    blast_results = result_handle.read() #Catch the results

    save_file.write(blast_results) #Write all the information to an XML file

result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml')

blast_records = NCBIXML.parse(result_handle)

for blast_record in blast_records:

    alignment = blast_record.alignments

    nIdent =
(alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10
0.0

    if nIdent >= 97:

        record.name = alignment[0].hit_def

for record in records:

    print('>description_%s length_%d\n' % (record.name, len(record.seq)))

    print('%s\n' % record.seq)

                                
save_file.close()

file_handle.close()

 
Thank you,

Bruno Santos


From bsantos at biocant.pt  Tue Oct 23 17:17:24 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 23 Oct 2007 18:17:24 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <471E1CBC.30601@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
Message-ID: <001b01c81598$95f7b3b0$2300a8c0@bsantos>

I have manually checked the file and I didn't found any problem.
Sorry about the three times it was my mistake because I send the message
before register and then I thought I had to send it again.
This is getting stranger every time I ran the script it gave me a different
error. Now I get this one at the first run:

Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: unclosed token: line 2826, column 8

Now if I run the script without first close it I get the following error:
Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: no element found: line 2823, column 81

Now if I execute the close operation on both files in the interactive window
and run the script again I get:

Traceback (most recent call last):
  File
"C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
    exec codeObject in __main__.__dict__
  File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta
Gen?mica\BLAST.py", line 17, in <module>
    for blast_record in blast_records:
  File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in
parse
    expat_parser.Parse("", True) # End of XML record
ExpatError: no element found: line 2827, column 0

I have upload my script, the FASTA file I'm using and the XML can anyone
give a look?

XML File: http://www.drivehq.com/folder/p2731454.aspx
Script: http://www.drivehq.com/folder/p2731447.aspx
FASTA File: http://www.drivehq.com/folder/p2731426.aspx


Unidade de Bioinform?tica  

3060-197 Cantanhede  
Tel: 231 410 892
http://bioinformatics.biocant.pt

-----Mensagem original-----
De: Peter [mailto:biopython at maubp.freeserve.co.uk] 
Enviada: ter?a-feira, 23 de Outubro de 2007 17:10
Para: Bruno Santos
Cc: biopython at biopython.org
Assunto: Re: [BioPython] Problems with NCBIXML.py

Bruno Santos wrote:
> I am trying to build a simple script that given a multi FASTA sequence
file
> perform a web BLAST and replace the name of the sequence by the hit with
the
> lowest E-Value.
> 
> But now I?m getting an exception that I don?t now why it?s happening:
> 
> Traceback (most recent call last):
> ...
> 
>     for blast_record in blast_records:
> 
>   File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
> parse
> 
>     expat_parser.Parse(text, False)
> 
> ExpatError: mismatched tag: line 2823, column 362

That sounds like an error in the XML file - have a look at this 
particular XML file by hand in a text editor; maybe its only a partial 
download, or an HTML error page or something.

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 23 18:14:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 19:14:43 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001b01c81598$95f7b3b0$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
Message-ID: <471E3A13.5080505@maubp.freeserve.co.uk>

Bruno Santos wrote:
> I have manually checked the file and I didn't found any problem.
> Sorry about the three times it was my mistake because I send the message
> before register and then I thought I had to send it again.
> This is getting stranger every time I ran the script it gave me a different
> error. Now I get this one at the first run:
 >
 > ...
> 
> Now if I run the script without first close it I get the following error:
> Traceback (most recent call last):
> 

Without seeing the XML file I'm having to guess - but this could be 
something to do with trying to read files from disk before the OS has 
finished flushing the data out.  Mismatched tags could certainly be 
explained if the parser was only getting part of the data.

You could try inserting a sleep of a few seconds after writing and 
closing the XML file.  Also try handle.flush() before the handle.close() 
when you save the XML file to disk.

> I have upload my script, the FASTA file I'm using and the XML can anyone
> give a look?
> 
> XML File: http://www.drivehq.com/folder/p2731454.aspx
> Script: http://www.drivehq.com/folder/p2731447.aspx
> FASTA File: http://www.drivehq.com/folder/p2731426.aspx

That didn't work - the easy solution is to file a bug, and then attach 
the three files:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Peter


From dag23 at duke.edu  Tue Oct 23 21:06:53 2007
From: dag23 at duke.edu (David Garfield)
Date: Tue, 23 Oct 2007 17:06:53 -0400
Subject: [BioPython] Syntax error while parsing Blast output
Message-ID: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>

Hey list,

    I'm having an issue with the BlastParser and Iterator from  
NCBIStandalone.  I assume its because NCBI has gone and changed the  
output file (again)...or I'm an idiot....but maybe there's a real  
problem here.


I'm trying to parse a blast result using the following code:

def filter_blast_results(blast_results, blast_cut_off):
	b_parser = NCBIStandalone.BlastParser()
	b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
	hit_results = {}
	while 1:
		b_record = b_iterator.next()
		if b_record is None:
			break
		header = b_record.Header.query
		temp = []
		for alignment in b_record.alignments:
			for hsp in alignment.hsps:
				if hsp.expect < blast_cut_off:
					temp.append(alignment.title)
		#we now remove duplicates from the temp list and add that the the  
hit_results
		hit_results[header] = remove_duplicates(temp)
	return hit_results

And I get the error I've included at the bottom of this message,  
something about "SyntaxError: Line does not start with 'Reference':"
I know that blast is working because I can print out what appears to  
my untrained eye to be a perfectly good XML of the results I see when  
I run blast manually.


Any help would be very much appreciated,

David


Traceback (most recent call last):
   File "test_scripts.py", line 7, in <module>
     single_blast_sequence.run_2way_blast('single_test_in.fasta','/ 
Users/dagarfield/urchins/blastdbs/urchin_2.0','/Users/dagarfield/ 
urchins/blastdbs/urchin_2.0','NA',.001,'/Users/dagarfield/urchins/ 
urchin_bin/blastall')
   File "/private/var/automount/Network/Share2/genomeScans/urchins/ 
alignment_methods/blast/single_blast_sequence.py", line 57, in  
run_2way_blast
     input_to_other_blast_matches = filter_blast_results 
(blast_results, blast_cut_off)
   File "/private/var/automount/Network/Share2/genomeScans/urchins/ 
alignment_methods/blast/single_blast_sequence.py", line 39, in  
filter_blast_results
     b_record = b_iterator.next()
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1403, in next
     return self._parser.parse(File.StringHandle(data))
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 616, in parse
     self._scanner.feed(handle, self._consumer)
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 96, in feed
     self._scan_header(uhandle, consumer)
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 125, in  
_scan_header
     read_and_call(uhandle, consumer.reference, start='Reference')
   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ 
python2.5/site-packages/Bio/ParserSupport.py", line 300, in  
read_and_call
     raise SyntaxError, errmsg
SyntaxError: Line does not start with 'Reference':
   <BlastOutput_db>/Users/dagarfield/urchins/blastdbs/urchin_2.0</ 
BlastOutput_db>


From biopython at maubp.freeserve.co.uk  Tue Oct 23 21:45:38 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 22:45:38 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
Message-ID: <471E6B82.5010700@maubp.freeserve.co.uk>

David Garfield wrote:
> Hey list,
> 
>     I'm having an issue with the BlastParser and Iterator from  
> NCBIStandalone.  I assume its because NCBI has gone and changed the  
> output file (again)...or I'm an idiot....but maybe there's a real  
> problem here.

The code you gave uses the NCBIStandalone parser/iterator, which expects 
plain text output - yet you say later the raw file looks like a 
perfectly good XML file.  If you have an XML file (which we recommend 
over the plain text) then you should use the NCBIXML module instead.

Also, a style point - I personally much prefer this:

b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
for b_record in b_iterator :
     #etc

over this:

b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
while 1:
     b_record = b_iterator.next()
     if b_record is None: break
     #etc

Peter


From dag23 at duke.edu  Tue Oct 23 21:59:33 2007
From: dag23 at duke.edu (David Garfield)
Date: Tue, 23 Oct 2007 17:59:33 -0400
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <471E6B82.5010700@maubp.freeserve.co.uk>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
Message-ID: <B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>

Thanks, Peter.  You've found the problem exactly.

Interestingly, the code I presented was taken directly from the  
BioPython cookbook (including the "while 1" bit).

Somewhere in the subsequent versions since that document was  
released, the output of NCBIStandalone has changed from text to XML  
and the NCBIStandalone Iterators and Parser either no longer seem to  
work with the output of NCBIStandalone.blastall or there is an option  
not mentioned in the Cookbook to ensure that the output is in text  
rather than XML.

In any event, the problem is now fixed.  Thanks!

--DG


On Oct 23, 2007, at 5:45 PM, Peter wrote:

> David Garfield wrote:
>> Hey list,
>>     I'm having an issue with the BlastParser and Iterator from   
>> NCBIStandalone.  I assume its because NCBI has gone and changed  
>> the  output file (again)...or I'm an idiot....but maybe there's a  
>> real  problem here.
>
> The code you gave uses the NCBIStandalone parser/iterator, which  
> expects plain text output - yet you say later the raw file looks  
> like a perfectly good XML file.  If you have an XML file (which we  
> recommend over the plain text) then you should use the NCBIXML  
> module instead.
>
> Also, a style point - I personally much prefer this:
>
> b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
> for b_record in b_iterator :
>     #etc
>
> over this:
>
> b_iterator = NCBIStandalone.Iterator(blast_results, b_parser)
> while 1:
>     b_record = b_iterator.next()
>     if b_record is None: break
>     #etc
>
> Peter
>


From biopython at maubp.freeserve.co.uk  Tue Oct 23 22:48:28 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 23:48:28 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
Message-ID: <471E7A3C.5010301@maubp.freeserve.co.uk>

David Garfield wrote:
> Thanks, Peter.  You've found the problem exactly.
> 
> Interestingly, the code I presented was taken directly from the 
> BioPython cookbook (including the "while 1" bit).

So it is.  Michiel - do you fancy tweaking that section of the tutorial?

> Somewhere in the subsequent versions since that document was released, 
> the output of NCBIStandalone has changed from text to XML and the 
> NCBIStandalone Iterators and Parser either no longer seem to work with 
> the output of NCBIStandalone.blastall or there is an option not 
> mentioned in the Cookbook to ensure that the output is in text rather 
> than XML.

Biopython 1.43 switched the default from text to XML, because we really 
wanted to encourage people to use the XML output by default as 
maintaining the text format parser is such an ongoing maintainance 
effort.  The release notes did mention this, but it was bound to catch 
someone out.

There is an option to override this...

from Bio.Blast import NCBIStandalone
help(NCBIStandalone.blastall)

You need the align_view option (what the NCBI refers to as the alignment 
view), corresponding to the -m command line option of the NCBI blastall 
tool.  Biopython currently defaults to seven to get XML output.

alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 23 16:09:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Oct 2007 17:09:32 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001101c8158d$7d146600$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
Message-ID: <471E1CBC.30601@maubp.freeserve.co.uk>

Bruno Santos wrote:
> I am trying to build a simple script that given a multi FASTA sequence file
> perform a web BLAST and replace the name of the sequence by the hit with the
> lowest E-Value.
> 
> But now I?m getting an exception that I don?t now why it?s happening:
> 
> Traceback (most recent call last):
> ...
> 
>     for blast_record in blast_records:
> 
>   File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in
> parse
> 
>     expat_parser.Parse(text, False)
> 
> ExpatError: mismatched tag: line 2823, column 362

That sounds like an error in the XML file - have a look at this 
particular XML file by hand in a text editor; maybe its only a partial 
download, or an HTML error page or something.

Peter


From mdehoon at c2b2.columbia.edu  Wed Oct 24 00:19:47 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 23 Oct 2007 20:19:47 -0400
Subject: [BioPython] Syntax error while parsing Blast output
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>

> > Interestingly, the code I presented was taken directly from the 
> > BioPython cookbook (including the "while 1" bit).
> 
> So it is.  Michiel - do you fancy tweaking that section of the tutorial?

That part of the tutorial is in the section "Deprecated BLAST parsers", which
will be removed once the plain-text Blast parser is removed from Biopython.
The description of NCBIStandalone.blastall says

"This command will generate BLAST output in XML format, ..."

So this is being described correctly in the documentation.

Nevertheless, it may be a good idea to remove the plain text Blast parser
completely from Biopython in the upcoming release (which will probably be
done this week), to avoid further confusion.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/23/2007 6:48 PM
To: David Garfield; biopython at lists.open-bio.org
Subject: Re: [BioPython] Syntax error while parsing Blast output
 
David Garfield wrote:
> Thanks, Peter.  You've found the problem exactly.
> 
> Somewhere in the subsequent versions since that document was released, 
> the output of NCBIStandalone has changed from text to XML and the 
> NCBIStandalone Iterators and Parser either no longer seem to work with 
> the output of NCBIStandalone.blastall or there is an option not 
> mentioned in the Cookbook to ensure that the output is in text rather 
> than XML.

Biopython 1.43 switched the default from text to XML, because we really 
wanted to encourage people to use the XML output by default as 
maintaining the text format parser is such an ongoing maintainance 
effort.  The release notes did mention this, but it was bound to catch 
someone out.

There is an option to override this...

from Bio.Blast import NCBIStandalone
help(NCBIStandalone.blastall)

You need the align_view option (what the NCBI refers to as the alignment 
view), corresponding to the -m command line option of the NCBI blastall 
tool.  Biopython currently defaults to seven to get XML output.

alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]

Peter

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From biopython at maubp.freeserve.co.uk  Wed Oct 24 08:22:45 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 09:22:45 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>

[Sorry you got this twice Michiel, I forgot to set the from/to fields]

> That part of the tutorial is in the section "Deprecated BLAST parsers", which
> will be removed once the plain-text Blast parser is removed from Biopython.
> ...
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

Removing it sounds too drastic - especially as we have had people on
the mailing list using it  deliberately fairly recently.  If you really do want
to remove this code, then adding a deprecation warning to the plain text
parser for the next release would be a more gentle route.

I think there is still some benefit in having the plain text parser, and that
it could be fixed to cope with current multi-query files without too much
pain.  Maybe I should try this weekend...

Anyone want to voice their opinion?

Peter


From mmokrejs at ribosome.natur.cuni.cz  Wed Oct 24 11:01:26 2007
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Wed, 24 Oct 2007 13:01:26 +0200
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <471F2606.8080500@ribosome.natur.cuni.cz>

Hi,

Michiel De Hoon wrote:
>>> Interestingly, the code I presented was taken directly from the 
>>> BioPython cookbook (including the "while 1" bit).
>> So it is.  Michiel - do you fancy tweaking that section of the tutorial?
> 
> That part of the tutorial is in the section "Deprecated BLAST parsers", which
> will be removed once the plain-text Blast parser is removed from Biopython.
> The description of NCBIStandalone.blastall says
> 
> "This command will generate BLAST output in XML format, ..."
> 
> So this is being described correctly in the documentation.
> 
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

although I understand your points, are you sure to REMOVE it? What if people
need to parse elsewhere generated, maybe even in the past generated BLAST
text outputs? If you wanted to say that you will REMOVE the text-based parser
because it won't be maintained anymore and probably be usable for one or two
NCBI BLAST version only, then it is probably more understandable. Otherwise
I guess more people move to bioperl. ;) BTW, what if some people have older
BLAST version generating broken XML file formats? Or have to parse such
old files again?

Martin


From winter at biotec.tu-dresden.de  Wed Oct 24 12:22:09 2007
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Wed, 24 Oct 2007 14:22:09 +0200
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk><B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
Message-ID: <471F38F1.1030600@biotec.tu-dresden.de>

Michiel De Hoon wrote:
> Nevertheless, it may be a good idea to remove the plain text Blast parser
> completely from Biopython in the upcoming release (which will probably be
> done this week), to avoid further confusion.

I agree with Peter and Martin that removing the plain text parser is maybe too 
much. Although I further agree that there is benefit in having the plain text 
parser, I am not sure if Biopython should ensure supporting every small format 
change that NCBI might come up with in the future.

I use XML and tabular output only, BTW.

Cheers,
Christof


From cjfields at uiuc.edu  Wed Oct 24 13:49:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 24 Oct 2007 08:49:09 -0500
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
Message-ID: <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>


On Oct 24, 2007, at 3:22 AM, Peter wrote:

> [Sorry you got this twice Michiel, I forgot to set the from/to fields]
>
>> That part of the tutorial is in the section "Deprecated BLAST  
>> parsers", which
>> will be removed once the plain-text Blast parser is removed from  
>> Biopython.
>> ...
>> Nevertheless, it may be a good idea to remove the plain text Blast  
>> parser
>> completely from Biopython in the upcoming release (which will  
>> probably be
>> done this week), to avoid further confusion.
>
> Removing it sounds too drastic - especially as we have had people on
> the mailing list using it  deliberately fairly recently.  If you  
> really do want
> to remove this code, then adding a deprecation warning to the plain  
> text
> parser for the next release would be a more gentle route.
>
> I think there is still some benefit in having the plain text  
> parser, and that
> it could be fixed to cope with current multi-query files without  
> too much
> pain.  Maybe I should try this weekend...
>
> Anyone want to voice their opinion?
>
> Peter

We have a similar issue with the bioperl parsers.  We basically  
promote the BLAST XML parser over the text parser, but we have  
retained both due to demand.  In fact, we have two text parsers, a  
pull and a push parser (we're gluttons for punishment).  As for  
maintenance, we never guarantee how long it will take to fix text  
parsing if it breaks as the text format is fairly unstable by NCBI's  
own admission.

Our deprecation cycle is usually: (1) announce it on list to get  
feedback, (2) if deprecation is planned, add warnings to the module  
in the next release, (3) remove completely in a later release.  It  
gives everyone time to change over.

chris


From bsantos at biocant.pt  Wed Oct 24 16:23:56 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Wed, 24 Oct 2007 17:23:56 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <471E3A13.5080505@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
Message-ID: <001601c8165a$48248600$2300a8c0@bsantos>

Peter Wrote:
>Without seeing the XML file I'm having to guess - but this could be 
>something to do with trying to read files from disk before the OS has 
>finished flushing the data out.  Mismatched tags could certainly be 
>explained if the parser was only getting part of the data.
>
>You could try inserting a sleep of a few seconds after writing and 
>closing the XML file.  Also try handle.flush() before the handle.close() 
>when you save the XML file to disk.

You were right I was getting the data before it has been written to the
file. Now it's working perfect. 

But know I have another problem it's possible to instead of making a single
request to NCBI_Blast with one sequence, make the request for all the
sequences in a multiFASTA file?

I'm trying to use threads to do this but until now without luck.

Thanks in advance,
Bruno Santos


From biopython at maubp.freeserve.co.uk  Wed Oct 24 17:32:52 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 18:32:52 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <001601c8165a$48248600$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
	<001601c8165a$48248600$2300a8c0@bsantos>
Message-ID: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>

On 10/24/07, Bruno Santos <bsantos at biocant.pt> wrote:
> You were right I was getting the data before it has been written to the
> file. Now it's working perfect.

Great.

> But know I have another problem it's possible to instead of making a single
> request to NCBI_Blast with one sequence, make the request for all the
> sequences in a multiFASTA file?
>
> I'm trying to use threads to do this but until now without luck.

I would suggest you install standalone blast, then give it the
multi-record FASTA file as input.  You should then get multiple blast
records back (in the same order).  This works fine with the XML output
(but currently does not work for plain text output on recent versions
of NCBI Blast).

If you really want to make multiple blast submissions in parallel
online, first check the NCBI's website for any usage restrictions -
they don't want their servers to be abused.

Peter


From biosql at hotmail.com  Wed Oct 24 20:53:19 2007
From: biosql at hotmail.com (Jonathan Boulais)
Date: Wed, 24 Oct 2007 16:53:19 -0400
Subject: [BioPython] Loading SwissProt to BioSQL
Message-ID: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>


Hello, 

I'm a biologist and quite newb with Biopython. I'm trying to build locally the Swissprot database with BioSQL and I'm having some problems. 
I have installed the latest version from the CVS and I'm using python 2.5 on a Mac Os 10.4. 

First, i get this weird problem. Since I need to connect with MySQL I started to wrote a simple script (Biosql.py) with only this ( from BioSQL import BioSeqDatabase). When I run this script in the terminal : python Biosql.py, I get this message **ImportError: cannot import name BioSeqDatabase**. But the weird thing is if I start a python session in the terminal by simply invoking python and then manually import BioSeqDatabase, it's working ! 
Is there any reason for that ?

Second, I've then decided to continue with the python session since I'm able to import BioSeqDatabse. The connection to MySQL is working fine, but when I'm trying to import the flat file I'm getting this : 


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load
    db_loader.load_seqrecord(cur_record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table
    version))
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute
    self.cursor.execute(sql, args or ())
  File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute
    query = query % db.literal(args)
TypeError: not all arguments converted during string formatting


Here's the lines I'm using : 

from BioSQL import BioSeqDatabase
from Bio.SwissProt import SProt

server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "", passwd = "", host = "localhost", db = "bioseqdb")
s_parser = SProt.SequenceParser()
s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser)
db = server.new_database("Swiss")
db.load(s_iterator)


Does anybody understand this ?

Many thanks if someone can help !

Jonathan


_________________________________________________________________
Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant!
http://www.emoticonesgratuites.ca/?icid=EMFRCA120


From biopython at maubp.freeserve.co.uk  Wed Oct 24 21:15:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 22:15:10 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
Message-ID: <471FB5DE.6080506@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
> Hello,
> 
> I'm a biologist and quite newb with Biopython. I'm trying to build 
> locally the Swissprot database with BioSQL and I'm having some 
> problems. I have installed the latest version from the CVS and I'm 
> using python 2.5 on a Mac Os 10.4.
> 
> First, i get this weird problem. Since I need to connect with MySQL I
>  started to wrote a simple script (Biosql.py) with only this ( from 
> BioSQL import BioSeqDatabase). When I run this script in the
> terminal: python Biosql.py, I get this message **ImportError: cannot
> import name BioSeqDatabase**. But the weird thing is if I start a
> python session in the terminal by simply invoking python and then
> manually import BioSeqDatabase, it's working ! Is there any reason
> for that ?

In both cases are you running python from the command prompt?  If so 
then the same environment variables (e.g. paths) should apply.  Odd.

My guess is you shouldn't call your script "Biosql.py", call it 
"Biosql_test.py" or something.  Python thinks the line "from BioSQL 
import BioSeqDatabase" means importing from the script itself because 
that is also called BioSQL.

Peter


From biopython at maubp.freeserve.co.uk  Wed Oct 24 21:22:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Oct 2007 22:22:05 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
Message-ID: <471FB77D.5060103@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
> from Bio.SwissProt import SProt
> s_parser = SProt.SequenceParser()
> s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser)

This won't help with the database issue, but you should also be able to 
load the SwissProt text file with Bio.SeqIO:

from Bio import SeqIO
s_iterator = SeqIO.parse(open("path/to/uniprot_sprot.dat"), "swiss")

This in fact will call the Bio.SwissProt.SProt module internally, and 
get it to return SeqRecord objects.

The Bio.SeqIO interface is meant to make it easy to switch the input 
file format (e.g. GenBank or EMBL).

Peter


From mdehoon at c2b2.columbia.edu  Thu Oct 25 00:40:18 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 24 Oct 2007 20:40:18 -0400
Subject: [BioPython] Syntax error while parsing Blast output
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>
	<471E6B82.5010700@maubp.freeserve.co.uk>
	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>
	<471E7A3C.5010301@maubp.freeserve.co.uk>
	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>
	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>
	<3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>

>> Nevertheless, it may be a good idea to remove the plain text Blast  
>> parser
>> completely from Biopython in the upcoming release (which will  
>> probably be
>> done this week), to avoid further confusion.
>
> Removing it sounds too drastic - especially as we have had people on
> the mailing list using it  deliberately fairly recently.  If you  
> really do want
> to remove this code, then adding a deprecation warning to the plain  
> text
> parser for the next release would be a more gentle route.
>

Sorry, I was confused; I was under the impression that the plain text Blast
parser was already deprecated (I was getting confused with the blast and
blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in
favor of qblast). OK, then let's keep the plain-text Blast parser as is, and
maybe think again about this issue after the upcoming release.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From mmayhew at mcb.mcgill.ca  Thu Oct 25 04:12:06 2007
From: mmayhew at mcb.mcgill.ca (Michael Mayhew)
Date: Thu, 25 Oct 2007 00:12:06 -0400
Subject: [BioPython] Any planned BioPython presence at PyCon 2008?
Message-ID: <47201796.2050902@mcb.mcgill.ca>

Was planning on going to PyCon 2008 anyway, but would have even more 
incentive if there is going to be a big BioPython community turnout.

Would love to pitch in on a development session or something like that.

Michael Mayhew


From biosql at hotmail.com  Thu Oct 25 14:52:02 2007
From: biosql at hotmail.com (Jonathan Boulais)
Date: Thu, 25 Oct 2007 10:52:02 -0400
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <471FB5DE.6080506@maubp.freeserve.co.uk>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
	<471FB5DE.6080506@maubp.freeserve.co.uk>
Message-ID: <BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>


> Date: Wed, 24 Oct 2007 22:15:10 +0100
> From: biopython at maubp.freeserve.co.uk
> To: biosql at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Loading SwissProt to BioSQL
> 
> Jonathan Boulais wrote:
> > Hello,
> > 
> > I'm a biologist and quite newb with Biopython. I'm trying to build 
> > locally the Swissprot database with BioSQL and I'm having some 
> > problems. I have installed the latest version from the CVS and I'm 
> > using python 2.5 on a Mac Os 10.4.
> > 
> > First, i get this weird problem. Since I need to connect with MySQL I
> >  started to wrote a simple script (Biosql.py) with only this ( from 
> > BioSQL import BioSeqDatabase). When I run this script in the
> > terminal: python Biosql.py, I get this message **ImportError: cannot
> > import name BioSeqDatabase**. But the weird thing is if I start a
> > python session in the terminal by simply invoking python and then
> > manually import BioSeqDatabase, it's working ! Is there any reason
> > for that ?
> 
> In both cases are you running python from the command prompt?  If so 
> then the same environment variables (e.g. paths) should apply.  Odd.
> 
> My guess is you shouldn't call your script "Biosql.py", call it 
> "Biosql_test.py" or something.  Python thinks the line "from BioSQL 
> import BioSeqDatabase" means importing from the script itself because 
> that is also called BioSQL.
> 
> Peter
> 

Peter you were right about the name of the file. Nice call and thank you !
But I still get the same error as before when I'm running it. 

Traceback (most recent call last):
  File "DB.py", line 14, in <module>
    db.load(s_iterator)
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load
    db_loader.load_seqrecord(cur_record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table
    version))
  File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute
    self.cursor.execute(sql, args or ())
  File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute
    query = query % db.literal(args)
TypeError: not all arguments converted during string formatting


Is it the MySQLdb driver or a bad arguments that is passed to MySQLdb ?

Again, thank you for your time. 

Jonathan 

_________________________________________________________________
Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant!
http://www.emoticonesgratuites.ca/?icid=EMFRCA120


From biopython at maubp.freeserve.co.uk  Thu Oct 25 17:22:46 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 18:22:46 +0100
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>	<471FB5DE.6080506@maubp.freeserve.co.uk>
	<BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
Message-ID: <4720D0E6.8000609@maubp.freeserve.co.uk>

Jonathan Boulais wrote:
>> My guess is you shouldn't call your script "Biosql.py", call it 
>> "Biosql_test.py" or something.  Python thinks the line "from BioSQL 
>> import BioSeqDatabase" means importing from the script itself because 
>> that is also called BioSQL.
> 
> Peter you were right about the name of the file. Nice call and thank you !

Great - I wasn't sure if the case would matter or not.

> But I still get the same error as before when I'm running it. 
> ...

I've not used BioSQL myself (yet), but looking at the code you posted 
earlier, you setup the connection like this:

from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="MySQLdb", user="", 
passwd="", host="localhost", db="bioseqdb")

I think the driver="MySQLdb" is fine, but don't you need a database 
username (and perhaps a password)?

Peter


From biopython at maubp.freeserve.co.uk  Thu Oct 25 09:44:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 10:44:43 +0100
Subject: [BioPython] Any planned BioPython presence at PyCon 2008?
In-Reply-To: <47201796.2050902@mcb.mcgill.ca>
References: <47201796.2050902@mcb.mcgill.ca>
Message-ID: <4720658B.4020103@maubp.freeserve.co.uk>

Michael Mayhew wrote:
> Was planning on going to PyCon 2008 anyway, but would have even more 
> incentive if there is going to be a big BioPython community turnout.
> 
> Would love to pitch in on a development session or something like that.
> 
> Michael Mayhew

http://us.pycon.org/2008/about/
http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html
 > Proposals for PyCon 2008 talks & tutorials are now being accepted.
 > The deadline for proposals is November 16.  PyCon 2008 will be held
 > in Chicago, Illinois, USA, from March 13-20.

It is remotely possible that I'll be working the USA next year, but I 
have to say at this point that it looks unlikely that I'll be able to 
attend.

Peter


From biopython at maubp.freeserve.co.uk  Thu Oct 25 09:57:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Oct 2007 10:57:10 +0100
Subject: [BioPython] Syntax error while parsing Blast output
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>
References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu>	<471E6B82.5010700@maubp.freeserve.co.uk>	<B7CD2416-8E40-40B7-B6F9-4C6A58177B99@duke.edu>	<471E7A3C.5010301@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu>	<320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com>	<3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu>
	<6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu>
Message-ID: <47206876.9040905@maubp.freeserve.co.uk>

Michiel De Hoon wrote:
> 
> Sorry, I was confused; I was under the impression that the plain text Blast
> parser was already deprecated (I was getting confused with the blast and
> blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in
> favor of qblast). OK, then let's keep the plain-text Blast parser as is, and
> maybe think again about this issue after the upcoming release.
> 

Panic averted - but it was good to hear some passionate defence of the 
plain text BLAST  parser, it looks like it still gets quite a bit of use.

Peter


From bsantos at biocant.pt  Fri Oct 26 09:13:58 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 10:13:58 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
References: <001101c8158d$7d146600$2300a8c0@bsantos>
	<471E1CBC.30601@maubp.freeserve.co.uk>
	<001b01c81598$95f7b3b0$2300a8c0@bsantos>
	<471E3A13.5080505@maubp.freeserve.co.uk>
	<001601c8165a$48248600$2300a8c0@bsantos>
	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
Message-ID: <000301c817b0$8c868c10$2300a8c0@bsantos>

Peter Said
>I would suggest you install standalone blast, then give it the
>multi-record FASTA file as input.  You should then get multiple blast
>records back (in the same order).  This works fine with the XML output
>(but currently does not work for plain text output on recent versions
>of NCBI Blast).
>
>If you really want to make multiple blast submissions in parallel
>online, first check the NCBI's website for any usage restrictions -
>they don't want their servers to be abused.
>
>Peter

I have followed your advice and I decide to install standalone blast. As I
want to make blast against the nt databases I have downloaded it pre
compiled from the ncbi ftp server. And I have created I scrip to do this but
for some reason I'm not getting any results, because the programs does not
write anything to the XML file. 

Where is my script:
from Bio import SeqIO
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
import time
import math

my_blast_db = (r'e:/nt.00')
my_blast_file =
r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
my_blast_exe = r'C:/BLAST/bin/'
save_file = open(r'C:/FASTASeq/Results/well9/V6_BLAST.xml', 'w')
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
blast_results = result_handle.read() #Catch the results
save_file.write(blast_results) #Write all the information to an XML file
save_file.close()
print time.ctime()

As I have download the files from ncbi I have a lot of files in the database
directory theres is any way of perform a search against all of them?

Thanks in advance,
Bruno Santos 

Unidade de Bioinform?tica  

3060-197 Cantanhede  
Tel: 231 410 892
http://bioinformatics.biocant.pt


From biopython at maubp.freeserve.co.uk  Fri Oct 26 09:52:34 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 10:52:34 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000301c817b0$8c868c10$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
	<000301c817b0$8c868c10$2300a8c0@bsantos>
Message-ID: <4721B8E2.2040902@maubp.freeserve.co.uk>

Bruno Santos wrote:
> Peter Said
>> I would suggest you install standalone blast, then give it the
>> multi-record FASTA file as input.  You should then get multiple blast
>> records back (in the same order).  This works fine with the XML output
>> (but currently does not work for plain text output on recent versions
>> of NCBI Blast).
>>
>> If you really want to make multiple blast submissions in parallel
>> online, first check the NCBI's website for any usage restrictions -
>> they don't want their servers to be abused.
>>
>> Peter
> 
> I have followed your advice and I decide to install standalone blast. As I
> want to make blast against the nt databases I have downloaded it pre
> compiled from the ncbi ftp server. And I have created I script to do this but
> for some reason I'm not getting any results, because the programs does not
> write anything to the XML file. 
> 
> Where is my script:
> from Bio import SeqIO
> from Bio.Blast import NCBIStandalone
> from Bio.Blast import NCBIXML
> import time
> import math

You are running on Windows, so the paths should have "\" rather than "/" 
in them.  However, in many cases this isn't essential - and indeed for 
some Unix programs ported to Windows using "/" is sometimes best!

> my_blast_db = (r'e:/nt.00')

I'm not sure if that is correct, but its difficult to tell without 
seeing your setup.

> my_blast_file =
> r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
> my_blast_exe = r'C:/BLAST/bin/'

That is wrong, try something like:
my_blast_exe = r'C:\BLAST\bin\blastall.exe'

I would urge you to try running blastall "by hand" at the command line 
first for a few small examples, to get the hang of it.  Because any 
error messages get printed to the command line, it makes debugging 
simpler. This will also help with you how to prepare the arguments in 
Biopython.  Within python you would have to have checked what was 
written to the error_info output handle.

> As I have download the files from ncbi I have a lot of files in the database
> directory theres is any way of perform a search against all of them?

I'm not sure what exactly you are asking.  BLAST can make databases from 
FASTA files, so you might want to build a database from all your FASTA 
files... check the documentation for the BLAST formatdb program.

Peter


From bsantos at biocant.pt  Fri Oct 26 13:40:40 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 14:40:40 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <4721B8E2.2040902@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>
	<000301c817b0$8c868c10$2300a8c0@bsantos>
	<4721B8E2.2040902@maubp.freeserve.co.uk>
Message-ID: <000701c817d5$d0e8f4e0$2300a8c0@bsantos>

>You are running on Windows, so the paths should have "\" rather than "/" 
>in them.  However, in many cases this isn't essential - and indeed for 
>some Unix programs ported to Windows using "/" is sometimes best!
>
> my_blast_db = (r'e:/nt.00')
>
>I'm not sure if that is correct, but its difficult to tell without 
>seeing your setup.
It's ok to use the "/" because it seems that the python interpreter converts
it to the symbol used by the OS. 

> my_blast_file =
> r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna'
> my_blast_exe = r'C:/BLAST/bin/'
>
>That is wrong, try something like:
>my_blast_exe = r'C:\BLAST\bin\blastall.exe'

You were right about that. It's ok now

> As I have download the files from ncbi I have a lot of files in the
database
> directory theres is any way of perform a search against all of them?

>I'm not sure what exactly you are asking.  BLAST can make databases from 
>FASTA files, so you might want to build a database from all your FASTA 
>files... check the documentation for the BLAST formatdb program.
I have downloaded the pre compiled files which mean I have five different
files like (nt.00.nhr, nt.01.nhr, nt.02.nhr...) and also the same files with
all the others extensions. But I have found I can use them all at the same
time by passing it to command line between "". So now I have my_blast_db =
(r'\"e:/nt.00 e:/nt.01 e:/nt.02 e:/nt.03 e:/nt.04 e:/nt.05 \"'). 

But now I'm mailing you with another doubt it is possible to pass the
result_handle to blast_results line by line or something like that because
I'm having a memory error in the step described below

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
blast_results = result_handle.read() #Catch the results 

Maybe if I pass one line at a time and write ir immediately to the xml file
it will work. 

Thanks once more,
Bruno Santos


From biopython at maubp.freeserve.co.uk  Fri Oct 26 14:37:45 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 15:37:45 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000701c817d5$d0e8f4e0$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>
	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>
Message-ID: <4721FBB9.1040408@maubp.freeserve.co.uk>

> But now I'm mailing you with another doubt it is possible to pass the
> result_handle to blast_results line by line or something like that because
> I'm having a memory error in the step described below
> 
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
> "blastn",my_blast_db, my_blast_file)
> blast_results = result_handle.read() #Catch the results 
> 
> Maybe if I pass one line at a time and write it immediately to the xml file
> it will work. 

XML files are big.  Lots of query sequences will also make things 
bigger.  And the default expectation threshold will also give lots of 
results - setting this to something harsher will help by giving less 
matches.

Unless you want to keep the XML file for other analysis, it might be 
simpler to parse the output from blast directly with Biopython - 
avoiding having the large XML file on disk.

Keeping the XML intermediate file can be a good idea when working on 
smaller datasets, where you want to tweak your analysis (without 
re-running blast each time).

Peter


From bsantos at biocant.pt  Fri Oct 26 15:50:48 2007
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 26 Oct 2007 16:50:48 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <4721FBB9.1040408@maubp.freeserve.co.uk>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>
	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>
	<4721FBB9.1040408@maubp.freeserve.co.uk>
Message-ID: <000801c817e7$fd1bc940$2300a8c0@bsantos>

Peter Said:
>XML files are big.  Lots of query sequences will also make things 
>bigger.  And the default expectation threshold will also give lots of 
>results - setting this to something harsher will help by giving less 
>matches.
>
>Unless you want to keep the XML file for other analysis, it might be 
>simpler to parse the output from blast directly with Biopython - 
>avoiding having the large XML file on disk.
>
>Keeping the XML intermediate file can be a good idea when working on 
>smaller datasets, where you want to tweak your analysis (without 
>re-running blast each time).

But if even I don't want to save the results to an XML I still have to do
the <blast_results = result_handle.read() #Catch the results> step right?
And my problem is in this step not in writing to the file. 
Or I can use the result_handle directly, because I was reading the biopython
documentation but it's not very clear.


From biopython at maubp.freeserve.co.uk  Fri Oct 26 16:04:40 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 17:04:40 +0100
Subject: [BioPython] Problems with NCBIXML.py
In-Reply-To: <000801c817e7$fd1bc940$2300a8c0@bsantos>
References: <001101c8158d$7d146600$2300a8c0@bsantos>	<471E1CBC.30601@maubp.freeserve.co.uk>	<001b01c81598$95f7b3b0$2300a8c0@bsantos>	<471E3A13.5080505@maubp.freeserve.co.uk>	<001601c8165a$48248600$2300a8c0@bsantos>	<320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com>	<000301c817b0$8c868c10$2300a8c0@bsantos>	<4721B8E2.2040902@maubp.freeserve.co.uk>	<000701c817d5$d0e8f4e0$2300a8c0@bsantos>	<4721FBB9.1040408@maubp.freeserve.co.uk>
	<000801c817e7$fd1bc940$2300a8c0@bsantos>
Message-ID: <47221018.9090104@maubp.freeserve.co.uk>

Bruno Santos wrote:
> Peter Said:
>> Unless you want to keep the XML file for other analysis, it might be 
>> simpler to parse the output from blast directly with Biopython - 
>> avoiding having the large XML file on disk.
> 
> But if even I don't want to save the results to an XML I still have to do
> the <blast_results = result_handle.read() #Catch the results> step right?
> And my problem is in this step not in writing to the file. 
> Or I can use the result_handle directly, because I was reading the biopython
> documentation but it's not very clear.

The intention is something like this:

result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)

blast_records = NCBIXML.parse(result_handle)
for record in blast_records :
    #do stuff

The bit about saving the results to a file and loading that to give a 
new handle is optional, but very handy if you need to look at the raw 
file by hand.  Perhaps that section of the tutorial could be a little 
clearer ...

Peter


From mdehoon at c2b2.columbia.edu  Sun Oct 28 06:32:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:32:40 -0400
Subject: [BioPython] Biopython release 1.44 ready
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>

Hi everybody,

Biopython release 1.44 is now available for download from the Biopython
website at http://biopython.org.

This release includes lots of code improvements and fixes in the Blast
interface and parsers, sequence input/output, the SwissProt parser, the
clustering routines, as well as a brand new module for population genetics.
For reasons of compatibility, some radical changes were necessary in some
parts of the code; please let us know if you find some functionality missing.

My thanks to all code contributers who made this new release possible.

--Michiel on behalf of the Biopython developers


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From tiagoantao at gmail.com  Sun Oct 28 21:31:58 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 21:31:58 +0000
Subject: [BioPython] Biopython citation
Message-ID: <4724FFCE.20103@gmail.com>

Hello,

I am submitting a paper regarding a Jython selection detection program 
that we have done, and I would like to cite biopython. What is really 
the best, most recent, citation?

Tiago
-- 
tiagoantao at gmail.com
http://tiago.org/ps


From biopython at maubp.freeserve.co.uk  Sun Oct 28 20:52:05 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 20:52:05 +0000
Subject: [BioPython] Biopython citation
In-Reply-To: <4724FFCE.20103@gmail.com>
References: <4724FFCE.20103@gmail.com>
Message-ID: <4724F675.8030902@maubp.freeserve.co.uk>

Tiago Antao wrote:
> I am submitting a paper regarding a Jython selection detection program 
> that we have done, and I would like to cite biopython. What is really 
> the best, most recent, citation?
> 
> Tiago

For a general project reference, I think the most recent is Brad &
Jeff's 2000 newsletter article:

Chapman, B. and Chang, J. (2000) Biopython: python tools for
computational biology. ACM SIG-BIO Newsletter, 20, 15-19.

However, I confess I only cited the www.biopython.org website in my last 
paper.

Peter

P.S. There are specific papers for some modules, e.g. Bio.PDB and 
Bio.Cluster


From skhadar at gmail.com  Mon Oct 29 13:15:30 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 29 Oct 2007 18:45:30 +0530
Subject: [BioPython] Biopython citation
In-Reply-To: <4724F675.8030902@maubp.freeserve.co.uk>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
Message-ID: <b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>

Hi Peter,

I am interested to look at it. We dont have access to ACM.
If you have a copy of that paper.

Thanks,
Shameer

On 10/29/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Tiago Antao wrote:
> > I am submitting a paper regarding a Jython selection detection program
> > that we have done, and I would like to cite biopython. What is really
> > the best, most recent, citation?
> >
> > Tiago
>
> For a general project reference, I think the most recent is Brad &
> Jeff's 2000 newsletter article:
>
> Chapman, B. and Chang, J. (2000) Biopython: python tools for
> computational biology. ACM SIG-BIO Newsletter, 20, 15-19.
>
> However, I confess I only cited the www.biopython.org website in my last
> paper.
>
> Peter
>
> P.S. There are specific papers for some modules, e.g. Bio.PDB and
> Bio.Cluster
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From skhadar at gmail.com  Mon Oct 29 14:11:41 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Mon, 29 Oct 2007 19:41:41 +0530
Subject: [BioPython] Biopython citation
In-Reply-To: <4725E655.8080608@maubp.freeserve.co.uk>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
	<b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
	<4725E655.8080608@maubp.freeserve.co.uk>
Message-ID: <b6ff81950710290711w5cedfcc2s85a6a12a05c4034b@mail.gmail.com>

Hi ,

Thanks for that !!!
--
Shameer

On 10/29/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Shameer Khadar wrote:
> > Hi Peter,
> >
> > I am interested to look at it. We dont have access to ACM. If you
> > have a copy of that paper.
> >
> > Thanks, Shameer
>
> Its not actually very informative, especial as of the examples are now
> rather dated.  Anyway, I believe the new-letter article was the same as
> the document available on our website:
>
> http://biopython.org/DIST/docs/acm/ACMbiopy.html
> http://biopython.org/DIST/docs/acm/ACMbiopy.pdf
>
> Chapman, B. and Chang, J. (2000) Biopython: python tools for
> computational biology. ACM SIG-BIO Newsletter, 20, 15-19.
>
> Peter
>


From biopython at maubp.freeserve.co.uk  Mon Oct 29 13:55:33 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 13:55:33 +0000
Subject: [BioPython] Biopython citation
In-Reply-To: <b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk>
	<b6ff81950710290615u4d8fea74le58869d526ddd50c@mail.gmail.com>
Message-ID: <4725E655.8080608@maubp.freeserve.co.uk>

Shameer Khadar wrote:
> Hi Peter,
> 
> I am interested to look at it. We dont have access to ACM. If you
> have a copy of that paper.
> 
> Thanks, Shameer

Its not actually very informative, especial as of the examples are now
rather dated.  Anyway, I believe the new-letter article was the same as 
the document available on our website:

http://biopython.org/DIST/docs/acm/ACMbiopy.html
http://biopython.org/DIST/docs/acm/ACMbiopy.pdf

Chapman, B. and Chang, J. (2000) Biopython: python tools for
computational biology. ACM SIG-BIO Newsletter, 20, 15-19.

Peter


From biopython at maubp.freeserve.co.uk  Mon Oct 29 19:22:20 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 19:22:20 +0000
Subject: [BioPython] Loading SwissProt to BioSQL
In-Reply-To: <4720D0E6.8000609@maubp.freeserve.co.uk>
References: <BLU109-W16D416D93604BC3494AECCC5940@phx.gbl>
	<471FB5DE.6080506@maubp.freeserve.co.uk>
	<BLU109-W2996B11A548AEE3A63AC9C5950@phx.gbl>
	<4720D0E6.8000609@maubp.freeserve.co.uk>
Message-ID: <320fb6e00710291222l1a5746e9m3bbc5c4c9fd03921@mail.gmail.com>

Jonathan Boulais wrote:
> But I still get the same error as before when I'm running it.
> ...

For anyone wanting to track this issue, Jonathan has filled
Bug 2390 - Error importing Swiss Prot in BioSQL
http://bugzilla.open-bio.org/show_bug.cgi?id=2390

Peter


From anaryin at gmail.com  Tue Oct 30 01:28:21 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 30 Oct 2007 01:28:21 +0000
Subject: [BioPython] Fwd:  Scripts cannot connect
In-Reply-To: <b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
Message-ID: <b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>

I've checked all my connection settings, tested an awful lot of
possibilities and I came to this conclusion. When using a webservice, I
can't connect to the internet. In the same script, I can get for instance,
the google page, but the lines regarding the webservice itself, they won't
connect.

I've tried to set environment proxy (through export
http_proxy='blabla:yyyy') in the script itself and nothing. I've set
os.environ[blabla] and it's doesn't work.

So, does anyone has an idea of why this is happening? Shouldn't the
webservice, if using http protocol (as it does), work just like any other
command (let's say, urllib.urlopen)?

I know this falls out of the BioPython theme but I consider it quite
relevant for my BioPython work :)

Thank you all in advance!


From biopython at maubp.freeserve.co.uk  Tue Oct 30 08:53:14 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 08:53:14 +0000
Subject: [BioPython] Fwd:  Scripts cannot connect
In-Reply-To: <b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>	<471C9C34.7000006@maubp.freeserve.co.uk>	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
	<b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
Message-ID: <4726F0FA.6000209@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> I've checked all my connection settings, tested an awful lot of
> possibilities and I came to this conclusion. When using a webservice, I
> can't connect to the internet. In the same script, I can get for instance,
> the google page, but the lines regarding the webservice itself, they won't
> connect.

Are you still finding things work on Windows, but fail on Linux?
If so, are you running the same version of python (and Biopython) on both?

> I've tried to set environment proxy (through export
> http_proxy='blabla:yyyy') in the script itself and nothing. I've set
> os.environ[blabla] and it's doesn't work.

When you say "it doesn't work", do you mean the (a) environment variable 
isn't set, or (b) the environment variable is set but has not effect.

> So, does anyone has an idea of why this is happening? Shouldn't the
> webservice, if using http protocol (as it does), work just like any other
> command (let's say, urllib.urlopen)?

Are you saying there is a difference depending on the URL type (plain 
page versus web-service?)

Or, are you saying there is a difference depending on what python 
library you use (e.g. urllib or something else).

> I know this falls out of the BioPython theme but I consider it quite
> relevant for my BioPython work :)
> 
> Thank you all in advance!

This must be very frustrating for you.  Have you been able to find your 
University's official documentation for the proxy?

Peter


From biopython at maubp.freeserve.co.uk  Tue Oct 30 12:32:10 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 12:32:10 +0000
Subject: [BioPython] Question about Seq.count()
In-Reply-To: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>
References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com>	
	<471686C7.6050305@maubp.freeserve.co.uk>	
	<5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com>
	<320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com>
Message-ID: <4727244A.4010705@maubp.freeserve.co.uk>

Peter wrote:
>> I've found the bug!
>>
>> The code for Bio.Seq.count is:
>>
>> def count(self, item):
>>         return len([x for x in self.data if x == item])
> 
> Yeah - by design this (and the functionally similar version for the
> MutableSeq) both expect the count argument to be a single letter.  The
> simple fix for the Seq object is to use the string method internally:
> 
> def count(self, item):
>         return self.data.count(item)
> 
> For the MutableSeq things are not so straight forward, but supporting
> multiple character arguments can be done.

Bug 2386 and proposed patch here:
http://bugzilla.open-bio.org/show_bug.cgi?id=2386

This also lets the count methods take Seq or MutableSeq objects as 
arguments - in addition to plain strings.

Note there is room for improvement in my patch: For the case of the 
MutableSeq, we might want to investigate counting from the array of 
characters directly, rather than taking the lazy option of turning it 
into a string and counting that way.

Peter


From anaryin at gmail.com  Tue Oct 30 16:29:00 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 30 Oct 2007 16:29:00 +0000
Subject: [BioPython] Fwd: Scripts cannot connect
In-Reply-To: <4726F0FA.6000209@maubp.freeserve.co.uk>
References: <b537e3710710220521o7752c81bt538449d1eb388673@mail.gmail.com>
	<471C9C34.7000006@maubp.freeserve.co.uk>
	<b537e3710710220611q4d1a3614vdeff07a4241c8bf6@mail.gmail.com>
	<320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com>
	<b537e3710710220838t6156a3ccta4c21ce1ed26bdfd@mail.gmail.com>
	<b537e3710710291828p753e957fy68cba7dc00a07a0b@mail.gmail.com>
	<4726F0FA.6000209@maubp.freeserve.co.uk>
Message-ID: <b537e3710710300929u37cee4ecm3aa56e080fe475ac@mail.gmail.com>

Are you still finding things work on Windows, but fail on Linux?
If so, are you running the same version of python (and Biopython) on both?

There is the same version in all operative systems. I'm using XP (one 32bits
the other 64) in the Windows Machines (one at home another at "work") and
Ubuntu 7.10 in both my laptop and the Workstation at the University (it's
dual-booted). Regarding Biopython, it's the same version in all but my
laptop that has the last upgrade of the 28th October (but still, it never
worked before). But since I'm not using any modules, it should not have
anything to do with it.

When you say "it doesn't work", do you mean the (a) environment variable
isn't set, or (b) the environment variable is set but has not effect.

An example: I start a new session in my laptop and open the console. I type
"export http_proxy='blabla'" to set the variable. I then type "env" and it
returns me a list of all env variable *including* the http_proxy one. I run
"aptitude update" and it works. If I do the same in a Python Script, it
doesn't (at least when connecting to a webservice). I believe then, that the
variable is set but it doesn't work somehow.

Are you saying there is a difference depending on the URL type (plain
page versus web-service?)

I *think*, or suppose, that somehow, the two "types" of connection, despite
using HTTP and the same  proxy env. variable, are working differently.


Or, are you saying there is a difference depending on what python
library you use (e.g. urllib or something else).

Which other libraries can I try out? Other than urllib?


This must be very frustrating for you.
Have you been able to find your University's official documentation for the
proxy?

It's a dilemma. On the one hand, I have a perfectly set windows system that
can access the internet through the scripts I write. However, there is no
ZSI for it (ot at least, I can't install it). As such, no SOAP support, no
API I can get to work.
On the other hand, GNU/Linux. It works perfectly, the *.deb packages exist
and are quite easy to install, so I have ZSI and SOAP support to work with
the API. However, I can't access the web with the ZSI module.

I'll try to talk to the University Informatics Service to see if they can
figure it out. Really hope they can, otherwise, I guess I'll just have to
work from home since it works there.. :)

Again, very thankful!

Jo?o Rodrigues