From ibdeno at gmail.com  Sun Sep  2 11:52:57 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Sun, 2 Sep 2007 17:52:57 +0200
Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary
Message-ID: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>

Hello everyone.

I'm trying to retrieve from NCBI a series of GeneBank records from a list
read from a file.
This is the code:

8<-------------------------------------------------------------------------------------------

ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")

output = open(args[0]+'.gb','w')

for gbid in ids:
    gb_record = ncbi_dict[gbid]
    output.write(gb_record)

output.close()

------------------------------------------------------------------------------------------->8

The problem is that at some point the job stops with an error such as:

Traceback (most recent call last):
  File "/Users/mol/bin/getfromGB.py", line 61, in ?
    main()
  File "/Users/mol/bin/getfromGB.py", line 54, in main
    gb_record = ncbi_dict[gbid]
  File "/sw/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1264,
in __getitem__
    handle = self.db[id]
  File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 89,
in __getitem__
    return self._get(key)
  File "/sw/lib/python2.4/site-packages/Bio/config/_support.py", line 107,
in __call__
    return self.fn(*args, **keywds)
  File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 370,
in _get
    handle = eutils_client.efetch(retmode = "text", rettype =
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/DBIdsClient.py", line
150, in efetch
    complexity = complexity)
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 987,
in efetch_using_dbids
    query = {"id": id_string,
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 644,
in _get
    return self.opener.open(url)
  File "/sw/lib/python2.4/urllib2.py", line 364, in open
    response = meth(req, response)
  File "/sw/lib/python2.4/urllib2.py", line 471, in http_response
    response = self.parent.error(
  File "/sw/lib/python2.4/urllib2.py", line 402, in error
    return self._call_chain(*args)
  File "/sw/lib/python2.4/urllib2.py", line 337, in _call_chain
    result = func(*args)
  File "/sw/lib/python2.4/urllib2.py", line 480, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Temporarily Unavailable

Sometimes is a 502 Error... Because I can access those entries from my
browser without problem, I'm guessing that there may be a timeout problem
here.

I would appreciate your help!

Cheers,

Miguel
-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From sbassi at gmail.com  Sun Sep  2 23:25:22 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 3 Sep 2007 00:25:22 -0300
Subject: [BioPython] Getting the location from a Genbank record
Message-ID: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>

I can get the "location" of the genes I want, but I have them in a
"print mode" (calling __str__), but I don't see how to get the start
and end position in a way I could use to slice the seq. There are
private attributes _start and _end but I don't know if using them if
the "right" way to do it.

from Bio import SeqIO
mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next()
targets=(['cox2'],['atp6'],['atp9'],['cob'])
for x in mr.features:
        if x.qualifiers.get('gene') in targets:
            print x.location
            #print mr.seq

Get the slice I am looking for:

>>> mr.seq[x.location._start.position:x.location._end.position]
Seq('ATGAATGTTATAACTCCTAATTCTTTGGTAGCGGACCTCTTTGATAGTTCGACCCTTATCCCCCGTCTAACTCAACTATTCGACTGTACGGCTATTGTGATTGCGAGAGAAAGGAGGGATGGCGCCTTCCTTTACCATCTGGCGGTTGAAAACAAAAGTGCTTCCAGGTACACGGCTGTTAGGCTCATCCAAGGCGTATTTACGGAAGTAGCAGGGAACTTGACCGTCAAGTTTGAAAAAAGCTGGCCAAGCCTGTGTCACTTTCTTACGTCAGGAGAAAGGGAGATCAAAGAAGTATGGGGCCGATACGCGAAGGATCAAATCATAGAGATAGCGGATCTTAAGAGGCGGAAGAAAAGGAACCTCGGCGACCCAGAGATCGCGGAGTCCGCGCCCGTGCCGAAAGTGAAGAAGCTTTCCTCTCCTTTCAGTCGAGCATGCCCGCCCTTTAGCACTTCCCTTCCCGAAGTGGGAGTAGGAGAAAGGAAAGCGCACTCGATCAATTACCATGCCGTGTCGTAA',
IUPACAmbiguousDNA())


-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From biopython at maubp.freeserve.co.uk  Mon Sep  3 06:46:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Sep 2007 11:46:32 +0100
Subject: [BioPython] Getting the location from a Genbank record
In-Reply-To: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
References: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
Message-ID: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>

On 9/3/07, Sebastian Bassi <sbassi at gmail.com> wrote:
> I can get the "location" of the genes I want, but I have them in a
> "print mode" (calling __str__), but I don't see how to get the start
> and end position in a way I could use to slice the seq. There are
> private attributes _start and _end but I don't know if using them if
> the "right" way to do it.
>
> from Bio import SeqIO
> mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next()
> targets=(['cox2'],['atp6'],['atp9'],['cob'])
> for x in mr.features:
>        if x.qualifiers.get('gene') in targets:
>            print x.location
>            #print mr.seq

I'm not at my own computer right now, but I think you need to do
something like this to get the slice - assuming nothing funny like
joins:

start = x.location.start.position
end = x.location.end.position
print mr.seq[start:end]
print mr.seq[start:end].reverse_complement()

See also: http://www.warwick.ac.uk/go/peter_cock/python/genbank/

Peter

From sbassi at gmail.com  Mon Sep  3 09:32:28 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 3 Sep 2007 10:32:28 -0300
Subject: [BioPython] Getting the location from a Genbank record
In-Reply-To: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>
References: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
	<320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>
Message-ID: <b43bf2080709030632s7a5f7d39tf03139e508b4c1a@mail.gmail.com>

On 9/3/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
> start = x.location.start.position
> end = x.location.end.position

Yes, this worked. I tried x.location._start.position
because of this:
>>> dir(x.location)
['__doc__', '__getattr__', '__init__', '__module__', '__str__',
'_end', '_start']

Thank you!


-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318

From biopython at maubp.freeserve.co.uk  Mon Sep  3 12:47:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 03 Sep 2007 17:47:06 +0100
Subject: [BioPython] Extracting SeqFeature locations from sequences
Message-ID: <46DC3A8A.1000100@maubp.freeserve.co.uk>

I was prompted to actually write this email based on Sebastian Bassi's 
recent email where he was having trouble getting to grips with this topic.

I had been thinking that Biopython really should have code built in to 
take a SeqFeature's location and extract this from the full record 
sequence. This would particularly apply to SeqRecord objects read from 
GenBank or EMBL files (using Bio.SeqIO or using Bio.GenBank directly).

As far as I am aware, right now it is up to the user to take the 
information stored in a SeqFeature and apply this "by hand" to the 
parent record's sequence.  Adding some more detailed examples to the 
tutorial is probably a good idea - for example based on 
http://www.warwick.ac.uk/go/peter_cock/python/genbank/

In addition to improving the documentation, we could add a new method to 
the Seq and/or SeqRecord object which would return the sub-sequence 
defined by a SeqFeature.

We could even do this via the __getitem__ method, normally used for 
accessing elements of a sequence (as strings) or splicing to get a 
sub-sequence. e.g.
print seq[index]
print seq[start:end]
print seq[feature]
or,
print record[feature]

I think this is quite elegant, but a separate explicitly named method 
might be clearer and more discoverable.

To do this properly covering all cases is actually non-trivial - a good 
reason to have it built into Biopython (with a good test suite) rather 
than having end users reimplement it themselves.

Messy details to take care of include being aware of both joins and 
complements (stored as sub-features and the strand property 
respectively), and fuzzy locations.  Most situations should be resolved 
relatively easily - but in the worst case we could throw a ValueError if 
there really is no sensible solution.

Peter


From biopython at maubp.freeserve.co.uk  Tue Sep  4 09:05:21 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 04 Sep 2007 14:05:21 +0100
Subject: [BioPython] problem accessing ncbi through
	GenBank.NCBIDictionary
In-Reply-To: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
References: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
Message-ID: <46DD5811.8060209@maubp.freeserve.co.uk>

Miguel Ortiz-Lombard?a wrote:
> Hello everyone.
> 
> I'm trying to retrieve from NCBI a series of GeneBank records from a list
> read from a file.

How many GenBenk identifiers are we talking about? Just trying to get an 
idea of the scale of the problem.  It certainly sounds like either 
network failures or timeouts.  Have you try something like this?

from Bio import GenBank
from urllib2 import HTTPError
ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")
ids = ['14598510', '16904191']
output = open('saved.gb','w')
for gbid in ids:
     print "Fetching %s" % gbid
     try :
         gb_record = ncbi_dict[gbid]
     except HTTPError, e :
         #Check error code?
         print str(e)
         print "Re-trying %s" % gbid
         gb_record = ncbi_dict[gbid]
     output.write(gb_record)
output.close()
print "Done"

Peter


From jimmy.musselwhite at gmail.com  Tue Sep  4 09:23:37 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Tue, 4 Sep 2007 09:23:37 -0400
Subject: [BioPython] Bio.Cluster clarification
Message-ID: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>

Hello all
In the documentation it says the "data" argument is "an array containing the
gene expression data". What exactly does that mean? Ideally all I want to do
is send it an array of lists, each containing 3 floats, aka an array of
vectors in 3d space, and have it cluster those. Is that doable?

This may seem like a beginner question but I'm not sure of this
documentation (cluster.pdf).

Thanks!

Or, less likely, if you know of any python lib that can handle this, let me
know!

From biopython at maubp.freeserve.co.uk  Tue Sep  4 09:42:00 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 04 Sep 2007 14:42:00 +0100
Subject: [BioPython] Bio.Cluster clarification
In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
Message-ID: <46DD60A8.7070403@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Hello all
 > In the documentation it says the "data" argument is "an array
 > containing the gene expression data". What exactly does that mean?

I suspect that means an array object from the Numeric library. i.e. a
two dimensional dataset of floats. In the context of gene expression,
the rows are usually different genes and the columns different samples
(typically covering two or more experimental conditions), and the data
points are simply floating point numbers (gene expression levels).

> Ideally all I want to do is send it an array of lists, each
> containing 3 floats, aka an array of vectors in 3d space, and have it
> cluster those. Is that doable?

When you say you have an array of three-vectors, do you mean you have a
three dimensional dataset? e.g. a vector field

> This may seem like a beginner question but I'm not sure of this 
> documentation (cluster.pdf).

Hopefully Michiel will reply shortly - as the author of Bio.Cluster, he
should be able to give you a more precise answer.  See also his webpage:
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/

Peter


From mdehoon at c2b2.columbia.edu  Tue Sep  4 09:47:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Tue, 04 Sep 2007 22:47:49 +0900
Subject: [BioPython] Bio.Cluster clarification
In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
Message-ID: <46DD6205.7070801@c2b2.columbia.edu>

Jimmy Musselwhite wrote:
> Hello all
> In the documentation it says the "data" argument is "an array containing the
> gene expression data". What exactly does that mean? Ideally all I want to do
> is send it an array of lists, each containing 3 floats, aka an array of
> vectors in 3d space, and have it cluster those. Is that doable?

Yes.

--Michiel.
> 
> This may seem like a beginner question but I'm not sure of this
> documentation (cluster.pdf).
> 
> Thanks!
> 
> Or, less likely, if you know of any python lib that can handle this, let me
> know!
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From ibdeno at gmail.com  Tue Sep  4 10:55:27 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Tue, 4 Sep 2007 16:55:27 +0200
Subject: [BioPython] problem accessing ncbi through
	GenBank.NCBIDictionary
In-Reply-To: <46DD5811.8060209@maubp.freeserve.co.uk>
References: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
	<46DD5811.8060209@maubp.freeserve.co.uk>
Message-ID: <d5dc3ecc0709040755t738550b3y31adda74acec6446@mail.gmail.com>

Eventually, I managed to download all of them (21 only...) But thank you
very much for the tip, I will incorporate that error check/try to the
script!

Cheers,


Miguel

2007/9/4, Peter <biopython at maubp.freeserve.co.uk>:
>
> Miguel Ortiz-Lombard?a wrote:
> > Hello everyone.
> >
> > I'm trying to retrieve from NCBI a series of GeneBank records from a
> list
> > read from a file.
>
> How many GenBenk identifiers are we talking about? Just trying to get an
> idea of the scale of the problem.  It certainly sounds like either
> network failures or timeouts.  Have you try something like this?
>
> from Bio import GenBank
> from urllib2 import HTTPError
> ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")
> ids = ['14598510', '16904191']
> output = open('saved.gb','w')
> for gbid in ids:
>      print "Fetching %s" % gbid
>      try :
>          gb_record = ncbi_dict[gbid]
>      except HTTPError, e :
>          #Check error code?
>          print str(e)
>          print "Re-trying %s" % gbid
>          gb_record = ncbi_dict[gbid]
>      output.write(gb_record)
> output.close()
> print "Done"
>
> Peter
>
>


-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From meesters at uni-mainz.de  Wed Sep  5 11:47:07 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Wed, 5 Sep 2007 17:47:07 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a	protein?
Message-ID: <1189007228.27068.31.camel@cmeesters>

Hi,

Does anyone know a way to compute the maximum distance within a protein
(perhaps using Bio.PDB) without calculating distances of all atom
pairs? 

I'm hoping to be just too blind to see an easy solution here ...

TIA
Christian

From idoerg at gmail.com  Wed Sep  5 12:02:04 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed, 5 Sep 2007 09:02:04 -0700
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <1189007228.27068.31.camel@cmeesters>
References: <1189007228.27068.31.camel@cmeesters>
Message-ID: <b5bbbc970709050902p43065279r1dc880b5d19289b1@mail.gmail.com>

Not sure why you would want to do that. But how about calculating the
diameter of an enclosing sphere?

On 9/5/07, Christian Meesters <meesters at uni-mainz.de> wrote:
>
> Hi,
>
> Does anyone know a way to compute the maximum distance within a protein
> (perhaps using Bio.PDB) without calculating distances of all atom
> pairs?
>
> I'm hoping to be just too blind to see an easy solution here ...
>
> TIA
> Christian
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."

From biopython at maubp.freeserve.co.uk  Wed Sep  5 12:24:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 05 Sep 2007 17:24:06 +0100
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
 within a protein?
In-Reply-To: <1189007228.27068.31.camel@cmeesters>
References: <1189007228.27068.31.camel@cmeesters>
Message-ID: <46DED826.4050802@maubp.freeserve.co.uk>

Christian Meesters wrote:
> Hi,
> 
> Does anyone know a way to compute the maximum distance within a protein
> (perhaps using Bio.PDB) without calculating distances of all atom
> pairs? 

Are you thinking alpha-carbon to alpha-carbon distances, or using all atoms?

> I'm hoping to be just too blind to see an easy solution here ...

There should be some way to take advantage of the backbone links meaning 
lots of residues are constrained to be close to each other... Is it 
essential to get the largest pairwise distance, or would a local maximum do?

You could probably do some clever sampling, say doing all pairwise 
combination of every third residue, and then for those furthest apart 
including all the local residues... just thinking out loud.

Peter


From ibdeno at gmail.com  Wed Sep  5 13:31:55 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Wed, 5 Sep 2007 19:31:55 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <46DED826.4050802@maubp.freeserve.co.uk>
References: <1189007228.27068.31.camel@cmeesters>
	<46DED826.4050802@maubp.freeserve.co.uk>
Message-ID: <d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>

Hello,

You can align the protein coordinates against its principal axes of inertia.
This is very fast. One (free) program doing so is 'moleman2' from the
Uppsala Software Factory:

http://alpha2.bmc.uu.se/~gerard/usf/

HTH,


Miguel

2007/9/5, Peter <biopython at maubp.freeserve.co.uk>:
>
> Christian Meesters wrote:
> > Hi,
> >
> > Does anyone know a way to compute the maximum distance within a protein
> > (perhaps using Bio.PDB) without calculating distances of all atom
> > pairs?
>
> Are you thinking alpha-carbon to alpha-carbon distances, or using all
> atoms?
>
> > I'm hoping to be just too blind to see an easy solution here ...
>
> There should be some way to take advantage of the backbone links meaning
> lots of residues are constrained to be close to each other... Is it
> essential to get the largest pairwise distance, or would a local maximum
> do?
>
> You could probably do some clever sampling, say doing all pairwise
> combination of every third residue, and then for those furthest apart
> including all the local residues... just thinking out loud.
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From thamelry at binf.ku.dk  Wed Sep  5 14:19:28 2007
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Wed, 5 Sep 2007 20:19:28 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>
References: <1189007228.27068.31.camel@cmeesters>
	<46DED826.4050802@maubp.freeserve.co.uk>
	<d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>
Message-ID: <2d7c25310709051119r18e278cag70f4750272f3cea@mail.gmail.com>

Hi,

This is one of those problems that computational geometry people love to solve.
See for example:

http://www-sop.inria.fr/epidaure/personnel/malandain/diameter/

Google will give many other algorithms...

Cheers,

-Thomas

From meesters at uni-mainz.de  Thu Sep  6 08:37:13 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Thu, 6 Sep 2007 14:37:13 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum	distance
	within a protein?
Message-ID: <1189082233.20772.37.camel@cmeesters>

Hi,

Thanks for the input. 

To clarify what I actually wanted: I need a rather precise (+/- 2 ?)
estimate of the maximum distance within a protein - taking all atoms,
including sugar residues in glycosilated proteins for example, into
account. So, restricting myself to CA-atoms does not really help. The
approach should not rely on symmetry, since not all proteins have
symmetry.

Thinking about the problem once more, I decided to make use of the
Har-Peled approach Thomas pointed me (indirectly) to. 

Again,
Thanks a lot,
Christian

From biopython at maubp.freeserve.co.uk  Sun Sep  9 17:17:04 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 09 Sep 2007 22:17:04 +0100
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46D31C97.1070200@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>
	<46D31C97.1070200@maubp.freeserve.co.uk>
Message-ID: <46E462D0.5090207@maubp.freeserve.co.uk>

Peter wrote:
> I think having SeqRecord subclass Seq is nicer than simply adding 
> annotation to the Seq class. Seq objects would (still) just have a 
> sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object.
> 
> I think this would be close to BioPerl's Seq and RichSeq objects.
> 
> I have filed an enhancement on Bugzilla to hold any suggested patches 
> etc (I hope to upload something later tonight):
> 
> Bug 2351 - Make SeqRecord subclass Seq subclass string?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2351

Going back over the mailing list archives, we discussed something 
similar on the dev mailing list back in early 2005.

I would like to make the following "small" change now, ready for the 
next release of Biopython:

(1) Make __str__ give the full sequence as a string for Seq and
     MutableSeq objects, allowing intuitive use of str(myseq) which
     used to give a truncated representation including the alphabet.
(2) tostring() will be documented as deprecated in favour of str(...)
(3) leave __repr__ as is (giving the full string with an alphabet)
     which can be used with eval(repr(myseq)))

There will be some fallout to this - in particular we'll need to go over 
the documentation and may need to fix a few things.

The only downside is the loss of a built in method to get a "short seq 
string representation" (currently available as str(myseq) via __str__). 
  Back in 2005, Fr?d?ric Sohm suggested adding short() method to do 
this. Personally I'd only use this when working at the command line, but 
it might be nice.  One refinement over the current truncation is I would 
personally include the last three letters - this is handy when looking 
at genes as you might want to know if there was a stop codon present.

e.g.

Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLN...LVL', 
SingleLetterAlphabet())

rather than:

Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLNLDQ ...', 
SingleLetterAlphabet())

and similarly for nucleotides (which is why I suggest at least the last 
three trailing letters).

Peter


From mdehoon at c2b2.columbia.edu  Sun Sep  9 20:04:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 10 Sep 2007 09:04:28 +0900
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E462D0.5090207@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>
	<46D31C97.1070200@maubp.freeserve.co.uk>
	<46E462D0.5090207@maubp.freeserve.co.uk>
Message-ID: <46E48A0C.1050403@c2b2.columbia.edu>

Peter wrote:
> I would like to make the following "small" change now, ready for the 
> next release of Biopython:
> 
> (1) Make __str__ give the full sequence as a string for Seq and
>     MutableSeq objects, allowing intuitive use of str(myseq) which
>     used to give a truncated representation including the alphabet.

Note that the __str__ is used to create the output of "print myseq", 
where myseq is a Seq object. So if __str__ returns the full sequence 
string, then "print myseq" will print the full sequence. This is not 
necessarily what you want. In essence, the str() function and the 
.tostring() method have different functions. So I think we should not 
drop .tostring() in favor of str().

Moreover, this problem will go away if and when a Seq object subclasses 
from a string object. Then, we won't need a Seq-to-string function at all.

--Michiel.

From biopython at maubp.freeserve.co.uk  Mon Sep 10 04:27:18 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 10 Sep 2007 09:27:18 +0100
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E48A0C.1050403@c2b2.columbia.edu>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>	<46D31C97.1070200@maubp.freeserve.co.uk>	<46E462D0.5090207@maubp.freeserve.co.uk>
	<46E48A0C.1050403@c2b2.columbia.edu>
Message-ID: <46E4FFE6.9040608@maubp.freeserve.co.uk>

We seem to be talking at cross purposes.

Michiel de Hoon wrote:
> Peter wrote:
>> I would like to make the following "small" change now, ready for
>> the next release of Biopython:
>> 
>> (1) Make __str__ give the full sequence as a string for Seq and 
>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>> to give a truncated representation including the alphabet.
> 
> Note that the __str__ is used to create the output of "print myseq",
>  where myseq is a Seq object. So if __str__ returns the full sequence
>  string, then "print myseq" will print the full sequence. This is not
>  necessarily what you want.

Getting the full string from both "print my_seq" and str(my_seq) is what
I would expect from a Seq object that acted like a string.

> In essence, the str() function and the .tostring() method have
> different functions. So I think we should not drop .tostring() in
> favor of str().

At the moment str() and .tostring() do serve purposes.  Currently with a 
Seq object called my_seq:
* full sequence as string - my_seq.tostring()
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - str(my_seq)

What I would like:
* full sequence as string - str(my_seq) and retain my_seq.tostring() for 
backwards compatibility.
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - consider added 
a new method e.g. my_seq.short()

> Moreover, this problem will go away if and when a Seq object
> subclasses from a string object. Then, we won't need a Seq-to-string
> function at all.

What do you mean by the "problem will go away"?  This would be much
easier to discuss in person :(

If/when we make Seq a subclass of string, there would still be __str__
and __repr__ methods, and I would expect str(my_seq) and also "print
my_seq" to give the full sequence.  For backwards compatibility I would
keep the existing .tostring() method as well.

I would find it very strange to have the Seq object subclass string, but 
doing str(my_seq) not give me the full sequence.  Isn't making 
str(my_seq) return the full sequence as a string is essential for things 
like this?:

print my_seq
print "My sequence is %s, length %i" % (my_seq, len(my_seq))

Rather than as currently required:

print my_seq.tostring()
print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))


Peter


From mdehoon at c2b2.columbia.edu  Mon Sep 10 05:56:25 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 10 Sep 2007 18:56:25 +0900
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E4FFE6.9040608@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>	<46D31C97.1070200@maubp.freeserve.co.uk>	<46E462D0.5090207@maubp.freeserve.co.uk>
	<46E48A0C.1050403@c2b2.columbia.edu>
	<46E4FFE6.9040608@maubp.freeserve.co.uk>
Message-ID: <46E514C9.2010006@c2b2.columbia.edu>

Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming 
release, which is only five days away. There's not enough time to 
discuss these issues in detail, let alone to test them.

--Michiel.


Peter wrote:
> We seem to be talking at cross purposes.
> 
> Michiel de Hoon wrote:
>> Peter wrote:
>>> I would like to make the following "small" change now, ready for
>>> the next release of Biopython:
>>>
>>> (1) Make __str__ give the full sequence as a string for Seq and 
>>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>>> to give a truncated representation including the alphabet.
>>
>> Note that the __str__ is used to create the output of "print myseq",
>>  where myseq is a Seq object. So if __str__ returns the full sequence
>>  string, then "print myseq" will print the full sequence. This is not
>>  necessarily what you want.
> 
> Getting the full string from both "print my_seq" and str(my_seq) is what
> I would expect from a Seq object that acted like a string.
> 
>> In essence, the str() function and the .tostring() method have
>> different functions. So I think we should not drop .tostring() in
>> favor of str().
> 
> At the moment str() and .tostring() do serve purposes.  Currently with a 
> Seq object called my_seq:
> * full sequence as string - my_seq.tostring()
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - str(my_seq)
> 
> What I would like:
> * full sequence as string - str(my_seq) and retain my_seq.tostring() for 
> backwards compatibility.
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - consider added 
> a new method e.g. my_seq.short()
> 
>> Moreover, this problem will go away if and when a Seq object
>> subclasses from a string object. Then, we won't need a Seq-to-string
>> function at all.
> 
> What do you mean by the "problem will go away"?  This would be much
> easier to discuss in person :(
> 
> If/when we make Seq a subclass of string, there would still be __str__
> and __repr__ methods, and I would expect str(my_seq) and also "print
> my_seq" to give the full sequence.  For backwards compatibility I would
> keep the existing .tostring() method as well.
> 
> I would find it very strange to have the Seq object subclass string, but 
> doing str(my_seq) not give me the full sequence.  Isn't making 
> str(my_seq) return the full sequence as a string is essential for things 
> like this?:
> 
> print my_seq
> print "My sequence is %s, length %i" % (my_seq, len(my_seq))
> 
> Rather than as currently required:
> 
> print my_seq.tostring()
> print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))
> 
> 
> Peter
> 


From mdehoon at c2b2.columbia.edu  Tue Sep 11 10:37:57 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Tue, 11 Sep 2007 23:37:57 +0900
Subject: [BioPython] Bio.MultiProc
Message-ID: <46E6A845.3030601@c2b2.columbia.edu>

Hi everybody,

In preparation for the upcoming release, I was running the Biopython 
test suite and found that test_copen.py hangs on Cygwin. It doesn't 
fail, it just sits there forever. This may be related to the use of 
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it 
is probably possible to fix this, I'd have to dig fairly deep into the 
code, and I am not sure if it is worth it. It looks like the copen 
functions are used only in Bio/config, which is needed for Bio.db. A 
description of the functionality of thia module can be found in the 
tutorial section 4.7.2.

Now, I don't remember users asking about this module on the mailing 
list. From the tutorial documentation, it seems to be a nice piece of 
code, but I doubt that it is being used often in practice.

So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release? 
Hopefully, people who are using this code will notice, and let us know 
that they need it.

--Michiel.

From biopython at maubp.freeserve.co.uk  Wed Sep 12 14:31:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Sep 2007 19:31:43 +0100
Subject: [BioPython] Deprecating Bio.FormatIO ?
Message-ID: <46E8308F.6040709@maubp.freeserve.co.uk>

With the release Biopython 1.43 and Bio.SeqIO earlier this year, would 
anyone be upset if the older Bio.FormatIO module was marked as 
deprecated for the next Biopython release?

This module isn't mentioned in the tutorial/cookbook, but Brad did write 
this entire document:
http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf
http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html

In addition to marking Bio.FormatIO as deprecated, I would probably add 
a big disclaimer to that document, or re-write it to use Bio.SeqIO instead.

Thanks

Peter


From mdehoon at c2b2.columbia.edu  Thu Sep 13 01:13:29 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 13 Sep 2007 01:13:29 -0400
Subject: [BioPython] Deprecating Fasta.Dictionary, GenBank.Dictionary
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B61E@mail2.exch.c2b2.columbia.edu>

Hi everybody,

In the preparation for the upcoming Biopython release, we noticed some
serious problems when using the latest version (3.0) of mxTextTools. We were
already able to fix several of them, but some Biopython tests still fail with
the new mxTextTools. One of the tests that fails is test_Fasta.py. The part
of the test that fails is related to creating a Fasta Dictionary. This is not
explicitly described in the Tutorial, but it is essentially the same as
creating a Genbank dictionary, which is explained in section 4.3.4 in the
Tutorial.

Quoting from the tutorial:
>>> from Bio import GenBank
>>> dict_file = 'cor6_6.gb'
>>> index_file = 'cor6_6.idx'
>>> GenBank.index_file(dict_file, index_file)
>>> gb_dict = GenBank.Dictionary(index_file, GenBank.FeatureParser())
>>> len(gb_dict)
>>> gb_dict.keys()
['L31939', 'AJ237582', 'X62281', 'AF297471', 'M81224', 'X55053']
>>> gb_dict['AJ237582']
<Bio.SeqRecord.SeqRecord instance at 0x102fdd8c>


The same can also be obtained with the new Bio.SeqIO code:

>>> from Bio import SeqIO
>>> records = SeqIO.parse(open('cor6_6.gb'), 'genbank')
>>> gb_dict = {}
>>> for record in records:
...     key = record.id.split(".")[0]
...     gb_dict[key] = record
...
>>> gb_dict.keys()
['M81224', 'AF297471', 'X62281', 'AJ237582', 'L31939', 'X55053']
>>> # etcetera

(you can also use the to_dict function in Bio.SeqIO). The same can also be
done for Fasta.

So, I'd like to deprecate the index_file functions where Bio.SeqIO can be
used instead, in particular for Fasta. Then, we can remove that particular
test from test_Fasta. Would that cause problems for anybody? Given the new
Bio.SeqIO code, does anybody still need to use the index_file functions? 

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From letondal at pasteur.Fr  Fri Sep 14 15:12:28 2007
From: letondal at pasteur.Fr (Catherine Letondal)
Date: Fri, 14 Sep 2007 21:12:28 +0200
Subject: [BioPython] Programming course at Institut Pasteur (winter 2008)
Message-ID: <19581094-71E1-4400-83ED-12EF051BF7CB@pasteur.Fr>

Hi,


************************************************************************

       Course in informatics for biology 2008 at Institut Pasteur
          http://www.pasteur.fr/formation/infobio-en.html
         *** Registration extended to October 15th 2007 ***

************************************************************************


       In the series of courses offered at the Pasteur Institute, a
course
will be offered in informatics in biology. The next session will take
place from January to end of April 2008.

       The main goal of this course is to provide researchers in biology
an initial exposure to informatics. Admitance in the course is reserved
for those with a degree in biology or a related discipline.

       With more and more bioinformatics tools available, it becomes
increasingly important for researchers in biology to be able both to
manage their data, implement their ideas, and judge for themselves the
usefulness of new algorithms and software.

       This course will emphasize fundamental aspects of computer  
science
and apply them to biological examples. Theoretical aspects (algorithm
development, logic, problem modeling and design methods), and technical
applications (databases and web technologies) that are relevant for
biologists will be thoroughly discussed.

       Programming is presented through the object-oriented paradigm,
using a modern high-level language, Python, provided with tools for
biology and enabling both prototyping or scripting and the building of
important software systems. Learning of an additional language (C) will
be available for interested students.

       Learning during the course will be reinforced with computing
exercises, and effective training will be provided by a 2 month research
project.

       The working language of the course is French.

For further information, please consult:

       http://www.pasteur.fr/formation/infobio-en.html

    *** Registration will be closed on October 15th 2007. ***

Sincerely,


--
Benno Schwikowski & Catherine Letondal
Institut Pasteur -- Course in Informatics for Biology
www.pasteur.fr/formation/infobio


From dalloliogm at gmail.com  Mon Sep 17 05:39:38 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 11:39:38 +0200
Subject: [BioPython] sequence logo with biopython
Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>

Hi,
is there any way to produce sequence logos[1] with biopython?

I have a set of sequences of the same length, which represent the 5'
donorsite in a set of introns.
I wonder if there is a way to to create and display a .png logo
representation of them, like with this program:
- http://weblogo.berkeley.edu/


Thanks!!


[1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com

From dalloliogm at gmail.com  Mon Sep 17 05:39:38 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 11:39:38 +0200
Subject: [BioPython] sequence logo with biopython
Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>

Hi,
is there any way to produce sequence logos[1] with biopython?

I have a set of sequences of the same length, which represent the 5'
donorsite in a set of introns.
I wonder if there is a way to to create and display a .png logo
representation of them, like with this program:
- http://weblogo.berkeley.edu/


Thanks!!


[1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com

From bartek at rezolwenta.eu.org  Mon Sep 17 05:49:12 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 11:49:12 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
Message-ID: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:

> Hi,
> is there any way to produce sequence logos[1] with biopython?
> 
> I have a set of sequences of the same length, which represent the 5'
> donorsite in a set of introns.
> I wonder if there is a way to to create and display a .png logo
> representation of them, like with this program:
> - http://weblogo.berkeley.edu/
> 

Unfortunately, currently there is no solution in biopython to this. You can
however take a look at TAMO, a python library designed for working with
motifs.
http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with it,
but there are ways to at least obtain text version of the logo.

-- 
regards
   Bartek
--
For every complex problem there is an answer that is clear, simple, and wrong. 
                   H. L. Mencken


From bartek at rezolwenta.eu.org  Mon Sep 17 09:35:22 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 15:35:22 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
Message-ID: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>

bartek wilczynski <bartek at rezolwenta.eu.org> wrote:

> Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> 
> > Hi,
> > is there any way to produce sequence logos[1] with biopython?
> > 
> > I have a set of sequences of the same length, which represent the 5'
> > donorsite in a set of introns.
> > I wonder if there is a way to to create and display a .png logo
> > representation of them, like with this program:
> > - http://weblogo.berkeley.edu/
> > 
> 
> Unfortunately, currently there is no solution in biopython to this. You can
> however take a look at TAMO, a python library designed for working with
> motifs.
> http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with
> it,
> but there are ways to at least obtain text version of the logo.
> 

I've looked into it, and found a way to add this functionality to biopython.

The diff file attached introduces a method .weblogo("filename.png") to the
Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
standalone function which takes a fasta file as input.

Is it a right time to submit things like this to cvs? I can do that, but I do
not want to mess up the (soon to be available) new release. 

-- 
regards
   Bartek
--
For every complex problem there is an answer that is clear, simple, and wrong. 
                   H. L. Mencken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Motif.py
Type: text/x-python
Size: 8378 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070917/0a15e1de/attachment.py 

From dalloliogm at gmail.com  Mon Sep 17 10:23:54 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 16:23:54 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
Message-ID: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>

Thank you: this is very good.

I see that it uses the berkeley weblogo website and urllib.

just one newbie question: why do you put it in the Bio.AlignAce.Motif class?
Thanks

Giovanni

2007/9/17, bartek wilczynski <bartek at rezolwenta.eu.org>:
> bartek wilczynski <bartek at rezolwenta.eu.org> wrote:
>
> > Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> >
> > > Hi,
> > > is there any way to produce sequence logos[1] with biopython?
> > >
> > > I have a set of sequences of the same length, which represent the 5'
> > > donorsite in a set of introns.
> > > I wonder if there is a way to to create and display a .png logo
> > > representation of them, like with this program:
> > > - http://weblogo.berkeley.edu/
> > >
> >
> > Unfortunately, currently there is no solution in biopython to this. You can
> > however take a look at TAMO, a python library designed for working with
> > motifs.
> > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with
> > it,
> > but there are ways to at least obtain text version of the logo.
> >
>
> I've looked into it, and found a way to add this functionality to biopython.
>
> The diff file attached introduces a method .weblogo("filename.png") to the
> Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
> standalone function which takes a fasta file as input.
>
> Is it a right time to submit things like this to cvs? I can do that, but I do
> not want to mess up the (soon to be available) new release.
>
> --
> regards
>    Bartek
> --
> For every complex problem there is an answer that is clear, simple, and wrong.
>                    H. L. Mencken
>
>
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com

From biopython at maubp.freeserve.co.uk  Mon Sep 17 10:24:42 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Sep 2007 15:24:42 +0100
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
Message-ID: <46EE8E2A.3080809@maubp.freeserve.co.uk>

bartek wilczynski wrote:
> I've looked into it, and found a way to add this functionality to biopython.
> 
> The diff file attached introduces a method .weblogo("filename.png") to the
> Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
> standalone function which takes a fasta file as input.
> 
> Is it a right time to submit things like this to cvs? I can do that, but I do
> not want to mess up the (soon to be available) new release. 

Its a very small change, but lets see what Michiel says for the timing.

It might be nice to expose all the options to the end user, possibly as 
handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
as in Bio/Blast/NCBIStandalone.py  blastall() etc.

Peter


From bartek at rezolwenta.eu.org  Mon Sep 17 10:57:52 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 16:57:52 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>
Message-ID: <1190041072.46ee95f03d51d@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:

> Thank you: this is very good.
> 
> I see that it uses the berkeley weblogo website and urllib.
> 
> just one newbie question: why do you put it in the Bio.AlignAce.Motif class?
> Thanks

Well, the quick answer is that it is the most convenient place for me to put it.
Since there is a Motif class for sequence motif objects, it is not a bad one. 

A longer answer is that biopython does not have a good infrastructure for
dealing with motifs. I've contributed the AlignAce lib, Jason Hackney
contributed the MEME library, which includes another Motif class, very similar,
but not exactly compatible with AlignAce code. 

I planned once to do some refactoring work to unify these to modules, but so far
did not find the time to do it. Now, since there is TAMO library available ,
there is even less incentive to do so (even though I do not use  TAMO myself).

cheers bartek

From bartek at rezolwenta.eu.org  Mon Sep 17 18:09:30 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 18 Sep 2007 00:09:30 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <46EE8E2A.3080809@maubp.freeserve.co.uk>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
Message-ID: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org>

Peter <biopython at maubp.freeserve.co.uk> wrote:

> > Is it a right time to submit things like this to cvs? I can do that, but I
> > do not want to mess up the (soon to be available) new release. 
> 
> Its a very small change, but lets see what Michiel says for the timing.
> 
It is indeed a very small change, however it seems to have at least one
prospective user ;). Also it is almost impossible to break anything by
including it in the new release. 

> It might be nice to expose all the options to the end user, possibly as 
> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
> as in Bio/Blast/NCBIStandalone.py  blastall() etc.

Good idea, I've included a new diff, which allows for passing any keys directly
from function call to the weblogo server such as:

m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image

BTW. It would be interesting to know if there are more people interested in
using a better module for sequence motifs. I have some code lying arround and
some ideas on how it could be put together, but since there were no documented
cases of anyone using Bio.AlignAce or Bio.MEME, I'm not sure if it's worth the
extra work.

-- 
cheers
   Bartek

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Motif.py.diff
Type: text/x-patch
Size: 2450 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070918/20c9d15e/attachment.bin 

From biopython at maubp.freeserve.co.uk  Tue Sep 18 04:12:02 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 18 Sep 2007 09:12:02 +0100
Subject: [BioPython] Removing Bio.FormatIO ?
In-Reply-To: <46E8308F.6040709@maubp.freeserve.co.uk>
References: <46E8308F.6040709@maubp.freeserve.co.uk>
Message-ID: <46EF8852.6090000@maubp.freeserve.co.uk>

Having looked at the Bio.FormatIO code in more detail, a simple 
deprecation warning isn't an option - it would get triggered whenever 
anyone used Bio.SeqRecord

Would anyone object if we removed Bio.FormatIO (and its hooks in 
Bio/SeqRecord.py and Bio/Search.py) entirely for the next release?

Speak now or forever hold your peace! ;)

Peter

Peter wrote:
> With the release Biopython 1.43 and Bio.SeqIO earlier this year, would 
> anyone be upset if the older Bio.FormatIO module was marked as 
> deprecated for the next Biopython release?
> 
> This module isn't mentioned in the tutorial/cookbook, but Brad did write 
> this entire document:
> http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf
> http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html
> 
> In addition to marking Bio.FormatIO as deprecated, I would probably add 
> a big disclaimer to that document, or re-write it to use Bio.SeqIO instead.
> 
> Thanks
> 
> Peter


From biopython at maubp.freeserve.co.uk  Tue Sep 18 04:51:26 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 18 Sep 2007 09:51:26 +0100
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
Message-ID: <46EF918E.90107@maubp.freeserve.co.uk>

>> It might be nice to expose all the options to the end user, possibly as 
>> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
>> as in Bio/Blast/NCBIStandalone.py  blastall() etc.
> 
> Good idea, I've included a new diff, which allows for passing any keys directly
> from function call to the weblogo server such as:
> 
> m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image

Does this let you do things like:

m.weblogo("x.png", res=300)

i.e. an integer, or do you have to use a string:

m.weblogo("x.png", res="300")

One way to "fix" this (if it is a problem) would be to do this:

for k,v in kwds.items():
     values[k]=str(v)

rather than:

for k,v in kwds.items():
     values[k]=v

Anyway, given we have at least ten days until the release (Michiel will 
be away - see his email on the developers list), and this is a little 
change, I would be happy for this to go into CVS now.

Peter


From bartek at rezolwenta.eu.org  Tue Sep 18 09:12:41 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 18 Sep 2007 15:12:41 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <46EF918E.90107@maubp.freeserve.co.uk>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
Message-ID: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org>

Peter <biopython at maubp.freeserve.co.uk> wrote:
> > 
> > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image
> 
> Does this let you do things like:
> 
> m.weblogo("x.png", res=300)
> 
> i.e. an integer, or do you have to use a string:
> 
> m.weblogo("x.png", res="300")
> 
> One way to "fix" this (if it is a problem) would be to do this:
> 
> for k,v in kwds.items():
>      values[k]=str(v)
> 
> rather than:
> 
> for k,v in kwds.items():
>      values[k]=v
> 
> Anyway, given we have at least ten days until the release (Michiel will 
> be away - see his email on the developers list), and this is a little 
> change, I would be happy for this to go into CVS now.

Thanks for another good idea. I submitted the code to CVS. 

-- 
cheers
   Bartek

From dalloliogm at gmail.com  Tue Sep 18 10:36:59 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 18 Sep 2007 16:36:59 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
Message-ID: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>

ok, thank you.

so, let's see if I understand how to use it:

from Bio.Seq import Seq
from Bio.AlignAce.Motif import Motif

m = Motif()
m.add_instance(Seq('ACTG'))
m.add_instance(Seq('ACCG'))
m.add_instance(Seq('ACTC'))

m.search_instance(Seq('ACACGACAACGTGTCGAT'))

m.weblogo('/home/user/logo.png')


Well, about refactoring.... honestly I think it would be a good idea.
The problem is that for example, I have never used AlignAce and I
don't know which kind of program is it... so I feel a bit confusing to
import a module called like this.
Anyway the Motif class seems useful, and I will use it in my program..
problably I will have to ask a few questions on it in the next days!
:)

2007/9/18, bartek wilczynski <bartek at rezolwenta.eu.org>:
> Peter <biopython at maubp.freeserve.co.uk> wrote:
> > >
> > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image
> >
> > Does this let you do things like:
> >
> > m.weblogo("x.png", res=300)
> >
> > i.e. an integer, or do you have to use a string:
> >
> > m.weblogo("x.png", res="300")
> >
> > One way to "fix" this (if it is a problem) would be to do this:
> >
> > for k,v in kwds.items():
> >      values[k]=str(v)
> >
> > rather than:
> >
> > for k,v in kwds.items():
> >      values[k]=v
> >
> > Anyway, given we have at least ten days until the release (Michiel will
> > be away - see his email on the developers list), and this is a little
> > change, I would be happy for this to go into CVS now.
>
> Thanks for another good idea. I submitted the code to CVS.
>
> --
> cheers
>    Bartek
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com

From bartek at rezolwenta.eu.org  Tue Sep 18 19:09:29 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Wed, 19 Sep 2007 01:09:29 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
	<5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
Message-ID: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com>:

> ok, thank you.
> 
> so, let's see if I understand how to use it:
> 
> from Bio.Seq import Seq
> from Bio.AlignAce.Motif import Motif
> 
> m = Motif()
> m.add_instance(Seq('ACTG'))
> m.add_instance(Seq('ACCG'))
> m.add_instance(Seq('ACTC'))
> 
> m.search_instance(Seq('ACACGACAACGTGTCGAT'))
> 
> m.weblogo('/home/user/logo.png')
> 

You got it mostly right. However, the .search_instance() and .search_pwm()
methods return generators, so you should rather use:

for pos,instance in m.search_instance(sequence):
    print "found %s at %d"%(instance,pos)

> 
> Well, about refactoring.... honestly I think it would be a good idea.
> The problem is that for example, I have never used AlignAce and I
> don't know which kind of program is it... so I feel a bit confusing to
> import a module called like this.

The basic idea is to create a new Motif class aggregating the good parts of the
AlignAce and MEME versions and modify these modules so they would use the new
class. I'll try to look into that next week. I also have some code for reading
modules from the JASPAR database and motif comparisons. I'll try to clean it up
ands submit as well. Then we could try to come up with a section in the tutorial
devoted to motif analysis. If you have anything you would consider useful in the
Motif library, let me know.

> Anyway the Motif class seems useful, and I will use it in my program..
> problably I will have to ask a few questions on it in the next days!
> :)

No problem, I'll do my best to answer your questions. However I'm leaving
tomorrow for the CMSB conference, so I may be slow at responding to email this
week.

-- 
cheers
   Bartek

From robert.campbell at queensu.ca  Wed Sep 19 09:32:14 2007
From: robert.campbell at queensu.ca (Robert Campbell)
Date: Wed, 19 Sep 2007 09:32:14 -0400
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
	<5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
	<1190156969.46f05aa950c96@imp.rezolwenta.eu.org>
Message-ID: <20070919093214.2c7567da@adelie.biochem.queensu.ca>

On Wed, 19 Sep 2007 01:09:29 +0200, bartek wilczynski
<bartek at rezolwenta.eu.org> wrote:

> Giovanni Marco Dall'Olio <dalloliogm at gmail.com>:
> 
> > ok, thank you.
> > 
> > so, let's see if I understand how to use it:
> > 
> > from Bio.Seq import Seq
> > from Bio.AlignAce.Motif import Motif
> > 
> > m = Motif()
> > m.add_instance(Seq('ACTG'))
> > m.add_instance(Seq('ACCG'))
> > m.add_instance(Seq('ACTC'))
> > 
> > m.search_instance(Seq('ACACGACAACGTGTCGAT'))
> > 
> > m.weblogo('/home/user/logo.png')
> > 
> 
> You got it mostly right. However, the .search_instance() and .search_pwm()
> methods return generators, so you should rather use:
> 
> for pos,instance in m.search_instance(sequence):
>     print "found %s at %d"%(instance,pos)

I believe that should be "m.search_instances(sequence)" not
"m.search_instance(sequence)"  (i.e. "instances", plural).

Cheers,
Rob
-- 
Robert L. Campbell, Ph.D.
Senior Research Associate/Adjunct Assistant Professor 
Botterell Hall Rm 644
Department of Biochemistry, Queen's University, 
Kingston, ON K7L 3N6  Canada
Tel: 613-533-6821            Fax: 613-533-2497
<robert.campbell at queensu.ca>    http://pldserver1.biochem.queensu.ca/~rlc

From meesters at uni-mainz.de  Thu Sep 20 08:23:54 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Thu, 20 Sep 2007 14:23:54 +0200
Subject: [BioPython] feature request for Bio.PDB
Message-ID: <1190291034.9570.28.camel@cmeesters>

Hi,

I think it would be good to have the option to retrieve the kind of atom
added using a method of the atom-class, e.g. like:
x = atom.get_kind()
and x would then be 'H' or 'N' for instance. It is of course possible to
retrieve this information via the atom id, but this requires to employ a
dictionary if one wants to know which type of atom this is. So, such a
method would only be for convenience.

It would be nice to see this in the upcoming release, but I fear it's
too late for this and it would be great if this idea would only be
considered for some other future release.

Christian

From anaryin at gmail.com  Fri Sep 21 15:40:07 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 20:40:07 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
Message-ID: <b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>

Hello all!

I'm writing a small script to fetch results from a NCBI database search
using BioPython modules. However, I'd like to broaden my search and to have
each page of the results displaying 500 results instead of the usual 20.
Does anyone has any idea on how to do this?

Thanks !

Jo?o Rodrigues


From anaryin at gmail.com  Fri Sep 21 16:33:55 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 21:33:55 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <46F41FEA.5020205@maubp.freeserve.co.uk>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>
	<46F41FEA.5020205@maubp.freeserve.co.uk>
Message-ID: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>

Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the
code *could* be clearer! Oh, and some of it is in Portuguese because it's
for personal use..


# NCBI Retriever

import os
import sys

# What should I look for?

query = raw_input('Qual a expressao que deseja procurar?\n..: ')

# Where should I look for?

print 'Em qual das bases de dados deseja procurar?'

databases = {1: 'PubMed', 2: 'Nucleotide', 3:
'Protein',4:'Genome',5:'Structure'}

choice = raw_input('[1] PubMed\n[2] Nucleotide\n[3] Protein\n[4] Genome\n[5]
Structure\n..: ')

if int(choice) not in databases.keys():
    print 'Escolha Inv?lida'
    sys.exit()

search_database = databases[int(choice)]

# Quit playing around, let s search!

from Bio.WWW import NCBI

search_command = 'Search'

results = NCBI.query(search_command , search_database, term = query,
doptcmdl = 'FASTA')

# Where should I save the results?

import time
actual_date = str(time.localtime()[0])+str(time.localtime()[1])+str(
time.localtime()[2])
results_file_name = os.path.join(os.getcwd(),
str(query)+'_'+str(actual_date)+".txt")

results_file = open(results_file_name, 'w')

results_file.write(results.read())
results_file.close()


From biopython at maubp.freeserve.co.uk  Fri Sep 21 17:40:15 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Sep 2007 22:40:15 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
Message-ID: <46F43A3F.9090008@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the
> code *could* be clearer! Oh, and some of it is in Portuguese because it's
> for personal use..

That's fine - as the code and comments were in English it was fine.

I see you are using Bio.WWW.NCBI as an interface to the Entrez query 
system.  Somewhere on the NCBI website they have an answer to your 
question (how to specify the number of results per page):

results = NCBI.query('Search', 'Protein', term='orchid', dispmax=23)

Some pages mentioned retstart and retmax but that doesn't seem to work.

You might also consider using Bio.EUtils instead - a python wrapper for 
the NCBI's E-Utils interface.

Peter


From biopython at maubp.freeserve.co.uk  Fri Sep 21 18:00:58 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Sep 2007 23:00:58 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
Message-ID: <46F43F1A.9040309@maubp.freeserve.co.uk>

Hi again Jo?o,

I'm was thinking about your example code, and while I'm not sure exactly 
what you want to be able to do in python:

You might want to look at the search_for() function in Bio.PubMed and 
Bio.GenBank (which uses EUtils internally), and then the download_many() 
or dictionary interfaces.  This is covered in the Biopython tutorial.

I'm not sure if we have a front end for the structure database at the 
moment.

This may be more helpful than working with Entrez directly.

Peter


From anaryin at gmail.com  Fri Sep 21 18:57:20 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 23:57:20 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <46F43F1A.9040309@maubp.freeserve.co.uk>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>
	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
	<46F43F1A.9040309@maubp.freeserve.co.uk>
Message-ID: <b537e3710709211557v6bb98c1ao85aca451e7306be@mail.gmail.com>

Thanks you for the tip, it worked perfectly.

Well, to be honest, I'm just practicing BioPython and Python skills. What
I'm trying to do is a simple script that searches for *something* in PubMed,
gets the results page and parses that page so that I can give the user, that
is, myself at the moment :) , a txt file with this format:

----
TITLE:
AUTHOR:
YEAR:
JOURNAL: (optional actually)
ABSTRACT:
LINK:
RELATED LINKS:
----

It is probably already made and in a more useful way than mine but, as I do
need to practice, it's a start!

Again, thanks for the tips. I'll look into those Bio.PubMed and Bio.GenBank.

From anaryin at gmail.com  Mon Sep 24 12:13:33 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 24 Sep 2007 17:13:33 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
Message-ID: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>

Hello!

I am working in a University whose network is proxied. I can't work with any
of the BioPython modules that require access to the Internet (e.g. Bio.WWW).
How can I configure them manually to override the proxy? I already read
about configuring the urllib to use a proxy, but I can't figure out where to
find the string that handles the connection.

Jo?o Rodrigues


From biopython at maubp.freeserve.co.uk  Mon Sep 24 12:58:56 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Sep 2007 17:58:56 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
In-Reply-To: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
References: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
Message-ID: <46F7ECD0.8020001@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Hello!
> 
> I am working in a University whose network is proxied. I can't work
> with any of the BioPython modules that require access to the Internet
> (e.g. Bio.WWW). How can I configure them manually to override the
> proxy? I already read about configuring the urllib to use a proxy,
> but I can't figure out where to find the string that handles the
> connection.

Bio.WWW uses urllib, so the simplest answer is to follow the advice in 
http://docs.python.org/lib/module-urllib.html

Specifically on Windows you probably just need to set the http_proxy 
environment variables before starting Python, or configure the proxy in 
the internet settings (via Internet Explorer I assume).  I think would 
be easiest to set this environment variable once by hand, but you could 
set it at run time as part of your python script.

You'll have to consult your Universities network documentation to 
determine the string to use for the http_proxy environment variable, but 
it would look something like "http://www.someproxy.com:3128" (i.e. 
address:port number).

The alternative is to pass the "proxies" option to urllib.openurl(), but 
this would require multiple changes in Bio.WWW to support.  Note that 
urllib does not currently support proxies which require authentication.

Peter


From biopython at maubp.freeserve.co.uk  Mon Sep 24 17:47:13 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Sep 2007 22:47:13 +0100
Subject: [BioPython] poor man's databases for large sequence files
Message-ID: <46F83061.3090207@maubp.freeserve.co.uk>

I've been thinking about extending Bio.SeqIO to support a (read only) 
dictionary like interface for large sequence files (WITHOUT having 
everything in memory).

Some of the older Biopython sequence format specific modules have an 
index_file function and matching Dictionary class to do this (based 
internally on either Martel/Mindy or a DIY Biopython indexer based on 
pickle).

When thinking about a format agnostic SeqRecord dictionary, the built in 
python "Shelf" object from python's built in "shelve library" looks like 
a good choice.  I could add a Bio.SeqIO.to_shelf() function similar to 
the existing Bio.SeqIO.to_dict() function.

The only downside I've thought of so far is updating a shelf database, 
something supported by shelve but with a few gotchas when dealing with 
non-trivial datatypes (like dictionaries).  The need I am thinking about 
addressing is a little less flexible - read only low-memory access to a 
large collection of SeqRecords (typically from a large sequence file).

Does anyone already use python's shelve library with sequence data?

Peter


From anaryin at gmail.com  Mon Sep 24 19:11:57 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 25 Sep 2007 00:11:57 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
In-Reply-To: <46F7ECD0.8020001@maubp.freeserve.co.uk>
References: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
	<46F7ECD0.8020001@maubp.freeserve.co.uk>
Message-ID: <b537e3710709241611w150b5c9ev5360ca7ee60d1efd@mail.gmail.com>

 Again, thank you for the kind answer!

I had in fact read about the urllib module and that was how I "discovered"
that I could configure the proxy "by hand". If I set it automatically at the
IE, or firefox, it won't work on Python, but it will on the browser. As for
the http_proxy env variable, how do I set them?

From sdavis2 at mail.nih.gov  Mon Sep 24 21:40:21 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 24 Sep 2007 21:40:21 -0400
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F83061.3090207@maubp.freeserve.co.uk>
References: <46F83061.3090207@maubp.freeserve.co.uk>
Message-ID: <46F86705.1090109@mail.nih.gov>

Peter wrote:
> I've been thinking about extending Bio.SeqIO to support a (read only) 
> dictionary like interface for large sequence files (WITHOUT having 
> everything in memory).
>
> Some of the older Biopython sequence format specific modules have an 
> index_file function and matching Dictionary class to do this (based 
> internally on either Martel/Mindy or a DIY Biopython indexer based on 
> pickle).
>
> When thinking about a format agnostic SeqRecord dictionary, the built in 
> python "Shelf" object from python's built in "shelve library" looks like 
> a good choice.  I could add a Bio.SeqIO.to_shelf() function similar to 
> the existing Bio.SeqIO.to_dict() function.
>
> The only downside I've thought of so far is updating a shelf database, 
> something supported by shelve but with a few gotchas when dealing with 
> non-trivial datatypes (like dictionaries).  The need I am thinking about 
> addressing is a little less flexible - read only low-memory access to a 
> large collection of SeqRecords (typically from a large sequence file).
>
> Does anyone already use python's shelve library with sequence data?
>   

Just a curiosity, Peter, but would this extension deal with small 
collections of large sequences (finished genomes, for example)? 

Sean

From biopython at maubp.freeserve.co.uk  Tue Sep 25 04:14:50 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 25 Sep 2007 09:14:50 +0100
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F86705.1090109@mail.nih.gov>
References: <46F83061.3090207@maubp.freeserve.co.uk>
	<46F86705.1090109@mail.nih.gov>
Message-ID: <46F8C37A.1000005@maubp.freeserve.co.uk>

Sean Davis wrote:
> Peter wrote:
>> I've been thinking about extending Bio.SeqIO to support a (read only) 
>> dictionary like interface for large sequence files (WITHOUT having 
>> everything in memory).
>>
>> ...
>>
>> Does anyone already use python's shelve library with sequence data?
>>   
> 
> Just a curiosity, Peter, but would this extension deal with small 
> collections of large sequences (finished genomes, for example)? 
> 

Hi Sean,

What I had in mind was say indexing all of UniProt which is currently 
1.1 GB in the SwissProt flat file format, but each record is pretty small.

However, in theory this (largely unwritten) code could be used on any 
number of any sized records - but you would need enough ram to hold any 
one record in memory at once, plus some more RAM for the hopefully 
modest database overhead, python, your script etc.

I suppose having all the chromosomes for a given Eukaryote (e.g. mouse 
or fruit fly) would also be a sensible examples; having tens of records 
where each is tens of MB in size. Is that the sort of thing you had in 
mind Sean?

Peter


From sdavis2 at mail.nih.gov  Tue Sep 25 07:41:25 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 25 Sep 2007 07:41:25 -0400
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F8C37A.1000005@maubp.freeserve.co.uk>
References: <46F83061.3090207@maubp.freeserve.co.uk>
	<46F86705.1090109@mail.nih.gov>
	<46F8C37A.1000005@maubp.freeserve.co.uk>
Message-ID: <46F8F3E5.5020802@mail.nih.gov>

Peter wrote:
> Sean Davis wrote:
>> Peter wrote:
>>> I've been thinking about extending Bio.SeqIO to support a (read only)
>>> dictionary like interface for large sequence files (WITHOUT having
>>> everything in memory).
>>>
>>> ...
>>>
>>> Does anyone already use python's shelve library with sequence data?
>>>   
>>
>> Just a curiosity, Peter, but would this extension deal with small
>> collections of large sequences (finished genomes, for example)?
> 
> Hi Sean,
> 
> What I had in mind was say indexing all of UniProt which is currently
> 1.1 GB in the SwissProt flat file format, but each record is pretty small.
> 
> However, in theory this (largely unwritten) code could be used on any
> number of any sized records - but you would need enough ram to hold any
> one record in memory at once, plus some more RAM for the hopefully
> modest database overhead, python, your script etc.
> 
> I suppose having all the chromosomes for a given Eukaryote (e.g. mouse
> or fruit fly) would also be a sensible examples; having tens of records
> where each is tens of MB in size. Is that the sort of thing you had in
> mind Sean?

Yes.  Lincoln Stein wrote some indexing stuff in perl that allows
essentially random access to sequence records as well as subsets of
individual records.  It makes it possible to do range queries on
individual sequences with very modest memory; with a larger memory
machine, one might imagine that this would result in very fast queries
as the files get cached.

Sean

From ytu888 at hotmail.com  Fri Sep 28 07:40:09 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 06:40:09 -0500
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
Message-ID: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>


I'm a newbie for the Biopython, and want to install it on my Mac OS X computer. I got the similar error messages on command line when install Python2.5, but finally I did that using the python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got the following messages: mxDateTime.c is missing. Where to find the file? Please help me to solve the problem and thank you very much.

LeesComputer:/Users/Python_Bio/egenix-mx-base-3.0.0.macosx-10.3-fat-py2.5_ucs4.prebuilt Lee$ sudo python setup.py build
running build
running mx_autoconf
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -D_GNU_SOURCE=1 -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
macros to define: []
macros to undefine: []
running build_ext

building extension "mx.DateTime.mxDateTime.mxDateTime" (required)
building 'mx.DateTime.mxDateTime.mxDateTime' extension
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DUSE_FAST_GETCURRENTTIME -Imx/DateTime/mxDateTime -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c mx/DateTime/mxDateTime/mxDateTime.c -o build/temp.darwin-8.10.1-i386-2.3_ucs2/mx-DateTime-mxDateTime-mxDateTime/mx/DateTime/mxDateTime/mxDateTime.o
i686-apple-darwin8-gcc-4.0.1: mx/DateTime/mxDateTime/mxDateTime.c: No such file or directory
i686-apple-darwin8-gcc-4.0.1: no input files
error: command 'gcc' failed with exit status 1

_________________________________________________________________
Connect to the next generation of MSN Messenger?
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

From biopython at maubp.freeserve.co.uk  Fri Sep 28 08:27:17 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 13:27:17 +0100
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
In-Reply-To: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
Message-ID: <46FCF325.4040002@maubp.freeserve.co.uk>

Y Tu wrote:
> I'm a newbie for the Biopython, and want to install it on my Mac OS X
> computer. I got the similar error messages on command line when
> install Python2.5, but finally I did that using the
> python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got
> the following messages: mxDateTime.c is missing. Where to find the
> file? Please help me to solve the problem and thank you very much.

It sounds like you don't want to use the default Apple provided python - 
I have the impression that this can make life more complicated.  I'm not 
a Mac user, but Michiel, is and he may be able to help.  He has been 
away recently but should be back soon.

In terms of installing mxTextTools, you may get more support on the 
egenix mailing list.  However, there are currently some issues with 
Biopython and egenix mxTextTools 3.0, so if you can find it I would 
suggest using version 2.0 instead.

We hope to release Biopython 1.44 in October, which will address most of 
the mxTextText tools issues.  That said, the majority of Biopython 1.43 
will still work even with mxTextTools 3.0

Peter


From ytu888 at hotmail.com  Fri Sep 28 09:22:43 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 08:22:43 -0500
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
Message-ID: <BAY119-W12BF9C22D8A72CBCCD943C8FB20@phx.gbl>


The one coming with Mac OS X is an old version. Therefore I installed the new one 2.5.1 and it succeeded. Then, it came the problem with mxTextTools. I just did the installation of Numerical and it worked.


> Date: Fri, 28 Sep 2007 13:27:17 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for installation of mxTextTools on Mac OS X
> 
> Y Tu wrote:
> > I'm a newbie for the Biopython, and want to install it on my Mac OS X
> > computer. I got the similar error messages on command line when
> > install Python2.5, but finally I did that using the
> > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got
> > the following messages: mxDateTime.c is missing. Where to find the
> > file? Please help me to solve the problem and thank you very much.
> 
> It sounds like you don't want to use the default Apple provided python - 
> I have the impression that this can make life more complicated.  I'm not 
> a Mac user, but Michiel, is and he may be able to help.  He has been 
> away recently but should be back soon.
> 
> In terms of installing mxTextTools, you may get more support on the 
> egenix mailing list.  However, there are currently some issues with 
> Biopython and egenix mxTextTools 3.0, so if you can find it I would 
> suggest using version 2.0 instead.
> 
> We hope to release Biopython 1.44 in October, which will address most of 
> the mxTextText tools issues.  That said, the majority of Biopython 1.43 
> will still work even with mxTextTools 3.0
> 
> Peter
> 

_________________________________________________________________
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE

From ytu888 at hotmail.com  Fri Sep 28 11:26:11 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 10:26:11 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
Message-ID: <BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>


I just installed ReportLab on Mac OS X and the test with command "from reportlab.graphics import renderPDF" succeeded. However, when I run the test script (eportlab/test/test_pdfgen_general.py), I got the following error. How to fix the problem. Another question is how to run the script under the python prompt (>>>) after importing the script by "import test_pdfgen_general.py". Thank you very much.

nypivs-lee:/Applications/MacPython 2.5/reportlab/test lee$ python test_pdfgen_general.py
E
======================================================================
ERROR: Make a PDFgen document with most graphics features
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_pdfgen_general.py", line 833, in test0
    run(outputfile('test_pdfgen_general.pdf'))
  File "test_pdfgen_general.py", line 796, in run
    c = makeDocument(filename)
  File "test_pdfgen_general.py", line 725, in makeDocument
    c.drawImage(tgif, 4*inch, 9.25*inch, w, h, mask='auto')
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfgen/canvas.py", line 629, in drawImage
    imgObj = pdfdoc.PDFImageXObject(name, image, mask=mask)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1840, in __init__
    self.loadImageFromA85(src)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1846, in loadImageFromA85
    imagedata = map(string.strip,pdfutils.makeA85Image(source,IMG=IMG))
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfutils.py", line 35, in makeA85Image
    raw = img.getRGBData()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/lib/utils.py", line 612, in getRGBData
    self._data = im.tostring()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 513, in tostring
    self.load()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
    d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
    raise IOError("decoder %s not available" % decoder_name)
IOError: decoder jpeg not available

----------------------------------------------------------------------
Ran 1 test in 0.321s

FAILED (errors=1)


_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

From biopython at maubp.freeserve.co.uk  Fri Sep 28 12:28:28 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 17:28:28 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
Message-ID: <46FD2BAC.80401@maubp.freeserve.co.uk>

Y Tu wrote:
> I just installed ReportLab on Mac OS X and the test with command
> "from reportlab.graphics import renderPDF" succeeded. However, when I
> run the test script (reportlab/test/test_pdfgen_general.py), I got the
> following error. How to fix the problem.

I would guess you have not installed PIL, the Python Imaging Library, 
which ReportLab uses.

 > Another question is how to
> run the script under the python prompt (>>>) after importing the
> script by "import test_pdfgen_general.py". Thank you very much.

To run a python script, like "test_pdfgen_general.py", at the command 
line type:

python test_pdfgen_general.py

(assuming python is on the path, and example.py is in the current directory)

In general there are two sorts of python files, scripts which you run 
(like test_pdfgen_general.py) and library modules you import.

Peter


From ytu888 at hotmail.com  Fri Sep 28 15:18:06 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 14:18:06 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FD2BAC.80401@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
Message-ID: <BAY119-W29B426228830B4921806608FB20@phx.gbl>


Thank you, Peter for the prompt answer.

I did install the PIL already and tested with the commands "from PIL import Image",
then "import _imaging". Both commands succeeded. That's why I don't understand why the test won't work. I used the command "python test_pdfgen_general.py" under the shell prompt, which generated the error. Since I installed PIL and succeeded in importing the module of PIL, I thought maybe I can solve the problem by running the test under Python. However, after importing the test into Python. I do't know how to launch the test under the python prompt (>>>). That's why I asked the second question. 

Once again, thank you very much for help.

> Date: Fri, 28 Sep 2007 17:28:28 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> 
> Y Tu wrote:
> > I just installed ReportLab on Mac OS X and the test with command
> > "from reportlab.graphics import renderPDF" succeeded. However, when I
> > run the test script (reportlab/test/test_pdfgen_general.py), I got the
> > following error. How to fix the problem.
> 
> I would guess you have not installed PIL, the Python Imaging Library, 
> which ReportLab uses.
> 
>  > Another question is how to
> > run the script under the python prompt (>>>) after importing the
> > script by "import test_pdfgen_general.py". Thank you very much.
> 
> To run a python script, like "test_pdfgen_general.py", at the command 
> line type:
> 
> python test_pdfgen_general.py
> 
> (assuming python is on the path, and example.py is in the current directory)
> 
> In general there are two sorts of python files, scripts which you run 
> (like test_pdfgen_general.py) and library modules you import.
> 
> Peter
> 

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

From biopython at maubp.freeserve.co.uk  Fri Sep 28 15:42:31 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 20:42:31 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W29B426228830B4921806608FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>	<46FCF325.4040002@maubp.freeserve.co.uk>	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
Message-ID: <46FD5927.3000207@maubp.freeserve.co.uk>

Y Tu wrote:
> Thank you, Peter for the prompt answer.
> 
> I did install the PIL already and tested with the commands "from PIL
> import Image", then "import _imaging". Both commands succeeded.
> That's why I don't understand why the test won't work. I used the
> command "python test_pdfgen_general.py" under the shell prompt, which
> generated the error. Since I installed PIL and succeeded in importing
> the module of PIL, I thought maybe I can solve the problem by running
> the test under Python.

Looking in more detail at the original stack trace,

>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
>     d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
>     raise IOError("decoder %s not available" % decoder_name)
> IOError: decoder jpeg not available

Its possible that PIL needs some optional JPEG library, which ReportLab 
wants to use.  I suggest you search the ReportLab website & user's 
mailing list, and if you can't work out what is wrong sign up to their 
mailing list and ask them, http://www.reportlab.org/

Very little of Biopython needs ReportLab, you should be able to install 
Biopython without it.

Peter


From ibdeno at gmail.com  Sun Sep  2 15:52:57 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Sun, 2 Sep 2007 17:52:57 +0200
Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary
Message-ID: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>

Hello everyone.

I'm trying to retrieve from NCBI a series of GeneBank records from a list
read from a file.
This is the code:

8<-------------------------------------------------------------------------------------------

ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")

output = open(args[0]+'.gb','w')

for gbid in ids:
    gb_record = ncbi_dict[gbid]
    output.write(gb_record)

output.close()

------------------------------------------------------------------------------------------->8

The problem is that at some point the job stops with an error such as:

Traceback (most recent call last):
  File "/Users/mol/bin/getfromGB.py", line 61, in ?
    main()
  File "/Users/mol/bin/getfromGB.py", line 54, in main
    gb_record = ncbi_dict[gbid]
  File "/sw/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1264,
in __getitem__
    handle = self.db[id]
  File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 89,
in __getitem__
    return self._get(key)
  File "/sw/lib/python2.4/site-packages/Bio/config/_support.py", line 107,
in __call__
    return self.fn(*args, **keywds)
  File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 370,
in _get
    handle = eutils_client.efetch(retmode = "text", rettype =
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/DBIdsClient.py", line
150, in efetch
    complexity = complexity)
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 987,
in efetch_using_dbids
    query = {"id": id_string,
  File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 644,
in _get
    return self.opener.open(url)
  File "/sw/lib/python2.4/urllib2.py", line 364, in open
    response = meth(req, response)
  File "/sw/lib/python2.4/urllib2.py", line 471, in http_response
    response = self.parent.error(
  File "/sw/lib/python2.4/urllib2.py", line 402, in error
    return self._call_chain(*args)
  File "/sw/lib/python2.4/urllib2.py", line 337, in _call_chain
    result = func(*args)
  File "/sw/lib/python2.4/urllib2.py", line 480, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 503: Service Temporarily Unavailable

Sometimes is a 502 Error... Because I can access those entries from my
browser without problem, I'm guessing that there may be a timeout problem
here.

I would appreciate your help!

Cheers,

Miguel
-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From sbassi at gmail.com  Mon Sep  3 03:25:22 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 3 Sep 2007 00:25:22 -0300
Subject: [BioPython] Getting the location from a Genbank record
Message-ID: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>

I can get the "location" of the genes I want, but I have them in a
"print mode" (calling __str__), but I don't see how to get the start
and end position in a way I could use to slice the seq. There are
private attributes _start and _end but I don't know if using them if
the "right" way to do it.

from Bio import SeqIO
mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next()
targets=(['cox2'],['atp6'],['atp9'],['cob'])
for x in mr.features:
        if x.qualifiers.get('gene') in targets:
            print x.location
            #print mr.seq

Get the slice I am looking for:

>>> mr.seq[x.location._start.position:x.location._end.position]
Seq('ATGAATGTTATAACTCCTAATTCTTTGGTAGCGGACCTCTTTGATAGTTCGACCCTTATCCCCCGTCTAACTCAACTATTCGACTGTACGGCTATTGTGATTGCGAGAGAAAGGAGGGATGGCGCCTTCCTTTACCATCTGGCGGTTGAAAACAAAAGTGCTTCCAGGTACACGGCTGTTAGGCTCATCCAAGGCGTATTTACGGAAGTAGCAGGGAACTTGACCGTCAAGTTTGAAAAAAGCTGGCCAAGCCTGTGTCACTTTCTTACGTCAGGAGAAAGGGAGATCAAAGAAGTATGGGGCCGATACGCGAAGGATCAAATCATAGAGATAGCGGATCTTAAGAGGCGGAAGAAAAGGAACCTCGGCGACCCAGAGATCGCGGAGTCCGCGCCCGTGCCGAAAGTGAAGAAGCTTTCCTCTCCTTTCAGTCGAGCATGCCCGCCCTTTAGCACTTCCCTTCCCGAAGTGGGAGTAGGAGAAAGGAAAGCGCACTCGATCAATTACCATGCCGTGTCGTAA',
IUPACAmbiguousDNA())


-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From biopython at maubp.freeserve.co.uk  Mon Sep  3 10:46:32 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 3 Sep 2007 11:46:32 +0100
Subject: [BioPython] Getting the location from a Genbank record
In-Reply-To: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
References: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
Message-ID: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>

On 9/3/07, Sebastian Bassi <sbassi at gmail.com> wrote:
> I can get the "location" of the genes I want, but I have them in a
> "print mode" (calling __str__), but I don't see how to get the start
> and end position in a way I could use to slice the seq. There are
> private attributes _start and _end but I don't know if using them if
> the "right" way to do it.
>
> from Bio import SeqIO
> mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next()
> targets=(['cox2'],['atp6'],['atp9'],['cob'])
> for x in mr.features:
>        if x.qualifiers.get('gene') in targets:
>            print x.location
>            #print mr.seq

I'm not at my own computer right now, but I think you need to do
something like this to get the slice - assuming nothing funny like
joins:

start = x.location.start.position
end = x.location.end.position
print mr.seq[start:end]
print mr.seq[start:end].reverse_complement()

See also: http://www.warwick.ac.uk/go/peter_cock/python/genbank/

Peter


From sbassi at gmail.com  Mon Sep  3 13:32:28 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 3 Sep 2007 10:32:28 -0300
Subject: [BioPython] Getting the location from a Genbank record
In-Reply-To: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>
References: <b43bf2080709022025i691130fetc5d2c11ebbf72fca@mail.gmail.com>
	<320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com>
Message-ID: <b43bf2080709030632s7a5f7d39tf03139e508b4c1a@mail.gmail.com>

On 9/3/07, Peter <biopython at maubp.freeserve.co.uk> wrote:
> start = x.location.start.position
> end = x.location.end.position

Yes, this worked. I tried x.location._start.position
because of this:
>>> dir(x.location)
['__doc__', '__getattr__', '__init__', '__module__', '__str__',
'_end', '_start']

Thank you!


-- 
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318


From biopython at maubp.freeserve.co.uk  Mon Sep  3 16:47:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 03 Sep 2007 17:47:06 +0100
Subject: [BioPython] Extracting SeqFeature locations from sequences
Message-ID: <46DC3A8A.1000100@maubp.freeserve.co.uk>

I was prompted to actually write this email based on Sebastian Bassi's 
recent email where he was having trouble getting to grips with this topic.

I had been thinking that Biopython really should have code built in to 
take a SeqFeature's location and extract this from the full record 
sequence. This would particularly apply to SeqRecord objects read from 
GenBank or EMBL files (using Bio.SeqIO or using Bio.GenBank directly).

As far as I am aware, right now it is up to the user to take the 
information stored in a SeqFeature and apply this "by hand" to the 
parent record's sequence.  Adding some more detailed examples to the 
tutorial is probably a good idea - for example based on 
http://www.warwick.ac.uk/go/peter_cock/python/genbank/

In addition to improving the documentation, we could add a new method to 
the Seq and/or SeqRecord object which would return the sub-sequence 
defined by a SeqFeature.

We could even do this via the __getitem__ method, normally used for 
accessing elements of a sequence (as strings) or splicing to get a 
sub-sequence. e.g.
print seq[index]
print seq[start:end]
print seq[feature]
or,
print record[feature]

I think this is quite elegant, but a separate explicitly named method 
might be clearer and more discoverable.

To do this properly covering all cases is actually non-trivial - a good 
reason to have it built into Biopython (with a good test suite) rather 
than having end users reimplement it themselves.

Messy details to take care of include being aware of both joins and 
complements (stored as sub-features and the strand property 
respectively), and fuzzy locations.  Most situations should be resolved 
relatively easily - but in the worst case we could throw a ValueError if 
there really is no sensible solution.

Peter


From biopython at maubp.freeserve.co.uk  Tue Sep  4 13:05:21 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 04 Sep 2007 14:05:21 +0100
Subject: [BioPython] problem accessing ncbi through
	GenBank.NCBIDictionary
In-Reply-To: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
References: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
Message-ID: <46DD5811.8060209@maubp.freeserve.co.uk>

Miguel Ortiz-Lombard?a wrote:
> Hello everyone.
> 
> I'm trying to retrieve from NCBI a series of GeneBank records from a list
> read from a file.

How many GenBenk identifiers are we talking about? Just trying to get an 
idea of the scale of the problem.  It certainly sounds like either 
network failures or timeouts.  Have you try something like this?

from Bio import GenBank
from urllib2 import HTTPError
ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")
ids = ['14598510', '16904191']
output = open('saved.gb','w')
for gbid in ids:
     print "Fetching %s" % gbid
     try :
         gb_record = ncbi_dict[gbid]
     except HTTPError, e :
         #Check error code?
         print str(e)
         print "Re-trying %s" % gbid
         gb_record = ncbi_dict[gbid]
     output.write(gb_record)
output.close()
print "Done"

Peter


From jimmy.musselwhite at gmail.com  Tue Sep  4 13:23:37 2007
From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite)
Date: Tue, 4 Sep 2007 09:23:37 -0400
Subject: [BioPython] Bio.Cluster clarification
Message-ID: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>

Hello all
In the documentation it says the "data" argument is "an array containing the
gene expression data". What exactly does that mean? Ideally all I want to do
is send it an array of lists, each containing 3 floats, aka an array of
vectors in 3d space, and have it cluster those. Is that doable?

This may seem like a beginner question but I'm not sure of this
documentation (cluster.pdf).

Thanks!

Or, less likely, if you know of any python lib that can handle this, let me
know!


From biopython at maubp.freeserve.co.uk  Tue Sep  4 13:42:00 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 04 Sep 2007 14:42:00 +0100
Subject: [BioPython] Bio.Cluster clarification
In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
Message-ID: <46DD60A8.7070403@maubp.freeserve.co.uk>

Jimmy Musselwhite wrote:
> Hello all
 > In the documentation it says the "data" argument is "an array
 > containing the gene expression data". What exactly does that mean?

I suspect that means an array object from the Numeric library. i.e. a
two dimensional dataset of floats. In the context of gene expression,
the rows are usually different genes and the columns different samples
(typically covering two or more experimental conditions), and the data
points are simply floating point numbers (gene expression levels).

> Ideally all I want to do is send it an array of lists, each
> containing 3 floats, aka an array of vectors in 3d space, and have it
> cluster those. Is that doable?

When you say you have an array of three-vectors, do you mean you have a
three dimensional dataset? e.g. a vector field

> This may seem like a beginner question but I'm not sure of this 
> documentation (cluster.pdf).

Hopefully Michiel will reply shortly - as the author of Bio.Cluster, he
should be able to give you a more precise answer.  See also his webpage:
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/

Peter


From mdehoon at c2b2.columbia.edu  Tue Sep  4 13:47:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Tue, 04 Sep 2007 22:47:49 +0900
Subject: [BioPython] Bio.Cluster clarification
In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com>
Message-ID: <46DD6205.7070801@c2b2.columbia.edu>

Jimmy Musselwhite wrote:
> Hello all
> In the documentation it says the "data" argument is "an array containing the
> gene expression data". What exactly does that mean? Ideally all I want to do
> is send it an array of lists, each containing 3 floats, aka an array of
> vectors in 3d space, and have it cluster those. Is that doable?

Yes.

--Michiel.
> 
> This may seem like a beginner question but I'm not sure of this
> documentation (cluster.pdf).
> 
> Thanks!
> 
> Or, less likely, if you know of any python lib that can handle this, let me
> know!
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From ibdeno at gmail.com  Tue Sep  4 14:55:27 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Tue, 4 Sep 2007 16:55:27 +0200
Subject: [BioPython] problem accessing ncbi through
	GenBank.NCBIDictionary
In-Reply-To: <46DD5811.8060209@maubp.freeserve.co.uk>
References: <d5dc3ecc0709020852q5b6fdf1pd8c140206ab70cc2@mail.gmail.com>
	<46DD5811.8060209@maubp.freeserve.co.uk>
Message-ID: <d5dc3ecc0709040755t738550b3y31adda74acec6446@mail.gmail.com>

Eventually, I managed to download all of them (21 only...) But thank you
very much for the tip, I will incorporate that error check/try to the
script!

Cheers,


Miguel

2007/9/4, Peter <biopython at maubp.freeserve.co.uk>:
>
> Miguel Ortiz-Lombard?a wrote:
> > Hello everyone.
> >
> > I'm trying to retrieve from NCBI a series of GeneBank records from a
> list
> > read from a file.
>
> How many GenBenk identifiers are we talking about? Just trying to get an
> idea of the scale of the problem.  It certainly sounds like either
> network failures or timeouts.  Have you try something like this?
>
> from Bio import GenBank
> from urllib2 import HTTPError
> ncbi_dict = GenBank.NCBIDictionary("protein", "genbank")
> ids = ['14598510', '16904191']
> output = open('saved.gb','w')
> for gbid in ids:
>      print "Fetching %s" % gbid
>      try :
>          gb_record = ncbi_dict[gbid]
>      except HTTPError, e :
>          #Check error code?
>          print str(e)
>          print "Re-trying %s" % gbid
>          gb_record = ncbi_dict[gbid]
>      output.write(gb_record)
> output.close()
> print "Done"
>
> Peter
>
>


-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From meesters at uni-mainz.de  Wed Sep  5 15:47:07 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Wed, 5 Sep 2007 17:47:07 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a	protein?
Message-ID: <1189007228.27068.31.camel@cmeesters>

Hi,

Does anyone know a way to compute the maximum distance within a protein
(perhaps using Bio.PDB) without calculating distances of all atom
pairs? 

I'm hoping to be just too blind to see an easy solution here ...

TIA
Christian


From idoerg at gmail.com  Wed Sep  5 16:02:04 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed, 5 Sep 2007 09:02:04 -0700
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <1189007228.27068.31.camel@cmeesters>
References: <1189007228.27068.31.camel@cmeesters>
Message-ID: <b5bbbc970709050902p43065279r1dc880b5d19289b1@mail.gmail.com>

Not sure why you would want to do that. But how about calculating the
diameter of an enclosing sphere?

On 9/5/07, Christian Meesters <meesters at uni-mainz.de> wrote:
>
> Hi,
>
> Does anyone know a way to compute the maximum distance within a protein
> (perhaps using Bio.PDB) without calculating distances of all atom
> pairs?
>
> I'm hoping to be just too blind to see an easy solution here ...
>
> TIA
> Christian
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


From biopython at maubp.freeserve.co.uk  Wed Sep  5 16:24:06 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 05 Sep 2007 17:24:06 +0100
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
 within a protein?
In-Reply-To: <1189007228.27068.31.camel@cmeesters>
References: <1189007228.27068.31.camel@cmeesters>
Message-ID: <46DED826.4050802@maubp.freeserve.co.uk>

Christian Meesters wrote:
> Hi,
> 
> Does anyone know a way to compute the maximum distance within a protein
> (perhaps using Bio.PDB) without calculating distances of all atom
> pairs? 

Are you thinking alpha-carbon to alpha-carbon distances, or using all atoms?

> I'm hoping to be just too blind to see an easy solution here ...

There should be some way to take advantage of the backbone links meaning 
lots of residues are constrained to be close to each other... Is it 
essential to get the largest pairwise distance, or would a local maximum do?

You could probably do some clever sampling, say doing all pairwise 
combination of every third residue, and then for those furthest apart 
including all the local residues... just thinking out loud.

Peter


From ibdeno at gmail.com  Wed Sep  5 17:31:55 2007
From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=)
Date: Wed, 5 Sep 2007 19:31:55 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <46DED826.4050802@maubp.freeserve.co.uk>
References: <1189007228.27068.31.camel@cmeesters>
	<46DED826.4050802@maubp.freeserve.co.uk>
Message-ID: <d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>

Hello,

You can align the protein coordinates against its principal axes of inertia.
This is very fast. One (free) program doing so is 'moleman2' from the
Uppsala Software Factory:

http://alpha2.bmc.uu.se/~gerard/usf/

HTH,


Miguel

2007/9/5, Peter <biopython at maubp.freeserve.co.uk>:
>
> Christian Meesters wrote:
> > Hi,
> >
> > Does anyone know a way to compute the maximum distance within a protein
> > (perhaps using Bio.PDB) without calculating distances of all atom
> > pairs?
>
> Are you thinking alpha-carbon to alpha-carbon distances, or using all
> atoms?
>
> > I'm hoping to be just too blind to see an easy solution here ...
>
> There should be some way to take advantage of the backbone links meaning
> lots of residues are constrained to be close to each other... Is it
> essential to get the largest pairwise distance, or would a local maximum
> do?
>
> You could probably do some clever sampling, say doing all pairwise
> combination of every third residue, and then for those furthest apart
> including all the local residues... just thinking out loud.
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
correo-e: ibdeno at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en libert?
Dans les jardins mal fr?quent?s!

Georges Brassens


From thamelry at binf.ku.dk  Wed Sep  5 18:19:28 2007
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Wed, 5 Sep 2007 20:19:28 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance
	within a protein?
In-Reply-To: <d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>
References: <1189007228.27068.31.camel@cmeesters>
	<46DED826.4050802@maubp.freeserve.co.uk>
	<d5dc3ecc0709051031n715e60fdw9c25fc138d42355e@mail.gmail.com>
Message-ID: <2d7c25310709051119r18e278cag70f4750272f3cea@mail.gmail.com>

Hi,

This is one of those problems that computational geometry people love to solve.
See for example:

http://www-sop.inria.fr/epidaure/personnel/malandain/diameter/

Google will give many other algorithms...

Cheers,

-Thomas


From meesters at uni-mainz.de  Thu Sep  6 12:37:13 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Thu, 6 Sep 2007 14:37:13 +0200
Subject: [BioPython] using Bio.PDB: fast way to get the maximum	distance
	within a protein?
Message-ID: <1189082233.20772.37.camel@cmeesters>

Hi,

Thanks for the input. 

To clarify what I actually wanted: I need a rather precise (+/- 2 ?)
estimate of the maximum distance within a protein - taking all atoms,
including sugar residues in glycosilated proteins for example, into
account. So, restricting myself to CA-atoms does not really help. The
approach should not rely on symmetry, since not all proteins have
symmetry.

Thinking about the problem once more, I decided to make use of the
Har-Peled approach Thomas pointed me (indirectly) to. 

Again,
Thanks a lot,
Christian


From biopython at maubp.freeserve.co.uk  Sun Sep  9 21:17:04 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 09 Sep 2007 22:17:04 +0100
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46D31C97.1070200@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>
	<46D31C97.1070200@maubp.freeserve.co.uk>
Message-ID: <46E462D0.5090207@maubp.freeserve.co.uk>

Peter wrote:
> I think having SeqRecord subclass Seq is nicer than simply adding 
> annotation to the Seq class. Seq objects would (still) just have a 
> sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object.
> 
> I think this would be close to BioPerl's Seq and RichSeq objects.
> 
> I have filed an enhancement on Bugzilla to hold any suggested patches 
> etc (I hope to upload something later tonight):
> 
> Bug 2351 - Make SeqRecord subclass Seq subclass string?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2351

Going back over the mailing list archives, we discussed something 
similar on the dev mailing list back in early 2005.

I would like to make the following "small" change now, ready for the 
next release of Biopython:

(1) Make __str__ give the full sequence as a string for Seq and
     MutableSeq objects, allowing intuitive use of str(myseq) which
     used to give a truncated representation including the alphabet.
(2) tostring() will be documented as deprecated in favour of str(...)
(3) leave __repr__ as is (giving the full string with an alphabet)
     which can be used with eval(repr(myseq)))

There will be some fallout to this - in particular we'll need to go over 
the documentation and may need to fix a few things.

The only downside is the loss of a built in method to get a "short seq 
string representation" (currently available as str(myseq) via __str__). 
  Back in 2005, Fr?d?ric Sohm suggested adding short() method to do 
this. Personally I'd only use this when working at the command line, but 
it might be nice.  One refinement over the current truncation is I would 
personally include the last three letters - this is handy when looking 
at genes as you might want to know if there was a stop codon present.

e.g.

Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLN...LVL', 
SingleLetterAlphabet())

rather than:

Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLNLDQ ...', 
SingleLetterAlphabet())

and similarly for nucleotides (which is why I suggest at least the last 
three trailing letters).

Peter


From mdehoon at c2b2.columbia.edu  Mon Sep 10 00:04:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 10 Sep 2007 09:04:28 +0900
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E462D0.5090207@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>
	<46D31C97.1070200@maubp.freeserve.co.uk>
	<46E462D0.5090207@maubp.freeserve.co.uk>
Message-ID: <46E48A0C.1050403@c2b2.columbia.edu>

Peter wrote:
> I would like to make the following "small" change now, ready for the 
> next release of Biopython:
> 
> (1) Make __str__ give the full sequence as a string for Seq and
>     MutableSeq objects, allowing intuitive use of str(myseq) which
>     used to give a truncated representation including the alphabet.

Note that the __str__ is used to create the output of "print myseq", 
where myseq is a Seq object. So if __str__ returns the full sequence 
string, then "print myseq" will print the full sequence. This is not 
necessarily what you want. In essence, the str() function and the 
.tostring() method have different functions. So I think we should not 
drop .tostring() in favor of str().

Moreover, this problem will go away if and when a Seq object subclasses 
from a string object. Then, we won't need a Seq-to-string function at all.

--Michiel.


From biopython at maubp.freeserve.co.uk  Mon Sep 10 08:27:18 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 10 Sep 2007 09:27:18 +0100
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E48A0C.1050403@c2b2.columbia.edu>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>	<46D31C97.1070200@maubp.freeserve.co.uk>	<46E462D0.5090207@maubp.freeserve.co.uk>
	<46E48A0C.1050403@c2b2.columbia.edu>
Message-ID: <46E4FFE6.9040608@maubp.freeserve.co.uk>

We seem to be talking at cross purposes.

Michiel de Hoon wrote:
> Peter wrote:
>> I would like to make the following "small" change now, ready for
>> the next release of Biopython:
>> 
>> (1) Make __str__ give the full sequence as a string for Seq and 
>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>> to give a truncated representation including the alphabet.
> 
> Note that the __str__ is used to create the output of "print myseq",
>  where myseq is a Seq object. So if __str__ returns the full sequence
>  string, then "print myseq" will print the full sequence. This is not
>  necessarily what you want.

Getting the full string from both "print my_seq" and str(my_seq) is what
I would expect from a Seq object that acted like a string.

> In essence, the str() function and the .tostring() method have
> different functions. So I think we should not drop .tostring() in
> favor of str().

At the moment str() and .tostring() do serve purposes.  Currently with a 
Seq object called my_seq:
* full sequence as string - my_seq.tostring()
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - str(my_seq)

What I would like:
* full sequence as string - str(my_seq) and retain my_seq.tostring() for 
backwards compatibility.
* representation with full sequence with alphabet - repr(my_seq)
* truncated sequence as string - not built in
* representation with truncated sequence with alphabet - consider added 
a new method e.g. my_seq.short()

> Moreover, this problem will go away if and when a Seq object
> subclasses from a string object. Then, we won't need a Seq-to-string
> function at all.

What do you mean by the "problem will go away"?  This would be much
easier to discuss in person :(

If/when we make Seq a subclass of string, there would still be __str__
and __repr__ methods, and I would expect str(my_seq) and also "print
my_seq" to give the full sequence.  For backwards compatibility I would
keep the existing .tostring() method as well.

I would find it very strange to have the Seq object subclass string, but 
doing str(my_seq) not give me the full sequence.  Isn't making 
str(my_seq) return the full sequence as a string is essential for things 
like this?:

print my_seq
print "My sequence is %s, length %i" % (my_seq, len(my_seq))

Rather than as currently required:

print my_seq.tostring()
print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))


Peter


From mdehoon at c2b2.columbia.edu  Mon Sep 10 09:56:25 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 10 Sep 2007 18:56:25 +0900
Subject: [BioPython] Making the Seq object act more like a string
In-Reply-To: <46E4FFE6.9040608@maubp.freeserve.co.uk>
References: <46CC50BB.1090902@maubp.freeserve.co.uk><b43bf2080708220841s7ba6bf7cof74a99866e4ef93a@mail.gmail.com>	<46CC5C17.4000709@maubp.freeserve.co.uk>	<6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu>	<46D31C97.1070200@maubp.freeserve.co.uk>	<46E462D0.5090207@maubp.freeserve.co.uk>
	<46E48A0C.1050403@c2b2.columbia.edu>
	<46E4FFE6.9040608@maubp.freeserve.co.uk>
Message-ID: <46E514C9.2010006@c2b2.columbia.edu>

Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming 
release, which is only five days away. There's not enough time to 
discuss these issues in detail, let alone to test them.

--Michiel.


Peter wrote:
> We seem to be talking at cross purposes.
> 
> Michiel de Hoon wrote:
>> Peter wrote:
>>> I would like to make the following "small" change now, ready for
>>> the next release of Biopython:
>>>
>>> (1) Make __str__ give the full sequence as a string for Seq and 
>>> MutableSeq objects, allowing intuitive use of str(myseq) which used
>>> to give a truncated representation including the alphabet.
>>
>> Note that the __str__ is used to create the output of "print myseq",
>>  where myseq is a Seq object. So if __str__ returns the full sequence
>>  string, then "print myseq" will print the full sequence. This is not
>>  necessarily what you want.
> 
> Getting the full string from both "print my_seq" and str(my_seq) is what
> I would expect from a Seq object that acted like a string.
> 
>> In essence, the str() function and the .tostring() method have
>> different functions. So I think we should not drop .tostring() in
>> favor of str().
> 
> At the moment str() and .tostring() do serve purposes.  Currently with a 
> Seq object called my_seq:
> * full sequence as string - my_seq.tostring()
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - str(my_seq)
> 
> What I would like:
> * full sequence as string - str(my_seq) and retain my_seq.tostring() for 
> backwards compatibility.
> * representation with full sequence with alphabet - repr(my_seq)
> * truncated sequence as string - not built in
> * representation with truncated sequence with alphabet - consider added 
> a new method e.g. my_seq.short()
> 
>> Moreover, this problem will go away if and when a Seq object
>> subclasses from a string object. Then, we won't need a Seq-to-string
>> function at all.
> 
> What do you mean by the "problem will go away"?  This would be much
> easier to discuss in person :(
> 
> If/when we make Seq a subclass of string, there would still be __str__
> and __repr__ methods, and I would expect str(my_seq) and also "print
> my_seq" to give the full sequence.  For backwards compatibility I would
> keep the existing .tostring() method as well.
> 
> I would find it very strange to have the Seq object subclass string, but 
> doing str(my_seq) not give me the full sequence.  Isn't making 
> str(my_seq) return the full sequence as a string is essential for things 
> like this?:
> 
> print my_seq
> print "My sequence is %s, length %i" % (my_seq, len(my_seq))
> 
> Rather than as currently required:
> 
> print my_seq.tostring()
> print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq))
> 
> 
> Peter
> 


From mdehoon at c2b2.columbia.edu  Tue Sep 11 14:37:57 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Tue, 11 Sep 2007 23:37:57 +0900
Subject: [BioPython] Bio.MultiProc
Message-ID: <46E6A845.3030601@c2b2.columbia.edu>

Hi everybody,

In preparation for the upcoming release, I was running the Biopython 
test suite and found that test_copen.py hangs on Cygwin. It doesn't 
fail, it just sits there forever. This may be related to the use of 
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it 
is probably possible to fix this, I'd have to dig fairly deep into the 
code, and I am not sure if it is worth it. It looks like the copen 
functions are used only in Bio/config, which is needed for Bio.db. A 
description of the functionality of thia module can be found in the 
tutorial section 4.7.2.

Now, I don't remember users asking about this module on the mailing 
list. From the tutorial documentation, it seems to be a nice piece of 
code, but I doubt that it is being used often in practice.

So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release? 
Hopefully, people who are using this code will notice, and let us know 
that they need it.

--Michiel.


From biopython at maubp.freeserve.co.uk  Wed Sep 12 18:31:43 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 12 Sep 2007 19:31:43 +0100
Subject: [BioPython] Deprecating Bio.FormatIO ?
Message-ID: <46E8308F.6040709@maubp.freeserve.co.uk>

With the release Biopython 1.43 and Bio.SeqIO earlier this year, would 
anyone be upset if the older Bio.FormatIO module was marked as 
deprecated for the next Biopython release?

This module isn't mentioned in the tutorial/cookbook, but Brad did write 
this entire document:
http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf
http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html

In addition to marking Bio.FormatIO as deprecated, I would probably add 
a big disclaimer to that document, or re-write it to use Bio.SeqIO instead.

Thanks

Peter


From mdehoon at c2b2.columbia.edu  Thu Sep 13 05:13:29 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 13 Sep 2007 01:13:29 -0400
Subject: [BioPython] Deprecating Fasta.Dictionary, GenBank.Dictionary
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B61E@mail2.exch.c2b2.columbia.edu>

Hi everybody,

In the preparation for the upcoming Biopython release, we noticed some
serious problems when using the latest version (3.0) of mxTextTools. We were
already able to fix several of them, but some Biopython tests still fail with
the new mxTextTools. One of the tests that fails is test_Fasta.py. The part
of the test that fails is related to creating a Fasta Dictionary. This is not
explicitly described in the Tutorial, but it is essentially the same as
creating a Genbank dictionary, which is explained in section 4.3.4 in the
Tutorial.

Quoting from the tutorial:
>>> from Bio import GenBank
>>> dict_file = 'cor6_6.gb'
>>> index_file = 'cor6_6.idx'
>>> GenBank.index_file(dict_file, index_file)
>>> gb_dict = GenBank.Dictionary(index_file, GenBank.FeatureParser())
>>> len(gb_dict)
>>> gb_dict.keys()
['L31939', 'AJ237582', 'X62281', 'AF297471', 'M81224', 'X55053']
>>> gb_dict['AJ237582']
<Bio.SeqRecord.SeqRecord instance at 0x102fdd8c>


The same can also be obtained with the new Bio.SeqIO code:

>>> from Bio import SeqIO
>>> records = SeqIO.parse(open('cor6_6.gb'), 'genbank')
>>> gb_dict = {}
>>> for record in records:
...     key = record.id.split(".")[0]
...     gb_dict[key] = record
...
>>> gb_dict.keys()
['M81224', 'AF297471', 'X62281', 'AJ237582', 'L31939', 'X55053']
>>> # etcetera

(you can also use the to_dict function in Bio.SeqIO). The same can also be
done for Fasta.

So, I'd like to deprecate the index_file functions where Bio.SeqIO can be
used instead, in particular for Fasta. Then, we can remove that particular
test from test_Fasta. Would that cause problems for anybody? Given the new
Bio.SeqIO code, does anybody still need to use the index_file functions? 

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From letondal at pasteur.Fr  Fri Sep 14 19:12:28 2007
From: letondal at pasteur.Fr (Catherine Letondal)
Date: Fri, 14 Sep 2007 21:12:28 +0200
Subject: [BioPython] Programming course at Institut Pasteur (winter 2008)
Message-ID: <19581094-71E1-4400-83ED-12EF051BF7CB@pasteur.Fr>

Hi,


************************************************************************

       Course in informatics for biology 2008 at Institut Pasteur
          http://www.pasteur.fr/formation/infobio-en.html
         *** Registration extended to October 15th 2007 ***

************************************************************************


       In the series of courses offered at the Pasteur Institute, a
course
will be offered in informatics in biology. The next session will take
place from January to end of April 2008.

       The main goal of this course is to provide researchers in biology
an initial exposure to informatics. Admitance in the course is reserved
for those with a degree in biology or a related discipline.

       With more and more bioinformatics tools available, it becomes
increasingly important for researchers in biology to be able both to
manage their data, implement their ideas, and judge for themselves the
usefulness of new algorithms and software.

       This course will emphasize fundamental aspects of computer  
science
and apply them to biological examples. Theoretical aspects (algorithm
development, logic, problem modeling and design methods), and technical
applications (databases and web technologies) that are relevant for
biologists will be thoroughly discussed.

       Programming is presented through the object-oriented paradigm,
using a modern high-level language, Python, provided with tools for
biology and enabling both prototyping or scripting and the building of
important software systems. Learning of an additional language (C) will
be available for interested students.

       Learning during the course will be reinforced with computing
exercises, and effective training will be provided by a 2 month research
project.

       The working language of the course is French.

For further information, please consult:

       http://www.pasteur.fr/formation/infobio-en.html

    *** Registration will be closed on October 15th 2007. ***

Sincerely,


--
Benno Schwikowski & Catherine Letondal
Institut Pasteur -- Course in Informatics for Biology
www.pasteur.fr/formation/infobio


From dalloliogm at gmail.com  Mon Sep 17 09:39:38 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 11:39:38 +0200
Subject: [BioPython] sequence logo with biopython
Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>

Hi,
is there any way to produce sequence logos[1] with biopython?

I have a set of sequences of the same length, which represent the 5'
donorsite in a set of introns.
I wonder if there is a way to to create and display a .png logo
representation of them, like with this program:
- http://weblogo.berkeley.edu/


Thanks!!


[1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com


From dalloliogm at gmail.com  Mon Sep 17 09:39:38 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 11:39:38 +0200
Subject: [BioPython] sequence logo with biopython
Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>

Hi,
is there any way to produce sequence logos[1] with biopython?

I have a set of sequences of the same length, which represent the 5'
donorsite in a set of introns.
I wonder if there is a way to to create and display a .png logo
representation of them, like with this program:
- http://weblogo.berkeley.edu/


Thanks!!


[1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html
-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com


From bartek at rezolwenta.eu.org  Mon Sep 17 09:49:12 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 11:49:12 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
Message-ID: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:

> Hi,
> is there any way to produce sequence logos[1] with biopython?
> 
> I have a set of sequences of the same length, which represent the 5'
> donorsite in a set of introns.
> I wonder if there is a way to to create and display a .png logo
> representation of them, like with this program:
> - http://weblogo.berkeley.edu/
> 

Unfortunately, currently there is no solution in biopython to this. You can
however take a look at TAMO, a python library designed for working with
motifs.
http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with it,
but there are ways to at least obtain text version of the logo.

-- 
regards
   Bartek
--
For every complex problem there is an answer that is clear, simple, and wrong. 
                   H. L. Mencken


From bartek at rezolwenta.eu.org  Mon Sep 17 13:35:22 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 15:35:22 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
Message-ID: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>

bartek wilczynski <bartek at rezolwenta.eu.org> wrote:

> Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> 
> > Hi,
> > is there any way to produce sequence logos[1] with biopython?
> > 
> > I have a set of sequences of the same length, which represent the 5'
> > donorsite in a set of introns.
> > I wonder if there is a way to to create and display a .png logo
> > representation of them, like with this program:
> > - http://weblogo.berkeley.edu/
> > 
> 
> Unfortunately, currently there is no solution in biopython to this. You can
> however take a look at TAMO, a python library designed for working with
> motifs.
> http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with
> it,
> but there are ways to at least obtain text version of the logo.
> 

I've looked into it, and found a way to add this functionality to biopython.

The diff file attached introduces a method .weblogo("filename.png") to the
Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
standalone function which takes a fasta file as input.

Is it a right time to submit things like this to cvs? I can do that, but I do
not want to mess up the (soon to be available) new release. 

-- 
regards
   Bartek
--
For every complex problem there is an answer that is clear, simple, and wrong. 
                   H. L. Mencken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Motif.py
Type: text/x-python
Size: 8378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070917/0a15e1de/attachment-0002.py>

From dalloliogm at gmail.com  Mon Sep 17 14:23:54 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 17 Sep 2007 16:23:54 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
Message-ID: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>

Thank you: this is very good.

I see that it uses the berkeley weblogo website and urllib.

just one newbie question: why do you put it in the Bio.AlignAce.Motif class?
Thanks

Giovanni

2007/9/17, bartek wilczynski <bartek at rezolwenta.eu.org>:
> bartek wilczynski <bartek at rezolwenta.eu.org> wrote:
>
> > Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> >
> > > Hi,
> > > is there any way to produce sequence logos[1] with biopython?
> > >
> > > I have a set of sequences of the same length, which represent the 5'
> > > donorsite in a set of introns.
> > > I wonder if there is a way to to create and display a .png logo
> > > representation of them, like with this program:
> > > - http://weblogo.berkeley.edu/
> > >
> >
> > Unfortunately, currently there is no solution in biopython to this. You can
> > however take a look at TAMO, a python library designed for working with
> > motifs.
> > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with
> > it,
> > but there are ways to at least obtain text version of the logo.
> >
>
> I've looked into it, and found a way to add this functionality to biopython.
>
> The diff file attached introduces a method .weblogo("filename.png") to the
> Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
> standalone function which takes a fasta file as input.
>
> Is it a right time to submit things like this to cvs? I can do that, but I do
> not want to mess up the (soon to be available) new release.
>
> --
> regards
>    Bartek
> --
> For every complex problem there is an answer that is clear, simple, and wrong.
>                    H. L. Mencken
>
>
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com


From biopython at maubp.freeserve.co.uk  Mon Sep 17 14:24:42 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 17 Sep 2007 15:24:42 +0100
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
Message-ID: <46EE8E2A.3080809@maubp.freeserve.co.uk>

bartek wilczynski wrote:
> I've looked into it, and found a way to add this functionality to biopython.
> 
> The diff file attached introduces a method .weblogo("filename.png") to the
> Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a
> standalone function which takes a fasta file as input.
> 
> Is it a right time to submit things like this to cvs? I can do that, but I do
> not want to mess up the (soon to be available) new release. 

Its a very small change, but lets see what Michiel says for the timing.

It might be nice to expose all the options to the end user, possibly as 
handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
as in Bio/Blast/NCBIStandalone.py  blastall() etc.

Peter


From bartek at rezolwenta.eu.org  Mon Sep 17 14:57:52 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Mon, 17 Sep 2007 16:57:52 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com>
Message-ID: <1190041072.46ee95f03d51d@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:

> Thank you: this is very good.
> 
> I see that it uses the berkeley weblogo website and urllib.
> 
> just one newbie question: why do you put it in the Bio.AlignAce.Motif class?
> Thanks

Well, the quick answer is that it is the most convenient place for me to put it.
Since there is a Motif class for sequence motif objects, it is not a bad one. 

A longer answer is that biopython does not have a good infrastructure for
dealing with motifs. I've contributed the AlignAce lib, Jason Hackney
contributed the MEME library, which includes another Motif class, very similar,
but not exactly compatible with AlignAce code. 

I planned once to do some refactoring work to unify these to modules, but so far
did not find the time to do it. Now, since there is TAMO library available ,
there is even less incentive to do so (even though I do not use  TAMO myself).

cheers bartek


From bartek at rezolwenta.eu.org  Mon Sep 17 22:09:30 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 18 Sep 2007 00:09:30 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <46EE8E2A.3080809@maubp.freeserve.co.uk>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
Message-ID: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org>

Peter <biopython at maubp.freeserve.co.uk> wrote:

> > Is it a right time to submit things like this to cvs? I can do that, but I
> > do not want to mess up the (soon to be available) new release. 
> 
> Its a very small change, but lets see what Michiel says for the timing.
> 
It is indeed a very small change, however it seems to have at least one
prospective user ;). Also it is almost impossible to break anything by
including it in the new release. 

> It might be nice to expose all the options to the end user, possibly as 
> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
> as in Bio/Blast/NCBIStandalone.py  blastall() etc.

Good idea, I've included a new diff, which allows for passing any keys directly
from function call to the weblogo server such as:

m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image

BTW. It would be interesting to know if there are more people interested in
using a better module for sequence motifs. I have some code lying arround and
some ideas on how it could be put together, but since there were no documented
cases of anyone using Bio.AlignAce or Bio.MEME, I'm not sure if it's worth the
extra work.

-- 
cheers
   Bartek

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Motif.py.diff
Type: text/x-patch
Size: 2450 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20070918/20c9d15e/attachment-0002.bin>

From biopython at maubp.freeserve.co.uk  Tue Sep 18 08:12:02 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 18 Sep 2007 09:12:02 +0100
Subject: [BioPython] Removing Bio.FormatIO ?
In-Reply-To: <46E8308F.6040709@maubp.freeserve.co.uk>
References: <46E8308F.6040709@maubp.freeserve.co.uk>
Message-ID: <46EF8852.6090000@maubp.freeserve.co.uk>

Having looked at the Bio.FormatIO code in more detail, a simple 
deprecation warning isn't an option - it would get triggered whenever 
anyone used Bio.SeqRecord

Would anyone object if we removed Bio.FormatIO (and its hooks in 
Bio/SeqRecord.py and Bio/Search.py) entirely for the next release?

Speak now or forever hold your peace! ;)

Peter

Peter wrote:
> With the release Biopython 1.43 and Bio.SeqIO earlier this year, would 
> anyone be upset if the older Bio.FormatIO module was marked as 
> deprecated for the next Biopython release?
> 
> This module isn't mentioned in the tutorial/cookbook, but Brad did write 
> this entire document:
> http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf
> http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html
> 
> In addition to marking Bio.FormatIO as deprecated, I would probably add 
> a big disclaimer to that document, or re-write it to use Bio.SeqIO instead.
> 
> Thanks
> 
> Peter


From biopython at maubp.freeserve.co.uk  Tue Sep 18 08:51:26 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 18 Sep 2007 09:51:26 +0100
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
Message-ID: <46EF918E.90107@maubp.freeserve.co.uk>

>> It might be nice to expose all the options to the end user, possibly as 
>> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds 
>> as in Bio/Blast/NCBIStandalone.py  blastall() etc.
> 
> Good idea, I've included a new diff, which allows for passing any keys directly
> from function call to the weblogo server such as:
> 
> m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image

Does this let you do things like:

m.weblogo("x.png", res=300)

i.e. an integer, or do you have to use a string:

m.weblogo("x.png", res="300")

One way to "fix" this (if it is a problem) would be to do this:

for k,v in kwds.items():
     values[k]=str(v)

rather than:

for k,v in kwds.items():
     values[k]=v

Anyway, given we have at least ten days until the release (Michiel will 
be away - see his email on the developers list), and this is a little 
change, I would be happy for this to go into CVS now.

Peter


From bartek at rezolwenta.eu.org  Tue Sep 18 13:12:41 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 18 Sep 2007 15:12:41 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <46EF918E.90107@maubp.freeserve.co.uk>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
Message-ID: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org>

Peter <biopython at maubp.freeserve.co.uk> wrote:
> > 
> > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image
> 
> Does this let you do things like:
> 
> m.weblogo("x.png", res=300)
> 
> i.e. an integer, or do you have to use a string:
> 
> m.weblogo("x.png", res="300")
> 
> One way to "fix" this (if it is a problem) would be to do this:
> 
> for k,v in kwds.items():
>      values[k]=str(v)
> 
> rather than:
> 
> for k,v in kwds.items():
>      values[k]=v
> 
> Anyway, given we have at least ten days until the release (Michiel will 
> be away - see his email on the developers list), and this is a little 
> change, I would be happy for this to go into CVS now.

Thanks for another good idea. I submitted the code to CVS. 

-- 
cheers
   Bartek


From dalloliogm at gmail.com  Tue Sep 18 14:36:59 2007
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 18 Sep 2007 16:36:59 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
Message-ID: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>

ok, thank you.

so, let's see if I understand how to use it:

from Bio.Seq import Seq
from Bio.AlignAce.Motif import Motif

m = Motif()
m.add_instance(Seq('ACTG'))
m.add_instance(Seq('ACCG'))
m.add_instance(Seq('ACTC'))

m.search_instance(Seq('ACACGACAACGTGTCGAT'))

m.weblogo('/home/user/logo.png')


Well, about refactoring.... honestly I think it would be a good idea.
The problem is that for example, I have never used AlignAce and I
don't know which kind of program is it... so I feel a bit confusing to
import a module called like this.
Anyway the Motif class seems useful, and I will use it in my program..
problably I will have to ask a few questions on it in the next days!
:)

2007/9/18, bartek wilczynski <bartek at rezolwenta.eu.org>:
> Peter <biopython at maubp.freeserve.co.uk> wrote:
> > >
> > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image
> >
> > Does this let you do things like:
> >
> > m.weblogo("x.png", res=300)
> >
> > i.e. an integer, or do you have to use a string:
> >
> > m.weblogo("x.png", res="300")
> >
> > One way to "fix" this (if it is a problem) would be to do this:
> >
> > for k,v in kwds.items():
> >      values[k]=str(v)
> >
> > rather than:
> >
> > for k,v in kwds.items():
> >      values[k]=v
> >
> > Anyway, given we have at least ten days until the release (Michiel will
> > be away - see his email on the developers list), and this is a little
> > change, I would be happy for this to go into CVS now.
>
> Thanks for another good idea. I submitted the code to CVS.
>
> --
> cheers
>    Bartek
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com


From bartek at rezolwenta.eu.org  Tue Sep 18 23:09:29 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Wed, 19 Sep 2007 01:09:29 +0200
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
	<5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
Message-ID: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org>

Giovanni Marco Dall'Olio <dalloliogm at gmail.com>:

> ok, thank you.
> 
> so, let's see if I understand how to use it:
> 
> from Bio.Seq import Seq
> from Bio.AlignAce.Motif import Motif
> 
> m = Motif()
> m.add_instance(Seq('ACTG'))
> m.add_instance(Seq('ACCG'))
> m.add_instance(Seq('ACTC'))
> 
> m.search_instance(Seq('ACACGACAACGTGTCGAT'))
> 
> m.weblogo('/home/user/logo.png')
> 

You got it mostly right. However, the .search_instance() and .search_pwm()
methods return generators, so you should rather use:

for pos,instance in m.search_instance(sequence):
    print "found %s at %d"%(instance,pos)

> 
> Well, about refactoring.... honestly I think it would be a good idea.
> The problem is that for example, I have never used AlignAce and I
> don't know which kind of program is it... so I feel a bit confusing to
> import a module called like this.

The basic idea is to create a new Motif class aggregating the good parts of the
AlignAce and MEME versions and modify these modules so they would use the new
class. I'll try to look into that next week. I also have some code for reading
modules from the JASPAR database and motif comparisons. I'll try to clean it up
ands submit as well. Then we could try to come up with a section in the tutorial
devoted to motif analysis. If you have anything you would consider useful in the
Motif library, let me know.

> Anyway the Motif class seems useful, and I will use it in my program..
> problably I will have to ask a few questions on it in the next days!
> :)

No problem, I'll do my best to answer your questions. However I'm leaving
tomorrow for the CMSB conference, so I may be slow at responding to email this
week.

-- 
cheers
   Bartek


From robert.campbell at queensu.ca  Wed Sep 19 13:32:14 2007
From: robert.campbell at queensu.ca (Robert Campbell)
Date: Wed, 19 Sep 2007 09:32:14 -0400
Subject: [BioPython] sequence logo with biopython
In-Reply-To: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org>
References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com>
	<1190022552.46ee4d98a071d@imp.rezolwenta.eu.org>
	<1190036122.46ee829a34e8a@imp.rezolwenta.eu.org>
	<46EE8E2A.3080809@maubp.freeserve.co.uk>
	<1190066970.46eefb1a93134@imp.rezolwenta.eu.org>
	<46EF918E.90107@maubp.freeserve.co.uk>
	<1190121161.46efcec961c0e@imp.rezolwenta.eu.org>
	<5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com>
	<1190156969.46f05aa950c96@imp.rezolwenta.eu.org>
Message-ID: <20070919093214.2c7567da@adelie.biochem.queensu.ca>

On Wed, 19 Sep 2007 01:09:29 +0200, bartek wilczynski
<bartek at rezolwenta.eu.org> wrote:

> Giovanni Marco Dall'Olio <dalloliogm at gmail.com>:
> 
> > ok, thank you.
> > 
> > so, let's see if I understand how to use it:
> > 
> > from Bio.Seq import Seq
> > from Bio.AlignAce.Motif import Motif
> > 
> > m = Motif()
> > m.add_instance(Seq('ACTG'))
> > m.add_instance(Seq('ACCG'))
> > m.add_instance(Seq('ACTC'))
> > 
> > m.search_instance(Seq('ACACGACAACGTGTCGAT'))
> > 
> > m.weblogo('/home/user/logo.png')
> > 
> 
> You got it mostly right. However, the .search_instance() and .search_pwm()
> methods return generators, so you should rather use:
> 
> for pos,instance in m.search_instance(sequence):
>     print "found %s at %d"%(instance,pos)

I believe that should be "m.search_instances(sequence)" not
"m.search_instance(sequence)"  (i.e. "instances", plural).

Cheers,
Rob
-- 
Robert L. Campbell, Ph.D.
Senior Research Associate/Adjunct Assistant Professor 
Botterell Hall Rm 644
Department of Biochemistry, Queen's University, 
Kingston, ON K7L 3N6  Canada
Tel: 613-533-6821            Fax: 613-533-2497
<robert.campbell at queensu.ca>    http://pldserver1.biochem.queensu.ca/~rlc


From meesters at uni-mainz.de  Thu Sep 20 12:23:54 2007
From: meesters at uni-mainz.de (Christian Meesters)
Date: Thu, 20 Sep 2007 14:23:54 +0200
Subject: [BioPython] feature request for Bio.PDB
Message-ID: <1190291034.9570.28.camel@cmeesters>

Hi,

I think it would be good to have the option to retrieve the kind of atom
added using a method of the atom-class, e.g. like:
x = atom.get_kind()
and x would then be 'H' or 'N' for instance. It is of course possible to
retrieve this information via the atom id, but this requires to employ a
dictionary if one wants to know which type of atom this is. So, such a
method would only be for convenience.

It would be nice to see this in the upcoming release, but I fear it's
too late for this and it would be great if this idea would only be
considered for some other future release.

Christian


From anaryin at gmail.com  Fri Sep 21 19:40:07 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 20:40:07 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
Message-ID: <b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>

Hello all!

I'm writing a small script to fetch results from a NCBI database search
using BioPython modules. However, I'd like to broaden my search and to have
each page of the results displaying 500 results instead of the usual 20.
Does anyone has any idea on how to do this?

Thanks !

Jo?o Rodrigues


From anaryin at gmail.com  Fri Sep 21 20:33:55 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 21:33:55 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <46F41FEA.5020205@maubp.freeserve.co.uk>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>
	<46F41FEA.5020205@maubp.freeserve.co.uk>
Message-ID: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>

Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the
code *could* be clearer! Oh, and some of it is in Portuguese because it's
for personal use..


# NCBI Retriever

import os
import sys

# What should I look for?

query = raw_input('Qual a expressao que deseja procurar?\n..: ')

# Where should I look for?

print 'Em qual das bases de dados deseja procurar?'

databases = {1: 'PubMed', 2: 'Nucleotide', 3:
'Protein',4:'Genome',5:'Structure'}

choice = raw_input('[1] PubMed\n[2] Nucleotide\n[3] Protein\n[4] Genome\n[5]
Structure\n..: ')

if int(choice) not in databases.keys():
    print 'Escolha Inv?lida'
    sys.exit()

search_database = databases[int(choice)]

# Quit playing around, let s search!

from Bio.WWW import NCBI

search_command = 'Search'

results = NCBI.query(search_command , search_database, term = query,
doptcmdl = 'FASTA')

# Where should I save the results?

import time
actual_date = str(time.localtime()[0])+str(time.localtime()[1])+str(
time.localtime()[2])
results_file_name = os.path.join(os.getcwd(),
str(query)+'_'+str(actual_date)+".txt")

results_file = open(results_file_name, 'w')

results_file.write(results.read())
results_file.close()


From biopython at maubp.freeserve.co.uk  Fri Sep 21 21:40:15 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Sep 2007 22:40:15 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
Message-ID: <46F43A3F.9090008@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the
> code *could* be clearer! Oh, and some of it is in Portuguese because it's
> for personal use..

That's fine - as the code and comments were in English it was fine.

I see you are using Bio.WWW.NCBI as an interface to the Entrez query 
system.  Somewhere on the NCBI website they have an answer to your 
question (how to specify the number of results per page):

results = NCBI.query('Search', 'Protein', term='orchid', dispmax=23)

Some pages mentioned retstart and retmax but that doesn't seem to work.

You might also consider using Bio.EUtils instead - a python wrapper for 
the NCBI's E-Utils interface.

Peter


From biopython at maubp.freeserve.co.uk  Fri Sep 21 22:00:58 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 21 Sep 2007 23:00:58 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
Message-ID: <46F43F1A.9040309@maubp.freeserve.co.uk>

Hi again Jo?o,

I'm was thinking about your example code, and while I'm not sure exactly 
what you want to be able to do in python:

You might want to look at the search_for() function in Bio.PubMed and 
Bio.GenBank (which uses EUtils internally), and then the download_many() 
or dictionary interfaces.  This is covered in the Biopython tutorial.

I'm not sure if we have a front end for the structure database at the 
moment.

This may be more helpful than working with Entrez directly.

Peter


From anaryin at gmail.com  Fri Sep 21 22:57:20 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Fri, 21 Sep 2007 23:57:20 +0100
Subject: [BioPython] More results at NCBI Search
In-Reply-To: <46F43F1A.9040309@maubp.freeserve.co.uk>
References: <mailman.27868.1190395383.2686.biopython@lists.open-bio.org>
	<b537e3710709211240w28c0f6cbs79e4c0883c75e096@mail.gmail.com>
	<46F41FEA.5020205@maubp.freeserve.co.uk>
	<b537e3710709211333r164b4b4h319f0459c5c5917c@mail.gmail.com>
	<46F43F1A.9040309@maubp.freeserve.co.uk>
Message-ID: <b537e3710709211557v6bb98c1ao85aca451e7306be@mail.gmail.com>

Thanks you for the tip, it worked perfectly.

Well, to be honest, I'm just practicing BioPython and Python skills. What
I'm trying to do is a simple script that searches for *something* in PubMed,
gets the results page and parses that page so that I can give the user, that
is, myself at the moment :) , a txt file with this format:

----
TITLE:
AUTHOR:
YEAR:
JOURNAL: (optional actually)
ABSTRACT:
LINK:
RELATED LINKS:
----

It is probably already made and in a more useful way than mine but, as I do
need to practice, it's a start!

Again, thanks for the tips. I'll look into those Bio.PubMed and Bio.GenBank.


From anaryin at gmail.com  Mon Sep 24 16:13:33 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Mon, 24 Sep 2007 17:13:33 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
Message-ID: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>

Hello!

I am working in a University whose network is proxied. I can't work with any
of the BioPython modules that require access to the Internet (e.g. Bio.WWW).
How can I configure them manually to override the proxy? I already read
about configuring the urllib to use a proxy, but I can't figure out where to
find the string that handles the connection.

Jo?o Rodrigues


From biopython at maubp.freeserve.co.uk  Mon Sep 24 16:58:56 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Sep 2007 17:58:56 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
In-Reply-To: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
References: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
Message-ID: <46F7ECD0.8020001@maubp.freeserve.co.uk>

Jo?o Rodrigues wrote:
> Hello!
> 
> I am working in a University whose network is proxied. I can't work
> with any of the BioPython modules that require access to the Internet
> (e.g. Bio.WWW). How can I configure them manually to override the
> proxy? I already read about configuring the urllib to use a proxy,
> but I can't figure out where to find the string that handles the
> connection.

Bio.WWW uses urllib, so the simplest answer is to follow the advice in 
http://docs.python.org/lib/module-urllib.html

Specifically on Windows you probably just need to set the http_proxy 
environment variables before starting Python, or configure the proxy in 
the internet settings (via Internet Explorer I assume).  I think would 
be easiest to set this environment variable once by hand, but you could 
set it at run time as part of your python script.

You'll have to consult your Universities network documentation to 
determine the string to use for the http_proxy environment variable, but 
it would look something like "http://www.someproxy.com:3128" (i.e. 
address:port number).

The alternative is to pass the "proxies" option to urllib.openurl(), but 
this would require multiple changes in Bio.WWW to support.  Note that 
urllib does not currently support proxies which require authentication.

Peter


From biopython at maubp.freeserve.co.uk  Mon Sep 24 21:47:13 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 24 Sep 2007 22:47:13 +0100
Subject: [BioPython] poor man's databases for large sequence files
Message-ID: <46F83061.3090207@maubp.freeserve.co.uk>

I've been thinking about extending Bio.SeqIO to support a (read only) 
dictionary like interface for large sequence files (WITHOUT having 
everything in memory).

Some of the older Biopython sequence format specific modules have an 
index_file function and matching Dictionary class to do this (based 
internally on either Martel/Mindy or a DIY Biopython indexer based on 
pickle).

When thinking about a format agnostic SeqRecord dictionary, the built in 
python "Shelf" object from python's built in "shelve library" looks like 
a good choice.  I could add a Bio.SeqIO.to_shelf() function similar to 
the existing Bio.SeqIO.to_dict() function.

The only downside I've thought of so far is updating a shelf database, 
something supported by shelve but with a few gotchas when dealing with 
non-trivial datatypes (like dictionaries).  The need I am thinking about 
addressing is a little less flexible - read only low-memory access to a 
large collection of SeqRecords (typically from a large sequence file).

Does anyone already use python's shelve library with sequence data?

Peter


From anaryin at gmail.com  Mon Sep 24 23:11:57 2007
From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=)
Date: Tue, 25 Sep 2007 00:11:57 +0100
Subject: [BioPython] Configuring Proxy for certain Modules
In-Reply-To: <46F7ECD0.8020001@maubp.freeserve.co.uk>
References: <b537e3710709240913r1c01eee3w75e871f67d5fcab8@mail.gmail.com>
	<46F7ECD0.8020001@maubp.freeserve.co.uk>
Message-ID: <b537e3710709241611w150b5c9ev5360ca7ee60d1efd@mail.gmail.com>

 Again, thank you for the kind answer!

I had in fact read about the urllib module and that was how I "discovered"
that I could configure the proxy "by hand". If I set it automatically at the
IE, or firefox, it won't work on Python, but it will on the browser. As for
the http_proxy env variable, how do I set them?


From sdavis2 at mail.nih.gov  Tue Sep 25 01:40:21 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 24 Sep 2007 21:40:21 -0400
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F83061.3090207@maubp.freeserve.co.uk>
References: <46F83061.3090207@maubp.freeserve.co.uk>
Message-ID: <46F86705.1090109@mail.nih.gov>

Peter wrote:
> I've been thinking about extending Bio.SeqIO to support a (read only) 
> dictionary like interface for large sequence files (WITHOUT having 
> everything in memory).
>
> Some of the older Biopython sequence format specific modules have an 
> index_file function and matching Dictionary class to do this (based 
> internally on either Martel/Mindy or a DIY Biopython indexer based on 
> pickle).
>
> When thinking about a format agnostic SeqRecord dictionary, the built in 
> python "Shelf" object from python's built in "shelve library" looks like 
> a good choice.  I could add a Bio.SeqIO.to_shelf() function similar to 
> the existing Bio.SeqIO.to_dict() function.
>
> The only downside I've thought of so far is updating a shelf database, 
> something supported by shelve but with a few gotchas when dealing with 
> non-trivial datatypes (like dictionaries).  The need I am thinking about 
> addressing is a little less flexible - read only low-memory access to a 
> large collection of SeqRecords (typically from a large sequence file).
>
> Does anyone already use python's shelve library with sequence data?
>   

Just a curiosity, Peter, but would this extension deal with small 
collections of large sequences (finished genomes, for example)? 

Sean


From biopython at maubp.freeserve.co.uk  Tue Sep 25 08:14:50 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 25 Sep 2007 09:14:50 +0100
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F86705.1090109@mail.nih.gov>
References: <46F83061.3090207@maubp.freeserve.co.uk>
	<46F86705.1090109@mail.nih.gov>
Message-ID: <46F8C37A.1000005@maubp.freeserve.co.uk>

Sean Davis wrote:
> Peter wrote:
>> I've been thinking about extending Bio.SeqIO to support a (read only) 
>> dictionary like interface for large sequence files (WITHOUT having 
>> everything in memory).
>>
>> ...
>>
>> Does anyone already use python's shelve library with sequence data?
>>   
> 
> Just a curiosity, Peter, but would this extension deal with small 
> collections of large sequences (finished genomes, for example)? 
> 

Hi Sean,

What I had in mind was say indexing all of UniProt which is currently 
1.1 GB in the SwissProt flat file format, but each record is pretty small.

However, in theory this (largely unwritten) code could be used on any 
number of any sized records - but you would need enough ram to hold any 
one record in memory at once, plus some more RAM for the hopefully 
modest database overhead, python, your script etc.

I suppose having all the chromosomes for a given Eukaryote (e.g. mouse 
or fruit fly) would also be a sensible examples; having tens of records 
where each is tens of MB in size. Is that the sort of thing you had in 
mind Sean?

Peter


From sdavis2 at mail.nih.gov  Tue Sep 25 11:41:25 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 25 Sep 2007 07:41:25 -0400
Subject: [BioPython] poor man's databases for large sequence files
In-Reply-To: <46F8C37A.1000005@maubp.freeserve.co.uk>
References: <46F83061.3090207@maubp.freeserve.co.uk>
	<46F86705.1090109@mail.nih.gov>
	<46F8C37A.1000005@maubp.freeserve.co.uk>
Message-ID: <46F8F3E5.5020802@mail.nih.gov>

Peter wrote:
> Sean Davis wrote:
>> Peter wrote:
>>> I've been thinking about extending Bio.SeqIO to support a (read only)
>>> dictionary like interface for large sequence files (WITHOUT having
>>> everything in memory).
>>>
>>> ...
>>>
>>> Does anyone already use python's shelve library with sequence data?
>>>   
>>
>> Just a curiosity, Peter, but would this extension deal with small
>> collections of large sequences (finished genomes, for example)?
> 
> Hi Sean,
> 
> What I had in mind was say indexing all of UniProt which is currently
> 1.1 GB in the SwissProt flat file format, but each record is pretty small.
> 
> However, in theory this (largely unwritten) code could be used on any
> number of any sized records - but you would need enough ram to hold any
> one record in memory at once, plus some more RAM for the hopefully
> modest database overhead, python, your script etc.
> 
> I suppose having all the chromosomes for a given Eukaryote (e.g. mouse
> or fruit fly) would also be a sensible examples; having tens of records
> where each is tens of MB in size. Is that the sort of thing you had in
> mind Sean?

Yes.  Lincoln Stein wrote some indexing stuff in perl that allows
essentially random access to sequence records as well as subsets of
individual records.  It makes it possible to do range queries on
individual sequences with very modest memory; with a larger memory
machine, one might imagine that this would result in very fast queries
as the files get cached.

Sean


From ytu888 at hotmail.com  Fri Sep 28 11:40:09 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 06:40:09 -0500
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
Message-ID: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>


I'm a newbie for the Biopython, and want to install it on my Mac OS X computer. I got the similar error messages on command line when install Python2.5, but finally I did that using the python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got the following messages: mxDateTime.c is missing. Where to find the file? Please help me to solve the problem and thank you very much.

LeesComputer:/Users/Python_Bio/egenix-mx-base-3.0.0.macosx-10.3-fat-py2.5_ucs4.prebuilt Lee$ sudo python setup.py build
running build
running mx_autoconf
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -D_GNU_SOURCE=1 -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
macros to define: []
macros to undefine: []
running build_ext

building extension "mx.DateTime.mxDateTime.mxDateTime" (required)
building 'mx.DateTime.mxDateTime.mxDateTime' extension
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DUSE_FAST_GETCURRENTTIME -Imx/DateTime/mxDateTime -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c mx/DateTime/mxDateTime/mxDateTime.c -o build/temp.darwin-8.10.1-i386-2.3_ucs2/mx-DateTime-mxDateTime-mxDateTime/mx/DateTime/mxDateTime/mxDateTime.o
i686-apple-darwin8-gcc-4.0.1: mx/DateTime/mxDateTime/mxDateTime.c: No such file or directory
i686-apple-darwin8-gcc-4.0.1: no input files
error: command 'gcc' failed with exit status 1

_________________________________________________________________
Connect to the next generation of MSN Messenger?
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline


From biopython at maubp.freeserve.co.uk  Fri Sep 28 12:27:17 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 13:27:17 +0100
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
In-Reply-To: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
Message-ID: <46FCF325.4040002@maubp.freeserve.co.uk>

Y Tu wrote:
> I'm a newbie for the Biopython, and want to install it on my Mac OS X
> computer. I got the similar error messages on command line when
> install Python2.5, but finally I did that using the
> python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got
> the following messages: mxDateTime.c is missing. Where to find the
> file? Please help me to solve the problem and thank you very much.

It sounds like you don't want to use the default Apple provided python - 
I have the impression that this can make life more complicated.  I'm not 
a Mac user, but Michiel, is and he may be able to help.  He has been 
away recently but should be back soon.

In terms of installing mxTextTools, you may get more support on the 
egenix mailing list.  However, there are currently some issues with 
Biopython and egenix mxTextTools 3.0, so if you can find it I would 
suggest using version 2.0 instead.

We hope to release Biopython 1.44 in October, which will address most of 
the mxTextText tools issues.  That said, the majority of Biopython 1.43 
will still work even with mxTextTools 3.0

Peter


From ytu888 at hotmail.com  Fri Sep 28 13:22:43 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 08:22:43 -0500
Subject: [BioPython] Error for installation of mxTextTools on Mac OS X
In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
Message-ID: <BAY119-W12BF9C22D8A72CBCCD943C8FB20@phx.gbl>


The one coming with Mac OS X is an old version. Therefore I installed the new one 2.5.1 and it succeeded. Then, it came the problem with mxTextTools. I just did the installation of Numerical and it worked.


> Date: Fri, 28 Sep 2007 13:27:17 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for installation of mxTextTools on Mac OS X
> 
> Y Tu wrote:
> > I'm a newbie for the Biopython, and want to install it on my Mac OS X
> > computer. I got the similar error messages on command line when
> > install Python2.5, but finally I did that using the
> > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got
> > the following messages: mxDateTime.c is missing. Where to find the
> > file? Please help me to solve the problem and thank you very much.
> 
> It sounds like you don't want to use the default Apple provided python - 
> I have the impression that this can make life more complicated.  I'm not 
> a Mac user, but Michiel, is and he may be able to help.  He has been 
> away recently but should be back soon.
> 
> In terms of installing mxTextTools, you may get more support on the 
> egenix mailing list.  However, there are currently some issues with 
> Biopython and egenix mxTextTools 3.0, so if you can find it I would 
> suggest using version 2.0 instead.
> 
> We hope to release Biopython 1.44 in October, which will address most of 
> the mxTextText tools issues.  That said, the majority of Biopython 1.43 
> will still work even with mxTextTools 3.0
> 
> Peter
> 

_________________________________________________________________
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE


From ytu888 at hotmail.com  Fri Sep 28 15:26:11 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 10:26:11 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
Message-ID: <BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>


I just installed ReportLab on Mac OS X and the test with command "from reportlab.graphics import renderPDF" succeeded. However, when I run the test script (eportlab/test/test_pdfgen_general.py), I got the following error. How to fix the problem. Another question is how to run the script under the python prompt (>>>) after importing the script by "import test_pdfgen_general.py". Thank you very much.

nypivs-lee:/Applications/MacPython 2.5/reportlab/test lee$ python test_pdfgen_general.py
E
======================================================================
ERROR: Make a PDFgen document with most graphics features
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_pdfgen_general.py", line 833, in test0
    run(outputfile('test_pdfgen_general.pdf'))
  File "test_pdfgen_general.py", line 796, in run
    c = makeDocument(filename)
  File "test_pdfgen_general.py", line 725, in makeDocument
    c.drawImage(tgif, 4*inch, 9.25*inch, w, h, mask='auto')
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfgen/canvas.py", line 629, in drawImage
    imgObj = pdfdoc.PDFImageXObject(name, image, mask=mask)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1840, in __init__
    self.loadImageFromA85(src)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1846, in loadImageFromA85
    imagedata = map(string.strip,pdfutils.makeA85Image(source,IMG=IMG))
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfutils.py", line 35, in makeA85Image
    raw = img.getRGBData()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/lib/utils.py", line 612, in getRGBData
    self._data = im.tostring()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 513, in tostring
    self.load()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
    d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
    raise IOError("decoder %s not available" % decoder_name)
IOError: decoder jpeg not available

----------------------------------------------------------------------
Ran 1 test in 0.321s

FAILED (errors=1)


_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us


From biopython at maubp.freeserve.co.uk  Fri Sep 28 16:28:28 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 17:28:28 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
Message-ID: <46FD2BAC.80401@maubp.freeserve.co.uk>

Y Tu wrote:
> I just installed ReportLab on Mac OS X and the test with command
> "from reportlab.graphics import renderPDF" succeeded. However, when I
> run the test script (reportlab/test/test_pdfgen_general.py), I got the
> following error. How to fix the problem.

I would guess you have not installed PIL, the Python Imaging Library, 
which ReportLab uses.

 > Another question is how to
> run the script under the python prompt (>>>) after importing the
> script by "import test_pdfgen_general.py". Thank you very much.

To run a python script, like "test_pdfgen_general.py", at the command 
line type:

python test_pdfgen_general.py

(assuming python is on the path, and example.py is in the current directory)

In general there are two sorts of python files, scripts which you run 
(like test_pdfgen_general.py) and library modules you import.

Peter


From ytu888 at hotmail.com  Fri Sep 28 19:18:06 2007
From: ytu888 at hotmail.com (Y Tu)
Date: Fri, 28 Sep 2007 14:18:06 -0500
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <46FD2BAC.80401@maubp.freeserve.co.uk>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>
	<46FCF325.4040002@maubp.freeserve.co.uk>
	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>
	<46FD2BAC.80401@maubp.freeserve.co.uk>
Message-ID: <BAY119-W29B426228830B4921806608FB20@phx.gbl>


Thank you, Peter for the prompt answer.

I did install the PIL already and tested with the commands "from PIL import Image",
then "import _imaging". Both commands succeeded. That's why I don't understand why the test won't work. I used the command "python test_pdfgen_general.py" under the shell prompt, which generated the error. Since I installed PIL and succeeded in importing the module of PIL, I thought maybe I can solve the problem by running the test under Python. However, after importing the test into Python. I do't know how to launch the test under the python prompt (>>>). That's why I asked the second question. 

Once again, thank you very much for help.

> Date: Fri, 28 Sep 2007 17:28:28 +0100
> From: biopython at maubp.freeserve.co.uk
> To: ytu888 at hotmail.com; biopython at lists.open-bio.org
> Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X
> 
> Y Tu wrote:
> > I just installed ReportLab on Mac OS X and the test with command
> > "from reportlab.graphics import renderPDF" succeeded. However, when I
> > run the test script (reportlab/test/test_pdfgen_general.py), I got the
> > following error. How to fix the problem.
> 
> I would guess you have not installed PIL, the Python Imaging Library, 
> which ReportLab uses.
> 
>  > Another question is how to
> > run the script under the python prompt (>>>) after importing the
> > script by "import test_pdfgen_general.py". Thank you very much.
> 
> To run a python script, like "test_pdfgen_general.py", at the command 
> line type:
> 
> python test_pdfgen_general.py
> 
> (assuming python is on the path, and example.py is in the current directory)
> 
> In general there are two sorts of python files, scripts which you run 
> (like test_pdfgen_general.py) and library modules you import.
> 
> Peter
> 

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us


From biopython at maubp.freeserve.co.uk  Fri Sep 28 19:42:31 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 28 Sep 2007 20:42:31 +0100
Subject: [BioPython] Error for running of ReportLab test on Mac OS X
In-Reply-To: <BAY119-W29B426228830B4921806608FB20@phx.gbl>
References: <BAY119-W200F8DDE4FCF9D7C69ADDE8FB20@phx.gbl>	<46FCF325.4040002@maubp.freeserve.co.uk>	<BAY119-W29236B5B13CDFB7B81465C8FB20@phx.gbl>	<46FD2BAC.80401@maubp.freeserve.co.uk>
	<BAY119-W29B426228830B4921806608FB20@phx.gbl>
Message-ID: <46FD5927.3000207@maubp.freeserve.co.uk>

Y Tu wrote:
> Thank you, Peter for the prompt answer.
> 
> I did install the PIL already and tested with the commands "from PIL
> import Image", then "import _imaging". Both commands succeeded.
> That's why I don't understand why the test won't work. I used the
> command "python test_pdfgen_general.py" under the shell prompt, which
> generated the error. Since I installed PIL and succeeded in importing
> the module of PIL, I thought maybe I can solve the problem by running
> the test under Python.

Looking in more detail at the original stack trace,

>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load
>     d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder
>     raise IOError("decoder %s not available" % decoder_name)
> IOError: decoder jpeg not available

Its possible that PIL needs some optional JPEG library, which ReportLab 
wants to use.  I suggest you search the ReportLab website & user's 
mailing list, and if you can't work out what is wrong sign up to their 
mailing list and ask them, http://www.reportlab.org/

Very little of Biopython needs ReportLab, you should be able to install 
Biopython without it.

Peter