From ibdeno at gmail.com Sun Sep 2 11:52:57 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Sun, 2 Sep 2007 17:52:57 +0200 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary Message-ID: Hello everyone. I'm trying to retrieve from NCBI a series of GeneBank records from a list read from a file. This is the code: 8<------------------------------------------------------------------------------------------- ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") output = open(args[0]+'.gb','w') for gbid in ids: gb_record = ncbi_dict[gbid] output.write(gb_record) output.close() ------------------------------------------------------------------------------------------->8 The problem is that at some point the job stops with an error such as: Traceback (most recent call last): File "/Users/mol/bin/getfromGB.py", line 61, in ? main() File "/Users/mol/bin/getfromGB.py", line 54, in main gb_record = ncbi_dict[gbid] File "/sw/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1264, in __getitem__ handle = self.db[id] File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 89, in __getitem__ return self._get(key) File "/sw/lib/python2.4/site-packages/Bio/config/_support.py", line 107, in __call__ return self.fn(*args, **keywds) File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 370, in _get handle = eutils_client.efetch(retmode = "text", rettype = File "/sw/lib/python2.4/site-packages/Bio/EUtils/DBIdsClient.py", line 150, in efetch complexity = complexity) File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 987, in efetch_using_dbids query = {"id": id_string, File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 644, in _get return self.opener.open(url) File "/sw/lib/python2.4/urllib2.py", line 364, in open response = meth(req, response) File "/sw/lib/python2.4/urllib2.py", line 471, in http_response response = self.parent.error( File "/sw/lib/python2.4/urllib2.py", line 402, in error return self._call_chain(*args) File "/sw/lib/python2.4/urllib2.py", line 337, in _call_chain result = func(*args) File "/sw/lib/python2.4/urllib2.py", line 480, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 503: Service Temporarily Unavailable Sometimes is a 502 Error... Because I can access those entries from my browser without problem, I'm guessing that there may be a timeout problem here. I would appreciate your help! Cheers, Miguel -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From sbassi at gmail.com Sun Sep 2 23:25:22 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 3 Sep 2007 00:25:22 -0300 Subject: [BioPython] Getting the location from a Genbank record Message-ID: I can get the "location" of the genes I want, but I have them in a "print mode" (calling __str__), but I don't see how to get the start and end position in a way I could use to slice the seq. There are private attributes _start and _end but I don't know if using them if the "right" way to do it. from Bio import SeqIO mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next() targets=(['cox2'],['atp6'],['atp9'],['cob']) for x in mr.features: if x.qualifiers.get('gene') in targets: print x.location #print mr.seq Get the slice I am looking for: >>> mr.seq[x.location._start.position:x.location._end.position] Seq('ATGAATGTTATAACTCCTAATTCTTTGGTAGCGGACCTCTTTGATAGTTCGACCCTTATCCCCCGTCTAACTCAACTATTCGACTGTACGGCTATTGTGATTGCGAGAGAAAGGAGGGATGGCGCCTTCCTTTACCATCTGGCGGTTGAAAACAAAAGTGCTTCCAGGTACACGGCTGTTAGGCTCATCCAAGGCGTATTTACGGAAGTAGCAGGGAACTTGACCGTCAAGTTTGAAAAAAGCTGGCCAAGCCTGTGTCACTTTCTTACGTCAGGAGAAAGGGAGATCAAAGAAGTATGGGGCCGATACGCGAAGGATCAAATCATAGAGATAGCGGATCTTAAGAGGCGGAAGAAAAGGAACCTCGGCGACCCAGAGATCGCGGAGTCCGCGCCCGTGCCGAAAGTGAAGAAGCTTTCCTCTCCTTTCAGTCGAGCATGCCCGCCCTTTAGCACTTCCCTTCCCGAAGTGGGAGTAGGAGAAAGGAAAGCGCACTCGATCAATTACCATGCCGTGTCGTAA', IUPACAmbiguousDNA()) -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Mon Sep 3 06:46:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Sep 2007 11:46:32 +0100 Subject: [BioPython] Getting the location from a Genbank record In-Reply-To: References: Message-ID: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> On 9/3/07, Sebastian Bassi wrote: > I can get the "location" of the genes I want, but I have them in a > "print mode" (calling __str__), but I don't see how to get the start > and end position in a way I could use to slice the seq. There are > private attributes _start and _end but I don't know if using them if > the "right" way to do it. > > from Bio import SeqIO > mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next() > targets=(['cox2'],['atp6'],['atp9'],['cob']) > for x in mr.features: > if x.qualifiers.get('gene') in targets: > print x.location > #print mr.seq I'm not at my own computer right now, but I think you need to do something like this to get the slice - assuming nothing funny like joins: start = x.location.start.position end = x.location.end.position print mr.seq[start:end] print mr.seq[start:end].reverse_complement() See also: http://www.warwick.ac.uk/go/peter_cock/python/genbank/ Peter From sbassi at gmail.com Mon Sep 3 09:32:28 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 3 Sep 2007 10:32:28 -0300 Subject: [BioPython] Getting the location from a Genbank record In-Reply-To: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> References: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> Message-ID: On 9/3/07, Peter wrote: > start = x.location.start.position > end = x.location.end.position Yes, this worked. I tried x.location._start.position because of this: >>> dir(x.location) ['__doc__', '__getattr__', '__init__', '__module__', '__str__', '_end', '_start'] Thank you! -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Mon Sep 3 12:47:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 03 Sep 2007 17:47:06 +0100 Subject: [BioPython] Extracting SeqFeature locations from sequences Message-ID: <46DC3A8A.1000100@maubp.freeserve.co.uk> I was prompted to actually write this email based on Sebastian Bassi's recent email where he was having trouble getting to grips with this topic. I had been thinking that Biopython really should have code built in to take a SeqFeature's location and extract this from the full record sequence. This would particularly apply to SeqRecord objects read from GenBank or EMBL files (using Bio.SeqIO or using Bio.GenBank directly). As far as I am aware, right now it is up to the user to take the information stored in a SeqFeature and apply this "by hand" to the parent record's sequence. Adding some more detailed examples to the tutorial is probably a good idea - for example based on http://www.warwick.ac.uk/go/peter_cock/python/genbank/ In addition to improving the documentation, we could add a new method to the Seq and/or SeqRecord object which would return the sub-sequence defined by a SeqFeature. We could even do this via the __getitem__ method, normally used for accessing elements of a sequence (as strings) or splicing to get a sub-sequence. e.g. print seq[index] print seq[start:end] print seq[feature] or, print record[feature] I think this is quite elegant, but a separate explicitly named method might be clearer and more discoverable. To do this properly covering all cases is actually non-trivial - a good reason to have it built into Biopython (with a good test suite) rather than having end users reimplement it themselves. Messy details to take care of include being aware of both joins and complements (stored as sub-features and the strand property respectively), and fuzzy locations. Most situations should be resolved relatively easily - but in the worst case we could throw a ValueError if there really is no sensible solution. Peter From biopython at maubp.freeserve.co.uk Tue Sep 4 09:05:21 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 04 Sep 2007 14:05:21 +0100 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary In-Reply-To: References: Message-ID: <46DD5811.8060209@maubp.freeserve.co.uk> Miguel Ortiz-Lombard?a wrote: > Hello everyone. > > I'm trying to retrieve from NCBI a series of GeneBank records from a list > read from a file. How many GenBenk identifiers are we talking about? Just trying to get an idea of the scale of the problem. It certainly sounds like either network failures or timeouts. Have you try something like this? from Bio import GenBank from urllib2 import HTTPError ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") ids = ['14598510', '16904191'] output = open('saved.gb','w') for gbid in ids: print "Fetching %s" % gbid try : gb_record = ncbi_dict[gbid] except HTTPError, e : #Check error code? print str(e) print "Re-trying %s" % gbid gb_record = ncbi_dict[gbid] output.write(gb_record) output.close() print "Done" Peter From jimmy.musselwhite at gmail.com Tue Sep 4 09:23:37 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Tue, 4 Sep 2007 09:23:37 -0400 Subject: [BioPython] Bio.Cluster clarification Message-ID: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Hello all In the documentation it says the "data" argument is "an array containing the gene expression data". What exactly does that mean? Ideally all I want to do is send it an array of lists, each containing 3 floats, aka an array of vectors in 3d space, and have it cluster those. Is that doable? This may seem like a beginner question but I'm not sure of this documentation (cluster.pdf). Thanks! Or, less likely, if you know of any python lib that can handle this, let me know! From biopython at maubp.freeserve.co.uk Tue Sep 4 09:42:00 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 04 Sep 2007 14:42:00 +0100 Subject: [BioPython] Bio.Cluster clarification In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Message-ID: <46DD60A8.7070403@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Hello all > In the documentation it says the "data" argument is "an array > containing the gene expression data". What exactly does that mean? I suspect that means an array object from the Numeric library. i.e. a two dimensional dataset of floats. In the context of gene expression, the rows are usually different genes and the columns different samples (typically covering two or more experimental conditions), and the data points are simply floating point numbers (gene expression levels). > Ideally all I want to do is send it an array of lists, each > containing 3 floats, aka an array of vectors in 3d space, and have it > cluster those. Is that doable? When you say you have an array of three-vectors, do you mean you have a three dimensional dataset? e.g. a vector field > This may seem like a beginner question but I'm not sure of this > documentation (cluster.pdf). Hopefully Michiel will reply shortly - as the author of Bio.Cluster, he should be able to give you a more precise answer. See also his webpage: http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ Peter From mdehoon at c2b2.columbia.edu Tue Sep 4 09:47:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 04 Sep 2007 22:47:49 +0900 Subject: [BioPython] Bio.Cluster clarification In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Message-ID: <46DD6205.7070801@c2b2.columbia.edu> Jimmy Musselwhite wrote: > Hello all > In the documentation it says the "data" argument is "an array containing the > gene expression data". What exactly does that mean? Ideally all I want to do > is send it an array of lists, each containing 3 floats, aka an array of > vectors in 3d space, and have it cluster those. Is that doable? Yes. --Michiel. > > This may seem like a beginner question but I'm not sure of this > documentation (cluster.pdf). > > Thanks! > > Or, less likely, if you know of any python lib that can handle this, let me > know! > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ibdeno at gmail.com Tue Sep 4 10:55:27 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Tue, 4 Sep 2007 16:55:27 +0200 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary In-Reply-To: <46DD5811.8060209@maubp.freeserve.co.uk> References: <46DD5811.8060209@maubp.freeserve.co.uk> Message-ID: Eventually, I managed to download all of them (21 only...) But thank you very much for the tip, I will incorporate that error check/try to the script! Cheers, Miguel 2007/9/4, Peter : > > Miguel Ortiz-Lombard?a wrote: > > Hello everyone. > > > > I'm trying to retrieve from NCBI a series of GeneBank records from a > list > > read from a file. > > How many GenBenk identifiers are we talking about? Just trying to get an > idea of the scale of the problem. It certainly sounds like either > network failures or timeouts. Have you try something like this? > > from Bio import GenBank > from urllib2 import HTTPError > ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") > ids = ['14598510', '16904191'] > output = open('saved.gb','w') > for gbid in ids: > print "Fetching %s" % gbid > try : > gb_record = ncbi_dict[gbid] > except HTTPError, e : > #Check error code? > print str(e) > print "Re-trying %s" % gbid > gb_record = ncbi_dict[gbid] > output.write(gb_record) > output.close() > print "Done" > > Peter > > -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From meesters at uni-mainz.de Wed Sep 5 11:47:07 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Wed, 5 Sep 2007 17:47:07 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? Message-ID: <1189007228.27068.31.camel@cmeesters> Hi, Does anyone know a way to compute the maximum distance within a protein (perhaps using Bio.PDB) without calculating distances of all atom pairs? I'm hoping to be just too blind to see an easy solution here ... TIA Christian From idoerg at gmail.com Wed Sep 5 12:02:04 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 5 Sep 2007 09:02:04 -0700 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <1189007228.27068.31.camel@cmeesters> References: <1189007228.27068.31.camel@cmeesters> Message-ID: Not sure why you would want to do that. But how about calculating the diameter of an enclosing sphere? On 9/5/07, Christian Meesters wrote: > > Hi, > > Does anyone know a way to compute the maximum distance within a protein > (perhaps using Bio.PDB) without calculating distances of all atom > pairs? > > I'm hoping to be just too blind to see an easy solution here ... > > TIA > Christian > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From biopython at maubp.freeserve.co.uk Wed Sep 5 12:24:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 05 Sep 2007 17:24:06 +0100 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <1189007228.27068.31.camel@cmeesters> References: <1189007228.27068.31.camel@cmeesters> Message-ID: <46DED826.4050802@maubp.freeserve.co.uk> Christian Meesters wrote: > Hi, > > Does anyone know a way to compute the maximum distance within a protein > (perhaps using Bio.PDB) without calculating distances of all atom > pairs? Are you thinking alpha-carbon to alpha-carbon distances, or using all atoms? > I'm hoping to be just too blind to see an easy solution here ... There should be some way to take advantage of the backbone links meaning lots of residues are constrained to be close to each other... Is it essential to get the largest pairwise distance, or would a local maximum do? You could probably do some clever sampling, say doing all pairwise combination of every third residue, and then for those furthest apart including all the local residues... just thinking out loud. Peter From ibdeno at gmail.com Wed Sep 5 13:31:55 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Wed, 5 Sep 2007 19:31:55 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <46DED826.4050802@maubp.freeserve.co.uk> References: <1189007228.27068.31.camel@cmeesters> <46DED826.4050802@maubp.freeserve.co.uk> Message-ID: Hello, You can align the protein coordinates against its principal axes of inertia. This is very fast. One (free) program doing so is 'moleman2' from the Uppsala Software Factory: http://alpha2.bmc.uu.se/~gerard/usf/ HTH, Miguel 2007/9/5, Peter : > > Christian Meesters wrote: > > Hi, > > > > Does anyone know a way to compute the maximum distance within a protein > > (perhaps using Bio.PDB) without calculating distances of all atom > > pairs? > > Are you thinking alpha-carbon to alpha-carbon distances, or using all > atoms? > > > I'm hoping to be just too blind to see an easy solution here ... > > There should be some way to take advantage of the backbone links meaning > lots of residues are constrained to be close to each other... Is it > essential to get the largest pairwise distance, or would a local maximum > do? > > You could probably do some clever sampling, say doing all pairwise > combination of every third residue, and then for those furthest apart > including all the local residues... just thinking out loud. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From thamelry at binf.ku.dk Wed Sep 5 14:19:28 2007 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Wed, 5 Sep 2007 20:19:28 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: References: <1189007228.27068.31.camel@cmeesters> <46DED826.4050802@maubp.freeserve.co.uk> Message-ID: <2d7c25310709051119r18e278cag70f4750272f3cea@mail.gmail.com> Hi, This is one of those problems that computational geometry people love to solve. See for example: http://www-sop.inria.fr/epidaure/personnel/malandain/diameter/ Google will give many other algorithms... Cheers, -Thomas From meesters at uni-mainz.de Thu Sep 6 08:37:13 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 6 Sep 2007 14:37:13 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? Message-ID: <1189082233.20772.37.camel@cmeesters> Hi, Thanks for the input. To clarify what I actually wanted: I need a rather precise (+/- 2 ?) estimate of the maximum distance within a protein - taking all atoms, including sugar residues in glycosilated proteins for example, into account. So, restricting myself to CA-atoms does not really help. The approach should not rely on symmetry, since not all proteins have symmetry. Thinking about the problem once more, I decided to make use of the Har-Peled approach Thomas pointed me (indirectly) to. Again, Thanks a lot, Christian From biopython at maubp.freeserve.co.uk Sun Sep 9 17:17:04 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 09 Sep 2007 22:17:04 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46D31C97.1070200@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> Message-ID: <46E462D0.5090207@maubp.freeserve.co.uk> Peter wrote: > I think having SeqRecord subclass Seq is nicer than simply adding > annotation to the Seq class. Seq objects would (still) just have a > sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object. > > I think this would be close to BioPerl's Seq and RichSeq objects. > > I have filed an enhancement on Bugzilla to hold any suggested patches > etc (I hope to upload something later tonight): > > Bug 2351 - Make SeqRecord subclass Seq subclass string? > http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Going back over the mailing list archives, we discussed something similar on the dev mailing list back in early 2005. I would like to make the following "small" change now, ready for the next release of Biopython: (1) Make __str__ give the full sequence as a string for Seq and MutableSeq objects, allowing intuitive use of str(myseq) which used to give a truncated representation including the alphabet. (2) tostring() will be documented as deprecated in favour of str(...) (3) leave __repr__ as is (giving the full string with an alphabet) which can be used with eval(repr(myseq))) There will be some fallout to this - in particular we'll need to go over the documentation and may need to fix a few things. The only downside is the loss of a built in method to get a "short seq string representation" (currently available as str(myseq) via __str__). Back in 2005, Fr?d?ric Sohm suggested adding short() method to do this. Personally I'd only use this when working at the command line, but it might be nice. One refinement over the current truncation is I would personally include the last three letters - this is handy when looking at genes as you might want to know if there was a stop codon present. e.g. Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLN...LVL', SingleLetterAlphabet()) rather than: Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLNLDQ ...', SingleLetterAlphabet()) and similarly for nucleotides (which is why I suggest at least the last three trailing letters). Peter From mdehoon at c2b2.columbia.edu Sun Sep 9 20:04:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 10 Sep 2007 09:04:28 +0900 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E462D0.5090207@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> Message-ID: <46E48A0C.1050403@c2b2.columbia.edu> Peter wrote: > I would like to make the following "small" change now, ready for the > next release of Biopython: > > (1) Make __str__ give the full sequence as a string for Seq and > MutableSeq objects, allowing intuitive use of str(myseq) which > used to give a truncated representation including the alphabet. Note that the __str__ is used to create the output of "print myseq", where myseq is a Seq object. So if __str__ returns the full sequence string, then "print myseq" will print the full sequence. This is not necessarily what you want. In essence, the str() function and the .tostring() method have different functions. So I think we should not drop .tostring() in favor of str(). Moreover, this problem will go away if and when a Seq object subclasses from a string object. Then, we won't need a Seq-to-string function at all. --Michiel. From biopython at maubp.freeserve.co.uk Mon Sep 10 04:27:18 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 10 Sep 2007 09:27:18 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E48A0C.1050403@c2b2.columbia.edu> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> <46E48A0C.1050403@c2b2.columbia.edu> Message-ID: <46E4FFE6.9040608@maubp.freeserve.co.uk> We seem to be talking at cross purposes. Michiel de Hoon wrote: > Peter wrote: >> I would like to make the following "small" change now, ready for >> the next release of Biopython: >> >> (1) Make __str__ give the full sequence as a string for Seq and >> MutableSeq objects, allowing intuitive use of str(myseq) which used >> to give a truncated representation including the alphabet. > > Note that the __str__ is used to create the output of "print myseq", > where myseq is a Seq object. So if __str__ returns the full sequence > string, then "print myseq" will print the full sequence. This is not > necessarily what you want. Getting the full string from both "print my_seq" and str(my_seq) is what I would expect from a Seq object that acted like a string. > In essence, the str() function and the .tostring() method have > different functions. So I think we should not drop .tostring() in > favor of str(). At the moment str() and .tostring() do serve purposes. Currently with a Seq object called my_seq: * full sequence as string - my_seq.tostring() * representation with full sequence with alphabet - repr(my_seq) * truncated sequence as string - not built in * representation with truncated sequence with alphabet - str(my_seq) What I would like: * full sequence as string - str(my_seq) and retain my_seq.tostring() for backwards compatibility. * representation with full sequence with alphabet - repr(my_seq) * truncated sequence as string - not built in * representation with truncated sequence with alphabet - consider added a new method e.g. my_seq.short() > Moreover, this problem will go away if and when a Seq object > subclasses from a string object. Then, we won't need a Seq-to-string > function at all. What do you mean by the "problem will go away"? This would be much easier to discuss in person :( If/when we make Seq a subclass of string, there would still be __str__ and __repr__ methods, and I would expect str(my_seq) and also "print my_seq" to give the full sequence. For backwards compatibility I would keep the existing .tostring() method as well. I would find it very strange to have the Seq object subclass string, but doing str(my_seq) not give me the full sequence. Isn't making str(my_seq) return the full sequence as a string is essential for things like this?: print my_seq print "My sequence is %s, length %i" % (my_seq, len(my_seq)) Rather than as currently required: print my_seq.tostring() print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq)) Peter From mdehoon at c2b2.columbia.edu Mon Sep 10 05:56:25 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 10 Sep 2007 18:56:25 +0900 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E4FFE6.9040608@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> <46E48A0C.1050403@c2b2.columbia.edu> <46E4FFE6.9040608@maubp.freeserve.co.uk> Message-ID: <46E514C9.2010006@c2b2.columbia.edu> Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming release, which is only five days away. There's not enough time to discuss these issues in detail, let alone to test them. --Michiel. Peter wrote: > We seem to be talking at cross purposes. > > Michiel de Hoon wrote: >> Peter wrote: >>> I would like to make the following "small" change now, ready for >>> the next release of Biopython: >>> >>> (1) Make __str__ give the full sequence as a string for Seq and >>> MutableSeq objects, allowing intuitive use of str(myseq) which used >>> to give a truncated representation including the alphabet. >> >> Note that the __str__ is used to create the output of "print myseq", >> where myseq is a Seq object. So if __str__ returns the full sequence >> string, then "print myseq" will print the full sequence. This is not >> necessarily what you want. > > Getting the full string from both "print my_seq" and str(my_seq) is what > I would expect from a Seq object that acted like a string. > >> In essence, the str() function and the .tostring() method have >> different functions. So I think we should not drop .tostring() in >> favor of str(). > > At the moment str() and .tostring() do serve purposes. Currently with a > Seq object called my_seq: > * full sequence as string - my_seq.tostring() > * representation with full sequence with alphabet - repr(my_seq) > * truncated sequence as string - not built in > * representation with truncated sequence with alphabet - str(my_seq) > > What I would like: > * full sequence as string - str(my_seq) and retain my_seq.tostring() for > backwards compatibility. > * representation with full sequence with alphabet - repr(my_seq) > * truncated sequence as string - not built in > * representation with truncated sequence with alphabet - consider added > a new method e.g. my_seq.short() > >> Moreover, this problem will go away if and when a Seq object >> subclasses from a string object. Then, we won't need a Seq-to-string >> function at all. > > What do you mean by the "problem will go away"? This would be much > easier to discuss in person :( > > If/when we make Seq a subclass of string, there would still be __str__ > and __repr__ methods, and I would expect str(my_seq) and also "print > my_seq" to give the full sequence. For backwards compatibility I would > keep the existing .tostring() method as well. > > I would find it very strange to have the Seq object subclass string, but > doing str(my_seq) not give me the full sequence. Isn't making > str(my_seq) return the full sequence as a string is essential for things > like this?: > > print my_seq > print "My sequence is %s, length %i" % (my_seq, len(my_seq)) > > Rather than as currently required: > > print my_seq.tostring() > print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq)) > > > Peter > From mdehoon at c2b2.columbia.edu Tue Sep 11 10:37:57 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 11 Sep 2007 23:37:57 +0900 Subject: [BioPython] Bio.MultiProc Message-ID: <46E6A845.3030601@c2b2.columbia.edu> Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. From biopython at maubp.freeserve.co.uk Wed Sep 12 14:31:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Sep 2007 19:31:43 +0100 Subject: [BioPython] Deprecating Bio.FormatIO ? Message-ID: <46E8308F.6040709@maubp.freeserve.co.uk> With the release Biopython 1.43 and Bio.SeqIO earlier this year, would anyone be upset if the older Bio.FormatIO module was marked as deprecated for the next Biopython release? This module isn't mentioned in the tutorial/cookbook, but Brad did write this entire document: http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html In addition to marking Bio.FormatIO as deprecated, I would probably add a big disclaimer to that document, or re-write it to use Bio.SeqIO instead. Thanks Peter From mdehoon at c2b2.columbia.edu Thu Sep 13 01:13:29 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 13 Sep 2007 01:13:29 -0400 Subject: [BioPython] Deprecating Fasta.Dictionary, GenBank.Dictionary Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B61E@mail2.exch.c2b2.columbia.edu> Hi everybody, In the preparation for the upcoming Biopython release, we noticed some serious problems when using the latest version (3.0) of mxTextTools. We were already able to fix several of them, but some Biopython tests still fail with the new mxTextTools. One of the tests that fails is test_Fasta.py. The part of the test that fails is related to creating a Fasta Dictionary. This is not explicitly described in the Tutorial, but it is essentially the same as creating a Genbank dictionary, which is explained in section 4.3.4 in the Tutorial. Quoting from the tutorial: >>> from Bio import GenBank >>> dict_file = 'cor6_6.gb' >>> index_file = 'cor6_6.idx' >>> GenBank.index_file(dict_file, index_file) >>> gb_dict = GenBank.Dictionary(index_file, GenBank.FeatureParser()) >>> len(gb_dict) >>> gb_dict.keys() ['L31939', 'AJ237582', 'X62281', 'AF297471', 'M81224', 'X55053'] >>> gb_dict['AJ237582'] The same can also be obtained with the new Bio.SeqIO code: >>> from Bio import SeqIO >>> records = SeqIO.parse(open('cor6_6.gb'), 'genbank') >>> gb_dict = {} >>> for record in records: ... key = record.id.split(".")[0] ... gb_dict[key] = record ... >>> gb_dict.keys() ['M81224', 'AF297471', 'X62281', 'AJ237582', 'L31939', 'X55053'] >>> # etcetera (you can also use the to_dict function in Bio.SeqIO). The same can also be done for Fasta. So, I'd like to deprecate the index_file functions where Bio.SeqIO can be used instead, in particular for Fasta. Then, we can remove that particular test from test_Fasta. Would that cause problems for anybody? Given the new Bio.SeqIO code, does anybody still need to use the index_file functions? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From letondal at pasteur.Fr Fri Sep 14 15:12:28 2007 From: letondal at pasteur.Fr (Catherine Letondal) Date: Fri, 14 Sep 2007 21:12:28 +0200 Subject: [BioPython] Programming course at Institut Pasteur (winter 2008) Message-ID: <19581094-71E1-4400-83ED-12EF051BF7CB@pasteur.Fr> Hi, ************************************************************************ Course in informatics for biology 2008 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html *** Registration extended to October 15th 2007 *** ************************************************************************ In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2008. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 15th 2007. *** Sincerely, -- Benno Schwikowski & Catherine Letondal Institut Pasteur -- Course in Informatics for Biology www.pasteur.fr/formation/infobio From dalloliogm at gmail.com Mon Sep 17 05:39:38 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 11:39:38 +0200 Subject: [BioPython] sequence logo with biopython Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Hi, is there any way to produce sequence logos[1] with biopython? I have a set of sequences of the same length, which represent the 5' donorsite in a set of introns. I wonder if there is a way to to create and display a .png logo representation of them, like with this program: - http://weblogo.berkeley.edu/ Thanks!! [1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From dalloliogm at gmail.com Mon Sep 17 05:39:38 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 11:39:38 +0200 Subject: [BioPython] sequence logo with biopython Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Hi, is there any way to produce sequence logos[1] with biopython? I have a set of sequences of the same length, which represent the 5' donorsite in a set of introns. I wonder if there is a way to to create and display a .png logo representation of them, like with this program: - http://weblogo.berkeley.edu/ Thanks!! [1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From bartek at rezolwenta.eu.org Mon Sep 17 05:49:12 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 11:49:12 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Message-ID: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio wrote: > Hi, > is there any way to produce sequence logos[1] with biopython? > > I have a set of sequences of the same length, which represent the 5' > donorsite in a set of introns. > I wonder if there is a way to to create and display a .png logo > representation of them, like with this program: > - http://weblogo.berkeley.edu/ > Unfortunately, currently there is no solution in biopython to this. You can however take a look at TAMO, a python library designed for working with motifs. http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with it, but there are ways to at least obtain text version of the logo. -- regards Bartek -- For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken From bartek at rezolwenta.eu.org Mon Sep 17 09:35:22 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 15:35:22 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> Message-ID: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> bartek wilczynski wrote: > Giovanni Marco Dall'Olio wrote: > > > Hi, > > is there any way to produce sequence logos[1] with biopython? > > > > I have a set of sequences of the same length, which represent the 5' > > donorsite in a set of introns. > > I wonder if there is a way to to create and display a .png logo > > representation of them, like with this program: > > - http://weblogo.berkeley.edu/ > > > > Unfortunately, currently there is no solution in biopython to this. You can > however take a look at TAMO, a python library designed for working with > motifs. > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with > it, > but there are ways to at least obtain text version of the logo. > I've looked into it, and found a way to add this functionality to biopython. The diff file attached introduces a method .weblogo("filename.png") to the Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a standalone function which takes a fasta file as input. Is it a right time to submit things like this to cvs? I can do that, but I do not want to mess up the (soon to be available) new release. -- regards Bartek -- For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken -------------- next part -------------- A non-text attachment was scrubbed... Name: Motif.py Type: text/x-python Size: 8378 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070917/0a15e1de/attachment.py From dalloliogm at gmail.com Mon Sep 17 10:23:54 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 16:23:54 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> Message-ID: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> Thank you: this is very good. I see that it uses the berkeley weblogo website and urllib. just one newbie question: why do you put it in the Bio.AlignAce.Motif class? Thanks Giovanni 2007/9/17, bartek wilczynski : > bartek wilczynski wrote: > > > Giovanni Marco Dall'Olio wrote: > > > > > Hi, > > > is there any way to produce sequence logos[1] with biopython? > > > > > > I have a set of sequences of the same length, which represent the 5' > > > donorsite in a set of introns. > > > I wonder if there is a way to to create and display a .png logo > > > representation of them, like with this program: > > > - http://weblogo.berkeley.edu/ > > > > > > > Unfortunately, currently there is no solution in biopython to this. You can > > however take a look at TAMO, a python library designed for working with > > motifs. > > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with > > it, > > but there are ways to at least obtain text version of the logo. > > > > I've looked into it, and found a way to add this functionality to biopython. > > The diff file attached introduces a method .weblogo("filename.png") to the > Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a > standalone function which takes a fasta file as input. > > Is it a right time to submit things like this to cvs? I can do that, but I do > not want to mess up the (soon to be available) new release. > > -- > regards > Bartek > -- > For every complex problem there is an answer that is clear, simple, and wrong. > H. L. Mencken > > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Mon Sep 17 10:24:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Sep 2007 15:24:42 +0100 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> Message-ID: <46EE8E2A.3080809@maubp.freeserve.co.uk> bartek wilczynski wrote: > I've looked into it, and found a way to add this functionality to biopython. > > The diff file attached introduces a method .weblogo("filename.png") to the > Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a > standalone function which takes a fasta file as input. > > Is it a right time to submit things like this to cvs? I can do that, but I do > not want to mess up the (soon to be available) new release. Its a very small change, but lets see what Michiel says for the timing. It might be nice to expose all the options to the end user, possibly as handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds as in Bio/Blast/NCBIStandalone.py blastall() etc. Peter From bartek at rezolwenta.eu.org Mon Sep 17 10:57:52 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 16:57:52 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> Message-ID: <1190041072.46ee95f03d51d@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio wrote: > Thank you: this is very good. > > I see that it uses the berkeley weblogo website and urllib. > > just one newbie question: why do you put it in the Bio.AlignAce.Motif class? > Thanks Well, the quick answer is that it is the most convenient place for me to put it. Since there is a Motif class for sequence motif objects, it is not a bad one. A longer answer is that biopython does not have a good infrastructure for dealing with motifs. I've contributed the AlignAce lib, Jason Hackney contributed the MEME library, which includes another Motif class, very similar, but not exactly compatible with AlignAce code. I planned once to do some refactoring work to unify these to modules, but so far did not find the time to do it. Now, since there is TAMO library available , there is even less incentive to do so (even though I do not use TAMO myself). cheers bartek From bartek at rezolwenta.eu.org Mon Sep 17 18:09:30 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 18 Sep 2007 00:09:30 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <46EE8E2A.3080809@maubp.freeserve.co.uk> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> Message-ID: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> Peter wrote: > > Is it a right time to submit things like this to cvs? I can do that, but I > > do not want to mess up the (soon to be available) new release. > > Its a very small change, but lets see what Michiel says for the timing. > It is indeed a very small change, however it seems to have at least one prospective user ;). Also it is almost impossible to break anything by including it in the new release. > It might be nice to expose all the options to the end user, possibly as > handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds > as in Bio/Blast/NCBIStandalone.py blastall() etc. Good idea, I've included a new diff, which allows for passing any keys directly from function call to the weblogo server such as: m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image BTW. It would be interesting to know if there are more people interested in using a better module for sequence motifs. I have some code lying arround and some ideas on how it could be put together, but since there were no documented cases of anyone using Bio.AlignAce or Bio.MEME, I'm not sure if it's worth the extra work. -- cheers Bartek -------------- next part -------------- A non-text attachment was scrubbed... Name: Motif.py.diff Type: text/x-patch Size: 2450 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070918/20c9d15e/attachment.bin From biopython at maubp.freeserve.co.uk Tue Sep 18 04:12:02 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 Sep 2007 09:12:02 +0100 Subject: [BioPython] Removing Bio.FormatIO ? In-Reply-To: <46E8308F.6040709@maubp.freeserve.co.uk> References: <46E8308F.6040709@maubp.freeserve.co.uk> Message-ID: <46EF8852.6090000@maubp.freeserve.co.uk> Having looked at the Bio.FormatIO code in more detail, a simple deprecation warning isn't an option - it would get triggered whenever anyone used Bio.SeqRecord Would anyone object if we removed Bio.FormatIO (and its hooks in Bio/SeqRecord.py and Bio/Search.py) entirely for the next release? Speak now or forever hold your peace! ;) Peter Peter wrote: > With the release Biopython 1.43 and Bio.SeqIO earlier this year, would > anyone be upset if the older Bio.FormatIO module was marked as > deprecated for the next Biopython release? > > This module isn't mentioned in the tutorial/cookbook, but Brad did write > this entire document: > http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf > http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html > > In addition to marking Bio.FormatIO as deprecated, I would probably add > a big disclaimer to that document, or re-write it to use Bio.SeqIO instead. > > Thanks > > Peter From biopython at maubp.freeserve.co.uk Tue Sep 18 04:51:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 Sep 2007 09:51:26 +0100 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> Message-ID: <46EF918E.90107@maubp.freeserve.co.uk> >> It might be nice to expose all the options to the end user, possibly as >> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds >> as in Bio/Blast/NCBIStandalone.py blastall() etc. > > Good idea, I've included a new diff, which allows for passing any keys directly > from function call to the weblogo server such as: > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image Does this let you do things like: m.weblogo("x.png", res=300) i.e. an integer, or do you have to use a string: m.weblogo("x.png", res="300") One way to "fix" this (if it is a problem) would be to do this: for k,v in kwds.items(): values[k]=str(v) rather than: for k,v in kwds.items(): values[k]=v Anyway, given we have at least ten days until the release (Michiel will be away - see his email on the developers list), and this is a little change, I would be happy for this to go into CVS now. Peter From bartek at rezolwenta.eu.org Tue Sep 18 09:12:41 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 18 Sep 2007 15:12:41 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <46EF918E.90107@maubp.freeserve.co.uk> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> Message-ID: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> Peter wrote: > > > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image > > Does this let you do things like: > > m.weblogo("x.png", res=300) > > i.e. an integer, or do you have to use a string: > > m.weblogo("x.png", res="300") > > One way to "fix" this (if it is a problem) would be to do this: > > for k,v in kwds.items(): > values[k]=str(v) > > rather than: > > for k,v in kwds.items(): > values[k]=v > > Anyway, given we have at least ten days until the release (Michiel will > be away - see his email on the developers list), and this is a little > change, I would be happy for this to go into CVS now. Thanks for another good idea. I submitted the code to CVS. -- cheers Bartek From dalloliogm at gmail.com Tue Sep 18 10:36:59 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 18 Sep 2007 16:36:59 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> Message-ID: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> ok, thank you. so, let's see if I understand how to use it: from Bio.Seq import Seq from Bio.AlignAce.Motif import Motif m = Motif() m.add_instance(Seq('ACTG')) m.add_instance(Seq('ACCG')) m.add_instance(Seq('ACTC')) m.search_instance(Seq('ACACGACAACGTGTCGAT')) m.weblogo('/home/user/logo.png') Well, about refactoring.... honestly I think it would be a good idea. The problem is that for example, I have never used AlignAce and I don't know which kind of program is it... so I feel a bit confusing to import a module called like this. Anyway the Motif class seems useful, and I will use it in my program.. problably I will have to ask a few questions on it in the next days! :) 2007/9/18, bartek wilczynski : > Peter wrote: > > > > > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image > > > > Does this let you do things like: > > > > m.weblogo("x.png", res=300) > > > > i.e. an integer, or do you have to use a string: > > > > m.weblogo("x.png", res="300") > > > > One way to "fix" this (if it is a problem) would be to do this: > > > > for k,v in kwds.items(): > > values[k]=str(v) > > > > rather than: > > > > for k,v in kwds.items(): > > values[k]=v > > > > Anyway, given we have at least ten days until the release (Michiel will > > be away - see his email on the developers list), and this is a little > > change, I would be happy for this to go into CVS now. > > Thanks for another good idea. I submitted the code to CVS. > > -- > cheers > Bartek > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From bartek at rezolwenta.eu.org Tue Sep 18 19:09:29 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Wed, 19 Sep 2007 01:09:29 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> Message-ID: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio : > ok, thank you. > > so, let's see if I understand how to use it: > > from Bio.Seq import Seq > from Bio.AlignAce.Motif import Motif > > m = Motif() > m.add_instance(Seq('ACTG')) > m.add_instance(Seq('ACCG')) > m.add_instance(Seq('ACTC')) > > m.search_instance(Seq('ACACGACAACGTGTCGAT')) > > m.weblogo('/home/user/logo.png') > You got it mostly right. However, the .search_instance() and .search_pwm() methods return generators, so you should rather use: for pos,instance in m.search_instance(sequence): print "found %s at %d"%(instance,pos) > > Well, about refactoring.... honestly I think it would be a good idea. > The problem is that for example, I have never used AlignAce and I > don't know which kind of program is it... so I feel a bit confusing to > import a module called like this. The basic idea is to create a new Motif class aggregating the good parts of the AlignAce and MEME versions and modify these modules so they would use the new class. I'll try to look into that next week. I also have some code for reading modules from the JASPAR database and motif comparisons. I'll try to clean it up ands submit as well. Then we could try to come up with a section in the tutorial devoted to motif analysis. If you have anything you would consider useful in the Motif library, let me know. > Anyway the Motif class seems useful, and I will use it in my program.. > problably I will have to ask a few questions on it in the next days! > :) No problem, I'll do my best to answer your questions. However I'm leaving tomorrow for the CMSB conference, so I may be slow at responding to email this week. -- cheers Bartek From robert.campbell at queensu.ca Wed Sep 19 09:32:14 2007 From: robert.campbell at queensu.ca (Robert Campbell) Date: Wed, 19 Sep 2007 09:32:14 -0400 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> Message-ID: <20070919093214.2c7567da@adelie.biochem.queensu.ca> On Wed, 19 Sep 2007 01:09:29 +0200, bartek wilczynski wrote: > Giovanni Marco Dall'Olio : > > > ok, thank you. > > > > so, let's see if I understand how to use it: > > > > from Bio.Seq import Seq > > from Bio.AlignAce.Motif import Motif > > > > m = Motif() > > m.add_instance(Seq('ACTG')) > > m.add_instance(Seq('ACCG')) > > m.add_instance(Seq('ACTC')) > > > > m.search_instance(Seq('ACACGACAACGTGTCGAT')) > > > > m.weblogo('/home/user/logo.png') > > > > You got it mostly right. However, the .search_instance() and .search_pwm() > methods return generators, so you should rather use: > > for pos,instance in m.search_instance(sequence): > print "found %s at %d"%(instance,pos) I believe that should be "m.search_instances(sequence)" not "m.search_instance(sequence)" (i.e. "instances", plural). Cheers, Rob -- Robert L. Campbell, Ph.D. Senior Research Associate/Adjunct Assistant Professor Botterell Hall Rm 644 Department of Biochemistry, Queen's University, Kingston, ON K7L 3N6 Canada Tel: 613-533-6821 Fax: 613-533-2497 http://pldserver1.biochem.queensu.ca/~rlc From meesters at uni-mainz.de Thu Sep 20 08:23:54 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 20 Sep 2007 14:23:54 +0200 Subject: [BioPython] feature request for Bio.PDB Message-ID: <1190291034.9570.28.camel@cmeesters> Hi, I think it would be good to have the option to retrieve the kind of atom added using a method of the atom-class, e.g. like: x = atom.get_kind() and x would then be 'H' or 'N' for instance. It is of course possible to retrieve this information via the atom id, but this requires to employ a dictionary if one wants to know which type of atom this is. So, such a method would only be for convenience. It would be nice to see this in the upcoming release, but I fear it's too late for this and it would be great if this idea would only be considered for some other future release. Christian From anaryin at gmail.com Fri Sep 21 15:40:07 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 20:40:07 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: Message-ID: Hello all! I'm writing a small script to fetch results from a NCBI database search using BioPython modules. However, I'd like to broaden my search and to have each page of the results displaying 500 results instead of the usual 20. Does anyone has any idea on how to do this? Thanks ! Jo?o Rodrigues From anaryin at gmail.com Fri Sep 21 16:33:55 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 21:33:55 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: <46F41FEA.5020205@maubp.freeserve.co.uk> References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the code *could* be clearer! Oh, and some of it is in Portuguese because it's for personal use.. # NCBI Retriever import os import sys # What should I look for? query = raw_input('Qual a expressao que deseja procurar?\n..: ') # Where should I look for? print 'Em qual das bases de dados deseja procurar?' databases = {1: 'PubMed', 2: 'Nucleotide', 3: 'Protein',4:'Genome',5:'Structure'} choice = raw_input('[1] PubMed\n[2] Nucleotide\n[3] Protein\n[4] Genome\n[5] Structure\n..: ') if int(choice) not in databases.keys(): print 'Escolha Inv?lida' sys.exit() search_database = databases[int(choice)] # Quit playing around, let s search! from Bio.WWW import NCBI search_command = 'Search' results = NCBI.query(search_command , search_database, term = query, doptcmdl = 'FASTA') # Where should I save the results? import time actual_date = str(time.localtime()[0])+str(time.localtime()[1])+str( time.localtime()[2]) results_file_name = os.path.join(os.getcwd(), str(query)+'_'+str(actual_date)+".txt") results_file = open(results_file_name, 'w') results_file.write(results.read()) results_file.close() From biopython at maubp.freeserve.co.uk Fri Sep 21 17:40:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Sep 2007 22:40:15 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: <46F43A3F.9090008@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the > code *could* be clearer! Oh, and some of it is in Portuguese because it's > for personal use.. That's fine - as the code and comments were in English it was fine. I see you are using Bio.WWW.NCBI as an interface to the Entrez query system. Somewhere on the NCBI website they have an answer to your question (how to specify the number of results per page): results = NCBI.query('Search', 'Protein', term='orchid', dispmax=23) Some pages mentioned retstart and retmax but that doesn't seem to work. You might also consider using Bio.EUtils instead - a python wrapper for the NCBI's E-Utils interface. Peter From biopython at maubp.freeserve.co.uk Fri Sep 21 18:00:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Sep 2007 23:00:58 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: <46F43F1A.9040309@maubp.freeserve.co.uk> Hi again Jo?o, I'm was thinking about your example code, and while I'm not sure exactly what you want to be able to do in python: You might want to look at the search_for() function in Bio.PubMed and Bio.GenBank (which uses EUtils internally), and then the download_many() or dictionary interfaces. This is covered in the Biopython tutorial. I'm not sure if we have a front end for the structure database at the moment. This may be more helpful than working with Entrez directly. Peter From anaryin at gmail.com Fri Sep 21 18:57:20 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 23:57:20 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: <46F43F1A.9040309@maubp.freeserve.co.uk> References: <46F41FEA.5020205@maubp.freeserve.co.uk> <46F43F1A.9040309@maubp.freeserve.co.uk> Message-ID: Thanks you for the tip, it worked perfectly. Well, to be honest, I'm just practicing BioPython and Python skills. What I'm trying to do is a simple script that searches for *something* in PubMed, gets the results page and parses that page so that I can give the user, that is, myself at the moment :) , a txt file with this format: ---- TITLE: AUTHOR: YEAR: JOURNAL: (optional actually) ABSTRACT: LINK: RELATED LINKS: ---- It is probably already made and in a more useful way than mine but, as I do need to practice, it's a start! Again, thanks for the tips. I'll look into those Bio.PubMed and Bio.GenBank. From anaryin at gmail.com Mon Sep 24 12:13:33 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 24 Sep 2007 17:13:33 +0100 Subject: [BioPython] Configuring Proxy for certain Modules Message-ID: Hello! I am working in a University whose network is proxied. I can't work with any of the BioPython modules that require access to the Internet (e.g. Bio.WWW). How can I configure them manually to override the proxy? I already read about configuring the urllib to use a proxy, but I can't figure out where to find the string that handles the connection. Jo?o Rodrigues From biopython at maubp.freeserve.co.uk Mon Sep 24 12:58:56 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Sep 2007 17:58:56 +0100 Subject: [BioPython] Configuring Proxy for certain Modules In-Reply-To: References: Message-ID: <46F7ECD0.8020001@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Hello! > > I am working in a University whose network is proxied. I can't work > with any of the BioPython modules that require access to the Internet > (e.g. Bio.WWW). How can I configure them manually to override the > proxy? I already read about configuring the urllib to use a proxy, > but I can't figure out where to find the string that handles the > connection. Bio.WWW uses urllib, so the simplest answer is to follow the advice in http://docs.python.org/lib/module-urllib.html Specifically on Windows you probably just need to set the http_proxy environment variables before starting Python, or configure the proxy in the internet settings (via Internet Explorer I assume). I think would be easiest to set this environment variable once by hand, but you could set it at run time as part of your python script. You'll have to consult your Universities network documentation to determine the string to use for the http_proxy environment variable, but it would look something like "http://www.someproxy.com:3128" (i.e. address:port number). The alternative is to pass the "proxies" option to urllib.openurl(), but this would require multiple changes in Bio.WWW to support. Note that urllib does not currently support proxies which require authentication. Peter From biopython at maubp.freeserve.co.uk Mon Sep 24 17:47:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Sep 2007 22:47:13 +0100 Subject: [BioPython] poor man's databases for large sequence files Message-ID: <46F83061.3090207@maubp.freeserve.co.uk> I've been thinking about extending Bio.SeqIO to support a (read only) dictionary like interface for large sequence files (WITHOUT having everything in memory). Some of the older Biopython sequence format specific modules have an index_file function and matching Dictionary class to do this (based internally on either Martel/Mindy or a DIY Biopython indexer based on pickle). When thinking about a format agnostic SeqRecord dictionary, the built in python "Shelf" object from python's built in "shelve library" looks like a good choice. I could add a Bio.SeqIO.to_shelf() function similar to the existing Bio.SeqIO.to_dict() function. The only downside I've thought of so far is updating a shelf database, something supported by shelve but with a few gotchas when dealing with non-trivial datatypes (like dictionaries). The need I am thinking about addressing is a little less flexible - read only low-memory access to a large collection of SeqRecords (typically from a large sequence file). Does anyone already use python's shelve library with sequence data? Peter From anaryin at gmail.com Mon Sep 24 19:11:57 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 25 Sep 2007 00:11:57 +0100 Subject: [BioPython] Configuring Proxy for certain Modules In-Reply-To: <46F7ECD0.8020001@maubp.freeserve.co.uk> References: <46F7ECD0.8020001@maubp.freeserve.co.uk> Message-ID: Again, thank you for the kind answer! I had in fact read about the urllib module and that was how I "discovered" that I could configure the proxy "by hand". If I set it automatically at the IE, or firefox, it won't work on Python, but it will on the browser. As for the http_proxy env variable, how do I set them? From sdavis2 at mail.nih.gov Mon Sep 24 21:40:21 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 24 Sep 2007 21:40:21 -0400 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F83061.3090207@maubp.freeserve.co.uk> References: <46F83061.3090207@maubp.freeserve.co.uk> Message-ID: <46F86705.1090109@mail.nih.gov> Peter wrote: > I've been thinking about extending Bio.SeqIO to support a (read only) > dictionary like interface for large sequence files (WITHOUT having > everything in memory). > > Some of the older Biopython sequence format specific modules have an > index_file function and matching Dictionary class to do this (based > internally on either Martel/Mindy or a DIY Biopython indexer based on > pickle). > > When thinking about a format agnostic SeqRecord dictionary, the built in > python "Shelf" object from python's built in "shelve library" looks like > a good choice. I could add a Bio.SeqIO.to_shelf() function similar to > the existing Bio.SeqIO.to_dict() function. > > The only downside I've thought of so far is updating a shelf database, > something supported by shelve but with a few gotchas when dealing with > non-trivial datatypes (like dictionaries). The need I am thinking about > addressing is a little less flexible - read only low-memory access to a > large collection of SeqRecords (typically from a large sequence file). > > Does anyone already use python's shelve library with sequence data? > Just a curiosity, Peter, but would this extension deal with small collections of large sequences (finished genomes, for example)? Sean From biopython at maubp.freeserve.co.uk Tue Sep 25 04:14:50 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 25 Sep 2007 09:14:50 +0100 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F86705.1090109@mail.nih.gov> References: <46F83061.3090207@maubp.freeserve.co.uk> <46F86705.1090109@mail.nih.gov> Message-ID: <46F8C37A.1000005@maubp.freeserve.co.uk> Sean Davis wrote: > Peter wrote: >> I've been thinking about extending Bio.SeqIO to support a (read only) >> dictionary like interface for large sequence files (WITHOUT having >> everything in memory). >> >> ... >> >> Does anyone already use python's shelve library with sequence data? >> > > Just a curiosity, Peter, but would this extension deal with small > collections of large sequences (finished genomes, for example)? > Hi Sean, What I had in mind was say indexing all of UniProt which is currently 1.1 GB in the SwissProt flat file format, but each record is pretty small. However, in theory this (largely unwritten) code could be used on any number of any sized records - but you would need enough ram to hold any one record in memory at once, plus some more RAM for the hopefully modest database overhead, python, your script etc. I suppose having all the chromosomes for a given Eukaryote (e.g. mouse or fruit fly) would also be a sensible examples; having tens of records where each is tens of MB in size. Is that the sort of thing you had in mind Sean? Peter From sdavis2 at mail.nih.gov Tue Sep 25 07:41:25 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 25 Sep 2007 07:41:25 -0400 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F8C37A.1000005@maubp.freeserve.co.uk> References: <46F83061.3090207@maubp.freeserve.co.uk> <46F86705.1090109@mail.nih.gov> <46F8C37A.1000005@maubp.freeserve.co.uk> Message-ID: <46F8F3E5.5020802@mail.nih.gov> Peter wrote: > Sean Davis wrote: >> Peter wrote: >>> I've been thinking about extending Bio.SeqIO to support a (read only) >>> dictionary like interface for large sequence files (WITHOUT having >>> everything in memory). >>> >>> ... >>> >>> Does anyone already use python's shelve library with sequence data? >>> >> >> Just a curiosity, Peter, but would this extension deal with small >> collections of large sequences (finished genomes, for example)? > > Hi Sean, > > What I had in mind was say indexing all of UniProt which is currently > 1.1 GB in the SwissProt flat file format, but each record is pretty small. > > However, in theory this (largely unwritten) code could be used on any > number of any sized records - but you would need enough ram to hold any > one record in memory at once, plus some more RAM for the hopefully > modest database overhead, python, your script etc. > > I suppose having all the chromosomes for a given Eukaryote (e.g. mouse > or fruit fly) would also be a sensible examples; having tens of records > where each is tens of MB in size. Is that the sort of thing you had in > mind Sean? Yes. Lincoln Stein wrote some indexing stuff in perl that allows essentially random access to sequence records as well as subsets of individual records. It makes it possible to do range queries on individual sequences with very modest memory; with a larger memory machine, one might imagine that this would result in very fast queries as the files get cached. Sean From ytu888 at hotmail.com Fri Sep 28 07:40:09 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 06:40:09 -0500 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X Message-ID: I'm a newbie for the Biopython, and want to install it on my Mac OS X computer. I got the similar error messages on command line when install Python2.5, but finally I did that using the python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got the following messages: mxDateTime.c is missing. Where to find the file? Please help me to solve the problem and thank you very much. LeesComputer:/Users/Python_Bio/egenix-mx-base-3.0.0.macosx-10.3-fat-py2.5_ucs4.prebuilt Lee$ sudo python setup.py build running build running mx_autoconf gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -D_GNU_SOURCE=1 -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c _configtest.c -o _configtest.o success! removing: _configtest.c _configtest.o macros to define: [] macros to undefine: [] running build_ext building extension "mx.DateTime.mxDateTime.mxDateTime" (required) building 'mx.DateTime.mxDateTime.mxDateTime' extension gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DUSE_FAST_GETCURRENTTIME -Imx/DateTime/mxDateTime -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c mx/DateTime/mxDateTime/mxDateTime.c -o build/temp.darwin-8.10.1-i386-2.3_ucs2/mx-DateTime-mxDateTime-mxDateTime/mx/DateTime/mxDateTime/mxDateTime.o i686-apple-darwin8-gcc-4.0.1: mx/DateTime/mxDateTime/mxDateTime.c: No such file or directory i686-apple-darwin8-gcc-4.0.1: no input files error: command 'gcc' failed with exit status 1 _________________________________________________________________ Connect to the next generation of MSN Messenger? http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline From biopython at maubp.freeserve.co.uk Fri Sep 28 08:27:17 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 13:27:17 +0100 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X In-Reply-To: References: Message-ID: <46FCF325.4040002@maubp.freeserve.co.uk> Y Tu wrote: > I'm a newbie for the Biopython, and want to install it on my Mac OS X > computer. I got the similar error messages on command line when > install Python2.5, but finally I did that using the > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got > the following messages: mxDateTime.c is missing. Where to find the > file? Please help me to solve the problem and thank you very much. It sounds like you don't want to use the default Apple provided python - I have the impression that this can make life more complicated. I'm not a Mac user, but Michiel, is and he may be able to help. He has been away recently but should be back soon. In terms of installing mxTextTools, you may get more support on the egenix mailing list. However, there are currently some issues with Biopython and egenix mxTextTools 3.0, so if you can find it I would suggest using version 2.0 instead. We hope to release Biopython 1.44 in October, which will address most of the mxTextText tools issues. That said, the majority of Biopython 1.43 will still work even with mxTextTools 3.0 Peter From ytu888 at hotmail.com Fri Sep 28 09:22:43 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 08:22:43 -0500 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: The one coming with Mac OS X is an old version. Therefore I installed the new one 2.5.1 and it succeeded. Then, it came the problem with mxTextTools. I just did the installation of Numerical and it worked. > Date: Fri, 28 Sep 2007 13:27:17 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for installation of mxTextTools on Mac OS X > > Y Tu wrote: > > I'm a newbie for the Biopython, and want to install it on my Mac OS X > > computer. I got the similar error messages on command line when > > install Python2.5, but finally I did that using the > > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got > > the following messages: mxDateTime.c is missing. Where to find the > > file? Please help me to solve the problem and thank you very much. > > It sounds like you don't want to use the default Apple provided python - > I have the impression that this can make life more complicated. I'm not > a Mac user, but Michiel, is and he may be able to help. He has been > away recently but should be back soon. > > In terms of installing mxTextTools, you may get more support on the > egenix mailing list. However, there are currently some issues with > Biopython and egenix mxTextTools 3.0, so if you can find it I would > suggest using version 2.0 instead. > > We hope to release Biopython 1.44 in October, which will address most of > the mxTextText tools issues. That said, the majority of Biopython 1.43 > will still work even with mxTextTools 3.0 > > Peter > _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From ytu888 at hotmail.com Fri Sep 28 11:26:11 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 10:26:11 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: I just installed ReportLab on Mac OS X and the test with command "from reportlab.graphics import renderPDF" succeeded. However, when I run the test script (eportlab/test/test_pdfgen_general.py), I got the following error. How to fix the problem. Another question is how to run the script under the python prompt (>>>) after importing the script by "import test_pdfgen_general.py". Thank you very much. nypivs-lee:/Applications/MacPython 2.5/reportlab/test lee$ python test_pdfgen_general.py E ====================================================================== ERROR: Make a PDFgen document with most graphics features ---------------------------------------------------------------------- Traceback (most recent call last): File "test_pdfgen_general.py", line 833, in test0 run(outputfile('test_pdfgen_general.pdf')) File "test_pdfgen_general.py", line 796, in run c = makeDocument(filename) File "test_pdfgen_general.py", line 725, in makeDocument c.drawImage(tgif, 4*inch, 9.25*inch, w, h, mask='auto') File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfgen/canvas.py", line 629, in drawImage imgObj = pdfdoc.PDFImageXObject(name, image, mask=mask) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1840, in __init__ self.loadImageFromA85(src) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1846, in loadImageFromA85 imagedata = map(string.strip,pdfutils.makeA85Image(source,IMG=IMG)) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfutils.py", line 35, in makeA85Image raw = img.getRGBData() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/lib/utils.py", line 612, in getRGBData self._data = im.tostring() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 513, in tostring self.load() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load d = Image._getdecoder(self.mode, d, a, self.decoderconfig) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder raise IOError("decoder %s not available" % decoder_name) IOError: decoder jpeg not available ---------------------------------------------------------------------- Ran 1 test in 0.321s FAILED (errors=1) _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From biopython at maubp.freeserve.co.uk Fri Sep 28 12:28:28 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 17:28:28 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: <46FD2BAC.80401@maubp.freeserve.co.uk> Y Tu wrote: > I just installed ReportLab on Mac OS X and the test with command > "from reportlab.graphics import renderPDF" succeeded. However, when I > run the test script (reportlab/test/test_pdfgen_general.py), I got the > following error. How to fix the problem. I would guess you have not installed PIL, the Python Imaging Library, which ReportLab uses. > Another question is how to > run the script under the python prompt (>>>) after importing the > script by "import test_pdfgen_general.py". Thank you very much. To run a python script, like "test_pdfgen_general.py", at the command line type: python test_pdfgen_general.py (assuming python is on the path, and example.py is in the current directory) In general there are two sorts of python files, scripts which you run (like test_pdfgen_general.py) and library modules you import. Peter From ytu888 at hotmail.com Fri Sep 28 15:18:06 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 14:18:06 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FD2BAC.80401@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> Message-ID: Thank you, Peter for the prompt answer. I did install the PIL already and tested with the commands "from PIL import Image", then "import _imaging". Both commands succeeded. That's why I don't understand why the test won't work. I used the command "python test_pdfgen_general.py" under the shell prompt, which generated the error. Since I installed PIL and succeeded in importing the module of PIL, I thought maybe I can solve the problem by running the test under Python. However, after importing the test into Python. I do't know how to launch the test under the python prompt (>>>). That's why I asked the second question. Once again, thank you very much for help. > Date: Fri, 28 Sep 2007 17:28:28 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > > Y Tu wrote: > > I just installed ReportLab on Mac OS X and the test with command > > "from reportlab.graphics import renderPDF" succeeded. However, when I > > run the test script (reportlab/test/test_pdfgen_general.py), I got the > > following error. How to fix the problem. > > I would guess you have not installed PIL, the Python Imaging Library, > which ReportLab uses. > > > Another question is how to > > run the script under the python prompt (>>>) after importing the > > script by "import test_pdfgen_general.py". Thank you very much. > > To run a python script, like "test_pdfgen_general.py", at the command > line type: > > python test_pdfgen_general.py > > (assuming python is on the path, and example.py is in the current directory) > > In general there are two sorts of python files, scripts which you run > (like test_pdfgen_general.py) and library modules you import. > > Peter > _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From biopython at maubp.freeserve.co.uk Fri Sep 28 15:42:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 20:42:31 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> Message-ID: <46FD5927.3000207@maubp.freeserve.co.uk> Y Tu wrote: > Thank you, Peter for the prompt answer. > > I did install the PIL already and tested with the commands "from PIL > import Image", then "import _imaging". Both commands succeeded. > That's why I don't understand why the test won't work. I used the > command "python test_pdfgen_general.py" under the shell prompt, which > generated the error. Since I installed PIL and succeeded in importing > the module of PIL, I thought maybe I can solve the problem by running > the test under Python. Looking in more detail at the original stack trace, > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load > d = Image._getdecoder(self.mode, d, a, self.decoderconfig) > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder > raise IOError("decoder %s not available" % decoder_name) > IOError: decoder jpeg not available Its possible that PIL needs some optional JPEG library, which ReportLab wants to use. I suggest you search the ReportLab website & user's mailing list, and if you can't work out what is wrong sign up to their mailing list and ask them, http://www.reportlab.org/ Very little of Biopython needs ReportLab, you should be able to install Biopython without it. Peter From ibdeno at gmail.com Sun Sep 2 15:52:57 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Sun, 2 Sep 2007 17:52:57 +0200 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary Message-ID: Hello everyone. I'm trying to retrieve from NCBI a series of GeneBank records from a list read from a file. This is the code: 8<------------------------------------------------------------------------------------------- ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") output = open(args[0]+'.gb','w') for gbid in ids: gb_record = ncbi_dict[gbid] output.write(gb_record) output.close() ------------------------------------------------------------------------------------------->8 The problem is that at some point the job stops with an error such as: Traceback (most recent call last): File "/Users/mol/bin/getfromGB.py", line 61, in ? main() File "/Users/mol/bin/getfromGB.py", line 54, in main gb_record = ncbi_dict[gbid] File "/sw/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 1264, in __getitem__ handle = self.db[id] File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 89, in __getitem__ return self._get(key) File "/sw/lib/python2.4/site-packages/Bio/config/_support.py", line 107, in __call__ return self.fn(*args, **keywds) File "/sw/lib/python2.4/site-packages/Bio/config/DBRegistry.py", line 370, in _get handle = eutils_client.efetch(retmode = "text", rettype = File "/sw/lib/python2.4/site-packages/Bio/EUtils/DBIdsClient.py", line 150, in efetch complexity = complexity) File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 987, in efetch_using_dbids query = {"id": id_string, File "/sw/lib/python2.4/site-packages/Bio/EUtils/ThinClient.py", line 644, in _get return self.opener.open(url) File "/sw/lib/python2.4/urllib2.py", line 364, in open response = meth(req, response) File "/sw/lib/python2.4/urllib2.py", line 471, in http_response response = self.parent.error( File "/sw/lib/python2.4/urllib2.py", line 402, in error return self._call_chain(*args) File "/sw/lib/python2.4/urllib2.py", line 337, in _call_chain result = func(*args) File "/sw/lib/python2.4/urllib2.py", line 480, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 503: Service Temporarily Unavailable Sometimes is a 502 Error... Because I can access those entries from my browser without problem, I'm guessing that there may be a timeout problem here. I would appreciate your help! Cheers, Miguel -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From sbassi at gmail.com Mon Sep 3 03:25:22 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 3 Sep 2007 00:25:22 -0300 Subject: [BioPython] Getting the location from a Genbank record Message-ID: I can get the "location" of the genes I want, but I have them in a "print mode" (calling __str__), but I don't see how to get the start and end position in a way I could use to slice the seq. There are private attributes _start and _end but I don't know if using them if the "right" way to do it. from Bio import SeqIO mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next() targets=(['cox2'],['atp6'],['atp9'],['cob']) for x in mr.features: if x.qualifiers.get('gene') in targets: print x.location #print mr.seq Get the slice I am looking for: >>> mr.seq[x.location._start.position:x.location._end.position] Seq('ATGAATGTTATAACTCCTAATTCTTTGGTAGCGGACCTCTTTGATAGTTCGACCCTTATCCCCCGTCTAACTCAACTATTCGACTGTACGGCTATTGTGATTGCGAGAGAAAGGAGGGATGGCGCCTTCCTTTACCATCTGGCGGTTGAAAACAAAAGTGCTTCCAGGTACACGGCTGTTAGGCTCATCCAAGGCGTATTTACGGAAGTAGCAGGGAACTTGACCGTCAAGTTTGAAAAAAGCTGGCCAAGCCTGTGTCACTTTCTTACGTCAGGAGAAAGGGAGATCAAAGAAGTATGGGGCCGATACGCGAAGGATCAAATCATAGAGATAGCGGATCTTAAGAGGCGGAAGAAAAGGAACCTCGGCGACCCAGAGATCGCGGAGTCCGCGCCCGTGCCGAAAGTGAAGAAGCTTTCCTCTCCTTTCAGTCGAGCATGCCCGCCCTTTAGCACTTCCCTTCCCGAAGTGGGAGTAGGAGAAAGGAAAGCGCACTCGATCAATTACCATGCCGTGTCGTAA', IUPACAmbiguousDNA()) -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Mon Sep 3 10:46:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Sep 2007 11:46:32 +0100 Subject: [BioPython] Getting the location from a Genbank record In-Reply-To: References: Message-ID: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> On 9/3/07, Sebastian Bassi wrote: > I can get the "location" of the genes I want, but I have them in a > "print mode" (calling __str__), but I don't see how to get the start > and end position in a way I could use to slice the seq. There are > private attributes _start and _end but I don't know if using them if > the "right" way to do it. > > from Bio import SeqIO > mr = SeqIO.parse(open("MTtabaco.gbk"), "genbank").next() > targets=(['cox2'],['atp6'],['atp9'],['cob']) > for x in mr.features: > if x.qualifiers.get('gene') in targets: > print x.location > #print mr.seq I'm not at my own computer right now, but I think you need to do something like this to get the slice - assuming nothing funny like joins: start = x.location.start.position end = x.location.end.position print mr.seq[start:end] print mr.seq[start:end].reverse_complement() See also: http://www.warwick.ac.uk/go/peter_cock/python/genbank/ Peter From sbassi at gmail.com Mon Sep 3 13:32:28 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 3 Sep 2007 10:32:28 -0300 Subject: [BioPython] Getting the location from a Genbank record In-Reply-To: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> References: <320fb6e00709030346s73852184u70fc3b8f44ba7ebe@mail.gmail.com> Message-ID: On 9/3/07, Peter wrote: > start = x.location.start.position > end = x.location.end.position Yes, this worked. I tried x.location._start.position because of this: >>> dir(x.location) ['__doc__', '__getattr__', '__init__', '__module__', '__str__', '_end', '_start'] Thank you! -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Mon Sep 3 16:47:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 03 Sep 2007 17:47:06 +0100 Subject: [BioPython] Extracting SeqFeature locations from sequences Message-ID: <46DC3A8A.1000100@maubp.freeserve.co.uk> I was prompted to actually write this email based on Sebastian Bassi's recent email where he was having trouble getting to grips with this topic. I had been thinking that Biopython really should have code built in to take a SeqFeature's location and extract this from the full record sequence. This would particularly apply to SeqRecord objects read from GenBank or EMBL files (using Bio.SeqIO or using Bio.GenBank directly). As far as I am aware, right now it is up to the user to take the information stored in a SeqFeature and apply this "by hand" to the parent record's sequence. Adding some more detailed examples to the tutorial is probably a good idea - for example based on http://www.warwick.ac.uk/go/peter_cock/python/genbank/ In addition to improving the documentation, we could add a new method to the Seq and/or SeqRecord object which would return the sub-sequence defined by a SeqFeature. We could even do this via the __getitem__ method, normally used for accessing elements of a sequence (as strings) or splicing to get a sub-sequence. e.g. print seq[index] print seq[start:end] print seq[feature] or, print record[feature] I think this is quite elegant, but a separate explicitly named method might be clearer and more discoverable. To do this properly covering all cases is actually non-trivial - a good reason to have it built into Biopython (with a good test suite) rather than having end users reimplement it themselves. Messy details to take care of include being aware of both joins and complements (stored as sub-features and the strand property respectively), and fuzzy locations. Most situations should be resolved relatively easily - but in the worst case we could throw a ValueError if there really is no sensible solution. Peter From biopython at maubp.freeserve.co.uk Tue Sep 4 13:05:21 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 04 Sep 2007 14:05:21 +0100 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary In-Reply-To: References: Message-ID: <46DD5811.8060209@maubp.freeserve.co.uk> Miguel Ortiz-Lombard?a wrote: > Hello everyone. > > I'm trying to retrieve from NCBI a series of GeneBank records from a list > read from a file. How many GenBenk identifiers are we talking about? Just trying to get an idea of the scale of the problem. It certainly sounds like either network failures or timeouts. Have you try something like this? from Bio import GenBank from urllib2 import HTTPError ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") ids = ['14598510', '16904191'] output = open('saved.gb','w') for gbid in ids: print "Fetching %s" % gbid try : gb_record = ncbi_dict[gbid] except HTTPError, e : #Check error code? print str(e) print "Re-trying %s" % gbid gb_record = ncbi_dict[gbid] output.write(gb_record) output.close() print "Done" Peter From jimmy.musselwhite at gmail.com Tue Sep 4 13:23:37 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Tue, 4 Sep 2007 09:23:37 -0400 Subject: [BioPython] Bio.Cluster clarification Message-ID: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Hello all In the documentation it says the "data" argument is "an array containing the gene expression data". What exactly does that mean? Ideally all I want to do is send it an array of lists, each containing 3 floats, aka an array of vectors in 3d space, and have it cluster those. Is that doable? This may seem like a beginner question but I'm not sure of this documentation (cluster.pdf). Thanks! Or, less likely, if you know of any python lib that can handle this, let me know! From biopython at maubp.freeserve.co.uk Tue Sep 4 13:42:00 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 04 Sep 2007 14:42:00 +0100 Subject: [BioPython] Bio.Cluster clarification In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Message-ID: <46DD60A8.7070403@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Hello all > In the documentation it says the "data" argument is "an array > containing the gene expression data". What exactly does that mean? I suspect that means an array object from the Numeric library. i.e. a two dimensional dataset of floats. In the context of gene expression, the rows are usually different genes and the columns different samples (typically covering two or more experimental conditions), and the data points are simply floating point numbers (gene expression levels). > Ideally all I want to do is send it an array of lists, each > containing 3 floats, aka an array of vectors in 3d space, and have it > cluster those. Is that doable? When you say you have an array of three-vectors, do you mean you have a three dimensional dataset? e.g. a vector field > This may seem like a beginner question but I'm not sure of this > documentation (cluster.pdf). Hopefully Michiel will reply shortly - as the author of Bio.Cluster, he should be able to give you a more precise answer. See also his webpage: http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ Peter From mdehoon at c2b2.columbia.edu Tue Sep 4 13:47:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 04 Sep 2007 22:47:49 +0900 Subject: [BioPython] Bio.Cluster clarification In-Reply-To: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> References: <86e5e8970709040623n66dcb850sfc3fc74c5c2e3e19@mail.gmail.com> Message-ID: <46DD6205.7070801@c2b2.columbia.edu> Jimmy Musselwhite wrote: > Hello all > In the documentation it says the "data" argument is "an array containing the > gene expression data". What exactly does that mean? Ideally all I want to do > is send it an array of lists, each containing 3 floats, aka an array of > vectors in 3d space, and have it cluster those. Is that doable? Yes. --Michiel. > > This may seem like a beginner question but I'm not sure of this > documentation (cluster.pdf). > > Thanks! > > Or, less likely, if you know of any python lib that can handle this, let me > know! > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ibdeno at gmail.com Tue Sep 4 14:55:27 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Tue, 4 Sep 2007 16:55:27 +0200 Subject: [BioPython] problem accessing ncbi through GenBank.NCBIDictionary In-Reply-To: <46DD5811.8060209@maubp.freeserve.co.uk> References: <46DD5811.8060209@maubp.freeserve.co.uk> Message-ID: Eventually, I managed to download all of them (21 only...) But thank you very much for the tip, I will incorporate that error check/try to the script! Cheers, Miguel 2007/9/4, Peter : > > Miguel Ortiz-Lombard?a wrote: > > Hello everyone. > > > > I'm trying to retrieve from NCBI a series of GeneBank records from a > list > > read from a file. > > How many GenBenk identifiers are we talking about? Just trying to get an > idea of the scale of the problem. It certainly sounds like either > network failures or timeouts. Have you try something like this? > > from Bio import GenBank > from urllib2 import HTTPError > ncbi_dict = GenBank.NCBIDictionary("protein", "genbank") > ids = ['14598510', '16904191'] > output = open('saved.gb','w') > for gbid in ids: > print "Fetching %s" % gbid > try : > gb_record = ncbi_dict[gbid] > except HTTPError, e : > #Check error code? > print str(e) > print "Re-trying %s" % gbid > gb_record = ncbi_dict[gbid] > output.write(gb_record) > output.close() > print "Done" > > Peter > > -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From meesters at uni-mainz.de Wed Sep 5 15:47:07 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Wed, 5 Sep 2007 17:47:07 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? Message-ID: <1189007228.27068.31.camel@cmeesters> Hi, Does anyone know a way to compute the maximum distance within a protein (perhaps using Bio.PDB) without calculating distances of all atom pairs? I'm hoping to be just too blind to see an easy solution here ... TIA Christian From idoerg at gmail.com Wed Sep 5 16:02:04 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 5 Sep 2007 09:02:04 -0700 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <1189007228.27068.31.camel@cmeesters> References: <1189007228.27068.31.camel@cmeesters> Message-ID: Not sure why you would want to do that. But how about calculating the diameter of an enclosing sphere? On 9/5/07, Christian Meesters wrote: > > Hi, > > Does anyone know a way to compute the maximum distance within a protein > (perhaps using Bio.PDB) without calculating distances of all atom > pairs? > > I'm hoping to be just too blind to see an easy solution here ... > > TIA > Christian > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From biopython at maubp.freeserve.co.uk Wed Sep 5 16:24:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 05 Sep 2007 17:24:06 +0100 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <1189007228.27068.31.camel@cmeesters> References: <1189007228.27068.31.camel@cmeesters> Message-ID: <46DED826.4050802@maubp.freeserve.co.uk> Christian Meesters wrote: > Hi, > > Does anyone know a way to compute the maximum distance within a protein > (perhaps using Bio.PDB) without calculating distances of all atom > pairs? Are you thinking alpha-carbon to alpha-carbon distances, or using all atoms? > I'm hoping to be just too blind to see an easy solution here ... There should be some way to take advantage of the backbone links meaning lots of residues are constrained to be close to each other... Is it essential to get the largest pairwise distance, or would a local maximum do? You could probably do some clever sampling, say doing all pairwise combination of every third residue, and then for those furthest apart including all the local residues... just thinking out loud. Peter From ibdeno at gmail.com Wed Sep 5 17:31:55 2007 From: ibdeno at gmail.com (=?ISO-8859-1?Q?Miguel_Ortiz-Lombard=EDa?=) Date: Wed, 5 Sep 2007 19:31:55 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: <46DED826.4050802@maubp.freeserve.co.uk> References: <1189007228.27068.31.camel@cmeesters> <46DED826.4050802@maubp.freeserve.co.uk> Message-ID: Hello, You can align the protein coordinates against its principal axes of inertia. This is very fast. One (free) program doing so is 'moleman2' from the Uppsala Software Factory: http://alpha2.bmc.uu.se/~gerard/usf/ HTH, Miguel 2007/9/5, Peter : > > Christian Meesters wrote: > > Hi, > > > > Does anyone know a way to compute the maximum distance within a protein > > (perhaps using Bio.PDB) without calculating distances of all atom > > pairs? > > Are you thinking alpha-carbon to alpha-carbon distances, or using all > atoms? > > > I'm hoping to be just too blind to see an easy solution here ... > > There should be some way to take advantage of the backbone links meaning > lots of residues are constrained to be close to each other... Is it > essential to get the largest pairwise distance, or would a local maximum > do? > > You could probably do some clever sampling, say doing all pairwise > combination of every third residue, and then for those furthest apart > including all the local residues... just thinking out loud. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- correo-e: ibdeno at gmail.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Je suis de la mauvaise herbe, Braves gens, braves gens, Je pousse en libert? Dans les jardins mal fr?quent?s! Georges Brassens From thamelry at binf.ku.dk Wed Sep 5 18:19:28 2007 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Wed, 5 Sep 2007 20:19:28 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? In-Reply-To: References: <1189007228.27068.31.camel@cmeesters> <46DED826.4050802@maubp.freeserve.co.uk> Message-ID: <2d7c25310709051119r18e278cag70f4750272f3cea@mail.gmail.com> Hi, This is one of those problems that computational geometry people love to solve. See for example: http://www-sop.inria.fr/epidaure/personnel/malandain/diameter/ Google will give many other algorithms... Cheers, -Thomas From meesters at uni-mainz.de Thu Sep 6 12:37:13 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 6 Sep 2007 14:37:13 +0200 Subject: [BioPython] using Bio.PDB: fast way to get the maximum distance within a protein? Message-ID: <1189082233.20772.37.camel@cmeesters> Hi, Thanks for the input. To clarify what I actually wanted: I need a rather precise (+/- 2 ?) estimate of the maximum distance within a protein - taking all atoms, including sugar residues in glycosilated proteins for example, into account. So, restricting myself to CA-atoms does not really help. The approach should not rely on symmetry, since not all proteins have symmetry. Thinking about the problem once more, I decided to make use of the Har-Peled approach Thomas pointed me (indirectly) to. Again, Thanks a lot, Christian From biopython at maubp.freeserve.co.uk Sun Sep 9 21:17:04 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 09 Sep 2007 22:17:04 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46D31C97.1070200@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> Message-ID: <46E462D0.5090207@maubp.freeserve.co.uk> Peter wrote: > I think having SeqRecord subclass Seq is nicer than simply adding > annotation to the Seq class. Seq objects would (still) just have a > sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object. > > I think this would be close to BioPerl's Seq and RichSeq objects. > > I have filed an enhancement on Bugzilla to hold any suggested patches > etc (I hope to upload something later tonight): > > Bug 2351 - Make SeqRecord subclass Seq subclass string? > http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Going back over the mailing list archives, we discussed something similar on the dev mailing list back in early 2005. I would like to make the following "small" change now, ready for the next release of Biopython: (1) Make __str__ give the full sequence as a string for Seq and MutableSeq objects, allowing intuitive use of str(myseq) which used to give a truncated representation including the alphabet. (2) tostring() will be documented as deprecated in favour of str(...) (3) leave __repr__ as is (giving the full string with an alphabet) which can be used with eval(repr(myseq))) There will be some fallout to this - in particular we'll need to go over the documentation and may need to fix a few things. The only downside is the loss of a built in method to get a "short seq string representation" (currently available as str(myseq) via __str__). Back in 2005, Fr?d?ric Sohm suggested adding short() method to do this. Personally I'd only use this when working at the command line, but it might be nice. One refinement over the current truncation is I would personally include the last three letters - this is handy when looking at genes as you might want to know if there was a stop codon present. e.g. Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLN...LVL', SingleLetterAlphabet()) rather than: Seq('MLKILLATTMLIPTAFILKPQILHQTMISYTFILTLFSLIFLKQNQYLKPLSNLYLNLDQ ...', SingleLetterAlphabet()) and similarly for nucleotides (which is why I suggest at least the last three trailing letters). Peter From mdehoon at c2b2.columbia.edu Mon Sep 10 00:04:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 10 Sep 2007 09:04:28 +0900 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E462D0.5090207@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> Message-ID: <46E48A0C.1050403@c2b2.columbia.edu> Peter wrote: > I would like to make the following "small" change now, ready for the > next release of Biopython: > > (1) Make __str__ give the full sequence as a string for Seq and > MutableSeq objects, allowing intuitive use of str(myseq) which > used to give a truncated representation including the alphabet. Note that the __str__ is used to create the output of "print myseq", where myseq is a Seq object. So if __str__ returns the full sequence string, then "print myseq" will print the full sequence. This is not necessarily what you want. In essence, the str() function and the .tostring() method have different functions. So I think we should not drop .tostring() in favor of str(). Moreover, this problem will go away if and when a Seq object subclasses from a string object. Then, we won't need a Seq-to-string function at all. --Michiel. From biopython at maubp.freeserve.co.uk Mon Sep 10 08:27:18 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 10 Sep 2007 09:27:18 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E48A0C.1050403@c2b2.columbia.edu> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> <46E48A0C.1050403@c2b2.columbia.edu> Message-ID: <46E4FFE6.9040608@maubp.freeserve.co.uk> We seem to be talking at cross purposes. Michiel de Hoon wrote: > Peter wrote: >> I would like to make the following "small" change now, ready for >> the next release of Biopython: >> >> (1) Make __str__ give the full sequence as a string for Seq and >> MutableSeq objects, allowing intuitive use of str(myseq) which used >> to give a truncated representation including the alphabet. > > Note that the __str__ is used to create the output of "print myseq", > where myseq is a Seq object. So if __str__ returns the full sequence > string, then "print myseq" will print the full sequence. This is not > necessarily what you want. Getting the full string from both "print my_seq" and str(my_seq) is what I would expect from a Seq object that acted like a string. > In essence, the str() function and the .tostring() method have > different functions. So I think we should not drop .tostring() in > favor of str(). At the moment str() and .tostring() do serve purposes. Currently with a Seq object called my_seq: * full sequence as string - my_seq.tostring() * representation with full sequence with alphabet - repr(my_seq) * truncated sequence as string - not built in * representation with truncated sequence with alphabet - str(my_seq) What I would like: * full sequence as string - str(my_seq) and retain my_seq.tostring() for backwards compatibility. * representation with full sequence with alphabet - repr(my_seq) * truncated sequence as string - not built in * representation with truncated sequence with alphabet - consider added a new method e.g. my_seq.short() > Moreover, this problem will go away if and when a Seq object > subclasses from a string object. Then, we won't need a Seq-to-string > function at all. What do you mean by the "problem will go away"? This would be much easier to discuss in person :( If/when we make Seq a subclass of string, there would still be __str__ and __repr__ methods, and I would expect str(my_seq) and also "print my_seq" to give the full sequence. For backwards compatibility I would keep the existing .tostring() method as well. I would find it very strange to have the Seq object subclass string, but doing str(my_seq) not give me the full sequence. Isn't making str(my_seq) return the full sequence as a string is essential for things like this?: print my_seq print "My sequence is %s, length %i" % (my_seq, len(my_seq)) Rather than as currently required: print my_seq.tostring() print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq)) Peter From mdehoon at c2b2.columbia.edu Mon Sep 10 09:56:25 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 10 Sep 2007 18:56:25 +0900 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46E4FFE6.9040608@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> <46D31C97.1070200@maubp.freeserve.co.uk> <46E462D0.5090207@maubp.freeserve.co.uk> <46E48A0C.1050403@c2b2.columbia.edu> <46E4FFE6.9040608@maubp.freeserve.co.uk> Message-ID: <46E514C9.2010006@c2b2.columbia.edu> Let's have the Seq/MutableSeq/SeqRecord discussion after the upcoming release, which is only five days away. There's not enough time to discuss these issues in detail, let alone to test them. --Michiel. Peter wrote: > We seem to be talking at cross purposes. > > Michiel de Hoon wrote: >> Peter wrote: >>> I would like to make the following "small" change now, ready for >>> the next release of Biopython: >>> >>> (1) Make __str__ give the full sequence as a string for Seq and >>> MutableSeq objects, allowing intuitive use of str(myseq) which used >>> to give a truncated representation including the alphabet. >> >> Note that the __str__ is used to create the output of "print myseq", >> where myseq is a Seq object. So if __str__ returns the full sequence >> string, then "print myseq" will print the full sequence. This is not >> necessarily what you want. > > Getting the full string from both "print my_seq" and str(my_seq) is what > I would expect from a Seq object that acted like a string. > >> In essence, the str() function and the .tostring() method have >> different functions. So I think we should not drop .tostring() in >> favor of str(). > > At the moment str() and .tostring() do serve purposes. Currently with a > Seq object called my_seq: > * full sequence as string - my_seq.tostring() > * representation with full sequence with alphabet - repr(my_seq) > * truncated sequence as string - not built in > * representation with truncated sequence with alphabet - str(my_seq) > > What I would like: > * full sequence as string - str(my_seq) and retain my_seq.tostring() for > backwards compatibility. > * representation with full sequence with alphabet - repr(my_seq) > * truncated sequence as string - not built in > * representation with truncated sequence with alphabet - consider added > a new method e.g. my_seq.short() > >> Moreover, this problem will go away if and when a Seq object >> subclasses from a string object. Then, we won't need a Seq-to-string >> function at all. > > What do you mean by the "problem will go away"? This would be much > easier to discuss in person :( > > If/when we make Seq a subclass of string, there would still be __str__ > and __repr__ methods, and I would expect str(my_seq) and also "print > my_seq" to give the full sequence. For backwards compatibility I would > keep the existing .tostring() method as well. > > I would find it very strange to have the Seq object subclass string, but > doing str(my_seq) not give me the full sequence. Isn't making > str(my_seq) return the full sequence as a string is essential for things > like this?: > > print my_seq > print "My sequence is %s, length %i" % (my_seq, len(my_seq)) > > Rather than as currently required: > > print my_seq.tostring() > print "My sequence is %s, length %i" % (my_seq.tostring(), len(my_seq)) > > > Peter > From mdehoon at c2b2.columbia.edu Tue Sep 11 14:37:57 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Tue, 11 Sep 2007 23:37:57 +0900 Subject: [BioPython] Bio.MultiProc Message-ID: <46E6A845.3030601@c2b2.columbia.edu> Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. From biopython at maubp.freeserve.co.uk Wed Sep 12 18:31:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Sep 2007 19:31:43 +0100 Subject: [BioPython] Deprecating Bio.FormatIO ? Message-ID: <46E8308F.6040709@maubp.freeserve.co.uk> With the release Biopython 1.43 and Bio.SeqIO earlier this year, would anyone be upset if the older Bio.FormatIO module was marked as deprecated for the next Biopython release? This module isn't mentioned in the tutorial/cookbook, but Brad did write this entire document: http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html In addition to marking Bio.FormatIO as deprecated, I would probably add a big disclaimer to that document, or re-write it to use Bio.SeqIO instead. Thanks Peter From mdehoon at c2b2.columbia.edu Thu Sep 13 05:13:29 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 13 Sep 2007 01:13:29 -0400 Subject: [BioPython] Deprecating Fasta.Dictionary, GenBank.Dictionary Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B61E@mail2.exch.c2b2.columbia.edu> Hi everybody, In the preparation for the upcoming Biopython release, we noticed some serious problems when using the latest version (3.0) of mxTextTools. We were already able to fix several of them, but some Biopython tests still fail with the new mxTextTools. One of the tests that fails is test_Fasta.py. The part of the test that fails is related to creating a Fasta Dictionary. This is not explicitly described in the Tutorial, but it is essentially the same as creating a Genbank dictionary, which is explained in section 4.3.4 in the Tutorial. Quoting from the tutorial: >>> from Bio import GenBank >>> dict_file = 'cor6_6.gb' >>> index_file = 'cor6_6.idx' >>> GenBank.index_file(dict_file, index_file) >>> gb_dict = GenBank.Dictionary(index_file, GenBank.FeatureParser()) >>> len(gb_dict) >>> gb_dict.keys() ['L31939', 'AJ237582', 'X62281', 'AF297471', 'M81224', 'X55053'] >>> gb_dict['AJ237582'] The same can also be obtained with the new Bio.SeqIO code: >>> from Bio import SeqIO >>> records = SeqIO.parse(open('cor6_6.gb'), 'genbank') >>> gb_dict = {} >>> for record in records: ... key = record.id.split(".")[0] ... gb_dict[key] = record ... >>> gb_dict.keys() ['M81224', 'AF297471', 'X62281', 'AJ237582', 'L31939', 'X55053'] >>> # etcetera (you can also use the to_dict function in Bio.SeqIO). The same can also be done for Fasta. So, I'd like to deprecate the index_file functions where Bio.SeqIO can be used instead, in particular for Fasta. Then, we can remove that particular test from test_Fasta. Would that cause problems for anybody? Given the new Bio.SeqIO code, does anybody still need to use the index_file functions? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From letondal at pasteur.Fr Fri Sep 14 19:12:28 2007 From: letondal at pasteur.Fr (Catherine Letondal) Date: Fri, 14 Sep 2007 21:12:28 +0200 Subject: [BioPython] Programming course at Institut Pasteur (winter 2008) Message-ID: <19581094-71E1-4400-83ED-12EF051BF7CB@pasteur.Fr> Hi, ************************************************************************ Course in informatics for biology 2008 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html *** Registration extended to October 15th 2007 *** ************************************************************************ In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2008. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 15th 2007. *** Sincerely, -- Benno Schwikowski & Catherine Letondal Institut Pasteur -- Course in Informatics for Biology www.pasteur.fr/formation/infobio From dalloliogm at gmail.com Mon Sep 17 09:39:38 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 11:39:38 +0200 Subject: [BioPython] sequence logo with biopython Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Hi, is there any way to produce sequence logos[1] with biopython? I have a set of sequences of the same length, which represent the 5' donorsite in a set of introns. I wonder if there is a way to to create and display a .png logo representation of them, like with this program: - http://weblogo.berkeley.edu/ Thanks!! [1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From dalloliogm at gmail.com Mon Sep 17 09:39:38 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 11:39:38 +0200 Subject: [BioPython] sequence logo with biopython Message-ID: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Hi, is there any way to produce sequence logos[1] with biopython? I have a set of sequences of the same length, which represent the 5' donorsite in a set of introns. I wonder if there is a way to to create and display a .png logo representation of them, like with this program: - http://weblogo.berkeley.edu/ Thanks!! [1] http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From bartek at rezolwenta.eu.org Mon Sep 17 09:49:12 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 11:49:12 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> Message-ID: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio wrote: > Hi, > is there any way to produce sequence logos[1] with biopython? > > I have a set of sequences of the same length, which represent the 5' > donorsite in a set of introns. > I wonder if there is a way to to create and display a .png logo > representation of them, like with this program: > - http://weblogo.berkeley.edu/ > Unfortunately, currently there is no solution in biopython to this. You can however take a look at TAMO, a python library designed for working with motifs. http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with it, but there are ways to at least obtain text version of the logo. -- regards Bartek -- For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken From bartek at rezolwenta.eu.org Mon Sep 17 13:35:22 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 15:35:22 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> Message-ID: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> bartek wilczynski wrote: > Giovanni Marco Dall'Olio wrote: > > > Hi, > > is there any way to produce sequence logos[1] with biopython? > > > > I have a set of sequences of the same length, which represent the 5' > > donorsite in a set of introns. > > I wonder if there is a way to to create and display a .png logo > > representation of them, like with this program: > > - http://weblogo.berkeley.edu/ > > > > Unfortunately, currently there is no solution in biopython to this. You can > however take a look at TAMO, a python library designed for working with > motifs. > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with > it, > but there are ways to at least obtain text version of the logo. > I've looked into it, and found a way to add this functionality to biopython. The diff file attached introduces a method .weblogo("filename.png") to the Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a standalone function which takes a fasta file as input. Is it a right time to submit things like this to cvs? I can do that, but I do not want to mess up the (soon to be available) new release. -- regards Bartek -- For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken -------------- next part -------------- A non-text attachment was scrubbed... Name: Motif.py Type: text/x-python Size: 8378 bytes Desc: not available URL: From dalloliogm at gmail.com Mon Sep 17 14:23:54 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 17 Sep 2007 16:23:54 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> Message-ID: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> Thank you: this is very good. I see that it uses the berkeley weblogo website and urllib. just one newbie question: why do you put it in the Bio.AlignAce.Motif class? Thanks Giovanni 2007/9/17, bartek wilczynski : > bartek wilczynski wrote: > > > Giovanni Marco Dall'Olio wrote: > > > > > Hi, > > > is there any way to produce sequence logos[1] with biopython? > > > > > > I have a set of sequences of the same length, which represent the 5' > > > donorsite in a set of introns. > > > I wonder if there is a way to to create and display a .png logo > > > representation of them, like with this program: > > > - http://weblogo.berkeley.edu/ > > > > > > > Unfortunately, currently there is no solution in biopython to this. You can > > however take a look at TAMO, a python library designed for working with > > motifs. > > http://fraenkel.mit.edu/TAMO/ I'm not sure if you can make png files with > > it, > > but there are ways to at least obtain text version of the logo. > > > > I've looked into it, and found a way to add this functionality to biopython. > > The diff file attached introduces a method .weblogo("filename.png") to the > Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a > standalone function which takes a fasta file as input. > > Is it a right time to submit things like this to cvs? I can do that, but I do > not want to mess up the (soon to be available) new release. > > -- > regards > Bartek > -- > For every complex problem there is an answer that is clear, simple, and wrong. > H. L. Mencken > > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Mon Sep 17 14:24:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Sep 2007 15:24:42 +0100 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> Message-ID: <46EE8E2A.3080809@maubp.freeserve.co.uk> bartek wilczynski wrote: > I've looked into it, and found a way to add this functionality to biopython. > > The diff file attached introduces a method .weblogo("filename.png") to the > Bio.AlignAce.Motif class. It is relatively easy to modify the method to be a > standalone function which takes a fasta file as input. > > Is it a right time to submit things like this to cvs? I can do that, but I do > not want to mess up the (soon to be available) new release. Its a very small change, but lets see what Michiel says for the timing. It might be nice to expose all the options to the end user, possibly as handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds as in Bio/Blast/NCBIStandalone.py blastall() etc. Peter From bartek at rezolwenta.eu.org Mon Sep 17 14:57:52 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Mon, 17 Sep 2007 16:57:52 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <5aa3b3570709170723w19574b98x4d974b025a9d4622@mail.gmail.com> Message-ID: <1190041072.46ee95f03d51d@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio wrote: > Thank you: this is very good. > > I see that it uses the berkeley weblogo website and urllib. > > just one newbie question: why do you put it in the Bio.AlignAce.Motif class? > Thanks Well, the quick answer is that it is the most convenient place for me to put it. Since there is a Motif class for sequence motif objects, it is not a bad one. A longer answer is that biopython does not have a good infrastructure for dealing with motifs. I've contributed the AlignAce lib, Jason Hackney contributed the MEME library, which includes another Motif class, very similar, but not exactly compatible with AlignAce code. I planned once to do some refactoring work to unify these to modules, but so far did not find the time to do it. Now, since there is TAMO library available , there is even less incentive to do so (even though I do not use TAMO myself). cheers bartek From bartek at rezolwenta.eu.org Mon Sep 17 22:09:30 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 18 Sep 2007 00:09:30 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <46EE8E2A.3080809@maubp.freeserve.co.uk> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> Message-ID: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> Peter wrote: > > Is it a right time to submit things like this to cvs? I can do that, but I > > do not want to mess up the (soon to be available) new release. > > Its a very small change, but lets see what Michiel says for the timing. > It is indeed a very small change, however it seems to have at least one prospective user ;). Also it is almost impossible to break anything by including it in the new release. > It might be nice to expose all the options to the end user, possibly as > handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds > as in Bio/Blast/NCBIStandalone.py blastall() etc. Good idea, I've included a new diff, which allows for passing any keys directly from function call to the weblogo server such as: m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image BTW. It would be interesting to know if there are more people interested in using a better module for sequence motifs. I have some code lying arround and some ideas on how it could be put together, but since there were no documented cases of anyone using Bio.AlignAce or Bio.MEME, I'm not sure if it's worth the extra work. -- cheers Bartek -------------- next part -------------- A non-text attachment was scrubbed... Name: Motif.py.diff Type: text/x-patch Size: 2450 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Tue Sep 18 08:12:02 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 Sep 2007 09:12:02 +0100 Subject: [BioPython] Removing Bio.FormatIO ? In-Reply-To: <46E8308F.6040709@maubp.freeserve.co.uk> References: <46E8308F.6040709@maubp.freeserve.co.uk> Message-ID: <46EF8852.6090000@maubp.freeserve.co.uk> Having looked at the Bio.FormatIO code in more detail, a simple deprecation warning isn't an option - it would get triggered whenever anyone used Bio.SeqRecord Would anyone object if we removed Bio.FormatIO (and its hooks in Bio/SeqRecord.py and Bio/Search.py) entirely for the next release? Speak now or forever hold your peace! ;) Peter Peter wrote: > With the release Biopython 1.43 and Bio.SeqIO earlier this year, would > anyone be upset if the older Bio.FormatIO module was marked as > deprecated for the next Biopython release? > > This module isn't mentioned in the tutorial/cookbook, but Brad did write > this entire document: > http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.pdf > http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html > > In addition to marking Bio.FormatIO as deprecated, I would probably add > a big disclaimer to that document, or re-write it to use Bio.SeqIO instead. > > Thanks > > Peter From biopython at maubp.freeserve.co.uk Tue Sep 18 08:51:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 18 Sep 2007 09:51:26 +0100 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> Message-ID: <46EF918E.90107@maubp.freeserve.co.uk> >> It might be nice to expose all the options to the end user, possibly as >> handled in the Bio/Blast/NCBIWWW.py qblast() function, or using **keywds >> as in Bio/Blast/NCBIStandalone.py blastall() etc. > > Good idea, I've included a new diff, which allows for passing any keys directly > from function call to the weblogo server such as: > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image Does this let you do things like: m.weblogo("x.png", res=300) i.e. an integer, or do you have to use a string: m.weblogo("x.png", res="300") One way to "fix" this (if it is a problem) would be to do this: for k,v in kwds.items(): values[k]=str(v) rather than: for k,v in kwds.items(): values[k]=v Anyway, given we have at least ten days until the release (Michiel will be away - see his email on the developers list), and this is a little change, I would be happy for this to go into CVS now. Peter From bartek at rezolwenta.eu.org Tue Sep 18 13:12:41 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 18 Sep 2007 15:12:41 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <46EF918E.90107@maubp.freeserve.co.uk> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> Message-ID: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> Peter wrote: > > > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image > > Does this let you do things like: > > m.weblogo("x.png", res=300) > > i.e. an integer, or do you have to use a string: > > m.weblogo("x.png", res="300") > > One way to "fix" this (if it is a problem) would be to do this: > > for k,v in kwds.items(): > values[k]=str(v) > > rather than: > > for k,v in kwds.items(): > values[k]=v > > Anyway, given we have at least ten days until the release (Michiel will > be away - see his email on the developers list), and this is a little > change, I would be happy for this to go into CVS now. Thanks for another good idea. I submitted the code to CVS. -- cheers Bartek From dalloliogm at gmail.com Tue Sep 18 14:36:59 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 18 Sep 2007 16:36:59 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> Message-ID: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> ok, thank you. so, let's see if I understand how to use it: from Bio.Seq import Seq from Bio.AlignAce.Motif import Motif m = Motif() m.add_instance(Seq('ACTG')) m.add_instance(Seq('ACCG')) m.add_instance(Seq('ACTC')) m.search_instance(Seq('ACACGACAACGTGTCGAT')) m.weblogo('/home/user/logo.png') Well, about refactoring.... honestly I think it would be a good idea. The problem is that for example, I have never used AlignAce and I don't know which kind of program is it... so I feel a bit confusing to import a module called like this. Anyway the Motif class seems useful, and I will use it in my program.. problably I will have to ask a few questions on it in the next days! :) 2007/9/18, bartek wilczynski : > Peter wrote: > > > > > > m.weblogo("x.png",colorscheme="BW") # brings you a monochrome logo image > > > > Does this let you do things like: > > > > m.weblogo("x.png", res=300) > > > > i.e. an integer, or do you have to use a string: > > > > m.weblogo("x.png", res="300") > > > > One way to "fix" this (if it is a problem) would be to do this: > > > > for k,v in kwds.items(): > > values[k]=str(v) > > > > rather than: > > > > for k,v in kwds.items(): > > values[k]=v > > > > Anyway, given we have at least ten days until the release (Michiel will > > be away - see his email on the developers list), and this is a little > > change, I would be happy for this to go into CVS now. > > Thanks for another good idea. I submitted the code to CVS. > > -- > cheers > Bartek > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From bartek at rezolwenta.eu.org Tue Sep 18 23:09:29 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Wed, 19 Sep 2007 01:09:29 +0200 Subject: [BioPython] sequence logo with biopython In-Reply-To: <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> Message-ID: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> Giovanni Marco Dall'Olio : > ok, thank you. > > so, let's see if I understand how to use it: > > from Bio.Seq import Seq > from Bio.AlignAce.Motif import Motif > > m = Motif() > m.add_instance(Seq('ACTG')) > m.add_instance(Seq('ACCG')) > m.add_instance(Seq('ACTC')) > > m.search_instance(Seq('ACACGACAACGTGTCGAT')) > > m.weblogo('/home/user/logo.png') > You got it mostly right. However, the .search_instance() and .search_pwm() methods return generators, so you should rather use: for pos,instance in m.search_instance(sequence): print "found %s at %d"%(instance,pos) > > Well, about refactoring.... honestly I think it would be a good idea. > The problem is that for example, I have never used AlignAce and I > don't know which kind of program is it... so I feel a bit confusing to > import a module called like this. The basic idea is to create a new Motif class aggregating the good parts of the AlignAce and MEME versions and modify these modules so they would use the new class. I'll try to look into that next week. I also have some code for reading modules from the JASPAR database and motif comparisons. I'll try to clean it up ands submit as well. Then we could try to come up with a section in the tutorial devoted to motif analysis. If you have anything you would consider useful in the Motif library, let me know. > Anyway the Motif class seems useful, and I will use it in my program.. > problably I will have to ask a few questions on it in the next days! > :) No problem, I'll do my best to answer your questions. However I'm leaving tomorrow for the CMSB conference, so I may be slow at responding to email this week. -- cheers Bartek From robert.campbell at queensu.ca Wed Sep 19 13:32:14 2007 From: robert.campbell at queensu.ca (Robert Campbell) Date: Wed, 19 Sep 2007 09:32:14 -0400 Subject: [BioPython] sequence logo with biopython In-Reply-To: <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> References: <5aa3b3570709170239p360b0842y77406416c450e9fa@mail.gmail.com> <1190022552.46ee4d98a071d@imp.rezolwenta.eu.org> <1190036122.46ee829a34e8a@imp.rezolwenta.eu.org> <46EE8E2A.3080809@maubp.freeserve.co.uk> <1190066970.46eefb1a93134@imp.rezolwenta.eu.org> <46EF918E.90107@maubp.freeserve.co.uk> <1190121161.46efcec961c0e@imp.rezolwenta.eu.org> <5aa3b3570709180736h3ea93267p198c9b33de62ffa2@mail.gmail.com> <1190156969.46f05aa950c96@imp.rezolwenta.eu.org> Message-ID: <20070919093214.2c7567da@adelie.biochem.queensu.ca> On Wed, 19 Sep 2007 01:09:29 +0200, bartek wilczynski wrote: > Giovanni Marco Dall'Olio : > > > ok, thank you. > > > > so, let's see if I understand how to use it: > > > > from Bio.Seq import Seq > > from Bio.AlignAce.Motif import Motif > > > > m = Motif() > > m.add_instance(Seq('ACTG')) > > m.add_instance(Seq('ACCG')) > > m.add_instance(Seq('ACTC')) > > > > m.search_instance(Seq('ACACGACAACGTGTCGAT')) > > > > m.weblogo('/home/user/logo.png') > > > > You got it mostly right. However, the .search_instance() and .search_pwm() > methods return generators, so you should rather use: > > for pos,instance in m.search_instance(sequence): > print "found %s at %d"%(instance,pos) I believe that should be "m.search_instances(sequence)" not "m.search_instance(sequence)" (i.e. "instances", plural). Cheers, Rob -- Robert L. Campbell, Ph.D. Senior Research Associate/Adjunct Assistant Professor Botterell Hall Rm 644 Department of Biochemistry, Queen's University, Kingston, ON K7L 3N6 Canada Tel: 613-533-6821 Fax: 613-533-2497 http://pldserver1.biochem.queensu.ca/~rlc From meesters at uni-mainz.de Thu Sep 20 12:23:54 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 20 Sep 2007 14:23:54 +0200 Subject: [BioPython] feature request for Bio.PDB Message-ID: <1190291034.9570.28.camel@cmeesters> Hi, I think it would be good to have the option to retrieve the kind of atom added using a method of the atom-class, e.g. like: x = atom.get_kind() and x would then be 'H' or 'N' for instance. It is of course possible to retrieve this information via the atom id, but this requires to employ a dictionary if one wants to know which type of atom this is. So, such a method would only be for convenience. It would be nice to see this in the upcoming release, but I fear it's too late for this and it would be great if this idea would only be considered for some other future release. Christian From anaryin at gmail.com Fri Sep 21 19:40:07 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 20:40:07 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: Message-ID: Hello all! I'm writing a small script to fetch results from a NCBI database search using BioPython modules. However, I'd like to broaden my search and to have each page of the results displaying 500 results instead of the usual 20. Does anyone has any idea on how to do this? Thanks ! Jo?o Rodrigues From anaryin at gmail.com Fri Sep 21 20:33:55 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 21:33:55 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: <46F41FEA.5020205@maubp.freeserve.co.uk> References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the code *could* be clearer! Oh, and some of it is in Portuguese because it's for personal use.. # NCBI Retriever import os import sys # What should I look for? query = raw_input('Qual a expressao que deseja procurar?\n..: ') # Where should I look for? print 'Em qual das bases de dados deseja procurar?' databases = {1: 'PubMed', 2: 'Nucleotide', 3: 'Protein',4:'Genome',5:'Structure'} choice = raw_input('[1] PubMed\n[2] Nucleotide\n[3] Protein\n[4] Genome\n[5] Structure\n..: ') if int(choice) not in databases.keys(): print 'Escolha Inv?lida' sys.exit() search_database = databases[int(choice)] # Quit playing around, let s search! from Bio.WWW import NCBI search_command = 'Search' results = NCBI.query(search_command , search_database, term = query, doptcmdl = 'FASTA') # Where should I save the results? import time actual_date = str(time.localtime()[0])+str(time.localtime()[1])+str( time.localtime()[2]) results_file_name = os.path.join(os.getcwd(), str(query)+'_'+str(actual_date)+".txt") results_file = open(results_file_name, 'w') results_file.write(results.read()) results_file.close() From biopython at maubp.freeserve.co.uk Fri Sep 21 21:40:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Sep 2007 22:40:15 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: <46F43A3F.9090008@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Sure I can :) Must warn though, that I have 2 weeks of "python-ing" so the > code *could* be clearer! Oh, and some of it is in Portuguese because it's > for personal use.. That's fine - as the code and comments were in English it was fine. I see you are using Bio.WWW.NCBI as an interface to the Entrez query system. Somewhere on the NCBI website they have an answer to your question (how to specify the number of results per page): results = NCBI.query('Search', 'Protein', term='orchid', dispmax=23) Some pages mentioned retstart and retmax but that doesn't seem to work. You might also consider using Bio.EUtils instead - a python wrapper for the NCBI's E-Utils interface. Peter From biopython at maubp.freeserve.co.uk Fri Sep 21 22:00:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Sep 2007 23:00:58 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: References: <46F41FEA.5020205@maubp.freeserve.co.uk> Message-ID: <46F43F1A.9040309@maubp.freeserve.co.uk> Hi again Jo?o, I'm was thinking about your example code, and while I'm not sure exactly what you want to be able to do in python: You might want to look at the search_for() function in Bio.PubMed and Bio.GenBank (which uses EUtils internally), and then the download_many() or dictionary interfaces. This is covered in the Biopython tutorial. I'm not sure if we have a front end for the structure database at the moment. This may be more helpful than working with Entrez directly. Peter From anaryin at gmail.com Fri Sep 21 22:57:20 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Fri, 21 Sep 2007 23:57:20 +0100 Subject: [BioPython] More results at NCBI Search In-Reply-To: <46F43F1A.9040309@maubp.freeserve.co.uk> References: <46F41FEA.5020205@maubp.freeserve.co.uk> <46F43F1A.9040309@maubp.freeserve.co.uk> Message-ID: Thanks you for the tip, it worked perfectly. Well, to be honest, I'm just practicing BioPython and Python skills. What I'm trying to do is a simple script that searches for *something* in PubMed, gets the results page and parses that page so that I can give the user, that is, myself at the moment :) , a txt file with this format: ---- TITLE: AUTHOR: YEAR: JOURNAL: (optional actually) ABSTRACT: LINK: RELATED LINKS: ---- It is probably already made and in a more useful way than mine but, as I do need to practice, it's a start! Again, thanks for the tips. I'll look into those Bio.PubMed and Bio.GenBank. From anaryin at gmail.com Mon Sep 24 16:13:33 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 24 Sep 2007 17:13:33 +0100 Subject: [BioPython] Configuring Proxy for certain Modules Message-ID: Hello! I am working in a University whose network is proxied. I can't work with any of the BioPython modules that require access to the Internet (e.g. Bio.WWW). How can I configure them manually to override the proxy? I already read about configuring the urllib to use a proxy, but I can't figure out where to find the string that handles the connection. Jo?o Rodrigues From biopython at maubp.freeserve.co.uk Mon Sep 24 16:58:56 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Sep 2007 17:58:56 +0100 Subject: [BioPython] Configuring Proxy for certain Modules In-Reply-To: References: Message-ID: <46F7ECD0.8020001@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Hello! > > I am working in a University whose network is proxied. I can't work > with any of the BioPython modules that require access to the Internet > (e.g. Bio.WWW). How can I configure them manually to override the > proxy? I already read about configuring the urllib to use a proxy, > but I can't figure out where to find the string that handles the > connection. Bio.WWW uses urllib, so the simplest answer is to follow the advice in http://docs.python.org/lib/module-urllib.html Specifically on Windows you probably just need to set the http_proxy environment variables before starting Python, or configure the proxy in the internet settings (via Internet Explorer I assume). I think would be easiest to set this environment variable once by hand, but you could set it at run time as part of your python script. You'll have to consult your Universities network documentation to determine the string to use for the http_proxy environment variable, but it would look something like "http://www.someproxy.com:3128" (i.e. address:port number). The alternative is to pass the "proxies" option to urllib.openurl(), but this would require multiple changes in Bio.WWW to support. Note that urllib does not currently support proxies which require authentication. Peter From biopython at maubp.freeserve.co.uk Mon Sep 24 21:47:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Sep 2007 22:47:13 +0100 Subject: [BioPython] poor man's databases for large sequence files Message-ID: <46F83061.3090207@maubp.freeserve.co.uk> I've been thinking about extending Bio.SeqIO to support a (read only) dictionary like interface for large sequence files (WITHOUT having everything in memory). Some of the older Biopython sequence format specific modules have an index_file function and matching Dictionary class to do this (based internally on either Martel/Mindy or a DIY Biopython indexer based on pickle). When thinking about a format agnostic SeqRecord dictionary, the built in python "Shelf" object from python's built in "shelve library" looks like a good choice. I could add a Bio.SeqIO.to_shelf() function similar to the existing Bio.SeqIO.to_dict() function. The only downside I've thought of so far is updating a shelf database, something supported by shelve but with a few gotchas when dealing with non-trivial datatypes (like dictionaries). The need I am thinking about addressing is a little less flexible - read only low-memory access to a large collection of SeqRecords (typically from a large sequence file). Does anyone already use python's shelve library with sequence data? Peter From anaryin at gmail.com Mon Sep 24 23:11:57 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 25 Sep 2007 00:11:57 +0100 Subject: [BioPython] Configuring Proxy for certain Modules In-Reply-To: <46F7ECD0.8020001@maubp.freeserve.co.uk> References: <46F7ECD0.8020001@maubp.freeserve.co.uk> Message-ID: Again, thank you for the kind answer! I had in fact read about the urllib module and that was how I "discovered" that I could configure the proxy "by hand". If I set it automatically at the IE, or firefox, it won't work on Python, but it will on the browser. As for the http_proxy env variable, how do I set them? From sdavis2 at mail.nih.gov Tue Sep 25 01:40:21 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 24 Sep 2007 21:40:21 -0400 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F83061.3090207@maubp.freeserve.co.uk> References: <46F83061.3090207@maubp.freeserve.co.uk> Message-ID: <46F86705.1090109@mail.nih.gov> Peter wrote: > I've been thinking about extending Bio.SeqIO to support a (read only) > dictionary like interface for large sequence files (WITHOUT having > everything in memory). > > Some of the older Biopython sequence format specific modules have an > index_file function and matching Dictionary class to do this (based > internally on either Martel/Mindy or a DIY Biopython indexer based on > pickle). > > When thinking about a format agnostic SeqRecord dictionary, the built in > python "Shelf" object from python's built in "shelve library" looks like > a good choice. I could add a Bio.SeqIO.to_shelf() function similar to > the existing Bio.SeqIO.to_dict() function. > > The only downside I've thought of so far is updating a shelf database, > something supported by shelve but with a few gotchas when dealing with > non-trivial datatypes (like dictionaries). The need I am thinking about > addressing is a little less flexible - read only low-memory access to a > large collection of SeqRecords (typically from a large sequence file). > > Does anyone already use python's shelve library with sequence data? > Just a curiosity, Peter, but would this extension deal with small collections of large sequences (finished genomes, for example)? Sean From biopython at maubp.freeserve.co.uk Tue Sep 25 08:14:50 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 25 Sep 2007 09:14:50 +0100 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F86705.1090109@mail.nih.gov> References: <46F83061.3090207@maubp.freeserve.co.uk> <46F86705.1090109@mail.nih.gov> Message-ID: <46F8C37A.1000005@maubp.freeserve.co.uk> Sean Davis wrote: > Peter wrote: >> I've been thinking about extending Bio.SeqIO to support a (read only) >> dictionary like interface for large sequence files (WITHOUT having >> everything in memory). >> >> ... >> >> Does anyone already use python's shelve library with sequence data? >> > > Just a curiosity, Peter, but would this extension deal with small > collections of large sequences (finished genomes, for example)? > Hi Sean, What I had in mind was say indexing all of UniProt which is currently 1.1 GB in the SwissProt flat file format, but each record is pretty small. However, in theory this (largely unwritten) code could be used on any number of any sized records - but you would need enough ram to hold any one record in memory at once, plus some more RAM for the hopefully modest database overhead, python, your script etc. I suppose having all the chromosomes for a given Eukaryote (e.g. mouse or fruit fly) would also be a sensible examples; having tens of records where each is tens of MB in size. Is that the sort of thing you had in mind Sean? Peter From sdavis2 at mail.nih.gov Tue Sep 25 11:41:25 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 25 Sep 2007 07:41:25 -0400 Subject: [BioPython] poor man's databases for large sequence files In-Reply-To: <46F8C37A.1000005@maubp.freeserve.co.uk> References: <46F83061.3090207@maubp.freeserve.co.uk> <46F86705.1090109@mail.nih.gov> <46F8C37A.1000005@maubp.freeserve.co.uk> Message-ID: <46F8F3E5.5020802@mail.nih.gov> Peter wrote: > Sean Davis wrote: >> Peter wrote: >>> I've been thinking about extending Bio.SeqIO to support a (read only) >>> dictionary like interface for large sequence files (WITHOUT having >>> everything in memory). >>> >>> ... >>> >>> Does anyone already use python's shelve library with sequence data? >>> >> >> Just a curiosity, Peter, but would this extension deal with small >> collections of large sequences (finished genomes, for example)? > > Hi Sean, > > What I had in mind was say indexing all of UniProt which is currently > 1.1 GB in the SwissProt flat file format, but each record is pretty small. > > However, in theory this (largely unwritten) code could be used on any > number of any sized records - but you would need enough ram to hold any > one record in memory at once, plus some more RAM for the hopefully > modest database overhead, python, your script etc. > > I suppose having all the chromosomes for a given Eukaryote (e.g. mouse > or fruit fly) would also be a sensible examples; having tens of records > where each is tens of MB in size. Is that the sort of thing you had in > mind Sean? Yes. Lincoln Stein wrote some indexing stuff in perl that allows essentially random access to sequence records as well as subsets of individual records. It makes it possible to do range queries on individual sequences with very modest memory; with a larger memory machine, one might imagine that this would result in very fast queries as the files get cached. Sean From ytu888 at hotmail.com Fri Sep 28 11:40:09 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 06:40:09 -0500 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X Message-ID: I'm a newbie for the Biopython, and want to install it on my Mac OS X computer. I got the similar error messages on command line when install Python2.5, but finally I did that using the python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got the following messages: mxDateTime.c is missing. Where to find the file? Please help me to solve the problem and thank you very much. LeesComputer:/Users/Python_Bio/egenix-mx-base-3.0.0.macosx-10.3-fat-py2.5_ucs4.prebuilt Lee$ sudo python setup.py build running build running mx_autoconf gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -D_GNU_SOURCE=1 -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c _configtest.c -o _configtest.o success! removing: _configtest.c _configtest.o macros to define: [] macros to undefine: [] running build_ext building extension "mx.DateTime.mxDateTime.mxDateTime" (required) building 'mx.DateTime.mxDateTime.mxDateTime' extension gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DUSE_FAST_GETCURRENTTIME -Imx/DateTime/mxDateTime -I/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3 -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.3/include -c mx/DateTime/mxDateTime/mxDateTime.c -o build/temp.darwin-8.10.1-i386-2.3_ucs2/mx-DateTime-mxDateTime-mxDateTime/mx/DateTime/mxDateTime/mxDateTime.o i686-apple-darwin8-gcc-4.0.1: mx/DateTime/mxDateTime/mxDateTime.c: No such file or directory i686-apple-darwin8-gcc-4.0.1: no input files error: command 'gcc' failed with exit status 1 _________________________________________________________________ Connect to the next generation of MSN Messenger? http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline From biopython at maubp.freeserve.co.uk Fri Sep 28 12:27:17 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 13:27:17 +0100 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X In-Reply-To: References: Message-ID: <46FCF325.4040002@maubp.freeserve.co.uk> Y Tu wrote: > I'm a newbie for the Biopython, and want to install it on my Mac OS X > computer. I got the similar error messages on command line when > install Python2.5, but finally I did that using the > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got > the following messages: mxDateTime.c is missing. Where to find the > file? Please help me to solve the problem and thank you very much. It sounds like you don't want to use the default Apple provided python - I have the impression that this can make life more complicated. I'm not a Mac user, but Michiel, is and he may be able to help. He has been away recently but should be back soon. In terms of installing mxTextTools, you may get more support on the egenix mailing list. However, there are currently some issues with Biopython and egenix mxTextTools 3.0, so if you can find it I would suggest using version 2.0 instead. We hope to release Biopython 1.44 in October, which will address most of the mxTextText tools issues. That said, the majority of Biopython 1.43 will still work even with mxTextTools 3.0 Peter From ytu888 at hotmail.com Fri Sep 28 13:22:43 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 08:22:43 -0500 Subject: [BioPython] Error for installation of mxTextTools on Mac OS X In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: The one coming with Mac OS X is an old version. Therefore I installed the new one 2.5.1 and it succeeded. Then, it came the problem with mxTextTools. I just did the installation of Numerical and it worked. > Date: Fri, 28 Sep 2007 13:27:17 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for installation of mxTextTools on Mac OS X > > Y Tu wrote: > > I'm a newbie for the Biopython, and want to install it on my Mac OS X > > computer. I got the similar error messages on command line when > > install Python2.5, but finally I did that using the > > python-2.5.1-macosx.dmg. When I tried to install mxTextTools and got > > the following messages: mxDateTime.c is missing. Where to find the > > file? Please help me to solve the problem and thank you very much. > > It sounds like you don't want to use the default Apple provided python - > I have the impression that this can make life more complicated. I'm not > a Mac user, but Michiel, is and he may be able to help. He has been > away recently but should be back soon. > > In terms of installing mxTextTools, you may get more support on the > egenix mailing list. However, there are currently some issues with > Biopython and egenix mxTextTools 3.0, so if you can find it I would > suggest using version 2.0 instead. > > We hope to release Biopython 1.44 in October, which will address most of > the mxTextText tools issues. That said, the majority of Biopython 1.43 > will still work even with mxTextTools 3.0 > > Peter > _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From ytu888 at hotmail.com Fri Sep 28 15:26:11 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 10:26:11 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FCF325.4040002@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: I just installed ReportLab on Mac OS X and the test with command "from reportlab.graphics import renderPDF" succeeded. However, when I run the test script (eportlab/test/test_pdfgen_general.py), I got the following error. How to fix the problem. Another question is how to run the script under the python prompt (>>>) after importing the script by "import test_pdfgen_general.py". Thank you very much. nypivs-lee:/Applications/MacPython 2.5/reportlab/test lee$ python test_pdfgen_general.py E ====================================================================== ERROR: Make a PDFgen document with most graphics features ---------------------------------------------------------------------- Traceback (most recent call last): File "test_pdfgen_general.py", line 833, in test0 run(outputfile('test_pdfgen_general.pdf')) File "test_pdfgen_general.py", line 796, in run c = makeDocument(filename) File "test_pdfgen_general.py", line 725, in makeDocument c.drawImage(tgif, 4*inch, 9.25*inch, w, h, mask='auto') File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfgen/canvas.py", line 629, in drawImage imgObj = pdfdoc.PDFImageXObject(name, image, mask=mask) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1840, in __init__ self.loadImageFromA85(src) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfdoc.py", line 1846, in loadImageFromA85 imagedata = map(string.strip,pdfutils.makeA85Image(source,IMG=IMG)) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/pdfbase/pdfutils.py", line 35, in makeA85Image raw = img.getRGBData() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/reportlab/lib/utils.py", line 612, in getRGBData self._data = im.tostring() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 513, in tostring self.load() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load d = Image._getdecoder(self.mode, d, a, self.decoderconfig) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder raise IOError("decoder %s not available" % decoder_name) IOError: decoder jpeg not available ---------------------------------------------------------------------- Ran 1 test in 0.321s FAILED (errors=1) _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From biopython at maubp.freeserve.co.uk Fri Sep 28 16:28:28 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 17:28:28 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> Message-ID: <46FD2BAC.80401@maubp.freeserve.co.uk> Y Tu wrote: > I just installed ReportLab on Mac OS X and the test with command > "from reportlab.graphics import renderPDF" succeeded. However, when I > run the test script (reportlab/test/test_pdfgen_general.py), I got the > following error. How to fix the problem. I would guess you have not installed PIL, the Python Imaging Library, which ReportLab uses. > Another question is how to > run the script under the python prompt (>>>) after importing the > script by "import test_pdfgen_general.py". Thank you very much. To run a python script, like "test_pdfgen_general.py", at the command line type: python test_pdfgen_general.py (assuming python is on the path, and example.py is in the current directory) In general there are two sorts of python files, scripts which you run (like test_pdfgen_general.py) and library modules you import. Peter From ytu888 at hotmail.com Fri Sep 28 19:18:06 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 28 Sep 2007 14:18:06 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FD2BAC.80401@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> Message-ID: Thank you, Peter for the prompt answer. I did install the PIL already and tested with the commands "from PIL import Image", then "import _imaging". Both commands succeeded. That's why I don't understand why the test won't work. I used the command "python test_pdfgen_general.py" under the shell prompt, which generated the error. Since I installed PIL and succeeded in importing the module of PIL, I thought maybe I can solve the problem by running the test under Python. However, after importing the test into Python. I do't know how to launch the test under the python prompt (>>>). That's why I asked the second question. Once again, thank you very much for help. > Date: Fri, 28 Sep 2007 17:28:28 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > > Y Tu wrote: > > I just installed ReportLab on Mac OS X and the test with command > > "from reportlab.graphics import renderPDF" succeeded. However, when I > > run the test script (reportlab/test/test_pdfgen_general.py), I got the > > following error. How to fix the problem. > > I would guess you have not installed PIL, the Python Imaging Library, > which ReportLab uses. > > > Another question is how to > > run the script under the python prompt (>>>) after importing the > > script by "import test_pdfgen_general.py". Thank you very much. > > To run a python script, like "test_pdfgen_general.py", at the command > line type: > > python test_pdfgen_general.py > > (assuming python is on the path, and example.py is in the current directory) > > In general there are two sorts of python files, scripts which you run > (like test_pdfgen_general.py) and library modules you import. > > Peter > _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From biopython at maubp.freeserve.co.uk Fri Sep 28 19:42:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Sep 2007 20:42:31 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> Message-ID: <46FD5927.3000207@maubp.freeserve.co.uk> Y Tu wrote: > Thank you, Peter for the prompt answer. > > I did install the PIL already and tested with the commands "from PIL > import Image", then "import _imaging". Both commands succeeded. > That's why I don't understand why the test won't work. I used the > command "python test_pdfgen_general.py" under the shell prompt, which > generated the error. Since I installed PIL and succeeded in importing > the module of PIL, I thought maybe I can solve the problem by running > the test under Python. Looking in more detail at the original stack trace, > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load > d = Image._getdecoder(self.mode, d, a, self.decoderconfig) > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder > raise IOError("decoder %s not available" % decoder_name) > IOError: decoder jpeg not available Its possible that PIL needs some optional JPEG library, which ReportLab wants to use. I suggest you search the ReportLab website & user's mailing list, and if you can't work out what is wrong sign up to their mailing list and ask them, http://www.reportlab.org/ Very little of Biopython needs ReportLab, you should be able to install Biopython without it. Peter