From mike.thon at gmail.com Mon Nov 1 01:41:41 2010 From: mike.thon at gmail.com (Michael Thon) Date: Mon, 1 Nov 2010 06:41:41 +0100 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote: > > > On Sun, Oct 31, 2010 at 1:57 PM, Eric Talevich wrote: > On Sun, Oct 31, 2010 at 12:03 PM, Michael Thon wrote: > I have a Clade object and I need to access its parent clade. I thought that clade.root should do this but this seems to contain a reference to itself: > > (Pdb) main_clade == main_clade.root > True > > Is there some other way? > Thanks > Mike > > > Hi Mike, > > You can do this, assuming you have the original tree object (call it "tree"): > > parent = tree.get_path(main_clade)[-2] > > This is an O(n) operation on the tree, so if you need to do it repeatedly on a large tree, it's faster to call tree.get_path(clade) once outside the loop and then reuse the resulting list. > > Is the operation you're doing here part of something you'd like to see implemented as a tree method? > > Maybe - it seems to me that if I can access children of a clade from the clade, then I should also be able to go the other way and access the parent. I don't know how often people would need this functionality though. Does a Clade contain a reference to its tree? I have a recursive function that does some crunching on a Clade and then recursively processes the child clades. I could pass in the tree object as well, but I figure that a Clade must know about its tree so there should be some way to access it. Thanks Mike > I added a cookbook entry on the Biopython wiki for this problem: > http://biopython.org/wiki/Phylo_cookbook#Get_the_parent_of_a_clade > > Cheers, > Eric From biopython at maubp.freeserve.co.uk Mon Nov 1 07:34:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Nov 2010 11:34:32 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 11:24 AM, saikari keitele wrote: > Many thanks for your reply. > Does that mean that pccompound and pcassay databases can not be queried > programmatically, they just have to be queried manually? > Thanks again Please ask the NCBI about this, and let us know what they say. Thank you, Peter From saikari78 at gmail.com Mon Nov 1 07:24:48 2010 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 1 Nov 2010 11:24:48 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: Many thanks for your reply. Does that mean that pccompound and pcassay databases can not be queried programmatically, they just have to be queried manually? Thanks again On Fri, Oct 29, 2010 at 1:13 PM, Peter wrote: > On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele > wrote: > > Hi, > > > > I'm using BioPython to query the NCBI pccompound database. > > I'm trying to retrieve the molecular weight of a compound given its > > InChIKey. > > Gettting the ID of the compound with esearch works fine. For instance: > > > > Entrez.esearch(db="pccompound", > > term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') > > > > However, when I try to retrieve the record's content with efetch from the > ID > > returned by esearch, like this: > > > > Entrez.efetch(db="pcassay", id="2244") > > > > I get the following response: > > ... > > Error occurred: Report 'ASN1' not found in 'pccompound' presentation > > ... > > > > I've tried to use other return types and return modes, like for instance > > > > Entrez.efetch(db="pcassay", id="2244", rettype="abstract", > retmode="text") > > > > but I have not succeeded in retrieveing this compound's record's content. > > Many thanks for any help on how to retrieve information on a compound > from > > pccompound. > > > > Best wishes > > > > Saikari > > If you go to the webpage for this, > http://www.ncbi.nlm.nih.gov/pcassay?term=2244 > then you don't actually get any download links - rather it connects to the > BioAssay server to retrieve data. My guess is the NCBI don't support > efetch for the pcassay database - you'll have to email them and ask. > > Peter > From cjfields at illinois.edu Mon Nov 1 10:50:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 09:50:59 -0500 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> Try using esummary instead of efetch to get that information programmatically. Some database information can't be retrieved via efetch (I think pcassay/pccomopund are two of those), but the summary of the information for any database is retrievable. Using the BioPerl eutil interface, one does this to just dump the returned information. One can also get at the various bits of that data programmatically as well using generic constructs, but you have to know the tag names for the data you are looking for. There should be an analogous Biopython way to do this. =================================== use Bio::DB::EUtilities; my $term = '"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]'; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => $term, -email => 'cjfields at bioperl.org', -db => 'pccompound', -usehistory => 'y'); my $hist = $eutil->next_History || die "Can't retrieve history data"; $eutil->set_parameters(-eutil => 'esummary', -history => $hist); $eutil->print_all; =================================== chris On Nov 1, 2010, at 6:24 AM, saikari keitele wrote: > Many thanks for your reply. > Does that mean that pccompound and pcassay databases can not be queried > programmatically, they just have to be queried manually? > Thanks again > > On Fri, Oct 29, 2010 at 1:13 PM, Peter wrote: > >> On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele >> wrote: >>> Hi, >>> >>> I'm using BioPython to query the NCBI pccompound database. >>> I'm trying to retrieve the molecular weight of a compound given its >>> InChIKey. >>> Gettting the ID of the compound with esearch works fine. For instance: >>> >>> Entrez.esearch(db="pccompound", >>> term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') >>> >>> However, when I try to retrieve the record's content with efetch from the >> ID >>> returned by esearch, like this: >>> >>> Entrez.efetch(db="pcassay", id="2244") >>> >>> I get the following response: >>> ... >>> Error occurred: Report 'ASN1' not found in 'pccompound' presentation >>> ... >>> >>> I've tried to use other return types and return modes, like for instance >>> >>> Entrez.efetch(db="pcassay", id="2244", rettype="abstract", >> retmode="text") >>> >>> but I have not succeeded in retrieveing this compound's record's content. >>> Many thanks for any help on how to retrieve information on a compound >> from >>> pccompound. >>> >>> Best wishes >>> >>> Saikari >> >> If you go to the webpage for this, >> http://www.ncbi.nlm.nih.gov/pcassay?term=2244 >> then you don't actually get any download links - rather it connects to the >> BioAssay server to retrieve data. My guess is the NCBI don't support >> efetch for the pcassay database - you'll have to email them and ask. >> >> Peter >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From saikari78 at gmail.com Mon Nov 1 11:53:45 2010 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 1 Nov 2010 15:53:45 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> References: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> Message-ID: Many thanks! By using esummary as you suggest I can retrieve all the information from pccompound and pcassay with BioPython. For instance, for retrieving the molecular weight of a compound given its InChIKey : handle=Entrez.esearch(db="pccompound", term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') records = Entrez.read(handle) molWeight = Entrez.read(Entrez.esummary(db="pccompound", id=records["IdList"][0]))[0]['MolecularWeight'] Thanks again. On Mon, Nov 1, 2010 at 2:50 PM, Chris Fields wrote: > Try using esummary instead of efetch to get that information > programmatically. Some database information can't be retrieved via efetch > (I think pcassay/pccomopund are two of those), but the summary of the > information for any database is retrievable. > > Using the BioPerl eutil interface, one does this to just dump the returned > information. One can also get at the various bits of that data > programmatically as well using generic constructs, but you have to know the > tag names for the data you are looking for. There should be an analogous > Biopython way to do this. > > =================================== > > use Bio::DB::EUtilities; > > my $term = '"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]'; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => $term, > -email => 'cjfields at bioperl.org', > -db => 'pccompound', > -usehistory => 'y'); > > my $hist = $eutil->next_History || die "Can't retrieve history data"; > > $eutil->set_parameters(-eutil => 'esummary', > -history => $hist); > > $eutil->print_all; > > =================================== > > chris > > On Nov 1, 2010, at 6:24 AM, saikari keitele wrote: > > > Many thanks for your reply. > > Does that mean that pccompound and pcassay databases can not be queried > > programmatically, they just have to be queried manually? > > Thanks again > > > > On Fri, Oct 29, 2010 at 1:13 PM, Peter >wrote: > > > >> On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele > >> wrote: > >>> Hi, > >>> > >>> I'm using BioPython to query the NCBI pccompound database. > >>> I'm trying to retrieve the molecular weight of a compound given its > >>> InChIKey. > >>> Gettting the ID of the compound with esearch works fine. For instance: > >>> > >>> Entrez.esearch(db="pccompound", > >>> term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') > >>> > >>> However, when I try to retrieve the record's content with efetch from > the > >> ID > >>> returned by esearch, like this: > >>> > >>> Entrez.efetch(db="pcassay", id="2244") > >>> > >>> I get the following response: > >>> ... > >>> Error occurred: Report 'ASN1' not found in 'pccompound' presentation > >>> ... > >>> > >>> I've tried to use other return types and return modes, like for > instance > >>> > >>> Entrez.efetch(db="pcassay", id="2244", rettype="abstract", > >> retmode="text") > >>> > >>> but I have not succeeded in retrieveing this compound's record's > content. > >>> Many thanks for any help on how to retrieve information on a compound > >> from > >>> pccompound. > >>> > >>> Best wishes > >>> > >>> Saikari > >> > >> If you go to the webpage for this, > >> http://www.ncbi.nlm.nih.gov/pcassay?term=2244 > >> then you don't actually get any download links - rather it connects to > the > >> BioAssay server to retrieve data. My guess is the NCBI don't support > >> efetch for the pcassay database - you'll have to email them and ask. > >> > >> Peter > >> > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From eric.talevich at gmail.com Mon Nov 1 21:20:30 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 1 Nov 2010 21:20:30 -0400 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 1:41 AM, Michael Thon wrote: > > On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote: > > > > Is the operation you're doing here part of something you'd like to see > implemented as a tree method? > > > > > Maybe - it seems to me that if I can access children of a clade from the > clade, then I should also be able to go the other way and access the parent. > I don't know how often people would need this functionality though. > > Does a Clade contain a reference to its tree? I have a recursive function > that does some crunching on a Clade and then recursively processes the child > clades. I could pass in the tree object as well, but I figure that a Clade > must know about its tree so there should be some way to access it. > PyCogent does work that way, but Bio.Phylo's data structure is simpler -- a Tree has a single root Clade (tree.root or tree.clade), and each Clade has a plain Python list of child Clades (clade.clades), all the way down. It doesn't track any references to the parent or the original tree, so the tree can never have an inconsistent internal state... because there is no internal state. I haven't needed the parent references so far for the Tree/Clade methods or my own scripts, surprisingly. Calling get_path once or twice has been enough. (I could probably speed up common_ancestor by using the all_parents dictionary approach in the cookbook, at the expense of memory.) Do you or anyone else want to try plugging that all_parents function into your code to see if it helps significantly? If it does, I could add it as a Tree/Clade method in the next Biopython release. Best, Eric From mike.thon at gmail.com Tue Nov 2 05:58:46 2010 From: mike.thon at gmail.com (Michael Thon) Date: Tue, 2 Nov 2010 10:58:46 +0100 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> Hi Eric > > Do you or anyone else want to try plugging that all_parents function into your code to see if it helps significantly? If it does, I could add it as a Tree/Clade method in the next Biopython release. > I can try it - I have a few 1000 trees to parse so any differences in performance should be more obvious. But first, I realized that I should have explained the problem I'm solving in more detail, to see if I'm approaching it the right way. I need to visit every node in the tree, and then compare the node to its parent and do some calculations. I'm doing this by writing a recursion that starts with tree.clade and then calls itself twice with clade.clade[0] and clade.clades[1] . then within the function I need to get the parent clade and do the calculations. def crunch_clade(tree, clade): compute_data(clade, get_parent(tree, clade) crunch_clade(tree, clade.clades[0]) crunch_clade(tree, clade.clades[1]) Is there a better way to do it? Like maybe starting with the terminal clades? Mike From eric.talevich at gmail.com Tue Nov 2 11:44:36 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 2 Nov 2010 11:44:36 -0400 Subject: [Biopython] getting the parent of a Clade In-Reply-To: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> References: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> Message-ID: On Tue, Nov 2, 2010 at 5:58 AM, Michael Thon wrote: > Hi Eric > > > > Do you or anyone else want to try plugging that all_parents function into > your code to see if it helps significantly? If it does, I could add it as a > Tree/Clade method in the next Biopython release. > > > > > I can try it - I have a few 1000 trees to parse so any differences in > performance should be more obvious. > > But first, I realized that I should have explained the problem I'm solving > in more detail, to see if I'm approaching it the right way. I need to visit > every node in the tree, and then compare the node to its parent and do some > calculations. I'm doing this by writing a recursion that starts with > tree.clade and then calls itself twice with clade.clade[0] and > clade.clades[1] . then within the function I need to get the parent clade > and do the calculations. > > def crunch_clade(tree, clade): > compute_data(clade, get_parent(tree, clade) > crunch_clade(tree, clade.clades[0]) > crunch_clade(tree, clade.clades[1]) > > Is there a better way to do it? Like maybe starting with the terminal > clades? > > Mike > > The tree traversal functions in Bio.Phylo are fairly efficient and flexible. I'm not sure if the traversal order matters for your function, but you could try something like: parent_lookup = all_parents(tree) # from the cookbook for clade in tree.find_clades(): compute_data(clade, parent_lookup[clade]) Or, possibly: for parent in tree.get_nonterminals(): for child in parent: compute_data(child, parent) Notice that get_terminals() and get_nonterminals() are simplified versions of find_clades(). They return plain lists instead of being generator functions, but the filtering arguments aren't as flexible. Also, see the tutorial section 12.4.1 on traversal: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc167 In particular, if you need to do level-order search, it looks like this: tree.find_clades(order='level') Hope that helps, Eric From matsen at fhcrc.org Wed Nov 3 08:06:05 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:06:05 -0700 Subject: [Biopython] Lightweight version of Biopython? Message-ID: Hello there Biopython community-- We're writing some python code to use SCons for reproducible bioinformatics research with intelligent dependencies. As part of the project, we often need to do very simple bioinformatics tasks, such as reading in various formats and spitting out others. We could use Biopython for such things, but it's a very heavy dependency for such trivial tasks. I'm curious if there exists a Biopython "lite". The ideal situation would be a tiny module that we could include directly in our project. I have searched the Biopython mailing list and have yet to find anything. Thanks in advance, Erick From biopython at maubp.freeserve.co.uk Wed Nov 3 08:10:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 12:10:58 +0000 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 12:06 PM, Erick Matsen wrote: > Hello there Biopython community-- > > We're writing some python code to use SCons for reproducible > bioinformatics research with intelligent dependencies. As part of the > project, we often need to do very simple bioinformatics tasks, such as > reading in various formats and spitting out others. We could use > Biopython for such things, but it's a very heavy dependency for such > trivial tasks. > > I'm curious if there exists a Biopython "lite". The ideal situation > would be a tiny module that we could include directly in our project. > I have searched the Biopython mailing list and have yet to find > anything. > > Thanks in advance, > > Erick Hi Erick, Why do you consider Biopython a heavy dependency? It can be installed with no 3rd party libraries (although we do strongly recommend NumPy, if you are not using anything numerical you don't need it). Peter From matsen at fhcrc.org Wed Nov 3 08:16:07 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:16:07 -0700 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> Message-ID: Hello Peter-- Thanks for the quick response. Perhaps "heavy" isn't the right word, but it definitely seems overkill when all we want to do is convert between three file formats (FASTA, phylip, and Stockholm). Furthermore, something little we can include directly in the project (with credit given where it is due) would be ideal. An installation script could look for Biopython and use it if it's there, or use the lite version if not. Thanks, Erick On Wed, Nov 3, 2010 at 5:10 AM, Peter wrote: > On Wed, Nov 3, 2010 at 12:06 PM, Erick Matsen wrote: >> Hello there Biopython community-- >> >> We're writing some python code to use SCons for reproducible >> bioinformatics research with intelligent dependencies. As part of the >> project, we often need to do very simple bioinformatics tasks, such as >> reading in various formats and spitting out others. We could use >> Biopython for such things, but it's a very heavy dependency for such >> trivial tasks. >> >> I'm curious if there exists a Biopython "lite". The ideal situation >> would be a tiny module that we could include directly in our project. >> I have searched the Biopython mailing list and have yet to find >> anything. >> >> Thanks in advance, >> >> Erick > > Hi Erick, > > Why do you consider Biopython a heavy dependency? It can > be installed with no 3rd party libraries (although we do strongly > recommend NumPy, if you are not using anything numerical > you don't need it). > > Peter > From biopython at maubp.freeserve.co.uk Wed Nov 3 08:38:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 12:38:29 +0000 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> Message-ID: On Wed, Nov 3, 2010 at 12:16 PM, Erick Matsen wrote: > Hello Peter-- > > Thanks for the quick response. > > Perhaps "heavy" isn't the right word, but it definitely seems overkill > when all we want to do is convert between three file formats (FASTA, > phylip, and Stockholm). Furthermore, something little we can include > directly in the project (with credit given where it is due) would be > ideal. An installation script could look for Biopython and use it if > it's there, or use the lite version if not. > Hi Erick, Under our licence you could take a snapshot copy of the bits you need and include them in your application. However, the bits you'd need for Bio.AlignIO would basically be the core sequence stuff in Biopython (at very least Bio.Alphabet, Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, Bio.SeqIO, Bio.Align, and Bio.AlignIO). This adds up to a lot of files, but would be possible... How are you distributing your application? If using py2exe or a source code bundle you could in theory bundle in (part of) Biopython. If using pip or easy_install (i.e. PyPi packages) then it *should* take care of a dependency on Biopython. Suppose hypothetically with had a "lite" or "seq-core" version of Biopython containing Bio.SeqIO, Bio.AlignIO and their dependencies (but excluding Bio.PDB, Bio.Phylo, Bio.Cluster, Bio.Blast, BioSQL etc). The download would be smaller (say a third of the size), but it isn't that big to start with. There wouldn't be the (soft) dependency on NumPy. Neither of these benefits seems compelling to me. Regards, Peter From matsen at fhcrc.org Wed Nov 3 08:53:55 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:53:55 -0700 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: <9068_1288787911_4CD157C7_9068_879472_1_AANLkTikAPuppz=kU6HYM+-TgARWdvKHPtVGyJXXt00UF@mail.gmail.com> References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> <9068_1288787911_4CD157C7_9068_879472_1_AANLkTikAPuppz=kU6HYM+-TgARWdvKHPtVGyJXXt00UF@mail.gmail.com> Message-ID: Peter-- Thank you for the complete response. We will have to have a run-off between a biopython install and our homegrown file format converters. Erick On Wed, Nov 3, 2010 at 5:38 AM, Peter wrote: > On Wed, Nov 3, 2010 at 12:16 PM, Erick Matsen wrote: >> Hello Peter-- >> >> Thanks for the quick response. >> >> Perhaps "heavy" isn't the right word, but it definitely seems overkill >> when all we want to do is convert between three file formats (FASTA, >> phylip, and Stockholm). Furthermore, something little we can include >> directly in the project (with credit given where it is due) would be >> ideal. An installation script could look for Biopython and use it if >> it's there, or use the lite version if not. >> > > Hi Erick, > > Under our licence you could take a snapshot copy of the bits you need > and include them in your application. However, the bits you'd need for > Bio.AlignIO would basically be the core sequence stuff in Biopython > (at very least Bio.Alphabet, Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, > Bio.SeqIO, Bio.Align, and Bio.AlignIO). This adds up to a lot of files, > but would be possible... How are you distributing your application? If > using py2exe or a source code bundle you could in theory bundle in > (part of) Biopython. > > If using pip or easy_install (i.e. PyPi packages) then it *should* take > care of a dependency on Biopython. > > Suppose hypothetically with had a "lite" or "seq-core" version of > Biopython containing Bio.SeqIO, Bio.AlignIO and their dependencies > (but excluding Bio.PDB, Bio.Phylo, Bio.Cluster, Bio.Blast, BioSQL etc). > The download would be smaller (say a third of the size), but it isn't that > big to start with. There wouldn't be the (soft) dependency on NumPy. > Neither of these benefits seems compelling to me. > > Regards, > > Peter > From mcasfrox at gmail.com Wed Nov 3 09:24:23 2010 From: mcasfrox at gmail.com (Merche Castillo) Date: Wed, 3 Nov 2010 14:24:23 +0100 Subject: [Biopython] biopython in eclipse Message-ID: Hi there, I'm working in bioinformatics and I just began using python. I have installed a pydev in eclipse and I'd like to know if I can use biopython there too, and how i should install it. Could anyone help me? Thanks Best Regards, Merche From anaryin at gmail.com Wed Nov 3 09:46:08 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 3 Nov 2010 14:46:08 +0100 Subject: [Biopython] biopython in eclipse In-Reply-To: References: Message-ID: Hello Merche, Once you install Biopython, it should be available just like any other library in PyDev. If you install it somewhere else (e.g. /home/merche/biopython) make sure that that directory is in your PYTHONPATH. Check this thread for details on how to generally configure the PyDev Interpreter. Also, please have a look at this thread for an older discussion on the subject: http://lists.open-bio.org/pipermail/biopython/2009-February/004910.html It should be easy enough! :) Jo?o [...] Rodrigues http://doeidoei.wordpress.org On Wed, Nov 3, 2010 at 2:24 PM, Merche Castillo wrote: > Hi there, > > > I'm working in bioinformatics and I just began using python. I have > installed a pydev in eclipse and I'd like to know if I can use biopython > there too, and how i should install it. > > Could anyone help me? > > Thanks > > Best Regards, > Merche > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jp.verta at gmail.com Wed Nov 3 14:58:18 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 14:58:18 -0400 Subject: [Biopython] Primer3Commandline rookie question Message-ID: Hi all, I'm just starting with Python so mine's a rookie question. I'm trying to write a little program that would do primer design with Primer3Commandline, yet I can't figure out how to access the output. For example, when I try the same example as in the Application.py file under Primer3Commandline function ###### from Bio.Emboss.Applications import Primer3Commandline from Bio.Emboss import Primer3 sequence_handle = open("sequence", "w").write(">test\nCGGTCGCCGGACTATCAAATGGCCAGCGCTGAATTGTAGTACATTTCTCTGTCCTTTAATTCACAGGATTCTGTCATGTAAGATGCGCGATTACAGGTTTAACTGATCCAAACTTCTGCAAAACCGATAAAGAACCTAAATAAAGGAAAGGGACACTGCATAGATACAAACAAACGAAACGTTATCTCAGCAGGATTCATGGCTTCTGAAATTCAACTCATCACTATTGGACCTTCTCATTTTTGTGAAAAGGCACGATGGGGTTTGGACCGTGCTGGAATTTCTTATAAGGAGTCAAAGCATGCTCCTGTATTTCACATGTTATACACAAGGGGTTTGGGACAGGGAACCAGCTGTCCCAAGCTTGTACTAGGTGAGGGCAAAAATAAAGTGGTGTTGCATGAATCATCAGATATCTTGAAGTTTGCTGATGAGAACATCATGTCTGAAGAGGATCGCTTGTATCCATCGAACCTTGAGCAGTCTGTGCAAGAGTGGGGGCTGCATTCAAGGGAAAGGTTGTTAACTTGGTGTTTAATGCCATTGTTAAAGCCTCTCGTATATAGAACAGTTGGTTGCAACATGCCTGCAGCAAAAGATCATTCTCTTGATAAGATTCGTTGTATATTTTCAAGAGGTAGATAATGTGCTTGCTGATGGCCGGCCTTTCCTTTGTGGCAACGAGTTTACTGCAGCTGACTTGACATTTGCTTCACTTGCTGGTCCTGTACTATGCCCAGTGGGATATGGCACCTATCTAGTACCAACAGAGGAACTTCCATCAGAGATGTCTGAAGTTACCCTTTCTTTAAGAGAGACACTAGCAGGGAAGCATGCATTGAGGATGTACGAAACTGAAAGACACAGGAAAGTGGCTTCGAAGGTGCTAGTGACCTGACTCAGTGCATTTACACCGTGGCCTTTGTGTTCTGTCATATTATTGGCAACTAATATTGATTTACGTAAACTGCATACAGCTTTTCTTTCTTGGGAGTGAATCTTTTGAGCAATTTGCTGTCTTCATATAGCAATAATTTGAGAAAGGACCAAGATTTATTAAACTTCGTATTTTGTTTTTTGAATGTTGATATCTTTGTTGTTATTTGAGCAAGCTTAAGCTGCCTAGAACTATAATTCGGTTATGGCATGGGCATGTATCATTCAACCTTGATCTTTGTGCCTTTTTAACTAGAACCCATTTTTTTCAAAAAA\n") test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) test.explainflag = True test.oligosize=20 test.psizeopt=200 test.outfile = "output.pr3" print test ####### from here on I've understood that I have to parse the output with Primer3.read(), but an error message tells me that the output.pr3 file doesn't exist. ####### test1 = Primer3.read(open("output.pr3", "r")) Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'output.pr3' ####### What would be the proper way to proceed? Thanks in advance! JP Verta From biopython at maubp.freeserve.co.uk Wed Nov 3 15:13:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 19:13:36 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 6:58 PM, Jukka-Pekka Verta wrote: > Hi all, > > I'm just starting with Python so mine's a rookie question. > > I'm trying to write a little program that would do primer design with > Primer3Commandline, yet I can't figure out how to access the output. > For example, when I try the same example as in the Application.py > file under Primer3Commandline function > > ###### > > from Bio.Emboss.Applications import Primer3Commandline > from Bio.Emboss import Primer3 > > sequence_handle = open("sequence", "w").write("...\n") Here you've created a file called "sequence" (with no extension). I'd have gone for something like "sequence.fasta" but it doesn't mater. > test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) Here you should be passing the filename, in your case "sequence". Maybe we should add a check here for people passing non-strings to arguments expecting filenames. Does that help? Peter [Note It is possible to avoid the temporary file by writing to the tool's stdin but this is fiddly and requires you to work with the subprocess module directly.] From jp.verta at gmail.com Wed Nov 3 15:27:30 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 15:27:30 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: Message-ID: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> Thanks Peter for the response. Though doing test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) doesn't fix the problem (no output is created). Other suggestions? JP On 2010-11-03, at 3:13 PM, Peter wrote: > On Wed, Nov 3, 2010 at 6:58 PM, Jukka-Pekka Verta wrote: >> Hi all, >> >> I'm just starting with Python so mine's a rookie question. >> >> I'm trying to write a little program that would do primer design with >> Primer3Commandline, yet I can't figure out how to access the output. >> For example, when I try the same example as in the Application.py >> file under Primer3Commandline function >> >> ###### >> >> from Bio.Emboss.Applications import Primer3Commandline >> from Bio.Emboss import Primer3 >> >> sequence_handle = open("sequence", "w").write("...\n") > > Here you've created a file called "sequence" (with no extension). I'd > have gone for something like "sequence.fasta" but it doesn't mater. > >> test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) > > Here you should be passing the filename, in your case "sequence". > Maybe we should add a check here for people passing non-strings > to arguments expecting filenames. > > Does that help? > > Peter > > [Note It is possible to avoid the temporary file by writing to the tool's > stdin but this is fiddly and requires you to work with the subprocess > module directly.] From chapmanb at 50mail.com Wed Nov 3 15:38:08 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Nov 2010 15:38:08 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> Message-ID: <20101103193808.GQ2726@sobchak.mgh.harvard.edu> JP; > Thanks Peter for the response. > > Though doing > > test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) > > doesn't fix the problem (no output is created). Other suggestions? In addition to what Peter mentioned, your code here creates a commandline object, but never actually runs it: > test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) > test.explainflag = True > test.oligosize=20 > test.psizeopt=200 > test.outfile = "output.pr3" > print test This is probably why you weren't seeing any error messages from passing in a file handle, and explains why there is no output file. You can either call your command line directly: test() or use the Python subprocess module: subprocess.check_call(str(test)) [As an aside, it's good practice to use meaningful variable names. At least a 'cl' for commandline would be nice here and may help you spot these types of issues.] Hope this helps, Brad From jp.verta at gmail.com Wed Nov 3 16:30:19 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 16:30:19 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <20101103193808.GQ2726@sobchak.mgh.harvard.edu> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> Message-ID: <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Thanks Brad, calling test() results in error Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.6/site-packages/biopython-1.55-py2.6-macosx-10.6-universal.egg/Bio/Application/__init__.py", line 487, in __call__ % (str(self), return_code)) RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero exit status 127 as does calling import subprocess subprocess.check_call(str(test)) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 499, in check_call retcode = call(*popenargs, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 486, in call return Popen(*popenargs, **kwargs).wait() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 672, in __init__ errread, errwrite) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1201, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory Ummm.... ? JP On 2010-11-03, at 3:38 PM, Brad Chapman wrote: > JP; > >> Thanks Peter for the response. >> >> Though doing >> >> test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) >> >> doesn't fix the problem (no output is created). Other suggestions? > > In addition to what Peter mentioned, your code here creates a > commandline object, but never actually runs it: > >> test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) >> test.explainflag = True >> test.oligosize=20 >> test.psizeopt=200 >> test.outfile = "output.pr3" >> print test > > This is probably why you weren't seeing any error messages from passing in > a file handle, and explains why there is no output file. You can > either call your command line directly: > > test() > > or use the Python subprocess module: > > subprocess.check_call(str(test)) > > [As an aside, it's good practice to use meaningful variable names. > At least a 'cl' for commandline would be nice here and may help > you spot these types of issues.] > > Hope this helps, > Brad From biopython at maubp.freeserve.co.uk Wed Nov 3 17:04:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 21:04:21 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Message-ID: On Wed, Nov 3, 2010 at 8:30 PM, Jukka-Pekka Verta wrote: > > > Thanks Brad, > calling > test() > results in error > > Traceback (most recent call last): > ... > RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned > non-zero exit status 127 On Linux error 127 means the command was not found. > as does calling > import subprocess > subprocess.check_call(str(test)) > > Traceback (most recent call last): > ... > OSError: [Errno 2] No such file or directory > Ummm.... ? > JP Have you installed the EMBOSS command line tools? Are they on the system path? What happens at the command line if you type eprimer3 (or the longer version Biopython is trying to use)? Peter From jp.verta at gmail.com Thu Nov 4 12:55:31 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 12:55:31 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Message-ID: <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Thanks Peter, indeed I hadn't. I'm a rookie as I mentioned.... I installed EMBOSS tools (works from command line) and the primer3 program, set the path for primer3_core and I am able to run eprimer3 from the command line. When I tried to call my function test() from Python though, I got an error message RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero exit status 1 Thanks for your patient help! JP On 2010-11-03, at 5:04 PM, Peter wrote: > On Wed, Nov 3, 2010 at 8:30 PM, Jukka-Pekka Verta wrote: >> >> >> Thanks Brad, >> calling >> test() >> results in error >> >> Traceback (most recent call last): >> ... >> RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence >> -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned >> non-zero exit status 127 > > On Linux error 127 means the command was not found. > >> as does calling >> import subprocess >> subprocess.check_call(str(test)) >> >> Traceback (most recent call last): >> ... >> OSError: [Errno 2] No such file or directory >> Ummm.... ? >> JP > > Have you installed the EMBOSS command line tools? Are they on > the system path? What happens at the command line if you type > eprimer3 (or the longer version Biopython is trying to use)? > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 13:10:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:10:03 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Message-ID: On Thu, Nov 4, 2010 at 4:55 PM, Jukka-Pekka Verta wrote: > > Thanks Peter, > > indeed I hadn't. I'm a rookie as I mentioned.... > > I installed EMBOSS tools (works from command line) and the primer3 program, > set the path for primer3_core and I am able to run eprimer3 from the command > line. When I tried to call my function test() from Python though, I got an error message > > RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero > exit status 1 > > Thanks for your patient help! > > JP Excellent - you are making progress. That RuntimeError is Biopython saying that eprimer3 returned with an error. Command line tools give an integer return code or error level -- zero means success by convention. What happens if you try this command at the command line prompt (outside python), in the directory where your sequence file is: eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True My guess is you are using a recent version of EMBOSS, in which case you'll see this: Died: Unknown qualifier -oligosize=20 If so you should be using -osizeopt rather than -oligosize, which the Biopython documentation does try to explain - I've made a small clarification here: https://github.com/biopython/biopython/commit/065e235ca76c6656a7f7d73de57c8bf8743ce25f Peter From jp.verta at gmail.com Thu Nov 4 13:33:42 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 13:33:42 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Message-ID: <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> You're right, thats the case. Still, with -oligosize changed to -osizeopt Python returns me an error and running the eprimer3 on the command line gives me EMBOSS An error in eprimer3.c at line 317: eprimer3: Could not exec primer3_core Segmentation fault JP On 2010-11-04, at 1:10 PM, Peter wrote: > On Thu, Nov 4, 2010 at 4:55 PM, Jukka-Pekka Verta wrote: >> >> Thanks Peter, >> >> indeed I hadn't. I'm a rookie as I mentioned.... >> >> I installed EMBOSS tools (works from command line) and the primer3 program, >> set the path for primer3_core and I am able to run eprimer3 from the command >> line. When I tried to call my function test() from Python though, I got an error message >> >> RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence >> -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero >> exit status 1 >> >> Thanks for your patient help! >> >> JP > > Excellent - you are making progress. > > That RuntimeError is Biopython saying that eprimer3 returned with an error. > Command line tools give an integer return code or error level -- zero means > success by convention. > > What happens if you try this command at the command line prompt (outside > python), in the directory where your sequence file is: > > eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True > > My guess is you are using a recent version of EMBOSS, in which case > you'll see this: Died: Unknown qualifier -oligosize=20 > > If so you should be using -osizeopt rather than -oligosize, which the > Biopython documentation does try to explain - I've made a small > clarification here: > > https://github.com/biopython/biopython/commit/065e235ca76c6656a7f7d73de57c8bf8743ce25f > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 13:53:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:53:54 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> Message-ID: On Thu, Nov 4, 2010 at 5:33 PM, Jukka-Pekka Verta wrote: > > You're right, thats the case. Still, with -oligosize changed to -osizeopt > Python returns me an error and running the eprimer3 on the command > line gives me > > ? EMBOSS An error in eprimer3.c at line 317: > eprimer3: Could not exec primer3_core > Segmentation fault > > JP Wow - it looks like you may have found a bug in EMBOSS itself, getting a "Segmentation fault" is never a good sign. Have you properly installed the Whitehead institute primer3 package? The EMBOSS tool acts as a wrapper, and from the error perhaps it can't find the primer3 tool? You may want to report this to EMBOSS directly... See also: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/eprimer3.html Peter From jp.verta at gmail.com Thu Nov 4 14:51:41 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 14:51:41 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> Message-ID: OK, the primer3 installation should be fine as I am able to run the primer3_core program from the command line with the example that comes with the distribution. I reported this to EMBOSS, let's see what they have to say... Thank you for all the help! JP On 2010-11-04, at 1:53 PM, Peter wrote: > On Thu, Nov 4, 2010 at 5:33 PM, Jukka-Pekka Verta wrote: >> >> You're right, thats the case. Still, with -oligosize changed to -osizeopt >> Python returns me an error and running the eprimer3 on the command >> line gives me >> >> EMBOSS An error in eprimer3.c at line 317: >> eprimer3: Could not exec primer3_core >> Segmentation fault >> >> JP > > Wow - it looks like you may have found a bug in EMBOSS itself, > getting a "Segmentation fault" is never a good sign. > > Have you properly installed the Whitehead institute primer3 > package? The EMBOSS tool acts as a wrapper, and from > the error perhaps it can't find the primer3 tool? > > You may want to report this to EMBOSS directly... > See also: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/eprimer3.html > > Peter From yvan.strahm at uni.no Wed Nov 10 07:21:25 2010 From: yvan.strahm at uni.no (Yvan) Date: Wed, 10 Nov 2010 13:21:25 +0100 Subject: [Biopython] editing xml blast report Message-ID: <4CDA8E45.8000206@uni.no> Hello All, I am wondering if it is possible with biopython to correctly edit a xml blast report? Here my problem; I need to filter/exclude some hits according to their length but the xml consistency should be conserved. Should I parse with it with NCBIXML and recreate a xml file with hard coded tags? I try to use the tabular output of blast+ and reformat it with blast_formatter but with no success. Any one has an idea or some pointer I could use? Thanks a lot for your help cheers yvan From biopython at maubp.freeserve.co.uk Wed Nov 10 07:45:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 12:45:53 +0000 Subject: [Biopython] editing xml blast report In-Reply-To: <4CDA8E45.8000206@uni.no> References: <4CDA8E45.8000206@uni.no> Message-ID: On Wed, Nov 10, 2010 at 12:21 PM, Yvan wrote: > Hello All, > > I am wondering if it is possible with biopython to correctly edit a xml > blast report? > > Here my problem; > I need to filter/exclude some hits according to their length but the xml > consistency should be conserved. Biopython has a BLAST XML parser, but nothing to help you write out BLAST XML. > Should I parse with it with NCBIXML and recreate a xml file with hard > coded tags? That might be the best option. > I try to use the tabular output of blast+ and reformat it with > blast_formatter but with no success. I'm pretty sure the blast_formatter tools in BLAST+ will only convert from ASN.1 to other BLAST output formats. So that won't work, unless you can edit the ASN.1 file (but that would probably be harder than editing XML). Peter From lpritc at scri.ac.uk Thu Nov 11 09:12:21 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 11 Nov 2010 14:12:21 +0000 Subject: [Biopython] Bioinformatics position Message-ID: We have a bioinformatics post available at SCRI, and would be grateful if you could please bring it to the attention of any colleagues who may be interested in applying. It is advertised at http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are included below: """ Bioinformatics Scottish Crop Research Institute- SCRI SCRI is Scotland's leading Institute for research on plants and their interactions with the environment, particularly in managed ecosystems. Our mission is to conduct excellent research in plant and environmental sciences. Our vision is to deliver innovative products, knowledge and services that enrich the life of the community and address the public goods of environmental sustainability, high quality and healthy food. Post Reference SMB/1/10 Research in the Plant Pathology Programme at SCRI is founded on pathogen genomics, and scientists in the Programme have a strong track record of contributing to whole genome sequencing and genetic analysis of economically important pests and pathogens.? The successful candidate will collaborate with other groups in the Programme working on plant-pathogen interactions developing innovative approaches to understand disease processes.?This post provides an opportunity to influence biological research of direct impact to agriculture. The ideal candidate would be experienced in manipulating and curating large biological datasets with a record of collaboration and integration with biologists.The successful applicant is expected to have an interest in plant-pathogen interactions and to develop their own research profile.The candidate should have a PhD or equivalent in bioinformatics, biostatistics or a related field. Informal enquiries from:??Leighton.Pritchard at scri.ac.uk ?or?Lesley.Torrance at scri.ac.uk Salary Scale For All Posts: *Band D/E, ?26,610 - ?37,534 (commensurate with experience) *Appointments to Band F, ?42,769 - ?47,521 available for exceptional candidates. Candidates willing to apply for a research fellowship to further help establish their own laboratory are encouraged to apply and will, if successful, benefit from generous Institute support throughout the tenure of their fellowship. Further information on the above posts, including how to apply, is available on the SCRI website athttp://www.scri.ac.uk/careers/vacancies ? Closing date -?Friday 19th?November 2010. The Institute is an equal opportunities employer. """ Many thanks, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From yvan.strahm at uni.no Mon Nov 15 05:38:48 2010 From: yvan.strahm at uni.no (Yvan) Date: Mon, 15 Nov 2010 11:38:48 +0100 Subject: [Biopython] editing xml blast report In-Reply-To: References: <4CDA8E45.8000206@uni.no> Message-ID: <4CE10DB8.8040908@uni.no> On 10/11/10 13:45, Peter wrote: > On Wed, Nov 10, 2010 at 12:21 PM, Yvan wrote: >> Hello All, >> >> I am wondering if it is possible with biopython to correctly edit a xml >> blast report? >> >> Here my problem; >> I need to filter/exclude some hits according to their length but the xml >> consistency should be conserved. > Biopython has a BLAST XML parser, but nothing to help you write > out BLAST XML. > >> Should I parse with it with NCBIXML and recreate a xml file with hard >> coded tags? > That might be the best option. > >> I try to use the tabular output of blast+ and reformat it with >> blast_formatter but with no success. > I'm pretty sure the blast_formatter tools in BLAST+ will only convert > from ASN.1 to other BLAST output formats. So that won't work, > unless you can edit the ASN.1 file (but that would probably be > harder than editing XML). > > Peter Ok thanks for the answer. Cheers, yvan From jgrant at smith.edu Wed Nov 17 16:22:08 2010 From: jgrant at smith.edu (Jessica Grant) Date: Wed, 17 Nov 2010 16:22:08 -0500 Subject: [Biopython] problem blasting on line Message-ID: Hello, I am trying to use blast to extract contaminating sequences from a set of 454 sequence data. My script uses NCBIWWW.qblast as follows: result_handle = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", hitlist_size="1", entrez_query='(Bacteria[ORGN])') It works...for a while but it stops, eventually, with the following error: raceback (most recent call last): File "compBACTEUK.py", line 81, in main() File "compBACTEUK.py", line 70, in main blastBACTEUK(arg) File "compBACTEUK.py", line 19, in blastBACTEUK result_handleE = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", hitlist_size="1", entrez_query='(Eukaryota[ORGN])') File "/Users/katzlab/Downloads/biopython-biopython-74d3c2a/build/lib.macosx-10.5-i386-2.5/Bio/Blast/NCBIWWW.py", line 114, in qblast handle = urllib2.urlopen(request) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 121, in urlopen return _opener.open(url, data) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 380, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 491, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 418, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 353, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 499, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found I suppose that the problem is a communication problem with ncbi. I have written a try except statement into my script, but I seem to be losing quite a few records as they get skipped over if the error occurs. I thought about downloading nr and using the standalone blast, but it seems the downloadable nr database comes in several parts, already formatted for blast. Can I concatenate these? Any thoughts on the problem with the qblast or other ways to circumvent this problem would be greatly appreciated! Jessica From biopython at maubp.freeserve.co.uk Wed Nov 17 16:37:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Nov 2010 21:37:47 +0000 Subject: [Biopython] problem blasting on line In-Reply-To: References: Message-ID: Hi Jessica, On Wed, Nov 17, 2010 at 9:22 PM, Jessica Grant wrote: > Hello, > > I am trying to use blast to extract contaminating sequences from a set of > 454 sequence data. ?My script uses NCBIWWW.qblast as follows: > > result_handle = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), > ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", > hitlist_size="1", entrez_query='(Bacteria[ORGN])') > > > It works...for a while but it stops, eventually, with the following error: > > > traceback (most recent call last): > ... > urllib2.HTTPError: HTTP Error 404: Not Found > > > I suppose that the problem is a communication problem with ncbi. Probably some kind of network problem, yes. > I have > written a try except statement into my script, but I seem to be losing quite > a few records as they get skipped over if the error occurs. > > I thought about downloading nr and using the standalone blast, but it seems > the downloadable nr database comes in several parts, already formatted for > blast. ?Can I concatenate these? > > Any thoughts on the problem with the qblast or other ways to circumvent this > problem would be greatly appreciated! > > Jessica How many sequences are you trying to BLAST? If it is more than a few dozen I would definitely recommend installing and running BLAST locally. Regarding the NR database, yes, it comes in parts, but this is by design. There is a main index file which tells the BLAST command line tools about all the subparts - very easy. Just download all the nr.*.tar.gz files into your BLAST database folder and uncompress them. Peter From eric.talevich at gmail.com Wed Nov 17 16:54:40 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 17 Nov 2010 16:54:40 -0500 Subject: [Biopython] problem blasting on line In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 4:22 PM, Jessica Grant wrote: > Hello, > > I am trying to use blast to extract contaminating sequences from a set of > 454 sequence data. My script uses NCBIWWW.qblast as follows: > > [...] > > I thought about downloading nr and using the standalone blast, but it seems > the downloadable nr database comes in several parts, already formatted for > blast. Can I concatenate these? > > Any thoughts on the problem with the qblast or other ways to circumvent > this problem would be greatly appreciated! > > Hi Jessica, If the problem boils down to grouping all the related sequences together, or isolating the unrelated sequences, you might also have some luck with CD-HIT: http://weizhong-lab.ucsd.edu/cd-hit/ Best, Eric From peter at maubp.freeserve.co.uk Thu Nov 18 10:47:08 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 15:47:08 +0000 Subject: [Biopython] Dropping Python 2.4 Support? Message-ID: Dear Biopythoneers, Are any of you still using Biopython on Python 2.4? http://news.open-bio.org/news/2010/11/dropping-python24-support/ Please get in touch if dropping support for Python 2.4 would be a problem. Otherwise we plan for Biopython 1.56 (expected by the end of this month) to be our last release to work with Python 2.4. Thanks, Peter From srikrishnamohan at gmail.com Fri Nov 19 13:43:23 2010 From: srikrishnamohan at gmail.com (km) Date: Sat, 20 Nov 2010 00:13:23 +0530 Subject: [Biopython] polypeptide residue ids Message-ID: Hi all, I would like to derive amino acid ids (number) corresponding to the residues in polypeptide chains extracted from PDB file using biopython PDB module. Is it possible to obtain this information with PDB module of biopython ? regards, KM From anaryin at gmail.com Fri Nov 19 13:54:17 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 19 Nov 2010 19:54:17 +0100 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Hello KM, You can iterate over the chains as described in the manual and then for each residue call get_id(). for residue in structure.get_residues(): print residue.get_id()[1] Best Jo?o [...] Rodrigues http://doeidoei.wordpress.com On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > Hi all, > I would like to derive amino acid ids (number) corresponding to the > residues in polypeptide chains extracted from PDB file using biopython > PDB module. > Is it possible to obtain this information with PDB module of biopython ? > > regards, > KM > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From srikrishnamohan at gmail.com Fri Nov 19 15:31:32 2010 From: srikrishnamohan at gmail.com (km) Date: Sat, 20 Nov 2010 02:01:32 +0530 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Thanks. that works well. but why have polypeptide builder for generating the sequence ? ppb.get_sequence() ? when we can do the same by reading the residues under chain of the structure hierarchy as u suggested? On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues wrote: > Hello KM, > > You can iterate over the chains as described in the manual and then for each > residue call get_id(). > > for residue in structure.get_residues(): > ?? print residue.get_id()[1] > > Best > > Jo?o [...] Rodrigues > http://doeidoei.wordpress.com > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: >> >> Hi all, >> I would like to derive amino acid ids (number) corresponding to the >> residues in polypeptide chains extracted from PDB file using biopython >> PDB module. >> Is it possible to obtain this information with PDB module of biopython ? >> >> regards, >> KM >> _______________________________________________ >> Biopython mailing list ?- ?Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > From anaryin at gmail.com Fri Nov 19 15:35:25 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 19 Nov 2010 21:35:25 +0100 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: The Peptide Builder serves for much more than generating sequenes. However, having the SMCRA representation of the model is much handier (IMO) to get this kind of information. I'm not really sure if I answered your question though.. Jo?o [...] Rodrigues http://doeidoei.wordpress.com On Fri, Nov 19, 2010 at 9:31 PM, km wrote: > Thanks. that works well. > but why have polypeptide builder for generating the sequence ? > ppb.get_sequence() ? > when we can do the same by reading the residues under chain of the > structure hierarchy as u suggested? > > > On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues > wrote: > > Hello KM, > > > > You can iterate over the chains as described in the manual and then for > each > > residue call get_id(). > > > > for residue in structure.get_residues(): > > print residue.get_id()[1] > > > > Best > > > > Jo?o [...] Rodrigues > > http://doeidoei.wordpress.com > > > > > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > >> > >> Hi all, > >> I would like to derive amino acid ids (number) corresponding to the > >> residues in polypeptide chains extracted from PDB file using biopython > >> PDB module. > >> Is it possible to obtain this information with PDB module of biopython ? > >> > >> regards, > >> KM > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > > From rodrigo_faccioli at uol.com.br Sun Nov 21 11:20:15 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Sun, 21 Nov 2010 14:20:15 -0200 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Hi, I'm not sure if my answer will be correct. Anyway, I'll try to write it. I believe the Peptide Builder works with atoms section to build the sequences. Therefore, if your atoms section has not all atoms of all residues they will not appear correctly. I have been worked with seqres section on pdb. More specific I have studied the biopython pdb module and I tried to add it in SMCRA representation. However, this moment I don't have time to talk with Biopython project because I have to work with my Ph.D project. Anyway, if you want to test this project, please see [1]. In this project, there is a script to split pdb in chains. [1] https://github.com/rodrigofaccioli/ContributeToBioPython I hope you enjoy it. Best regards, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 On Fri, Nov 19, 2010 at 6:35 PM, Jo?o Rodrigues wrote: > The Peptide Builder serves for much more than generating sequenes. However, > having the SMCRA representation of the model is much handier (IMO) to get > this kind of information. > > I'm not really sure if I answered your question though.. > > Jo?o [...] Rodrigues > http://doeidoei.wordpress.com > > > > On Fri, Nov 19, 2010 at 9:31 PM, km wrote: > > > Thanks. that works well. > > but why have polypeptide builder for generating the sequence ? > > ppb.get_sequence() ? > > when we can do the same by reading the residues under chain of the > > structure hierarchy as u suggested? > > > > > > On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues > > wrote: > > > Hello KM, > > > > > > You can iterate over the chains as described in the manual and then for > > each > > > residue call get_id(). > > > > > > for residue in structure.get_residues(): > > > print residue.get_id()[1] > > > > > > Best > > > > > > Jo?o [...] Rodrigues > > > http://doeidoei.wordpress.com > > > > > > > > > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > > >> > > >> Hi all, > > >> I would like to derive amino acid ids (number) corresponding to the > > >> residues in polypeptide chains extracted from PDB file using biopython > > >> PDB module. > > >> Is it possible to obtain this information with PDB module of biopython > ? > > >> > > >> regards, > > >> KM > > >> _______________________________________________ > > >> Biopython mailing list - Biopython at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From ming.xue at boehringer-ingelheim.com Mon Nov 22 09:58:27 2010 From: ming.xue at boehringer-ingelheim.com (ming.xue at boehringer-ingelheim.com) Date: Mon, 22 Nov 2010 09:58:27 -0500 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") References: Message-ID: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Hi All, I was installing biopython 1.55 over python 2.6 and testing generated the error: ====================================================================== ERROR: Bio.Entrez.efetch(genome, X52960, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/src/biopython-1.55/build/lib.linux-x86_64-2.6/Bio/SeqIO/__init__.py", line 585, in read raise ValueError("No records found in handle") ValueError: No records found in handle ====================================================================== ERROR: Bio.ExPASy.get_sprot_raw("O23729") ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_online.py", line 53, in test_get_sprot_raw "internet (or maybe just ExPASy) not available") MissingExternalDependencyError: internet (or maybe just ExPASy) not available ---------------------------------------------------------------------- Ran 144 tests in 93.664 seconds FAILED (failures = 1) Since I am behind proxy with ntlm authentication, my guess the issue is related to firewall. I tried some solution from internet such as ntlm-map and python-ntlm but no luck. Any suggestions? Best and thanks, MX From biopython at maubp.freeserve.co.uk Mon Nov 22 10:17:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Nov 2010 15:17:44 +0000 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") In-Reply-To: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> References: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Message-ID: On Mon, Nov 22, 2010 at 2:58 PM, wrote: > Hi All, > > I was installing biopython 1.55 over python 2.6 and testing generated the > error: > ====================================================================== > ERROR: Bio.Entrez.efetch(genome, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > ValueError: No records found in handle > > ====================================================================== > ERROR: Bio.ExPASy.get_sprot_raw("O23729") > ---------------------------------------------------------------------- > ... > MissingExternalDependencyError: internet (or maybe just ExPASy) not available > > ---------------------------------------------------------------------- > Ran 144 tests in 93.664 seconds > > FAILED (failures = 1) > > Since I am behind proxy with ntlm authentication, my guess the issue is > related to firewall. I tried some solution from internet such as ntlm-map and > python-ntlm but no luck. Any suggestions? > > Best and thanks, > > MX Hi, I see similar errors on my Windows machine, and I think this is down to needing to use the institute proxy. Currently Bio.Entrez uses urllib (version 1), and I don't know if this can be modified to work with python-ntlm or not. What have you tried? If you are happy to try experimenting, you could modify the Bio.Entrez _open function to use urllib2 with python-ntlm. If you had a simple proxy (without authentication) you'd do something like this: import os os.environ["http_proxy"] = "http://proxyhost.example.com:8080" See the "Using a proxy" section in the Entrez chapter of the tutorial for more details. Regards, Peter From akooser at unm.edu Mon Nov 22 22:52:15 2010 From: akooser at unm.edu (Ara Kooser) Date: Mon, 22 Nov 2010 20:52:15 -0700 Subject: [Biopython] Extracting data genpept files Message-ID: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Hello all, I think Peter pointed me to part of this code (shown below) for extracting data out of a genpept file. I am trying to get a handle on the formating end of things. My questions is when there is missing taxonomic data grabbed by tax_records = gb_record.annotations["taxonomy"] instead of leaving the space blank the program fills it in with the next piece of data, usually the date. This throws off the whole spreadsheet when I import as a CSV file. Is there a way to have the program write in white space when it encounters missing data instead of the date? Thanks, Ara PS as soon as the formating is sorted out and folks created for input and such I will post the code up here. gg = open("raw_genbank.txt","w") gb_file = "sequence.gp.txt" for gb_record in SeqIO.parse(open(gb_file,"r"), "genbank"): gb_feature = gb_record.features[2] def index_genbank_features(gb_record, feature_type, qualifier) : answer = dict() for (index, feature) in enumerate(gb_record.features) : if feature.type==feature_type : if qualifier in feature.qualifiers : for value in feature.qualifiers[qualifier] : if value in answer : print "WARNING - Duplicate key %s for %s features %i and %i" \ % (value, feature_type, answer[value], index) else : answer[value] = index return answer locus_tag_cds_index = index_genbank_features(gb_record,"CDS","locus_tag") coded_by_cds_index = index_genbank_features(gb_record,"CDS","coded_by") name_by_source_index = index_genbank_features(gb_record,"source","organism") protein_id_cds_index = index_genbank_features(gb_record,"CDS","protein_id") gb_annotations = gb_record.annotations tax_records = gb_record.annotations["taxonomy"] accession = gb_record.annotations["accessions"] date = gb_record.annotations["date"] function = gb_record.description gg.write(str([accession, locus_tag_cds_index, coded_by_cds_index, name_by_source_index, tax_records, date, function])) gg.write("\n") gg.close() From biopython at maubp.freeserve.co.uk Tue Nov 23 03:53:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 08:53:52 +0000 Subject: [Biopython] Extracting data genpept files In-Reply-To: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Message-ID: On Tue, Nov 23, 2010 at 3:52 AM, Ara Kooser wrote: > Hello all, > > ? I think Peter pointed me to part of this code (shown below) for extracting > data out of a genpept file. I am trying to get a handle on the formating end > of things. My questions is when there is missing taxonomic data grabbed by > tax_records = gb_record.annotations["taxonomy"] instead of leaving the space > blank the program fills it in with the next piece of data, usually the date. > This throws off the whole spreadsheet when I import as a CSV file. > If I understood your aim, try using this if the taxonomy isn't in the annotations dictionary (which would give a KeyError), tax_records = gb_record.annotations.get("taxonomy", []) Perhaps you could clarify if you want the taxonomy (a list of variable length) to go in one column of your CSV file? Peter P.S. I prefer using tab separated variables (tsv) over csv, as I find commas in descriptions quite often - and although this can be dealt with it is fiddly. From akooser at unm.edu Tue Nov 23 08:27:21 2010 From: akooser at unm.edu (Ara Kooser) Date: Tue, 23 Nov 2010 06:27:21 -0700 Subject: [Biopython] Extracting data genpept files In-Reply-To: References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Message-ID: <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> Peter, Thank you for your response. > > If I understood your aim, try using this if the taxonomy isn't in the > annotations > dictionary (which would give a KeyError), > > tax_records = gb_record.annotations.get("taxonomy", []) I will give this a try later in the morning. > > Perhaps you could clarify if you want the taxonomy (a list of > variable length) > to go in one column of your CSV file? > I would like the whole taxonomy to go into one column but couldn't figure out how to do that. > Peter > > P.S. I prefer using tab separated variables (tsv) over csv, as I > find commas > in descriptions quite often - and although this can be dealt with it > is fiddly. I wanted to use tabs but here again wasn't sure of how to code that. Commas have been my bane while trying to write this code. Ara From biopython at maubp.freeserve.co.uk Tue Nov 23 09:09:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 14:09:57 +0000 Subject: [Biopython] Extracting data genpept files In-Reply-To: <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> Message-ID: On Tue, Nov 23, 2010 at 1:27 PM, Ara Kooser wrote: > Peter, > > I would like the whole taxonomy to go into one column but couldn't figure > out how to do that. > > ... > > I wanted to use tabs but here again wasn't sure of how to code that. Commas > have been my bane while trying to write this code. I'd suggest writing the taxonomy in a single column, space separated. You can do that with the python string join method. For example, if taxonomy_list is a list of strings, use: taxonomy_str = " ".join(taxonomy_list) Regarding tabs, use slash t to get a tab in Python, e.g. "\t", or: name = "Test" x = 10 y = -5 handle = open("example.tsv", "w") handle.write("%s\t%i\t%i\n" % (name, x, y)) handle.close() Peter From ming.xue at boehringer-ingelheim.com Wed Nov 24 12:47:16 2010 From: ming.xue at boehringer-ingelheim.com (ming.xue at boehringer-ingelheim.com) Date: Wed, 24 Nov 2010 12:47:16 -0500 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") In-Reply-To: References: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Message-ID: <5C105629EFD02E42B2C3E3113755F1211273AB86@RDGEXM01.am.boehringer.com> Hi, I installed python-ntlm from http://code.google.com/p/python-ntlm/ and tried their example using urllib2 to retrieve Google home page but still get "407: Proxy Authentication Required" error. I tried ntlmaps-0.9.9.0.1 and it didn't solve the problem either. I am not going to try to fix the issue as I only need to parse behind the firewall. Thanks a lot. Cheers, MX -----Original Message----- From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On Behalf Of Peter Sent: Monday, November 22, 2010 10:18 AM To: Xue,Ming IT BIP-US-R Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") On Mon, Nov 22, 2010 at 2:58 PM, wrote: > Hi All, > > I was installing biopython 1.55 over python 2.6 and testing generated the > error: > ====================================================================== > ERROR: Bio.Entrez.efetch(genome, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > ValueError: No records found in handle > > ====================================================================== > ERROR: Bio.ExPASy.get_sprot_raw("O23729") > ---------------------------------------------------------------------- > ... > MissingExternalDependencyError: internet (or maybe just ExPASy) not available > > ---------------------------------------------------------------------- > Ran 144 tests in 93.664 seconds > > FAILED (failures = 1) > > Since I am behind proxy with ntlm authentication, my guess the issue is > related to firewall. I tried some solution from internet such as ntlm-map and > python-ntlm but no luck. Any suggestions? > > Best and thanks, > > MX Hi, I see similar errors on my Windows machine, and I think this is down to needing to use the institute proxy. Currently Bio.Entrez uses urllib (version 1), and I don't know if this can be modified to work with python-ntlm or not. What have you tried? If you are happy to try experimenting, you could modify the Bio.Entrez _open function to use urllib2 with python-ntlm. If you had a simple proxy (without authentication) you'd do something like this: import os os.environ["http_proxy"] = "http://proxyhost.example.com:8080" See the "Using a proxy" section in the Entrez chapter of the tutorial for more details. Regards, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 18:12:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 23:12:14 +0000 Subject: [Biopython] Biopython 1.56 Message-ID: Dear Biopythoneers, On behalf of the developers, I'm pleased to announce we released Biopython 1.56 earlier today. For more details please see: http://news.open-bio.org/news/2010/11/biopython-1-56-released/ Please note this will probably be the last release to support Python 2.4, see: http://news.open-bio.org/news/2010/11/dropping-python24-support/ (At least) 13 people have contributed to this release, including 6 new people ? thank you all: * Andrea Pierleoni (first contribution) * Bart de Koning (first contribution) * Bartek Wilczynski * Bartosz Telenczuk (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Michiel de Hoon * Peter Cock * Phillip Garland (first contribution) * Siong Kong (first contribution) * Tiago Antao * Uri Laserson (first contribution) Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter From arkanion at gmail.com Sun Nov 28 13:06:15 2010 From: arkanion at gmail.com (=?ISO-8859-1?Q?=D6mer?=) Date: Sun, 28 Nov 2010 20:06:15 +0200 Subject: [Biopython] Using ClustalW Webtool Message-ID: To make a multiple alignment based on WebTools, I followed the instructions in Tutorial, but I have a problem with creating outfile for Clustalw where it says: "By default ClustalW will generate an alignment and guide tree file with names based on the input FASTA file, in this case opuntia.aln and opuntia.dnd, but you can override this or make it explicit:" However, it does not generate these files. What could be the problem and how can I solve it? Thanks, Greetings From biopython at maubp.freeserve.co.uk Sun Nov 28 16:07:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Nov 2010 21:07:02 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: References: Message-ID: On Sun, Nov 28, 2010 at 6:06 PM, ?mer wrote: > To make a multiple alignment based on WebTools, I followed the > instructions in Tutorial, but I have a problem with creating outfile for > Clustalw where it says: > > "By default ClustalW will generate an alignment and guide tree file with > names based on the input FASTA file, in this case opuntia.aln and > opuntia.dnd, but you can override this or make it explicit:" > > However, it does not generate these files. What could be the > problem and how can I solve it? Hello ?mer, I'm puzzled by the title of your email, "Using ClustalW Webtool". Are you actually running ClustalW on your machine, or using a website with ClustalW on it? The Biopython tutorial is talking about the command line tool ClustalW which you must first install on your own computer. Have you installed it? If so how? How are you trying to run ClustalW? Regards, Peter From biopython at maubp.freeserve.co.uk Sun Nov 28 17:59:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Nov 2010 22:59:19 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> References: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> Message-ID: On Sun, Nov 28, 2010 at 9:55 PM, ?mer AN wrote: > > Hi Peter, > > Actually I solved the problem on my own by installing ClustalW > to my local machine and setting the directory. Good. > But is it possible to run it over internet i.e. telling the url to the > python ? If so, how? In theory that is possible - for instance Biopython has code to talk to the NCBI servers to run BLAST remotely. In this case, why bother? For most datasets ClustalW is fast enough to run on an ordinary computer. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 03:48:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 08:48:17 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: <5ABA7C6C9DD5440FAED2AEC5FEF4D29B@MYWORLD> References: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> <5ABA7C6C9DD5440FAED2AEC5FEF4D29B@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 12:40 AM, ?mer AN wrote: > > One last vital question I have. I installed mafft.bat file to run an > alignment similar to clustalw.exe. It is better for me because it has an > option to choose the matrix such as Blosum 45 , 80 etc. However, it does not > generate an output file by default for the alignment ! How can I create an > output file? Which option should i switch on or what value should i assign? > So that i can read the alignment file via AlignIO? I checked everywhere but > couldn't find ! > The Tutorial doesn't (currently) show any MAFFT examples, but you would use the MafftCommandline wrapper class from the module Bio.Align.Applications very similarly to how you use ClustalW with ClustalwCommandline. Peter P.S. Please CC the mailing list so that other people can benefit from the discussion, or help. Thanks! From arkanion at gmail.com Mon Nov 29 05:26:26 2010 From: arkanion at gmail.com (=?ISO-8859-1?Q?=D6mer?=) Date: Mon, 29 Nov 2010 12:26:26 +0200 Subject: [Biopython] Mafft Alignment Output file Message-ID: I installed mafft.bat file to run an alignment similar to clustalw.exe shown in the Tutorial. I prefer Mafft because it has an option to choose the matrix such as Blosum 45 , 80 etc. However, it does not generate an output file by default for the alignment ! How can I create an output file? Which option should i switch on or what value should i assign? So that i can read the alignment file via AlignIO? It has parameters for the output file but the file cannot be created itself ! It is simply generated by default when used ClustalWCommandline. I checked everywhere but couldn't find ! Any idea? From biopython at maubp.freeserve.co.uk Mon Nov 29 06:05:01 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 11:05:01 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: > I installed mafft.bat file to run an alignment similar to clustalw.exe > shown in the Tutorial. I prefer Mafft because it has an option to > choose the matrix such as Blosum 45 , 80 etc. I think MAFFT is also faster, especially with large alignments. > However, it does not generate an output file by default for the alignment ! > How can I create an output file? Which option should i switch on or what > value should i assign? So that i can read the alignment file via AlignIO? > It has parameters for the output file but the file cannot be created itself > ! It is simply generated by default when used ClustalWCommandline. > I checked everywhere but couldn't find ! Any idea? What are you doing right now? Can you show us your code? Are you using the MafftCommandline wrapper as I suggested here?: http://lists.open-bio.org/pipermail/biopython/2010-November/006894.html I'm looking at the code right now, and it could do with a basic example. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 06:32:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 11:32:11 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 11:05 AM, Peter wrote: > On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: >> I installed mafft.bat file to run an alignment similar to clustalw.exe >> shown in the Tutorial. I prefer Mafft because it has an ?option to >> choose the matrix such as Blosum 45 , 80 etc. > > I think MAFFT is also faster, especially with large alignments. > >> However, it does not generate an output file by default for the alignment ! >> How can I create an output file? Which option should i switch on or what >> value should i assign? So that i can read the alignment file via AlignIO? >> It has parameters for the output file but the file cannot be created itself >> ! It is simply generated by default when used ClustalWCommandline. >> I checked everywhere but couldn't find ! Any idea? > > What are you doing right now? Can you show us your code? > > Are you using the MafftCommandline wrapper as I suggested here?: > http://lists.open-bio.org/pipermail/biopython/2010-November/006894.html > > I'm looking at the code right now, and it could do with a basic example. > > Peter > How about something like this: from Bio.Align.Applications import MuscleCommandline muscle_exe = r"C:\Program Files\Aligments\muscle3.8.31_i86win32.exe" in_file = r"C:\My Documents\unaligned.fasta" out_file = r"C:\My Documents\aligned.fasta" muscle_cline = MuscleCommandline(muscle_exe, input=in_file, out=out_file) print muscle_cline stdout, stderr = muscle_cline() Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 07:06:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:06:18 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 11:05 AM, Peter wrote: > On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: >> I installed mafft.bat file to run an alignment similar to clustalw.exe >> shown in the Tutorial. I prefer Mafft because it has an ?option to >> choose the matrix such as Blosum 45 , 80 etc. > > I think MAFFT is also faster, especially with large alignments. Sorry, I meant MUSCLE, not MAFFT. I haven't compare the speed of MAFFT. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 07:11:49 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:11:49 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <12DE657F8EB44E7B8007C25E607E6731@MYWORLD> References: <12DE657F8EB44E7B8007C25E607E6731@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 11:37 AM, ?mer AN wrote: > Here is my code in the attachment. My problem is mafft.bat does not create > an output file. What is the contents of your mafft.bat file? Batch files are just plain text. How did you install MAFFT on Windows? Last time I tried it wasn't obvious how to do this, and I gave up. >From your example, from Bio.Align.Applications import MafftCommandline mafft_bat = r"mafft.bat" assert os.path.isfile(mafft_bat), "mafft.bat file is missing" mafft_cline = MafftCommandline(mafft_bat, input="sequences.fasta", clustalout="on", treeout="on", bl="45") stdout, stderr = mafft_cline() What happens if you add: print stdout print stderr Are there any error messages? Peter P.S. The clustalout and treeout switches should really be booleans, the wrapper documentation is unclear here. e.g. mafft_cline = MafftCommandline(mafft_bat, input="sequences.fasta", clustalout=True, treeout=True, bl="45") From biopython at maubp.freeserve.co.uk Mon Nov 29 16:35:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 21:35:19 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> References: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 8:51 PM, ?mer AN wrote: > > How can i write alignment output to a file? > I can actually see the alignment when i use the command > > print stdout > > on the shell, but how can i make it write on a file? > > Thanks in advance If you have the alignment in the string stdout, you can save it to a file just like saving any data to a text file in Python: handle = open("output.txt", "w") handle.write(stdout) handle.close() Most command line tools will write the file themselves rather than printing the output to screen (stdout), or give you the choice. It looks like MAFFT only writes to stdout - which could be redirected to a file, but I think the above is clearer. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 17:31:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 22:31:10 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <97485881576F49C4A17750F34098CA81@MYWORLD> References: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> <97485881576F49C4A17750F34098CA81@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 9:44 PM, ?mer AN wrote: > > Now everything is clear, this is a nice alternative to replace with ! > > Thanks a lot for your assistance ! > I'm glad we solved it - I haven't used MAFFT recently so this was not fresh in my mind. I have tried to make some clarifications to the MafftCommandline documentation to improve this for the next release of Biopython. So thank you for your feedback. Regards, Peter From mike.thon at gmail.com Mon Nov 1 05:41:41 2010 From: mike.thon at gmail.com (Michael Thon) Date: Mon, 1 Nov 2010 06:41:41 +0100 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote: > > > On Sun, Oct 31, 2010 at 1:57 PM, Eric Talevich wrote: > On Sun, Oct 31, 2010 at 12:03 PM, Michael Thon wrote: > I have a Clade object and I need to access its parent clade. I thought that clade.root should do this but this seems to contain a reference to itself: > > (Pdb) main_clade == main_clade.root > True > > Is there some other way? > Thanks > Mike > > > Hi Mike, > > You can do this, assuming you have the original tree object (call it "tree"): > > parent = tree.get_path(main_clade)[-2] > > This is an O(n) operation on the tree, so if you need to do it repeatedly on a large tree, it's faster to call tree.get_path(clade) once outside the loop and then reuse the resulting list. > > Is the operation you're doing here part of something you'd like to see implemented as a tree method? > > Maybe - it seems to me that if I can access children of a clade from the clade, then I should also be able to go the other way and access the parent. I don't know how often people would need this functionality though. Does a Clade contain a reference to its tree? I have a recursive function that does some crunching on a Clade and then recursively processes the child clades. I could pass in the tree object as well, but I figure that a Clade must know about its tree so there should be some way to access it. Thanks Mike > I added a cookbook entry on the Biopython wiki for this problem: > http://biopython.org/wiki/Phylo_cookbook#Get_the_parent_of_a_clade > > Cheers, > Eric From biopython at maubp.freeserve.co.uk Mon Nov 1 11:34:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Nov 2010 11:34:32 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 11:24 AM, saikari keitele wrote: > Many thanks for your reply. > Does that mean that pccompound and pcassay databases can not be queried > programmatically, they just have to be queried manually? > Thanks again Please ask the NCBI about this, and let us know what they say. Thank you, Peter From saikari78 at gmail.com Mon Nov 1 11:24:48 2010 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 1 Nov 2010 11:24:48 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: Many thanks for your reply. Does that mean that pccompound and pcassay databases can not be queried programmatically, they just have to be queried manually? Thanks again On Fri, Oct 29, 2010 at 1:13 PM, Peter wrote: > On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele > wrote: > > Hi, > > > > I'm using BioPython to query the NCBI pccompound database. > > I'm trying to retrieve the molecular weight of a compound given its > > InChIKey. > > Gettting the ID of the compound with esearch works fine. For instance: > > > > Entrez.esearch(db="pccompound", > > term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') > > > > However, when I try to retrieve the record's content with efetch from the > ID > > returned by esearch, like this: > > > > Entrez.efetch(db="pcassay", id="2244") > > > > I get the following response: > > ... > > Error occurred: Report 'ASN1' not found in 'pccompound' presentation > > ... > > > > I've tried to use other return types and return modes, like for instance > > > > Entrez.efetch(db="pcassay", id="2244", rettype="abstract", > retmode="text") > > > > but I have not succeeded in retrieveing this compound's record's content. > > Many thanks for any help on how to retrieve information on a compound > from > > pccompound. > > > > Best wishes > > > > Saikari > > If you go to the webpage for this, > http://www.ncbi.nlm.nih.gov/pcassay?term=2244 > then you don't actually get any download links - rather it connects to the > BioAssay server to retrieve data. My guess is the NCBI don't support > efetch for the pcassay database - you'll have to email them and ask. > > Peter > From cjfields at illinois.edu Mon Nov 1 14:50:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 09:50:59 -0500 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: References: Message-ID: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> Try using esummary instead of efetch to get that information programmatically. Some database information can't be retrieved via efetch (I think pcassay/pccomopund are two of those), but the summary of the information for any database is retrievable. Using the BioPerl eutil interface, one does this to just dump the returned information. One can also get at the various bits of that data programmatically as well using generic constructs, but you have to know the tag names for the data you are looking for. There should be an analogous Biopython way to do this. =================================== use Bio::DB::EUtilities; my $term = '"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]'; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -term => $term, -email => 'cjfields at bioperl.org', -db => 'pccompound', -usehistory => 'y'); my $hist = $eutil->next_History || die "Can't retrieve history data"; $eutil->set_parameters(-eutil => 'esummary', -history => $hist); $eutil->print_all; =================================== chris On Nov 1, 2010, at 6:24 AM, saikari keitele wrote: > Many thanks for your reply. > Does that mean that pccompound and pcassay databases can not be queried > programmatically, they just have to be queried manually? > Thanks again > > On Fri, Oct 29, 2010 at 1:13 PM, Peter wrote: > >> On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele >> wrote: >>> Hi, >>> >>> I'm using BioPython to query the NCBI pccompound database. >>> I'm trying to retrieve the molecular weight of a compound given its >>> InChIKey. >>> Gettting the ID of the compound with esearch works fine. For instance: >>> >>> Entrez.esearch(db="pccompound", >>> term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') >>> >>> However, when I try to retrieve the record's content with efetch from the >> ID >>> returned by esearch, like this: >>> >>> Entrez.efetch(db="pcassay", id="2244") >>> >>> I get the following response: >>> ... >>> Error occurred: Report 'ASN1' not found in 'pccompound' presentation >>> ... >>> >>> I've tried to use other return types and return modes, like for instance >>> >>> Entrez.efetch(db="pcassay", id="2244", rettype="abstract", >> retmode="text") >>> >>> but I have not succeeded in retrieveing this compound's record's content. >>> Many thanks for any help on how to retrieve information on a compound >> from >>> pccompound. >>> >>> Best wishes >>> >>> Saikari >> >> If you go to the webpage for this, >> http://www.ncbi.nlm.nih.gov/pcassay?term=2244 >> then you don't actually get any download links - rather it connects to the >> BioAssay server to retrieve data. My guess is the NCBI don't support >> efetch for the pcassay database - you'll have to email them and ask. >> >> Peter >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From saikari78 at gmail.com Mon Nov 1 15:53:45 2010 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 1 Nov 2010 15:53:45 +0000 Subject: [Biopython] Entrez.efetch problem when querying pccompound database In-Reply-To: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> References: <9B3549C1-7BC8-4D2D-BB6C-751C282AA8A1@illinois.edu> Message-ID: Many thanks! By using esummary as you suggest I can retrieve all the information from pccompound and pcassay with BioPython. For instance, for retrieving the molecular weight of a compound given its InChIKey : handle=Entrez.esearch(db="pccompound", term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') records = Entrez.read(handle) molWeight = Entrez.read(Entrez.esummary(db="pccompound", id=records["IdList"][0]))[0]['MolecularWeight'] Thanks again. On Mon, Nov 1, 2010 at 2:50 PM, Chris Fields wrote: > Try using esummary instead of efetch to get that information > programmatically. Some database information can't be retrieved via efetch > (I think pcassay/pccomopund are two of those), but the summary of the > information for any database is retrievable. > > Using the BioPerl eutil interface, one does this to just dump the returned > information. One can also get at the various bits of that data > programmatically as well using generic constructs, but you have to know the > tag names for the data you are looking for. There should be an analogous > Biopython way to do this. > > =================================== > > use Bio::DB::EUtilities; > > my $term = '"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]'; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -term => $term, > -email => 'cjfields at bioperl.org', > -db => 'pccompound', > -usehistory => 'y'); > > my $hist = $eutil->next_History || die "Can't retrieve history data"; > > $eutil->set_parameters(-eutil => 'esummary', > -history => $hist); > > $eutil->print_all; > > =================================== > > chris > > On Nov 1, 2010, at 6:24 AM, saikari keitele wrote: > > > Many thanks for your reply. > > Does that mean that pccompound and pcassay databases can not be queried > > programmatically, they just have to be queried manually? > > Thanks again > > > > On Fri, Oct 29, 2010 at 1:13 PM, Peter >wrote: > > > >> On Fri, Oct 29, 2010 at 12:26 PM, saikari keitele > >> wrote: > >>> Hi, > >>> > >>> I'm using BioPython to query the NCBI pccompound database. > >>> I'm trying to retrieve the molecular weight of a compound given its > >>> InChIKey. > >>> Gettting the ID of the compound with esearch works fine. For instance: > >>> > >>> Entrez.esearch(db="pccompound", > >>> term='"BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey]') > >>> > >>> However, when I try to retrieve the record's content with efetch from > the > >> ID > >>> returned by esearch, like this: > >>> > >>> Entrez.efetch(db="pcassay", id="2244") > >>> > >>> I get the following response: > >>> ... > >>> Error occurred: Report 'ASN1' not found in 'pccompound' presentation > >>> ... > >>> > >>> I've tried to use other return types and return modes, like for > instance > >>> > >>> Entrez.efetch(db="pcassay", id="2244", rettype="abstract", > >> retmode="text") > >>> > >>> but I have not succeeded in retrieveing this compound's record's > content. > >>> Many thanks for any help on how to retrieve information on a compound > >> from > >>> pccompound. > >>> > >>> Best wishes > >>> > >>> Saikari > >> > >> If you go to the webpage for this, > >> http://www.ncbi.nlm.nih.gov/pcassay?term=2244 > >> then you don't actually get any download links - rather it connects to > the > >> BioAssay server to retrieve data. My guess is the NCBI don't support > >> efetch for the pcassay database - you'll have to email them and ask. > >> > >> Peter > >> > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From eric.talevich at gmail.com Tue Nov 2 01:20:30 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 1 Nov 2010 21:20:30 -0400 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: On Mon, Nov 1, 2010 at 1:41 AM, Michael Thon wrote: > > On Oct 31, 2010, at 8:23 PM, Eric Talevich wrote: > > > > Is the operation you're doing here part of something you'd like to see > implemented as a tree method? > > > > > Maybe - it seems to me that if I can access children of a clade from the > clade, then I should also be able to go the other way and access the parent. > I don't know how often people would need this functionality though. > > Does a Clade contain a reference to its tree? I have a recursive function > that does some crunching on a Clade and then recursively processes the child > clades. I could pass in the tree object as well, but I figure that a Clade > must know about its tree so there should be some way to access it. > PyCogent does work that way, but Bio.Phylo's data structure is simpler -- a Tree has a single root Clade (tree.root or tree.clade), and each Clade has a plain Python list of child Clades (clade.clades), all the way down. It doesn't track any references to the parent or the original tree, so the tree can never have an inconsistent internal state... because there is no internal state. I haven't needed the parent references so far for the Tree/Clade methods or my own scripts, surprisingly. Calling get_path once or twice has been enough. (I could probably speed up common_ancestor by using the all_parents dictionary approach in the cookbook, at the expense of memory.) Do you or anyone else want to try plugging that all_parents function into your code to see if it helps significantly? If it does, I could add it as a Tree/Clade method in the next Biopython release. Best, Eric From mike.thon at gmail.com Tue Nov 2 09:58:46 2010 From: mike.thon at gmail.com (Michael Thon) Date: Tue, 2 Nov 2010 10:58:46 +0100 Subject: [Biopython] getting the parent of a Clade In-Reply-To: References: Message-ID: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> Hi Eric > > Do you or anyone else want to try plugging that all_parents function into your code to see if it helps significantly? If it does, I could add it as a Tree/Clade method in the next Biopython release. > I can try it - I have a few 1000 trees to parse so any differences in performance should be more obvious. But first, I realized that I should have explained the problem I'm solving in more detail, to see if I'm approaching it the right way. I need to visit every node in the tree, and then compare the node to its parent and do some calculations. I'm doing this by writing a recursion that starts with tree.clade and then calls itself twice with clade.clade[0] and clade.clades[1] . then within the function I need to get the parent clade and do the calculations. def crunch_clade(tree, clade): compute_data(clade, get_parent(tree, clade) crunch_clade(tree, clade.clades[0]) crunch_clade(tree, clade.clades[1]) Is there a better way to do it? Like maybe starting with the terminal clades? Mike From eric.talevich at gmail.com Tue Nov 2 15:44:36 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 2 Nov 2010 11:44:36 -0400 Subject: [Biopython] getting the parent of a Clade In-Reply-To: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> References: <4E425445-5A7E-4FFB-B7FF-82ED243AACB4@gmail.com> Message-ID: On Tue, Nov 2, 2010 at 5:58 AM, Michael Thon wrote: > Hi Eric > > > > Do you or anyone else want to try plugging that all_parents function into > your code to see if it helps significantly? If it does, I could add it as a > Tree/Clade method in the next Biopython release. > > > > > I can try it - I have a few 1000 trees to parse so any differences in > performance should be more obvious. > > But first, I realized that I should have explained the problem I'm solving > in more detail, to see if I'm approaching it the right way. I need to visit > every node in the tree, and then compare the node to its parent and do some > calculations. I'm doing this by writing a recursion that starts with > tree.clade and then calls itself twice with clade.clade[0] and > clade.clades[1] . then within the function I need to get the parent clade > and do the calculations. > > def crunch_clade(tree, clade): > compute_data(clade, get_parent(tree, clade) > crunch_clade(tree, clade.clades[0]) > crunch_clade(tree, clade.clades[1]) > > Is there a better way to do it? Like maybe starting with the terminal > clades? > > Mike > > The tree traversal functions in Bio.Phylo are fairly efficient and flexible. I'm not sure if the traversal order matters for your function, but you could try something like: parent_lookup = all_parents(tree) # from the cookbook for clade in tree.find_clades(): compute_data(clade, parent_lookup[clade]) Or, possibly: for parent in tree.get_nonterminals(): for child in parent: compute_data(child, parent) Notice that get_terminals() and get_nonterminals() are simplified versions of find_clades(). They return plain lists instead of being generator functions, but the filtering arguments aren't as flexible. Also, see the tutorial section 12.4.1 on traversal: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc167 In particular, if you need to do level-order search, it looks like this: tree.find_clades(order='level') Hope that helps, Eric From matsen at fhcrc.org Wed Nov 3 12:06:05 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:06:05 -0700 Subject: [Biopython] Lightweight version of Biopython? Message-ID: Hello there Biopython community-- We're writing some python code to use SCons for reproducible bioinformatics research with intelligent dependencies. As part of the project, we often need to do very simple bioinformatics tasks, such as reading in various formats and spitting out others. We could use Biopython for such things, but it's a very heavy dependency for such trivial tasks. I'm curious if there exists a Biopython "lite". The ideal situation would be a tiny module that we could include directly in our project. I have searched the Biopython mailing list and have yet to find anything. Thanks in advance, Erick From biopython at maubp.freeserve.co.uk Wed Nov 3 12:10:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 12:10:58 +0000 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 12:06 PM, Erick Matsen wrote: > Hello there Biopython community-- > > We're writing some python code to use SCons for reproducible > bioinformatics research with intelligent dependencies. As part of the > project, we often need to do very simple bioinformatics tasks, such as > reading in various formats and spitting out others. We could use > Biopython for such things, but it's a very heavy dependency for such > trivial tasks. > > I'm curious if there exists a Biopython "lite". The ideal situation > would be a tiny module that we could include directly in our project. > I have searched the Biopython mailing list and have yet to find > anything. > > Thanks in advance, > > Erick Hi Erick, Why do you consider Biopython a heavy dependency? It can be installed with no 3rd party libraries (although we do strongly recommend NumPy, if you are not using anything numerical you don't need it). Peter From matsen at fhcrc.org Wed Nov 3 12:16:07 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:16:07 -0700 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> Message-ID: Hello Peter-- Thanks for the quick response. Perhaps "heavy" isn't the right word, but it definitely seems overkill when all we want to do is convert between three file formats (FASTA, phylip, and Stockholm). Furthermore, something little we can include directly in the project (with credit given where it is due) would be ideal. An installation script could look for Biopython and use it if it's there, or use the lite version if not. Thanks, Erick On Wed, Nov 3, 2010 at 5:10 AM, Peter wrote: > On Wed, Nov 3, 2010 at 12:06 PM, Erick Matsen wrote: >> Hello there Biopython community-- >> >> We're writing some python code to use SCons for reproducible >> bioinformatics research with intelligent dependencies. As part of the >> project, we often need to do very simple bioinformatics tasks, such as >> reading in various formats and spitting out others. We could use >> Biopython for such things, but it's a very heavy dependency for such >> trivial tasks. >> >> I'm curious if there exists a Biopython "lite". The ideal situation >> would be a tiny module that we could include directly in our project. >> I have searched the Biopython mailing list and have yet to find >> anything. >> >> Thanks in advance, >> >> Erick > > Hi Erick, > > Why do you consider Biopython a heavy dependency? It can > be installed with no 3rd party libraries (although we do strongly > recommend NumPy, if you are not using anything numerical > you don't need it). > > Peter > From biopython at maubp.freeserve.co.uk Wed Nov 3 12:38:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 12:38:29 +0000 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> Message-ID: On Wed, Nov 3, 2010 at 12:16 PM, Erick Matsen wrote: > Hello Peter-- > > Thanks for the quick response. > > Perhaps "heavy" isn't the right word, but it definitely seems overkill > when all we want to do is convert between three file formats (FASTA, > phylip, and Stockholm). Furthermore, something little we can include > directly in the project (with credit given where it is due) would be > ideal. An installation script could look for Biopython and use it if > it's there, or use the lite version if not. > Hi Erick, Under our licence you could take a snapshot copy of the bits you need and include them in your application. However, the bits you'd need for Bio.AlignIO would basically be the core sequence stuff in Biopython (at very least Bio.Alphabet, Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, Bio.SeqIO, Bio.Align, and Bio.AlignIO). This adds up to a lot of files, but would be possible... How are you distributing your application? If using py2exe or a source code bundle you could in theory bundle in (part of) Biopython. If using pip or easy_install (i.e. PyPi packages) then it *should* take care of a dependency on Biopython. Suppose hypothetically with had a "lite" or "seq-core" version of Biopython containing Bio.SeqIO, Bio.AlignIO and their dependencies (but excluding Bio.PDB, Bio.Phylo, Bio.Cluster, Bio.Blast, BioSQL etc). The download would be smaller (say a third of the size), but it isn't that big to start with. There wouldn't be the (soft) dependency on NumPy. Neither of these benefits seems compelling to me. Regards, Peter From matsen at fhcrc.org Wed Nov 3 12:53:55 2010 From: matsen at fhcrc.org (Erick Matsen) Date: Wed, 3 Nov 2010 05:53:55 -0700 Subject: [Biopython] Lightweight version of Biopython? In-Reply-To: <9068_1288787911_4CD157C7_9068_879472_1_AANLkTikAPuppz=kU6HYM+-TgARWdvKHPtVGyJXXt00UF@mail.gmail.com> References: <5626_1288786261_4CD15155_5626_778761_1_AANLkTimp3rEVchwPr35NR=e3CXhk+0MyqMLo=CrUWawp@mail.gmail.com> <9068_1288787911_4CD157C7_9068_879472_1_AANLkTikAPuppz=kU6HYM+-TgARWdvKHPtVGyJXXt00UF@mail.gmail.com> Message-ID: Peter-- Thank you for the complete response. We will have to have a run-off between a biopython install and our homegrown file format converters. Erick On Wed, Nov 3, 2010 at 5:38 AM, Peter wrote: > On Wed, Nov 3, 2010 at 12:16 PM, Erick Matsen wrote: >> Hello Peter-- >> >> Thanks for the quick response. >> >> Perhaps "heavy" isn't the right word, but it definitely seems overkill >> when all we want to do is convert between three file formats (FASTA, >> phylip, and Stockholm). Furthermore, something little we can include >> directly in the project (with credit given where it is due) would be >> ideal. An installation script could look for Biopython and use it if >> it's there, or use the lite version if not. >> > > Hi Erick, > > Under our licence you could take a snapshot copy of the bits you need > and include them in your application. However, the bits you'd need for > Bio.AlignIO would basically be the core sequence stuff in Biopython > (at very least Bio.Alphabet, Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, > Bio.SeqIO, Bio.Align, and Bio.AlignIO). This adds up to a lot of files, > but would be possible... How are you distributing your application? If > using py2exe or a source code bundle you could in theory bundle in > (part of) Biopython. > > If using pip or easy_install (i.e. PyPi packages) then it *should* take > care of a dependency on Biopython. > > Suppose hypothetically with had a "lite" or "seq-core" version of > Biopython containing Bio.SeqIO, Bio.AlignIO and their dependencies > (but excluding Bio.PDB, Bio.Phylo, Bio.Cluster, Bio.Blast, BioSQL etc). > The download would be smaller (say a third of the size), but it isn't that > big to start with. There wouldn't be the (soft) dependency on NumPy. > Neither of these benefits seems compelling to me. > > Regards, > > Peter > From mcasfrox at gmail.com Wed Nov 3 13:24:23 2010 From: mcasfrox at gmail.com (Merche Castillo) Date: Wed, 3 Nov 2010 14:24:23 +0100 Subject: [Biopython] biopython in eclipse Message-ID: Hi there, I'm working in bioinformatics and I just began using python. I have installed a pydev in eclipse and I'd like to know if I can use biopython there too, and how i should install it. Could anyone help me? Thanks Best Regards, Merche From anaryin at gmail.com Wed Nov 3 13:46:08 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 3 Nov 2010 14:46:08 +0100 Subject: [Biopython] biopython in eclipse In-Reply-To: References: Message-ID: Hello Merche, Once you install Biopython, it should be available just like any other library in PyDev. If you install it somewhere else (e.g. /home/merche/biopython) make sure that that directory is in your PYTHONPATH. Check this thread for details on how to generally configure the PyDev Interpreter. Also, please have a look at this thread for an older discussion on the subject: http://lists.open-bio.org/pipermail/biopython/2009-February/004910.html It should be easy enough! :) Jo?o [...] Rodrigues http://doeidoei.wordpress.org On Wed, Nov 3, 2010 at 2:24 PM, Merche Castillo wrote: > Hi there, > > > I'm working in bioinformatics and I just began using python. I have > installed a pydev in eclipse and I'd like to know if I can use biopython > there too, and how i should install it. > > Could anyone help me? > > Thanks > > Best Regards, > Merche > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jp.verta at gmail.com Wed Nov 3 18:58:18 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 14:58:18 -0400 Subject: [Biopython] Primer3Commandline rookie question Message-ID: Hi all, I'm just starting with Python so mine's a rookie question. I'm trying to write a little program that would do primer design with Primer3Commandline, yet I can't figure out how to access the output. For example, when I try the same example as in the Application.py file under Primer3Commandline function ###### from Bio.Emboss.Applications import Primer3Commandline from Bio.Emboss import Primer3 sequence_handle = open("sequence", "w").write(">test\nCGGTCGCCGGACTATCAAATGGCCAGCGCTGAATTGTAGTACATTTCTCTGTCCTTTAATTCACAGGATTCTGTCATGTAAGATGCGCGATTACAGGTTTAACTGATCCAAACTTCTGCAAAACCGATAAAGAACCTAAATAAAGGAAAGGGACACTGCATAGATACAAACAAACGAAACGTTATCTCAGCAGGATTCATGGCTTCTGAAATTCAACTCATCACTATTGGACCTTCTCATTTTTGTGAAAAGGCACGATGGGGTTTGGACCGTGCTGGAATTTCTTATAAGGAGTCAAAGCATGCTCCTGTATTTCACATGTTATACACAAGGGGTTTGGGACAGGGAACCAGCTGTCCCAAGCTTGTACTAGGTGAGGGCAAAAATAAAGTGGTGTTGCATGAATCATCAGATATCTTGAAGTTTGCTGATGAGAACATCATGTCTGAAGAGGATCGCTTGTATCCATCGAACCTTGAGCAGTCTGTGCAAGAGTGGGGGCTGCATTCAAGGGAAAGGTTGTTAACTTGGTGTTTAATGCCATTGTTAAAGCCTCTCGTATATAGAACAGTTGGTTGCAACATGCCTGCAGCAAAAGATCATTCTCTTGATAAGATTCGTTGTATATTTTCAAGAGGTAGATAATGTGCTTGCTGATGGCCGGCCTTTCCTTTGTGGCAACGAGTTTACTGCAGCTGACTTGACATTTGCTTCACTTGCTGGTCCTGTACTATGCCCAGTGGGATATGGCACCTATCTAGTACCAACAGAGGAACTTCCATCAGAGATGTCTGAAGTTACCCTTTCTTTAAGAGAGACACTAGCAGGGAAGCATGCATTGAGGATGTACGAAACTGAAAGACACAGGAAAGTGGCTTCGAAGGTGCTAGTGACCTGACTCAGTGCATTTACACCGTGGCCTTTGTGTTCTGTCATATTATTGGCAACTAATATTGATTTACGTAAACTGCATACAGCTTTTCTTTCTTGGGAGTGAATCTTTTGAGCAATTTGCTGTCTTCATATAGCAATAATTTGAGAAAGGACCAAGATTTATTAAACTTCGTATTTTGTTTTTTGAATGTTGATATCTTTGTTGTTATTTGAGCAAGCTTAAGCTGCCTAGAACTATAATTCGGTTATGGCATGGGCATGTATCATTCAACCTTGATCTTTGTGCCTTTTTAACTAGAACCCATTTTTTTCAAAAAA\n") test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) test.explainflag = True test.oligosize=20 test.psizeopt=200 test.outfile = "output.pr3" print test ####### from here on I've understood that I have to parse the output with Primer3.read(), but an error message tells me that the output.pr3 file doesn't exist. ####### test1 = Primer3.read(open("output.pr3", "r")) Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'output.pr3' ####### What would be the proper way to proceed? Thanks in advance! JP Verta From biopython at maubp.freeserve.co.uk Wed Nov 3 19:13:36 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 19:13:36 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 6:58 PM, Jukka-Pekka Verta wrote: > Hi all, > > I'm just starting with Python so mine's a rookie question. > > I'm trying to write a little program that would do primer design with > Primer3Commandline, yet I can't figure out how to access the output. > For example, when I try the same example as in the Application.py > file under Primer3Commandline function > > ###### > > from Bio.Emboss.Applications import Primer3Commandline > from Bio.Emboss import Primer3 > > sequence_handle = open("sequence", "w").write("...\n") Here you've created a file called "sequence" (with no extension). I'd have gone for something like "sequence.fasta" but it doesn't mater. > test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) Here you should be passing the filename, in your case "sequence". Maybe we should add a check here for people passing non-strings to arguments expecting filenames. Does that help? Peter [Note It is possible to avoid the temporary file by writing to the tool's stdin but this is fiddly and requires you to work with the subprocess module directly.] From jp.verta at gmail.com Wed Nov 3 19:27:30 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 15:27:30 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: Message-ID: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> Thanks Peter for the response. Though doing test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) doesn't fix the problem (no output is created). Other suggestions? JP On 2010-11-03, at 3:13 PM, Peter wrote: > On Wed, Nov 3, 2010 at 6:58 PM, Jukka-Pekka Verta wrote: >> Hi all, >> >> I'm just starting with Python so mine's a rookie question. >> >> I'm trying to write a little program that would do primer design with >> Primer3Commandline, yet I can't figure out how to access the output. >> For example, when I try the same example as in the Application.py >> file under Primer3Commandline function >> >> ###### >> >> from Bio.Emboss.Applications import Primer3Commandline >> from Bio.Emboss import Primer3 >> >> sequence_handle = open("sequence", "w").write("...\n") > > Here you've created a file called "sequence" (with no extension). I'd > have gone for something like "sequence.fasta" but it doesn't mater. > >> test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) > > Here you should be passing the filename, in your case "sequence". > Maybe we should add a check here for people passing non-strings > to arguments expecting filenames. > > Does that help? > > Peter > > [Note It is possible to avoid the temporary file by writing to the tool's > stdin but this is fiddly and requires you to work with the subprocess > module directly.] From chapmanb at 50mail.com Wed Nov 3 19:38:08 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 3 Nov 2010 15:38:08 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> Message-ID: <20101103193808.GQ2726@sobchak.mgh.harvard.edu> JP; > Thanks Peter for the response. > > Though doing > > test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) > > doesn't fix the problem (no output is created). Other suggestions? In addition to what Peter mentioned, your code here creates a commandline object, but never actually runs it: > test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) > test.explainflag = True > test.oligosize=20 > test.psizeopt=200 > test.outfile = "output.pr3" > print test This is probably why you weren't seeing any error messages from passing in a file handle, and explains why there is no output file. You can either call your command line directly: test() or use the Python subprocess module: subprocess.check_call(str(test)) [As an aside, it's good practice to use meaningful variable names. At least a 'cl' for commandline would be nice here and may help you spot these types of issues.] Hope this helps, Brad From jp.verta at gmail.com Wed Nov 3 20:30:19 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Wed, 3 Nov 2010 16:30:19 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <20101103193808.GQ2726@sobchak.mgh.harvard.edu> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> Message-ID: <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Thanks Brad, calling test() results in error Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.6/site-packages/biopython-1.55-py2.6-macosx-10.6-universal.egg/Bio/Application/__init__.py", line 487, in __call__ % (str(self), return_code)) RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero exit status 127 as does calling import subprocess subprocess.check_call(str(test)) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 499, in check_call retcode = call(*popenargs, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 486, in call return Popen(*popenargs, **kwargs).wait() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 672, in __init__ errread, errwrite) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1201, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory Ummm.... ? JP On 2010-11-03, at 3:38 PM, Brad Chapman wrote: > JP; > >> Thanks Peter for the response. >> >> Though doing >> >> test = Primer3Commandline(sequence="sequence", auto=True, hybridprobe=True) >> >> doesn't fix the problem (no output is created). Other suggestions? > > In addition to what Peter mentioned, your code here creates a > commandline object, but never actually runs it: > >> test = Primer3Commandline(sequence = sequence_handle, auto=True, hybridprobe=True) >> test.explainflag = True >> test.oligosize=20 >> test.psizeopt=200 >> test.outfile = "output.pr3" >> print test > > This is probably why you weren't seeing any error messages from passing in > a file handle, and explains why there is no output file. You can > either call your command line directly: > > test() > > or use the Python subprocess module: > > subprocess.check_call(str(test)) > > [As an aside, it's good practice to use meaningful variable names. > At least a 'cl' for commandline would be nice here and may help > you spot these types of issues.] > > Hope this helps, > Brad From biopython at maubp.freeserve.co.uk Wed Nov 3 21:04:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 21:04:21 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Message-ID: On Wed, Nov 3, 2010 at 8:30 PM, Jukka-Pekka Verta wrote: > > > Thanks Brad, > calling > test() > results in error > > Traceback (most recent call last): > ... > RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned > non-zero exit status 127 On Linux error 127 means the command was not found. > as does calling > import subprocess > subprocess.check_call(str(test)) > > Traceback (most recent call last): > ... > OSError: [Errno 2] No such file or directory > Ummm.... ? > JP Have you installed the EMBOSS command line tools? Are they on the system path? What happens at the command line if you type eprimer3 (or the longer version Biopython is trying to use)? Peter From jp.verta at gmail.com Thu Nov 4 16:55:31 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 12:55:31 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> Message-ID: <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Thanks Peter, indeed I hadn't. I'm a rookie as I mentioned.... I installed EMBOSS tools (works from command line) and the primer3 program, set the path for primer3_core and I am able to run eprimer3 from the command line. When I tried to call my function test() from Python though, I got an error message RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero exit status 1 Thanks for your patient help! JP On 2010-11-03, at 5:04 PM, Peter wrote: > On Wed, Nov 3, 2010 at 8:30 PM, Jukka-Pekka Verta wrote: >> >> >> Thanks Brad, >> calling >> test() >> results in error >> >> Traceback (most recent call last): >> ... >> RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence >> -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned >> non-zero exit status 127 > > On Linux error 127 means the command was not found. > >> as does calling >> import subprocess >> subprocess.check_call(str(test)) >> >> Traceback (most recent call last): >> ... >> OSError: [Errno 2] No such file or directory >> Ummm.... ? >> JP > > Have you installed the EMBOSS command line tools? Are they on > the system path? What happens at the command line if you type > eprimer3 (or the longer version Biopython is trying to use)? > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 17:10:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:10:03 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Message-ID: On Thu, Nov 4, 2010 at 4:55 PM, Jukka-Pekka Verta wrote: > > Thanks Peter, > > indeed I hadn't. I'm a rookie as I mentioned.... > > I installed EMBOSS tools (works from command line) and the primer3 program, > set the path for primer3_core and I am able to run eprimer3 from the command > line. When I tried to call my function test() from Python though, I got an error message > > RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero > exit status 1 > > Thanks for your patient help! > > JP Excellent - you are making progress. That RuntimeError is Biopython saying that eprimer3 returned with an error. Command line tools give an integer return code or error level -- zero means success by convention. What happens if you try this command at the command line prompt (outside python), in the directory where your sequence file is: eprimer3 -auto -outfile=output.pr3 -sequence=sequence -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True My guess is you are using a recent version of EMBOSS, in which case you'll see this: Died: Unknown qualifier -oligosize=20 If so you should be using -osizeopt rather than -oligosize, which the Biopython documentation does try to explain - I've made a small clarification here: https://github.com/biopython/biopython/commit/065e235ca76c6656a7f7d73de57c8bf8743ce25f Peter From jp.verta at gmail.com Thu Nov 4 17:33:42 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 13:33:42 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> Message-ID: <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> You're right, thats the case. Still, with -oligosize changed to -osizeopt Python returns me an error and running the eprimer3 on the command line gives me EMBOSS An error in eprimer3.c at line 317: eprimer3: Could not exec primer3_core Segmentation fault JP On 2010-11-04, at 1:10 PM, Peter wrote: > On Thu, Nov 4, 2010 at 4:55 PM, Jukka-Pekka Verta wrote: >> >> Thanks Peter, >> >> indeed I hadn't. I'm a rookie as I mentioned.... >> >> I installed EMBOSS tools (works from command line) and the primer3 program, >> set the path for primer3_core and I am able to run eprimer3 from the command >> line. When I tried to call my function test() from Python though, I got an error message >> >> RuntimeError: Command eprimer3 -auto -outfile=output.pr3 -sequence=sequence >> -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True returned non-zero >> exit status 1 >> >> Thanks for your patient help! >> >> JP > > Excellent - you are making progress. > > That RuntimeError is Biopython saying that eprimer3 returned with an error. > Command line tools give an integer return code or error level -- zero means > success by convention. > > What happens if you try this command at the command line prompt (outside > python), in the directory where your sequence file is: > > eprimer3 -auto -outfile=output.pr3 -sequence=sequence > -hybridprobe=True -psizeopt=200 -oligosize=20 -explainflag=True > > My guess is you are using a recent version of EMBOSS, in which case > you'll see this: Died: Unknown qualifier -oligosize=20 > > If so you should be using -osizeopt rather than -oligosize, which the > Biopython documentation does try to explain - I've made a small > clarification here: > > https://github.com/biopython/biopython/commit/065e235ca76c6656a7f7d73de57c8bf8743ce25f > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 17:53:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:53:54 +0000 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> Message-ID: On Thu, Nov 4, 2010 at 5:33 PM, Jukka-Pekka Verta wrote: > > You're right, thats the case. Still, with -oligosize changed to -osizeopt > Python returns me an error and running the eprimer3 on the command > line gives me > > ? EMBOSS An error in eprimer3.c at line 317: > eprimer3: Could not exec primer3_core > Segmentation fault > > JP Wow - it looks like you may have found a bug in EMBOSS itself, getting a "Segmentation fault" is never a good sign. Have you properly installed the Whitehead institute primer3 package? The EMBOSS tool acts as a wrapper, and from the error perhaps it can't find the primer3 tool? You may want to report this to EMBOSS directly... See also: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/eprimer3.html Peter From jp.verta at gmail.com Thu Nov 4 18:51:41 2010 From: jp.verta at gmail.com (Jukka-Pekka Verta) Date: Thu, 4 Nov 2010 14:51:41 -0400 Subject: [Biopython] Primer3Commandline rookie question In-Reply-To: References: <1B72F431-E1AF-4EA6-8133-183593D47D8D@gmail.com> <20101103193808.GQ2726@sobchak.mgh.harvard.edu> <458C8AC0-01BB-4707-B412-53A096C1A238@gmail.com> <40AFFBB6-E89D-4C9C-BD89-0F5BDC1D60D9@gmail.com> <494A9140-3170-4E49-B2C4-14FA5FB70969@gmail.com> Message-ID: OK, the primer3 installation should be fine as I am able to run the primer3_core program from the command line with the example that comes with the distribution. I reported this to EMBOSS, let's see what they have to say... Thank you for all the help! JP On 2010-11-04, at 1:53 PM, Peter wrote: > On Thu, Nov 4, 2010 at 5:33 PM, Jukka-Pekka Verta wrote: >> >> You're right, thats the case. Still, with -oligosize changed to -osizeopt >> Python returns me an error and running the eprimer3 on the command >> line gives me >> >> EMBOSS An error in eprimer3.c at line 317: >> eprimer3: Could not exec primer3_core >> Segmentation fault >> >> JP > > Wow - it looks like you may have found a bug in EMBOSS itself, > getting a "Segmentation fault" is never a good sign. > > Have you properly installed the Whitehead institute primer3 > package? The EMBOSS tool acts as a wrapper, and from > the error perhaps it can't find the primer3 tool? > > You may want to report this to EMBOSS directly... > See also: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/eprimer3.html > > Peter From yvan.strahm at uni.no Wed Nov 10 12:21:25 2010 From: yvan.strahm at uni.no (Yvan) Date: Wed, 10 Nov 2010 13:21:25 +0100 Subject: [Biopython] editing xml blast report Message-ID: <4CDA8E45.8000206@uni.no> Hello All, I am wondering if it is possible with biopython to correctly edit a xml blast report? Here my problem; I need to filter/exclude some hits according to their length but the xml consistency should be conserved. Should I parse with it with NCBIXML and recreate a xml file with hard coded tags? I try to use the tabular output of blast+ and reformat it with blast_formatter but with no success. Any one has an idea or some pointer I could use? Thanks a lot for your help cheers yvan From biopython at maubp.freeserve.co.uk Wed Nov 10 12:45:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 12:45:53 +0000 Subject: [Biopython] editing xml blast report In-Reply-To: <4CDA8E45.8000206@uni.no> References: <4CDA8E45.8000206@uni.no> Message-ID: On Wed, Nov 10, 2010 at 12:21 PM, Yvan wrote: > Hello All, > > I am wondering if it is possible with biopython to correctly edit a xml > blast report? > > Here my problem; > I need to filter/exclude some hits according to their length but the xml > consistency should be conserved. Biopython has a BLAST XML parser, but nothing to help you write out BLAST XML. > Should I parse with it with NCBIXML and recreate a xml file with hard > coded tags? That might be the best option. > I try to use the tabular output of blast+ and reformat it with > blast_formatter but with no success. I'm pretty sure the blast_formatter tools in BLAST+ will only convert from ASN.1 to other BLAST output formats. So that won't work, unless you can edit the ASN.1 file (but that would probably be harder than editing XML). Peter From lpritc at scri.ac.uk Thu Nov 11 14:12:21 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 11 Nov 2010 14:12:21 +0000 Subject: [Biopython] Bioinformatics position Message-ID: We have a bioinformatics post available at SCRI, and would be grateful if you could please bring it to the attention of any colleagues who may be interested in applying. It is advertised at http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are included below: """ Bioinformatics Scottish Crop Research Institute- SCRI SCRI is Scotland's leading Institute for research on plants and their interactions with the environment, particularly in managed ecosystems. Our mission is to conduct excellent research in plant and environmental sciences. Our vision is to deliver innovative products, knowledge and services that enrich the life of the community and address the public goods of environmental sustainability, high quality and healthy food. Post Reference SMB/1/10 Research in the Plant Pathology Programme at SCRI is founded on pathogen genomics, and scientists in the Programme have a strong track record of contributing to whole genome sequencing and genetic analysis of economically important pests and pathogens.? The successful candidate will collaborate with other groups in the Programme working on plant-pathogen interactions developing innovative approaches to understand disease processes.?This post provides an opportunity to influence biological research of direct impact to agriculture. The ideal candidate would be experienced in manipulating and curating large biological datasets with a record of collaboration and integration with biologists.The successful applicant is expected to have an interest in plant-pathogen interactions and to develop their own research profile.The candidate should have a PhD or equivalent in bioinformatics, biostatistics or a related field. Informal enquiries from:??Leighton.Pritchard at scri.ac.uk ?or?Lesley.Torrance at scri.ac.uk Salary Scale For All Posts: *Band D/E, ?26,610 - ?37,534 (commensurate with experience) *Appointments to Band F, ?42,769 - ?47,521 available for exceptional candidates. Candidates willing to apply for a research fellowship to further help establish their own laboratory are encouraged to apply and will, if successful, benefit from generous Institute support throughout the tenure of their fellowship. Further information on the above posts, including how to apply, is available on the SCRI website athttp://www.scri.ac.uk/careers/vacancies ? Closing date -?Friday 19th?November 2010. The Institute is an equal opportunities employer. """ Many thanks, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From yvan.strahm at uni.no Mon Nov 15 10:38:48 2010 From: yvan.strahm at uni.no (Yvan) Date: Mon, 15 Nov 2010 11:38:48 +0100 Subject: [Biopython] editing xml blast report In-Reply-To: References: <4CDA8E45.8000206@uni.no> Message-ID: <4CE10DB8.8040908@uni.no> On 10/11/10 13:45, Peter wrote: > On Wed, Nov 10, 2010 at 12:21 PM, Yvan wrote: >> Hello All, >> >> I am wondering if it is possible with biopython to correctly edit a xml >> blast report? >> >> Here my problem; >> I need to filter/exclude some hits according to their length but the xml >> consistency should be conserved. > Biopython has a BLAST XML parser, but nothing to help you write > out BLAST XML. > >> Should I parse with it with NCBIXML and recreate a xml file with hard >> coded tags? > That might be the best option. > >> I try to use the tabular output of blast+ and reformat it with >> blast_formatter but with no success. > I'm pretty sure the blast_formatter tools in BLAST+ will only convert > from ASN.1 to other BLAST output formats. So that won't work, > unless you can edit the ASN.1 file (but that would probably be > harder than editing XML). > > Peter Ok thanks for the answer. Cheers, yvan From jgrant at smith.edu Wed Nov 17 21:22:08 2010 From: jgrant at smith.edu (Jessica Grant) Date: Wed, 17 Nov 2010 16:22:08 -0500 Subject: [Biopython] problem blasting on line Message-ID: Hello, I am trying to use blast to extract contaminating sequences from a set of 454 sequence data. My script uses NCBIWWW.qblast as follows: result_handle = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", hitlist_size="1", entrez_query='(Bacteria[ORGN])') It works...for a while but it stops, eventually, with the following error: raceback (most recent call last): File "compBACTEUK.py", line 81, in main() File "compBACTEUK.py", line 70, in main blastBACTEUK(arg) File "compBACTEUK.py", line 19, in blastBACTEUK result_handleE = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", hitlist_size="1", entrez_query='(Eukaryota[ORGN])') File "/Users/katzlab/Downloads/biopython-biopython-74d3c2a/build/lib.macosx-10.5-i386-2.5/Bio/Blast/NCBIWWW.py", line 114, in qblast handle = urllib2.urlopen(request) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 121, in urlopen return _opener.open(url, data) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 380, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 491, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 418, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 353, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py", line 499, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found I suppose that the problem is a communication problem with ncbi. I have written a try except statement into my script, but I seem to be losing quite a few records as they get skipped over if the error occurs. I thought about downloading nr and using the standalone blast, but it seems the downloadable nr database comes in several parts, already formatted for blast. Can I concatenate these? Any thoughts on the problem with the qblast or other ways to circumvent this problem would be greatly appreciated! Jessica From biopython at maubp.freeserve.co.uk Wed Nov 17 21:37:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Nov 2010 21:37:47 +0000 Subject: [Biopython] problem blasting on line In-Reply-To: References: Message-ID: Hi Jessica, On Wed, Nov 17, 2010 at 9:22 PM, Jessica Grant wrote: > Hello, > > I am trying to use blast to extract contaminating sequences from a set of > 454 sequence data. ?My script uses NCBIWWW.qblast as follows: > > result_handle = NCBIWWW.qblast("blastx", "nr", record.format("fasta"), > ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML", > hitlist_size="1", entrez_query='(Bacteria[ORGN])') > > > It works...for a while but it stops, eventually, with the following error: > > > traceback (most recent call last): > ... > urllib2.HTTPError: HTTP Error 404: Not Found > > > I suppose that the problem is a communication problem with ncbi. Probably some kind of network problem, yes. > I have > written a try except statement into my script, but I seem to be losing quite > a few records as they get skipped over if the error occurs. > > I thought about downloading nr and using the standalone blast, but it seems > the downloadable nr database comes in several parts, already formatted for > blast. ?Can I concatenate these? > > Any thoughts on the problem with the qblast or other ways to circumvent this > problem would be greatly appreciated! > > Jessica How many sequences are you trying to BLAST? If it is more than a few dozen I would definitely recommend installing and running BLAST locally. Regarding the NR database, yes, it comes in parts, but this is by design. There is a main index file which tells the BLAST command line tools about all the subparts - very easy. Just download all the nr.*.tar.gz files into your BLAST database folder and uncompress them. Peter From eric.talevich at gmail.com Wed Nov 17 21:54:40 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 17 Nov 2010 16:54:40 -0500 Subject: [Biopython] problem blasting on line In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 4:22 PM, Jessica Grant wrote: > Hello, > > I am trying to use blast to extract contaminating sequences from a set of > 454 sequence data. My script uses NCBIWWW.qblast as follows: > > [...] > > I thought about downloading nr and using the standalone blast, but it seems > the downloadable nr database comes in several parts, already formatted for > blast. Can I concatenate these? > > Any thoughts on the problem with the qblast or other ways to circumvent > this problem would be greatly appreciated! > > Hi Jessica, If the problem boils down to grouping all the related sequences together, or isolating the unrelated sequences, you might also have some luck with CD-HIT: http://weizhong-lab.ucsd.edu/cd-hit/ Best, Eric From peter at maubp.freeserve.co.uk Thu Nov 18 15:47:08 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 15:47:08 +0000 Subject: [Biopython] Dropping Python 2.4 Support? Message-ID: Dear Biopythoneers, Are any of you still using Biopython on Python 2.4? http://news.open-bio.org/news/2010/11/dropping-python24-support/ Please get in touch if dropping support for Python 2.4 would be a problem. Otherwise we plan for Biopython 1.56 (expected by the end of this month) to be our last release to work with Python 2.4. Thanks, Peter From srikrishnamohan at gmail.com Fri Nov 19 18:43:23 2010 From: srikrishnamohan at gmail.com (km) Date: Sat, 20 Nov 2010 00:13:23 +0530 Subject: [Biopython] polypeptide residue ids Message-ID: Hi all, I would like to derive amino acid ids (number) corresponding to the residues in polypeptide chains extracted from PDB file using biopython PDB module. Is it possible to obtain this information with PDB module of biopython ? regards, KM From anaryin at gmail.com Fri Nov 19 18:54:17 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 19 Nov 2010 19:54:17 +0100 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Hello KM, You can iterate over the chains as described in the manual and then for each residue call get_id(). for residue in structure.get_residues(): print residue.get_id()[1] Best Jo?o [...] Rodrigues http://doeidoei.wordpress.com On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > Hi all, > I would like to derive amino acid ids (number) corresponding to the > residues in polypeptide chains extracted from PDB file using biopython > PDB module. > Is it possible to obtain this information with PDB module of biopython ? > > regards, > KM > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From srikrishnamohan at gmail.com Fri Nov 19 20:31:32 2010 From: srikrishnamohan at gmail.com (km) Date: Sat, 20 Nov 2010 02:01:32 +0530 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Thanks. that works well. but why have polypeptide builder for generating the sequence ? ppb.get_sequence() ? when we can do the same by reading the residues under chain of the structure hierarchy as u suggested? On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues wrote: > Hello KM, > > You can iterate over the chains as described in the manual and then for each > residue call get_id(). > > for residue in structure.get_residues(): > ?? print residue.get_id()[1] > > Best > > Jo?o [...] Rodrigues > http://doeidoei.wordpress.com > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: >> >> Hi all, >> I would like to derive amino acid ids (number) corresponding to the >> residues in polypeptide chains extracted from PDB file using biopython >> PDB module. >> Is it possible to obtain this information with PDB module of biopython ? >> >> regards, >> KM >> _______________________________________________ >> Biopython mailing list ?- ?Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > From anaryin at gmail.com Fri Nov 19 20:35:25 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 19 Nov 2010 21:35:25 +0100 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: The Peptide Builder serves for much more than generating sequenes. However, having the SMCRA representation of the model is much handier (IMO) to get this kind of information. I'm not really sure if I answered your question though.. Jo?o [...] Rodrigues http://doeidoei.wordpress.com On Fri, Nov 19, 2010 at 9:31 PM, km wrote: > Thanks. that works well. > but why have polypeptide builder for generating the sequence ? > ppb.get_sequence() ? > when we can do the same by reading the residues under chain of the > structure hierarchy as u suggested? > > > On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues > wrote: > > Hello KM, > > > > You can iterate over the chains as described in the manual and then for > each > > residue call get_id(). > > > > for residue in structure.get_residues(): > > print residue.get_id()[1] > > > > Best > > > > Jo?o [...] Rodrigues > > http://doeidoei.wordpress.com > > > > > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > >> > >> Hi all, > >> I would like to derive amino acid ids (number) corresponding to the > >> residues in polypeptide chains extracted from PDB file using biopython > >> PDB module. > >> Is it possible to obtain this information with PDB module of biopython ? > >> > >> regards, > >> KM > >> _______________________________________________ > >> Biopython mailing list - Biopython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > > From rodrigo_faccioli at uol.com.br Sun Nov 21 16:20:15 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Sun, 21 Nov 2010 14:20:15 -0200 Subject: [Biopython] polypeptide residue ids In-Reply-To: References: Message-ID: Hi, I'm not sure if my answer will be correct. Anyway, I'll try to write it. I believe the Peptide Builder works with atoms section to build the sequences. Therefore, if your atoms section has not all atoms of all residues they will not appear correctly. I have been worked with seqres section on pdb. More specific I have studied the biopython pdb module and I tried to add it in SMCRA representation. However, this moment I don't have time to talk with Biopython project because I have to work with my Ph.D project. Anyway, if you want to test this project, please see [1]. In this project, there is a script to split pdb in chains. [1] https://github.com/rodrigofaccioli/ContributeToBioPython I hope you enjoy it. Best regards, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 On Fri, Nov 19, 2010 at 6:35 PM, Jo?o Rodrigues wrote: > The Peptide Builder serves for much more than generating sequenes. However, > having the SMCRA representation of the model is much handier (IMO) to get > this kind of information. > > I'm not really sure if I answered your question though.. > > Jo?o [...] Rodrigues > http://doeidoei.wordpress.com > > > > On Fri, Nov 19, 2010 at 9:31 PM, km wrote: > > > Thanks. that works well. > > but why have polypeptide builder for generating the sequence ? > > ppb.get_sequence() ? > > when we can do the same by reading the residues under chain of the > > structure hierarchy as u suggested? > > > > > > On Sat, Nov 20, 2010 at 12:24 AM, Jo?o Rodrigues > > wrote: > > > Hello KM, > > > > > > You can iterate over the chains as described in the manual and then for > > each > > > residue call get_id(). > > > > > > for residue in structure.get_residues(): > > > print residue.get_id()[1] > > > > > > Best > > > > > > Jo?o [...] Rodrigues > > > http://doeidoei.wordpress.com > > > > > > > > > > > > On Fri, Nov 19, 2010 at 7:43 PM, km wrote: > > >> > > >> Hi all, > > >> I would like to derive amino acid ids (number) corresponding to the > > >> residues in polypeptide chains extracted from PDB file using biopython > > >> PDB module. > > >> Is it possible to obtain this information with PDB module of biopython > ? > > >> > > >> regards, > > >> KM > > >> _______________________________________________ > > >> Biopython mailing list - Biopython at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > > > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From ming.xue at boehringer-ingelheim.com Mon Nov 22 14:58:27 2010 From: ming.xue at boehringer-ingelheim.com (ming.xue at boehringer-ingelheim.com) Date: Mon, 22 Nov 2010 09:58:27 -0500 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") References: Message-ID: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Hi All, I was installing biopython 1.55 over python 2.6 and testing generated the error: ====================================================================== ERROR: Bio.Entrez.efetch(genome, X52960, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/src/biopython-1.55/build/lib.linux-x86_64-2.6/Bio/SeqIO/__init__.py", line 585, in read raise ValueError("No records found in handle") ValueError: No records found in handle ====================================================================== ERROR: Bio.ExPASy.get_sprot_raw("O23729") ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_online.py", line 53, in test_get_sprot_raw "internet (or maybe just ExPASy) not available") MissingExternalDependencyError: internet (or maybe just ExPASy) not available ---------------------------------------------------------------------- Ran 144 tests in 93.664 seconds FAILED (failures = 1) Since I am behind proxy with ntlm authentication, my guess the issue is related to firewall. I tried some solution from internet such as ntlm-map and python-ntlm but no luck. Any suggestions? Best and thanks, MX From biopython at maubp.freeserve.co.uk Mon Nov 22 15:17:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Nov 2010 15:17:44 +0000 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") In-Reply-To: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> References: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Message-ID: On Mon, Nov 22, 2010 at 2:58 PM, wrote: > Hi All, > > I was installing biopython 1.55 over python 2.6 and testing generated the > error: > ====================================================================== > ERROR: Bio.Entrez.efetch(genome, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > ValueError: No records found in handle > > ====================================================================== > ERROR: Bio.ExPASy.get_sprot_raw("O23729") > ---------------------------------------------------------------------- > ... > MissingExternalDependencyError: internet (or maybe just ExPASy) not available > > ---------------------------------------------------------------------- > Ran 144 tests in 93.664 seconds > > FAILED (failures = 1) > > Since I am behind proxy with ntlm authentication, my guess the issue is > related to firewall. I tried some solution from internet such as ntlm-map and > python-ntlm but no luck. Any suggestions? > > Best and thanks, > > MX Hi, I see similar errors on my Windows machine, and I think this is down to needing to use the institute proxy. Currently Bio.Entrez uses urllib (version 1), and I don't know if this can be modified to work with python-ntlm or not. What have you tried? If you are happy to try experimenting, you could modify the Bio.Entrez _open function to use urllib2 with python-ntlm. If you had a simple proxy (without authentication) you'd do something like this: import os os.environ["http_proxy"] = "http://proxyhost.example.com:8080" See the "Using a proxy" section in the Entrez chapter of the tutorial for more details. Regards, Peter From akooser at unm.edu Tue Nov 23 03:52:15 2010 From: akooser at unm.edu (Ara Kooser) Date: Mon, 22 Nov 2010 20:52:15 -0700 Subject: [Biopython] Extracting data genpept files Message-ID: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Hello all, I think Peter pointed me to part of this code (shown below) for extracting data out of a genpept file. I am trying to get a handle on the formating end of things. My questions is when there is missing taxonomic data grabbed by tax_records = gb_record.annotations["taxonomy"] instead of leaving the space blank the program fills it in with the next piece of data, usually the date. This throws off the whole spreadsheet when I import as a CSV file. Is there a way to have the program write in white space when it encounters missing data instead of the date? Thanks, Ara PS as soon as the formating is sorted out and folks created for input and such I will post the code up here. gg = open("raw_genbank.txt","w") gb_file = "sequence.gp.txt" for gb_record in SeqIO.parse(open(gb_file,"r"), "genbank"): gb_feature = gb_record.features[2] def index_genbank_features(gb_record, feature_type, qualifier) : answer = dict() for (index, feature) in enumerate(gb_record.features) : if feature.type==feature_type : if qualifier in feature.qualifiers : for value in feature.qualifiers[qualifier] : if value in answer : print "WARNING - Duplicate key %s for %s features %i and %i" \ % (value, feature_type, answer[value], index) else : answer[value] = index return answer locus_tag_cds_index = index_genbank_features(gb_record,"CDS","locus_tag") coded_by_cds_index = index_genbank_features(gb_record,"CDS","coded_by") name_by_source_index = index_genbank_features(gb_record,"source","organism") protein_id_cds_index = index_genbank_features(gb_record,"CDS","protein_id") gb_annotations = gb_record.annotations tax_records = gb_record.annotations["taxonomy"] accession = gb_record.annotations["accessions"] date = gb_record.annotations["date"] function = gb_record.description gg.write(str([accession, locus_tag_cds_index, coded_by_cds_index, name_by_source_index, tax_records, date, function])) gg.write("\n") gg.close() From biopython at maubp.freeserve.co.uk Tue Nov 23 08:53:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 08:53:52 +0000 Subject: [Biopython] Extracting data genpept files In-Reply-To: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Message-ID: On Tue, Nov 23, 2010 at 3:52 AM, Ara Kooser wrote: > Hello all, > > ? I think Peter pointed me to part of this code (shown below) for extracting > data out of a genpept file. I am trying to get a handle on the formating end > of things. My questions is when there is missing taxonomic data grabbed by > tax_records = gb_record.annotations["taxonomy"] instead of leaving the space > blank the program fills it in with the next piece of data, usually the date. > This throws off the whole spreadsheet when I import as a CSV file. > If I understood your aim, try using this if the taxonomy isn't in the annotations dictionary (which would give a KeyError), tax_records = gb_record.annotations.get("taxonomy", []) Perhaps you could clarify if you want the taxonomy (a list of variable length) to go in one column of your CSV file? Peter P.S. I prefer using tab separated variables (tsv) over csv, as I find commas in descriptions quite often - and although this can be dealt with it is fiddly. From akooser at unm.edu Tue Nov 23 13:27:21 2010 From: akooser at unm.edu (Ara Kooser) Date: Tue, 23 Nov 2010 06:27:21 -0700 Subject: [Biopython] Extracting data genpept files In-Reply-To: References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> Message-ID: <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> Peter, Thank you for your response. > > If I understood your aim, try using this if the taxonomy isn't in the > annotations > dictionary (which would give a KeyError), > > tax_records = gb_record.annotations.get("taxonomy", []) I will give this a try later in the morning. > > Perhaps you could clarify if you want the taxonomy (a list of > variable length) > to go in one column of your CSV file? > I would like the whole taxonomy to go into one column but couldn't figure out how to do that. > Peter > > P.S. I prefer using tab separated variables (tsv) over csv, as I > find commas > in descriptions quite often - and although this can be dealt with it > is fiddly. I wanted to use tabs but here again wasn't sure of how to code that. Commas have been my bane while trying to write this code. Ara From biopython at maubp.freeserve.co.uk Tue Nov 23 14:09:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 14:09:57 +0000 Subject: [Biopython] Extracting data genpept files In-Reply-To: <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> References: <2FCC04DE-9619-4225-9114-311A186FE70D@unm.edu> <8D9CF8DC-3489-41E1-87CA-101FB4925520@unm.edu> Message-ID: On Tue, Nov 23, 2010 at 1:27 PM, Ara Kooser wrote: > Peter, > > I would like the whole taxonomy to go into one column but couldn't figure > out how to do that. > > ... > > I wanted to use tabs but here again wasn't sure of how to code that. Commas > have been my bane while trying to write this code. I'd suggest writing the taxonomy in a single column, space separated. You can do that with the python string join method. For example, if taxonomy_list is a list of strings, use: taxonomy_str = " ".join(taxonomy_list) Regarding tabs, use slash t to get a tab in Python, e.g. "\t", or: name = "Test" x = 10 y = -5 handle = open("example.tsv", "w") handle.write("%s\t%i\t%i\n" % (name, x, y)) handle.close() Peter From ming.xue at boehringer-ingelheim.com Wed Nov 24 17:47:16 2010 From: ming.xue at boehringer-ingelheim.com (ming.xue at boehringer-ingelheim.com) Date: Wed, 24 Nov 2010 12:47:16 -0500 Subject: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") In-Reply-To: References: <5C105629EFD02E42B2C3E3113755F121126C6B66@RDGEXM01.am.boehringer.com> Message-ID: <5C105629EFD02E42B2C3E3113755F1211273AB86@RDGEXM01.am.boehringer.com> Hi, I installed python-ntlm from http://code.google.com/p/python-ntlm/ and tried their example using urllib2 to retrieve Google home page but still get "407: Proxy Authentication Required" error. I tried ntlmaps-0.9.9.0.1 and it didn't solve the problem either. I am not going to try to fix the issue as I only need to parse behind the firewall. Thanks a lot. Cheers, MX -----Original Message----- From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On Behalf Of Peter Sent: Monday, November 22, 2010 10:18 AM To: Xue,Ming IT BIP-US-R Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] ERROR: Bio.Entrez.efetch(genome, X52960, ...) and ERROR: Bio.ExPASy.get_sprot_raw("O23729") On Mon, Nov 22, 2010 at 2:58 PM, wrote: > Hi All, > > I was installing biopython 1.55 over python 2.6 and testing generated the > error: > ====================================================================== > ERROR: Bio.Entrez.efetch(genome, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > ValueError: No records found in handle > > ====================================================================== > ERROR: Bio.ExPASy.get_sprot_raw("O23729") > ---------------------------------------------------------------------- > ... > MissingExternalDependencyError: internet (or maybe just ExPASy) not available > > ---------------------------------------------------------------------- > Ran 144 tests in 93.664 seconds > > FAILED (failures = 1) > > Since I am behind proxy with ntlm authentication, my guess the issue is > related to firewall. I tried some solution from internet such as ntlm-map and > python-ntlm but no luck. Any suggestions? > > Best and thanks, > > MX Hi, I see similar errors on my Windows machine, and I think this is down to needing to use the institute proxy. Currently Bio.Entrez uses urllib (version 1), and I don't know if this can be modified to work with python-ntlm or not. What have you tried? If you are happy to try experimenting, you could modify the Bio.Entrez _open function to use urllib2 with python-ntlm. If you had a simple proxy (without authentication) you'd do something like this: import os os.environ["http_proxy"] = "http://proxyhost.example.com:8080" See the "Using a proxy" section in the Entrez chapter of the tutorial for more details. Regards, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 23:12:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 23:12:14 +0000 Subject: [Biopython] Biopython 1.56 Message-ID: Dear Biopythoneers, On behalf of the developers, I'm pleased to announce we released Biopython 1.56 earlier today. For more details please see: http://news.open-bio.org/news/2010/11/biopython-1-56-released/ Please note this will probably be the last release to support Python 2.4, see: http://news.open-bio.org/news/2010/11/dropping-python24-support/ (At least) 13 people have contributed to this release, including 6 new people ? thank you all: * Andrea Pierleoni (first contribution) * Bart de Koning (first contribution) * Bartek Wilczynski * Bartosz Telenczuk (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Michiel de Hoon * Peter Cock * Phillip Garland (first contribution) * Siong Kong (first contribution) * Tiago Antao * Uri Laserson (first contribution) Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter From arkanion at gmail.com Sun Nov 28 18:06:15 2010 From: arkanion at gmail.com (=?ISO-8859-1?Q?=D6mer?=) Date: Sun, 28 Nov 2010 20:06:15 +0200 Subject: [Biopython] Using ClustalW Webtool Message-ID: To make a multiple alignment based on WebTools, I followed the instructions in Tutorial, but I have a problem with creating outfile for Clustalw where it says: "By default ClustalW will generate an alignment and guide tree file with names based on the input FASTA file, in this case opuntia.aln and opuntia.dnd, but you can override this or make it explicit:" However, it does not generate these files. What could be the problem and how can I solve it? Thanks, Greetings From biopython at maubp.freeserve.co.uk Sun Nov 28 21:07:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Nov 2010 21:07:02 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: References: Message-ID: On Sun, Nov 28, 2010 at 6:06 PM, ?mer wrote: > To make a multiple alignment based on WebTools, I followed the > instructions in Tutorial, but I have a problem with creating outfile for > Clustalw where it says: > > "By default ClustalW will generate an alignment and guide tree file with > names based on the input FASTA file, in this case opuntia.aln and > opuntia.dnd, but you can override this or make it explicit:" > > However, it does not generate these files. What could be the > problem and how can I solve it? Hello ?mer, I'm puzzled by the title of your email, "Using ClustalW Webtool". Are you actually running ClustalW on your machine, or using a website with ClustalW on it? The Biopython tutorial is talking about the command line tool ClustalW which you must first install on your own computer. Have you installed it? If so how? How are you trying to run ClustalW? Regards, Peter From biopython at maubp.freeserve.co.uk Sun Nov 28 22:59:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Nov 2010 22:59:19 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> References: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> Message-ID: On Sun, Nov 28, 2010 at 9:55 PM, ?mer AN wrote: > > Hi Peter, > > Actually I solved the problem on my own by installing ClustalW > to my local machine and setting the directory. Good. > But is it possible to run it over internet i.e. telling the url to the > python ? If so, how? In theory that is possible - for instance Biopython has code to talk to the NCBI servers to run BLAST remotely. In this case, why bother? For most datasets ClustalW is fast enough to run on an ordinary computer. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 08:48:17 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 08:48:17 +0000 Subject: [Biopython] Using ClustalW Webtool In-Reply-To: <5ABA7C6C9DD5440FAED2AEC5FEF4D29B@MYWORLD> References: <4FDBC381BC454AFC98FCB32A8930B901@MYWORLD> <5ABA7C6C9DD5440FAED2AEC5FEF4D29B@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 12:40 AM, ?mer AN wrote: > > One last vital question I have. I installed mafft.bat file to run an > alignment similar to clustalw.exe. It is better for me because it has an > option to choose the matrix such as Blosum 45 , 80 etc. However, it does not > generate an output file by default for the alignment ! How can I create an > output file? Which option should i switch on or what value should i assign? > So that i can read the alignment file via AlignIO? I checked everywhere but > couldn't find ! > The Tutorial doesn't (currently) show any MAFFT examples, but you would use the MafftCommandline wrapper class from the module Bio.Align.Applications very similarly to how you use ClustalW with ClustalwCommandline. Peter P.S. Please CC the mailing list so that other people can benefit from the discussion, or help. Thanks! From arkanion at gmail.com Mon Nov 29 10:26:26 2010 From: arkanion at gmail.com (=?ISO-8859-1?Q?=D6mer?=) Date: Mon, 29 Nov 2010 12:26:26 +0200 Subject: [Biopython] Mafft Alignment Output file Message-ID: I installed mafft.bat file to run an alignment similar to clustalw.exe shown in the Tutorial. I prefer Mafft because it has an option to choose the matrix such as Blosum 45 , 80 etc. However, it does not generate an output file by default for the alignment ! How can I create an output file? Which option should i switch on or what value should i assign? So that i can read the alignment file via AlignIO? It has parameters for the output file but the file cannot be created itself ! It is simply generated by default when used ClustalWCommandline. I checked everywhere but couldn't find ! Any idea? From biopython at maubp.freeserve.co.uk Mon Nov 29 11:05:01 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 11:05:01 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: > I installed mafft.bat file to run an alignment similar to clustalw.exe > shown in the Tutorial. I prefer Mafft because it has an option to > choose the matrix such as Blosum 45 , 80 etc. I think MAFFT is also faster, especially with large alignments. > However, it does not generate an output file by default for the alignment ! > How can I create an output file? Which option should i switch on or what > value should i assign? So that i can read the alignment file via AlignIO? > It has parameters for the output file but the file cannot be created itself > ! It is simply generated by default when used ClustalWCommandline. > I checked everywhere but couldn't find ! Any idea? What are you doing right now? Can you show us your code? Are you using the MafftCommandline wrapper as I suggested here?: http://lists.open-bio.org/pipermail/biopython/2010-November/006894.html I'm looking at the code right now, and it could do with a basic example. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 11:32:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 11:32:11 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 11:05 AM, Peter wrote: > On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: >> I installed mafft.bat file to run an alignment similar to clustalw.exe >> shown in the Tutorial. I prefer Mafft because it has an ?option to >> choose the matrix such as Blosum 45 , 80 etc. > > I think MAFFT is also faster, especially with large alignments. > >> However, it does not generate an output file by default for the alignment ! >> How can I create an output file? Which option should i switch on or what >> value should i assign? So that i can read the alignment file via AlignIO? >> It has parameters for the output file but the file cannot be created itself >> ! It is simply generated by default when used ClustalWCommandline. >> I checked everywhere but couldn't find ! Any idea? > > What are you doing right now? Can you show us your code? > > Are you using the MafftCommandline wrapper as I suggested here?: > http://lists.open-bio.org/pipermail/biopython/2010-November/006894.html > > I'm looking at the code right now, and it could do with a basic example. > > Peter > How about something like this: from Bio.Align.Applications import MuscleCommandline muscle_exe = r"C:\Program Files\Aligments\muscle3.8.31_i86win32.exe" in_file = r"C:\My Documents\unaligned.fasta" out_file = r"C:\My Documents\aligned.fasta" muscle_cline = MuscleCommandline(muscle_exe, input=in_file, out=out_file) print muscle_cline stdout, stderr = muscle_cline() Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 12:06:18 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:06:18 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: References: Message-ID: On Mon, Nov 29, 2010 at 11:05 AM, Peter wrote: > On Mon, Nov 29, 2010 at 10:26 AM, ?mer wrote: >> I installed mafft.bat file to run an alignment similar to clustalw.exe >> shown in the Tutorial. I prefer Mafft because it has an ?option to >> choose the matrix such as Blosum 45 , 80 etc. > > I think MAFFT is also faster, especially with large alignments. Sorry, I meant MUSCLE, not MAFFT. I haven't compare the speed of MAFFT. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 12:11:49 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:11:49 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <12DE657F8EB44E7B8007C25E607E6731@MYWORLD> References: <12DE657F8EB44E7B8007C25E607E6731@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 11:37 AM, ?mer AN wrote: > Here is my code in the attachment. My problem is mafft.bat does not create > an output file. What is the contents of your mafft.bat file? Batch files are just plain text. How did you install MAFFT on Windows? Last time I tried it wasn't obvious how to do this, and I gave up. >From your example, from Bio.Align.Applications import MafftCommandline mafft_bat = r"mafft.bat" assert os.path.isfile(mafft_bat), "mafft.bat file is missing" mafft_cline = MafftCommandline(mafft_bat, input="sequences.fasta", clustalout="on", treeout="on", bl="45") stdout, stderr = mafft_cline() What happens if you add: print stdout print stderr Are there any error messages? Peter P.S. The clustalout and treeout switches should really be booleans, the wrapper documentation is unclear here. e.g. mafft_cline = MafftCommandline(mafft_bat, input="sequences.fasta", clustalout=True, treeout=True, bl="45") From biopython at maubp.freeserve.co.uk Mon Nov 29 21:35:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 21:35:19 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> References: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 8:51 PM, ?mer AN wrote: > > How can i write alignment output to a file? > I can actually see the alignment when i use the command > > print stdout > > on the shell, but how can i make it write on a file? > > Thanks in advance If you have the alignment in the string stdout, you can save it to a file just like saving any data to a text file in Python: handle = open("output.txt", "w") handle.write(stdout) handle.close() Most command line tools will write the file themselves rather than printing the output to screen (stdout), or give you the choice. It looks like MAFFT only writes to stdout - which could be redirected to a file, but I think the above is clearer. Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 22:31:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 22:31:10 +0000 Subject: [Biopython] Mafft Alignment Output file In-Reply-To: <97485881576F49C4A17750F34098CA81@MYWORLD> References: <3E8CE55194194B02AA8A5DD2170E8DD3@MYWORLD> <97485881576F49C4A17750F34098CA81@MYWORLD> Message-ID: On Mon, Nov 29, 2010 at 9:44 PM, ?mer AN wrote: > > Now everything is clear, this is a nice alternative to replace with ! > > Thanks a lot for your assistance ! > I'm glad we solved it - I haven't used MAFFT recently so this was not fresh in my mind. I have tried to make some clarifications to the MafftCommandline documentation to improve this for the next release of Biopython. So thank you for your feedback. Regards, Peter