From p.j.a.cock at googlemail.com Wed Feb 1 12:22:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Feb 2012 17:22:18 +0000 Subject: [Biopython] regarding retrieving antigen information of specific gene using Biopython In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 7:00 AM, shweta dubey wrote: > hello everyone, > > I am new to Biopython.I have a set of genes and i want information of > antigens specific to these genes from a database(suppose, Antigen > Database). > > How can i do the same using Biopython?? > > Thanks in advance > > Shweta Dubey Hi, Which antigen database are you trying to use? If it is one of the NCBI ones you can probably use their Entrez API via Biopython. Peter From bpkth2012 at gmail.com Thu Feb 2 10:45:17 2012 From: bpkth2012 at gmail.com (Sarttu Bourvir) Date: Thu, 2 Feb 2012 16:45:17 +0100 Subject: [Biopython] parsing Blast results (xml) Message-ID: Hi, I am new to biopython and having problems parsing a blast reulst file (xml format). I can get out alignments, alignment length, title etc. But I would additionally need to print the query title , percent similarity, e-value. How does one do that? Is there anywhere else than Biopython cookbook and help(Bio.Blast.NCBIXML.Record) to look for information. I feel like I don't really understand the Blast.Record and where in there things can be found. Is the sequence query title in the header? Example code would be greatly appreciated! Thank you, From p.j.a.cock at googlemail.com Thu Feb 2 11:09:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 2 Feb 2012 16:09:54 +0000 Subject: [Biopython] parsing Blast results (xml) In-Reply-To: References: Message-ID: On Thu, Feb 2, 2012 at 3:45 PM, Sarttu Bourvir wrote: > Hi, > I am new to biopython and having problems parsing a blast reulst file (xml > format). > I can get out alignments, alignment length, title etc. > But I would additionally need to print the query title , percent > similarity, e-value. Well e-value is easy, and covered in the tutorial - e.g. for alignment in blast_record.alignments: for hsp in alignment.hsps: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' For percentage similarity I think you must use hsp.positives and the alignment length. Likewise hsp.identities can be used to get the percentage identity. > How does one do that? ?Is there anywhere else than Biopython > cookbook and help(Bio.Blast.NCBIXML.Record) to look for information. I assume you also know about dir(...) as well? e.g. try dir(hsp) after the above example or dir(alignment) to see what attributes these objects have. > I feel like I don't really understand the > Blast.Record and where in there things can be found. > Is the sequence query title in the header? Yes, the query details should be captured. Try dir(blast_record) where blast_record is a Bio.Blast.Record from the parser. Peter From drobukow at UTMB.EDU Tue Feb 7 09:41:50 2012 From: drobukow at UTMB.EDU (Obukowicz, Dennis R.) Date: Tue, 7 Feb 2012 14:41:50 +0000 Subject: [Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others Message-ID: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> I am new to Biopython and have tried installing Biopython according to instructions. When I run the test after installing I get many errors, 96 errors (see below some examples) in all out of 154 test runs. Two errors that keep popping up are not being able to find module Seq and module Alphabet. ImportError: No module named Seq ImportError: No module named Alphabet NameError: name 'Seq' is not defined NameError: name 'record' is not defined NameError: name 'protein_rec' is not defined NameError: name 'protein_rec' is not defined Dennis From drobukow at UTMB.EDU Tue Feb 7 11:01:21 2012 From: drobukow at UTMB.EDU (Obukowicz, Dennis R.) Date: Tue, 7 Feb 2012 16:01:21 +0000 Subject: [Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others In-Reply-To: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> References: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> Message-ID: <43C6C371D341DE44A81477FD432BA65854CB0B94@GRMBX4.utmb.edu> I solved some of my earlier problems with adjusting the path with the sys.path.append, adding directories where packages are located. However, now I keep getting this error below. I've searched for this error but can't find any mention of it. Can anyone help? ERROR: Bio.Wise ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 327, in runTest module = __import__(name, None, None, name.split(".")) File "/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/Wise/__init__.py", line 20, in from Bio import SeqIO File "/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/SeqIO/__init__.py", line 308, in import Seq File "/usr/local/biopython/biopython-1.58/Bio/Seq.py", line 31, in import ambiguous_dna_complement, ambiguous_rna_complement ImportError: No module named ambiguous_dna_complement From: Obukowicz, Dennis R. Sent: Tuesday, February 07, 2012 8:42 AM To: 'biopython at lists.open-bio.org' Subject: Problems Installing: Can't find modules Seq and Alphabet plus many others I am new to Biopython and have tried installing Biopython according to instructions. When I run the test after installing I get many errors, 96 errors (see below some examples) in all out of 154 test runs. Two errors that keep popping up are not being able to find module Seq and module Alphabet. ImportError: No module named Seq ImportError: No module named Alphabet NameError: name 'Seq' is not defined NameError: name 'record' is not defined NameError: name 'protein_rec' is not defined NameError: name 'protein_rec' is not defined Dennis From devaniranjan at gmail.com Tue Feb 7 20:01:31 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 7 Feb 2012 20:01:31 -0500 Subject: [Biopython] comparing sequences.qustion Message-ID: Hi, I have a list of > 200, 000 UNIQUE short EQUAL length sequences. I do the following I am comparing ALL sequences against ALL sequences so there will be (200000 * 199999 )/2 comparisons Once a sequence is compared, if they differ from one another by ONE letter only . then I do another more detailed alignment using a BLOSUM matrix. Currently I use the pairwise sequence comparison code found in BIOPYTHON for both comparison, simple comparison where I set match = 0 mismatch = -1 If the total alignment score is equal to -1 (meaning only one mismatch) then I go a further step and do a BLOSUM alignment. This works but its taking a long long time, I suspect its because I am using TWO alignments but I think there could be a way to do the first simple alignment WITHOUT using the pairwise alignment code for the first part will speed up this calculation. Unfortunately I don't have much more than a desktop to do this, so if someone can suggest a quicker way to do this, I would appreciate it. Thank you, George From eric.talevich at gmail.com Tue Feb 7 20:50:28 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 7 Feb 2012 20:50:28 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan wrote: > Hi, > > I have a list of > 200, 000 UNIQUE short EQUAL length sequences. > I do the following > > I am comparing ALL sequences against ALL sequences so there will be (200000 > * 199999 )/2 comparisons > Once a sequence is compared, if they differ from one another by ONE letter > only . then I do another more detailed alignment using a BLOSUM matrix. > > Currently I use the pairwise sequence comparison code found in BIOPYTHON > for both comparison, simple comparison where I set > match = 0 > mismatch = -1 > If the total alignment score is equal to -1 (meaning only one mismatch) > then I go a further step and do a BLOSUM alignment. > > This works but its taking a long long time, I suspect its because I am > using TWO alignments but I think there could be a way to do the first > simple alignment WITHOUT using the pairwise alignment code for the first > part will speed up this calculation. > Unfortunately I don't have much more than a desktop to do this, so if > someone can suggest a quicker way to do this, I would appreciate it. > > Thank you, > George > > Hi George, If your sequences are all equal length, and you're interested in the ones that differ by 1 character, then the difference between any two of those sequences of interest will be a single mismatched character. You don't need to do an alignment at all. Without Python: try clustering at whatever identify threshold corresponds to edit distance 1 in your sequences. UCLUST/USEARCH and other programs can do this quickly. With Python: try an expression like: seq_pairs_of_interest = [] for i, aseq in input_seq_list[:-1]: for j, bseq in input_seq_list[i+1:]: if sum(a != b for a, b in zip(aseq, bseq)) == 1: seq_pairs_of_interest.append((aseq, bseq)) Hope that helps, Eric From nje5 at georgetown.edu Wed Feb 8 10:50:15 2012 From: nje5 at georgetown.edu (Nathan Edwards) Date: Wed, 08 Feb 2012 10:50:15 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: References: Message-ID: <4F3299B7.4090305@georgetown.edu> Classical method (essentially BYP, obligatory reference to Goldberg): * for each sequence, divide in two, get s1 and s2. * place the sequences (or an reference/index) in a dictionary with list values at key s1 and s2. This is linear time. Any pair of sequences that differ in only one position _must_ have at least one of their halves in common, so do detailed alignment on all pairs of sequences with a common key. You specified unique, so each pair must be considered at most once. If you had duplicates, these would be aligned for each of their halves (and you'd have to normalize these out, somehow). This will be a small fraction of all pairs, assuming these are not pathological sequences. This works well as long as the halves have enough specificity - for DNA length 10 halves should work. Note that this doesn't distinguish between left-halves and right-halves, which might have the same key values, but obviously won't differ by one. Fixing this is an easy modification. BTW, this works even for edit-distance. Only concern is the use of the in-memory dictionary data-structure, which can get big. Untested pseudocode: from collections import defaultdict from itertools import combinations n = 20 halves = defaultdict(list) for s in sequences: s1 = s[:n/2] s2 = s[n/2:] halves[s1].append(s) halves[s2].append(s) for k in halves.iterkeys(): for seq1,seq2 in combinations(halves[k],2): # check for one-change before expensive alignment? align(seq1,seq2) - n On 2/7/2012 8:50 PM, Eric Talevich wrote: > On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan > wrote: > >> Hi, >> >> I have a list of> 200, 000 UNIQUE short EQUAL length sequences. >> I do the following >> >> I am comparing ALL sequences against ALL sequences so there will be (200000 >> * 199999 )/2 comparisons >> Once a sequence is compared, if they differ from one another by ONE letter >> only . then I do another more detailed alignment using a BLOSUM matrix. >> >> Currently I use the pairwise sequence comparison code found in BIOPYTHON >> for both comparison, simple comparison where I set >> match = 0 >> mismatch = -1 >> If the total alignment score is equal to -1 (meaning only one mismatch) >> then I go a further step and do a BLOSUM alignment. >> >> This works but its taking a long long time, I suspect its because I am >> using TWO alignments but I think there could be a way to do the first >> simple alignment WITHOUT using the pairwise alignment code for the first >> part will speed up this calculation. >> Unfortunately I don't have much more than a desktop to do this, so if >> someone can suggest a quicker way to do this, I would appreciate it. >> >> Thank you, >> George >> >> > Hi George, > > If your sequences are all equal length, and you're interested in the ones > that differ by 1 character, then the difference between any two of those > sequences of interest will be a single mismatched character. You don't need > to do an alignment at all. > > Without Python: try clustering at whatever identify threshold corresponds > to edit distance 1 in your sequences. UCLUST/USEARCH and other programs can > do this quickly. > > With Python: try an expression like: > > seq_pairs_of_interest = [] > for i, aseq in input_seq_list[:-1]: > for j, bseq in input_seq_list[i+1:]: > if sum(a != b for a, b in zip(aseq, bseq)) == 1: > seq_pairs_of_interest.append((aseq, bseq)) > > > Hope that helps, > Eric > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Dr. Nathan Edwards nje5 at georgetown.edu Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Room 1215, Harris Building Room 347, Basic Science 3300 Whitehaven St, NW 3900 Reservoir Road, NW Washington DC 20007 Washington DC 20007 Phone: 202-687-7042 Phone: 202-687-1618 Fax: 202-687-0057 Fax: 202-687-7186 From nje5 at georgetown.edu Wed Feb 8 11:08:30 2012 From: nje5 at georgetown.edu (Nathan Edwards) Date: Wed, 08 Feb 2012 11:08:30 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: <4F3299B7.4090305@georgetown.edu> References: <4F3299B7.4090305@georgetown.edu> Message-ID: <4F329DFE.6010607@georgetown.edu> Argh, Gusfield "Algorithms on Strings, Trees, and Sequences" is the obligatory string matching reference... - n On 2/8/2012 10:50 AM, Nathan Edwards wrote: > > Classical method (essentially BYP, obligatory reference to Goldberg): > > * for each sequence, divide in two, get s1 and s2. > * place the sequences (or an reference/index) in a dictionary with list > values at key s1 and s2. > > This is linear time. > > Any pair of sequences that differ in only one position _must_ have at > least one of their halves in common, so do detailed alignment on all > pairs of sequences with a common key. You specified unique, so each pair > must be considered at most once. If you had duplicates, these would be > aligned for each of their halves (and you'd have to normalize these out, > somehow). This will be a small fraction of all pairs, assuming these are > not pathological sequences. > > This works well as long as the halves have enough specificity - for DNA > length 10 halves should work. Note that this doesn't distinguish between > left-halves and right-halves, which might have the same key values, but > obviously won't differ by one. Fixing this is an easy modification. BTW, > this works even for edit-distance. Only concern is the use of the > in-memory dictionary data-structure, which can get big. > > Untested pseudocode: > > from collections import defaultdict > from itertools import combinations > > n = 20 > halves = defaultdict(list) > for s in sequences: > s1 = s[:n/2] > s2 = s[n/2:] > halves[s1].append(s) > halves[s2].append(s) > > for k in halves.iterkeys(): > for seq1,seq2 in combinations(halves[k],2): > # check for one-change before expensive alignment? > align(seq1,seq2) > > - n > > On 2/7/2012 8:50 PM, Eric Talevich wrote: >> On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan >> wrote: >> >>> Hi, >>> >>> I have a list of> 200, 000 UNIQUE short EQUAL length sequences. >>> I do the following >>> >>> I am comparing ALL sequences against ALL sequences so there will be >>> (200000 >>> * 199999 )/2 comparisons >>> Once a sequence is compared, if they differ from one another by ONE >>> letter >>> only . then I do another more detailed alignment using a BLOSUM matrix. >>> >>> Currently I use the pairwise sequence comparison code found in BIOPYTHON >>> for both comparison, simple comparison where I set >>> match = 0 >>> mismatch = -1 >>> If the total alignment score is equal to -1 (meaning only one mismatch) >>> then I go a further step and do a BLOSUM alignment. >>> >>> This works but its taking a long long time, I suspect its because I am >>> using TWO alignments but I think there could be a way to do the first >>> simple alignment WITHOUT using the pairwise alignment code for the first >>> part will speed up this calculation. >>> Unfortunately I don't have much more than a desktop to do this, so if >>> someone can suggest a quicker way to do this, I would appreciate it. >>> >>> Thank you, >>> George >>> >>> >> Hi George, >> >> If your sequences are all equal length, and you're interested in the ones >> that differ by 1 character, then the difference between any two of those >> sequences of interest will be a single mismatched character. You don't >> need >> to do an alignment at all. >> >> Without Python: try clustering at whatever identify threshold corresponds >> to edit distance 1 in your sequences. UCLUST/USEARCH and other >> programs can >> do this quickly. >> >> With Python: try an expression like: >> >> seq_pairs_of_interest = [] >> for i, aseq in input_seq_list[:-1]: >> for j, bseq in input_seq_list[i+1:]: >> if sum(a != b for a, b in zip(aseq, bseq)) == 1: >> seq_pairs_of_interest.append((aseq, bseq)) >> >> >> Hope that helps, >> Eric >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > -- Dr. Nathan Edwards nje5 at georgetown.edu Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Room 1215, Harris Building Room 347, Basic Science 3300 Whitehaven St, NW 3900 Reservoir Road, NW Washington DC 20007 Washington DC 20007 Phone: 202-687-7042 Phone: 202-687-1618 Fax: 202-687-0057 Fax: 202-687-7186 From d.m.a.martin at dundee.ac.uk Thu Feb 9 09:36:42 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Thu, 9 Feb 2012 14:36:42 +0000 Subject: [Biopython] Proteomics tools in BioPython Message-ID: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> We are planning to develop some proteomics tools in python and have a view to submit them as part of Biopython. Primarily we will be writing wrappers/parsers for the OpenMS tools/output formats and analytic tools on top of that. If anyone else is working on python wrappers for openms then I'd be happy to share expertise. ..d Dr David Martin College of Life Sciences University of Dundee The University of Dundee is a registered Scottish Charity, No: SC015096 From p.j.a.cock at googlemail.com Thu Feb 9 13:10:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 9 Feb 2012 18:10:39 +0000 Subject: [Biopython] Proteomics tools in BioPython In-Reply-To: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> Message-ID: On Thu, Feb 9, 2012 at 2:36 PM, David Martin wrote: > We are planning to develop some proteomics tools in python and > have a view to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS > tools/output formats and analytic tools on top of that. If anyone > else is working on python wrappers for openms then I'd be happy > to share expertise. > > ..d Thanks David - that was quick, you must have sent this almost straight after our chat this afternoon at the Dundee NextGenBUG meeting :) Peter From eric.talevich at gmail.com Thu Feb 9 15:41:02 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 9 Feb 2012 15:41:02 -0500 Subject: [Biopython] Proteomics tools in BioPython In-Reply-To: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> Message-ID: On Thu, Feb 9, 2012 at 9:36 AM, David Martin wrote: > We are planning to develop some proteomics tools in python and have a view > to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS tools/output > formats and analytic tools on top of that. If anyone else is working on > python wrappers for openms then I'd be happy to share expertise. > > ..d > > Sounds great to me! Since Google Summer of Code is coming up soon, do you see an opportunity to take on a student to help out with this work or build something on top of it? -Eric From d.m.a.martin at dundee.ac.uk Fri Feb 10 12:03:04 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Fri, 10 Feb 2012 17:03:04 +0000 Subject: [Biopython] Proteomics tools in biopython Message-ID: <959CFF5060375249824CC633DDDF896F08709A95@AMSPRD0402MB113.eurprd04.prod.outlook.com> There may be potential but I don't have the time to get organized for this year. We shall see how things progress. ..d On Thu, Feb 9, 2012 at 9:36 AM, David Martin wrote: > We are planning to develop some proteomics tools in python and have a > view to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS > tools/output formats and analytic tools on top of that. If anyone else > is working on python wrappers for openms then I'd be happy to share expertise. > > ..d > > Sounds great to me! Since Google Summer of Code is coming up soon, do you see an opportunity to take on a student to help out with this work or build something on top of it? -Eric ------------------------------ _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython End of Biopython Digest, Vol 110, Issue 5 ***************************************** The University of Dundee is a registered Scottish Charity, No: SC015096 From rbuels at gmail.com Fri Feb 10 12:51:12 2012 From: rbuels at gmail.com (Robert Buels) Date: Fri, 10 Feb 2012 12:51:12 -0500 Subject: [Biopython] Google Summer of Code project ideas Message-ID: <4F355910.4060203@gmail.com> Hi all, I'm going to be OBF project admin again this year for Google Summer of code. OBF's application is due in a couple of weeks, and we need to update our project ideas on the OBF wiki page and on each project's individual wiki pages. So, for each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. Please have the updates done, if possible, by this Friday (March 11). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2012 OBF GSoC Organization Admin From mjldehoon at yahoo.com Sat Feb 11 22:35:44 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 11 Feb 2012 19:35:44 -0800 (PST) Subject: [Biopython] Digital gene expression Message-ID: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Hi everybody, EdgeR and DESeq are popular R packages to analyze differential expression in digital gene expression methodologies such as RNAseq and CAGE. Is something similar to these R packages available in Python (or is anybody working on such a module for Biopython)? Not that I don't like EdgeR / DESeq, but I'd prefer something in Python so that I can understand what I am doing. Thanks, -Michiel. From p.j.a.cock at googlemail.com Sun Feb 12 07:27:12 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 12 Feb 2012 12:27:12 +0000 Subject: [Biopython] Digital gene expression In-Reply-To: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> References: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Message-ID: On Sunday, February 12, 2012, Michiel de Hoon wrote: > Hi everybody, > > EdgeR and DESeq are popular R packages to analyze differential > expression in digital gene expression methodologies such as > RNAseq and CAGE. Is something similar to these R packages > available in Python (or is anybody working on such a module > for Biopython)? Not that I don't like EdgeR / DESeq, but I'd > prefer something in Python so that I can understand what I > am doing. > > Thanks, > -Michiel. I'm not sure, but try rpy or rpy2 for calling these R libraries from Python. If you know both languages it is very powerful. Peter From tiagoantao at gmail.com Sun Feb 12 07:39:20 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 12 Feb 2012 12:39:20 +0000 Subject: [Biopython] Digital gene expression In-Reply-To: References: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Message-ID: Hi, On Sun, Feb 12, 2012 at 12:27 PM, Peter Cock > I'm not sure, but try rpy or rpy2 for calling these R rpy2 is an extremely declarative library. One almost forgets it is writing R inside Python. It seems to be brilliantly well done. I have only used it a couple of times myself, but I can only offer praise for it. From dan.bolser at gmail.com Mon Feb 13 20:32:39 2012 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 14 Feb 2012 01:32:39 +0000 Subject: [Biopython] Fwd: Interested in Variation? In-Reply-To: References:

Message-ID: Job at the EBI: "... the primary responsibility of the post-holder will be the development of pipelines and storage solutions for variation data deriving from whole genome re-sequencing." http://goo.gl/eQrRu or http://ig14.i-grasp.com/fe/tpl_embl01.asp?s=LktVsYDaNlCOtQqCli&jobid=47627,4187528723&key=45520467&c=126152212583&pagestamp=dbnjwyufpgvrmyvkmt Cheers, Dan. From chris.mit7 at gmail.com Tue Feb 14 15:30:41 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Tue, 14 Feb 2012 15:30:41 -0500 Subject: [Biopython] Proteomics tools in BioPython Message-ID: Hey David, What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) Chris From d.m.a.martin at dundee.ac.uk Wed Feb 15 12:22:48 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Wed, 15 Feb 2012 17:22:48 +0000 Subject: [Biopython] Proteomics tools for biopython Message-ID: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> > Hey David, > > What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) > > Chris At present we are wrapping the OpenMS outputs (featureML etc) so that we can interrogate the detail of how the runs behave. It is insightful to see (for example) how many of the ms/ms are on overlapping peptides, and the distribution of ms/ms selections per feature (vs intensity). This is just the first stage. Having these data (which up till now have been difficult to access) allows for building of smarter tools (custom delta mass thresholds for each ms/ms, second peptide searching, seeing whether all the peptide ID for a feature agree, correlating ID from different search engines to the same spectra). There are outstanding questions from our users for things like 'is it really necessary to do duplicate runs?' or in other words, can we get the machine to treat duplicate runs differently to optimise ID. (under the principle that madness is doing the same thing repeatedly but expecting different results.) Parsers for XTandem! would be really useful as that is something we'd like to have in our tool chain. A Mascot one would be good - I am looking into that (it is on my list of things to do, just not near the top right now.) I very much favour a modular approach where each class/object does one thing really well and can feed output to another class, and all can be represented using open formats. It might be a good idea to arrange a telecon or Skype group chat for people who are interested in contributing to this and building a comprehensive set of tools into Biopython. I can't promise too much from our end but we are making good progress and we have a strong commitment to open software and algorithms, with a heavy python development presence. ..d The University of Dundee is a registered Scottish Charity, No: SC015096 From Achim.Treumann at NEPAF.com Wed Feb 15 13:49:30 2012 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Wed, 15 Feb 2012 18:49:30 -0000 Subject: [Biopython] Proteomics tools for biopython In-Reply-To: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> Message-ID: <01798D2396253A449511F31F1CDE835519D5ED@srv1.NEPAF.local> Hi, I could possibly contribute a few snippets regarding an X!Tandem parser and a few other little tools - at the moment I am very busy, but can take part in the discussion more in March. Another interesting toolset that has been released today by Mike Gorshkov's research group is pyteomics 1.0.0 http://pypi.python.org/pypi/pyteomics/1.0.0 Best wishes, Achim -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of David Martin Sent: 15 February 2012 17:23 To: 'biopython at lists.open-bio.org' Subject: [Biopython] Proteomics tools for biopython > Hey David, > > What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) > > Chris At present we are wrapping the OpenMS outputs (featureML etc) so that we can interrogate the detail of how the runs behave. It is insightful to see (for example) how many of the ms/ms are on overlapping peptides, and the distribution of ms/ms selections per feature (vs intensity). This is just the first stage. Having these data (which up till now have been difficult to access) allows for building of smarter tools (custom delta mass thresholds for each ms/ms, second peptide searching, seeing whether all the peptide ID for a feature agree, correlating ID from different search engines to the same spectra). There are outstanding questions from our users for things like 'is it really necessary to do duplicate runs?' or in other words, can we get the machine to treat duplicate runs differently to optimise ID. (under the principle that madness is doing the same thing repeatedly but expecting different results.) Parsers for XTandem! would be really useful as that is something we'd like to have in our tool chain. A Mascot one would be good - I am looking into that (it is on my list of things to do, just not near the top right now.) I very much favour a modular approach where each class/object does one thing really well and can feed output to another class, and all can be represented using open formats. It might be a good idea to arrange a telecon or Skype group chat for people who are interested in contributing to this and building a comprehensive set of tools into Biopython. I can't promise too much from our end but we are making good progress and we have a strong commitment to open software and algorithms, with a heavy python development presence. ..d The University of Dundee is a registered Scottish Charity, No: SC015096 _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From schnoes at gmail.com Thu Feb 16 20:18:30 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Thu, 16 Feb 2012 17:18:30 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: Hi, I have some python code (using BioPython 1.58) that uses Bio.Entrez to pull out information on 50 papers from pubmed. I have had no problem with this code until yesterday when I started getting HTTP Error 500 messages that continue until today. For example... Traceback (most recent call last): File "", line 1, in File "sp_tools.py", line 421, in top_papers_dict rettype="medline", retmode="text") File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 113, in efetch return _open(cgi, variables) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 360, in _open raise exception urllib2.HTTPError: HTTP Error 500: Internal server error The parameters I'm using are db = pubmed rettype = medline retmode = text Anyone have an idea why this might be happening now? Thanks! Alexandra -- Alexandra Schnoes, Ph.D. Scientist, Babbitt Laboratory Program Coordinator, Graduate Student Internships for Career Exploration University of California San Francisco Tel: 415-502-1248 Fax: 415-514-9656 Email: schnoes at gmail.com From jgrant at smith.edu Thu Feb 16 21:39:54 2012 From: jgrant at smith.edu (Jessica Grant) Date: Thu, 16 Feb 2012 21:39:54 -0500 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: Message-ID: I have a script I have used successfully in the past that uses Entrez. Yesterday, a lab-mate was using it but about half way through the files she was processing she got an error and it wouldn't work after that. I went over the script from top to bottom to see what went wrong and couldn't find a problem. Your question gives me hope that it is something happening at ncbi. Our error was not identical. Our script seemed to work until it tried to read the output of the Entrez.efetch and then found no record in the handle (or something like that...I don't have it in front of me.) I suggested my lab mate contact the ncbi help desk to see if anything was wrong on their end, but I dont' know if she did or if she heard back. Jessica On Thu, Feb 16, 2012 at 8:18 PM, Alexandra Schnoes wrote: > Hi, > > I have some python code (using BioPython 1.58) that uses Bio.Entrez to pull > out information on 50 papers from pubmed. I have had no problem with this > code until yesterday when I started getting HTTP Error 500 messages that > continue until today. For example... > > Traceback (most recent call last): > File "", line 1, in > File "sp_tools.py", line 421, in top_papers_dict > rettype="medline", retmode="text") > File > > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 113, in efetch > return _open(cgi, variables) > File > > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 360, in _open > raise exception > urllib2.HTTPError: HTTP Error 500: Internal server error > > The parameters I'm using are > db = pubmed > rettype = medline > retmode = text > > Anyone have an idea why this might be happening now? > > Thanks! > Alexandra > > > -- > Alexandra Schnoes, Ph.D. > Scientist, Babbitt Laboratory > Program Coordinator, Graduate Student Internships for Career Exploration > University of California San Francisco > Tel: 415-502-1248 > Fax: 415-514-9656 > Email: schnoes at gmail.com > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mictadlo at gmail.com Fri Feb 17 01:39:27 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 17 Feb 2012 16:39:27 +1000 Subject: [Biopython] histogram plot of insert size Message-ID: Hi all, How is it possible to create histogram plot of insert size with pysam/Biopython? Thank you in advance. Cheers From schnoes at gmail.com Fri Feb 17 02:31:10 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Thu, 16 Feb 2012 23:31:10 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References:

Message-ID: Thanks, that is somewhat encouraging to hear. I have also emailed NCBI (that's actually where the multi-repeat subject line came from. My computer sometimes has weird lags and apparently I hit ctrl-v a couple of times when copying over the email I sent to NCBI, and didn't notice it. My apologies). Hopefully one of us will hear something soon! Alex On Thu, Feb 16, 2012 at 6:39 PM, Jessica Grant wrote: > I have a script I have used successfully in the past that uses Entrez. > Yesterday, a lab-mate was using it but about half way through the files > she was processing she got an error and it wouldn't work after that. I > went over the script from top to bottom to see what went wrong and couldn't > find a problem. Your question gives me hope that it is something happening > at ncbi. Our error was not identical. Our script seemed to work until it > tried to read the output of the Entrez.efetch and then found no record in > the handle (or something like that...I don't have it in front of me.) > > I suggested my lab mate contact the ncbi help desk to see if anything was > wrong on their end, but I dont' know if she did or if she heard back. > > Jessica > > > > > > On Thu, Feb 16, 2012 at 8:18 PM, Alexandra Schnoes wrote: > >> Hi, >> >> I have some python code (using BioPython 1.58) that uses Bio.Entrez to >> pull >> out information on 50 papers from pubmed. I have had no problem with this >> code until yesterday when I started getting HTTP Error 500 messages that >> continue until today. For example... >> >> Traceback (most recent call last): >> File "", line 1, in >> File "sp_tools.py", line 421, in top_papers_dict >> rettype="medline", retmode="text") >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 113, in efetch >> return _open(cgi, variables) >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 360, in _open >> raise exception >> urllib2.HTTPError: HTTP Error 500: Internal server error >> >> The parameters I'm using are >> db = pubmed >> rettype = medline >> retmode = text >> >> Anyone have an idea why this might be happening now? >> >> Thanks! >> Alexandra >> >> >> From p.j.a.cock at googlemail.com Fri Feb 17 04:56:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 09:56:05 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References:

Message-ID: On Fri, Feb 17, 2012 at 7:31 AM, Alexandra Schnoes wrote: > Thanks, that is somewhat encouraging to hear. I have also emailed NCBI > (that's actually where the multi-repeat subject line came from. My computer > sometimes has weird lags and apparently I hit ctrl-v a couple of times when > copying over the email I sent to NCBI, and didn't notice it. My apologies). > Hopefully one of us will hear something soon! > Alex This is probably due to a recent NCBI change (on Wednesday 15 Feb 2012) with the release of EFetch 2.0, see: http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000085.html http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Release_Notes They have changed things with the default retmode, although looking at their Table 1 from the above link, using rettype="medline", retmode="text" looks OK still with both db="pmc" and ab"pubmed" databases. Peter From mictadlo at gmail.com Fri Feb 17 05:13:31 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 17 Feb 2012 20:13:31 +1000 Subject: [Biopython] histogram plot of insert size In-Reply-To: <84771354-DF77-49FE-B1EC-F3EAA1FF21F6@hsr.it> References: <84771354-DF77-49FE-B1EC-F3EAA1FF21F6@hsr.it> Message-ID: Hi Cittaro, Thank you for your solution. I run fixmate from samtools on a BAM file: HWI-ST226_0154:4:1206:12773:170407#CTTGTA 73 A_01a 1046 30 100M * 0 0 AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT fdfffdfffbddaec_dSc^ddd^dQc^`Udddad_`c^^ac`R_NV\\]T^c`T_Mc]aV[V\R`Xa^^EKIIVcccc`YNY]UV[U`BBBBBBBBBBB NM:i:1 MD:Z:73T26 I just wonder which column I have to take to fill isize_array? Thank you in advance. On Fri, Feb 17, 2012 at 7:07 PM, Cittaro Davide wrote: > > On Feb 17, 2012, at 7:39 AM, Mic wrote: > > > Hi all, > > How is it possible to create histogram plot of insert size with > pysam/Biopython? > > As far as you have some plotting library yes. > Take a look to matplotlib and try this: > > import matplotlib.pyplot as plt > > f = plt.figure() > h = f.add_subplot(111) > h.hist(isize_array, bins=50, normed=True) > f.savefig('myhist.pdf', format='pdf') > > assuming isize_array is your array/list of insert sizes > > d > /* > Davide Cittaro, PhD > > Head of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > From Matej.Repic at ki.si Fri Feb 17 08:20:13 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 13:20:13 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: <47063F60-DC26-4901-9D12-8503F285F3C9@ki.si> Fortunately, the fix is quite simple: Substitute the id=idlist in you fetch line with id=",".join(idlist). Explanation: This has something to do with how Pubmed accepts a python list. If you enter PMIDs by hand it works ok, but if you feed it a python list you get an internal server error. Instead of using the procedure from the Cookbook, where you feed the list: >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") Use this slightly modified line for record_handle. With the ",".join(idlist) you convert the idlist python list to a comma separated string, which works as expected. >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=",".join(idlist), rettype="medline", retmode="text") Kind regards, Matej ---------------------------------------------------------- Matej Repi? Junior Researcher Laboratory for Biocomputing and Bioinformatics National Institute of Chemistry Hajdrihova 19 SI-1001 Ljubljana POB 660 Slovenia tel: +386-1-4760457 e-mail: matej.repic at ki.si ---------------------------------------------------------- From Matej.Repic at ki.si Fri Feb 17 08:14:10 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 13:14:10 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Fortunately, the fix is quite simple: Substitute the id=idlist in you fetch line with id=",".join(idlist). Explanation: This has something to do with how Pubmed accepts a python list. If you enter PMIDs by hand it works ok, but if you feed it a python list you get an internal server error. Instead of using the procedure from the Cookbook, where you feed the list: >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") Use this slightly modified line for record_handle. With the ",".join(idlist) you convert the idlist python list to a comma separated string, which works as expected. >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=",".join(idlist), rettype="medline", retmode="text") Kind regards, Matej ---------------------------------------------------------- Matej Repi? Junior Researcher Laboratory for Biocomputing and Bioinformatics National Institute of Chemistry Hajdrihova 19 SI-1001 Ljubljana POB 660 Slovenia tel: +386-1-4760457 e-mail: matej.repic at ki.si ---------------------------------------------------------- From p.j.a.cock at googlemail.com Fri Feb 17 08:36:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 13:36:13 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Message-ID: On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: > Fortunately, the fix is quite simple: > > Substitute the id=idlist in you fetch line with id=",".join(idlist). > Hi Matej, Well spotted. The idea is that this call, Entrez.efetch(db="pubmed", id=['22307645', '22303114', '22301129', '22299544', '22298842'], rettype="medline", retmode="text") accesses this URL (where I have removed the email entry) which used to work but isn't following the letter of the specification: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&tool=biopython&db=pubmed&id=22307645&id=22303114&id=22301129&id=22299544&id=22298842&rettype=medline Whereas this call: h = Entrez.efetch(db="pubmed", id="22307645,22303114,22301129,22299544,22298842", rettype="medline", retmode="text") actually uses a different URL (which the NCBI would approve of): http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&tool=biopython&db=pubmed&id=22307645%2C22303114%2C22301129%2C22299544%2C22298842&rettype=medline It is possible the NCBI may opt to "fix" this, but it looks like it was only working in the past by accident. However, we can do the conversion inside the Bio.Entrez.efetch function in future - we're due for another Biopython release now anyway so you shouldn't have to wait too long. I'm having general errors from the Entrez server right now - so I can't confirm the problem or test the potential fix yet. Peter From p.j.a.cock at googlemail.com Fri Feb 17 09:41:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 14:41:26 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Message-ID: On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock wrote: > On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: >> Fortunately, the fix is quite simple: >> >> Substitute the id=idlist in you fetch line with id=",".join(idlist). >> > > Hi Matej, > > Well spotted. ... > > It is possible the NCBI may opt to "fix" this, but it looks like it was only > working in the past by accident. However, we can do the conversion inside > the Bio.Entrez.efetch function in future - we're due for another Biopython > release now anyway so you shouldn't have to wait too long. > > I'm having general errors from the Entrez server right now - so I can't confirm > the problem or test the potential fix yet. I guess they kicked the server or something - it is working again now, and I could confirm Matej Repi?'s findings and test my fix based on them: https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed If anyone needs the fix right now, you must install Biopython from source, or at least update the Bio/Entrez/__init__.py file by hand. Some testing would be appreciated - and then we'll try to expedite the release of Biopython 1.59 by the end of the month. Peter P.S. If anyone wants a small challenge for contributing to Biopython, an online unit test for this and other things in Bio.Entrez would be great. Please ask for more information on the biopython-dev list if you're interested in helping out. From jgrant at smith.edu Fri Feb 17 09:52:21 2012 From: jgrant at smith.edu (Jessica Grant) Date: Fri, 17 Feb 2012 09:52:21 -0500 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

Message-ID: <937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> My problem was fixed by changing the db to "nuccore" and the rettype to "fasta". Up until the other day, I was successfully using "nucleotide" and "gb". Jessica On Feb 17, 2012, at 9:41 AM, Peter Cock wrote: > On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock wrote: >> On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: >>> Fortunately, the fix is quite simple: >>> >>> Substitute the id=idlist in you fetch line with id=",".join(idlist). >>> >> >> Hi Matej, >> >> Well spotted. ... >> >> It is possible the NCBI may opt to "fix" this, but it looks like it was only >> working in the past by accident. However, we can do the conversion inside >> the Bio.Entrez.efetch function in future - we're due for another Biopython >> release now anyway so you shouldn't have to wait too long. >> >> I'm having general errors from the Entrez server right now - so I can't confirm >> the problem or test the potential fix yet. > > I guess they kicked the server or something - it is working again now, and > I could confirm Matej Repi?'s findings and test my fix based on them: > https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed > > If anyone needs the fix right now, you must install Biopython from source, > or at least update the Bio/Entrez/__init__.py file by hand. Some testing > would be appreciated - and then we'll try to expedite the release of > Biopython 1.59 by the end of the month. > > Peter > > P.S. If anyone wants a small challenge for contributing to Biopython, an > online unit test for this and other things in Bio.Entrez would be great. > Please ask for more information on the biopython-dev list if you're > interested in helping out. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From schnoes at gmail.com Fri Feb 17 12:33:59 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Fri, 17 Feb 2012 09:33:59 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: <937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

<937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> Message-ID: Wow. That was quick! Thanks guys! Alex On Fri, Feb 17, 2012 at 6:52 AM, Jessica Grant wrote: > My problem was fixed by changing the db to "nuccore" and the rettype to > "fasta". Up until the other day, I was successfully using "nucleotide" and > "gb". > > Jessica > > > > > > > > On Feb 17, 2012, at 9:41 AM, Peter Cock wrote: > > > On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock > wrote: > >> On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: > >>> Fortunately, the fix is quite simple: > >>> > >>> Substitute the id=idlist in you fetch line with id=",".join(idlist). > >>> > >> > >> Hi Matej, > >> > >> Well spotted. ... > >> > >> It is possible the NCBI may opt to "fix" this, but it looks like it was > only > >> working in the past by accident. However, we can do the conversion > inside > >> the Bio.Entrez.efetch function in future - we're due for another > Biopython > >> release now anyway so you shouldn't have to wait too long. > >> > >> I'm having general errors from the Entrez server right now - so I can't > confirm > >> the problem or test the potential fix yet. > > > > I guess they kicked the server or something - it is working again now, > and > > I could confirm Matej Repi?'s findings and test my fix based on them: > > > https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed > > > > If anyone needs the fix right now, you must install Biopython from > source, > > or at least update the Bio/Entrez/__init__.py file by hand. Some testing > > would be appreciated - and then we'll try to expedite the release of > > Biopython 1.59 by the end of the month. > > > > Peter > > > > P.S. If anyone wants a small challenge for contributing to Biopython, an > > online unit test for this and other things in Bio.Entrez would be great. > > Please ask for more information on the biopython-dev list if you're > > interested in helping out. > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Matej.Repic at ki.si Fri Feb 17 16:21:24 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 21:21:24 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

Message-ID: <7E558ACD-59B7-41B0-BE87-998704A3940C@ki.si> The fix is working for me. I updated the __init__.py by hand and now both approaches work. By both i mean id=idlist or id=",".join(idlist). Great and quick fix! Regards, Matej On 17. feb. 2012, at 15:41, Peter Cock wrote: > On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock wrote: >> On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: >>> Fortunately, the fix is quite simple: >>> >>> Substitute the id=idlist in you fetch line with id=",".join(idlist). >>> >> >> Hi Matej, >> >> Well spotted. ... >> >> It is possible the NCBI may opt to "fix" this, but it looks like it was only >> working in the past by accident. However, we can do the conversion inside >> the Bio.Entrez.efetch function in future - we're due for another Biopython >> release now anyway so you shouldn't have to wait too long. >> >> I'm having general errors from the Entrez server right now - so I can't confirm >> the problem or test the potential fix yet. > > I guess they kicked the server or something - it is working again now, and > I could confirm Matej Repi?'s findings and test my fix based on them: > https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed > > If anyone needs the fix right now, you must install Biopython from source, > or at least update the Bio/Entrez/__init__.py file by hand. Some testing > would be appreciated - and then we'll try to expedite the release of > Biopython 1.59 by the end of the month. > > Peter > > P.S. If anyone wants a small challenge for contributing to Biopython, an > online unit test for this and other things in Bio.Entrez would be great. > Please ask for more information on the biopython-dev list if you're > interested in helping out. From p.j.a.cock at googlemail.com Fri Feb 17 17:33:23 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 22:33:23 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: <7E558ACD-59B7-41B0-BE87-998704A3940C@ki.si> References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

<7E558ACD-59B7-41B0-BE87-998704A3940C@ki.si> Message-ID: On Fri, Feb 17, 2012 at 9:21 PM, Matej Repi? wrote: > The fix is working for me. I updated the __init__.py by hand > and now both approaches work. By both i mean id=idlist or > id=",".join(idlist). > > Great and quick fix! > > Regards, > Matej Great - independent testing is always good, thanks. The NCBI have now announced this change was a deliberate tightening up of the Entrez API, so we do need this fix in Biopython to avoid existing scripts breaking: http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000086.html Peter From mictadlo at gmail.com Sat Feb 18 02:53:32 2012 From: mictadlo at gmail.com (Mic) Date: Sat, 18 Feb 2012 17:53:32 +1000 Subject: [Biopython] Google Summer of Code 2012 Message-ID: Hi all, would it be possible to put the following idea to Google Summer of Code 2012: * http://www.biogems.info/ for Biopython with http://pypi.python.org/pypi/pip . Could include pysam, GFF3, OBO/OWL (see below) and so on. Cheers, On Tue, Dec 27, 2011 at 3:55 AM, Martin Mokrejs wrote: > Hi, > I would like to parse some OBO/OWL files in python. I searched > for some existing code and found http://biopython.org/wiki/Gene_Ontology > pointing to some OWL parser from Ed Cannon (follow a link on the page > listed above). Unfortunately, the code is gone? :(( > > I also discovered an OBO parser at http://hal.elte.hu/~nepusz/development > , > the sources can be fetched from > > http://bazaar.launchpad.net/~ntamas/+junk/go-parser/tarball/7?start_revid=7 > > It can open the .obo files for me although I do not see much methods > available. > > Finally, I found > https://github.com/gotgenes/biopython/tree/a4824ceb71f3a687b3eb5e1fefd0ad3c278bf185/Bio/GO so > my question is when will this be available in released biopython > and what are your opinions/suggestions now. Does it offer more than the > go-parser from ~ntamas? > > I want to cluster some sequences based on anatomical terms, so > I think what I want is to be able to lookup easily all parents > (probably except the very root node or so) and compare whether > they overlap with any parent of another sequence. > > Thank you for your comments, > Martin > P.S.: I want to parse OBO from http://www.evocontology.org/ > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Sat Feb 18 11:34:18 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:34:18 -0500 Subject: [Biopython] Bio.Phylo bugs & pain points Message-ID: Folks, Since we're coming up on another release of Biopython, I'd like to identify any remaing bugs, pain points, aesthetic flaws, and minor missing features in Bio.Phylo. (And hopefully, fix them before the release.) In particular, the Phylo.draw() function, which plots a rooted phylogram with matplotlib, appeared in the last Biopython release unannounced. There are already many tree-drawing programs that produce beautiful publication-quality graphics, and we're not trying to compete with those. But we do want it to be useful for quickly visualizing a tree as you develop a script or modify a tree interactively in IPython, for example. So -- do the trees drawn by Phylo.draw() look right? Thanks, Eric From mjldehoon at yahoo.com Sat Feb 18 11:46:33 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 08:46:33 -0800 (PST) Subject: [Biopython] Bio.Phylo bugs & pain points In-Reply-To: Message-ID: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Hi Eric, > But we do want it to be useful for quickly visualizing a > tree as you develop a script or modify a tree > interactively in IPython, for example. Do you need IPython or is regular Python sufficient? -Michiel. --- On Sat, 2/18/12, Eric Talevich wrote: > From: Eric Talevich > Subject: [Biopython] Bio.Phylo bugs & pain points > To: "BioPython Mailing List" , "BioPython-Dev Mailing List" > Date: Saturday, February 18, 2012, 11:34 AM > Folks, > > Since we're coming up on another release of Biopython, I'd > like to identify > any remaing bugs, pain points, aesthetic flaws, and minor > missing features > in Bio.Phylo. (And hopefully, fix them before the release.) > > In particular, the Phylo.draw() function, which plots a > rooted phylogram > with matplotlib, appeared in the last Biopython release > unannounced. There > are already many tree-drawing programs that produce > beautiful > publication-quality graphics, and we're not trying to > compete with those. > But we do want it to be useful for quickly visualizing a > tree as you > develop a script or modify a tree interactively in IPython, > for example. So > -- do the trees drawn by Phylo.draw() look right? > > Thanks, > Eric > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Sat Feb 18 11:52:16 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:52:16 -0500 Subject: [Biopython] Have you used Bio.Phylo in a published study? Message-ID: Folks, Now that Bio.Phylo has reached a somewhat stable point, we're preparing a journal article on it. I'd like to mention and cite some published studies in which Bio.Phylo was used for some part of the analysis. Has anyone here published a study that relied on the Phylo module of Biopython? I know of two so far: http://www.biology-direct.com/content/6/1/34/ http://www.biomedcentral.com/1471-2148/11/321 Thanks, Eric From eric.talevich at gmail.com Sat Feb 18 11:54:03 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:54:03 -0500 Subject: [Biopython] Bio.Phylo bugs & pain points In-Reply-To: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> References: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:46 AM, Michiel de Hoon wrote: > Hi Eric, > > > But we do want it to be useful for quickly visualizing a > > tree as you develop a script or modify a tree > > interactively in IPython, for example. > > Do you need IPython or is regular Python sufficient? > > -Michiel. > Regular Python plus matplotlib is sufficient. IPython has convenient integration with pylab, that's all. -E > > --- On Sat, 2/18/12, Eric Talevich wrote: > > > From: Eric Talevich > > Subject: [Biopython] Bio.Phylo bugs & pain points > > To: "BioPython Mailing List" , > "BioPython-Dev Mailing List" > > Date: Saturday, February 18, 2012, 11:34 AM > > Folks, > > > > Since we're coming up on another release of Biopython, I'd > > like to identify > > any remaing bugs, pain points, aesthetic flaws, and minor > > missing features > > in Bio.Phylo. (And hopefully, fix them before the release.) > > > > In particular, the Phylo.draw() function, which plots a > > rooted phylogram > > with matplotlib, appeared in the last Biopython release > > unannounced. There > > are already many tree-drawing programs that produce > > beautiful > > publication-quality graphics, and we're not trying to > > compete with those. > > But we do want it to be useful for quickly visualizing a > > tree as you > > develop a script or modify a tree interactively in IPython, > > for example. So > > -- do the trees drawn by Phylo.draw() look right? > > > > Thanks, > > Eric > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > From eric.talevich at gmail.com Sat Feb 18 12:11:27 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 12:11:27 -0500 Subject: [Biopython] Bio.Phylo bugs & pain points In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 11:34 AM, Eric Talevich wrote: > So -- do the trees drawn by Phylo.draw() look right? > > Here's how to get a quick tree, using a test file from the Biopython source distribution: >>> from Bio import Phylo >>> tree = Phylo.read("Tests/PhyloXML/apaf.xml", "phyloxml") >>> Phylo.draw(tree) If you don't have the Tests/ directory, you can use any other Newick, Nexus or PhyloXML tree; just change the file name and format name in the call to Phylo.read(). Thanks, Eric From mictadlo at gmail.com Mon Feb 20 01:37:59 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 20 Feb 2012 16:37:59 +1000 Subject: [Biopython] retrieving paired sequences from BAM Message-ID: Hello, I am trying to retrieve paired end sequences from BAM file. However, I am only able to retrieve the A sequence with the following code: '''''''''''''''''''''''''''''''''''''''' import pysam samfile = pysam.Samfile("b.sorted.bam", "rb") for read in samfile.fetch(): if read.is_paired: print read.qname print read.seq samfile.close() $ python a.py | head -n 2 HWI-ST226_0154:4:1206:12773:170407#CTTGTA AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT '''''''''''''''''''''''''''''''''''''''' How is it possible to retrieve also the B sequence? What is the difference between is_proper_pair and is_paired? ------------- $ grep -A 1 'HWI-ST226_0154:4:1206:12773:170407#CTTGTA' X_A.fastq @HWI-ST226_0154:4:1206:12773:170407#CTTGTA/1 AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT $ grep -A 1 'HWI-ST226_0154:4:1206:12773:170407#CTTGTA' X_B.fastq @HWI-ST226_0154:4:1206:12773:170407#CTTGTA/2 GGCGCTGTGACTGAAGTCCTCAGATTTCGGTACGGTTTTGTCTATTTCTGGGTTCCTGCGGAAACACCTTCTCGATTATTTTCTAATCTCAATTAGGTTT ------------- Thank you in advance. Cheers From MatatTHC at gmx.de Wed Feb 22 10:06:02 2012 From: MatatTHC at gmx.de (Matthias Bernt) Date: Wed, 22 Feb 2012 16:06:02 +0100 Subject: [Biopython] Degenerated Codons Message-ID: <20120222150602.313750@gmx.net> Hi, Is there a some functionality: - to list X-fold degenerated codons? - to test if a codon is X-fold degenerated? - to return X for a given codon in biopython? If not then we might forward this to the dev-list. I would try to implement this. Matthias -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de From p.j.a.cock at googlemail.com Wed Feb 22 10:58:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 22 Feb 2012 15:58:36 +0000 Subject: [Biopython] Degenerated Codons In-Reply-To: <20120222150602.313750@gmx.net> References: <20120222150602.313750@gmx.net> Message-ID: On Wed, Feb 22, 2012 at 3:06 PM, Matthias Bernt wrote: > Hi, > > Is there a some functionality: > - to list X-fold degenerated codons? > - to test if a codon is X-fold degenerated? > - to return X for a given codon > in biopython? > > If not then we might forward this to the dev-list. I would try to implement this. > > Matthias Hi Matthias, You can probably do that with the codon dictionaries provided in Bio.Data.CodonTable (used for translations). If you could give some specific examples of input and desired output I'm sure we can give you more hints. Peter From MatatTHC at gmx.de Thu Feb 23 06:00:42 2012 From: MatatTHC at gmx.de (Matthias Bernt) Date: Thu, 23 Feb 2012 12:00:42 +0100 Subject: [Biopython] Degenerated Codons In-Reply-To: References: <20120222150602.313750@gmx.net> Message-ID: Hi, You can probably do that with the codon dictionaries > provided in Bio.Data.CodonTable (used for translations). > If you could give some specific examples of input and > desired output I'm sure we can give you more hints. > You mean the forward_table? The most important function for me is: - Input: Codon (sequence of length 3), code table - Output: X, number of codons coding for the same amino acid as the given codon Building on this it would be easy to implements: - Input: Codon, code table, X - Output: true iff there are X codons coding for the same amino acid Also important (and maybe the core of the other two functions): - Input: codon - Output: list of codons coding for the same amino acid I think this is it. I need to implement these functions anyway. So maybe you can give me hints how I should implement it. For my application it would be acceptable to iterate over the forward_table for each call. But maybe its possible to modify the backward_table such that it stores all codons (as list) for a amino acid. For the other functions building on it (back translation...) it would be possible to take the first element of the list. Matthias From p.j.a.cock at googlemail.com Thu Feb 23 06:29:04 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 11:29:04 +0000 Subject: [Biopython] Degenerated Codons In-Reply-To: References: <20120222150602.313750@gmx.net>

Message-ID: On Thu, Feb 23, 2012 at 11:00 AM, Matthias Bernt wrote: > Hi, > >> You can probably do that with the codon dictionaries >> provided in Bio.Data.CodonTable (used for translations). >> If you could give some specific examples of input and >> desired output I'm sure we can give you more hints. > > > You mean the forward_table? > > The most important function for me is: > - Input: Codon (sequence of length 3), code table > - Output: X, number of codons coding for the same > ? amino acid as the given codon Where table could be ambiguous or unambiguous, DNA or RNA? Something like this works nicely for unambiguous tables: from Bio.Data.CodonTable import unambiguous_dna_by_id for table in [unambiguous_dna_by_id[1], unambiguous_dna_by_id[2]]: print table.id for codon in ["AAA", "ATT"]: amino = table.forward_table[codon] alt_codons = [k for (k,v) in table.forward_table.iteritems() if v==amino] print "%s -> %s, %i codons for this amino acid" % (codon, amino, len(alt_codons)) Giving: 1 AAA -> K, 2 codons for this amino acid ATT -> I, 3 codons for this amino acid 2 AAA -> K, 2 codons for this amino acid ATT -> I, 2 codons for this amino acid > But maybe its possible to modify the backward_table such > that it stores all codons (as list) for a amino acid. For the other > functions building on it (back translation...) it would be possible > to take the first element of the list. We can't change that for backwards compatibility, but we could add a alt_backward_table or something instead? Peter From marco.galardini at unifi.it Thu Feb 23 11:05:53 2012 From: marco.galardini at unifi.it (Marco Galardini) Date: Thu, 23 Feb 2012 17:05:53 +0100 Subject: [Biopython] SeqIO fasta "fakes" recognition Message-ID: <4F4663E1.5010206@unifi.it> Hi all, i was wondering if you are aware of a method to distinguish between "real" fasta files and files that just happen to have a ">" character. I would like to scan a directory and return only the "real" fasta files. I tried to open a .png file and surprisingly it gave me the following results: SeqIO.parse(open('Screenshot.png'),'fasta').next() SeqRecord(seq=Seq('?;9r$????8?n??????7M?4???\?r???0????$It??I...q+', SingleLetterAlphabet()), id='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]``', name='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]``', description='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]`` \x81\x81\x81\xec\xdb\xb7Ok\xf9\xd5\xabW\xf1\xf0\xf0`\xe2\xc4\x89\x8c\x181\x82\x9e={j\x95+\x14', dbxrefs=[]) I tried to use some Alphabets but i experienced the same results. Thanks in advance, Marco -- ------------------------------------------------- Marco Galardini DBE - Department of Evolutionary Biology University of Florence - Italy e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 2288249 mobile: +39 340 2808041 ------------------------------------------------- From p.j.a.cock at googlemail.com Thu Feb 23 11:21:36 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 16:21:36 +0000 Subject: [Biopython] SeqIO fasta "fakes" recognition In-Reply-To: <4F4663E1.5010206@unifi.it> References: <4F4663E1.5010206@unifi.it> Message-ID: On Thu, Feb 23, 2012 at 4:05 PM, Marco Galardini wrote: > Hi all, > > i was wondering if you are aware of a method to distinguish between "real" > fasta files and files that just happen to have a ">" character. > I would like to scan a directory and return only the "real" fasta files. > I tried to open a .png file and surprisingly it gave me the following > results: The FASTA parser doesn't attempt to restrict the sequence alphabet, indeed some FASTA like files do use all sorts of weird characters (e.g. RNA secondary structure). Also it allows for 'free text' before the first record (useful in several situations including FASTA records embedded at the end of a GFF file). As a side effect of this need for tolerance, the code does its best to read any file you give it - but this is clearly a case of garbage in, garbage out (GIGO). Guessing bioinformatics file types is non-trivial, and not something that Bio.SeqIO attempts to do (unlike BioPerl). We take the Python approach that you the user need to be explicit, and if you say it is a FASTA file we'll try to treat it as such. Detecting image files (or indeed most binary file types) on the other hand is much easier - so do that instead? Peter From eric.talevich at gmail.com Thu Feb 23 11:35:29 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 23 Feb 2012 11:35:29 -0500 Subject: [Biopython] SeqIO fasta "fakes" recognition In-Reply-To: References: <4F4663E1.5010206@unifi.it> Message-ID: On Thu, Feb 23, 2012 at 11:21 AM, Peter Cock wrote: > On Thu, Feb 23, 2012 at 4:05 PM, Marco Galardini > wrote: > > Hi all, > > > > i was wondering if you are aware of a method to distinguish between > "real" > > fasta files and files that just happen to have a ">" character. > > I would like to scan a directory and return only the "real" fasta files. > > I tried to open a .png file and surprisingly it gave me the following > > results: > > The FASTA parser doesn't attempt to restrict the sequence alphabet, > indeed some FASTA like files do use all sorts of weird characters > (e.g. RNA secondary structure). Also it allows for 'free text' before > the first record (useful in several situations including FASTA records > embedded at the end of a GFF file). As a side effect of this need for > tolerance, the code does its best to read any file you give it - but > this is clearly a case of garbage in, garbage out (GIGO). > > Guessing bioinformatics file types is non-trivial, and not something > that Bio.SeqIO attempts to do (unlike BioPerl). We take the Python > approach that you the user need to be explicit, and if you say it is > a FASTA file we'll try to treat it as such. > > I suppose there's always: try: record = SeqIO.read("gigo.png", "fasta") assert str(record.seq).isalpha() except: # complain... At some point, didn't we discuss adding optional alphabet validation, e.g. a validate() method or something more automatic? -Eric From p.j.a.cock at googlemail.com Thu Feb 23 11:38:59 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 16:38:59 +0000 Subject: [Biopython] SeqIO fasta "fakes" recognition In-Reply-To: References: <4F4663E1.5010206@unifi.it>

Message-ID: On Thu, Feb 23, 2012 at 4:35 PM, Eric Talevich wrote: > > At some point, didn't we discuss adding optional alphabet validation, e.g. a > validate() method or something more automatic? > Yes, but with no clear best way forward agreed: http://lists.open-bio.org/pipermail/biopython-dev/2011-December/009343.html https://redmine.open-bio.org/issues/2597 Peter From marco.galardini at unifi.it Thu Feb 23 11:40:25 2012 From: marco.galardini at unifi.it (Marco Galardini) Date: Thu, 23 Feb 2012 17:40:25 +0100 Subject: [Biopython] SeqIO fasta "fakes" recognition In-Reply-To: References: <4F4663E1.5010206@unifi.it> Message-ID: <4F466BF9.6080805@unifi.it> On 02/23/2012 05:21 PM, Peter Cock wrote: > On Thu, Feb 23, 2012 at 4:05 PM, Marco Galardini > wrote: >> Hi all, >> >> i was wondering if you are aware of a method to distinguish between "real" >> fasta files and files that just happen to have a ">" character. >> I would like to scan a directory and return only the "real" fasta files. >> I tried to open a .png file and surprisingly it gave me the following >> results: > Guessing bioinformatics file types is non-trivial, and not something > that Bio.SeqIO attempts to do (unlike BioPerl). We take the Python > approach that you the user need to be explicit, and if you say it is > a FASTA file we'll try to treat it as such. You're right: probably the best thing to do will be to trust users and hope they won't push garbage as inputs. > Detecting image files (or indeed most binary file types) on the other > hand is much easier - so do that instead? > In principle this is true, but the fact is that i think it won't be easy or straightforward to account for all possible file formats that can be found in a given directory. I'll stick to good python principles and hope to have smart-enough users :) Thanks for your instant reply. Marco -- ------------------------------------------------- Marco Galardini DBE - Department of Evolutionary Biology University of Florence - Italy e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 2288249 mobile: +39 340 2808041 ------------------------------------------------- From marco.galardini at unifi.it Thu Feb 23 12:38:26 2012 From: marco.galardini at unifi.it (Marco Galardini) Date: Thu, 23 Feb 2012 18:38:26 +0100 Subject: [Biopython] SeqIO fasta "fakes" recognition In-Reply-To: References: <4F4663E1.5010206@unifi.it>

Message-ID: <4F467992.9060205@unifi.it> On 02/23/2012 05:35 PM, Eric Talevich wrote: > > I suppose there's always: > > try: > record = SeqIO.read("gigo.png", "fasta") > assert str(record.seq).isalpha() > except: > # complain... > > Thanks for the hint, I've implemented this (using the parse method) and i'll see how it will perform (i guess it will had some overhead). Marco -- ------------------------------------------------- Marco Galardini DBE - Department of Evolutionary Biology University of Florence - Italy e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 2288249 mobile: +39 340 2808041 ------------------------------------------------- From MatatTHC at gmx.de Thu Feb 23 16:09:18 2012 From: MatatTHC at gmx.de (Matthias Bernt) Date: Thu, 23 Feb 2012 22:09:18 +0100 Subject: [Biopython] Degenerated Codons In-Reply-To: References: <20120222150602.313750@gmx.net>

Message-ID: hi peter, Thank you for the suggestions. I will try to create the functions as suggested. Should I post them here? > Where table could be ambiguous or unambiguous, > DNA or RNA? Something like this works nicely for > unambiguous tables: > > from Bio.Data.CodonTable import unambiguous_dna_by_id > for table in [unambiguous_dna_by_id[1], unambiguous_dna_by_id[2]]: > print table.id > for codon in ["AAA", "ATT"]: > amino = table.forward_table[codon] > alt_codons = [k for (k,v) in table.forward_table.iteritems() > if v==amino] > print "%s -> %s, %i codons for this amino acid" % (codon, > amino, len(alt_codons)) > > Giving: > > 1 > AAA -> K, 2 codons for this amino acid > ATT -> I, 3 codons for this amino acid > 2 > AAA -> K, 2 codons for this amino acid > ATT -> I, 2 codons for this amino acid > > > > But maybe its possible to modify the backward_table such > > that it stores all codons (as list) for a amino acid. For the other > > functions building on it (back translation...) it would be possible > > to take the first element of the list. > > We can't change that for backwards compatibility, but we > could add a alt_backward_table or something instead > I think we keep it as it is at the moment. Performance is not so important for me .. so far. Optimisation can still be done later. Matthias From fenggao0907 at yahoo.com.cn Thu Feb 23 16:35:23 2012 From: fenggao0907 at yahoo.com.cn (=?utf-8?B?6auY5YekKEZlbmcgR0FPKQ==?=) Date: Fri, 24 Feb 2012 05:35:23 +0800 (CST) Subject: [Biopython] Entrez and SeqIO "no records found in handle" Message-ID: <1330032923.65491.YahooMailNeo@web15106.mail.cnb.yahoo.com> Hi all, We have some python code using gi number to get record from Genbank. Part of the code is: handle = Entrez.efetch(db="protein", id=ID, rettype="gb") record = SeqIO.read(handle,"genbank") We have had no problem with this code until this week when we started getting "ValueError: No records found in handle". Anyone have an idea how to fix it now? Thanks! Feng ? --------------------------------------Ms. Feng Gao Department of Biological Sciences Smith College Northampton, MA 01063 USA & Laboratory of Protozoology Institute of Evolution and Marine Biodiversity Ocean University of China 266003 Qingdao China E-mail: fenggao0907 at yahoo.com.cn -------------------------------------- From p.j.a.cock at googlemail.com Thu Feb 23 18:00:17 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 23:00:17 +0000 Subject: [Biopython] Degenerated Codons In-Reply-To: References: <20120222150602.313750@gmx.net>

Message-ID: On Thu, Feb 23, 2012 at 9:09 PM, Matthias Bernt wrote: > hi peter, > > Thank you for the suggestions. I will try to create the functions as > suggested. > Should I post them here? Sure - or on the wiki under a new 'Cookbook' entry? http://biopython.org/wiki/Category:Cookbook > I think we keep it as it is at the moment. Performance is not so important > for me .. so far. > Optimisation can still be done later. Of course :) Do you know the quote "premature optimization is the root of all evil"? Peter From p.j.a.cock at googlemail.com Thu Feb 23 18:03:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 23:03:01 +0000 Subject: [Biopython] Entrez and SeqIO "no records found in handle" In-Reply-To: <1330032923.65491.YahooMailNeo@web15106.mail.cnb.yahoo.com> References: <1330032923.65491.YahooMailNeo@web15106.mail.cnb.yahoo.com> Message-ID: 2012/2/23 ????(Feng GAO) : > Hi all, > We have some python code using gi number to get record from Genbank. > Part of the code is: > > handle = Entrez.efetch(db="protein", id=ID, rettype="gb") > record = SeqIO.read(handle,"genbank") > > We have had no problem with this code > until this week when we started getting "ValueError: No records found in handle". > Anyone have an idea how to fix it now? Thanks! > Feng Try using an explicit retmode="text" in the efetch call. The NCBI changed the defaults with EFetch 2.0, which went live earlier this month. You're probably getting XML back instead. Note to self: I wonder if the Biopython tutorial examples need to be updated as well... Peter From p.j.a.cock at googlemail.com Fri Feb 24 11:56:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 16:56:11 +0000 Subject: [Biopython] Biopython 1.59 released Message-ID: Dear Biopythoneers, Biopython 1.59 is out: http://news.open-bio.org/news/2012/02/biopython-1-59-released/ Thank you to everyone who has contributed. Peter P.S. We're on Twitter as @Biopython From idoerg at gmail.com Fri Feb 24 12:02:19 2012 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 24 Feb 2012 12:02:19 -0500 Subject: [Biopython] Biopython 1.59 released In-Reply-To: References: Message-ID: As an ex- (and hopefully future?) contributer and a strong cheerleader and user: thanks Peter and all developers for all the hard work! On Fri, Feb 24, 2012 at 11:56 AM, Peter Cock wrote: > Dear Biopythoneers, > > Biopython 1.59 is out: > http://news.open-bio.org/news/2012/02/biopython-1-59-released/ > > Thank you to everyone who has contributed. > > Peter > > P.S. We're on Twitter as @Biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From idoerg at gmail.com Fri Feb 24 12:24:41 2012 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 24 Feb 2012 12:24:41 -0500 Subject: [Biopython] Biopython 1.59 released In-Reply-To: References:

Message-ID: As an ex- (and hopefully future?) contributer and a strong cheerleader and user: thanks Peter and all developers for all the hard work! > On Fri, Feb 24, 2012 at 11:56 AM, Peter Cock wrote: > >> Dear Biopythoneers, >> >> Biopython 1.59 is out: >> http://news.open-bio.org/news/2012/02/biopython-1-59-released/ >> >> Thank you to everyone who has contributed. >> >> Peter >> >> P.S. We're on Twitter as @Biopython >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From rbuels at gmail.com Mon Feb 27 11:24:03 2012 From: rbuels at gmail.com (Robert Buels) Date: Mon, 27 Feb 2012 11:24:03 -0500 Subject: [Biopython] Update: call for Google Summer of Code project ideas Message-ID: <4F4BAE23.7070402@gmail.com> Hi all, As kindly pointed out by Reece Hart, the previous email I sent out calling for Google Summer of Code project ideas, had the wrong due date for project ideas in it. I actually want them to all be in place by Friday, March 2, which is this coming Friday. == Instructions for Wiki Editing == For each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. == end instructions == Again, please have the updates done by this Friday (March 2). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2012 OBF GSoC Organization Admin From fedyukina at gmail.com Mon Feb 27 16:00:11 2012 From: fedyukina at gmail.com (Daria Fedyukina) Date: Mon, 27 Feb 2012 15:00:11 -0600 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 Message-ID: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> Hello, I am trying to install BioPython 1.58. My BioPython and python are both in the Applications folder. My configuration is: OS: Mac OS 10.7.3 Python: 2.7.2 Apple Xcode is installed. NumPy is not installed (could not install for 2.7, only available for 2.6; decided to skip) When I try to install it, I go to X11 -> go inside biopython-1.58 older -> type "python setup.py build" -> choose to proceed without NumPy -> get an error. Here how it looks: bash-3.2$ cd biopython-1.58 bash-3.2$ ls Bio DEPRECATED MANIFEST.in README build BioSQL Doc NEWS Scripts do2to3.py CONTRIB LICENSE PKG-INFO Tests setup.py bash-3.2$ python setup.py build running build running build_py Numerical Python (NumPy) is not installed. This package is required for many Biopython features. Please install it before you install Biopython. You can install Biopython anyway, but anything dependent on NumPy will not work. If you do this, and later install NumPy, you should then re-install Biopython. You can find NumPy at http://numpy.scipy.org Do you want to continue this installation? (y/N): y running build_ext building 'Bio.cpairwise2' extension gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -isysroot /Developer/SDKs/MaSX10.6.sdk -arch i386 -arch x86_64 -g -O2 -DNDEBUG -g -O3 -IBio -I/Library/Fraworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.-o build/temp.macosx-10.6-intel-2.7/Bio/cpairwise2module.o In file included from /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, from /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, from Bio/cpairwise2module.c:12: /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No ch file or directory In file included from /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, from /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, from Bio/cpairwise2module.c:12: /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No ch file or directory lipo: can't figure out the architecture type of: /var/folders/nt/1ppm7j953zv2qylmtwydqm0000gn/T//ccODIZmY.out error: command 'gcc-4.2' failed with exit status 1 bash-3.2$ I googled this error, it turned out that gcc-4.2 needs to be installed. I did. Then, the situation is exactly the same when I repeat biopython installation. Does anyone know what could've happened? My only suspicion now is that Lion is messing stuff up. Looks like Snow Leopard works better. Thank you very much for any help or advice! -Daria From idoerg at gmail.com Mon Feb 27 16:27:31 2012 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 27 Feb 2012 16:27:31 -0500 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> Message-ID: Daria, Seems like there is a NumPy for Python 2.7 on Mac: http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/ In any case, you should not skip the NumPy installation. Even if for teh combination of your OS and Python version there is only NumPy for Python 2.6, you should install that. There is very little difference between Python 2.6 and 2.7, and NumPy should not break (and, in any case, seems like your desired version is in the link above). HTH, Iddo On Mon, Feb 27, 2012 at 4:00 PM, Daria Fedyukina wrote: > Hello, > > I am trying to install BioPython 1.58. My BioPython and python are both in > the Applications folder. > My configuration is: > OS: Mac OS 10.7.3 > Python: 2.7.2 > Apple Xcode is installed. > NumPy is not installed (could not install for 2.7, only available for 2.6; > decided to skip) > > When I try to install it, I go to X11 -> go inside biopython-1.58 older -> > type "python setup.py build" -> choose to proceed without NumPy -> get an > error. > > Here how it looks: > bash-3.2$ cd biopython-1.58 > bash-3.2$ ls > Bio DEPRECATED MANIFEST.in README build > BioSQL Doc NEWS Scripts do2to3.py > CONTRIB LICENSE PKG-INFO Tests setup.py > bash-3.2$ python setup.py build > running build > running build_py > > Numerical Python (NumPy) is not installed. > > This package is required for many Biopython features. Please install > it before you install Biopython. You can install Biopython anyway, but > anything dependent on NumPy will not work. If you do this, and later > install NumPy, you should then re-install Biopython. > > You can find NumPy at http://numpy.scipy.org > > Do you want to continue this installation? (y/N): > y > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -isysroot > /Developer/SDKs/MaSX10.6.sdk -arch i386 -arch x86_64 -g -O2 -DNDEBUG -g -O3 > -IBio -I/Library/Fraworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.-o > build/temp.macosx-10.6-intel-2.7/Bio/cpairwise2module.o > In file included from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, > from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, > from Bio/cpairwise2module.c:12: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: > No ch file or directory > In file included from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, > from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, > from Bio/cpairwise2module.c:12: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: > No ch file or directory > lipo: can't figure out the architecture type of: > /var/folders/nt/1ppm7j953zv2qylmtwydqm0000gn/T//ccODIZmY.out > error: command 'gcc-4.2' failed with exit status 1 > bash-3.2$ > > I googled this error, it turned out that gcc-4.2 needs to be installed. I > did. Then, the situation is exactly the same when I repeat biopython > installation. > > Does anyone know what could've happened? My only suspicion now is that > Lion is messing stuff up. Looks like Snow Leopard works better. > > Thank you very much for any help or advice! > -Daria > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From idoerg at gmail.com Mon Feb 27 17:37:01 2012 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 27 Feb 2012 17:37:01 -0500 Subject: [Biopython] Fwd: question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com>

Message-ID: ---------- Forwarded message ---------- From: Daria Fedyukina Date: Mon, Feb 27, 2012 at 5:15 PM Subject: Re: [Biopython] question on installation of BioPython on Mac OS 10.7 To: Iddo Friedberg Hi Iddo, I have installed NumPy. Thank you very much for finding the right version for me. I cannot understand how I missed it. I thought I installed NumPy correctly: I double clicked on it, I agreed on terms and conditions, I saw "installation is successful" After that I did not see any new folder or package in my Applications folder except previous numpy-1.6.1-py2.7-python.org-macosx10.3.dmg file. I drag and dropped numpy-1.6.1-py2.7.mpkg file in Applications just in case. I do not know what to do with this .mpkg file because if I double click on it, it gives me hte same installation wizard as before, all steps are the same, nothing chages. I went to my X11 -> went inside biopython-1.58 older -> typed "python setup.py build" -> it again asked me if I want to proceed without NumPy. It did not see that I installed NumPy. It means I did not install it right. I looked online, the instructions talk about some kind of binary here http://docs.scipy.org/doc/numpy/user/install.html for Mac OS X, but it does not look any different from what I have already done. Would you be so kind as to provide some insight on what they want? THank you again. -Daria P.S. I got Mac 3 months ago, I always thought that it is more convenient for programming in general. Well..... It was easier to install biopython on windows. This is a very interesting case. On Feb 27, 2012, at 3:27 PM, Iddo Friedberg wrote: Daria, Seems like there is a NumPy for Python 2.7 on Mac: http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/ In any case, you should not skip the NumPy installation. Even if for teh combination of your OS and Python version there is only NumPy for Python 2.6, you should install that. There is very little difference between Python 2.6 and 2.7, and NumPy should not break (and, in any case, seems like your desired version is in the link above). HTH, Iddo On Mon, Feb 27, 2012 at 4:00 PM, Daria Fedyukina wrote: > Hello, > > I am trying to install BioPython 1.58. My BioPython and python are both in > the Applications folder. > My configuration is: > OS: Mac OS 10.7.3 > Python: 2.7.2 > Apple Xcode is installed. > NumPy is not installed (could not install for 2.7, only available for 2.6; > decided to skip) > > When I try to install it, I go to X11 -> go inside biopython-1.58 older -> > type "python setup.py build" -> choose to proceed without NumPy -> get an > error. > > Here how it looks: > bash-3.2$ cd biopython-1.58 > bash-3.2$ ls > Bio DEPRECATED MANIFEST.in README build > BioSQL Doc NEWS Scripts do2to3.py > CONTRIB LICENSE PKG-INFO Tests setup.py > bash-3.2$ python setup.py build > running build > running build_py > > Numerical Python (NumPy) is not installed. > > This package is required for many Biopython features. Please install > it before you install Biopython. You can install Biopython anyway, but > anything dependent on NumPy will not work. If you do this, and later > install NumPy, you should then re-install Biopython. > > You can find NumPy at http://numpy.scipy.org > > Do you want to continue this installation? (y/N): > y > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -isysroot > /Developer/SDKs/MaSX10.6.sdk -arch i386 -arch x86_64 -g -O2 -DNDEBUG -g -O3 > -IBio -I/Library/Fraworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.-o > build/temp.macosx-10.6-intel-2.7/Bio/cpairwise2module.o > In file included from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, > from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, > from Bio/cpairwise2module.c:12: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: > No ch file or directory > In file included from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/unicodeobject.h:4, > from > /Library/Frameworks/Python.framework/Versions/2.7/includpython2.7/Python.h:85, > from Bio/cpairwise2module.c:12: > /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: > No ch file or directory > lipo: can't figure out the architecture type of: > /var/folders/nt/1ppm7j953zv2qylmtwydqm0000gn/T//ccODIZmY.out > error: command 'gcc-4.2' failed with exit status 1 > bash-3.2$ > > I googled this error, it turned out that gcc-4.2 needs to be installed. I > did. Then, the situation is exactly the same when I repeat biopython > installation. > > Does anyone know what could've happened? My only suspicion now is that > Lion is messing stuff up. Looks like Snow Leopard works better. > > Thank you very much for any help or advice! > -Daria > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From p.j.a.cock at googlemail.com Mon Feb 27 17:37:08 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 Feb 2012 22:37:08 +0000 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> Message-ID: On Monday, February 27, 2012, Daria Fedyukina wrote: > Hello, > > I am trying to install BioPython 1.58. My BioPython and python are both in > the Applications folder. > My configuration is: > OS: Mac OS 10.7.3 > Python: 2.7.2 > Apple Xcode is installed. > NumPy is not installed (could not install for 2.7, only available for 2.6; > decided to skip) > > What version of Xcode do you have, and were there any options when you installed about SDK installation and command line tool installation? It worked for me on one of my machines which has Mac OS X 10.7 Lion, but I am not in front of it right now to check the details. Peter From schnoes at gmail.com Mon Feb 27 17:50:36 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Mon, 27 Feb 2012 14:50:36 -0800 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> Message-ID: Hi Daria, It sounds like you might be new to Mac/OS 10... an easy way to ease into that is to use a package manager (or if you're like me and just like to keep everything organized with a manager and don't always feel like self compiling). I have used a couple of different ones. I am currently using MacPorts (http://www.macports.org/) and they keep very up to date (just upgraded my biopython to 1.59 today). Fink (http://www.finkproject.org/) and Homebrew (http://mxcl.github.com/homebrew/) are two other commonly used/ well known managers. Best, Alex On Mon, Feb 27, 2012 at 2:37 PM, Peter Cock wrote: > On Monday, February 27, 2012, Daria Fedyukina wrote: > > > Hello, > > > > I am trying to install BioPython 1.58. My BioPython and python are both > in > > the Applications folder. > > My configuration is: > > OS: Mac OS 10.7.3 > > Python: 2.7.2 > > Apple Xcode is installed. > > NumPy is not installed (could not install for 2.7, only available for > 2.6; > > decided to skip) > > > > > What version of Xcode do you have, and were there any options when you > installed about SDK installation and command line tool installation? It > worked for me on one of my machines which has Mac OS X 10.7 Lion, but I am > not in front of it right now to check the details. > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Alexandra Schnoes, Ph.D. Scientist, Babbitt Laboratory Program Coordinator, Graduate Student Internships for Career Exploration University of California San Francisco Tel: 415-502-1248 Fax: 415-514-9656 Email: schnoes at gmail.com From anna.kostikova at gmail.com Tue Feb 28 08:28:50 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Tue, 28 Feb 2012 14:28:50 +0100 Subject: [Biopython] Entrez.efetch issue with the returning data type Message-ID: Dear list, Since today I've started to have an issue with Entrez.efetch utility, and in particular with the rettype parameter. Essentially, the format of the data returned when I specify Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') does not correspond anymore to a genbank datatype. Few days ago all was fine. What is going on? the script is: from Bio import Entrez Entrez.email = 'my.email at domain.com' local_file=open("test.gb", 'w') try: handle = Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') #download record with Entrez.efetch local_file.write(handle.read()) #write to a file handle.close() except: print "Accsession id is not found" local_file.close() Thanks a lot for your ideas, Anna From p.j.a.cock at googlemail.com Tue Feb 28 09:16:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 Feb 2012 14:16:39 +0000 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 1:28 PM, Anna Kostikova wrote: > Dear list, > > Since today I've started to have an issue with Entrez.efetch utility, > and in particular with the rettype parameter. Essentially, the format > of the data returned when I specify > Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') does not > correspond anymore to a genbank datatype. Few days ago all was fine. > What is going on? The NCBI changed the defaults in mid Feb 2012 with the release of EFetch v2.0 - you must now add retmode="text" to the Entrez fetch call otherwise you'll probably get XML back. We mentioned this on the release notes for Biopython 1.59, but I should probably add this link as well: http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000085.html Peter From Matej.Repic at ki.si Tue Feb 28 09:12:31 2012 From: Matej.Repic at ki.si (=?iso-8859-2?Q?Matej_Repi=E8?=) Date: Tue, 28 Feb 2012 14:12:31 +0000 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: Message-ID: For me the script works without a problem (Biopython 1.58, Python 2.6). On the other hand, some things were changed on the Pubmed side this month, so maybe it would work for you if you edit your script according to this table: http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/? report=objectonly The full documentation on Efetch 2.0 is available at: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Efetch Regards, ---------------------------------------------------------- Matej Repi? Junior Researcher Laboratory for Biocomputing and Bioinformatics National Institute of Chemistry Hajdrihova 19 SI-1001 Ljubljana POB 660 Slovenia tel: +386-1-4760457 e-mail: matej.repic at ki.si ---------------------------------------------------------- On 28.2.12 14:28, "Anna Kostikova" wrote: >Dear list, > >Since today I've started to have an issue with Entrez.efetch utility, >and in particular with the rettype parameter. Essentially, the format >of the data returned when I specify >Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') does not >correspond anymore to a genbank datatype. Few days ago all was fine. >What is going on? > >the script is: >from Bio import Entrez >Entrez.email = 'my.email at domain.com' > >local_file=open("test.gb", 'w') >try: > handle = >Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') #download >record with Entrez.efetch > local_file.write(handle.read()) #write to a file > handle.close() >except: > print "Accsession id is not found" >local_file.close() > >Thanks a lot for your ideas, >Anna >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Tue Feb 28 09:35:03 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 Feb 2012 14:35:03 +0000 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> Message-ID: On Mon, Feb 27, 2012 at 11:14 PM, Daria Fedyukina wrote: > Hi Peter, > > I have version 4.2.1 of Xcode. I think that is what my Mac OS X 10.7 Lion machine has - not checked yet. > I went to Macintosh HD-> Developer and I see SDKs and Tools folders. But the > content of both is daunting. I have no idea what I can do with it, probably > nothing. When I click on Xcode in my Dock, there is no command line, it > looks like a piece of GUI software with, it reminds me Eclipse. Xcode covers a *lot* of stuff, including a GUI for editing source code, system library files for compiling code, and compilers themselves (for C, C++, etc). > The command line that I used before is called X11 and I found it in > /Applications/Utilities/X11. > > I tried to figure out what the difference between X11 and Xcode (if any) and > whether X11 was in Utilities prior Xcode installation. But I do not know the > answers to these questions. Normally I would use the Mac OS X command line via the "Terminal" application, also under /Applications/Utilities, but getting to a command line prompt via X11 instead should work. Peter From anna.kostikova at gmail.com Tue Feb 28 10:15:33 2012 From: anna.kostikova at gmail.com (Anna Kostikova) Date: Tue, 28 Feb 2012 16:15:33 +0100 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: Dear Peter, Thanks a lot for the prompt response. Yes, exactly, it was retmode="text" thing. Thanks a lot again! Anna 2012/2/28 Peter Cock : > On Tue, Feb 28, 2012 at 1:28 PM, Anna Kostikova > wrote: >> Dear list, >> >> Since today I've started to have an issue with Entrez.efetch utility, >> and in particular with the rettype parameter. Essentially, the format >> of the data returned when I specify >> Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') does not >> correspond anymore to a genbank datatype. Few days ago all was fine. >> What is going on? > > The NCBI changed the defaults in mid Feb 2012 with the release of > EFetch v2.0 - you must now add retmode="text" to the Entrez fetch > call otherwise you'll probably get XML back. We mentioned this on > the release notes for Biopython 1.59, but I should probably add this > link as well: > > http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000085.html > > Peter From p.j.a.cock at googlemail.com Tue Feb 28 12:18:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 Feb 2012 17:18:28 +0000 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com> References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com> Message-ID: On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina wrote: > > Hi Peter, > > Thank you. I tried Terminal, it is indeed exactly the same. > NumPy is not installed somehow. Here is my screenshot. Do you have any > suggestions on how to install NumPy in such a way that this installation is > seen when I do "python setup.py build" in the terminal? > > -Daria Hi Daria, Personally I use Apple's provided Python, which from memory comes with a (slightly out of date) copy of NumPy. A more detailed reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. If anyone else online right now has Biopython installed and working on Mac OS X 10.7 "Lion" and can help now, please speak up. One important question could be was this a clean install of Lion, or an update from Snow Leopard? I'm a little confused now even with the screenshots. It looks like you have download python-2.7.2-macosx10.6.dmg and numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should work fine together BUT you will have two copies of Python (the Apple provided Python, and Python.org's Python) and it can be confusing about which is in use and which has a given library installed Peter From fedyukina at gmail.com Tue Feb 28 12:22:50 2012 From: fedyukina at gmail.com (Daria Fedyukina) Date: Tue, 28 Feb 2012 11:22:50 -0600 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com> Message-ID: <6A4268ED-0E86-49B2-87C9-D5095EDE5654@gmail.com> Hi all, Answering Peter's question: I got a new laptop with Lion already installed. -Daria On Feb 28, 2012, at 11:18 AM, Peter Cock wrote: > On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina wrote: >> >> Hi Peter, >> >> Thank you. I tried Terminal, it is indeed exactly the same. >> NumPy is not installed somehow. Here is my screenshot. Do you have any >> suggestions on how to install NumPy in such a way that this installation is >> seen when I do "python setup.py build" in the terminal? >> >> -Daria > > Hi Daria, > > Personally I use Apple's provided Python, which from memory > comes with a (slightly out of date) copy of NumPy. A more detailed > reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. > > If anyone else online right now has Biopython installed and working > on Mac OS X 10.7 "Lion" and can help now, please speak up. > > One important question could be was this a clean install of Lion, > or an update from Snow Leopard? > > I'm a little confused now even with the screenshots. It looks > like you have download python-2.7.2-macosx10.6.dmg and > numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should > work fine together BUT you will have two copies of Python (the > Apple provided Python, and Python.org's Python) and it can be > confusing about which is in use and which has a given library > installed > > Peter From anaryin at gmail.com Tue Feb 28 12:23:49 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 28 Feb 2012 18:23:49 +0100 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com> Message-ID: Hi Daria, Which Python architecture did you install? 32 bits or 64? This thread in SO solves a similar problem: http://stackoverflow.com/questions/6839795/cant-figure-out-the-architecture-type-of-problem-when-compiling-python Maybe it solves yours? Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao No dia 28 de Fevereiro de 2012 18:18, Peter Cock escreveu: > On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina > wrote: > > > > Hi Peter, > > > > Thank you. I tried Terminal, it is indeed exactly the same. > > NumPy is not installed somehow. Here is my screenshot. Do you have any > > suggestions on how to install NumPy in such a way that this installation > is > > seen when I do "python setup.py build" in the terminal? > > > > -Daria > > Hi Daria, > > Personally I use Apple's provided Python, which from memory > comes with a (slightly out of date) copy of NumPy. A more detailed > reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. > > If anyone else online right now has Biopython installed and working > on Mac OS X 10.7 "Lion" and can help now, please speak up. > > One important question could be was this a clean install of Lion, > or an update from Snow Leopard? > > I'm a little confused now even with the screenshots. It looks > like you have download python-2.7.2-macosx10.6.dmg and > numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should > work fine together BUT you will have two copies of Python (the > Apple provided Python, and Python.org's Python) and it can be > confusing about which is in use and which has a given library > installed > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From fedyukina at gmail.com Tue Feb 28 12:28:29 2012 From: fedyukina at gmail.com (Daria Fedyukina) Date: Tue, 28 Feb 2012 11:28:29 -0600 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com>

Message-ID: <8CC22492-718A-451D-A8B5-D216E0A37932@gmail.com> I downloaded and installed this one: Python 2.7.2 Mac OS X 64-bit/32-bit x86-64/i386 Installer because on the web site it says that this version is for 10.6 and 10.7 But it did not help me to avoid the problem. Also my exit status is 1, and here it is 255. Do these numbers tell me anything useful? I could not locate definite answer by googling. On Feb 28, 2012, at 11:23 AM, Jo?o Rodrigues wrote: > Hi Daria, > > Which Python architecture did you install? 32 bits or 64? > > This thread in SO solves a similar problem: http://stackoverflow.com/questions/6839795/cant-figure-out-the-architecture-type-of-problem-when-compiling-python > > Maybe it solves yours? > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > No dia 28 de Fevereiro de 2012 18:18, Peter Cock escreveu: > On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina wrote: > > > > Hi Peter, > > > > Thank you. I tried Terminal, it is indeed exactly the same. > > NumPy is not installed somehow. Here is my screenshot. Do you have any > > suggestions on how to install NumPy in such a way that this installation is > > seen when I do "python setup.py build" in the terminal? > > > > -Daria > > Hi Daria, > > Personally I use Apple's provided Python, which from memory > comes with a (slightly out of date) copy of NumPy. A more detailed > reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. > > If anyone else online right now has Biopython installed and working > on Mac OS X 10.7 "Lion" and can help now, please speak up. > > One important question could be was this a clean install of Lion, > or an update from Snow Leopard? > > I'm a little confused now even with the screenshots. It looks > like you have download python-2.7.2-macosx10.6.dmg and > numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should > work fine together BUT you will have two copies of Python (the > Apple provided Python, and Python.org's Python) and it can be > confusing about which is in use and which has a given library > installed > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Tue Feb 28 13:46:15 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 Feb 2012 18:46:15 +0000 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: <8CC22492-718A-451D-A8B5-D216E0A37932@gmail.com> References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com>

<8CC22492-718A-451D-A8B5-D216E0A37932@gmail.com> Message-ID: On my Lion laptop, Apple provides Python 2.6.7 and 2.7.1 $ which python2.6 /usr/bin/python2.6 $ which python2.7 /usr/bin/python2.7 $ which python /usr/bin/python The default on my machine is Python 2.7, and both this and Python 2.6 have NumPy 1.5.1 installed - however I'm not 100% if I installed this (and if so how), or if it came on the machine: $ python Python 2.7.1 (r271:86832, Jun 25 2011, 05:09:01) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__version__ '1.5.1' >>> quit() $ python2.6 Python 2.6.7 (r267:88850, Jul 31 2011, 19:30:54) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__version__ '1.5.1' >>> quit() That probably doesn't help much :( Peter From anaryin at gmail.com Tue Feb 28 14:29:45 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 28 Feb 2012 20:29:45 +0100 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com>

<8CC22492-718A-451D-A8B5-D216E0A37932@gmail.com> Message-ID: I googled a bit and likely you have architecture mismatches everywhere.. The numpy error is clearly from that as 1) I've seen it before and 2) so did someone else . Therefore, I would try to compile Numpy from source in your computer and then install biopython. I'm guessing that these might help a bit. Also, from your traceback, which XCode did you install? It says 10.6 in a few places but Lion should be 10.7. Dunno if this might cause some mismatches when compiling stuff.. From Jared.Sampson at nyumc.org Tue Feb 28 16:05:10 2012 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Tue, 28 Feb 2012 16:05:10 -0500 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com> Message-ID: Hi Daria et al. - What about using pip or easy_install? I'm running a Mac with Lion and I had no problem installing Biopython via: sudo pip install numpy sudo pip install biopython I got a "you need numpy first" warning when I tried installing biopython alone, but running the commands above worked just fine to install numpy first, then Bio. It may be useful to note I've done this mainly within a virtualenv, but I think it should work using the system Python as well. Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center 550 First Ave MSB 329/398 New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Feb 28, 2012, at 12:18 PM, Peter Cock wrote: On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina > wrote: Hi Peter, Thank you. I tried Terminal, it is indeed exactly the same. NumPy is not installed somehow. Here is my screenshot. Do you have any suggestions on how to install NumPy in such a way that this installation is seen when I do "python setup.py build" in the terminal? -Daria Hi Daria, Personally I use Apple's provided Python, which from memory comes with a (slightly out of date) copy of NumPy. A more detailed reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. If anyone else online right now has Biopython installed and working on Mac OS X 10.7 "Lion" and can help now, please speak up. One important question could be was this a clean install of Lion, or an update from Snow Leopard? I'm a little confused now even with the screenshots. It looks like you have download python-2.7.2-macosx10.6.dmg and numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should work fine together BUT you will have two copies of Python (the Apple provided Python, and Python.org's Python) and it can be confusing about which is in use and which has a given library installed Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython ------------------------------------------------------------ This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. ================================= From karolisr at gmail.com Tue Feb 28 23:01:30 2012 From: karolisr at gmail.com (Karolis Ramanauskas) Date: Tue, 28 Feb 2012 22:01:30 -0600 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: Good day, I think it would be a good idea to not rely on the NCBI defaults in Biopython and hardcode retmode and other defaults directly into the Biopython methods. Basically having defaults that Biopython guarantees. That way the defaults will not change haphazardly when NCBI decides to change things. Of course users will still be able to set their own parameters if they do not use default behavior, but this would reduce unexpected changes and the need to go over long stretches of code and change things. Karolis On Tue, Feb 28, 2012 at 09:15, Anna Kostikova wrote: > Dear Peter, > > Thanks a lot for the prompt response. Yes, exactly, it was retmode="text" thing. > Thanks a lot again! > > Anna > > 2012/2/28 Peter Cock : >> On Tue, Feb 28, 2012 at 1:28 PM, Anna Kostikova >> wrote: >>> Dear list, >>> >>> Since today I've started to have an issue with Entrez.efetch utility, >>> and in particular with the rettype parameter. Essentially, the format >>> of the data returned when I specify >>> Entrez.efetch(db='nucleotide',id='JQ042818.1',rettype='gb') does not >>> correspond anymore to a genbank datatype. Few days ago all was fine. >>> What is going on? >> >> The NCBI changed the defaults in mid Feb 2012 with the release of >> EFetch v2.0 - you must now add retmode="text" to the Entrez fetch >> call otherwise you'll probably get XML back. We mentioned this on >> the release notes for Biopython 1.59, but I should probably add this >> link as well: >> >> http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000085.html >> >> Peter > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Wed Feb 29 05:28:03 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 10:28:03 +0000 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: On Wed, Feb 29, 2012 at 4:01 AM, Karolis Ramanauskas wrote: > Good day, > > I think it would be a good idea to not rely on the NCBI defaults in > Biopython and hardcode retmode and other defaults directly into the > Biopython methods. Basically having defaults that Biopython > guarantees. That way the defaults will not change haphazardly when > NCBI decides to change things. Of course users will still be able to > set their own parameters if they do not use default behavior, but this > would reduce unexpected changes and the need to go over long stretches > of code and change things. > > Karolis Hi Karolis, That has downsides too - take the NCBI QBLAST API for calling an online BLAST search at the NCBI. Biopython has default values which (I presume) once matched the NCBI, but the NCBI has changed things. As a result, the defaults have diverged and we often see user queries about why their BLAST results via Biopython are different. You could write to the NCBI Entrez team and urge them to consider backwards compatibility more strongly. Peter P.S. I've added an FAQ entry about EFetch defaults to the tutorial: https://github.com/biopython/biopython/commit/2a2ef7ab7b5a9c3ed3891922df7e2d47e2701faf From karolisr at gmail.com Wed Feb 29 08:57:22 2012 From: karolisr at gmail.com (Karolis Ramanauskas) Date: Wed, 29 Feb 2012 07:57:22 -0600 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: I see, you can never make everyone happy. Thanks. On Wed, Feb 29, 2012 at 04:28, Peter Cock wrote: > On Wed, Feb 29, 2012 at 4:01 AM, Karolis Ramanauskas wrote: >> Good day, >> >> I think it would be a good idea to not rely on the NCBI defaults in >> Biopython and hardcode retmode and other defaults directly into the >> Biopython methods. Basically having defaults that Biopython >> guarantees. That way the defaults will not change haphazardly when >> NCBI decides to change things. Of course users will still be able to >> set their own parameters if they do not use default behavior, but this >> would reduce unexpected changes and the need to go over long stretches >> of code and change things. >> >> Karolis > > Hi Karolis, > > That has downsides too - take the NCBI QBLAST API for calling an > online BLAST search at the NCBI. Biopython has default values > which (I presume) once matched the NCBI, but the NCBI has changed > things. As a result, the defaults have diverged and we often see user > queries about why their BLAST results via Biopython are different. > > You could write to the NCBI Entrez team and urge them to consider > backwards compatibility more strongly. > > Peter > > P.S. I've added an FAQ entry about EFetch defaults to the tutorial: > https://github.com/biopython/biopython/commit/2a2ef7ab7b5a9c3ed3891922df7e2d47e2701faf From p.j.a.cock at googlemail.com Wed Feb 29 12:34:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 17:34:56 +0000 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: On Wed, Feb 29, 2012 at 1:57 PM, Karolis Ramanauskas wrote: > I see, you can never make everyone happy. Thanks. Sadly true of many things in life. However, in this particular case, encouraging people to be explicit and provide their desired EFetch retmode/rettype is not only a practical solution, but also very Pythonic: Zen Of Python: Explicit is better than implicit. http://www.python.org/dev/peps/pep-0020/ Peter From karolisr at gmail.com Wed Feb 29 12:40:39 2012 From: karolisr at gmail.com (Karolis Ramanauskas) Date: Wed, 29 Feb 2012 11:40:39 -0600 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: I agree, reading their code after a few weeks of writing it, most people would not remember what implicit settings they relied on. On Feb 29, 2012 11:34 AM, "Peter Cock" wrote: > > On Wed, Feb 29, 2012 at 1:57 PM, Karolis Ramanauskas wrote: > > I see, you can never make everyone happy. Thanks. > > Sadly true of many things in life. > > However, in this particular case, encouraging people to be > explicit and provide their desired EFetch retmode/rettype is > not only a practical solution, but also very Pythonic: > > Zen Of Python: Explicit is better than implicit. > http://www.python.org/dev/peps/pep-0020/ > > Peter From p.j.a.cock at googlemail.com Wed Feb 29 12:45:14 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Feb 2012 17:45:14 +0000 Subject: [Biopython] Entrez.efetch issue with the returning data type In-Reply-To: References:

Message-ID: On Wed, Feb 29, 2012 at 5:40 PM, Karolis wrote: > On Feb 29, 2012 11:34 AM, Peter wrote: >> On Wed, Feb 29, 2012 at 1:57 PM, Karolis wrote >> > I see, you can never make everyone happy. Thanks. >> >> Sadly true of many things in life. >> >> However, in this particular case, encouraging people to be >> explicit and provide their desired EFetch retmode/rettype is >> not only a practical solution, but also very Pythonic: >> >> Zen Of Python: Explicit is better than implicit. >> http://www.python.org/dev/peps/pep-0020/ >> >> Peter > > I agree, reading their code after a few weeks of writing it, most people > would not remember what implicit settings they relied on. I suppose we could add a warning if the retmode or rettype is not set explicitly. Maybe we should have done it for Biopython 1.59? It would have got people's attention and given them a big clue about how their EFetch script might have been broken by the recent NCBI change. Peter From Jared.Sampson at nyumc.org Wed Feb 29 15:56:53 2012 From: Jared.Sampson at nyumc.org (Sampson, Jared) Date: Wed, 29 Feb 2012 15:56:53 -0500 Subject: [Biopython] question on installation of BioPython on Mac OS 10.7 In-Reply-To: References: <55428E6B-EC5B-42EA-B91C-2C45499D6091@gmail.com> <1D2B139E-E1A7-4682-B2B8-B24186DEEA2D@gmail.com> <76AE15B7-EA83-4DDF-83D3-BAC426E74CE2@gmail.com>

Message-ID: <78E520BD-F3A5-49D4-B83C-60170FC4AE16@nyumc.org> Hi Daria - If the output of typing: which easy_install in a terminal window gives a path, then you have easy_install on your machine already. So you could just type: sudo easy_install pip When that finishes, run the commands from before to install numpy and biopython. Pip and easy_install are both Python package managers, and the choice of one over the other is largely a matter of preference. So if you'd rather not install pip, you could replace "pip install" with "easy_install" in those previous commands. If that doesn't work (i.e. you don't have easy_install either), you can install pip or setuptools (which includes easy_install) by following the instructions on this website I gave the link to before, under 'installation instructions.' Hope that helps, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center 550 First Ave MSB 329/398 New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Feb 29, 2012, at 12:30 PM, Daria Fedyukina wrote: Thank you Jared, I tried to use these commands but it does not know what "pip" is. I used with no "pip", but it did not help. I will keep searching for the cure :) -Daria On Feb 28, 2012, at 3:05 PM, Sampson, Jared wrote: Hi Daria et al. - What about using pip or easy_install? I'm running a Mac with Lion and I had no problem installing Biopython via: sudo pip install numpy sudo pip install biopython I got a "you need numpy first" warning when I tried installing biopython alone, but running the commands above worked just fine to install numpy first, then Bio. It may be useful to note I've done this mainly within a virtualenv, but I think it should work using the system Python as well. Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center 550 First Ave MSB 329/398 New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Feb 28, 2012, at 12:18 PM, Peter Cock wrote: On Tue, Feb 28, 2012 at 5:00 PM, Daria Fedyukina > wrote: Hi Peter, Thank you. I tried Terminal, it is indeed exactly the same. NumPy is not installed somehow. Here is my screenshot. Do you have any suggestions on how to install NumPy in such a way that this installation is seen when I do "python setup.py build" in the terminal? -Daria Hi Daria, Personally I use Apple's provided Python, which from memory comes with a (slightly out of date) copy of NumPy. A more detailed reply will have to wait until I'm at my Mac OS X 10.7 "Lion" machine. If anyone else online right now has Biopython installed and working on Mac OS X 10.7 "Lion" and can help now, please speak up. One important question could be was this a clean install of Lion, or an update from Snow Leopard? I'm a little confused now even with the screenshots. It looks like you have download python-2.7.2-macosx10.6.dmg and numpy-1.6.1-py2.7-python.org-macosx10.3.dmg which should work fine together BUT you will have two copies of Python (the Apple provided Python, and Python.org's Python) and it can be confusing about which is in use and which has a given library installed Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython ------------------------------------------------------------ This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. ================================= ------------------------------------------------------------ This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. ================================= From mictadlo at gmail.com Wed Feb 29 20:40:02 2012 From: mictadlo at gmail.com (Mic) Date: Thu, 1 Mar 2012 11:40:02 +1000 Subject: [Biopython] samtools does not return correct exit code Message-ID: Hallo, Samtools does not return correct the exit code: import subprocess import logging import sys def run_cmd(args): if subprocess.call(args,shell=True) != 0: print 'hello' logging.error("Error copying sequence file args='%s'" % str(args)) return 1 print 'e', sys.stderr print 'o', sys.stdout return 0 def runSamtools( cmd ): '''run a samtools command''' try: retcode = subprocess.call(cmd, shell=True) print retcode if retcode < 0: print >>sys.stderr, "Child was terminated by signal", -retcode except OSError, e: print >>sys.stderr, "Execution failed:", e print run_cmd("samtools faidx ex1.fa") print runSamtools("samtools faidx ex1.fa") print 'Hello still alive' and as output I got: $ python p3.py open: No such file or directory [_razf_open] fail to open ex1.fa [fai_build] fail to open the FASTA file ex1.fa e ', mode 'w' at 0x7ffa4658d270> o ', mode 'w' at 0x7ffa4658d1e0> 0 open: No such file or directory [_razf_open] fail to open ex1.fa [fai_build] fail to open the FASTA file ex1.fa 0 None Hello still alive How can I get sure that all samtools commands were executed successfully? Thank you in advance. From p.j.a.cock at googlemail.com Wed Feb 1 17:22:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Feb 2012 17:22:18 +0000 Subject: [Biopython] regarding retrieving antigen information of specific gene using Biopython In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 7:00 AM, shweta dubey wrote: > hello everyone, > > I am new to Biopython.I have a set of genes and i want information of > antigens specific to these genes from a database(suppose, Antigen > Database). > > How can i do the same using Biopython?? > > Thanks in advance > > Shweta Dubey Hi, Which antigen database are you trying to use? If it is one of the NCBI ones you can probably use their Entrez API via Biopython. Peter From bpkth2012 at gmail.com Thu Feb 2 15:45:17 2012 From: bpkth2012 at gmail.com (Sarttu Bourvir) Date: Thu, 2 Feb 2012 16:45:17 +0100 Subject: [Biopython] parsing Blast results (xml) Message-ID: Hi, I am new to biopython and having problems parsing a blast reulst file (xml format). I can get out alignments, alignment length, title etc. But I would additionally need to print the query title , percent similarity, e-value. How does one do that? Is there anywhere else than Biopython cookbook and help(Bio.Blast.NCBIXML.Record) to look for information. I feel like I don't really understand the Blast.Record and where in there things can be found. Is the sequence query title in the header? Example code would be greatly appreciated! Thank you, From p.j.a.cock at googlemail.com Thu Feb 2 16:09:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 2 Feb 2012 16:09:54 +0000 Subject: [Biopython] parsing Blast results (xml) In-Reply-To: References: Message-ID: On Thu, Feb 2, 2012 at 3:45 PM, Sarttu Bourvir wrote: > Hi, > I am new to biopython and having problems parsing a blast reulst file (xml > format). > I can get out alignments, alignment length, title etc. > But I would additionally need to print the query title , percent > similarity, e-value. Well e-value is easy, and covered in the tutorial - e.g. for alignment in blast_record.alignments: for hsp in alignment.hsps: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' For percentage similarity I think you must use hsp.positives and the alignment length. Likewise hsp.identities can be used to get the percentage identity. > How does one do that? ?Is there anywhere else than Biopython > cookbook and help(Bio.Blast.NCBIXML.Record) to look for information. I assume you also know about dir(...) as well? e.g. try dir(hsp) after the above example or dir(alignment) to see what attributes these objects have. > I feel like I don't really understand the > Blast.Record and where in there things can be found. > Is the sequence query title in the header? Yes, the query details should be captured. Try dir(blast_record) where blast_record is a Bio.Blast.Record from the parser. Peter From drobukow at UTMB.EDU Tue Feb 7 14:41:50 2012 From: drobukow at UTMB.EDU (Obukowicz, Dennis R.) Date: Tue, 7 Feb 2012 14:41:50 +0000 Subject: [Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others Message-ID: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> I am new to Biopython and have tried installing Biopython according to instructions. When I run the test after installing I get many errors, 96 errors (see below some examples) in all out of 154 test runs. Two errors that keep popping up are not being able to find module Seq and module Alphabet. ImportError: No module named Seq ImportError: No module named Alphabet NameError: name 'Seq' is not defined NameError: name 'record' is not defined NameError: name 'protein_rec' is not defined NameError: name 'protein_rec' is not defined Dennis From drobukow at UTMB.EDU Tue Feb 7 16:01:21 2012 From: drobukow at UTMB.EDU (Obukowicz, Dennis R.) Date: Tue, 7 Feb 2012 16:01:21 +0000 Subject: [Biopython] Problems Installing: Can't find modules Seq and Alphabet plus many others In-Reply-To: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> References: <43C6C371D341DE44A81477FD432BA65854CB0B1F@GRMBX4.utmb.edu> Message-ID: <43C6C371D341DE44A81477FD432BA65854CB0B94@GRMBX4.utmb.edu> I solved some of my earlier problems with adjusting the path with the sys.path.append, adding directories where packages are located. However, now I keep getting this error below. I've searched for this error but can't find any mention of it. Can anyone help? ERROR: Bio.Wise ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 327, in runTest module = __import__(name, None, None, name.split(".")) File "/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/Wise/__init__.py", line 20, in from Bio import SeqIO File "/usr/local/biopython/biopython-1.58/build/lib.linux-x86_64-2.7/Bio/SeqIO/__init__.py", line 308, in import Seq File "/usr/local/biopython/biopython-1.58/Bio/Seq.py", line 31, in import ambiguous_dna_complement, ambiguous_rna_complement ImportError: No module named ambiguous_dna_complement From: Obukowicz, Dennis R. Sent: Tuesday, February 07, 2012 8:42 AM To: 'biopython at lists.open-bio.org' Subject: Problems Installing: Can't find modules Seq and Alphabet plus many others I am new to Biopython and have tried installing Biopython according to instructions. When I run the test after installing I get many errors, 96 errors (see below some examples) in all out of 154 test runs. Two errors that keep popping up are not being able to find module Seq and module Alphabet. ImportError: No module named Seq ImportError: No module named Alphabet NameError: name 'Seq' is not defined NameError: name 'record' is not defined NameError: name 'protein_rec' is not defined NameError: name 'protein_rec' is not defined Dennis From devaniranjan at gmail.com Wed Feb 8 01:01:31 2012 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 7 Feb 2012 20:01:31 -0500 Subject: [Biopython] comparing sequences.qustion Message-ID: Hi, I have a list of > 200, 000 UNIQUE short EQUAL length sequences. I do the following I am comparing ALL sequences against ALL sequences so there will be (200000 * 199999 )/2 comparisons Once a sequence is compared, if they differ from one another by ONE letter only . then I do another more detailed alignment using a BLOSUM matrix. Currently I use the pairwise sequence comparison code found in BIOPYTHON for both comparison, simple comparison where I set match = 0 mismatch = -1 If the total alignment score is equal to -1 (meaning only one mismatch) then I go a further step and do a BLOSUM alignment. This works but its taking a long long time, I suspect its because I am using TWO alignments but I think there could be a way to do the first simple alignment WITHOUT using the pairwise alignment code for the first part will speed up this calculation. Unfortunately I don't have much more than a desktop to do this, so if someone can suggest a quicker way to do this, I would appreciate it. Thank you, George From eric.talevich at gmail.com Wed Feb 8 01:50:28 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 7 Feb 2012 20:50:28 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan wrote: > Hi, > > I have a list of > 200, 000 UNIQUE short EQUAL length sequences. > I do the following > > I am comparing ALL sequences against ALL sequences so there will be (200000 > * 199999 )/2 comparisons > Once a sequence is compared, if they differ from one another by ONE letter > only . then I do another more detailed alignment using a BLOSUM matrix. > > Currently I use the pairwise sequence comparison code found in BIOPYTHON > for both comparison, simple comparison where I set > match = 0 > mismatch = -1 > If the total alignment score is equal to -1 (meaning only one mismatch) > then I go a further step and do a BLOSUM alignment. > > This works but its taking a long long time, I suspect its because I am > using TWO alignments but I think there could be a way to do the first > simple alignment WITHOUT using the pairwise alignment code for the first > part will speed up this calculation. > Unfortunately I don't have much more than a desktop to do this, so if > someone can suggest a quicker way to do this, I would appreciate it. > > Thank you, > George > > Hi George, If your sequences are all equal length, and you're interested in the ones that differ by 1 character, then the difference between any two of those sequences of interest will be a single mismatched character. You don't need to do an alignment at all. Without Python: try clustering at whatever identify threshold corresponds to edit distance 1 in your sequences. UCLUST/USEARCH and other programs can do this quickly. With Python: try an expression like: seq_pairs_of_interest = [] for i, aseq in input_seq_list[:-1]: for j, bseq in input_seq_list[i+1:]: if sum(a != b for a, b in zip(aseq, bseq)) == 1: seq_pairs_of_interest.append((aseq, bseq)) Hope that helps, Eric From nje5 at georgetown.edu Wed Feb 8 15:50:15 2012 From: nje5 at georgetown.edu (Nathan Edwards) Date: Wed, 08 Feb 2012 10:50:15 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: References: Message-ID: <4F3299B7.4090305@georgetown.edu> Classical method (essentially BYP, obligatory reference to Goldberg): * for each sequence, divide in two, get s1 and s2. * place the sequences (or an reference/index) in a dictionary with list values at key s1 and s2. This is linear time. Any pair of sequences that differ in only one position _must_ have at least one of their halves in common, so do detailed alignment on all pairs of sequences with a common key. You specified unique, so each pair must be considered at most once. If you had duplicates, these would be aligned for each of their halves (and you'd have to normalize these out, somehow). This will be a small fraction of all pairs, assuming these are not pathological sequences. This works well as long as the halves have enough specificity - for DNA length 10 halves should work. Note that this doesn't distinguish between left-halves and right-halves, which might have the same key values, but obviously won't differ by one. Fixing this is an easy modification. BTW, this works even for edit-distance. Only concern is the use of the in-memory dictionary data-structure, which can get big. Untested pseudocode: from collections import defaultdict from itertools import combinations n = 20 halves = defaultdict(list) for s in sequences: s1 = s[:n/2] s2 = s[n/2:] halves[s1].append(s) halves[s2].append(s) for k in halves.iterkeys(): for seq1,seq2 in combinations(halves[k],2): # check for one-change before expensive alignment? align(seq1,seq2) - n On 2/7/2012 8:50 PM, Eric Talevich wrote: > On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan > wrote: > >> Hi, >> >> I have a list of> 200, 000 UNIQUE short EQUAL length sequences. >> I do the following >> >> I am comparing ALL sequences against ALL sequences so there will be (200000 >> * 199999 )/2 comparisons >> Once a sequence is compared, if they differ from one another by ONE letter >> only . then I do another more detailed alignment using a BLOSUM matrix. >> >> Currently I use the pairwise sequence comparison code found in BIOPYTHON >> for both comparison, simple comparison where I set >> match = 0 >> mismatch = -1 >> If the total alignment score is equal to -1 (meaning only one mismatch) >> then I go a further step and do a BLOSUM alignment. >> >> This works but its taking a long long time, I suspect its because I am >> using TWO alignments but I think there could be a way to do the first >> simple alignment WITHOUT using the pairwise alignment code for the first >> part will speed up this calculation. >> Unfortunately I don't have much more than a desktop to do this, so if >> someone can suggest a quicker way to do this, I would appreciate it. >> >> Thank you, >> George >> >> > Hi George, > > If your sequences are all equal length, and you're interested in the ones > that differ by 1 character, then the difference between any two of those > sequences of interest will be a single mismatched character. You don't need > to do an alignment at all. > > Without Python: try clustering at whatever identify threshold corresponds > to edit distance 1 in your sequences. UCLUST/USEARCH and other programs can > do this quickly. > > With Python: try an expression like: > > seq_pairs_of_interest = [] > for i, aseq in input_seq_list[:-1]: > for j, bseq in input_seq_list[i+1:]: > if sum(a != b for a, b in zip(aseq, bseq)) == 1: > seq_pairs_of_interest.append((aseq, bseq)) > > > Hope that helps, > Eric > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Dr. Nathan Edwards nje5 at georgetown.edu Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Room 1215, Harris Building Room 347, Basic Science 3300 Whitehaven St, NW 3900 Reservoir Road, NW Washington DC 20007 Washington DC 20007 Phone: 202-687-7042 Phone: 202-687-1618 Fax: 202-687-0057 Fax: 202-687-7186 From nje5 at georgetown.edu Wed Feb 8 16:08:30 2012 From: nje5 at georgetown.edu (Nathan Edwards) Date: Wed, 08 Feb 2012 11:08:30 -0500 Subject: [Biopython] comparing sequences.qustion In-Reply-To: <4F3299B7.4090305@georgetown.edu> References: <4F3299B7.4090305@georgetown.edu> Message-ID: <4F329DFE.6010607@georgetown.edu> Argh, Gusfield "Algorithms on Strings, Trees, and Sequences" is the obligatory string matching reference... - n On 2/8/2012 10:50 AM, Nathan Edwards wrote: > > Classical method (essentially BYP, obligatory reference to Goldberg): > > * for each sequence, divide in two, get s1 and s2. > * place the sequences (or an reference/index) in a dictionary with list > values at key s1 and s2. > > This is linear time. > > Any pair of sequences that differ in only one position _must_ have at > least one of their halves in common, so do detailed alignment on all > pairs of sequences with a common key. You specified unique, so each pair > must be considered at most once. If you had duplicates, these would be > aligned for each of their halves (and you'd have to normalize these out, > somehow). This will be a small fraction of all pairs, assuming these are > not pathological sequences. > > This works well as long as the halves have enough specificity - for DNA > length 10 halves should work. Note that this doesn't distinguish between > left-halves and right-halves, which might have the same key values, but > obviously won't differ by one. Fixing this is an easy modification. BTW, > this works even for edit-distance. Only concern is the use of the > in-memory dictionary data-structure, which can get big. > > Untested pseudocode: > > from collections import defaultdict > from itertools import combinations > > n = 20 > halves = defaultdict(list) > for s in sequences: > s1 = s[:n/2] > s2 = s[n/2:] > halves[s1].append(s) > halves[s2].append(s) > > for k in halves.iterkeys(): > for seq1,seq2 in combinations(halves[k],2): > # check for one-change before expensive alignment? > align(seq1,seq2) > > - n > > On 2/7/2012 8:50 PM, Eric Talevich wrote: >> On Tue, Feb 7, 2012 at 8:01 PM, George Devaniranjan >> wrote: >> >>> Hi, >>> >>> I have a list of> 200, 000 UNIQUE short EQUAL length sequences. >>> I do the following >>> >>> I am comparing ALL sequences against ALL sequences so there will be >>> (200000 >>> * 199999 )/2 comparisons >>> Once a sequence is compared, if they differ from one another by ONE >>> letter >>> only . then I do another more detailed alignment using a BLOSUM matrix. >>> >>> Currently I use the pairwise sequence comparison code found in BIOPYTHON >>> for both comparison, simple comparison where I set >>> match = 0 >>> mismatch = -1 >>> If the total alignment score is equal to -1 (meaning only one mismatch) >>> then I go a further step and do a BLOSUM alignment. >>> >>> This works but its taking a long long time, I suspect its because I am >>> using TWO alignments but I think there could be a way to do the first >>> simple alignment WITHOUT using the pairwise alignment code for the first >>> part will speed up this calculation. >>> Unfortunately I don't have much more than a desktop to do this, so if >>> someone can suggest a quicker way to do this, I would appreciate it. >>> >>> Thank you, >>> George >>> >>> >> Hi George, >> >> If your sequences are all equal length, and you're interested in the ones >> that differ by 1 character, then the difference between any two of those >> sequences of interest will be a single mismatched character. You don't >> need >> to do an alignment at all. >> >> Without Python: try clustering at whatever identify threshold corresponds >> to edit distance 1 in your sequences. UCLUST/USEARCH and other >> programs can >> do this quickly. >> >> With Python: try an expression like: >> >> seq_pairs_of_interest = [] >> for i, aseq in input_seq_list[:-1]: >> for j, bseq in input_seq_list[i+1:]: >> if sum(a != b for a, b in zip(aseq, bseq)) == 1: >> seq_pairs_of_interest.append((aseq, bseq)) >> >> >> Hope that helps, >> Eric >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > -- Dr. Nathan Edwards nje5 at georgetown.edu Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Room 1215, Harris Building Room 347, Basic Science 3300 Whitehaven St, NW 3900 Reservoir Road, NW Washington DC 20007 Washington DC 20007 Phone: 202-687-7042 Phone: 202-687-1618 Fax: 202-687-0057 Fax: 202-687-7186 From d.m.a.martin at dundee.ac.uk Thu Feb 9 14:36:42 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Thu, 9 Feb 2012 14:36:42 +0000 Subject: [Biopython] Proteomics tools in BioPython Message-ID: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> We are planning to develop some proteomics tools in python and have a view to submit them as part of Biopython. Primarily we will be writing wrappers/parsers for the OpenMS tools/output formats and analytic tools on top of that. If anyone else is working on python wrappers for openms then I'd be happy to share expertise. ..d Dr David Martin College of Life Sciences University of Dundee The University of Dundee is a registered Scottish Charity, No: SC015096 From p.j.a.cock at googlemail.com Thu Feb 9 18:10:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 9 Feb 2012 18:10:39 +0000 Subject: [Biopython] Proteomics tools in BioPython In-Reply-To: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> Message-ID: On Thu, Feb 9, 2012 at 2:36 PM, David Martin wrote: > We are planning to develop some proteomics tools in python and > have a view to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS > tools/output formats and analytic tools on top of that. If anyone > else is working on python wrappers for openms then I'd be happy > to share expertise. > > ..d Thanks David - that was quick, you must have sent this almost straight after our chat this afternoon at the Dundee NextGenBUG meeting :) Peter From eric.talevich at gmail.com Thu Feb 9 20:41:02 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 9 Feb 2012 15:41:02 -0500 Subject: [Biopython] Proteomics tools in BioPython In-Reply-To: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F086CB9AD@AMSPRD0402MB109.eurprd04.prod.outlook.com> Message-ID: On Thu, Feb 9, 2012 at 9:36 AM, David Martin wrote: > We are planning to develop some proteomics tools in python and have a view > to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS tools/output > formats and analytic tools on top of that. If anyone else is working on > python wrappers for openms then I'd be happy to share expertise. > > ..d > > Sounds great to me! Since Google Summer of Code is coming up soon, do you see an opportunity to take on a student to help out with this work or build something on top of it? -Eric From d.m.a.martin at dundee.ac.uk Fri Feb 10 17:03:04 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Fri, 10 Feb 2012 17:03:04 +0000 Subject: [Biopython] Proteomics tools in biopython Message-ID: <959CFF5060375249824CC633DDDF896F08709A95@AMSPRD0402MB113.eurprd04.prod.outlook.com> There may be potential but I don't have the time to get organized for this year. We shall see how things progress. ..d On Thu, Feb 9, 2012 at 9:36 AM, David Martin wrote: > We are planning to develop some proteomics tools in python and have a > view to submit them as part of Biopython. > Primarily we will be writing wrappers/parsers for the OpenMS > tools/output formats and analytic tools on top of that. If anyone else > is working on python wrappers for openms then I'd be happy to share expertise. > > ..d > > Sounds great to me! Since Google Summer of Code is coming up soon, do you see an opportunity to take on a student to help out with this work or build something on top of it? -Eric ------------------------------ _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython End of Biopython Digest, Vol 110, Issue 5 ***************************************** The University of Dundee is a registered Scottish Charity, No: SC015096 From rbuels at gmail.com Fri Feb 10 17:51:12 2012 From: rbuels at gmail.com (Robert Buels) Date: Fri, 10 Feb 2012 12:51:12 -0500 Subject: [Biopython] Google Summer of Code project ideas Message-ID: <4F355910.4060203@gmail.com> Hi all, I'm going to be OBF project admin again this year for Google Summer of code. OBF's application is due in a couple of weeks, and we need to update our project ideas on the OBF wiki page and on each project's individual wiki pages. So, for each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. Please have the updates done, if possible, by this Friday (March 11). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2012 OBF GSoC Organization Admin From mjldehoon at yahoo.com Sun Feb 12 03:35:44 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 11 Feb 2012 19:35:44 -0800 (PST) Subject: [Biopython] Digital gene expression Message-ID: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Hi everybody, EdgeR and DESeq are popular R packages to analyze differential expression in digital gene expression methodologies such as RNAseq and CAGE. Is something similar to these R packages available in Python (or is anybody working on such a module for Biopython)? Not that I don't like EdgeR / DESeq, but I'd prefer something in Python so that I can understand what I am doing. Thanks, -Michiel. From p.j.a.cock at googlemail.com Sun Feb 12 12:27:12 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 12 Feb 2012 12:27:12 +0000 Subject: [Biopython] Digital gene expression In-Reply-To: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> References: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Message-ID: On Sunday, February 12, 2012, Michiel de Hoon wrote: > Hi everybody, > > EdgeR and DESeq are popular R packages to analyze differential > expression in digital gene expression methodologies such as > RNAseq and CAGE. Is something similar to these R packages > available in Python (or is anybody working on such a module > for Biopython)? Not that I don't like EdgeR / DESeq, but I'd > prefer something in Python so that I can understand what I > am doing. > > Thanks, > -Michiel. I'm not sure, but try rpy or rpy2 for calling these R libraries from Python. If you know both languages it is very powerful. Peter From tiagoantao at gmail.com Sun Feb 12 12:39:20 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 12 Feb 2012 12:39:20 +0000 Subject: [Biopython] Digital gene expression In-Reply-To: References: <1329017744.74960.YahooMailClassic@web161203.mail.bf1.yahoo.com> Message-ID: Hi, On Sun, Feb 12, 2012 at 12:27 PM, Peter Cock > I'm not sure, but try rpy or rpy2 for calling these R rpy2 is an extremely declarative library. One almost forgets it is writing R inside Python. It seems to be brilliantly well done. I have only used it a couple of times myself, but I can only offer praise for it. From dan.bolser at gmail.com Tue Feb 14 01:32:39 2012 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 14 Feb 2012 01:32:39 +0000 Subject: [Biopython] Fwd: Interested in Variation? In-Reply-To: References:

Message-ID: Job at the EBI: "... the primary responsibility of the post-holder will be the development of pipelines and storage solutions for variation data deriving from whole genome re-sequencing." http://goo.gl/eQrRu or http://ig14.i-grasp.com/fe/tpl_embl01.asp?s=LktVsYDaNlCOtQqCli&jobid=47627,4187528723&key=45520467&c=126152212583&pagestamp=dbnjwyufpgvrmyvkmt Cheers, Dan. From chris.mit7 at gmail.com Tue Feb 14 20:30:41 2012 From: chris.mit7 at gmail.com (Chris Mitchell) Date: Tue, 14 Feb 2012 15:30:41 -0500 Subject: [Biopython] Proteomics tools in BioPython Message-ID: Hey David, What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) Chris From d.m.a.martin at dundee.ac.uk Wed Feb 15 17:22:48 2012 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Wed, 15 Feb 2012 17:22:48 +0000 Subject: [Biopython] Proteomics tools for biopython Message-ID: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> > Hey David, > > What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) > > Chris At present we are wrapping the OpenMS outputs (featureML etc) so that we can interrogate the detail of how the runs behave. It is insightful to see (for example) how many of the ms/ms are on overlapping peptides, and the distribution of ms/ms selections per feature (vs intensity). This is just the first stage. Having these data (which up till now have been difficult to access) allows for building of smarter tools (custom delta mass thresholds for each ms/ms, second peptide searching, seeing whether all the peptide ID for a feature agree, correlating ID from different search engines to the same spectra). There are outstanding questions from our users for things like 'is it really necessary to do duplicate runs?' or in other words, can we get the machine to treat duplicate runs differently to optimise ID. (under the principle that madness is doing the same thing repeatedly but expecting different results.) Parsers for XTandem! would be really useful as that is something we'd like to have in our tool chain. A Mascot one would be good - I am looking into that (it is on my list of things to do, just not near the top right now.) I very much favour a modular approach where each class/object does one thing really well and can feed output to another class, and all can be represented using open formats. It might be a good idea to arrange a telecon or Skype group chat for people who are interested in contributing to this and building a comprehensive set of tools into Biopython. I can't promise too much from our end but we are making good progress and we have a strong commitment to open software and algorithms, with a heavy python development presence. ..d The University of Dundee is a registered Scottish Charity, No: SC015096 From Achim.Treumann at NEPAF.com Wed Feb 15 18:49:30 2012 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Wed, 15 Feb 2012 18:49:30 -0000 Subject: [Biopython] Proteomics tools for biopython In-Reply-To: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> References: <959CFF5060375249824CC633DDDF896F08710546@AMSPRD0402MB113.eurprd04.prod.outlook.com> Message-ID: <01798D2396253A449511F31F1CDE835519D5ED@srv1.NEPAF.local> Hi, I could possibly contribute a few snippets regarding an X!Tandem parser and a few other little tools - at the moment I am very busy, but can take part in the discussion more in March. Another interesting toolset that has been released today by Mike Gorshkov's research group is pyteomics 1.0.0 http://pypi.python.org/pypi/pyteomics/1.0.0 Best wishes, Achim -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of David Martin Sent: 15 February 2012 17:23 To: 'biopython at lists.open-bio.org' Subject: [Biopython] Proteomics tools for biopython > Hey David, > > What sort of tools do you have in mind for proteomics? I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.) > > Chris At present we are wrapping the OpenMS outputs (featureML etc) so that we can interrogate the detail of how the runs behave. It is insightful to see (for example) how many of the ms/ms are on overlapping peptides, and the distribution of ms/ms selections per feature (vs intensity). This is just the first stage. Having these data (which up till now have been difficult to access) allows for building of smarter tools (custom delta mass thresholds for each ms/ms, second peptide searching, seeing whether all the peptide ID for a feature agree, correlating ID from different search engines to the same spectra). There are outstanding questions from our users for things like 'is it really necessary to do duplicate runs?' or in other words, can we get the machine to treat duplicate runs differently to optimise ID. (under the principle that madness is doing the same thing repeatedly but expecting different results.) Parsers for XTandem! would be really useful as that is something we'd like to have in our tool chain. A Mascot one would be good - I am looking into that (it is on my list of things to do, just not near the top right now.) I very much favour a modular approach where each class/object does one thing really well and can feed output to another class, and all can be represented using open formats. It might be a good idea to arrange a telecon or Skype group chat for people who are interested in contributing to this and building a comprehensive set of tools into Biopython. I can't promise too much from our end but we are making good progress and we have a strong commitment to open software and algorithms, with a heavy python development presence. ..d The University of Dundee is a registered Scottish Charity, No: SC015096 _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From schnoes at gmail.com Fri Feb 17 01:18:30 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Thu, 16 Feb 2012 17:18:30 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: Hi, I have some python code (using BioPython 1.58) that uses Bio.Entrez to pull out information on 50 papers from pubmed. I have had no problem with this code until yesterday when I started getting HTTP Error 500 messages that continue until today. For example... Traceback (most recent call last): File "", line 1, in File "sp_tools.py", line 421, in top_papers_dict rettype="medline", retmode="text") File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 113, in efetch return _open(cgi, variables) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 360, in _open raise exception urllib2.HTTPError: HTTP Error 500: Internal server error The parameters I'm using are db = pubmed rettype = medline retmode = text Anyone have an idea why this might be happening now? Thanks! Alexandra -- Alexandra Schnoes, Ph.D. Scientist, Babbitt Laboratory Program Coordinator, Graduate Student Internships for Career Exploration University of California San Francisco Tel: 415-502-1248 Fax: 415-514-9656 Email: schnoes at gmail.com From jgrant at smith.edu Fri Feb 17 02:39:54 2012 From: jgrant at smith.edu (Jessica Grant) Date: Thu, 16 Feb 2012 21:39:54 -0500 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: Message-ID: I have a script I have used successfully in the past that uses Entrez. Yesterday, a lab-mate was using it but about half way through the files she was processing she got an error and it wouldn't work after that. I went over the script from top to bottom to see what went wrong and couldn't find a problem. Your question gives me hope that it is something happening at ncbi. Our error was not identical. Our script seemed to work until it tried to read the output of the Entrez.efetch and then found no record in the handle (or something like that...I don't have it in front of me.) I suggested my lab mate contact the ncbi help desk to see if anything was wrong on their end, but I dont' know if she did or if she heard back. Jessica On Thu, Feb 16, 2012 at 8:18 PM, Alexandra Schnoes wrote: > Hi, > > I have some python code (using BioPython 1.58) that uses Bio.Entrez to pull > out information on 50 papers from pubmed. I have had no problem with this > code until yesterday when I started getting HTTP Error 500 messages that > continue until today. For example... > > Traceback (most recent call last): > File "", line 1, in > File "sp_tools.py", line 421, in top_papers_dict > rettype="medline", retmode="text") > File > > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 113, in efetch > return _open(cgi, variables) > File > > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", > line 360, in _open > raise exception > urllib2.HTTPError: HTTP Error 500: Internal server error > > The parameters I'm using are > db = pubmed > rettype = medline > retmode = text > > Anyone have an idea why this might be happening now? > > Thanks! > Alexandra > > > -- > Alexandra Schnoes, Ph.D. > Scientist, Babbitt Laboratory > Program Coordinator, Graduate Student Internships for Career Exploration > University of California San Francisco > Tel: 415-502-1248 > Fax: 415-514-9656 > Email: schnoes at gmail.com > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mictadlo at gmail.com Fri Feb 17 06:39:27 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 17 Feb 2012 16:39:27 +1000 Subject: [Biopython] histogram plot of insert size Message-ID: Hi all, How is it possible to create histogram plot of insert size with pysam/Biopython? Thank you in advance. Cheers From schnoes at gmail.com Fri Feb 17 07:31:10 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Thu, 16 Feb 2012 23:31:10 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References:

Message-ID: Thanks, that is somewhat encouraging to hear. I have also emailed NCBI (that's actually where the multi-repeat subject line came from. My computer sometimes has weird lags and apparently I hit ctrl-v a couple of times when copying over the email I sent to NCBI, and didn't notice it. My apologies). Hopefully one of us will hear something soon! Alex On Thu, Feb 16, 2012 at 6:39 PM, Jessica Grant wrote: > I have a script I have used successfully in the past that uses Entrez. > Yesterday, a lab-mate was using it but about half way through the files > she was processing she got an error and it wouldn't work after that. I > went over the script from top to bottom to see what went wrong and couldn't > find a problem. Your question gives me hope that it is something happening > at ncbi. Our error was not identical. Our script seemed to work until it > tried to read the output of the Entrez.efetch and then found no record in > the handle (or something like that...I don't have it in front of me.) > > I suggested my lab mate contact the ncbi help desk to see if anything was > wrong on their end, but I dont' know if she did or if she heard back. > > Jessica > > > > > > On Thu, Feb 16, 2012 at 8:18 PM, Alexandra Schnoes wrote: > >> Hi, >> >> I have some python code (using BioPython 1.58) that uses Bio.Entrez to >> pull >> out information on 50 papers from pubmed. I have had no problem with this >> code until yesterday when I started getting HTTP Error 500 messages that >> continue until today. For example... >> >> Traceback (most recent call last): >> File "", line 1, in >> File "sp_tools.py", line 421, in top_papers_dict >> rettype="medline", retmode="text") >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 113, in efetch >> return _open(cgi, variables) >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Entrez/__init__.py", >> line 360, in _open >> raise exception >> urllib2.HTTPError: HTTP Error 500: Internal server error >> >> The parameters I'm using are >> db = pubmed >> rettype = medline >> retmode = text >> >> Anyone have an idea why this might be happening now? >> >> Thanks! >> Alexandra >> >> >> From p.j.a.cock at googlemail.com Fri Feb 17 09:56:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 09:56:05 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References:

Message-ID: On Fri, Feb 17, 2012 at 7:31 AM, Alexandra Schnoes wrote: > Thanks, that is somewhat encouraging to hear. I have also emailed NCBI > (that's actually where the multi-repeat subject line came from. My computer > sometimes has weird lags and apparently I hit ctrl-v a couple of times when > copying over the email I sent to NCBI, and didn't notice it. My apologies). > Hopefully one of us will hear something soon! > Alex This is probably due to a recent NCBI change (on Wednesday 15 Feb 2012) with the release of EFetch 2.0, see: http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2012-February/000085.html http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Release_Notes They have changed things with the default retmode, although looking at their Table 1 from the above link, using rettype="medline", retmode="text" looks OK still with both db="pmc" and ab"pubmed" databases. Peter From mictadlo at gmail.com Fri Feb 17 10:13:31 2012 From: mictadlo at gmail.com (Mic) Date: Fri, 17 Feb 2012 20:13:31 +1000 Subject: [Biopython] histogram plot of insert size In-Reply-To: <84771354-DF77-49FE-B1EC-F3EAA1FF21F6@hsr.it> References: <84771354-DF77-49FE-B1EC-F3EAA1FF21F6@hsr.it> Message-ID: Hi Cittaro, Thank you for your solution. I run fixmate from samtools on a BAM file: HWI-ST226_0154:4:1206:12773:170407#CTTGTA 73 A_01a 1046 30 100M * 0 0 AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT fdfffdfffbddaec_dSc^ddd^dQc^`Udddad_`c^^ac`R_NV\\]T^c`T_Mc]aV[V\R`Xa^^EKIIVcccc`YNY]UV[U`BBBBBBBBBBB NM:i:1 MD:Z:73T26 I just wonder which column I have to take to fill isize_array? Thank you in advance. On Fri, Feb 17, 2012 at 7:07 PM, Cittaro Davide wrote: > > On Feb 17, 2012, at 7:39 AM, Mic wrote: > > > Hi all, > > How is it possible to create histogram plot of insert size with > pysam/Biopython? > > As far as you have some plotting library yes. > Take a look to matplotlib and try this: > > import matplotlib.pyplot as plt > > f = plt.figure() > h = f.add_subplot(111) > h.hist(isize_array, bins=50, normed=True) > f.savefig('myhist.pdf', format='pdf') > > assuming isize_array is your array/list of insert sizes > > d > /* > Davide Cittaro, PhD > > Head of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > San Raffaele Scientific Institute > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > From Matej.Repic at ki.si Fri Feb 17 13:20:13 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 13:20:13 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: <47063F60-DC26-4901-9D12-8503F285F3C9@ki.si> Fortunately, the fix is quite simple: Substitute the id=idlist in you fetch line with id=",".join(idlist). Explanation: This has something to do with how Pubmed accepts a python list. If you enter PMIDs by hand it works ok, but if you feed it a python list you get an internal server error. Instead of using the procedure from the Cookbook, where you feed the list: >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") Use this slightly modified line for record_handle. With the ",".join(idlist) you convert the idlist python list to a comma separated string, which works as expected. >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=",".join(idlist), rettype="medline", retmode="text") Kind regards, Matej ---------------------------------------------------------- Matej Repi? Junior Researcher Laboratory for Biocomputing and Bioinformatics National Institute of Chemistry Hajdrihova 19 SI-1001 Ljubljana POB 660 Slovenia tel: +386-1-4760457 e-mail: matej.repic at ki.si ---------------------------------------------------------- From Matej.Repic at ki.si Fri Feb 17 13:14:10 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 13:14:10 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Message-ID: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Fortunately, the fix is quite simple: Substitute the id=idlist in you fetch line with id=",".join(idlist). Explanation: This has something to do with how Pubmed accepts a python list. If you enter PMIDs by hand it works ok, but if you feed it a python list you get an internal server error. Instead of using the procedure from the Cookbook, where you feed the list: >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") Use this slightly modified line for record_handle. With the ",".join(idlist) you convert the idlist python list to a comma separated string, which works as expected. >>> fetch_handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463) >>> record = Entrez.read(fetch_handle) >>> idlist = record["IdList"] >>> record_handle = Entrez.efetch(db="pubmed", id=",".join(idlist), rettype="medline", retmode="text") Kind regards, Matej ---------------------------------------------------------- Matej Repi? Junior Researcher Laboratory for Biocomputing and Bioinformatics National Institute of Chemistry Hajdrihova 19 SI-1001 Ljubljana POB 660 Slovenia tel: +386-1-4760457 e-mail: matej.repic at ki.si ---------------------------------------------------------- From p.j.a.cock at googlemail.com Fri Feb 17 13:36:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 13:36:13 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Message-ID: On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: > Fortunately, the fix is quite simple: > > Substitute the id=idlist in you fetch line with id=",".join(idlist). > Hi Matej, Well spotted. The idea is that this call, Entrez.efetch(db="pubmed", id=['22307645', '22303114', '22301129', '22299544', '22298842'], rettype="medline", retmode="text") accesses this URL (where I have removed the email entry) which used to work but isn't following the letter of the specification: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&tool=biopython&db=pubmed&id=22307645&id=22303114&id=22301129&id=22299544&id=22298842&rettype=medline Whereas this call: h = Entrez.efetch(db="pubmed", id="22307645,22303114,22301129,22299544,22298842", rettype="medline", retmode="text") actually uses a different URL (which the NCBI would approve of): http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&tool=biopython&db=pubmed&id=22307645%2C22303114%2C22301129%2C22299544%2C22298842&rettype=medline It is possible the NCBI may opt to "fix" this, but it looks like it was only working in the past by accident. However, we can do the conversion inside the Bio.Entrez.efetch function in future - we're due for another Biopython release now anyway so you shouldn't have to wait too long. I'm having general errors from the Entrez server right now - so I can't confirm the problem or test the potential fix yet. Peter From p.j.a.cock at googlemail.com Fri Feb 17 14:41:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 14:41:26 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si> Message-ID: On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock wrote: > On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: >> Fortunately, the fix is quite simple: >> >> Substitute the id=idlist in you fetch line with id=",".join(idlist). >> > > Hi Matej, > > Well spotted. ... > > It is possible the NCBI may opt to "fix" this, but it looks like it was only > working in the past by accident. However, we can do the conversion inside > the Bio.Entrez.efetch function in future - we're due for another Biopython > release now anyway so you shouldn't have to wait too long. > > I'm having general errors from the Entrez server right now - so I can't confirm > the problem or test the potential fix yet. I guess they kicked the server or something - it is working again now, and I could confirm Matej Repi?'s findings and test my fix based on them: https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed If anyone needs the fix right now, you must install Biopython from source, or at least update the Bio/Entrez/__init__.py file by hand. Some testing would be appreciated - and then we'll try to expedite the release of Biopython 1.59 by the end of the month. Peter P.S. If anyone wants a small challenge for contributing to Biopython, an online unit test for this and other things in Bio.Entrez would be great. Please ask for more information on the biopython-dev list if you're interested in helping out. From jgrant at smith.edu Fri Feb 17 14:52:21 2012 From: jgrant at smith.edu (Jessica Grant) Date: Fri, 17 Feb 2012 09:52:21 -0500 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

Message-ID: <937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> My problem was fixed by changing the db to "nuccore" and the rettype to "fasta". Up until the other day, I was successfully using "nucleotide" and "gb". Jessica On Feb 17, 2012, at 9:41 AM, Peter Cock wrote: > On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock wrote: >> On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: >>> Fortunately, the fix is quite simple: >>> >>> Substitute the id=idlist in you fetch line with id=",".join(idlist). >>> >> >> Hi Matej, >> >> Well spotted. ... >> >> It is possible the NCBI may opt to "fix" this, but it looks like it was only >> working in the past by accident. However, we can do the conversion inside >> the Bio.Entrez.efetch function in future - we're due for another Biopython >> release now anyway so you shouldn't have to wait too long. >> >> I'm having general errors from the Entrez server right now - so I can't confirm >> the problem or test the potential fix yet. > > I guess they kicked the server or something - it is working again now, and > I could confirm Matej Repi?'s findings and test my fix based on them: > https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed > > If anyone needs the fix right now, you must install Biopython from source, > or at least update the Bio/Entrez/__init__.py file by hand. Some testing > would be appreciated - and then we'll try to expedite the release of > Biopython 1.59 by the end of the month. > > Peter > > P.S. If anyone wants a small challenge for contributing to Biopython, an > online unit test for this and other things in Bio.Entrez would be great. > Please ask for more information on the biopython-dev list if you're > interested in helping out. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From schnoes at gmail.com Fri Feb 17 17:33:59 2012 From: schnoes at gmail.com (Alexandra Schnoes) Date: Fri, 17 Feb 2012 09:33:59 -0800 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: <937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>

<937A6C1B-3C8D-4820-AD11-FA2BA730B047@smith.edu> Message-ID: Wow. That was quick! Thanks guys! Alex On Fri, Feb 17, 2012 at 6:52 AM, Jessica Grant wrote: > My problem was fixed by changing the db to "nuccore" and the rettype to > "fasta". Up until the other day, I was successfully using "nucleotide" and > "gb". > > Jessica > > > > > > > > On Feb 17, 2012, at 9:41 AM, Peter Cock wrote: > > > On Fri, Feb 17, 2012 at 1:36 PM, Peter Cock > wrote: > >> On Fri, Feb 17, 2012 at 1:14 PM, Matej Repi? wrote: > >>> Fortunately, the fix is quite simple: > >>> > >>> Substitute the id=idlist in you fetch line with id=",".join(idlist). > >>> > >> > >> Hi Matej, > >> > >> Well spotted. ... > >> > >> It is possible the NCBI may opt to "fix" this, but it looks like it was > only > >> working in the past by accident. However, we can do the conversion > inside > >> the Bio.Entrez.efetch function in future - we're due for another > Biopython > >> release now anyway so you shouldn't have to wait too long. > >> > >> I'm having general errors from the Entrez server right now - so I can't > confirm > >> the problem or test the potential fix yet. > > > > I guess they kicked the server or something - it is working again now, > and > > I could confirm Matej Repi?'s findings and test my fix based on them: > > > https://github.com/biopython/biopython/commit/01b091cd4679b58d7e478734324528dd9d52f3ed > > > > If anyone needs the fix right now, you must install Biopython from > source, > > or at least update the Bio/Entrez/__init__.py file by hand. Some testing > > would be appreciated - and then we'll try to expedite the release of > > Biopython 1.59 by the end of the month. > > > > Peter > > > > P.S. If anyone wants a small challenge for contributing to Biopython, an > > online unit test for this and other things in Bio.Entrez would be great. > > Please ask for more information on the biopython-dev list if you're > > interested in helping out. > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Matej.Repic at ki.si Fri Feb 17 21:21:24 2012 From: Matej.Repic at ki.si (=?utf-8?B?TWF0ZWogUmVwacSN?=) Date: Fri, 17 Feb 2012 21:21:24 +0000 Subject: [Biopython] Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 Bio/Entrez/efetch: Getting HTTP Error 500 In-Reply-To: References: <985E73BE-35D3-422D-8CC4-470FF9040C78@ki.si>