From biopython at maubp.freeserve.co.uk Wed Sep 1 12:52:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 17:52:51 +0100 Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers? Message-ID: Hello all, One of the improvements in Biopython 1.55 was a re-written location parser for Bio.GenBank (which also covers EMBL parsing). This made parsing much faster, and also meant Bio.GenBank.LocationParser and the underlying Bio.Parsers and Bio.Parsers.spark modules were obsolete. I'd like to mark these as deprecated in the next release: * Bio.GenBank.LocationParser * Bio.Parsers (including Bio.Parsers.spark) Would this cause anyone a problem? Thanks, Peter From j.reid at mail.cryst.bbk.ac.uk Fri Sep 3 09:11:28 2010 From: j.reid at mail.cryst.bbk.ac.uk (John Reid) Date: Fri, 03 Sep 2010 14:11:28 +0100 Subject: [Biopython] Wrong instance length bug in MEME parser Message-ID: Hi, The MEME parser in biopython 1.55 seems to incorrectly set the length of the first instance of a motif to 0. Here is an example: #Sequence, start, length, site Motif: E-value: 0.000010 seq_3, 213, 0, AGGTGACAGAG seq_1, 146, 11, AGGTGACAGAG seq_0, 490, 11, AGGTGACAGAG seq_0, 83, 11, AGGTGACAGAG seq_0, 388, 11, AGGAAACAGAG seq_1, 422, 11, AGGGGACAGAG seq_1, 79, 11, TGGAGACAGAG seq_0, 281, 11, TGGGGACAGAG seq_0, 16, 11, TAGAGACAGAG seq_1, 228, 11, TTGTGACAGAG seq_4, 156, 11, AGGGGACAGGG seq_0, 348, 11, AGGAAAGAGAA seq_0, 374, 11, AGGAATGAGAG seq_5, 22, 11, GGGAAACTGAG seq_3, 486, 11, AAGGGAGTGAG Here's the code that generated the above: from Bio.Motif.Parsers.MEME import MEMEParser import cStringIO meme_output = cStringIO.StringIO(""" ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ seq_0 1.0000 500 seq_1 1.0000 500 seq_2 1.0000 500 seq_3 1.0000 500 seq_4 1.0000 500 seq_5 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp -print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1 model: mod= anr nmotifs= 1 evt= inf object function= E-value of product of p-values width: minw= 8 maxw= 20 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 30 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 1000 distance= 1e-05 data: n= 3000 N= 6 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.195 C 0.305 G 0.305 T 0.195 Background letter frequencies (from dataset with add-one prior applied): A 0.195 C 0.305 G 0.305 T 0.195 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 11 sites = 15 llr = 159 E-value = 9.8e-006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 71:439:9:91 pos.-specific C ::::::8:::: probability G 18a37:2:a19 matrix T 31:3:1:1::: bits 2.4 2.1 * 1.9 * * * 1.6 * * *** Relative 1.4 * * **** Entropy 1.2 * * * **** (15.3 bits) 0.9 *** ******* 0.7 *********** 0.5 *********** 0.2 *********** 0.0 ----------- Multilevel AGGAGACAGAG consensus T TA G sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Strand Start P-value Site ------------- ------ ----- --------- ----------- seq_3 - 213 4.54e-07 GGCCTTTGGA AGGTGACAGAG GCGCGGCCAC seq_1 - 146 4.54e-07 CCCAACAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 490 4.54e-07 AAAACAGCAG AGGTGACAGAG seq_0 - 83 4.54e-07 CCCAGCAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 388 5.99e-07 ATGAGAGGAG AGGAAACAGAG CTTCCTGGAC seq_1 + 422 1.10e-06 ATGAGAGGGG AGGGGACAGAG GACACCTGAA seq_1 + 79 1.33e-06 TTGGTGGTAC TGGAGACAGAG GGCTGGTCCC seq_0 + 281 3.17e-06 CCTCCCCTGA TGGGGACAGAG GTCTCATCAG seq_0 + 16 5.72e-06 CTGGTGACAC TAGAGACAGAG GGCTGGTCCC seq_1 - 228 1.18e-05 TTATTTTCCT TTGTGACAGAG AAACCCAGCA seq_4 + 156 2.07e-05 TCAAGTCCCA AGGGGACAGGG AGCAGAAGGG seq_0 + 348 2.47e-05 GTAGACAGAA AGGAAAGAGAA AGTAAGGACA seq_0 + 374 3.14e-05 GGACAAAGGT AGGAATGAGAG GAGAGGAAAC seq_5 - 22 4.53e-05 CTCTTGTGTA GGGAAACTGAG CACGGGGAAC seq_3 + 486 5.02e-05 CGCCAATGGG AAGGGAGTGAG TGCC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_3 5e-05 212_[-1]_262_[+1]_4 seq_1 1.2e-05 78_[+1]_56_[-1]_71_[-1]_183_[+1]_68 seq_0 3.2e-06 15_[+1]_56_[-1]_187_[+1]_56_[+1]_ 15_[+1]_3_[+1]_91_[+1] seq_4 2.1e-05 155_[+1]_334 seq_5 4.5e-05 21_[-1]_468 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=11 seqs=15 seq_3 ( 213) AGGTGACAGAG 1 seq_1 ( 146) AGGTGACAGAG 1 seq_0 ( 490) AGGTGACAGAG 1 seq_0 ( 83) AGGTGACAGAG 1 seq_0 ( 388) AGGAAACAGAG 1 seq_1 ( 422) AGGGGACAGAG 1 seq_1 ( 79) TGGAGACAGAG 1 seq_0 ( 281) TGGGGACAGAG 1 seq_0 ( 16) TAGAGACAGAG 1 seq_1 ( 228) TTGTGACAGAG 1 seq_4 ( 156) AGGGGACAGGG 1 seq_0 ( 348) AGGAAAGAGAA 1 seq_0 ( 374) AGGAATGAGAG 1 seq_5 ( 22) GGGAAACTGAG 1 seq_3 ( 486) AAGGGAGTGAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006 177 -1055 -219 45 -55 -1055 139 -155 -1055 -1055 171 -1055 103 -1055 -19 77 45 -1055 127 -1055 226 -1055 -1055 -155 -1055 139 -61 -1055 215 -1055 -1055 -55 -1055 -1055 171 -1055 226 -1055 -219 -1055 -155 -1055 161 -1055 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006 0.666667 0.000000 0.066667 0.266667 0.133333 0.000000 0.800000 0.066667 0.000000 0.000000 1.000000 0.000000 0.400000 0.000000 0.266667 0.333333 0.266667 0.000000 0.733333 0.000000 0.933333 0.000000 0.000000 0.066667 0.000000 0.800000 0.200000 0.000000 0.866667 0.000000 0.000000 0.133333 0.000000 0.000000 1.000000 0.000000 0.933333 0.000000 0.066667 0.000000 0.066667 0.000000 0.933333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- [AT]GG[ATG][GA]A[CG]AGAG -------------------------------------------------------------------------------- Time 3.78 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_0 4.45e-04 15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)] seq_1 4.45e-04 78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68 seq_2 2.03e-01 500 seq_3 4.45e-04 212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4 seq_4 2.01e-02 155_[+1(2.07e-05)]_334 seq_5 4.34e-02 21_[-1(4.53e-05)]_468 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 1 reached. ******************************************************************************** CPU: john-dell ******************************************************************************** """) parser = MEMEParser() parsed = parser.parse(meme_output) print '#Sequence, start, length, site' for motif in parsed.motifs: print 'Motif: E-value: %f' % motif.evalue for instance in motif.instances: print "%10s, %5d, %5d, %s" % ( instance.sequence_name, instance.start, instance.length, str(instance), ) #assert instance.length == motif.length From biopython at maubp.freeserve.co.uk Fri Sep 3 09:44:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 14:44:27 +0100 Subject: [Biopython] Wrong instance length bug in MEME parser In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 2:11 PM, John Reid wrote: > Hi, > > The MEME parser in biopython 1.55 seems to incorrectly set the length of the > first instance of a motif to 0. Here is an example: > ... Could you file a bug with all that useful information? http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Thanks, Peter From bartek at rezolwenta.eu.org Fri Sep 3 10:52:32 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 3 Sep 2010 16:52:32 +0200 Subject: [Biopython] Wrong instance length bug in MEME parser In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 3:11 PM, John Reid wrote: > Hi, > > The MEME parser in biopython 1.55 seems to incorrectly set the length of > the first instance of a motif to 0. Here is an example: > > /cut/ Hi, Thanks for reporting the bug. It is fixed now in the main branch (small change, you can see the diff here : http://github.com/biopython/biopython/commit/102ad30a8c5d8bd87847000b33f771b40143e743 I'm closing the bug now, if you find anything else, please let us know. thanks Bartek From mitlox at op.pl Mon Sep 6 09:24:14 2010 From: mitlox at op.pl (xyz) Date: Mon, 06 Sep 2010 23:24:14 +1000 Subject: [Biopython] reading two fastq files at the same time Message-ID: <4C84EB7E.90200@op.pl> Hi, How is it possible to read two fastq files at the same time in BioPython? I have the following BioRuby example: require 'bio' begin fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2]) fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3]) while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry) fastq_A1 = entry1.entry_id fastq_A2 = entry1.seq fastq_B1 = entry2.entry_id fastq_B2 = entry2.seq end rescue => err raise "Exception: #{err}" end Thank you in advance. From biopython at maubp.freeserve.co.uk Mon Sep 6 09:51:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Sep 2010 14:51:13 +0100 Subject: [Biopython] reading two fastq files at the same time In-Reply-To: <4C84EB7E.90200@op.pl> References: <4C84EB7E.90200@op.pl> Message-ID: On Mon, Sep 6, 2010 at 2:24 PM, xyz wrote: > Hi, > How is it possible to read two fastq files at the same time in BioPython? I > have the following BioRuby example: > > require 'bio' > > begin > ?fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2]) > ?fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3]) > > ?while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry) > > ? ?fastq_A1 = entry1.entry_id > ? ?fastq_A2 = entry1.seq > > ? ?fastq_B1 = entry2.entry_id > ? ?fastq_B2 = entry2.seq > ?end > > rescue => err > ?raise "Exception: #{err}" > end > > Thank you in advance. Hi, If you are using Python 2.6+ then probably itertools.izip_longest would do what you want. You could use itertools.izip but this won't catch the error condition when one file has more records than the other. Alternatively you could use something like this, from Bio import SeqIO iter1 = SeqIO.parse(filename1, "fastq") iter2 = SeqIO.parse(filename1, "fastq") while True: try: rec1 = iter1.next() except StopIteration: rec1 = None try: rec2 = iter2.next() except StopIteration: rec2 = None if rec1 is None and rec2 is None: break #end of both files elif rec1 is None or rec2 is None: raise ValueError("Diff record count") else: print rec1.seq, rec1.id print rec2.seq, rec2.id I haven't tested that but it is based on a similar example in Bio.SeqIO.QualityIO.PairedFastaQualIterator for a paired FASTQ and QUAL file. Peter From biopython at maubp.freeserve.co.uk Thu Sep 9 13:13:34 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:13:34 +0100 Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers? In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: > Hello all, > > One of the improvements in Biopython 1.55 was a re-written location > parser for Bio.GenBank (which also covers EMBL parsing). This made > parsing much faster, and also meant Bio.GenBank.LocationParser > and the underlying Bio.Parsers and Bio.Parsers.spark modules were > obsolete. I'd like to mark these as deprecated in the next release: > > * Bio.GenBank.LocationParser > * Bio.Parsers (including Bio.Parsers.spark) > > Would this cause anyone a problem? > > Thanks, I've just added the deprecation warnings to the code, ready for Biopython 1.56 - it is not too late to undo this is anyone is using this code, but you need to tell us. Peter From margeemail at gmail.com Fri Sep 10 00:10:23 2010 From: margeemail at gmail.com (mailing list) Date: Fri, 10 Sep 2010 00:10:23 -0400 Subject: [Biopython] Added Biopython to my web tool Message-ID: I made a web application (http://utilitymill.com) that lets people make online utilities with Python. I thought you guys might appreciate I added Biopython as a built-in library users can use in their utilities. Here's an example of a utility using Biopython: http://utilitymill.com/utility/RNA_Transcription (It's very simple, I just wanted to try it out.) I'm curious to know if it's useful to you guys. And I'm also hoping I installed everything correctly, so let me know if anything doesn't work. -Greg From mjldehoon at yahoo.com Sat Sep 11 01:29:32 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 10 Sep 2010 22:29:32 -0700 (PDT) Subject: [Biopython] Parsing XML returned by efetch from the Journals database Message-ID: <825067.71139.qm@web62404.mail.re1.yahoo.com> Dear users, The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code. To make the parser more stable for other XML documents, I'd like to remove these hacks. Currently is anybody using Bio.Entrez to parse XML returned by efetch from the Journals database? --Michiel. From natassa_g_2000 at yahoo.com Mon Sep 13 12:22:26 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Mon, 13 Sep 2010 09:22:26 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? Message-ID: <533513.93597.qm@web52005.mail.re2.yahoo.com> Hello, I was wondering if there is a Biopython solution to parsing codeml results from paml. the output files are pretty standard, so such a parser should be quite straightforward to write up. I d volunteer for this, but thought I might check first if somebody else has done this. Actually, I found a read-only pypaml interface in google codes, tried it out and realized I had to edit several things to even import it (in python 2.5), which is quite strange: It was mainly built-in methods that throwed errors..Anyway, i 'corrected' this and then realized that the output files assumed by this code may not be the same as mine, although again, the outputs of codeml are pretty standard. I am not sure how much this code is used and was not sure what is the developper's email to ask him some questions. I am interested in parsing outputs from Branch, Site and BranchSite models, so everthing that codeml can do. Any information by experienced users is welcome! Thanks, Anastasia Gioti From biopython at maubp.freeserve.co.uk Mon Sep 13 12:45:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 17:45:28 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <533513.93597.qm@web52005.mail.re2.yahoo.com> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> Message-ID: On Mon, Sep 13, 2010 at 5:22 PM, natassa wrote: > Hello, > I was wondering if there ?is a Biopython solution to parsing codeml results from > paml. the output files are pretty standard, so such a parser should be quite > straightforward to write up. I d volunteer for this, but thought I might check > first if somebody else has done this. Actually, I found a read-only pypaml > interface in google codes, tried it out and realized I had to edit several > things to even import it (in python 2.5), which is quite strange: It was mainly > built-in methods that throwed errors..Anyway, i 'corrected' this and then > realized that the output files assumed by this code may not be the same as mine, > although again, the outputs of codeml are pretty standard. I am not sure how > much this code is used and was not sure what is the developper's email to ask > him some questions. > > I am interested in parsing outputs from Branch, Site and BranchSite models, so > everthing that codeml can do. Any information by experienced users is welcome! > Thanks, > Anastasia Gioti Hi Anastasia, Could you post a short example of the kind of output you are looking at? Can you get codeml to output what you need in another format, such as NEXUS? Peter From p.j.a.cock at googlemail.com Mon Sep 13 16:40:30 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 13 Sep 2010 21:40:30 +0100 Subject: [Biopython] Fwd: problems searching swiss prot In-Reply-To: References: Message-ID: Forwarding a query from Jessica Grant since she appears to have had trouble posting to the mailing list. Jessica wrote: > Hello, > > I am running a few scripts to try to extract sequence information > out of uniprot. ?One program called AutoFACT gives me ID numbers > associated with that database. ?Most of these look like this: > > D2V5S4_NAEGR > Q48KU2_PSE14 > Q22B72_TETTH > > > and my downstream scripts, which are written in biopython, are > fine with this. ?Then, every once in a while, a sequence will come > back with a name that looks like this: > > UPI00006CC162 > > and everything goes bad. ?My script can't handle these names, > apparently, although if I go to uniprot.org and search for it, the > sequence comes up. > > My script uses the following, where RepID is the number > extracted from AutoFACT: > > ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None) > ? ? ? ?seq_record = SeqIO.read(handle, "swiss") > > Any thoughts? > > Thank you, > > Jessica Hi Jessica, I think the problem is that these unusual identifiers are not UniProt/SwissProt accession identifiers. The URL this Biopython function uses was originally from www.expasy.ch but is now on www.uniprot.org as described here: http://www.expasy.ch/expasy_urls.html I think the ID UPI00006CC162 is a UniProt ID of some kind, so it may be possible to access the information you want somehow. See for example: http://www.uniprot.org/uniparc/UPI00006CC162 However, it is not clear to me right away if you can get this record back as a plain text "swiss" format entry... Peter From natassa_g_2000 at yahoo.com Tue Sep 14 04:02:18 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Tue, 14 Sep 2010 01:02:18 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: References: <533513.93597.qm@web52005.mail.re2.yahoo.com> Message-ID: <937371.74873.qm@web52006.mail.re2.yahoo.com> Hi Peter, Could you post a short example of the kind of output you are looking at? Here is an example output, but this caan differ depending on the model used (there are several models for Branch, Site, BranchSite, but all are pretty standard) -------------------------------------------------------------------------------OUTPUT------------------------- seed used = 808671289 CODONML (in paml version 4.4, January 2010) align.phy Model: One dN/dS ratio for branches Codon frequency model: F3x4 Site-class models: NearlyNeutral ns = 7 ls = 861 Codon usage in sequences -------------------------------------------------------------------------------------------------------------- Phe TTT 12 14 15 14 14 12 | Ser TCT 6 11 12 8 10 6 | Tyr TAT 5 5 4 7 9 5 | Cys TGT 11 8 10 9 11 8 TTC 23 18 18 20 20 20 | TCC 16 13 16 19 16 18 | TAC 11 12 13 17 11 13 | TGC 6 2 6 6 4 6 Leu TTA 8 5 6 5 4 2 | TCA 17 16 18 20 21 15 | *** TAA 0 0 0 0 0 0 | *** TGA 0 0 0 0 0 0 TTG 13 11 11 15 15 17 | TCG 17 14 14 17 17 18 | TAG 0 0 0 0 0 0 | Trp TGG 9 8 8 11 8 7 -------------------------------------------------------------------------------------------------------------- Leu CTT 13 15 16 11 12 16 | Pro CCT 7 7 10 6 10 8 | His CAT 8 7 8 4 6 5 | Arg CGT 6 4 5 4 5 5 CTC 14 14 13 19 14 15 | CCC 20 13 16 24 19 20 | CAC 23 18 22 20 24 17 | CGC 14 13 15 14 14 15 CTA 6 4 8 7 6 9 | CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18 13 | CGA 8 4 6 5 6 6 CTG 17 17 14 14 16 10 | CCG 7 8 8 9 6 8 | CAG 18 14 15 14 14 13 | CGG 7 7 8 9 9 8 -------------------------------------------------------------------------------------------------------------- Ile ATT 6 7 9 5 7 6 | Thr ACT 5 7 7 7 5 4 | Asn AAT 3 3 4 2 5 2 | Ser AGT 7 7 9 8 7 7 ATC 16 13 15 23 14 16 | ACC 21 14 17 20 20 16 | AAC 12 14 14 21 14 11 | AGC 14 13 14 15 11 10 ATA 13 9 10 11 11 10 | ACA 19 17 22 22 28 18 | Lys AAA 17 8 13 9 13 12 | Arg AGA 11 5 8 4 6 5 Met ATG 23 21 23 22 23 20 | ACG 11 12 12 12 14 13 | AAG 18 15 19 19 18 18 | AGG 9 10 13 14 12 13 -------------------------------------------------------------------------------------------------------------- Val GTT 8 13 10 10 10 6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15 17 | Gly GGT 13 7 12 10 11 10 GTC 18 13 18 20 19 21 | GCC 28 26 28 28 28 23 | GAC 29 21 26 33 29 30 | GGC 9 9 8 7 12 8 GTA 8 8 9 7 6 7 | GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21 22 | GGA 7 7 10 9 7 9 GTG 13 11 14 13 13 9 | GCG 11 10 10 10 7 7 | GAG 14 14 17 13 19 17 | GGG 7 6 9 8 7 9 -------------------------------------------------------------------------------------------------------------- -------------------------------------------------- Phe TTT 12 | Ser TCT 8 | Tyr TAT 6 | Cys TGT 8 TTC 22 | TCC 18 | TAC 15 | TGC 6 Leu TTA 5 | TCA 22 | *** TAA 0 | *** TGA 0 TTG 17 | TCG 17 | TAG 0 | Trp TGG 9 -------------------------------------------------- Leu CTT 14 | Pro CCT 12 | His CAT 5 | Arg CGT 6 CTC 19 | CCC 20 | CAC 20 | CGC 13 CTA 10 | CCA 16 | Gln CAA 17 | CGA 5 CTG 8 | CCG 11 | CAG 15 | CGG 8 -------------------------------------------------- Ile ATT 5 | Thr ACT 4 | Asn AAT 4 | Ser AGT 7 ATC 20 | ACC 21 | AAC 14 | AGC 12 ATA 11 | ACA 29 | Lys AAA 11 | Arg AGA 4 Met ATG 25 | ACG 15 | AAG 23 | AGG 13 -------------------------------------------------- Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT 7 GTC 18 | GCC 26 | GAC 33 | GGC 11 GTA 7 | GCA 24 | Glu GAA 23 | GGA 11 GTG 10 | GCG 8 | GAG 15 | GGG 11 -------------------------------------------------- Codon position x base (3x4) table for each sequence. #1: species1 position 1: T:0.18989 C:0.25524 A:0.25277 G:0.30210 position 2: T:0.26017 C:0.29470 A:0.27497 G:0.17016 position 3: T:0.17386 C:0.33785 A:0.24908 G:0.23921 Average T:0.20797 C:0.29593 A:0.25894 G:0.23716 #2: species2 position 1: T:0.19296 C:0.25211 A:0.24648 G:0.30845 position 2: T:0.27183 C:0.30704 A:0.26620 G:0.15493 position 3: T:0.20141 C:0.31831 A:0.22958 G:0.25070 Average T:0.22207 C:0.29249 A:0.24742 G:0.23803 #3: species3 position 1: T:0.18619 C:0.25031 A:0.25771 G:0.30580 position 2: T:0.25771 C:0.30210 A:0.26634 G:0.17386 position 3: T:0.19729 C:0.31936 A:0.24291 G:0.24044 Average T:0.21373 C:0.29059 A:0.25565 G:0.24003 #4: species4 position 1: T:0.20664 C:0.23616 A:0.26322 G:0.29397 position 2: T:0.26568 C:0.29766 A:0.27306 G:0.16359 position 3: T:0.16236 C:0.37638 A:0.21525 G:0.24600 Average T:0.21156 C:0.30340 A:0.25051 G:0.23452 #5: species5 position 1: T:0.19876 C:0.24348 A:0.25839 G:0.29938 position 2: T:0.25342 C:0.31677 A:0.26832 G:0.16149 position 3: T:0.18758 C:0.33416 A:0.23230 G:0.24596 Average T:0.21325 C:0.29814 A:0.25300 G:0.23561 #6: species6 position 1: T:0.19892 C:0.24899 A:0.24493 G:0.30717 position 2: T:0.26522 C:0.30041 A:0.26387 G:0.17050 position 3: T:0.17591 C:0.35047 A:0.22057 G:0.25304 Average T:0.21335 C:0.29995 A:0.24312 G:0.24357 #7: species7 position 1: T:0.20000 C:0.24121 A:0.26424 G:0.29455 position 2: T:0.25818 C:0.32000 A:0.26303 G:0.15879 position 3: T:0.16606 C:0.34909 A:0.23636 G:0.24848 Average T:0.20808 C:0.30343 A:0.25455 G:0.23394 Sums of codon usage counts ------------------------------------------------------------------------------ Phe F TTT 93 | Ser S TCT 61 | Tyr Y TAT 41 | Cys C TGT 65 TTC 141 | TCC 116 | TAC 92 | TGC 36 Leu L TTA 35 | TCA 129 | *** * TAA 0 | *** * TGA 0 TTG 99 | TCG 114 | TAG 0 | Trp W TGG 60 ------------------------------------------------------------------------------ Leu L CTT 97 | Pro P CCT 60 | His H CAT 43 | Arg R CGT 35 CTC 108 | CCC 132 | CAC 144 | CGC 98 CTA 50 | CCA 116 | Gln Q CAA 125 | CGA 40 CTG 96 | CCG 57 | CAG 103 | CGG 56 ------------------------------------------------------------------------------ Ile I ATT 45 | Thr T ACT 39 | Asn N AAT 23 | Ser S AGT 52 ATC 117 | ACC 129 | AAC 100 | AGC 89 ATA 75 | ACA 155 | Lys K AAA 83 | Arg R AGA 43 Met M ATG 157 | ACG 89 | AAG 130 | AGG 84 ------------------------------------------------------------------------------ Val V GTT 67 | Ala A GCT 87 | Asp D GAT 116 | Gly G GGT 70 GTC 127 | GCC 187 | GAC 201 | GGC 64 GTA 52 | GCA 151 | Glu E GAA 168 | GGA 60 GTG 83 | GCG 63 | GAG 109 | GGG 57 ------------------------------------------------------------------------------ (Ambiguity data are not used in the counts.) Codon position x base (3x4) table, overall position 1: T:0.19623 C:0.24664 A:0.25571 G:0.30141 position 2: T:0.26152 C:0.30559 A:0.26804 G:0.16485 position 3: T:0.18027 C:0.34113 A:0.23250 G:0.24610 Average T:0.21267 C:0.29779 A:0.25209 G:0.23746 Nei & Gojobori 1986. dN/dS (dN, dS) (Pairwise deletion) (Note: This matrix is not used in later ML. analysis. Use runmode = -2 for ML pairwise comparison.) species1 species2 0.2598 (0.0599 0.2306) species3 0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680) species4 0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981 0.3838) species5 0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487 (0.0552 0.2221) 0.2993 (0.0908 0.3034) species6 0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644 0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260) species7 0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787 0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967) TREE # 1: (((1, (2, 3)), 5), (6, 4), 7); MP score: -1 lnL(ntime: 11 np: 14): -7469.732728 +0.000000 8..9 9..10 10..1 10..11 11..2 11..3 9..5 8..12 12..6 12..4 8..7 0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030 0.062291 0.297695 0.117429 2.800021 0.731929 0.083728 Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site). tree length = 1.22476 (((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010): 0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429); (((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525): 0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4: 0.297695): 0.001030, species7: 0.117429); Detailed output identifying parameters kappa (ts/tv) = 2.80002 dN/dS (w) for site classes (K=2) p: 0.73193 0.26807 w: 0.08373 1.00000 dN & dS for each branch branch t N S dN/dS dN dS N*dN S*dS 8..9 0.180 1857.3 725.7 0.3294 0.0381 0.1158 70.8 84.0 9..10 0.083 1857.3 725.7 0.3294 0.0176 0.0534 32.7 38.7 10..1 0.173 1857.3 725.7 0.3294 0.0366 0.1111 68.0 80.6 10..11 0.088 1857.3 725.7 0.3294 0.0186 0.0563 34.5 40.9 11..2 0.067 1857.3 725.7 0.3294 0.0143 0.0434 26.6 31.5 11..3 0.032 1857.3 725.7 0.3294 0.0068 0.0206 12.6 15.0 9..5 0.124 1857.3 725.7 0.3294 0.0263 0.0798 48.8 57.9 8..12 0.001 1857.3 725.7 0.3294 0.0002 0.0007 0.4 0.5 12..6 0.062 1857.3 725.7 0.3294 0.0132 0.0401 24.5 29.1 12..4 0.298 1857.3 725.7 0.3294 0.0631 0.1917 117.2 139.1 8..7 0.117 1857.3 725.7 0.3294 0.0249 0.0756 46.2 54.9 Time used: 0:10 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Can you get codeml to output what you need in another format, such as NEXUS? Haven't tried that, but as you can see, this is a very verbose output and NEXUS does not seem an option. Ultimately, I want to parse this to get all the information I need in a tabulated file. I am still working out what exactly I need (there are standard values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type of downstram analysis). I will now work on the pypaml class and modify the original code to make it more generic (it seems that it only works for Site Models). Will let you know, was just wondering if there was already a solution.There is one in Bioperl, but heard it is very slow and in any case, I don't understand much of perl.... Thanks, Anastasia From biopython at maubp.freeserve.co.uk Tue Sep 14 05:04:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 10:04:56 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <937371.74873.qm@web52006.mail.re2.yahoo.com> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> Message-ID: Hi Anastasia, On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: > Hi Peter, > >> >> Could you post a short example of the kind of output you are looking at? >> > > Here is an example output, but this caan differ depending on the model used > (there are several models for Branch, Site, BranchSite, but all are pretty > standard) > Thanks - that looks possible to parse, but not very easy (especially if the codeml output changes slightly between versions). >> >> Can you get codeml to output what you need in another format, such as NEXUS? >> > > Haven't tried that, but as you can see, this is a very verbose output and > NEXUS does not seem an option. At first glance, the NEXUS format could hold a lot of that information. Another possibility might be phyloXML. However, you are at the mercy of the codeml tool and what it supports. I might be worth politely asking the author(s) about supporting one of these more standard formats as a optional output. > Ultimately, I want to parse this to get all the information I need in a > tabulated file. I am still working out what exactly I need (there are standard > values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type > of downstram analysis). I will now work on the pypaml class and modify the > original code to make it more generic (it seems that it only works for Site > Models). Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so unless he agrees to re-license it we cannot include it in Biopython. > Will let you know, was just wondering if there was already a solution.There is > one in Bioperl, but heard it is very slow and in any case, I don't understand > much of perl.... I don't know much Perl either ;) Peter From p.j.a.cock at googlemail.com Tue Sep 14 05:13:04 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 10:13:04 +0100 Subject: [Biopython] problems searching swiss prot In-Reply-To: References: Message-ID: On Mon, Sep 13, 2010 at 9:40 PM, Peter Cock wrote: > Forwarding a query from Jessica Grant since she appears > to have had trouble posting to the mailing list. > > Jessica wrote: > >> Hello, >> >> I am running a few scripts to try to extract sequence information >> out of uniprot. ?One program called AutoFACT gives me ID numbers >> associated with that database. ?Most of these look like this: >> >> D2V5S4_NAEGR >> Q48KU2_PSE14 >> Q22B72_TETTH >> >> >> and my downstream scripts, which are written in biopython, are >> fine with this. ?Then, every once in a while, a sequence will come >> back with a name that looks like this: >> >> UPI00006CC162 >> >> and everything goes bad. ?My script can't handle these names, >> apparently, although if I go to uniprot.org and search for it, the >> sequence comes up. >> >> My script uses the following, where RepID is the number >> extracted from AutoFACT: >> >> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None) >> ? ? ? ?seq_record = SeqIO.read(handle, "swiss") >> >> Any thoughts? >> >> Thank you, >> >> Jessica > > Hi Jessica, > > I think the problem is that these unusual identifiers are > not UniProt/SwissProt accession identifiers. The URL > this Biopython function uses was originally from > www.expasy.ch but is now on www.uniprot.org as > described here: > > http://www.expasy.ch/expasy_urls.html > > I think the ID UPI00006CC162 is a UniProt ID of some > kind, so it may be possible to access the information > you want somehow. See for example: > > http://www.uniprot.org/uniparc/UPI00006CC162 > > However, it is not clear to me right away if you can get > this record back as a plain text "swiss" format entry... > > Peter Jessica replied (off list), to say: >> Oh, and I got a great help from someone at Uniprot for my >> previous question...turns out you can get the sequences >> downloaded as fasta files: >> >> http://www.uniprot.org/uniparc/UPI00006CC162.fasta >> >> and I could then read them into SeqIO as a fasta and >> manipulate them that way. I guess the UPI at the start stands for Uni Parc Identifier. Note that the page I linked to earlier has links to several file formats including FASTA, but not plain text "SwissProt" format: http://www.uniprot.org/uniparc/UPI00006CC162 Peter From p.j.a.cock at googlemail.com Tue Sep 14 05:49:56 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 10:49:56 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 3:47 AM, Jessica Grant wrote: >On Mon, Sep 13, 2010 at 9:49 PM, Peter Cock wrote: >> On Mon, Sep 13, 2010 at 7:43 PM, Jessica Grant wrote: >>> >>> Hello, >>> >>> I am working with an organism that has an unusual genetic code. Is there >>> a way I can use biopython's translate() but import my own codon table >>> instead of using the standard ncbi tables that are coded in the CodonTable >>> module? >>> >>> Thanks! >>> >>> Jessica >> >> Hi Jessica, >> >> Good question - this is something I had thought about but not done anything >> since no one had ever asked about using a non-standard table. After all, the >> NCBI do have a pretty comprehensive list. I'm curious which organism(s) you >> are using. >> >> In answer to your query, right now, not easily. However, it would be simple to >> tweak the Bio.Seq module to allow the table argument to be a string or integer >> as now (for referring to a built in NCBI table) or a CodonTable object which you >> would have to supply. These are defined in the Bio.Data.CodonTable module. >> If this sounds useful and you could help with testing, it could be done ready >> for the next release of Biopython. >> >> Peter > > Thanks Peter, > > We are doing some work on a ciliate called Chilodonella uncinata. ?It > apparently has only one stop codon and the others are recoded in an > unusual way so it doesn't quite fit any of the ncbi tables. > > I did try to play around with the CodonTable module, but couldnt' quite > figure out how to do it. ?Just making a new table similar to the tables that > are in the module didn't do it, and I didn't feel comfortable messing around > in the depths of biopython. ?:) > > I would be happy to help with testing and I guess in the meantime I will be > putting lots of if statements in my script. > > Jessica Hi Jessica, Do you have the information for the CodonTable handy? e.g. a list of the start codons, and how to translate the 64 codons (including stop codons). Given that I could show you how to make the CodonTable object. Peter From p.j.a.cock at googlemail.com Tue Sep 14 06:39:34 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 11:39:34 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 10:49 AM, Peter Cock wrote: > > Hi Jessica, > > Do you have the information for the CodonTable handy? e.g. a list of the > start codons, and how to translate the 64 codons (including stop codons). > Given that I could show you how to make the CodonTable object. > > Peter > I've done a proof of principle change to Bio.Seq on this branch: http://github.com/peterjc/biopython/tree/trans-table specifically this commit: http://github.com/peterjc/biopython/commit/56a2fd5f92098e9be892eb51f27b08aaa46a19a6 I'm not expecting you to try this code out yet (unless you happen to know your way round git already). The basic idea is that the Bio.Seq translate function and the Seq object translate method are extended so that the table argument can now also be a CodonTable object. Once we know what your table should look like, I can write a complete example. Probably Bio.Data.CodonTable will need some more documentation added... Peter From zaricdragoslav at gmail.com Tue Sep 14 07:08:55 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 15:08:55 +0400 Subject: [Biopython] Intro Message-ID: Dear All, I am new member and I like to send welcome greet to everyone. I have few newbie questions so please be cooperative :) 1. How can I see biopython version and is there connection between python version and biopython version ? 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 (at least I think I did :) I have same installation on windows machine and everything works fine. But for example when I want to use something like this: from Bio import SeqIO orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") Two problems happens in ubuntu environment: first is that SeqIO complains that there is no index method second is that everywhere I should put string location of file biopython wants handle to file The first thing I can think of is maybe I am using old version of biopython, which points to question 1. 3. Does somebody have experience with using biopython in django web site ? Do I install biopython on web server or I can keep libraries in some folder and load them dynamically in code ? Kind regards, Dragoslav Zaric Programmer Msc. in Astrophysics From biopython at maubp.freeserve.co.uk Tue Sep 14 07:44:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 12:44:47 +0100 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric wrote: > Dear All, > > I am new member and I like to send welcome greet to everyone. > > I have few newbie questions so please be cooperative :) > Hello and welcome :) > 1. How can I see biopython version and is there connection between python > version and biopython version ? Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions of Biopython may not have worked 100% on Python 2.7, but we did previously support Python 2.3 and even older versions of Python. There is a FAQ (frequently asked questions) section in the Tutorial for how to determine the version of Biopython installed. Try: import Bio print Bio.__version__ The Tutorial for the latest release is online as PDF or HTML, http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/tutorial/Tutorial.html > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 > (at least I think I did :) Based on the problems below, I don't think it worked. > ? ?I have same installation on windows machine and everything works fine. > ? ?But for example when I want to use something like this: > > ? ?from Bio import SeqIO > ? ?orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") > > ? ?Two problems happens in ubuntu environment: > ? ?first is that SeqIO complains that there is no index method That does suggest you have an old version of Biopython. The index function was added in Biopython 1.52, see: http://biopython.open-bio.org/SRC/biopython/NEWS http://news.open-bio.org/news/2009/09/biopython-release-152/ > ? ?second is that everywhere I should put string location of file > ? ?biopython wants handle to file Things like Bio.SeqIO will accept filenames in recent versions of Biopython (since release 1.54), but older versions only accepted file handles. This is discussed in an FAQ in recent versions of the Tutorial which point to this section on handles: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles > ? ?The first thing I can think of is maybe I am using old version of > ? ?biopython, which points to question 1. That does seem to be the problem. > 3. Does somebody have experience with using biopython in django > web site ? ?Do I install biopython on web server or I can keep libraries > in some folder and?load them dynamically in code ? I've used Biopython within TurboGears, but I haven't used django. You should probably consult the django documentation for how they recommend installing 3rd party libraries (e.g. they may recommend using a virtual environment). Peter From zaricdragoslav at gmail.com Tue Sep 14 08:04:47 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 16:04:47 +0400 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: Peter, Thank you so very much for detailed explanations. I will try to upgrade biopython version under linux. Kind regards, Dragoslav Zaric On Tue, Sep 14, 2010 at 3:44 PM, Peter wrote: > On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric > wrote: > > Dear All, > > > > I am new member and I like to send welcome greet to everyone. > > > > I have few newbie questions so please be cooperative :) > > > > Hello and welcome :) > > > 1. How can I see biopython version and is there connection between python > > version and biopython version ? > > Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions > of Biopython may not have worked 100% on Python 2.7, but we did > previously support Python 2.3 and even older versions of Python. > > There is a FAQ (frequently asked questions) section in the Tutorial for > how to determine the version of Biopython installed. Try: > > import Bio > print Bio.__version__ > > The Tutorial for the latest release is online as PDF or HTML, > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 > > (at least I think I did :) > > Based on the problems below, I don't think it worked. > > > I have same installation on windows machine and everything works fine. > > But for example when I want to use something like this: > > > > from Bio import SeqIO > > orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") > > > > Two problems happens in ubuntu environment: > > first is that SeqIO complains that there is no index method > > That does suggest you have an old version of Biopython. The index > function was added in Biopython 1.52, see: > > http://biopython.open-bio.org/SRC/biopython/NEWS > http://news.open-bio.org/news/2009/09/biopython-release-152/ > > > second is that everywhere I should put string location of file > > biopython wants handle to file > > Things like Bio.SeqIO will accept filenames in recent versions of > Biopython (since release 1.54), but older versions only accepted > file handles. This is discussed in an FAQ in recent versions of the > Tutorial which point to this section on handles: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles > > > The first thing I can think of is maybe I am using old version of > > biopython, which points to question 1. > > That does seem to be the problem. > > > 3. Does somebody have experience with using biopython in django > > web site ? Do I install biopython on web server or I can keep libraries > > in some folder and load them dynamically in code ? > > I've used Biopython within TurboGears, but I haven't used django. > You should probably consult the django documentation for how they > recommend installing 3rd party libraries (e.g. they may recommend > using a virtual environment). > > Peter > From bartek at rezolwenta.eu.org Tue Sep 14 08:20:44 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 14:20:44 +0200 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:04 PM, Dragoslav Zaric wrote: > Peter, > > Thank you so very much for detailed explanations. > > I will try to upgrade biopython version under linux. > > Hi, Since you mentioned that you are working on ubuntu, I wanted to add that you should be careful when upgrading the python/biopython versions on your machine. You are most probably now running both python and biopython installed from ubuntu packages, but if you want to upgrade, you have to choose between taking newer packages from a newer ubuntu (currently the newest ubuntu 10.10 beta contains biopython 1.53 http://packages.ubuntu.com/lucid/python-biopython) or compiling from source. If you choose to install from source, be sure to first uninstall the old version from the package: sudo apt-get remove python-biopython if you want to install from source, you will need some extra packages: sudo apt-get install python-dev python-reportlab python-numpy good luck Bartek From bartek at rezolwenta.eu.org Tue Sep 14 09:13:59 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 15:13:59 +0200 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski wrote: > (currently the newest ubuntu 10.10 beta contains biopython 1.53 > http://packages.ubuntu.com/lucid/python-biopython) Just a small correction: 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains 1.54 (http://packages.ubuntu.com/maverick/python-biopython) sorry for the error Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From zaricdragoslav at gmail.com Tue Sep 14 10:22:55 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 18:22:55 +0400 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: Thanks for answers and help ! Actually I do not prefer to use ubuntu above 9.04 and there is no reason to change distribution because one program. I just did sudo apt-get remove python-biopython and after this 1.55 was automatically activated. I did install 1.55 but it looks like older version of biopython from default package was masking new biopython version. Thanks again ! On Tue, Sep 14, 2010 at 5:13 PM, Bartek Wilczynski wrote: > > > On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> (currently the newest ubuntu 10.10 beta contains biopython 1.53 >> http://packages.ubuntu.com/lucid/python-biopython) > > > Just a small correction: > 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains > 1.54 (http://packages.ubuntu.com/maverick/python-biopython) > > sorry for the error > Bartek > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Sep 14 10:29:27 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 18:29:27 +0400 Subject: [Biopython] Some books Message-ID: Dear friends, I do not come from bioinformatics background, so can anybody recommend some introducing book about bioinformatics so I can cover the basics. Of course there are a lot of python programming in biopython that is out of biology (like parsing of database files, connect to databases), but to get clear picture it is good to read some introducing book. Is book "Introduction to Bioinformatics" by Arthur Lesk good one ? Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From p.j.a.cock at googlemail.com Tue Sep 14 10:58:05 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 15:58:05 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant wrote: > Hi Peter, > > Here is the codon table, in the format I found in CodonTable.py. > > I will look at the links you sent, but I don't know if I will be able to > follow it all. ?Thanks, > > Jessica > > ? ? ? ? ? ? ? ? ? ?table = { > ? ? 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', > ? ? 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', > ? ? 'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', > ? ? 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', > ? ? 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', > ? ? 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', > ? ? 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', > ? ? 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', > ? ? 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', > ? ? 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', > ? ? 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', > ? ? 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', > ? ? 'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',}, > ? ? ? ? ? ? ? ? ? ?stop_codons = ['TAA' ], > ? ? ? ? ? ? ? ? ? ?start_codons = [ 'ATG'] > ? ? ? ? ? ? ? ? ? ?) OK, don't worry about the git branch stuff - I've just merged this to the main repository. Are you happy with installing Biopython from source? If so grab the latest source code as described here: http://www.biopython.org/wiki/SourceCode Alternatively all you need to update is the Bio/Seq.py file to the latest version: http://github.com/biopython/biopython/raw/master/Bio/Seq.py To use the new functionality, first you need to create a CodonData object with your special table, and assuming you are just working with unambiguous DNA that means: from Bio.Data.CodonTable import CodonTable c_uncinata_table = CodonTable(forward_table={ 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAG': 'Q', 'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'}, start_codons = [ 'ATG'], stop_codons = ['TAA' ]) Note that order of the forward table dictionary entries does not actually matter, however, I have moved the TAG and TGA entries from the end to keep the whole table in a standard order - I found this easier to check. If you have the updated Bio.Seq module, then you can do this: >>> from Bio.Alphabet import generic_dna >>> from Bio.Seq import Seq >>> seq = Seq("AAATAGTGATAA", generic_dna) >>> print seq.translate() K*** >>> print seq.translate(table=c_uncinata_table) KQW* Or using strings, >>> from Bio.Seq import translate >>> print translate("AAATAGTGATAA") K*** >>> print translate("AAATAGTGATAA", table=c_uncinata_table) KQW* Does that make sense? Does it do what you expect? Don't hesitate to ask for clarification. Peter From Achim.Treumann at NEPAF.com Tue Sep 14 10:51:29 2010 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Tue, 14 Sep 2010 15:51:29 +0100 Subject: [Biopython] Some books In-Reply-To: References: Message-ID: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Dear Dragoslav, I cannot comment on Arthur Lesk's book - haven't read it. I can really recommend two freely available tutorials on Katja Schuerer's website: One of them is an introduction to programming using Python: http://www.pasteur.fr/formation/infobio/python/ The other one is a Python course in Bioinformatics: http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html Both of them provide you with numerous examples and take you through tips and tricks on how to address bioinformatic problems using Python and Biopython. I presume that you are familiar with the Biopython manual that is part of the Biopython distribution: http://www.biopython.org/DIST/docs/tutorial/Tutorial.html Hope this helps, Best wishes, Achim -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Dragoslav Zaric Sent: 14 September 2010 15:29 To: biopython at lists.open-bio.org Subject: [Biopython] Some books Dear friends, I do not come from bioinformatics background, so can anybody recommend some introducing book about bioinformatics so I can cover the basics. Of course there are a lot of python programming in biopython that is out of biology (like parsing of database files, connect to databases), but to get clear picture it is good to read some introducing book. Is book "Introduction to Bioinformatics" by Arthur Lesk good one ? Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cjfields at illinois.edu Tue Sep 14 10:59:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Sep 2010 09:59:49 -0500 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> Message-ID: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> On Sep 14, 2010, at 4:04 AM, Peter wrote: > Hi Anastasia, > > On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >> Hi Peter, >> >>> >>> Could you post a short example of the kind of output you are looking at? >>> >> >> Here is an example output, but this caan differ depending on the model used >> (there are several models for Branch, Site, BranchSite, but all are pretty >> standard) >> > > Thanks - that looks possible to parse, but not very easy (especially if the > codeml output changes slightly between versions). > >>> >>> Can you get codeml to output what you need in another format, such as NEXUS? >>> >> >> Haven't tried that, but as you can see, this is a very verbose output and >> NEXUS does not seem an option. > > At first glance, the NEXUS format could hold a lot of that information. > Another possibility might be phyloXML. However, you are at the mercy > of the codeml tool and what it supports. I might be worth politely asking > the author(s) about supporting one of these more standard formats as > a optional output. > >> Ultimately, I want to parse this to get all the information I need in a >> tabulated file. I am still working out what exactly I need (there are standard >> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type >> of downstram analysis). I will now work on the pypaml class and modify the >> original code to make it more generic (it seems that it only works for Site >> Models). > > Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so > unless he agrees to re-license it we cannot include it in Biopython. > >> Will let you know, was just wondering if there was already a solution.There is >> one in Bioperl, but heard it is very slow and in any case, I don't understand >> much of perl.... > > I don't know much Perl either ;) > > Peter Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that. It's extremely frustrating. chris From p.j.a.cock at googlemail.com Tue Sep 14 12:04:52 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 17:04:52 +0100 Subject: [Biopython] Some books In-Reply-To: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> References: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Message-ID: On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann wrote: > Dear Dragoslav, > > ... > > The other one is a Python course in Bioinformatics: > http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html The above Pasteur Institute course using Biopython is sadly very out of date in places, and I have been unable to get in touch with the authors to revise it or at least add some warning text to it. Peter From biopython at maubp.freeserve.co.uk Tue Sep 14 12:07:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 17:07:52 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> Message-ID: On Tue, Sep 14, 2010 at 3:59 PM, Chris Fields wrote: > On Sep 14, 2010, at 4:04 AM, Peter wrote: >> On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >>> >>> Here is an example output, but this caan differ depending on the model used >>> (there are several models for Branch, Site, BranchSite, but all are pretty >>> standard) >> >> Thanks - that looks possible to parse, but not very easy (especially if the >> codeml output changes slightly between versions). > > Just a warning from those experienced with paml parsers (bioperl): the > output is notoriously shifty even between minor releases (sections get > reordered, etc), so pretty much any parse needs to accommodate that. > It's extremely frustrating. Thanks Chris - I was afraid of that. It sounds like parsing plain text NCBI BLAST output, but worse. Do you know if anyone has asked about codeml outputting something nicer to parse instead? e.g. Nexus or any kind of XML? Peter From Achim.Treumann at NEPAF.com Tue Sep 14 12:20:36 2010 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Tue, 14 Sep 2010 17:20:36 +0100 Subject: [Biopython] Some books In-Reply-To: References: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Message-ID: <01798D2396253A449511F31F1CDE83550FBAF2@srv1.NEPAF.local> Hiya, I agree about this warning (and have come across a few bits where this has caused problems) - despite that I found them very useful. Best wishes, Achim -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: 14 September 2010 17:05 To: Achim Treumann Cc: Dragoslav Zaric; biopython at lists.open-bio.org Subject: Re: [Biopython] Some books On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann wrote: > Dear Dragoslav, > > ... > > The other one is a Python course in Bioinformatics: > http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html The above Pasteur Institute course using Biopython is sadly very out of date in places, and I have been unable to get in touch with the authors to revise it or at least add some warning text to it. Peter From natassa_g_2000 at yahoo.com Tue Sep 14 12:15:18 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Tue, 14 Sep 2010 09:15:18 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> Message-ID: <884978.85315.qm@web52004.mail.re2.yahoo.com> Thanks Chris, Good to know.. I am dealing with paml results for the first time, but somehow thought that outputs were standard. Apparently not... Now that I started writing my own python parser, I see that even among models of the same run, the text changes without any obvious reason (from 'omega' to 'w' etc). Indeed frustrating! Does the Bioperl solution include different parsers for different types of analysis ex The Branch analysis models, another for the Site Analysis models etc? It would be good o have one for all, but I am not sure this is feasible...I start with separate parsers and will see how it can be generalized. Thanks, Anastasia ________________________________ From: Chris Fields To: Peter Cc: natassa ; biopython at biopython.org Sent: Tue, September 14, 2010 4:59:49 PM Subject: Re: [Biopython] Codeml parser in Biopython? On Sep 14, 2010, at 4:04 AM, Peter wrote: > Hi Anastasia, > > On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >> Hi Peter, >> >>> >>> Could you post a short example of the kind of output you are looking at? >>> >> >> Here is an example output, but this caan differ depending on the model used >> (there are several models for Branch, Site, BranchSite, but all are pretty >> standard) >> > > Thanks - that looks possible to parse, but not very easy (especially if the > codeml output changes slightly between versions). > >>> >>> Can you get codeml to output what you need in another format, such as NEXUS? >>> >> >> Haven't tried that, but as you can see, this is a very verbose output and >> NEXUS does not seem an option. > > At first glance, the NEXUS format could hold a lot of that information. > Another possibility might be phyloXML. However, you are at the mercy > of the codeml tool and what it supports. I might be worth politely asking > the author(s) about supporting one of these more standard formats as > a optional output. > >> Ultimately, I want to parse this to get all the information I need in a >> tabulated file. I am still working out what exactly I need (there are standard >> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the >>type >> of downstram analysis). I will now work on the pypaml class and modify the >> original code to make it more generic (it seems that it only works for Site >> Models). > > Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so > unless he agrees to re-license it we cannot include it in Biopython. > >> Will let you know, was just wondering if there was already a solution.There is >> one in Bioperl, but heard it is very slow and in any case, I don't understand >> much of perl.... > > I don't know much Perl either ;) > > Peter Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that. It's extremely frustrating. chris From p.j.a.cock at googlemail.com Wed Sep 15 13:10:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 15 Sep 2010 18:10:46 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Wed, Sep 15, 2010 at 5:50 PM, Jessica Grant wrote: >Peter wrote: >> ... >> To use the new functionality, first you need to create a >> CodonData object with your special table, and assuming >> you are just working with unambiguous DNA that means: >> ... >> Does that make sense? Does it do what you expect? >> Don't hesitate to ask for clarification. >> >> Peter > > It works! Thanks so much!! > > Jessica Great - thanks for letting us know. Peter From cwalentas at gmail.com Thu Sep 16 00:36:13 2010 From: cwalentas at gmail.com (Christopher Walentas) Date: Thu, 16 Sep 2010 00:36:13 -0400 Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource Message-ID: <4C919EBD.3080802@gmail.com> Apologies in advance- all of this is very new to me- and I hope that this is the proper forum for this query. What I would like to do is parse the returns of an entrez pubmed search into their smallest, unique useful bits and create a relational database (sqlite, dee?). Ideally this would not only be of returned fields, but also drilling further down into say affiliation, addresses, etc... I believe that I've mastered the search and download functions and individual citations exist as a stacked dictionary of the xml outputs. Where I am falling down is understanding how to extract the structure of these outputs and create a persistent relational resource that's been normalized such that these fields can be mapped to used to "correct" values in an uncurated dataset with highly analogous fields. I've been struggling to bridge the gap between python and sqlite/dee, however have recently been informed that it might be possible to do everything within python itself and again apologies for any navieties- they are indeed sincere, however I'm well aware that a little knowledge can be dangerous- hence reaching out. From what I've already read, it would seem that all of this is ideally suited to bio-/python and am looking forward to learning- I'm just looking for that swift shove in the right direction and to benefit from your collective informed guidance. Cheers in advance, christopher From mjldehoon at yahoo.com Thu Sep 16 06:53:55 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 16 Sep 2010 03:53:55 -0700 (PDT) Subject: [Biopython] Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 Message-ID: <841162.95892.qm@web62405.mail.re1.yahoo.com> --- On Thu, 9/16/10, Paolo Romano wrote: > From: Paolo Romano > Subject: Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 > To: biopython-owner at lists.open-bio.org > Date: Thursday, September 16, 2010, 3:25 AM > Dear list owner, > > I would be glad if you would forward thsi message to the > list. > > Many thanks in adavnce. > > Ciao. Paolo > > >Date: Wed, 15 Sep 2010 17:50:48 +0200 > >To: biopython at lists.open-bio.org > >From: Paolo Romano > >Subject: NETTAB 2010: Submission deadline is > approaching: Sep 24, 2010 > > > >I hope this announcement can be of interest for this > list. > > > >Forgive me if I'm wrong! > > > >Ciao. Paolo > > > >========== > >NETTAB 2010 on "Biological Wikis" > >joint with the BBCC 2010 workshop on Bioinformatics and > > >Computational Biology in Campania > > > >November 29 - December 1, 2010, Naples, Italy > >http://www.nettab.org/2010/ > >http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/ > > > > > >The deadline for the submission of oral communications > is quickly > >approaching, submit you contribution within next > >Friday September 24, 2010 through the EasyChair site ( > > >http://www.easychair.org/conferences/?conf=nettab2010 > ). > > > >The lenght of contributions for oral communications > should be > >between 3 and 5 pages, including tables and figures. > >See more instructions below. > > > > > >NETTAB 2010 workshop promises to be a great meeting for > all > >researchers involved in the exploitation of wikis in > biology. > > > >Don't miss this opportunity to discuss your ideas and > doubts with > >such scientists as > >- Alex Bateman, Wellcome Trust Sanger Institute, > Hinxton, Cambridge, > >United Kingdom > >- Alexander Pico, Gladstone Institute of Cardiovascular > Disease, San > >Francisco, USA > >- Andrew Su, Bioinformatics and Computational Biology, > Genomics > >Institute of the Novartis Research Foundation (GNF), > San Diego, USA > >- Dan Bolser, College of Life Sciences, University of > Dundee, > >Scotland, United Kingdom > >- Robert Hoffmann, Computational Biology Center, cBIO, > Memorial > >Sloan-Kettering Cancer Center, MSKCC, New York, USA > >- Thomas Kelder, Department of Bioinformatics (BiGCaT), > Maastricht > >University, the Netherlands > >- Jaime Prilusky, Bioinformatics, Weizmann Institute of > Science, > >Rehovot, Israel > >- and many other who, we hope, will join the workshop. > > > >Here below, please find a summary of the Call. The > complete Call is > >available on-line at http://www.nettab.org/2010/call.html . > > > >Further information is availble at http://www.nettab.org/2010/ . > > > >============ > >CALL FOR PAPERS > > > >TOPICS > >The following list is not meant to be exclusive of any > further > >topics as stated above. > >Submitted contributions should address one or more of > the following topics: > >? ???* Wiki development tools > >? ? ? ? ???o > Wikimedia > >? ? ? ? ???o > Wikimedia extensions > >? ? ? ? ???o > Semantic Wikis > >? ? ? ? ???o > Wiki-coupled CMSs > >? ? ? ? ???o Other > wikis > >? ???* Arising issues for the > biomedical domain: > >? ? ? ? ???o > Authoritativeness of contributions and sites > >? ? ? ? ???o Quality > assessment > >? ? ? ? ???o Users > acknowledgement > >? ? ? ? ???o > Stimulatation of quality contributions > >? ? ? ? ???o > Authorships management and reward > >? ? ? ? ???o > 'Scientific production' value for contributions > >? ? ? ? ???o > Management of bioinformatics data types > >? ???* Wikis and collaborative > systems for: > >? ? ? ? ???o > Genomics, proteomics, metabolomics, any -omics > >? ? ? ? ???o > Proteins analysis and visualization > >? ? ? ? ???o gene > and proteins interactions > >? ? ? ? ???o > metabolic pathways > >? ? ? ? ???o > oncology research > >? ???* Issues to be tackled by wiki > and collaborative research for: > >? ? ? ? ???o > Genomics, proteomics, metabolomics, any -omics > >? ? ? ? ???o > Proteins analysis and visualization > >? ? ? ? ???o gene > and proteins interactions > >? ? ? ? ???o > metabolic pathways > >? ? ? ? ???o > oncology research > > > >The NETTAB 2010 workshop is a joint event with the BBCC > 2010 workshop on > >This deadline also applies to the BBCC 2010 workshop. > >Submit for BBCC through the same EasyChair site and > select 'BBCC > >session' topic. > > > > > >TYPE OF CONTRIBUTIONS > > > >The following possible contributions are sought: > >? ???* Oral communications > >? ???* Posters > >? ???* Software demos > >All accepted contributions will be published in the > proceedings of > >the workshop. > > > > > >DEADLINES > > > >* September 24, 2010: Oral communications submission > >? ? ? ? ???o > Decisions announced: October 24, 2010 > > > >* October 29, 2010: Early registration ends > > > >* November 29 - December 1, 2010: Workshop and > Tutorials > > > > > >INSTRUCTIONS > >Kindly follow the instructions carefully when preparing > your > >contribution and submit your contribution through the > EasyChair > >system at http://www.easychair.org/conferences/?conf=nettab2010. > > > >All contributions should follow the same format, as > specified here: > >font type: Times New Roman, font size: 12 pti, page > size: A4, left > >and right margins: 2.0 cm, upper margin: 2.5 cm, lower > margin: 2.0 cm. > > > >The lenght of contributions for oral communications > should be > >between 3 and 5 pages, including tables and figures. > >They should include: Abstract, Introduction, Methods, > Results and > >Discussion, References. > >All contributions for oral communications will be > evaluated by at > >least three referees. > > > >For any further information or clarification, please > contact the > >organization by email at info at nettab.org. > > > > > >ORGANIZATION (see http://www.nettab.org/2010/organization.html for > >the Scientific Committee and more information) > > > >Co-chairs > >? ???* Angelo Facchiano, CNR-ISA, > Avellino, Italy > >? ???* Paolo Romano, National > Cancer Research Institute, Genoa, Italy > > > >We look forward to meeting you in Naples! > > > >Paolo Romano and Angelo Facchiano > >???on behalf of the Scientific > Committee > > > > > >Paolo Romano (paolo.romano at istge.it) > >Bioinformatics > >National Cancer Research Institute (IST) > >Largo Rosanna Benzi, 10, I-16132, Genova, Italy > >Tel: +39-010-5737-288? Fax: +39-010-5737-295? > Skype: p.romano > >Web: http://www.nettab.org/promano/ > > > > > > > > > > > Paolo Romano (paolo.romano at istge.it) > Bioinformatics > National Cancer Research Institute (IST) > > > > From zaricdragoslav at gmail.com Sun Sep 19 02:33:06 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 10:33:06 +0400 Subject: [Biopython] Trhird party library Message-ID: Did anybody used biopython as third part library, like for example in python web project ? I ask this because probably you can not expect to find or install biopython in provider server environment. For example, after installing biopython in windows environment, you can see that biopython is installed inside python 2.6 installation: C:\Python26\Lib\site-packages\Bio C:\Python26\Lib\site-packages\BioSQL C:\Python26\Lib\site-packages\numpy So can you copy these folders to, for example, \Lib\ folder of web project, and reference them somehow from code ? Of course I can test this by myself, and I will do this, but maybe somebody have experience with this problem, and it would be probably good info for others in this forum. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From chapmanb at 50mail.com Sun Sep 19 06:51:19 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 19 Sep 2010 06:51:19 -0400 Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource In-Reply-To: <4C919EBD.3080802@gmail.com> References: <4C919EBD.3080802@gmail.com> Message-ID: <20100919105119.GC2030@kunkel> Christopher; > What I would like to do is parse the returns of an entrez pubmed > search into their smallest, unique useful bits and create a > relational database (sqlite, dee?). Ideally this would not only be > of returned fields, but also drilling further down into say > affiliation, addresses, etc... [...] > Where I am falling down is understanding how to extract the > structure of these outputs and create a persistent relational > resource that's been normalized such that these fields can be mapped > to used to "correct" values in an uncurated dataset with highly > analogous fields. This is the standard problem of represent object style data in a flat relational database. It's tough to answer succinctly on a mailing list, as there are entire textbooks and courses devoted to the problem. The wikipedia entry on normalization and first normal form is a good place to get started: http://en.wikipedia.org/wiki/Database_normalization As far as accessing relational databases, Python is great for this. An object relational mapper like SQLAlchemy: http://www.sqlalchemy.org/ is a great place to get started. This allows you to deal more directly with objects, and also generalizes database access so you can quickly switch from SQLite to MySQL to whatever. Another suggestion is to use a document oriented database like MongoDB for storing your data: http://www.mongodb.org/ This allows you to store objects without flattening them, which may be more intuitive for the XML/dictionary results you get back from Entrez searches. Hope this helps, Brad From chapmanb at 50mail.com Sun Sep 19 06:44:40 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 19 Sep 2010 06:44:40 -0400 Subject: [Biopython] Third party library In-Reply-To: References: Message-ID: <20100919104440.GB2030@kunkel> Dragoslav; > Did anybody used biopython as third part library, like for example in python > web project ? Yes, absolutely. Biopython doesn't behave any different than other Python third party libraries, so there wouldn't be any special instructions outside the documentation for the library you are using. > I ask this because probably you can not expect to find or install biopython > in provider server environment. It's tough to answer this generally without knowing what framework you are planning to use. For an example, Google App Engine has a restricted environment where only pure Python libraries work. As an install procedure you can most simply do: python setup.py build and then copy the libraries from build/lib.your_platform to the site-libraries location in your application. More formally, virtualenv is also very useful for building an isolated Python environment with only the libraries for a project: http://pypi.python.org/pypi/virtualenv > For example, after installing biopython in windows environment, you can see > that biopython is > installed inside python 2.6 installation: > > C:\Python26\Lib\site-packages\Bio > C:\Python26\Lib\site-packages\BioSQL > C:\Python26\Lib\site-packages\numpy > > So can you copy these folders to, for example, \Lib\ folder of web project, > and reference them somehow from code ? Sure, that all seems fine but it's hard to offer specific advise without knowing exactly what you are doing. The best place for questions is probably in the community of the web framework you are using. Everything that applies to other third party libraries will apply to Biopython. Hope this helps, Brad From sdavis2 at mail.nih.gov Sun Sep 19 07:02:45 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 19 Sep 2010 07:02:45 -0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric wrote: > Did anybody used biopython as third part library, like for example in > python > web project ? > I ask this because probably you can not expect to find or install biopython > in provider server > environment. > > For example, after installing biopython in windows environment, you can see > that biopython is > installed inside python 2.6 installation: > > C:\Python26\Lib\site-packages\Bio > C:\Python26\Lib\site-packages\BioSQL > C:\Python26\Lib\site-packages\numpy > > So can you copy these folders to, for example, \Lib\ folder of web project, > and reference them > somehow from code ? > > Of course I can test this by myself, and I will do this, but maybe somebody > have experience > with this problem, and it would be probably good info for others in this > forum. > > Hi, Dragoslav. The python developers thought of this problem. http://docs.python.org/install/#alternate-installation-the-home-scheme Sean From zaricdragoslav at gmail.com Sun Sep 19 08:11:48 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 16:11:48 +0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Anyway, I will try simplest thing, to copy folder with biopython modules in some folder of web app and access modules trough absolute path of web server, this must work. At first I planned to use django web framework, but I recently discovered there are many python web frameworks. So i prefer most simplistic and effective frameworks, I will check out web.py it looks nice at first glance. Kind regards On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis wrote: > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric > wrote: > >> Did anybody used biopython as third part library, like for example in >> python >> web project ? >> I ask this because probably you can not expect to find or install >> biopython >> in provider server >> environment. >> >> For example, after installing biopython in windows environment, you can >> see >> that biopython is >> installed inside python 2.6 installation: >> >> C:\Python26\Lib\site-packages\Bio >> C:\Python26\Lib\site-packages\BioSQL >> C:\Python26\Lib\site-packages\numpy >> >> So can you copy these folders to, for example, \Lib\ folder of web >> project, >> and reference them >> somehow from code ? >> >> Of course I can test this by myself, and I will do this, but maybe >> somebody >> have experience >> with this problem, and it would be probably good info for others in this >> forum. >> >> > Hi, Dragoslav. The python developers thought of this problem. > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > Sean > > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From rodrigo_faccioli at uol.com.br Sun Sep 19 09:59:47 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Sun, 19 Sep 2010 10:59:47 -0300 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Hi, I've worked with BioPython in web project. I've installed BioPython normally in our ubuntu server. My web project was developed its front-end in jsp. But I ran my scripts with BioPython. You can find this project in http://glu.fcfrp.usp.br:8180/prometheus/ About the python frameworks, I've read Django is an excellent framework. Thanks in advance, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric wrote: > Anyway, > > I will try simplest thing, to copy folder with biopython modules in some > folder of web app and access modules trough absolute path of web server, > this must work. > > At first I planned to use django web framework, but I recently discovered > there are > many python web frameworks. So i prefer most simplistic and effective > frameworks, > I will check out > > web.py > > it looks nice at first glance. > > Kind regards > > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis wrote: > > > > > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric < > zaricdragoslav at gmail.com > > > wrote: > > > >> Did anybody used biopython as third part library, like for example in > >> python > >> web project ? > >> I ask this because probably you can not expect to find or install > >> biopython > >> in provider server > >> environment. > >> > >> For example, after installing biopython in windows environment, you can > >> see > >> that biopython is > >> installed inside python 2.6 installation: > >> > >> C:\Python26\Lib\site-packages\Bio > >> C:\Python26\Lib\site-packages\BioSQL > >> C:\Python26\Lib\site-packages\numpy > >> > >> So can you copy these folders to, for example, \Lib\ folder of web > >> project, > >> and reference them > >> somehow from code ? > >> > >> Of course I can test this by myself, and I will do this, but maybe > >> somebody > >> have experience > >> with this problem, and it would be probably good info for others in this > >> forum. > >> > >> > > Hi, Dragoslav. The python developers thought of this problem. > > > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > > > Sean > > > > > > > > -- > Dragoslav Zaric > > Professional Programmer > MSc Astrophysics > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From zaricdragoslav at gmail.com Sun Sep 19 10:34:37 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 18:34:37 +0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Thanks Rodrigo, I have come to same conclusion after little searching. Also hosting for django is very common. kind regards On Sun, Sep 19, 2010 at 5:59 PM, Rodrigo Faccioli < rodrigo_faccioli at uol.com.br> wrote: > Hi, > > I've worked with BioPython in web project. I've installed BioPython > normally > in our ubuntu server. > > My web project was developed its front-end in jsp. But I ran my scripts > with > BioPython. You can find this project in > http://glu.fcfrp.usp.br:8180/prometheus/ > > About the python frameworks, I've read Django is an excellent framework. > > Thanks in advance, > > -- > Rodrigo Antonio Faccioli > Ph.D Student in Electrical Engineering > University of Sao Paulo - USP > Engineering School of Sao Carlos - EESC > Department of Electrical Engineering - SEL > Intelligent System in Structure Bioinformatics > http://laips.sel.eesc.usp.br > Phone: 55 (16) 3373-9366 Ext 229 > Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 > Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 > > > On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric > wrote: > > > Anyway, > > > > I will try simplest thing, to copy folder with biopython modules in some > > folder of web app and access modules trough absolute path of web server, > > this must work. > > > > At first I planned to use django web framework, but I recently discovered > > there are > > many python web frameworks. So i prefer most simplistic and effective > > frameworks, > > I will check out > > > > web.py > > > > it looks nice at first glance. > > > > Kind regards > > > > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis > wrote: > > > > > > > > > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric < > > zaricdragoslav at gmail.com > > > > wrote: > > > > > >> Did anybody used biopython as third part library, like for example in > > >> python > > >> web project ? > > >> I ask this because probably you can not expect to find or install > > >> biopython > > >> in provider server > > >> environment. > > >> > > >> For example, after installing biopython in windows environment, you > can > > >> see > > >> that biopython is > > >> installed inside python 2.6 installation: > > >> > > >> C:\Python26\Lib\site-packages\Bio > > >> C:\Python26\Lib\site-packages\BioSQL > > >> C:\Python26\Lib\site-packages\numpy > > >> > > >> So can you copy these folders to, for example, \Lib\ folder of web > > >> project, > > >> and reference them > > >> somehow from code ? > > >> > > >> Of course I can test this by myself, and I will do this, but maybe > > >> somebody > > >> have experience > > >> with this problem, and it would be probably good info for others in > this > > >> forum. > > >> > > >> > > > Hi, Dragoslav. The python developers thought of this problem. > > > > > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > > > > > Sean > > > > > > > > > > > > > > -- > > Dragoslav Zaric > > > > Professional Programmer > > MSc Astrophysics > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Wed Sep 1 16:52:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Sep 2010 17:52:51 +0100 Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers? Message-ID: Hello all, One of the improvements in Biopython 1.55 was a re-written location parser for Bio.GenBank (which also covers EMBL parsing). This made parsing much faster, and also meant Bio.GenBank.LocationParser and the underlying Bio.Parsers and Bio.Parsers.spark modules were obsolete. I'd like to mark these as deprecated in the next release: * Bio.GenBank.LocationParser * Bio.Parsers (including Bio.Parsers.spark) Would this cause anyone a problem? Thanks, Peter From j.reid at mail.cryst.bbk.ac.uk Fri Sep 3 13:11:28 2010 From: j.reid at mail.cryst.bbk.ac.uk (John Reid) Date: Fri, 03 Sep 2010 14:11:28 +0100 Subject: [Biopython] Wrong instance length bug in MEME parser Message-ID: Hi, The MEME parser in biopython 1.55 seems to incorrectly set the length of the first instance of a motif to 0. Here is an example: #Sequence, start, length, site Motif: E-value: 0.000010 seq_3, 213, 0, AGGTGACAGAG seq_1, 146, 11, AGGTGACAGAG seq_0, 490, 11, AGGTGACAGAG seq_0, 83, 11, AGGTGACAGAG seq_0, 388, 11, AGGAAACAGAG seq_1, 422, 11, AGGGGACAGAG seq_1, 79, 11, TGGAGACAGAG seq_0, 281, 11, TGGGGACAGAG seq_0, 16, 11, TAGAGACAGAG seq_1, 228, 11, TTGTGACAGAG seq_4, 156, 11, AGGGGACAGGG seq_0, 348, 11, AGGAAAGAGAA seq_0, 374, 11, AGGAATGAGAG seq_5, 22, 11, GGGAAACTGAG seq_3, 486, 11, AAGGGAGTGAG Here's the code that generated the above: from Bio.Motif.Parsers.MEME import MEMEParser import cStringIO meme_output = cStringIO.StringIO(""" ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ seq_0 1.0000 500 seq_1 1.0000 500 seq_2 1.0000 500 seq_3 1.0000 500 seq_4 1.0000 500 seq_5 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp -print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1 model: mod= anr nmotifs= 1 evt= inf object function= E-value of product of p-values width: minw= 8 maxw= 20 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 30 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 1000 distance= 1e-05 data: n= 3000 N= 6 strands: + - sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.195 C 0.305 G 0.305 T 0.195 Background letter frequencies (from dataset with add-one prior applied): A 0.195 C 0.305 G 0.305 T 0.195 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 11 sites = 15 llr = 159 E-value = 9.8e-006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 71:439:9:91 pos.-specific C ::::::8:::: probability G 18a37:2:a19 matrix T 31:3:1:1::: bits 2.4 2.1 * 1.9 * * * 1.6 * * *** Relative 1.4 * * **** Entropy 1.2 * * * **** (15.3 bits) 0.9 *** ******* 0.7 *********** 0.5 *********** 0.2 *********** 0.0 ----------- Multilevel AGGAGACAGAG consensus T TA G sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Strand Start P-value Site ------------- ------ ----- --------- ----------- seq_3 - 213 4.54e-07 GGCCTTTGGA AGGTGACAGAG GCGCGGCCAC seq_1 - 146 4.54e-07 CCCAACAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 490 4.54e-07 AAAACAGCAG AGGTGACAGAG seq_0 - 83 4.54e-07 CCCAGCAGGA AGGTGACAGAG GTGGCTCTGG seq_0 + 388 5.99e-07 ATGAGAGGAG AGGAAACAGAG CTTCCTGGAC seq_1 + 422 1.10e-06 ATGAGAGGGG AGGGGACAGAG GACACCTGAA seq_1 + 79 1.33e-06 TTGGTGGTAC TGGAGACAGAG GGCTGGTCCC seq_0 + 281 3.17e-06 CCTCCCCTGA TGGGGACAGAG GTCTCATCAG seq_0 + 16 5.72e-06 CTGGTGACAC TAGAGACAGAG GGCTGGTCCC seq_1 - 228 1.18e-05 TTATTTTCCT TTGTGACAGAG AAACCCAGCA seq_4 + 156 2.07e-05 TCAAGTCCCA AGGGGACAGGG AGCAGAAGGG seq_0 + 348 2.47e-05 GTAGACAGAA AGGAAAGAGAA AGTAAGGACA seq_0 + 374 3.14e-05 GGACAAAGGT AGGAATGAGAG GAGAGGAAAC seq_5 - 22 4.53e-05 CTCTTGTGTA GGGAAACTGAG CACGGGGAAC seq_3 + 486 5.02e-05 CGCCAATGGG AAGGGAGTGAG TGCC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_3 5e-05 212_[-1]_262_[+1]_4 seq_1 1.2e-05 78_[+1]_56_[-1]_71_[-1]_183_[+1]_68 seq_0 3.2e-06 15_[+1]_56_[-1]_187_[+1]_56_[+1]_ 15_[+1]_3_[+1]_91_[+1] seq_4 2.1e-05 155_[+1]_334 seq_5 4.5e-05 21_[-1]_468 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=11 seqs=15 seq_3 ( 213) AGGTGACAGAG 1 seq_1 ( 146) AGGTGACAGAG 1 seq_0 ( 490) AGGTGACAGAG 1 seq_0 ( 83) AGGTGACAGAG 1 seq_0 ( 388) AGGAAACAGAG 1 seq_1 ( 422) AGGGGACAGAG 1 seq_1 ( 79) TGGAGACAGAG 1 seq_0 ( 281) TGGGGACAGAG 1 seq_0 ( 16) TAGAGACAGAG 1 seq_1 ( 228) TTGTGACAGAG 1 seq_4 ( 156) AGGGGACAGGG 1 seq_0 ( 348) AGGAAAGAGAA 1 seq_0 ( 374) AGGAATGAGAG 1 seq_5 ( 22) GGGAAACTGAG 1 seq_3 ( 486) AAGGGAGTGAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006 177 -1055 -219 45 -55 -1055 139 -155 -1055 -1055 171 -1055 103 -1055 -19 77 45 -1055 127 -1055 226 -1055 -1055 -155 -1055 139 -61 -1055 215 -1055 -1055 -55 -1055 -1055 171 -1055 226 -1055 -219 -1055 -155 -1055 161 -1055 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006 0.666667 0.000000 0.066667 0.266667 0.133333 0.000000 0.800000 0.066667 0.000000 0.000000 1.000000 0.000000 0.400000 0.000000 0.266667 0.333333 0.266667 0.000000 0.733333 0.000000 0.933333 0.000000 0.000000 0.066667 0.000000 0.800000 0.200000 0.000000 0.866667 0.000000 0.000000 0.133333 0.000000 0.000000 1.000000 0.000000 0.933333 0.000000 0.066667 0.000000 0.066667 0.000000 0.933333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- [AT]GG[ATG][GA]A[CG]AGAG -------------------------------------------------------------------------------- Time 3.78 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- seq_0 4.45e-04 15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)] seq_1 4.45e-04 78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68 seq_2 2.03e-01 500 seq_3 4.45e-04 212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4 seq_4 2.01e-02 155_[+1(2.07e-05)]_334 seq_5 4.34e-02 21_[-1(4.53e-05)]_468 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 1 reached. ******************************************************************************** CPU: john-dell ******************************************************************************** """) parser = MEMEParser() parsed = parser.parse(meme_output) print '#Sequence, start, length, site' for motif in parsed.motifs: print 'Motif: E-value: %f' % motif.evalue for instance in motif.instances: print "%10s, %5d, %5d, %s" % ( instance.sequence_name, instance.start, instance.length, str(instance), ) #assert instance.length == motif.length From biopython at maubp.freeserve.co.uk Fri Sep 3 13:44:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Sep 2010 14:44:27 +0100 Subject: [Biopython] Wrong instance length bug in MEME parser In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 2:11 PM, John Reid wrote: > Hi, > > The MEME parser in biopython 1.55 seems to incorrectly set the length of the > first instance of a motif to 0. Here is an example: > ... Could you file a bug with all that useful information? http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Thanks, Peter From bartek at rezolwenta.eu.org Fri Sep 3 14:52:32 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 3 Sep 2010 16:52:32 +0200 Subject: [Biopython] Wrong instance length bug in MEME parser In-Reply-To: References: Message-ID: On Fri, Sep 3, 2010 at 3:11 PM, John Reid wrote: > Hi, > > The MEME parser in biopython 1.55 seems to incorrectly set the length of > the first instance of a motif to 0. Here is an example: > > /cut/ Hi, Thanks for reporting the bug. It is fixed now in the main branch (small change, you can see the diff here : http://github.com/biopython/biopython/commit/102ad30a8c5d8bd87847000b33f771b40143e743 I'm closing the bug now, if you find anything else, please let us know. thanks Bartek From mitlox at op.pl Mon Sep 6 13:24:14 2010 From: mitlox at op.pl (xyz) Date: Mon, 06 Sep 2010 23:24:14 +1000 Subject: [Biopython] reading two fastq files at the same time Message-ID: <4C84EB7E.90200@op.pl> Hi, How is it possible to read two fastq files at the same time in BioPython? I have the following BioRuby example: require 'bio' begin fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2]) fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3]) while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry) fastq_A1 = entry1.entry_id fastq_A2 = entry1.seq fastq_B1 = entry2.entry_id fastq_B2 = entry2.seq end rescue => err raise "Exception: #{err}" end Thank you in advance. From biopython at maubp.freeserve.co.uk Mon Sep 6 13:51:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Sep 2010 14:51:13 +0100 Subject: [Biopython] reading two fastq files at the same time In-Reply-To: <4C84EB7E.90200@op.pl> References: <4C84EB7E.90200@op.pl> Message-ID: On Mon, Sep 6, 2010 at 2:24 PM, xyz wrote: > Hi, > How is it possible to read two fastq files at the same time in BioPython? I > have the following BioRuby example: > > require 'bio' > > begin > ?fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2]) > ?fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3]) > > ?while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry) > > ? ?fastq_A1 = entry1.entry_id > ? ?fastq_A2 = entry1.seq > > ? ?fastq_B1 = entry2.entry_id > ? ?fastq_B2 = entry2.seq > ?end > > rescue => err > ?raise "Exception: #{err}" > end > > Thank you in advance. Hi, If you are using Python 2.6+ then probably itertools.izip_longest would do what you want. You could use itertools.izip but this won't catch the error condition when one file has more records than the other. Alternatively you could use something like this, from Bio import SeqIO iter1 = SeqIO.parse(filename1, "fastq") iter2 = SeqIO.parse(filename1, "fastq") while True: try: rec1 = iter1.next() except StopIteration: rec1 = None try: rec2 = iter2.next() except StopIteration: rec2 = None if rec1 is None and rec2 is None: break #end of both files elif rec1 is None or rec2 is None: raise ValueError("Diff record count") else: print rec1.seq, rec1.id print rec2.seq, rec2.id I haven't tested that but it is based on a similar example in Bio.SeqIO.QualityIO.PairedFastaQualIterator for a paired FASTQ and QUAL file. Peter From biopython at maubp.freeserve.co.uk Thu Sep 9 17:13:34 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Sep 2010 18:13:34 +0100 Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers? In-Reply-To: References: Message-ID: On Wed, Sep 1, 2010 at 5:52 PM, Peter wrote: > Hello all, > > One of the improvements in Biopython 1.55 was a re-written location > parser for Bio.GenBank (which also covers EMBL parsing). This made > parsing much faster, and also meant Bio.GenBank.LocationParser > and the underlying Bio.Parsers and Bio.Parsers.spark modules were > obsolete. I'd like to mark these as deprecated in the next release: > > * Bio.GenBank.LocationParser > * Bio.Parsers (including Bio.Parsers.spark) > > Would this cause anyone a problem? > > Thanks, I've just added the deprecation warnings to the code, ready for Biopython 1.56 - it is not too late to undo this is anyone is using this code, but you need to tell us. Peter From margeemail at gmail.com Fri Sep 10 04:10:23 2010 From: margeemail at gmail.com (mailing list) Date: Fri, 10 Sep 2010 00:10:23 -0400 Subject: [Biopython] Added Biopython to my web tool Message-ID: I made a web application (http://utilitymill.com) that lets people make online utilities with Python. I thought you guys might appreciate I added Biopython as a built-in library users can use in their utilities. Here's an example of a utility using Biopython: http://utilitymill.com/utility/RNA_Transcription (It's very simple, I just wanted to try it out.) I'm curious to know if it's useful to you guys. And I'm also hoping I installed everything correctly, so let me know if anything doesn't work. -Greg From mjldehoon at yahoo.com Sat Sep 11 05:29:32 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 10 Sep 2010 22:29:32 -0700 (PDT) Subject: [Biopython] Parsing XML returned by efetch from the Journals database Message-ID: <825067.71139.qm@web62404.mail.re1.yahoo.com> Dear users, The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code. To make the parser more stable for other XML documents, I'd like to remove these hacks. Currently is anybody using Bio.Entrez to parse XML returned by efetch from the Journals database? --Michiel. From natassa_g_2000 at yahoo.com Mon Sep 13 16:22:26 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Mon, 13 Sep 2010 09:22:26 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? Message-ID: <533513.93597.qm@web52005.mail.re2.yahoo.com> Hello, I was wondering if there is a Biopython solution to parsing codeml results from paml. the output files are pretty standard, so such a parser should be quite straightforward to write up. I d volunteer for this, but thought I might check first if somebody else has done this. Actually, I found a read-only pypaml interface in google codes, tried it out and realized I had to edit several things to even import it (in python 2.5), which is quite strange: It was mainly built-in methods that throwed errors..Anyway, i 'corrected' this and then realized that the output files assumed by this code may not be the same as mine, although again, the outputs of codeml are pretty standard. I am not sure how much this code is used and was not sure what is the developper's email to ask him some questions. I am interested in parsing outputs from Branch, Site and BranchSite models, so everthing that codeml can do. Any information by experienced users is welcome! Thanks, Anastasia Gioti From biopython at maubp.freeserve.co.uk Mon Sep 13 16:45:28 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Sep 2010 17:45:28 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <533513.93597.qm@web52005.mail.re2.yahoo.com> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> Message-ID: On Mon, Sep 13, 2010 at 5:22 PM, natassa wrote: > Hello, > I was wondering if there ?is a Biopython solution to parsing codeml results from > paml. the output files are pretty standard, so such a parser should be quite > straightforward to write up. I d volunteer for this, but thought I might check > first if somebody else has done this. Actually, I found a read-only pypaml > interface in google codes, tried it out and realized I had to edit several > things to even import it (in python 2.5), which is quite strange: It was mainly > built-in methods that throwed errors..Anyway, i 'corrected' this and then > realized that the output files assumed by this code may not be the same as mine, > although again, the outputs of codeml are pretty standard. I am not sure how > much this code is used and was not sure what is the developper's email to ask > him some questions. > > I am interested in parsing outputs from Branch, Site and BranchSite models, so > everthing that codeml can do. Any information by experienced users is welcome! > Thanks, > Anastasia Gioti Hi Anastasia, Could you post a short example of the kind of output you are looking at? Can you get codeml to output what you need in another format, such as NEXUS? Peter From p.j.a.cock at googlemail.com Mon Sep 13 20:40:30 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 13 Sep 2010 21:40:30 +0100 Subject: [Biopython] Fwd: problems searching swiss prot In-Reply-To: References: Message-ID: Forwarding a query from Jessica Grant since she appears to have had trouble posting to the mailing list. Jessica wrote: > Hello, > > I am running a few scripts to try to extract sequence information > out of uniprot. ?One program called AutoFACT gives me ID numbers > associated with that database. ?Most of these look like this: > > D2V5S4_NAEGR > Q48KU2_PSE14 > Q22B72_TETTH > > > and my downstream scripts, which are written in biopython, are > fine with this. ?Then, every once in a while, a sequence will come > back with a name that looks like this: > > UPI00006CC162 > > and everything goes bad. ?My script can't handle these names, > apparently, although if I go to uniprot.org and search for it, the > sequence comes up. > > My script uses the following, where RepID is the number > extracted from AutoFACT: > > ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None) > ? ? ? ?seq_record = SeqIO.read(handle, "swiss") > > Any thoughts? > > Thank you, > > Jessica Hi Jessica, I think the problem is that these unusual identifiers are not UniProt/SwissProt accession identifiers. The URL this Biopython function uses was originally from www.expasy.ch but is now on www.uniprot.org as described here: http://www.expasy.ch/expasy_urls.html I think the ID UPI00006CC162 is a UniProt ID of some kind, so it may be possible to access the information you want somehow. See for example: http://www.uniprot.org/uniparc/UPI00006CC162 However, it is not clear to me right away if you can get this record back as a plain text "swiss" format entry... Peter From natassa_g_2000 at yahoo.com Tue Sep 14 08:02:18 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Tue, 14 Sep 2010 01:02:18 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: References: <533513.93597.qm@web52005.mail.re2.yahoo.com> Message-ID: <937371.74873.qm@web52006.mail.re2.yahoo.com> Hi Peter, Could you post a short example of the kind of output you are looking at? Here is an example output, but this caan differ depending on the model used (there are several models for Branch, Site, BranchSite, but all are pretty standard) -------------------------------------------------------------------------------OUTPUT------------------------- seed used = 808671289 CODONML (in paml version 4.4, January 2010) align.phy Model: One dN/dS ratio for branches Codon frequency model: F3x4 Site-class models: NearlyNeutral ns = 7 ls = 861 Codon usage in sequences -------------------------------------------------------------------------------------------------------------- Phe TTT 12 14 15 14 14 12 | Ser TCT 6 11 12 8 10 6 | Tyr TAT 5 5 4 7 9 5 | Cys TGT 11 8 10 9 11 8 TTC 23 18 18 20 20 20 | TCC 16 13 16 19 16 18 | TAC 11 12 13 17 11 13 | TGC 6 2 6 6 4 6 Leu TTA 8 5 6 5 4 2 | TCA 17 16 18 20 21 15 | *** TAA 0 0 0 0 0 0 | *** TGA 0 0 0 0 0 0 TTG 13 11 11 15 15 17 | TCG 17 14 14 17 17 18 | TAG 0 0 0 0 0 0 | Trp TGG 9 8 8 11 8 7 -------------------------------------------------------------------------------------------------------------- Leu CTT 13 15 16 11 12 16 | Pro CCT 7 7 10 6 10 8 | His CAT 8 7 8 4 6 5 | Arg CGT 6 4 5 4 5 5 CTC 14 14 13 19 14 15 | CCC 20 13 16 24 19 20 | CAC 23 18 22 20 24 17 | CGC 14 13 15 14 14 15 CTA 6 4 8 7 6 9 | CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18 13 | CGA 8 4 6 5 6 6 CTG 17 17 14 14 16 10 | CCG 7 8 8 9 6 8 | CAG 18 14 15 14 14 13 | CGG 7 7 8 9 9 8 -------------------------------------------------------------------------------------------------------------- Ile ATT 6 7 9 5 7 6 | Thr ACT 5 7 7 7 5 4 | Asn AAT 3 3 4 2 5 2 | Ser AGT 7 7 9 8 7 7 ATC 16 13 15 23 14 16 | ACC 21 14 17 20 20 16 | AAC 12 14 14 21 14 11 | AGC 14 13 14 15 11 10 ATA 13 9 10 11 11 10 | ACA 19 17 22 22 28 18 | Lys AAA 17 8 13 9 13 12 | Arg AGA 11 5 8 4 6 5 Met ATG 23 21 23 22 23 20 | ACG 11 12 12 12 14 13 | AAG 18 15 19 19 18 18 | AGG 9 10 13 14 12 13 -------------------------------------------------------------------------------------------------------------- Val GTT 8 13 10 10 10 6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15 17 | Gly GGT 13 7 12 10 11 10 GTC 18 13 18 20 19 21 | GCC 28 26 28 28 28 23 | GAC 29 21 26 33 29 30 | GGC 9 9 8 7 12 8 GTA 8 8 9 7 6 7 | GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21 22 | GGA 7 7 10 9 7 9 GTG 13 11 14 13 13 9 | GCG 11 10 10 10 7 7 | GAG 14 14 17 13 19 17 | GGG 7 6 9 8 7 9 -------------------------------------------------------------------------------------------------------------- -------------------------------------------------- Phe TTT 12 | Ser TCT 8 | Tyr TAT 6 | Cys TGT 8 TTC 22 | TCC 18 | TAC 15 | TGC 6 Leu TTA 5 | TCA 22 | *** TAA 0 | *** TGA 0 TTG 17 | TCG 17 | TAG 0 | Trp TGG 9 -------------------------------------------------- Leu CTT 14 | Pro CCT 12 | His CAT 5 | Arg CGT 6 CTC 19 | CCC 20 | CAC 20 | CGC 13 CTA 10 | CCA 16 | Gln CAA 17 | CGA 5 CTG 8 | CCG 11 | CAG 15 | CGG 8 -------------------------------------------------- Ile ATT 5 | Thr ACT 4 | Asn AAT 4 | Ser AGT 7 ATC 20 | ACC 21 | AAC 14 | AGC 12 ATA 11 | ACA 29 | Lys AAA 11 | Arg AGA 4 Met ATG 25 | ACG 15 | AAG 23 | AGG 13 -------------------------------------------------- Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT 7 GTC 18 | GCC 26 | GAC 33 | GGC 11 GTA 7 | GCA 24 | Glu GAA 23 | GGA 11 GTG 10 | GCG 8 | GAG 15 | GGG 11 -------------------------------------------------- Codon position x base (3x4) table for each sequence. #1: species1 position 1: T:0.18989 C:0.25524 A:0.25277 G:0.30210 position 2: T:0.26017 C:0.29470 A:0.27497 G:0.17016 position 3: T:0.17386 C:0.33785 A:0.24908 G:0.23921 Average T:0.20797 C:0.29593 A:0.25894 G:0.23716 #2: species2 position 1: T:0.19296 C:0.25211 A:0.24648 G:0.30845 position 2: T:0.27183 C:0.30704 A:0.26620 G:0.15493 position 3: T:0.20141 C:0.31831 A:0.22958 G:0.25070 Average T:0.22207 C:0.29249 A:0.24742 G:0.23803 #3: species3 position 1: T:0.18619 C:0.25031 A:0.25771 G:0.30580 position 2: T:0.25771 C:0.30210 A:0.26634 G:0.17386 position 3: T:0.19729 C:0.31936 A:0.24291 G:0.24044 Average T:0.21373 C:0.29059 A:0.25565 G:0.24003 #4: species4 position 1: T:0.20664 C:0.23616 A:0.26322 G:0.29397 position 2: T:0.26568 C:0.29766 A:0.27306 G:0.16359 position 3: T:0.16236 C:0.37638 A:0.21525 G:0.24600 Average T:0.21156 C:0.30340 A:0.25051 G:0.23452 #5: species5 position 1: T:0.19876 C:0.24348 A:0.25839 G:0.29938 position 2: T:0.25342 C:0.31677 A:0.26832 G:0.16149 position 3: T:0.18758 C:0.33416 A:0.23230 G:0.24596 Average T:0.21325 C:0.29814 A:0.25300 G:0.23561 #6: species6 position 1: T:0.19892 C:0.24899 A:0.24493 G:0.30717 position 2: T:0.26522 C:0.30041 A:0.26387 G:0.17050 position 3: T:0.17591 C:0.35047 A:0.22057 G:0.25304 Average T:0.21335 C:0.29995 A:0.24312 G:0.24357 #7: species7 position 1: T:0.20000 C:0.24121 A:0.26424 G:0.29455 position 2: T:0.25818 C:0.32000 A:0.26303 G:0.15879 position 3: T:0.16606 C:0.34909 A:0.23636 G:0.24848 Average T:0.20808 C:0.30343 A:0.25455 G:0.23394 Sums of codon usage counts ------------------------------------------------------------------------------ Phe F TTT 93 | Ser S TCT 61 | Tyr Y TAT 41 | Cys C TGT 65 TTC 141 | TCC 116 | TAC 92 | TGC 36 Leu L TTA 35 | TCA 129 | *** * TAA 0 | *** * TGA 0 TTG 99 | TCG 114 | TAG 0 | Trp W TGG 60 ------------------------------------------------------------------------------ Leu L CTT 97 | Pro P CCT 60 | His H CAT 43 | Arg R CGT 35 CTC 108 | CCC 132 | CAC 144 | CGC 98 CTA 50 | CCA 116 | Gln Q CAA 125 | CGA 40 CTG 96 | CCG 57 | CAG 103 | CGG 56 ------------------------------------------------------------------------------ Ile I ATT 45 | Thr T ACT 39 | Asn N AAT 23 | Ser S AGT 52 ATC 117 | ACC 129 | AAC 100 | AGC 89 ATA 75 | ACA 155 | Lys K AAA 83 | Arg R AGA 43 Met M ATG 157 | ACG 89 | AAG 130 | AGG 84 ------------------------------------------------------------------------------ Val V GTT 67 | Ala A GCT 87 | Asp D GAT 116 | Gly G GGT 70 GTC 127 | GCC 187 | GAC 201 | GGC 64 GTA 52 | GCA 151 | Glu E GAA 168 | GGA 60 GTG 83 | GCG 63 | GAG 109 | GGG 57 ------------------------------------------------------------------------------ (Ambiguity data are not used in the counts.) Codon position x base (3x4) table, overall position 1: T:0.19623 C:0.24664 A:0.25571 G:0.30141 position 2: T:0.26152 C:0.30559 A:0.26804 G:0.16485 position 3: T:0.18027 C:0.34113 A:0.23250 G:0.24610 Average T:0.21267 C:0.29779 A:0.25209 G:0.23746 Nei & Gojobori 1986. dN/dS (dN, dS) (Pairwise deletion) (Note: This matrix is not used in later ML. analysis. Use runmode = -2 for ML pairwise comparison.) species1 species2 0.2598 (0.0599 0.2306) species3 0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680) species4 0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981 0.3838) species5 0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487 (0.0552 0.2221) 0.2993 (0.0908 0.3034) species6 0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644 0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260) species7 0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787 0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967) TREE # 1: (((1, (2, 3)), 5), (6, 4), 7); MP score: -1 lnL(ntime: 11 np: 14): -7469.732728 +0.000000 8..9 9..10 10..1 10..11 11..2 11..3 9..5 8..12 12..6 12..4 8..7 0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030 0.062291 0.297695 0.117429 2.800021 0.731929 0.083728 Note: Branch length is defined as number of nucleotide substitutions per codon (not per neucleotide site). tree length = 1.22476 (((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010): 0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429); (((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525): 0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4: 0.297695): 0.001030, species7: 0.117429); Detailed output identifying parameters kappa (ts/tv) = 2.80002 dN/dS (w) for site classes (K=2) p: 0.73193 0.26807 w: 0.08373 1.00000 dN & dS for each branch branch t N S dN/dS dN dS N*dN S*dS 8..9 0.180 1857.3 725.7 0.3294 0.0381 0.1158 70.8 84.0 9..10 0.083 1857.3 725.7 0.3294 0.0176 0.0534 32.7 38.7 10..1 0.173 1857.3 725.7 0.3294 0.0366 0.1111 68.0 80.6 10..11 0.088 1857.3 725.7 0.3294 0.0186 0.0563 34.5 40.9 11..2 0.067 1857.3 725.7 0.3294 0.0143 0.0434 26.6 31.5 11..3 0.032 1857.3 725.7 0.3294 0.0068 0.0206 12.6 15.0 9..5 0.124 1857.3 725.7 0.3294 0.0263 0.0798 48.8 57.9 8..12 0.001 1857.3 725.7 0.3294 0.0002 0.0007 0.4 0.5 12..6 0.062 1857.3 725.7 0.3294 0.0132 0.0401 24.5 29.1 12..4 0.298 1857.3 725.7 0.3294 0.0631 0.1917 117.2 139.1 8..7 0.117 1857.3 725.7 0.3294 0.0249 0.0756 46.2 54.9 Time used: 0:10 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Can you get codeml to output what you need in another format, such as NEXUS? Haven't tried that, but as you can see, this is a very verbose output and NEXUS does not seem an option. Ultimately, I want to parse this to get all the information I need in a tabulated file. I am still working out what exactly I need (there are standard values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type of downstram analysis). I will now work on the pypaml class and modify the original code to make it more generic (it seems that it only works for Site Models). Will let you know, was just wondering if there was already a solution.There is one in Bioperl, but heard it is very slow and in any case, I don't understand much of perl.... Thanks, Anastasia From biopython at maubp.freeserve.co.uk Tue Sep 14 09:04:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 10:04:56 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <937371.74873.qm@web52006.mail.re2.yahoo.com> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> Message-ID: Hi Anastasia, On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: > Hi Peter, > >> >> Could you post a short example of the kind of output you are looking at? >> > > Here is an example output, but this caan differ depending on the model used > (there are several models for Branch, Site, BranchSite, but all are pretty > standard) > Thanks - that looks possible to parse, but not very easy (especially if the codeml output changes slightly between versions). >> >> Can you get codeml to output what you need in another format, such as NEXUS? >> > > Haven't tried that, but as you can see, this is a very verbose output and > NEXUS does not seem an option. At first glance, the NEXUS format could hold a lot of that information. Another possibility might be phyloXML. However, you are at the mercy of the codeml tool and what it supports. I might be worth politely asking the author(s) about supporting one of these more standard formats as a optional output. > Ultimately, I want to parse this to get all the information I need in a > tabulated file. I am still working out what exactly I need (there are standard > values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type > of downstram analysis). I will now work on the pypaml class and modify the > original code to make it more generic (it seems that it only works for Site > Models). Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so unless he agrees to re-license it we cannot include it in Biopython. > Will let you know, was just wondering if there was already a solution.There is > one in Bioperl, but heard it is very slow and in any case, I don't understand > much of perl.... I don't know much Perl either ;) Peter From p.j.a.cock at googlemail.com Tue Sep 14 09:13:04 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 10:13:04 +0100 Subject: [Biopython] problems searching swiss prot In-Reply-To: References: Message-ID: On Mon, Sep 13, 2010 at 9:40 PM, Peter Cock wrote: > Forwarding a query from Jessica Grant since she appears > to have had trouble posting to the mailing list. > > Jessica wrote: > >> Hello, >> >> I am running a few scripts to try to extract sequence information >> out of uniprot. ?One program called AutoFACT gives me ID numbers >> associated with that database. ?Most of these look like this: >> >> D2V5S4_NAEGR >> Q48KU2_PSE14 >> Q22B72_TETTH >> >> >> and my downstream scripts, which are written in biopython, are >> fine with this. ?Then, every once in a while, a sequence will come >> back with a name that looks like this: >> >> UPI00006CC162 >> >> and everything goes bad. ?My script can't handle these names, >> apparently, although if I go to uniprot.org and search for it, the >> sequence comes up. >> >> My script uses the following, where RepID is the number >> extracted from AutoFACT: >> >> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None) >> ? ? ? ?seq_record = SeqIO.read(handle, "swiss") >> >> Any thoughts? >> >> Thank you, >> >> Jessica > > Hi Jessica, > > I think the problem is that these unusual identifiers are > not UniProt/SwissProt accession identifiers. The URL > this Biopython function uses was originally from > www.expasy.ch but is now on www.uniprot.org as > described here: > > http://www.expasy.ch/expasy_urls.html > > I think the ID UPI00006CC162 is a UniProt ID of some > kind, so it may be possible to access the information > you want somehow. See for example: > > http://www.uniprot.org/uniparc/UPI00006CC162 > > However, it is not clear to me right away if you can get > this record back as a plain text "swiss" format entry... > > Peter Jessica replied (off list), to say: >> Oh, and I got a great help from someone at Uniprot for my >> previous question...turns out you can get the sequences >> downloaded as fasta files: >> >> http://www.uniprot.org/uniparc/UPI00006CC162.fasta >> >> and I could then read them into SeqIO as a fasta and >> manipulate them that way. I guess the UPI at the start stands for Uni Parc Identifier. Note that the page I linked to earlier has links to several file formats including FASTA, but not plain text "SwissProt" format: http://www.uniprot.org/uniparc/UPI00006CC162 Peter From p.j.a.cock at googlemail.com Tue Sep 14 09:49:56 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 10:49:56 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 3:47 AM, Jessica Grant wrote: >On Mon, Sep 13, 2010 at 9:49 PM, Peter Cock wrote: >> On Mon, Sep 13, 2010 at 7:43 PM, Jessica Grant wrote: >>> >>> Hello, >>> >>> I am working with an organism that has an unusual genetic code. Is there >>> a way I can use biopython's translate() but import my own codon table >>> instead of using the standard ncbi tables that are coded in the CodonTable >>> module? >>> >>> Thanks! >>> >>> Jessica >> >> Hi Jessica, >> >> Good question - this is something I had thought about but not done anything >> since no one had ever asked about using a non-standard table. After all, the >> NCBI do have a pretty comprehensive list. I'm curious which organism(s) you >> are using. >> >> In answer to your query, right now, not easily. However, it would be simple to >> tweak the Bio.Seq module to allow the table argument to be a string or integer >> as now (for referring to a built in NCBI table) or a CodonTable object which you >> would have to supply. These are defined in the Bio.Data.CodonTable module. >> If this sounds useful and you could help with testing, it could be done ready >> for the next release of Biopython. >> >> Peter > > Thanks Peter, > > We are doing some work on a ciliate called Chilodonella uncinata. ?It > apparently has only one stop codon and the others are recoded in an > unusual way so it doesn't quite fit any of the ncbi tables. > > I did try to play around with the CodonTable module, but couldnt' quite > figure out how to do it. ?Just making a new table similar to the tables that > are in the module didn't do it, and I didn't feel comfortable messing around > in the depths of biopython. ?:) > > I would be happy to help with testing and I guess in the meantime I will be > putting lots of if statements in my script. > > Jessica Hi Jessica, Do you have the information for the CodonTable handy? e.g. a list of the start codons, and how to translate the 64 codons (including stop codons). Given that I could show you how to make the CodonTable object. Peter From p.j.a.cock at googlemail.com Tue Sep 14 10:39:34 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 11:39:34 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 10:49 AM, Peter Cock wrote: > > Hi Jessica, > > Do you have the information for the CodonTable handy? e.g. a list of the > start codons, and how to translate the 64 codons (including stop codons). > Given that I could show you how to make the CodonTable object. > > Peter > I've done a proof of principle change to Bio.Seq on this branch: http://github.com/peterjc/biopython/tree/trans-table specifically this commit: http://github.com/peterjc/biopython/commit/56a2fd5f92098e9be892eb51f27b08aaa46a19a6 I'm not expecting you to try this code out yet (unless you happen to know your way round git already). The basic idea is that the Bio.Seq translate function and the Seq object translate method are extended so that the table argument can now also be a CodonTable object. Once we know what your table should look like, I can write a complete example. Probably Bio.Data.CodonTable will need some more documentation added... Peter From zaricdragoslav at gmail.com Tue Sep 14 11:08:55 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 15:08:55 +0400 Subject: [Biopython] Intro Message-ID: Dear All, I am new member and I like to send welcome greet to everyone. I have few newbie questions so please be cooperative :) 1. How can I see biopython version and is there connection between python version and biopython version ? 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 (at least I think I did :) I have same installation on windows machine and everything works fine. But for example when I want to use something like this: from Bio import SeqIO orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") Two problems happens in ubuntu environment: first is that SeqIO complains that there is no index method second is that everywhere I should put string location of file biopython wants handle to file The first thing I can think of is maybe I am using old version of biopython, which points to question 1. 3. Does somebody have experience with using biopython in django web site ? Do I install biopython on web server or I can keep libraries in some folder and load them dynamically in code ? Kind regards, Dragoslav Zaric Programmer Msc. in Astrophysics From biopython at maubp.freeserve.co.uk Tue Sep 14 11:44:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 12:44:47 +0100 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric wrote: > Dear All, > > I am new member and I like to send welcome greet to everyone. > > I have few newbie questions so please be cooperative :) > Hello and welcome :) > 1. How can I see biopython version and is there connection between python > version and biopython version ? Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions of Biopython may not have worked 100% on Python 2.7, but we did previously support Python 2.3 and even older versions of Python. There is a FAQ (frequently asked questions) section in the Tutorial for how to determine the version of Biopython installed. Try: import Bio print Bio.__version__ The Tutorial for the latest release is online as PDF or HTML, http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/tutorial/Tutorial.html > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 > (at least I think I did :) Based on the problems below, I don't think it worked. > ? ?I have same installation on windows machine and everything works fine. > ? ?But for example when I want to use something like this: > > ? ?from Bio import SeqIO > ? ?orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") > > ? ?Two problems happens in ubuntu environment: > ? ?first is that SeqIO complains that there is no index method That does suggest you have an old version of Biopython. The index function was added in Biopython 1.52, see: http://biopython.open-bio.org/SRC/biopython/NEWS http://news.open-bio.org/news/2009/09/biopython-release-152/ > ? ?second is that everywhere I should put string location of file > ? ?biopython wants handle to file Things like Bio.SeqIO will accept filenames in recent versions of Biopython (since release 1.54), but older versions only accepted file handles. This is discussed in an FAQ in recent versions of the Tutorial which point to this section on handles: http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles > ? ?The first thing I can think of is maybe I am using old version of > ? ?biopython, which points to question 1. That does seem to be the problem. > 3. Does somebody have experience with using biopython in django > web site ? ?Do I install biopython on web server or I can keep libraries > in some folder and?load them dynamically in code ? I've used Biopython within TurboGears, but I haven't used django. You should probably consult the django documentation for how they recommend installing 3rd party libraries (e.g. they may recommend using a virtual environment). Peter From zaricdragoslav at gmail.com Tue Sep 14 12:04:47 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 16:04:47 +0400 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: Peter, Thank you so very much for detailed explanations. I will try to upgrade biopython version under linux. Kind regards, Dragoslav Zaric On Tue, Sep 14, 2010 at 3:44 PM, Peter wrote: > On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric > wrote: > > Dear All, > > > > I am new member and I like to send welcome greet to everyone. > > > > I have few newbie questions so please be cooperative :) > > > > Hello and welcome :) > > > 1. How can I see biopython version and is there connection between python > > version and biopython version ? > > Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions > of Biopython may not have worked 100% on Python 2.7, but we did > previously support Python 2.3 and even older versions of Python. > > There is a FAQ (frequently asked questions) section in the Tutorial for > how to determine the version of Biopython installed. Try: > > import Bio > print Bio.__version__ > > The Tutorial for the latest release is online as PDF or HTML, > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 > > (at least I think I did :) > > Based on the problems below, I don't think it worked. > > > I have same installation on windows machine and everything works fine. > > But for example when I want to use something like this: > > > > from Bio import SeqIO > > orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta") > > > > Two problems happens in ubuntu environment: > > first is that SeqIO complains that there is no index method > > That does suggest you have an old version of Biopython. The index > function was added in Biopython 1.52, see: > > http://biopython.open-bio.org/SRC/biopython/NEWS > http://news.open-bio.org/news/2009/09/biopython-release-152/ > > > second is that everywhere I should put string location of file > > biopython wants handle to file > > Things like Bio.SeqIO will accept filenames in recent versions of > Biopython (since release 1.54), but older versions only accepted > file handles. This is discussed in an FAQ in recent versions of the > Tutorial which point to this section on handles: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles > > > The first thing I can think of is maybe I am using old version of > > biopython, which points to question 1. > > That does seem to be the problem. > > > 3. Does somebody have experience with using biopython in django > > web site ? Do I install biopython on web server or I can keep libraries > > in some folder and load them dynamically in code ? > > I've used Biopython within TurboGears, but I haven't used django. > You should probably consult the django documentation for how they > recommend installing 3rd party libraries (e.g. they may recommend > using a virtual environment). > > Peter > From bartek at rezolwenta.eu.org Tue Sep 14 12:20:44 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 14:20:44 +0200 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:04 PM, Dragoslav Zaric wrote: > Peter, > > Thank you so very much for detailed explanations. > > I will try to upgrade biopython version under linux. > > Hi, Since you mentioned that you are working on ubuntu, I wanted to add that you should be careful when upgrading the python/biopython versions on your machine. You are most probably now running both python and biopython installed from ubuntu packages, but if you want to upgrade, you have to choose between taking newer packages from a newer ubuntu (currently the newest ubuntu 10.10 beta contains biopython 1.53 http://packages.ubuntu.com/lucid/python-biopython) or compiling from source. If you choose to install from source, be sure to first uninstall the old version from the package: sudo apt-get remove python-biopython if you want to install from source, you will need some extra packages: sudo apt-get install python-dev python-reportlab python-numpy good luck Bartek From bartek at rezolwenta.eu.org Tue Sep 14 13:13:59 2010 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 14 Sep 2010 15:13:59 +0200 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski wrote: > (currently the newest ubuntu 10.10 beta contains biopython 1.53 > http://packages.ubuntu.com/lucid/python-biopython) Just a small correction: 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains 1.54 (http://packages.ubuntu.com/maverick/python-biopython) sorry for the error Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From zaricdragoslav at gmail.com Tue Sep 14 14:22:55 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 18:22:55 +0400 Subject: [Biopython] Intro In-Reply-To: References: Message-ID: Thanks for answers and help ! Actually I do not prefer to use ubuntu above 9.04 and there is no reason to change distribution because one program. I just did sudo apt-get remove python-biopython and after this 1.55 was automatically activated. I did install 1.55 but it looks like older version of biopython from default package was masking new biopython version. Thanks again ! On Tue, Sep 14, 2010 at 5:13 PM, Bartek Wilczynski wrote: > > > On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> (currently the newest ubuntu 10.10 beta contains biopython 1.53 >> http://packages.ubuntu.com/lucid/python-biopython) > > > Just a small correction: > 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains > 1.54 (http://packages.ubuntu.com/maverick/python-biopython) > > sorry for the error > Bartek > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Sep 14 14:29:27 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 14 Sep 2010 18:29:27 +0400 Subject: [Biopython] Some books Message-ID: Dear friends, I do not come from bioinformatics background, so can anybody recommend some introducing book about bioinformatics so I can cover the basics. Of course there are a lot of python programming in biopython that is out of biology (like parsing of database files, connect to databases), but to get clear picture it is good to read some introducing book. Is book "Introduction to Bioinformatics" by Arthur Lesk good one ? Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From p.j.a.cock at googlemail.com Tue Sep 14 14:58:05 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 15:58:05 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant wrote: > Hi Peter, > > Here is the codon table, in the format I found in CodonTable.py. > > I will look at the links you sent, but I don't know if I will be able to > follow it all. ?Thanks, > > Jessica > > ? ? ? ? ? ? ? ? ? ?table = { > ? ? 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', > ? ? 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', > ? ? 'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', > ? ? 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', > ? ? 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', > ? ? 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', > ? ? 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', > ? ? 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', > ? ? 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', > ? ? 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', > ? ? 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', > ? ? 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', > ? ? 'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',}, > ? ? ? ? ? ? ? ? ? ?stop_codons = ['TAA' ], > ? ? ? ? ? ? ? ? ? ?start_codons = [ 'ATG'] > ? ? ? ? ? ? ? ? ? ?) OK, don't worry about the git branch stuff - I've just merged this to the main repository. Are you happy with installing Biopython from source? If so grab the latest source code as described here: http://www.biopython.org/wiki/SourceCode Alternatively all you need to update is the Bio/Seq.py file to the latest version: http://github.com/biopython/biopython/raw/master/Bio/Seq.py To use the new functionality, first you need to create a CodonData object with your special table, and assuming you are just working with unambiguous DNA that means: from Bio.Data.CodonTable import CodonTable c_uncinata_table = CodonTable(forward_table={ 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAG': 'Q', 'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'}, start_codons = [ 'ATG'], stop_codons = ['TAA' ]) Note that order of the forward table dictionary entries does not actually matter, however, I have moved the TAG and TGA entries from the end to keep the whole table in a standard order - I found this easier to check. If you have the updated Bio.Seq module, then you can do this: >>> from Bio.Alphabet import generic_dna >>> from Bio.Seq import Seq >>> seq = Seq("AAATAGTGATAA", generic_dna) >>> print seq.translate() K*** >>> print seq.translate(table=c_uncinata_table) KQW* Or using strings, >>> from Bio.Seq import translate >>> print translate("AAATAGTGATAA") K*** >>> print translate("AAATAGTGATAA", table=c_uncinata_table) KQW* Does that make sense? Does it do what you expect? Don't hesitate to ask for clarification. Peter From Achim.Treumann at NEPAF.com Tue Sep 14 14:51:29 2010 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Tue, 14 Sep 2010 15:51:29 +0100 Subject: [Biopython] Some books In-Reply-To: References: Message-ID: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Dear Dragoslav, I cannot comment on Arthur Lesk's book - haven't read it. I can really recommend two freely available tutorials on Katja Schuerer's website: One of them is an introduction to programming using Python: http://www.pasteur.fr/formation/infobio/python/ The other one is a Python course in Bioinformatics: http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html Both of them provide you with numerous examples and take you through tips and tricks on how to address bioinformatic problems using Python and Biopython. I presume that you are familiar with the Biopython manual that is part of the Biopython distribution: http://www.biopython.org/DIST/docs/tutorial/Tutorial.html Hope this helps, Best wishes, Achim -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Dragoslav Zaric Sent: 14 September 2010 15:29 To: biopython at lists.open-bio.org Subject: [Biopython] Some books Dear friends, I do not come from bioinformatics background, so can anybody recommend some introducing book about bioinformatics so I can cover the basics. Of course there are a lot of python programming in biopython that is out of biology (like parsing of database files, connect to databases), but to get clear picture it is good to read some introducing book. Is book "Introduction to Bioinformatics" by Arthur Lesk good one ? Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cjfields at illinois.edu Tue Sep 14 14:59:49 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Sep 2010 09:59:49 -0500 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> Message-ID: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> On Sep 14, 2010, at 4:04 AM, Peter wrote: > Hi Anastasia, > > On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >> Hi Peter, >> >>> >>> Could you post a short example of the kind of output you are looking at? >>> >> >> Here is an example output, but this caan differ depending on the model used >> (there are several models for Branch, Site, BranchSite, but all are pretty >> standard) >> > > Thanks - that looks possible to parse, but not very easy (especially if the > codeml output changes slightly between versions). > >>> >>> Can you get codeml to output what you need in another format, such as NEXUS? >>> >> >> Haven't tried that, but as you can see, this is a very verbose output and >> NEXUS does not seem an option. > > At first glance, the NEXUS format could hold a lot of that information. > Another possibility might be phyloXML. However, you are at the mercy > of the codeml tool and what it supports. I might be worth politely asking > the author(s) about supporting one of these more standard formats as > a optional output. > >> Ultimately, I want to parse this to get all the information I need in a >> tabulated file. I am still working out what exactly I need (there are standard >> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type >> of downstram analysis). I will now work on the pypaml class and modify the >> original code to make it more generic (it seems that it only works for Site >> Models). > > Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so > unless he agrees to re-license it we cannot include it in Biopython. > >> Will let you know, was just wondering if there was already a solution.There is >> one in Bioperl, but heard it is very slow and in any case, I don't understand >> much of perl.... > > I don't know much Perl either ;) > > Peter Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that. It's extremely frustrating. chris From p.j.a.cock at googlemail.com Tue Sep 14 16:04:52 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Sep 2010 17:04:52 +0100 Subject: [Biopython] Some books In-Reply-To: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> References: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Message-ID: On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann wrote: > Dear Dragoslav, > > ... > > The other one is a Python course in Bioinformatics: > http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html The above Pasteur Institute course using Biopython is sadly very out of date in places, and I have been unable to get in touch with the authors to revise it or at least add some warning text to it. Peter From biopython at maubp.freeserve.co.uk Tue Sep 14 16:07:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Sep 2010 17:07:52 +0100 Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> Message-ID: On Tue, Sep 14, 2010 at 3:59 PM, Chris Fields wrote: > On Sep 14, 2010, at 4:04 AM, Peter wrote: >> On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >>> >>> Here is an example output, but this caan differ depending on the model used >>> (there are several models for Branch, Site, BranchSite, but all are pretty >>> standard) >> >> Thanks - that looks possible to parse, but not very easy (especially if the >> codeml output changes slightly between versions). > > Just a warning from those experienced with paml parsers (bioperl): the > output is notoriously shifty even between minor releases (sections get > reordered, etc), so pretty much any parse needs to accommodate that. > It's extremely frustrating. Thanks Chris - I was afraid of that. It sounds like parsing plain text NCBI BLAST output, but worse. Do you know if anyone has asked about codeml outputting something nicer to parse instead? e.g. Nexus or any kind of XML? Peter From Achim.Treumann at NEPAF.com Tue Sep 14 16:20:36 2010 From: Achim.Treumann at NEPAF.com (Achim Treumann) Date: Tue, 14 Sep 2010 17:20:36 +0100 Subject: [Biopython] Some books In-Reply-To: References: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local> Message-ID: <01798D2396253A449511F31F1CDE83550FBAF2@srv1.NEPAF.local> Hiya, I agree about this warning (and have come across a few bits where this has caused problems) - despite that I found them very useful. Best wishes, Achim -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: 14 September 2010 17:05 To: Achim Treumann Cc: Dragoslav Zaric; biopython at lists.open-bio.org Subject: Re: [Biopython] Some books On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann wrote: > Dear Dragoslav, > > ... > > The other one is a Python course in Bioinformatics: > http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html The above Pasteur Institute course using Biopython is sadly very out of date in places, and I have been unable to get in touch with the authors to revise it or at least add some warning text to it. Peter From natassa_g_2000 at yahoo.com Tue Sep 14 16:15:18 2010 From: natassa_g_2000 at yahoo.com (natassa) Date: Tue, 14 Sep 2010 09:15:18 -0700 (PDT) Subject: [Biopython] Codeml parser in Biopython? In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> References: <533513.93597.qm@web52005.mail.re2.yahoo.com> <937371.74873.qm@web52006.mail.re2.yahoo.com> <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu> Message-ID: <884978.85315.qm@web52004.mail.re2.yahoo.com> Thanks Chris, Good to know.. I am dealing with paml results for the first time, but somehow thought that outputs were standard. Apparently not... Now that I started writing my own python parser, I see that even among models of the same run, the text changes without any obvious reason (from 'omega' to 'w' etc). Indeed frustrating! Does the Bioperl solution include different parsers for different types of analysis ex The Branch analysis models, another for the Site Analysis models etc? It would be good o have one for all, but I am not sure this is feasible...I start with separate parsers and will see how it can be generalized. Thanks, Anastasia ________________________________ From: Chris Fields To: Peter Cc: natassa ; biopython at biopython.org Sent: Tue, September 14, 2010 4:59:49 PM Subject: Re: [Biopython] Codeml parser in Biopython? On Sep 14, 2010, at 4:04 AM, Peter wrote: > Hi Anastasia, > > On Tue, Sep 14, 2010 at 9:02 AM, natassa wrote: >> Hi Peter, >> >>> >>> Could you post a short example of the kind of output you are looking at? >>> >> >> Here is an example output, but this caan differ depending on the model used >> (there are several models for Branch, Site, BranchSite, but all are pretty >> standard) >> > > Thanks - that looks possible to parse, but not very easy (especially if the > codeml output changes slightly between versions). > >>> >>> Can you get codeml to output what you need in another format, such as NEXUS? >>> >> >> Haven't tried that, but as you can see, this is a very verbose output and >> NEXUS does not seem an option. > > At first glance, the NEXUS format could hold a lot of that information. > Another possibility might be phyloXML. However, you are at the mercy > of the codeml tool and what it supports. I might be worth politely asking > the author(s) about supporting one of these more standard formats as > a optional output. > >> Ultimately, I want to parse this to get all the information I need in a >> tabulated file. I am still working out what exactly I need (there are standard >> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the >>type >> of downstram analysis). I will now work on the pypaml class and modify the >> original code to make it more generic (it seems that it only works for Site >> Models). > > Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so > unless he agrees to re-license it we cannot include it in Biopython. > >> Will let you know, was just wondering if there was already a solution.There is >> one in Bioperl, but heard it is very slow and in any case, I don't understand >> much of perl.... > > I don't know much Perl either ;) > > Peter Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that. It's extremely frustrating. chris From p.j.a.cock at googlemail.com Wed Sep 15 17:10:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 15 Sep 2010 18:10:46 +0100 Subject: [Biopython] unusual genetic code In-Reply-To: References: Message-ID: On Wed, Sep 15, 2010 at 5:50 PM, Jessica Grant wrote: >Peter wrote: >> ... >> To use the new functionality, first you need to create a >> CodonData object with your special table, and assuming >> you are just working with unambiguous DNA that means: >> ... >> Does that make sense? Does it do what you expect? >> Don't hesitate to ask for clarification. >> >> Peter > > It works! Thanks so much!! > > Jessica Great - thanks for letting us know. Peter From cwalentas at gmail.com Thu Sep 16 04:36:13 2010 From: cwalentas at gmail.com (Christopher Walentas) Date: Thu, 16 Sep 2010 00:36:13 -0400 Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource Message-ID: <4C919EBD.3080802@gmail.com> Apologies in advance- all of this is very new to me- and I hope that this is the proper forum for this query. What I would like to do is parse the returns of an entrez pubmed search into their smallest, unique useful bits and create a relational database (sqlite, dee?). Ideally this would not only be of returned fields, but also drilling further down into say affiliation, addresses, etc... I believe that I've mastered the search and download functions and individual citations exist as a stacked dictionary of the xml outputs. Where I am falling down is understanding how to extract the structure of these outputs and create a persistent relational resource that's been normalized such that these fields can be mapped to used to "correct" values in an uncurated dataset with highly analogous fields. I've been struggling to bridge the gap between python and sqlite/dee, however have recently been informed that it might be possible to do everything within python itself and again apologies for any navieties- they are indeed sincere, however I'm well aware that a little knowledge can be dangerous- hence reaching out. From what I've already read, it would seem that all of this is ideally suited to bio-/python and am looking forward to learning- I'm just looking for that swift shove in the right direction and to benefit from your collective informed guidance. Cheers in advance, christopher From mjldehoon at yahoo.com Thu Sep 16 10:53:55 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 16 Sep 2010 03:53:55 -0700 (PDT) Subject: [Biopython] Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 Message-ID: <841162.95892.qm@web62405.mail.re1.yahoo.com> --- On Thu, 9/16/10, Paolo Romano wrote: > From: Paolo Romano > Subject: Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 > To: biopython-owner at lists.open-bio.org > Date: Thursday, September 16, 2010, 3:25 AM > Dear list owner, > > I would be glad if you would forward thsi message to the > list. > > Many thanks in adavnce. > > Ciao. Paolo > > >Date: Wed, 15 Sep 2010 17:50:48 +0200 > >To: biopython at lists.open-bio.org > >From: Paolo Romano > >Subject: NETTAB 2010: Submission deadline is > approaching: Sep 24, 2010 > > > >I hope this announcement can be of interest for this > list. > > > >Forgive me if I'm wrong! > > > >Ciao. Paolo > > > >========== > >NETTAB 2010 on "Biological Wikis" > >joint with the BBCC 2010 workshop on Bioinformatics and > > >Computational Biology in Campania > > > >November 29 - December 1, 2010, Naples, Italy > >http://www.nettab.org/2010/ > >http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/ > > > > > >The deadline for the submission of oral communications > is quickly > >approaching, submit you contribution within next > >Friday September 24, 2010 through the EasyChair site ( > > >http://www.easychair.org/conferences/?conf=nettab2010 > ). > > > >The lenght of contributions for oral communications > should be > >between 3 and 5 pages, including tables and figures. > >See more instructions below. > > > > > >NETTAB 2010 workshop promises to be a great meeting for > all > >researchers involved in the exploitation of wikis in > biology. > > > >Don't miss this opportunity to discuss your ideas and > doubts with > >such scientists as > >- Alex Bateman, Wellcome Trust Sanger Institute, > Hinxton, Cambridge, > >United Kingdom > >- Alexander Pico, Gladstone Institute of Cardiovascular > Disease, San > >Francisco, USA > >- Andrew Su, Bioinformatics and Computational Biology, > Genomics > >Institute of the Novartis Research Foundation (GNF), > San Diego, USA > >- Dan Bolser, College of Life Sciences, University of > Dundee, > >Scotland, United Kingdom > >- Robert Hoffmann, Computational Biology Center, cBIO, > Memorial > >Sloan-Kettering Cancer Center, MSKCC, New York, USA > >- Thomas Kelder, Department of Bioinformatics (BiGCaT), > Maastricht > >University, the Netherlands > >- Jaime Prilusky, Bioinformatics, Weizmann Institute of > Science, > >Rehovot, Israel > >- and many other who, we hope, will join the workshop. > > > >Here below, please find a summary of the Call. The > complete Call is > >available on-line at http://www.nettab.org/2010/call.html . > > > >Further information is availble at http://www.nettab.org/2010/ . > > > >============ > >CALL FOR PAPERS > > > >TOPICS > >The following list is not meant to be exclusive of any > further > >topics as stated above. > >Submitted contributions should address one or more of > the following topics: > >? ???* Wiki development tools > >? ? ? ? ???o > Wikimedia > >? ? ? ? ???o > Wikimedia extensions > >? ? ? ? ???o > Semantic Wikis > >? ? ? ? ???o > Wiki-coupled CMSs > >? ? ? ? ???o Other > wikis > >? ???* Arising issues for the > biomedical domain: > >? ? ? ? ???o > Authoritativeness of contributions and sites > >? ? ? ? ???o Quality > assessment > >? ? ? ? ???o Users > acknowledgement > >? ? ? ? ???o > Stimulatation of quality contributions > >? ? ? ? ???o > Authorships management and reward > >? ? ? ? ???o > 'Scientific production' value for contributions > >? ? ? ? ???o > Management of bioinformatics data types > >? ???* Wikis and collaborative > systems for: > >? ? ? ? ???o > Genomics, proteomics, metabolomics, any -omics > >? ? ? ? ???o > Proteins analysis and visualization > >? ? ? ? ???o gene > and proteins interactions > >? ? ? ? ???o > metabolic pathways > >? ? ? ? ???o > oncology research > >? ???* Issues to be tackled by wiki > and collaborative research for: > >? ? ? ? ???o > Genomics, proteomics, metabolomics, any -omics > >? ? ? ? ???o > Proteins analysis and visualization > >? ? ? ? ???o gene > and proteins interactions > >? ? ? ? ???o > metabolic pathways > >? ? ? ? ???o > oncology research > > > >The NETTAB 2010 workshop is a joint event with the BBCC > 2010 workshop on > >This deadline also applies to the BBCC 2010 workshop. > >Submit for BBCC through the same EasyChair site and > select 'BBCC > >session' topic. > > > > > >TYPE OF CONTRIBUTIONS > > > >The following possible contributions are sought: > >? ???* Oral communications > >? ???* Posters > >? ???* Software demos > >All accepted contributions will be published in the > proceedings of > >the workshop. > > > > > >DEADLINES > > > >* September 24, 2010: Oral communications submission > >? ? ? ? ???o > Decisions announced: October 24, 2010 > > > >* October 29, 2010: Early registration ends > > > >* November 29 - December 1, 2010: Workshop and > Tutorials > > > > > >INSTRUCTIONS > >Kindly follow the instructions carefully when preparing > your > >contribution and submit your contribution through the > EasyChair > >system at http://www.easychair.org/conferences/?conf=nettab2010. > > > >All contributions should follow the same format, as > specified here: > >font type: Times New Roman, font size: 12 pti, page > size: A4, left > >and right margins: 2.0 cm, upper margin: 2.5 cm, lower > margin: 2.0 cm. > > > >The lenght of contributions for oral communications > should be > >between 3 and 5 pages, including tables and figures. > >They should include: Abstract, Introduction, Methods, > Results and > >Discussion, References. > >All contributions for oral communications will be > evaluated by at > >least three referees. > > > >For any further information or clarification, please > contact the > >organization by email at info at nettab.org. > > > > > >ORGANIZATION (see http://www.nettab.org/2010/organization.html for > >the Scientific Committee and more information) > > > >Co-chairs > >? ???* Angelo Facchiano, CNR-ISA, > Avellino, Italy > >? ???* Paolo Romano, National > Cancer Research Institute, Genoa, Italy > > > >We look forward to meeting you in Naples! > > > >Paolo Romano and Angelo Facchiano > >???on behalf of the Scientific > Committee > > > > > >Paolo Romano (paolo.romano at istge.it) > >Bioinformatics > >National Cancer Research Institute (IST) > >Largo Rosanna Benzi, 10, I-16132, Genova, Italy > >Tel: +39-010-5737-288? Fax: +39-010-5737-295? > Skype: p.romano > >Web: http://www.nettab.org/promano/ > > > > > > > > > > > Paolo Romano (paolo.romano at istge.it) > Bioinformatics > National Cancer Research Institute (IST) > > > > From zaricdragoslav at gmail.com Sun Sep 19 06:33:06 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 10:33:06 +0400 Subject: [Biopython] Trhird party library Message-ID: Did anybody used biopython as third part library, like for example in python web project ? I ask this because probably you can not expect to find or install biopython in provider server environment. For example, after installing biopython in windows environment, you can see that biopython is installed inside python 2.6 installation: C:\Python26\Lib\site-packages\Bio C:\Python26\Lib\site-packages\BioSQL C:\Python26\Lib\site-packages\numpy So can you copy these folders to, for example, \Lib\ folder of web project, and reference them somehow from code ? Of course I can test this by myself, and I will do this, but maybe somebody have experience with this problem, and it would be probably good info for others in this forum. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From chapmanb at 50mail.com Sun Sep 19 10:51:19 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 19 Sep 2010 06:51:19 -0400 Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource In-Reply-To: <4C919EBD.3080802@gmail.com> References: <4C919EBD.3080802@gmail.com> Message-ID: <20100919105119.GC2030@kunkel> Christopher; > What I would like to do is parse the returns of an entrez pubmed > search into their smallest, unique useful bits and create a > relational database (sqlite, dee?). Ideally this would not only be > of returned fields, but also drilling further down into say > affiliation, addresses, etc... [...] > Where I am falling down is understanding how to extract the > structure of these outputs and create a persistent relational > resource that's been normalized such that these fields can be mapped > to used to "correct" values in an uncurated dataset with highly > analogous fields. This is the standard problem of represent object style data in a flat relational database. It's tough to answer succinctly on a mailing list, as there are entire textbooks and courses devoted to the problem. The wikipedia entry on normalization and first normal form is a good place to get started: http://en.wikipedia.org/wiki/Database_normalization As far as accessing relational databases, Python is great for this. An object relational mapper like SQLAlchemy: http://www.sqlalchemy.org/ is a great place to get started. This allows you to deal more directly with objects, and also generalizes database access so you can quickly switch from SQLite to MySQL to whatever. Another suggestion is to use a document oriented database like MongoDB for storing your data: http://www.mongodb.org/ This allows you to store objects without flattening them, which may be more intuitive for the XML/dictionary results you get back from Entrez searches. Hope this helps, Brad From chapmanb at 50mail.com Sun Sep 19 10:44:40 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 19 Sep 2010 06:44:40 -0400 Subject: [Biopython] Third party library In-Reply-To: References: Message-ID: <20100919104440.GB2030@kunkel> Dragoslav; > Did anybody used biopython as third part library, like for example in python > web project ? Yes, absolutely. Biopython doesn't behave any different than other Python third party libraries, so there wouldn't be any special instructions outside the documentation for the library you are using. > I ask this because probably you can not expect to find or install biopython > in provider server environment. It's tough to answer this generally without knowing what framework you are planning to use. For an example, Google App Engine has a restricted environment where only pure Python libraries work. As an install procedure you can most simply do: python setup.py build and then copy the libraries from build/lib.your_platform to the site-libraries location in your application. More formally, virtualenv is also very useful for building an isolated Python environment with only the libraries for a project: http://pypi.python.org/pypi/virtualenv > For example, after installing biopython in windows environment, you can see > that biopython is > installed inside python 2.6 installation: > > C:\Python26\Lib\site-packages\Bio > C:\Python26\Lib\site-packages\BioSQL > C:\Python26\Lib\site-packages\numpy > > So can you copy these folders to, for example, \Lib\ folder of web project, > and reference them somehow from code ? Sure, that all seems fine but it's hard to offer specific advise without knowing exactly what you are doing. The best place for questions is probably in the community of the web framework you are using. Everything that applies to other third party libraries will apply to Biopython. Hope this helps, Brad From sdavis2 at mail.nih.gov Sun Sep 19 11:02:45 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 19 Sep 2010 07:02:45 -0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric wrote: > Did anybody used biopython as third part library, like for example in > python > web project ? > I ask this because probably you can not expect to find or install biopython > in provider server > environment. > > For example, after installing biopython in windows environment, you can see > that biopython is > installed inside python 2.6 installation: > > C:\Python26\Lib\site-packages\Bio > C:\Python26\Lib\site-packages\BioSQL > C:\Python26\Lib\site-packages\numpy > > So can you copy these folders to, for example, \Lib\ folder of web project, > and reference them > somehow from code ? > > Of course I can test this by myself, and I will do this, but maybe somebody > have experience > with this problem, and it would be probably good info for others in this > forum. > > Hi, Dragoslav. The python developers thought of this problem. http://docs.python.org/install/#alternate-installation-the-home-scheme Sean From zaricdragoslav at gmail.com Sun Sep 19 12:11:48 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 16:11:48 +0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Anyway, I will try simplest thing, to copy folder with biopython modules in some folder of web app and access modules trough absolute path of web server, this must work. At first I planned to use django web framework, but I recently discovered there are many python web frameworks. So i prefer most simplistic and effective frameworks, I will check out web.py it looks nice at first glance. Kind regards On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis wrote: > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric > wrote: > >> Did anybody used biopython as third part library, like for example in >> python >> web project ? >> I ask this because probably you can not expect to find or install >> biopython >> in provider server >> environment. >> >> For example, after installing biopython in windows environment, you can >> see >> that biopython is >> installed inside python 2.6 installation: >> >> C:\Python26\Lib\site-packages\Bio >> C:\Python26\Lib\site-packages\BioSQL >> C:\Python26\Lib\site-packages\numpy >> >> So can you copy these folders to, for example, \Lib\ folder of web >> project, >> and reference them >> somehow from code ? >> >> Of course I can test this by myself, and I will do this, but maybe >> somebody >> have experience >> with this problem, and it would be probably good info for others in this >> forum. >> >> > Hi, Dragoslav. The python developers thought of this problem. > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > Sean > > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From rodrigo_faccioli at uol.com.br Sun Sep 19 13:59:47 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Sun, 19 Sep 2010 10:59:47 -0300 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Hi, I've worked with BioPython in web project. I've installed BioPython normally in our ubuntu server. My web project was developed its front-end in jsp. But I ran my scripts with BioPython. You can find this project in http://glu.fcfrp.usp.br:8180/prometheus/ About the python frameworks, I've read Django is an excellent framework. Thanks in advance, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric wrote: > Anyway, > > I will try simplest thing, to copy folder with biopython modules in some > folder of web app and access modules trough absolute path of web server, > this must work. > > At first I planned to use django web framework, but I recently discovered > there are > many python web frameworks. So i prefer most simplistic and effective > frameworks, > I will check out > > web.py > > it looks nice at first glance. > > Kind regards > > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis wrote: > > > > > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric < > zaricdragoslav at gmail.com > > > wrote: > > > >> Did anybody used biopython as third part library, like for example in > >> python > >> web project ? > >> I ask this because probably you can not expect to find or install > >> biopython > >> in provider server > >> environment. > >> > >> For example, after installing biopython in windows environment, you can > >> see > >> that biopython is > >> installed inside python 2.6 installation: > >> > >> C:\Python26\Lib\site-packages\Bio > >> C:\Python26\Lib\site-packages\BioSQL > >> C:\Python26\Lib\site-packages\numpy > >> > >> So can you copy these folders to, for example, \Lib\ folder of web > >> project, > >> and reference them > >> somehow from code ? > >> > >> Of course I can test this by myself, and I will do this, but maybe > >> somebody > >> have experience > >> with this problem, and it would be probably good info for others in this > >> forum. > >> > >> > > Hi, Dragoslav. The python developers thought of this problem. > > > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > > > Sean > > > > > > > > -- > Dragoslav Zaric > > Professional Programmer > MSc Astrophysics > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From zaricdragoslav at gmail.com Sun Sep 19 14:34:37 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Sun, 19 Sep 2010 18:34:37 +0400 Subject: [Biopython] Trhird party library In-Reply-To: References: Message-ID: Thanks Rodrigo, I have come to same conclusion after little searching. Also hosting for django is very common. kind regards On Sun, Sep 19, 2010 at 5:59 PM, Rodrigo Faccioli < rodrigo_faccioli at uol.com.br> wrote: > Hi, > > I've worked with BioPython in web project. I've installed BioPython > normally > in our ubuntu server. > > My web project was developed its front-end in jsp. But I ran my scripts > with > BioPython. You can find this project in > http://glu.fcfrp.usp.br:8180/prometheus/ > > About the python frameworks, I've read Django is an excellent framework. > > Thanks in advance, > > -- > Rodrigo Antonio Faccioli > Ph.D Student in Electrical Engineering > University of Sao Paulo - USP > Engineering School of Sao Carlos - EESC > Department of Electrical Engineering - SEL > Intelligent System in Structure Bioinformatics > http://laips.sel.eesc.usp.br > Phone: 55 (16) 3373-9366 Ext 229 > Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 > Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5 > > > On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric > wrote: > > > Anyway, > > > > I will try simplest thing, to copy folder with biopython modules in some > > folder of web app and access modules trough absolute path of web server, > > this must work. > > > > At first I planned to use django web framework, but I recently discovered > > there are > > many python web frameworks. So i prefer most simplistic and effective > > frameworks, > > I will check out > > > > web.py > > > > it looks nice at first glance. > > > > Kind regards > > > > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis > wrote: > > > > > > > > > > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric < > > zaricdragoslav at gmail.com > > > > wrote: > > > > > >> Did anybody used biopython as third part library, like for example in > > >> python > > >> web project ? > > >> I ask this because probably you can not expect to find or install > > >> biopython > > >> in provider server > > >> environment. > > >> > > >> For example, after installing biopython in windows environment, you > can > > >> see > > >> that biopython is > > >> installed inside python 2.6 installation: > > >> > > >> C:\Python26\Lib\site-packages\Bio > > >> C:\Python26\Lib\site-packages\BioSQL > > >> C:\Python26\Lib\site-packages\numpy > > >> > > >> So can you copy these folders to, for example, \Lib\ folder of web > > >> project, > > >> and reference them > > >> somehow from code ? > > >> > > >> Of course I can test this by myself, and I will do this, but maybe > > >> somebody > > >> have experience > > >> with this problem, and it would be probably good info for others in > this > > >> forum. > > >> > > >> > > > Hi, Dragoslav. The python developers thought of this problem. > > > > > > http://docs.python.org/install/#alternate-installation-the-home-scheme > > > > > > Sean > > > > > > > > > > > > > > -- > > Dragoslav Zaric > > > > Professional Programmer > > MSc Astrophysics > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Dragoslav Zaric Professional Programmer MSc Astrophysics