From p.j.a.cock at googlemail.com Tue Apr 1 11:44:14 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Apr 2014 16:44:14 +0100 Subject: [Biopython] SciPy 2014 (July 6-12, Austin, Texas, USA) In-Reply-To: References: Message-ID: Dear Biopythoneers, It looks like there was an extension to the SciPy deadline, so *today* is your last chance to submit talk or poster abstracts! Thanks, Peter P.S. You have until this *Friday* to submit your BOSC abstracts. On Wed, Mar 12, 2014 at 3:48 PM, Peter Cock wrote: > Hi all, > > It is a bit short notice, but some of you may be interested in attending > SciPy 2014, which will again have a bioinformatics session. There > is still time to submit an abstract (deadline 14 March): > > https://conference.scipy.org/scipy2014/participate/presentations/ > > "SciPy 2014, the thirteenth annual Scientific Computing with Python > conference, will be held this July 6th-12th in Austin, Texas. SciPy is > a community dedicated to the advancement of scientific computing > through open source Python software for mathematics, science, and > engineering. The annual SciPy Conference allows participants from > academic, commercial, and governmental organizations to showcase > their latest projects, learn from skilled users and developers, and > collaborate on code development." > > Unfortunately SciPy 2014 clashes with BOSC 2014 in Boston, > which you may prefer to attend, which is also currently accepting > abstracts: > > http://www.open-bio.org/wiki/BOSC_2014 > http://www.open-bio.org/wiki/Codefest_2014 > > *Disclaimer*: I am co-chairing BOSC this year. > > Regards, > > Peter From ivangreg at gmail.com Wed Apr 2 12:33:50 2014 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Wed, 2 Apr 2014 12:33:50 -0400 Subject: [Biopython] Back translation from Protein to RNA sequence Message-ID: The documentation of the Seq object nicely shows how to 1) transcribe DNA -> RNA, 2) back transcribe RNA -> DNA, and 3) translate RNA -> protein. If priorities allow, I would appreciate the expansion of the documentation with one example of 4) back translation protein -> most_probable_RNA. The result of that operation is species-dependent and worth documenting if the functionality already exists. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics From p.j.a.cock at googlemail.com Mon Apr 7 08:58:26 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Apr 2014 13:58:26 +0100 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: On Wed, Apr 2, 2014 at 5:33 PM, Ivan Gregoretti wrote: > The documentation of the Seq object nicely shows how to > > 1) transcribe DNA -> RNA, > 2) back transcribe RNA -> DNA, and > 3) translate RNA -> protein. > > If priorities allow, I would appreciate the expansion of the documentation > with one example of > > 4) back translation protein -> most_probable_RNA. > > The result of that operation is species-dependent and worth documenting if > the functionality already exists. > > Thank you, > > Ivan Hello Ivan, Biopython currently deliberately does not have any back-translation functionality. Why do you want this, and how would you define it? I think 'most probable' would require a codon usage table for the organism, and would need a tie breaker for when two codons are equally frequent - or would you be happy with non-deterministic output? There are a whole set of details which would need to be settled, such as what would you do with ambiguous amino acids (e.g. X or J), making a general purpose back-translate rather complex. Last time this was discussed on the mailing list, the real use case was back-translation as used with protein to nucleotide alignment, where the sequence is known and just the gaps need inserting appropriately. e.g. https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans Regards, Peter From ivangreg at gmail.com Mon Apr 7 10:20:44 2014 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Mon, 7 Apr 2014 10:20:44 -0400 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: Hello Peter, I would benefit from the availability of a back-translation tool for practical reasons. In our case, part of my team is designing peptides. They asked my if I had a python tool to create the corresponding DNA so that they could design and order expression vectors. As simple as that. I would not intend to use this tool out of context and I fully understand that a codon bias table would be necessary for each species. I just leave it then as an open question then in case somebody has written a programme within the scope or our need. I'll explore the pico_galaxy link you sent me nonetheless. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Mon, Apr 7, 2014 at 8:58 AM, Peter Cock wrote: > > On Wed, Apr 2, 2014 at 5:33 PM, Ivan Gregoretti wrote: > > The documentation of the Seq object nicely shows how to > > > > 1) transcribe DNA -> RNA, > > 2) back transcribe RNA -> DNA, and > > 3) translate RNA -> protein. > > > > If priorities allow, I would appreciate the expansion of the documentation > > with one example of > > > > 4) back translation protein -> most_probable_RNA. > > > > The result of that operation is species-dependent and worth documenting if > > the functionality already exists. > > > > Thank you, > > > > Ivan > > Hello Ivan, > > Biopython currently deliberately does not have any > back-translation functionality. > > Why do you want this, and how would you define it? > > I think 'most probable' would require a codon usage table > for the organism, and would need a tie breaker for when > two codons are equally frequent - or would you be happy > with non-deterministic output? > > There are a whole set of details which would need to > be settled, such as what would you do with ambiguous > amino acids (e.g. X or J), making a general purpose > back-translate rather complex. > > Last time this was discussed on the mailing list, the real > use case was back-translation as used with protein to > nucleotide alignment, where the sequence is known > and just the gaps need inserting appropriately. e.g. > https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans > > Regards, > > Peter From p.j.a.cock at googlemail.com Mon Apr 7 11:12:29 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Apr 2014 16:12:29 +0100 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: On Mon, Apr 7, 2014 at 3:20 PM, Ivan Gregoretti wrote: > Hello Peter, > > I would benefit from the availability of a back-translation tool for > practical reasons. > > In our case, part of my team is designing peptides. They asked my if I > had a python tool to create the corresponding DNA so that they could > design and order expression vectors. > > As simple as that. I would not intend to use this tool out of context > and I fully understand that a codon bias table would be necessary for > each species. OK, so in your case you specifically want non-ambiguous codons (just using A, C, G, T and not trying to capture the wobble codon or other many-to-one mapping possibilities with IUPAC codes like N), and when picking each codon you'd probably like this to be based on tRNA levels on your target organism to maximise efficiency. It should be simple to write a special case function for your needs with a dictionary mapping each amino acid to a preferred codon (for each target organism). > I just leave it then as an open question then in case somebody has > written a programme within the scope or our need. I'll explore the > pico_galaxy link you sent me nonetheless. It wouldn't be relevant to your use-case. Peter From tra at popgen.net Tue Apr 8 11:22:21 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 8 Apr 2014 16:22:21 +0100 Subject: [Biopython] Job: Bioinformatics/PopGen of Disease vectors (Tropical Medicine) Message-ID: <20140408162221.41a107e3@lnx> Dear all, We currently have a position here for a bioinformatician with a good population genetics background. The Liverpool School of Tropical Medicine works with neglected tropical diseases and in our department we are mainly focused on disease vectors (mosquitoes, flies, ...). The postdoc accepting this position would be the first in the group (and the second in the department) doing bioinformatics, thus with a lot of freedom to choose their preferred tools. So, if you are a Biopythoneer you would most probably be able to use (Bio)python. Liverpool is one of the cheapest cities in the UK and has been ranked as one of the top 5 big cities to live in the UK: http://www.telegraph.co.uk/news/uknews/10386993/Bristol-is-best-city-to-live-in-the-UK.html As to the position, you can find more details here: http://www.jobs.ac.uk/job/AIL535/post-doctoral-research-assistant-in-bioinformatics-and-population-genetics/ Charles' page (the PI), can be found here: http://www.lstmliverpool.ac.uk/research/departments/staff-profiles/charles-wondji For questions, please do not hesitate to contact Charles directly (or me if you prefer, but I am not directly related to the position). Tiago From michael.shaffer at ucdenver.edu Tue Apr 8 17:05:01 2014 From: michael.shaffer at ucdenver.edu (Mike Shaffer) Date: Tue, 8 Apr 2014 15:05:01 -0600 Subject: [Biopython] Installation Problems in Mavericks Message-ID: Hello, I am attempting to install biopython and I am getting this error. I have install xcode command line tools and verified that it is installed using pkgutil. I found some people saying that this problem was solved by installing xcode command line tools but this didn't work. Some seemed to say that this was just because of the new version of Apple's complier. Any work arounds or tips would be greatly appreciated. Full read out from running setup.py build is below: python setup.py build running build running build_py running build_ext building 'Bio.cpairwise2' extension cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 From arklenna at gmail.com Tue Apr 8 17:12:25 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 8 Apr 2014 17:12:25 -0400 Subject: [Biopython] Installation Problems in Mavericks In-Reply-To: References: Message-ID: This has been discussed on the dev list: http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011131.html There are several possible workarounds, which are enumerated in various answers on this stackoverflow question: http://stackoverflow.com/questions/22313407/ Cheers, Lenna From tc9 at sanger.ac.uk Tue Apr 8 17:24:27 2014 From: tc9 at sanger.ac.uk (Tommy Carstensen) Date: Tue, 8 Apr 2014 21:24:27 +0000 Subject: [Biopython] random access to bgz file Message-ID: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From tc9 at sanger.ac.uk Tue Apr 8 17:23:06 2014 From: tc9 at sanger.ac.uk (Tommy Carstensen) Date: Tue, 8 Apr 2014 21:23:06 +0000 Subject: [Biopython] random access to bgz file Message-ID: <093A736015FA4E44A43E043ABDFCBF78032A85@exch-mbx2.internal.sanger.ac.uk> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Wed Apr 9 04:54:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 09:54:55 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: Hi Tommy, This isn't covered in the tutorial, but the module's built in help is quite extensive (the docstrings). Try: from Bio import bgzf help(bgzf) Or, the HTML rendered version: http://biopython.org/DIST/docs/api/Bio.bgzf-module.html (Note to self - that could be made prettier by checking the markup works, rather than treating it as plain text) Or, read the source on GitHub etc: https://github.com/biopython/biopython/blob/master/Bio/bgzf.py Essentially, like any other Python handle use the seek and tell methods - however the offsets are BGZF virtual offets which are ordered but you CANNOT do offset arithmetic on them. See also: http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html Peter On Tue, Apr 8, 2014 at 10:24 PM, Tommy Carstensen wrote: > I read the Biopython tutorial: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. > > Best wishes, > Tommy > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From tc9 at sanger.ac.uk Wed Apr 9 13:35:23 2014 From: tc9 at sanger.ac.uk (tc9) Date: Wed, 09 Apr 2014 18:35:23 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: Peter, thanks for link to html version of the bgzf documentation. Here some additional details. I am trying to do random access on a bgzipped haplotype/HAPS file. Here file format description: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample I compressed the haps file with bgzip: zcat file.haps.gz | bgzip > file.haps.bgz I know the byte position of each newline after decompression, but I need the block offsets to go from a decompressed position to a virtual offset. Trying to get the block offsets like this fails: import Bio handle = Bio.bgzf.open('file.haps.bgz') for values in Bio.bgzf.BgzfBlocks(handle): print("Raw start %i, raw length %i; data start %i, data length %i" % values) I get this error message: for values in Bio.bgzf.BgzfBlocks(handle): File "/software/team149/lib/python3.3/site-packages/Bio/bgzf.py", line 392, in BgzfBlocks block_length, data = _load_bgzf_block(handle) File "/software/team149/lib/python3.3/site-packages/Bio/bgzf.py", line 407, in _load_bgzf_block % (_bgzf_magic, magic, handle.tell())) ValueError: A BGZF (e.g. a BAM file) block should start with b'x1fx8bx08x04', not b'1:10'; handle.tell() now says 4 How can I get the block offsets, so I can access a random byte/line of my choice? On 2014-04-09 09:54, Peter Cock wrote: > Hi Tommy, > > This isn't covered in the tutorial, but the module's built in > help is quite extensive (the docstrings). Try: > > from Bio import bgzf > help(bgzf) > > Or, the HTML rendered version: > http://biopython.org/DIST/docs/api/Bio.bgzf-module.html [3] > > (Note to self - that could be made prettier by checking > the markup works, rather than treating it as plain text) > > Or, read the source on GitHub etc: > https://github.com/biopython/biopython/blob/master/Bio/bgzf.py [4] > > Essentially, like any other Python handle use the seek > and tell methods - however the offsets are BGZF virtual > offets which are ordered but you CANNOT do offset > arithmetic on them. See also: > http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html [5] > > Peter > > On Tue, Apr 8, 2014 at 10:24 PM, Tommy Carstensen wrote: > >> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html [1] It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython [2] Links: ------ [1] http://biopython.org/DIST/docs/tutorial/Tutorial.html [2] http://lists.open-bio.org/mailman/listinfo/biopython [3] http://biopython.org/DIST/docs/api/Bio.bgzf-module.html [4] https://github.com/biopython/biopython/blob/master/Bio/bgzf.py [5] http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Wed Apr 9 17:00:38 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 22:00:38 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: > > Peter, thanks for link to html version of the bgzf documentation. Here > some additional details. > > I am trying to do random access on a bgzipped haplotype/HAPS file. > Here file format description: > > https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample > > I compressed the haps file with bgzip: > > zcat file.haps.gz | bgzip > file.haps.bgz > > I know the byte position of each newline after decompression, > but I need the block offsets to go from a decompressed position > to a virtual offset. Not necessarily - all you need is the virtual offset which handle.tell() would give you. How did you get the positions in the decompressed file? Can you not repeat that indexing but using the virtual offsets via the BGZF handle? The big advantage is you just use the virtual offsets without having to know how they are calculated. If you really want to map from decompressed offsets to virtual offsets, you will need both the raw start offset of each block, but also the decompressed size of each block (often 64kb, but it can be less). > Trying to get the block offsets like this fails: > > import Bio > handle = Bio.bgzf.open('file.haps.bgz') > for values in Bio.bgzf.BgzfBlocks(handle): > print("Raw start %i, raw length %i; data start %i, data length %i" % > values) The BgzfBlocks function (which was intended for low level debugging originally) wants a raw handle (which should be opened in binary mode). I concede its docstring doesn't say that (yet) but its example show this. Try: from Bio import bgzf for values in bgzf.BgzfBlocks(open('file.haps.bgz', 'rb')): print("Raw start %i, raw length %i; data start %i, data length %i" % values) Peter From p.j.a.cock at googlemail.com Wed Apr 9 17:11:46 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 22:11:46 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: On Wed, Apr 9, 2014 at 10:00 PM, Peter Cock wrote: > On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: >> >> Trying to get the block offsets like this fails: >> >> import Bio >> handle = Bio.bgzf.open('file.haps.bgz') >> for values in Bio.bgzf.BgzfBlocks(handle): >> print("Raw start %i, raw length %i; data start %i, data length %i" % >> values) > > The BgzfBlocks function (which was intended for > low level debugging originally) wants a raw handle > (which should be opened in binary mode). I concede > its docstring doesn't say that (yet) but its example > show this. Try: > > from Bio import bgzf > for values in bgzf.BgzfBlocks(open('file.haps.bgz', 'rb')): > print("Raw start %i, raw length %i; data start %i, data length %i" % values) > > Peter Hi again Tommy, I have tried to clarify the BgzfBlocks docstring for the next release, https://github.com/biopython/biopython/commit/44e943fd5c1e1a2ee6d8520eb01ab5e8114b1b56 Please keep the questions coming - your feedback is being very useful - e.g. the context manager omission you reported earlier (off list): https://github.com/biopython/biopython/commit/a669757305962202516a192d16166eb0870d8ebe Thanks, Peter From asmariyaz23 at gmail.com Thu Apr 10 14:29:24 2014 From: asmariyaz23 at gmail.com (Asma Riyaz) Date: Thu, 10 Apr 2014 14:29:24 -0400 Subject: [Biopython] Phylo Tree: Need to align Taxa for visual representation Message-ID: Hi, I am using Bio.Phylo package to display a tree, and I am having problems representing it the way I want it to be. Here is my code: gs=gridspec.GridSpec(1, 2,height_ratios=[1,1,-2,2] ,width_ratios=[1,1,-2,2],hspace=0,wspace=0) phyl_ax=plt.subplot(gs[0]) Phylo.draw(tree, axes=phyl_ax, do_show=False,show_confidence=False) With the above code I am able to produce, wrong.png I would like the tree to be displayed similar to correct.png (I got this of a 3rd party software MEGA) and want to automate the process (hence Phylo) I have tried several of rcParams settings with line specifically but no success. Appreciate any help provided. Asma -------------- next part -------------- A non-text attachment was scrubbed... Name: correct.png Type: image/png Size: 9950 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wrong.png Type: image/png Size: 11667 bytes Desc: not available URL: From eric.talevich at gmail.com Thu Apr 10 15:45:58 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 10 Apr 2014 12:45:58 -0700 Subject: [Biopython] Phylo Tree: Need to align Taxa for visual representation In-Reply-To: References: Message-ID: Asma, The tree style you want is called a hypermetric tree. The difference is that all branch lengths are 1 (the default value) in "wrong.png" but in "correct.png" the total depth of each tip is the same distance from the root. You could set these branch lengths programmatically to make the sums work out right; Bio.Phylo doesn't currently implement it. Or you could use another tree visualization program, like Archaeopteryx ( https://sites.google.com/site/cmzmasek/home/software/archaeopteryx). -Eric On Thu, Apr 10, 2014 at 11:29 AM, Asma Riyaz wrote: > Hi, > > I am using Bio.Phylo package to display a tree, and I am having problems > representing it the way I want it to be. > > Here is my code: > > gs=gridspec.GridSpec(1, 2,height_ratios=[1,1,-2,2] > ,width_ratios=[1,1,-2,2],hspace=0,wspace=0) > phyl_ax=plt.subplot(gs[0]) > Phylo.draw(tree, axes=phyl_ax, do_show=False,show_confidence=False) > > > With the above code I am able to produce, wrong.png > I would like the tree to be displayed similar to correct.png (I got this of > a 3rd party software MEGA) and want to automate the process (hence Phylo) > > > I have tried several of rcParams settings with line specifically but no > success. > > Appreciate any help provided. > Asma > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From mathera at gmail.com Fri Apr 11 04:28:57 2014 From: mathera at gmail.com (Andrew Mather) Date: Fri, 11 Apr 2014 18:28:57 +1000 Subject: [Biopython] Install problems (Numpy problem ?) Message-ID: Hi, I'm attempting to install 1.63 from a git cloned directory into Python 2.7. Numpy 1.8.0 appears to have installed correctly and can be imported at the prompt. However the BioPython build fails with the message below: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/include/python2.7 -c Bio/Cluster/clustermodule.c -o build/temp.linux-x86_64-2.7/Bio/Cluster/clustermodule.o In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:4, from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from Bio/Cluster/clustermodule.c:3: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_common.h:114:10: error: #error Unsupported size for type off_t error: command 'gcc' failed with exit status 1 Any advice would be gratefully received, as we're in the middle of racing to commission a new system and running out of time. Thanks, Andrew -- - http://surfcoast.redbubble.com | https://picasaweb.google.com/107747436224613508618 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- "Unless someone like you, cares a whole awful lot, nothing is going to get better...It's not !" - The Lorax -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- A committee is a cul-de-sac, down which ideas are lured and then quietly strangled. Sir Barnett Cocks -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- "A mind is like a parachute. It doesnt work if it's not open." :- Frank Zappa - From anaryin at gmail.com Fri Apr 11 04:57:48 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 10:57:48 +0200 Subject: [Biopython] Install problems (Numpy problem ?) In-Reply-To: References: Message-ID: Hi Andrew, My experience with numpy is that even though you install it and import correctly, sometimes, it's still not proper. Try importing something within numpy, like ndarray, and see if you can do it. import numpy na = numpy.ndarray(shape=(2,2), dtype=float, order='F') I would debug this first and then move on to a biopython issue. Cheers, Jo?o 2014-04-11 10:28 GMT+02:00 Andrew Mather : > Hi, > > I'm attempting to install 1.63 from a git cloned directory into Python 2.7. > > Numpy 1.8.0 appears to have installed correctly and can be imported at > the prompt. > > However the BioPython build fails with the message below: > > gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC > -I/usr/local/lib/python2.7/site-packages/numpy/core/include > -I/usr/local/include/python2.7 -c Bio/Cluster/clustermodule.c -o > build/temp.linux-x86_64-2.7/Bio/Cluster/clustermodule.o > In file included from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:4, > from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, > from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, > from Bio/Cluster/clustermodule.c:3: > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_common.h:114:10: > error: #error Unsupported size for type off_t > error: command 'gcc' failed with exit status 1 > > Any advice would be gratefully received, as we're in the middle of > racing to commission a new system and running out of time. > > Thanks, > Andrew > > > -- > - > http://surfcoast.redbubble.com | > https://picasaweb.google.com/107747436224613508618 > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > "Unless someone like you, cares a whole awful lot, nothing is going to > get better...It's not !" - The Lorax > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > A committee is a cul-de-sac, down which ideas are lured and then > quietly strangled. > Sir Barnett Cocks > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > "A mind is like a parachute. It doesnt work if it's not open." :- Frank > Zappa > - > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Fri Apr 11 11:58:52 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 11:58:52 -0400 Subject: [Biopython] SVDSuperimposer() Message-ID: I was wondering if there is a faster way to do the following. I am minimizing a protein structure and one of the 'measurements" is that the minimized structure be as close to the starting value as possible. Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD difference. When I checked the various "energy terms" that are used to evaluate the structure I find that the bottleneck is indeed SVDSuperimposer.SVDSuperimposer(). Is there a way to do this more efficiently ? Thank you From anaryin at gmail.com Fri Apr 11 12:11:15 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 18:11:15 +0200 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hey George, What do you mean by bottleneck? In terms of speed? You can always use Profit for example to calculate RMSDs between the models. It's a bit faster than our module. Cheers, Jo?o 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > I was wondering if there is a faster way to do the following. > > > I am minimizing a protein structure and one of the 'measurements" is that > the minimized structure be as close to the starting value as possible. > > > Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD > difference. > > > When I checked the various "energy terms" that are used to evaluate the > structure I find that the bottleneck is > indeed SVDSuperimposer.SVDSuperimposer(). > > > Is there a way to do this more efficiently ? > > > Thank you > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Fri Apr 11 12:16:36 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 12:16:36 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hi Jo?o, Ok this is what I do, I use a conjugate gradient minimizer that adjusts the phi and psi angles of the residues to move the residues around. Of course small changes in these angles can result in large displacements further down the chain. I know that the "bad" structure I start with is "close" to what is expected so as I continue to minimize (using the minimizer in cycles) I use the RMSD difference as a test to ensure that the "better" structure while energetically better than the starting one doesn't look totally different from where I started. I will look at profit, I have never tried that-thank you very much for the suggestion. George On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > Hey George, > > What do you mean by bottleneck? In terms of speed? > > You can always use Profit for example to calculate RMSDs between the > models. It's a bit faster than our module. > > Cheers, > > Jo?o > > > 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > >> I was wondering if there is a faster way to do the following. >> >> >> I am minimizing a protein structure and one of the 'measurements" is that >> the minimized structure be as close to the starting value as possible. >> >> >> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >> difference. >> >> >> When I checked the various "energy terms" that are used to evaluate the >> structure I find that the bottleneck is >> indeed SVDSuperimposer.SVDSuperimposer(). >> >> >> Is there a way to do this more efficiently ? >> >> >> Thank you >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From devaniranjan at gmail.com Fri Apr 11 12:22:06 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 12:22:06 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Oh, sorry-yes I meant the speed. On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > Hey George, > > What do you mean by bottleneck? In terms of speed? > > You can always use Profit for example to calculate RMSDs between the > models. It's a bit faster than our module. > > Cheers, > > Jo?o > > > 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > >> I was wondering if there is a faster way to do the following. >> >> >> I am minimizing a protein structure and one of the 'measurements" is that >> the minimized structure be as close to the starting value as possible. >> >> >> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >> difference. >> >> >> When I checked the various "energy terms" that are used to evaluate the >> structure I find that the bottleneck is >> indeed SVDSuperimposer.SVDSuperimposer(). >> >> >> Is there a way to do this more efficiently ? >> >> >> Thank you >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From anaryin at gmail.com Fri Apr 11 16:37:32 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 22:37:32 +0200 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hi George, Sorry for the delay in the answer.. Are you doing the minimization using Biopython? That's the only way I see in which the SVDSuperimposer is a bottleneck. In any case, the SVD code is written in C, so it should be pretty fast. Can you identify precisely where the bottleneck is (atom selection, fitting, calculation, etc)? Anyway, I would suggest looking into some weak position restraints on the heavy atoms of the backbone to keep things sort of in place. This would avoid the RMSD calculations at every step (I guess?), instead just a simple harmonic potential calculation added to the energy function. Cheers, Jo?o 2014-04-11 18:22 GMT+02:00 George Devaniranjan : > Oh, sorry-yes I meant the speed. > > > On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > >> Hey George, >> >> What do you mean by bottleneck? In terms of speed? >> >> You can always use Profit for example to calculate RMSDs between the >> models. It's a bit faster than our module. >> >> Cheers, >> >> Jo?o >> >> >> 2014-04-11 17:58 GMT+02:00 George Devaniranjan : >> >>> I was wondering if there is a faster way to do the following. >>> >>> >>> I am minimizing a protein structure and one of the 'measurements" is that >>> the minimized structure be as close to the starting value as possible. >>> >>> >>> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >>> difference. >>> >>> >>> When I checked the various "energy terms" that are used to evaluate the >>> structure I find that the bottleneck is >>> indeed SVDSuperimposer.SVDSuperimposer(). >>> >>> >>> Is there a way to do this more efficiently ? >>> >>> >>> Thank you >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From devaniranjan at gmail.com Fri Apr 11 16:43:23 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 16:43:23 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Thank you Jo?o, No, I am using an in-house code written in python but in that code I use SVDSuperimposer() as well. The "energy function" that is minimized has various terms such as a steric clash is highly disfavored, Hydrogen bonds are favored...etc Let me try your second suggestion and see if that helps. On Fri, Apr 11, 2014 at 4:37 PM, Jo?o Rodrigues wrote: > Hi George, > > Sorry for the delay in the answer.. > > Are you doing the minimization using Biopython? That's the only way I see > in which the SVDSuperimposer is a bottleneck. In any case, the SVD code is > written in C, so it should be pretty fast. Can you identify precisely where > the bottleneck is (atom selection, fitting, calculation, etc)? > > Anyway, I would suggest looking into some weak position restraints on the > heavy atoms of the backbone to keep things sort of in place. This would > avoid the RMSD calculations at every step (I guess?), instead just a simple > harmonic potential calculation added to the energy function. > > Cheers, > > Jo?o > > > > > 2014-04-11 18:22 GMT+02:00 George Devaniranjan : > > Oh, sorry-yes I meant the speed. >> >> >> On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: >> >>> Hey George, >>> >>> What do you mean by bottleneck? In terms of speed? >>> >>> You can always use Profit for example to calculate RMSDs between the >>> models. It's a bit faster than our module. >>> >>> Cheers, >>> >>> Jo?o >>> >>> >>> 2014-04-11 17:58 GMT+02:00 George Devaniranjan : >>> >>>> I was wondering if there is a faster way to do the following. >>>> >>>> >>>> I am minimizing a protein structure and one of the 'measurements" is >>>> that >>>> the minimized structure be as close to the starting value as possible. >>>> >>>> >>>> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >>>> difference. >>>> >>>> >>>> When I checked the various "energy terms" that are used to evaluate the >>>> structure I find that the bottleneck is >>>> indeed SVDSuperimposer.SVDSuperimposer(). >>>> >>>> >>>> Is there a way to do this more efficiently ? >>>> >>>> >>>> Thank you >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> >>> >>> >> > From arklenna at gmail.com Fri Apr 11 23:41:36 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 11 Apr 2014 23:41:36 -0400 Subject: [Biopython] Installation Problems in Mavericks In-Reply-To: References: Message-ID: Hi Mike, Perhaps the XCode update has fixed this? Cheers, Lenna On Tue, Apr 8, 2014 at 5:12 PM, Lenna Peterson wrote: > This has been discussed on the dev list: > http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011131.html > > There are several possible workarounds, which are enumerated in various > answers on this stackoverflow question: > http://stackoverflow.com/questions/22313407/ > > Cheers, > > Lenna > > From tc9 at sanger.ac.uk Mon Apr 14 13:29:35 2014 From: tc9 at sanger.ac.uk (tc9) Date: Mon, 14 Apr 2014 18:29:35 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: <9ac19a0cde00f3cbbcca33c6179a24ca@sanger.ac.uk> On 2014-04-09 22:00, Peter Cock wrote: > On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: > >> Peter, thanks for link to html version of the bgzf documentation. Here some additional details. I am trying to do random access on a bgzipped haplotype/HAPS file. Here file format description: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample [1] I compressed the haps file with bgzip: zcat file.haps.gz | bgzip > file.haps.bgz I know the byte position of each newline after decompression, but I need the block offsets to go from a decompressed position to a virtual offset. > > Not necessarily - all you need is the virtual offset which > handle.tell() would give you. How did you get the positions > in the decompressed file? Can you not repeat that indexing > but using the virtual offsets via the BGZF handle? The > big advantage is you just use the virtual offsets without > having to know how they are calculated. > > If you really want to map from decompressed offsets to > virtual offsets, you will need both the raw start offset of > each block, but also the decompressed size of each > block (often 64kb, but it can be less). Initially I got the byte positions in the decompressed stream by reading the entire thing once with gzip.open(). I re-read the compressed file with BgzfReader and got the virtual offset of line number 1 million and was able to seek that line with BgzfReader much faster than I could have done with gzip.open(). See solution below, which I will post to a question of mine on stackoverflow.com. from Bio import bgzf file='file.haps.gz' handle = bgzf.BgzfReader(file) for i in range(10**6): handle.readline() virtual_offset = handle.tell() line1 = handle.readline() handle.close() handle = bgzf.BgzfReader(file) handle.seek(virtual_offset) line2 = handle.readline() handle.close() assert line1==line2 For completeness I want to mention that one can do: block_start_offset, within_block_offset = bgzf.split_virtual_offset(virtual_offset) virtual_offset = bgzf.make_virtual_offset(block_start_offset, within_block_offset) P.S. I was without a stable internet connection for a few days. Hence the slow reply. Thanks for the help! Links: ------ [1] https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From csaba.kiss at lanl.gov Tue Apr 15 10:58:22 2014 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Tue, 15 Apr 2014 08:58:22 -0600 Subject: [Biopython] python advice needed Message-ID: <534D490E.9040604@lanl.gov> Hi! I need some advice how to get better in python. I have written a software package to analyze antibody deep sequencing data. This was my first experience with python and I am not a programmer. The end result works, however, if a professional coder looks at the scripts, it is obvious that it was written by an amateur. I am planning to re-write the code into a better format that is extendable and more user and coder friendly. At the moment the script only relies on biopython to get the sequences and quality values out of sff and fastq files, the rest is custom written. I would like to rely more on biopython and also perhaps extend biopython with new features. The problem I am having is object oriented python and classes. I understand the concept of both, but it's completely different to actually use it. I would like to ask help from scientist who are in a similar situation, as myself. I am a molecular biologist with interest in coding, but little background. Do you have any good tutorials books about python classes and OOP? For example, when I learned python I found the Google python class, extremely valuable. I practically looked at the videos and solved the problems and that sent me on my way to python: https://developers.google.com/edu/python/?csw=1 Any help would be appreciated: Csaba -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From devaniranjan at gmail.com Tue Apr 15 12:16:02 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 15 Apr 2014 12:16:02 -0400 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: I wouldn't worry about it Csaba -it will come in time. I started in Python from C and at the beginning wrote "function style" code. After a "long" time "need" made it necessary to start with classes and I use both. As for the code looking "good" to a programmer, sorry if I sound cynical but I would use an "amateur" code than a "professional" as I often find the latter's efforts harder to decipher (even with comments) than an "amateur's" attempt. Good luck. On Tue, Apr 15, 2014 at 10:58 AM, Csaba Kiss wrote: > Hi! > I need some advice how to get better in python. I have written a software > package to analyze antibody deep sequencing data. This was my first > experience with python and I am not a programmer. The end result works, > however, if a professional coder looks at the scripts, it is obvious that > it was written by an amateur. I am planning to re-write the code into a > better format that is extendable and more user and coder friendly. At the > moment the script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I would like > to rely more on biopython and also perhaps extend biopython with new > features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to actually > use it. I would like to ask help from scientist who are in a similar > situation, as myself. I am a molecular biologist with interest in coding, > but little background. Do you have any good tutorials books about python > classes and OOP? For example, when I learned python I found the Google > python class, extremely valuable. I practically looked at the videos and > solved the problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From kevin.rue at ucdconnect.ie Tue Apr 15 12:27:22 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Tue, 15 Apr 2014 17:27:22 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: Hi Csaba, Well done! I witness everyday in my research group that the transition from fundamental biology to bioinformatics is not a straightforward process. Congratulations on your first successful experience. To give some context to my answer, let me tell you that I am a 3rd year PhD student trained in bioinformatics for the past 6 years (since my Master's Degree). Python is the first programming language I was taught during my Master's Degree (a tiny amount of Matlab in practicals of math before that), and I was taught the object-oriented programming aspect through classes of the Java programming language. I am glad that you managed to teach yourself how to program in Python through online resources. However, I think that going to actual classes can ease the learning curve a lot, particularly at the beginning, and for new topics such as object-oriented programming. The interactive Q&A with the demonstrator, and the questions of other classmates can help rapidly come across some common mistakes and tricks. For instance, a post-doc in my lab is learning Python just like you, and I have seen him rack his head for hours until I came along and pointed him in the right direction (avoid giving a student an answer: "give someone food and he'll eat for the day, teach them how to cook and they'll eat for the rest of their life"). Meanwhile, it is always useful to have a book around, I heard a lot of good about the O'Reilly books for that matter. They have Python books for beginners, intermediate and high-performance programming ( http://shop.oreilly.com/category/browse-subjects/programming/python.do). Now, if you allow me a few personal pieces of advice about programming (valid for Python and most languages): - "Always write pseudo-code first" - Pseudo-code is "an informal high-level description of the operating principle of a computer program or other algorithm" (Thanks Wikipedia, you just saved me 10 minutes to find my words) - In other words, before you even approach you "file.py" script, turn off the screen of your computer, take a piece of paper, and write down what your script is supposed to do, what input it will accept, what outputs it will generate. First in one sentence of plain English. Then break the sentence in subtasks. Then continue breaking each of these subtasks into smaller ones until you recognise small tasks that you feel confident to code in a reasonable number of lines. - The pseudo-code is extremely valuable for two reasons: - Avoid losing focus of what the script was originally intended to do. (once coding, it is quite easy to lose sight of the greater scheme) - It will help document your script, if you write a wiki or simply to comment you code (if you share it with someone else, they won't need to read the entire code to understand its purpose) - "Draw your objects/classes" - Essentially, an object/class has a number of attributes (=variables) and methods (=functions). For each I typically draw a box entitled with the name of the class. Then in the box, I list the names of the attributes and the names of the methods. The names of the attributes and methods should clearly represent what they are meant to contain (attributes) or do (methods). - I still apply a rule that one of my earliest programming teacher taught us: "functions are meant to do stuff, therefore their name should always start with a verb of action" - "Google is your friend" - That's a tricky one, but every time you know what you want to do but you don't know how on earth you can do it: Google your problem. You may have to browse a while, or try different search words, but in my experience "Any problem you find to write working and efficient code, someone else likely had the same problem before you". If you can clearly explain your problem, StackOverflow and other such websites may have the answer. - Use a code versioning tool - All the changes you have done for the past week have made your script worse and you don't have a copy of last week's script? Version control tools such as git/GitHub and svn will help you keep track of what your code looked like along the way. This way, you can edit a script that is working to try and enhance it without the fear of messing it up. If it goes sour, you can just go back to the working script without having to keep a separate backup. - Use a friendly (but still powerful) development environement - IDE (Integrated development environement) are software which are meant to make programming easier. A (silly?) example is a feature I cannot work without: auto-completion. Tired of typing the same long variable name over and over again? Once you have defined "variable=5" in your script, a decent IDE will allow you to type only "var" and opens you a friendly pop-up window suggesting you all existing variables and methods starting with "var". Select the one you need with the arrow keys and hit TAB: you don't have to type the rest of the variable. An amusing side-effect of this is that your variable names will grow longer (and therefore be more explicit about what they contain). IDE come with many more features including code checking, spell checking, ... - For Python I am very happy with PyCharm This email ended up to be much longer than I intended it, but I hope you will find it useful ! The learning curve to Python progamming can be rough. Learning additional tricks like version control, IDE, and object-oriented programming can make it even steeper, but the end result is a very rewarding skillset that can be helpful in many circumstances and appeal to many research group leaders too! Best of luck in your learning of Python ! Kevin On 15 April 2014 15:58, Csaba Kiss wrote: > Hi! > I need some advice how to get better in python. I have written a software > package to analyze antibody deep sequencing data. This was my first > experience with python and I am not a programmer. The end result works, > however, if a professional coder looks at the scripts, it is obvious that > it was written by an amateur. I am planning to re-write the code into a > better format that is extendable and more user and coder friendly. At the > moment the script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I would like > to rely more on biopython and also perhaps extend biopython with new > features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to actually > use it. I would like to ask help from scientist who are in a similar > situation, as myself. I am a molecular biologist with interest in coding, > but little background. Do you have any good tutorials books about python > classes and OOP? For example, when I learned python I found the Google > python class, extremely valuable. I practically looked at the videos and > solved the problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From ferreirafm at usp.br Tue Apr 15 14:56:36 2014 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Tue, 15 Apr 2014 15:56:36 -0300 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: <534D80E4.4070100@usp.br> Hi Csaba, Have a look on these free books. https://drive.google.com/file/d/0B9eRIc-w3cjVV0ZZSmxSUWZDdVU/edit?usp=sharing They certainly will help you a lot. Best, -- Dr. Frederico Moraes Ferreira University of Sao Paulo Heart Institute, School of Medicine Laboratoy of Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 Sao Paulo - SP Brasil Em 15-04-2014 11:58, Csaba Kiss escreveu: > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This was my > first experience with python and I am not a programmer. The end result > works, however, if a professional coder looks at the scripts, it is > obvious that it was written by an amateur. I am planning to re-write > the code into a better format that is extendable and more user and > coder friendly. At the moment the script only relies on biopython to > get the sequences and quality values out of sff and fastq files, the > rest is custom written. I would like to rely more on biopython and > also perhaps extend biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are in a > similar situation, as myself. I am a molecular biologist with interest > in coding, but little background. Do you have any good tutorials books > about python classes and OOP? For example, when I learned python I > found the Google python class, extremely valuable. I practically > looked at the videos and solved the problems and that sent me on my > way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > From csaba.kiss at lanl.gov Tue Apr 15 15:19:19 2014 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Tue, 15 Apr 2014 13:19:19 -0600 Subject: [Biopython] python advice needed In-Reply-To: References: <534D490E.9040604@lanl.gov> Message-ID: <534D8637.1040507@lanl.gov> Thanks for the advice Kevin. If this was a forum, they should make your post a sticky :). I use pycharm and really like it. However, using it efficiently is also challenging. Csaba On 4/15/2014 10:27 AM, Kevin Rue wrote: > Hi Csaba, > > Well done! I witness everyday in my research group that the transition > from fundamental biology to bioinformatics is not a straightforward > process. Congratulations on your first successful experience. > > To give some context to my answer, let me tell you that I am a 3rd > year PhD student trained in bioinformatics for the past 6 years (since > my Master's Degree). Python is the first programming language I was > taught during my Master's Degree (a tiny amount of Matlab in > practicals of math before that), and I was taught the object-oriented > programming aspect through classes of the Java programming language. > > I am glad that you managed to teach yourself how to program in Python > through online resources. However, I think that going to actual > classes can ease the learning curve a lot, particularly at the > beginning, and for new topics such as object-oriented programming. The > interactive Q&A with the demonstrator, and the questions of other > classmates can help rapidly come across some common mistakes and > tricks. For instance, a post-doc in my lab is learning Python just > like you, and I have seen him rack his head for hours until I came > along and pointed him in the right direction (avoid giving a student > an answer: "give someone food and he'll eat for the day, teach them > how to cook and they'll eat for the rest of their life"). > > Meanwhile, it is always useful to have a book around, I heard a lot of > good about the O'Reilly books for that matter. They have Python books > for beginners, intermediate and high-performance programming > (http://shop.oreilly.com/category/browse-subjects/programming/python.do). > > > > Now, if you allow me a few personal pieces of advice about programming > (valid for Python and most languages): > > * "Always write pseudo-code first" > o Pseudo-code is "an informal high-level > description > of the operating principle of a computer program or other > algorithm" (Thanks Wikipedia, you just saved me 10 minutes to > find my words) > o In other words, before you even approach you "file.py" script, > turn off the screen of your computer, take a piece of paper, > and write down what your script is supposed to do, what input > it will accept, what outputs it will generate. First in one > sentence of plain English. Then break the sentence in > subtasks. Then continue breaking each of these subtasks into > smaller ones until you recognise small tasks that you feel > confident to code in a reasonable number of lines. > o The pseudo-code is extremely valuable for two reasons: > + Avoid losing focus of what the script was originally > intended to do. (once coding, it is quite easy to lose > sight of the greater scheme) > + It will help document your script, if you write a wiki or > simply to comment you code (if you share it with someone > else, they won't need to read the entire code to > understand its purpose) > * "Draw your objects/classes" > o Essentially, an object/class has a number of attributes > (=variables) and methods (=functions). For each I typically > draw a box entitled with the name of the class. Then in the > box, I list the names of the attributes and the names of the > methods. The names of the attributes and methods should > clearly represent what they are meant to contain (attributes) > or do (methods). > + I still apply a rule that one of my earliest programming > teacher taught us: "functions are meant to do stuff, > therefore their name should always start with a verb of > action" > * "Google is your friend" > o That's a tricky one, but every time you know what you want to > do but you don't know how on earth you can do it: Google your > problem. You may have to browse a while, or try different > search words, but in my experience "Any problem you find to > write working and efficient code, someone else likely had the > same problem before you". If you can clearly explain your > problem, StackOverflow and other such websites may have the > answer. > * Use a code versioning tool > o All the changes you have done for the past week have made your > script worse and you don't have a copy of last week's script? > Version control tools such as git/GitHub and svn will help you > keep track of what your code looked like along the way. This > way, you can edit a script that is working to try and enhance > it without the fear of messing it up. If it goes sour, you can > just go back to the working script without having to keep a > separate backup. > * Use a friendly (but still powerful) development environement > o IDE (Integrated development environement) are software which > are meant to make programming easier. A (silly?) example is a > feature I cannot work without: auto-completion. Tired of > typing the same long variable name over and over again? Once > you have defined "variable=5" in your script, a decent IDE > will allow you to type only "var" and opens you a friendly > pop-up window suggesting you all existing variables and > methods starting with "var". Select the one you need with the > arrow keys and hit TAB: you don't have to type the rest of the > variable. An amusing side-effect of this is that your variable > names will grow longer (and therefore be more explicit about > what they contain). IDE come with many more features including > code checking, spell checking, ... > o For Python I am very happy with PyCharm > > > > This email ended up to be much longer than I intended it, but I hope > you will find it useful ! > The learning curve to Python progamming can be rough. Learning > additional tricks like version control, IDE, and object-oriented > programming can make it even steeper, but the end result is a very > rewarding skillset that can be helpful in many circumstances and > appeal to many research group leaders too! > > Best of luck in your learning of Python ! > > Kevin > > > > > On 15 April 2014 15:58, Csaba Kiss > wrote: > > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This > was my first experience with python and I am not a programmer. The > end result works, however, if a professional coder looks at the > scripts, it is obvious that it was written by an amateur. I am > planning to re-write the code into a better format that is > extendable and more user and coder friendly. At the moment the > script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I > would like to rely more on biopython and also perhaps extend > biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are > in a similar situation, as myself. I am a molecular biologist with > interest in coding, but little background. Do you have any good > tutorials books about python classes and OOP? For example, when I > learned python I found the Google python class, extremely > valuable. I practically looked at the videos and solved the > problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From kevin.rue at ucdconnect.ie Tue Apr 15 15:40:44 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Tue, 15 Apr 2014 20:40:44 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534D8637.1040507@lanl.gov> References: <534D490E.9040604@lanl.gov> <534D8637.1040507@lanl.gov> Message-ID: Thanks, much appreciated ! Best of luck, Kevin On 15 April 2014 20:19, Csaba Kiss wrote: > Thanks for the advice Kevin. If this was a forum, they should make your > post a sticky :). I use pycharm and really like it. However, using it > efficiently is also challenging. > > Csaba > > On 4/15/2014 10:27 AM, Kevin Rue wrote: > > Hi Csaba, > > Well done! I witness everyday in my research group that the transition > from fundamental biology to bioinformatics is not a straightforward > process. Congratulations on your first successful experience. > > To give some context to my answer, let me tell you that I am a 3rd year > PhD student trained in bioinformatics for the past 6 years (since my > Master's Degree). Python is the first programming language I was taught > during my Master's Degree (a tiny amount of Matlab in practicals of math > before that), and I was taught the object-oriented programming aspect > through classes of the Java programming language. > > I am glad that you managed to teach yourself how to program in Python > through online resources. However, I think that going to actual classes can > ease the learning curve a lot, particularly at the beginning, and for new > topics such as object-oriented programming. The interactive Q&A with the > demonstrator, and the questions of other classmates can help rapidly come > across some common mistakes and tricks. For instance, a post-doc in my lab > is learning Python just like you, and I have seen him rack his head for > hours until I came along and pointed him in the right direction (avoid > giving a student an answer: "give someone food and he'll eat for the day, > teach them how to cook and they'll eat for the rest of their life"). > > Meanwhile, it is always useful to have a book around, I heard a lot of > good about the O'Reilly books for that matter. They have Python books for > beginners, intermediate and high-performance programming ( > http://shop.oreilly.com/category/browse-subjects/programming/python.do). > > > > Now, if you allow me a few personal pieces of advice about programming > (valid for Python and most languages): > > - "Always write pseudo-code first" > - Pseudo-code is "an informal high-level description > of the operating principle of a computer program or other > algorithm" (Thanks Wikipedia, you just saved me 10 minutes to find my wo > rds) > - In other words, before you even approach you "file.py" script, > turn off the screen of your computer, take a piece of paper, and write down > what your script is supposed to do, what input it will accept, what outputs > it will generate. First in one sentence of plain English. Then break the > sentence in subtasks. Then continue breaking each of these subtasks into > smaller ones until you recognise small tasks that you feel confident to > code in a reasonable number of lines. > - The pseudo-code is extremely valuable for two reasons: > - Avoid losing focus of what the script was originally intended > to do. (once coding, it is quite easy to lose sight of the greater scheme) > - It will help document your script, if you write a wiki or > simply to comment you code (if you share it with someone else, they won't > need to read the entire code to understand its purpose) > - "Draw your objects/classes" > - Essentially, an object/class has a number of attributes > (=variables) and methods (=functions). For each I typically draw a box > entitled with the name of the class. Then in the box, I list the names of > the attributes and the names of the methods. The names of the attributes > and methods should clearly represent what they are meant to contain > (attributes) or do (methods). > - I still apply a rule that one of my earliest programming > teacher taught us: "functions are meant to do stuff, therefore their name > should always start with a verb of action" > - "Google is your friend" > - That's a tricky one, but every time you know what you want to do > but you don't know how on earth you can do it: Google your problem. You may > have to browse a while, or try different search words, but in my experience > "Any problem you find to write working and efficient code, someone else > likely had the same problem before you". If you can clearly explain your > problem, StackOverflow and other such websites may have the answer. > - Use a code versioning tool > - All the changes you have done for the past week have made your > script worse and you don't have a copy of last week's script? Version > control tools such as git/GitHub and svn will help you keep track of what > your code looked like along the way. This way, you can edit a script that > is working to try and enhance it without the fear of messing it up. If it > goes sour, you can just go back to the working script without having to > keep a separate backup. > - Use a friendly (but still powerful) development environement > - IDE (Integrated development environement) are software which are > meant to make programming easier. A (silly?) example is a feature I cannot > work without: auto-completion. Tired of typing the same long variable name > over and over again? Once you have defined "variable=5" in your script, a > decent IDE will allow you to type only "var" and opens you a friendly > pop-up window suggesting you all existing variables and methods starting > with "var". Select the one you need with the arrow keys and hit TAB: you > don't have to type the rest of the variable. An amusing side-effect of this > is that your variable names will grow longer (and therefore be more > explicit about what they contain). IDE come with many more features > including code checking, spell checking, ... > - For Python I am very happy with PyCharm > > > > This email ended up to be much longer than I intended it, but I hope you > will find it useful ! > The learning curve to Python progamming can be rough. Learning additional > tricks like version control, IDE, and object-oriented programming can make > it even steeper, but the end result is a very rewarding skillset that can > be helpful in many circumstances and appeal to many research group leaders > too! > > Best of luck in your learning of Python ! > > Kevin > > > > > On 15 April 2014 15:58, Csaba Kiss wrote: > >> Hi! >> I need some advice how to get better in python. I have written a software >> package to analyze antibody deep sequencing data. This was my first >> experience with python and I am not a programmer. The end result works, >> however, if a professional coder looks at the scripts, it is obvious that >> it was written by an amateur. I am planning to re-write the code into a >> better format that is extendable and more user and coder friendly. At the >> moment the script only relies on biopython to get the sequences and quality >> values out of sff and fastq files, the rest is custom written. I would like >> to rely more on biopython and also perhaps extend biopython with new >> features. >> The problem I am having is object oriented python and classes. I >> understand the concept of both, but it's completely different to actually >> use it. I would like to ask help from scientist who are in a similar >> situation, as myself. I am a molecular biologist with interest in coding, >> but little background. Do you have any good tutorials books about python >> classes and OOP? For example, when I learned python I found the Google >> python class, extremely valuable. I practically looked at the videos and >> solved the problems and that sent me on my way to python: >> https://developers.google.com/edu/python/?csw=1 >> >> Any help would be appreciated: >> Csaba >> >> -- >> Best Regards: >> Csaba Kiss PhD, MSc, BSc >> TA-43, HRL-1, MS888 >> Los Alamos National Laboratory >> Work: 1-505-667-9898 >> Cell: 1-505-920-5774 >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From catfish at austin.utexas.edu Tue Apr 15 18:37:37 2014 From: catfish at austin.utexas.edu (Cannatella, David) Date: Tue, 15 Apr 2014 22:37:37 +0000 Subject: [Biopython] Biopython and OSX Mavericks problem. Message-ID: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> I've had the same problem as Mike Shaffer (8 April) on the discussion list (see my errors below). I had tried the possible solutions mentioned (including the export commands below), including those on the stackoverflow site, but none has worked so far. Some posted solutions have included installing other versions of python, etc., but this is not practical for several reasons. Is re-installing an earlier version of XCode a viable option for me? Or is it likely that there might be a fix in BioPython in the next month? Thanks in advance, Dave ========= export CPPFLAGS=-Qunused-arguments export CFLAGS=-Qunused-arguments =============== ... building 'Bio.cpairwise2' extension cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 ============= From arklenna at gmail.com Wed Apr 16 01:40:21 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Wed, 16 Apr 2014 01:40:21 -0400 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < catfish at austin.utexas.edu> wrote: > I've had the same problem as Mike Shaffer (8 April) on the discussion list > (see my errors below). > > I had tried the possible solutions mentioned (including the export > commands below), including those on the stackoverflow site, but none has > worked so far. > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 Are you installing with sudo? > > Some posted solutions have included installing other versions of python, > etc., but this is not practical for several reasons. > > Is re-installing an earlier version of XCode a viable option for me? > Apple is not particularly supportive of downgrading, so I imagine this could have unforeseen side effects. clang will have this behavior indefinitely; the problem is an incompatibility between clang (part of XCode) and the python version distributed with the OS. Hopefully there will be an OS update soon fixing the python version. > Or is it likely that there might be a fix in BioPython in the next month? > > Thanks in advance, > Dave > > ========= > export CPPFLAGS=-Qunused-arguments > export CFLAGS=-Qunused-arguments > > =============== > ... > building 'Bio.cpairwise2' extension > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g > -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a warning) > in the future > error: command 'cc' failed with exit status 1 > ============= > > > > > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From stephane.teletchea at inserm.fr Wed Apr 16 08:38:40 2014 From: stephane.teletchea at inserm.fr (=?ISO-8859-1?Q?T=E9letch=E9a_St=E9phane?=) Date: Wed, 16 Apr 2014 14:38:40 +0200 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: <534E79D0.1080008@inserm.fr> Le 15/04/2014 16:58, Csaba Kiss a ?crit : > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This was my > first experience with python and I am not a programmer. The end result > works, however, if a professional coder looks at the scripts, it is > obvious that it was written by an amateur. I am planning to re-write > the code into a better format that is extendable and more user and > coder friendly. At the moment the script only relies on biopython to > get the sequences and quality values out of sff and fastq files, the > rest is custom written. I would like to rely more on biopython and > also perhaps extend biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are in a > similar situation, as myself. I am a molecular biologist with interest > in coding, but little background. Do you have any good tutorials books > about python classes and OOP? For example, when I learned python I > found the Google python class, extremely valuable. I practically > looked at the videos and solved the problems and that sent me on my > way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > Dear Csaba, Being a bioinformatics teacher, I would first say that your code should first work :-) Second, in order to get another version of your code, as mentioned earlier, you should use a git-like versioning control tool (git or any other, git tends to be popular). Third, concerning python itself, I would recommend following the "PEP8" recommandations: http://legacy.python.org/dev/peps/pep-0008/ (I also found this page while searching for PEP8 -> http://docs.python-guide.org/en/latest/writing/style/) And last, since we are using biopython, check how biopython is implemented (for example): https://github.com/biopython/biopython/blob/master/Bio/AlignIO/Interfaces.py Best, St?phane -- Equipe DSIMB - Dynamique des Structures et des Interactions des Macromol?cules Biologiques INTS, INSERM-Paris-Diderot UMR_S 1134 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France T?l : +33 144 493 057 Fax : +33 147 347 431 http://www.dsimb.inserm.fr - http://www.steletch.org From kevin.rue at ucdconnect.ie Wed Apr 16 11:09:53 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Wed, 16 Apr 2014 16:09:53 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534E79D0.1080008@inserm.fr> References: <534D490E.9040604@lanl.gov> <534E79D0.1080008@inserm.fr> Message-ID: Stephane, "I would first say that your code should first work" With all due respect, I would be careful how you phrased that, especially when talking to a beginner in programming. The way I understand it (not necessarily the way you meant it), this could be one of the worst advice I have heard. I would very much rather have a script that does not work is well commented and documented (making it easier to debug), than a script copied from StackOverflow that works with obscure syntax and no comment to guide you in understanding it. Programming 1.0.1 in my opinion. To me, a script that is "pseudo-coded first", is much more likely to "work second" (i.e. first after the pseudo-code, excuse my play on words). Still in my humble opinion, creating a code that "works first" is one of the best way to write something that works fine in your particular little ultra-specific scenario. At best, it will need to be copy-pasted and edited for another scenario, more often you will start from scratch another script to "work first" in the second scenario. The PEP8 is a very good advice I forgot to mention. PyCharm is very useful in that regard as it checks the code for the PEP8 rules while it is typed. Regards, Kevin On 16 April 2014 13:38, T?letch?a St?phane wrote: > Le 15/04/2014 16:58, Csaba Kiss a ?crit : > > Hi! >> I need some advice how to get better in python. I have written a software >> package to analyze antibody deep sequencing data. This was my first >> experience with python and I am not a programmer. The end result works, >> however, if a professional coder looks at the scripts, it is obvious that >> it was written by an amateur. I am planning to re-write the code into a >> better format that is extendable and more user and coder friendly. At the >> moment the script only relies on biopython to get the sequences and quality >> values out of sff and fastq files, the rest is custom written. I would like >> to rely more on biopython and also perhaps extend biopython with new >> features. >> The problem I am having is object oriented python and classes. I >> understand the concept of both, but it's completely different to actually >> use it. I would like to ask help from scientist who are in a similar >> situation, as myself. I am a molecular biologist with interest in coding, >> but little background. Do you have any good tutorials books about python >> classes and OOP? For example, when I learned python I found the Google >> python class, extremely valuable. I practically looked at the videos and >> solved the problems and that sent me on my way to python: >> https://developers.google.com/edu/python/?csw=1 >> >> Any help would be appreciated: >> Csaba >> >> > Dear Csaba, > > Being a bioinformatics teacher, I would first say that your code should > first work :-) > > Second, in order to get another version of your code, as mentioned > earlier, you should use > a git-like versioning control tool (git or any other, git tends to be > popular). > > Third, concerning python itself, I would recommend following the "PEP8" > recommandations: > http://legacy.python.org/dev/peps/pep-0008/ > > (I also found this page while searching for PEP8 -> > http://docs.python-guide.org/en/latest/writing/style/) > > And last, since we are using biopython, check how biopython is implemented > (for example): > https://github.com/biopython/biopython/blob/master/Bio/ > AlignIO/Interfaces.py > > Best, > St?phane > > -- > Equipe DSIMB - Dynamique des Structures et > des Interactions des Macromol?cules Biologiques > INTS, INSERM-Paris-Diderot UMR_S 1134 > 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France > T?l : +33 144 493 057 > Fax : +33 147 347 431 > http://www.dsimb.inserm.fr - http://www.steletch.org > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From kevin.rue at ucdconnect.ie Wed Apr 16 11:11:08 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Wed, 16 Apr 2014 16:11:08 +0100 Subject: [Biopython] python advice needed In-Reply-To: References: <534D490E.9040604@lanl.gov> <534E79D0.1080008@inserm.fr> Message-ID: Just adding a missing "but" in one of my sentences below: On 16 April 2014 16:09, Kevin Rue wrote: > Stephane, > > "I would first say that your code should first work" > > With all due respect, I would be careful how you phrased that, especially > when talking to a beginner in programming. The way I understand it (not > necessarily the way you meant it), this could be one of the worst advice I > have heard. > I would very much rather have a script that does not work but is well > commented and documented (making it easier to debug), than a script copied > from StackOverflow that works with obscure syntax and no comment to guide > you in understanding it. Programming 1.0.1 in my opinion. > > To me, a script that is "pseudo-coded first", is much more likely to "work > second" (i.e. first after the pseudo-code, excuse my play on words). > Still in my humble opinion, creating a code that "works first" is one of > the best way to write something that works fine in your particular little > ultra-specific scenario. At best, it will need to be copy-pasted and edited > for another scenario, more often you will start from scratch another script > to "work first" in the second scenario. > > The PEP8 is a very good advice I forgot to mention. PyCharm is very useful > in that regard as it checks the code for the PEP8 rules while it is typed. > > Regards, > Kevin > > > On 16 April 2014 13:38, T?letch?a St?phane wrote: > >> Le 15/04/2014 16:58, Csaba Kiss a ?crit : >> >> Hi! >>> I need some advice how to get better in python. I have written a >>> software package to analyze antibody deep sequencing data. This was my >>> first experience with python and I am not a programmer. The end result >>> works, however, if a professional coder looks at the scripts, it is obvious >>> that it was written by an amateur. I am planning to re-write the code into >>> a better format that is extendable and more user and coder friendly. At the >>> moment the script only relies on biopython to get the sequences and quality >>> values out of sff and fastq files, the rest is custom written. I would like >>> to rely more on biopython and also perhaps extend biopython with new >>> features. >>> The problem I am having is object oriented python and classes. I >>> understand the concept of both, but it's completely different to actually >>> use it. I would like to ask help from scientist who are in a similar >>> situation, as myself. I am a molecular biologist with interest in coding, >>> but little background. Do you have any good tutorials books about python >>> classes and OOP? For example, when I learned python I found the Google >>> python class, extremely valuable. I practically looked at the videos and >>> solved the problems and that sent me on my way to python: >>> https://developers.google.com/edu/python/?csw=1 >>> >>> Any help would be appreciated: >>> Csaba >>> >>> >> Dear Csaba, >> >> Being a bioinformatics teacher, I would first say that your code should >> first work :-) >> >> Second, in order to get another version of your code, as mentioned >> earlier, you should use >> a git-like versioning control tool (git or any other, git tends to be >> popular). >> >> Third, concerning python itself, I would recommend following the "PEP8" >> recommandations: >> http://legacy.python.org/dev/peps/pep-0008/ >> >> (I also found this page while searching for PEP8 -> >> http://docs.python-guide.org/en/latest/writing/style/) >> >> And last, since we are using biopython, check how biopython is >> implemented (for example): >> https://github.com/biopython/biopython/blob/master/Bio/ >> AlignIO/Interfaces.py >> >> Best, >> St?phane >> >> -- >> Equipe DSIMB - Dynamique des Structures et >> des Interactions des Macromol?cules Biologiques >> INTS, INSERM-Paris-Diderot UMR_S 1134 >> 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France >> T?l : +33 144 493 057 >> Fax : +33 147 347 431 >> http://www.dsimb.inserm.fr - http://www.steletch.org >> >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From zhigang.wu at email.ucr.edu Wed Apr 16 16:31:56 2014 From: zhigang.wu at email.ucr.edu (Zhigang Wu) Date: Wed, 16 Apr 2014 13:31:56 -0700 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: If you cannot get pip working, I recommend you trying Macport, which often involves less hassle. In case you have not used port before, follow the instruction here: http://www.macports.org/install.php to install port first then you can install biopython by typing `sudo port install 'py27-biopython' ` (I assume your OSX running python2.7) Zhigang On Tue, Apr 15, 2014 at 10:40 PM, Lenna Peterson wrote: > On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < > catfish at austin.utexas.edu> wrote: > > > I've had the same problem as Mike Shaffer (8 April) on the discussion > list > > (see my errors below). > > > > I had tried the possible solutions mentioned (including the export > > commands below), including those on the stackoverflow site, but none has > > worked so far. > > > > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 > > Are you installing with sudo? > > > > > > Some posted solutions have included installing other versions of python, > > etc., but this is not practical for several reasons. > > > > Is re-installing an earlier version of XCode a viable option for me? > > > > Apple is not particularly supportive of downgrading, so I imagine this > could have unforeseen side effects. clang will have this behavior > indefinitely; the problem is an incompatibility between clang (part of > XCode) and the python version distributed with the OS. Hopefully there will > be an OS update soon fixing the python version. > > > > Or is it likely that there might be a fix in BioPython in the next month? > > > > Thanks in advance, > > Dave > > > > ========= > > export CPPFLAGS=-Qunused-arguments > > export CFLAGS=-Qunused-arguments > > > > =============== > > ... > > building 'Bio.cpairwise2' extension > > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g > > -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > warning) > > in the future > > error: command 'cc' failed with exit status 1 > > ============= > > > > > > > > > > > > > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 17 05:33:48 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Apr 2014 10:33:48 +0100 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: On Wed, Apr 16, 2014 at 6:40 AM, Lenna Peterson wrote: > On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < > catfish at austin.utexas.edu> wrote: > >> I've had the same problem as Mike Shaffer (8 April) on the discussion list >> (see my errors below). >> >> I had tried the possible solutions mentioned (including the export >> commands below), including those on the stackoverflow site, but none has >> worked so far. >> > > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 > > Are you installing with sudo? > >> >> Some posted solutions have included installing other versions of python, >> etc., but this is not practical for several reasons. >> >> Is re-installing an earlier version of XCode a viable option for me? >> > > Apple is not particularly supportive of downgrading, so I imagine this > could have unforeseen side effects. clang will have this behavior > indefinitely; the problem is an incompatibility between clang (part of > XCode) and the python version distributed with the OS. Hopefully there will > be an OS update soon fixing the python version. The other option (which is actually the recommended route according the NumPy/SciPy folk) is to ignore the Apple provides Python (don't try to remove it!), and install your own direct from python.org. They do provide precompiled binaries for the Mac, but I like to do this myself under $HOME in order to test with the newer releases like Python 3.3 and 3.4 etc. Try something like: $ cd ~/Downloads $ wget http://www.python.org/ftp/python/3.3.3/Python-3.33.tgz $ tar -zxvf Python-3.3.3.tgz $ cd Python-3.3.3 $ ./configure --prefix=$HOME $ make $ make test $ make install Then modify your $HOME/.bash_profile to add $HOME/bin to your path: export PATH=$HOME/bin:$PATH Then install NumPy, and try installing Biopython from source. Peter From vikthirtyfive at gmail.com Sat Apr 19 10:31:03 2014 From: vikthirtyfive at gmail.com (Vikram K) Date: Sat, 19 Apr 2014 10:31:03 -0400 Subject: [Biopython] cosmic data Message-ID: Dear Biopython users, is there any biopython module which can be used to help analyze cosmic data? Regards Vikram From biologyguy at gmail.com Sat Apr 19 17:08:17 2014 From: biologyguy at gmail.com (Steve Bond) Date: Sat, 19 Apr 2014 17:08:17 -0400 Subject: [Biopython] cosmic data Message-ID: Hi Vikram, I'm not sure if Biopython has a dedicated module, but the cosmic database is actually quite simple to work with on its own. It's only a single table, and can be loaded into your favourite SQL server in a snap. What sort of analysis are you trying to do? Here's the link to the database download if you don't already have a copy ftp://ftp.sanger.ac.uk/pub/CGP/cosmic/data_export/CosmicCompleteExportIncFus_v68.tsv.gz -Steve On Sat, Apr 19, 2014 at 12:00 PM, wrote: > > > Message: 1 > Date: Sat, 19 Apr 2014 10:31:03 -0400 > From: Vikram K > Subject: [Biopython] cosmic data > To: biopython at lists.open-bio.org > Message-ID: > XDfXsfzdN2n27CxX4FwcJQXbG8n+hX1OZA at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Dear Biopython users, > is there any biopython module which can be used to help analyze cosmic > data? > > Regards > Vikram > > From rpathmanaban1 at gmail.com Sat Apr 19 18:04:36 2014 From: rpathmanaban1 at gmail.com (Pathmanaban Ramasamy) Date: Sun, 20 Apr 2014 00:04:36 +0200 Subject: [Biopython] Graph representation Message-ID: Dear Biopython users, Can anyone help me how to represent residue contact map of pdb files in adjacency list? Also need some good references/articles related. Thanks in advance, Pathmanaban. From p.j.a.cock at googlemail.com Tue Apr 22 12:11:33 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Apr 2014 17:11:33 +0100 Subject: [Biopython] Fwd: [GSoC] Welcome aboard, GSoC 2014 students! In-Reply-To: References: Message-ID: Dear Biopythoneers, Please join me in congratulating this year's accepted students for Google Summer of Code under the Open Bioinformatics Foundation (OBF), and in particular welcome Evan Parker who will be working on Biopython with Bow and myself as co-mentors. As always the scheme has been very competitive, so our sympathies and commiserations to those students who were not accepted. Please do stay involved in Biopython or other open source projects - this would be a positive factor if you are eligible to apply for next year's GSoC. Thank you, Peter ---------- Forwarded message ---------- From: Eric Talevich Date: Tue, Apr 22, 2014 at 4:41 AM Subject: [GSoC] Welcome aboard, GSoC 2014 students! To: OBF GSoC Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students: Sarah Berkemer - "Open source high-performance BioHaskell" (Mentors: Christian H?ner zu Siederdissen, Ketil Malde) Loris Cro - "An ultra-fast scalable RESTful API to query large numbers of VCF datapoints" (Mentors: Francesco Strozzi, Raoul Bonnal & the BioRuby team) Victor Kofia - "JSBML: Redesign the implementation of mathematical formulas" (Mentors: Alex Thomas, Sarah Keating & the JSBML team) Evan Parker - "Addition of a lazy loading sequence parser to Biopython's SeqIO package" (Mentors: Wibowo Arindrarto, Peter Cock & the Biopython team) Ibrahim Vairabad - "Improving the Plug-in interface for CellDesigner" (Mentors: Andreas Dr?ger, Alex Thomas & the JSBML team) Leandro Watanabe - "Dynamic Modeling of Cellular Populations within JSBML" (Mentors: Nicolas Rodriguez, Chris Meyers & the JSBML team) Congratulations to our accepted students! Thanks very much to all the students who applied, we very much appreciate your hard work. Today marks the start of the Community Bonding Period. Official work starts on May 23rd, and until then, students should prepare for their projects: get on the project mailing lists, solidify your plans, figure out where all the version control repositories are and which branch or fork you'll be working on, and start doing preparatory work. Students: if you have not done so already, make sure you have subscribed to the OBF GSoC email list at: http://lists.open-bio.org/mailman/listinfo/gsoc This list is for discussions among students and mentors, and for administrative announcements from me or my co-administrators. Here's to a great 2014 Summer of Code, Eric & Raoul OBF GSoC 2014 Organization Administrators _______________________________________________ GSoC mailing list GSoC at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/gsoc From mfethe1 at gmail.com Tue Apr 22 17:59:14 2014 From: mfethe1 at gmail.com (Michael Fethe) Date: Tue, 22 Apr 2014 17:59:14 -0400 Subject: [Biopython] Virus alert during qblast() Message-ID: Hi, I am submitting sequences to blast via biopython. My script runs over multiple hours and can take quite some time (working with hundreds of sequences). Is it possible for my computer or someone to mistake this script running as a virus since it writes my blast results to an output file and then submits my next sequence? Thanks, Michael Fethe From p.j.a.cock at googlemail.com Tue Apr 22 18:02:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Apr 2014 23:02:53 +0100 Subject: [Biopython] Virus alert during qblast() In-Reply-To: References: Message-ID: Hi Michael, That seems unlikely - but if you are doing hundreds of automated BLAST queries, the NCBI might not be very happy. For big BLAST jobs, I would always use standalone BLAST running locally (on your cluster if possible). This is generally faster as well :) Regards, Peter On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe wrote: > Hi, > > I am submitting sequences to blast via biopython. My script > runs over multiple hours and can take quite some time > (working with hundreds of sequences). Is it possible for > my computer or someone to mistake this script running > as a virus since it writes my blast results to an output file > and then submits my next sequence? > > Thanks, > > Michael Fethe > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Wed Apr 23 01:44:59 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Apr 2014 06:44:59 +0100 Subject: [Biopython] Virus alert during qblast() In-Reply-To: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> References: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> Message-ID: Hi again, Using standalone BLAST+ at the command line with -remote you can specify an Entrez filter option -entrez_query on the organism. Another option which may be better is to make a target database (e.g all fully sequenced bacteria). Peter On Wed, Apr 23, 2014 at 12:45 AM, Michael Fethe wrote: > Hi Peter, > > I am blasting unknowns, however, can I limit biopython to bacteria in my qblast command? > > Michael Fethe > >> On Apr 22, 2014, at 6:02 PM, Peter Cock wrote: >> >> Hi Michael, >> >> That seems unlikely - but if you are doing hundreds of >> automated BLAST queries, the NCBI might not be very >> happy. >> >> For big BLAST jobs, I would always use standalone >> BLAST running locally (on your cluster if possible). >> This is generally faster as well :) >> >> Regards, >> >> Peter >> >>> On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe wrote: >>> Hi, >>> >>> I am submitting sequences to blast via biopython. My script >>> runs over multiple hours and can take quite some time >>> (working with hundreds of sequences). Is it possible for >>> my computer or someone to mistake this script running >>> as a virus since it writes my blast results to an output file >>> and then submits my next sequence? >>> >>> Thanks, >>> >>> Michael Fethe >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython From davidsshin at lbl.gov Wed Apr 23 03:38:55 2014 From: davidsshin at lbl.gov (David Shin) Date: Wed, 23 Apr 2014 00:38:55 -0700 Subject: [Biopython] Virus alert during qblast() In-Reply-To: References: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> Message-ID: For standalone, which yes, will run way way way faster, this is what I did to make a few filtered databases. Tried to give examples of nucleotide if that's what you are looking for... Go to the nucleotide or protein (whichever you are working on) BLAST page Nucleotide BLAST: Search nucleotide databases using a nucleotide query and start typing in the organism or species in the text field corresponding to "Organism" "optional" Get the taxid i.e if you typed in: bacteria, you would get taxid:2 if you put zea mays you would get taxid:4577 then go to the NCBI nucleotide or page Home - Nucleotide - NCBI Use the following syntax for your search (will use the zea mays example) txid4577[ORGN] Then, from the "send to:" pulldown on the webpage: click "file" button a dropdown will appear under format, select "gi list" save the file.... but change name to something you remember like sequence.gi.4577.txt in case you will want different filters later Then in your database directory where you have downloaded the all nr nucleotide database run: blastdb_aliastool -gilist sequence.gi.4577.txt -db nr -out nr_gi.4577 -title nr_gi.4577 to give a filter called nr_gi.4577 then when you blast from your script, it would look something like: blastn -query mysequence.fs -num_threads 4 -db nr_gi.4577 -out test-4577.out In my case, I made a filter that had just "plants", using taxid 3193, but also a subset that had ~15 selected species, by combining the "gi list" output from separate searches.. ie. like if I wanted a "bacteria + zea mays" filter because I was psychotic, I would cat together the gi lists files from txid2 and txid4577. Anyway, that's how you can run everything locally after you have it set up, and reduce time by a significant amount. At least, that's how I did it, if anyone has a better way, let me know. D On Tue, Apr 22, 2014 at 10:44 PM, Peter Cock wrote: > Hi again, > > Using standalone BLAST+ at the command line with -remote > you can specify an Entrez filter option -entrez_query on the > organism. > > Another option which may be better is to make a target > database (e.g all fully sequenced bacteria). > > Peter > > > On Wed, Apr 23, 2014 at 12:45 AM, Michael Fethe wrote: > > Hi Peter, > > > > I am blasting unknowns, however, can I limit biopython to bacteria in my > qblast command? > > > > Michael Fethe > > > >> On Apr 22, 2014, at 6:02 PM, Peter Cock > wrote: > >> > >> Hi Michael, > >> > >> That seems unlikely - but if you are doing hundreds of > >> automated BLAST queries, the NCBI might not be very > >> happy. > >> > >> For big BLAST jobs, I would always use standalone > >> BLAST running locally (on your cluster if possible). > >> This is generally faster as well :) > >> > >> Regards, > >> > >> Peter > >> > >>> On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe > wrote: > >>> Hi, > >>> > >>> I am submitting sequences to blast via biopython. My script > >>> runs over multiple hours and can take quite some time > >>> (working with hundreds of sequences). Is it possible for > >>> my computer or someone to mistake this script running > >>> as a virus since it writes my blast results to an output file > >>> and then submits my next sequence? > >>> > >>> Thanks, > >>> > >>> Michael Fethe > >>> _______________________________________________ > >>> Biopython mailing list - Biopython at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Thu Apr 24 15:53:34 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Apr 2014 20:53:34 +0100 Subject: [Biopython] Versions of PyPy to support? Message-ID: Hello all, PyPy 2.3 is due out shortly, which prompts me to ask which versions of PyPy are people using Biopython with? PyPy is an alternative implementation of Python, which can often be much faster - see http://pypy.org/ We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 but I would like to suggest we drop at least PyPy 1.8 and 1.9. Is that OK? Thanks! Peter From manlio.calvi at gmail.com Thu Apr 24 17:04:47 2014 From: manlio.calvi at gmail.com (Manlio Calvi) Date: Thu, 24 Apr 2014 23:04:47 +0200 Subject: [Biopython] [Biopython-dev] Versions of PyPy to support? In-Reply-To: References: Message-ID: I'm thinking about run a PyPy test on my machine, something specific to watch out? I've seen they have a beta on pypy for 3, probably a bit experimental at the moment but they saying it mostly works (as they say of PyPy in general) Manlio On Thu, Apr 24, 2014 at 9:53 PM, Peter Cock wrote: > Hello all, > > PyPy 2.3 is due out shortly, which prompts me to ask which > versions of PyPy are people using Biopython with? > > PyPy is an alternative implementation of Python, which > can often be much faster - see http://pypy.org/ > > We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 > but I would like to suggest we drop at least PyPy 1.8 and 1.9. > > Is that OK? > > Thanks! > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Thu Apr 24 17:07:54 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Apr 2014 22:07:54 +0100 Subject: [Biopython] Versions of PyPy to support? In-Reply-To: References: Message-ID: Try PyPy 2.2 first, and if that works you can try their experimental Python 3 support? Peter On Thursday, April 24, 2014, Manlio Calvi wrote: > I'm thinking about run a PyPy test on my machine, something specific > to watch out? > > I've seen they have a beta on pypy for 3, probably a bit experimental > at the moment but they saying it mostly works (as they say of PyPy in > general) > > Manlio > > On Thu, Apr 24, 2014 at 9:53 PM, Peter Cock > > wrote: > > Hello all, > > > > PyPy 2.3 is due out shortly, which prompts me to ask which > > versions of PyPy are people using Biopython with? > > > > PyPy is an alternative implementation of Python, which > > can often be much faster - see http://pypy.org/ > > > > We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 > > but I would like to suggest we drop at least PyPy 1.8 and 1.9. > > > > Is that OK? > > > > Thanks! > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From smortaz at gmail.com Fri Apr 25 02:18:45 2014 From: smortaz at gmail.com (Sean Mortazavi) Date: Thu, 24 Apr 2014 23:18:45 -0700 Subject: [Biopython] The PTVS gang (from microsoft) would like to hear from BioPython users! Message-ID: Hi - we're a few engineers (not marketers!) that work on Python Tools for Visual Studio (http://pytools.codeplex.com). It's a free & OSS plug-in that turns VS into a decent Python IDE. It has some nice features like mixed-mode Python/C++ debugging, debugging on Linux from Visual Studio and IPython integration. Some of our users (including BioPython users) have been encouraging us to enhance PTVS and add support for some "Data Science" focused features & scenarios. We'd *love* to get your input regarding your current stack, workflow and pain points before taking the next steps. It does not matter if you use Windows, Visual Studio, love/hate Microsoft - we'd just love to understand your environment a bit better especially if you use tools like Excel, R, Matlab, Mathematica, numpy, scipy, Pandas, ... As a thank you, 50 people will be randomly selected to receive a $5 Starbucks Coffee card! Here is a link to the survey which should take about 2 minutes to complete. https://www.surveymonkey.com/s/VSForDataScience If you know others that might be interested in taking this survey, *please* forward it to them - much appreciated. Thanks for your participation! From mictadlo at gmail.com Tue Apr 29 06:28:39 2014 From: mictadlo at gmail.com (Mic) Date: Tue, 29 Apr 2014 20:28:39 +1000 Subject: [Biopython] Parsing SnpEff's VCF file Message-ID: Hello, SnpEff created a new VCF file which looks like this line DA_v3.0 1252 DA0000001 G T 3.0 . DP=12;EFF=DOWNSTREAM(MODIFIER|||||Q3TPR7|||Transcript_DA_0011r.4||1),DOWNSTREAM(MODIFIER|||||Q8GYX9|||Transcript_DA_0011r.2||1),INTERGENIC(MODIFIER||||||||||1) GT:DP 0/0:3 ./.:0 ./.:0 1/1:3 0/0:3 0/0:1 0/0:2 ./.:0 I found Gemini project which contains a SnpEff class ( https://github.com/arq5x/gemini/blob/master/gemini/snpEff.py ). However, I am not quite sure how to use snpEff.py outside Gemini project in order to parse the whole SnpEff's VCF file. Or does BioPython provide a parser? Thank you in advance, From p.j.a.cock at googlemail.com Wed Apr 30 09:24:42 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Apr 2014 14:24:42 +0100 Subject: [Biopython] Bug#746484: file not distributable In-Reply-To: <20140430131703.GD758@an3as.eu> References: <20140430131703.GD758@an3as.eu> Message-ID: Hi Andreas, Are you specifically asking about Biopython 1.63 here? I think you can reasonable exclude this DTD file (and any others under the Bio/Entrez/DTD file). Biopython 1.63 will warn if they are missing but attempt to download them automatically. We're looking at dropping all the NCBI Entrez related DTD files, since the next Biopython release (v1.64) will automatically download AND cache them - see recent discussion, e.g. http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011205.html We haven't actually removed the files on GitHub yet, but this might be an incentive to do so. Thanks, Peter On Wed, Apr 30, 2014 at 2:17 PM, Andreas Tille wrote: > Hello, > > our ftpmaster has detected an issue with one of the DTDs which are > distributed with BioPython source. > > I can confirm that after applying the following patch > > > --- a/Bio/Entrez/DTDs/modules.ent > +++ b/Bio/Entrez/DTDs/modules.ent > @@ -350,13 +350,6 @@ Version Reason/Occasion > "mathmlsetup.ent" > > > > - > - - PUBLIC > -"-//W3C//ENTITIES MathML 2.0 Qualified Names 1.0//EN" > -"mathml/mathml2-qname-1.mod" > > - > - > > "-//W3C//DTD MathML 2.0//EN" > > > the file in question can be removed from the archive without breaking > the build (including the test suite). I would like to suggest to drop > the file in question from your distribution tarball in case my analysis > that it is not needed is correct. > > Kind regards > > Andreas. > > On Wed, Apr 30, 2014 at 02:25:09PM +0200, Thorsten Alteholz wrote: >> Package: python-biopython >> Version: 1.63-2 >> Severity: serious >> User: alteholz at debian.org >> Usertags: ftp >> X-Debbugs-CC: ftpmaster at ftp-master.debian.org >> thanks >> >> Dear Maintainer, >> >> according to: >> http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231 >> the file >> biopython-1.63\Bio\Entrez\DTDs\mathml2-qname-1.mod >> may not be modified and such this file is not distributable in main. >> >> Thorsten >> >> _______________________________________________ >> Debian-med-packaging mailing list >> Debian-med-packaging at lists.alioth.debian.org >> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging >> > > -- > http://fam-tille.de From p.j.a.cock at googlemail.com Wed Apr 30 09:38:11 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Apr 2014 14:38:11 +0100 Subject: [Biopython] Bug#746484: file not distributable In-Reply-To: <20140430133016.GE758@an3as.eu> References: <20140430131703.GD758@an3as.eu> <20140430133016.GE758@an3as.eu> Message-ID: On Wed, Apr 30, 2014 at 2:30 PM, Andreas Tille wrote: > Hi Peter, > > thanks for your super-fast response. > > On Wed, Apr 30, 2014 at 02:24:42PM +0100, Peter Cock wrote: >> Are you specifically asking about Biopython 1.63 here? > > Yes. Since I have added python3 binary packages Biopython 1.63 went > through manual inspection by ftpmaster and this issue was noticed. Very through of them - thanks! Also thank you for doing the Debian Python3 packaging of Biopython :) >> I think you >> can reasonable exclude this DTD file (and any others under the >> Bio/Entrez/DTD file). Biopython 1.63 will warn if they are missing >> but attempt to download them automatically. > > OK. > >> We're looking at dropping all the NCBI Entrez related DTD files, >> since the next Biopython release (v1.64) will automatically download >> AND cache them - see recent discussion, e.g. >> >> http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011205.html > > That's fine. > >> We haven't actually removed the files on GitHub yet, but this >> might be an incentive to do so. > > OK, meanwhile (as long as 1.64 is not yet released) I will remove the > file from the Debian archive. > > Thanks for the clarification > > Andreas. Great, Peter P.S. I'm skimming over the Debian patches to see what we can fix: http://anonscm.debian.org/viewvc/debian-med/trunk/packages/python-biopython/trunk/debian/patches/ e.g. https://github.com/biopython/biopython/commit/2f098ac5311e0eec3d6737f4fff60e18c50b9481 From p.j.a.cock at googlemail.com Tue Apr 1 15:44:14 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Apr 2014 16:44:14 +0100 Subject: [Biopython] SciPy 2014 (July 6-12, Austin, Texas, USA) In-Reply-To: References: Message-ID: Dear Biopythoneers, It looks like there was an extension to the SciPy deadline, so *today* is your last chance to submit talk or poster abstracts! Thanks, Peter P.S. You have until this *Friday* to submit your BOSC abstracts. On Wed, Mar 12, 2014 at 3:48 PM, Peter Cock wrote: > Hi all, > > It is a bit short notice, but some of you may be interested in attending > SciPy 2014, which will again have a bioinformatics session. There > is still time to submit an abstract (deadline 14 March): > > https://conference.scipy.org/scipy2014/participate/presentations/ > > "SciPy 2014, the thirteenth annual Scientific Computing with Python > conference, will be held this July 6th-12th in Austin, Texas. SciPy is > a community dedicated to the advancement of scientific computing > through open source Python software for mathematics, science, and > engineering. The annual SciPy Conference allows participants from > academic, commercial, and governmental organizations to showcase > their latest projects, learn from skilled users and developers, and > collaborate on code development." > > Unfortunately SciPy 2014 clashes with BOSC 2014 in Boston, > which you may prefer to attend, which is also currently accepting > abstracts: > > http://www.open-bio.org/wiki/BOSC_2014 > http://www.open-bio.org/wiki/Codefest_2014 > > *Disclaimer*: I am co-chairing BOSC this year. > > Regards, > > Peter From ivangreg at gmail.com Wed Apr 2 16:33:50 2014 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Wed, 2 Apr 2014 12:33:50 -0400 Subject: [Biopython] Back translation from Protein to RNA sequence Message-ID: The documentation of the Seq object nicely shows how to 1) transcribe DNA -> RNA, 2) back transcribe RNA -> DNA, and 3) translate RNA -> protein. If priorities allow, I would appreciate the expansion of the documentation with one example of 4) back translation protein -> most_probable_RNA. The result of that operation is species-dependent and worth documenting if the functionality already exists. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics From p.j.a.cock at googlemail.com Mon Apr 7 12:58:26 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Apr 2014 13:58:26 +0100 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: On Wed, Apr 2, 2014 at 5:33 PM, Ivan Gregoretti wrote: > The documentation of the Seq object nicely shows how to > > 1) transcribe DNA -> RNA, > 2) back transcribe RNA -> DNA, and > 3) translate RNA -> protein. > > If priorities allow, I would appreciate the expansion of the documentation > with one example of > > 4) back translation protein -> most_probable_RNA. > > The result of that operation is species-dependent and worth documenting if > the functionality already exists. > > Thank you, > > Ivan Hello Ivan, Biopython currently deliberately does not have any back-translation functionality. Why do you want this, and how would you define it? I think 'most probable' would require a codon usage table for the organism, and would need a tie breaker for when two codons are equally frequent - or would you be happy with non-deterministic output? There are a whole set of details which would need to be settled, such as what would you do with ambiguous amino acids (e.g. X or J), making a general purpose back-translate rather complex. Last time this was discussed on the mailing list, the real use case was back-translation as used with protein to nucleotide alignment, where the sequence is known and just the gaps need inserting appropriately. e.g. https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans Regards, Peter From ivangreg at gmail.com Mon Apr 7 14:20:44 2014 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Mon, 7 Apr 2014 10:20:44 -0400 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: Hello Peter, I would benefit from the availability of a back-translation tool for practical reasons. In our case, part of my team is designing peptides. They asked my if I had a python tool to create the corresponding DNA so that they could design and order expression vectors. As simple as that. I would not intend to use this tool out of context and I fully understand that a codon bias table would be necessary for each species. I just leave it then as an open question then in case somebody has written a programme within the scope or our need. I'll explore the pico_galaxy link you sent me nonetheless. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Mon, Apr 7, 2014 at 8:58 AM, Peter Cock wrote: > > On Wed, Apr 2, 2014 at 5:33 PM, Ivan Gregoretti wrote: > > The documentation of the Seq object nicely shows how to > > > > 1) transcribe DNA -> RNA, > > 2) back transcribe RNA -> DNA, and > > 3) translate RNA -> protein. > > > > If priorities allow, I would appreciate the expansion of the documentation > > with one example of > > > > 4) back translation protein -> most_probable_RNA. > > > > The result of that operation is species-dependent and worth documenting if > > the functionality already exists. > > > > Thank you, > > > > Ivan > > Hello Ivan, > > Biopython currently deliberately does not have any > back-translation functionality. > > Why do you want this, and how would you define it? > > I think 'most probable' would require a codon usage table > for the organism, and would need a tie breaker for when > two codons are equally frequent - or would you be happy > with non-deterministic output? > > There are a whole set of details which would need to > be settled, such as what would you do with ambiguous > amino acids (e.g. X or J), making a general purpose > back-translate rather complex. > > Last time this was discussed on the mailing list, the real > use case was back-translation as used with protein to > nucleotide alignment, where the sequence is known > and just the gaps need inserting appropriately. e.g. > https://github.com/peterjc/pico_galaxy/tree/master/tools/align_back_trans > > Regards, > > Peter From p.j.a.cock at googlemail.com Mon Apr 7 15:12:29 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 7 Apr 2014 16:12:29 +0100 Subject: [Biopython] Back translation from Protein to RNA sequence In-Reply-To: References: Message-ID: On Mon, Apr 7, 2014 at 3:20 PM, Ivan Gregoretti wrote: > Hello Peter, > > I would benefit from the availability of a back-translation tool for > practical reasons. > > In our case, part of my team is designing peptides. They asked my if I > had a python tool to create the corresponding DNA so that they could > design and order expression vectors. > > As simple as that. I would not intend to use this tool out of context > and I fully understand that a codon bias table would be necessary for > each species. OK, so in your case you specifically want non-ambiguous codons (just using A, C, G, T and not trying to capture the wobble codon or other many-to-one mapping possibilities with IUPAC codes like N), and when picking each codon you'd probably like this to be based on tRNA levels on your target organism to maximise efficiency. It should be simple to write a special case function for your needs with a dictionary mapping each amino acid to a preferred codon (for each target organism). > I just leave it then as an open question then in case somebody has > written a programme within the scope or our need. I'll explore the > pico_galaxy link you sent me nonetheless. It wouldn't be relevant to your use-case. Peter From tra at popgen.net Tue Apr 8 15:22:21 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 8 Apr 2014 16:22:21 +0100 Subject: [Biopython] Job: Bioinformatics/PopGen of Disease vectors (Tropical Medicine) Message-ID: <20140408162221.41a107e3@lnx> Dear all, We currently have a position here for a bioinformatician with a good population genetics background. The Liverpool School of Tropical Medicine works with neglected tropical diseases and in our department we are mainly focused on disease vectors (mosquitoes, flies, ...). The postdoc accepting this position would be the first in the group (and the second in the department) doing bioinformatics, thus with a lot of freedom to choose their preferred tools. So, if you are a Biopythoneer you would most probably be able to use (Bio)python. Liverpool is one of the cheapest cities in the UK and has been ranked as one of the top 5 big cities to live in the UK: http://www.telegraph.co.uk/news/uknews/10386993/Bristol-is-best-city-to-live-in-the-UK.html As to the position, you can find more details here: http://www.jobs.ac.uk/job/AIL535/post-doctoral-research-assistant-in-bioinformatics-and-population-genetics/ Charles' page (the PI), can be found here: http://www.lstmliverpool.ac.uk/research/departments/staff-profiles/charles-wondji For questions, please do not hesitate to contact Charles directly (or me if you prefer, but I am not directly related to the position). Tiago From michael.shaffer at ucdenver.edu Tue Apr 8 21:05:01 2014 From: michael.shaffer at ucdenver.edu (Mike Shaffer) Date: Tue, 8 Apr 2014 15:05:01 -0600 Subject: [Biopython] Installation Problems in Mavericks Message-ID: Hello, I am attempting to install biopython and I am getting this error. I have install xcode command line tools and verified that it is installed using pkgutil. I found some people saying that this problem was solved by installing xcode command line tools but this didn't work. Some seemed to say that this was just because of the new version of Apple's complier. Any work arounds or tips would be greatly appreciated. Full read out from running setup.py build is below: python setup.py build running build running build_py running build_ext building 'Bio.cpairwise2' extension cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 From arklenna at gmail.com Tue Apr 8 21:12:25 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 8 Apr 2014 17:12:25 -0400 Subject: [Biopython] Installation Problems in Mavericks In-Reply-To: References: Message-ID: This has been discussed on the dev list: http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011131.html There are several possible workarounds, which are enumerated in various answers on this stackoverflow question: http://stackoverflow.com/questions/22313407/ Cheers, Lenna From tc9 at sanger.ac.uk Tue Apr 8 21:24:27 2014 From: tc9 at sanger.ac.uk (Tommy Carstensen) Date: Tue, 8 Apr 2014 21:24:27 +0000 Subject: [Biopython] random access to bgz file Message-ID: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From tc9 at sanger.ac.uk Tue Apr 8 21:23:06 2014 From: tc9 at sanger.ac.uk (Tommy Carstensen) Date: Tue, 8 Apr 2014 21:23:06 +0000 Subject: [Biopython] random access to bgz file Message-ID: <093A736015FA4E44A43E043ABDFCBF78032A85@exch-mbx2.internal.sanger.ac.uk> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Wed Apr 9 08:54:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 09:54:55 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: Hi Tommy, This isn't covered in the tutorial, but the module's built in help is quite extensive (the docstrings). Try: from Bio import bgzf help(bgzf) Or, the HTML rendered version: http://biopython.org/DIST/docs/api/Bio.bgzf-module.html (Note to self - that could be made prettier by checking the markup works, rather than treating it as plain text) Or, read the source on GitHub etc: https://github.com/biopython/biopython/blob/master/Bio/bgzf.py Essentially, like any other Python handle use the seek and tell methods - however the offsets are BGZF virtual offets which are ordered but you CANNOT do offset arithmetic on them. See also: http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html Peter On Tue, Apr 8, 2014 at 10:24 PM, Tommy Carstensen wrote: > I read the Biopython tutorial: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. > > Best wishes, > Tommy > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From tc9 at sanger.ac.uk Wed Apr 9 17:35:23 2014 From: tc9 at sanger.ac.uk (tc9) Date: Wed, 09 Apr 2014 18:35:23 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: Peter, thanks for link to html version of the bgzf documentation. Here some additional details. I am trying to do random access on a bgzipped haplotype/HAPS file. Here file format description: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample I compressed the haps file with bgzip: zcat file.haps.gz | bgzip > file.haps.bgz I know the byte position of each newline after decompression, but I need the block offsets to go from a decompressed position to a virtual offset. Trying to get the block offsets like this fails: import Bio handle = Bio.bgzf.open('file.haps.bgz') for values in Bio.bgzf.BgzfBlocks(handle): print("Raw start %i, raw length %i; data start %i, data length %i" % values) I get this error message: for values in Bio.bgzf.BgzfBlocks(handle): File "/software/team149/lib/python3.3/site-packages/Bio/bgzf.py", line 392, in BgzfBlocks block_length, data = _load_bgzf_block(handle) File "/software/team149/lib/python3.3/site-packages/Bio/bgzf.py", line 407, in _load_bgzf_block % (_bgzf_magic, magic, handle.tell())) ValueError: A BGZF (e.g. a BAM file) block should start with b'x1fx8bx08x04', not b'1:10'; handle.tell() now says 4 How can I get the block offsets, so I can access a random byte/line of my choice? On 2014-04-09 09:54, Peter Cock wrote: > Hi Tommy, > > This isn't covered in the tutorial, but the module's built in > help is quite extensive (the docstrings). Try: > > from Bio import bgzf > help(bgzf) > > Or, the HTML rendered version: > http://biopython.org/DIST/docs/api/Bio.bgzf-module.html [3] > > (Note to self - that could be made prettier by checking > the markup works, rather than treating it as plain text) > > Or, read the source on GitHub etc: > https://github.com/biopython/biopython/blob/master/Bio/bgzf.py [4] > > Essentially, like any other Python handle use the seek > and tell methods - however the offsets are BGZF virtual > offets which are ordered but you CANNOT do offset > arithmetic on them. See also: > http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html [5] > > Peter > > On Tue, Apr 8, 2014 at 10:24 PM, Tommy Carstensen wrote: > >> I read the Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html [1] It does not explain how to do random access to a bgz file. Can someone point me to a tutorial on how to do this? Thank you. Best wishes, Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython [2] Links: ------ [1] http://biopython.org/DIST/docs/tutorial/Tutorial.html [2] http://lists.open-bio.org/mailman/listinfo/biopython [3] http://biopython.org/DIST/docs/api/Bio.bgzf-module.html [4] https://github.com/biopython/biopython/blob/master/Bio/bgzf.py [5] http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Wed Apr 9 21:00:38 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 22:00:38 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: > > Peter, thanks for link to html version of the bgzf documentation. Here > some additional details. > > I am trying to do random access on a bgzipped haplotype/HAPS file. > Here file format description: > > https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample > > I compressed the haps file with bgzip: > > zcat file.haps.gz | bgzip > file.haps.bgz > > I know the byte position of each newline after decompression, > but I need the block offsets to go from a decompressed position > to a virtual offset. Not necessarily - all you need is the virtual offset which handle.tell() would give you. How did you get the positions in the decompressed file? Can you not repeat that indexing but using the virtual offsets via the BGZF handle? The big advantage is you just use the virtual offsets without having to know how they are calculated. If you really want to map from decompressed offsets to virtual offsets, you will need both the raw start offset of each block, but also the decompressed size of each block (often 64kb, but it can be less). > Trying to get the block offsets like this fails: > > import Bio > handle = Bio.bgzf.open('file.haps.bgz') > for values in Bio.bgzf.BgzfBlocks(handle): > print("Raw start %i, raw length %i; data start %i, data length %i" % > values) The BgzfBlocks function (which was intended for low level debugging originally) wants a raw handle (which should be opened in binary mode). I concede its docstring doesn't say that (yet) but its example show this. Try: from Bio import bgzf for values in bgzf.BgzfBlocks(open('file.haps.bgz', 'rb')): print("Raw start %i, raw length %i; data start %i, data length %i" % values) Peter From p.j.a.cock at googlemail.com Wed Apr 9 21:11:46 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Apr 2014 22:11:46 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: On Wed, Apr 9, 2014 at 10:00 PM, Peter Cock wrote: > On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: >> >> Trying to get the block offsets like this fails: >> >> import Bio >> handle = Bio.bgzf.open('file.haps.bgz') >> for values in Bio.bgzf.BgzfBlocks(handle): >> print("Raw start %i, raw length %i; data start %i, data length %i" % >> values) > > The BgzfBlocks function (which was intended for > low level debugging originally) wants a raw handle > (which should be opened in binary mode). I concede > its docstring doesn't say that (yet) but its example > show this. Try: > > from Bio import bgzf > for values in bgzf.BgzfBlocks(open('file.haps.bgz', 'rb')): > print("Raw start %i, raw length %i; data start %i, data length %i" % values) > > Peter Hi again Tommy, I have tried to clarify the BgzfBlocks docstring for the next release, https://github.com/biopython/biopython/commit/44e943fd5c1e1a2ee6d8520eb01ab5e8114b1b56 Please keep the questions coming - your feedback is being very useful - e.g. the context manager omission you reported earlier (off list): https://github.com/biopython/biopython/commit/a669757305962202516a192d16166eb0870d8ebe Thanks, Peter From asmariyaz23 at gmail.com Thu Apr 10 18:29:24 2014 From: asmariyaz23 at gmail.com (Asma Riyaz) Date: Thu, 10 Apr 2014 14:29:24 -0400 Subject: [Biopython] Phylo Tree: Need to align Taxa for visual representation Message-ID: Hi, I am using Bio.Phylo package to display a tree, and I am having problems representing it the way I want it to be. Here is my code: gs=gridspec.GridSpec(1, 2,height_ratios=[1,1,-2,2] ,width_ratios=[1,1,-2,2],hspace=0,wspace=0) phyl_ax=plt.subplot(gs[0]) Phylo.draw(tree, axes=phyl_ax, do_show=False,show_confidence=False) With the above code I am able to produce, wrong.png I would like the tree to be displayed similar to correct.png (I got this of a 3rd party software MEGA) and want to automate the process (hence Phylo) I have tried several of rcParams settings with line specifically but no success. Appreciate any help provided. Asma -------------- next part -------------- A non-text attachment was scrubbed... Name: correct.png Type: image/png Size: 9950 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wrong.png Type: image/png Size: 11667 bytes Desc: not available URL: From eric.talevich at gmail.com Thu Apr 10 19:45:58 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 10 Apr 2014 12:45:58 -0700 Subject: [Biopython] Phylo Tree: Need to align Taxa for visual representation In-Reply-To: References: Message-ID: Asma, The tree style you want is called a hypermetric tree. The difference is that all branch lengths are 1 (the default value) in "wrong.png" but in "correct.png" the total depth of each tip is the same distance from the root. You could set these branch lengths programmatically to make the sums work out right; Bio.Phylo doesn't currently implement it. Or you could use another tree visualization program, like Archaeopteryx ( https://sites.google.com/site/cmzmasek/home/software/archaeopteryx). -Eric On Thu, Apr 10, 2014 at 11:29 AM, Asma Riyaz wrote: > Hi, > > I am using Bio.Phylo package to display a tree, and I am having problems > representing it the way I want it to be. > > Here is my code: > > gs=gridspec.GridSpec(1, 2,height_ratios=[1,1,-2,2] > ,width_ratios=[1,1,-2,2],hspace=0,wspace=0) > phyl_ax=plt.subplot(gs[0]) > Phylo.draw(tree, axes=phyl_ax, do_show=False,show_confidence=False) > > > With the above code I am able to produce, wrong.png > I would like the tree to be displayed similar to correct.png (I got this of > a 3rd party software MEGA) and want to automate the process (hence Phylo) > > > I have tried several of rcParams settings with line specifically but no > success. > > Appreciate any help provided. > Asma > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From mathera at gmail.com Fri Apr 11 08:28:57 2014 From: mathera at gmail.com (Andrew Mather) Date: Fri, 11 Apr 2014 18:28:57 +1000 Subject: [Biopython] Install problems (Numpy problem ?) Message-ID: Hi, I'm attempting to install 1.63 from a git cloned directory into Python 2.7. Numpy 1.8.0 appears to have installed correctly and can be imported at the prompt. However the BioPython build fails with the message below: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/include/python2.7 -c Bio/Cluster/clustermodule.c -o build/temp.linux-x86_64-2.7/Bio/Cluster/clustermodule.o In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:4, from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from Bio/Cluster/clustermodule.c:3: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_common.h:114:10: error: #error Unsupported size for type off_t error: command 'gcc' failed with exit status 1 Any advice would be gratefully received, as we're in the middle of racing to commission a new system and running out of time. Thanks, Andrew -- - http://surfcoast.redbubble.com | https://picasaweb.google.com/107747436224613508618 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- "Unless someone like you, cares a whole awful lot, nothing is going to get better...It's not !" - The Lorax -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- A committee is a cul-de-sac, down which ideas are lured and then quietly strangled. Sir Barnett Cocks -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- "A mind is like a parachute. It doesnt work if it's not open." :- Frank Zappa - From anaryin at gmail.com Fri Apr 11 08:57:48 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 10:57:48 +0200 Subject: [Biopython] Install problems (Numpy problem ?) In-Reply-To: References: Message-ID: Hi Andrew, My experience with numpy is that even though you install it and import correctly, sometimes, it's still not proper. Try importing something within numpy, like ndarray, and see if you can do it. import numpy na = numpy.ndarray(shape=(2,2), dtype=float, order='F') I would debug this first and then move on to a biopython issue. Cheers, Jo?o 2014-04-11 10:28 GMT+02:00 Andrew Mather : > Hi, > > I'm attempting to install 1.63 from a git cloned directory into Python 2.7. > > Numpy 1.8.0 appears to have installed correctly and can be imported at > the prompt. > > However the BioPython build fails with the message below: > > gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC > -I/usr/local/lib/python2.7/site-packages/numpy/core/include > -I/usr/local/include/python2.7 -c Bio/Cluster/clustermodule.c -o > build/temp.linux-x86_64-2.7/Bio/Cluster/clustermodule.o > In file included from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:4, > from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, > from > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, > from Bio/Cluster/clustermodule.c:3: > > /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_common.h:114:10: > error: #error Unsupported size for type off_t > error: command 'gcc' failed with exit status 1 > > Any advice would be gratefully received, as we're in the middle of > racing to commission a new system and running out of time. > > Thanks, > Andrew > > > -- > - > http://surfcoast.redbubble.com | > https://picasaweb.google.com/107747436224613508618 > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > "Unless someone like you, cares a whole awful lot, nothing is going to > get better...It's not !" - The Lorax > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > A committee is a cul-de-sac, down which ideas are lured and then > quietly strangled. > Sir Barnett Cocks > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > "A mind is like a parachute. It doesnt work if it's not open." :- Frank > Zappa > - > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Fri Apr 11 15:58:52 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 11:58:52 -0400 Subject: [Biopython] SVDSuperimposer() Message-ID: I was wondering if there is a faster way to do the following. I am minimizing a protein structure and one of the 'measurements" is that the minimized structure be as close to the starting value as possible. Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD difference. When I checked the various "energy terms" that are used to evaluate the structure I find that the bottleneck is indeed SVDSuperimposer.SVDSuperimposer(). Is there a way to do this more efficiently ? Thank you From anaryin at gmail.com Fri Apr 11 16:11:15 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 18:11:15 +0200 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hey George, What do you mean by bottleneck? In terms of speed? You can always use Profit for example to calculate RMSDs between the models. It's a bit faster than our module. Cheers, Jo?o 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > I was wondering if there is a faster way to do the following. > > > I am minimizing a protein structure and one of the 'measurements" is that > the minimized structure be as close to the starting value as possible. > > > Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD > difference. > > > When I checked the various "energy terms" that are used to evaluate the > structure I find that the bottleneck is > indeed SVDSuperimposer.SVDSuperimposer(). > > > Is there a way to do this more efficiently ? > > > Thank you > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From devaniranjan at gmail.com Fri Apr 11 16:16:36 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 12:16:36 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hi Jo?o, Ok this is what I do, I use a conjugate gradient minimizer that adjusts the phi and psi angles of the residues to move the residues around. Of course small changes in these angles can result in large displacements further down the chain. I know that the "bad" structure I start with is "close" to what is expected so as I continue to minimize (using the minimizer in cycles) I use the RMSD difference as a test to ensure that the "better" structure while energetically better than the starting one doesn't look totally different from where I started. I will look at profit, I have never tried that-thank you very much for the suggestion. George On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > Hey George, > > What do you mean by bottleneck? In terms of speed? > > You can always use Profit for example to calculate RMSDs between the > models. It's a bit faster than our module. > > Cheers, > > Jo?o > > > 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > >> I was wondering if there is a faster way to do the following. >> >> >> I am minimizing a protein structure and one of the 'measurements" is that >> the minimized structure be as close to the starting value as possible. >> >> >> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >> difference. >> >> >> When I checked the various "energy terms" that are used to evaluate the >> structure I find that the bottleneck is >> indeed SVDSuperimposer.SVDSuperimposer(). >> >> >> Is there a way to do this more efficiently ? >> >> >> Thank you >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From devaniranjan at gmail.com Fri Apr 11 16:22:06 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 12:22:06 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Oh, sorry-yes I meant the speed. On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > Hey George, > > What do you mean by bottleneck? In terms of speed? > > You can always use Profit for example to calculate RMSDs between the > models. It's a bit faster than our module. > > Cheers, > > Jo?o > > > 2014-04-11 17:58 GMT+02:00 George Devaniranjan : > >> I was wondering if there is a faster way to do the following. >> >> >> I am minimizing a protein structure and one of the 'measurements" is that >> the minimized structure be as close to the starting value as possible. >> >> >> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >> difference. >> >> >> When I checked the various "energy terms" that are used to evaluate the >> structure I find that the bottleneck is >> indeed SVDSuperimposer.SVDSuperimposer(). >> >> >> Is there a way to do this more efficiently ? >> >> >> Thank you >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > From anaryin at gmail.com Fri Apr 11 20:37:32 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 11 Apr 2014 22:37:32 +0200 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Hi George, Sorry for the delay in the answer.. Are you doing the minimization using Biopython? That's the only way I see in which the SVDSuperimposer is a bottleneck. In any case, the SVD code is written in C, so it should be pretty fast. Can you identify precisely where the bottleneck is (atom selection, fitting, calculation, etc)? Anyway, I would suggest looking into some weak position restraints on the heavy atoms of the backbone to keep things sort of in place. This would avoid the RMSD calculations at every step (I guess?), instead just a simple harmonic potential calculation added to the energy function. Cheers, Jo?o 2014-04-11 18:22 GMT+02:00 George Devaniranjan : > Oh, sorry-yes I meant the speed. > > > On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: > >> Hey George, >> >> What do you mean by bottleneck? In terms of speed? >> >> You can always use Profit for example to calculate RMSDs between the >> models. It's a bit faster than our module. >> >> Cheers, >> >> Jo?o >> >> >> 2014-04-11 17:58 GMT+02:00 George Devaniranjan : >> >>> I was wondering if there is a faster way to do the following. >>> >>> >>> I am minimizing a protein structure and one of the 'measurements" is that >>> the minimized structure be as close to the starting value as possible. >>> >>> >>> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >>> difference. >>> >>> >>> When I checked the various "energy terms" that are used to evaluate the >>> structure I find that the bottleneck is >>> indeed SVDSuperimposer.SVDSuperimposer(). >>> >>> >>> Is there a way to do this more efficiently ? >>> >>> >>> Thank you >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> >> > From devaniranjan at gmail.com Fri Apr 11 20:43:23 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 11 Apr 2014 16:43:23 -0400 Subject: [Biopython] SVDSuperimposer() In-Reply-To: References: Message-ID: Thank you Jo?o, No, I am using an in-house code written in python but in that code I use SVDSuperimposer() as well. The "energy function" that is minimized has various terms such as a steric clash is highly disfavored, Hydrogen bonds are favored...etc Let me try your second suggestion and see if that helps. On Fri, Apr 11, 2014 at 4:37 PM, Jo?o Rodrigues wrote: > Hi George, > > Sorry for the delay in the answer.. > > Are you doing the minimization using Biopython? That's the only way I see > in which the SVDSuperimposer is a bottleneck. In any case, the SVD code is > written in C, so it should be pretty fast. Can you identify precisely where > the bottleneck is (atom selection, fitting, calculation, etc)? > > Anyway, I would suggest looking into some weak position restraints on the > heavy atoms of the backbone to keep things sort of in place. This would > avoid the RMSD calculations at every step (I guess?), instead just a simple > harmonic potential calculation added to the energy function. > > Cheers, > > Jo?o > > > > > 2014-04-11 18:22 GMT+02:00 George Devaniranjan : > > Oh, sorry-yes I meant the speed. >> >> >> On Fri, Apr 11, 2014 at 12:11 PM, Jo?o Rodrigues wrote: >> >>> Hey George, >>> >>> What do you mean by bottleneck? In terms of speed? >>> >>> You can always use Profit for example to calculate RMSDs between the >>> models. It's a bit faster than our module. >>> >>> Cheers, >>> >>> Jo?o >>> >>> >>> 2014-04-11 17:58 GMT+02:00 George Devaniranjan : >>> >>>> I was wondering if there is a faster way to do the following. >>>> >>>> >>>> I am minimizing a protein structure and one of the 'measurements" is >>>> that >>>> the minimized structure be as close to the starting value as possible. >>>> >>>> >>>> Currently I use SVDSuperimposer.SVDSuperimposer() to calculate the RMSD >>>> difference. >>>> >>>> >>>> When I checked the various "energy terms" that are used to evaluate the >>>> structure I find that the bottleneck is >>>> indeed SVDSuperimposer.SVDSuperimposer(). >>>> >>>> >>>> Is there a way to do this more efficiently ? >>>> >>>> >>>> Thank you >>>> _______________________________________________ >>>> Biopython mailing list - Biopython at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> >>> >>> >> > From arklenna at gmail.com Sat Apr 12 03:41:36 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 11 Apr 2014 23:41:36 -0400 Subject: [Biopython] Installation Problems in Mavericks In-Reply-To: References: Message-ID: Hi Mike, Perhaps the XCode update has fixed this? Cheers, Lenna On Tue, Apr 8, 2014 at 5:12 PM, Lenna Peterson wrote: > This has been discussed on the dev list: > http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011131.html > > There are several possible workarounds, which are enumerated in various > answers on this stackoverflow question: > http://stackoverflow.com/questions/22313407/ > > Cheers, > > Lenna > > From tc9 at sanger.ac.uk Mon Apr 14 17:29:35 2014 From: tc9 at sanger.ac.uk (tc9) Date: Mon, 14 Apr 2014 18:29:35 +0100 Subject: [Biopython] random access to bgz file In-Reply-To: References: <093A736015FA4E44A43E043ABDFCBF78032AA3@exch-mbx2.internal.sanger.ac.uk> Message-ID: <9ac19a0cde00f3cbbcca33c6179a24ca@sanger.ac.uk> On 2014-04-09 22:00, Peter Cock wrote: > On Wed, Apr 9, 2014 at 6:35 PM, tc9 wrote: > >> Peter, thanks for link to html version of the bgzf documentation. Here some additional details. I am trying to do random access on a bgzipped haplotype/HAPS file. Here file format description: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample [1] I compressed the haps file with bgzip: zcat file.haps.gz | bgzip > file.haps.bgz I know the byte position of each newline after decompression, but I need the block offsets to go from a decompressed position to a virtual offset. > > Not necessarily - all you need is the virtual offset which > handle.tell() would give you. How did you get the positions > in the decompressed file? Can you not repeat that indexing > but using the virtual offsets via the BGZF handle? The > big advantage is you just use the virtual offsets without > having to know how they are calculated. > > If you really want to map from decompressed offsets to > virtual offsets, you will need both the raw start offset of > each block, but also the decompressed size of each > block (often 64kb, but it can be less). Initially I got the byte positions in the decompressed stream by reading the entire thing once with gzip.open(). I re-read the compressed file with BgzfReader and got the virtual offset of line number 1 million and was able to seek that line with BgzfReader much faster than I could have done with gzip.open(). See solution below, which I will post to a question of mine on stackoverflow.com. from Bio import bgzf file='file.haps.gz' handle = bgzf.BgzfReader(file) for i in range(10**6): handle.readline() virtual_offset = handle.tell() line1 = handle.readline() handle.close() handle = bgzf.BgzfReader(file) handle.seek(virtual_offset) line2 = handle.readline() handle.close() assert line1==line2 For completeness I want to mention that one can do: block_start_offset, within_block_offset = bgzf.split_virtual_offset(virtual_offset) virtual_offset = bgzf.make_virtual_offset(block_start_offset, within_block_offset) P.S. I was without a stable internet connection for a few days. Hence the slow reply. Thanks for the help! Links: ------ [1] https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#hapsample -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From csaba.kiss at lanl.gov Tue Apr 15 14:58:22 2014 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Tue, 15 Apr 2014 08:58:22 -0600 Subject: [Biopython] python advice needed Message-ID: <534D490E.9040604@lanl.gov> Hi! I need some advice how to get better in python. I have written a software package to analyze antibody deep sequencing data. This was my first experience with python and I am not a programmer. The end result works, however, if a professional coder looks at the scripts, it is obvious that it was written by an amateur. I am planning to re-write the code into a better format that is extendable and more user and coder friendly. At the moment the script only relies on biopython to get the sequences and quality values out of sff and fastq files, the rest is custom written. I would like to rely more on biopython and also perhaps extend biopython with new features. The problem I am having is object oriented python and classes. I understand the concept of both, but it's completely different to actually use it. I would like to ask help from scientist who are in a similar situation, as myself. I am a molecular biologist with interest in coding, but little background. Do you have any good tutorials books about python classes and OOP? For example, when I learned python I found the Google python class, extremely valuable. I practically looked at the videos and solved the problems and that sent me on my way to python: https://developers.google.com/edu/python/?csw=1 Any help would be appreciated: Csaba -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From devaniranjan at gmail.com Tue Apr 15 16:16:02 2014 From: devaniranjan at gmail.com (George Devaniranjan) Date: Tue, 15 Apr 2014 12:16:02 -0400 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: I wouldn't worry about it Csaba -it will come in time. I started in Python from C and at the beginning wrote "function style" code. After a "long" time "need" made it necessary to start with classes and I use both. As for the code looking "good" to a programmer, sorry if I sound cynical but I would use an "amateur" code than a "professional" as I often find the latter's efforts harder to decipher (even with comments) than an "amateur's" attempt. Good luck. On Tue, Apr 15, 2014 at 10:58 AM, Csaba Kiss wrote: > Hi! > I need some advice how to get better in python. I have written a software > package to analyze antibody deep sequencing data. This was my first > experience with python and I am not a programmer. The end result works, > however, if a professional coder looks at the scripts, it is obvious that > it was written by an amateur. I am planning to re-write the code into a > better format that is extendable and more user and coder friendly. At the > moment the script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I would like > to rely more on biopython and also perhaps extend biopython with new > features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to actually > use it. I would like to ask help from scientist who are in a similar > situation, as myself. I am a molecular biologist with interest in coding, > but little background. Do you have any good tutorials books about python > classes and OOP? For example, when I learned python I found the Google > python class, extremely valuable. I practically looked at the videos and > solved the problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From kevin.rue at ucdconnect.ie Tue Apr 15 16:27:22 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Tue, 15 Apr 2014 17:27:22 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: Hi Csaba, Well done! I witness everyday in my research group that the transition from fundamental biology to bioinformatics is not a straightforward process. Congratulations on your first successful experience. To give some context to my answer, let me tell you that I am a 3rd year PhD student trained in bioinformatics for the past 6 years (since my Master's Degree). Python is the first programming language I was taught during my Master's Degree (a tiny amount of Matlab in practicals of math before that), and I was taught the object-oriented programming aspect through classes of the Java programming language. I am glad that you managed to teach yourself how to program in Python through online resources. However, I think that going to actual classes can ease the learning curve a lot, particularly at the beginning, and for new topics such as object-oriented programming. The interactive Q&A with the demonstrator, and the questions of other classmates can help rapidly come across some common mistakes and tricks. For instance, a post-doc in my lab is learning Python just like you, and I have seen him rack his head for hours until I came along and pointed him in the right direction (avoid giving a student an answer: "give someone food and he'll eat for the day, teach them how to cook and they'll eat for the rest of their life"). Meanwhile, it is always useful to have a book around, I heard a lot of good about the O'Reilly books for that matter. They have Python books for beginners, intermediate and high-performance programming ( http://shop.oreilly.com/category/browse-subjects/programming/python.do). Now, if you allow me a few personal pieces of advice about programming (valid for Python and most languages): - "Always write pseudo-code first" - Pseudo-code is "an informal high-level description of the operating principle of a computer program or other algorithm" (Thanks Wikipedia, you just saved me 10 minutes to find my words) - In other words, before you even approach you "file.py" script, turn off the screen of your computer, take a piece of paper, and write down what your script is supposed to do, what input it will accept, what outputs it will generate. First in one sentence of plain English. Then break the sentence in subtasks. Then continue breaking each of these subtasks into smaller ones until you recognise small tasks that you feel confident to code in a reasonable number of lines. - The pseudo-code is extremely valuable for two reasons: - Avoid losing focus of what the script was originally intended to do. (once coding, it is quite easy to lose sight of the greater scheme) - It will help document your script, if you write a wiki or simply to comment you code (if you share it with someone else, they won't need to read the entire code to understand its purpose) - "Draw your objects/classes" - Essentially, an object/class has a number of attributes (=variables) and methods (=functions). For each I typically draw a box entitled with the name of the class. Then in the box, I list the names of the attributes and the names of the methods. The names of the attributes and methods should clearly represent what they are meant to contain (attributes) or do (methods). - I still apply a rule that one of my earliest programming teacher taught us: "functions are meant to do stuff, therefore their name should always start with a verb of action" - "Google is your friend" - That's a tricky one, but every time you know what you want to do but you don't know how on earth you can do it: Google your problem. You may have to browse a while, or try different search words, but in my experience "Any problem you find to write working and efficient code, someone else likely had the same problem before you". If you can clearly explain your problem, StackOverflow and other such websites may have the answer. - Use a code versioning tool - All the changes you have done for the past week have made your script worse and you don't have a copy of last week's script? Version control tools such as git/GitHub and svn will help you keep track of what your code looked like along the way. This way, you can edit a script that is working to try and enhance it without the fear of messing it up. If it goes sour, you can just go back to the working script without having to keep a separate backup. - Use a friendly (but still powerful) development environement - IDE (Integrated development environement) are software which are meant to make programming easier. A (silly?) example is a feature I cannot work without: auto-completion. Tired of typing the same long variable name over and over again? Once you have defined "variable=5" in your script, a decent IDE will allow you to type only "var" and opens you a friendly pop-up window suggesting you all existing variables and methods starting with "var". Select the one you need with the arrow keys and hit TAB: you don't have to type the rest of the variable. An amusing side-effect of this is that your variable names will grow longer (and therefore be more explicit about what they contain). IDE come with many more features including code checking, spell checking, ... - For Python I am very happy with PyCharm This email ended up to be much longer than I intended it, but I hope you will find it useful ! The learning curve to Python progamming can be rough. Learning additional tricks like version control, IDE, and object-oriented programming can make it even steeper, but the end result is a very rewarding skillset that can be helpful in many circumstances and appeal to many research group leaders too! Best of luck in your learning of Python ! Kevin On 15 April 2014 15:58, Csaba Kiss wrote: > Hi! > I need some advice how to get better in python. I have written a software > package to analyze antibody deep sequencing data. This was my first > experience with python and I am not a programmer. The end result works, > however, if a professional coder looks at the scripts, it is obvious that > it was written by an amateur. I am planning to re-write the code into a > better format that is extendable and more user and coder friendly. At the > moment the script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I would like > to rely more on biopython and also perhaps extend biopython with new > features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to actually > use it. I would like to ask help from scientist who are in a similar > situation, as myself. I am a molecular biologist with interest in coding, > but little background. Do you have any good tutorials books about python > classes and OOP? For example, when I learned python I found the Google > python class, extremely valuable. I practically looked at the videos and > solved the problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From ferreirafm at usp.br Tue Apr 15 18:56:36 2014 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Tue, 15 Apr 2014 15:56:36 -0300 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: <534D80E4.4070100@usp.br> Hi Csaba, Have a look on these free books. https://drive.google.com/file/d/0B9eRIc-w3cjVV0ZZSmxSUWZDdVU/edit?usp=sharing They certainly will help you a lot. Best, -- Dr. Frederico Moraes Ferreira University of Sao Paulo Heart Institute, School of Medicine Laboratoy of Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 Sao Paulo - SP Brasil Em 15-04-2014 11:58, Csaba Kiss escreveu: > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This was my > first experience with python and I am not a programmer. The end result > works, however, if a professional coder looks at the scripts, it is > obvious that it was written by an amateur. I am planning to re-write > the code into a better format that is extendable and more user and > coder friendly. At the moment the script only relies on biopython to > get the sequences and quality values out of sff and fastq files, the > rest is custom written. I would like to rely more on biopython and > also perhaps extend biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are in a > similar situation, as myself. I am a molecular biologist with interest > in coding, but little background. Do you have any good tutorials books > about python classes and OOP? For example, when I learned python I > found the Google python class, extremely valuable. I practically > looked at the videos and solved the problems and that sent me on my > way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > From csaba.kiss at lanl.gov Tue Apr 15 19:19:19 2014 From: csaba.kiss at lanl.gov (Csaba Kiss) Date: Tue, 15 Apr 2014 13:19:19 -0600 Subject: [Biopython] python advice needed In-Reply-To: References: <534D490E.9040604@lanl.gov> Message-ID: <534D8637.1040507@lanl.gov> Thanks for the advice Kevin. If this was a forum, they should make your post a sticky :). I use pycharm and really like it. However, using it efficiently is also challenging. Csaba On 4/15/2014 10:27 AM, Kevin Rue wrote: > Hi Csaba, > > Well done! I witness everyday in my research group that the transition > from fundamental biology to bioinformatics is not a straightforward > process. Congratulations on your first successful experience. > > To give some context to my answer, let me tell you that I am a 3rd > year PhD student trained in bioinformatics for the past 6 years (since > my Master's Degree). Python is the first programming language I was > taught during my Master's Degree (a tiny amount of Matlab in > practicals of math before that), and I was taught the object-oriented > programming aspect through classes of the Java programming language. > > I am glad that you managed to teach yourself how to program in Python > through online resources. However, I think that going to actual > classes can ease the learning curve a lot, particularly at the > beginning, and for new topics such as object-oriented programming. The > interactive Q&A with the demonstrator, and the questions of other > classmates can help rapidly come across some common mistakes and > tricks. For instance, a post-doc in my lab is learning Python just > like you, and I have seen him rack his head for hours until I came > along and pointed him in the right direction (avoid giving a student > an answer: "give someone food and he'll eat for the day, teach them > how to cook and they'll eat for the rest of their life"). > > Meanwhile, it is always useful to have a book around, I heard a lot of > good about the O'Reilly books for that matter. They have Python books > for beginners, intermediate and high-performance programming > (http://shop.oreilly.com/category/browse-subjects/programming/python.do). > > > > Now, if you allow me a few personal pieces of advice about programming > (valid for Python and most languages): > > * "Always write pseudo-code first" > o Pseudo-code is "an informal high-level > description > of the operating principle of a computer program or other > algorithm" (Thanks Wikipedia, you just saved me 10 minutes to > find my words) > o In other words, before you even approach you "file.py" script, > turn off the screen of your computer, take a piece of paper, > and write down what your script is supposed to do, what input > it will accept, what outputs it will generate. First in one > sentence of plain English. Then break the sentence in > subtasks. Then continue breaking each of these subtasks into > smaller ones until you recognise small tasks that you feel > confident to code in a reasonable number of lines. > o The pseudo-code is extremely valuable for two reasons: > + Avoid losing focus of what the script was originally > intended to do. (once coding, it is quite easy to lose > sight of the greater scheme) > + It will help document your script, if you write a wiki or > simply to comment you code (if you share it with someone > else, they won't need to read the entire code to > understand its purpose) > * "Draw your objects/classes" > o Essentially, an object/class has a number of attributes > (=variables) and methods (=functions). For each I typically > draw a box entitled with the name of the class. Then in the > box, I list the names of the attributes and the names of the > methods. The names of the attributes and methods should > clearly represent what they are meant to contain (attributes) > or do (methods). > + I still apply a rule that one of my earliest programming > teacher taught us: "functions are meant to do stuff, > therefore their name should always start with a verb of > action" > * "Google is your friend" > o That's a tricky one, but every time you know what you want to > do but you don't know how on earth you can do it: Google your > problem. You may have to browse a while, or try different > search words, but in my experience "Any problem you find to > write working and efficient code, someone else likely had the > same problem before you". If you can clearly explain your > problem, StackOverflow and other such websites may have the > answer. > * Use a code versioning tool > o All the changes you have done for the past week have made your > script worse and you don't have a copy of last week's script? > Version control tools such as git/GitHub and svn will help you > keep track of what your code looked like along the way. This > way, you can edit a script that is working to try and enhance > it without the fear of messing it up. If it goes sour, you can > just go back to the working script without having to keep a > separate backup. > * Use a friendly (but still powerful) development environement > o IDE (Integrated development environement) are software which > are meant to make programming easier. A (silly?) example is a > feature I cannot work without: auto-completion. Tired of > typing the same long variable name over and over again? Once > you have defined "variable=5" in your script, a decent IDE > will allow you to type only "var" and opens you a friendly > pop-up window suggesting you all existing variables and > methods starting with "var". Select the one you need with the > arrow keys and hit TAB: you don't have to type the rest of the > variable. An amusing side-effect of this is that your variable > names will grow longer (and therefore be more explicit about > what they contain). IDE come with many more features including > code checking, spell checking, ... > o For Python I am very happy with PyCharm > > > > This email ended up to be much longer than I intended it, but I hope > you will find it useful ! > The learning curve to Python progamming can be rough. Learning > additional tricks like version control, IDE, and object-oriented > programming can make it even steeper, but the end result is a very > rewarding skillset that can be helpful in many circumstances and > appeal to many research group leaders too! > > Best of luck in your learning of Python ! > > Kevin > > > > > On 15 April 2014 15:58, Csaba Kiss > wrote: > > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This > was my first experience with python and I am not a programmer. The > end result works, however, if a professional coder looks at the > scripts, it is obvious that it was written by an amateur. I am > planning to re-write the code into a better format that is > extendable and more user and coder friendly. At the moment the > script only relies on biopython to get the sequences and quality > values out of sff and fastq files, the rest is custom written. I > would like to rely more on biopython and also perhaps extend > biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are > in a similar situation, as myself. I am a molecular biologist with > interest in coding, but little background. Do you have any good > tutorials books about python classes and OOP? For example, when I > learned python I found the Google python class, extremely > valuable. I practically looked at the videos and solved the > problems and that sent me on my way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en -- Best Regards: Csaba Kiss PhD, MSc, BSc TA-43, HRL-1, MS888 Los Alamos National Laboratory Work: 1-505-667-9898 Cell: 1-505-920-5774 From kevin.rue at ucdconnect.ie Tue Apr 15 19:40:44 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Tue, 15 Apr 2014 20:40:44 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534D8637.1040507@lanl.gov> References: <534D490E.9040604@lanl.gov> <534D8637.1040507@lanl.gov> Message-ID: Thanks, much appreciated ! Best of luck, Kevin On 15 April 2014 20:19, Csaba Kiss wrote: > Thanks for the advice Kevin. If this was a forum, they should make your > post a sticky :). I use pycharm and really like it. However, using it > efficiently is also challenging. > > Csaba > > On 4/15/2014 10:27 AM, Kevin Rue wrote: > > Hi Csaba, > > Well done! I witness everyday in my research group that the transition > from fundamental biology to bioinformatics is not a straightforward > process. Congratulations on your first successful experience. > > To give some context to my answer, let me tell you that I am a 3rd year > PhD student trained in bioinformatics for the past 6 years (since my > Master's Degree). Python is the first programming language I was taught > during my Master's Degree (a tiny amount of Matlab in practicals of math > before that), and I was taught the object-oriented programming aspect > through classes of the Java programming language. > > I am glad that you managed to teach yourself how to program in Python > through online resources. However, I think that going to actual classes can > ease the learning curve a lot, particularly at the beginning, and for new > topics such as object-oriented programming. The interactive Q&A with the > demonstrator, and the questions of other classmates can help rapidly come > across some common mistakes and tricks. For instance, a post-doc in my lab > is learning Python just like you, and I have seen him rack his head for > hours until I came along and pointed him in the right direction (avoid > giving a student an answer: "give someone food and he'll eat for the day, > teach them how to cook and they'll eat for the rest of their life"). > > Meanwhile, it is always useful to have a book around, I heard a lot of > good about the O'Reilly books for that matter. They have Python books for > beginners, intermediate and high-performance programming ( > http://shop.oreilly.com/category/browse-subjects/programming/python.do). > > > > Now, if you allow me a few personal pieces of advice about programming > (valid for Python and most languages): > > - "Always write pseudo-code first" > - Pseudo-code is "an informal high-level description > of the operating principle of a computer program or other > algorithm" (Thanks Wikipedia, you just saved me 10 minutes to find my wo > rds) > - In other words, before you even approach you "file.py" script, > turn off the screen of your computer, take a piece of paper, and write down > what your script is supposed to do, what input it will accept, what outputs > it will generate. First in one sentence of plain English. Then break the > sentence in subtasks. Then continue breaking each of these subtasks into > smaller ones until you recognise small tasks that you feel confident to > code in a reasonable number of lines. > - The pseudo-code is extremely valuable for two reasons: > - Avoid losing focus of what the script was originally intended > to do. (once coding, it is quite easy to lose sight of the greater scheme) > - It will help document your script, if you write a wiki or > simply to comment you code (if you share it with someone else, they won't > need to read the entire code to understand its purpose) > - "Draw your objects/classes" > - Essentially, an object/class has a number of attributes > (=variables) and methods (=functions). For each I typically draw a box > entitled with the name of the class. Then in the box, I list the names of > the attributes and the names of the methods. The names of the attributes > and methods should clearly represent what they are meant to contain > (attributes) or do (methods). > - I still apply a rule that one of my earliest programming > teacher taught us: "functions are meant to do stuff, therefore their name > should always start with a verb of action" > - "Google is your friend" > - That's a tricky one, but every time you know what you want to do > but you don't know how on earth you can do it: Google your problem. You may > have to browse a while, or try different search words, but in my experience > "Any problem you find to write working and efficient code, someone else > likely had the same problem before you". If you can clearly explain your > problem, StackOverflow and other such websites may have the answer. > - Use a code versioning tool > - All the changes you have done for the past week have made your > script worse and you don't have a copy of last week's script? Version > control tools such as git/GitHub and svn will help you keep track of what > your code looked like along the way. This way, you can edit a script that > is working to try and enhance it without the fear of messing it up. If it > goes sour, you can just go back to the working script without having to > keep a separate backup. > - Use a friendly (but still powerful) development environement > - IDE (Integrated development environement) are software which are > meant to make programming easier. A (silly?) example is a feature I cannot > work without: auto-completion. Tired of typing the same long variable name > over and over again? Once you have defined "variable=5" in your script, a > decent IDE will allow you to type only "var" and opens you a friendly > pop-up window suggesting you all existing variables and methods starting > with "var". Select the one you need with the arrow keys and hit TAB: you > don't have to type the rest of the variable. An amusing side-effect of this > is that your variable names will grow longer (and therefore be more > explicit about what they contain). IDE come with many more features > including code checking, spell checking, ... > - For Python I am very happy with PyCharm > > > > This email ended up to be much longer than I intended it, but I hope you > will find it useful ! > The learning curve to Python progamming can be rough. Learning additional > tricks like version control, IDE, and object-oriented programming can make > it even steeper, but the end result is a very rewarding skillset that can > be helpful in many circumstances and appeal to many research group leaders > too! > > Best of luck in your learning of Python ! > > Kevin > > > > > On 15 April 2014 15:58, Csaba Kiss wrote: > >> Hi! >> I need some advice how to get better in python. I have written a software >> package to analyze antibody deep sequencing data. This was my first >> experience with python and I am not a programmer. The end result works, >> however, if a professional coder looks at the scripts, it is obvious that >> it was written by an amateur. I am planning to re-write the code into a >> better format that is extendable and more user and coder friendly. At the >> moment the script only relies on biopython to get the sequences and quality >> values out of sff and fastq files, the rest is custom written. I would like >> to rely more on biopython and also perhaps extend biopython with new >> features. >> The problem I am having is object oriented python and classes. I >> understand the concept of both, but it's completely different to actually >> use it. I would like to ask help from scientist who are in a similar >> situation, as myself. I am a molecular biologist with interest in coding, >> but little background. Do you have any good tutorials books about python >> classes and OOP? For example, when I learned python I found the Google >> python class, extremely valuable. I practically looked at the videos and >> solved the problems and that sent me on my way to python: >> https://developers.google.com/edu/python/?csw=1 >> >> Any help would be appreciated: >> Csaba >> >> -- >> Best Regards: >> Csaba Kiss PhD, MSc, BSc >> TA-43, HRL-1, MS888 >> Los Alamos National Laboratory >> Work: 1-505-667-9898 >> Cell: 1-505-920-5774 >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > > > -- > Best Regards: > Csaba Kiss PhD, MSc, BSc > TA-43, HRL-1, MS888 > Los Alamos National Laboratory > Work: 1-505-667-9898 > Cell: 1-505-920-5774 > > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From catfish at austin.utexas.edu Tue Apr 15 22:37:37 2014 From: catfish at austin.utexas.edu (Cannatella, David) Date: Tue, 15 Apr 2014 22:37:37 +0000 Subject: [Biopython] Biopython and OSX Mavericks problem. Message-ID: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> I've had the same problem as Mike Shaffer (8 April) on the discussion list (see my errors below). I had tried the possible solutions mentioned (including the export commands below), including those on the stackoverflow site, but none has worked so far. Some posted solutions have included installing other versions of python, etc., but this is not practical for several reasons. Is re-installing an earlier version of XCode a viable option for me? Or is it likely that there might be a fix in BioPython in the next month? Thanks in advance, Dave ========= export CPPFLAGS=-Qunused-arguments export CFLAGS=-Qunused-arguments =============== ... building 'Bio.cpairwise2' extension cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 ============= From arklenna at gmail.com Wed Apr 16 05:40:21 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Wed, 16 Apr 2014 01:40:21 -0400 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < catfish at austin.utexas.edu> wrote: > I've had the same problem as Mike Shaffer (8 April) on the discussion list > (see my errors below). > > I had tried the possible solutions mentioned (including the export > commands below), including those on the stackoverflow site, but none has > worked so far. > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 Are you installing with sudo? > > Some posted solutions have included installing other versions of python, > etc., but this is not practical for several reasons. > > Is re-installing an earlier version of XCode a viable option for me? > Apple is not particularly supportive of downgrading, so I imagine this could have unforeseen side effects. clang will have this behavior indefinitely; the problem is an incompatibility between clang (part of XCode) and the python version distributed with the OS. Hopefully there will be an OS update soon fixing the python version. > Or is it likely that there might be a fix in BioPython in the next month? > > Thanks in advance, > Dave > > ========= > export CPPFLAGS=-Qunused-arguments > export CFLAGS=-Qunused-arguments > > =============== > ... > building 'Bio.cpairwise2' extension > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g > -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a warning) > in the future > error: command 'cc' failed with exit status 1 > ============= > > > > > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From stephane.teletchea at inserm.fr Wed Apr 16 12:38:40 2014 From: stephane.teletchea at inserm.fr (=?ISO-8859-1?Q?T=E9letch=E9a_St=E9phane?=) Date: Wed, 16 Apr 2014 14:38:40 +0200 Subject: [Biopython] python advice needed In-Reply-To: <534D490E.9040604@lanl.gov> References: <534D490E.9040604@lanl.gov> Message-ID: <534E79D0.1080008@inserm.fr> Le 15/04/2014 16:58, Csaba Kiss a ?crit : > Hi! > I need some advice how to get better in python. I have written a > software package to analyze antibody deep sequencing data. This was my > first experience with python and I am not a programmer. The end result > works, however, if a professional coder looks at the scripts, it is > obvious that it was written by an amateur. I am planning to re-write > the code into a better format that is extendable and more user and > coder friendly. At the moment the script only relies on biopython to > get the sequences and quality values out of sff and fastq files, the > rest is custom written. I would like to rely more on biopython and > also perhaps extend biopython with new features. > The problem I am having is object oriented python and classes. I > understand the concept of both, but it's completely different to > actually use it. I would like to ask help from scientist who are in a > similar situation, as myself. I am a molecular biologist with interest > in coding, but little background. Do you have any good tutorials books > about python classes and OOP? For example, when I learned python I > found the Google python class, extremely valuable. I practically > looked at the videos and solved the problems and that sent me on my > way to python: > https://developers.google.com/edu/python/?csw=1 > > Any help would be appreciated: > Csaba > Dear Csaba, Being a bioinformatics teacher, I would first say that your code should first work :-) Second, in order to get another version of your code, as mentioned earlier, you should use a git-like versioning control tool (git or any other, git tends to be popular). Third, concerning python itself, I would recommend following the "PEP8" recommandations: http://legacy.python.org/dev/peps/pep-0008/ (I also found this page while searching for PEP8 -> http://docs.python-guide.org/en/latest/writing/style/) And last, since we are using biopython, check how biopython is implemented (for example): https://github.com/biopython/biopython/blob/master/Bio/AlignIO/Interfaces.py Best, St?phane -- Equipe DSIMB - Dynamique des Structures et des Interactions des Macromol?cules Biologiques INTS, INSERM-Paris-Diderot UMR_S 1134 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France T?l : +33 144 493 057 Fax : +33 147 347 431 http://www.dsimb.inserm.fr - http://www.steletch.org From kevin.rue at ucdconnect.ie Wed Apr 16 15:09:53 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Wed, 16 Apr 2014 16:09:53 +0100 Subject: [Biopython] python advice needed In-Reply-To: <534E79D0.1080008@inserm.fr> References: <534D490E.9040604@lanl.gov> <534E79D0.1080008@inserm.fr> Message-ID: Stephane, "I would first say that your code should first work" With all due respect, I would be careful how you phrased that, especially when talking to a beginner in programming. The way I understand it (not necessarily the way you meant it), this could be one of the worst advice I have heard. I would very much rather have a script that does not work is well commented and documented (making it easier to debug), than a script copied from StackOverflow that works with obscure syntax and no comment to guide you in understanding it. Programming 1.0.1 in my opinion. To me, a script that is "pseudo-coded first", is much more likely to "work second" (i.e. first after the pseudo-code, excuse my play on words). Still in my humble opinion, creating a code that "works first" is one of the best way to write something that works fine in your particular little ultra-specific scenario. At best, it will need to be copy-pasted and edited for another scenario, more often you will start from scratch another script to "work first" in the second scenario. The PEP8 is a very good advice I forgot to mention. PyCharm is very useful in that regard as it checks the code for the PEP8 rules while it is typed. Regards, Kevin On 16 April 2014 13:38, T?letch?a St?phane wrote: > Le 15/04/2014 16:58, Csaba Kiss a ?crit : > > Hi! >> I need some advice how to get better in python. I have written a software >> package to analyze antibody deep sequencing data. This was my first >> experience with python and I am not a programmer. The end result works, >> however, if a professional coder looks at the scripts, it is obvious that >> it was written by an amateur. I am planning to re-write the code into a >> better format that is extendable and more user and coder friendly. At the >> moment the script only relies on biopython to get the sequences and quality >> values out of sff and fastq files, the rest is custom written. I would like >> to rely more on biopython and also perhaps extend biopython with new >> features. >> The problem I am having is object oriented python and classes. I >> understand the concept of both, but it's completely different to actually >> use it. I would like to ask help from scientist who are in a similar >> situation, as myself. I am a molecular biologist with interest in coding, >> but little background. Do you have any good tutorials books about python >> classes and OOP? For example, when I learned python I found the Google >> python class, extremely valuable. I practically looked at the videos and >> solved the problems and that sent me on my way to python: >> https://developers.google.com/edu/python/?csw=1 >> >> Any help would be appreciated: >> Csaba >> >> > Dear Csaba, > > Being a bioinformatics teacher, I would first say that your code should > first work :-) > > Second, in order to get another version of your code, as mentioned > earlier, you should use > a git-like versioning control tool (git or any other, git tends to be > popular). > > Third, concerning python itself, I would recommend following the "PEP8" > recommandations: > http://legacy.python.org/dev/peps/pep-0008/ > > (I also found this page while searching for PEP8 -> > http://docs.python-guide.org/en/latest/writing/style/) > > And last, since we are using biopython, check how biopython is implemented > (for example): > https://github.com/biopython/biopython/blob/master/Bio/ > AlignIO/Interfaces.py > > Best, > St?phane > > -- > Equipe DSIMB - Dynamique des Structures et > des Interactions des Macromol?cules Biologiques > INTS, INSERM-Paris-Diderot UMR_S 1134 > 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France > T?l : +33 144 493 057 > Fax : +33 147 347 431 > http://www.dsimb.inserm.fr - http://www.steletch.org > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From kevin.rue at ucdconnect.ie Wed Apr 16 15:11:08 2014 From: kevin.rue at ucdconnect.ie (Kevin Rue) Date: Wed, 16 Apr 2014 16:11:08 +0100 Subject: [Biopython] python advice needed In-Reply-To: References: <534D490E.9040604@lanl.gov> <534E79D0.1080008@inserm.fr> Message-ID: Just adding a missing "but" in one of my sentences below: On 16 April 2014 16:09, Kevin Rue wrote: > Stephane, > > "I would first say that your code should first work" > > With all due respect, I would be careful how you phrased that, especially > when talking to a beginner in programming. The way I understand it (not > necessarily the way you meant it), this could be one of the worst advice I > have heard. > I would very much rather have a script that does not work but is well > commented and documented (making it easier to debug), than a script copied > from StackOverflow that works with obscure syntax and no comment to guide > you in understanding it. Programming 1.0.1 in my opinion. > > To me, a script that is "pseudo-coded first", is much more likely to "work > second" (i.e. first after the pseudo-code, excuse my play on words). > Still in my humble opinion, creating a code that "works first" is one of > the best way to write something that works fine in your particular little > ultra-specific scenario. At best, it will need to be copy-pasted and edited > for another scenario, more often you will start from scratch another script > to "work first" in the second scenario. > > The PEP8 is a very good advice I forgot to mention. PyCharm is very useful > in that regard as it checks the code for the PEP8 rules while it is typed. > > Regards, > Kevin > > > On 16 April 2014 13:38, T?letch?a St?phane wrote: > >> Le 15/04/2014 16:58, Csaba Kiss a ?crit : >> >> Hi! >>> I need some advice how to get better in python. I have written a >>> software package to analyze antibody deep sequencing data. This was my >>> first experience with python and I am not a programmer. The end result >>> works, however, if a professional coder looks at the scripts, it is obvious >>> that it was written by an amateur. I am planning to re-write the code into >>> a better format that is extendable and more user and coder friendly. At the >>> moment the script only relies on biopython to get the sequences and quality >>> values out of sff and fastq files, the rest is custom written. I would like >>> to rely more on biopython and also perhaps extend biopython with new >>> features. >>> The problem I am having is object oriented python and classes. I >>> understand the concept of both, but it's completely different to actually >>> use it. I would like to ask help from scientist who are in a similar >>> situation, as myself. I am a molecular biologist with interest in coding, >>> but little background. Do you have any good tutorials books about python >>> classes and OOP? For example, when I learned python I found the Google >>> python class, extremely valuable. I practically looked at the videos and >>> solved the problems and that sent me on my way to python: >>> https://developers.google.com/edu/python/?csw=1 >>> >>> Any help would be appreciated: >>> Csaba >>> >>> >> Dear Csaba, >> >> Being a bioinformatics teacher, I would first say that your code should >> first work :-) >> >> Second, in order to get another version of your code, as mentioned >> earlier, you should use >> a git-like versioning control tool (git or any other, git tends to be >> popular). >> >> Third, concerning python itself, I would recommend following the "PEP8" >> recommandations: >> http://legacy.python.org/dev/peps/pep-0008/ >> >> (I also found this page while searching for PEP8 -> >> http://docs.python-guide.org/en/latest/writing/style/) >> >> And last, since we are using biopython, check how biopython is >> implemented (for example): >> https://github.com/biopython/biopython/blob/master/Bio/ >> AlignIO/Interfaces.py >> >> Best, >> St?phane >> >> -- >> Equipe DSIMB - Dynamique des Structures et >> des Interactions des Macromol?cules Biologiques >> INTS, INSERM-Paris-Diderot UMR_S 1134 >> 6 rue Alexandre Cabanel - 75739 Paris cedex 15 - France >> T?l : +33 144 493 057 >> Fax : +33 147 347 431 >> http://www.dsimb.inserm.fr - http://www.steletch.org >> >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > K?vin RUE-ALBRECHT > Wellcome Trust Computational Infection Biology PhD Programme > University College Dublin > Ireland > http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en > -- K?vin RUE-ALBRECHT Wellcome Trust Computational Infection Biology PhD Programme University College Dublin Ireland http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en From zhigang.wu at email.ucr.edu Wed Apr 16 20:31:56 2014 From: zhigang.wu at email.ucr.edu (Zhigang Wu) Date: Wed, 16 Apr 2014 13:31:56 -0700 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: If you cannot get pip working, I recommend you trying Macport, which often involves less hassle. In case you have not used port before, follow the instruction here: http://www.macports.org/install.php to install port first then you can install biopython by typing `sudo port install 'py27-biopython' ` (I assume your OSX running python2.7) Zhigang On Tue, Apr 15, 2014 at 10:40 PM, Lenna Peterson wrote: > On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < > catfish at austin.utexas.edu> wrote: > > > I've had the same problem as Mike Shaffer (8 April) on the discussion > list > > (see my errors below). > > > > I had tried the possible solutions mentioned (including the export > > commands below), including those on the stackoverflow site, but none has > > worked so far. > > > > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 > > Are you installing with sudo? > > > > > > Some posted solutions have included installing other versions of python, > > etc., but this is not practical for several reasons. > > > > Is re-installing an earlier version of XCode a viable option for me? > > > > Apple is not particularly supportive of downgrading, so I imagine this > could have unforeseen side effects. clang will have this behavior > indefinitely; the problem is an incompatibility between clang (part of > XCode) and the python version distributed with the OS. Hopefully there will > be an OS update soon fixing the python version. > > > > Or is it likely that there might be a fix in BioPython in the next month? > > > > Thanks in advance, > > Dave > > > > ========= > > export CPPFLAGS=-Qunused-arguments > > export CFLAGS=-Qunused-arguments > > > > =============== > > ... > > building 'Bio.cpairwise2' extension > > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g > > -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > warning) > > in the future > > error: command 'cc' failed with exit status 1 > > ============= > > > > > > > > > > > > > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 17 09:33:48 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Apr 2014 10:33:48 +0100 Subject: [Biopython] Biopython and OSX Mavericks problem. In-Reply-To: References: <48EB8916-B06A-4182-B67A-BD6BA25C90F3@austin.utexas.edu> Message-ID: On Wed, Apr 16, 2014 at 6:40 AM, Lenna Peterson wrote: > On Tue, Apr 15, 2014 at 6:37 PM, Cannatella, David < > catfish at austin.utexas.edu> wrote: > >> I've had the same problem as Mike Shaffer (8 April) on the discussion list >> (see my errors below). >> >> I had tried the possible solutions mentioned (including the export >> commands below), including those on the stackoverflow site, but none has >> worked so far. >> > > Have you tried the ARCHFLAGS option? http://stackoverflow.com/a/22372751 > > Are you installing with sudo? > >> >> Some posted solutions have included installing other versions of python, >> etc., but this is not practical for several reasons. >> >> Is re-installing an earlier version of XCode a viable option for me? >> > > Apple is not particularly supportive of downgrading, so I imagine this > could have unforeseen side effects. clang will have this behavior > indefinitely; the problem is an incompatibility between clang (part of > XCode) and the python version distributed with the OS. Hopefully there will > be an OS update soon fixing the python version. The other option (which is actually the recommended route according the NumPy/SciPy folk) is to ignore the Apple provides Python (don't try to remove it!), and install your own direct from python.org. They do provide precompiled binaries for the Mac, but I like to do this myself under $HOME in order to test with the newer releases like Python 3.3 and 3.4 etc. Try something like: $ cd ~/Downloads $ wget http://www.python.org/ftp/python/3.3.3/Python-3.33.tgz $ tar -zxvf Python-3.3.3.tgz $ cd Python-3.3.3 $ ./configure --prefix=$HOME $ make $ make test $ make install Then modify your $HOME/.bash_profile to add $HOME/bin to your path: export PATH=$HOME/bin:$PATH Then install NumPy, and try installing Biopython from source. Peter From vikthirtyfive at gmail.com Sat Apr 19 14:31:03 2014 From: vikthirtyfive at gmail.com (Vikram K) Date: Sat, 19 Apr 2014 10:31:03 -0400 Subject: [Biopython] cosmic data Message-ID: Dear Biopython users, is there any biopython module which can be used to help analyze cosmic data? Regards Vikram From biologyguy at gmail.com Sat Apr 19 21:08:17 2014 From: biologyguy at gmail.com (Steve Bond) Date: Sat, 19 Apr 2014 17:08:17 -0400 Subject: [Biopython] cosmic data Message-ID: Hi Vikram, I'm not sure if Biopython has a dedicated module, but the cosmic database is actually quite simple to work with on its own. It's only a single table, and can be loaded into your favourite SQL server in a snap. What sort of analysis are you trying to do? Here's the link to the database download if you don't already have a copy ftp://ftp.sanger.ac.uk/pub/CGP/cosmic/data_export/CosmicCompleteExportIncFus_v68.tsv.gz -Steve On Sat, Apr 19, 2014 at 12:00 PM, wrote: > > > Message: 1 > Date: Sat, 19 Apr 2014 10:31:03 -0400 > From: Vikram K > Subject: [Biopython] cosmic data > To: biopython at lists.open-bio.org > Message-ID: > XDfXsfzdN2n27CxX4FwcJQXbG8n+hX1OZA at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Dear Biopython users, > is there any biopython module which can be used to help analyze cosmic > data? > > Regards > Vikram > > From rpathmanaban1 at gmail.com Sat Apr 19 22:04:36 2014 From: rpathmanaban1 at gmail.com (Pathmanaban Ramasamy) Date: Sun, 20 Apr 2014 00:04:36 +0200 Subject: [Biopython] Graph representation Message-ID: Dear Biopython users, Can anyone help me how to represent residue contact map of pdb files in adjacency list? Also need some good references/articles related. Thanks in advance, Pathmanaban. From p.j.a.cock at googlemail.com Tue Apr 22 16:11:33 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Apr 2014 17:11:33 +0100 Subject: [Biopython] Fwd: [GSoC] Welcome aboard, GSoC 2014 students! In-Reply-To: References: Message-ID: Dear Biopythoneers, Please join me in congratulating this year's accepted students for Google Summer of Code under the Open Bioinformatics Foundation (OBF), and in particular welcome Evan Parker who will be working on Biopython with Bow and myself as co-mentors. As always the scheme has been very competitive, so our sympathies and commiserations to those students who were not accepted. Please do stay involved in Biopython or other open source projects - this would be a positive factor if you are eligible to apply for next year's GSoC. Thank you, Peter ---------- Forwarded message ---------- From: Eric Talevich Date: Tue, Apr 22, 2014 at 4:41 AM Subject: [GSoC] Welcome aboard, GSoC 2014 students! To: OBF GSoC Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students: Sarah Berkemer - "Open source high-performance BioHaskell" (Mentors: Christian H?ner zu Siederdissen, Ketil Malde) Loris Cro - "An ultra-fast scalable RESTful API to query large numbers of VCF datapoints" (Mentors: Francesco Strozzi, Raoul Bonnal & the BioRuby team) Victor Kofia - "JSBML: Redesign the implementation of mathematical formulas" (Mentors: Alex Thomas, Sarah Keating & the JSBML team) Evan Parker - "Addition of a lazy loading sequence parser to Biopython's SeqIO package" (Mentors: Wibowo Arindrarto, Peter Cock & the Biopython team) Ibrahim Vairabad - "Improving the Plug-in interface for CellDesigner" (Mentors: Andreas Dr?ger, Alex Thomas & the JSBML team) Leandro Watanabe - "Dynamic Modeling of Cellular Populations within JSBML" (Mentors: Nicolas Rodriguez, Chris Meyers & the JSBML team) Congratulations to our accepted students! Thanks very much to all the students who applied, we very much appreciate your hard work. Today marks the start of the Community Bonding Period. Official work starts on May 23rd, and until then, students should prepare for their projects: get on the project mailing lists, solidify your plans, figure out where all the version control repositories are and which branch or fork you'll be working on, and start doing preparatory work. Students: if you have not done so already, make sure you have subscribed to the OBF GSoC email list at: http://lists.open-bio.org/mailman/listinfo/gsoc This list is for discussions among students and mentors, and for administrative announcements from me or my co-administrators. Here's to a great 2014 Summer of Code, Eric & Raoul OBF GSoC 2014 Organization Administrators _______________________________________________ GSoC mailing list GSoC at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/gsoc From mfethe1 at gmail.com Tue Apr 22 21:59:14 2014 From: mfethe1 at gmail.com (Michael Fethe) Date: Tue, 22 Apr 2014 17:59:14 -0400 Subject: [Biopython] Virus alert during qblast() Message-ID: Hi, I am submitting sequences to blast via biopython. My script runs over multiple hours and can take quite some time (working with hundreds of sequences). Is it possible for my computer or someone to mistake this script running as a virus since it writes my blast results to an output file and then submits my next sequence? Thanks, Michael Fethe From p.j.a.cock at googlemail.com Tue Apr 22 22:02:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Apr 2014 23:02:53 +0100 Subject: [Biopython] Virus alert during qblast() In-Reply-To: References: Message-ID: Hi Michael, That seems unlikely - but if you are doing hundreds of automated BLAST queries, the NCBI might not be very happy. For big BLAST jobs, I would always use standalone BLAST running locally (on your cluster if possible). This is generally faster as well :) Regards, Peter On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe wrote: > Hi, > > I am submitting sequences to blast via biopython. My script > runs over multiple hours and can take quite some time > (working with hundreds of sequences). Is it possible for > my computer or someone to mistake this script running > as a virus since it writes my blast results to an output file > and then submits my next sequence? > > Thanks, > > Michael Fethe > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Wed Apr 23 05:44:59 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Apr 2014 06:44:59 +0100 Subject: [Biopython] Virus alert during qblast() In-Reply-To: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> References: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> Message-ID: Hi again, Using standalone BLAST+ at the command line with -remote you can specify an Entrez filter option -entrez_query on the organism. Another option which may be better is to make a target database (e.g all fully sequenced bacteria). Peter On Wed, Apr 23, 2014 at 12:45 AM, Michael Fethe wrote: > Hi Peter, > > I am blasting unknowns, however, can I limit biopython to bacteria in my qblast command? > > Michael Fethe > >> On Apr 22, 2014, at 6:02 PM, Peter Cock wrote: >> >> Hi Michael, >> >> That seems unlikely - but if you are doing hundreds of >> automated BLAST queries, the NCBI might not be very >> happy. >> >> For big BLAST jobs, I would always use standalone >> BLAST running locally (on your cluster if possible). >> This is generally faster as well :) >> >> Regards, >> >> Peter >> >>> On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe wrote: >>> Hi, >>> >>> I am submitting sequences to blast via biopython. My script >>> runs over multiple hours and can take quite some time >>> (working with hundreds of sequences). Is it possible for >>> my computer or someone to mistake this script running >>> as a virus since it writes my blast results to an output file >>> and then submits my next sequence? >>> >>> Thanks, >>> >>> Michael Fethe >>> _______________________________________________ >>> Biopython mailing list - Biopython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython From davidsshin at lbl.gov Wed Apr 23 07:38:55 2014 From: davidsshin at lbl.gov (David Shin) Date: Wed, 23 Apr 2014 00:38:55 -0700 Subject: [Biopython] Virus alert during qblast() In-Reply-To: References: <502F4F7C-09E1-4229-AD4D-FFB2275C5E17@gmail.com> Message-ID: For standalone, which yes, will run way way way faster, this is what I did to make a few filtered databases. Tried to give examples of nucleotide if that's what you are looking for... Go to the nucleotide or protein (whichever you are working on) BLAST page Nucleotide BLAST: Search nucleotide databases using a nucleotide query and start typing in the organism or species in the text field corresponding to "Organism" "optional" Get the taxid i.e if you typed in: bacteria, you would get taxid:2 if you put zea mays you would get taxid:4577 then go to the NCBI nucleotide or page Home - Nucleotide - NCBI Use the following syntax for your search (will use the zea mays example) txid4577[ORGN] Then, from the "send to:" pulldown on the webpage: click "file" button a dropdown will appear under format, select "gi list" save the file.... but change name to something you remember like sequence.gi.4577.txt in case you will want different filters later Then in your database directory where you have downloaded the all nr nucleotide database run: blastdb_aliastool -gilist sequence.gi.4577.txt -db nr -out nr_gi.4577 -title nr_gi.4577 to give a filter called nr_gi.4577 then when you blast from your script, it would look something like: blastn -query mysequence.fs -num_threads 4 -db nr_gi.4577 -out test-4577.out In my case, I made a filter that had just "plants", using taxid 3193, but also a subset that had ~15 selected species, by combining the "gi list" output from separate searches.. ie. like if I wanted a "bacteria + zea mays" filter because I was psychotic, I would cat together the gi lists files from txid2 and txid4577. Anyway, that's how you can run everything locally after you have it set up, and reduce time by a significant amount. At least, that's how I did it, if anyone has a better way, let me know. D On Tue, Apr 22, 2014 at 10:44 PM, Peter Cock wrote: > Hi again, > > Using standalone BLAST+ at the command line with -remote > you can specify an Entrez filter option -entrez_query on the > organism. > > Another option which may be better is to make a target > database (e.g all fully sequenced bacteria). > > Peter > > > On Wed, Apr 23, 2014 at 12:45 AM, Michael Fethe wrote: > > Hi Peter, > > > > I am blasting unknowns, however, can I limit biopython to bacteria in my > qblast command? > > > > Michael Fethe > > > >> On Apr 22, 2014, at 6:02 PM, Peter Cock > wrote: > >> > >> Hi Michael, > >> > >> That seems unlikely - but if you are doing hundreds of > >> automated BLAST queries, the NCBI might not be very > >> happy. > >> > >> For big BLAST jobs, I would always use standalone > >> BLAST running locally (on your cluster if possible). > >> This is generally faster as well :) > >> > >> Regards, > >> > >> Peter > >> > >>> On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe > wrote: > >>> Hi, > >>> > >>> I am submitting sequences to blast via biopython. My script > >>> runs over multiple hours and can take quite some time > >>> (working with hundreds of sequences). Is it possible for > >>> my computer or someone to mistake this script running > >>> as a virus since it writes my blast results to an output file > >>> and then submits my next sequence? > >>> > >>> Thanks, > >>> > >>> Michael Fethe > >>> _______________________________________________ > >>> Biopython mailing list - Biopython at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- David Shin, Ph.D Lawrence Berkeley National Labs 1 Cyclotron Road MS 83-R0101 Berkeley, CA 94720 USA From p.j.a.cock at googlemail.com Thu Apr 24 19:53:34 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Apr 2014 20:53:34 +0100 Subject: [Biopython] Versions of PyPy to support? Message-ID: Hello all, PyPy 2.3 is due out shortly, which prompts me to ask which versions of PyPy are people using Biopython with? PyPy is an alternative implementation of Python, which can often be much faster - see http://pypy.org/ We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 but I would like to suggest we drop at least PyPy 1.8 and 1.9. Is that OK? Thanks! Peter From manlio.calvi at gmail.com Thu Apr 24 21:04:47 2014 From: manlio.calvi at gmail.com (Manlio Calvi) Date: Thu, 24 Apr 2014 23:04:47 +0200 Subject: [Biopython] [Biopython-dev] Versions of PyPy to support? In-Reply-To: References: Message-ID: I'm thinking about run a PyPy test on my machine, something specific to watch out? I've seen they have a beta on pypy for 3, probably a bit experimental at the moment but they saying it mostly works (as they say of PyPy in general) Manlio On Thu, Apr 24, 2014 at 9:53 PM, Peter Cock wrote: > Hello all, > > PyPy 2.3 is due out shortly, which prompts me to ask which > versions of PyPy are people using Biopython with? > > PyPy is an alternative implementation of Python, which > can often be much faster - see http://pypy.org/ > > We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 > but I would like to suggest we drop at least PyPy 1.8 and 1.9. > > Is that OK? > > Thanks! > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Thu Apr 24 21:07:54 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Apr 2014 22:07:54 +0100 Subject: [Biopython] Versions of PyPy to support? In-Reply-To: References: Message-ID: Try PyPy 2.2 first, and if that works you can try their experimental Python 3 support? Peter On Thursday, April 24, 2014, Manlio Calvi wrote: > I'm thinking about run a PyPy test on my machine, something specific > to watch out? > > I've seen they have a beta on pypy for 3, probably a bit experimental > at the moment but they saying it mostly works (as they say of PyPy in > general) > > Manlio > > On Thu, Apr 24, 2014 at 9:53 PM, Peter Cock > > wrote: > > Hello all, > > > > PyPy 2.3 is due out shortly, which prompts me to ask which > > versions of PyPy are people using Biopython with? > > > > PyPy is an alternative implementation of Python, which > > can often be much faster - see http://pypy.org/ > > > > We're currently testing with PyPy 1.8, 1.9, 2.0, 2.1 and 2.2 > > but I would like to suggest we drop at least PyPy 1.8 and 1.9. > > > > Is that OK? > > > > Thanks! > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From smortaz at gmail.com Fri Apr 25 06:18:45 2014 From: smortaz at gmail.com (Sean Mortazavi) Date: Thu, 24 Apr 2014 23:18:45 -0700 Subject: [Biopython] The PTVS gang (from microsoft) would like to hear from BioPython users! Message-ID: Hi - we're a few engineers (not marketers!) that work on Python Tools for Visual Studio (http://pytools.codeplex.com). It's a free & OSS plug-in that turns VS into a decent Python IDE. It has some nice features like mixed-mode Python/C++ debugging, debugging on Linux from Visual Studio and IPython integration. Some of our users (including BioPython users) have been encouraging us to enhance PTVS and add support for some "Data Science" focused features & scenarios. We'd *love* to get your input regarding your current stack, workflow and pain points before taking the next steps. It does not matter if you use Windows, Visual Studio, love/hate Microsoft - we'd just love to understand your environment a bit better especially if you use tools like Excel, R, Matlab, Mathematica, numpy, scipy, Pandas, ... As a thank you, 50 people will be randomly selected to receive a $5 Starbucks Coffee card! Here is a link to the survey which should take about 2 minutes to complete. https://www.surveymonkey.com/s/VSForDataScience If you know others that might be interested in taking this survey, *please* forward it to them - much appreciated. Thanks for your participation! From mictadlo at gmail.com Tue Apr 29 10:28:39 2014 From: mictadlo at gmail.com (Mic) Date: Tue, 29 Apr 2014 20:28:39 +1000 Subject: [Biopython] Parsing SnpEff's VCF file Message-ID: Hello, SnpEff created a new VCF file which looks like this line DA_v3.0 1252 DA0000001 G T 3.0 . DP=12;EFF=DOWNSTREAM(MODIFIER|||||Q3TPR7|||Transcript_DA_0011r.4||1),DOWNSTREAM(MODIFIER|||||Q8GYX9|||Transcript_DA_0011r.2||1),INTERGENIC(MODIFIER||||||||||1) GT:DP 0/0:3 ./.:0 ./.:0 1/1:3 0/0:3 0/0:1 0/0:2 ./.:0 I found Gemini project which contains a SnpEff class ( https://github.com/arq5x/gemini/blob/master/gemini/snpEff.py ). However, I am not quite sure how to use snpEff.py outside Gemini project in order to parse the whole SnpEff's VCF file. Or does BioPython provide a parser? Thank you in advance, From p.j.a.cock at googlemail.com Wed Apr 30 13:24:42 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Apr 2014 14:24:42 +0100 Subject: [Biopython] Bug#746484: file not distributable In-Reply-To: <20140430131703.GD758@an3as.eu> References: <20140430131703.GD758@an3as.eu> Message-ID: Hi Andreas, Are you specifically asking about Biopython 1.63 here? I think you can reasonable exclude this DTD file (and any others under the Bio/Entrez/DTD file). Biopython 1.63 will warn if they are missing but attempt to download them automatically. We're looking at dropping all the NCBI Entrez related DTD files, since the next Biopython release (v1.64) will automatically download AND cache them - see recent discussion, e.g. http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011205.html We haven't actually removed the files on GitHub yet, but this might be an incentive to do so. Thanks, Peter On Wed, Apr 30, 2014 at 2:17 PM, Andreas Tille wrote: > Hello, > > our ftpmaster has detected an issue with one of the DTDs which are > distributed with BioPython source. > > I can confirm that after applying the following patch > > > --- a/Bio/Entrez/DTDs/modules.ent > +++ b/Bio/Entrez/DTDs/modules.ent > @@ -350,13 +350,6 @@ Version Reason/Occasion > "mathmlsetup.ent" > > > > - > - - PUBLIC > -"-//W3C//ENTITIES MathML 2.0 Qualified Names 1.0//EN" > -"mathml/mathml2-qname-1.mod" > > - > - > > "-//W3C//DTD MathML 2.0//EN" > > > the file in question can be removed from the archive without breaking > the build (including the test suite). I would like to suggest to drop > the file in question from your distribution tarball in case my analysis > that it is not needed is correct. > > Kind regards > > Andreas. > > On Wed, Apr 30, 2014 at 02:25:09PM +0200, Thorsten Alteholz wrote: >> Package: python-biopython >> Version: 1.63-2 >> Severity: serious >> User: alteholz at debian.org >> Usertags: ftp >> X-Debbugs-CC: ftpmaster at ftp-master.debian.org >> thanks >> >> Dear Maintainer, >> >> according to: >> http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231 >> the file >> biopython-1.63\Bio\Entrez\DTDs\mathml2-qname-1.mod >> may not be modified and such this file is not distributable in main. >> >> Thorsten >> >> _______________________________________________ >> Debian-med-packaging mailing list >> Debian-med-packaging at lists.alioth.debian.org >> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging >> > > -- > http://fam-tille.de From p.j.a.cock at googlemail.com Wed Apr 30 13:38:11 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 30 Apr 2014 14:38:11 +0100 Subject: [Biopython] Bug#746484: file not distributable In-Reply-To: <20140430133016.GE758@an3as.eu> References: <20140430131703.GD758@an3as.eu> <20140430133016.GE758@an3as.eu> Message-ID: On Wed, Apr 30, 2014 at 2:30 PM, Andreas Tille wrote: > Hi Peter, > > thanks for your super-fast response. > > On Wed, Apr 30, 2014 at 02:24:42PM +0100, Peter Cock wrote: >> Are you specifically asking about Biopython 1.63 here? > > Yes. Since I have added python3 binary packages Biopython 1.63 went > through manual inspection by ftpmaster and this issue was noticed. Very through of them - thanks! Also thank you for doing the Debian Python3 packaging of Biopython :) >> I think you >> can reasonable exclude this DTD file (and any others under the >> Bio/Entrez/DTD file). Biopython 1.63 will warn if they are missing >> but attempt to download them automatically. > > OK. > >> We're looking at dropping all the NCBI Entrez related DTD files, >> since the next Biopython release (v1.64) will automatically download >> AND cache them - see recent discussion, e.g. >> >> http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011205.html > > That's fine. > >> We haven't actually removed the files on GitHub yet, but this >> might be an incentive to do so. > > OK, meanwhile (as long as 1.64 is not yet released) I will remove the > file from the Debian archive. > > Thanks for the clarification > > Andreas. Great, Peter P.S. I'm skimming over the Debian patches to see what we can fix: http://anonscm.debian.org/viewvc/debian-med/trunk/packages/python-biopython/trunk/debian/patches/ e.g. https://github.com/biopython/biopython/commit/2f098ac5311e0eec3d6737f4fff60e18c50b9481