From ziemys at ecr6.ohio-state.edu Fri Mar 3 15:09:02 2006 From: ziemys at ecr6.ohio-state.edu (ziemys@ecr6.ohio-state.edu) Date: Fri Mar 3 16:00:22 2006 Subject: [BioPython] Bio.PDB.ResidueDepth Message-ID: HI, What sphere radius is used to calculate the surface in Bio.PDB.ResidueDepth? 1.4 ? Can the radius be modified and how ? Is it needed just 'msms.exe' or I need 'pdb_to_xyzr.exe' also to install? With best Arturas From biopython at maubp.freeserve.co.uk Sat Mar 4 05:08:45 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Sat Mar 4 05:31:20 2006 Subject: [BioPython] Homology Modeling question In-Reply-To: References: Message-ID: <4409672D.4090602@maubp.freeserve.co.uk> Omid Khalouei wrote: > Hello, > > My question is not specifically related to Biopython, I wanted to know > if homology modeling can be used reliably to see the effects of single > amino acid substitutions. I mean is homoloy modeling useful for > predicting the structure of those sequences for which there is no know > structure or can it also be used for a more "fine tuning" analysis such > as changing one amino acid on a PDB structure and then performing > homology modeling using that same PDB structure as template? I was under the impression that for homology modelling you provide an alignment of your sequence with several other sequences with associated known structures. Have you looked at the Sali Lab's program MODELLER: http://salilab.org/modeller/ For the simple case of "fine tuning" analysis with a single amino acid substitute, homology modelling might be overkill. A simple substitution using the same backbone positions and initial direction for the side chain, followed by a molecular dynamics energy minimization of the side chain may be enough. This particular question has cropped before on the MMTK mailing lists - MMTK is a python molecular dynamics library: http://starship.python.net/crew/hinsen/MMTK/index.html http://starship.python.net/pipermail/mmtk/ > Also is there any uptodate forum for homology modeling? I looked it up > on Google but postings were for back in 1990's. I don't know. Peter From mdehoon at c2b2.columbia.edu Sat Mar 4 15:33:36 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sat Mar 4 15:29:45 2006 Subject: [BioPython] qblast fails on parsing XML results Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE9A@cgcmail.cgc.cpmc.columbia.edu> Fixed in CVS using urllib2. Thanks to Alexander Morgan for providing the code. Please let us know if there is still a problem with qblast. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces@portal.open-bio.org on behalf of Ilya Soifer Sent: Mon 2/27/2006 10:38 AM To: biopython@biopython.org Subject: [BioPython] qblast fails on parsing XML results Hi, I hope that I send it to the correct list. When I run qblast I get >>> res1 = NCBIWWW.qblast("blastn", "nr", seq1) Traceback (most recent call last): File "", line 1, in -toplevel- res1 = NCBIWWW.qblast("blastn", "nr", seq1) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1130, in qblast i = results.index("Connection: close") ValueError: substring not found This happens since the results that Blast return no longer have this header # HTTP/1.1 200 OK # Date: Wed, 05 Oct 2005 02:13:33 GMT # Server: Nde # Content-Type: text/plain # Connection: close # but this one HTTP/1.0 200 OK Date: Mon, 27 Feb 2006 11:54:40 GMT Content-Type: application/xml Server: Nde Via: 1.1 proxy7 (NetCache NetApp/6.0.2) I guess it might be better to look for something like " References: Message-ID: That snippet is useable if you moe it down a few lines- right after: consumer.reference_num(data[:data.find(' ')]) and before: consumer.reference_bases(data[data.find(' ')+1:]) -Kael From ziemys at ecr6.ohio-state.edu Tue Mar 7 16:38:36 2006 From: ziemys at ecr6.ohio-state.edu (ziemys@ecr6.ohio-state.edu) Date: Tue Mar 7 16:34:46 2006 Subject: [BioPython] HSE (half-sphere exposure) Message-ID: Hi Can anybody give more details about how to use HSE in BioPython ? (BioPython is very nice, but at the same it suffers from the lack of documentations...) With best Arturas From thamelry at binf.ku.dk Tue Mar 7 16:50:46 2006 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Tue Mar 7 17:02:46 2006 Subject: [BioPython] HSE (half-sphere exposure) In-Reply-To: References: Message-ID: <33099.87.72.27.226.1141768246.squirrel@www.binf.ku.dk> On Tue, March 7, 2006 10:38 pm, ziemys@ecr6.ohio-state.edu wrote: > Hi > > > Can anybody give more details about how to use HSE in BioPython ? > > > (BioPython is very nice, but at the same it suffers from the lack of > documentations...) > > With best > Arturas > Hi Arturas, Below is an example. Note that HSE-alpha is undefined for the first and last residues of a polypeptide. Best regards, -Thomas ---- from Bio.PDB import * import sys p=PDBParser() s=p.get_structure('X', sys.argv[1]) model=s[0] RADIUS=12.0 hse=HSExposureCA(model, radius=RADIUS) hse=HSExposureCB(model, radius=RADIUS) hse=ExposureCN(model, radius=RADIUS) for r in model.get_residues(): if is_aa(r): print r try: # Contact number print r.xtra["EXP_CN"] # HSE alpha up print r.xtra["EXP_HSE_A_U"] # HSE alpha down print r.xtra["EXP_HSE_A_D"] # HSE beta up print r.xtra["EXP_HSE_B_U"] # HSE beta down print r.xtra["EXP_HSE_B_D"] print except: pass From kael at sonic.net Tue Mar 7 16:55:53 2006 From: kael at sonic.net (Kael Fischer) Date: Tue Mar 7 17:06:42 2006 Subject: [BioPython] GenBank RecordParser Failure (split REFERENCE line) Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: gb__init__v2.diff Type: application/octet-stream Size: 747 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20060307/e66f9bff/gb__init__v2.obj From biopython at maubp.freeserve.co.uk Fri Mar 10 09:07:56 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Fri Mar 10 09:21:06 2006 Subject: [BioPython] Translating ambiguous stop codons Message-ID: <4411883C.2020200@maubp.freeserve.co.uk> I've been working on simple gene finding within sequence contigs from unfinished genomes. Very simply I have used biopython to scan each of the six frames looking for a start codon, translating until the next stop codon - and repeating. This is a pretty simple way of generating a list of possible open reading frames for further analysis. Unfortunately (as is probably the case for many unfinished genomes) there are some ambiguous codons which could code for an amino acid or a stop codon: e.g. "NAG" could be E, K, Q or a stop codon "YAG" could be either Q or a stop codon (as Y = C or T) For example, If I have the ambiguous sequence "CAAGGCGTCGAAYAGCTTCAGGAACAGGAC" and try and translate it I get an exception, "TranslationError: YAG" from Bio.Seq import Seq from Bio import Translate my_translator = Translate.ambiguous_dna_by_id[11] my_dna = Seq('CAAGGCGTCGAAYAGCTTCAGGAACAGGAC', \ my_translator.table.nucleotide_alphabet) #print my_translator.translate_to_stop(my_dna) print my_translator.translate(my_dna) The possible translations are 'QGVEQLQEQD', and 'QGVE*LQEQD' Is this situation something many other BioPython users have had to deal with? I could write my own translate method for this particular application, but was wondering how best to support this within the basic BioPython setup. Suggestion One - Fairly Simple ============================== The translate_to_stop method could be enhanced with an option to control how it copes with ambiguous codons that could be either a stop or an amino acid: (i) Treat as a stop codon "*", and stop translating there (ii) Treat as amino acid, and continue translating (iii) Treat as ambiguous (see suggestion two) and continue translating As this is an unusual case, the additional code would only be triggered rarely so should not have much impact on the speed of the typical translation. This could also be done to the translate method giving: (i) Treat as stop codon, e.g. 'QGVE*LQEQD' (ii) Treat as amino acid, e.g. 'QGVEXLQEQD' or better 'QGVEQLQEQD' (iii) Treat as ambiguous (see suggestion two) In this case (codon = "YAG") if we assume it is an amino acid (and not a stop codon) it must be "Q". In other examples (e.g. "NAG") then the result would be E, K or Q and thus result in translation "X". Suggestion Two - Complex ======================== Biopython uses "*" for a stop codon, and "X" for any amino acid. There does not seem to be a symbol for either a stop codon or an amino acid, e.g. "?". As far as I can tell, there is no IUPAC standard for this... If this existed (maybe in a variant of the IUPACAmbiguousDNA alphabet) then we could expect to get back 'QGVE?LQEQD' from translate. Old, but fairly relevant, email from Andrew Dalke http://biopython.org/pipermail/biopython-dev/2000-August/000072.html Peter From mdehoon at c2b2.columbia.edu Sat Mar 18 15:40:51 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sat, 18 Mar 2006 15:40:51 -0500 Subject: [BioPython] Test - please ignore Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEC4@cgcmail.cgc.cpmc.columbia.edu> Just testing if I can send to this mailing list. One of our users complained that his messages were getting bounced, although he is a member of this mailing list. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Tue Mar 21 07:30:16 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Tue, 21 Mar 2006 12:30:16 +0000 Subject: [BioPython] EMBOSS programs and their alignment formats Message-ID: <441FF1D8.2060904@maubp.freeserve.co.uk> I've been having a look at BioPython's Emboss support and it looks like a (partial) set of command line interfaces to the tools, with additional code for some of the primer tools and their formats. As far as I can tell, there is no support for any of the Emboss alignment output formats: http://emboss.sourceforge.net/docs/themes/AlignFormats.html Some (all?) of the alignment programs will happily produce gapped FASTA output, but this excludes other information like the alignment score etc. The alignments themselves could be analysed to extract the alignment length, identity, similarity and gap counts. However, the FASTA format does not include the algorithm specific score, nor other program parameters which might be of interest (like the matrix and gap penalties). e.g. ######################################## # Program: demoalign # Rundate: Thu Jan 17 09:30:08 2002 # Report_file: stdout ######################################## #======================================= # # Aligned_sequences: 4 # 1: IXI_234 # 2: IXI_235 # 3: IXI_236 # 4: IXI_237 # Matrix: EBLOSUM62 # Gap_penalty: 9 # Extend_penalty: -1 # # Length: 131 # Identity: 95/131 (72.5%) # Similarity: 127/131 (96.9%) # Gaps: 25/131 (19.1%) # # #======================================= (followed by the aligned sequences) Has anyone tackled supporting these files in BioPython? Thanks Peter From biopython at maubp.freeserve.co.uk Fri Mar 24 09:56:14 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Fri, 24 Mar 2006 14:56:14 +0000 Subject: [BioPython] Tweaking GenomeDiagram Message-ID: <4424088E.9090004@maubp.freeserve.co.uk> This email is mainly aimed at Leighton Pritchard (who I have spotted posting on the list in the past) as it concerns his (bio)python add-on, GenomeDiagram: http://bioinf.scri.ac.uk/lp/programs.html#genomediagram First Query ----------- I would like to attach labels to (selected) features. For example, I am drawing a circular genome diagram with a selection of colour coded genes - some of which I would like to have individually labelled. This might be done in a similar way to the genome size tick captions (i.e. horizontal text) or perhaps rotated text (radially aligned). However, as far as I can tell from the documentation and the source code, this is not built in. Second Query ------------- When drawing circular genomes following the examples, the major tick marks seem to be at 1, 10001, 20001, ... (depending on the tick interval size). It would look much better to display 10000 rather than 10001 (or even, leading to my third question, 10 Kb). Third Query ----------- I would like to have genome size "tick labels" in terms of kilo-bases or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than 2000000). I have done this myself by "hacking the source code" but my implementation is rather special case. So, has anyone tried to tackle these issues before? Thanks Peter From jchang at smi.stanford.edu Fri Mar 24 09:06:11 2006 From: jchang at smi.stanford.edu (jchang at smi.stanford.edu) Date: Fri, 24 Mar 2006 09:06:11 -0500 Subject: [BioPython] Lecturer needed for "Advanced Python" In-Reply-To: <4423E03E.20604@nbn.ac.za> References: <4423E03E.20604@nbn.ac.za> Message-ID: <20060324140609.GA266@sophie.local> Hello, Ruediger Braeuning has asked me to forward this to the list. Jeff On Fri, Mar 24, 2006 at 02:04:14PM +0200, Ruediger Braeuning wrote: > Hi, > > I'm writing to you from the South African National Bioinformatics > Network (NBN). We need a lecturer for our course in "Advanced Python" as > Andrew Dalke (the lecturer of last year's "Advanced Python") is not > available this year. > > So if you are qualified and want to spend some time in Cape Town drop me > a line. > > Time > ---- > We allocated 36 days for this module (Thursday, August 10th till Friday, > September 29th, 2006). I know that this is a long time but as you can > see from the daily schedule it's just 3 hours per day. For the rest of > the day you are free and we can provide you with an office. > > Expenses > --------- > We arrange and cover your flight, local transport, accommodation, meals. > There is also a small honorarium of ZAR 300 per day of teaching. > > Please find more details below: > > The National Bioinformatics Network (NBN) > ------------------------------------------ > The NBN was established to stimulate and support growth and development > of Bioinformatics as a scientific and applied discipline in South Africa > at an internationally competitive level. > > The Course > ----------- > We run national courses on an annual basis. Details of the courses > (content, lecture material) that were run in 2004 & 2005 can be found > under "Bioinformatics Workshop Modules" at > http://www.nbn.ac.za/Education/course.html > > The course content is aimed at covering as much of our Bioinformatics > core curriculum as possible. > > Your module > ------------ > We have the following suggestions for your module: > > - Data structures (object oriented design) > - Libraries, BioPython > - How to write larger pieces of code > - Interface Design > - Usability Testing Methodologies > > Note: These are just suggestions. You are the expert and we would > welcome your ideas. The students already got 35 days of introduction to > Python (Feb - Apr 2006). I'd be more than happy to hook you up with the > lecturer of that module. > > IMPORTANT: Please let us know of any prerequisites you require your > students to have. We can then make certain course modules a prerequisite > for your module. > - module on "Introductory Python" > > Also let us know of any required reading your students have to for > preparation. > > As your module is part of the bigger course on Bioinformatics we would > like to encourage you to show the relevance of your module for the whole > discipline and use Bioinformatics examples. The courses should start > easy, get everybody on board and then go into detail. Lectures should be > complemented by hands on sessions, which we believe is absolutely > crucial to the success of teaching and training. Problem based teaching > approaches worked best for our students. The NBN strongly emphasizes > open source solutions. It would be great if you could support this by > choosing your software accordingly. > > Daily Schedule > -------------- > 08:00-10:45 Python > 10:45-11:00 Break > 11:00-13:00 Lecture/Practical for another module > 13:00-14:00 Lunch > 14:00-15:30 Lecture/Practical for another module > 15:30-15:45 Break > 15:45-17:00 Lecture/Practical for another module > > Students will be taught Python every day of the course. We encourage all > our lecturers to exchange ideas for little tasks that are relevant for > their module and can be tackled in Python with the Python lecturer. > > Background of students > ----------------------- > Course participants will come from a range of different backgrounds > (Biology, Computer Science) and comprise people who study Bioinformatics > (attendance is mandatory for students with a NBN bursary) and people who > are "just" interested in particular aspects of Bioinformatics. Students > will also vary in terms of seniority from undergraduates to postdocs > with the majority being postgraduates. You should therefore expect a > higher degree of heterogeneity of knowledge amongst the course > participants than you would normally expect. This means you should be > prepared to have some flexibility in adjusting your course schedule to > the participants. > > Number of students > ------------------- > We expect a maximum of 25 students. > > Assistants > ---------- > Please let me know if you need some student assistants. > > Facilities > ----------- > Every participant will have her/his own PC to work on. A video projector > and stable internet access is in place. > > Evaluation and Assignments > --------------------------- > To give you and us feedback on the success of the course we will hand > out evaluation questionnaires after each module. Students also have to > take one assignment per course (our pass mark is 65%). In that respect > we would like to ask you to provide and mark your assignment. > > Should you require further information please don?t hesitate to contact > me at ruediger at nbn.ac.za or + 27 21 959 2991. I?m also more than happy > to give you a call. > > I?m looking forward to working with you. > > Should you not be available I would be grateful if you could recommend > another lecturer to me. > > Ruediger > -- > Ruediger Braeuning / National Bioinformatics Network > (=) University of the Western Cape > Ph. +27 21 959 2991 / Private Bag X17 > Fax +27 21 959 3573 (=) Bellville, 7535 > www.nbn.ac.za / South Africa From Teemu.Kuulasmaa at uku.fi Sat Mar 25 11:56:48 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Sat, 25 Mar 2006 18:56:48 +0200 Subject: [BioPython] biopython and dbSNP Message-ID: <44257650.6090602@uku.fi> Hi, I am absolute beginner in python and biopython. I am trying to familiarize myself with biopython. I have java background but I think that python (and biopython) would be better tool to automate my daily routines. Python seems to be superior language for quick (and sometimes dirty) scripting compared to java. I would like to write some python scripts that help me to work with SNPs and DNA/RNA sequences. I work with SNPs daily basis. However,I was disappointed because I didn't find any notice about dbSNP from biopython documentation. At the beginnig I would like to be able to retrieve some SNP records from NCBI's dbSNP and parse them. Is there any ready made classes for that purpose? GenBank.search_for('id', 'database=xxx',...) function doesn't seem to support 'database=snp' parameter. To put it simple: Am I able to work with dbSNP by using biopython? Best regards, Teemu Kuulasmaa From dag at sonsorol.org Sat Mar 25 18:50:57 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Sat, 25 Mar 2006 18:50:57 -0500 Subject: [BioPython] Important news for developers on open-bio machines Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org> Hi, apologies for the massive cross-post. I'll keep it short! This message is a last-ditch attempt to contact people with developer accounts on pub.open-bio.org who may have not received the individual mails we've been sending via the obf-developers at lists.open-bio.org mailing list. We suspect that there are a number of devs out there for whom we don't have up to date email addresses. All open-bio services have been migrated to new hardware and a new datacenter. Part of this migration process involved moving all developer accounts and all source-code repositories to a new server. The developer migration was completed a few minutes ago. An unavoidable side effect of the move is that all developers are now locked out of their accounts until they contact us for a password reset. If you are a developer and this news comes as a surprise to you, it means we don't have your contact info. Your best way to get up to speed on the history and technical details behind the migration is to point your browser here: http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ thread.html ... and read the various messages we've posted this month. Included in the first message is the information on how to request an account reset. Regards, Chris Dagdigian open-bio.org From biopython at maubp.freeserve.co.uk Mon Mar 27 09:48:52 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Mon, 27 Mar 2006 15:48:52 +0100 Subject: [BioPython] Tweaking GenomeDiagram In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> Message-ID: <4427FB54.1040408@maubp.freeserve.co.uk> Thanks Leighton', I've included most of your reply for the benefit of the BioPython mailing list and its archive... >>> First Query >>> ----------- >>> I would like to attach labels to (selected) features. >>> >>> For example, I am drawing a circular genome diagram with a selection of >>> colour coded genes - some of which I would like to have individually >>> labelled. This might be done in a similar way to the genome size tick >>> captions (i.e. horizontal text) or perhaps rotated text (radially aligned). >>> >>> However, as far as I can tell from the documentation and the source >>> code, this is not built in. I clearly didn't read the right bit of the source code. Leighton wrote: > Each individual GDFeature has a label attribute, taking a Boolean, that > allows you to set whether its label is displayed or not. You could set > this on feature creation, ... e.g. for feature in genbank_entry.features: if feature.type == 'CDS': gdfs.add_feature(feature, label=False, colour=colors.lightgreen) elif feature.type == 'tRNA' : gdfs.add_feature(feature, label=True, colour=colors.red) (This can easily be used with the examples in the documentation) Leighton wrote: > ... or at some later stage with a filter. If you're working with > N.equitans, and your GDFeatureSet is called `gdfs1', for example, this > code: > > gdfs1.set_all_features('label', 0) > for feature in gdfs1.features.values(): > print feature.name, feature.label > if feature.name.startswith('NEQ016'): > feature.label = 1 Maybe that should be gdfs._features rather than gdfs.features? > will label only features whose names begin with NEQ016. You'll probably > already see how flexible this can be if you add your own attributes to > GDFeature objects when they're created (BLAST scores, expression > leveles, membership of functional classes, etc.). You can set the > label_font, label_size, label_colour and label_angle attributes in the > same kind of way. By default, when using the add_feature method with a SeqFeature extracted from a GenBank file, GenomeDiagram will look at the 'gene', 'label', 'locus_tag' and 'product' qualifiers for potential labels (in that order). (See code in GDFeature.py class GDFeature.) It might be usefull to be able to supply the prefered label caption as part of the add_feature command. Thanks Peter From lpritc at scri.sari.ac.uk Mon Mar 27 04:17:01 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Mon, 27 Mar 2006 10:17:01 +0100 Subject: [BioPython] Tweaking GenomeDiagram Message-ID: <1143451021.18558.228.camel@lplinuxdev> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment.ksh -------------- next part -------------- An embedded message was scrubbed... From: Leighton Pritchard Subject: Re: Tweaking GenomeDiagram Date: Mon, 27 Mar 2006 10:17:01 +0100 Size: 5030 Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment.mht From lpritc at scri.sari.ac.uk Mon Mar 27 10:42:10 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Mon, 27 Mar 2006 16:42:10 +0100 Subject: [BioPython] Tweaking GenomeDiagram In-Reply-To: <4427FB54.1040408@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <4427FB54.1040408@maubp.freeserve.co.uk> Message-ID: <1143474132.18558.260.camel@lplinuxdev> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment.ksh -------------- next part -------------- An embedded message was scrubbed... From: Leighton Pritchard Subject: Re: [BioPython] Tweaking GenomeDiagram Date: Mon, 27 Mar 2006 16:42:10 +0100 Size: 5015 Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment.mht From biopython at maubp.freeserve.co.uk Mon Mar 27 13:03:14 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Mon, 27 Mar 2006 19:03:14 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.2 In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> Message-ID: <442828E2.3040703@maubp.freeserve.co.uk> Peter wrote: > > This email is mainly aimed at Leighton Pritchard (who I have spotted > posting on the list in the past) as it concerns his (bio)python > add-on, GenomeDiagram: > > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram > > First Query > ----------- > I would like to attach labels to (selected) features. See earlier email with example - I hadn't looked closely enough. http://www.biopython.org/pipermail/biopython/2006-March/002967.html > Second Query > ------------- > When drawing circular genomes following the examples, the major tick > marks seem to be at 1, 10001, 20001, ... (depending on the tick > interval size). > > It would look much better to display 10000 rather than 10001 (or even, > leading to my third question, 10 Kb). Fixed in GenomeDiagram release 0.2 > Third Query > ----------- > I would like to have genome size "tick labels" in terms of kilo-bases > or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than > 2000000). This is a new option in GenomeDiagram release 0.2, see below. Leighton has also improved the positioning of the size captions on the lower half of circular diagrams, and probably other things as well. The following email was sent to me, and I am forwarding it to the mailing list because Leighton PGP signature was confusing the server. Thanks again, Peter -------------------------------------------------------------------- Hi Peter, I think I've implemented everything you asked for, and the new source and Windows installer are located at: http://bioinf.scri.sari.ac.uk/lp/programs.php and http://bioinf.scri.sari.ac.uk/lp/programs.html#genomediagram (take your pick). To use the new features, you need to do the following sort of thing: parser = GenBank.FeatureParser() fhandle = open ('/data/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk','r') genbank_entry = parser.parse(fhandle) fhandle.close() gdd = GDDiagram('Test Diagram') gdfs1 = GDFeatureSet(name='CDS features') for feature in genbank_entry.features: if feature.type == 'CDS': # This is how you can override any attribute of the # GDFeature as you add it to the GDFeatureSet, just by # passing the appropriate keyword and argument gdfs1.add_feature(feature, name="Some feature or other") # By passing the scale_format = "SInt" argument, you can use SI-like # suffixes for scale markers. So far we only have Kbp and Mbp # suffixes, and the default goes to just a string of the marker # base postion. gdt1 = GDTrack('CDS features', greytrack=1, scale_largetick_interval=1e4, scale_smalltick_interval=1e3, scale_format = "SInt") gdt1.add_set(gdfs1) # You can now do regular expression comparisons, startswith # comparisons, exclusions or just plain matches to any GDFeature # attribute, just by passing the appropriate attribute, value and # comparator mode mod_features = gdfs1.get_features('name', 'NEQ0[2-4]', 'like') #mod_features = gdfs1.get_features('name', 'NEQ02', 'startswith') #mod_features = gdfs1.get_features('name', 'NEQ05', 'not') #mod_features = gdfs1.get_features('name', 'NEQ016') for feature in mod_features: feature.label = 1 And, finally, the marker labels in the lower halves of GenomeDiagram images have been lowered so that they hit the marker line at the top of the string, rather than the bottom. Phew! L. -- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp From Teemu.Kuulasmaa at uku.fi Tue Mar 28 01:59:27 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Tue, 28 Mar 2006 09:59:27 +0300 Subject: [BioPython] biopython and dbSNP (2) Message-ID: <4428DECF.4070406@uku.fi> Hi, I made some experimentations and got GenBank.search_for() and GenBank.download_many() to work with dbSNP. However, I didn't succeed to get GenBank.NCBIDictionary() to work. I do not know if this is right way to do it. It would by nice if someone (biopython-dev) could speak out on the matter. Here are two very small diffs (against biopython version 1.41) that were required to get dbSNP sequence retrieval to work: ---------------------------------------------------------- ubuntu at ubuntu:~/src/biopython$ diff /usr/lib/python2.4/site-packages/Bio/EUtils/Config.py Config.py 58c58 < databases.SNP = _add_db(DatabaseInfo("snp", 1)) ubuntu at ubuntu:~/src/biopython$ diff /usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py __init__.py 1422,1423d1421 < elif database in ['snp']: < format = 'fasta' This is example how it works after these modifications: ---------------------------------------------------------- ubuntu at ubuntu:~$ python Python 2.4.2 (#2, Mar 5 2006, 00:03:25) [GCC 4.0.3 20060212 (prerelease) (Ubuntu 4.0.2-9ubuntu1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import GenBank >>> snps = GenBank.search_for('rs8192602', 'snp') >>> snps ['8192602'] >>> seqs = GenBank.download_many(snps, 'snp') >>> print seqs.read() 1: rs8192602 [Homo sapiens] >gnl|dbSNP|rs8192602 rs=8192602|pos=272|len=397|taxid=9606|mol="genomic"|class=1|alleles="A/G"|build=117 TGGCAGAGTG GGGAGTAGGA GGGTAGTGCC AGTGAGTAAA CCAGACTCCA TACCTTAAGC TCAACTCCTA TCCCTTTGTC GCCTCCCAAC CCCAGTCATG GCTGAGTACG GGACCCTCCT GCAAGACCTG ACCAACAACA TCACCCTTGA AGATCTAGAA CAGCTCAAGT CGGCCTGCAA GGAAGACATC CCCAGCGAAA AGAGTGAGGA GATCACTACT GGCAGTGCCT GGTTTAGCTT CCTGGAGAGC CACAACAAGC TGGACAAAGG T R GGGGAGGGGA GCACAGGGGT CCTGTCATCA GTCATTCAGG CTCAGTTCAT TCAGCAAATA GAGATGAGCT CAAAGCTTTT ACATCCACAA TGTGTACCCC TCTATAGCAA GGCAGAAGAG AGGTG Best regards, Teemu Kuulasmaa From biopython at maubp.freeserve.co.uk Tue Mar 28 12:11:12 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 28 Mar 2006 18:11:12 +0100 Subject: [BioPython] biopython and dbSNP (2) In-Reply-To: <4428DECF.4070406@uku.fi> References: <4428DECF.4070406@uku.fi> Message-ID: <44296E30.1070809@maubp.freeserve.co.uk> Teemu Kuulasmaa wrote: > Hi, > > I made some experimentations and got GenBank.search_for() and > GenBank.download_many() to work with dbSNP. However, I didn't succeed to > get GenBank.NCBIDictionary() to work. I do not know if this is right way > to do it. It would by nice if someone (biopython-dev) could speak out on > the matter. > > Here are two very small diffs (against biopython version 1.41) that were > required to get dbSNP sequence retrieval to work: I'm not familiar with this aspect of the GenBank support, but your code looks OK to me. I tried your two changes on the CVS version of EUtils and GenBank and it works for me (the GenBank file has had significant changes to the file parser). One question is are the GenBank.search_for() and GenBank.download_many() functions intended just for "GenBank" (officially just the nucleotides?) or other sequence based EUtils databases like proteins, snp, ..., or even genomes. Unless anyone else cares to comment, I'll commit Teemu's two small changes in the next few days. As to getting GenBank.NCBIDictionary() to work with the snp database, its not as easy as it looks. Peter From biopython at maubp.freeserve.co.uk Tue Mar 28 13:06:08 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 28 Mar 2006 19:06:08 +0100 Subject: [BioPython] biopython and dbSNP (2) In-Reply-To: <44296E30.1070809@maubp.freeserve.co.uk> References: <4428DECF.4070406@uku.fi> <44296E30.1070809@maubp.freeserve.co.uk> Message-ID: <44297B10.9070000@maubp.freeserve.co.uk> Peter (BioPython List) wrote: > Teemu Kuulasmaa wrote: > >>Hi, >> >>I made some experimentations and got GenBank.search_for() and >>GenBank.download_many() to work with dbSNP. However, I didn't succeed to >>get GenBank.NCBIDictionary() to work. I do not know if this is right way >>to do it. It would by nice if someone (biopython-dev) could speak out on >>the matter. >> >>Here are two very small diffs (against biopython version 1.41) that were >>required to get dbSNP sequence retrieval to work: > > > I'm not familiar with this aspect of the GenBank support, but your code > looks OK to me. > > I tried your two changes on the CVS version of EUtils and GenBank and it > works for me (the GenBank file has had significant changes to the file > parser). > > One question is are the GenBank.search_for() and GenBank.download_many() > functions intended just for "GenBank" (officially just the nucleotides?) > or other sequence based EUtils databases like proteins, snp, ..., or > even genomes. > > Unless anyone else cares to comment, I'll commit Teemu's two small > changes in the next few days. > > As to getting GenBank.NCBIDictionary() to work with the snp database, > its not as easy as it looks. Trying this with SNP's (having applied Teemu Kuulasmaa's changes) we get back "mangled FASTA entries" with additional headers and blank lines. Ignoring the spaces in the sequences (which appear mostly in ten nucleotide blocks with a space in between) we get: >>> seqs = GenBank.download_many(['8192602','8192603'], 'snp') >>> print seqs.read() 1: rs8192602 [Homo sapiens] >gnl|dbSNP|rs8192602 ... TGGCAGAGTG... 2: rs8192603 [Homo sapiens] >gnl|dbSNP|rs8192603 ... TGGTGGGCAG... The blank lines shouldn't be a problem for the BioPython's FASTA parser. However, due to the extra lines look like "{Result Number}: {Identifier} [{Species}]" this is NOT a valid FASTA format file. This may be an NCBI EUtils problem... following their FAQ, I tested this URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA and this: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA And it does the same sort of thing :( I have emailed the NCBI... Peter From Teemu.Kuulasmaa at uku.fi Wed Mar 29 04:07:00 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Wed, 29 Mar 2006 12:07:00 +0300 Subject: [BioPython] biopython and dbSNP (2) References: 44296E30.1070809@maubp.freeserve.co.uk Message-ID: <442A4E34.8050206@uku.fi> > Peter (BioPython List) wrote: > > The blank lines shouldn't be a problem for the BioPython's FASTA parser. > > However, due to the extra lines look like "{Result Number}: {Identifier} > [{Species}]" this is NOT a valid FASTA format file. > > This may be an NCBI EUtils problem... following their FAQ, I tested this > URL: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA > > and this: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA > > And it does the same sort of thing :( > > I have emailed the NCBI... > > Peter Thank you for your response Peter! Like you said the NCBI EUtils result is not valid Fasta formated file. I hope that NCBI will fix this issue soon. Let us know if you get any kind of feedback from NCBI! Teemu -- Teemu Kuulasmaa, M.Sc. University of Kuopio Laboratory of Internal Medicine P.O.Box 1627 70211 Kuopio FINLAND Tel +358 1716 3498 Fax +358 1716 2445 From biopython at maubp.freeserve.co.uk Wed Mar 29 08:18:30 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Wed, 29 Mar 2006 14:18:30 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.21 In-Reply-To: <442828E2.3040703@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <442828E2.3040703@maubp.freeserve.co.uk> Message-ID: <442A8926.8040101@maubp.freeserve.co.uk> There was a packing problem with GenomeDiagram 0.2 (missing new module Observer), which Leighton has fixed with the release of GenomeDiagram 0.21, available here: http://bioinf.scri.ac.uk/lp/programs.html#genomediagram This also adds a dpi option to GDDiagram.write() for raster output - which is handy for generating high resolution PNG files. Peter From lpritc at scri.sari.ac.uk Wed Mar 29 09:05:21 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Wed, 29 Mar 2006 15:05:21 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.21 In-Reply-To: <442A8926.8040101@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <442828E2.3040703@maubp.freeserve.co.uk> <442A8926.8040101@maubp.freeserve.co.uk> Message-ID: <1143641121.4788.9.camel@lplinuxdev> On Wed, 2006-03-29 at 14:18 +0100, Peter (BioPython List) wrote: > There was a packing problem with GenomeDiagram 0.2 (missing new module > Observer), The problem was me - the packaging did exactly what I told it to, more's the pity ;) > which Leighton has fixed with the release of GenomeDiagram > 0.21, available here: > > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram I'll leave the advertising in... -- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp GPG/PGP: FEFC205C E58BA41B http://www.keyserver.net (If the signature does not verify, please remove the SCRI disclaimer) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.sari.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From halima at mancala.cbio.uct.ac.za Thu Mar 30 02:17:10 2006 From: halima at mancala.cbio.uct.ac.za (Halima Rabiu) Date: Thu, 30 Mar 2006 09:17:10 +0200 (SAST) Subject: [BioPython] Need help on NCBIStandaloneblast Message-ID: Hi everyboby ; I am new to biopython having problems with the "NCBIStandalone.blastall". After launching the Blast with "doBlast" it look like runs and end and then I check the output it empty and I try same thing using comand line it work and get result. I attch my code. I also try to go though the previous posts on biopython mailing list fund similar problem post by Andreas but no solution to the problem . Please can somebody help Thanks Nike -------------- next part -------------- #! /usr/local/bin/python2.4 #halimah #20-03-2006 from Bio.Blast import NCBIStandalone import os # path to my database data=os.path.join(os.getcwd(),"Newprotein.db","Nprotein.Fdb") # input file (protein sequence in fasta ) infile=os.path.join(os.getcwd(),"Newprotein.db","mytest.txt",'r') # path to Blastall executable blast_exe=os.path.join("/","usr","local","blast","bin","blastall") output,error_info =NCBIStandalone.blastall(blast_exe,"blastp", data, infile) #print output.readline() save_file =open("blast.out","w") blast_result=output.read() save_file.write(blast_result) save_file.close() blastfile = open('blast.out', 'r') b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blastfile, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break #This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let's just use print out a quick summary of all of the alignments greater than some threshold value. E_VALUE_THRESH = 1.00 for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' From biopython at maubp.freeserve.co.uk Thu Mar 30 10:56:29 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Thu, 30 Mar 2006 16:56:29 +0100 Subject: [BioPython] Need help on NCBIStandaloneblast In-Reply-To: References: Message-ID: <442BFFAD.10103@maubp.freeserve.co.uk> Halima Rabiu wrote: > Hi everyboby ; > I am new to biopython having problems with the "NCBIStandalone.blastall". > After launching the Blast with "doBlast" it look like runs and end > and then I check the output it empty and I try same thing using comand > line it work and get result. > I attch my code. Have you checked the paths are correct, e.g. assert os.path.isfile(data), "Missing database file " + data assert os.path.isfile(infile), "Missing input file " + infile You don't need to check blast_exe yourself, as the blastall command does this for you. If I understood you correctly, the "blast.out" file is empty. Did blast return any error message? Try: print error_info.read() or: save_file =open("blast.error","w") blast_result=error_info.read() save_file.write(blast_result) save_file.close() Next question, could you tell us what you typed at the command line which does work? > I also try to go though the previous posts on biopython mailing list fund > similar problem post by Andreas but no solution to the problem . It was worth checking anyway :) Peter From mdehoon at c2b2.columbia.edu Fri Mar 31 12:22:13 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 31 Mar 2006 12:22:13 -0500 Subject: [BioPython] BOSC announcement Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEDC@cgcmail.cgc.cpmc.columbia.edu> MEETING ANNOUNCEMENT & CALL FOR SPEAKERS The 7th annual Bioinformatics Open Source Conference (BOSC 2006) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place Aug 4,5th in Fortaleza, Brasil, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 14th International Conference on Intelligent Systems for Molecular Biology. Please consult The Official BOSC 2006 Website at http://www.open-bio.org/wiki/BOSC_2006 for details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From mdehoon at c2b2.columbia.edu Sat Mar 18 20:40:51 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sat, 18 Mar 2006 15:40:51 -0500 Subject: [BioPython] Test - please ignore Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEC4@cgcmail.cgc.cpmc.columbia.edu> Just testing if I can send to this mailing list. One of our users complained that his messages were getting bounced, although he is a member of this mailing list. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Tue Mar 21 12:30:16 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython)) Date: Tue, 21 Mar 2006 12:30:16 +0000 Subject: [BioPython] EMBOSS programs and their alignment formats Message-ID: <441FF1D8.2060904@maubp.freeserve.co.uk> I've been having a look at BioPython's Emboss support and it looks like a (partial) set of command line interfaces to the tools, with additional code for some of the primer tools and their formats. As far as I can tell, there is no support for any of the Emboss alignment output formats: http://emboss.sourceforge.net/docs/themes/AlignFormats.html Some (all?) of the alignment programs will happily produce gapped FASTA output, but this excludes other information like the alignment score etc. The alignments themselves could be analysed to extract the alignment length, identity, similarity and gap counts. However, the FASTA format does not include the algorithm specific score, nor other program parameters which might be of interest (like the matrix and gap penalties). e.g. ######################################## # Program: demoalign # Rundate: Thu Jan 17 09:30:08 2002 # Report_file: stdout ######################################## #======================================= # # Aligned_sequences: 4 # 1: IXI_234 # 2: IXI_235 # 3: IXI_236 # 4: IXI_237 # Matrix: EBLOSUM62 # Gap_penalty: 9 # Extend_penalty: -1 # # Length: 131 # Identity: 95/131 (72.5%) # Similarity: 127/131 (96.9%) # Gaps: 25/131 (19.1%) # # #======================================= (followed by the aligned sequences) Has anyone tackled supporting these files in BioPython? Thanks Peter From biopython at maubp.freeserve.co.uk Fri Mar 24 14:56:14 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Fri, 24 Mar 2006 14:56:14 +0000 Subject: [BioPython] Tweaking GenomeDiagram Message-ID: <4424088E.9090004@maubp.freeserve.co.uk> This email is mainly aimed at Leighton Pritchard (who I have spotted posting on the list in the past) as it concerns his (bio)python add-on, GenomeDiagram: http://bioinf.scri.ac.uk/lp/programs.html#genomediagram First Query ----------- I would like to attach labels to (selected) features. For example, I am drawing a circular genome diagram with a selection of colour coded genes - some of which I would like to have individually labelled. This might be done in a similar way to the genome size tick captions (i.e. horizontal text) or perhaps rotated text (radially aligned). However, as far as I can tell from the documentation and the source code, this is not built in. Second Query ------------- When drawing circular genomes following the examples, the major tick marks seem to be at 1, 10001, 20001, ... (depending on the tick interval size). It would look much better to display 10000 rather than 10001 (or even, leading to my third question, 10 Kb). Third Query ----------- I would like to have genome size "tick labels" in terms of kilo-bases or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than 2000000). I have done this myself by "hacking the source code" but my implementation is rather special case. So, has anyone tried to tackle these issues before? Thanks Peter From jchang at smi.stanford.edu Fri Mar 24 14:06:11 2006 From: jchang at smi.stanford.edu (jchang at smi.stanford.edu) Date: Fri, 24 Mar 2006 09:06:11 -0500 Subject: [BioPython] Lecturer needed for "Advanced Python" In-Reply-To: <4423E03E.20604@nbn.ac.za> References: <4423E03E.20604@nbn.ac.za> Message-ID: <20060324140609.GA266@sophie.local> Hello, Ruediger Braeuning has asked me to forward this to the list. Jeff On Fri, Mar 24, 2006 at 02:04:14PM +0200, Ruediger Braeuning wrote: > Hi, > > I'm writing to you from the South African National Bioinformatics > Network (NBN). We need a lecturer for our course in "Advanced Python" as > Andrew Dalke (the lecturer of last year's "Advanced Python") is not > available this year. > > So if you are qualified and want to spend some time in Cape Town drop me > a line. > > Time > ---- > We allocated 36 days for this module (Thursday, August 10th till Friday, > September 29th, 2006). I know that this is a long time but as you can > see from the daily schedule it's just 3 hours per day. For the rest of > the day you are free and we can provide you with an office. > > Expenses > --------- > We arrange and cover your flight, local transport, accommodation, meals. > There is also a small honorarium of ZAR 300 per day of teaching. > > Please find more details below: > > The National Bioinformatics Network (NBN) > ------------------------------------------ > The NBN was established to stimulate and support growth and development > of Bioinformatics as a scientific and applied discipline in South Africa > at an internationally competitive level. > > The Course > ----------- > We run national courses on an annual basis. Details of the courses > (content, lecture material) that were run in 2004 & 2005 can be found > under "Bioinformatics Workshop Modules" at > http://www.nbn.ac.za/Education/course.html > > The course content is aimed at covering as much of our Bioinformatics > core curriculum as possible. > > Your module > ------------ > We have the following suggestions for your module: > > - Data structures (object oriented design) > - Libraries, BioPython > - How to write larger pieces of code > - Interface Design > - Usability Testing Methodologies > > Note: These are just suggestions. You are the expert and we would > welcome your ideas. The students already got 35 days of introduction to > Python (Feb - Apr 2006). I'd be more than happy to hook you up with the > lecturer of that module. > > IMPORTANT: Please let us know of any prerequisites you require your > students to have. We can then make certain course modules a prerequisite > for your module. > - module on "Introductory Python" > > Also let us know of any required reading your students have to for > preparation. > > As your module is part of the bigger course on Bioinformatics we would > like to encourage you to show the relevance of your module for the whole > discipline and use Bioinformatics examples. The courses should start > easy, get everybody on board and then go into detail. Lectures should be > complemented by hands on sessions, which we believe is absolutely > crucial to the success of teaching and training. Problem based teaching > approaches worked best for our students. The NBN strongly emphasizes > open source solutions. It would be great if you could support this by > choosing your software accordingly. > > Daily Schedule > -------------- > 08:00-10:45 Python > 10:45-11:00 Break > 11:00-13:00 Lecture/Practical for another module > 13:00-14:00 Lunch > 14:00-15:30 Lecture/Practical for another module > 15:30-15:45 Break > 15:45-17:00 Lecture/Practical for another module > > Students will be taught Python every day of the course. We encourage all > our lecturers to exchange ideas for little tasks that are relevant for > their module and can be tackled in Python with the Python lecturer. > > Background of students > ----------------------- > Course participants will come from a range of different backgrounds > (Biology, Computer Science) and comprise people who study Bioinformatics > (attendance is mandatory for students with a NBN bursary) and people who > are "just" interested in particular aspects of Bioinformatics. Students > will also vary in terms of seniority from undergraduates to postdocs > with the majority being postgraduates. You should therefore expect a > higher degree of heterogeneity of knowledge amongst the course > participants than you would normally expect. This means you should be > prepared to have some flexibility in adjusting your course schedule to > the participants. > > Number of students > ------------------- > We expect a maximum of 25 students. > > Assistants > ---------- > Please let me know if you need some student assistants. > > Facilities > ----------- > Every participant will have her/his own PC to work on. A video projector > and stable internet access is in place. > > Evaluation and Assignments > --------------------------- > To give you and us feedback on the success of the course we will hand > out evaluation questionnaires after each module. Students also have to > take one assignment per course (our pass mark is 65%). In that respect > we would like to ask you to provide and mark your assignment. > > Should you require further information please don?t hesitate to contact > me at ruediger at nbn.ac.za or + 27 21 959 2991. I?m also more than happy > to give you a call. > > I?m looking forward to working with you. > > Should you not be available I would be grateful if you could recommend > another lecturer to me. > > Ruediger > -- > Ruediger Braeuning / National Bioinformatics Network > (=) University of the Western Cape > Ph. +27 21 959 2991 / Private Bag X17 > Fax +27 21 959 3573 (=) Bellville, 7535 > www.nbn.ac.za / South Africa From Teemu.Kuulasmaa at uku.fi Sat Mar 25 16:56:48 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Sat, 25 Mar 2006 18:56:48 +0200 Subject: [BioPython] biopython and dbSNP Message-ID: <44257650.6090602@uku.fi> Hi, I am absolute beginner in python and biopython. I am trying to familiarize myself with biopython. I have java background but I think that python (and biopython) would be better tool to automate my daily routines. Python seems to be superior language for quick (and sometimes dirty) scripting compared to java. I would like to write some python scripts that help me to work with SNPs and DNA/RNA sequences. I work with SNPs daily basis. However,I was disappointed because I didn't find any notice about dbSNP from biopython documentation. At the beginnig I would like to be able to retrieve some SNP records from NCBI's dbSNP and parse them. Is there any ready made classes for that purpose? GenBank.search_for('id', 'database=xxx',...) function doesn't seem to support 'database=snp' parameter. To put it simple: Am I able to work with dbSNP by using biopython? Best regards, Teemu Kuulasmaa From dag at sonsorol.org Sat Mar 25 23:50:57 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Sat, 25 Mar 2006 18:50:57 -0500 Subject: [BioPython] Important news for developers on open-bio machines Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org> Hi, apologies for the massive cross-post. I'll keep it short! This message is a last-ditch attempt to contact people with developer accounts on pub.open-bio.org who may have not received the individual mails we've been sending via the obf-developers at lists.open-bio.org mailing list. We suspect that there are a number of devs out there for whom we don't have up to date email addresses. All open-bio services have been migrated to new hardware and a new datacenter. Part of this migration process involved moving all developer accounts and all source-code repositories to a new server. The developer migration was completed a few minutes ago. An unavoidable side effect of the move is that all developers are now locked out of their accounts until they contact us for a password reset. If you are a developer and this news comes as a surprise to you, it means we don't have your contact info. Your best way to get up to speed on the history and technical details behind the migration is to point your browser here: http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ thread.html ... and read the various messages we've posted this month. Included in the first message is the information on how to request an account reset. Regards, Chris Dagdigian open-bio.org From biopython at maubp.freeserve.co.uk Mon Mar 27 14:48:52 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Mon, 27 Mar 2006 15:48:52 +0100 Subject: [BioPython] Tweaking GenomeDiagram In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> Message-ID: <4427FB54.1040408@maubp.freeserve.co.uk> Thanks Leighton', I've included most of your reply for the benefit of the BioPython mailing list and its archive... >>> First Query >>> ----------- >>> I would like to attach labels to (selected) features. >>> >>> For example, I am drawing a circular genome diagram with a selection of >>> colour coded genes - some of which I would like to have individually >>> labelled. This might be done in a similar way to the genome size tick >>> captions (i.e. horizontal text) or perhaps rotated text (radially aligned). >>> >>> However, as far as I can tell from the documentation and the source >>> code, this is not built in. I clearly didn't read the right bit of the source code. Leighton wrote: > Each individual GDFeature has a label attribute, taking a Boolean, that > allows you to set whether its label is displayed or not. You could set > this on feature creation, ... e.g. for feature in genbank_entry.features: if feature.type == 'CDS': gdfs.add_feature(feature, label=False, colour=colors.lightgreen) elif feature.type == 'tRNA' : gdfs.add_feature(feature, label=True, colour=colors.red) (This can easily be used with the examples in the documentation) Leighton wrote: > ... or at some later stage with a filter. If you're working with > N.equitans, and your GDFeatureSet is called `gdfs1', for example, this > code: > > gdfs1.set_all_features('label', 0) > for feature in gdfs1.features.values(): > print feature.name, feature.label > if feature.name.startswith('NEQ016'): > feature.label = 1 Maybe that should be gdfs._features rather than gdfs.features? > will label only features whose names begin with NEQ016. You'll probably > already see how flexible this can be if you add your own attributes to > GDFeature objects when they're created (BLAST scores, expression > leveles, membership of functional classes, etc.). You can set the > label_font, label_size, label_colour and label_angle attributes in the > same kind of way. By default, when using the add_feature method with a SeqFeature extracted from a GenBank file, GenomeDiagram will look at the 'gene', 'label', 'locus_tag' and 'product' qualifiers for potential labels (in that order). (See code in GDFeature.py class GDFeature.) It might be usefull to be able to supply the prefered label caption as part of the add_feature command. Thanks Peter From lpritc at scri.sari.ac.uk Mon Mar 27 09:17:01 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Mon, 27 Mar 2006 10:17:01 +0100 Subject: [BioPython] Tweaking GenomeDiagram Message-ID: <1143451021.18558.228.camel@lplinuxdev> An embedded and charset-unspecified text was scrubbed... Name: not available URL: -------------- next part -------------- An embedded message was scrubbed... From: Leighton Pritchard Subject: Re: Tweaking GenomeDiagram Date: Mon, 27 Mar 2006 10:17:01 +0100 Size: 5030 URL: From lpritc at scri.sari.ac.uk Mon Mar 27 15:42:10 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Mon, 27 Mar 2006 16:42:10 +0100 Subject: [BioPython] Tweaking GenomeDiagram In-Reply-To: <4427FB54.1040408@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <4427FB54.1040408@maubp.freeserve.co.uk> Message-ID: <1143474132.18558.260.camel@lplinuxdev> An embedded and charset-unspecified text was scrubbed... Name: not available URL: -------------- next part -------------- An embedded message was scrubbed... From: Leighton Pritchard Subject: Re: [BioPython] Tweaking GenomeDiagram Date: Mon, 27 Mar 2006 16:42:10 +0100 Size: 5015 URL: From biopython at maubp.freeserve.co.uk Mon Mar 27 18:03:14 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Mon, 27 Mar 2006 19:03:14 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.2 In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> Message-ID: <442828E2.3040703@maubp.freeserve.co.uk> Peter wrote: > > This email is mainly aimed at Leighton Pritchard (who I have spotted > posting on the list in the past) as it concerns his (bio)python > add-on, GenomeDiagram: > > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram > > First Query > ----------- > I would like to attach labels to (selected) features. See earlier email with example - I hadn't looked closely enough. http://www.biopython.org/pipermail/biopython/2006-March/002967.html > Second Query > ------------- > When drawing circular genomes following the examples, the major tick > marks seem to be at 1, 10001, 20001, ... (depending on the tick > interval size). > > It would look much better to display 10000 rather than 10001 (or even, > leading to my third question, 10 Kb). Fixed in GenomeDiagram release 0.2 > Third Query > ----------- > I would like to have genome size "tick labels" in terms of kilo-bases > or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than > 2000000). This is a new option in GenomeDiagram release 0.2, see below. Leighton has also improved the positioning of the size captions on the lower half of circular diagrams, and probably other things as well. The following email was sent to me, and I am forwarding it to the mailing list because Leighton PGP signature was confusing the server. Thanks again, Peter -------------------------------------------------------------------- Hi Peter, I think I've implemented everything you asked for, and the new source and Windows installer are located at: http://bioinf.scri.sari.ac.uk/lp/programs.php and http://bioinf.scri.sari.ac.uk/lp/programs.html#genomediagram (take your pick). To use the new features, you need to do the following sort of thing: parser = GenBank.FeatureParser() fhandle = open ('/data/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk','r') genbank_entry = parser.parse(fhandle) fhandle.close() gdd = GDDiagram('Test Diagram') gdfs1 = GDFeatureSet(name='CDS features') for feature in genbank_entry.features: if feature.type == 'CDS': # This is how you can override any attribute of the # GDFeature as you add it to the GDFeatureSet, just by # passing the appropriate keyword and argument gdfs1.add_feature(feature, name="Some feature or other") # By passing the scale_format = "SInt" argument, you can use SI-like # suffixes for scale markers. So far we only have Kbp and Mbp # suffixes, and the default goes to just a string of the marker # base postion. gdt1 = GDTrack('CDS features', greytrack=1, scale_largetick_interval=1e4, scale_smalltick_interval=1e3, scale_format = "SInt") gdt1.add_set(gdfs1) # You can now do regular expression comparisons, startswith # comparisons, exclusions or just plain matches to any GDFeature # attribute, just by passing the appropriate attribute, value and # comparator mode mod_features = gdfs1.get_features('name', 'NEQ0[2-4]', 'like') #mod_features = gdfs1.get_features('name', 'NEQ02', 'startswith') #mod_features = gdfs1.get_features('name', 'NEQ05', 'not') #mod_features = gdfs1.get_features('name', 'NEQ016') for feature in mod_features: feature.label = 1 And, finally, the marker labels in the lower halves of GenomeDiagram images have been lowered so that they hit the marker line at the top of the string, rather than the bottom. Phew! L. -- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp From Teemu.Kuulasmaa at uku.fi Tue Mar 28 06:59:27 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Tue, 28 Mar 2006 09:59:27 +0300 Subject: [BioPython] biopython and dbSNP (2) Message-ID: <4428DECF.4070406@uku.fi> Hi, I made some experimentations and got GenBank.search_for() and GenBank.download_many() to work with dbSNP. However, I didn't succeed to get GenBank.NCBIDictionary() to work. I do not know if this is right way to do it. It would by nice if someone (biopython-dev) could speak out on the matter. Here are two very small diffs (against biopython version 1.41) that were required to get dbSNP sequence retrieval to work: ---------------------------------------------------------- ubuntu at ubuntu:~/src/biopython$ diff /usr/lib/python2.4/site-packages/Bio/EUtils/Config.py Config.py 58c58 < databases.SNP = _add_db(DatabaseInfo("snp", 1)) ubuntu at ubuntu:~/src/biopython$ diff /usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py __init__.py 1422,1423d1421 < elif database in ['snp']: < format = 'fasta' This is example how it works after these modifications: ---------------------------------------------------------- ubuntu at ubuntu:~$ python Python 2.4.2 (#2, Mar 5 2006, 00:03:25) [GCC 4.0.3 20060212 (prerelease) (Ubuntu 4.0.2-9ubuntu1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import GenBank >>> snps = GenBank.search_for('rs8192602', 'snp') >>> snps ['8192602'] >>> seqs = GenBank.download_many(snps, 'snp') >>> print seqs.read() 1: rs8192602 [Homo sapiens] >gnl|dbSNP|rs8192602 rs=8192602|pos=272|len=397|taxid=9606|mol="genomic"|class=1|alleles="A/G"|build=117 TGGCAGAGTG GGGAGTAGGA GGGTAGTGCC AGTGAGTAAA CCAGACTCCA TACCTTAAGC TCAACTCCTA TCCCTTTGTC GCCTCCCAAC CCCAGTCATG GCTGAGTACG GGACCCTCCT GCAAGACCTG ACCAACAACA TCACCCTTGA AGATCTAGAA CAGCTCAAGT CGGCCTGCAA GGAAGACATC CCCAGCGAAA AGAGTGAGGA GATCACTACT GGCAGTGCCT GGTTTAGCTT CCTGGAGAGC CACAACAAGC TGGACAAAGG T R GGGGAGGGGA GCACAGGGGT CCTGTCATCA GTCATTCAGG CTCAGTTCAT TCAGCAAATA GAGATGAGCT CAAAGCTTTT ACATCCACAA TGTGTACCCC TCTATAGCAA GGCAGAAGAG AGGTG Best regards, Teemu Kuulasmaa From biopython at maubp.freeserve.co.uk Tue Mar 28 17:11:12 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 28 Mar 2006 18:11:12 +0100 Subject: [BioPython] biopython and dbSNP (2) In-Reply-To: <4428DECF.4070406@uku.fi> References: <4428DECF.4070406@uku.fi> Message-ID: <44296E30.1070809@maubp.freeserve.co.uk> Teemu Kuulasmaa wrote: > Hi, > > I made some experimentations and got GenBank.search_for() and > GenBank.download_many() to work with dbSNP. However, I didn't succeed to > get GenBank.NCBIDictionary() to work. I do not know if this is right way > to do it. It would by nice if someone (biopython-dev) could speak out on > the matter. > > Here are two very small diffs (against biopython version 1.41) that were > required to get dbSNP sequence retrieval to work: I'm not familiar with this aspect of the GenBank support, but your code looks OK to me. I tried your two changes on the CVS version of EUtils and GenBank and it works for me (the GenBank file has had significant changes to the file parser). One question is are the GenBank.search_for() and GenBank.download_many() functions intended just for "GenBank" (officially just the nucleotides?) or other sequence based EUtils databases like proteins, snp, ..., or even genomes. Unless anyone else cares to comment, I'll commit Teemu's two small changes in the next few days. As to getting GenBank.NCBIDictionary() to work with the snp database, its not as easy as it looks. Peter From biopython at maubp.freeserve.co.uk Tue Mar 28 18:06:08 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Tue, 28 Mar 2006 19:06:08 +0100 Subject: [BioPython] biopython and dbSNP (2) In-Reply-To: <44296E30.1070809@maubp.freeserve.co.uk> References: <4428DECF.4070406@uku.fi> <44296E30.1070809@maubp.freeserve.co.uk> Message-ID: <44297B10.9070000@maubp.freeserve.co.uk> Peter (BioPython List) wrote: > Teemu Kuulasmaa wrote: > >>Hi, >> >>I made some experimentations and got GenBank.search_for() and >>GenBank.download_many() to work with dbSNP. However, I didn't succeed to >>get GenBank.NCBIDictionary() to work. I do not know if this is right way >>to do it. It would by nice if someone (biopython-dev) could speak out on >>the matter. >> >>Here are two very small diffs (against biopython version 1.41) that were >>required to get dbSNP sequence retrieval to work: > > > I'm not familiar with this aspect of the GenBank support, but your code > looks OK to me. > > I tried your two changes on the CVS version of EUtils and GenBank and it > works for me (the GenBank file has had significant changes to the file > parser). > > One question is are the GenBank.search_for() and GenBank.download_many() > functions intended just for "GenBank" (officially just the nucleotides?) > or other sequence based EUtils databases like proteins, snp, ..., or > even genomes. > > Unless anyone else cares to comment, I'll commit Teemu's two small > changes in the next few days. > > As to getting GenBank.NCBIDictionary() to work with the snp database, > its not as easy as it looks. Trying this with SNP's (having applied Teemu Kuulasmaa's changes) we get back "mangled FASTA entries" with additional headers and blank lines. Ignoring the spaces in the sequences (which appear mostly in ten nucleotide blocks with a space in between) we get: >>> seqs = GenBank.download_many(['8192602','8192603'], 'snp') >>> print seqs.read() 1: rs8192602 [Homo sapiens] >gnl|dbSNP|rs8192602 ... TGGCAGAGTG... 2: rs8192603 [Homo sapiens] >gnl|dbSNP|rs8192603 ... TGGTGGGCAG... The blank lines shouldn't be a problem for the BioPython's FASTA parser. However, due to the extra lines look like "{Result Number}: {Identifier} [{Species}]" this is NOT a valid FASTA format file. This may be an NCBI EUtils problem... following their FAQ, I tested this URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA and this: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA And it does the same sort of thing :( I have emailed the NCBI... Peter From Teemu.Kuulasmaa at uku.fi Wed Mar 29 09:07:00 2006 From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa) Date: Wed, 29 Mar 2006 12:07:00 +0300 Subject: [BioPython] biopython and dbSNP (2) References: 44296E30.1070809@maubp.freeserve.co.uk Message-ID: <442A4E34.8050206@uku.fi> > Peter (BioPython List) wrote: > > The blank lines shouldn't be a problem for the BioPython's FASTA parser. > > However, due to the extra lines look like "{Result Number}: {Identifier} > [{Species}]" this is NOT a valid FASTA format file. > > This may be an NCBI EUtils problem... following their FAQ, I tested this > URL: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA > > and this: > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA > > And it does the same sort of thing :( > > I have emailed the NCBI... > > Peter Thank you for your response Peter! Like you said the NCBI EUtils result is not valid Fasta formated file. I hope that NCBI will fix this issue soon. Let us know if you get any kind of feedback from NCBI! Teemu -- Teemu Kuulasmaa, M.Sc. University of Kuopio Laboratory of Internal Medicine P.O.Box 1627 70211 Kuopio FINLAND Tel +358 1716 3498 Fax +358 1716 2445 From biopython at maubp.freeserve.co.uk Wed Mar 29 13:18:30 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Wed, 29 Mar 2006 14:18:30 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.21 In-Reply-To: <442828E2.3040703@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <442828E2.3040703@maubp.freeserve.co.uk> Message-ID: <442A8926.8040101@maubp.freeserve.co.uk> There was a packing problem with GenomeDiagram 0.2 (missing new module Observer), which Leighton has fixed with the release of GenomeDiagram 0.21, available here: http://bioinf.scri.ac.uk/lp/programs.html#genomediagram This also adds a dpi option to GDDiagram.write() for raster output - which is handy for generating high resolution PNG files. Peter From lpritc at scri.sari.ac.uk Wed Mar 29 14:05:21 2006 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Wed, 29 Mar 2006 15:05:21 +0100 Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram 0.21 In-Reply-To: <442A8926.8040101@maubp.freeserve.co.uk> References: <4424088E.9090004@maubp.freeserve.co.uk> <442828E2.3040703@maubp.freeserve.co.uk> <442A8926.8040101@maubp.freeserve.co.uk> Message-ID: <1143641121.4788.9.camel@lplinuxdev> On Wed, 2006-03-29 at 14:18 +0100, Peter (BioPython List) wrote: > There was a packing problem with GenomeDiagram 0.2 (missing new module > Observer), The problem was me - the packaging did exactly what I told it to, more's the pity ;) > which Leighton has fixed with the release of GenomeDiagram > 0.21, available here: > > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram I'll leave the advertising in... -- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp GPG/PGP: FEFC205C E58BA41B http://www.keyserver.net (If the signature does not verify, please remove the SCRI disclaimer) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.sari.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From halima at mancala.cbio.uct.ac.za Thu Mar 30 07:17:10 2006 From: halima at mancala.cbio.uct.ac.za (Halima Rabiu) Date: Thu, 30 Mar 2006 09:17:10 +0200 (SAST) Subject: [BioPython] Need help on NCBIStandaloneblast Message-ID: Hi everyboby ; I am new to biopython having problems with the "NCBIStandalone.blastall". After launching the Blast with "doBlast" it look like runs and end and then I check the output it empty and I try same thing using comand line it work and get result. I attch my code. I also try to go though the previous posts on biopython mailing list fund similar problem post by Andreas but no solution to the problem . Please can somebody help Thanks Nike -------------- next part -------------- #! /usr/local/bin/python2.4 #halimah #20-03-2006 from Bio.Blast import NCBIStandalone import os # path to my database data=os.path.join(os.getcwd(),"Newprotein.db","Nprotein.Fdb") # input file (protein sequence in fasta ) infile=os.path.join(os.getcwd(),"Newprotein.db","mytest.txt",'r') # path to Blastall executable blast_exe=os.path.join("/","usr","local","blast","bin","blastall") output,error_info =NCBIStandalone.blastall(blast_exe,"blastp", data, infile) #print output.readline() save_file =open("blast.out","w") blast_result=output.read() save_file.write(blast_result) save_file.close() blastfile = open('blast.out', 'r') b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blastfile, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break #This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let's just use print out a quick summary of all of the alignments greater than some threshold value. E_VALUE_THRESH = 1.00 for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' From biopython at maubp.freeserve.co.uk Thu Mar 30 15:56:29 2006 From: biopython at maubp.freeserve.co.uk (Peter (BioPython List)) Date: Thu, 30 Mar 2006 16:56:29 +0100 Subject: [BioPython] Need help on NCBIStandaloneblast In-Reply-To: References: Message-ID: <442BFFAD.10103@maubp.freeserve.co.uk> Halima Rabiu wrote: > Hi everyboby ; > I am new to biopython having problems with the "NCBIStandalone.blastall". > After launching the Blast with "doBlast" it look like runs and end > and then I check the output it empty and I try same thing using comand > line it work and get result. > I attch my code. Have you checked the paths are correct, e.g. assert os.path.isfile(data), "Missing database file " + data assert os.path.isfile(infile), "Missing input file " + infile You don't need to check blast_exe yourself, as the blastall command does this for you. If I understood you correctly, the "blast.out" file is empty. Did blast return any error message? Try: print error_info.read() or: save_file =open("blast.error","w") blast_result=error_info.read() save_file.write(blast_result) save_file.close() Next question, could you tell us what you typed at the command line which does work? > I also try to go though the previous posts on biopython mailing list fund > similar problem post by Andreas but no solution to the problem . It was worth checking anyway :) Peter From mdehoon at c2b2.columbia.edu Fri Mar 31 17:22:13 2006 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 31 Mar 2006 12:22:13 -0500 Subject: [BioPython] BOSC announcement Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEDC@cgcmail.cgc.cpmc.columbia.edu> MEETING ANNOUNCEMENT & CALL FOR SPEAKERS The 7th annual Bioinformatics Open Source Conference (BOSC 2006) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place Aug 4,5th in Fortaleza, Brasil, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 14th International Conference on Intelligent Systems for Molecular Biology. Please consult The Official BOSC 2006 Website at http://www.open-bio.org/wiki/BOSC_2006 for details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee.