From ziemys at ecr6.ohio-state.edu  Fri Mar  3 15:09:02 2006
From: ziemys at ecr6.ohio-state.edu (ziemys@ecr6.ohio-state.edu)
Date: Fri Mar  3 16:00:22 2006
Subject: [BioPython] Bio.PDB.ResidueDepth
Message-ID: <W16111467684521141416542@ER6S3>

HI,

What sphere radius is used to calculate the surface in Bio.PDB.ResidueDepth? 1.4 ? Can the radius be modified and how ? 

Is it needed just 'msms.exe' or I need 'pdb_to_xyzr.exe' also to install?

With best
Arturas


From biopython at maubp.freeserve.co.uk  Sat Mar  4 05:08:45 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython))
Date: Sat Mar  4 05:31:20 2006
Subject: [BioPython] Homology Modeling question
In-Reply-To: <BAY103-F151C140A1ABC724D63C526E6F40@phx.gbl>
References: <BAY103-F151C140A1ABC724D63C526E6F40@phx.gbl>
Message-ID: <4409672D.4090602@maubp.freeserve.co.uk>

Omid Khalouei wrote:
> Hello,
> 
> My question is not specifically related to Biopython, I wanted to know 
> if homology modeling can be used reliably to see the effects of single 
> amino acid substitutions. I mean is homoloy modeling useful for 
> predicting the structure of those sequences for which there is no know 
> structure or can it also be used for a more "fine tuning" analysis such 
> as changing one amino acid on a PDB structure and then performing 
> homology modeling using that same PDB structure as template?

I was under the impression that for homology modelling you provide an 
alignment of your sequence with several other sequences with associated 
known structures.  Have you looked at the Sali Lab's program MODELLER:

http://salilab.org/modeller/

For the simple case of "fine tuning" analysis with a single amino acid 
substitute, homology modelling might be overkill.  A simple substitution 
using the same backbone positions and initial direction for the side 
chain, followed by a molecular dynamics energy minimization of the side 
chain may be enough.  This particular question has cropped before on the 
MMTK mailing lists - MMTK is a python molecular dynamics library:

http://starship.python.net/crew/hinsen/MMTK/index.html
http://starship.python.net/pipermail/mmtk/

> Also is there any uptodate forum for homology modeling? I looked it up 
> on Google but postings were for back in 1990's.

I don't know.

Peter

From mdehoon at c2b2.columbia.edu  Sat Mar  4 15:33:36 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sat Mar  4 15:29:45 2006
Subject: [BioPython] qblast fails on parsing XML results
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECE9A@cgcmail.cgc.cpmc.columbia.edu>

Fixed in CVS using urllib2. Thanks to Alexander Morgan for providing the
code. Please let us know if there is still a problem with qblast.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces@portal.open-bio.org on behalf of Ilya Soifer
Sent: Mon 2/27/2006 10:38 AM
To: biopython@biopython.org
Subject: [BioPython] qblast fails on parsing XML results
 
Hi,
I hope that I send it to the correct list.
When I run qblast I get

>>> res1 = NCBIWWW.qblast("blastn", "nr", seq1)

Traceback (most recent call last):
 File "<pyshell#24>", line 1, in -toplevel-
   res1 = NCBIWWW.qblast("blastn", "nr", seq1)
 File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line
1130, in qblast
   i = results.index("Connection: close")
ValueError: substring not found

This happens since the results that Blast return no longer have this header

   # HTTP/1.1 200 OK
   # Date: Wed, 05 Oct 2005 02:13:33 GMT
   # Server: Nde
   # Content-Type: text/plain
   # Connection: close
   #


but this one

HTTP/1.0 200 OK
Date: Mon, 27 Feb 2006 11:54:40 GMT
Content-Type: application/xml
Server: Nde
Via: 1.1 proxy7 (NetCache NetApp/6.0.2)

I guess it might be better to look for something like "<?xml" etc. in
order to remove the annoying header.

Ilya

_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython


From kael.fischer at gmail.com  Tue Mar  7 15:07:53 2006
From: kael.fischer at gmail.com (Kael Fischer)
Date: Tue Mar  7 15:02:51 2006
Subject: [BioPython] Re: GenBank RecordParser Failure
In-Reply-To: <cd02cc220603071159l3ddfa93fn59ab18d04f35cd19@mail.gmail.com>
References: <cd02cc220603071144p16135c64hd4c9b8844d649eca@mail.gmail.com>
	<cd02cc220603071159l3ddfa93fn59ab18d04f35cd19@mail.gmail.com>
Message-ID: <cd02cc220603071207q2b0435c2k6582a5c3e6b83c00@mail.gmail.com>

That snippet is useable if you moe it down a few lines-

right after:

consumer.reference_num(data[:data.find(' ')])

and before:
consumer.reference_bases(data[data.find(' ')+1:])


-Kael

From ziemys at ecr6.ohio-state.edu  Tue Mar  7 16:38:36 2006
From: ziemys at ecr6.ohio-state.edu (ziemys@ecr6.ohio-state.edu)
Date: Tue Mar  7 16:34:46 2006
Subject: [BioPython] HSE (half-sphere exposure)
Message-ID: <W67307925643041141767516@ER6S3>

Hi

Can anybody give more details about how to use HSE in BioPython ?

(BioPython is very nice, but at the same it suffers from the lack of documentations...)

With best
Arturas


From thamelry at binf.ku.dk  Tue Mar  7 16:50:46 2006
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Tue Mar  7 17:02:46 2006
Subject: [BioPython] HSE (half-sphere exposure)
In-Reply-To: <W67307925643041141767516@ER6S3>
References: <W67307925643041141767516@ER6S3>
Message-ID: <33099.87.72.27.226.1141768246.squirrel@www.binf.ku.dk>


On Tue, March 7, 2006 10:38 pm, ziemys@ecr6.ohio-state.edu wrote:
> Hi
>
>
> Can anybody give more details about how to use HSE in BioPython ?
>
>
> (BioPython is very nice, but at the same it suffers from the lack of
> documentations...)
>
> With best
> Arturas
>

Hi Arturas,

Below is an example.
Note that HSE-alpha is undefined for the first and last
residues of a polypeptide.

Best regards,

-Thomas

----

from Bio.PDB import *
import sys

p=PDBParser()
s=p.get_structure('X', sys.argv[1])
model=s[0]

RADIUS=12.0

hse=HSExposureCA(model, radius=RADIUS)
hse=HSExposureCB(model, radius=RADIUS)
hse=ExposureCN(model, radius=RADIUS)

for r in model.get_residues():
    if is_aa(r):
        print r
        try:
            # Contact number
            print r.xtra["EXP_CN"]
            # HSE alpha up
            print r.xtra["EXP_HSE_A_U"]
            # HSE alpha down
            print r.xtra["EXP_HSE_A_D"]
            # HSE beta up
            print r.xtra["EXP_HSE_B_U"]
            # HSE beta down
            print r.xtra["EXP_HSE_B_D"]
            print
        except:
            pass


From kael at sonic.net  Tue Mar  7 16:55:53 2006
From: kael at sonic.net (Kael Fischer)
Date: Tue Mar  7 17:06:42 2006
Subject: [BioPython] GenBank RecordParser Failure (split REFERENCE line)
Message-ID: <cd02cc220603071355i241005b4jfeaf9a522313b8bc@mail.gmail.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: gb__init__v2.diff
Type: application/octet-stream
Size: 747 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20060307/e66f9bff/gb__init__v2.obj
From biopython at maubp.freeserve.co.uk  Fri Mar 10 09:07:56 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Fri Mar 10 09:21:06 2006
Subject: [BioPython] Translating ambiguous stop codons
Message-ID: <4411883C.2020200@maubp.freeserve.co.uk>

I've been working on simple gene finding within sequence contigs from 
unfinished genomes.  Very simply I have used biopython to scan each of 
the six frames looking for a start codon, translating until the next 
stop codon - and repeating.  This is a pretty simple way of generating a 
list of possible open reading frames for further analysis.

Unfortunately (as is probably the case for many unfinished genomes) 
there are some ambiguous codons which could code for an amino acid or a 
stop codon:

e.g.
"NAG" could be E, K, Q or a stop codon
"YAG" could be either Q or a stop codon (as Y = C or T)

For example, If I have the ambiguous sequence 
"CAAGGCGTCGAAYAGCTTCAGGAACAGGAC" and try and translate it I get an 
exception, "TranslationError: YAG"

from Bio.Seq import Seq
from Bio import Translate
my_translator = Translate.ambiguous_dna_by_id[11]
my_dna = Seq('CAAGGCGTCGAAYAGCTTCAGGAACAGGAC', \
              my_translator.table.nucleotide_alphabet)

#print my_translator.translate_to_stop(my_dna)
print my_translator.translate(my_dna)

The possible translations are 'QGVEQLQEQD', and 'QGVE*LQEQD'

Is this situation something many other BioPython users have had to deal 
with?  I could write my own translate method for this particular 
application, but was wondering how best to support this within the basic 
BioPython setup.

Suggestion One - Fairly Simple
==============================
The translate_to_stop method could be enhanced with an option to control 
how it copes with ambiguous codons that could be either a stop or an 
amino acid:
(i) Treat as a stop codon "*", and stop translating there
(ii) Treat as amino acid, and continue translating
(iii) Treat as ambiguous (see suggestion two) and continue translating

As this is an unusual case, the additional code would only be triggered 
rarely so should not have much impact on the speed of the typical 
translation.

This could also be done to the translate method giving:
(i) Treat as stop codon, e.g. 'QGVE*LQEQD'
(ii) Treat as amino acid, e.g. 'QGVEXLQEQD' or better 'QGVEQLQEQD'
(iii) Treat as ambiguous (see suggestion two)

In this case (codon = "YAG") if we assume it is an amino acid (and not a 
stop codon) it must be "Q".  In other examples (e.g. "NAG") then the 
result would be E, K or Q and thus result in translation "X".

Suggestion Two - Complex
========================
Biopython uses "*" for a stop codon, and "X" for any amino acid.  There 
does not seem to be a symbol for either a stop codon or an amino acid, 
e.g. "?".  As far as I can tell, there is no IUPAC standard for this...

If this existed (maybe in a variant of the IUPACAmbiguousDNA alphabet) 
then we could expect to get back 'QGVE?LQEQD' from translate.

Old, but fairly relevant, email from Andrew Dalke

http://biopython.org/pipermail/biopython-dev/2000-August/000072.html

Peter

From mdehoon at c2b2.columbia.edu  Sat Mar 18 15:40:51 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sat, 18 Mar 2006 15:40:51 -0500
Subject: [BioPython] Test - please ignore
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEC4@cgcmail.cgc.cpmc.columbia.edu>

Just testing if I can send to this mailing list. One of our users complained
that his messages were getting bounced, although he is a member of this
mailing list.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From biopython at maubp.freeserve.co.uk  Tue Mar 21 07:30:16 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython))
Date: Tue, 21 Mar 2006 12:30:16 +0000
Subject: [BioPython] EMBOSS programs and their alignment formats
Message-ID: <441FF1D8.2060904@maubp.freeserve.co.uk>

I've been having a look at BioPython's Emboss support and it looks like 
a (partial) set of command line interfaces to the tools, with additional 
code for some of the primer tools and their formats.

As far as I can tell, there is no support for any of the Emboss 
alignment output formats:

http://emboss.sourceforge.net/docs/themes/AlignFormats.html

Some (all?) of the alignment programs will happily produce gapped FASTA 
output, but this excludes other information like the alignment score 
etc.  The alignments themselves could be analysed to extract the 
alignment length, identity, similarity and gap counts.

However, the FASTA format does not include the algorithm specific score, 
nor other program parameters which might be of interest (like the matrix 
and gap penalties).

e.g.

########################################
# Program:  demoalign
# Rundate:  Thu Jan 17 09:30:08 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 9
# Extend_penalty: -1
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
#
#
#=======================================

(followed by the aligned sequences)

Has anyone tackled supporting these files in BioPython?

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Fri Mar 24 09:56:14 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Fri, 24 Mar 2006 14:56:14 +0000
Subject: [BioPython] Tweaking GenomeDiagram
Message-ID: <4424088E.9090004@maubp.freeserve.co.uk>

This email is mainly aimed at Leighton Pritchard (who I have spotted 
posting on the list in the past) as it concerns his (bio)python add-on, 
GenomeDiagram:

http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

First Query
-----------
I would like to attach labels to (selected) features.

For example, I am drawing a circular genome diagram with a selection of 
colour coded genes - some of which I would like to have individually 
labelled.  This might be done in a similar way to the genome size tick 
captions (i.e. horizontal text) or perhaps rotated text (radially aligned).

However, as far as I can tell from the documentation and the source 
code, this is not built in.

Second Query
-------------
When drawing circular genomes following the examples, the major tick 
marks seem to be at 1, 10001, 20001, ... (depending on the tick interval 
size).

It would look much better to display 10000 rather than 10001 (or even, 
leading to my third question, 10 Kb).

Third Query
-----------
I would like to have genome size "tick labels" in terms of kilo-bases or 
mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than 2000000).  I 
have done this myself by "hacking the source code" but my implementation 
is rather special case.

So, has anyone tried to tackle these issues before?

Thanks

Peter


From jchang at smi.stanford.edu  Fri Mar 24 09:06:11 2006
From: jchang at smi.stanford.edu (jchang at smi.stanford.edu)
Date: Fri, 24 Mar 2006 09:06:11 -0500
Subject: [BioPython] Lecturer needed for "Advanced Python"
In-Reply-To: <4423E03E.20604@nbn.ac.za>
References: <4423E03E.20604@nbn.ac.za>
Message-ID: <20060324140609.GA266@sophie.local>

Hello,

Ruediger Braeuning has asked me to forward this to the list.

Jeff


On Fri, Mar 24, 2006 at 02:04:14PM +0200, Ruediger Braeuning wrote:
> Hi,
> 
> I'm writing to you from the South African National Bioinformatics
> Network (NBN). We need a lecturer for our course in "Advanced Python" as
> Andrew Dalke (the lecturer of last year's "Advanced Python") is not
> available this year.
> 
> So if you are qualified and want to spend some time in Cape Town drop me
> a line.
> 
> Time
> ----
> We allocated 36 days for this module (Thursday, August 10th till Friday,
> September 29th, 2006). I know that this is a long time but as you can
> see from the daily schedule it's just 3 hours per day. For the rest of
> the day you are free and we can provide you with an office.
> 
> Expenses
> ---------
> We arrange and cover your flight, local transport, accommodation, meals.
> There is also a small honorarium of ZAR 300 per day of teaching.
> 
> Please find more details below:
> 
> The National Bioinformatics Network (NBN)
> ------------------------------------------
> The NBN was established to stimulate and support growth and development
> of Bioinformatics as a scientific and applied discipline in South Africa
> at an internationally competitive level.
> 
> The Course
> -----------
> We run national courses on an annual basis. Details of the courses
> (content, lecture material) that were run in 2004 & 2005 can be found
> under "Bioinformatics Workshop Modules" at
> http://www.nbn.ac.za/Education/course.html
> 
> The course content is aimed at covering as much of our Bioinformatics
> core curriculum as possible.
> 
> Your module
> ------------
> We have the following suggestions for your module:
> 
> - Data structures (object oriented design)
> - Libraries, BioPython
> - How to write larger pieces of code
> - Interface Design
> - Usability Testing Methodologies
> 
> Note: These are just suggestions. You are the expert and we would
> welcome your ideas. The students already got 35 days of introduction to
> Python (Feb - Apr 2006). I'd be more than happy to hook you up with the
> lecturer of that module.
> 
> IMPORTANT: Please let us know of any prerequisites you require your
> students to have. We can then make certain course modules a prerequisite
> for your module.
> - module on "Introductory Python"
> 
> Also let us know of any required reading your students have to for
> preparation.
> 
> As your module is part of the bigger course on Bioinformatics we would
> like to encourage you to show the relevance of your module for the whole
> discipline and use Bioinformatics examples. The courses should start
> easy, get everybody on board and then go into detail. Lectures should be
> complemented by hands on sessions, which we believe is absolutely
> crucial to the success of teaching and training. Problem based teaching
> approaches worked best for our students. The NBN strongly emphasizes
> open source solutions. It would be great if you could support this by
> choosing your software accordingly.
> 
> Daily Schedule
> --------------
> 08:00-10:45    Python
> 10:45-11:00    Break
> 11:00-13:00    Lecture/Practical for another module
> 13:00-14:00    Lunch
> 14:00-15:30    Lecture/Practical for another module
> 15:30-15:45    Break
> 15:45-17:00    Lecture/Practical for another module
> 
> Students will be taught Python every day of the course. We encourage all
> our lecturers to exchange ideas for little tasks that are relevant for
> their module and can be tackled in Python with the Python lecturer.
> 
> Background of students
> -----------------------
> Course participants will come from a range of different backgrounds
> (Biology, Computer Science) and comprise people who study Bioinformatics
> (attendance is mandatory for students with a NBN bursary) and people who
> are "just" interested in particular aspects of Bioinformatics. Students
> will also vary in terms of seniority from undergraduates to postdocs
> with the majority being postgraduates. You should therefore expect a
> higher degree of heterogeneity of knowledge amongst the course
> participants than you would normally expect. This means you should be
> prepared to have some flexibility in adjusting your course schedule to
> the participants.
> 
> Number of students
> -------------------
> We expect a maximum of 25 students.
> 
> Assistants
> ----------
> Please let me know if you need some student assistants.
> 
> Facilities
> -----------
> Every participant will have her/his own PC to work on. A video projector
> and stable internet access is in place.
> 
> Evaluation and Assignments
> ---------------------------
> To give you and us feedback on the success of the course we will hand
> out evaluation questionnaires after each module. Students also have to
> take one assignment per course (our pass mark is 65%). In that respect
> we would like to ask you to provide and mark your assignment.
> 
> Should you require further information please don?t hesitate to contact
> me at ruediger at nbn.ac.za or + 27 21 959 2991. I?m also more than happy
> to give you a call.
> 
> I?m looking forward to working with you.
> 
> Should you not be available I would be grateful if you could recommend
> another lecturer to me.
> 
> Ruediger
> -- 
> Ruediger Braeuning    /   National Bioinformatics Network
>              (=)  University of the Western Cape
> Ph. +27 21 959 2991   /   Private Bag X17
> Fax +27 21 959 3573  (=)  Bellville, 7535
> www.nbn.ac.za         /   South Africa

From Teemu.Kuulasmaa at uku.fi  Sat Mar 25 11:56:48 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Sat, 25 Mar 2006 18:56:48 +0200
Subject: [BioPython] biopython and dbSNP
Message-ID: <44257650.6090602@uku.fi>

Hi,

I am absolute beginner in python and biopython. I am trying to
familiarize myself with biopython. I have java background but I think
that python (and biopython) would be better tool to automate my daily
routines. Python seems to be superior language for quick (and sometimes
dirty) scripting compared to java. I would like to write some python
scripts that help me to work with SNPs and DNA/RNA sequences. I work
with SNPs daily basis.

However,I was disappointed because I didn't find any notice about dbSNP
from biopython documentation. At the beginnig I would like to be able to
retrieve some SNP records from NCBI's dbSNP and parse them. Is there any
ready made classes for that purpose? GenBank.search_for('id',
'database=xxx',...) function doesn't seem to support 'database=snp'
parameter.

To put it simple: Am I able to work with dbSNP by using biopython?

Best regards,

Teemu Kuulasmaa


From dag at sonsorol.org  Sat Mar 25 18:50:57 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Sat, 25 Mar 2006 18:50:57 -0500
Subject: [BioPython] Important news for developers on open-bio machines
Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org>


Hi, apologies for the massive cross-post. I'll keep it short!

This message is a last-ditch attempt to contact people with developer  
accounts on pub.open-bio.org who may have not received the individual  
mails we've been sending via the obf-developers at lists.open-bio.org  
mailing list. We suspect that there are a number of devs out there  
for whom we don't have up to date email addresses.

All open-bio services have been migrated to new hardware and a new  
datacenter. Part of this migration process involved moving all  
developer accounts and all source-code repositories to a new server.  
The developer migration was completed a few minutes ago. An  
unavoidable side effect of the move is that all developers are now  
locked out of their accounts until they contact us for a password reset.

If you are a developer and this news comes as a surprise to you, it  
means we don't have your contact info. Your best way to get up to  
speed on the history and technical details behind the migration is to  
point your browser here:

http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ 
thread.html

... and read the various messages we've posted this month. Included  
in the first message is the information on how to request an account  
reset.


Regards,
Chris Dagdigian
open-bio.org


From biopython at maubp.freeserve.co.uk  Mon Mar 27 09:48:52 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Mon, 27 Mar 2006 15:48:52 +0100
Subject: [BioPython] Tweaking GenomeDiagram
In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
Message-ID: <4427FB54.1040408@maubp.freeserve.co.uk>

Thanks Leighton',

I've included most of your reply for the benefit of the BioPython 
mailing list and its archive...

>>> First Query
>>> -----------
>>> I would like to attach labels to (selected) features.
>>> 
>>> For example, I am drawing a circular genome diagram with a selection of 
>>> colour coded genes - some of which I would like to have individually 
>>> labelled.  This might be done in a similar way to the genome size tick 
>>> captions (i.e. horizontal text) or perhaps rotated text (radially aligned).
>>> 
>>> However, as far as I can tell from the documentation and the source 
>>> code, this is not built in.

I clearly didn't read the right bit of the source code.

Leighton wrote:
> Each individual GDFeature has a label attribute, taking a Boolean, that
> allows you to set whether its label is displayed or not.  You could set
> this on feature creation, ...

e.g.

for feature in genbank_entry.features:
     if feature.type == 'CDS':
         gdfs.add_feature(feature, label=False, colour=colors.lightgreen)
     elif feature.type == 'tRNA' :
         gdfs.add_feature(feature, label=True, colour=colors.red)

(This can easily be used with the examples in the documentation)

Leighton wrote:
 > ... or at some later stage with a filter.  If you're working with
 > N.equitans, and your GDFeatureSet is called `gdfs1', for example, this
 > code:
> 
>     gdfs1.set_all_features('label', 0)
>     for feature in gdfs1.features.values():
>         print feature.name, feature.label
>         if feature.name.startswith('NEQ016'):
>             feature.label = 1

Maybe that should be gdfs._features rather than gdfs.features?

> will label only features whose names begin with NEQ016.  You'll probably
> already see how flexible this can be if you add your own attributes to
> GDFeature objects when they're created (BLAST scores, expression
> leveles, membership of functional classes, etc.).  You can set the
> label_font, label_size, label_colour and label_angle attributes in the
> same kind of way.

By default, when using the add_feature method with a SeqFeature 
extracted from a GenBank file, GenomeDiagram will look at the 'gene', 
'label', 'locus_tag' and 'product' qualifiers for potential labels (in 
that order).  (See code in GDFeature.py class GDFeature.)

It might be usefull to be able to supply the prefered label caption as 
part of the add_feature command.

Thanks

Peter


From lpritc at scri.sari.ac.uk  Mon Mar 27 04:17:01 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Mon, 27 Mar 2006 10:17:01 +0100
Subject: [BioPython] Tweaking GenomeDiagram
Message-ID: <1143451021.18558.228.camel@lplinuxdev>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment.ksh 
-------------- next part --------------
An embedded message was scrubbed...
From: Leighton Pritchard <lpritc at scri.sari.ac.uk>
Subject: Re: Tweaking GenomeDiagram
Date: Mon, 27 Mar 2006 10:17:01 +0100
Size: 5030
Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment.mht 

From lpritc at scri.sari.ac.uk  Mon Mar 27 10:42:10 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Mon, 27 Mar 2006 16:42:10 +0100
Subject: [BioPython] Tweaking GenomeDiagram
In-Reply-To: <4427FB54.1040408@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<4427FB54.1040408@maubp.freeserve.co.uk>
Message-ID: <1143474132.18558.260.camel@lplinuxdev>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment.ksh 
-------------- next part --------------
An embedded message was scrubbed...
From: Leighton Pritchard <lpritc at scri.sari.ac.uk>
Subject: Re: [BioPython] Tweaking GenomeDiagram
Date: Mon, 27 Mar 2006 16:42:10 +0100
Size: 5015
Url: http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment.mht 

From biopython at maubp.freeserve.co.uk  Mon Mar 27 13:03:14 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Mon, 27 Mar 2006 19:03:14 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram
 0.2
In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
Message-ID: <442828E2.3040703@maubp.freeserve.co.uk>

Peter wrote:
 >
 > This email is mainly aimed at Leighton Pritchard (who I have spotted
 > posting on the list in the past) as it concerns his (bio)python
 > add-on, GenomeDiagram:
 >
 > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram
 >
 > First Query
 > -----------
 > I would like to attach labels to (selected) features.

See earlier email with example - I hadn't looked closely enough.

http://www.biopython.org/pipermail/biopython/2006-March/002967.html

 > Second Query
 > -------------
 > When drawing circular genomes following the examples, the major tick
 > marks seem to be at 1, 10001, 20001, ... (depending on the tick
 > interval size).
 >
 > It would look much better to display 10000 rather than 10001 (or even,
 > leading to my third question, 10 Kb).

Fixed in GenomeDiagram release 0.2

 > Third Query
 > -----------
 > I would like to have genome size "tick labels" in terms of kilo-bases
 > or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than
 > 2000000).

This is a new option in GenomeDiagram release 0.2, see below.

Leighton has also improved the positioning of the size captions on the 
lower half of circular diagrams, and probably other things as well.

The following email was sent to me, and I am forwarding it to the 
mailing list because Leighton PGP signature was confusing the server.

Thanks again,

Peter

--------------------------------------------------------------------

Hi Peter,

I think I've implemented everything you asked for, and the new source
and Windows installer are located at:

http://bioinf.scri.sari.ac.uk/lp/programs.php

and

http://bioinf.scri.sari.ac.uk/lp/programs.html#genomediagram

(take your pick).

To use the new features, you need to do the following sort of thing:


     parser = GenBank.FeatureParser()
     fhandle = open
('/data/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk','r')
     genbank_entry = parser.parse(fhandle)
     fhandle.close()
     gdd = GDDiagram('Test Diagram')

     gdfs1 = GDFeatureSet(name='CDS features')
     for feature in genbank_entry.features:
         if feature.type == 'CDS':
             # This is how you can override any attribute of the
             # GDFeature as you add it to the GDFeatureSet, just by
             # passing the appropriate keyword and argument
             gdfs1.add_feature(feature, name="Some feature or other")
     # By passing the scale_format = "SInt" argument, you can use SI-like
     # suffixes for scale markers.  So far we only have Kbp and Mbp
     # suffixes, and the default goes to just a string of the marker
     # base postion.
     gdt1 = GDTrack('CDS features', greytrack=1,
                    scale_largetick_interval=1e4,
                    scale_smalltick_interval=1e3,
                    scale_format = "SInt")
     gdt1.add_set(gdfs1)
     # You can now do regular expression comparisons, startswith
     # comparisons, exclusions or just plain matches to any GDFeature
     # attribute, just by passing the appropriate attribute, value and
     # comparator mode
     mod_features = gdfs1.get_features('name', 'NEQ0[2-4]', 'like')
     #mod_features = gdfs1.get_features('name', 'NEQ02', 'startswith')
     #mod_features = gdfs1.get_features('name', 'NEQ05', 'not')
     #mod_features = gdfs1.get_features('name', 'NEQ016')
     for feature in mod_features:
         feature.label = 1

And, finally, the marker labels in the lower halves of GenomeDiagram
images have been lowered so that they hit the marker line at the top of
the string, rather than the bottom.

Phew!

L.

-- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, 
Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, 
UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: 
lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp


From Teemu.Kuulasmaa at uku.fi  Tue Mar 28 01:59:27 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Tue, 28 Mar 2006 09:59:27 +0300
Subject: [BioPython] biopython and dbSNP (2)
Message-ID: <4428DECF.4070406@uku.fi>

Hi,

I made some experimentations and got GenBank.search_for() and 
GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
get GenBank.NCBIDictionary() to work. I do not know if this is right way 
to do it. It would by nice if someone (biopython-dev) could speak out on 
the matter.

Here are two very small diffs (against biopython version 1.41) that were 
required to get dbSNP sequence retrieval to work:
----------------------------------------------------------
ubuntu at ubuntu:~/src/biopython$ diff 
/usr/lib/python2.4/site-packages/Bio/EUtils/Config.py Config.py
58c58
< databases.SNP = _add_db(DatabaseInfo("snp", 1))

ubuntu at ubuntu:~/src/biopython$ diff 
/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py __init__.py
1422,1423d1421
<     elif database in ['snp']:
<         format = 'fasta'


This is example how it works after these modifications:
----------------------------------------------------------
ubuntu at ubuntu:~$ python
Python 2.4.2 (#2, Mar  5 2006, 00:03:25)
[GCC 4.0.3 20060212 (prerelease) (Ubuntu 4.0.2-9ubuntu1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> from Bio import GenBank
 >>> snps = GenBank.search_for('rs8192602', 'snp')
 >>> snps
['8192602']
 >>> seqs = GenBank.download_many(snps, 'snp')
 >>> print seqs.read()

1: rs8192602 [Homo sapiens]
 >gnl|dbSNP|rs8192602 
rs=8192602|pos=272|len=397|taxid=9606|mol="genomic"|class=1|alleles="A/G"|build=117
TGGCAGAGTG GGGAGTAGGA GGGTAGTGCC AGTGAGTAAA CCAGACTCCA TACCTTAAGC 
TCAACTCCTA TCCCTTTGTC
GCCTCCCAAC CCCAGTCATG GCTGAGTACG GGACCCTCCT GCAAGACCTG ACCAACAACA 
TCACCCTTGA AGATCTAGAA
CAGCTCAAGT CGGCCTGCAA GGAAGACATC CCCAGCGAAA AGAGTGAGGA GATCACTACT 
GGCAGTGCCT GGTTTAGCTT
CCTGGAGAGC CACAACAAGC TGGACAAAGG T
R
GGGGAGGGGA GCACAGGGGT CCTGTCATCA GTCATTCAGG CTCAGTTCAT TCAGCAAATA 
GAGATGAGCT CAAAGCTTTT
ACATCCACAA TGTGTACCCC TCTATAGCAA GGCAGAAGAG AGGTG


Best regards,

Teemu Kuulasmaa


From biopython at maubp.freeserve.co.uk  Tue Mar 28 12:11:12 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Tue, 28 Mar 2006 18:11:12 +0100
Subject: [BioPython] biopython and dbSNP (2)
In-Reply-To: <4428DECF.4070406@uku.fi>
References: <4428DECF.4070406@uku.fi>
Message-ID: <44296E30.1070809@maubp.freeserve.co.uk>

Teemu Kuulasmaa wrote:
> Hi,
> 
> I made some experimentations and got GenBank.search_for() and 
> GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
> get GenBank.NCBIDictionary() to work. I do not know if this is right way 
> to do it. It would by nice if someone (biopython-dev) could speak out on 
> the matter.
> 
> Here are two very small diffs (against biopython version 1.41) that were 
> required to get dbSNP sequence retrieval to work:

I'm not familiar with this aspect of the GenBank support, but your code 
looks OK to me.

I tried your two changes on the CVS version of EUtils and GenBank and it 
works for me (the GenBank file has had significant changes to the file 
parser).

One question is are the GenBank.search_for() and GenBank.download_many() 
functions intended just for "GenBank" (officially just the nucleotides?) 
or other sequence based EUtils databases like proteins, snp, ..., or 
even genomes.

Unless anyone else cares to comment, I'll commit Teemu's two small 
changes in the next few days.

As to getting GenBank.NCBIDictionary() to work with the snp database, 
its not as easy as it looks.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 28 13:06:08 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Tue, 28 Mar 2006 19:06:08 +0100
Subject: [BioPython] biopython and dbSNP (2)
In-Reply-To: <44296E30.1070809@maubp.freeserve.co.uk>
References: <4428DECF.4070406@uku.fi> <44296E30.1070809@maubp.freeserve.co.uk>
Message-ID: <44297B10.9070000@maubp.freeserve.co.uk>

Peter (BioPython List) wrote:
> Teemu Kuulasmaa wrote:
> 
>>Hi,
>>
>>I made some experimentations and got GenBank.search_for() and 
>>GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
>>get GenBank.NCBIDictionary() to work. I do not know if this is right way 
>>to do it. It would by nice if someone (biopython-dev) could speak out on 
>>the matter.
>>
>>Here are two very small diffs (against biopython version 1.41) that were 
>>required to get dbSNP sequence retrieval to work:
> 
> 
> I'm not familiar with this aspect of the GenBank support, but your code 
> looks OK to me.
> 
> I tried your two changes on the CVS version of EUtils and GenBank and it 
> works for me (the GenBank file has had significant changes to the file 
> parser).
> 
> One question is are the GenBank.search_for() and GenBank.download_many() 
> functions intended just for "GenBank" (officially just the nucleotides?) 
> or other sequence based EUtils databases like proteins, snp, ..., or 
> even genomes.
> 
> Unless anyone else cares to comment, I'll commit Teemu's two small 
> changes in the next few days.
> 
> As to getting GenBank.NCBIDictionary() to work with the snp database, 
> its not as easy as it looks.

Trying this with SNP's (having applied Teemu Kuulasmaa's changes) we get 
back "mangled FASTA entries" with additional headers and blank lines.

Ignoring the spaces in the sequences (which appear mostly in ten 
nucleotide blocks with a space in between) we get:

 >>> seqs = GenBank.download_many(['8192602','8192603'], 'snp')
 >>> print seqs.read()

1: rs8192602 [Homo sapiens]
 >gnl|dbSNP|rs8192602 ...
TGGCAGAGTG...

2: rs8192603 [Homo sapiens]
 >gnl|dbSNP|rs8192603 ...
TGGTGGGCAG...

The blank lines shouldn't be a problem for the BioPython's FASTA parser.

However, due to the extra lines look like "{Result Number}: {Identifier} 
[{Species}]" this is NOT a valid FASTA format file.

This may be an NCBI EUtils problem... following their FAQ, I tested this 
URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA

and this:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA

And it does the same sort of thing :(

I have emailed the NCBI...

Peter


From Teemu.Kuulasmaa at uku.fi  Wed Mar 29 04:07:00 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Wed, 29 Mar 2006 12:07:00 +0300
Subject: [BioPython]  biopython and dbSNP (2)
References: 44296E30.1070809@maubp.freeserve.co.uk
Message-ID: <442A4E34.8050206@uku.fi>

> Peter (BioPython List) wrote:
> 
> The blank lines shouldn't be a problem for the BioPython's FASTA parser.
> 
> However, due to the extra lines look like "{Result Number}: {Identifier} 
> [{Species}]" this is NOT a valid FASTA format file.
> 
> This may be an NCBI EUtils problem... following their FAQ, I tested this 
> URL:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA
> 
> and this:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA
> 
> And it does the same sort of thing :(
> 
> I have emailed the NCBI...
> 
> Peter

Thank you for your response Peter!

Like you said the NCBI EUtils result is not valid Fasta formated file. I 
hope that NCBI will fix this issue soon.

Let us know if you get any kind of feedback from NCBI!

Teemu

-- 
  Teemu Kuulasmaa, M.Sc.

  University of Kuopio
  Laboratory of Internal Medicine
  P.O.Box 1627
  70211 Kuopio
  FINLAND

  Tel	+358 1716 3498
  Fax	+358 1716 2445


From biopython at maubp.freeserve.co.uk  Wed Mar 29 08:18:30 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Wed, 29 Mar 2006 14:18:30 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram
 0.21
In-Reply-To: <442828E2.3040703@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<442828E2.3040703@maubp.freeserve.co.uk>
Message-ID: <442A8926.8040101@maubp.freeserve.co.uk>

There was a packing problem with GenomeDiagram 0.2 (missing new module 
Observer), which Leighton has fixed with the release of GenomeDiagram 
0.21, available here:

http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

This also adds a dpi option to GDDiagram.write() for raster output - 
which is handy for generating high resolution PNG files.

Peter


From lpritc at scri.sari.ac.uk  Wed Mar 29 09:05:21 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Wed, 29 Mar 2006 15:05:21 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome
	Diagram	0.21
In-Reply-To: <442A8926.8040101@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<442828E2.3040703@maubp.freeserve.co.uk>
	<442A8926.8040101@maubp.freeserve.co.uk>
Message-ID: <1143641121.4788.9.camel@lplinuxdev>

On Wed, 2006-03-29 at 14:18 +0100, Peter (BioPython List) wrote:
> There was a packing problem with GenomeDiagram 0.2 (missing new module 
> Observer), 

The problem was me - the packaging did exactly what I told it to, more's
the pity ;)

> which Leighton has fixed with the release of GenomeDiagram 
> 0.21, available here:
> 
> http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

I'll leave the advertising in...

-- 
Dr Leighton Pritchard AMRSC
D131, Plant-Pathogen Interactions, Scottish Crop Research Institute
Invergowrie, Dundee, Scotland, DD2 5DA, UK
T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578
E: lpritc at scri.sari.ac.uk   W: http://bioinf.scri.sari.ac.uk/lp
GPG/PGP: FEFC205C E58BA41B  http://www.keyserver.net             
(If the signature does not verify, please remove the SCRI disclaimer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.sari.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From halima at mancala.cbio.uct.ac.za  Thu Mar 30 02:17:10 2006
From: halima at mancala.cbio.uct.ac.za (Halima Rabiu)
Date: Thu, 30 Mar 2006 09:17:10 +0200 (SAST)
Subject: [BioPython] Need help on NCBIStandaloneblast
Message-ID: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>

Hi everyboby ;
I am new to biopython having problems with the "NCBIStandalone.blastall".
 After launching the Blast with "doBlast" it look like runs and end
and then I check the output it empty and I try same thing using comand
line it work and get result.
I attch my code.

I also try to go though the previous posts on biopython mailing list fund
similar problem post by Andreas but no solution to the problem .
Please can somebody help
Thanks
Nike
-------------- next part --------------
#! /usr/local/bin/python2.4
#halimah
#20-03-2006
from Bio.Blast import NCBIStandalone
import os
# path to my database
data=os.path.join(os.getcwd(),"Newprotein.db","Nprotein.Fdb")
# input file (protein sequence in fasta )
infile=os.path.join(os.getcwd(),"Newprotein.db","mytest.txt",'r')

# path to Blastall executable
blast_exe=os.path.join("/","usr","local","blast","bin","blastall")

output,error_info =NCBIStandalone.blastall(blast_exe,"blastp", data, infile)
#print output.readline()

save_file =open("blast.out","w")
blast_result=output.read()
save_file.write(blast_result)
save_file.close()
	

blastfile = open('blast.out', 'r')
b_parser = NCBIStandalone.BlastParser()
b_iterator = NCBIStandalone.Iterator(blastfile, b_parser)
while 1:
	b_record = b_iterator.next()
	if b_record is None:
		break
#This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let's just use print out a quick summary of all of the alignments greater than some threshold value.

	E_VALUE_THRESH = 1.00
	for alignment in b_record.alignments:
    		for hsp in alignment.hsps:
        		if hsp.expect < E_VALUE_THRESH:
            			print '****Alignment****'
            			print 'sequence:', alignment.title
	            		print 'length:', alignment.length
            			print 'e value:', hsp.expect
            			print hsp.query[0:75] + '...'
            			print hsp.match[0:75] + '...'
            			print hsp.sbjct[0:75] + '...'


From biopython at maubp.freeserve.co.uk  Thu Mar 30 10:56:29 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Thu, 30 Mar 2006 16:56:29 +0100
Subject: [BioPython] Need help on NCBIStandaloneblast
In-Reply-To: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>
References: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>
Message-ID: <442BFFAD.10103@maubp.freeserve.co.uk>

Halima Rabiu wrote:
> Hi everyboby ;
> I am new to biopython having problems with the "NCBIStandalone.blastall".
> After launching the Blast with "doBlast" it look like runs and end
> and then I check the output it empty and I try same thing using comand
> line it work and get result.
> I attch my code.

Have you checked the paths are correct, e.g.

assert os.path.isfile(data), "Missing database file " + data
assert os.path.isfile(infile), "Missing input file " + infile

You don't need to check blast_exe yourself, as the blastall command does 
this for you.

If I understood you correctly, the "blast.out" file is empty.

Did blast return any error message?  Try:

print error_info.read()

or:

save_file =open("blast.error","w")
blast_result=error_info.read()
save_file.write(blast_result)
save_file.close()

Next question, could you tell us what you typed at the command line 
which does work?

> I also try to go though the previous posts on biopython mailing list fund
> similar problem post by Andreas but no solution to the problem .

It was worth checking anyway :)

Peter


From mdehoon at c2b2.columbia.edu  Fri Mar 31 12:22:13 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 31 Mar 2006 12:22:13 -0500
Subject: [BioPython] BOSC announcement
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEDC@cgcmail.cgc.cpmc.columbia.edu>

MEETING ANNOUNCEMENT & CALL FOR SPEAKERS

The 7th annual Bioinformatics Open Source Conference (BOSC 2006) is
organized by the not-for-profit Open Bioinformatics Foundation. The meeting
will take place Aug 4,5th in Fortaleza, Brasil, and is one of several Special
Interest
Group (SIG) meetings occurring in conjunction with the 14th International
Conference on Intelligent Systems for Molecular Biology.  Please consult The
Official BOSC 2006 Website at

http://www.open-bio.org/wiki/BOSC_2006

for details and information.

In addition, a BOSC weblog has been setup to make it easier to desiminate all
BOSC related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB calendar set
up with all BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/


Thank You, and we look forward to seeing you all,
The BOSC Organizing Committee.


From mdehoon at c2b2.columbia.edu  Sat Mar 18 20:40:51 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sat, 18 Mar 2006 15:40:51 -0500
Subject: [BioPython] Test - please ignore
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEC4@cgcmail.cgc.cpmc.columbia.edu>

Just testing if I can send to this mailing list. One of our users complained
that his messages were getting bounced, although he is a member of this
mailing list.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From biopython at maubp.freeserve.co.uk  Tue Mar 21 12:30:16 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython))
Date: Tue, 21 Mar 2006 12:30:16 +0000
Subject: [BioPython] EMBOSS programs and their alignment formats
Message-ID: <441FF1D8.2060904@maubp.freeserve.co.uk>

I've been having a look at BioPython's Emboss support and it looks like 
a (partial) set of command line interfaces to the tools, with additional 
code for some of the primer tools and their formats.

As far as I can tell, there is no support for any of the Emboss 
alignment output formats:

http://emboss.sourceforge.net/docs/themes/AlignFormats.html

Some (all?) of the alignment programs will happily produce gapped FASTA 
output, but this excludes other information like the alignment score 
etc.  The alignments themselves could be analysed to extract the 
alignment length, identity, similarity and gap counts.

However, the FASTA format does not include the algorithm specific score, 
nor other program parameters which might be of interest (like the matrix 
and gap penalties).

e.g.

########################################
# Program:  demoalign
# Rundate:  Thu Jan 17 09:30:08 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 9
# Extend_penalty: -1
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
#
#
#=======================================

(followed by the aligned sequences)

Has anyone tackled supporting these files in BioPython?

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Fri Mar 24 14:56:14 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Fri, 24 Mar 2006 14:56:14 +0000
Subject: [BioPython] Tweaking GenomeDiagram
Message-ID: <4424088E.9090004@maubp.freeserve.co.uk>

This email is mainly aimed at Leighton Pritchard (who I have spotted 
posting on the list in the past) as it concerns his (bio)python add-on, 
GenomeDiagram:

http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

First Query
-----------
I would like to attach labels to (selected) features.

For example, I am drawing a circular genome diagram with a selection of 
colour coded genes - some of which I would like to have individually 
labelled.  This might be done in a similar way to the genome size tick 
captions (i.e. horizontal text) or perhaps rotated text (radially aligned).

However, as far as I can tell from the documentation and the source 
code, this is not built in.

Second Query
-------------
When drawing circular genomes following the examples, the major tick 
marks seem to be at 1, 10001, 20001, ... (depending on the tick interval 
size).

It would look much better to display 10000 rather than 10001 (or even, 
leading to my third question, 10 Kb).

Third Query
-----------
I would like to have genome size "tick labels" in terms of kilo-bases or 
mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than 2000000).  I 
have done this myself by "hacking the source code" but my implementation 
is rather special case.

So, has anyone tried to tackle these issues before?

Thanks

Peter


From jchang at smi.stanford.edu  Fri Mar 24 14:06:11 2006
From: jchang at smi.stanford.edu (jchang at smi.stanford.edu)
Date: Fri, 24 Mar 2006 09:06:11 -0500
Subject: [BioPython] Lecturer needed for "Advanced Python"
In-Reply-To: <4423E03E.20604@nbn.ac.za>
References: <4423E03E.20604@nbn.ac.za>
Message-ID: <20060324140609.GA266@sophie.local>

Hello,

Ruediger Braeuning has asked me to forward this to the list.

Jeff


On Fri, Mar 24, 2006 at 02:04:14PM +0200, Ruediger Braeuning wrote:
> Hi,
> 
> I'm writing to you from the South African National Bioinformatics
> Network (NBN). We need a lecturer for our course in "Advanced Python" as
> Andrew Dalke (the lecturer of last year's "Advanced Python") is not
> available this year.
> 
> So if you are qualified and want to spend some time in Cape Town drop me
> a line.
> 
> Time
> ----
> We allocated 36 days for this module (Thursday, August 10th till Friday,
> September 29th, 2006). I know that this is a long time but as you can
> see from the daily schedule it's just 3 hours per day. For the rest of
> the day you are free and we can provide you with an office.
> 
> Expenses
> ---------
> We arrange and cover your flight, local transport, accommodation, meals.
> There is also a small honorarium of ZAR 300 per day of teaching.
> 
> Please find more details below:
> 
> The National Bioinformatics Network (NBN)
> ------------------------------------------
> The NBN was established to stimulate and support growth and development
> of Bioinformatics as a scientific and applied discipline in South Africa
> at an internationally competitive level.
> 
> The Course
> -----------
> We run national courses on an annual basis. Details of the courses
> (content, lecture material) that were run in 2004 & 2005 can be found
> under "Bioinformatics Workshop Modules" at
> http://www.nbn.ac.za/Education/course.html
> 
> The course content is aimed at covering as much of our Bioinformatics
> core curriculum as possible.
> 
> Your module
> ------------
> We have the following suggestions for your module:
> 
> - Data structures (object oriented design)
> - Libraries, BioPython
> - How to write larger pieces of code
> - Interface Design
> - Usability Testing Methodologies
> 
> Note: These are just suggestions. You are the expert and we would
> welcome your ideas. The students already got 35 days of introduction to
> Python (Feb - Apr 2006). I'd be more than happy to hook you up with the
> lecturer of that module.
> 
> IMPORTANT: Please let us know of any prerequisites you require your
> students to have. We can then make certain course modules a prerequisite
> for your module.
> - module on "Introductory Python"
> 
> Also let us know of any required reading your students have to for
> preparation.
> 
> As your module is part of the bigger course on Bioinformatics we would
> like to encourage you to show the relevance of your module for the whole
> discipline and use Bioinformatics examples. The courses should start
> easy, get everybody on board and then go into detail. Lectures should be
> complemented by hands on sessions, which we believe is absolutely
> crucial to the success of teaching and training. Problem based teaching
> approaches worked best for our students. The NBN strongly emphasizes
> open source solutions. It would be great if you could support this by
> choosing your software accordingly.
> 
> Daily Schedule
> --------------
> 08:00-10:45    Python
> 10:45-11:00    Break
> 11:00-13:00    Lecture/Practical for another module
> 13:00-14:00    Lunch
> 14:00-15:30    Lecture/Practical for another module
> 15:30-15:45    Break
> 15:45-17:00    Lecture/Practical for another module
> 
> Students will be taught Python every day of the course. We encourage all
> our lecturers to exchange ideas for little tasks that are relevant for
> their module and can be tackled in Python with the Python lecturer.
> 
> Background of students
> -----------------------
> Course participants will come from a range of different backgrounds
> (Biology, Computer Science) and comprise people who study Bioinformatics
> (attendance is mandatory for students with a NBN bursary) and people who
> are "just" interested in particular aspects of Bioinformatics. Students
> will also vary in terms of seniority from undergraduates to postdocs
> with the majority being postgraduates. You should therefore expect a
> higher degree of heterogeneity of knowledge amongst the course
> participants than you would normally expect. This means you should be
> prepared to have some flexibility in adjusting your course schedule to
> the participants.
> 
> Number of students
> -------------------
> We expect a maximum of 25 students.
> 
> Assistants
> ----------
> Please let me know if you need some student assistants.
> 
> Facilities
> -----------
> Every participant will have her/his own PC to work on. A video projector
> and stable internet access is in place.
> 
> Evaluation and Assignments
> ---------------------------
> To give you and us feedback on the success of the course we will hand
> out evaluation questionnaires after each module. Students also have to
> take one assignment per course (our pass mark is 65%). In that respect
> we would like to ask you to provide and mark your assignment.
> 
> Should you require further information please don?t hesitate to contact
> me at ruediger at nbn.ac.za or + 27 21 959 2991. I?m also more than happy
> to give you a call.
> 
> I?m looking forward to working with you.
> 
> Should you not be available I would be grateful if you could recommend
> another lecturer to me.
> 
> Ruediger
> -- 
> Ruediger Braeuning    /   National Bioinformatics Network
>              (=)  University of the Western Cape
> Ph. +27 21 959 2991   /   Private Bag X17
> Fax +27 21 959 3573  (=)  Bellville, 7535
> www.nbn.ac.za         /   South Africa


From Teemu.Kuulasmaa at uku.fi  Sat Mar 25 16:56:48 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Sat, 25 Mar 2006 18:56:48 +0200
Subject: [BioPython] biopython and dbSNP
Message-ID: <44257650.6090602@uku.fi>

Hi,

I am absolute beginner in python and biopython. I am trying to
familiarize myself with biopython. I have java background but I think
that python (and biopython) would be better tool to automate my daily
routines. Python seems to be superior language for quick (and sometimes
dirty) scripting compared to java. I would like to write some python
scripts that help me to work with SNPs and DNA/RNA sequences. I work
with SNPs daily basis.

However,I was disappointed because I didn't find any notice about dbSNP
from biopython documentation. At the beginnig I would like to be able to
retrieve some SNP records from NCBI's dbSNP and parse them. Is there any
ready made classes for that purpose? GenBank.search_for('id',
'database=xxx',...) function doesn't seem to support 'database=snp'
parameter.

To put it simple: Am I able to work with dbSNP by using biopython?

Best regards,

Teemu Kuulasmaa


From dag at sonsorol.org  Sat Mar 25 23:50:57 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Sat, 25 Mar 2006 18:50:57 -0500
Subject: [BioPython] Important news for developers on open-bio machines
Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org>


Hi, apologies for the massive cross-post. I'll keep it short!

This message is a last-ditch attempt to contact people with developer  
accounts on pub.open-bio.org who may have not received the individual  
mails we've been sending via the obf-developers at lists.open-bio.org  
mailing list. We suspect that there are a number of devs out there  
for whom we don't have up to date email addresses.

All open-bio services have been migrated to new hardware and a new  
datacenter. Part of this migration process involved moving all  
developer accounts and all source-code repositories to a new server.  
The developer migration was completed a few minutes ago. An  
unavoidable side effect of the move is that all developers are now  
locked out of their accounts until they contact us for a password reset.

If you are a developer and this news comes as a surprise to you, it  
means we don't have your contact info. Your best way to get up to  
speed on the history and technical details behind the migration is to  
point your browser here:

http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ 
thread.html

... and read the various messages we've posted this month. Included  
in the first message is the information on how to request an account  
reset.


Regards,
Chris Dagdigian
open-bio.org


From biopython at maubp.freeserve.co.uk  Mon Mar 27 14:48:52 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Mon, 27 Mar 2006 15:48:52 +0100
Subject: [BioPython] Tweaking GenomeDiagram
In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
Message-ID: <4427FB54.1040408@maubp.freeserve.co.uk>

Thanks Leighton',

I've included most of your reply for the benefit of the BioPython 
mailing list and its archive...

>>> First Query
>>> -----------
>>> I would like to attach labels to (selected) features.
>>> 
>>> For example, I am drawing a circular genome diagram with a selection of 
>>> colour coded genes - some of which I would like to have individually 
>>> labelled.  This might be done in a similar way to the genome size tick 
>>> captions (i.e. horizontal text) or perhaps rotated text (radially aligned).
>>> 
>>> However, as far as I can tell from the documentation and the source 
>>> code, this is not built in.

I clearly didn't read the right bit of the source code.

Leighton wrote:
> Each individual GDFeature has a label attribute, taking a Boolean, that
> allows you to set whether its label is displayed or not.  You could set
> this on feature creation, ...

e.g.

for feature in genbank_entry.features:
     if feature.type == 'CDS':
         gdfs.add_feature(feature, label=False, colour=colors.lightgreen)
     elif feature.type == 'tRNA' :
         gdfs.add_feature(feature, label=True, colour=colors.red)

(This can easily be used with the examples in the documentation)

Leighton wrote:
 > ... or at some later stage with a filter.  If you're working with
 > N.equitans, and your GDFeatureSet is called `gdfs1', for example, this
 > code:
> 
>     gdfs1.set_all_features('label', 0)
>     for feature in gdfs1.features.values():
>         print feature.name, feature.label
>         if feature.name.startswith('NEQ016'):
>             feature.label = 1

Maybe that should be gdfs._features rather than gdfs.features?

> will label only features whose names begin with NEQ016.  You'll probably
> already see how flexible this can be if you add your own attributes to
> GDFeature objects when they're created (BLAST scores, expression
> leveles, membership of functional classes, etc.).  You can set the
> label_font, label_size, label_colour and label_angle attributes in the
> same kind of way.

By default, when using the add_feature method with a SeqFeature 
extracted from a GenBank file, GenomeDiagram will look at the 'gene', 
'label', 'locus_tag' and 'product' qualifiers for potential labels (in 
that order).  (See code in GDFeature.py class GDFeature.)

It might be usefull to be able to supply the prefered label caption as 
part of the add_feature command.

Thanks

Peter


From lpritc at scri.sari.ac.uk  Mon Mar 27 09:17:01 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Mon, 27 Mar 2006 10:17:01 +0100
Subject: [BioPython] Tweaking GenomeDiagram
Message-ID: <1143451021.18558.228.camel@lplinuxdev>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment-0001.ksh>
-------------- next part --------------
An embedded message was scrubbed...
From: Leighton Pritchard <lpritc at scri.sari.ac.uk>
Subject: Re: Tweaking GenomeDiagram
Date: Mon, 27 Mar 2006 10:17:01 +0100
Size: 5030
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060327/9aaca755/attachment.eml>

From lpritc at scri.sari.ac.uk  Mon Mar 27 15:42:10 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Mon, 27 Mar 2006 16:42:10 +0100
Subject: [BioPython] Tweaking GenomeDiagram
In-Reply-To: <4427FB54.1040408@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<4427FB54.1040408@maubp.freeserve.co.uk>
Message-ID: <1143474132.18558.260.camel@lplinuxdev>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment-0001.ksh>
-------------- next part --------------
An embedded message was scrubbed...
From: Leighton Pritchard <lpritc at scri.sari.ac.uk>
Subject: Re: [BioPython] Tweaking GenomeDiagram
Date: Mon, 27 Mar 2006 16:42:10 +0100
Size: 5015
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060327/ad206104/attachment.eml>

From biopython at maubp.freeserve.co.uk  Mon Mar 27 18:03:14 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Mon, 27 Mar 2006 19:03:14 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram
 0.2
In-Reply-To: <4424088E.9090004@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
Message-ID: <442828E2.3040703@maubp.freeserve.co.uk>

Peter wrote:
 >
 > This email is mainly aimed at Leighton Pritchard (who I have spotted
 > posting on the list in the past) as it concerns his (bio)python
 > add-on, GenomeDiagram:
 >
 > http://bioinf.scri.ac.uk/lp/programs.html#genomediagram
 >
 > First Query
 > -----------
 > I would like to attach labels to (selected) features.

See earlier email with example - I hadn't looked closely enough.

http://www.biopython.org/pipermail/biopython/2006-March/002967.html

 > Second Query
 > -------------
 > When drawing circular genomes following the examples, the major tick
 > marks seem to be at 1, 10001, 20001, ... (depending on the tick
 > interval size).
 >
 > It would look much better to display 10000 rather than 10001 (or even,
 > leading to my third question, 10 Kb).

Fixed in GenomeDiagram release 0.2

 > Third Query
 > -----------
 > I would like to have genome size "tick labels" in terms of kilo-bases
 > or mega-bases (i.e. 3 Kb rather than 3000, or 2 Mb rather than
 > 2000000).

This is a new option in GenomeDiagram release 0.2, see below.

Leighton has also improved the positioning of the size captions on the 
lower half of circular diagrams, and probably other things as well.

The following email was sent to me, and I am forwarding it to the 
mailing list because Leighton PGP signature was confusing the server.

Thanks again,

Peter

--------------------------------------------------------------------

Hi Peter,

I think I've implemented everything you asked for, and the new source
and Windows installer are located at:

http://bioinf.scri.sari.ac.uk/lp/programs.php

and

http://bioinf.scri.sari.ac.uk/lp/programs.html#genomediagram

(take your pick).

To use the new features, you need to do the following sort of thing:


     parser = GenBank.FeatureParser()
     fhandle = open
('/data/genomes/Bacteria/Nanoarchaeum_equitans/NC_005213.gbk','r')
     genbank_entry = parser.parse(fhandle)
     fhandle.close()
     gdd = GDDiagram('Test Diagram')

     gdfs1 = GDFeatureSet(name='CDS features')
     for feature in genbank_entry.features:
         if feature.type == 'CDS':
             # This is how you can override any attribute of the
             # GDFeature as you add it to the GDFeatureSet, just by
             # passing the appropriate keyword and argument
             gdfs1.add_feature(feature, name="Some feature or other")
     # By passing the scale_format = "SInt" argument, you can use SI-like
     # suffixes for scale markers.  So far we only have Kbp and Mbp
     # suffixes, and the default goes to just a string of the marker
     # base postion.
     gdt1 = GDTrack('CDS features', greytrack=1,
                    scale_largetick_interval=1e4,
                    scale_smalltick_interval=1e3,
                    scale_format = "SInt")
     gdt1.add_set(gdfs1)
     # You can now do regular expression comparisons, startswith
     # comparisons, exclusions or just plain matches to any GDFeature
     # attribute, just by passing the appropriate attribute, value and
     # comparator mode
     mod_features = gdfs1.get_features('name', 'NEQ0[2-4]', 'like')
     #mod_features = gdfs1.get_features('name', 'NEQ02', 'startswith')
     #mod_features = gdfs1.get_features('name', 'NEQ05', 'not')
     #mod_features = gdfs1.get_features('name', 'NEQ016')
     for feature in mod_features:
         feature.label = 1

And, finally, the marker labels in the lower halves of GenomeDiagram
images have been lowered so that they hit the marker line at the top of
the string, rather than the bottom.

Phew!

L.

-- Dr Leighton Pritchard AMRSC D131, Plant-Pathogen Interactions, 
Scottish Crop Research Institute Invergowrie, Dundee, Scotland, DD2 5DA, 
UK T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578 E: 
lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/lp


From Teemu.Kuulasmaa at uku.fi  Tue Mar 28 06:59:27 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Tue, 28 Mar 2006 09:59:27 +0300
Subject: [BioPython] biopython and dbSNP (2)
Message-ID: <4428DECF.4070406@uku.fi>

Hi,

I made some experimentations and got GenBank.search_for() and 
GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
get GenBank.NCBIDictionary() to work. I do not know if this is right way 
to do it. It would by nice if someone (biopython-dev) could speak out on 
the matter.

Here are two very small diffs (against biopython version 1.41) that were 
required to get dbSNP sequence retrieval to work:
----------------------------------------------------------
ubuntu at ubuntu:~/src/biopython$ diff 
/usr/lib/python2.4/site-packages/Bio/EUtils/Config.py Config.py
58c58
< databases.SNP = _add_db(DatabaseInfo("snp", 1))

ubuntu at ubuntu:~/src/biopython$ diff 
/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py __init__.py
1422,1423d1421
<     elif database in ['snp']:
<         format = 'fasta'


This is example how it works after these modifications:
----------------------------------------------------------
ubuntu at ubuntu:~$ python
Python 2.4.2 (#2, Mar  5 2006, 00:03:25)
[GCC 4.0.3 20060212 (prerelease) (Ubuntu 4.0.2-9ubuntu1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> from Bio import GenBank
 >>> snps = GenBank.search_for('rs8192602', 'snp')
 >>> snps
['8192602']
 >>> seqs = GenBank.download_many(snps, 'snp')
 >>> print seqs.read()

1: rs8192602 [Homo sapiens]
 >gnl|dbSNP|rs8192602 
rs=8192602|pos=272|len=397|taxid=9606|mol="genomic"|class=1|alleles="A/G"|build=117
TGGCAGAGTG GGGAGTAGGA GGGTAGTGCC AGTGAGTAAA CCAGACTCCA TACCTTAAGC 
TCAACTCCTA TCCCTTTGTC
GCCTCCCAAC CCCAGTCATG GCTGAGTACG GGACCCTCCT GCAAGACCTG ACCAACAACA 
TCACCCTTGA AGATCTAGAA
CAGCTCAAGT CGGCCTGCAA GGAAGACATC CCCAGCGAAA AGAGTGAGGA GATCACTACT 
GGCAGTGCCT GGTTTAGCTT
CCTGGAGAGC CACAACAAGC TGGACAAAGG T
R
GGGGAGGGGA GCACAGGGGT CCTGTCATCA GTCATTCAGG CTCAGTTCAT TCAGCAAATA 
GAGATGAGCT CAAAGCTTTT
ACATCCACAA TGTGTACCCC TCTATAGCAA GGCAGAAGAG AGGTG


Best regards,

Teemu Kuulasmaa


From biopython at maubp.freeserve.co.uk  Tue Mar 28 17:11:12 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Tue, 28 Mar 2006 18:11:12 +0100
Subject: [BioPython] biopython and dbSNP (2)
In-Reply-To: <4428DECF.4070406@uku.fi>
References: <4428DECF.4070406@uku.fi>
Message-ID: <44296E30.1070809@maubp.freeserve.co.uk>

Teemu Kuulasmaa wrote:
> Hi,
> 
> I made some experimentations and got GenBank.search_for() and 
> GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
> get GenBank.NCBIDictionary() to work. I do not know if this is right way 
> to do it. It would by nice if someone (biopython-dev) could speak out on 
> the matter.
> 
> Here are two very small diffs (against biopython version 1.41) that were 
> required to get dbSNP sequence retrieval to work:

I'm not familiar with this aspect of the GenBank support, but your code 
looks OK to me.

I tried your two changes on the CVS version of EUtils and GenBank and it 
works for me (the GenBank file has had significant changes to the file 
parser).

One question is are the GenBank.search_for() and GenBank.download_many() 
functions intended just for "GenBank" (officially just the nucleotides?) 
or other sequence based EUtils databases like proteins, snp, ..., or 
even genomes.

Unless anyone else cares to comment, I'll commit Teemu's two small 
changes in the next few days.

As to getting GenBank.NCBIDictionary() to work with the snp database, 
its not as easy as it looks.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 28 18:06:08 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Tue, 28 Mar 2006 19:06:08 +0100
Subject: [BioPython] biopython and dbSNP (2)
In-Reply-To: <44296E30.1070809@maubp.freeserve.co.uk>
References: <4428DECF.4070406@uku.fi> <44296E30.1070809@maubp.freeserve.co.uk>
Message-ID: <44297B10.9070000@maubp.freeserve.co.uk>

Peter (BioPython List) wrote:
> Teemu Kuulasmaa wrote:
> 
>>Hi,
>>
>>I made some experimentations and got GenBank.search_for() and 
>>GenBank.download_many() to work with dbSNP. However, I didn't succeed to 
>>get GenBank.NCBIDictionary() to work. I do not know if this is right way 
>>to do it. It would by nice if someone (biopython-dev) could speak out on 
>>the matter.
>>
>>Here are two very small diffs (against biopython version 1.41) that were 
>>required to get dbSNP sequence retrieval to work:
> 
> 
> I'm not familiar with this aspect of the GenBank support, but your code 
> looks OK to me.
> 
> I tried your two changes on the CVS version of EUtils and GenBank and it 
> works for me (the GenBank file has had significant changes to the file 
> parser).
> 
> One question is are the GenBank.search_for() and GenBank.download_many() 
> functions intended just for "GenBank" (officially just the nucleotides?) 
> or other sequence based EUtils databases like proteins, snp, ..., or 
> even genomes.
> 
> Unless anyone else cares to comment, I'll commit Teemu's two small 
> changes in the next few days.
> 
> As to getting GenBank.NCBIDictionary() to work with the snp database, 
> its not as easy as it looks.

Trying this with SNP's (having applied Teemu Kuulasmaa's changes) we get 
back "mangled FASTA entries" with additional headers and blank lines.

Ignoring the spaces in the sequences (which appear mostly in ten 
nucleotide blocks with a space in between) we get:

 >>> seqs = GenBank.download_many(['8192602','8192603'], 'snp')
 >>> print seqs.read()

1: rs8192602 [Homo sapiens]
 >gnl|dbSNP|rs8192602 ...
TGGCAGAGTG...

2: rs8192603 [Homo sapiens]
 >gnl|dbSNP|rs8192603 ...
TGGTGGGCAG...

The blank lines shouldn't be a problem for the BioPython's FASTA parser.

However, due to the extra lines look like "{Result Number}: {Identifier} 
[{Species}]" this is NOT a valid FASTA format file.

This may be an NCBI EUtils problem... following their FAQ, I tested this 
URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA

and this:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA

And it does the same sort of thing :(

I have emailed the NCBI...

Peter


From Teemu.Kuulasmaa at uku.fi  Wed Mar 29 09:07:00 2006
From: Teemu.Kuulasmaa at uku.fi (Teemu Kuulasmaa)
Date: Wed, 29 Mar 2006 12:07:00 +0300
Subject: [BioPython]  biopython and dbSNP (2)
References: 44296E30.1070809@maubp.freeserve.co.uk
Message-ID: <442A4E34.8050206@uku.fi>

> Peter (BioPython List) wrote:
> 
> The blank lines shouldn't be a problem for the BioPython's FASTA parser.
> 
> However, due to the extra lines look like "{Result Number}: {Identifier} 
> [{Species}]" this is NOT a valid FASTA format file.
> 
> This may be an NCBI EUtils problem... following their FAQ, I tested this 
> URL:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602&report=FASTA
> 
> and this:
> 
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=8192602,8192603&report=FASTA
> 
> And it does the same sort of thing :(
> 
> I have emailed the NCBI...
> 
> Peter

Thank you for your response Peter!

Like you said the NCBI EUtils result is not valid Fasta formated file. I 
hope that NCBI will fix this issue soon.

Let us know if you get any kind of feedback from NCBI!

Teemu

-- 
  Teemu Kuulasmaa, M.Sc.

  University of Kuopio
  Laboratory of Internal Medicine
  P.O.Box 1627
  70211 Kuopio
  FINLAND

  Tel	+358 1716 3498
  Fax	+358 1716 2445


From biopython at maubp.freeserve.co.uk  Wed Mar 29 13:18:30 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Wed, 29 Mar 2006 14:18:30 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome Diagram
 0.21
In-Reply-To: <442828E2.3040703@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<442828E2.3040703@maubp.freeserve.co.uk>
Message-ID: <442A8926.8040101@maubp.freeserve.co.uk>

There was a packing problem with GenomeDiagram 0.2 (missing new module 
Observer), which Leighton has fixed with the release of GenomeDiagram 
0.21, available here:

http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

This also adds a dpi option to GDDiagram.write() for raster output - 
which is handy for generating high resolution PNG files.

Peter


From lpritc at scri.sari.ac.uk  Wed Mar 29 14:05:21 2006
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Wed, 29 Mar 2006 15:05:21 +0100
Subject: [BioPython] Tweaking GenomeDiagram / Release of Genome
	Diagram	0.21
In-Reply-To: <442A8926.8040101@maubp.freeserve.co.uk>
References: <4424088E.9090004@maubp.freeserve.co.uk>
	<442828E2.3040703@maubp.freeserve.co.uk>
	<442A8926.8040101@maubp.freeserve.co.uk>
Message-ID: <1143641121.4788.9.camel@lplinuxdev>

On Wed, 2006-03-29 at 14:18 +0100, Peter (BioPython List) wrote:
> There was a packing problem with GenomeDiagram 0.2 (missing new module 
> Observer), 

The problem was me - the packaging did exactly what I told it to, more's
the pity ;)

> which Leighton has fixed with the release of GenomeDiagram 
> 0.21, available here:
> 
> http://bioinf.scri.ac.uk/lp/programs.html#genomediagram

I'll leave the advertising in...

-- 
Dr Leighton Pritchard AMRSC
D131, Plant-Pathogen Interactions, Scottish Crop Research Institute
Invergowrie, Dundee, Scotland, DD2 5DA, UK
T: +44 (0)1382 562731 x2405 F: +44 (0)1382 568578
E: lpritc at scri.sari.ac.uk   W: http://bioinf.scri.sari.ac.uk/lp
GPG/PGP: FEFC205C E58BA41B  http://www.keyserver.net             
(If the signature does not verify, please remove the SCRI disclaimer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.sari.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From halima at mancala.cbio.uct.ac.za  Thu Mar 30 07:17:10 2006
From: halima at mancala.cbio.uct.ac.za (Halima Rabiu)
Date: Thu, 30 Mar 2006 09:17:10 +0200 (SAST)
Subject: [BioPython] Need help on NCBIStandaloneblast
Message-ID: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>

Hi everyboby ;
I am new to biopython having problems with the "NCBIStandalone.blastall".
 After launching the Blast with "doBlast" it look like runs and end
and then I check the output it empty and I try same thing using comand
line it work and get result.
I attch my code.

I also try to go though the previous posts on biopython mailing list fund
similar problem post by Andreas but no solution to the problem .
Please can somebody help
Thanks
Nike
-------------- next part --------------
#! /usr/local/bin/python2.4
#halimah
#20-03-2006
from Bio.Blast import NCBIStandalone
import os
# path to my database
data=os.path.join(os.getcwd(),"Newprotein.db","Nprotein.Fdb")
# input file (protein sequence in fasta )
infile=os.path.join(os.getcwd(),"Newprotein.db","mytest.txt",'r')

# path to Blastall executable
blast_exe=os.path.join("/","usr","local","blast","bin","blastall")

output,error_info =NCBIStandalone.blastall(blast_exe,"blastp", data, infile)
#print output.readline()

save_file =open("blast.out","w")
blast_result=output.read()
save_file.write(blast_result)
save_file.close()
	

blastfile = open('blast.out', 'r')
b_parser = NCBIStandalone.BlastParser()
b_iterator = NCBIStandalone.Iterator(blastfile, b_parser)
while 1:
	b_record = b_iterator.next()
	if b_record is None:
		break
#This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let's just use print out a quick summary of all of the alignments greater than some threshold value.

	E_VALUE_THRESH = 1.00
	for alignment in b_record.alignments:
    		for hsp in alignment.hsps:
        		if hsp.expect < E_VALUE_THRESH:
            			print '****Alignment****'
            			print 'sequence:', alignment.title
	            		print 'length:', alignment.length
            			print 'e value:', hsp.expect
            			print hsp.query[0:75] + '...'
            			print hsp.match[0:75] + '...'
            			print hsp.sbjct[0:75] + '...'


From biopython at maubp.freeserve.co.uk  Thu Mar 30 15:56:29 2006
From: biopython at maubp.freeserve.co.uk (Peter (BioPython List))
Date: Thu, 30 Mar 2006 16:56:29 +0100
Subject: [BioPython] Need help on NCBIStandaloneblast
In-Reply-To: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>
References: <Pine.LNX.4.58.0603300915090.7802@mancala.cbio.uct.ac.za>
Message-ID: <442BFFAD.10103@maubp.freeserve.co.uk>

Halima Rabiu wrote:
> Hi everyboby ;
> I am new to biopython having problems with the "NCBIStandalone.blastall".
> After launching the Blast with "doBlast" it look like runs and end
> and then I check the output it empty and I try same thing using comand
> line it work and get result.
> I attch my code.

Have you checked the paths are correct, e.g.

assert os.path.isfile(data), "Missing database file " + data
assert os.path.isfile(infile), "Missing input file " + infile

You don't need to check blast_exe yourself, as the blastall command does 
this for you.

If I understood you correctly, the "blast.out" file is empty.

Did blast return any error message?  Try:

print error_info.read()

or:

save_file =open("blast.error","w")
blast_result=error_info.read()
save_file.write(blast_result)
save_file.close()

Next question, could you tell us what you typed at the command line 
which does work?

> I also try to go though the previous posts on biopython mailing list fund
> similar problem post by Andreas but no solution to the problem .

It was worth checking anyway :)

Peter


From mdehoon at c2b2.columbia.edu  Fri Mar 31 17:22:13 2006
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 31 Mar 2006 12:22:13 -0500
Subject: [BioPython] BOSC announcement
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECEDC@cgcmail.cgc.cpmc.columbia.edu>

MEETING ANNOUNCEMENT & CALL FOR SPEAKERS

The 7th annual Bioinformatics Open Source Conference (BOSC 2006) is
organized by the not-for-profit Open Bioinformatics Foundation. The meeting
will take place Aug 4,5th in Fortaleza, Brasil, and is one of several Special
Interest
Group (SIG) meetings occurring in conjunction with the 14th International
Conference on Intelligent Systems for Molecular Biology.  Please consult The
Official BOSC 2006 Website at

http://www.open-bio.org/wiki/BOSC_2006

for details and information.

In addition, a BOSC weblog has been setup to make it easier to desiminate all
BOSC related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB calendar set
up with all BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/


Thank You, and we look forward to seeing you all,
The BOSC Organizing Committee.