From rhf22 at mole.bio.cam.ac.uk  Mon Dec  1 08:17:06 2003
From: rhf22 at mole.bio.cam.ac.uk (Rasmus Fogh)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] ScrioptCentral
Message-ID: <Pine.SGI.4.33.0312011313410.37439243-100000@mole.bio.cam.ac.uk>

Hi,

I have written you before about possibly joining up to the BioPython
project. I am from the CCPN project (http://www.ccpn.ac.uk/index.html).

We do not think we could do that at the moment, but we would be interested
in putting a link to our web page in the ScriptCentral, if possible.
Clearly I need to join up, somehow, to get a userID and password.

Do you think this would be possible?

How should I proceed?

Thanks,

Rasmus

---------------------------------------------------------------------------
Dr. Rasmus H. Fogh                  Email: r.h.fogh@bioc.cam.ac.uk
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK.     FAX (01223)766002


From chapmanb at uga.edu  Mon Dec  1 09:10:37 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] ScriptCentral
In-Reply-To: <Pine.SGI.4.33.0312011313410.37439243-100000@mole.bio.cam.ac.uk>
References: <Pine.SGI.4.33.0312011313410.37439243-100000@mole.bio.cam.ac.uk>
Message-ID: <20031201141037.GA95612@evostick.agtec.uga.edu>

Hi Rasmus;

> I have written you before about possibly joining up to the BioPython
> project. I am from the CCPN project (http://www.ccpn.ac.uk/index.html).
> 
> We do not think we could do that at the moment, but we would be interested
> in putting a link to our web page in the ScriptCentral, if possible.
> Clearly I need to join up, somehow, to get a userID and password.

That would be great. The ScriptCentral page is editable from the web
with the username 'biopython' and the password 'user' (no quotes on
both). From there you can click 'Edit this page' and then 'Add New'
and you can go forward and enter the contact information about the
page. If you have any problems at all, feel free to send your
information (Name, Author, URL and Description) to me and I'd be
happy to add it.

Thanks! Hope this helps.
Brad

From jpaint at u.washington.edu  Mon Dec  1 13:42:54 2003
From: jpaint at u.washington.edu (Jay Painter)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] mmLib 0.5 Released
Message-ID: <1070304174.3736.6.camel@d-128-95-235-174.dhcp4.washington.edu>

Hello,

I have just released a new version of mmLib (full description below). 
This version includes a new monomer library based on the RCSB's standard
component library, classes for unit cell calculations, a full space
group library, and a enhanced GUI mmCIF editor written with the PyGTK
toolkit bindings.

Regards,
Jay Painter


The Python Macromolecular Library (mmLib) is a software toolkit and
library of routines for the analysis and manipulation of macromolecular
structural models, implemented in the Python programming language. It is
accessed via a layered, object-oriented application programming
interface, and provides a range of useful software components for
parsing mmCIF, PDB, and MTZ files, a library of atomic elements and
monomers, an object-oriented data structure describing biological
macromolecules, and an OpenGL molecular viewer. The mmLib data model is
designed to provide easy access to the various levels of detail needed
to implement high-level application programs for macromolecular
crystallography, NMR, modeling, and visualization. This includes
specialized classes for proteins, DNA, amino acids, and nucleic acids.
Also included is a extensive monomer library, element library, and
specialized classes for performing unit cell calculations combined with
a full space group library.


From idoerg at burnham.org  Fri Dec 12 14:28:59 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Error parsing a genBank file
Message-ID: <3FDA16FB.2000209@burnham.org>


Hi,

I am getting an error on parsing a GenBank file. It's the E. coli K12 
Genome, downloaded from:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NC_000913

Egads! How do I turn Martel's debug feature on? Setting it to level 2 in 
FeatureParser didn't seem to do much...


Thanks,

Iddo


 >>> gb_iter = GenBank.Iterator(open('ecoli_k12.gb'),fp)
 >>> fp = GenBank.FeatureParser()
 >>> cur_rec = gb_iter.next()
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line 
142, in next
     return self._parser.parse(File.StringHandle(data))
   File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", 
line 229, in parse
     self._scanner.feed(handle, self._consumer)
   File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", 
line 1251, in feed

   File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 328, 
in parseFile
     self.parseString(fileobj.read())
   File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 356, 
in parseString
     self._err_handler.fatalError(result)
   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 
38, in fatalError
     raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond 
character 1414769


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From idoerg at burnham.org  Fri Dec 12 14:51:04 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] SORRY!
Message-ID: <3FDA1C28.4030200@burnham.org>

Re: my previous message. Not a bug. Needed the bleeding edge CVS 
version. Everything's hunky-dory.

./I


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From chapmanb at uga.edu  Fri Dec 12 14:53:45 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Error parsing a genBank file
In-Reply-To: <3FDA16FB.2000209@burnham.org>
References: <3FDA16FB.2000209@burnham.org>
Message-ID: <20031212195345.GF6895@evostick.agtec.uga.edu>

Hi Iddo;

> I am getting an error on parsing a GenBank file. It's the E. coli K12 
> Genome, downloaded from:
> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NC_000913

It parses for me with the current CVS. Peter added the /selenocysteine 
tag just a week ago, and yup it looks like your fine has this tag.
So yeah, just update to CVS and all should be good. Ah, GenBank and
their continuously expanding tag list.

> Egads! How do I turn Martel's debug feature on? Setting it to level 2 in 
> FeatureParser didn't seem to do much...

Huh? Now you are really confusing me. The following code:

from Bio import GenBank

fp = GenBank.FeatureParser(debug_level = 2)
gb_iter = GenBank.Iterator(open('ecoli_k12.gb'), fp)

gb_iter.next()

spits out tons to debug messages ala:

Match '        /EC_numb' (x=8978): '                     '
Match '       /EC_numbe' (x=8979): '\\/'
Match '      /EC_number' (x=8980): '[^"]'
Match '     /EC_number=' (x=8981): '[^"]'

It does take a few moments before the debug info gets printed
though. Just wait and you'll see all the magic. Hope this helps.

Why-can't-anyone-parse-small-little-files-anymore-ly yr's,
Brad

> >>> gb_iter = GenBank.Iterator(open('ecoli_k12.gb'),fp)
> >>> fp = GenBank.FeatureParser()
> >>> cur_rec = gb_iter.next()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", line 
> 142, in next
>     return self._parser.parse(File.StringHandle(data))
>   File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", 
> line 229, in parse
>     self._scanner.feed(handle, self._consumer)
>   File "/usr/home/iddo/biopy_cvs/biopython/Bio/GenBank/__init__.py", 
> line 1251, in feed
> 
>   File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 328, 
> in parseFile
>     self.parseString(fileobj.read())
>   File "/usr/home/iddo/biopy_cvs/biopython/Martel/Parser.py", line 356, 
> in parseString
>     self._err_handler.fatalError(result)
>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py", line 
> 38, in fatalError
>     raise exception
> Martel.Parser.ParserPositionException: error parsing at or beyond 
> character 1414769
> 
> 
> 
> 
> 
> 
> 
> -- 
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 N. Torrey Pines Rd.
> La Jolla, CA 92037
> USA
> Tel: +1 (858) 646 3100 x3516
> Fax: +1 (858) 646 3171
> http://ffas.ljcrf.edu/~iddo
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev

From rgb2003 at med.cornell.edu  Tue Dec 16 22:59:34 2003
From: rgb2003 at med.cornell.edu (Robert G. Bussell)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Contribution -- NMR xpk files
Message-ID: <Pine.LNX.4.44.0312162255320.17354-100000@www-users.med.cornell.edu>

Hello,

I would like to contribute some tools that I have developed to the
biopython project.  Among them are programs for analyzing NMR data as well
as modules suited for more general problems of handling resonance
assignment data.


The tool set that I would like to contribue deals with with structural NMR
resonance assignment data in read from a standard, easily generated file
format (xpk or peaklist files) recognized by nmrview, a freely distributed,
open source program with a large user base.  The code that I have written
facilitates the process of extracting scientifically relevant information
from the data. Of additional utility is the module upon which the main
program depends which can be used as a building block for constructing code
to deal with nmrview peaklists and other resonance assignment data read in
from some other file format.

I'd be thrilled to introduce these tools into the public domain through
biopython and to provide more, related tools as feedback warrants them and
as I understand more about where they fit into the biopython project.

Thanks to everybody who supports the biopython and other open source
projects and I look forward to learning from your comments, testing and
suggestions.

Sincerely,

Robert Bussell, Jr.
rgb2003@med.cornell.edu

-------------- Program and instructions -------------

I would appreciate your feedback and testing.  Here are some
instructions for setting things up on your computer.  Please let me know if
you use these tools even if you don't have a specific comment.

NOTE: At the moment the program and modules do not depend on biopython in
any way.  It should only be necessary to have python installed to run them.


(1) Create or locate an existing directory for the modules I provide
    below.  Copy all three of the modules (readtools, writetools and
    xpktools) into that directory.

(2) Edit the line in predictnoe2.py that reads

	 sys.path=[sys.path,"/usr/people/robert/bin/python/lib"]

    so that it points your local directory that contains the modules
    listed in step 1.

(3) The first line of the program 

	#! /usr/bin/env python

    may have to be modified to point to the python interpreter on 
    your computer.

(4) Cut and paste the input peaklist (noed.xpk) from below (or use
    your own data).  You will need to specify this file location as
    an input parameter to predict


(5) Make predictnoe2.py executable and invoke it with the command:
 
    predictnoe2.py --inxpk noed.xpk --outxpk noex.xpk 
    --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15

    NOTE: This mode sets up an i->i+1 and i->i-1 prediction where
    the directly detected proton is attached to the N15 nitrogen and
    the 15N2 atom is attacted to the proton from which the coherence
    originates.
    
(6) Inspect the output in the --outxpk file (noex.xpk if following
    step 4 literally) and/or load it into nmrview and overlay it on	 a
    spectrum.
    
CAVEAT: Be careful in cutting and pasting the code, especially if
    you are unfamiliar with the xpk data format.  It could easily be
    scrambled if it acquires false line feeds.

-------------BEGIN PROGRAM: predicnoe2.py------------------
#! /usr/bin/env python
# predictnoe.py: A python script that predicts neighbor NOE locations
# Generates a peaklist of predicted i->i+n and i->i-n NOE crosspeaks from
#  peaklist of self peaks.
# **Input arguments:** 
#	--inxpk 	input peaklist 
#	--outxpk	output peaklist
#	--fromatom	atom in xpk head corresponding to i of i->i+n noe 
#	--relayatom	label for the relay atom 
#	--detectatom	label for detected atom
#	--increment	n, where noes are between i and i+n
#	Example input:
#         predictnoe2.py --inxpk dpeak.xpk --outxpk xpeak.xpk
#         --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15
#
# *******************
# Known input file assumptions: Header should be six lines long and contain

# the list of data labels as the last line
#
# ****TO DO LIST****
# Add automatic prediction of forward and reverse noes
# Test the endpoint predictions (are all possible crosspeaks predicted?)


# ***** LOAD MODULES *****
import types
import getopt
import string
import sys
sys.path=[sys.path,"/usr/people/robert/bin/python/lib"]
import xpktools

# ***** FUNCTION DEFINITIONS *****


def get_res(infn,match,dict,headerlen):

	n=0
	i=0
	line=infile.readline()
	while (line):	# Read past the header
		i=i+1
		if (i>=headerlen):
			res=string.split(string.split(line)[1],".")[0]
			if (res==match):
				n=n+1
				key=res+" "+str(n)
				dict[key]=line
		line=infile.readline()

def find_label_cols(labels, labelswanted):
	# Find the column number for to, from and relay atoms
	#	using the xpk header and user inputs fromlabel and tolabel
	# Input -- the label line from the xpk
	#	(like this for ex. H1.L  H1.P  H1.W...more stuff...int 
stat)
	# Return values: col number for fromlabel and tolabel

	#** LOCAL INITS **
	datamap={}

	fromlabel	=labelswanted[0] 
	relaylabel	=labelswanted[1]
	detectlabel	=labelswanted[2]
	
	labellist=string.splitfields(string.split(labels,"\012")[0])

	# Make a data map of the label and ppm values of atoms of interest
	for i in range(len(labellist)):
		if (fromlabel+".P"==labellist[i]):
			datamap["fromppm"]=i
		if (detectlabel+".P"==labellist[i]):
			datamap["detectppm"]=i
		if (relaylabel+".P"==labellist[i]):
			datamap["relayppm"]=i
		if (fromlabel+".L"==labellist[i]):
			datamap["fromassign"]=i
		if (detectlabel+".L"==labellist[i]):
			datamap["detectassign"]=i
		if (relaylabel+".L"==labellist[i]):
			datamap["relayassign"]=i

	return datamap

def get_ave_cs(list,col):
	# Get the average of the chemical shift

	sum=0
	n=0
	for element in list:
		sum=sum+string.atof(string.split(element)[col+1])
		n=n+1
	return sum/n

def read_xpk(infile,dict,headerlen):
	# Read xpk files into a dictionary of lists
	# The dictionary entries are indexed by the "to" atom of the noe
	#	(i.e. the detected amide proton for a nh-nh noe experiment)

	#
	# Each list contains lines from xpk file with a common first
	# dimension residue assinment.
	# The peaklist header is also returned but not included in the dict

	#
	# Special dictionary elements:
	#	"maxres" the maximum residue number
	#	"minres" the minimum residue number


	header=[]	# This will hold the header lines 
	i=0		# line counter
	maxres=-1	# maximum residue number
	minres=-1	# minimum residue number

	line=infile.readline()
	while (line):	# Read past the header
		if (i<headerlen):
			header.append(line)
		i=i+1
		if (i>=headerlen+1):
			res=string.split(string.split(line)[1],".")[0]
			
			# Check min and max and update values as necessary
			[maxres,minres]=update_min_max(res,minres,maxres)

			if dict.has_key(str(res)):
				# Append the additional data about this
residue
				#	to a list 
				templst=dict[str(res)]
				templst.append(line)
				dict[str(res)]=templst
			else:
				# This is a new residue, start a new list 
				dict[str(res)]=[line]  # Use [] for list
type
		line=infile.readline()

	# Add the max and min statistics to the dictionary
	dict["maxres"]=maxres
	dict["minres"]=minres

	return header


def predict_xpeaks(dict,fromres,tores,datamap,count):
	# Predict the position of the fromres->tores NOE crosspeak
	# Nomenclature:
	#	"fromres" --> "tores" == "i --> i+inc"

	# *LOCAL INITS*
	predict=[]

	fromreslist=dict[str(fromres)]	# Residue 1 data line
	toreslist=dict[str(tores)]	# Residue 2 data line

	
	#** Get averages of ppm values for coordinates of each residue**
	# Only the "relay" and "detect" atom data need to be calculated 
	#	since the from characteristics will be left in place
	avefromppm	= get_ave_cs(fromreslist,datamap["fromppm"])

	
	#** Change the hn and n assignments and chem shifts to other res**
	# Base the new line on the "to" residue data line, substituting the
from data
	# Also change the first element (line count)
	for line in toreslist:
		fromlabel	= str(fromres) + ".n"	# ABSTRACT THIS
		line	= 
xpktools.replace_entry(line,datamap["fromppm"]+2,avefromppm)
		line	=
xpktools.replace_entry(line,datamap["fromassign"]+2,fromlabel)
		line	= xpktools.replace_entry(line,1,count)
		predict.append(line)
	return predict

def parse_args():
       
opts=getopt.getopt(sys.argv[1:],'',['inxpk=','outxpk=','fromatom=','detecta
tom=','increment=','relayatom='])

	for elem in opts[0]:
		if (elem[0]=="--inxpk"):
			inxpk=elem[1]
		if (elem[0]=="--outxpk"):
			outxpk=elem[1]
		if (elem[0]=="--fromatom"):
			fromatom=elem[1]
		if (elem[0]=="--detectatom"):
			detectatom=elem[1]
		if (elem[0]=="--increment"):
			increment=elem[1]
		if (elem[0]=="--relayatom"):
			relayatom=elem[1]

	if (inxpk=='' or outxpk=='' or increment=='' or detectatom=='' or
fromatom=='' or relayatom==''):
		input_args_needed_error()
		exit(0)
		
	return string.atoi(increment), inxpk, outxpk, detectatom,
relayatom, fromatom

def input_args_needed_error():
	print "These input arguments needed for program execution:"
	print "--inxpk"
	print "--outxpk"
	print "--increment"
	print "--detectatom"
	print "--fromatom"
	print "--atom"


def input_args_warning(progname):
	print progname, "Error -- please check your input arguments."
	print progname, "Quitting."

def cols_not_found_warning(progname):
	print progname, "Error -- One or more data columns not found."
	print progname, "	  Try checking your from, to and relay
atoms."
	print progname, "Quitting."


def update_min_max(res,minres,maxres):
	# This function takes care of updating the values of maxres and
	#	minres so that they reflect the global max and min residue
	#	values in the peaklist

	res=string.atoi(res)
	if (res>0):
		if (minres<0):	# takes care of initialization where
minres=-1
			minres=res
		if (maxres<0):	# takes care of initial value maxres=-1
			maxres=res
		if (minres>res): # found a smaller min, replace
			minres=res
		if (maxres<res): # found a larger value than maxres,
replace
			maxres=res

	return maxres, minres # * * * * * * * * * * MAIN * * * * * * * * *
*

def file_open_warning(file):
	print "predictnoe.py: Error opening", file, " for writing."
	print "predictnoe.py: Could be a file permission problem."
	

# ***** INITS *****
headerlen=6				# Num lines in xpk header
dict={} 				# The input dictionary
datamap={}
PROGNAME="predictnoe.py:"		# The name of this program
fromcol=-1			#column number of from atom label
detectcol=-1			#column number of detected atom label
relaycol=-1			#column number of relay atom

# ***** READ INPUTS ******
try:
	[inc,infn,outfn,detectatom,relayatom,fromatom]=parse_args()# Parse
input
except NameError, e:
	input_args_warning(PROGNAME)
	sys.exit(0)
atomswanted=[fromatom,relayatom,detectatom]


# **Read the input file into a dictionary**
try:
	infile=open(infn)
except IOError, e:
	file_open_warning("input file")
	sys.exit(0)

header=read_xpk(infile,dict,headerlen)
infile.close()


# **Find the appropriate column numbers for accessing data in this xpk file

datamap=find_label_cols(header[headerlen-1],atomswanted)

MAXRES=dict["maxres"]
MINRES=dict["minres"]


#****** CALCULATE AND WRITE *****
try:
	outfile=open(outfn,'w') 		# Open the output file
except IOError, e:
	file_open_warning("output file")
	sys.exit()

xpktools.write_list(outfile,header)   # Write the header

# Predict the i->i+inc and i->i-inc noe positions if possible
# Write each one to the output file as they are calculated
count=0 	# A counter for number the output data lines
res=MINRES	# minimum should be the lowest i value
while (res<=MAXRES):
	if ( dict.has_key(str(res)) and dict.has_key(str(res+inc)) ):
	       
xpktools.write_list(outfile,predict_xpeaks(dict,res,res+inc,datamap,count))

	if ( dict.has_key(str(res)) and dict.has_key(str(res-inc)) ):
	       
xpktools.write_list(outfile,predict_xpeaks(dict,res,res-inc,datamap,count))

		count=count+1
	res=res+1

outfile.close()
------------- END PROGRAM: predictnoe2.py --------------

------------- BEGIN MODULE: xpktools.py -----------------

# xpktools.py: A python module containing function definitions and classes
#	       useful for manipulating data from nmrview .xpk peaklist
files.
#
# ********** INDEX of functions and classes **********
#
#	xpkentry: Handles xpk data one line at a time generating
#		  attributes that are often sought after such as
#		  the chemical shifts and assignments of the 1st
#		  three dimesions.
#		  Using this function to define a variable
#		  requires both the data line from the xpk file
#		  itself as well as the label line (seventh line)
#		  of the xpk header file.  To get that line use
#		  the function get_header_line(infile) in this
#		  module.

import string

class xpkentry:
    # Usage: xpkentry(xpkentry,xpkheadline) where xpkentry is the line
    #	     from an nmrview .xpk file and xpkheadline is the line from
    #	     the header file that gives the names of the entries
    #	     which is typcially the sixth line of the header (counting fm
1)
    # Variables are accessed by either their name in the header line as in
    #	self.field["H1.P] will return the H1.P entry for example.
    #	self.field["linenum"] returns the line number (1st field of line)
    #	self.d1l=first dimension atom label
    #	self.d1p=first dimension chemical shift
    #	    --------  same for dim up to 4  -------

    def __init__(self,entry,headline):
       self.field={}	# Holds all fields from input line in a dictionary
			# keyed to the label line in the xpk file
       datlist	= string.split(entry)
       headlist = string.split(headline)

       # Parse the entry into a field dictionary
       self.field["linenum"]=datlist[0]
       i=1
       while i<len(datlist):
	 self.field[str(headlist[i-1])]=datlist[i]
	 i=i+1

       # Assign the chem shifts in all dimensions to special variables
       self.d1l=datlist[1]
       self.d1p=datlist[2]

       if (len(datlist)>=9):
	 self.d2l=datlist[7]
	 self.d2p=datlist[8]

       if (len(datlist)>=15):
	 self.d3l=datlist[13]
	 self.d3p=datlist[14]

       # Assign the general peak properties to special variables
       self.stat = datlist[len(datlist)-1]
       self.int  = datlist[len(datlist)-2]
       self.vol  = datlist[len(datlist)-3]

def get_header_line(infile):
	i=1
	while (i<7):
	  line=infile.readline()
	  i=i+1
	return line 

def replace_entry(line,fieldn,newentry):
	# Replace an entry in a string by the field number
	# No padding is implemented currently.	Spacing will change if
	#  the original field entry and the new field entry are of
	#  different lengths.
	# This method depends on xpktools.find_start_entry

	start=find_start_entry(line,fieldn)
	leng=len(string.splitfields(line[start:])[0])
	newline=line[:start]+str(newentry)+line[(start+leng):]
	return newline

def write_list(outfile,list):
	for line in list:
		outfile.write(line)

def find_start_entry(line,n):
	# find the starting point character for the n'th entry in
	# a space delimited line.  n is counted starting with 1
	# The n=1 field by definition begins at the first character

	infield=0	# A flag that indicates that the counter is in a
field

	if (n==1):
		return 0	# Special case

	# Count the number of fields by counting spaces
	c=1
	leng=len(line)

	# Initialize variables according to whether the first character
	#  is a space or a character
	if (line[0]==" "):
		infield=0
		field=0
	else:
		infield=1
		field=1


	while (c<leng and field<n):
		if (infield):
			if (line[c]==" " and not (line[c-1]==" ")):
				infield=0
		else:
			if (not line[c]==" "):
				infield=1
				field=field+1

		c=c+1

	return c-1
------------- END MODULE: xpktools.py -------------------

-------------- BEGIN DATASET: noed.xpk -------------------------
label dataset sw sf 
H1 15N2 N15 
test.nv
{1571.86 } {1460.01 } {1460.00 }
{599.8230 } { 60.7860 } { 60.7860 }
 H1.L  H1.P  H1.W  H1.B  H1.E  H1.J  15N2.L  15N2.P  15N2.W  15N2.B  15N2.E
 15N2.J  N15.L	N15.P  N15.W  N15.B  N
15.E  N15.J  vol  int  stat 
0  3.hn   7.753   0.021   0.010   ++   0.000   3.n   118.104   0.344  
0.010	PP   0.000   3.n   118.117   0.344 
  0.010   PP   0.000  1.18200 1.18200 0
1  4.hn   7.675   0.030   0.010   ++   0.000   4.n   119.262   1.852  
0.010	E+   0.000   4.n   119.260   1.852 
  0.010   E+   0.000  0.99960 0.99960 0
2  5.hn   7.878   0.026   0.010   ++   0.000   5.n   117.892   0.449  
0.010	EE   0.000   5.n   117.894   0.449 
  0.010   EE   0.000  0.93720 0.93720 0
3  6.hn   7.978   0.010   0.010   ++   0.000   6.n   120.085   0.343  
0.010	PP   0.000   6.n   120.101   0.343 
  0.010   PP   0.000  0.57500 0.57500 0
4  7.hn   8.137   0.084   0.010   ++   0.000   7.n   107.673   0.346  
0.010	PP   0.000   7.n   107.641   0.346 
  0.010   PP   0.000  0.60770 0.60770 0
5  8.hn   8.140   0.026   0.010   ++   0.000   8.n   121.050   0.355  
0.010	PP   0.000   8.n   121.050   0.355 
  0.010   PP   0.000  0.56960 0.56960 0
6  9.hn   7.924   0.028   0.010   ++   0.000   9.n   113.323   0.319  
0.010	+P   0.000   9.n   113.322   0.319 
  0.010   +P   0.000  0.46630 0.46630 0
7  10.hn   7.663   0.021   0.010   ++	0.000	10.n   121.341	 0.324	
0.010	+E   0.000   10.n   121.476   0.3
24   0.010   +E   0.000  0.49840 0.49840 0
------------------ END DATASET: noed.xpk


From chapmanb at uga.edu  Wed Dec 17 13:29:32 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Contribution -- NMR xpk files
In-Reply-To: <Pine.LNX.4.44.0312162255320.17354-100000@www-users.med.cornell.edu>
References: <Pine.LNX.4.44.0312162255320.17354-100000@www-users.med.cornell.edu>
Message-ID: <20031217182932.GD53012@evostick.agtec.uga.edu>

Hi Robert;

> I would like to contribute some tools that I have developed to the
> biopython project.

Great! We definitely always welcome contributions.

> Among them are programs for analyzing NMR data as well
> as modules suited for more general problems of handling resonance
> assignment data.

I will admit straightaway that I know next to nothing about
structural data, so I won't be able to make any comments about the
actual work the code is doing (Heh. Predict NOEs -- I don't even
know what an NOE that I got predicted is. Heh.). But I can make
comments at least from the style and usability aspects and others
who know about structural data can help me out.

The first major point is that most things in Biopython are organized
as modules that can be called from other functions. From looking at
your code (and not really understanding structural things) it looks
like the two major things the code does are deal with the xpk files
and then do the NOE prediction.

If I am on target, then it might be best to organize your code as a
couple of modules under Bio.NMR, called something like xpktools.py
and NOEPredict.py or something similar. This way the
predictnoe.py script can call the useful functions from these
modules, and they can also be reused from other people's scripts. If
structural people have other ideas about where this functionality
should be located, please chime in.

The second point is that I noticed just on rapid examination that
some class and function names don't conform to the Biopython style
guide that we use. Specifically, classes are normally named in
AllFirstLetterUppercase style, and internal functions (those that
aren't meant to be called from other scripts using the modules) are
differentiated with _underscores_in_front. Jeff wrote up a nice
guide about contributing to Biopython which has these points and
additional info:

http://www.biopython.org/docs/developer/contrib.html

But yeah, after all that -- we definitely would like to have your
code as it doesn't (to my knowledge) duplicate anything we already
have in Biopython. To sum up my major suggestions would be to:

1. Read over the contribution and style guide for the code.
2. Organize the functionality as modules and make it clear by
underscores or some other method which functions are meant to be
called by other modules.
3. Have the example script use the modules as an example.
4. Make sure you are willing to put your code under the Biopython
license.

Thanks for your mail and the code!
Brad

From Richard.Christen at unice.fr  Fri Dec 19 12:19:59 2003
From: Richard.Christen at unice.fr (christen)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Blast Parser error
Message-ID: <033e01c3c654$549354c0$2b113b86@christen2002>

Hi there
I got a problem with the Blast parser

##############################
The biology thing :
I have been using the blast parser in some kind of a loop  to blast n
sequences against themselves, and then parse the output to build a distance
matrix. I use a sliding window to extract different parts of the n sequences
and, how stupid not to check it but I did not exepted, one sequence was much
shorter, so I sent to blast a sequence of zero length :-(

this is confirmed by the log of formatdb, blast thus provides only a warning
(note that formatdb does not return the proper lcl|id of the sequence ! (I
will send a mail to ncbi about that)
========================[ Dec 19, 2003  4:42 PM ]========================
Version 2.2.2 [Dec-14-2001]
Started database file "D:\Bases\Bac16S\BLAST\Chapon_4133-P"
WARNING: [000.000] lcl|50 has zero-length sequence

Formatted 90 sequences


As a result a got an error in the parser.


##############################
Error messages:
Traceback (most recent call last):
  File "test.py", line 24, in ?
    b_record = b_iter.next()  #recherche de la query suivante
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
1331, in next
    return self._parser.parse(File.StringHandle(data))
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
556, in parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 98,
in feed
    self._scan_database_report(uhandle, consumer)
  File "C:\Python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line
422, in _scan_database_report
    line = safe_readline(uhandle)
  File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
    raise SyntaxError, "Unexpected end of stream."

##############################
test.py sample
...
b_parser=NCBIStandalone.BlastParser()   # appel du parser
b_iter=NCBIStandalone.Iterator(blast_out, b_parser)  #appel de l'iterateur
...

23 while 1:
24     b_record = b_iter.next()  #recherche de la query suivante
25
26    if b_record is None:
27        break      #"plus de reponse Query= a lire...


##############################
blast output, section concerned
BLASTN 2.2.2 [Dec-14-2001]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= lcl|97633|sp=CYB 296
         (0 letters)

Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
           90 sequences; 14,728 total letters


 ***** No hits found ******

  Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
    Posted date:  Dec 19, 2003  4:42 PM
  Number of letters in database: 14,728
  Number of sequences in database:  90

BLASTN 2.2.2 [Dec-14-2001]


##############################
usefull pieces of code

def safe_readline(handle):
    """safe_readline(handle) -> line

    Read a line from an UndoHandle and return it.  If there are no more
    lines to read, I will raise a SyntaxError.

    """
    line = handle.readline()
    if not line:
        raise SyntaxError, "Unexpected end of stream."    #File
"C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in
safe_readline
    return line


        consumer.start_database_report()
 while 1:
            read_and_call(uhandle, consumer.database, start='  Database')
            # Database can span multiple lines.
            read_and_call_until(uhandle, consumer.database, start='
Posted')
            read_and_call(uhandle, consumer.posted_date, start='    Posted')
            read_and_call(uhandle, consumer.num_letters_in_database,
                       start='  Number of letters')
            read_and_call(uhandle, consumer.num_sequences_in_database,
                       start='  Number of sequences')
            read_and_call(uhandle, consumer.noevent, start='  ')
            line = safe_readline(uhandle)   #### NCBIStandalone.py", line
422, in _scan_database_report
            uhandle.saveline(line)
            if line.find('Lambda') != -1:
                       break


    def feed(self, handle, consumer):
        """S.feed(handle, consumer)

        Feed in a BLAST report for scanning.  handle is a file-like
        object that contains the BLAST report.  consumer is a Consumer
        object that will receive events as the report is scanned.

        """
        if isinstance(handle, File.UndoHandle):
            uhandle = handle
        else:
            uhandle = File.UndoHandle(handle)

        # Try to fast-forward to the beginning of the blast report.
        read_and_call_until(uhandle, consumer.noevent, contains='BLAST')
        # Now scan the BLAST report.
        self._scan_header(uhandle, consumer)
        self._scan_rounds(uhandle, consumer)
        self._scan_database_report(uhandle, consumer)
#######CBIStandalone.py", line 98, in feed
        self._scan_parameters(uhandle, consumer)


#######################################


Thanks in advance


Richard CHRISTEN
Champion de saut en epaisseur
UMR6543 CNRS - Universit? de Nice Sophia Antipolis

Centre de Biochimie
Parc Valrose
06108 Nice cedex2

tel  33 - 492 076 947
fax 33 - 492 076 408


From benita at cshl.edu  Tue Dec 23 09:44:40 2003
From: benita at cshl.edu (Yair Benita)
Date: Sat Mar  5 14:43:29 2005
Subject: [Biopython-dev] Interpro parser
Message-ID: <BC0DBF0B.398D%benita@cshl.edu>

Hi all,
I found some code to work with the online version of Interpro in biopython.
However, I couldn't find any code to parse the results. I run Interpro
locally and am able to produce xml output. Did anyone make a parser for such
an output?

I attach an example of the Interpro output. Its clear this should be easy to
parse, but I am not sure what would be best in that case.

Any suggestions?

Yair
-- 
Yair Benita
Pharmaceutical Proteomics
Utrecht University

-------------- next part --------------
Skipped content of type multipart/appledouble