[BioPython] byte packing and binary files

Jonathan M. Gilligan jonathan.gilligan@vanderbilt.edu
Mon, 08 Jan 2001 13:24:22 -0600


I you were dealing with homogeneous data (say, an array of data all of the 
same type) I would recommend looking in the Python manual under 
array.array(), array.tofile(), and array.tostring(). This is the python 
standard library array package, not the Numerical Python array package.

 From the python documentation:

>tofile (f)
>Write all items (as machine values) to the file object f.
>tolist ()
>Convert the array to an ordinary list with the same items.
>tostring ()
>Convert the array to an array of machine values and return the string 
>representation (the same sequence of bytes that would be written to a file 
>by the tofile() method.)

For what you want, mixing integer and float data, I think struct.pack() is 
the way to go. Then you just open an output stream in binary mode and write 
the string emitted by struct.pack. You may need to worry about endian-ness 
if you and your colleague work on machines with different endian 
conventions. The struct.pack() documentation describes how to specify 
endian-ness (BTW, the Mac is big-endian). Probably you want to do something 
like

import struct
outfile = open('out.dat', 'wb')
# Imagine that datalist is a list of 3-tuples (i,j,f):
# two integers and a float
for (i, j, f) in datalist
#   '>' in the format string indicates to output
#   as big-endian
     outfile.write(struct.pack('>3d', i, j, f)
outfile.close()

Alternately, you could create an array:

import array
import sys
a = array.array('d')
# imagine that datalist is a list of 3-tuples (i,j,f), as above
for item in datalist:
     for element in item:
         a.append(element)
# handle endian-correction:
# Mac is big-endian
if sys.byteorder == 'little'
         a.byteswap()
outfile = open('out.dat', 'wb')
a.tofile(outfile)
outfile.close()

Hope this helps,
Jonathan

At 12:23 PM 1/8/2001, Scott T. Kelley wrote:
>Dear Biopythoneers,
>
>I have an unusual question about writing binary files. I have some data that
>I want make available to someone else's program written in C++ on a Mac (he
>has a nice GUI all set up) and the easiest way for his program to read the
>input would be if all the numbers were read in as 8 byte doubles.
>
>I know you can use the struct library to pack numbers as doubles: pack('d',
>number) but can you then write these to a binary file easily? Is there
>another way to do this that I don't know about?
>
>The file I want to write is just a list of number (two ints and a float)
>that looks like this:
>
>2    5    8.9035
>2    7    10.988
>
>...etc.
>
>Thanks for any pointers you can send my way. -Scott

=============================================================================
Jonathan M. Gilligan                         jonathan.gilligan@vanderbilt.edu
The Robert T. Lagemann Assistant Professor    www.vanderbilt.edu/lsp/gilligan
                    of Living State Physics            Office:    615 343-6252
Dept. of Physics and Astronomy, Box 1807-B            Lab (FEL)      343-7580
6823 Stevenson Center                                 Fax:           343-7263
Vanderbilt University, Nashville, TN 37235            Dep't Office:  322-2828