[Biopython] affy CEL and CDF reader

Vincent Davis vincent at vincentdavis.net
Thu Apr 8 18:33:41 UTC 2010


I ended up writing my own modules for reading both affy Cel and CDF files.
Long story as to why I did not just use what was available in biopython.
I plan on making what I have done available to the biopython and will upload
it as a fork. I will outline what ways what I have is different below.
My question is: Are there any improvements(features) others would like to
see beyond what is avalible in the current CelFile.py?
I saw some posts a month or so ago about checking for consistency in cell
file, I think it was something about making sure the stated number of probes
was consistent with the intensity measurements.

What is different,
when an file is read Affycel.read('file') many atributes are set. for
example
a = affcel()
a.read('testfile')
a.filename,
a.version,
a.header.items()  # a dictionary of all header items
a.num_intensity
a.intensity
a.num_masks
a.masks
a.num_outliers
a.outliers
a.numb_modified
a.modified

I plan to add the ability return/call intensity values with our with
outliers or mask values.
All data is currently store in numpy structured arrays,
currently a.intensity returns the structured array, but I plan on making it
an option to easily choose how this is returned.
also what to make an optional normalized intensity array so that if the data
is normalized it can be stored with the affycel instance. My use case was
that I was opening about 80 cel files and reading them in was slow. this
allowed me to read each file as an instance of affycel stored in a list that
I then pickled. It was then much faster to open them.

Are improvements to the CelFile.py are of value to biopython?

I hope to have the code pushed up to my fork on github late tonight. Just
thought I would ask if there was any suggestion before I did.

Also have an CDF file reader, but only have done some basic testing. I don't
have a lot of use for this, do other biopython users?

I am kinda working in a vacuum and am trying to get more involved in
projects to improve my skills and knowledge. Any suggestions would be
appreciated.

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>



More information about the Biopython mailing list