[Biopython] affy CEL and CDF reader

Vincent Davis vincent at vincentdavis.net
Thu Apr 8 19:03:38 UTC 2010


Parsing it myself, But based directly an the affy documentation found here.
http://www.stat.lsa.umich.edu/~kshedden/Courses/Stat545/Notes/AffxFileFormats/

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Thu, Apr 8, 2010 at 12:56 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:

> On Thu, Apr 8, 2010 at 2:33 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I ended up writing my own modules for reading both affy Cel and CDF
> files.
> > Long story as to why I did not just use what was available in biopython.
> > I plan on making what I have done available to the biopython and will
> upload
> > it as a fork. I will outline what ways what I have is different below.
> > My question is: Are there any improvements(features) others would like to
> > see beyond what is avalible in the current CelFile.py?
> > I saw some posts a month or so ago about checking for consistency in cell
> > file, I think it was something about making sure the stated number of
> probes
> > was consistent with the intensity measurements.
> >
> > What is different,
> > when an file is read Affycel.read('file') many atributes are set. for
> > example
> > a = affcel()
> > a.read('testfile')
> > a.filename,
> > a.version,
> > a.header.items()  # a dictionary of all header items
> > a.num_intensity
> > a.intensity
> > a.num_masks
> > a.masks
> > a.num_outliers
> > a.outliers
> > a.numb_modified
> > a.modified
> >
> > I plan to add the ability return/call intensity values with our with
> > outliers or mask values.
> > All data is currently store in numpy structured arrays,
> > currently a.intensity returns the structured array, but I plan on making
> it
> > an option to easily choose how this is returned.
> > also what to make an optional normalized intensity array so that if the
> data
> > is normalized it can be stored with the affycel instance. My use case was
> > that I was opening about 80 cel files and reading them in was slow. this
> > allowed me to read each file as an instance of affycel stored in a list
> that
> > I then pickled. It was then much faster to open them.
> >
> > Are improvements to the CelFile.py are of value to biopython?
> >
> > I hope to have the code pushed up to my fork on github late tonight. Just
> > thought I would ask if there was any suggestion before I did.
> >
> > Also have an CDF file reader, but only have done some basic testing. I
> don't
> > have a lot of use for this, do other biopython users?
> >
> > I am kinda working in a vacuum and am trying to get more involved in
> > projects to improve my skills and knowledge. Any suggestions would be
> > appreciated.
>
> Just out of curiosity, is your work based on the affy sdk, or are you
> parsing stuff yourself?
>
> Sean
>



More information about the Biopython mailing list