[Biopython] affy CEL and CDF reader

Vincent Davis vincent at vincentdavis.net
Thu Apr 8 20:21:32 UTC 2010


Maybe I should have started this discussion differently.
Is there any need for improvements to the ability to read CEL files or CDF
files and if so what are they? I am interested in  contributing.

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Thu, Apr 8, 2010 at 12:33 PM, Vincent Davis <vincent at vincentdavis.net>wrote:

> I ended up writing my own modules for reading both affy Cel and CDF files.
> Long story as to why I did not just use what was available in biopython.
> I plan on making what I have done available to the biopython and will
> upload it as a fork. I will outline what ways what I have is different
> below.
> My question is: Are there any improvements(features) others would like to
> see beyond what is avalible in the current CelFile.py?
> I saw some posts a month or so ago about checking for consistency in cell
> file, I think it was something about making sure the stated number of probes
> was consistent with the intensity measurements.
>
> What is different,
> when an file is read Affycel.read('file') many atributes are set. for
> example
> a = affcel()
> a.read('testfile')
> a.filename,
> a.version,
> a.header.items()  # a dictionary of all header items
> a.num_intensity
> a.intensity
> a.num_masks
> a.masks
> a.num_outliers
> a.outliers
>  a.numb_modified
> a.modified
>
> I plan to add the ability return/call intensity values with our with
> outliers or mask values.
> All data is currently store in numpy structured arrays,
> currently a.intensity returns the structured array, but I plan on making it
> an option to easily choose how this is returned.
> also what to make an optional normalized intensity array so that if the
> data is normalized it can be stored with the affycel instance. My use case
> was that I was opening about 80 cel files and reading them in was slow. this
> allowed me to read each file as an instance of affycel stored in a list that
> I then pickled. It was then much faster to open them.
>
> Are improvements to the CelFile.py are of value to biopython?
>
> I hope to have the code pushed up to my fork on github late tonight. Just
> thought I would ask if there was any suggestion before I did.
>
> Also have an CDF file reader, but only have done some basic testing. I
> don't have a lot of use for this, do other biopython users?
>
> I am kinda working in a vacuum and am trying to get more involved in
> projects to improve my skills and knowledge. Any suggestions would be
> appreciated.
>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
>



More information about the Biopython mailing list