[Biopython] affy CEL and CDF reader

Sean Davis sdavis2 at mail.nih.gov
Thu Apr 8 22:31:43 UTC 2010


On Thu, Apr 8, 2010 at 3:43 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> No I was not reading the binary files. That said I am interested in perusing
> that if there is interest.
> Do you have a link to the SDK?

I believe this will get you close:

http://www.affymetrix.com/partners_programs/programs/developer/fusion/index.affx?terms=no

I hope my questions are not taken the wrong way, but I have learned
from the bioconductor project that dealing with vendor file formats is
often a non-trivial pursuit.  It isn't always easy to think of all the
edge cases.

Sean


>  *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> |
> LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
>
> On Thu, Apr 8, 2010 at 1:40 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>> On Thu, Apr 8, 2010 at 3:03 PM, Vincent Davis <vincent at vincentdavis.net>
>> wrote:
>> > Parsing it myself, But based directly an the affy documentation found
>> here.
>> >
>> http://www.stat.lsa.umich.edu/~kshedden/Courses/Stat545/Notes/AffxFileFormats/
>>
>> So, are you covering both binary and text formats for .CEL files?  I
>> think that modern .CEL files (those produced by GCOS) are binary and
>> represent the majority of .CEL files produced today.  Some of the I/O
>> issues that you discuss are almost definitely dealt with by using the
>> binary .CEL files.
>>
>> I'm certainly not an expert on Affy, so take all these
>> questions/comments with a grain of salt.
>>
>> Sean
>>
>>
>> > On Thu, Apr 8, 2010 at 12:56 PM, Sean Davis <sdavis2 at mail.nih.gov>
>> wrote:
>> >
>> >> On Thu, Apr 8, 2010 at 2:33 PM, Vincent Davis <vincent at vincentdavis.net
>> >
>> >> wrote:
>> >> > I ended up writing my own modules for reading both affy Cel and CDF
>> >> files.
>> >> > Long story as to why I did not just use what was available in
>> biopython.
>> >> > I plan on making what I have done available to the biopython and will
>> >> upload
>> >> > it as a fork. I will outline what ways what I have is different below.
>> >> > My question is: Are there any improvements(features) others would like
>> to
>> >> > see beyond what is avalible in the current CelFile.py?
>> >> > I saw some posts a month or so ago about checking for consistency in
>> cell
>> >> > file, I think it was something about making sure the stated number of
>> >> probes
>> >> > was consistent with the intensity measurements.
>> >> >
>> >> > What is different,
>> >> > when an file is read Affycel.read('file') many atributes are set. for
>> >> > example
>> >> > a = affcel()
>> >> > a.read('testfile')
>> >> > a.filename,
>> >> > a.version,
>> >> > a.header.items()  # a dictionary of all header items
>> >> > a.num_intensity
>> >> > a.intensity
>> >> > a.num_masks
>> >> > a.masks
>> >> > a.num_outliers
>> >> > a.outliers
>> >> > a.numb_modified
>> >> > a.modified
>> >> >
>> >> > I plan to add the ability return/call intensity values with our with
>> >> > outliers or mask values.
>> >> > All data is currently store in numpy structured arrays,
>> >> > currently a.intensity returns the structured array, but I plan on
>> making
>> >> it
>> >> > an option to easily choose how this is returned.
>> >> > also what to make an optional normalized intensity array so that if
>> the
>> >> data
>> >> > is normalized it can be stored with the affycel instance. My use case
>> was
>> >> > that I was opening about 80 cel files and reading them in was slow.
>> this
>> >> > allowed me to read each file as an instance of affycel stored in a
>> list
>> >> that
>> >> > I then pickled. It was then much faster to open them.
>> >> >
>> >> > Are improvements to the CelFile.py are of value to biopython?
>> >> >
>> >> > I hope to have the code pushed up to my fork on github late tonight.
>> Just
>> >> > thought I would ask if there was any suggestion before I did.
>> >> >
>> >> > Also have an CDF file reader, but only have done some basic testing. I
>> >> don't
>> >> > have a lot of use for this, do other biopython users?
>> >> >
>> >> > I am kinda working in a vacuum and am trying to get more involved in
>> >> > projects to improve my skills and knowledge. Any suggestions would be
>> >> > appreciated.
>> >>
>> >> Just out of curiosity, is your work based on the affy sdk, or are you
>> >> parsing stuff yourself?
>> >>
>> >> Sean
>> >>
>> > _______________________________________________
>> > Biopython mailing list  -  Biopython at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython
>> >
>>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>




More information about the Biopython mailing list