[Biopython-dev] Interested in a Phenotype Microarray parser?

Marco Galardini marco.galardini at unifi.it
Tue Mar 25 23:40:44 UTC 2014


Hi all,

following your suggestions (as well as the other modules 
implementations) I've just committed a couple of commits to my biopython 
fork, featuring the Bio.Phenomics module.
The module capabilities are limited to reading/writing Phenotype 
Microarray files and basic operations on the PlateRecord/WellRecord 
objects. The module requires numpy to interpolate the signal when the 
user request a time point that wasn't in the input file (this way the 
WellRecord object can be queried with slices).
I'm thinking on how to implement the parameters extraction from 
WellRecord objects without the use of scipy.

Here's the link to my branch: 
https://github.com/mgalardini/biopython/tree/phenomics
The module and functions have been documented taking inspiration from 
the other modules: hope they are clear enough for you to try it out.
Some example files can be found in Tests/Phenomics.

Marco

On 08/01/2014 10:32, Marco Galardini wrote:
> Hi,
>
> On 01/08/2014 06:53 AM, Michiel de Hoon wrote:
>>> any specification on the style guide for the biopython parsers?
>> There is no strict set of rules, but to get you started, many modules
>> follow this format:
>> - Assuming a PM data file contains only a single data set, the module
>> should contain a function "read" that takes either a file name or a file
>> handle as the argument.
> Unfortunately, the situation is a bit mixed up: there are basically 
> three file formats for PM data: as csv files (which can contain one or 
> more data sets or 'plates') and as yaml/json, which can contain also 
> some metadata. I would therefore use a similar approach as the SeqIO 
> module, having a parse() and a read() method that returns an exception 
> if the file contains more than one record.
>
>> - The module should contain a class (typically called "Record") that
>> can store the data in the data file. The "read" function returns an
>> object of this class.
>> - Try to avoid third-party dependencies if at all possible.
> So far the dependencies would be pyYaml (for the yaml/json parsing, 
> but maybe i could use the stdlib json module) and numpy/scipy for the 
> extraction of curve parameters. Does this sound ok?
>>
>> Would it make sense to have a single Bio.Microarray module that can
>> house the various microarray parsers (PM, Affy, others)?
> I don't know if that would be a good strategy: the Phenotype 
> Microarrays are very different from the other proper microarrays; how 
> about a "phenomics" module?
>
>>
>> Best,
>> -Michiel.
> Kind regards,
> Marco
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

-- 
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------




More information about the Biopython-dev mailing list