[BioPython] calculate F-Statistics from SNP data

Thu Oct 23 05:41:04 EDT 2008

On Wed, Oct 22, 2008 at 7:26 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Oct 22, 2008 at 6:10 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
> >
> > Iterators are more difficult to implement in Ped files, because in this
> > format every line of the file is an individual, so to write an iterator
> > which iterates by population we will need to read at list the first row
> of
> > every line of all the file.
>
> It sounds like for Ped files it would make more sense to iterate over
> the individuals.  The mental picture I have in mind is a big
> spreadsheet, individuals as rows (lines), populations (and other
> information) as columns.  By having the parser iterate over the
> individuals one by one, the user could then "simplify" each individual
> as they are read in, recording in memory just the interesting data.
> This way the whole dataset need not be kept in memory.

This makes sense.
Basically, we should write a (Ped/GenePop)Iterator function, which should
read the file one line at a time, check if it a has correct syntax and is
not a comment, and then use 'yield' to create a Record object. Am I right?

>
> > I was also thinking of starting using a database to store data, instead
> of
> > files. This would probably solve the problem of out of memory when
> parsing
> > those long files.
> > I would probably use sqlalchemy to interface with this database: this is
> why
> > I would like to implement a Population and Individual objects, it will
> fit
> > better with relational mapping.
>
> That would mean adding sqlalchemy as another (optional) dependency for
> Biopython.  If you could use MySQLdb instead that would be better as
> several existing modules use this.  However, I would encourage you to
> avoid any database if possible because this makes the installation
> much more complicated for the end user, and imposes your own arbitrary
> schema as well.  It also means setting up suitable unit tests is also
> a pain.
>

Don't worry, I am not going to do that.
I will probably use sqlalchemy only in my scripts; I will use it to retrieve
data from the database, and then create Population/Marker/Individual objects
using the code I am writing now, or a adapt the objects created  by
sqlalchemy to be compatible with the functions I will have to use.

>
> Peter
>

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it