[BioPython] calculate F-Statistics from SNP data

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Wed Oct 22 17:10:45 UTC 2008


On Wed, Oct 22, 2008 at 6:03 PM, Tiago Antão <tiagoantao at gmail.com> wrote:

> On Wed, Oct 22, 2008 at 11:34 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> > I have not looked at the specifics here, but adopting an iterator
> > approach might make sense - returning the entries one by one as parsed
> > from the file.  This is the idea for the Bio.SeqIO and Bio.AlignIO
> > parsers.  The user can then turn the entries into a list (if they have
> > enough memory), filter them as the arrive, etc.  For example, you
> > could compile a list of only those desired population entries,
> > discarding the others on the fly.
>
> I will have look at iterators in Python. This idea from Giovannni is
> actually floating around with current users for GenePop data which
> have exactly the same problem (loooong records).
>


Iterators are more difficult to implement in Ped files, because in this
format every line of the file is an individual, so to write an iterator
which iterates by population we will need to read at list the first row of
every line of all the file.
I was also thinking of starting using a database to store data, instead of
files. This would probably solve the problem of out of memory when parsing
those long files.
I would probably use sqlalchemy to interface with this database: this is why
I would like to implement a Population and Individual objects, it will fit
better with relational mapping.

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it




More information about the Biopython mailing list