[BioPython] calculate F-Statistics from SNP data

Peter biopython at maubp.freeserve.co.uk
Thu Oct 23 09:57:38 UTC 2008


On Thu, Oct 23, 2008, Giovanni Marco Dall'Olio wrote:
> On Wed, Oct 22, Peter wrote:
>> On Wed, Oct 22, Giovanni Marco Dall'Olio wrote:
>> >
>> > Iterators are more difficult to implement in Ped files, because in this
>> > format every line of the file is an individual, so to write an iterator
>> > which iterates by population we will need to read at list the first row
>> > of every line of all the file.
>>
>> It sounds like for Ped files it would make more sense to iterate over
>> the individuals.  The mental picture I have in mind is a big
>> spreadsheet, individuals as rows (lines), populations (and other
>> information) as columns.  By having the parser iterate over the
>> individuals one by one, the user could then "simplify" each individual
>> as they are read in, recording in memory just the interesting data.
>> This way the whole dataset need not be kept in memory.
>
> This makes sense.
> Basically, we should write a (Ped/GenePop)Iterator function, which should
> read the file one line at a time, check if it a has correct syntax and is
> not a comment, and then use 'yield' to create a Record object. Am I right?

Yes :)

Python functions written with "yield" are called  "generator functions", see:
http://www.python.org/dev/peps/pep-0255/

Peter



More information about the Biopython mailing list