[Biopython] Memory use - alignment formats

Peter Cock p.j.a.cock at googlemail.com
Thu Sep 5 12:16:19 EDT 2013


Hi Tanya,

For any alignment based format SeqIO will call AlignIO,
which means it will load all the records into memory at
once to build a MSA object which holds a list of all the
SeqRecord objects in memory. SeqIO handles FASTA
files itself so doesn't do this.

There is no simple answer for your specific need with
PHYLIP format - potential something more memory
efficient could be done for the non-interlaced PHYLIP
formats...

If you can work with FASTA and SeqIO instead, that
would be best.

Peter

On Wed, Sep 4, 2013 at 11:41 AM, Tanya Golubchik
<golubchi at stats.ox.ac.uk> wrote:
> Hello,
>
> I'm looking for the most memory-efficient way to write a large number of
> very long sequences (several Mb each) to a file. This works easily with a
> generator passed to SeqIO.write if I'm writing in a sequential format like
> multifasta, but what about, say, phylip?
>
> It is better/equivalent to convert the alignment to a list first (obviously
> using a lot of memory in the process), or to write to a multifasta file,
> then use SeqIO.convert?
>
> Thanks,
> Tanya
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list