[BioPython] SeqIO.write() Multiple Calls for fasta

Matthew Strand stran104 at chapman.edu
Sun Dec 14 09:38:34 UTC 2008


Hello,
I have been working with SeqIO.write() on fasta files based on some info
provided in the API Documentation. It is written that SeqIO.write() should
"probably" perform fine with multiple calls, but with my experience it
actually does overwrite the whole file, even when the file is opened and
closed immediately before and after the write. Has anyone else had this
experience?



I will be rewriting my code to create large arrays before adding to the
file, which is easy for the example provided below. However, this will take
some work to change the part of the application that runs against our local
Blast databases for a few days, periodically adding sequences to files. I'd
like to make sure that I'm not the only one with this issue before rewriting
it.

---------BEGIN API Documentation Quote
Output - Advanced
=================
The effect of calling write() multiple times on a single file will vary
depending on the file format, and is best avoided unless you have a strong
reason to do so.

Trying this for certain alignment formats (e.g. phylip, clustal, stockholm)
would have the effect of concatenating several multiple sequence alignments
together. Such files are created by the PHYLIP suite of programs for
bootstrap analysis.

For sequential files formats (e.g. fasta, genbank) each "record block" holds
a single sequence. For these files it would probably be safe to call
write() multiple times.
---------END API Documentation Quote


---------BEGIN Code Sample to take a bunch of fasta files with multiple
species and generate individual files for each species.
for j in range(1, len(kogid)):
    name = "EXT-CLB-" + kogid[j] + ".seq"
    if os.path.exists(name):
        handle = open(name, "rU")
        records = list(SeqIO.parse(handle, "fasta"))
        for record in records:
            speciesID = record.id.split('|')[0]
            outFile = open(speciesID.split('-')[0] + ".seq", 'w')
            SeqIO.write([record], outFile, "fasta")
            outFile.close()
            print "Added a record for" + speciesID.split('-')[0]
    handle.close()
--------END Code Sample



Thank you for your responses,
-Matthew J



More information about the Biopython mailing list