[Biopython-dev] Wrapping sequences in Fasta output
Peter
biopython-dev at maubp.freeserve.co.uk
Thu Aug 9 08:10:22 UTC 2007
Michiel De Hoon wrote:
> Sebastian Bassi wrote:
>> On 8/7/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
>>> I was wondering why we go through the Fasta reader/writer instead of
>>> reading/writing the file contents directly, as in
>>> for filename, input_file in zip(pair, input_files):
>>> input_file.close()
>>> file(input_file.name, "w").write(file(filename).read())
>> The old Fasta writer used to write a 70 column formated fasta file.
>> Your method (and I think also the new seq.io) write the fasta data as
>> a one big line.
Maybe wise doesn't like its input as one long line?
> Peter, can we change the behavior of SeqIO.write so that it writes the fasta
> data in some fixed column format? For comparison, Bioperl appears to use a
> column width of 60 characters:
>
> http://www.bioperl.org/wiki/FASTA_sequence_format
>
> --Michiel.
That would be easy, and might improve compatibility with some tools
which recommend the lines be at most 80 letters long. 60 does seem to be
considered a default.
My personal preference is with no line breaks, partly because I tend to
work more with domain sequences (usually less than 100 characters). This
also means that when viewing a sequence in a text editor I can simply
halve the line number to get the record number.
Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files
with a max sequence line length of 60.
Peter
More information about the Biopython-dev
mailing list