[Biopython-dev] Relaxing SeqIO, AlignIO, etc write functions?

Peter biopython at maubp.freeserve.co.uk
Fri Mar 19 06:45:55 EDT 2010


Hi Sebastian,

On Thu, Mar 18, 2010 at 7:39 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Thu, Mar 18, 2010 at 4:01 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> There was another +1 vote from Marshall Hampton, and no
>> comments against (so far). Let's leave it a few days, but unless
>> anyone speaks out in favour of the status-quo (keep the
>> current strict check in the write function), then make the change.
>
> If we are going to change this, why not setting "fasta" as default
> input/output format? This would also results in less typing when
> processing fasta files (most of the time in my workflow at least).

Give an inch and they'll take a mile ;)

I agree that FASTA is likely to be the most common file format
for most users, but I don't think we should make it the default.
One specific reason is because the FASTA parser will allow and
ignore a header comment, you will get confusing results if the
file is not actually a FASTA file (typically it will parse other
text files like GenBank, EMBL or FASTQ with no errors, but
will return no records). I am worried that people will assume
that if they don't specify the format that Biopython will
determine it automatically - which it won't.

[Yes, I'm talking about the read/parse functions here, but it
would be odd if the write function defaulted to FASTA but they
did not.]

Also, could you clarify if you are in favour of relaxing the
requirement that the write function takes a list/iterator of
records/alignments to allow a single SeqRecord or alignment?

Thanks,

Peter


More information about the Biopython-dev mailing list