[Biopython-dev] Bio.SeqIO.convert function?

Peter biopython at maubp.freeserve.co.uk
Wed Jul 29 03:43:58 EDT 2009


On Tue, Jul 28, 2009 at 11:09 PM, Brad Chapman<chapmanb at 50mail.com> wrote:
> Hi Peter;
>
>> As a possible enhancement to Bio.SeqIO, I've been toying with
>> the idea of introducing another function, essentially to provide
>> the following functionality:
>>
>> def convert(in_handle, in_format, out_handle, out_format, alphabet=None) :
>>     """Converts between two file formats, returns number of records."""
>>     records = parse(in_handle, in_format, alphabet)
>>     return write(records, out_handle, out_format)
> [...]
>> However, that isn't the real aim here. Having a function like this
>> would allow a number of file format specific optimisations -
>> instead of using SeqIO.parse to create SeqRecord objects
>> which get converted by SeqIO.write as shown above.
>
> I like this idea. To the extent in which we can optimize popular
> conversions, this gives us a standard place to put it. There is
> going to be lots of fastq to fasta conversion and being as fast as
> possible is good (notice my avoidance of any more potentially
> misconstrued jokes).

OK, assuming we press ahead with this, the Bio.SeqIO.convert()
function would be the only public API addition, the internals
would all be private. What I had in mind was Bio.SeqIO.convert()
using a dictionary of functions (all with the same arguments),
keyed on a tuple of (in_format, out_format). I was thinking of
using Bio/SeqIO/_convert.py for the individual functions (like
GenBank/EMBL to FASTA/tab, or any FASTQ to FASTA/tab).
Note I am expecting that in many cases it will be quite simple
to handle several related conversions in one function, and this
should avoid some code duplication. My marking these details
as private, we can of course refine this scheme later.

> Conversion lately seems to be getting worse, not better, with
> all of the alignment and annotation formats springing up.
> Extending this to AlignIO and TreeIO as Eric suggested is
> also great.

Whatever we do for Bio.SeqIO, we can follow the same pattern
for Bio.AlignIO etc.

> So +1 from me,
> Brad

And we basically had a +0 from Michiel, and a +1 from Eric.
And I like the idea but am not convinced we need it. Maybe
we should put the suggestion forward on the main discussion
list for debate?

Peter



More information about the Biopython-dev mailing list