[Biopython-dev] Bio.SeqIO
Peter
biopython-dev at maubp.freeserve.co.uk
Mon Jan 15 20:04:34 UTC 2007
Michiel de Hoon wrote:
> In my opinion, the new Bio.SeqIO code is a huge improvement to
> Biopython, so I'd be happy to make a new release for it.
>
> ...
>
> For Bio.SeqIO, we're also in pretty good shape, as far as I can tell.
> From what I remember, the remaining issues were
> 1) Which functionality to include, in particular
> a) if functions should accept file names in addition to file handles;
I have decided to follow Michiel's stance on this issue: handles only.
> b) if functions should infer the file format from the file extension,
> the file content, or otherwise.
Right now the file format string is optional and if omitted the file
extension (via handle.name) is used to try and guess.
It would be trivial to remove this functionality and make format a
required argument.
We could at a later date chose to add limited support for format
guessing based on file contents without altering the function parameters
(i.e. the API).
Both these features would be nice to have (speaking as user) but then
again, am I prepared to support the headaches they may cause later on.
I'm wavering on this issue (having previously been in favour of
including the format guessing).
Item 1(c) on Michiel's list could have been do we need the three "helper
functions" which turned a file into a SeqRecord list, dictionary or
alignment.
Again, I have come round to Michiel's view and removed these as they
were just simple wrappers for list, SequencesToDictionary and
SequencesToAlignment.
> 2) What are the best names for the functions that the user will see.
The good news is that after that little spring clean there are less
functions to name - just these four really:
SequenceIterator, once known as FileToSequenceIterator and before that
File2SequenceIterator. Now takes just an input file handle and an
optional file format. Returns a SeqRecord iterator.
SequencesToDictionary - takes SeqRecord iterator or list, plus an
optional function to define the keys, and returns a dictionary.
SequencesToAlignment - takes SeqRecord iterator or list, and returns an
alignment object. Perhaps this functionality should be included in the
alignment class itself...
WriteSequences, once known as SequencesToFile - takes a SeqRecord
iterator or list, and output handle, and a format string. Intended for
use on a whole file at once (i.e. the general case where there may be
headers/footers etc). This does not let you do incremental writes one
for each record (which would be possible for some formats like GenBank
or fasta)
Peter
More information about the Biopython-dev
mailing list