[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc

Wed Aug 2 11:00:46 UTC 2006

> Question One
> ============
> Is reading sequence files an important function to you, and if so which
> file formats in particular (e.g. Fasta, GenBank, ...)

Yes. FASTA.

> Question Two - Reading Fasta Files
> ==================================
> Which of the following do you currently use (and why)?:
>
> (f) Other (Could you tell us more?)

I have written my own short iterator so that my code is portable
without requiring Biopython to be installed.

> Question Three - index_file based dictionaries
> ==============================================
> Do you use any of the following:

No.

> Question Four - Record Access...
> ================================
> When loading a file with multiple sequences do you use:
>
> (a) An iterator interface(e.g. Bio.Fasta.Iterator) to give you the
> records one by one in the order from the file.

Yes.

> Question Four - Fasta files: FastaRecord or SeqRecord
> =====================================================
> If you use Fasta files, do you want get records returned as FastaRecords
> or as SeqRecords?  If SeqRecords, do you use your own title2ids mapping?

SeqRecords. I hate it when an interface tries to parse the definition
line for me. Perhaps a set of standard definition line parsers should
be provided so that one can choose, but usually I would rather have
plain text and parse it myself.

> Question Six - Martel, Scanners and Consumers
> ==============================================
> Some of BioPython's existing parsers (e.g. those using Martel) use an
> event/callback model, where the scanner component generates parsing
> events which are dealt with by the consumer component.
>
> Do any of you use this system to modify existing parser behaviour, or
> use it as part of your own personal file parser?

No.
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute