[Biopython-dev] format autodection

Andrew Dalke adalke at mindspring.com
Mon Dec 31 05:50:08 EST 2001

>I was a confused with what was going on in the code until I realized
>that there's actually two slightly different uses of the word

After some early attempts, I decided that "recognize" just
wasn't the right word to use.  I've decided to use "identify",
and my solution to the confusion in words is that the
identify returns a 'Format'.

format = Bioformat.identify(open("file.dat"))
if format is not None:
    print format.name

>  In the first use,
>> def _recognizeFile(parser, infile):
>recognize is used as a predicate for whether the parser can handle the
>format of the data in infile.

I've kept that usage internally.

>In the second,
>> class RecognizeFormats:
>> [...]
>>   def recognizeFile(self, infile):
>recognize selects between multiple formats and returns the appropriate
>one for the data.

This form is now known as 'identify'

I wasn't explicitly aware of the distinction, but what happened
to me was it didn't scan well in English.  I wrote some sample
code and tried to make the names fit the way I decribed what
was going on.  I ended up with:
   "I want to identify the format used"
   "First, we see if this recognizes the format"

>It would clear things up if one of them were renamed something else,
>e.g. the first use is renamed as "handlesFile" or "acceptsFile".


>The file-like handle in File.py is incomplete for this purpose.  It
>can only push back stuff as lines, and not as other blocks of data.
>It should not be hard to add that capability, though.

Yeah, I saw that.  I've included a 'ReseekFile' which buffers
everything read, and allows reseeking to the original position
(and only the original position).  It only supports the 'read'
method, since that's all Martel needs.  I only allows tells()
at the beginning, and only allows seeks to that position.

It has new method called 'nobuffer', which clears the buffer
after it's all been (re)read.  This prevents the ReseekFile
from storing everything even after the file has been parsed.

>> 8) Does this idea make sense to others?
>Yes!  And it's sorely needed!  :)

Thanks!  Now, take a look at the code to see what the result
looks like :)


More information about the Biopython-dev mailing list