[Biopython] newbie question: sequence parsing

Fields, Christopher J cjfields at illinois.edu
Tue Oct 18 19:11:56 UTC 2011


On Oct 18, 2011, at 2:04 PM, Peter Cock wrote:

> On Tue, Oct 18, 2011 at 7:08 PM, Nat Echols <nathaniel.echols at gmail.com> wrote:
>> ...
>> 2) Is there a single function that will take a file (and/or string) of
>> unknown format and try the different parsers until it finds one that works?
>>  We currently use several different formats (raw string, FASTA, PIR, and
>> possibly others), and we try not to rely on the file extension alone to
>> determine the type.  We already have something that does this using our
>> parsers, which could be refactored to use Bio.SeqIO instead, but if
>> BioPython has something similar I'd rather use that.
> 
> No, we don't have such a function. There are many difficulties
> with format guessing - both from the file contents and even the
> filename. I usually cite the Zen of Python, Explicit is Better Than
> Implicit.
> 
> Peter


Some implicitness is fine, but speaking from experience (BioPerl's GuessSeqFormat) trying to guess the format from the dozens that litter the bioinformatics landscape is a nest of hornets no one wants to maintain.  

chris



More information about the Biopython mailing list