[Biopython] SeqIO fasta "fakes" recognition

Marco Galardini marco.galardini at unifi.it
Thu Feb 23 16:40:25 UTC 2012


On 02/23/2012 05:21 PM, Peter Cock wrote:
> On Thu, Feb 23, 2012 at 4:05 PM, Marco Galardini
> <marco.galardini at unifi.it>  wrote:
>> Hi all,
>>
>> i was wondering if you are aware of a method to distinguish between "real"
>> fasta files and files that just happen to have a ">" character.
>> I would like to scan a directory and return only the "real" fasta files.
>> I tried to open a .png file and surprisingly it gave me the following
>> results:
> Guessing bioinformatics file types is non-trivial, and not something
> that Bio.SeqIO attempts to do (unlike BioPerl). We take the Python
> approach that you the user need to be explicit, and if you say it is
> a FASTA file we'll try to treat it as such.
You're right: probably the best thing to do will be to trust users and 
hope they won't push garbage as inputs.
> Detecting image files (or indeed most binary file types) on the other
> hand is much easier - so do that instead?
>
In principle this is true, but the fact is that i think it won't be easy 
or straightforward to account for all possible file formats that can be 
found in a given directory. I'll stick to good python principles and 
hope to have smart-enough users :)

Thanks for your instant reply.
Marco

-- 
-------------------------------------------------
Marco Galardini
DBE - Department of Evolutionary Biology
University of Florence - Italy

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 2288249
mobile: +39 340 2808041
-------------------------------------------------




More information about the Biopython mailing list