[Biopython] SeqIO fasta "fakes" recognition

Marco Galardini marco.galardini at unifi.it
Thu Feb 23 16:05:53 UTC 2012


Hi all,

i was wondering if you are aware of a method to distinguish between 
"real" fasta files and files that just happen to have a ">" character.
I would like to scan a directory and return only the "real" fasta files.
I tried to open a .png file and surprisingly it gave me the following 
results:

SeqIO.parse(open('Screenshot.png'),'fasta').next()
SeqRecord(seq=Seq('Ȏ;9r$?���8�n���˗�ݻ7M�4��ɓ\�r���0����$It��I...q+', 
SingleLetterAlphabet()), 
id='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]``', 
name='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]``', 
description='>>DEE\xd1\xaaU+\x8e\x1f?Nxx8g\xce\x9c1\xb8]`` 
\x81\x81\x81\xec\xdb\xb7Ok\xf9\xd5\xabW\xf1\xf0\xf0`\xe2\xc4\x89\x8c\x181\x82\x9e={j\x95+\x14', 
dbxrefs=[])

I tried to use some Alphabets but i experienced the same results.
Thanks in advance,
Marco

-- 
-------------------------------------------------
Marco Galardini
DBE - Department of Evolutionary Biology
University of Florence - Italy

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 2288249
mobile: +39 340 2808041
-------------------------------------------------




More information about the Biopython mailing list