[Biopython] SeqIO.parse Question
João Rodrigues
anaryin at gmail.com
Mon Nov 23 09:02:06 UTC 2009
Dear all,
This is merely a suggestion. I've been using SeqIO.parse on some user input
I receive from a server.
I'm using the following code:
for num, record in enumerate(SeqIO.parse(StringIO(FASTA_sequence),
'fasta')):
req_seq = record.seq.tostring()
req_name = record.id
Since I have no clue what the user might introduce, regarding the number of
sequences, I have to user parse, instead of read. If I introduce only one
sequence and it is a valid FASTA sequence, it does its work flawlessly. If I
insert several FASTA sequences and one of them is wrongly formatted, it
won't complain at all. If I insert a single wrong sequence, it doesn't
complain either.
Is there a convenient way for me to check FASTA formats? The usual
startswith('>') doesn't work for multiple sequences. And the user might have
spaces in the sequence so a split('\n') is also ruled out to split the
sequences.
At the moment, I'm checking if the first sequence of the input starts with
'>', and if it does, the parser kicks in and for every req_seq object I
check if there is any character that is not valid (a number or an otherwise
weird character). If I get a mis-formatted sequence in there it will
complain because spaces, newlines, and numbers ( often found in sequence
names ) are not in my allowed list.
However, if there's an easier way, it would save me some if checks and for
loops :) Suggestions?
Best regards to all,
João [...] Rodrigues
@ http://stanford.edu/~joaor/
More information about the Biopython
mailing list