[Biopython] SeqIO.parse Question

Peter biopython at maubp.freeserve.co.uk
Mon Nov 23 10:18:24 UTC 2009


On Mon, Nov 23, 2009 at 9:02 AM, João Rodrigues <anaryin at gmail.com> wrote:
> Dear all,
>
> This is merely a suggestion. I've been using SeqIO.parse on some user input
> I receive from a server.
>
> I'm using the following code:
>
> for num, record in enumerate(SeqIO.parse(StringIO(FASTA_sequence),
> 'fasta')):
>
>    req_seq = record.seq.tostring()
>    req_name = record.id
>
> Since I have no clue what the user might introduce, regarding the number of
> sequences, I have to user parse, instead of read. If I introduce only one
> sequence and it is a valid FASTA sequence, it does its work flawlessly. If I
> insert several FASTA sequences and one of them is wrongly formatted, it
> won't complain at all. If I insert a single wrong sequence, it doesn't
> complain either.

Can you give us an example?

> Is there a convenient way for me to check FASTA formats? The usual
> startswith('>') doesn't work for multiple sequences. And the user might have
> spaces in the sequence so a split('\n') is also ruled out to split the
> sequences.

You could do something like ("\n"+FASTA_sequence).count("\n>") to
get the number of records.

> At the moment, I'm checking if the first sequence of the input starts with
> '>', and if it does, the parser kicks in and for every req_seq object I
> check if there is any character that is not valid (a number or an otherwise
> weird character). If I get a mis-formatted sequence in there it will
> complain because spaces, newlines, and numbers ( often found in sequence
> names ) are not in my allowed list.
>
> However, if there's an easier way, it would save me some if checks and for
> loops :) Suggestions?

I'm not 100% sure what you are tying to do - some examples should help.

Peter




More information about the Biopython mailing list