[Bioperl-l] fasta format

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Mon, 26 Aug 2002 13:06:42 -0500


> 
> > >\s+(.*)
> > is valid, as described by Bill Pearson.  Should have null ID, 
> > then description.
> > 
> 
> I'm concerned about making this change. It radically changes 
> the behaviour of the parser, even if this interpretation is 
> the correct one (Bill, could you clarify?)
> 
> The reason I'm concerned is that I have seen many people 
> putting a space between '>' and the ID when the 
> copy-and-paste sequences, believe it or not,

I believe it, and consider it an error to be flagged, or unexpected results to occur.

> correct or not. My point is that every web-server written 
> using bioperl will break after this change when users enter a 
> space between ID and '>', whereas it handled the situation 
> fine before.

I guess allowing a space after '>' was possibly an error, but if everyone relies on it, I wouldn't want to change it.  

> 
> So that makes my second concern: I don't want bioperl behave 
> much different than other sequence analysis toolkits.

I agree.  Even if I think it is more strange to have the first word of a description become an ID, I must be in the minority since no one complained before, and the fix was made to allow the space.  

Simplest solution I think might be just to put a comment the code (or other docs) that this is the way it is, right or wrong.  And then wait for biojava methods for event based parsing to be ported to bioperl.  Could get complex and pass a parameter to fasta, but I think it is not worth going down that road.

-Mat