[Bioperl-l] fasta format

Matthew Pocock matthew_pocock@yahoo.co.uk
Sat, 24 Aug 2002 17:25:34 +0100


Hi. There is no one-size-fits-all solution for fasta description lines. 
Perhaps an optional callback on the fasta parser object that takes all 
text following ">" including all whitespace, and returns an array - 
(id,description)? You could write a handfull of default callbacks with 
obvious names and in realy mad situations (SCOP fasta may be a candidate 
for this), the user can provide their own. Apologies if the bioperl 
fasta already has this functionality.

Matthew

Paul Gordon wrote:
>>my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/
> 
> 
> I guess the tradeoffs are between:
> 
> 1. people who put a description, but no identifier at all, for whom the
> current code does not work nicely
> 
> 2. people who have a space between the > and the identifier.  
> 
> So, which is more likely to occur?  If you wanted to get really fancy, you
> might check, if there is a leading space, if the next word looks like an
> identifier (e.g. /^[^A-Z\-]$/i).  Even swissprot ids usually have
> numbers or underscores.  It may not work all the time (e.g. 16S kind of
> descriptors), but perhaps it's better than assuming the user isn't
> providing an identifier at all?  And it would be mostly backward
> compatible?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com