[Bioperl-l] fasta format

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Mon, 26 Aug 2002 08:46:57 -0500


All good points.  Patterns below are not exact, only meant to illustrate (I hope) the use cases.  My take is

>\n(or whatever OS says a new line is)
is valid

>\s+\n
is valid

>\s+(.*)
is valid, as described by Bill Pearson.  Should have null ID, then description.

>^\S+\s+(.*)
is valid as already works


Is that all agreeable?

-----Original Message-----
From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
Sent: Saturday, August 24, 2002 11:26 AM
To: Paul Gordon
Cc: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] fasta format


Hi. There is no one-size-fits-all solution for fasta description lines. 
Perhaps an optional callback on the fasta parser object that takes all 
text following ">" including all whitespace, and returns an array - 
(id,description)? You could write a handfull of default callbacks with 
obvious names and in realy mad situations (SCOP fasta may be a candidate 
for this), the user can provide their own. Apologies if the bioperl 
fasta already has this functionality.

Matthew

Paul Gordon wrote:
>>my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/
> 
> 
> I guess the tradeoffs are between:
> 
> 1. people who put a description, but no identifier at all, for whom the
> current code does not work nicely
> 
> 2. people who have a space between the > and the identifier.  
> 
> So, which is more likely to occur?  If you wanted to get really fancy, you
> might check, if there is a leading space, if the next word looks like an
> identifier (e.g. /^[^A-Z\-]$/i).  Even swissprot ids usually have
> numbers or underscores.  It may not work all the time (e.g. 16S kind of
> descriptors), but perhaps it's better than assuming the user isn't
> providing an identifier at all?  And it would be mostly backward
> compatible?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l