[Bioperl-l] fasta format

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Sat, 24 Aug 2002 15:29:00 -0400 (EDT)


I think this idea has a lot of merit; we often find ourselves parsing lots
of embedded information out of a fasta header line; perhaps some of the
more common callbacks (i.e. parsing NCBI-prepared nr or swissprot fasta
flatfiles) could make their way back into the distribution without the
creation of an entirely new SeqIO::fasta::NCBI object.

-Aaron

On Sat, 24 Aug 2002, Matthew Pocock wrote:

> Hi. There is no one-size-fits-all solution for fasta description lines.
> Perhaps an optional callback on the fasta parser object that takes all
> text following ">" including all whitespace, and returns an array -
> (id,description)? You could write a handfull of default callbacks with
> obvious names and in realy mad situations (SCOP fasta may be a candidate
> for this), the user can provide their own. Apologies if the bioperl
> fasta already has this functionality.
>
> Matthew
>
> Paul Gordon wrote:
> >>my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/
> >
> >
> > I guess the tradeoffs are between:
> >
> > 1. people who put a description, but no identifier at all, for whom the
> > current code does not work nicely
> >
> > 2. people who have a space between the > and the identifier.
> >
> > So, which is more likely to occur?  If you wanted to get really fancy, you
> > might check, if there is a leading space, if the next word looks like an
> > identifier (e.g. /^[^A-Z\-]$/i).  Even swissprot ids usually have
> > numbers or underscores.  It may not work all the time (e.g. 16S kind of
> > descriptors), but perhaps it's better than assuming the user isn't
> > providing an identifier at all?  And it would be mostly backward
> > compatible?
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu