New Bio::Seq and Bio::Seq::Parse (.025 BETA)

Steven E. Brenner brenner@akamail.com
Mon, 17 Mar 1997 12:12:10 +0900 (JST)


Hi Chris,

> The location is: http://www.ayf.org/~c_raffi/bioperl/top.html

  Nifty logo!  Only, I'm confused about what the code is supposed [sic] to
be doing.  (Also, the name of hte project is 'bioperl', not 'bio::perl'
That would imply a bio/perl.pm module, whereas bioperl doesn't imply
anything at all :) 



> I wrote a crude Parse.pm that serves as an interface to ReadSeq and made
> the appropriate changes to Seq.pm.

  Great!

  Had a quick look at it; it seems quite reasonable and the changes in
Seq.pm are also appropriate.  On comment is that it would be much more
efficient to pass around references than potentially huge strings.

  However, these modification doesn't deal with the bigger issue of what
to do about the strings v. files problem, that I mention in the 5th
paragraph of:

http://www.hrz.uni-bielefeld.de/mailinglists/BCD/vsns-bcd-perl/9702/0003.html

  Is the parse function in Bio::Seq supposed to take 1 or 2 parameters (as
documented) or 4 params (as coded).  The problem arises because of some of
my inefficient legacy design at the very outset, but I think there's a
solution. 

-=-

  A few other nits from a _very_ cursory look-through

@SeqForm appears never to be created

I would change [@%]SeqForm to [@%]SeqFmt, or even [@%]seq_fmt (to be
consistent with the rest of the naming). 

The names of formats in SeqForm, etc., should be all lower-case for the
reasons discussed earlier on this list.  (Becuase is it FastA or fasta or
Fasta?  GenBank, Genbank, or genbank?  If it is always lowercase, there's
no ambiguity.)

There's no 'valid' field to indicate whether or not the object is indeed
valid for any operation.  For example, if setseq is used to set an invalid
sequence.  

to-do: more validity checking, such as in setseq

A "_undef" parameter (or something like it) needs to be available to unset
various options

Functions which can return an invalid result (such as parse_bad) should
return undef ratehr