New Bio::Seq and Bio::Seq::Parse (.025 BETA)

Steven E. Brenner brenner@akamail.com
Fri, 21 Mar 1997 15:30:08 +0900 (JST)


I'm only online for about an hour before I disappear again for over a
week.  Here are some very quick thoughts.


On Tue, 18 Mar 1997, Georg Fuellen wrote:
> Steve wrote,

> I would carp and then the user knows that warnings and even fatal errors are 
> possible later on. 

Sounds reasonable.  I can go with this.  


> I strongly believe we should support ids with ``\S''; I've seen them in 
> Fasta-files, Nexus files, etc, etc.

I would prefer we document it as restricted to \w, but not enforce that.
We might want to plan to allow \S eventually.

> > > > whitespace is a legal component of most filesystems.  (It is on Unix,
> > > > Macintosh, and Windows, for example). 
> > > 
> > > Space (`` '') may be OK, but newline (``\n'') certainly not ?!
> > 
> > On unix, at least, \n certainly is valid.  Typically the only illegal
> > characters are "\0" and "/".  Some filesystems even allow those.
> 
> I guess I failed to say that I'm talking about filenames. In a lot of
> cases, the filename will give rise to the (default) id and vice versa.

Yes.  *filenames* on Macintosh, Unix, and Windows allow spaces and on Unix
many all other \s characters are also allowed.



> > > Hm. What about ids that we inherit from somewhere ? E.g. from a file ?
> > > On a parallel machine, this won't work either I think. What about other
> > > distributed computation; CORBA may offer solutions, but it's another
> > > big can of worms although I feel that we'll have to open it at some time -
> > > does anyone know more about CORBA ? (I've just heard rumors! :)
> > 
> > Why would you inherit ids?  These ids are ONLY for setting names of
> > bio::seq's.  I don't see how parallel programs and/or CORBA have anything
> > to do with it.  We only need to guarantee that id's are unique within a
> > given program.
> 
> For me (and in the current code, e.g. parse_fasta), the ids are the identifiers 
> you find in files, etc, etc. It seems that you're introducing a new notion of
> id, the merit of which is rather unclear to me.

I think we're talking at cross-purposes.  The original question was about
what to do when the user fails to give any id whatsoever.  These uniq_id's
generated by the code are only in the case where the user has failed to
provide an id.  Currently, it just sets it to "_no_id_given" or something
to that effect.  I suggested that this be set to simething like
"_no_id_xxxxxx" where xxxxx is some number unique to the program, so
that the different sequences have different names.