thoughts/questions on Seq.pm and Parse.pm

Steven E. Brenner brenner@akamail.com
Tue, 18 Mar 1997 11:05:13 +0900 (JST)


> 
> It's getting tough keeping track of the 'open' issues that should be
> resolved, I've tried to distill most of them into the "ToDo" file with the
> Seq.pm distribution. I'm limited both by time and by programming skill
> (working on Seq.pm has been like a trial by fire-- learning by doing) which
> limits the amount of 'real' contributions I can make.

Maybe it was trial by fire, but you really produced some great code in the
end!  I suspect that anything else you do from now will seem trivial.


> I've got some questions/comments about several of the issues so here goes:
> 
> Parse.pm
> --------
> I wrote 2 basic methods that were necessary to get things in Seq.pm working
> without thinking much about the overall interface scheme. Any
> guidance/code/observations on method names, interface or implementation
> would be appreciated.

>From my quick look-thru, your coding seemed like a good extension of the
current scheme.  However, I'm concerned that the current scheme (which is
really my fault since it evolved from my sample code) is fundamentally
broken -- and since it is exported as part of the interface, this could
really haunt us unless we fix it RSN. 


> Seq.pm
> -------
> 
> o One major interface change that needs to happen SOON is changing
> Dna_to_Rna(), Rna_to_Dna() and translate() so that they return biosequence
> objects instead of strings. I tried for a little while to do this, using
> the Perl-OO-tutorial as a guide but kept running into problems with
> scoping. I'm also not sure if the "Right Way" invoves returning an object
> or a ref. to an object. I don't want to waste any more time doing this if
> the answer invoves a tiny piece of code that is immediatly obvious to
> someone on this list. So- if someone knows the "Right Way" to do this,
> please let me know!

Return the object iself.  (The object is, itself, just a blessed
reference)

A simple thing to do right now is just to make a new Bio::Seq the way you
would anywhere else:

 
 $seq = new Bio::Seq (-seq=>$reversed_seq,
                      -desc=>"Reverse complement of $my_desc",
                      -numbering=>'1',
                      -type=>'dna',
                      -ffmt=>'fasta',
                      -id=>$my_id."_revcom");

 return $seq;




> o Non-fatal use of Parse.pm if ReadSeq does not exist or not configured
> I wrapped some code around an eval{} statement in Seq.pm that tries to
> politely figure out if Parse.pm is available -- it checks for the presense
> of an exported "OK" variable in Parse.pm. Is this the right approach?
> Seq.pm should be able to use/not-use Parse.pm without any obvious error
> messages.

I think that there may be better approaches, but this seems eminently
reasonable.  The one thing that perhaps should be done is putting the
eval() in a BEGIN{}.  Though this may require that the $OK is also in a
BEGIN{}.


> o Site-specific configuration issues.
> Right now, Seq.pm does not have to be edited by users but Parse.pm and the
> test scripts do. I'm going to hit the POD docs for MakeMaker, etc. and try
> to figure out how setup a system where users edit a ".config" file or
> somesuch and the resulting info is used to automatically tweak Parse.pm and
> Seq.pm during the 'make' process. Again, any help/suggestions on this would
> be appreciated.

Again, I'm not sure of the right thing to do here; I haven't worked with
MakeMaker much before.

Probalby the right think to do is to have a real make, which runs a
program which spits out a Parse.pm.  (i.e., there's no Parse.pm in the
distribution, but it is the output of a ParseMaker Perl script which
queries users for file locations, etc.)  One place to possibly look for
guidance are things like PGPLOT which require external programs and
libraries.

If you are really pressed, I think it would be ok to simply set the
default to be for $OK to be false and force people to edit things (before
installation) to set them right.

> o Proposed validity markers
>   - A marker that would be set to 'false' whenever Seq.pm makes a call to carp()
>   - A marker to specify valid/invalid biosequence object
> Are these permutations of the same idea or two different things? I'm also
> not sure about how to implement.

Yes.  These are the same thing.  Basically, there should be a 'valid'
flag, and the code should carp() or croak() on any operation if the valid
flag is not set.

Alternatively (as mentioned in hte previous mail), croak()-ing on any
failure would always ensure that the object is valid.  It would
potentially cause programs to die often.


> o Default constructor ID
> Steve commented that the default constructor ID should be changed from
> "No_Id_Given" to "No_Id" plus a unique number. Assigning a number is easy
> enough but how would you keep track of "unique" numbers assigned? Is there
> a way to save state or remember these numbers each time new() is called? I
> think I see the potential problems that objects with the same 'ID' field
> could cause but I'm unsure how a 'unique' naming process would work.

in the package have a package global something like

my $UniqNum = 1234;

and also have a function something like

sub uniq_num {
  return $UniqNum++;
}



> o translate() treats ambiguity inconsistantly
> Steve mentioned this, but I want to be sure that I understand the problem
> -- it looks like the code deals with "N" unknown bases but does not deal
> with any of the other IUPAC symbols for ambiguity. Is this what you were
> pointing out Steve?

Yes.  I was suggesting, for example that code such as /^UU[UC]/ be changed
to /^UU[UCY]/

A couple lines above, the check for 
       $self->_monomer eq "1"           should be
       $self->_monomer == 1              which should in turn be
       $self->_monomer == $SeqType{dna}



As an aside, the function name layout() was a small stroke of genious.
Much much much better than the output() or format() functions I suggested!

As an another aside, do we ever specify whether the sequence in bio::seq
is upper or lower case?  Some of the code such as translate() seems to
depend on it being uppercase, but this is never explicitly specified. 

If possible, I would like to permit both cases as people sometimes use
them to mean different things.  We may want to add upcase() and downcase()  
[or something like that maybe toupper() and tolower()].




Hey.. you can't be *that* busy -- you changed the bio::perl to bioperl! :)