thoughts/questions on Seq.pm and Parse.pm

Georg Fuellen fuellen@dali.Mathematik.Uni-Bielefeld.DE
Mon, 17 Mar 1997 18:36:34 +0000 (GMT)


Hi Chris,

> It's getting tough keeping track of the 'open' issues that should be
> resolved, I've tried to distill most of them into the "ToDo" file with the
> Seq.pm distribution. I'm limited both by time and by programming skill
> (working on Seq.pm has been like a trial by fire-- learning by doing) which
> limits the amount of 'real' contributions I can make.
>
> I've got some questions/comments about several of the issues so here goes:
> 
> Parse.pm
> --------
> I wrote 2 basic methods that were necessary to get things in Seq.pm working
> without thinking much about the overall interface scheme. Any
> guidance/code/observations on method names, interface or implementation
> would be appreciated.

I really don't feel qualified to say much about this; I hope Steve can
mail soon...!

Do you see a way to integrate convert_from_raw() into convert() ?
Should the case that we want to convert but don't know the format yet
be handled by Parse.pm or by Seq.pm ? Do we wanna move the whole parsing 
system into Parse.pm  ? Make it an object, then ? Can UnivAln.pm use
Parse.pm, too ?

I'd very much prefer ffmt (fileformat) to fmt; why did you change this ?
I'd also prefer ``-input'' to ``-sequence'' since ReadSeq reads multiple
alignment formats as well.
``-location'' is a little long - how about ``-loc'' which we have already
in use for $seq->{names}. Or, call it ``-file'' ?

I just note that there's FileHandle::new_tmpfile() and the Camel
seems to prefer this (p.485). 

> Seq.pm
> -------
> 
> o One major interface change that needs to happen SOON is changing
> Dna_to_Rna(), Rna_to_Dna() and translate() so that they return biosequence
> objects instead of strings. I tried for a little while to do this, using
> the Perl-OO-tutorial as a guide but kept running into problems with
> scoping. I'm also not sure if the "Right Way" invoves returning an object
> or a ref. to an object. I don't want to waste any more time doing this if
> the answer invoves a tiny piece of code that is immediatly obvious to
> someone on this list. So- if someone knows the "Right Way" to do this,
> please let me know!

In UnivAln.pm, there's a method aln() which returns an alignment; I hope
it can serve as a model (Steve?)...
 
> o Non-fatal use of Parse.pm if ReadSeq does not exist or not configured
> I wrapped some code around an eval{} statement in Seq.pm that tries to
> politely figure out if Parse.pm is available -- it checks for the presense
> of an exported "OK" variable in Parse.pm. Is this the right approach?

I'd check for $PARSE::VERSION > release_number_which_you_need. 
Maybe there's something better though.

> Seq.pm should be able to use/not-use Parse.pm without any obvious error
> messages.
> 
> 
> o Site-specific configuration issues.
> Right now, Seq.pm does not have to be edited by users but Parse.pm and the
> test scripts do. I'm going to hit the POD docs for MakeMaker, etc. and try
> to figure out how setup a system where users edit a ".config" file or
> somesuch and the resulting info is used to automatically tweak Parse.pm and
> Seq.pm during the 'make' process. Again, any help/suggestions on this would
> be appreciated.

Let's postpone this until Steve replies.
I thought MakeMaker is so complex b/c it can take care of _all_ configuration
issues automatically ?! ;-)

> o Proposed validity markers
>   - A marker that would be set to 'false' whenever Seq.pm makes a call to carp()
>   - A marker to specify valid/invalid biosequence object
> Are these permutations of the same idea or two different things? I'm also

They are both ways of defining what ``valid'' is. ..
For me a valid object conforms to some requirements, like (for UnivAln), 
that $self{type} is correct (especially that it reflects the fact that the 
alignment is just a sequence bag, i.e.  the rows are of different length), 
$self{id} has no whitespace, $self{desc} conforms to $self{descffmt}, 
$self{row_ids}, etc, have the correct size.
This is something I don't have time for right now, but it's needed eventually.

> not sure about how to implement.
>
> o Default constructor ID
> Steve commented that the default constructor ID should be changed from
> "No_Id_Given" to "No_Id" plus a unique number. Assigning a number is easy
> enough but how would you keep track of "unique" numbers assigned? Is there
> a way to save state or remember these numbers each time new() is called? I
> think I see the potential problems that objects with the same 'ID' field
> could cause but I'm unsure how a 'unique' naming process would work.

Same here..

best wishes,
georg

> o translate() treats ambiguity inconsistantly
> Steve mentioned this, but I want to be sure that I understand the problem
> -- it looks like the code deals with "N" unknown bases but does not deal
> with any of the other IUPAC symbols for ambiguity. Is this what you were
> pointing out Steve?
> 
> 
> 
> Sorry for the length!
> 
> Regards,
> Chris Dagdigian
> cdagdigian@genetics.com
> 
> 
>