[bioperl] Multiple sequences & ReadSeq

Chris Dagdigian cdagdigian@genetics.com
Thu, 13 Feb 1997 16:24:38 -0400


Ok, here are my thoughts about ReadSeq... [thanks Will for your comments!]

Probably the best course of action to take with Seq.pm is for it to be able
to robustly handle input/output of many formats as well as contain some
simple manipulation methods. The more complicated stuff can be done by
things that inherit the BioSeq object properties.

My preferences on solving the parsing problem are in order of attraction:

        1. perl code to parse and format everything inside Seq.pm
        2. Modify readSeq sourcecode to add an "unbuffered output" option that
            should allow proper bi-directional piping.

        3. Writing sequences to temp files and opening a one-way pipe to readseq

Of these options, #3 can be accomplished right now, I'm willing do tackle
this ASAP if people would give some guidance on where these temp files
should be written ("/tmp", the scripts working directory?, etc. etc.).
Maybe there should be a global $TEMP_DIR config option, and Seq.pm can
write temp files to $TEMP_DIR/[PID]/xxx.tmp where [PID] is the process ID
of the running script?

Option #1 I think is very possible if people share their knowlege (and
regular expressions!) I'm hampered by the fact that I only work regularly
with single GCG and Fasta formatted files. I just don't know enough about
other sequence formats to properly write parse/output code.


Multiple Sequences:

I'm willing to help write whatever the 'powers that be' decide is the
proper behavior for dealing with files containing multiple sequences. I
honestly don't feel qualified enough to dictate the big picture of Bio::*.
My first experience with OO programming was working on Seq.pm.

I will definatly change the POD in Seq.pm to better reflect which formats
allow for multiple seuquences, thanks again Will for pointing this out.


Regards,
Chris Dagdigian
cdagdigian@genetics.com