[Bioperl-l] SeqIO (stress) testing

Kris Boulez krbou@pgsgent.be
Wed, 20 Dec 2000 09:04:16 +0100


I played a bit around with SeqIO (starting from one format and
writing/reading it back in different formats) and found some interesting
things. I didn't have time to get in to all of these, will do hopefully
this evening.

One thing I already found out (and was able to document properly) is
that starting from t/test.genbank, writing it out in PIR format. It then
is impossible for BioPerl to read this file back in. As I have little or
no knowledge about the PIR format I submitted a bug report (#876).

For the following I don't have a demo script ready yet (will do this
evening)

- starting from t/test.genbank, writing a swiss-prot file gives (we die,
  no error thrown)
Programming error - cannot called write_line_swissprot_regex with
different length pre1 and pre2 tags! at
/usr/lib/perl5/site_perl/5.005/Bio/SeqIO/swiss.pm line 949, <GEN2> chunk
176.

( when adding $pre1 and $pre2 to the die() )
Programming error - cannot called write_line_swissprot_regex with
different length pre1 and pre2 tags (FT   sig_peptide     76    123
) (FT                                )! at
/usr/lib/perl5/site_perl/5.005/Bio/SeqIO/swiss.pm line 949, <GEN2> chunk
176.


- starting from t/test.genbank, writing a gcg file, reading this gcg
  file gives
-------------------- EXCEPTION --------------------
MSG: Looks like start of another sequence. See documentation. 
CONTEXT: Error in uNKNOWN CONTEXT
SCRIPT: seqtest.pl
STACK: 
Bio::SeqIO::gcg::next_seq(123)
main::seqtest.pl(14)
---------------------------------------------------



- starting from t/test.embl, there is a problem for SeqIO to read a gcg
  file it wrote himself (it just loops forever). I will investigate this
one further as it's not clear when/what happens.



By looking at the test (and test sequences) we have now I saw that we
only try to read the first sequence from our test sequence files (apart
from GCG, which reads more then one file). The test.embl even contains
only one sequence. I think that we should test for reading/writing
multiple sequences from one file.


Kris,