[Bioperl-l] guessing sequence format

Heikki Lehvaslaiho heikki at nildram.co.uk
Tue Dec 2 15:25:37 EST 2003


Andreas Kähäri has written a module that gives SeqIO and AlignIO ability to 
look into input files and guess the format of the sequence: 
Bio::Tools::GuessSeqFormat. See the POD docs in the module for formats and 
details.

Initial modifications to Bio::SeqIO::new() and Bio::AlignIO::new() to try to 
determine the format in this order:

1. given in argument (-format)
2. based on the file name extension  
3. looking into file by calling Bio::Tools::GuessSeqFormat

No verification of the format is done if conditions 1 or 2 are met. I think it 
would be neat to have an option to do that. It could, for example, be linked 
to verbosity. Suggestions or implementations are welcome.

Tests have been written for reading all formats from files and even reading 
from a file handle works which is really cool:

----------------- snip --------------------
use IO::String;
use Bio::SeqIO;

my $string = ">test1 no comment
agtgctagctagctagctagct
>test2 no comment
gtagttatgc
";

my $stringfh = new IO::String($string);

my $seqio = new Bio::SeqIO(-fh => $stringfh);
while( my $seq = $seqio->next_seq ) {
    print $seq->id, "\n";
}

----------------- snip --------------------

It would really good if people could try this out now so thatthe possible bugs 
could be ironed out before the 1.4.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________




More information about the Bioperl-l mailing list