[Bioperl-l] GenBank Flat File Parser
John Kloss
jkloss@sapiens.wustl.edu
Mon, 9 Sep 2002 16:50:17 -0700
I don't know if this is the right place to announce this.
I've created a generic GenBank Flat File parser. It's modular, easily
modified (to keep up with the changing flat file format), and pretty
fast. Hopefully it's easy to use. I have to parse flat files a lot so
the code is useful to me. Hopefully, it's useful to you.
It's available at
sapiens.wustl.edu/~jkloss/GenBankParser.tar.gz
It's a module. Just
gzip -cd GenBankParser.tar.gz | tar xvf -
cd GenBankParser
perl Makefile.PL
make
make install
I have two sample programs at
sapiens.wustl.edu/~jkloss/gb2fasta.txt
sapiens.wustl.edu/~jkloss/gt2fasta.txt
One parses the flat file and spits out nucleic fasta format, the other
protein.
As a quick example of it's use (though the perldoc info gives a lot of
examples), if I want to parse out the DEFINITION field, the VERSION
fields accession.version, and the LOCUS date field for each entry I
would code
use GenBankParser qw( DEFINITION LOCUS VERSION );
my $Parser = new GenBankParser;
$Parser->parse_file( \*STDIN, sub {
my $parser = shift;
print $parser->VERSION->accession, "\n";
print $parser->VERSION->version, "\n";
print $parser->LOCUS->date, "\n";
print $parser->DEFINITION, "\n";
});
That's about it.
The parser parses the FEATURES table, too. So if I wanted to parse out
the translation, accession, version, and note field for every CDS in an
entry except those which are pseudo genes I would code
use GenBankParser qw( FEATURES );
use GenBankParser::FEATURES qw( CDS );
(new GenBankParser)->parse_file( \*STDIN, sub {
my $parser = shift;
foreach my $cds ( @{ $parser->FEATURES->CDS } ) {
next if $cds->pseudo;
print $cds->accession, "\n";
print $cds->version, "\n";
print $cds->note, "\n";
print $cds->translation,"\n";
}
});
Anyway, if you find it useful, let me know.
John Kloss <jkloss@sapiens.wustl.edu>
Systems Admin., Database Admin., Programmer.
Gish Lab, Genome Sequencing Center
Washington University Medical School ... in St. Louis