[Bioperl-l] GenBank Flat File Parser
Hilmar Lapp
hlapp@gnf.org
Mon, 9 Sep 2002 17:18:18 -0700
Hi John,
sounds interesting. Do you have a coarse comparison to the Bioperl
genbank parser with respect to both features and speed? If it's much
faster or if it's event-based, would you consider integrating your
parser into the bioperl framework?
-hilmar
On Monday, September 9, 2002, at 04:50 PM, John Kloss wrote:
> I don't know if this is the right place to announce this.
>
> I've created a generic GenBank Flat File parser. It's modular, easily
> modified (to keep up with the changing flat file format), and pretty
> fast. Hopefully it's easy to use. I have to parse flat files a lot so
> the code is useful to me. Hopefully, it's useful to you.
>
> It's available at
>
> sapiens.wustl.edu/~jkloss/GenBankParser.tar.gz
>
> It's a module. Just
>
> gzip -cd GenBankParser.tar.gz | tar xvf -
> cd GenBankParser
> perl Makefile.PL
> make
> make install
>
> I have two sample programs at
>
> sapiens.wustl.edu/~jkloss/gb2fasta.txt
> sapiens.wustl.edu/~jkloss/gt2fasta.txt
>
> One parses the flat file and spits out nucleic fasta format, the other
> protein.
>
> As a quick example of it's use (though the perldoc info gives a lot of
> examples), if I want to parse out the DEFINITION field, the VERSION
> fields accession.version, and the LOCUS date field for each entry I
> would code
>
> use GenBankParser qw( DEFINITION LOCUS VERSION );
>
> my $Parser = new GenBankParser;
>
> $Parser->parse_file( \*STDIN, sub {
> my $parser = shift;
>
> print $parser->VERSION->accession, "\n";
> print $parser->VERSION->version, "\n";
> print $parser->LOCUS->date, "\n";
> print $parser->DEFINITION, "\n";
> });
>
> That's about it.
>
> The parser parses the FEATURES table, too. So if I wanted to parse out
> the translation, accession, version, and note field for every CDS in an
> entry except those which are pseudo genes I would code
>
> use GenBankParser qw( FEATURES );
> use GenBankParser::FEATURES qw( CDS );
>
> (new GenBankParser)->parse_file( \*STDIN, sub {
> my $parser = shift;
>
> foreach my $cds ( @{ $parser->FEATURES->CDS } ) {
>
> next if $cds->pseudo;
>
> print $cds->accession, "\n";
> print $cds->version, "\n";
> print $cds->note, "\n";
> print $cds->translation,"\n";
> }
> });
>
> Anyway, if you find it useful, let me know.
>
> John Kloss <jkloss@sapiens.wustl.edu>
> Systems Admin., Database Admin., Programmer.
>
> Gish Lab, Genome Sequencing Center
> Washington University Medical School ... in St. Louis
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------