[Bioperl-l] bp_genbank2gff3.pl vs. EMBL2GFF ?
Scott Cain
scott at scottcain.net
Thu Jan 22 22:17:54 UTC 2009
Hi Don,
Thanks for this--I committed it today.
Scott
On Wed, Jan 21, 2009 at 3:35 PM, Don Gilbert
<gilbertd at cricket.bio.indiana.edu> wrote:
>
> Dan Bolser <dan.bolser at gmail.com> spotted a problem in bp_genbank2gff3.pl,
> and asked whether it was worth the effort to fix/use rather than a simpler
> call to Bio::SeqIO methods.
>
> Here is a patch that should fix the problem you found with bp_genbank2gff3
> species->binomial, as well as an update for changes in BioPerl/Annotation use.
> As to the question of value, this bp_genbank2gff3 does more parsing of
> genbank/embl/swissprot annotations, and tries to put more of these into
> GFF v3 hierarchical gene model structures. If you don't need that level of detail,
> the simpler Bio::SeqIO processing is good enough, and less fragile to changes
> in your data source and/or BioPerl updates.
>
> - Don Gilbert
>
> BioPerl-1.5.9/scripts/Bio-DB-GFF/genbank2gff3.PLS
> #$Id: genbank2gff3.PLS 15088 2008-12-04 02:49:09Z bosborne $;
>
>
> diff -bwrc scripts/Bio-DB-GFF/genbank2gff3.PLS scripts/Bio-DB-GFF/genbank2gff3.fixed.pl
> *** scripts/Bio-DB-GFF/genbank2gff3.PLS Fri Jan 16 13:33:47 2009
> --- scripts/Bio-DB-GFF/genbank2gff3.fixed.pl Wed Jan 21 15:23:08 2009
> ***************
> *** 671,678 ****
> 'product' => 'product',
> 'Reference' => 'reference',
> 'OntologyTerm' => 'Ontology_term',
> ! 'comment' => 'Note',
> ! 'comment1' => 'Note',
> # various map-type locations
> # gene accession tag is named per source db !??
> # 'Index terms' => keywords ??
> --- 671,678 ----
> 'product' => 'product',
> 'Reference' => 'reference',
> 'OntologyTerm' => 'Ontology_term',
> ! #? 'comment' => 'Note',
> ! #? 'comment1' => 'Note',
> # various map-type locations
> # gene accession tag is named per source db !??
> # 'Index terms' => keywords ??
> ***************
> *** 684,691 ****
> || $seq->annotation->get_Annotations("update-date")
> || $is_rich ? $seq->get_dates() : ();
> my ($comment)= $seq->annotation->get_Annotations("comment");
> ! my ($species)= $seq->annotation->get_Annotations("species")
> ! || ( $seq->can('species') ? $seq->species()->binomial() : undef );
>
> # update source feature with main GB fields
> $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
> --- 684,694 ----
> || $seq->annotation->get_Annotations("update-date")
> || $is_rich ? $seq->get_dates() : ();
> my ($comment)= $seq->annotation->get_Annotations("comment");
> ! my ($species)= $seq->annotation->get_Annotations("species");
> ! if( ! $species && $seq->can('species') && defined $seq->species() && $seq->species()->can('binomial') )
> ! {
> ! $species= $seq->species()->binomial();
> ! }
>
> # update source feature with main GB fields
> $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
> ***************
> *** 699,707 ****
> foreach my $atag (sort keys %AnnotTagMap) {
> my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
> my @anno = map{
> ! ref $_
> ! ? split( /[,;] */, $_->value)
> ! : split( /[,;] */, "$_") if($_);
> } $seq->annotation->get_Annotations($atag);
> foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
> }
> --- 702,713 ----
> foreach my $atag (sort keys %AnnotTagMap) {
> my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
> my @anno = map{
> ! # dgg; handle Bio::Annotation::TagTree as get_all_values
> ! if(ref $_ && $_->can('get_all_values')) { split( /[,;] */, join ";", $_->get_all_values) }
> ! elsif(ref $_ && $_->can('display_text')) { split( /[,;] */, $_->display_text) }
> ! elsif(ref $_ && $_->can('value')) { split( /[,;] */, $_->value) }
> ! #bad.gets hashes# elsif($_) { split( /[,;] */, "$_") }
> ! else { (); }
> } $seq->annotation->get_Annotations($atag);
> foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
> }
>
> ...........
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list