[Bioperl-l] bp_genbank2gff3.pl vs. EMBL2GFF ?
Don Gilbert
gilbertd at cricket.bio.indiana.edu
Wed Jan 21 20:35:41 UTC 2009
Dan Bolser <dan.bolser at gmail.com> spotted a problem in bp_genbank2gff3.pl,
and asked whether it was worth the effort to fix/use rather than a simpler
call to Bio::SeqIO methods.
Here is a patch that should fix the problem you found with bp_genbank2gff3
species->binomial, as well as an update for changes in BioPerl/Annotation use.
As to the question of value, this bp_genbank2gff3 does more parsing of
genbank/embl/swissprot annotations, and tries to put more of these into
GFF v3 hierarchical gene model structures. If you don't need that level of detail,
the simpler Bio::SeqIO processing is good enough, and less fragile to changes
in your data source and/or BioPerl updates.
- Don Gilbert
BioPerl-1.5.9/scripts/Bio-DB-GFF/genbank2gff3.PLS
#$Id: genbank2gff3.PLS 15088 2008-12-04 02:49:09Z bosborne $;
diff -bwrc scripts/Bio-DB-GFF/genbank2gff3.PLS scripts/Bio-DB-GFF/genbank2gff3.fixed.pl
*** scripts/Bio-DB-GFF/genbank2gff3.PLS Fri Jan 16 13:33:47 2009
--- scripts/Bio-DB-GFF/genbank2gff3.fixed.pl Wed Jan 21 15:23:08 2009
***************
*** 671,678 ****
'product' => 'product',
'Reference' => 'reference',
'OntologyTerm' => 'Ontology_term',
! 'comment' => 'Note',
! 'comment1' => 'Note',
# various map-type locations
# gene accession tag is named per source db !??
# 'Index terms' => keywords ??
--- 671,678 ----
'product' => 'product',
'Reference' => 'reference',
'OntologyTerm' => 'Ontology_term',
! #? 'comment' => 'Note',
! #? 'comment1' => 'Note',
# various map-type locations
# gene accession tag is named per source db !??
# 'Index terms' => keywords ??
***************
*** 684,691 ****
|| $seq->annotation->get_Annotations("update-date")
|| $is_rich ? $seq->get_dates() : ();
my ($comment)= $seq->annotation->get_Annotations("comment");
! my ($species)= $seq->annotation->get_Annotations("species")
! || ( $seq->can('species') ? $seq->species()->binomial() : undef );
# update source feature with main GB fields
$sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
--- 684,694 ----
|| $seq->annotation->get_Annotations("update-date")
|| $is_rich ? $seq->get_dates() : ();
my ($comment)= $seq->annotation->get_Annotations("comment");
! my ($species)= $seq->annotation->get_Annotations("species");
! if( ! $species && $seq->can('species') && defined $seq->species() && $seq->species()->can('binomial') )
! {
! $species= $seq->species()->binomial();
! }
# update source feature with main GB fields
$sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
***************
*** 699,707 ****
foreach my $atag (sort keys %AnnotTagMap) {
my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
my @anno = map{
! ref $_
! ? split( /[,;] */, $_->value)
! : split( /[,;] */, "$_") if($_);
} $seq->annotation->get_Annotations($atag);
foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
}
--- 702,713 ----
foreach my $atag (sort keys %AnnotTagMap) {
my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
my @anno = map{
! # dgg; handle Bio::Annotation::TagTree as get_all_values
! if(ref $_ && $_->can('get_all_values')) { split( /[,;] */, join ";", $_->get_all_values) }
! elsif(ref $_ && $_->can('display_text')) { split( /[,;] */, $_->display_text) }
! elsif(ref $_ && $_->can('value')) { split( /[,;] */, $_->value) }
! #bad.gets hashes# elsif($_) { split( /[,;] */, "$_") }
! else { (); }
} $seq->annotation->get_Annotations($atag);
foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
}
...........
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
More information about the Bioperl-l
mailing list