[Bioperl-l] Genbank2gff3 script update

Don Gilbert gilbertd at cricket.bio.indiana.edu
Tue Mar 27 23:42:30 UTC 2007


Dear Bioperl developers,

Here is an improved bp_Genbank2gff3.pl script, with bug fixes
and enhancements.  The non-transparent changes in behavior are
made via non-default command flags. I've updated these against current
Bioperl CVS. Would one of you care to add this to your CVS repository?  

THanks, Don Gilbert

Find at  http://eugenes.org/gmod/genbank2chado/

=item Bioperl bp_genbank2gff3.pl 

  bin/genbank2gff3.PLS   (Bioperl CVS scripts/Bio-GFF-DB/genbank2gff3.PLS)
  lib/Bio-new/SeqFeature/Tools/TypeMapper.pm      (required for genbank2gff3 update)
  lib/Bio-new/SeqFeature/Tools/Unflattener.pm     (minor change suggested for genbank2gff3)
    (put into your Bioperl lib/Bio/... directories)

There are also this unrelated patch 
  lib/Bio-new/Graphnics/Glyph/processed_transcript.pm  
      -- new flag to ignore excess subfeatures from Chado's gene-mrna-polypeptide-exon model.
  
=item Genbank2gff3 changes

  * Polypeptide alternate gene model added (--noCDS option)
    Standard gene model:  gene > mRNA > (UTR,CDS,exon)
    G-R-P-E alternate model:   gene > mRNA > polypeptide > exon
    Polypeptide contains all the important protein info (IDs, translation, GO terms)

  * IO pipes: curl ftp://ncbigenomes/... | genbank2gff3 --in stdin --out stdout | gff2chado ...
  
  * GenBank main record fields are added to source feature
    and the sourcetype, commonly chromosome for genomes, is used.
      
  * Gene Model handling for ncRNA, pseudogenes are added.

  * GFF header is cleaner, more informative, and GFF_VERSION option
    
  * GFF ##FASTA inclusion is improved, and translation sequence stored there.
     
  * FT -> GFF attribute mapping is improved.
  
  * --format choice of SeqIO input formats (GenBank default). 
    Uniprot/Swissprot and EMBL produce useful GFF.
    
  * SeqFeature::Tools::TypeMapper has a few FT -> SOFA additions, more flexible usage.

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/



More information about the Bioperl-l mailing list