[Bioperl-l] Thoughts on Bio::Tools::Glimmer

Andrew Stewart stewarta at nmrc.navy.mil
Wed Apr 11 18:40:18 UTC 2007


First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 
$genenum"},

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)


Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


-Andrew


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270





More information about the Bioperl-l mailing list