[Bioperl-l] Input for Bio::CodonUsage::IO

Fri Apr 21 13:26:07 UTC 2006

Hi Brian
Thanks for the reply.
I might be overlooking something but I dowloaded this last week. The
tarball contained *.codon and *.spsum files and did not look at all like
as a codon usage table (kind of pseudo fasta). For that reason, I used
EMBOSS cutgextract that produced *.cut files starting from the CUTG
*.codon files. 

I finally managed to parse this *.cut files.
In order to do that I created a Bio::CodonUsage::IO::emboss module that
only contains the private _parse() method. The setup I used is a copycat
from Bio::SeqIO.
Meaning, now you can do:
my $io  = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss'
);

In case no format option is given it defaults to the
Bio::CodonUsage::IO::default module that contains the _parse() method
from the original Bio::CodonUsage::IO module. Actually, this should be
changed to a name that makes more sense but I did not know what this
default format looks like and/or where it comes from. My guess it is
coming from http://www.kazusa.or.jp but the site seems to be broken. At
least today.
Currently I continue with this setup in house, but in case you think it
is usefull to commit, just let me know.
Cheers,
Marc

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net] 
> Sent: Friday, April 21, 2006 3:09 PM
> To: Marc Logghe; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO
> 
> Marc,
> 
> It wants a file from the database CUTG. You can ftp them from 
> this mirror:
> 
> ftp://ftp.ebi.ac.uk/pub/databases/cutg
> 
> 
> Brian O.
> 
> 
> On 4/21/06 4:56 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:
> 
> > Hi,
> > I was wondering what format Bio::CodonUsage::IO expects as 
> input for 
> > the -file option.
> > I tried to pass it a *.cut file generated by EMBOSS' 
> cutgextract that 
> > looks like this:
> > #Species: Oryza sativa
> > #Division: gbpln
> > #Release: CUTG
> > #CdsCount: 70050
> > 
> > #Coding GC 55.34%
> > #1st letter GC 58.41%
> > #2nd letter GC 46.34%
> > #3rd letter GC 61.29%
> > 
> > #Codon AA Fraction Frequency Number
> > GCA    A     0.185    17.382 431151
> > <skipped>
> > TGA    *     0.435     1.228  30463
> > 
> > Looking into the _parse() method of Bio::CodonUsage::IO it appears 
> > that the table resembles this kind of format but is actually not 
> > exactly what it expects. My question is: how should it really look 
> > like ? I could not find an example in t/data.
> > Any clues ?
> > Thanks,
> > Marc
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>