[Bioperl-l] Input for Bio::CodonUsage::IO
Marc Logghe
Marc.Logghe at DEVGEN.com
Fri Apr 21 13:26:07 UTC 2006
Hi Brian
Thanks for the reply.
I might be overlooking something but I dowloaded this last week. The
tarball contained *.codon and *.spsum files and did not look at all like
as a codon usage table (kind of pseudo fasta). For that reason, I used
EMBOSS cutgextract that produced *.cut files starting from the CUTG
*.codon files.
I finally managed to parse this *.cut files.
In order to do that I created a Bio::CodonUsage::IO::emboss module that
only contains the private _parse() method. The setup I used is a copycat
from Bio::SeqIO.
Meaning, now you can do:
my $io = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss'
);
In case no format option is given it defaults to the
Bio::CodonUsage::IO::default module that contains the _parse() method
from the original Bio::CodonUsage::IO module. Actually, this should be
changed to a name that makes more sense but I did not know what this
default format looks like and/or where it comes from. My guess it is
coming from http://www.kazusa.or.jp but the site seems to be broken. At
least today.
Currently I continue with this setup in house, but in case you think it
is usefull to commit, just let me know.
Cheers,
Marc
> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Friday, April 21, 2006 3:09 PM
> To: Marc Logghe; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO
>
> Marc,
>
> It wants a file from the database CUTG. You can ftp them from
> this mirror:
>
> ftp://ftp.ebi.ac.uk/pub/databases/cutg
>
>
> Brian O.
>
>
> On 4/21/06 4:56 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:
>
> > Hi,
> > I was wondering what format Bio::CodonUsage::IO expects as
> input for
> > the -file option.
> > I tried to pass it a *.cut file generated by EMBOSS'
> cutgextract that
> > looks like this:
> > #Species: Oryza sativa
> > #Division: gbpln
> > #Release: CUTG
> > #CdsCount: 70050
> >
> > #Coding GC 55.34%
> > #1st letter GC 58.41%
> > #2nd letter GC 46.34%
> > #3rd letter GC 61.29%
> >
> > #Codon AA Fraction Frequency Number
> > GCA A 0.185 17.382 431151
> > <skipped>
> > TGA * 0.435 1.228 30463
> >
> > Looking into the _parse() method of Bio::CodonUsage::IO it appears
> > that the table resembles this kind of format but is actually not
> > exactly what it expects. My question is: how should it really look
> > like ? I could not find an example in t/data.
> > Any clues ?
> > Thanks,
> > Marc
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
More information about the Bioperl-l
mailing list