[Bioperl-l] Input for Bio::CodonUsage::IO

Fri Apr 21 18:00:11 EDT 2006

Marc,

I spoke too soon, looking at IO.pm's _parse method shows clearly that it's
meant to parse the same format it writes, something like what's shown below.
As you noted, this doesn't look anything like the *codon files, which have a
fasta-style header followed by 64 numbers, clearly these are the counts of
codons in the sequence that's referenced.

In addition IO.pm wouldn't have worked anyway, it was missing a basic "use"
statement, fixed now. Like you I can't connect to Codon at Kasuza, when I
can I'll see if it can provide us with a file formatted something like the
text below.

Brian O.

WDW311031#4\AJ311031\complement(1717..2511)\795\CAC84661.1\Wheat dwarf virus
- [unknown]  1  CDS's

AmAcid   Codon             Number       /1000    Fraction

Gly      GGG                 0.00        0.00        0.00
Gly      GGA                 0.00        0.00        0.00
Gly      GGT                 0.00        0.00        0.00
Gly      GGC                 0.00        0.00        0.00

Glu      GAG                 0.00        0.00        0.00
Glu      GAA                 0.00        0.00        0.00
Asp      GAT                 0.00        0.00        0.00
Asp      GAC                 0.00        0.00        0.00

Val      GTG                 0.00        0.00        0.00
Val      GTA                 0.00        0.00        0.00
Val      GTT                 0.00        0.00        0.00
Val      GTC                 0.00        0.00        0.00

Ala      GCG                 0.00        0.00        0.00
Ala      GCA                 0.00        0.00        0.00
Ala      GCT                 0.00        0.00        0.00
Ala      GCC                 0.00        0.00        0.00

Arg      AGG                 0.00        0.00        0.00
Arg      AGA                 0.00        0.00        0.00
Ser      AGT                 0.00        0.00        0.00
Ser      AGC                 0.00        0.00        0.00

Lys      AAG                 0.00        0.00        0.00
Lys      AAA                 0.00        0.00        0.00
Asn      AAT                 0.00        0.00        0.00
Asn      AAC                 0.00        0.00        0.00

Met      ATG                 0.00        0.00        0.00
Ile      ATA                 0.00        0.00        0.00
Ile      ATT                 0.00        0.00        0.00
Ile      ATC                 0.00        0.00        0.00

Thr      ACG                 0.00        0.00        0.00
Thr      ACA                 0.00        0.00        0.00
Thr      ACT                 0.00        0.00        0.00
Thr      ACC                 0.00        0.00        0.00

Trp      TGG                 0.00        0.00        0.00
Ter      TGA                 0.00        0.00        0.00
Cys      TGT                 0.00        0.00        0.00
Cys      TGC                 0.00        0.00        0.00

Ter      TAG                 0.00        0.00        0.00
Ter      TAA                 0.00        0.00        0.00
Tyr      TAT                 0.00        0.00        0.00
Tyr      TAC                 0.00        0.00        0.00

Leu      TTG                 0.00        0.00        0.00
Leu      TTA                 0.00        0.00        0.00
Phe      TTT                 0.00        0.00        0.00
Phe      TTC                 0.00        0.00        0.00

Ser      TCG                 0.00        0.00        0.00
Ser      TCA                 0.00        0.00        0.00
Ser      TCT                 0.00        0.00        0.00
Ser      TCC                 0.00        0.00        0.00

Arg      CGG                 0.00        0.00        0.00
Arg      CGA                 0.00        0.00        0.00
Arg      CGT                 0.00        0.00        0.00
Arg      CGC                 0.00        0.00        0.00

Gln      CAG                 0.00        0.00        0.00
Gln      CAA                 0.00        0.00        0.00
His      CAT                 0.00        0.00        0.00
His      CAC                 0.00        0.00        0.00

Leu      CTG                 0.00        0.00        0.00
Leu      CTA                 0.00        0.00        0.00
Leu      CTT                 0.00        0.00        0.00
Leu      CTC                 0.00        0.00        0.00

Pro      CCG                 0.00        0.00        0.00
Pro      CCA                 0.00        0.00        0.00
Pro      CCT                 0.00        0.00        0.00
Pro      CCC                 0.00        0.00        0.00

Coding GC 0%
1st letter GC 0%
2nd letter GC 0%
3rd letter GC 0%
Genetic code 1

On 4/21/06 9:26 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:

> Hi Brian
> Thanks for the reply.
> I might be overlooking something but I dowloaded this last week. The
> tarball contained *.codon and *.spsum files and did not look at all like
> as a codon usage table (kind of pseudo fasta). For that reason, I used
> EMBOSS cutgextract that produced *.cut files starting from the CUTG
> *.codon files. 
> 
> I finally managed to parse this *.cut files.
> In order to do that I created a Bio::CodonUsage::IO::emboss module that
> only contains the private _parse() method. The setup I used is a copycat
> from Bio::SeqIO.
> Meaning, now you can do:
> my $io  = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss'
> );
> 
> In case no format option is given it defaults to the
> Bio::CodonUsage::IO::default module that contains the _parse() method
> from the original Bio::CodonUsage::IO module. Actually, this should be
> changed to a name that makes more sense but I did not know what this
> default format looks like and/or where it comes from. My guess it is
> coming from http://www.kazusa.or.jp but the site seems to be broken. At
> least today.
> Currently I continue with this setup in house, but in case you think it
> is usefull to commit, just let me know.
> Cheers,
> Marc
> 
>  
> 
>> -----Original Message-----
>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>> Sent: Friday, April 21, 2006 3:09 PM
>> To: Marc Logghe; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO
>> 
>> Marc,
>> 
>> It wants a file from the database CUTG. You can ftp them from
>> this mirror:
>> 
>> ftp://ftp.ebi.ac.uk/pub/databases/cutg
>> 
>> 
>> Brian O.
>> 
>> 
>> On 4/21/06 4:56 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:
>> 
>>> Hi,
>>> I was wondering what format Bio::CodonUsage::IO expects as
>> input for 
>>> the -file option.
>>> I tried to pass it a *.cut file generated by EMBOSS'
>> cutgextract that
>>> looks like this:
>>> #Species: Oryza sativa
>>> #Division: gbpln
>>> #Release: CUTG
>>> #CdsCount: 70050
>>> 
>>> #Coding GC 55.34%
>>> #1st letter GC 58.41%
>>> #2nd letter GC 46.34%
>>> #3rd letter GC 61.29%
>>> 
>>> #Codon AA Fraction Frequency Number
>>> GCA    A     0.185    17.382 431151
>>> <skipped>
>>> TGA    *     0.435     1.228  30463
>>> 
>>> Looking into the _parse() method of Bio::CodonUsage::IO it appears
>>> that the table resembles this kind of format but is actually not
>>> exactly what it expects. My question is: how should it really look
>>> like ? I could not find an example in t/data.
>>> Any clues ?
>>> Thanks,
>>> Marc
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l