[Bioperl-l] gcg.pm, another one

Derek Gatherer d.gatherer at vir.gla.ac.uk
Fri Oct 31 10:40:25 EST 2003


Hello again

Last bug submitted to bugzilla.  The following may be a bug, but I wonder 
if there is a problem with my GCG format.

Try this script:

#!/usr/bin/perl -w

use lib "/usr/local/lib/site_perl/5.8.0/";
use strict;
use Bio::Seq;
use Bio::SeqIO;

my $dnain = Bio::SeqIO->new( '-format' => 'GCG' , -file => "cds.gcg");

while((my $seqobj = $dnain->next_seq()))
{
     $seqobj->display_id;
}

on the file cds.gcg below.

to get:

------------- EXCEPTION  -------------
MSG: Looks like start of another sequence. See documentation.
STACK Bio::SeqIO::gcg::next_seq 
/usr/local/lib/site_perl/5.8.0//Bio/SeqIO/gcg.pm:124
STACK toplevel gcgtest.pl:10

The problem, I think, is that the SeqIO stream doesn't seem to recognise 
the change over from one sequence to another.  Or do I need some record 
separator between my sequences???  If this really is also a bug, I'll 
submit it too.

Offending line in gcg.pm is:

  124      if( /\.\.$/ ) {
         $self->throw("Looks like start of another sequence. See 
documentation. ");
        }

and here's the file cds.gcg, containing two sequences in GCG format

!!NA_SEQUENCE 1.0
  ASSEMBLE    October 27, 2003 15:32

Symbols:     1 to: 1269  from: merlin.seq /rev   ck: 8363, 55186 to: 56454

LOCUS       MERLIN                235645 bp    DNA     linear   VRL 14-AUG-2003
DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION   MERLIN
VERSION
KEYWORDS    .
SOURCE      Human herpesvirus 5 . . .

merlin_ul43.cds  Length: 1269  October 27, 2003 15:32  Type: N  Check: 9039  ..

        1  ATGGAGAAAA CGCCGGCGGA GACGACGGCG GTTTCAGCTG GCAACGTGCC

       51  ACGTGACTCA ATTCCGTGTA TAACTAACGT GTCCGCGGAC ACCCGCGGCC

      101  GTACCCGCCC CAGCAGACCA GCCACCGTCC CTCAGCGACG TCCCGCGCGG

      151  ATCGGACACT TTAGGCGGCG CAGCGCCAGC CTTAGCTTTC TTGACTGGCC

      201  GGACGACAGC GTCACAGAGG GCGTTCGGAC GACCTCCGCG TCGGTCGCCG

      251  CCTCCGCGGC CCGTTTCGAC GAAATCCGGC GGCGCCGCCA GAGCATCAAC

      301  GACGAGATGA AGGAACGCAC GCTGGAGGAC GCGCTGGCTG TCGAGCTGGT

      351  CAACGAGACC TTCCGCTGCT CTGTCACCTC CGACGCCCGC AAGGACTTGC

      401  AGAAGCTGGT TCGTCGCGTC AGCGGCACGG TGCTGCGTCT CAGCTGGCCA

      451  AACGGTTGGT TCTTCACCTA CTGCGACCTG TTACGCGTCG GCTACTTTGG

      501  ACATCTCAAT ATTAAAGGTT TGGAGAAGAC CTTCCTGTGC TGCGACAAGT

      551  TCTTGCTGCC GGTGGGCACT GTGAGTCGTT GCGAAGCCAT CGGCCGCCCA

      601  CCGCTACCCG TACTCATCGG CGAGGGCGGT CGCGTCTACG TCTACTCGCC

      651  TGTGGTGGAA TCGCTGTACC TGGTGTCGCG GTCCGGTTTC CGCGGCTTCG

      701  TGCAGGAGGG CCTGCGCAAC TACGCGCCGC TGCGCGAAGA ACTGGGCTAT

      751  GTCCGCTTCG AGACCGGCGG CGACGTGGGT CGCGAGTTCA TGTTGGCGCG

      801  CGACCTGCTG GCCCTGTGGC GCCTGTGCAT GAAGCGCGAG GGTTCTATCT

      851  TCAGCTGGCG AGACGGTAAC GAGGCGCTGA CGACGGTCGT CTTGAACGGG

      901  AGCCAGACTT ACGAGGATCC GGCCCACGGC AACTGGTTAA AAGAGACGTG

      951  CTCGCTGAAC GTGCTGCAGG TATTTGTGGT GCGGGCCGTG CCGGTGGAGT

     1001  CGCAGCAGCG CCTGGACATC TCCATACTGG TGAACGAGAG CGGCGCCGTC

     1051  TTCGGCGTGC ATCCCGATAC GCGGCAGGCG CACTTTCTGG CGCGCGGACT

     1101  CCTGGGCTTC TTTCGCGTCG GGTTCTTGCG GTTCTGCAAC AACTACTGCT

     1151  TCGCCCGCGA CTGTTTTACC CACCCTGAAA GCGTGGCACC CGCTTACCGC

     1201  GCCACCGGCT GTCCCAGAGA ACTGTTTTGT CGTCGTTTGC GCAAAAAGAA

     1251  GGGGCTCTTT GCTCGAAGG

!!NA_SEQUENCE 1.0
  ASSEMBLE    October 27, 2003 15:32

Symbols:     1 to: 2718  from: merlin.seq /rev   ck: 8363, 57946 to: 60663

LOCUS       MERLIN                235645 bp    DNA     linear   VRL 14-AUG-2003
DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION   MERLIN
VERSION
KEYWORDS    .
SOURCE      Human herpesvirus 5 . . .

merlin_ul45.cds  Length: 2718  October 27, 2003 15:32  Type: N  Check: 4998  ..

        1  ATGAATCCGG CTGACGCGGA CGAGGAACAG CGGGTGTCCT CGGTGCCCGC

       51  ACATCGGTGC CGGCCAGGTA GGATTCCAAG CCGCAGCGCG GAAACCGAGA

      101  CGGAGGAATC GTCGGCAGAG GTCGCCGCTG ATACTATCGG GGGAGATGAC

      151  AGCGAGCTCG AGGAGGGGCC GCTGCCCGGG GGTGACAAGG AAGCGTCCGC

      201  TGGAAATACC AACGTATCGA GCGGTGTAGC ATGTGTAGCG GGTTTTACGA

      251  GTGGTGGCGG CGTCGTCAGT TGGCGTCCCG AGTCGCCGTC TCCCGACGGC

      301  ACGCCGTCTG TGCTGTCGTT GACGCGTGAC AGCGGTCCCG CCGTGCCCAG

      351  TCGCGGTGGA CGCGTGAGTA GCGGTCTGAG CACCTTTAAT CCGGCCGGCG

      401  CGACCAGGAT GGAGCTGGAC AGTGTCGAGG AGGAGGACGA TTTCGGGGCT

      451  TCGCTCTGCA AAGTATCGCC GCCGATACAA GCTATGCGCA TGTTGATGGG

      501  CAAAAAGTGT CATTGTCACG GCTACTGGGG CAAGTTTCGC TTTTGCGGCG

      551  TACAGGAGCC GGCGCGGGAG CTGCCGTCCG ACAGGAACGC GCTGTGGCGC

      601  GAGATGGACA CCGTGTCGCG GCACAGTGCC GGTTTGGGCA GTTTCAGGCT

      651  ATTTCAGCTC ATTATGCGCC ACGGTCCCTG TCTGATTCGT CACTCGCCGC

      701  GTTGCGACCT GCTGTTGGGT CGCTTTTATT TCAAAGCCAA CTGGGCGCGT

      751  GAAAGCCGCA CGCCACTGTG TTACGCTTCG GAGCTGTGCG ATGAGTCGGT

      801  GCGCCGTTTT GTGCTGCGTC ACATGGAGGA TCTACCCAAG CTGGCCGAGG

      851  AGACGGCGCG TTTTGTGGAA TTGGCCGGTT GCTGGGGCTT GTACGCGGCC

      901  ATTTTGTGTT TGGATAAGGT GTGTCGCCAA CTGCACGGAC AGGACGAGAG

      951  CCCGGGCGGC GTGTTTTTGC GCATCGCCGT GGCGTTGACG GCCGCTATCG

     1001  AGAACAGTAG GCACTCGCGC ATCTATCGTT TCCATCTGGA TGCGCGTTTC

     1051  GAGGGCGAGG TGTTGGAATC GGTGTTGAAG CGCTGTCGCG ATGGGCAGCT

     1101  GTCGCTGTCC ACCTTCACCA TGTCTACCGT GGGTTTCGAT CGCGTGCCGC

     1151  AGTACGACTT TCTGATCTCG GCCGACCCTT TCTCGCGTGA CGCCAGTTGG

     1201  GCGGCCATGT GCAAGTGGAT GAGTACCTTG AGTTGCGGCG TTTCTGTGTC

     1251  GGTGAACGTA ACGCGACTTA ACGCCGATGT GAACAGCGTG ATTCGTTGCC

     1301  TGGGGGGATA CTGCGATTTG ATACGCGAGA AGGAGGTGCA TCGACCCGTG

     1351  GTACGTGTGT TTGTGGACAT GTGGGACGTG GCCGCTATCC GCGTGATTAA

     1401  CTTTATTCTC AAAGAAAGCA CGTCGGAGTT GACGGGGGTT TGCTACGCTT

     1451  TCAACGTGCC TAGCGTGTTA ATGAAGCGCT ACCGTGCGCG TGAGCAGCGC

     1501  TACTCGCTGT TTGGGCGGCC TGTCTCCCGG CGGCTCTCGG ACCTGGGTCA

     1551  GGAGTCGGCT TTCGAGAAAG AGTATTCGCG CTGCGAGCAA TCGTGCCCCA

     1601  AGGTGGTCGT GAACACGGAC GATTTTCTGA AAAAGATGTT GCTGTGCGCG

     1651  CTCAAGGGCC GTGCCTCGGT GGTCTTTGTC CATCACGTAG TCAAGTACTC

     1701  GATTATGGCC GACAGCGTGT GCCTGCCGCC GTGCTTGAGT CCCGATATGG

     1751  CGTCGTGCCA CTTTGGCGAG TGTGACATGC CGGTGCAGCG GCTGACGGTG

     1801  AACGTGGCTC GCTGCGTGTT TGCGCGTAGC GACGAGCAGA AGCTGCATCT

     1851  ACCCGACGTG GTTTTGGGGA ACACGCGACG TTACTTTGAT TTGAGCGTGC

     1901  TGCGCGAGTT GGTGACCGAG GCGGTGGTTT GGGGCAACGC GCGCTTGGAC

     1951  GCGCTAATGT CGGCGTCCGA ATGGTGGGTA GAGAGCGCGC TGGAAAAACT

     2001  GCGTCCGCTG CACATCGGCG TGGCTGGCTT GCACACGGCG CTCATGCGGT

     2051  TAGGGTTCAC GTACTTTGCC TCTTGGGACT TGATCGAGCG CATCTTTGAG

     2101  CACATGTACT TTGCCGCGGT GCGCGCTAGC GTCGATTTGT GCAAGTCGGG

     2151  TTTGCCGCGC TGCGAGTGGT TCGAACGCAC CATCTATCAA GAGGGCAAAT

     2201  TCATTTTCGA ATTGTATCGG TTGCCGCGGC TCTCCATCGC CAGCGCGCGC

     2251  TGGGAAGCGC TGCGCGCCGA CATGCTCGAG TTCGGATTGC GCAACTGTCA

     2301  GTTTCTGGCG GTGGGTCCCG ACGACGAGGT GGCGCATCTG TGGGGCGTGA

     2351  CGCCGTCAGT GTGGGCTTCG CGCGGCACCG TGTTCGAGGA GGAGACGGTG

     2401  TGGTCATTGT GCCCGCCCAA CCGTGAGTGT TACTTCCCCA CCGTGGTGCG

     2451  GAGGCCGCTG CGCGTGCCCG TGGTGAATTA CGCGTGGTTG GAGCAGCACC

     2501  AGGAGGAGGG CAAGGCGACG CAGTGTCTGT TCCAGGCGGC ACCGGCGATC

     2551  CAAAACGACG TGGAAATGGC GGCCGTGAAC CTGAGCGTGT TTGTGGACCA

     2601  GTGCGTGGCC CTGGTTTTCT ACTATGACTC GGGGATGACG CCCGACGTGC

     2651  TTCTGGCCAG GATGCTCAAG TGGTACCACT GGCGCTTTAA GGTCGGAGTA

     2701  TATAAGTACT GTGCCTCT



More information about the Bioperl-l mailing list