[Bioperl-l] gcg.pm, another one

Hilmar Lapp hlapp at gnf.org
Fri Oct 31 14:10:08 EST 2003


The SeqIO gcg parser was written for single-sequence gcg files.

In fact, the parser I think doesn't currently have a maintainer, so anyone
who is willing to become that is welcome. Since GCG has been commercialized
since many years, you'll not easily find people among the core who actively
use GCG, just to explain why there is no maintainer.

    -hilmar

On 10/31/03 7:40 AM, "Derek Gatherer" <d.gatherer at vir.gla.ac.uk> wrote:

> Hello again
> 
> Last bug submitted to bugzilla.  The following may be a bug, but I wonder
> if there is a problem with my GCG format.
> 
> Try this script:
> 
> #!/usr/bin/perl -w
> 
> use lib "/usr/local/lib/site_perl/5.8.0/";
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my $dnain = Bio::SeqIO->new( '-format' => 'GCG' , -file => "cds.gcg");
> 
> while((my $seqobj = $dnain->next_seq()))
> {
>    $seqobj->display_id;
> }
> 
> on the file cds.gcg below.
> 
> to get:
> 
> ------------- EXCEPTION  -------------
> MSG: Looks like start of another sequence. See documentation.
> STACK Bio::SeqIO::gcg::next_seq
> /usr/local/lib/site_perl/5.8.0//Bio/SeqIO/gcg.pm:124
> STACK toplevel gcgtest.pl:10
> 
> The problem, I think, is that the SeqIO stream doesn't seem to recognise
> the change over from one sequence to another.  Or do I need some record
> separator between my sequences???  If this really is also a bug, I'll
> submit it too.
> 
> Offending line in gcg.pm is:
> 
> 124      if( /\.\.$/ ) {
>        $self->throw("Looks like start of another sequence. See
> documentation. ");
>       }
> 
> and here's the file cds.gcg, containing two sequences in GCG format
> 
> !!NA_SEQUENCE 1.0
> ASSEMBLE    October 27, 2003 15:32
> 
> Symbols:     1 to: 1269  from: merlin.seq /rev   ck: 8363, 55186 to: 56454
> 
> LOCUS       MERLIN                235645 bp    DNA     linear   VRL
> 14-AUG-2003
> DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
> ACCESSION   MERLIN
> VERSION
> KEYWORDS    .
> SOURCE      Human herpesvirus 5 . . .
> 
> merlin_ul43.cds  Length: 1269  October 27, 2003 15:32  Type: N  Check: 9039
> ..
> 
>       1  ATGGAGAAAA CGCCGGCGGA GACGACGGCG GTTTCAGCTG GCAACGTGCC
> 
>      51  ACGTGACTCA ATTCCGTGTA TAACTAACGT GTCCGCGGAC ACCCGCGGCC
> 
>     101  GTACCCGCCC CAGCAGACCA GCCACCGTCC CTCAGCGACG TCCCGCGCGG
> 
>     151  ATCGGACACT TTAGGCGGCG CAGCGCCAGC CTTAGCTTTC TTGACTGGCC
> 
>     201  GGACGACAGC GTCACAGAGG GCGTTCGGAC GACCTCCGCG TCGGTCGCCG
> 
>     251  CCTCCGCGGC CCGTTTCGAC GAAATCCGGC GGCGCCGCCA GAGCATCAAC
> 
>     301  GACGAGATGA AGGAACGCAC GCTGGAGGAC GCGCTGGCTG TCGAGCTGGT
> 
>     351  CAACGAGACC TTCCGCTGCT CTGTCACCTC CGACGCCCGC AAGGACTTGC
> 
>     401  AGAAGCTGGT TCGTCGCGTC AGCGGCACGG TGCTGCGTCT CAGCTGGCCA
> 
>     451  AACGGTTGGT TCTTCACCTA CTGCGACCTG TTACGCGTCG GCTACTTTGG
> 
>     501  ACATCTCAAT ATTAAAGGTT TGGAGAAGAC CTTCCTGTGC TGCGACAAGT
> 
>     551  TCTTGCTGCC GGTGGGCACT GTGAGTCGTT GCGAAGCCAT CGGCCGCCCA
> 
>     601  CCGCTACCCG TACTCATCGG CGAGGGCGGT CGCGTCTACG TCTACTCGCC
> 
>     651  TGTGGTGGAA TCGCTGTACC TGGTGTCGCG GTCCGGTTTC CGCGGCTTCG
> 
>     701  TGCAGGAGGG CCTGCGCAAC TACGCGCCGC TGCGCGAAGA ACTGGGCTAT
> 
>     751  GTCCGCTTCG AGACCGGCGG CGACGTGGGT CGCGAGTTCA TGTTGGCGCG
> 
>     801  CGACCTGCTG GCCCTGTGGC GCCTGTGCAT GAAGCGCGAG GGTTCTATCT
> 
>     851  TCAGCTGGCG AGACGGTAAC GAGGCGCTGA CGACGGTCGT CTTGAACGGG
> 
>     901  AGCCAGACTT ACGAGGATCC GGCCCACGGC AACTGGTTAA AAGAGACGTG
> 
>     951  CTCGCTGAAC GTGCTGCAGG TATTTGTGGT GCGGGCCGTG CCGGTGGAGT
> 
>    1001  CGCAGCAGCG CCTGGACATC TCCATACTGG TGAACGAGAG CGGCGCCGTC
> 
>    1051  TTCGGCGTGC ATCCCGATAC GCGGCAGGCG CACTTTCTGG CGCGCGGACT
> 
>    1101  CCTGGGCTTC TTTCGCGTCG GGTTCTTGCG GTTCTGCAAC AACTACTGCT
> 
>    1151  TCGCCCGCGA CTGTTTTACC CACCCTGAAA GCGTGGCACC CGCTTACCGC
> 
>    1201  GCCACCGGCT GTCCCAGAGA ACTGTTTTGT CGTCGTTTGC GCAAAAAGAA
> 
>    1251  GGGGCTCTTT GCTCGAAGG
> 
> !!NA_SEQUENCE 1.0
> ASSEMBLE    October 27, 2003 15:32
> 
> Symbols:     1 to: 2718  from: merlin.seq /rev   ck: 8363, 57946 to: 60663
> 
> LOCUS       MERLIN                235645 bp    DNA     linear   VRL
> 14-AUG-2003
> DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
> ACCESSION   MERLIN
> VERSION
> KEYWORDS    .
> SOURCE      Human herpesvirus 5 . . .
> 
> merlin_ul45.cds  Length: 2718  October 27, 2003 15:32  Type: N  Check: 4998
> ..
> 
>       1  ATGAATCCGG CTGACGCGGA CGAGGAACAG CGGGTGTCCT CGGTGCCCGC
> 
>      51  ACATCGGTGC CGGCCAGGTA GGATTCCAAG CCGCAGCGCG GAAACCGAGA
> 
>     101  CGGAGGAATC GTCGGCAGAG GTCGCCGCTG ATACTATCGG GGGAGATGAC
> 
>     151  AGCGAGCTCG AGGAGGGGCC GCTGCCCGGG GGTGACAAGG AAGCGTCCGC
> 
>     201  TGGAAATACC AACGTATCGA GCGGTGTAGC ATGTGTAGCG GGTTTTACGA
> 
>     251  GTGGTGGCGG CGTCGTCAGT TGGCGTCCCG AGTCGCCGTC TCCCGACGGC
> 
>     301  ACGCCGTCTG TGCTGTCGTT GACGCGTGAC AGCGGTCCCG CCGTGCCCAG
> 
>     351  TCGCGGTGGA CGCGTGAGTA GCGGTCTGAG CACCTTTAAT CCGGCCGGCG
> 
>     401  CGACCAGGAT GGAGCTGGAC AGTGTCGAGG AGGAGGACGA TTTCGGGGCT
> 
>     451  TCGCTCTGCA AAGTATCGCC GCCGATACAA GCTATGCGCA TGTTGATGGG
> 
>     501  CAAAAAGTGT CATTGTCACG GCTACTGGGG CAAGTTTCGC TTTTGCGGCG
> 
>     551  TACAGGAGCC GGCGCGGGAG CTGCCGTCCG ACAGGAACGC GCTGTGGCGC
> 
>     601  GAGATGGACA CCGTGTCGCG GCACAGTGCC GGTTTGGGCA GTTTCAGGCT
> 
>     651  ATTTCAGCTC ATTATGCGCC ACGGTCCCTG TCTGATTCGT CACTCGCCGC
> 
>     701  GTTGCGACCT GCTGTTGGGT CGCTTTTATT TCAAAGCCAA CTGGGCGCGT
> 
>     751  GAAAGCCGCA CGCCACTGTG TTACGCTTCG GAGCTGTGCG ATGAGTCGGT
> 
>     801  GCGCCGTTTT GTGCTGCGTC ACATGGAGGA TCTACCCAAG CTGGCCGAGG
> 
>     851  AGACGGCGCG TTTTGTGGAA TTGGCCGGTT GCTGGGGCTT GTACGCGGCC
> 
>     901  ATTTTGTGTT TGGATAAGGT GTGTCGCCAA CTGCACGGAC AGGACGAGAG
> 
>     951  CCCGGGCGGC GTGTTTTTGC GCATCGCCGT GGCGTTGACG GCCGCTATCG
> 
>    1001  AGAACAGTAG GCACTCGCGC ATCTATCGTT TCCATCTGGA TGCGCGTTTC
> 
>    1051  GAGGGCGAGG TGTTGGAATC GGTGTTGAAG CGCTGTCGCG ATGGGCAGCT
> 
>    1101  GTCGCTGTCC ACCTTCACCA TGTCTACCGT GGGTTTCGAT CGCGTGCCGC
> 
>    1151  AGTACGACTT TCTGATCTCG GCCGACCCTT TCTCGCGTGA CGCCAGTTGG
> 
>    1201  GCGGCCATGT GCAAGTGGAT GAGTACCTTG AGTTGCGGCG TTTCTGTGTC
> 
>    1251  GGTGAACGTA ACGCGACTTA ACGCCGATGT GAACAGCGTG ATTCGTTGCC
> 
>    1301  TGGGGGGATA CTGCGATTTG ATACGCGAGA AGGAGGTGCA TCGACCCGTG
> 
>    1351  GTACGTGTGT TTGTGGACAT GTGGGACGTG GCCGCTATCC GCGTGATTAA
> 
>    1401  CTTTATTCTC AAAGAAAGCA CGTCGGAGTT GACGGGGGTT TGCTACGCTT
> 
>    1451  TCAACGTGCC TAGCGTGTTA ATGAAGCGCT ACCGTGCGCG TGAGCAGCGC
> 
>    1501  TACTCGCTGT TTGGGCGGCC TGTCTCCCGG CGGCTCTCGG ACCTGGGTCA
> 
>    1551  GGAGTCGGCT TTCGAGAAAG AGTATTCGCG CTGCGAGCAA TCGTGCCCCA
> 
>    1601  AGGTGGTCGT GAACACGGAC GATTTTCTGA AAAAGATGTT GCTGTGCGCG
> 
>    1651  CTCAAGGGCC GTGCCTCGGT GGTCTTTGTC CATCACGTAG TCAAGTACTC
> 
>    1701  GATTATGGCC GACAGCGTGT GCCTGCCGCC GTGCTTGAGT CCCGATATGG
> 
>    1751  CGTCGTGCCA CTTTGGCGAG TGTGACATGC CGGTGCAGCG GCTGACGGTG
> 
>    1801  AACGTGGCTC GCTGCGTGTT TGCGCGTAGC GACGAGCAGA AGCTGCATCT
> 
>    1851  ACCCGACGTG GTTTTGGGGA ACACGCGACG TTACTTTGAT TTGAGCGTGC
> 
>    1901  TGCGCGAGTT GGTGACCGAG GCGGTGGTTT GGGGCAACGC GCGCTTGGAC
> 
>    1951  GCGCTAATGT CGGCGTCCGA ATGGTGGGTA GAGAGCGCGC TGGAAAAACT
> 
>    2001  GCGTCCGCTG CACATCGGCG TGGCTGGCTT GCACACGGCG CTCATGCGGT
> 
>    2051  TAGGGTTCAC GTACTTTGCC TCTTGGGACT TGATCGAGCG CATCTTTGAG
> 
>    2101  CACATGTACT TTGCCGCGGT GCGCGCTAGC GTCGATTTGT GCAAGTCGGG
> 
>    2151  TTTGCCGCGC TGCGAGTGGT TCGAACGCAC CATCTATCAA GAGGGCAAAT
> 
>    2201  TCATTTTCGA ATTGTATCGG TTGCCGCGGC TCTCCATCGC CAGCGCGCGC
> 
>    2251  TGGGAAGCGC TGCGCGCCGA CATGCTCGAG TTCGGATTGC GCAACTGTCA
> 
>    2301  GTTTCTGGCG GTGGGTCCCG ACGACGAGGT GGCGCATCTG TGGGGCGTGA
> 
>    2351  CGCCGTCAGT GTGGGCTTCG CGCGGCACCG TGTTCGAGGA GGAGACGGTG
> 
>    2401  TGGTCATTGT GCCCGCCCAA CCGTGAGTGT TACTTCCCCA CCGTGGTGCG
> 
>    2451  GAGGCCGCTG CGCGTGCCCG TGGTGAATTA CGCGTGGTTG GAGCAGCACC
> 
>    2501  AGGAGGAGGG CAAGGCGACG CAGTGTCTGT TCCAGGCGGC ACCGGCGATC
> 
>    2551  CAAAACGACG TGGAAATGGC GGCCGTGAAC CTGAGCGTGT TTGTGGACCA
> 
>    2601  GTGCGTGGCC CTGGTTTTCT ACTATGACTC GGGGATGACG CCCGACGTGC
> 
>    2651  TTCTGGCCAG GATGCTCAAG TGGTACCACT GGCGCTTTAA GGTCGGAGTA
> 
>    2701  TATAAGTACT GTGCCTCT
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





More information about the Bioperl-l mailing list