[Bioperl-l] Genbank files with CONTIG lines in them.

Govind Chandra govind.chandra at jic.ac.uk
Tue Dec 17 11:52:07 UTC 2013


Hi Chris,

As you suggested, I upgraded to Bioperl version 1.6.922 but the
problem remains. Below is the output of my test script again.

withContigLine.gbk

Bioperl reports length as 19314.
Length of the sequence string is .

=========================================

withoutContigLine.gbk

Bioperl reports length as 19314.
Length of the sequence string is 19314.

=========================================

Perl version is: 5.018000
Bioperl version is: 1.006922
Bioperl version again: 49.46.48.48.54.57.50.50

Thanks

Govind


On Mon, Dec 16, 2013 at 12:00:04PM -0500, bioperl-l-request at lists.open-bio.org wrote:
> Message: 4
> Date: Mon, 16 Dec 2013 04:28:22 +0000
> From: "Fields, Christopher J" <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] Genbank files with CONTIG lines in them.
> To: Govind Chandra <govind.chandra at jic.ac.uk>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <0D7043D8-AB40-49D6-AD8A-C1B62948EA18 at illinois.edu>
> Content-Type: text/plain; charset="us-ascii"
> 
> Govind,
> 
> Can you try this with the latest CPAN release (v 1.6.922)?  
> 
> chris
> 
> On Dec 11, 2013, at 6:15 AM, Govind Chandra <govind.chandra at jic.ac.uk> wrote:
> 
> > Hi,
> > 
> > Some Genbank files have a line beginning with "CONTIG" as shown below.
> > 
> > .
> > .
> > .
> >                     /protein_id="YP_008390690.1"
> >                     /db_xref="GI:529229870"
> >                     /db_xref="GeneID:16501453"
> >                     /translation="MSAEATPNTGEVQRYVKGLGRAASFVAGLVVLAFAADCIPPWPF
> >                     VTEDGSPAKLRRLGMLRCPACGLMSNREHRRLCRGPWRAGEDVST"
> > CONTIG      join(CP006261.1:1..19314)
> > ORIGIN      
> >        1 ggggggcaga ggccatgcgg ctacgccgcg tcacctccgg gcctgcggcc ctcacggacg
> >       61 gtgacggtca ctctccgcgg tcgtgcctac ggcacatccc cgccgccgtg tcaacccccg
> >      121 cgcgcaactt ttccccgaca acctgcggtt gtcgtccgcc gtcccgggac cgcacccccc
> >      181 acccgatcac cccccaccgg ccgggctacg cccacggccg gcccctcggc cgtctgtggc
> >      241 ccacaggttc cccccgccgc ctacggcgtc tcgtccgggc ataccccccc ctgctacgcc
> >      301 accccaccga acgcgccgag cccgcaaagg ccggcggcgc gtcggccgac acactccgtc
> >      361 tgtccccgtg aggctgcggg tatcggccat gcctggcctg ccctgcttcg ccgctcggcc
> > .
> > .
> > .
> > 
> > 
> > 
> > If the CONTIG line is present in a Genbank file then the string
> > returned by the Bio::Seq->seq() method is zero-length or undefined (I
> > haven't checked which).
> > 
> > I made two versions of the same genbank file, one with the CONTIG line
> > and one without. Then I ran the script pasted below.
> > 
> > 
> > ### Code begins ###
> > 
> > use strict;
> > use Bio::SeqIO;
> > 
> > 
> > for my $gbkfile (qw(withContigLine.gbk withoutContigLine.gbk)) {
> > 
> > my $seqin = Bio::SeqIO->new(-file => $gbkfile);
> > my $seqobj = $seqin->next_seq();
> > my $ntseq = $seqobj->seq();
> > my $strlen = length($ntseq);
> > my $bplen = $seqobj->length();
> > 
> > print <<"REPORT";
> > $gbkfile
> > 
> > Bioperl reports length as $bplen.
> > Length of the sequence string is $strlen.
> > 
> > =========================================
> > 
> > REPORT
> > 
> > }
> > 
> > print("Perl version is: $]\n");
> > print("Bioperl version is: ", $Bio::SeqIO::VERSION, "\n");
> > printf "Bioperl version again: %vd\n", $Bio::SeqIO::VERSION;
> > 
> > exit;
> > 
> > ### Code Ends ###
> > 
> > The output from the above script is pasted below.
> > 
> > 
> > ### Output begins ###
> > 
> > withContigLine.gbk
> > 
> > Bioperl reports length as 19314.
> > Length of the sequence string is .
> > 
> > =========================================
> > 
> > withoutContigLine.gbk
> > 
> > Bioperl reports length as 19314.
> > Length of the sequence string is 19314.
> > 
> > =========================================
> > 
> > Perl version is: 5.018000
> > Bioperl version is: 1.006001
> > Bioperl version again: 49.46.48.48.54.48.48.49
> > 
> > ### Output ends ###
> > 
> > 
> > Do I have to do something different to get the sequence string from
> > Genbank files which have the CONTIG line in them?
> > 
> > Any suggestions will be most gratefully received.
> > 
> > Thanks
> > 
> > Govind
> > 
> > Govind Chandra
> > Molecular Microbiology
> > John Innes Centre
> > Norwich UK.



More information about the Bioperl-l mailing list