[Bioperl-l] Genbank files with CONTIG lines in them.

Fields, Christopher J cjfields at illinois.edu
Mon Dec 16 04:28:22 UTC 2013


Govind,

Can you try this with the latest CPAN release (v 1.6.922)?  

chris

On Dec 11, 2013, at 6:15 AM, Govind Chandra <govind.chandra at jic.ac.uk> wrote:

> Hi,
> 
> Some Genbank files have a line beginning with "CONTIG" as shown below.
> 
> .
> .
> .
>                     /protein_id="YP_008390690.1"
>                     /db_xref="GI:529229870"
>                     /db_xref="GeneID:16501453"
>                     /translation="MSAEATPNTGEVQRYVKGLGRAASFVAGLVVLAFAADCIPPWPF
>                     VTEDGSPAKLRRLGMLRCPACGLMSNREHRRLCRGPWRAGEDVST"
> CONTIG      join(CP006261.1:1..19314)
> ORIGIN      
>        1 ggggggcaga ggccatgcgg ctacgccgcg tcacctccgg gcctgcggcc ctcacggacg
>       61 gtgacggtca ctctccgcgg tcgtgcctac ggcacatccc cgccgccgtg tcaacccccg
>      121 cgcgcaactt ttccccgaca acctgcggtt gtcgtccgcc gtcccgggac cgcacccccc
>      181 acccgatcac cccccaccgg ccgggctacg cccacggccg gcccctcggc cgtctgtggc
>      241 ccacaggttc cccccgccgc ctacggcgtc tcgtccgggc ataccccccc ctgctacgcc
>      301 accccaccga acgcgccgag cccgcaaagg ccggcggcgc gtcggccgac acactccgtc
>      361 tgtccccgtg aggctgcggg tatcggccat gcctggcctg ccctgcttcg ccgctcggcc
> .
> .
> .
> 
> 
> 
> If the CONTIG line is present in a Genbank file then the string
> returned by the Bio::Seq->seq() method is zero-length or undefined (I
> haven't checked which).
> 
> I made two versions of the same genbank file, one with the CONTIG line
> and one without. Then I ran the script pasted below.
> 
> 
> ### Code begins ###
> 
> use strict;
> use Bio::SeqIO;
> 
> 
> for my $gbkfile (qw(withContigLine.gbk withoutContigLine.gbk)) {
> 
> my $seqin = Bio::SeqIO->new(-file => $gbkfile);
> my $seqobj = $seqin->next_seq();
> my $ntseq = $seqobj->seq();
> my $strlen = length($ntseq);
> my $bplen = $seqobj->length();
> 
> print <<"REPORT";
> $gbkfile
> 
> Bioperl reports length as $bplen.
> Length of the sequence string is $strlen.
> 
> =========================================
> 
> REPORT
> 
> }
> 
> print("Perl version is: $]\n");
> print("Bioperl version is: ", $Bio::SeqIO::VERSION, "\n");
> printf "Bioperl version again: %vd\n", $Bio::SeqIO::VERSION;
> 
> exit;
> 
> ### Code Ends ###
> 
> The output from the above script is pasted below.
> 
> 
> ### Output begins ###
> 
> withContigLine.gbk
> 
> Bioperl reports length as 19314.
> Length of the sequence string is .
> 
> =========================================
> 
> withoutContigLine.gbk
> 
> Bioperl reports length as 19314.
> Length of the sequence string is 19314.
> 
> =========================================
> 
> Perl version is: 5.018000
> Bioperl version is: 1.006001
> Bioperl version again: 49.46.48.48.54.48.48.49
> 
> ### Output ends ###
> 
> 
> Do I have to do something different to get the sequence string from
> Genbank files which have the CONTIG line in them?
> 
> Any suggestions will be most gratefully received.
> 
> Thanks
> 
> Govind
> 
> Govind Chandra
> Molecular Microbiology
> John Innes Centre
> Norwich UK.
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list