[Bioperl-l] GenBank files CONTIG line

Brian Osborne bosborne11 at verizon.net
Wed Sep 17 14:24:48 UTC 2014


Matthew,

What's an easy way for me to reproduce this performance problem, making the *ptt file?

Brian O.

On Sep 16, 2014, at 5:16 PM, Matthew Laird <lairdm at sfu.ca> wrote:

> Good afternoon,
> 
> I wanted to report what I think is an issue but I'm not positive yet.  I found this old mailing list posting from May (http://lists.open-bio.org/pipermail/bioperl-l/2014-May/071583.html) about the changes to NCBI's genbank files, and I just grabbed the latest bioperl live with August's patch to hopefully solve it.  That part worked great, instead of spewing a few GB of warns and the whole sequence multiple times it read the genbank file and wrote out an embl file perfectly fine.
> 
> However the current bioperl live created a new issue.  I have a mirror of NCBI's bacterial genomes directory (yes, I know, I need to move to the new directory structure in the next 6 months) and this pipeline takes the genbank file and makes the embl, ptt, faa, and fna as needed.  This usually takes seconds.  Whatever changed in bioperl live compared to BioPerl 1.6.922 causes the script to spin doing something very intensely for tens of minutes, slowly writing out the ptt file.
> 
> Simply copying genbank.pm from bioperl live to my install directory solved both the CONTIG issue and kept the whole conversion process speedy.  So I'm happy for now, but I wanted to mention this in case it rings a bell with anyone on what could have changed to make parsing a gbk in to a ptt so much less efficient now.
> 
> Thanks.
> 
> -- 
> Matthew Laird
> Lead Software Developer, Bioinformatics
> Brinkman Laboratory
> Simon Fraser University, Burnaby, BC, Canada
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list