[Bioperl-l] GenBank files CONTIG line

Matthew Laird lairdm at sfu.ca
Tue Sep 16 21:16:59 UTC 2014


Good afternoon,

I wanted to report what I think is an issue but I'm not positive yet.  I 
found this old mailing list posting from May 
(http://lists.open-bio.org/pipermail/bioperl-l/2014-May/071583.html) 
about the changes to NCBI's genbank files, and I just grabbed the latest 
bioperl live with August's patch to hopefully solve it.  That part 
worked great, instead of spewing a few GB of warns and the whole 
sequence multiple times it read the genbank file and wrote out an embl 
file perfectly fine.

However the current bioperl live created a new issue.  I have a mirror 
of NCBI's bacterial genomes directory (yes, I know, I need to move to 
the new directory structure in the next 6 months) and this pipeline 
takes the genbank file and makes the embl, ptt, faa, and fna as needed. 
  This usually takes seconds.  Whatever changed in bioperl live compared 
to BioPerl 1.6.922 causes the script to spin doing something very 
intensely for tens of minutes, slowly writing out the ptt file.

Simply copying genbank.pm from bioperl live to my install directory 
solved both the CONTIG issue and kept the whole conversion process 
speedy.  So I'm happy for now, but I wanted to mention this in case it 
rings a bell with anyone on what could have changed to make parsing a 
gbk in to a ptt so much less efficient now.

Thanks.

-- 
Matthew Laird
Lead Software Developer, Bioinformatics
Brinkman Laboratory
Simon Fraser University, Burnaby, BC, Canada


More information about the Bioperl-l mailing list