[Bioperl-l] Oddness in Bio::SeqIO

Chris Fields cjfields at uiuc.edu
Tue May 9 21:13:43 UTC 2006


I noticed an odd thing with SeqIO parsing of species lines (those
problematic bacterial tax names again).  I have a simple script that runs
output to STDOUT to generate a list of hits.  Here's what I get:

Bacterium: Corynebacterium glutamicum ATCC 13032
         hits: 4
Bacterium: Corynebacterium jeikeium K411 K411 <--
         hits: 1
Bacterium: Frankia sp. CcI3 CcI3 <--
         hits: 1
Bacterium: Frankia sp. EAN1pec EAN1pec <--
         hits: 1
Bacterium: Janibacter sp. HTCC2649 HTCC2649 <--
         hits: 1
Bacterium: Kineococcus radiotolerans SRS30216 SRS30216  <--
         hits: 1
Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <--
         hits: 1
Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
K-10 <--

...

Most (but not all) of the strain numbers get repeated (marked with arrows).
This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
(and thus passed through Bio::SeqIO).  Anyone seen this before?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





More information about the Bioperl-l mailing list