[Bioperl-l] Problems parsing scientific name from a Genbank file

Cesar Arze carze at som.umaryland.edu
Thu Jun 18 17:51:43 UTC 2009


Hi all,
   I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
addressed.

When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
NC_005945:

  ORGANISM  Bacillus anthracis str. Sterne
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus
            cereus group.

Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.

Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
-- 
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.




More information about the Bioperl-l mailing list