[Bioperl-l] Problems parsing scientific name from a Genbank file
Roy Chaudhuri
roy.chaudhuri at gmail.com
Fri Jun 19 10:34:24 UTC 2009
Hi Cesar,
I can replicate this using an old Bioperl (version 1.5.2), but it
appears to be fixed in version 1.6 and bioperl-live - the
scientific_name method returns "Bacillus anthracis str. Sterne".
Hope this helps.
Roy.
Cesar Arze wrote:
> Hi all,
> I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
>
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
>
> ORGANISM Bacillus anthracis str. Sterne
> Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
> cereus group.
>
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
>
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.
More information about the Bioperl-l
mailing list