[Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Dec 15 20:33:51 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2591





------- Comment #4 from joelb at lanl.gov  2008-12-15 15:33 EST -------
I heard back from GenBank, and it seems they are saying the problem isn't
theirs:
>On Tue, December 9, 2008 10:30 am, gb-admin at ncbi.nlm.nih.gov wrote:
>> Hi Joel,
>>
>> I heard back from our database folks on this one.  Essentially we do
>> allow the source line to line-wrap, but we never publicly announced
>> it.  We apologize for this oversight and will be putting something
>> in the release notes regarding this.  Hopefully BioPython and other
>> companies will be able to pick up this change and adapt once it is
>> announced in the release notes.
>>
>> thanks for pointing it out
>>
>> Linda

I just wrote back with the followup question:
>

>OK, but but then a followup question.  How does one distinguish, then, a
>line-wrapped organism line from the multiline phylogeny that follows? 
>According to my reading of the specs (and most Bio* GenBank parser's
>implementations) it seems that an equally-valid parsing of the following
>ORGANISM record is that it belongs to the "AKU_12601 Bacteria" kingdom. 
>That is, there is no official way of signalling "this is the end of the
>multiline organism name" or "this begins the multiline phylogeny record."
>
>  ORGANISM  Salmonella enterica subsp. enterica serovar Paratyphi A str.
>            AKU_12601
>            Bacteria; Proteobacteria; Gammaproteobacteria;Enterobacteriales;
>            Enterobacteriaceae; Salmonella.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list