[Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines!

Sun Mar 14 19:31:51 EDT 2010

http://bugzilla.open-bio.org/show_bug.cgi?id=3026

------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-03-14 19:31 EST -------
I just used the Entrez web interface, and it comes with the URL split already
to meet the 80 column limit. Also doing it via the API:

>>> from Bio import Entrez
>>> data = Entrez.efetch("nucest", id="BF378302", rettype="gb").read()
>>> print data[1095:1800]
   PUBMED   10737800
COMMENT     Contact: Simpson A.J.G.
            Laboratory of Cancer Genetics
            Ludwig Institute for Cancer Research
            Rua Prof. Antonio Prudente 109, 4 andar, 01509-010, Sao Paulo-SP,
            Brazil
            Tel: +55-11-2704922
            Fax: +55-11-2707001
            Email: asimpson at ludwig.org.br
            This sequence was derived from the FAPESP/LICR Human Cancer Genome
            Project. This entry can be seen in the following URL
            (http://www.ludwig.org.br/scripts/gethtml2.pl?t1=CM0&t2=CM0-UM0001-
            060300-270-g07&t3=2000-03-06&t4=1)
            Seq primer: puc 18 forward.
FEATURES             Location/Qualifiers

In this particular case, it looks like splitting the string on a hyphen would
be a reasonable option (i.e. copy what the NCBI seems to be doing).

Did you just cut and paste it from the NCBI's HTML page where it does seem
to be shown with the URL is shown unbroken (giving a line more than 80
characters)? Or can we download a "broken" GenBank file from the NCBI
somewhere (maybe the FTP site)?

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.