[Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Feb 6 07:27:49 EST 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2750
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 07:27 EST -------
Confirmed title problem, example code using your EMBL record saved to a file:
>>> from Bio import SeqIO
>>> record = SeqIO.read(open("long_ref.embl"),"embl")
>>> print record.annotations["references"][0]
authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG;
title: Caenorhabditis elegans";
journal: Science. 294:858-862(2001).
medline id:
pubmed id:
comment:
This is due to a subtle difference between the GenBank and EMBL scanner code,
the GenBank scanner pre-combines the title lines before passing it to the
consumer, while the EMBL scanner passes the title in chunks. Fixed the
consumer to cope with either. Also fixed for multi-line author lists etc.
Could you update your Bio/GenBank/__init__.py file to CVS revision 102, which
you will be able to download here, and retest:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython
Or update the full installation to CVS if you would find that easier.
Thanks,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list