[Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Feb 6 12:27:49 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2750





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-02-06 07:27 EST -------
Confirmed title problem, example code using your EMBL record saved to a file:

>>> from Bio import SeqIO
>>> record = SeqIO.read(open("long_ref.embl"),"embl")
>>> print record.annotations["references"][0]
authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG;
title: Caenorhabditis elegans";
journal: Science. 294:858-862(2001).
medline id: 
pubmed id: 
comment:

This is due to a subtle difference between the GenBank and EMBL scanner code,
the GenBank scanner pre-combines the title lines before passing it to the
consumer, while the EMBL scanner passes the title in chunks.  Fixed the
consumer to cope with either.  Also fixed for multi-line author lists etc.

Could you update your Bio/GenBank/__init__.py file to CVS revision 102, which
you will be able to download here, and retest:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython

Or update the full installation to CVS if you would find that easier.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list