[Biopython] Moving from Bio.PubMed to Bio.Entrez

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Fri Jul 17 09:58:13 UTC 2009


Hi Peter and others,
  finally am moving my code from Bio.PubMed to Bio.Entrez. I think I have something
wrong with my installation biopython-1.49:

$ python
Python 2.6.2 (r262:71600, Jun 10 2009, 00:54:18) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez, Medline, GenBank
>>> Entrez.email = "mmokrejs at iresite.org"
>>> _handle = Entrez.efetch(db="pubmed", id=10851087, retmode="text")
>>> _records = Entrez.read(_handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read
    record = handler.run(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run
    self.parser.ParseFile(handle)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
>>> _handle = Entrez.efetch(db="pubmed", id=10851087, retmode="XML")
>>> _records = Entrez.read(_handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read
    record = handler.run(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 283, in external_entity_ref_handler
    parser.ParseFile(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 280, in external_entity_ref_handler
    handle = urllib.urlopen(systemId)
  File "/usr/lib/python2.6/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/usr/lib/python2.6/urllib.py", line 203, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.6/urllib.py", line 465, in open_file
    return self.open_local_file(url)
  File "/usr/lib/python2.6/urllib.py", line 479, in open_local_file
    raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'nlmmedline_090101.dtd'
>>> 





When I upgrade to 1.51b I get slightly better results:

$ python
Python 2.5.4 (r254:67916, Jul 15 2009, 19:40:01) 
[GCC 4.2.2 (Gentoo 4.2.2 p1.0)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez, Medline, GenBank
>>> Entrez.email = "mmokrejs at iresite.org"
>>> _handle = Entrez.efetch(db="pubmed", id=10851087, retmode="text")
>>> _records = Entrez.read(_handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/Bio/Entrez/__init__.py", line 297, in read
    record = handler.run(handle)
  File "/usr/lib/python2.5/site-packages/Bio/Entrez/Parser.py", line 90, in run
    self.parser.ParseFile(handle)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
>>> _handle = Entrez.efetch(db="pubmed", id=10851087, retmode="XML")
>>> _records = Entrez.read(_handle)
>>> _records
[{u'MedlineCitation': {u'DateCompleted': {u'Month': '06', u'Day': '29', u'Year': '2000'}, u'OtherID': [], u'DateRevised': {u'Month': '11', u'Day': '14', u'Year': '2007'}, u'MeshHeadingList': [{u'QualifierName': [], u'DescriptorName': '3T3 Cells'}, {u'QualifierName': ['chemistry', 'physiology'], u'DescriptorName': "5' Untranslated Regions"}, {u'QualifierName': [], u'DescriptorName': 'Animals'}, {u'QualifierName': [], u'DescriptorName': 'Base Sequence'}, {u'QualifierName': [], u'DescriptorName': 'Chick Embryo'}, {u'QualifierName': [], u'DescriptorName': 'Mice'}, {u'QualifierName': [], u'DescriptorName': 'Molecular Sequence Data'}, {u'QualifierName': [], u'DescriptorName': 'Protein Biosynthesis'}, {u'QualifierName': ['genetics'], u'DescriptorName': 'Proto-Oncogene Proteins c-jun'}, {u'QualifierName': ['chemistry'], u'DescriptorName': 'RNA, Messenger'}, {u'QualifierName': [], u'DescriptorName': 'Rabbits'}], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'ChemicalList': [{u'Nam
eOfSubstance': "5' Untranslated Regions", u'RegistryNumber': '0'}, {u'NameOfSubstance': 'Proto-Oncogene Proteins c-jun', u'RegistryNumber': '0'}, {u'NameOfSubstance': 'RNA, Messenger', u'RegistryNumber': '0'}], u'KeywordList': [], u'DateCreated': {u'Month': '06', u'Day': '29', u'Year': '2000'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': {u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '2836-45'}, u'AuthorList': [{u'LastName': 'Sehgal', u'Initials': 'A', u'ForeName': 'A'}, {u'LastName': 'Briggs', u'Initials': 'J', u'ForeName': 'J'}, {u'LastName': 'Rinehart-Kim', u'Initials': 'J', u'ForeName': 'J'}, {u'LastName': 'Basso', u'Initials': 'J', u'ForeName': 'J'}, {u'LastName': 'Bos', u'Initials': 'TJ', u'ForeName': 'T J'}], u'Language': ['eng'], u'PublicationTypeList': ['Journal Article', "Research Support, Non-U.S. Gov't", "Research Support, U.S. Gov't, P.H.S."], u'Journal': {u'ISSN': '0950-9232', u'ISOAbbreviation': 'Oncogene', u'JournalIssue': {u'Volume': '19',
 u'Issue': '24', u'PubDate': {u'Month': 'Jun', u'Day': '1', u'Year': '2000'}}, u'Title': 'Oncogene'}, u'Affiliation': 'Department of Microbiology and Molecular Cell Biology, Eastern Virginia Medical School, PO Box 1980, Norfolk, Virginia, VA 23501, USA.', u'ArticleTitle': "The chicken c-Jun 5' untranslated region directs translation by internal initiation.", u'ELocationID': [], u'Abstract': {u'AbstractText': "The 5' untranslated region (UTR) of the chicken c-jun message is exceptionally GC rich and has the potential to form a complex and extremely stable secondary structure. Because stable RNA secondary structures can serve as obstacles to scanning ribosomes, their presence suggests inefficient translation or initiation through alternate mechanisms. We have examined the role of the c-jun 5' UTR with respect to its ability to influence translation both in vitro and in vivo. We find, using rabbit reticulocyte lysates, that the presence of the c-jun 5' UTR severely inhibits tran
slation of both homologous and heterologous genes in vitro. Furthermore, translational inhibition correlates with the degree of secondary structure exhibited by the 5' UTR. Thus, in the rabbit reticulocyte lysate system, the c-jun 5' UTR likely impedes ribosome scanning resulting in inefficient translation. In contrast to our results in vitro, the c-jun 5' UTR does not inhibit translation in a variety of different cell lines suggesting that it may direct an alternate mechanism of translational initiation in vivo. To distinguish among the alternate mechanisms, we generated a series of bicistronic expression plasmids. Our results demonstrate that the downstream cistron, in the bicistronic gene, is expressed to a much higher level when directly preceded by the c-jun 5' UTR. In addition, inhibition of ribosome scanning on the bicistronic message, through insertion of a synthetic stable hairpin, inhibits translation of the first cistron but does not inhibit translation of the cist
ron downstream of the c-jun 5' UTR. These results are consistent with a model by which the c-jun message is translated through cap independent internal initiation. Oncogene (2000) 19, 2836 - 2845"}, u'GrantList': [{u'Acronym': 'CA', u'Country': 'United States', u'Agency': 'NCI NIH HHS', u'GrantID': 'R01 CA51982'}]}, u'PMID': '10851087', u'MedlineJournalInfo': {u'MedlineTA': 'Oncogene', u'Country': 'ENGLAND', u'NlmUniqueID': '8711562'}}, u'PubmedData': {u'ArticleIdList': ['10851087', '10.1038/sj.onc.1203601'], u'PublicationStatus': 'ppublish', u'History': [[{u'Minute': '0', u'Month': '6', u'Day': '13', u'Hour': '9', u'Year': '2000'}, {u'Minute': '0', u'Month': '7', u'Day': '6', u'Hour': '11', u'Year': '2000'}, {u'Minute': '0', u'Month': '6', u'Day': '13', u'Hour': '9', u'Year': '2000'}]]}}]
>>> _handle = Entrez.efetch(db="pubmed", id=10851087, retmode="text")
>>> _records = Entrez.read(_handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/Bio/Entrez/__init__.py", line 297, in read
    record = handler.run(handle)
  File "/usr/lib/python2.5/site-packages/Bio/Entrez/Parser.py", line 90, in run
    self.parser.ParseFile(handle)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
>>> 


  Any clues what does that mean? TIA,
martin



More information about the Biopython mailing list