[Bioperl-l] PubMed records (was: MeSH terms)
Brian Osborne
bosborne11 at verizon.net
Sat Oct 24 19:18:20 UTC 2009
Robert,
Not sure what "robust" means - would "working" suffice? Also, you
suggested starting with a Genbank id but what I'm about to show you
starts with Pubmed ids, at the other end. What I will do is take some
of this and make a little script for Bioperl's examples/ directory. In
the meantime, here is some code:
#!/bin/perl -w
use Bio::Biblio;
my $pmid = 52;
my $biblio = Bio::Biblio->new(-access => "eutils");
my $ref = $biblio->get_by_id($pmid);
# $ref contains raw XML
print $ref,"\n";
And what it prints is below.
Brian O.
<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
January 2009//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_090101.dtd
">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID>52</PMID>
<DateCreated>
<Year>1976</Year>
<Month>02</Month>
<Day>09</Day>
</DateCreated>
<DateCompleted>
<Year>1976</Year>
<Month>02</Month>
<Day>09</Day>
</DateCompleted>
<DateRevised>
<Year>2006</Year>
<Month>11</Month>
<Day>15</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0006-2960</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>14</Volume>
<Issue>24</Issue>
<PubDate>
<Year>1975</Year>
<Month>Dec</Month>
<Day>2</Day>
</PubDate>
</JournalIssue>
<Title>Biochemistry</Title>
<ISOAbbreviation>Biochemistry</ISOAbbreviation>
</Journal>
<ArticleTitle>Evidence of the involvement of a 50S
ribosomal protein in several active sites.</ArticleTitle>
<Pagination>
<MedlinePgn>5321-7</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>The functional role of the Bacillus
stearothermophilus 50S ribosomal protein B-L3 (probably homologous to
the Escherichia coli protein L2) was examined by chemical
modification. The complex [B-L3-23S RNA] was photooxidized in the
presence of rose bengal and the modified protein incorporated by
reconstitution into 50S ribosomal subunits containing all other
unmodified components. Particles containing photooxidized B-L3 are
defective in several functional assays, including (1) poly(U)-directed
poly(Phe) synthesis, (2) peptidyltransferase activity, (3) ability to
associate with a [30S-poly(U)-Phe-tRNA] complex, and (4) binding of
elongation factor G and GTP. The rates of loss of the partial
functional activities during photooxidation of B-L3 indicate that at
least two independent inactivating events are occurring, a faster one,
involving oxidation of one or more histidine residues, affecting
peptidyltransferase and subunit association activities and a slower
one affecting EF-G binding. Therefore the protein B-L3 has one or more
histidine residues which are essential for peptidyltransferase and
subunit association, and another residue which is essential for EF-G-
GTP binding. B-L3 may be the ribosomal peptidyltransferase protein, or
a part of the active site, and may contribute functional groups to the
other active sites as well.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Fahnestock</LastName>
<ForeName>S R</ForeName>
<Initials>SR</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
<PublicationType>Research Support, U.S. Gov't,
P.H.S.</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>UNITED STATES</Country>
<MedlineTA>Biochemistry</MedlineTA>
<NlmUniqueID>0370623</NlmUniqueID>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Macromolecular Substances</
NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance>Ribosomal Proteins</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Bacillus
stearothermophilus</DescriptorName>
<QualifierName MajorTopicYN="Y">metabolism</
QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Binding Sites</
DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Hydrogen-Ion
Concentration</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Kinetics</
DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Macromolecular
Substances</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Oxidation-Reduction</
DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Photochemistry</
DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Protein Binding</
DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Ribosomal Proteins</
DescriptorName>
<QualifierName MajorTopicYN="Y">metabolism</
QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Ribosomes</
DescriptorName>
<QualifierName MajorTopicYN="N">metabolism</
QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>1975</Year>
<Month>12</Month>
<Day>2</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>1975</Year>
<Month>12</Month>
<Day>2</Day>
<Hour>0</Hour>
<Minute>1</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>1975</Year>
<Month>12</Month>
<Day>2</Day>
<Hour>0</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">52</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
On Oct 24, 2009, at 2:45 PM, Robert Bradbury wrote:
> <alsaplayer-devel at lists.tartarus.org>
> I'm not sure if this is related to the MeSH question question or
> not, but
> I've googled the documentation several times and never managed to find
> "robust" examples for how to manipulate PubMed records.
>
> It would seem that there ought to be code lying around which does:
> Given Genbank ID,
> Fetch all Pubmed records from that ID
> Fetch all related records (via NCBI's "related" record IDs)
>
> Purge the list of duplicates, then do things like fetch all of the
> abstracts or fetch all of the MeSH headings, etc. for all of those
> records.
>
> Another example would include fetching all records of relatedness
> (i.e. a
> PubMed tree of depth N (or cloud of some max N)).
>
> I think that one can use NCBI's fetch interface to do this (one
> could do it
> by having NCBI email you all of the PubMed results and have an email
> harvester collect those results, parse them and setup a new set of
> queries). Of course this seems like an overhead intensive way to do
> this.
> Given the fact that increasing amounts of information is becoming
> open to
> the public one could consider even parsing the published papers and
> supplemental files (e.g. XLS tables) for genes of interest (as it
> seems the
> authors of most work as well as the PubMed record processors fail to
> provide
> or research the gene name information that is supposed to be in the
> PubMed
> records).
>
> Now it may simply be that its because I lack sufficient experience
> with the
> BioPerl documentation that I am unaware of the functions/tools which
> do this
> type of thing. So if anyone has any hints/pointers they would be
> appreciated.
>
> Thanks,
> Robert Bradbury
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list