[Bioperl-l] PubMed records (was: MeSH terms)

Sat Oct 24 15:18:20 EDT 2009

Robert,

Not sure what "robust" means - would "working" suffice? Also, you  
suggested starting with a Genbank id but what I'm about to show you  
starts with Pubmed ids, at the other end. What I will do is take some  
of this and make a little script for Bioperl's examples/ directory. In  
the meantime, here is some code:

#!/bin/perl -w

use Bio::Biblio;

my $pmid = 52;

my $biblio = Bio::Biblio->new(-access => "eutils");

my $ref = $biblio->get_by_id($pmid);

# $ref contains raw XML
print $ref,"\n";

And what it prints is below.

Brian O.

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st  
January 2009//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_090101.dtd 
">
<PubmedArticleSet>
<PubmedArticle>
     <MedlineCitation Owner="NLM" Status="MEDLINE">
         <PMID>52</PMID>
         <DateCreated>
             <Year>1976</Year>
             <Month>02</Month>
             <Day>09</Day>
         </DateCreated>
         <DateCompleted>
             <Year>1976</Year>
             <Month>02</Month>
             <Day>09</Day>
         </DateCompleted>
         <DateRevised>
             <Year>2006</Year>
             <Month>11</Month>
             <Day>15</Day>
         </DateRevised>
         <Article PubModel="Print">
             <Journal>
                 <ISSN IssnType="Print">0006-2960</ISSN>
                 <JournalIssue CitedMedium="Print">
                     <Volume>14</Volume>
                     <Issue>24</Issue>
                     <PubDate>
                         <Year>1975</Year>
                         <Month>Dec</Month>
                         <Day>2</Day>
                     </PubDate>
                 </JournalIssue>
                 <Title>Biochemistry</Title>
                 <ISOAbbreviation>Biochemistry</ISOAbbreviation>
             </Journal>
             <ArticleTitle>Evidence of the involvement of a 50S  
ribosomal protein in several active sites.</ArticleTitle>
             <Pagination>
                 <MedlinePgn>5321-7</MedlinePgn>
             </Pagination>
             <Abstract>
                 <AbstractText>The functional role of the Bacillus  
stearothermophilus 50S ribosomal protein B-L3 (probably homologous to  
the Escherichia coli protein L2) was examined by chemical  
modification. The complex [B-L3-23S RNA] was photooxidized in the  
presence of rose bengal and the modified protein incorporated by  
reconstitution into 50S ribosomal subunits containing all other  
unmodified components. Particles containing photooxidized B-L3 are  
defective in several functional assays, including (1) poly(U)-directed  
poly(Phe) synthesis, (2) peptidyltransferase activity, (3) ability to  
associate with a [30S-poly(U)-Phe-tRNA] complex, and (4) binding of  
elongation factor G and GTP. The rates of loss of the partial  
functional activities during photooxidation of B-L3 indicate that at  
least two independent inactivating events are occurring, a faster one,  
involving oxidation of one or more histidine residues, affecting  
peptidyltransferase and subunit association activities and a slower  
one affecting EF-G binding. Therefore the protein B-L3 has one or more  
histidine residues which are essential for peptidyltransferase and  
subunit association, and another residue which is essential for EF-G- 
GTP binding. B-L3 may be the ribosomal peptidyltransferase protein, or  
a part of the active site, and may contribute functional groups to the  
other active sites as well.</AbstractText>
             </Abstract>
             <AuthorList CompleteYN="Y">
                 <Author ValidYN="Y">
                     <LastName>Fahnestock</LastName>
                     <ForeName>S R</ForeName>
                     <Initials>SR</Initials>
                 </Author>
             </AuthorList>
             <Language>eng</Language>
             <PublicationTypeList>
                 <PublicationType>Journal Article</PublicationType>
                 <PublicationType>Research Support, U.S. Gov't,  
P.H.S.</PublicationType>
             </PublicationTypeList>
         </Article>
         <MedlineJournalInfo>
             <Country>UNITED STATES</Country>
             <MedlineTA>Biochemistry</MedlineTA>
             <NlmUniqueID>0370623</NlmUniqueID>
         </MedlineJournalInfo>
         <ChemicalList>
             <Chemical>
                 <RegistryNumber>0</RegistryNumber>
                 <NameOfSubstance>Macromolecular Substances</ 
NameOfSubstance>
             </Chemical>
             <Chemical>
                 <RegistryNumber>0</RegistryNumber>
                 <NameOfSubstance>Ribosomal Proteins</NameOfSubstance>
             </Chemical>
         </ChemicalList>
         <CitationSubset>IM</CitationSubset>
         <MeshHeadingList>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Bacillus  
stearothermophilus</DescriptorName>
                 <QualifierName MajorTopicYN="Y">metabolism</ 
QualifierName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Binding Sites</ 
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Hydrogen-Ion  
Concentration</DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Kinetics</ 
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Macromolecular  
Substances</DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Oxidation-Reduction</ 
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Photochemistry</ 
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Protein Binding</ 
DescriptorName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Ribosomal Proteins</ 
DescriptorName>
                 <QualifierName MajorTopicYN="Y">metabolism</ 
QualifierName>
             </MeshHeading>
             <MeshHeading>
                 <DescriptorName MajorTopicYN="N">Ribosomes</ 
DescriptorName>
                 <QualifierName MajorTopicYN="N">metabolism</ 
QualifierName>
             </MeshHeading>
         </MeshHeadingList>
     </MedlineCitation>
     <PubmedData>
         <History>
             <PubMedPubDate PubStatus="pubmed">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
             </PubMedPubDate>
             <PubMedPubDate PubStatus="medline">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
                 <Hour>0</Hour>
                 <Minute>1</Minute>
             </PubMedPubDate>
             <PubMedPubDate PubStatus="entrez">
                 <Year>1975</Year>
                 <Month>12</Month>
                 <Day>2</Day>
                 <Hour>0</Hour>
                 <Minute>0</Minute>
             </PubMedPubDate>
         </History>
         <PublicationStatus>ppublish</PublicationStatus>
         <ArticleIdList>
             <ArticleId IdType="pubmed">52</ArticleId>
         </ArticleIdList>
     </PubmedData>
</PubmedArticle>

</PubmedArticleSet>

On Oct 24, 2009, at 2:45 PM, Robert Bradbury wrote:

> <alsaplayer-devel at lists.tartarus.org>
> I'm not sure if this is related to the MeSH question question or  
> not, but
> I've googled the documentation several times and never managed to find
> "robust" examples for how to manipulate PubMed records.
>
> It would seem that there ought to be code lying around which does:
>  Given Genbank ID,
>     Fetch all Pubmed records from that ID
>         Fetch all related records (via NCBI's "related" record IDs)
>
>     Purge the list of duplicates, then do things like fetch all of the
> abstracts or fetch all of the MeSH headings, etc. for all of those  
> records.
>
> Another example would include fetching all records of relatedness  
> (i.e. a
> PubMed tree of depth N (or cloud of some max N)).
>
> I think that one can use NCBI's fetch interface to do this (one  
> could do it
> by having NCBI email you all of the PubMed results and have an email
> harvester collect those results, parse them and setup a new set of
> queries).  Of course this seems like an overhead intensive way to do  
> this.
> Given the fact that increasing amounts of information is becoming  
> open to
> the public one could consider even parsing the published papers and
> supplemental files (e.g. XLS tables) for genes of interest (as it  
> seems the
> authors of most work as well as the PubMed record processors fail to  
> provide
> or research the gene name information that is supposed to be in the  
> PubMed
> records).
>
> Now it may simply be that its because I lack sufficient experience  
> with the
> BioPerl documentation that I am unaware of the functions/tools which  
> do this
> type of thing.  So if anyone has any hints/pointers they would be
> appreciated.
>
> Thanks,
> Robert Bradbury
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l