[Bioperl-l] problems parsing EBI interposscan.xml

Hilmar Lapp hlapp at gnf.org
Wed Dec 8 19:18:34 EST 2004


It looks you're trying to parse an interpro scan match file with the 
InterPro ontology file parser (Bio::OntologyIO). Maybe what you need is 
the interpro parser in Bio::SeqIO?

	-hilmar

On Dec 3, 2004, at 10:17 AM, Mariano Latorre A wrote:

> I installed Bioperl 1.4 (also installed dependencies Heap and Graph). I
> need to parser interproscan xml reports.
>
> When I run "make test" it passed the Interproscan_parser test ok. But
> when I perform a Interproscan at EBI I get a XML that can not be 
> parsed.
> Bioperl says:
>
> Can't call method "identifier" on an undefined value at
> /usr/lib/perl5/site_perl/5.8.3/Bio/Ontology/SimpleOntologyEngine.pm 
> line
> 410.
>
>
> So I go to the test directory inside the bioperl installation and check
> the differences and the xml generated by EBI and the one provided by
> bioperl installation package and notice that they are totally
> different!!!
>
> I paste both file beginings (as you'll see they uses different 
> tags...):
>
> Thanks!
> Mariano
>
>
> 1.- the one provided for bioperl testing:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!-- edited with XML Spy v4.4 U (http://www.xmlspy.com) by LYNN WHITE
> (EMBL OUTSTATION THE EBI) -->
> <!DOCTYPE interprodb SYSTEM "interpro.dtd">
> <interprodb>
>   <release>
>     <dbinfo dbname="INTERPRO" version="5.1" entry_count="5630"
> file_date="12-JUL-2002 00:00:00"/>
>     <dbinfo dbname="SWISS" version="40.22" entry_count="110823"
> file_date="24-JUN-2002 00:00:00"/>
>     <dbinfo dbname="TREMBL" version="21.2" entry_count="671586"
> file_date="05-JUL-2002 00:00:00"/>
>     <dbinfo dbname="PRINTS" version="33.0" entry_count="1650"
> file_date="24-JAN-2002 00:00:00"/>
>     <dbinfo dbname="PREFILE" version="N/A" entry_count="252"
> file_date="18-JUL-2001 00:00:00"/>
>     <dbinfo dbname="PROSITE" version="17.5" entry_count="1565"
> file_date="21-JUN-2002 00:00:00"/>
>     <dbinfo dbname="PFAM" version="7.3" entry_count="3865"
> file_date="17-MAY-2002 00:00:00"/>
>     <dbinfo dbname="PRODOM" version="2001.3" entry_count="1346"
> file_date="28-JAN-2002 00:00:00"/>
>     <dbinfo dbname="SMART" version="3.1" entry_count="509"
> file_date="16-NOV-2000 00:00:00"/>
>     <dbinfo dbname="TIGRFAMs" version="1.2" entry_count="814"
> file_date="03-AUG-2001 00:00:00"/>
>   </release>
>   <interpro id="IPR000001" type="Domain" short_name="Kringle"
> protein_count="129">
>     <name>Kringle</name>
>     <abstract>
> Kringles are autonomous structural domains, found throughout the blood
>                clotting and fibrinolytic proteins.
> Kringle domains are believed to play a role in binding mediators (e.g.,
> membranes,
> other proteins or phospholipids), and in the regulation of proteolytic
> activity
> <cite idref="PUB00002414"/>, <cite idref="PUB00001541"/>, <cite
> idref="PUB00003257"/>.
> Kringle domains <cite idref="PUB00003400"/>, <cite
> idref="PUB00000803"/>, <cite idref="PUB00001620"/> are characterised by
> a triple loop, 3-disulphide bridge structure, whose  conformation is
> defined by a number of hydrogen bonds and small pieces of  
> anti-parallel
> beta-sheet. They are found in a varying number  of  copies,  in some
> serine proteases and
> plasma proteins.</abstract>
>     <example_list>
>       <example><db_xref dbkey="P00748" db="SWISS"/>Blood coagulation
> factor XII (Hageman factor) (1 copy)</example>
>       <example><db_xref dbkey="P00749" db="SWISS"/>Urokinase-type
> plasminogen activator (1 copy)</example>
>       <example><db_xref dbkey="Q08048" db="SWISS"/>Hepatocyte growth
> factor (HGF) (4 copies)</example>
>       <example><db_xref dbkey="Q04756" db="SWISS"/>Hepatocyte growth
> factor activator <cite idref="PUB00003400"/> (1 copy) <cite
> idref="PUB00002776"/></example>
>       <example>
>                                 <db_xref dbkey="P06867" db="SWISS"/>
> Plasminogen (5 copies)
>       </example>
>       <example>
>                                 <db_xref dbkey="P26927" db="SWISS"/>
> Hepatocyte growth factor like protein (4 copies) <cite
> idref="PUB00000355"/>
>
>
>
>
>
> 2.- The ouptput from EBI INTERPRO:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <EBIInterProScanResults>
>         <Header>
>                 <program name="InterProScan" version="4.0"
> citation="PMID:11590104" />
>                 <parameters>
>                         <sequences total="1" />
>                         <databases total="11">
>                                 <database number="1" name="PRODOM"
> type="sequences" />
>                                 <database number="2" name="PRINTS"
> type="matrix" />
>                                 <database number="3" name="PIR"
> type="model" />
>                                 <database number="4" name="PFAM"
> type="model" />
>                                 <database number="5" name="SMART"
> type="model" />
>                                 <database number="6" name="TIGRFAMs"
> type="model" />
>                                 <database number="7" name="PROFILE"
> type="strings" />
>                                 <database number="8" name="PROSITE"
> type="strings" />
>                                 <database number="9" name="SUPERFAMILY"
> type="model" />
>                                 <database number="10" name="SIGNALP"
> type="model" />
>                                 <database number="11" name="TMHMM"
> type="model" />
>                         </databases>
>                 </parameters>
>         </Header>
> <interpro_matches>
>
>    <protein id="SAM" length="393" crc64="847CBC4BD0EAA1BC" >
>         <interpro id="IPR002133" name="S-adenosylmethionine synthetase"
> type="Family">
>           <classification id="GO:0004478" class_type="GO">
>             <category>Molecular Function</category>
>             <description>methionine adenosyltransferase
> activity</description>
>           </classification>
>           <classification id="GO:0005524" class_type="GO">
>             <category>Molecular Function</category>
>             <description>ATP binding</description>
>           </classification>
>           <classification id="GO:0006730" class_type="GO">
>             <category>Biological Process</category>
>             <description>one-carbon compound metabolism</description>
>           </classification>
>           <match id="PIRSF000497" name="Methionine adenosyltransferase"
> dbname="PIR">
>             <location start="2" end="387" score="2.6e-224" status="T"
> evidence="HMMPIR" />
>           </match>
>           <match id="PF00438.9" name="S-adenosylmethionine synthetase,
> N-te" dbname="PFAM">
>             <location start="2" end="102" score="2.7e-63" status="T"
> evidence="HMMPfam" />
>           </match>
>           <match id="PF02772.5" name="S-adenosylmethionine synthetase,
> cent" dbname="PFAM">
>             <location start="116" end="238" score="5.1e-97" status="T"
> evidence="HMMPfam" />
>           </match>
>           <match id="PF02773.5" name="S-adenosylmethionine synthetase,
> C-te" dbname="PFAM">
>             <location start="240" end="382" score="2e-83" status="T"
> evidence="HMMPfam" />
>           </match>
>           <match id="TIGR01034" name="metK: S-adenosylmethionine
> synthetase" dbname="TIGRFAMs">
>             <location start="5" end="393" score="6.2e-232" status="T"
> evidence="HMMTigr" />
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list