[Biojava-l] UniprotParser
Saif Ur-Rehman
su24 at st-andrews.ac.uk
Mon Sep 19 10:09:46 UTC 2011
Dear all,
I am having issues with the BioJava UniProt parser as detailed below:
Code:
BufferedReader br = new BufferedReader(new FileReader( files[index]));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator iterator = RichSequence.IOTools.readUniProt(br, ns);
while(iterator.hasNext())
{
try
{
RichSequence rs=iterator.nextRichSequence();
}
catch (NoSuchElementException e)
{
}
catch (BioException e)
{
e.printStackTrace();
}
The file I am using is downloaded from the link:
ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_fungi.dat.gz
The problem is that the parser works for a subset of the IDs within the file
and on others throws an exception.
Sample Exception stack trace:
*** Start of trace *************************
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at uniprot.mp.main(mp.java:161)
Caused by: org.biojava.bio.seq.io.ParseException:
A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post a bug
report to http://bugzilla.open-bio.org/
Format_object=org.biojavax.bio.seq.io.UniProtFormat
Accession=P53031
Id=
Comments=
Parse_block=RN [1]RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].RC STRAIN=NCYC
2512;RX MEDLINE=97082501; PubMed=8923737;
DOI=10.1002/(SICI)1097-0061(199610)12:13<1321::AID-YEA27>3.0.CO;2-6;RA
Rodriguez P.L., Ali R., Serrano R.;RT "CtCdc55p and CtHa13p: two putative
regulatory proteins from Candida
tropicalis with long acidic domains.";RL Yeast 12:1321-1329(1996).
Stack trace follows ....
at
org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:615)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 1 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:486)
... 2 more
org.biojava.bio.BioException: Could not read sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at uniprot.mp.main(mp.java:161)
Caused by: org.biojava.bio.seq.io.ParseException: Name has not been supplied
********End of trace**********************************
An example of an Id that worked is:
ZYM1_SCHPO
while an ID that didn't work is:
ZUO1_YEAST
Thanks a lot in advance.
Cheers,
Saif
--
Saif Ur-Rehman
Centre for Evolution, Genes and Genomics
Harold Mitchell Building
University of St Andrews
St Andrews
Fife
KY16 9TH
UK
Tel: +44 131 5572556
Fax: +44 1334 463366
More information about the Biojava-l
mailing list