[Biojava-dev] Re Uniprot Format

Andreas Prlic ap3 at sanger.ac.uk
Sat Jun 7 10:35:05 UTC 2008


Hi Ekta,

You can parse through the whole file if you  add a try - catch around  
seq.nextRichSequence() (see below).

Still there were in fact a few records that caused problems to the  
parser. I committed a quick catch for these to SVN.

the main 2 problems were:
* some DOI reference lines contain extra ; characters, which are  
confusing to the parser For the moment such lines are ignored.
* some locations contain a ? character in front of them, added a  
catch for these to the UniProtLocationParser...

Andreas


while (seqs.hasNext()) {
	RichSequence seq = null;
	try {
		seq = seqs.nextRichSequence();
	} catch (Exception e){
		e.printStackTrace();	
		continue;
	}
	//your code here		
}





> Hi Andreas,
>
> The file im trying to parse is uniprot_sprot_human.dat.gz available  
> from
> ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ 
> knowledgebase/taxonomic_divisions/
>
>
> Sorry its too big to attach. I have attached my entire code (the one
> that is doing the parsing).
>
> The error message is :
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.biojava.bio.seq.io.ParseException.newMessage(Ljava/lang/ 
> Class;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/ 
> lang/String;)Ljava/lang/String;
> 	at
> org.biojavax.bio.seq.io.UniProtFormat.readRichSequence 
> (UniProtFormat.java:614)
> 	at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence 
> (RichStreamReader.java:110)
> 	at fileReader.ParseUniprot.UniprotFromFile(ParseUniprot.java:62)
> 	at fileReader.WriteToFile.main(WriteToFile.java:50)
>
>
> The code gets stuck after the uniprotID '5HT2B_HUMAN'  and gives the
> above exception.
>
> Thanks a ton, appreciate the help.
>
> Ekta
>
>>>> Andreas Prlic <ap3 at sanger.ac.uk> 06/06/08 1:00 PM >>>
> Hi Ekta,
>
> We need more info in order to help. Can you provide the file you want
> to parse, the code you use for parsing, and also the exception trace?
>
> Andreas
>
>
>
> On 6 Jun 2008, at 11:40, Ekta Jain wrote:
>
>> Hello there,
>> there seems to be a bug in the UniprotFormat. It gives me an error at
>> line 614. In the Uniprot text data file, when i remove the records
>> (where my parser gets stuck) It all works okay.
>>
>> How can this be fixed?
>>
>> Ekta
>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a
>> charitable Company Limited by Guarantee, Registered in England
>> under Company No. 534147 with its Registered Office at 123 Old
>> Brompton Road, London SW7 3RP.
>>
>> This e-mail message is confidential and for use by the addressee
>> only.  If the message is received by anyone other than the
>> addressee, please return the message to the sender by replying to
>> it and then delete the message from your computer and network.
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> ---------------------------------------------------------------------- 
> -
>
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
>                                +44 (0) 1223 49 6891
>
> ---------------------------------------------------------------------- 
> -
>
>
>
>
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
> The Institute of Cancer Research: Royal Cancer Hospital, a  
> charitable Company Limited by Guarantee, Registered in England  
> under Company No. 534147 with its Registered Office at 123 Old  
> Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee  
> only.  If the message is received by anyone other than the  
> addressee, please return the message to the sender by replying to  
> it and then delete the message from your computer and  
> network.<ParseUniprot.java>

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the biojava-dev mailing list