[Biojava-dev] Re Uniprot Format
Andreas Prlic
ap3 at sanger.ac.uk
Sat Jun 7 10:35:05 UTC 2008
Hi Ekta,
You can parse through the whole file if you add a try - catch around
seq.nextRichSequence() (see below).
Still there were in fact a few records that caused problems to the
parser. I committed a quick catch for these to SVN.
the main 2 problems were:
* some DOI reference lines contain extra ; characters, which are
confusing to the parser For the moment such lines are ignored.
* some locations contain a ? character in front of them, added a
catch for these to the UniProtLocationParser...
Andreas
while (seqs.hasNext()) {
RichSequence seq = null;
try {
seq = seqs.nextRichSequence();
} catch (Exception e){
e.printStackTrace();
continue;
}
//your code here
}
> Hi Andreas,
>
> The file im trying to parse is uniprot_sprot_human.dat.gz available
> from
> ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/
> knowledgebase/taxonomic_divisions/
>
>
> Sorry its too big to attach. I have attached my entire code (the one
> that is doing the parsing).
>
> The error message is :
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.biojava.bio.seq.io.ParseException.newMessage(Ljava/lang/
> Class;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/
> lang/String;)Ljava/lang/String;
> at
> org.biojavax.bio.seq.io.UniProtFormat.readRichSequence
> (UniProtFormat.java:614)
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence
> (RichStreamReader.java:110)
> at fileReader.ParseUniprot.UniprotFromFile(ParseUniprot.java:62)
> at fileReader.WriteToFile.main(WriteToFile.java:50)
>
>
> The code gets stuck after the uniprotID '5HT2B_HUMAN' and gives the
> above exception.
>
> Thanks a ton, appreciate the help.
>
> Ekta
>
>>>> Andreas Prlic <ap3 at sanger.ac.uk> 06/06/08 1:00 PM >>>
> Hi Ekta,
>
> We need more info in order to help. Can you provide the file you want
> to parse, the code you use for parsing, and also the exception trace?
>
> Andreas
>
>
>
> On 6 Jun 2008, at 11:40, Ekta Jain wrote:
>
>> Hello there,
>> there seems to be a bug in the UniprotFormat. It gives me an error at
>> line 614. In the Uniprot text data file, when i remove the records
>> (where my parser gets stuck) It all works okay.
>>
>> How can this be fixed?
>>
>> Ekta
>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a
>> charitable Company Limited by Guarantee, Registered in England
>> under Company No. 534147 with its Registered Office at 123 Old
>> Brompton Road, London SW7 3RP.
>>
>> This e-mail message is confidential and for use by the addressee
>> only. If the message is received by anyone other than the
>> addressee, please return the message to the sender by replying to
>> it and then delete the message from your computer and network.
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> ----------------------------------------------------------------------
> -
>
> Andreas Prlic Wellcome Trust Sanger Institute
> Hinxton, Cambridge CB10 1SA, UK
> +44 (0) 1223 49 6891
>
> ----------------------------------------------------------------------
> -
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
> The Institute of Cancer Research: Royal Cancer Hospital, a
> charitable Company Limited by Guarantee, Registered in England
> under Company No. 534147 with its Registered Office at 123 Old
> Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the addressee
> only. If the message is received by anyone other than the
> addressee, please return the message to the sender by replying to
> it and then delete the message from your computer and
> network.<ParseUniprot.java>
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
-----------------------------------------------------------------------
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the biojava-dev
mailing list