[Biojava-l] Parsing exception reading sequence from GenbankRichSequenceDB
George Waldon
gwaldon at geneinfinity.org
Fri Feb 17 17:56:06 UTC 2012
Hi Scott,
Yes, well done. You need to fix rettype too. So, if I have it correct,
we should uncomment and have:
rettype = "gb"
retmode = "txt"
and existing code should not be broken. What do you think? I can
commit if you do not have a developer account.
Thanks,
- George
Quoting Scott Frees <sfrees at ramapo.edu>:
> George - Thanks for your response.
>
> I think I tracked down the problem. When building the FetchURL,
> GenbankRichSequenceDB uses "genbank" as the db. In the
> org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
> specifically not set when given "genbank" - see lines 54-55 commented
> out.
>
> // rettype = format;
> // retmode = format;
>
> Entrez recently updated their API
> (http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
> release notes they say they've set defaults on each database for
> retmode. I'm new to biojava and entrez, but I can only assume that
> the "genbank" db used to return sequences as text always, which is why
> FetchURL doesn't include the parameter in the URL it builds. It looks
> like the default now is XML - which breaks the GenbankRichSequenceDB
> parser.
>
> I proved it out by subclassing GenbankRichSequenceDB to set the
> retmode parameter as text, and the problem is resolved.
>
> @Override
> protected URL getAddress(String id) throws MalformedURLException {
> FetchURL seqURL = new FetchURL("Genbank", "text");
> String baseurl = seqURL.getbaseURL();
> String db = seqURL.getDB();
> // added retmode=text
> String url =
> baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
> return new URL(url);
> }
>
> I think a more elegant solution would be to simply fix FetchURL to use
> the retmode parameter
>
> Regards -
> Scott
>
> On Thu, Feb 16, 2012 at 8:53 PM, George Waldon
> <gwaldon at geneinfinity.org> wrote:
>> Hello Scott,
>>
>> This appears to be an exception thrown by the parser. Is-there a way you can
>> fetch the sequence(s) as a text file before the exception occurs? It would
>> be interesting to see if you can reproduce the exception; you can send me
>> the file if you want.
>>
>> Regards,
>> George
>>
>> Quoting Scott Frees <sfrees at ramapo.edu>:
>>
>>> Hello -
>>>
>>> I have developed an application that searches and compares
>>> g-quadruplexes within mRNA. The web application has been running
>>> without any problems on several different web servers for over a year.
>>> Suddenly, just this week, it is unable to download sequence data
>>> using GenbankRichSequenceDB - has anyone else has had this problem?
>>>
>>> We are using BioJava 1.8.1
>>>
>>> Below is the exception trace, and the code that follows is a small
>>> test app that generates the exception. This code worked without any
>>> problems prior to Tuesday this week, and we haven't made any
>>> modification to our application.
>>> ------------------------------------------------------
>>> org.biojava.bio.BioException: Failed to read Genbank sequence
>>> at
>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
>>> at Tester.main(Tester.java:11)
>>> Caused by: org.biojava.bio.BioException: Could not read sequence
>>> at
>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
>>> at
>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
>>> ... 1 more
>>> Caused by: org.biojava.bio.seq.io.ParseException:
>>>
>>> A Exception Has Occurred During Parsing.
>>> Please submit the details that follow to biojava-l at biojava.org or post
>>> a bug report to http://bugzilla.open-bio.org/
>>>
>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>> Accession=null
>>> Id=null
>>> Comments=Bad section
>>> Parse_block=<?xml version="1.0"?>
>>> Stack trace follows ....
>>>
>>> at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
>>> at
>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
>>> at
>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>> ... 2 more
>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>> of range: -4
>>> at java.lang.String.substring(Unknown Source)
>>> at java.lang.String.substring(Unknown Source)
>>> at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
>>> ... 4 more
>>> -----------------------------
>>>
>>>
>>> import org.biojava.bio.BioException;
>>> import org.biojava.bio.seq.db.IllegalIDException;
>>> import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
>>> import org.biojavax.bio.seq.RichSequence;
>>>
>>> public class Tester {
>>> public static void main(String args[]) {
>>> String id =
>>> "NM_001110.2"; // Issue occurs with any ID
>>>
>>> GenbankRichSequenceDB ncbi = new
>>> GenbankRichSequenceDB();
>>> try {
>>>
>>> RichSequence rs = ncbi.getRichSequence(id);
>>>
>>> System.out.println(rs.seqString());
>>> } catch
>>> (IllegalIDException e) {
>>>
>>> e.printStackTrace();
>>> } catch
>>> (BioException e) {
>>>
>>> e.printStackTrace();
>>> }
>>> }
>>> }
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>> --------------------------------
>> George Waldon
>>
>>
>
--------------------------------
George Waldon
More information about the Biojava-l
mailing list