[Biojava-l] Parsing Genbank/EMBL/XML Sequences from binary NCBI ASN.1 daily update files
Seth Johnson
johnson.biotech at gmail.com
Fri Jun 2 18:46:26 UTC 2006
Hi Mark,
Thank you for your suggestions. I've followed your suggestions and it
seems to have found a bug that caused an exception in readINSDseqDNA
parser.
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=94481355
The problem int the above sequence in INSDseq format was caused by the
presence of <INSDQualifier_name> tags without the corresponding
<INSDQualifier_value> tags:
<INSDQualifier>
<INSDQualifier_name>environmental_sample</INSDQualifier_name>
</INSDQualifier>
I have not checked wether it's handled correctly by other parsers when
it is converted from original NCBI ASN.1 format.
Could the code be adjusted so if there's no <INSDQualifier_value> tags
it would assume the value to be 'null' ???
Regards,
Seth
On 6/1/06, mark.schreiber at novartis.com <mark.schreiber at novartis.com> wrote:
> Hi Seth -
>
> The BioJavaX parsers are still quite new and have not been heavily tested
> so your experiences can help us quite a lot. The parsers where initially
> designed to be quite strict and follow the GenBank etc specifications.
> However, there are often minor variations to those specs which cause
> things to break.
>
> To help us find the bugs can you make sure you are using the very latest
> version of biojava from CVS, for example I was under the impression that
> the author = null problem had been solved. In each case an example file
> and the full stack trace is very useful as well. In some cases you have
> provided these so we have a starting point.
>
> Also, if you have ideas on ways to fix the problems your suggestions would
> be greatly appreciated. We only have a very small team of active
> developers many of whom are unfortunately very busy just now.
>
> Hopefully we can get to this soon.
>
> - Mark
>
>
>
>
>
> "Seth Johnson" <johnson.biotech at gmail.com>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 06/02/2006 06:03 AM
>
>
> To: biojava-l at lists.open-bio.org
> cc: (bcc: Mark Schreiber/GP/Novartis)
> Subject: [Biojava-l] Parsing Genbank/EMBL/XML Sequences from binary NCBI ASN.1
> daily update files
>
>
> Hi All,
>
> I'm a newbie to the whole BioJava(X) API and was hoping to get some
> clarification on several issues that I'm having.
> I am developing a parser that would take as input "NCBI Incremental
> ASN.1 Sequence Updates to Genbank" files (
> ftp://ftp.ncbi.nih.gov/ncbi-asn1/daily-nc ) , gunzip them, and use the
> ASN2GB converter (
> ftp://ftp.ncbi.nih.gov/asn1-converters/by_program/asn2gb ) to convert
> resulting sequences to a format parsable by BioJava(X) (
> http://www.penguin-soft.com/penguin/man/1/asn2gb.html ). This is where
> my problems start.
>
> ISSUE 1:
> I've tried to parse all of the formats that ASN2GB outputs ( GenBank
> (default) , EMBL, nucleotide GBSet (XML), nucleotide INSDSet (XML),
> tiny seq (XML) ) using either BioJava or BioJavaX API. Only GenBank
> format is recognized by the
> "RichSequence.IOTools.readGenbankDNA(inBuf,gbNspace)" function with
> some exceptions that I'll describe in issue #2. This is the code that
> I'm using to parse, for example, the EMBL output:
>
> BufferedReader inBuf = new BufferedReader(new
> FileReader("embl_output.emb"));
> Namespace gbNspace = (Namespace)
> RichObjectFactory.getObject(SimpleNamespace.class, new
> Object[]{"gbSpace"} );
> RichSequenceIterator gbSeqs =
> RichSequence.IOTools.readEMBLDNA(inBuf,gbNspace);
> while (gbSeqs.hasNext()) {
> try {
> RichSequence rs = gbSeqs.nextRichSequence();
> // Further processing or RichSequence object from here
>
> } catch (BioException be){
> be.printStackTrace();
> }
> }
>
> The multi-sequence EMBL file looks like this:
> ---------------------------------------------------------------------------------
> ID DQ472184 standard; DNA; INV; 546 BP.
> XX
> AC DQ472184;
> XX
> SV DQ472184.1
> DT 15-MAY-2006
> XX
> DE Trypanosoma cruzi strain CL Brener actin-related protein 3 (ARC21)
> gene,
> DE complete cds.
> XX
> KW .
> XX
> OS Trypanosoma cruzi strain CL Brener
> OC Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> OC Schizotrypanum.
> XX
> RN [1]
> RP 1-546
> RA De Melo L.D.B.;
> RT "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> RL Unpublished.
> XX
> RN [2]
> RP 1-546
> RA De Melo L.D.B.;
> RT ;
> RL Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> RL Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do
> Rio
> RL de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro,
> RJ
> RL 21949-900, Brazil
> XX
> FH Key Location/Qualifiers
> FH
> FT source 1..546
> FT /organism="Trypanosoma cruzi strain CL Brener"
> FT /mol_type="genomic DNA"
> FT /strain="CL Brener"
> FT /db_xref="taxon:353153"
> FT gene <1..>546
> FT /gene="ARC21"
> FT /note="TcARC21"
> FT mRNA <1..>546
> FT /gene="ARC21"
> FT /product="actin-related protein 3"
> FT CDS 1..546
> FT /gene="ARC21"
> FT /note="actin-binding protein; ARPC3 21 kDa; putative
> FT member of Arp2/3 complex"
> FT /codon_start=1
> FT /product="actin-related protein 3"
> FT /protein_id="ABF13401.1"
> FT /db_xref="GI:93360014"
> FT /translation="MHSRWNGYEESSLLGCGVYPLRRTSRLTPPGPAPRMDEMIEEG
> FT EEEPQDIVDEAFYFFKPHMFFRNFPIKGAGDRVILYLTMYLHECLKKIVQLKREEAH
> FT SVLLNYATMPFASPGEKDFPFNAFFPAGNEEEQEKWREYAKQLRLEANARLIEKVFL
> FT FPEKDGTGNKFWMAFAKRPFLASS"
> atgcacagca ggtggaatgg gtatgaagaa agtagtcttt tgggctgcgg tgtttatccg 60
> cttcgccgca cgtcacggct cactccaccc ggccctgcac cgcggatgga tgaaatgatt
> 120
> gaggagggcg aagaggagcc acaagacatt gttgacgagg cattttactt ttttaagccc
> 180
> cacatgtttt ttcgtaattt tcccattaag ggtgctggtg atcgtgtcat tctgtacttg
> 240
> acgatgtacc ttcatgagtg tttgaagaaa attgtccagt tgaagcgtga agaggcccat
> 300
> tctgtgcttc ttaactacgc tacgatgccg tttgcatcac caggggaaaa ggactttccg
> 360
> tttaacgcgt ttttccctgc tgggaatgag gaggaacaag aaaaatggcg agagtatgca
> 420
> aaacagcttc gattggaggc caacgcacgt ctcattgaga aggtttttct ttttccagag
> 480
> aaggacggca ccggaaacaa gttctggatg gcgtttgcga agaggccttt cttggcttct
> 540
> agttag 546
> //
> ID DQ472185 standard; DNA; INV; 543 BP.
> XX
> AC DQ472185;
> XX
> SV DQ472185.1
> DT 15-MAY-2006
> XX
> DE Trypanosoma cruzi strain CL Brener actin-related protein 4 (ARC20)
> gene,
> DE complete cds.
> XX
> KW .
> XX
> OS Trypanosoma cruzi strain CL Brener
> OC Eukaryota; Euglenozoa; Kinetoplastida; Trypanosomatidae; Trypanosoma;
> OC Schizotrypanum.
> XX
> RN [1]
> RP 1-543
> RA De Melo L.D.B.;
> RT "Actin of Trypanosoma cruzi: ubiquitous actin-binding proteins";
> RL Unpublished.
> XX
> RN [2]
> RP 1-543
> RA De Melo L.D.B.;
> RT ;
> RL Submitted (03-APR-2006) to the EMBL/GenBank/DDBJ databases.
> RL Instituto de Biofisica Carlos Chagas Filho, Universidade Federal do
> Rio
> RL de Janeiro, Cidade Universitaria, CCS, Bl.G, Sl.G157, Rio de Janeiro,
> RJ
> RL 21949-900, Brazil
> XX
> FH Key Location/Qualifiers
> FH
> FT source 1..543
> FT /organism="Trypanosoma cruzi strain CL Brener"
> FT /mol_type="genomic DNA"
> FT /strain="CL Brener"
> FT /db_xref="taxon:353153"
> FT gene <1..>543
> FT /gene="ARC20"
> FT /note="TcARC20"
> FT mRNA <1..>543
> FT /gene="ARC20"
> FT /product="actin-related protein 4"
> FT CDS 1..543
> FT /gene="ARC20"
> FT /note="actin-binding protein; ARPC4 20 kDa; putative
> FT member of Arp2/3 complex"
> FT /codon_start=1
> FT /product="actin-related protein 4"
> FT /protein_id="ABF13402.1"
> FT /db_xref="GI:93360016"
> FT /translation="MATAYLPYYDCIKCTLHAALCIGNYPSCTVERHNKPEVEVADH
> FT LENNGEIKVQDFLLNPIRIVRSEQESCLIEPSINSTRISVSFLKSDAIAEIIARKYV
> FT GFLAQRAKQFHILRKKPIPGYDISFLISHEEVETMHRNRIIQFIITFLMDIDADIAA
> FT MKLNVNQRARRAAMEFFLALNFT"
> atggcaaccg cctatttgcc ttactacgac tgcatcaagt gcacgttgca cgcggctttg 60
> tgcatcggga attatccttc atgtaccgtg gagcgtcata ataaaccaga agttgaggtt
> 120
> gcagaccatc tggagaataa tggtgaaata aaagtacaag atttccttct taaccccata
> 180
> cgcattgtgc gttcagaaca ggaaagttgt cttattgaac ctagtataaa cagcacacgc
> 240
> atatctgtat cgtttctcaa gagcgacgct attgcagaga ttattgcccg aaagtacgtt
> 300
> ggatttttag ctcagcgagc caaacagttt cacatcttga gaaaaaagcc tattccggga
> 360
> tatgatataa gttttttgat ttctcacgag gaagtagaaa caatgcatag gaataggatt
> 420
> attcaattta taattacttt cttgatggat attgatgctg acattgctgc aatgaagttg
> 480
> aatgtgaatc aacgtgcacg tcgagcagcg atggaattct ttcttgcatt gaatttcaca
> 540
> tga 543
> //
> -----------------------------------------------------------------------
> I get an exception message "Could Not Read Sequence". Same thing
> happens if I use the readINSDSetDNA reader instead of readEMBLDNA one
> with the following INSDset file (beginning of the file):
>
> <?xml version="1.0"?>
> <!DOCTYPE INSDSeq PUBLIC "-//NCBI//INSD INSDSeq/EN" "INSD_INSDSeq.dtd">
> <INSDSeq>
> <INSDSeq_locus>DQ022078</INSDSeq_locus>
> <INSDSeq_length>16729</INSDSeq_length>
> <INSDSeq_moltype>DNA</INSDSeq_moltype>
> <INSDSeq_topology>linear</INSDSeq_topology>
> <INSDSeq_division>ENV</INSDSeq_division>
> <INSDSeq_update-date>15-MAY-2006</INSDSeq_update-date>
> <INSDSeq_create-date>15-MAY-2006</INSDSeq_create-date>
> <INSDSeq_definition>Uncultured bacterium WWRS-2005 putative
> aminoglycoside phosphotransferase (a3.001), putative oxidoreductase
> (a3.002), putative oxidoreductase (a3.003), putative beta-lactamase
> class C (estA3), putative permease (a3.005), putative transmembrane
> signal peptide (a3.006), thiol-disulfide isomerase (a3.007), histone
> acetyltransferase HPA2 (a3.008), putative enzyme (a3.009), putative
> asparaginase (a3.010), hypothetical protein (a3.011), hypothetical
> protein (a3.012), putative membrane protease subunit (a3.013),
> putative haloalkane dehalogenase (a3.014), putative transcriptional
> regulator (a3.015), putative peptidyl-dipeptidase Dcp (a3.016), and
> hypothetical protein (a3.017) genes, complete cds</INSDSeq_definition>
> <INSDSeq_primary-accession>DQ022078</INSDSeq_primary-accession>
> <INSDSeq_other-seqids>
> <INSDSeqid>gb|DQ022078.1|</INSDSeqid>
> <INSDSeqid>gi|71842722</INSDSeqid>
> </INSDSeq_other-seqids>
> <INSDSeq_keywords>
> <INSDKeyword>ENV</INSDKeyword>
> </INSDSeq_keywords>
> <INSDSeq_references>
> <INSDReference>
> <INSDReference_reference>?</INSDReference_reference>
> <INSDReference_position>1..16729</INSDReference_position>
> <INSDReference_authors>
> <INSDAuthor>Schmeisser,C.</INSDAuthor>
> <INSDAuthor>Elend,C.</INSDAuthor>
> <INSDAuthor>Streit,W.R.</INSDAuthor>
> </INSDReference_authors>
> <INSDReference_title>Isolation and biochemical characterization
> of two novel metagenome derived esterases</INSDReference_title>
> <INSDReference_journal>Appl. Environ. Microbiol. 0:0-0
> (2006)</INSDReference_journal>
> </INSDReference>
> <INSDReference>
> <INSDReference_reference>?</INSDReference_reference>
> <INSDReference_position>1..16729</INSDReference_position>
> <INSDReference_authors>
> <INSDAuthor>Schmeisser,C.</INSDAuthor>
> <INSDAuthor>Elend,C.</INSDAuthor>
> <INSDAuthor>Streit,W.R.</INSDAuthor>
> </INSDReference_authors>
> <INSDReference_journal>Submitted (29-APR-2005) to the
> EMBL/GenBank/DDBJ databases. Molekulare Enzymtechnologie, University
> Duisburg-Essen, Lotharstrasse 1, Duisburg D-47057,
> Germany</INSDReference_journal>
> </INSDReference>
> </INSDSeq_references>
>
> So my question is wether the ASN2GB produces output that's
> incompatible with BioJava parsers or is there a problem with the
> sequence themselves or the problems with the majority of parsers???
> Could it be that I'm using the API wrongly for the above formats,
> although GenBank parser works as advertised with some exceptions
> below:
>
> ISSUE #2:
> When I try to parse GenBank files using the following code:
>
> BufferedReader inBuf = new BufferedReader(new
> FileReader("genbank_output.gb"));
> Namespace gbNspace = (Namespace)
> RichObjectFactory.getObject(SimpleNamespace.class, new
> Object[]{"gbSpace"} );
> RichSequenceIterator gbSeqs =
> RichSequence.IOTools.readGenbankDNA(inBuf,gbNspace);
> while (gbSeqs.hasNext()) {
> try {
> RichSequence rs = gbSeqs.nextRichSequence();
> // Further processing or RichSequence object from here
>
> } catch (BioException be){
> be.printStackTrace();
> }
> }
>
> Genbank file in question:
>
> LOCUS BC074905 838 bp mRNA linear PRI
> 15-APR-2006
> DEFINITION Homo sapiens kallikrein 14, mRNA (cDNA clone MGC:104038
> IMAGE:30915482), complete cds.
> ACCESSION BC074905
> VERSION BC074905.2 GI:50959825
> KEYWORDS MGC.
> SOURCE Homo sapiens (human)
> ORGANISM Homo sapiens
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
> Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> Catarrhini; Hominidae; Homo.
> REFERENCE 1 (bases 1 to 838)
> AUTHORS Strausberg,R.L., Feingold,E.A., Grouse,L.H., Derge,J.G.,
> Klausner,R.D., Collins,F.S., Wagner,L., Shenmen,C.M.,
> Schuler,G.D.,
> Altschul,S.F., Zeeberg,B., Buetow,K.H., Schaefer,C.F.,
> Bhat,N.K.,
> Hopkins,R.F., Jordan,H., Moore,T., Max,S.I., Wang,J.,
> Hsieh,F.,
> Diatchenko,L., Marusina,K., Farmer,A.A., Rubin,G.M., Hong,L.,
> Stapleton,M., Soares,M.B., Bonaldo,M.F., Casavant,T.L.,
> Scheetz,T.E., Brownstein,M.J., Usdin,T.B., Toshiyuki,S.,
> Carninci,P., Prange,C., Raha,S.S., Loquellano,N.A.,
> Peters,G.J.,
> Abramson,R.D., Mullahy,S.J., Bosak,S.A., McEwan,P.J.,
> McKernan,K.J., Malek,J.A., Gunaratne,P.H., Richards,S.,
> Worley,K.C., Hale,S., Garcia,A.M., Gay,L.J., Hulyk,S.W.,
> Villalon,D.K., Muzny,D.M., Sodergren,E.J., Lu,X., Gibbs,R.A.,
> Fahey,J., Helton,E., Ketteman,M., Madan,A., Rodrigues,S.,
> Sanchez,A., Whiting,M., Madan,A., Young,A.C., Shevchenko,Y.,
> Bouffard,G.G., Blakesley,R.W., Touchman,J.W., Green,E.D.,
> Dickson,M.C., Rodriguez,A.C., Grimwood,J., Schmutz,J.,
> Myers,R.M.,
> Butterfield,Y.S., Krzywinski,M.I., Skalska,U., Smailus,D.E.,
> Schnerch,A., Schein,J.E., Jones,S.J. and Marra,M.A.
> CONSRTM Mammalian Gene Collection Program Team
> TITLE Generation and initial analysis of more than 15,000
> full-length
> human and mouse cDNA sequences
> JOURNAL Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002)
> PUBMED 12477932
> REFERENCE 2 (bases 1 to 838)
> CONSRTM NIH MGC Project
> TITLE Direct Submission
> JOURNAL Submitted (25-JUN-2004) National Institutes of Health,
> Mammalian
> Gene Collection (MGC), Bethesda, MD 20892-2590, USA
> REMARK NIH-MGC Project URL: http://mgc.nci.nih.gov
> COMMENT On Aug 4, 2004 this sequence version replaced gi:49901832.
> Contact: MGC help desk
> Email: cgapbs-r at mail.nih.gov
> Tissue Procurement: Genome Sequence Centre, British Columbia
> Cancer
> Center
> cDNA Library Preparation: British Columbia Cancer Research
> Center
> cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL)
> DNA Sequencing by: Genome Sequence Centre,
> BC Cancer Agency, Vancouver, BC, Canada
> info at bcgsc.bc.ca
> Martin Hirst, Thomas Zeng, Ryan Morin, Michelle Moksa, Johnson
> Pang, Diana Mah, Jing Wang, Kieth Fichter, Eric Chuah, Allen
> Delaney, Rob Kirkpatrick, Agnes Baross, Sarah Barber, Mabel
> Brown-John, Steve S. Chand, William Chow, Ryan Babakaiff, Dave
> Wong, Corey Matsuo, Jaclyn Beland, Susan Gibson, Luis delRio,
> Ruth
> Featherstone, Malachi Griffith, Obi Griffith, Ran Guin, Nancy
> Liao,
> Kim MacDonald, Mike R. Mayo, Josh Moran, Diana Palmquist, JR
> Santos, Duane Smailus, Jeff Stott, Miranda Tsai, George Yang,
> Jacquie Schein, Asim Siddiqui,Steven Jones, Rob Holt, Marco
> Marra.
>
> Clone distribution: MGC clone distribution information can be
> found
> through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov
> Series: IRBU Plate: 4 Row: C Column: 3.
>
> Differences found between this sequence and the human
> reference
> genome (build 36) are described in misc_difference features
> below.
> FEATURES Location/Qualifiers
> source 1..838
> /organism="Homo sapiens"
> /mol_type="mRNA"
> /db_xref="taxon:9606"
> /clone="MGC:104038 IMAGE:30915482"
> /tissue_type="Lung, PCR rescued clones"
> /clone_lib="NIH_MGC_273"
> /lab_host="DH10B"
> /note="Vector: pCR4 Topo TA with reversed insert"
> gene 1..838
> /gene="KLK14"
> /note="synonym: KLK-L6"
> /db_xref="GeneID:43847"
> /db_xref="HGNC:6362"
> /db_xref="IMGT/GENE-DB:6362"
> /db_xref="MIM:606135"
> CDS 49..804
> /gene="KLK14"
> /codon_start=1
> /product="KLK14 protein"
> /protein_id="AAH74905.1"
> /db_xref="GI:50959826"
> /db_xref="GeneID:43847"
> /db_xref="HGNC:6362"
> /db_xref="IMGT/GENE-DB:6362"
> /db_xref="MIM:606135"
> /translation="MFLLLTALQVLAIAMTRSQEDENKIIGGYTCTRSSQPWQAALLA
> GPRRRFLCGGALLSGQWVITAAHCGRPILQVALGKHNLRRWEATQQVLRVVRQVTHPN
> YNSRTHDNDLMLLQLQQPARIGRAVRPIEVTQACASPGTSCRVSGWGTISSPIARYPA
> SLQCVNINISPDEVCQKAYPRTITPGMVCAGVPQGGKDSCQGDSGGPLVCRGQLQGLV
> SWGMERCALPGYPGVYTNLCKYRSWIEETMRDK"
> misc_difference 98
> /gene="KLK14"
> /note="'G' in cDNA is 'A' in the human genome; amino
> acid
> difference: 'R' in cDNA, 'Q' in the human genome."
> misc_difference 133
> /gene="KLK14"
> /note="'T' in cDNA is 'C' in the human genome; amino
> acid
> difference: 'Y' in cDNA, 'H' in the human genome."
> ORIGIN
> 1 atgtccctga gggtcttggg ctctgggacc tggccctcag cccctaaaat
> gttcctcctg
> 61 ctgacagcac ttcaagtcct ggctatagcc atgacacgga gccaagagga
> tgagaacaag
> 121 ataattggtg gctatacgtg cacccggagc tcccagccgt ggcaggcggc
> cctgctggcg
> 181 ggtcccaggc gccgcttcct ctgcggaggc gccctgcttt caggccagtg
> ggtcatcact
> 241 gctgctcact gcggccgccc gatccttcag gttgccctgg gcaagcacaa
> cctgaggagg
> 301 tgggaggcca cccagcaggt gctgcgcgtg gttcgtcagg tgacgcaccc
> caactacaac
> 361 tcccggaccc acgacaacga cctcatgctg ctgcagctac agcagcccgc
> acggatcggg
> 421 agggcagtca ggcccattga ggtcacccag gcctgtgcca gccccgggac
> ctcctgccga
> 481 gtgtcaggct ggggaactat atccagcccc atcgccaggt accccgcctc
> tctgcaatgc
> 541 gtgaacatca acatctcccc ggatgaggtg tgccagaagg cctatcctag
> aaccatcacg
> 601 cctggcatgg tctgtgcagg agttccccag ggcgggaagg actcttgtca
> gggtgactct
> 661 gggggacccc tggtgtgcag aggacagctc cagggcctcg tgtcttgggg
> aatggagcgc
> 721 tgcgccctgc ctggctaccc cggtgtctac accaacctgt gcaagtacag
> aagctggatt
> 781 gaggaaacga tgcgggacaa atgatggtct tcacggtggg atggacctcg tcagctgc
> //
>
> I get the following exception:
>
> java.lang.IllegalArgumentException: Authors string cannot be null
> org.biojava.bio.BioException: Could not read sequence
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112)
> at
> exonhit.parsers.GenBankParser.getSequences(GenBankParser.java:107)
> at
> exonhit.parsers.GenBankParser.runGBparser(GenBankParser.java:258)
> at exonhit.parsers.GenBankParser.main(GenBankParser.java:341)
> Caused by: java.lang.IllegalArgumentException: Authors string cannot be
> null
> at
> org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:76)
> at
> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:356)
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109)
>
> -----------------------------------------------------------------------
>
> I'm trying to see what could be the problem with this particular
> sequence. Looks to me like the AUTHORS portion is not getting parsed
> correctly. Any ideas would be greatly appreciated!
>
> --
> Best Regards,
>
>
> Seth Johnson
> Senior Bioinformatics Associate
>
> Ph: (202) 470-0900
> Fx: (775) 251-0358
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
--
Best Regards,
Seth Johnson
Senior Bioinformatics Associate
Ph: (202) 470-0900
Fx: (775) 251-0358
More information about the Biojava-l
mailing list