[Biojava-l] RefSeq bioJava parser problem
wanner.de@pg.com
wanner.de@pg.com
Tue, 14 May 2002 11:40:33 -0400
Hi,
Appreciate the responses to the refSeq question. We've been able to put togther
a reliable parser using the example in TestRefSeqPrt.
Have an additional question now. Are there any utility methods within bioJava
that can be used to handle parsed values that are returned by bioJava in list
form.
For example the following value was returned from bioJava for a sequence
annotation with key MEDLINE:
[98127055, 99357812]
Another example is the value that was returned from bioJava for a feature annotation with key db_xref:
[LocusID:946, MIM:604405]
bioJava does good work in accumulating the information together and placing it under a specific annotation, does
anyone know if there are method to extract listMembers or parameter/value pairs already available in bioJava?
thx,
Dave
With a catch, the genbank reader is the right thing to use. The issue is
that the Genbank parser only reads nucleotide sequences, and you've got an
amino acid sequence here. So, the biojava sequence is being built with the
wrong alphabet and breaks when you hit the sequence. TestRefSeqPrt will
handle files like these. The sample code should point you in the right
direction.
Greg
> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Thursday, May 09, 2002 8:58 AM
> To: wanner.de@pg.com
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] RefSeq bioJava parser problem
>
>
> Hi,
>
> The genbank parser may be being fairly paranoid about the
> exact format.
> I don't think that these parsers were writen in a modular
> manner, so it
> may not be easy to plug together your own customized version without
> access to the source code. The source code can be obtained from
> anonymous CVS, or as a download from the biojava web site.
> Alternatively, the org.biojava.bio.program.tagvalue package
> provides an
> alternative and modularly extensible, but poorly documented API for
> processing tag-value files such as these. You may be able to
> knock up a
> complete parser in a couple of hours that way, depending on what you
> want to turn these files into.
>
> Could someone with genbank parsing experteese say what the
> differences
> between the two formats are, and how easy it would be to get
> the genbank
> parser to accept refseq documents?
>
> Matthew
>
> wanner.de@pg.com wrote:
> > bioJava members,
> >
> > Our Genbank parser using bioJava has been working great.
> We've now been asked
> > to parse RefSeq accession numbers..... which seem to have
> only minor differences
> > in the Genbank format, however, bioJava cannot read the
> sequence. I get the
> > " org.biojava.bio.BioException: Could not read sequence"
> exception. Below is
> > the sequence I am trying to parse (downloaded from NCBI):
> Do you have any
> > ideas ? Sould we be
> > using something other than a Genbank reader to be parsing this?
> >
> > Thanks Much,
> >
> > LOCUS NP_000221 167 aa
> linear PRI 29-JAN-2002
> > DEFINITION leptin precursor; leptin (murine obesity
> homolog); obesity; obesity
> > (murine homolog, leptin) [Homo sapiens].
> > ACCESSION NP_000221
> > PID g4557715
> > VERSION NP_000221.1 GI:4557715
> > DBSOURCE REFSEQ: accession NM_000230.1
> > KEYWORDS .
> > SOURCE human.
> > ORGANISM Homo sapiens
> > Eukaryota; Metazoa; Chordata; Craniata;
> Vertebrata; Euteleostomi;
> > Mammalia; Eutheria; Primates; Catarrhini;
> Hominidae; Homo.
> > REFERENCE 1 (residues 1 to 167)
> > AUTHORS Friedman JM, Leibel RL, Siegel DS, Walsh J and Bahary N.
> > TITLE Molecular mapping of the mouse ob mutation
> > JOURNAL Genomics 11 (4), 1054-1062 (1991)
> > MEDLINE 92147101
> > PUBMED 1686014
> > REFERENCE 2 (residues 1 to 167)
> > AUTHORS Zhang Y, Proenca R, Maffei M, Barone M, Leopold
> L and Friedman JM.
> > TITLE Positional cloning of the mouse obese gene and
> its human homologue
> > JOURNAL Nature 372 (6505), 425-432 (1994)
> > MEDLINE 95075453
> > PUBMED 7984236
> > REMARK Erratum:[[published erratum appears in Nature 1995 Mar
> > 30;374(6521):479]]
> > REFERENCE 3 (residues 1 to 167)
> > AUTHORS Masuzaki H, Ogawa Y, Isse N, Satoh N, Okazaki
> T, Shigemoto M, Mori
> > K, Tamura N, Hosoda K, Yoshimasa Y et al.
> > TITLE Human obese gene expression. Adipocyte-specific
> expression and
> > regional differences in the adipose tissue
> > JOURNAL Diabetes 44 (7), 855-858 (1995)
> > MEDLINE 95309556
> > PUBMED 7789654
> > REFERENCE 4 (residues 1 to 167)
> > AUTHORS Green ED, Maffei M, Braden VV, Proenca R,
> DeSilva U, Zhang Y, Chua
> > SC Jr, Leibel RL, Weissenbach J and Friedman JM.
> > TITLE The human obese (OB) gene: RNA expression
> pattern and mapping on
> > the physical, cytogenetic, and genetic maps of
> chromosome 7
> > JOURNAL Genome Res. 5 (1), 5-12 (1995)
> > MEDLINE 96352898
> > PUBMED 8717050
> > REFERENCE 5 (residues 1 to 167)
> > AUTHORS Isse N, Ogawa Y, Tamura N, Masuzaki H, Mori K,
> Okazaki T, Satoh N,
> > Shigemoto M, Yoshimasa Y, Nishi S et al.
> > TITLE Structural organization and chromosomal
> assignment of the human
> > obese gene
> > JOURNAL J. Biol. Chem. 270 (46), 27728-27733 (1995)
> > MEDLINE 96070903
> > PUBMED 7499240
> > REFERENCE 6 (residues 1 to 167)
> > AUTHORS Gong,D.W., Bi,S., Pratley,R.E. and Weintraub,B.D.
> > TITLE Genomic structure and promoter analysis of the
> human obese gene
> > JOURNAL J. Biol. Chem. 271 (8), 3971-3974 (1996)
> > MEDLINE 96223958
> > REFERENCE 7 (residues 1 to 167)
> > AUTHORS Niki T, Mori H, Tamori Y, Kishimoto-Hashirmoto
> M, Ueno H, Araki S,
> > Masugi J, Sawant N, Majithia HR, Rais N et al.
> > TITLE Human obese gene: molecular screening in
> Japanese and Asian Indian
> > NIDDM patients associated with obesity
> > JOURNAL Diabetes 45 (5), 675-678 (1996)
> > MEDLINE 96198511
> > PUBMED 8621021
> > REFERENCE 8 (residues 1 to 167)
> > AUTHORS Comuzzie,A.G., Hixson,J.E., Almasy,L.,
> Mitchell,B.D., Mahaney,M.C.,
> > Dyer,T.D., Stern,M.P., MacCluer,J.W. and Blangero,J.
> > TITLE A major quantitative trait locus determining
> serum leptin levels
> > and fat mass is located on human chromosome 2
> > JOURNAL Nat. Genet. 15 (3), 273-276 (1997)
> > MEDLINE 97207647
> > PUBMED 9054940
> > REFERENCE 9 (residues 1 to 167)
> > AUTHORS Clement,K., Vaisse,C., Lahlou,N., Cabrol,S., Pelloux,V.,
> > Cassuto,D., Gourmelen,M., Dina,C., Chambaz,J.,
> Lacorte,J.M.,
> > Basdevant,A., Bougneres,P., Lebouc,Y.,
> Froguel,P. and Guy-Grand,B.
> > TITLE A mutation in the human leptin receptor gene
> causes obesity and
> > pituitary dysfunction
> > JOURNAL Nature 392 (6674), 398-401 (1998)
> > MEDLINE 98196670
> > PUBMED 9537324
> > REFERENCE 10 (residues 1 to 167)
> > AUTHORS Friedman,J.M. and Halaas,J.L.
> > TITLE Leptin and the regulation of body weight in mammals
> > JOURNAL Nature 395 (6704), 763-770 (1998)
> > MEDLINE 99010835
> > COMMENT REVIEWED REFSEQ: This record has been curated
> by NCBI staff. The
> > reference sequence was derived from U43653.1.
> > Summary: This gene is similar to the mouse
> obesity gene (ob). The
> > protein encoded by this gene is secreted by
> white adipocytes. In
> > the mouse study, mutations in this gene are
> linked to severe and
> > morbid obesity.
> > FEATURES Location/Qualifiers
> > source 1..167
> > /organism="Homo sapiens"
> > /db_xref="taxon:9606"
> > /chromosome="7"
> > /map="7q31.3"
> > Protein 1..167
> > /product="leptin precursor"
> > /note="leptin (murine obesity
> homolog); obesity (murine
> > homolog, leptin)"
> > sig_peptide 1..21
> > Region 22..167
> > /region_name="Leptin"
> > /note="Leptin"
> > /db_xref="CDD:pfam02024"
> > mat_peptide 22..167
> > /product="leptin"
> > CDS 1..167
> > /gene="LEP"
> > /coded_by="NM_000230.1:57..560"
> > /db_xref="LocusID:3952"
> > /db_xref="MIM:164160"
> > ORIGIN
> > 1 mhwgtlcgfl wlwpylfyvq avpiqkvqdd tktliktivt
> rindishtqs vsskqkvtgl
> > 61 dfipglhpil tlskmdqtla vyqqiltsmp srnviqisnd
> lenlrdllhv lafskschlp
> > 121 wasgletlds lggvleasgy stevvalsrl qgslqdmlwq ldlspgc
> > //
> >
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
_______________________________________________
Biojava-l mailing list - Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l