[BioPython] Problems parsing xml with sax
BONIS SANZ, JULIO
JBonis at imim.es
Mon May 9 10:22:47 EDT 2005
Solved ;)
Just after defining the parser
>>> parser = from xml.sax.make_parser()
put:
self.__parser.setFeature('http://xml.org/sax/features/external-general-entities',False)
This disables the external entities solver, avoiding the problem of the DTD use.
Regards,
Julio Bonis Sanz MD
www.juliobonis.com
-----Mensaje original-----
De: biopython-bounces at portal.open-bio.org
[mailto:biopython-bounces at portal.open-bio.org]En nombre de BONIS SANZ,
JULIO
Enviado el: lunes, 09 de mayo de 2005 15:34
Para: Biopython mailing list
Asunto: [BioPython] Problems parsing xml with sax
Maybe it is not closely related with biopython, maybe it is... anyway:
I am using biopython GenBank.EUtils.ThinClient.ThinClient() to get some xml from NCBI.
After that I have build some xml parsers in sax to get information.
My problem is that sax does not recognize the format that NCBI uses in their DTD for SNP records.
I did:
snpdbi = GenBank.DBIds("snp",['6313'])
file = GenBank.EUtils.ThinClient.ThinClient.efetch_using_dbids(snpdbi,rettype = 'flt',retmode = 'xml')
the problem is that file starts with:
### <?xml version="1.0"?>
### <!DOCTYPE NSE-rs PUBLIC "-//NCBI//NSE/EN" "/entrez/query/DTD/NSE.dtd">
And sax returns this error:
ValueError: unknown url type: /entrez/query/DTD/NSE.dtd
When retrieving from elink I have not that problem. For example:
>>> dbid = GenBank.DBIds("nucleotide",['55956922'])
>>> xmlFileWithSNPsStream = eutils.elink_using_dbids(dbid,db="snp")
And I can parse with sax, as the file starts with:
### <?xml version="1.0"?>
### <!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">
That is a well formed URL....
Any idea?
Regards,
Julio Bonis Sanz MD
www.juliobonis.com
-----Mensaje original-----
De: biopython-bounces at portal.open-bio.org
[mailto:biopython-bounces at portal.open-bio.org]En nombre de Michiel Jan
Laurens de Hoon
Enviado el: sábado, 07 de mayo de 2005 7:25
Para: bneron at pasteur.fr
CC: Biopython mailing list
Asunto: Re: [BioPython] Rethinking Seq objects
bneron at pasteur.fr wrote:
> just an exemple:
>
> Python 2.3.4 (#1, Mar 11 2005, 17:34:27)
> [GCC 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
>>>>from Bio.Seq import Seq
>>>>from Bio import Translate
>>>>from Bio.Alphabet import IUPAC
>>>>my_alpha = IUPAC.unambiguous_dna
>>>>my_seq_upper = Seq('GATCGATGGGCCTATTAGGATCGAAAATCGC', my_alpha)
>>>>my_seq_lower = Seq('gatcgatgggcctattaggatcgaaaatcgc', my_alpha)
>>>>standard_translator = Translate.unambiguous_dna_by_id[1]
>>>>standard_translator.translate(my_seq_upper)
>
> Seq('DRWAY*DRKS', HasStopCodon(IUPACProtein(), '*'))
>
>>>>standard_translator.translate(my_seq_lower)
>
> Seq('**********', HasStopCodon(IUPACProtein(), '*'))
>
>
> obviously the lower case doesn't work in the Seq object.
I agree, this should be corrected. The translate and transcribe methods should
work with both uppercase and lowercase.
--Michiel.
--
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
_______________________________________________
BioPython mailing list - BioPython at biopython.org
http://biopython.org/mailman/listinfo/biopython
_______________________________________________
BioPython mailing list - BioPython at biopython.org
http://biopython.org/mailman/listinfo/biopython
More information about the BioPython
mailing list