[BioPython] Problems parsing xml with sax

BONIS SANZ, JULIO JBonis at imim.es
Mon May 9 09:34:17 EDT 2005


Maybe it is not closely related with biopython, maybe it is... anyway:


I am using biopython GenBank.EUtils.ThinClient.ThinClient() to get some xml from NCBI.

After that I have build some xml parsers in sax to get information.

My problem is that sax does not recognize the format that NCBI uses in their DTD for SNP records.

I did:

snpdbi = GenBank.DBIds("snp",['6313'])
file = GenBank.EUtils.ThinClient.ThinClient.efetch_using_dbids(snpdbi,rettype = 'flt',retmode = 'xml')

the problem is that file starts with:

### <?xml version="1.0"?>
### <!DOCTYPE NSE-rs PUBLIC "-//NCBI//NSE/EN" "/entrez/query/DTD/NSE.dtd">

And sax returns this error: 

ValueError: unknown url type: /entrez/query/DTD/NSE.dtd


When retrieving from elink I have not that problem. For example:

>>> dbid = GenBank.DBIds("nucleotide",['55956922'])
>>> xmlFileWithSNPsStream = eutils.elink_using_dbids(dbid,db="snp")

And I can parse with sax, as the file starts with:

###  <?xml version="1.0"?>
###  <!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">

That is a well formed URL....


Any idea?

Regards, 

Julio Bonis Sanz MD
www.juliobonis.com


-----Mensaje original-----
De: biopython-bounces at portal.open-bio.org
[mailto:biopython-bounces at portal.open-bio.org]En nombre de Michiel Jan
Laurens de Hoon
Enviado el: sábado, 07 de mayo de 2005 7:25
Para: bneron at pasteur.fr
CC: Biopython mailing list
Asunto: Re: [BioPython] Rethinking Seq objects


bneron at pasteur.fr wrote:
> just an exemple:
> 
> Python 2.3.4 (#1, Mar 11 2005, 17:34:27) 
> [GCC 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 
>>>>from Bio.Seq import Seq
>>>>from Bio import Translate
>>>>from Bio.Alphabet import IUPAC
>>>>my_alpha = IUPAC.unambiguous_dna
>>>>my_seq_upper = Seq('GATCGATGGGCCTATTAGGATCGAAAATCGC', my_alpha)
>>>>my_seq_lower = Seq('gatcgatgggcctattaggatcgaaaatcgc', my_alpha)
>>>>standard_translator = Translate.unambiguous_dna_by_id[1]
>>>>standard_translator.translate(my_seq_upper)
> 
> Seq('DRWAY*DRKS', HasStopCodon(IUPACProtein(), '*'))
> 
>>>>standard_translator.translate(my_seq_lower)
> 
> Seq('**********', HasStopCodon(IUPACProtein(), '*'))
> 
> 
> obviously the lower case doesn't work in the Seq object.

I agree, this should be corrected. The translate and transcribe methods should 
work with both uppercase and lowercase.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
_______________________________________________
BioPython mailing list  -  BioPython at biopython.org
http://biopython.org/mailman/listinfo/biopython



More information about the BioPython mailing list