[Biopython-dev] GenBank parser -- first go

Brad Chapman chapmanb at arches.uga.edu
Wed Dec 6 02:31:36 EST 2000


Hi Cayte;
Thanks for trying this out!

[GenBank parser]
>    Does it strip html tags?  When I ran checkoutput.py, it produced this
> output:

[the parser doesn't like html]

>   The problem with conversions to text is that Netscape and Explorer and
> probably others use different algorithms and produce different text output.

GenBank is a flat file format, like FASTA, so all of the html markup 
that NCBI or whoever puts in is just arbitrary to "beautify" it for
the web. 

You should be able to get the text GenBank version of any record
without having to do a "save as text" on an html page. On the NCBI
page, there is a Text button at the top of a list of records that 
will give you the flat-file text version of a record you searched 
for using Entrez. You can then save this as text, and it'll be 
consistent between browsers. 

Once you get this the parser should be happier with the file :-).

Let me know if this doesn't help.

Brad




More information about the Biopython-dev mailing list