[Biopython-dev] GenBank parser -- first go
Brad Chapman
chapmanb at arches.uga.edu
Wed Dec 6 02:31:36 EST 2000
Hi Cayte;
Thanks for trying this out!
[GenBank parser]
> Does it strip html tags? When I ran checkoutput.py, it produced this
> output:
[the parser doesn't like html]
> The problem with conversions to text is that Netscape and Explorer and
> probably others use different algorithms and produce different text output.
GenBank is a flat file format, like FASTA, so all of the html markup
that NCBI or whoever puts in is just arbitrary to "beautify" it for
the web.
You should be able to get the text GenBank version of any record
without having to do a "save as text" on an html page. On the NCBI
page, there is a Text button at the top of a list of records that
will give you the flat-file text version of a record you searched
for using Entrez. You can then save this as text, and it'll be
consistent between browsers.
Once you get this the parser should be happier with the file :-).
Let me know if this doesn't help.
Brad
More information about the Biopython-dev
mailing list