[Biopython-dev] does something already do this?

Joh Johannsen johj at pacbell.net
Tue Jul 22 20:26:06 EDT 2003


Hi,

I'm new to biopython, been looking at the tutorial, trying out examples, and tried to code something that made sense to me, but I'm wondering if such a thing exists and I just haven't found it.

There are some cases, like the NCBI.query method where parsable results are embedded in HTML.  For example, on page 17 of the tutorial, there is an NCBI.query that gets results in FASTA format, then displays the HTML in a browser.  And on page 13, there is an example that iterates over FASTA records from a "FASTA-only" formatted file.  

So I made something that merged these two into one by creating a file handle-like object that has "readline" method that returns only the FASTA formatted lines (extracted from the HTML).  So it looks like:

result_handle = NCBI.query(...)
xtractor = TextExtractor( result_handle, "<pre>", "</pre>" )      # because <pre> is how FASTA is delimited in this case
parser = Fasta.RecordParser()
iterator = Fasta.iterator( parser, xtractor )

Anyway, the whole point of doing this is to not see HTML.  I'm totally not sure if the above makes sense as a general approach, but it is fairly easy in cases where delimiters are identifiable and the desired format is embedded in HTML.

So my question is, if I don't want to see the HTML, and to see only the python objects, is this problem totally solved?  Or are there areas like the above where the biopython methods return only HTML?

Regards,

JJ



More information about the Biopython-dev mailing list