[Biopython-dev] Blast parsers and records
Michiel de Hoon
mjldehoon at yahoo.com
Fri Jun 4 15:55:27 UTC 2010
Michael, Peter, Sebastian, Laurent, Jose, and others,
Thanks for your comments. It looks like there are lots of things to discuss, so let's start with the easiest ones.
About converting a record to a string (point 5): I agree that using __str__ is probably not the best choice, so let's use __format__ instead, or add a "write" method. The added advantage of these is that we can print out a record in different formats (xml, text, table) by specifying the requested format as an argument.
For point 3), maybe my wording was confusing; actually what I had in mind is the case where a given Blast program can produce different output formats (xml, text, table, etc.). This was inspired by this bug report:
http://bugzilla.open-bio.org/show_bug.cgi?id=2176
In my mind, the different output formats are just different intermediates, but in essence they are the same and should therefore be stored in the same class. So, if I run blastp, save the result as XML, and parse it, I'd expect the same class as when I run blastp and save and parse the output in table format. Just in the latter case, some information may be missing if it is not available in the output in table format. Does that sound acceptable?
--Michiel.
--- On Fri, 5/28/10, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: [Biopython-dev] Blast parsers and records
> To: biopython-dev at biopython.org
> Date: Friday, May 28, 2010, 11:23 PM
> Hi everybody,
>
> With Biopython 1.54 out (thanks Peter!), and NCBI
> encouraging to use its new Blast+ suite of Blast programs,
> maybe this is a good time to tackle some older bugs related
> to Blast output parsing in Biopython:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2176
> (inconsistencies in the output of different Blast parsers)
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2929
> (inconsistencies between Psi-blast parsers)
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2319
> (parsing Blast table output)
>
> and more generally think about the design of the Blast
> record class and Blast parsing. In my opinion, these are the
> major issues:
>
> 1) Blast parsers are located in several modules
> (Bio.Blast.NCBIXML, Bio.Blast.NCBIStandalone,
> Bio.Blast.ParseBlastTable). I think we should have one
> read() function and one parse() function under Bio.Blast,
> with arguments specifying which format the Blast output is
> in.
>
> 2) Blast records produced by any of the parsers should be
> consistent with each other. As XML output by blast and
> psi-blast follow the same DTD, we should be able to
> represent both by a single Record class.
>
> 3) Different parsers should store information in this
> Record class in the same way.
>
> 4) The current Blast record stores its information in
> attributes. If you use Bio.Entrez to parse Blast XML output
> (Biopython 1.54 contains the necessary DTDs to do so), the
> information is stored in dictionaries. This has some
> advantages. For example, it allows you to use record.keys()
> to find out what the record contains. Ideally, I think that
> a Blast Record class should inherit from a dictionary.
>
> 5) We should be able to print a Blast record object to
> generate output that is close to the plain-text output
> generated by blast. This would allow us to generate and
> store Blast output as XML, and to convert the output to
> plain-text to make it more human-readable.
>
> 6) The current Blast record inherits from
> Bio.Blast.Record.Header, Bio.Blast.Record.DatabaseReport,
> and Bio.Blast.Record.Parameters. I don't see the rationale
> for this inheritance, and I think we should remove it.
>
> Any comments, suggestions (in particular about by proposal
> to have a Blast Record class that inherits from a
> dictionary? Btw, to avoid breaking scripts, I propose that
> any changes to the Blast record and parser are implemented
> separately from the existing parsers and record, and to
> leave those untouched.
>
> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
More information about the Biopython-dev
mailing list