[Biopython-dev] [biopython] TAIR (Arabidopsis) sequence retreival module (#132)

Tue Dec 18 13:46:06 UTC 2012

Hi Peter, Chris and the mailing list,

Thanks very much for the feedback!

> Query: It isn't clear to me (from a first read) what MultipartPostHandler
is needed for.
The arabidopsis.org server form requires the content-type to be a multipart
form, not a urlencoded form, which the standard urllib2 does not handle. I
could write a custom handler, however when writing the module I found
MultipartPostHandler, and figured I should use that. I may be wrong, but
couldn't figure out any other way of doing it.

>Minor: The module's docstring should start with a one line summary then a
blank line (see PEP8 style guide).
>Note: Since your unit test requires internet access, it should include
these lines to work nicely in our testing framework (which allows the tests
needing network access to be skipped)
I'll fix the module docstring and requires_internet check tomorrow.

>Why does the NCBI code exist given it is such a thin wrapper round the
Bio.Entrez code - the module would be a lot simpler if it was just a
wrapper for www.arabidopsis.org alone.
The NCBI functions exist to get genbank files for AGIs, as TAIR's
sequence retrieval only gives fasta files, so if users need/want the extra
metadata a genbank file gives, they can use this module. As you've said,
this is a *very* thin wrapper, so would it be better to just provide the
mapping dicts in Bio.TAIR._ncbi for people to use however they see fit?

>Query: Why do your methods return SeqRecord objects? Is this because the
handle might return FASTA with a non-FASTA header which must be stripped
off?
SeqRecord handles were returned for two reasons, the first being as you
said that the raw return text is not always a valid fasta file, despite my
efforts to trim extraneous text. The latter is simply that is what i
required when writing it, and i could not think of a better way of
returning it. (and I thought that the return of a SeqRecord allowed
"pythonic" processing of results, a la the test suite). Again happy for any
suggestions

>Why does classes TAIRDirect and TAIRNCBI exist? Wouldn't module level
functions be simpler (or at least, consistent with other modules like
Bio.Entrez)
>Style: Why introduce the mode argument and two magic values NCBI_RNA and
NCBI_PROTEIN?
The honest answer to both of these is personal choice. If consistency is an
issue i will reimplement as module-level functions and textual arguments
respectively.

Regarding the placement of modules, i'm happy for it to go wherever. I
would imagine that there are other niche web interface "getters" such as
this, and think your suggestion sounds great, although i can't think what
we could call it. Perhaps Bio.Web.TAIR?

Regards
Kevin Murray

On 18 December 2012 10:34, Peter Cock <notifications at github.com> wrote:

> Hi Kevin,
>
> Thanks for your code submission. I've not had a chance to play with it,
> but I do have some comments/queries - some of which are perhaps just style
> issues.
>
> Note: Since your unit test requires internet access, it should include
> these lines to work nicely in our testing framework (which allows the tests
> needing network access to be skipped):
>
> import requires_internet
> requires_internet.check()
>
> Query: It isn't clear to me (from a first read) what MultipartPostHandler
> is needed for.
>
> Minor: The module's docstring should start with a one line summary then a
> blank line (see PEP8 style guide).
>
> Query: Why does classes TAIRDirect and TAIRNCBI exist? Wouldn't module
> level functions be simpler (or at least, consistent with other modules like
> Bio.Entrez)?
>
> Query: Why do your methods return SeqRecord objects? Is this because the
> handle might return FASTA with a non-FASTA header which must be stripped
> off?
>
> Style: Why introduce the mode argument and two magic values NCBI_RNA and
> NCBI_PROTEIN?
>
> In fact I would go further and ask why does the NCBI code exist given it
> is such a thin wrapper round the Bio.Entrez code - the module would be a
> lot simpler if it was just a wrapper for www.arabidopsis.org alone.
>
> I'm also not sure about the namespace Bio.TAIR, the old Bio.www namespace
> might have been better but that was deprecated a while back, and the other
> semi-natural fit under Biopython's old OBDA effort is also defunct
> (attempting to catalogue a collection of sequence resources, see
> http://obda.open-bio.org for background if curious). The namespace issue
> at least would be worth bringing up on the dev mailing list... especially
> if you can think of many other examples like this for specialised resources.
>
> Regards,
>
> Peter
>
> —
> Reply to this email directly or view it on GitHub<https://github.com/biopython/biopython/pull/132#issuecomment-11467257>.
>
>