[Biopython-dev] test_SeqIO_online failure

Peter Cock p.j.a.cock at googlemail.com
Mon May 27 22:43:19 UTC 2013


On Mon, May 27, 2013 at 10:38 PM, Phillip Garland <pgarland at gmail.com> wrote:
> Hi Peter,
>
>> I'd noticed this on Friday but hadn't looked into why the sequence was
>> different (and sometimes Entrez errors are transient). Thanks for
>> exploring this :)
>>
>> Would you like to submit a pull request to update test_SeqIO_online.py
>> or should I just go ahead and change the rettype?
>>
>> It would be sensible to review all the Entrez examples in the Tutorial,
>> to perhaps make more use of 'gbwithparts' rather than 'gb'?
>>
>> Thanks,
>>
>> Peter
>
> The slight problem with just replacing "gb" with "gbwithparts" is that
> SeqIO doesn't take "gbwithparts" as an option for the file format. So
> in test_SeqIO_online.py, you have this code:
>
>             handle = Entrez.efetch(db=database, id=entry, rettype=f,
> retmode="text")
>             record = SeqIO.read(handle, f)
>
> which is a natural way to write the test (because it tests fasta and
> genbank files), but will currently fail if f is "gbwithparts", b/c
> SeqIO doesn't accept "gbwithparts" as a file format specifier. My
> guess is that most existing code hardcodes the rettype and SeqIO file
> format specifier, so we could just test for gbwithparts prior to
> calling SeqIO.read:
>
>   handle = Entrez.efetch(db=database, id=entry, rettype=f, retmode="text")
>             if f == "gbwithparts":
>                 f = "gb"
>             record = SeqIO.read(handle, f)
>
> I submitted a pull request with a minimal patch that does this.

That's good for now :)

> For code like this, it would be cleaner if SeqIO accepted,
> "gbwithparts" as an alias for "genbank", just like "gb" is, but I
> don't know if it's a common pattern enough to bother.

That makes some sense for parsing files, but all those aliases
would cause confusion with writing GenBank files.

> If records like this are becoming more common, then "gbwithparts"
> should be clearly documented in the biopython tutorial, though
> "gbwithparts" isn't clearly explained in NCBI's Entrez docs AFAICT. It
> seems safer to always use "gbwithparts" at this point, at least when
> you want the sequence.

Definitely - if the NCBI moves to using 'gb' as the light style
without the sequence then many people will just want to use
'gbwithparts' as their default when scripting this sort of thing.

Thanks,

Peter



More information about the Biopython-dev mailing list