[Biopython-dev] test_SeqIO_online failure

Peter Cock p.j.a.cock at googlemail.com
Mon May 27 05:05:44 EDT 2013


Hi Philip,

On Mon, May 27, 2013 at 3:27 AM, Phillip Garland <pgarland at gmail.com> wrote:
> The fasta formatted record is fine, the problem seems to come after
> requesting and reading the genbank-formatted record for the protein
> with GI:16130152.
>
> It looks like the record was modified a few days ago:
>
> LOCUS       NP_416719                367 aa            linear   CON 24-MAY-2013
>
> and ends with
>
> CONTIG      join(WP_000865568.1:1..367)\n//\n\n'
>
> instead of
>
> ORIGIN and the sequence data.
>
> Is this a problem with the genbank record that should be reported to
> NCBI, or is SeqIO supposed to handle the record as it is by fetching
> the sequence from the linked contig, or is the test doing the wrong
> thing by using rettype="gb" instead of rettype="gbwithparts"?

Interesting - it looks like the NCBI made a change to Entrez and
where previously this record had included the sequence with
rettype="gb" now we have to ask for it explicitly with the longer
rettype="gbwithparts" - my guess is this is now happening on
more records.

Note it does not affect all records, consider this example in our
Tutorial which seems unchanged:

  from Bio import Entrez
  Entrez.email = "A.N.Other at example.com"     # Always tell NCBI who you are
  handle = Entrez.efetch(db="nucleotide", id="186972394",
rettype="gb", retmode="text")
  print handle.read()

Curious.

> Here's the test output:
>
> pgarland at cradle:~/Hacking/Source/Biology/biopython/Tests$ python
> run_tests.py test_SeqIO_online.py
> Python version: 2.7.5 (default, May 20 2013, 11:51:12)
> [GCC 4.7.3]
> Operating system: posix linux2
> test_SeqIO_online ... FAIL
> ======================================================================
> FAIL: test_protein_16130152 (test_SeqIO_online.EntrezTests)
> Bio.Entrez.efetch(protein, 16130152, ...)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/pgarland/Hacking/Source/Biology/biopython/Tests/test_SeqIO_online.py",
> line 77, in <lambda>
>     method = lambda x : x.simple(d, f, e, l, c)
>   File "/home/pgarland/Hacking/Source/Biology/biopython/Tests/test_SeqIO_online.py",
> line 65, in simple
>     self.assertEqual(seguid(record.seq), checksum)
> AssertionError: 'NT/aFiTXyD/7KixizZ9sq2FcniU' != 'fCjcjMFeGIrilHAn6h+yju267lg'
>
> ----------------------------------------------------------------------
> Ran 1 test in 10.010 seconds
>
> FAILED (failures = 1)

I'd noticed this on Friday but hadn't looked into why the sequence was
different (and sometimes Entrez errors are transient). Thanks for
exploring this :)

Would you like to submit a pull request to update test_SeqIO_online.py
or should I just go ahead and change the rettype?

It would be sensible to review all the Entrez examples in the Tutorial,
to perhaps make more use of 'gbwithparts' rather than 'gb'?

Thanks,

Peter


More information about the Biopython-dev mailing list