Hi Philip,

On Mon, May 27, 2013 at 3:27 AM, Phillip Garland <pgarland at gmail.com> wrote:
> The fasta formatted record is fine, the problem seems to come after
> requesting and reading the genbank-formatted record for the protein
> with GI:16130152.
> It looks like the record was modified a few days ago:
> LOCUS       NP_416719                367 aa            linear   CON 24-MAY-2013
> and ends with
> CONTIG      join(WP_000865568.1:1..367)\n//\n\n'
> instead of
> ORIGIN and the sequence data.
> Is this a problem with the genbank record that should be reported to
> NCBI, or is SeqIO supposed to handle the record as it is by fetching
> the sequence from the linked contig, or is the test doing the wrong
> thing by using rettype="gb" instead of rettype="gbwithparts"?

Interesting - it looks like the NCBI made a change to Entrez and
where previously this record had included the sequence with
rettype="gb" now we have to ask for it explicitly with the longer
rettype="gbwithparts" - my guess is this is now happening on
more records.

Note it does not affect all records, consider this example in our
Tutorial which seems unchanged:

  from Bio import Entrez
  Entrez.email = "A.N.Other at example.com"     # Always tell NCBI who you are
  handle = Entrez.efetch(db="nucleotide", id="186972394",
rettype="gb", retmode="text")
  print handle.read()


> Here's the test output:
> pgarland at cradle:~/Hacking/Source/Biology/biopython/Tests$ python
> run_tests.py test_SeqIO_online.py
> Python version: 2.7.5 (default, May 20 2013, 11:51:12)
> [GCC 4.7.3]
> Operating system: posix linux2
> test_SeqIO_online ... FAIL
> ======================================================================
> FAIL: test_protein_16130152 (test_SeqIO_online.EntrezTests)
> Bio.Entrez.efetch(protein, 16130152, ...)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/pgarland/Hacking/Source/Biology/biopython/Tests/test_SeqIO_online.py",
> line 77, in <lambda>
>     method = lambda x : x.simple(d, f, e, l, c)
>   File "/home/pgarland/Hacking/Source/Biology/biopython/Tests/test_SeqIO_online.py",
> line 65, in simple
>     self.assertEqual(seguid(record.seq), checksum)
> AssertionError: 'NT/aFiTXyD/7KixizZ9sq2FcniU' != 'fCjcjMFeGIrilHAn6h+yju267lg'
> ----------------------------------------------------------------------
> Ran 1 test in 10.010 seconds
> FAILED (failures = 1)

I'd noticed this on Friday but hadn't looked into why the sequence was
different (and sometimes Entrez errors are transient). Thanks for
exploring this :)

Would you like to submit a pull request to update test_SeqIO_online.py
or should I just go ahead and change the rettype?

It would be sensible to review all the Entrez examples in the Tutorial,
to perhaps make more use of 'gbwithparts' rather than 'gb'?



