[emboss-dev] Regression in GenBank/GenPept parsing?

Peter biopython at maubp.freeserve.co.uk
Tue Jul 21 13:18:01 UTC 2009


On Tue, Jul 21, 2009 at 11:40 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>>> My next task (once I've made sure your bugs are fixed) is to
>>> regenerate all the tables of formats.
>>
>> Great. This may save you having to answer my next question,
>> which was could you expand on what EMBOSS considers to be
>> the differences between "genbank", "genpept" and "refseqp" as
>> file formats? Of course, I may come up with further questions ;)
>
> Oh, further questions please! We love answering them.
>
> GenPept format expects to find 9 fields on the LOCUS line.
> RefseqP format expects only 8.
>
> The difference is GenPept format including the original GenPept locus name.

Which 8 or 9 fields?

> We may try to merge them one day. If we do, we would keep the format
> names but use one parser.

That makes sense.

> Your Genpept (refseqp) format problem will be fixed in a patch. It was
> fine for one sequence but needed to rebuffer the input file to work with
> multiple input sequences.

Grand. Will there be an EMBOSS 6.1.1 in a week or so then (addressing
this, the FASTQ @ problem, and any other minor issues)?

> Meanwhile, could you tar up the biopython test data and scripts
> http://biopython.open-bio.org/SRC/biopython/Tests/ and I will try
> running the same data through EMBOSS to see what issues we
> can find.

http://biopython.open-bio.org/SRC/biopython/ is just a dump from
our repository (hourly or something). If you just download the latest
Biopython source code, this will have all the unit test files etc:
http://biopython.org/DIST/biopython-1.51b.tar.gz

You could also grab the latest code from CVS or github - further
details on request.

Ask if you need clarification on what any of the test data files are
for. In some cases searching the Tests/test_*.py files may have
informative comments.

Peter C.



More information about the emboss-dev mailing list