[Bioperl-l] Test related Suggestions
Chris Fields
cjfields at uiuc.edu
Thu Jul 5 11:54:42 EDT 2007
On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:
> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> -Heikki
>> ...
>>
>> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use. However, what if the test file is old (as many in t/data are)
>> and the format has changed? GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just
> layout? e.g. a version number of the likes?
>
> Nath
I don't think so; this veers back into the whole validation issue
(i.e. does the record fit certain specifications). There are
examples of seq records from different sources which bioperl is
expected to parse, for example Ensembl GenBank records. Some of
those have feature tags or annotation fields which may not appear in
output when using write_seq().
I don't think it's as important to replicate the output data exactly
like the input as much as it's important to have the data represented
in a Bio::Seq object (or any other Bio* instance) in a consistent
manner and have the ability to incorporate new fields (such as the
recent addition of genome projects) transparently. The latter is
hard to do with the current genbank parser (you have to specifically
code for it), but it is a bit easier to do with the driver-handler
model I'm working on.
chris
More information about the Bioperl-l
mailing list