[Biopython-dev] SeqRecord id behavior
Peter Cock
p.j.a.cock at googlemail.com
Tue May 29 18:02:20 EDT 2012
On Tue, May 29, 2012 at 10:32 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi all,
>
> I have some questions/comments regarding how SeqRecord handles various
> arguments.
>
>>>> print SeqRecord(seq="G")
> ID: <unknown id>
> Name: <unknown name>
> Description: <unknown description>
> Number of features: 0
> 'G'
>>>> print SeqRecord(seq="G", id=2)
> TypeError: id argument should be a string
>>>> print SeqRecord(seq="G", id=None)
> Name: <unknown name>
> Description: <unknown description>
> Number of features: 0
> 'G'
>
> 1. Couldn't a sequence id hypothetically be an integer? In which
> case, it could be converted to a string.
We want to be able to assume a string for things like the
string formatting operators used in SeqRecord output
(dealing with None as a special case is annoying enough).
> 2. Regarding this comment on line 180:
> https://github.com/biopython/biopython/blob/master/Bio/SeqRecord.py#L180
>
> if id is not None and not isinstance(id, basestring):
> #Lots of existing code uses id=None... this may be a bad idea.
> raise TypeError("id argument should be a string")
>
> Why might that be a bad idea? id=None will currently set self.id to
> None, so it doesn't affect the type checking.
Using None for the ID prevents code assuming it is a string
(but see below).
> 3. Is it desirable to be able to remove the id from the __str__
> representation,
No - the sequence and the ID are the two most important
bits of a SeqRecord.
> or would it be more consistent to do this:
>
> if id == "<unknown id>" or id is None:
> self.id = "<unknown id>"
> else:
> (typecheck here)
>
> Lenna
I never liked the face that "<unknown id>" has a space in it.
This breaks the assumption of loads of file formats. Many
file formats don't like an empty ID, so maybe "<unknown-id>"
is better. On the other hand, it is fairly common in Python
to use None as a missing data representation... which
currently the SeqRecord allows you to do.
Note these SeqRecord defaults predate Bio.SeqIO - if we
didn't have to worry about breaking existing code I would
much rather make the ID a mandatory SeqRecord argument.
Peter
More information about the Biopython-dev
mailing list