[Biopython-dev] syntax of indices for future Alignment object
Peter
biopython-dev at maubp.freeserve.co.uk
Sat Jul 28 10:04:04 UTC 2007
Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have
> three suggestions though:
>
> >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying
> an alphabet
That would mean changing the existing __init__ from:
def __init__(self, alphabet):
to something like:
def __init__(self, alphabet=single_letter_alphabet):
with this import statement added:
from Bio.Alphabet import single_letter_alphabet
This seems like a good idea, and shouldn't break any existing code either.
> >>> aln.add_sequence("seq1", "ATCGTTGC")
> >>> aln.add_sequence("seq2", "ATCCTTGC")
> >>> aln.add_sequence("seq3", "ATCCGTGC")
> >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>',
> name='<unknown name>', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description
I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.
We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).
> >>> aln[:2]
> <Bio.Align.Generic.Alignment instance at 0x10aaeb8>
> # OK
> >>> aln[:,4]
> 'TTG'
> # OK
> >>> aln[2,:]
> <Bio.Align.Generic.Alignment instance at 0x105efd0>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment
> consisting of a single sequence doesn't make much sense.
I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.
Peter
More information about the Biopython-dev
mailing list