[Biopython-dev] syntax of indices for future Alignment object

Peter biopython-dev at maubp.freeserve.co.uk
Sat Jul 28 10:04:04 UTC 2007


Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to 
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have 
> three suggestions though:
> 
>  >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying 
> an alphabet

That would mean changing the existing __init__ from:

def __init__(self, alphabet):

to something like:

def __init__(self, alphabet=single_letter_alphabet):

with this import statement added:

from Bio.Alphabet import single_letter_alphabet

This seems like a good idea, and shouldn't break any existing code either.

>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>  >>> aln.add_sequence("seq2", "ATCCTTGC")
>  >>> aln.add_sequence("seq3", "ATCCGTGC")
>  >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
> name='<unknown name>', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description

I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.

We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).

>  >>> aln[:2]
> <Bio.Align.Generic.Alignment instance at 0x10aaeb8>
> # OK
>  >>> aln[:,4]
> 'TTG'
> # OK
>  >>> aln[2,:]
> <Bio.Align.Generic.Alignment instance at 0x105efd0>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment 
> consisting of a single sequence doesn't make much sense.

I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.

Peter





More information about the Biopython-dev mailing list