[Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Tue Feb 21 21:36:30 UTC 2012

Issue #3326 has been updated by Fabio Zanini.

Right, neither Python nor Numpy support iterators, for different reasons - AFAIK.

# Python lists actually do support it, kind of; that is the idea behind *list comprehensions*:
 new_list = [rec for rec in iterator]
does exactly this!
# Numpy probably avoids it for problems when extending to many dimensions, as you mentioned.

Multiple Sequence Alignments, however, are intrinsically two dimensional, and have no easy list comprehension. Your compromise is what I am proposing as well. This needs two steps:
# we check that the index object supports _for_ cycles, i.e. has an __iter__ method (see http://docs.python.org/library/stdtypes.html#iterator-types):
 if hasattr(index, '__iter__'):
# we generate the new MSA by a for cycle:
 return MultipleSeqAlignment((self._records[i] for i in index), self._alphabet)

Note that double slicing is not really an issue, since in that case *we are already using that method*! In fact, we now have:
 #Handle double indexing
     #e.g. sub_align = align[1:4, 5:7], gives another alignment
     return MultipleSeqAlignment((rec[col_index] for rec in self._records[row_index]), self._alphabet)
We would only need to modify this easily to:
 if hasattr(row_index, '__iter__'):
     return MultipleSeqAlignment((self._record[i][col_index] for i in row_index), self._alphabet)
Finally, I would gladly post to the mailing list. You mean the Biopython-Dev Mailing List <biopython-dev at biopython.org>, right?
Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects

Author: Fabio Zanini
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 

Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.:

- alignment[4,6]
- alignment[2:4,3:6]
- alignment[3:4:5]

In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case.

However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy:

# Check whether the index is an iterator
if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')):
    return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet)

Would you think this is useful?

You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org

More information about the Biopython-dev mailing list