[BioPython] How to check codon usage for specific amino acid positions in a given set of CDS sequences

Thu Jan 15 13:11:42 EST 2009

On Thu, Jan 15, 2009 at 6:02 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Jan 15, 2009 at 2:26 PM, Giovanni Marco Dall'Olio
> <dalloliogm at fastwebnet.it> wrote:
>> Let's see how can you do this with biopython (ehi, Peter, please
>> correct me if I say something wrong!! :)).
>> ...
>> You will be able to access all the sequences in your alignment by the
>> _records property of AlignIO:
>
> In Python anything starting with a single underscore is considered to
> be a private variable, and you should avoid using it.  So you
> shouldn't be doing alignment._records, and if you do, don't complain
> if this implementation detail changes in a future version of
> Biopython.
>
> For the Alignment object, if you really want a list of SeqRecord
> objects you should use alignment.get_all_seqs() instead. ...

On a related note, if you just want a list of SeqRecord objects from
an alignment file, you can do this:

from Bio import AlignIO
alignment = AlignIO.read(open("my_example.phy"), "phylip")
records = alignment.get_all_seqs()

However, any input alignment format supported by Bio.AlignIO (like the
PHYLIP format used in this example) can also be used via Bio.SeqIO, so
you might prefer to do this:

from Bio import SeqIO
records = list(SeqIO.parse(open("my_example.phy"), "phylip"))

Up to you.  It rather depends on what you are trying to do with the
sequences - sometimes working with the SeqRecord objects directly is
preferable.

Peter