[Biopython] Bio.Sequencing.Ace

Peter biopython at maubp.freeserve.co.uk
Sun Jun 28 03:31:30 EDT 2009


On Sun, Jun 28, 2009 at 4:27 AM, David
Winter<winda002 at student.otago.ac.nz> wrote:
>> I am trying to parse a large Ace file produced by newbler on 454 cDNAs
>> assemly. I followed the Bio.Sequencing.Ace cookbook here:
>> http://biopython.org/wiki/ACE_contig_to_alignment
>> and indeed, I can now fetch several properties of my contigs
>> (alignment of reads to consensus, contigs name, reads name).
>
> Good.
>
>> Yet,I would like to know if and how to perform the following tasks:
>> * retrieving the quality of specific nucleotides in the read.
>> * getting the consensus sequence.
>
> The cookbook example isn't meant to be complete documentation for the Ace
> module - just an example of something you might want to do with it. At the
> moment there is no tutorial chapter on the module but you can read the doc
> strings here:
>
> http://www.biopython.org/DIST/docs/api/Bio.Sequencing.Ace-pysrc.html
> Most of the tags you want to play with are in the Contig and Reads classes
> in that (and have the same names as the ACE format specification
>
> http://bozeman.mbt.washington.edu/consed/distributions/README.14.0.txt

Specifically you asked for the consensus sequence - which is simple
to get (as are its associated quality scores):

from Bio.Sequencing import Ace
for ace_contig in Ace.parse(handle) :
    print ace_contig.name # just a string
    print ace_contig.sequence # as a string with "*" chars for insertions
    print ace_contig.quality # list of scores (but not for the insertions)

There top level properties are simple enough - but I find drilling down
into the reads a bit more tricky. In general the Ace parser is a bit
non-obvious without knowing the Ace format. Having some __str__
and __repr__ methods defined on the objects returned would be
very nice - I may get time to work on this later this year. Anyone
else interested in this drop us an email.

Peter


More information about the Biopython mailing list