[Biopython] Bio.Sequencing.Ace

Sun Jun 28 12:53:00 UTC 2009

Thanks Peter and David,

contig.sequence and contig.quality parameters are more or less the solution I basically wanted.   

Any additional tips are more than welcomed (For example: getting specific qualities of reads. I think this requires parsing the Phd file which is used as part of the assembly process. In addition: getting read strand).

Thanks,
Avi

--- On Sun, 6/28/09, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython] Bio.Sequencing.Ace
> To: "David Winter" <winda002 at student.otago.ac.nz>
> Cc: "Fungazid" <fungazid at yahoo.com>, biopython at lists.open-bio.org
> Date: Sunday, June 28, 2009, 10:31 AM
> On Sun, Jun 28, 2009 at 4:27 AM,
> David
> Winter<winda002 at student.otago.ac.nz>
> wrote:
> >> I am trying to parse a large Ace file produced by
> newbler on 454 cDNAs
> >> assemly. I followed the Bio.Sequencing.Ace
> cookbook here:
> >> http://biopython.org/wiki/ACE_contig_to_alignment
> >> and indeed, I can now fetch several properties of
> my contigs
> >> (alignment of reads to consensus, contigs name,
> reads name).
> >
> > Good.
> >
> >> Yet,I would like to know if and how to perform the
> following tasks:
> >> * retrieving the quality of specific nucleotides
> in the read.
> >> * getting the consensus sequence.
> >
> > The cookbook example isn't meant to be complete
> documentation for the Ace
> > module - just an example of something you might want
> to do with it. At the
> > moment there is no tutorial chapter on the module but
> you can read the doc
> > strings here:
> >
> > http://www.biopython.org/DIST/docs/api/Bio.Sequencing.Ace-pysrc.html
> > Most of the tags you want to play with are in the
> Contig and Reads classes
> > in that (and have the same names as the ACE format
> specification
> >
> > http://bozeman.mbt.washington.edu/consed/distributions/README.14.0.txt
> 
> Specifically you asked for the consensus sequence - which
> is simple
> to get (as are its associated quality scores):
> 
> from Bio.Sequencing import Ace
> for ace_contig in Ace.parse(handle) :
>     print ace_contig.name # just a string
>     print ace_contig.sequence # as a string with
> "*" chars for insertions
>     print ace_contig.quality # list of scores
> (but not for the insertions)
> 
> There top level properties are simple enough - but I find
> drilling down
> into the reads a bit more tricky. In general the Ace parser
> is a bit
> non-obvious without knowing the Ace format. Having some
> __str__
> and __repr__ methods defined on the objects returned would
> be
> very nice - I may get time to work on this later this year.
> Anyone
> else interested in this drop us an email.
> 
> Peter
>