[Biopython-dev] Iterating over Ace contig files
Peter
biopython at maubp.freeserve.co.uk
Tue Jun 17 09:16:29 UTC 2008
Hello Frank,
I wanted to get your opinion on iterating over the Ace file contig by
contig, and what is lost in the WA, CT, RT and WR tags at the end of
the file by doing this. As large sequencing runs become more common,
iterating over the file in a single pass WITHOUT keeping everything in
memory does seem to be desirable.
Similar past discussions:
http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html
Would you object to me rewording your module's header-comment not to
say that the Ace Iterator is NOT deprecated, but rather that it has
certain drawbacks.
[The context for this is my recent thread on the Biopython dev mailing
list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
and/or Bio.AlignIO - I've included a little context below.]
Thanks,
Peter
--
Peter wrote:
>> So integrating the "ace" format into Bio.SeqIO representing the
>> consensus sequence of each contig as a SeqRecord would be useful.
>> Initially I would try and represent the aligned reads as SeqFeature
>> objects (much like when reading a genome from a GenBank file you get
>> CDS features with their amino acid translation).
>>
>> Note that for memory reasons, I would be inclined to scan over the Ace
>> file in one pass (using the existing Iterator in the
>> Bio.Sequencing.Ace parser) returning SeqRecords as we go. As Frank
>> points out in the code comments, this means we can't easily include
>> the WA, CT, RT and WR tags found in the Ace file footer. Do you use
>> this information Jose?
Jose replied,
> I haven't used the iterator because of the deprecation warning of the code. I
> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
> I there are more sequences, and there will be with the 454 sequencer, we will
> have trouble reading all at once. I vote for the iterator approach. I have not
> used the information of this tag, but I don't know also what they mean. I've
> been looking for documentation about this format, but I've found none, do you
> have any good ace documentation?
More information about the Biopython-dev
mailing list