[Biopython-dev] Bio.Sequencing

Peter biopython at maubp.freeserve.co.uk
Tue Jun 30 09:33:00 UTC 2009


Cymon wrote:
>
>Peter wrote:
>> The thing about "gaps" in contigs is that the consensus is
>> really the ungapped sequence.
>
> Yes, but... there is still some ambiguity over the consensus sequence which
> is lost in the ungapped sequence. OK, so this isnt such a bid deal with the
> massive coverages achieved by 454 tech but I can imagine cases of hybrid
> Sanger/454 where this might be an issue (might be scraping the bottom of the
> barrel a bit here...).
>
>Peter wrote:
>> I'd have to check but I think
>> Newbler and CAP3 will output both FASTA and ACE files,
>> and in the FASTA files there are no insertions/gaps in the
>> contig sequences.
>
> For comparison, Mira outputs ACE, plus X.gapped.fasta, and X.ungapped.fasta

That is nice an explicit. :)

>> What I am thinking is Bio.SeqIO could return the ungapped
>> consensus sequences as SeqRecord objects (which can then
>> be saved as FASTA, FASTQ, QUAL) while Bio.AlignIO
>> could return contig-alignment objects (with the gaps, like
>> David's cookbook but in the long run with a contig class).
>
> Yeah, I like this.

Cool. I will try and look into this later in July.

> Although, I'm not sure how intuitive it is that SeqIO would
> necessarily return the ungapped rather than gapped
> sequences - but it kinda makes sense...

Yeah - I'm a bit on the fence myself about Ace to SeqRecord,
and whether gapped or ungapped makes most sense. Given
that the current Bio.SeqIO behaviour gives the gapped
sequence, I guess we should just leave it like that.

Peter



More information about the Biopython-dev mailing list