[Biopython] cleaning sequences
Brad Chapman
chapmanb at 50mail.com
Tue Jul 14 08:45:21 EDT 2009
Hi Liam;
I don't believe there is built in functionality for doing this. The
problem itself is hard because it is a bit underspecified: what
should be done when encountering ambiguous characters? Depending on
your situation this can be a couple of different things:
- Trim the sequence to remove the bases. This might be a
post-sequencing step, and there was some discussion between Peter
and Giles about the parameters of doing this earlier this month:
http://lists.open-bio.org/pipermail/biopython/2009-July/005342.html
- Replace the bases with an accepted ambiguity character (say, N or
x)
So it's a bit hard to generalize. Saying that, we'd be happy for
thoughts on an implementation that would tackle these sorts of
issues.
Brad
> I was wondering if there was a built in method for determining whether a
> sequence (Genbank or FASTA) is an Ambiguous or Unambiguous sequence. The
> reason I ask is I am trying to subtype a couple hundred viral DNA sequences,
> and due to bad sequencing, the sequences often have ambiguous characters in
> them, which the algorithm used to subtype doesn't like. I realise I can
> compare each letter of each genome in a loop with GATC to determine
> ambiguity, but it might be easier if there was a built in function.
>
> Thanks
> Liam
>
>
>
> --
> -----------------------------------------------------------
> Antiviral Gene Therapy Research Unit
> University of the Witwatersrand
> Faculty of Health Sciences, Room 7Q07
> 7 York Road, Parktown
> 2193
>
> Tel: 2711 717 2465/7
> Fax: 2711 717 2395
> Email: liam.thompson at students.wits.ac.za / dejmail at gmail.com
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list