[Biopython] cleaning sequences

Liam Thompson dejmail at gmail.com
Tue Jul 14 18:21:29 UTC 2009


Hi Brad

Yes, I remember the posts rereading them now. I think my problem is a little
less complicated than sequence data, seeing as my sequences are genbank
entries, so they just need to be read, even if they're bad quality. I
suppose changing the letter would be a better option for me, especially as
the reading frame is important for aligning based on peptide sequence.

As for implementation, I am a complete greenhorn at python nevermind
programming, so I wouldn't even know where to start suggestions, sorry about
that.

Regards
Liam




On Tue, Jul 14, 2009 at 2:45 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

> Hi Liam;
> I don't believe there is built in functionality for doing this. The
> problem itself is hard because it is a bit underspecified: what
> should be done when encountering ambiguous characters? Depending on
> your situation this can be a couple of different things:
>
> - Trim the sequence to remove the bases. This might be a
>  post-sequencing step, and there was some discussion between Peter
>  and Giles about the parameters of doing this earlier this month:
>
>  http://lists.open-bio.org/pipermail/biopython/2009-July/005342.html
>
> - Replace the bases with an accepted ambiguity character (say, N or
>  x)
>
> So it's a bit hard to generalize. Saying that, we'd be happy for
> thoughts on an implementation that would tackle these sorts of
> issues.
>
> Brad
>
> > I was wondering if there was a built in method for determining whether a
> > sequence (Genbank or FASTA) is an Ambiguous or Unambiguous sequence. The
> > reason I ask is I am trying to subtype a couple hundred viral DNA
> sequences,
> > and due to bad sequencing, the sequences often have ambiguous characters
> in
> > them, which the algorithm used to subtype doesn't like. I realise I can
> > compare each letter of each genome in a loop with GATC to determine
> > ambiguity, but it might be easier if there was a built in function.
> >
> > Thanks
> > Liam
> >
> >
> >
> > --
> > -----------------------------------------------------------
> > Antiviral Gene Therapy Research Unit
> > University of the Witwatersrand
> > Faculty of Health Sciences, Room 7Q07
> > 7 York Road, Parktown
> > 2193
> >
> > Tel: 2711 717 2465/7
> > Fax: 2711 717 2395
> > Email: liam.thompson at students.wits.ac.za / dejmail at gmail.com
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
>



-- 
-----------------------------------------------------------
Antiviral Gene Therapy Research Unit
University of the Witwatersrand
Faculty of Health Sciences, Room 7Q07
7 York Road, Parktown
2193

Tel: 2711 717 2465/7
Fax: 2711 717 2395
Email: liam.thompson at students.wits.ac.za / dejmail at gmail.com



More information about the Biopython mailing list