[Biopython] cleaning sequences

Chris Fields cjfields at illinois.edu
Tue Jul 14 14:48:04 UTC 2009


If you do come up with something, let us Bioperl guys know.  We have a  
preliminary trimming/cleaning version that we're thinking of adding,  
but it would be nice to coalesce around a similar implementation.

chris

On Jul 14, 2009, at 7:45 AM, Brad Chapman wrote:

> Hi Liam;
> I don't believe there is built in functionality for doing this. The
> problem itself is hard because it is a bit underspecified: what
> should be done when encountering ambiguous characters? Depending on
> your situation this can be a couple of different things:
>
> - Trim the sequence to remove the bases. This might be a
>  post-sequencing step, and there was some discussion between Peter
>  and Giles about the parameters of doing this earlier this month:
>
>  http://lists.open-bio.org/pipermail/biopython/2009-July/005342.html
>
> - Replace the bases with an accepted ambiguity character (say, N or
>  x)
>
> So it's a bit hard to generalize. Saying that, we'd be happy for
> thoughts on an implementation that would tackle these sorts of
> issues.
>
> Brad
>
>> I was wondering if there was a built in method for determining  
>> whether a
>> sequence (Genbank or FASTA) is an Ambiguous or Unambiguous  
>> sequence. The
>> reason I ask is I am trying to subtype a couple hundred viral DNA  
>> sequences,
>> and due to bad sequencing, the sequences often have ambiguous  
>> characters in
>> them, which the algorithm used to subtype doesn't like. I realise I  
>> can
>> compare each letter of each genome in a loop with GATC to determine
>> ambiguity, but it might be easier if there was a built in function.
>>
>> Thanks
>> Liam
>>
>>
>>
>> -- 
>> -----------------------------------------------------------
>> Antiviral Gene Therapy Research Unit
>> University of the Witwatersrand
>> Faculty of Health Sciences, Room 7Q07
>> 7 York Road, Parktown
>> 2193
>>
>> Tel: 2711 717 2465/7
>> Fax: 2711 717 2395
>> Email: liam.thompson at students.wits.ac.za / dejmail at gmail.com
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython




More information about the Biopython mailing list