[Biopython] still more questions about NGS sequenbce trimming
Sebastian Schmeier
s.schmeier at gmail.com
Wed Oct 24 13:12:46 EDT 2012
A very quick and dirty approach for your reject function (I hope I
understood correctly) in script form:
#!/usr/bin/env python
import sys, re
from Bio import SeqIO
def main():
for record in SeqIO.parse(open(sys.argv[1], "rU"), "fasta") :
if not discard(str(record.seq)):
SeqIO.write(record, sys.stdout, 'fasta')
def discard(seq):
oRes = re.search('(A{9,}|C{9,}|G{9,}|T{9,}|N{9,})', seq)
if oRes: return 1
else: return 0
if __name__ == '__main__':
sys.exit(main())
Best,
Seb
On Wed, Oct 24, 2012 at 5:49 PM, Kiss, Csaba <csaba.kiss at lanl.gov> wrote:
> Hi All!
> Thanks for all your help to extract DNA sequences from sff files. Using
> biopython I managed to improve the sequence extraction from 3 hours to 10
> minutes.
> Now that I am hooked, I would like to replace mothur with some simple
> python functions.
> Is there any function in biopython that would look for homopolymers on DNA
> sequences. Particularly I am looking to reject a sequence if it has more
> than 8 bp of stretches of any single nucleotide.
>
> Another function I am looking for is a sliding window function along the
> quality file. I could either use the fastq file or the fasta/qual file pair.
>
> I could write these functions myself but if they are available, then it
> would make my life easier.
> Thanks
>
> Csaba
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list