[Biopython] Occurrence of Sequence in fasta file
Peter Cock
p.j.a.cock at googlemail.com
Fri Jul 12 09:52:21 UTC 2013
On Fri, Jul 12, 2013 at 10:00 AM, Jurgens de Bruin <debruinjj at gmail.com> wrote:
> Hi,
>
> Does Biopython have a method of calculating the occurrence of a sequence in
> a fasta file. The actual sequence will have to be used and not the id/title
> of each sequence?
>
> Thanks
Depending exactly what you mean (and if you care about overlapping
counts or not), the Seq object's count method (like the Python string's
count method) might be enough, for example:
my_fasta_file = "example.fasta"
my_sequence = "ACGTACGT"
print sum(record.seq.count(my_sequence) for record in
SeqIO.parse(my_fasta_file, "fasta"))
That's a compact way of writing this equivalent with a for loop:
my_fasta_file = "example.fasta"
my_sequence = "ACGTACGT"
total = 0
for record in SeqIO.parse(my_fasta_file, "fasta"):
total += record.seq.count(my_sequence)
print total
Something like that?
Peter
More information about the Biopython
mailing list