[Biopython] Occurrence of Sequence in fasta file

Peter Cock p.j.a.cock at googlemail.com
Fri Jul 12 09:52:21 UTC 2013


On Fri, Jul 12, 2013 at 10:00 AM, Jurgens de Bruin <debruinjj at gmail.com> wrote:
> Hi,
>
> Does Biopython have a method of calculating the occurrence of a sequence in
> a fasta file. The actual sequence will have to be used and not the id/title
> of each sequence?
>
> Thanks

Depending exactly what you mean (and if you care about overlapping
counts or not), the Seq object's count method (like the Python string's
count method) might be enough, for example:

my_fasta_file = "example.fasta"
my_sequence = "ACGTACGT"
print sum(record.seq.count(my_sequence) for record in
SeqIO.parse(my_fasta_file, "fasta"))

That's a compact way of writing this equivalent with a for loop:

my_fasta_file = "example.fasta"
my_sequence = "ACGTACGT"
total = 0
for record in SeqIO.parse(my_fasta_file, "fasta"):
    total += record.seq.count(my_sequence)
print total

Something like that?

Peter



More information about the Biopython mailing list