[Bioperl-l] Script

Tue Sep 23 03:59:11 EDT 2003

This isn't a pure Bioperl implementation, but it should do the trick:

# assume you have fasta file with your seqs
my $seqio = Bio::SeqIO->new(-file => 'my_file.fasta');
my $count = 0;
while (my $seq = $seqio->next_seq) {
     while ($seq->seq =~ /NCCC/g) {
         $count++;
     }
}
print "Found motif $count times\n";

If you need to have 2 or more amino acids possible at 1 position, then 
use [] in your regex.

e.g. to match NCCC and NDCC, use /N[CD]CC/g

Maybe someone else out there knows of a Bioperl module that would also 
do this.

Cheers,

Andrew

Lobvi Matamoros wrote:
> 
> 
> Hi to everyone:
> 
> I am trying to know how many times a particular amino acid motif occur 
> in a protein database, for instance NCCC,  in other words count that 
> particular motif. Does anyone have an script to perform that task or 
> something close I can change a little bit?.
> 
> Thanks for your help in advance
> 
> Lobvi Matamoros Fernández, Ph.D
> Post-doctoral fellow
> 
> Centre de Recherche du CHUL
> 2705 Boul. Laurier, T3-80
> Sainte-Foy (Québec)
> G1V 4G2 CANADA
> Tel: 418-6542261
> FAX:418-654-2279
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Pfotenhauerstr. 108
01307 Dresden, Germany
Tel. +49(351)210-2699
Fax  +49(351)210-1309

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------