[Bioperl-l] Count amino acid frequency

Stephan Bour sbour at niaid.nih.gov
Fri Jun 27 15:37:21 EDT 2003


Hi Brian,
The combination of Bioperl modules you describe sounds perfect. Being new to
the list, I don't know who Jason is (yet?) but I'm sure you will all hear
from me again as I try to put this together!
Thanks,
Stephan.

> Stephan,
> 
> OK, simple. If the sequences weren't of equal length and it was essential to
> account for each sequence then I'd say you'd have to make an alignment using
> your file as input (/Bio/Tools/Run/Alignment/Clustalw.pm), and then you
> could slice the alignment into columns with Bio/SimpleAlign::slice and
> analyze each column with Bio/Tools/SeqStats. In fact, it still may be easier
> to do this than take each sequence, split it into an array, and so on. There
> may be other approaches of course, and I'm not sure about the details, this
> is what my first try would be. But you should probably just wait one minute,
> Jason will probably write this application for you...
> 
> ;-)
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Stephan Bour
> Sent: Friday, June 27, 2003 12:30 PM
> To: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] Count amino acid frequency
> 
> Good question. There should be an array length test to eliminate any
> sequence that's not full length (81 aa) or don't start with a methionine.
> Stephan.
> 
>> Stephan,
>> 
>> Are all of your protein variants guaranteed to be of the same length?
>> 
>> Brian O.
>> 
>> -----Original Message-----
>> From: bioperl-l-bounces at portal.open-bio.org
>> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Stephan Bour
>> Sent: Friday, June 27, 2003 11:35 AM
>> To: bioperl-l at portal.open-bio.org
>> Subject: [Bioperl-l] Count amino acid frequency
>> 
>> I¹m new to the list and bioperl so I hope this is not too stupid a
> question.
>> 
>> I need to write a perl script that does the following:
>> - Take a file with about 1000 sequences of the same protein in FASTA
> format
>> - For each position on all sequences count the number of occurrence of
> each
>> possible residue
>> - Return only the count of the residues actually present at each position
>> (in other words, residues present 0 times are not returned).
>> - Present the data in tab delimited format that could be imported into
> Excel
>> for graphing
>> 
>> It is a fairly simple script to write but I try to apply the
>> do-not-reinvent-the-wheel dogma.
>> 
>> Is there a bioperl module or an existing script that would fit the bill?
>> 
>> Thanks,
>> Stephan.
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 




More information about the Bioperl-l mailing list