[EMBOSS] EMBOSS for protein alignment stats
Anandkumar Surendrarao
aksrao at ucdavis.edu
Thu May 30 17:47:04 UTC 2019
Greetings EMBOSS users!
I have ~ 18000 files, each with clustal formatted protein alignments
derived from Pfam-A.full.
Some of these files are large > 500MB in size, the largest alignment is 3GB!
I need to calculate the following alignment statistics
A. average aligned length
B. std. dev. of aligned length
C. average of pairwise sequence ID %
D. std. dev. of pairwise sequence ID %
Here are my 2 problems that I seek help with:
1. I can calculate A and C using alistat that comes with UBUNTU, but not B
or D.
2. For the really large alignments, there is no option due to RAM
requirements, and so I've used alistat's -f (fast) option, which estimates
average %id by "sampling"
If EMBOSS has tools / tricks to report A - D, while having reasonable RAM
and disk-usage footprints, and quick processing times, please let me know.
I am open to suggestions regarding other tools as well.
I look forward to your replies. Thanks, in advance.
Sincerely,
Anand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/emboss/attachments/20190530/9e2bb089/attachment.htm>
More information about the EMBOSS
mailing list