[Biopython-dev] COMPASS parsing code
James Casbon
j.a.casbon at qmul.ac.uk
Tue Apr 27 07:45:20 EDT 2004
Hi,
I have written some code for parsing compass results. Compass implements
profile/profile alignment and is available by ftp. See:
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12547212
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14500884
for more details.
I have attached the code, which you might like to include in the biopython
distribution.
There are probably a few issues with the code that could make it better:
* the unit tests use some sample input, file comtest1 and comtest2. These are
just read using open. I have seen someone use test.locate or something like
that, but I'm not sure how that works. If you want to enlighten me, I'll
change it.
* i have used regular expressions inefficiently, as I'm not sure how you're
supposed to cache them using the _Scanner/_Consumer framework. At the moment
each subroutine compiles an re when called, which can't be good. Again,
please enlighten me to a better way and I will change it.
regards,
James
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Compass.py
Type: application/x-python
Size: 12778 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040427/96a80e10/Compass.bin
-------------- next part --------------
Ali1: 60456.blo.gz.aln Ali2: allscop//14982.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388 filtered_length1=386 length2=116 filtered_length2=115
Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=11.313
Smith-Waterman score = 35 Evalue = 1.01e+03
QUERY 178 KKDLEEIAD
++ ++++++
QUERY 9 QAAVQAVTA
Ali1: 60456.blo.gz.aln Ali2: allscop//14983.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388 filtered_length1=386 length2=121 filtered_length2=119
Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=11.168
Smith-Waterman score = 35 Evalue = 1.01e+03
QUERY 178 KKDLEEIAD
++ ++++++
QUERY 9 REAVEAAVD
Ali1: 60456.blo.gz.aln Ali2: allscop//14984.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388 filtered_length1=386 length2=145 filtered_length2=137
Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=5.869
Smith-Waterman score = 37 Evalue = 5.75e+02
QUERY 371 LEEAMDRMER~~~V
+ ++++ + + +
QUERY 76 LQNFIDQLDNpddL
Ali1: 60456.blo.gz.aln Ali2: allscop//15010.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388 filtered_length1=386 length2=141 filtered_length2=141
Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=6.099
Smith-Waterman score = 37 Evalue = 5.75e+02
QUERY 163 LIINSP
++++++
QUERY 32 LFDAHD
-------------- next part --------------
....Ali1: 60456.blo.gz.aln Ali2: 60456.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388 filtered_length1=386 length2=388 filtered_length2=386
Nseqs1=399 Neff1=12.972 Nseqs2=399 Neff2=12.972
Smith-Waterman score = 2759 Evalue = 0.00e+00
QUERY 2 LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY 2 LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN
QUERY IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV
QUERY SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL
QUERY EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL
QUERY GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW
QUERY KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR
QUERY ISYATAYEKLEEAMDRMERVLKERKL
++++++++++++++++++++++++++
QUERY ISYATAYEKLEEAMDRMERVLKERKL
More information about the Biopython-dev
mailing list