PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me)

Tue Nov 2 18:46:57 EST 2004

Thanks for all the quick and detailed responses!  Quite a motivation to 
stick with (Bio)Python!

I was unaware of the ASTRAL database -- that might be useful in the 
future (i was looking to do this particular conversion on an 
experimentally determined PDB ... not yet submitted).  Can BioPython 
interact directly w/ the (remote) ASTRAL database?  Or is it something 
I would need to have locally?

Thanks again,
Doug

On Nov 2, 2004, at 6:34 PM, Iddo wrote:

> Trouble is, that the conversion here might not be good for a some 
> purposes, as usually structure ->sequence conversion applications want 
> (1) unique mapping and (2) a 20 letter alphabet + 'X' for everything 
> else.
>
>
> Gavin Crooks wrote:
>
>> There is also a longer three letter code conversion table in 
>> Bio/SCOP/Raf.py
>> The PDB contains a whole bunch of weird 3 letter codes for different
>> chemically modified amino acids.
>>
>> Another possibility is to get the fasta sequences directly from the
>> ASTRAL database, since they have already grubbed around and done the
>> conversion.
>>
>> Gavin
>>
>>
>> # This table is taken from the RAF release notes, and includes the
>> # undocumented mapping "UNK" -> "X"
>> to_one_letter_code= {
>>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>>     'UNK':'X'
>>     }
>>
>> On Nov 2, 2004, at 13:51, Iddo wrote:
>>
>>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>>> dark side of  bioinformatics...;)
>>>
>>> As to your question:  see the attached class. I should get that in 
>>> Biopython, but I keep not doing that....
>>>
>>> ./I
>>>
>>>
>>> Douglas Kojetin wrote:
>>>
>>>> Hi All-
>>>>
>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
>>>> ...).  First off, many thanks for the structrual biopython FAQ ... 
>>>> very helpful!  My question:  can anyone help me with some ideas on 
>>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>>
>>>> From the structural biopython faq, I've been able to extract 
>>>> residue information in the form of:
>>>>
>>>>  <Residue MET het=  resseq=1 icode= >
>>>>
>>>> I take it I would just need to grab the MET (split the residue 
>>>> object and grab the r[1] index?) and convert into M, then append to 
>>>> a sequence string ....
>>>>
>>>> but I didn't know if biopython had something that did an 
>>>> autoconversion of MET->M, or vice versa (M->MET).
>>>>
>>>> Thanks for the input,
>>>> Doug
>>>>
>>>> _______________________________________________
>>>> BioPython mailing list  -  BioPython at biopython.org
>>>> http://biopython.org/mailman/listinfo/biopython
>>>>
>>>>
>>>
>>
>> Gavin E. Crooks
>> Postdoctoral Fellow                  tel:  (510) 642-9614
>> 461 Koshland Hall                    aim:notastring
>> University of California             http://threeplusone.com/
>> Berkeley, CA 94720-3102, USA         gec at compbio.berkeley.edu
>>
>>
>>
>
>
> -- 
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 North Torrey Pines Road
> La Jolla, CA 92037 USA
> T: (858) 646 3100 x3516
> F: (858) 713 9930
> http://ffas.ljcrf.edu/~iddo
>