PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam
filter hates me)
Douglas Kojetin
djkojeti at unity.ncsu.edu
Tue Nov 2 18:46:57 EST 2004
Thanks for all the quick and detailed responses! Quite a motivation to
stick with (Bio)Python!
I was unaware of the ASTRAL database -- that might be useful in the
future (i was looking to do this particular conversion on an
experimentally determined PDB ... not yet submitted). Can BioPython
interact directly w/ the (remote) ASTRAL database? Or is it something
I would need to have locally?
Thanks again,
Doug
On Nov 2, 2004, at 6:34 PM, Iddo wrote:
> Trouble is, that the conversion here might not be good for a some
> purposes, as usually structure ->sequence conversion applications want
> (1) unique mapping and (2) a 20 letter alphabet + 'X' for everything
> else.
>
>
> Gavin Crooks wrote:
>
>> There is also a longer three letter code conversion table in
>> Bio/SCOP/Raf.py
>> The PDB contains a whole bunch of weird 3 letter codes for different
>> chemically modified amino acids.
>>
>> Another possibility is to get the fasta sequences directly from the
>> ASTRAL database, since they have already grubbed around and done the
>> conversion.
>>
>> Gavin
>>
>>
>> # This table is taken from the RAF release notes, and includes the
>> # undocumented mapping "UNK" -> "X"
>> to_one_letter_code= {
>> 'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>> 'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>> 'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>> 'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>> '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>> 'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>> 'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>> 'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>> 'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>> 'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>> 'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>> 'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>> 'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>> 'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>> 'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>> 'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>> 'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>> 'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>> 'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>> 'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>> 'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>> 'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>> 'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>> 'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>> 'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>> 'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>> 'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>> 'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>> 'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>> 'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>> 'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>> 'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>> 'UNK':'X'
>> }
>>
>> On Nov 2, 2004, at 13:51, Iddo wrote:
>>
>>> Welcome aboard, and I am glad we managed to save one Jedi from the
>>> dark side of bioinformatics...;)
>>>
>>> As to your question: see the attached class. I should get that in
>>> Biopython, but I keep not doing that....
>>>
>>> ./I
>>>
>>>
>>> Douglas Kojetin wrote:
>>>
>>>> Hi All-
>>>>
>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python
>>>> ...). First off, many thanks for the structrual biopython FAQ ...
>>>> very helpful! My question: can anyone help me with some ideas on
>>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>>
>>>> From the structural biopython faq, I've been able to extract
>>>> residue information in the form of:
>>>>
>>>> <Residue MET het= resseq=1 icode= >
>>>>
>>>> I take it I would just need to grab the MET (split the residue
>>>> object and grab the r[1] index?) and convert into M, then append to
>>>> a sequence string ....
>>>>
>>>> but I didn't know if biopython had something that did an
>>>> autoconversion of MET->M, or vice versa (M->MET).
>>>>
>>>> Thanks for the input,
>>>> Doug
>>>>
>>>> _______________________________________________
>>>> BioPython mailing list - BioPython at biopython.org
>>>> http://biopython.org/mailman/listinfo/biopython
>>>>
>>>>
>>>
>>
>> Gavin E. Crooks
>> Postdoctoral Fellow tel: (510) 642-9614
>> 461 Koshland Hall aim:notastring
>> University of California http://threeplusone.com/
>> Berkeley, CA 94720-3102, USA gec at compbio.berkeley.edu
>>
>>
>>
>
>
> --
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 North Torrey Pines Road
> La Jolla, CA 92037 USA
> T: (858) 646 3100 x3516
> F: (858) 713 9930
> http://ffas.ljcrf.edu/~iddo
>
More information about the BioPython
mailing list