[Biopython] Generating a fasta file from atomic coordinate file

Wed Mar 21 10:56:35 UTC 2018

Good point - we should be able to access mmCIF files
via SeqIO as well:

https://github.com/biopython/biopython/issues/1573

Peter

On Wed, Mar 21, 2018 at 7:27 AM, John Berrisford <jmb at ebi.ac.uk> wrote:
> Hi
>
> The mmCIF coordinate file has this information readily available in the
> _entity_poly category either with non-standard residues in brackets
> (pdbx_seq_one_letter_code) or with non-standard residues having the one
> letter code of their parent (i.e. MSE -> MET) pdbx_seq_one_letter_code_can
>
> I do not know if this is available in biopython's mmCIF parser.
>
> Please note that PDB format files are not available for every entry in the
> PDB due to limitations of the format. However, mmCIF files are available for
> every entry.
>
> Regards
>
> John
>
>
>
> On 20/03/2018 23:23, Peter Cock wrote:
>>
>> That is using the 3D structure to get the protein sequence
>> (using the PDB parser and NumPy as a dependency), and
>> the code to call it can be shortened to just:
>>
>> from Bio import SeqIO
>> SeqIO.convert("input.pdb", "pdb-atom", "output.fasta", "fasta")
>>
>> Or, if you just want the sequence in the SEQRES header:
>>
>> from Bio import SeqIO
>> SeqIO.convert("input.pdb", "pdb-seqres", "output.fasta", "fasta")
>>
>> See:
>>
>> http://biopython.org/wiki/SeqIO
>>
>> Peter
>>
>> On Tue, Mar 20, 2018 at 10:05 PM, João Rodrigues
>> <j.p.g.l.m.rodrigues at gmail.com> wrote:
>>>
>>> Hi Ahmad,
>>>
>>> You can use Bio.Seq directly on the PDB file:
>>>
>>> from Bio import SeqIO
>>> records = SeqIO.parse('1xyz.pdb', 'pdb-atom'):
>>> with open('1xyz.fasta', 'w') as handle:
>>>      SeqIO.write(records, handle, "fasta")
>>>
>>> Not sure if there is a way to couple SeqIO directly to the Bio.PDB code
>>> (a
>>> method that allows to read the sequence from the SMCRA object), that
>>> would
>>> be cool to add.
>>>
>>> Cheers,
>>>
>>> João
>>>