[Biopython-dev] Primary Sequence of all protein (help)

Tue Mar 16 19:42:43 UTC 2010

On Tue, Mar 16, 2010 at 7:24 PM, Rodrigo Faccioli
<rodrigo_faccioli at uol.com.br> wrote:
>
> Hi all,
>
> I want to know the primary sequence (fasta file) of all proteins. In other
> the words, I would like a database which contain the fasta files of all
> proteins.
>
> I'm a computer scientist and I don't know how hard it is. However, we have
> worked with SEQRES section of PDB files and BioPython. So, we want to work
> with fasta files and BioPython to check our results.

A single FASTA file of all know proteins would be enormous. Even the
non-redundant ("nr") dataset used by the NCBI for their hugely popular
BLAST search is pretty big.

It sounds like many all you need is a FASTA file containing all the
sequences with structures in the PDB - something you may be
able to download directly from the PDB FTP site.

Peter