[Biopython-dev] Primary Sequence of all protein (help)
Peter
biopython at maubp.freeserve.co.uk
Tue Mar 16 19:42:43 UTC 2010
On Tue, Mar 16, 2010 at 7:24 PM, Rodrigo Faccioli
<rodrigo_faccioli at uol.com.br> wrote:
>
> Hi all,
>
> I want to know the primary sequence (fasta file) of all proteins. In other
> the words, I would like a database which contain the fasta files of all
> proteins.
>
> I'm a computer scientist and I don't know how hard it is. However, we have
> worked with SEQRES section of PDB files and BioPython. So, we want to work
> with fasta files and BioPython to check our results.
A single FASTA file of all know proteins would be enormous. Even the
non-redundant ("nr") dataset used by the NCBI for their hugely popular
BLAST search is pretty big.
It sounds like many all you need is a FASTA file containing all the
sequences with structures in the PDB - something you may be
able to download directly from the PDB FTP site.
Peter
More information about the Biopython-dev
mailing list