[BioPython] Read Astral file

Brad Chapman chapmanb@arches.uga.edu
Thu, 5 Sep 2002 14:56:18 -0400


Hi Pablo;

> I have the astral-scopdom-seqres-all-1.59.fa file.
> 
> Here it's an example:
> >d1a3oa_ a.1.1.2 (A:) Hemoglobin, alpha-chain {Human (Homo sapiens)}
> vlspadktnvkaawgkvgahageygaealermflsfpttkthfphfdlshgsaqvkghgk
> kvadaltnavahvddmpnalsalsdlhahklrvdpvnfkllshcllvtlaahlpaeftpa
> vhasldkflasvstvltskyr
> 
> I would like to read it, and work with the sequence, but I don't know if 
> there's any class in biopython for doing this or I should write it myself.

I don't really know anything about these type of files, but from your
example they look just like Fasta formatted files in which case you can
use the Fasta parser in Biopython.

> I need the info of the SCOP classification (the "a.1.1.2"  part) and the 
> sequence in order to by able to cluster the sequences with the same 
> superfamily.

This should be pretty easy to do. All you'd need to do is something
like:

from Bio import Fasta
# set up the file
handle = open("astral-scopdom-seqres-all-1.59.fa", "r")
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(handle, parser)

# work through all of the records in the file
while 1:
    record = iterator.next()
    if not record: # when we're out of records, stop
        break
    # split the information you want out of the title
    title_parts = record.title.split(" ")
    scop_class = title_parts[1]

    # do whatever you want for sorting them. For instance if you wanted
    # to the Fasta records to separate files based on classifications, 
    # then you could do:
    output_handle = open(scop_class + ".fasta", "a")
    output_handle.write(str(record) + "\n")
    output_handle.close()

Hopefully this makes some sense and answers your question. Good luck!
Brad