[BioPython] Read Astral file
Brad Chapman
chapmanb@arches.uga.edu
Thu, 5 Sep 2002 14:56:18 -0400
Hi Pablo;
> I have the astral-scopdom-seqres-all-1.59.fa file.
>
> Here it's an example:
> >d1a3oa_ a.1.1.2 (A:) Hemoglobin, alpha-chain {Human (Homo sapiens)}
> vlspadktnvkaawgkvgahageygaealermflsfpttkthfphfdlshgsaqvkghgk
> kvadaltnavahvddmpnalsalsdlhahklrvdpvnfkllshcllvtlaahlpaeftpa
> vhasldkflasvstvltskyr
>
> I would like to read it, and work with the sequence, but I don't know if
> there's any class in biopython for doing this or I should write it myself.
I don't really know anything about these type of files, but from your
example they look just like Fasta formatted files in which case you can
use the Fasta parser in Biopython.
> I need the info of the SCOP classification (the "a.1.1.2" part) and the
> sequence in order to by able to cluster the sequences with the same
> superfamily.
This should be pretty easy to do. All you'd need to do is something
like:
from Bio import Fasta
# set up the file
handle = open("astral-scopdom-seqres-all-1.59.fa", "r")
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(handle, parser)
# work through all of the records in the file
while 1:
record = iterator.next()
if not record: # when we're out of records, stop
break
# split the information you want out of the title
title_parts = record.title.split(" ")
scop_class = title_parts[1]
# do whatever you want for sorting them. For instance if you wanted
# to the Fasta records to separate files based on classifications,
# then you could do:
output_handle = open(scop_class + ".fasta", "a")
output_handle.write(str(record) + "\n")
output_handle.close()
Hopefully this makes some sense and answers your question. Good luck!
Brad