[BioPython] Looking for functions

Brad Chapman chapmanb at uga.edu
Wed May 5 07:58:58 EDT 2004


Hi Myriam;

> >- convert a swissprot file to a fasta file,
> >I try this one :
> >
> >from Bio.SeqIO import FASTA
> >from Bio.SwissProt import SProt
> >from sys import *
> >
> >def convert_sp_fasta(infile,outfile):
> >   """
> >   convert a SwissProt file into a Fasta formatted file
> >   """
> >   in_h = open(infile)
> >   sp = SProt.Iterator(in_h, SProt.SequenceParser())
> >   out_h = FASTA.FastaWriter(outfile)
> >   sequence = sp.next()
> >   out_h.write(sequence)
> >   in_h.close()
> >   out_h.close()

The code is fine, as far as the use of the Biopython goes. The only
problem is that you are supposed to pass an open file handle to the
FastaWriter, hence the error:

AttributeError: 'str' object has no attribute 'write'

when the code tries to write to the name of the file (outfile)
instead of an open handle. You can fix this by changing:

out_h = FASTA.FastaWriter(outfile)

to:

out_h = FASTA.FastaWriter(open(outfile, "w"))

Converting files is also covered in documentation on the Biopython
website:

http://www.biopython.org/docs/cookbook/genbank_to_fasta.html

Here is some equivalent code that would do your job using the system
described there:

from Bio import formats
from Bio.FormatIO import FormatIO
import sys

def convert_sp_fasta(infile, outfile):
   """
   convert a SwissProt file into a Fasta formatted file
   """
   in_h = open(infile)
   out_h = open(outfile, "w")
   formatter = FormatIO("SeqRecord", formats["swissprot"],
                        formats["fasta"])
   formatter.convert(in_h, out_h)
   in_h.close()
   out_h.close()

if __name__ == "__main__":
    convert_sp_fasta(sys.argv[1], sys.argv[2])

> >- find EcoRI restriction sites in a fasta sequence.

In the simplest case, you can just use python's string.find
find occurrences of "GAATTC" in the sequence:

>>> cur_pos = -1
>>> all_sites = []
>>> while 1:
...     pos = seq.find("GAATTC", cur_pos + 1)
...     if pos == -1:
...             break
...     cur_pos = pos
...     all_sites.append(pos)

Hope this helps!
Brad


More information about the BioPython mailing list