[BioPython] error with Fasta.Record?

Brad Chapman chapmanb at uga.edu
Fri Apr 2 11:20:52 EST 2004


Hi Karin;

> I use the following code to read in a fasta file:
[...]
> I do this with a test file:
> 
> adenine:18:38> cat /med/adenine/u2/projects/locator/gard/testfile
> >1_dapB_to_carA_29196_29650
> gtctataagtgccaaaaattacatgttttgtcttctgtttttgttgttttaatgtaaatt
> ttgaccatttggtccacttttttctgctcgtttttatttcatgcaatc
[...]
> And the files I get look like this:
> 
> adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10001
> >1_dapB_to_carA_29196_29650
> GTCTATAAGTGCCAAAAATTACATGTTTTGTCTTCTGTTTTTGTTGTTTTAATGTAAATT

Unfortunately, I'm not able to reproduce this error. I've attached a
test script which uses the quick_FASTA_reader and works with the
f002 file from Tests/Fasta (so you can check it yourself on the same
file and make sure everything works on your platform). If you run
this script on your test file, do you see the same problem?

Without knowing more, I have a couple of guesses about the problem:

1. There is some kind of newline problem. The quick_FASTA_reader is
a pretty simple implementation which probably won't work properly if
fed a file with lots of different newlines (or newlines different
from the platform they are being run on). The best solution here is
to use the full Fasta.RecordParser() for parsing.

2. Your code is somewhere modifying the sequences. If seems like you
have at least a bit of other code in there which is doing things
with the entries. Perhaps they are modified somehow there.

Just guesses though. I'd like to fix the problem but need to distill
this down to a test case so that I can reproduce it. Hopefully my
attached test code helps do this.

Thanks for the report and checking into this.
Brad
-------------- next part --------------
from Bio.SeqUtils import quick_FASTA_reader
from Bio import Fasta

fasta_file = "f002"

outfile = "test-writing.fasta"
outhandle = open(outfile, "w")

entries = quick_FASTA_reader(fasta_file)

for name, seq in entries:
    rec = Fasta.Record()
    rec.title = name
    rec.sequence = seq
    print rec
    outhandle.write(str(rec) + "\n")

outhandle.close()


More information about the BioPython mailing list