[BioPython] FASTA parsing errors

Brad Chapman chapmanb at uga.edu
Tue Aug 3 18:26:36 EDT 2004


Hi Aaron;

Aaron:
> > This is the file that is being read. I know it worked in 1.24 just fine  
> > but maybe something changed in the versions that make it not like this  
> > format
> >
> > LOCUS       XM_414447               2107 bp    mRNA    linear   VRT  
[....]

Jon:
> I don't think that file conforms to the fasta format:
> see http://ngfnblast.gbf.de/docs/fasta.html
> I could be wrong though.

Right. That's a GenBank file, which is why the Fasta parser is
choking on it (the error message should be a lot nicer, for sure).

You have two options:

1. Use a GenBank parser.

2. Retrieve Fasta sequences. Going from the code you posted
previously, you could retrieve your search in FASTA format with the
following:

from Bio import GenBank
from Bio import Fasta

ncbi_dict = GenBank.NCBIDictionary("nucleotide", "fasta",
             Fasta.RecordParser())

seqrecord = ncbi_dict["6273291"]

genbank_file = open(data_path_prefix + file_unique_id + 'fasta',
'w')
genbank_file.write(seqrecord + "\n")
genbank_file.close()

This may have changed with the most recent release because the
default for GenBank retrieval used to be fasta. Because of changes
at NCBI this had to be updated, and I believe now defaults to
GenBank. So, if you didn't specify "fasta" as the second argument,
that's probably now why you are getting GenBank data. Hopefully this
small change in your code will fix everything.

Hope this helps.
Brad


More information about the BioPython mailing list