[BioPython] FASTA parsing errors
Brad Chapman
chapmanb at uga.edu
Tue Aug 3 18:26:36 EDT 2004
Hi Aaron;
Aaron:
> > This is the file that is being read. I know it worked in 1.24 just fine
> > but maybe something changed in the versions that make it not like this
> > format
> >
> > LOCUS XM_414447 2107 bp mRNA linear VRT
[....]
Jon:
> I don't think that file conforms to the fasta format:
> see http://ngfnblast.gbf.de/docs/fasta.html
> I could be wrong though.
Right. That's a GenBank file, which is why the Fasta parser is
choking on it (the error message should be a lot nicer, for sure).
You have two options:
1. Use a GenBank parser.
2. Retrieve Fasta sequences. Going from the code you posted
previously, you could retrieve your search in FASTA format with the
following:
from Bio import GenBank
from Bio import Fasta
ncbi_dict = GenBank.NCBIDictionary("nucleotide", "fasta",
Fasta.RecordParser())
seqrecord = ncbi_dict["6273291"]
genbank_file = open(data_path_prefix + file_unique_id + 'fasta',
'w')
genbank_file.write(seqrecord + "\n")
genbank_file.close()
This may have changed with the most recent release because the
default for GenBank retrieval used to be fasta. Because of changes
at NCBI this had to be updated, and I believe now defaults to
GenBank. So, if you didn't specify "fasta" as the second argument,
that's probably now why you are getting GenBank data. Hopefully this
small change in your code will fix everything.
Hope this helps.
Brad
More information about the BioPython
mailing list