[BioPython] FASTA parsing errors
Aaron Zschau
aaron at ocelot-atroxen.dyndns.org
Tue Aug 3 19:15:19 EDT 2004
I seem to still have a problem with the results I'm getting, I need a
protein sequence in order to do a BLAST search with the data from my
genbank lookup, however the FASTA file created now just contains the
nucleotide. I tried the following line:
ncbi_dict = GenBank.NCBIDictionary("protein", "fasta",
Fasta.RecordParser())
thinking that possibly changing "nucleotide" to "protein" in your
original recommendation would help things but I still get the following
results which are not in protein sequence form:
>gi|6273291|gb|AF191665.1|AF191665 Opuntia marenae rpl16 gene;
chloroplast gene for chloroplast product, partial intron sequence
TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
CTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTA
ATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAA
GAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTC
ATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGAT
GTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCTATGNTCATTGGAAGGATGGC
GGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAAACT
AAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATTGCT
CATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGACAA
ACTATATATATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAAAAA
TATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATGTAT
ATATTAATTCAATATTATTATTCTATTCATTTTTATTCATTTTCAAATTTATAATATATT
AATCTATATATTAATTTAGAATTCTATTCTAATTCGAATTCAATTTTTAAATATTCATAT
TCAATTAAAATTGAAATTTTTTCATTCGCGAGGAGCCGGATGAGAAGAAACTCTCATGTC
CGGTTCTGTAGTAGAGATGGAATTAAGAAAAAACCATCAACTATAACCCCAAAAGAACCA
GA
thanks,
Aaron Zschau
On Aug 3, 2004, at 6:26 PM, Brad Chapman wrote:
> Hi Aaron;
>
> Aaron:
>>> This is the file that is being read. I know it worked in 1.24 just
>>> fine
>>> but maybe something changed in the versions that make it not like
>>> this
>>> format
>>>
>>> LOCUS XM_414447 2107 bp mRNA linear VRT
> [....]
>
> Jon:
>> I don't think that file conforms to the fasta format:
>> see http://ngfnblast.gbf.de/docs/fasta.html
>> I could be wrong though.
>
> Right. That's a GenBank file, which is why the Fasta parser is
> choking on it (the error message should be a lot nicer, for sure).
>
> You have two options:
>
> 1. Use a GenBank parser.
>
> 2. Retrieve Fasta sequences. Going from the code you posted
> previously, you could retrieve your search in FASTA format with the
> following:
>
> from Bio import GenBank
> from Bio import Fasta
>
> ncbi_dict = GenBank.NCBIDictionary("nucleotide", "fasta",
> Fasta.RecordParser())
>
> seqrecord = ncbi_dict["6273291"]
>
> genbank_file = open(data_path_prefix + file_unique_id + 'fasta',
> 'w')
> genbank_file.write(seqrecord + "\n")
> genbank_file.close()
>
> This may have changed with the most recent release because the
> default for GenBank retrieval used to be fasta. Because of changes
> at NCBI this had to be updated, and I believe now defaults to
> GenBank. So, if you didn't specify "fasta" as the second argument,
> that's probably now why you are getting GenBank data. Hopefully this
> small change in your code will fix everything.
>
> Hope this helps.
> Brad
> _______________________________________________
> BioPython mailing list - BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
More information about the BioPython
mailing list