[BioPython] FASTA parsing errors
Jonathan Taylor
jonathan.taylor at utoronto.ca
Tue Aug 3 17:01:50 EDT 2004
Hi,
I don't think that file conforms to the fasta format:
see http://ngfnblast.gbf.de/docs/fasta.html
I could be wrong though.
Jon Taylor.
On Tue, 2004-08-03 at 16:48, Aaron Zschau wrote:
> This is the file that is being read. I know it worked in 1.24 just fine
> but maybe something changed in the versions that make it not like this
> format
>
>
> thanks,
>
> Aaron Zschau
>
> ------a12345.fasta----------
>
>
> LOCUS XM_414447 2107 bp mRNA linear VRT
> 28-JUL-2004
> DEFINITION PREDICTED: Gallus gallus similar to von Hippel-Lindau
> protein
> (LOC416117), mRNA.
> ACCESSION XM_414447
> VERSION XM_414447.1 GI:50754623
> KEYWORDS .
> SOURCE Gallus gallus (red jungle fowl)
> ORGANISM Gallus gallus
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
> Archosauria; Aves; Neognathae; Galliformes; Phasianidae;
> Phasianinae; Gallus.
> COMMENT MODEL REFSEQ: This record is predicted by automated
> computational
> analysis. This record is derived from an annotated genomic
> sequence
> (NW_060494) using gene prediction method: GNOMON, supported
> by EST
> evidence.
> Also see:
> Documentation of NCBI's Annotation Process
>
> FEATURES Location/Qualifiers
> source 1..2107
> /organism="Gallus gallus"
> /mol_type="mRNA"
> /strain="inbred line UCD001"
> /isolate="#256"
> /db_xref="taxon:9031"
> /chromosome="12"
> /sex="female"
> /note="inbred line derived from a wild population
> of red
> jungle fowl in Malaysia in the late 1930s, with the
> possible introgression of a limited amount of White
> Leghorn genome during its captive breeding history
> common: red jungle fowl"
> gene 1..2107
> /gene="LOC416117"
> /note="Derived by automated computational analysis
> using
> gene prediction method: GNOMON."
> /db_xref="GeneID:416117"
> /db_xref="InterimID:416117"
> CDS 1..486
> /gene="LOC416117"
> /codon_start=1
> /product="similar to von Hippel-Lindau protein"
> /protein_id="XP_414447.1"
> /db_xref="GI:50754624"
> /db_xref="GeneID:416117"
> /db_xref="InterimID:416117"
>
> /translation="MAPPGPGPAGPCLRSANTRELSEVVFNNRSPRAVLPIWVDFEGR
>
> PRYYPVLRPRTGRIMHSYRGHLWLFRDAGTHDGLLVNRQELFVAAPDVNKADITLPVF
>
> TLKERCLQVVRSLVRPGDYRKLDIVRSLYEELEDHPDVKKDLQRLSMERSKTLQEEIL
> H"
> misc_feature 37..453
> /gene="LOC416117"
> /note="VHL; Region: von Hippel-Lindau disease
> tumour
> suppressor protein. VHL forms a ternary complex
> with the
> elonginB and elonginC proteins. This complex binds
> Cul2,
> which then is involved in regulation of vascular
> endothelial growth factor mRNA"
> /db_xref="CDD:pfam01847"
> ORIGIN
> 1 atggcgccgc cgggtccggg tcccgccggg ccgtgcctgc gctccgccaa
> cacgcgcgaa
> 61 ctctccgagg tcgtcttcaa caaccgcagc ccgcgcgccg tgctccccat
> ctgggtggac
> 121 ttcgagggcc ggccgcgcta ctaccccgtg ctgcggccgc gcaccgggcg
> gatcatgcac
> 181 agctaccgcg ggcacctgtg gctgttccgc gacgcgggca cgcacgacgg
> gctgctcgtc
> 241 aaccggcagg agctgttcgt ggccgcgccg gacgtcaaca aggccgacat
> cacgctgcca
> 301 gtgttcacgc tgaaggagcg gtgcctgcag gtggtgcgca gcctggtccg
> gccgggggac
> 361 taccggaagc tggacatcgt gcgctcgctg tacgaggagc tggaggacca
> ccccgacgtc
> 421 aagaaggacc tgcagcggct ctccatggag aggagcaaaa cgttacagga
> ggaaatcctc
> 481 cactaacagg gctgtgcgtc ccgagccgtg tagatagcaa agcaccgagc
> ttaggagggg
> 541 cagctgccgt gcagcgtgcc gggagctaac gtctgcatcg acgttctgga
> acgaactcag
> 601 tcatgctgta gaacatttgc tatgctggta ggtcagattc caaagagcaa
> acagtgtgca
> 661 ggaacgtact gctttgtgag ggctctgctc ccggtctcat gcactggtga
> gcagtgaccc
> 721 cagtggcctg gcacagacgg ggctcagaga agcttgcttc cgactgtttc
> agaacattcc
> 781 atagtaacac aagatttatc cgtctggagg aaatacatgc agctcagctt
> cctctgagtt
> 841 agaaagaaaa ctacatcaag ggttcactta atccagacta taaaatcagt
> ggcagagcag
> 901 caccaggttt gcttgaatga tttggttttg gcagaaattc gctctcacat
> gctaaattta
> 961 cttttgaatc acaaagcgtg gagcgtgttc atgtgagagc ttccacggtt
> gccttctgag
> 1021 ggctcggccc aaaacttctg tgctggcgga aagatgtccg taagcatttc
> tgtgttagcc
> 1081 tctgtctgtg cgttcataaa ccctcattgt agcaactctg aagctgacaa
> attcttacac
> 1141 agaacatgcc ttgaatgcct taatttgtct ttcattcctg aattcctgct
> tagtttatct
> 1201 ctagatgatg gaaccttgtc agccatatgg actgcatctt ggttttagga
> cccctttctg
> 1261 ctttgcacct ctgtgcccac accctcagct cccatagtgg tataccaagg
> gagcgttccc
> 1321 agaaggtggg tgctctgagc ctcatctttc ccttgtccca gggattggcc
> ttggggagca
> 1381 cagtccgccc aggccgctgg tgccccctga ggcacagaag ctgccccagc
> tgcaggcgtg
> 1441 gctcccccaa gcagagctgt gcttttcagc aggccagctg cacagagaga
> aatcatagaa
> 1501 tcacagaatc atacaatggc ctgggctgaa aaggaccaca atgcccatcc
> agttccaacc
> 1561 ccctgctatg tgcagggtca ccaaccagca gaccaggctg cccagagcca
> catccagcct
> 1621 ggccttgaat gcctccaggg atggggcctc cttgggcgac ctgttccaat
> gcatcaacac
> 1681 cctccaagtg aaaaacttcc tcctgatata cctgaacatc ccctgtctta
> tttaagatca
> 1741 ttcccccttg tcctgtcact atccaccctc gtgaacagct gttccccttc
> ctgtttatat
> 1801 gcttcctaaa atcaagaaag gttctaggcc tatatgttct cttcccccat
> acatcaaata
> 1861 cacaggtgtg tgtctgtatg tctctgtgca taactcaaag cagcgttgtt
> tttagcagat
> 1921 aggtgaattg ttccccaagt tgcaggcagg cgcagtgctg ctcagcatgc
> agagcagcag
> 1981 gttgctaaca gatagcagca ggctgttctg tggtgtaagg ttcttaagta
> tgcaatgtgt
> 2041 gcccttctcg tggacttttt ttttcttaaa tgtttgtgta tgaactgatc
> tttgtttctc
> 2101 ataaaaa
> //
>
>
> ------end file----------
> On Aug 3, 2004, at 4:23 PM, Jeffrey Chang wrote:
>
> > Hi Aaron,
> >
> > Can you send the file that is generating the error? I believe it is
> > called /var/www/html/data/a12345.fasta. In general, the fasta parser
> > should be well-tested. It works on a test file in fasta format that I
> > have here. It would help most if someone could look at your file to
> > see what's going on.
> >
> > Thanks,
> > Jeff
> >
> >
> > On Aug 3, 2004, at 3:42 PM, Aaron Zschau wrote:
> >
> >> I've sent a couple messages to the list about this but I'm not sure
> >> if they're going through as I haven't seen any replies. I am trying
> >> to get a section of my code that worked before the 1.30 revision of
> >> biopython, based on the cookbook tutorials. My code looks up a gene
> >> by name in genbank and saves the FASTA version of that data so that
> >> the protein string can be fed into a BLAST search. The lookup works
> >> fine and I get a FASTA file saved just fine, however I then get an
> >> error at the parse stage at character 0 of the file.
> >>
> >> Any help would be greatly appreciated
> >>
> >> thanks
> >>
> >> Aaron Zschau
> >>
> >>
> >>
> >>
> >>
> >>
> >> #file_for_blast = open(data_path_prefix + file_unique_id + 'fasta',
> >> 'r')
> >> file_for_blast = open('/var/www/html/data/a12345.fasta','r')
> >>
> >> f_iterator = Fasta.Iterator(file_for_blast)
> >> print "iterator created"
> >> sys.stdout.flush()
> >>
> >> f_record = f_iterator.next()
> >> print "f_record created"
> >> sys.stdout.flush()
> >>
> >> -----------------------
> >>
> >> iterator created
> >> Traceback (most recent call last):
> >> File "cluster-debug.py", line 119, in ?
> >> f_record = f_iterator.next()
> >> File
> >> "/root/biopython-1.30/build/lib.linux-i586-2.2/Bio/Fasta/
> >> __init__.py", line 72, in next
> >> result = self._iterator.next()
> >> File
> >> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/IterParser.py",
> >> line 152, in iterateFile
> >> self.header_parser.parseString(rec)
> >> File
> >> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/Parser.py",
> >> line 361, in parseString
> >> self._err_handler.fatalError(ParserIncompleteException(pos))
> >> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",
> >> line 38, in fatalError
> >> raise exception
> >> Martel.Parser.ParserIncompleteException: error parsing at or beyond
> >> character 0 (unparsed text remains)
> >>
> >> _______________________________________________
> >> BioPython mailing list - BioPython at biopython.org
> >> http://biopython.org/mailman/listinfo/biopython
> >
> > _______________________________________________
> > BioPython mailing list - BioPython at biopython.org
> > http://biopython.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list - BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
More information about the BioPython
mailing list