[BioPython] FASTA parsing errors
Aaron Zschau
aaron at ocelot-atroxen.dyndns.org
Tue Aug 3 16:48:36 EDT 2004
This is the file that is being read. I know it worked in 1.24 just fine
but maybe something changed in the versions that make it not like this
format
thanks,
Aaron Zschau
------a12345.fasta----------
LOCUS XM_414447 2107 bp mRNA linear VRT
28-JUL-2004
DEFINITION PREDICTED: Gallus gallus similar to von Hippel-Lindau
protein
(LOC416117), mRNA.
ACCESSION XM_414447
VERSION XM_414447.1 GI:50754623
KEYWORDS .
SOURCE Gallus gallus (red jungle fowl)
ORGANISM Gallus gallus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
Archosauria; Aves; Neognathae; Galliformes; Phasianidae;
Phasianinae; Gallus.
COMMENT MODEL REFSEQ: This record is predicted by automated
computational
analysis. This record is derived from an annotated genomic
sequence
(NW_060494) using gene prediction method: GNOMON, supported
by EST
evidence.
Also see:
Documentation of NCBI's Annotation Process
FEATURES Location/Qualifiers
source 1..2107
/organism="Gallus gallus"
/mol_type="mRNA"
/strain="inbred line UCD001"
/isolate="#256"
/db_xref="taxon:9031"
/chromosome="12"
/sex="female"
/note="inbred line derived from a wild population
of red
jungle fowl in Malaysia in the late 1930s, with the
possible introgression of a limited amount of White
Leghorn genome during its captive breeding history
common: red jungle fowl"
gene 1..2107
/gene="LOC416117"
/note="Derived by automated computational analysis
using
gene prediction method: GNOMON."
/db_xref="GeneID:416117"
/db_xref="InterimID:416117"
CDS 1..486
/gene="LOC416117"
/codon_start=1
/product="similar to von Hippel-Lindau protein"
/protein_id="XP_414447.1"
/db_xref="GI:50754624"
/db_xref="GeneID:416117"
/db_xref="InterimID:416117"
/translation="MAPPGPGPAGPCLRSANTRELSEVVFNNRSPRAVLPIWVDFEGR
PRYYPVLRPRTGRIMHSYRGHLWLFRDAGTHDGLLVNRQELFVAAPDVNKADITLPVF
TLKERCLQVVRSLVRPGDYRKLDIVRSLYEELEDHPDVKKDLQRLSMERSKTLQEEIL
H"
misc_feature 37..453
/gene="LOC416117"
/note="VHL; Region: von Hippel-Lindau disease
tumour
suppressor protein. VHL forms a ternary complex
with the
elonginB and elonginC proteins. This complex binds
Cul2,
which then is involved in regulation of vascular
endothelial growth factor mRNA"
/db_xref="CDD:pfam01847"
ORIGIN
1 atggcgccgc cgggtccggg tcccgccggg ccgtgcctgc gctccgccaa
cacgcgcgaa
61 ctctccgagg tcgtcttcaa caaccgcagc ccgcgcgccg tgctccccat
ctgggtggac
121 ttcgagggcc ggccgcgcta ctaccccgtg ctgcggccgc gcaccgggcg
gatcatgcac
181 agctaccgcg ggcacctgtg gctgttccgc gacgcgggca cgcacgacgg
gctgctcgtc
241 aaccggcagg agctgttcgt ggccgcgccg gacgtcaaca aggccgacat
cacgctgcca
301 gtgttcacgc tgaaggagcg gtgcctgcag gtggtgcgca gcctggtccg
gccgggggac
361 taccggaagc tggacatcgt gcgctcgctg tacgaggagc tggaggacca
ccccgacgtc
421 aagaaggacc tgcagcggct ctccatggag aggagcaaaa cgttacagga
ggaaatcctc
481 cactaacagg gctgtgcgtc ccgagccgtg tagatagcaa agcaccgagc
ttaggagggg
541 cagctgccgt gcagcgtgcc gggagctaac gtctgcatcg acgttctgga
acgaactcag
601 tcatgctgta gaacatttgc tatgctggta ggtcagattc caaagagcaa
acagtgtgca
661 ggaacgtact gctttgtgag ggctctgctc ccggtctcat gcactggtga
gcagtgaccc
721 cagtggcctg gcacagacgg ggctcagaga agcttgcttc cgactgtttc
agaacattcc
781 atagtaacac aagatttatc cgtctggagg aaatacatgc agctcagctt
cctctgagtt
841 agaaagaaaa ctacatcaag ggttcactta atccagacta taaaatcagt
ggcagagcag
901 caccaggttt gcttgaatga tttggttttg gcagaaattc gctctcacat
gctaaattta
961 cttttgaatc acaaagcgtg gagcgtgttc atgtgagagc ttccacggtt
gccttctgag
1021 ggctcggccc aaaacttctg tgctggcgga aagatgtccg taagcatttc
tgtgttagcc
1081 tctgtctgtg cgttcataaa ccctcattgt agcaactctg aagctgacaa
attcttacac
1141 agaacatgcc ttgaatgcct taatttgtct ttcattcctg aattcctgct
tagtttatct
1201 ctagatgatg gaaccttgtc agccatatgg actgcatctt ggttttagga
cccctttctg
1261 ctttgcacct ctgtgcccac accctcagct cccatagtgg tataccaagg
gagcgttccc
1321 agaaggtggg tgctctgagc ctcatctttc ccttgtccca gggattggcc
ttggggagca
1381 cagtccgccc aggccgctgg tgccccctga ggcacagaag ctgccccagc
tgcaggcgtg
1441 gctcccccaa gcagagctgt gcttttcagc aggccagctg cacagagaga
aatcatagaa
1501 tcacagaatc atacaatggc ctgggctgaa aaggaccaca atgcccatcc
agttccaacc
1561 ccctgctatg tgcagggtca ccaaccagca gaccaggctg cccagagcca
catccagcct
1621 ggccttgaat gcctccaggg atggggcctc cttgggcgac ctgttccaat
gcatcaacac
1681 cctccaagtg aaaaacttcc tcctgatata cctgaacatc ccctgtctta
tttaagatca
1741 ttcccccttg tcctgtcact atccaccctc gtgaacagct gttccccttc
ctgtttatat
1801 gcttcctaaa atcaagaaag gttctaggcc tatatgttct cttcccccat
acatcaaata
1861 cacaggtgtg tgtctgtatg tctctgtgca taactcaaag cagcgttgtt
tttagcagat
1921 aggtgaattg ttccccaagt tgcaggcagg cgcagtgctg ctcagcatgc
agagcagcag
1981 gttgctaaca gatagcagca ggctgttctg tggtgtaagg ttcttaagta
tgcaatgtgt
2041 gcccttctcg tggacttttt ttttcttaaa tgtttgtgta tgaactgatc
tttgtttctc
2101 ataaaaa
//
------end file----------
On Aug 3, 2004, at 4:23 PM, Jeffrey Chang wrote:
> Hi Aaron,
>
> Can you send the file that is generating the error? I believe it is
> called /var/www/html/data/a12345.fasta. In general, the fasta parser
> should be well-tested. It works on a test file in fasta format that I
> have here. It would help most if someone could look at your file to
> see what's going on.
>
> Thanks,
> Jeff
>
>
> On Aug 3, 2004, at 3:42 PM, Aaron Zschau wrote:
>
>> I've sent a couple messages to the list about this but I'm not sure
>> if they're going through as I haven't seen any replies. I am trying
>> to get a section of my code that worked before the 1.30 revision of
>> biopython, based on the cookbook tutorials. My code looks up a gene
>> by name in genbank and saves the FASTA version of that data so that
>> the protein string can be fed into a BLAST search. The lookup works
>> fine and I get a FASTA file saved just fine, however I then get an
>> error at the parse stage at character 0 of the file.
>>
>> Any help would be greatly appreciated
>>
>> thanks
>>
>> Aaron Zschau
>>
>>
>>
>>
>>
>>
>> #file_for_blast = open(data_path_prefix + file_unique_id + 'fasta',
>> 'r')
>> file_for_blast = open('/var/www/html/data/a12345.fasta','r')
>>
>> f_iterator = Fasta.Iterator(file_for_blast)
>> print "iterator created"
>> sys.stdout.flush()
>>
>> f_record = f_iterator.next()
>> print "f_record created"
>> sys.stdout.flush()
>>
>> -----------------------
>>
>> iterator created
>> Traceback (most recent call last):
>> File "cluster-debug.py", line 119, in ?
>> f_record = f_iterator.next()
>> File
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Bio/Fasta/
>> __init__.py", line 72, in next
>> result = self._iterator.next()
>> File
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/IterParser.py",
>> line 152, in iterateFile
>> self.header_parser.parseString(rec)
>> File
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/Parser.py",
>> line 361, in parseString
>> self._err_handler.fatalError(ParserIncompleteException(pos))
>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",
>> line 38, in fatalError
>> raise exception
>> Martel.Parser.ParserIncompleteException: error parsing at or beyond
>> character 0 (unparsed text remains)
>>
>> _______________________________________________
>> BioPython mailing list - BioPython at biopython.org
>> http://biopython.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list - BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
More information about the BioPython
mailing list