[BioPython] FASTA parsing errors

Aaron Zschau aaron at ocelot-atroxen.dyndns.org
Tue Aug 3 16:48:36 EDT 2004


This is the file that is being read. I know it worked in 1.24 just fine  
but maybe something changed in the versions that make it not like this  
format


thanks,

Aaron Zschau

------a12345.fasta----------


LOCUS       XM_414447               2107 bp    mRNA    linear   VRT  
28-JUL-2004
DEFINITION  PREDICTED: Gallus gallus similar to von Hippel-Lindau  
protein
             (LOC416117), mRNA.
ACCESSION   XM_414447
VERSION     XM_414447.1  GI:50754623
KEYWORDS    .
SOURCE      Gallus gallus (red jungle fowl)
   ORGANISM  Gallus gallus
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
Euteleostomi;
             Archosauria; Aves; Neognathae; Galliformes; Phasianidae;
             Phasianinae; Gallus.
COMMENT     MODEL REFSEQ:  This record is predicted by automated  
computational
             analysis. This record is derived from an annotated genomic  
sequence
             (NW_060494) using gene prediction method: GNOMON, supported  
by EST
             evidence.
             Also see:
                 Documentation of NCBI's Annotation Process

FEATURES             Location/Qualifiers
      source          1..2107
                      /organism="Gallus gallus"
                      /mol_type="mRNA"
                      /strain="inbred line UCD001"
                      /isolate="#256"
                      /db_xref="taxon:9031"
                      /chromosome="12"
                      /sex="female"
                      /note="inbred line derived from a wild population  
of red
                      jungle fowl in Malaysia in the late 1930s, with the
                      possible introgression of a limited amount of White
                      Leghorn genome during its captive breeding history
                      common: red jungle fowl"
      gene            1..2107
                      /gene="LOC416117"
                      /note="Derived by automated computational analysis  
using
                      gene prediction method: GNOMON."
                      /db_xref="GeneID:416117"
                      /db_xref="InterimID:416117"
      CDS             1..486
                      /gene="LOC416117"
                      /codon_start=1
                      /product="similar to von Hippel-Lindau protein"
                      /protein_id="XP_414447.1"
                      /db_xref="GI:50754624"
                      /db_xref="GeneID:416117"
                      /db_xref="InterimID:416117"
                       
/translation="MAPPGPGPAGPCLRSANTRELSEVVFNNRSPRAVLPIWVDFEGR
                       
PRYYPVLRPRTGRIMHSYRGHLWLFRDAGTHDGLLVNRQELFVAAPDVNKADITLPVF
                       
TLKERCLQVVRSLVRPGDYRKLDIVRSLYEELEDHPDVKKDLQRLSMERSKTLQEEIL
                      H"
      misc_feature    37..453
                      /gene="LOC416117"
                      /note="VHL; Region: von Hippel-Lindau disease  
tumour
                      suppressor protein. VHL forms a ternary complex  
with the
                      elonginB and elonginC proteins. This complex binds  
Cul2,
                      which then is involved in regulation of vascular
                      endothelial growth factor mRNA"
                      /db_xref="CDD:pfam01847"
ORIGIN
         1 atggcgccgc cgggtccggg tcccgccggg ccgtgcctgc gctccgccaa  
cacgcgcgaa
        61 ctctccgagg tcgtcttcaa caaccgcagc ccgcgcgccg tgctccccat  
ctgggtggac
       121 ttcgagggcc ggccgcgcta ctaccccgtg ctgcggccgc gcaccgggcg  
gatcatgcac
       181 agctaccgcg ggcacctgtg gctgttccgc gacgcgggca cgcacgacgg  
gctgctcgtc
       241 aaccggcagg agctgttcgt ggccgcgccg gacgtcaaca aggccgacat  
cacgctgcca
       301 gtgttcacgc tgaaggagcg gtgcctgcag gtggtgcgca gcctggtccg  
gccgggggac
       361 taccggaagc tggacatcgt gcgctcgctg tacgaggagc tggaggacca  
ccccgacgtc
       421 aagaaggacc tgcagcggct ctccatggag aggagcaaaa cgttacagga  
ggaaatcctc
       481 cactaacagg gctgtgcgtc ccgagccgtg tagatagcaa agcaccgagc  
ttaggagggg
       541 cagctgccgt gcagcgtgcc gggagctaac gtctgcatcg acgttctgga  
acgaactcag
       601 tcatgctgta gaacatttgc tatgctggta ggtcagattc caaagagcaa  
acagtgtgca
       661 ggaacgtact gctttgtgag ggctctgctc ccggtctcat gcactggtga  
gcagtgaccc
       721 cagtggcctg gcacagacgg ggctcagaga agcttgcttc cgactgtttc  
agaacattcc
       781 atagtaacac aagatttatc cgtctggagg aaatacatgc agctcagctt  
cctctgagtt
       841 agaaagaaaa ctacatcaag ggttcactta atccagacta taaaatcagt  
ggcagagcag
       901 caccaggttt gcttgaatga tttggttttg gcagaaattc gctctcacat  
gctaaattta
       961 cttttgaatc acaaagcgtg gagcgtgttc atgtgagagc ttccacggtt  
gccttctgag
      1021 ggctcggccc aaaacttctg tgctggcgga aagatgtccg taagcatttc  
tgtgttagcc
      1081 tctgtctgtg cgttcataaa ccctcattgt agcaactctg aagctgacaa  
attcttacac
      1141 agaacatgcc ttgaatgcct taatttgtct ttcattcctg aattcctgct  
tagtttatct
      1201 ctagatgatg gaaccttgtc agccatatgg actgcatctt ggttttagga  
cccctttctg
      1261 ctttgcacct ctgtgcccac accctcagct cccatagtgg tataccaagg  
gagcgttccc
      1321 agaaggtggg tgctctgagc ctcatctttc ccttgtccca gggattggcc  
ttggggagca
      1381 cagtccgccc aggccgctgg tgccccctga ggcacagaag ctgccccagc  
tgcaggcgtg
      1441 gctcccccaa gcagagctgt gcttttcagc aggccagctg cacagagaga  
aatcatagaa
      1501 tcacagaatc atacaatggc ctgggctgaa aaggaccaca atgcccatcc  
agttccaacc
      1561 ccctgctatg tgcagggtca ccaaccagca gaccaggctg cccagagcca  
catccagcct
      1621 ggccttgaat gcctccaggg atggggcctc cttgggcgac ctgttccaat  
gcatcaacac
      1681 cctccaagtg aaaaacttcc tcctgatata cctgaacatc ccctgtctta  
tttaagatca
      1741 ttcccccttg tcctgtcact atccaccctc gtgaacagct gttccccttc  
ctgtttatat
      1801 gcttcctaaa atcaagaaag gttctaggcc tatatgttct cttcccccat  
acatcaaata
      1861 cacaggtgtg tgtctgtatg tctctgtgca taactcaaag cagcgttgtt  
tttagcagat
      1921 aggtgaattg ttccccaagt tgcaggcagg cgcagtgctg ctcagcatgc  
agagcagcag
      1981 gttgctaaca gatagcagca ggctgttctg tggtgtaagg ttcttaagta  
tgcaatgtgt
      2041 gcccttctcg tggacttttt ttttcttaaa tgtttgtgta tgaactgatc  
tttgtttctc
      2101 ataaaaa
//


------end file----------
On Aug 3, 2004, at 4:23 PM, Jeffrey Chang wrote:

> Hi Aaron,
>
> Can you send the file that is generating the error?  I believe it is  
> called /var/www/html/data/a12345.fasta.  In general, the fasta parser  
> should be well-tested.  It works on a test file in fasta format that I  
> have here.  It would help most if someone could look at your file to  
> see what's going on.
>
> Thanks,
> Jeff
>
>
> On Aug 3, 2004, at 3:42 PM, Aaron Zschau wrote:
>
>> I've sent a couple messages to the list about this but I'm not sure  
>> if they're going through as I haven't seen any replies.  I am trying  
>> to get a section of my code that worked before the 1.30 revision of  
>> biopython, based on the cookbook tutorials. My code looks up a gene  
>> by name in genbank and saves the FASTA version of that data so that  
>> the protein string can be fed into a BLAST search.  The lookup works  
>> fine and I get a FASTA file saved just fine, however I then get an  
>> error at the parse stage at character 0 of the file.
>>
>> Any help would be greatly appreciated
>>
>> thanks
>>
>> Aaron Zschau
>>
>>
>>
>>
>>
>>
>> #file_for_blast = open(data_path_prefix + file_unique_id + 'fasta',  
>> 'r')
>> file_for_blast = open('/var/www/html/data/a12345.fasta','r')
>>
>> f_iterator = Fasta.Iterator(file_for_blast)
>> print "iterator created"
>> sys.stdout.flush()
>>
>> f_record = f_iterator.next()
>> print "f_record created"
>> sys.stdout.flush()
>>
>> -----------------------
>>
>> iterator created
>> Traceback (most recent call last):
>>   File "cluster-debug.py", line 119, in ?
>>     f_record = f_iterator.next()
>>   File  
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Bio/Fasta/ 
>> __init__.py", line 72, in next
>>     result = self._iterator.next()
>>   File  
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/IterParser.py",  
>> line 152, in iterateFile
>>     self.header_parser.parseString(rec)
>>   File  
>> "/root/biopython-1.30/build/lib.linux-i586-2.2/Martel/Parser.py",  
>> line 361, in parseString
>>     self._err_handler.fatalError(ParserIncompleteException(pos))
>>   File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",  
>> line 38, in fatalError
>>     raise exception
>> Martel.Parser.ParserIncompleteException: error parsing at or beyond  
>> character 0 (unparsed text remains)
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at biopython.org
>> http://biopython.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython



More information about the BioPython mailing list