[BioPython] GenBank.FeatureParser()

Ravinder Singh Ravinder.Singh at colorado.edu
Thu May 1 10:33:48 EDT 2003


Hi,
(A) We are having a problem with the GenBank.FeatureParser()using the 
code below (1). It gives the error (2). When we delete the word linear 
problem 1 goes away. Then it gives the second error (3). Has there been 
a fix for bug, likely from the change in the genbank format, or could 
you please fix it. Many thanks.
The genbank file that we are using is from URL 
http://www.ncbi.nih.gov/entrez/viewer.fcgi?val=NC_004353.1&from=1&to=1237870&txt=on&view=gb
A small top portion of the file is pasted below (4).

(B) Also, if I want to get the location of the fuzzy start or end, what 
do I need to do?
Again, many thanks.
Ravinder
****************************
Dr. Ravinder Singh

Assistant Professor
MCD Biology
347 UCB
University of Colorado
Boulder, CO 80309-0347

(303)492-8886 (voice)
(303)492-7744 (fax)

-- 
********************************************************************************


(1) Code:
from Bio import GenBank

def get_mRNA(genbank_file_name):

    gb_handle = open(genbank_file_name)
    feature_parser = GenBank.FeatureParser()
    iterator = GenBank.Iterator(gb_handle, feature_parser)

    while 1:
        cur_entry = iterator.next()
                       
        if cur_entry is None:
            break
                       
        for feature in cur_entry.features:
               
            if feature.type == "mRNA":
                for sub_feature in feature.sub_features:
                    length = sub_feature.location.nofuzzy_end - 
sub_feature.location.nofuzzy_start
                    print length

(2) Error:
Traceback (most recent call last):
  File "gb_exon_length_erin.py", line 31, in ?
    gb_parser.get_mRNA(filename)
  File "GenBank_Parser_erin.py", line 34, in get_mRNA
    cur_entry = iterator.next()
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 183, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 268, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 1250, in feed
    self._parser.parseFile(handle)
  File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line 
230, in parseFile
    self.parseString(fileobj.read())
  File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line 
258, in parseString
    self._err_handler.fatalError(result)
  File "/usr/local/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond 
character 55   

(3) error:
Traceback (most recent call last):
  File "gb_exon_length_erin.py", line 31, in ?
    get_mRNA(filename)
  File "gb_test.py", line 10, in get_mRNA
    cur_entry = iterator.next()
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 183, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 268, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
line 1250, in feed
    self._parser.parseFile(handle)
  File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line 
230, in parseFile
    self.parseString(fileobj.read())
  File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line 
258, in parseString
    self._err_handler.fatalError(result)
  File "/usr/local/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond 
character 6099

(4) Genbank file to be parsed:

LOCUS       NC_004353            1237870 bp    DNA     linear   INV 29-APR-2003
DEFINITION  Drosophila melanogaster chromosome 4 complete sequence.
ACCESSION   NC_004353
VERSION     NC_004353.1  GI:24638835
KEYWORDS    .
SOURCE      Drosophila melanogaster (fruit fly)
  ORGANISM  Drosophila melanogaster
            Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta;
            Pterygota; Neoptera; Endopterygota; Diptera; Brachycera;
            Muscomorpha; Ephydroidea; Drosophilidae; Drosophila.
REFERENCE   1  (bases 1 to 1237870)
  AUTHORS   Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D.,
            Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F.,
            George,R.A., Lewis,S.E., Richards,S., Ashburner,M., Henderson,S.N.,
            Sutton,G.G., Wortman,J.R., Yandell,M.D., Zhang,Q., Chen,L.X.,
            Brandon,R.C., Rogers,Y.H., Blazej,R.G., Champe,M., Pfeiffer,B.D.,
            Wan,K.H., Doyle,C., Baxter,E.G., Helt,G., Nelson,C.R., Gabor,G.L.,
            Abril,J.F., Agbayani,A., An,H.J., Andrews-Pfannkoch,C., Baldwin,D.,
            Ballew,R.M., Basu,A., Baxendale,J., Bayraktaroglu,L., Beasley,E.M.,
            Beeson,K.Y., Benos,P.V., Berman,B.P., Bhandari,D., Bolshakov,S.,
            Borkova,D., Botchan,M.R., Bouck,J., Brokstein,P., Brottier,P.,
            Burtis,K.C., Busam,D.A., Butler,H., Cadieu,E., Center,A.,
            Chandra,I., Cherry,J.M., Cawley,S., Dahlke,C., Davenport,L.B.,
            Davies,P., de Pablos,B., Delcher,A., Deng,Z., Mays,A.D., Dew,I.,
            Dietz,S.M., Dodson,K., Doup,L.E., Downes,M., Dugan-Rocha,S.,
            Dunkov,B.C., Dunn,P., Durbin,K.J., Evangelista,C.C., Ferraz,C.,
            Ferriera,S., Fleischmann,W., Fosler,C., Gabrielian,A.E., Garg,N.S.,
            Gelbart,W.M., Glasser,K., Glodek,A., Gong,F., Gorrell,J.H., Gu,Z.,
            Guan,P., Harris,M., Harris,N.L., Harvey,D., Heiman,T.J.,
            Hernandez,J.R., Houck,J., Hostin,D., Houston,K.A., Howland,T.J.,
            Wei,M.H., Ibegwam,C., Jalali,M., Kalush,F., Karpen,G.H., Ke,Z.,
            Kennison,J.A., Ketchum,K.A., Kimmel,B.E., Kodira,C.D., Kraft,C.,
            Kravitz,S., Kulp,D., Lai,Z., Lasko,P., Lei,Y., Levitsky,A.A.,
            Li,J., Li,Z., Liang,Y., Lin,X., Liu,X., Mattei,B., McIntosh,T.C.,
            McLeod,M.P., McPherson,D., Merkulov,G., Milshina,N.V., Mobarry,C.,
            Morris,J., Moshrefi,A., Mount,S.M., Moy,M., Murphy,B., Murphy,L.,
            Muzny,D.M., Nelson,D.L., Nelson,D.R., Nelson,K.A., Nixon,K.,
            Nusskern,D.R., Pacleb,J.M., Palazzolo,M., Pittman,G.S., Pan,S.,
            Pollard,J., Puri,V., Reese,M.G., Reinert,K., Remington,K.,
            Saunders,R.D., Scheeler,F., Shen,H., Shue,B.C., Siden-Kiamos,I.,
            Simpson,M., Skupski,M.P., Smith,T., Spier,E., Spradling,A.C.,
            Stapleton,M., Strong,R., Sun,E., Svirskas,R., Tector,C., Turner,R.,
            Venter,E., Wang,A.H., Wang,X., Wang,Z.Y., Wassarman,D.A.,
            Weinstock,G.M., Weissenbach,J., Williams,S.M., WoodageT,
            Worley,K.C., Wu,D., Yang,S., Yao,Q.A., Ye,J., Yeh,R.F.,
            Zaveri,J.S., Zhan,M., Zhang,G., Zhao,Q., Zheng,L., Zheng,X.H.,
            Zhong,F.N., Zhong,W., Zhou,X., Zhu,S., Zhu,X., Smith,H.O.,
            Gibbs,R.A., Myers,E.W., Rubin,G.M. and Venter,J.C.
  TITLE     The genome sequence of Drosophila melanogaster
  JOURNAL   Science 287 (5461), 2185-2195 (2000)
  MEDLINE   20196006
   PUBMED   10731132
REFERENCE   2  (bases 1 to 1237870)
  AUTHORS   Misra,S., Crosby,M.A., Matthews,B.B., Bayraktaroglu,L.,
            Campbell,K., Hradecky,P., Huang,Y., Kaminker,J.S., Prochnik,S.E.,
            Smith,C.D., Tupy,J.L., Bergman,C.M., Berman,B.P., Carlson,J.W.,
            Celniker,S.E., Clamp,M.E., Drysdale,R.A., Emmert,D., Frise,E., de
            Grey,A.D.N.J., Harris,N.L., Kronmiller,B., Marshall,B.,
            Millburn,G.H., Richter,J., Russo,S., Searle,S.M.J., Smith,E.,
            Shu,S., Smutniak,F., Whitfield,E.J., Ashburner,M., Gelbart,W.M.,
            Rubin,G.M., Mungall,C.J. and Lewis,S.E.
  TITLE     Annotation of Drosophila melanogaster genome
  JOURNAL   Unpublished
REFERENCE   3  (bases 1 to 1237870)
  AUTHORS   Celniker,S.E., Adams,M.D., Kronmiller,B., Wan,K.H., Holt,R.A.,
            Evans,C.A., Gocayne,J.D., Amanatides,P.G., Brandon,R.C., Rogers,Y.,
            Banzon,J., An,H., Baldwin,D., Banzon,J., Beeson,K.Y., Busam,D.A.,
            Carlson,J.W., Center,A., Champe,M., Davenport,L.B., Dietz,S.M.,
            Dodson,K., Dorsett,V., Doup,L.E., Doyle,C., Dresnek,D., Farfan,D.,
            Ferriera,S., Frise,E., Galle,R.F., Garg,N.S., George,R.A.,
            Gonzalez,M., Houck,J., Hoskins,R.A., Hostin,D., Howland,T.J.,
            Ibegwam,C., Jalali,M., Kruse,D., Li,P., Mattei,B., Moshrefi,A.,
            McIntosh,T.C., Moy,M., Murphy,B., Nelson,C., Nelson,K.A., Nunoo,J.,
            Pacleb,J., Paragas,V., Park,S., Patel,S., Pfeiffer,B.,
            Phouanenavong,S., Pittman,G.S., Puri,V., Richards,S., Scheeler,F.,
            Stapleton,M., Strong,R., Svirskas,R., Tector,C., Tyler,D.,
            Williams,S.M., Zaveri,J.S., Smith,H.O., Venter,J.C. and Rubin,G.M.
  TITLE     Sequencing of Drosophila melanogaster genome
  JOURNAL   Unpublished
REFERENCE   4  (bases 1 to 1237870)
  AUTHORS   FlyBase.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-SEP-2002) University of California Berkeley, 539 Life
            Sciences Addition, Berkeley, CA 94720, USA
REFERENCE   5  (bases 1 to 1237870)
  AUTHORS   Adams,M.D., Celniker,S.E., Gibbs,R.A., Rubin,G.M. and Venter,C.J.
  TITLE     Direct Submission
  JOURNAL   Submitted (21-MAR-2000) Celera Genomics, 45 West Gude Drive,
            Rockville, MD 20850, USA
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to final
            NCBI review. The reference sequence was derived from AE014135.
FEATURES             Location/Qualifiers
     source          1..1237870
                     /organism="Drosophila melanogaster"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:7227"
                     /chromosome="4"
                     /note="genotype: y[1]; cn[1] bw[1] sp[1]; Rh6[1]"
     repeat_region   complement(638..1719)
                     /locus_tag="TE20395"
                     /map="102A1-102A1"
                     /transposon="baggins{}1471"
                     /db_xref="FLYBASE:FBti0020395"
     repeat_region   complement(2554..4264)
                     /locus_tag="TE20396"
                     /map="102A1-102A1"
                     /transposon="Rt1c{}1472"
                     /db_xref="FLYBASE:FBti0020396"
     repeat_region   4886..11664
                     /locus_tag="TE20397"
                     /map="102A1-102A1"
                     /transposon="GATE{}1473"
                     /db_xref="FLYBASE:FBti0020397"
     repeat_region   complement(11691..12255)
                     /locus_tag="TE20398"
                     /map="102A1-102A1"
                     /transposon="GATE{}1474"
                     /db_xref="FLYBASE:FBti0020398"
     repeat_region   12291..13244
                     /locus_tag="TE20399"
                     /map="102A1-102A1"
                     /transposon="GATE{}1475"
                     /db_xref="FLYBASE:FBti0020399"
     repeat_region   complement(13288..13761)
                     /locus_tag="TE20400"
                     /map="102A1-102A1"
                     /transposon="1360{}1476"
                     /db_xref="FLYBASE:FBti0020400"
     repeat_region   complement(17702..18272)
                     /locus_tag="TE20401"
                     /map="102A1-102A1"
                     /transposon="Rt1b{}1477"
                     /db_xref="FLYBASE:FBti0020401"
     gene            complement(22335..23205)
                     /locus_tag="CG32013"
                     /map="102A1-102A1"
                     /db_xref="FLYBASE:FBgn0052013"
                     /db_xref="LocusID:317821"
     mRNA            complement(join(22335..22528,22617..23205))
                     /locus_tag="CG32013"
                     /product="CG32013-RA"
                     /transcript_id="NM_166710.1"
                     /db_xref="GI:24638483"
                     /db_xref="FLYBASE:FBgn0052013"
                     /db_xref="LocusID:317821"
     CDS             complement(join(22335..22528,22617..23205))
                     /locus_tag="CG32013"
                     /codon_start=1
                     /protein_id="NP_726514.1"
                     /db_xref="GI:24638484"
                     /db_xref="FLYBASE:FBgn0052013"
                     /db_xref="LocusID:317821"
     gene            24068..25621
                     /locus_tag="CG17923"
                     /note="synonym: JYalpha"
                     /map="102A1-102A1"
                     /db_xref="FLYBASE:FBgn0040037"
                     /db_xref="LocusID:49962"
     mRNA            join(24068..24477,24979..25153,25218..25450,25501..25621)
                     /locus_tag="CG17923"
                     /product="CG17923-RA"
                     /transcript_id="NM_143896.2"
                     /db_xref="GI:24638485"
                     /db_xref="FLYBASE:FBgn0040037"
                     /db_xref="LocusID:49962"
     CDS             join(24134..24477,24979..25153,25218..25450,25501..25621)
                     /locus_tag="CG17923"
                     /codon_start=1
                     /protein_id="NP_652153.2"
                     /db_xref="GI:24638486"
                     /db_xref="FLYBASE:FBgn0040037"
                     /db_xref="LocusID:49962"
     gene            complement(26482..34110)
                     /locus_tag="CG32011"
                     /map="102A1-102A1"
                     /db_xref="FLYBASE:FBgn0052011"
                     /db_xref="LocusID:317820"
     mRNA            complement(join(26482..26667,27167..27349,28371..28609,
                     28966..29301,29356..30391,30551..31625,31703..32391,
                     33949..34110))
                     /locus_tag="CG32011"
                     /product="CG32011-RA"
                     /transcript_id="NM_166711.2"
                     /db_xref="GI:28558763"
                     /db_xref="FLYBASE:FBgn0052011"
                     /db_xref="LocusID:317820"
     CDS             complement(join(26482..26667,27167..27349,28371..28609,
                     28966..29301,29356..30391,30551..31625,31703..32391,
                     33949..34110))
                     /locus_tag="CG32011"
                     /codon_start=1
                     /protein_id="NP_726515.2"
                     /db_xref="GI:28558764"
                     /db_xref="FLYBASE:FBgn0052011"
                     /db_xref="LocusID:317820"
     repeat_region   34275..40713
                     /locus_tag="TE20402"
                     /map="102A1-102A2"
                     /transposon="McClintock{}1478"
                     /db_xref="FLYBASE:FBti0020402"






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://open-bio.org/pipermail/biopython/attachments/20030501/c9a53286/attachment.htm


More information about the BioPython mailing list