[BioPython] GenBank.FeatureParser()
Ravinder Singh
Ravinder.Singh at colorado.edu
Thu May 1 10:33:48 EDT 2003
Hi,
(A) We are having a problem with the GenBank.FeatureParser()using the
code below (1). It gives the error (2). When we delete the word linear
problem 1 goes away. Then it gives the second error (3). Has there been
a fix for bug, likely from the change in the genbank format, or could
you please fix it. Many thanks.
The genbank file that we are using is from URL
http://www.ncbi.nih.gov/entrez/viewer.fcgi?val=NC_004353.1&from=1&to=1237870&txt=on&view=gb
A small top portion of the file is pasted below (4).
(B) Also, if I want to get the location of the fuzzy start or end, what
do I need to do?
Again, many thanks.
Ravinder
****************************
Dr. Ravinder Singh
Assistant Professor
MCD Biology
347 UCB
University of Colorado
Boulder, CO 80309-0347
(303)492-8886 (voice)
(303)492-7744 (fax)
--
********************************************************************************
(1) Code:
from Bio import GenBank
def get_mRNA(genbank_file_name):
gb_handle = open(genbank_file_name)
feature_parser = GenBank.FeatureParser()
iterator = GenBank.Iterator(gb_handle, feature_parser)
while 1:
cur_entry = iterator.next()
if cur_entry is None:
break
for feature in cur_entry.features:
if feature.type == "mRNA":
for sub_feature in feature.sub_features:
length = sub_feature.location.nofuzzy_end -
sub_feature.location.nofuzzy_start
print length
(2) Error:
Traceback (most recent call last):
File "gb_exon_length_erin.py", line 31, in ?
gb_parser.get_mRNA(filename)
File "GenBank_Parser_erin.py", line 34, in get_mRNA
cur_entry = iterator.next()
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 183, in next
return self._parser.parse(File.StringHandle(data))
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 268, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 1250, in feed
self._parser.parseFile(handle)
File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line
230, in parseFile
self.parseString(fileobj.read())
File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line
258, in parseString
self._err_handler.fatalError(result)
File "/usr/local/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond
character 55
(3) error:
Traceback (most recent call last):
File "gb_exon_length_erin.py", line 31, in ?
get_mRNA(filename)
File "gb_test.py", line 10, in get_mRNA
cur_entry = iterator.next()
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 183, in next
return self._parser.parse(File.StringHandle(data))
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 268, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py",
line 1250, in feed
self._parser.parseFile(handle)
File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line
230, in parseFile
self.parseString(fileobj.read())
File "/usr/local/lib/python2.2/site-packages/Martel/Parser.py", line
258, in parseString
self._err_handler.fatalError(result)
File "/usr/local/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond
character 6099
(4) Genbank file to be parsed:
LOCUS NC_004353 1237870 bp DNA linear INV 29-APR-2003
DEFINITION Drosophila melanogaster chromosome 4 complete sequence.
ACCESSION NC_004353
VERSION NC_004353.1 GI:24638835
KEYWORDS .
SOURCE Drosophila melanogaster (fruit fly)
ORGANISM Drosophila melanogaster
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta;
Pterygota; Neoptera; Endopterygota; Diptera; Brachycera;
Muscomorpha; Ephydroidea; Drosophilidae; Drosophila.
REFERENCE 1 (bases 1 to 1237870)
AUTHORS Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D.,
Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F.,
George,R.A., Lewis,S.E., Richards,S., Ashburner,M., Henderson,S.N.,
Sutton,G.G., Wortman,J.R., Yandell,M.D., Zhang,Q., Chen,L.X.,
Brandon,R.C., Rogers,Y.H., Blazej,R.G., Champe,M., Pfeiffer,B.D.,
Wan,K.H., Doyle,C., Baxter,E.G., Helt,G., Nelson,C.R., Gabor,G.L.,
Abril,J.F., Agbayani,A., An,H.J., Andrews-Pfannkoch,C., Baldwin,D.,
Ballew,R.M., Basu,A., Baxendale,J., Bayraktaroglu,L., Beasley,E.M.,
Beeson,K.Y., Benos,P.V., Berman,B.P., Bhandari,D., Bolshakov,S.,
Borkova,D., Botchan,M.R., Bouck,J., Brokstein,P., Brottier,P.,
Burtis,K.C., Busam,D.A., Butler,H., Cadieu,E., Center,A.,
Chandra,I., Cherry,J.M., Cawley,S., Dahlke,C., Davenport,L.B.,
Davies,P., de Pablos,B., Delcher,A., Deng,Z., Mays,A.D., Dew,I.,
Dietz,S.M., Dodson,K., Doup,L.E., Downes,M., Dugan-Rocha,S.,
Dunkov,B.C., Dunn,P., Durbin,K.J., Evangelista,C.C., Ferraz,C.,
Ferriera,S., Fleischmann,W., Fosler,C., Gabrielian,A.E., Garg,N.S.,
Gelbart,W.M., Glasser,K., Glodek,A., Gong,F., Gorrell,J.H., Gu,Z.,
Guan,P., Harris,M., Harris,N.L., Harvey,D., Heiman,T.J.,
Hernandez,J.R., Houck,J., Hostin,D., Houston,K.A., Howland,T.J.,
Wei,M.H., Ibegwam,C., Jalali,M., Kalush,F., Karpen,G.H., Ke,Z.,
Kennison,J.A., Ketchum,K.A., Kimmel,B.E., Kodira,C.D., Kraft,C.,
Kravitz,S., Kulp,D., Lai,Z., Lasko,P., Lei,Y., Levitsky,A.A.,
Li,J., Li,Z., Liang,Y., Lin,X., Liu,X., Mattei,B., McIntosh,T.C.,
McLeod,M.P., McPherson,D., Merkulov,G., Milshina,N.V., Mobarry,C.,
Morris,J., Moshrefi,A., Mount,S.M., Moy,M., Murphy,B., Murphy,L.,
Muzny,D.M., Nelson,D.L., Nelson,D.R., Nelson,K.A., Nixon,K.,
Nusskern,D.R., Pacleb,J.M., Palazzolo,M., Pittman,G.S., Pan,S.,
Pollard,J., Puri,V., Reese,M.G., Reinert,K., Remington,K.,
Saunders,R.D., Scheeler,F., Shen,H., Shue,B.C., Siden-Kiamos,I.,
Simpson,M., Skupski,M.P., Smith,T., Spier,E., Spradling,A.C.,
Stapleton,M., Strong,R., Sun,E., Svirskas,R., Tector,C., Turner,R.,
Venter,E., Wang,A.H., Wang,X., Wang,Z.Y., Wassarman,D.A.,
Weinstock,G.M., Weissenbach,J., Williams,S.M., WoodageT,
Worley,K.C., Wu,D., Yang,S., Yao,Q.A., Ye,J., Yeh,R.F.,
Zaveri,J.S., Zhan,M., Zhang,G., Zhao,Q., Zheng,L., Zheng,X.H.,
Zhong,F.N., Zhong,W., Zhou,X., Zhu,S., Zhu,X., Smith,H.O.,
Gibbs,R.A., Myers,E.W., Rubin,G.M. and Venter,J.C.
TITLE The genome sequence of Drosophila melanogaster
JOURNAL Science 287 (5461), 2185-2195 (2000)
MEDLINE 20196006
PUBMED 10731132
REFERENCE 2 (bases 1 to 1237870)
AUTHORS Misra,S., Crosby,M.A., Matthews,B.B., Bayraktaroglu,L.,
Campbell,K., Hradecky,P., Huang,Y., Kaminker,J.S., Prochnik,S.E.,
Smith,C.D., Tupy,J.L., Bergman,C.M., Berman,B.P., Carlson,J.W.,
Celniker,S.E., Clamp,M.E., Drysdale,R.A., Emmert,D., Frise,E., de
Grey,A.D.N.J., Harris,N.L., Kronmiller,B., Marshall,B.,
Millburn,G.H., Richter,J., Russo,S., Searle,S.M.J., Smith,E.,
Shu,S., Smutniak,F., Whitfield,E.J., Ashburner,M., Gelbart,W.M.,
Rubin,G.M., Mungall,C.J. and Lewis,S.E.
TITLE Annotation of Drosophila melanogaster genome
JOURNAL Unpublished
REFERENCE 3 (bases 1 to 1237870)
AUTHORS Celniker,S.E., Adams,M.D., Kronmiller,B., Wan,K.H., Holt,R.A.,
Evans,C.A., Gocayne,J.D., Amanatides,P.G., Brandon,R.C., Rogers,Y.,
Banzon,J., An,H., Baldwin,D., Banzon,J., Beeson,K.Y., Busam,D.A.,
Carlson,J.W., Center,A., Champe,M., Davenport,L.B., Dietz,S.M.,
Dodson,K., Dorsett,V., Doup,L.E., Doyle,C., Dresnek,D., Farfan,D.,
Ferriera,S., Frise,E., Galle,R.F., Garg,N.S., George,R.A.,
Gonzalez,M., Houck,J., Hoskins,R.A., Hostin,D., Howland,T.J.,
Ibegwam,C., Jalali,M., Kruse,D., Li,P., Mattei,B., Moshrefi,A.,
McIntosh,T.C., Moy,M., Murphy,B., Nelson,C., Nelson,K.A., Nunoo,J.,
Pacleb,J., Paragas,V., Park,S., Patel,S., Pfeiffer,B.,
Phouanenavong,S., Pittman,G.S., Puri,V., Richards,S., Scheeler,F.,
Stapleton,M., Strong,R., Svirskas,R., Tector,C., Tyler,D.,
Williams,S.M., Zaveri,J.S., Smith,H.O., Venter,J.C. and Rubin,G.M.
TITLE Sequencing of Drosophila melanogaster genome
JOURNAL Unpublished
REFERENCE 4 (bases 1 to 1237870)
AUTHORS FlyBase.
TITLE Direct Submission
JOURNAL Submitted (06-SEP-2002) University of California Berkeley, 539 Life
Sciences Addition, Berkeley, CA 94720, USA
REFERENCE 5 (bases 1 to 1237870)
AUTHORS Adams,M.D., Celniker,S.E., Gibbs,R.A., Rubin,G.M. and Venter,C.J.
TITLE Direct Submission
JOURNAL Submitted (21-MAR-2000) Celera Genomics, 45 West Gude Drive,
Rockville, MD 20850, USA
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AE014135.
FEATURES Location/Qualifiers
source 1..1237870
/organism="Drosophila melanogaster"
/mol_type="genomic DNA"
/db_xref="taxon:7227"
/chromosome="4"
/note="genotype: y[1]; cn[1] bw[1] sp[1]; Rh6[1]"
repeat_region complement(638..1719)
/locus_tag="TE20395"
/map="102A1-102A1"
/transposon="baggins{}1471"
/db_xref="FLYBASE:FBti0020395"
repeat_region complement(2554..4264)
/locus_tag="TE20396"
/map="102A1-102A1"
/transposon="Rt1c{}1472"
/db_xref="FLYBASE:FBti0020396"
repeat_region 4886..11664
/locus_tag="TE20397"
/map="102A1-102A1"
/transposon="GATE{}1473"
/db_xref="FLYBASE:FBti0020397"
repeat_region complement(11691..12255)
/locus_tag="TE20398"
/map="102A1-102A1"
/transposon="GATE{}1474"
/db_xref="FLYBASE:FBti0020398"
repeat_region 12291..13244
/locus_tag="TE20399"
/map="102A1-102A1"
/transposon="GATE{}1475"
/db_xref="FLYBASE:FBti0020399"
repeat_region complement(13288..13761)
/locus_tag="TE20400"
/map="102A1-102A1"
/transposon="1360{}1476"
/db_xref="FLYBASE:FBti0020400"
repeat_region complement(17702..18272)
/locus_tag="TE20401"
/map="102A1-102A1"
/transposon="Rt1b{}1477"
/db_xref="FLYBASE:FBti0020401"
gene complement(22335..23205)
/locus_tag="CG32013"
/map="102A1-102A1"
/db_xref="FLYBASE:FBgn0052013"
/db_xref="LocusID:317821"
mRNA complement(join(22335..22528,22617..23205))
/locus_tag="CG32013"
/product="CG32013-RA"
/transcript_id="NM_166710.1"
/db_xref="GI:24638483"
/db_xref="FLYBASE:FBgn0052013"
/db_xref="LocusID:317821"
CDS complement(join(22335..22528,22617..23205))
/locus_tag="CG32013"
/codon_start=1
/protein_id="NP_726514.1"
/db_xref="GI:24638484"
/db_xref="FLYBASE:FBgn0052013"
/db_xref="LocusID:317821"
gene 24068..25621
/locus_tag="CG17923"
/note="synonym: JYalpha"
/map="102A1-102A1"
/db_xref="FLYBASE:FBgn0040037"
/db_xref="LocusID:49962"
mRNA join(24068..24477,24979..25153,25218..25450,25501..25621)
/locus_tag="CG17923"
/product="CG17923-RA"
/transcript_id="NM_143896.2"
/db_xref="GI:24638485"
/db_xref="FLYBASE:FBgn0040037"
/db_xref="LocusID:49962"
CDS join(24134..24477,24979..25153,25218..25450,25501..25621)
/locus_tag="CG17923"
/codon_start=1
/protein_id="NP_652153.2"
/db_xref="GI:24638486"
/db_xref="FLYBASE:FBgn0040037"
/db_xref="LocusID:49962"
gene complement(26482..34110)
/locus_tag="CG32011"
/map="102A1-102A1"
/db_xref="FLYBASE:FBgn0052011"
/db_xref="LocusID:317820"
mRNA complement(join(26482..26667,27167..27349,28371..28609,
28966..29301,29356..30391,30551..31625,31703..32391,
33949..34110))
/locus_tag="CG32011"
/product="CG32011-RA"
/transcript_id="NM_166711.2"
/db_xref="GI:28558763"
/db_xref="FLYBASE:FBgn0052011"
/db_xref="LocusID:317820"
CDS complement(join(26482..26667,27167..27349,28371..28609,
28966..29301,29356..30391,30551..31625,31703..32391,
33949..34110))
/locus_tag="CG32011"
/codon_start=1
/protein_id="NP_726515.2"
/db_xref="GI:28558764"
/db_xref="FLYBASE:FBgn0052011"
/db_xref="LocusID:317820"
repeat_region 34275..40713
/locus_tag="TE20402"
/map="102A1-102A2"
/transposon="McClintock{}1478"
/db_xref="FLYBASE:FBti0020402"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://open-bio.org/pipermail/biopython/attachments/20030501/c9a53286/attachment.htm
More information about the BioPython
mailing list