[Bioperl-l] found the error tarp in load_seqdatabase.pl

Hilmar Lapp hlapp at gmx.net
Thu Jan 31 20:10:35 UTC 2008


I see. Note that the sequence below is really a UniProt sequence,  
that has been reformatted into GenBank format, and hence aren't in  
your typical genbank sequence format (which usually lacks DBSOURCE,  
for example). (The joys of data integration.)

If you load the same sequence from UniProt, does it still fail to  
parse or to load?

Also, does it or does this not mean that sequences at the link you  
sent load w/o error? I.e., can I close that issue report, or is there  
a bug in bioperl-db?

	-hilmar

On Jan 31, 2008, at 1:46 PM, snoze pa wrote:

> The link i sent was related to my tutorial. I was following that  
> website. The typical example is one of the following which have  
> xrefs (non-sequence databases): line.
> thanks
> s
>
> LOCUS       P27912                   792 aa            linear   VRL  
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein)  
> (Capsid
>             protein); prM; Peptide pr; Small envelope protein M  
> (Matrix
>             protein); Envelope protein E; Non-structural protein 1  
> (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             xrefs (non-sequence databases): HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069,  
> InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA: 
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,  
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;  
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane;  
> Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;  
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of  
> dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein  
> E during
>             intracellular virion assembly by masking and  
> inactivating envelope
>             protein E fusion peptide. prM is matured in the last  
> step of virion
>             assembly, presumably to avoid catastrophic activation  
> of the viral
>             fusion peptide induced by the acidic pH of the trans- 
> Golgi network.
>             After cleavage by host furin, the pr peptide is  
> released in the
>             extracellular medium and small envelope protein M and  
> envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface  
> receptor and is
>             involved in membrane fusion between virion and target  
> cell.
>             Synthesized as an homodimer with prM which acts as a  
> chaperone for
>             envelope protein E. After cleavage of prM, envelope  
> protein E
>             dissociate from small envelope protein M and  
> homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted  
> from
>             mammalian cells, but not from mosquito cells. Secreted  
> form elicits
>             protective immune response and plays an essential role  
> in RNA
>             replication. Soluble and membrane-associated NS1 may  
> activate human
>             complement and induce host vascular leakage. This  
> effect might
>             explain the clinical manifestations of dengue  
> hemorrhagic fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers  
> in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as  
> homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to  
> the Golgi,
>             then transported again to the cell membrane where it is  
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By  
> similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion  
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane  
> protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope  
> protein M and
>             envelope protein E contains an endoplasmic reticulum  
> retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature  
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic  
> domain that
>             act as a signal sequence for translocation of prM into  
> the lumen of
>             the ER. Mature protein C is cleaved at a site upstream  
> of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi  
> vesicles by
>             a host furin, releasing the mature small envelope  
> protein M, and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF  
> 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever  
> mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains:  
> Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C.  
> Flaviviruses are small
>                      enveloped viruses with virions comprised of 3  
> proteins
>                      called C, M and E. Multiple copies of the C  
> protein form
>                      the nucleocapsid, which contains the ssRNA  
> molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By  
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in  
> mature form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The  
> flaviviruses
>                      are small enveloped animal viruses containing  
> a single
>                      positive strand genomic RNA. The genome  
> encodes one large
>                      ORF a polyprotein which undergos proteolytic  
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.  
> Flaviviruses
>                      are small enveloped viruses with virions  
> comprised of 3
>                      proteins called C, M and E. The envelope  
> glycoprotein M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Small envelope protein M. / 
> FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and  
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin- 
> like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no  
> additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no  
> additional details
>                      recorded"
>                      /note="Non-structural protein 1. / 
> FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf  
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp  
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm  
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega  
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd  
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt  
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv  
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg  
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev  
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek  
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae  
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft  
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg  
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to  
> load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list