[Bioperl-l] found the error tarp in load_seqdatabase.pl

snoze pa snoze.pa at gmail.com
Thu Jan 31 18:46:24 UTC 2008


The link i sent was related to my tutorial. I was following that website.
The typical example is one of the following which have *xrefs (non-sequence
databases): line.
thanks
s
*
LOCUS       P27912                   792 aa            linear   VRL
15-JAN-2008
DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
            protein); prM; Peptide pr; Small envelope protein M (Matrix
            protein); Envelope protein E; Non-structural protein 1 (NS1)].
ACCESSION   P27912
VERSION     P27912.1  GI:130422
DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
            class: standard.
            created: Aug 1, 1992.
            sequence updated: Aug 1, 1992.
            annotation updated: Jan 15, 2008.
            xrefs: D00502.1, BAA00394.1, B32401
            *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
            GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
            InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
            InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:2.60.98.10,
            Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
Pfam:PF00869,
            Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
KEYWORDS    Capsid protein; Cleavage on pair of basic residues; Endoplasmic
            reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
            Transmembrane; Viral nucleoprotein; Virion.
SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
  ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
            Viruses; ssRNA positive-strand viruses, no DNA stage;
Flaviviridae;
            Flavivirus; Dengue virus group.
REFERENCE   1  (residues 1 to 792)
  AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
  TITLE     Genetic relatedness among structural protein genes of dengue 1
            virus strains
  JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
   PUBMED   2738579
  REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
            [FUNCTION] Protein C packages viral RNA to form a viral
            nucleocapsid, and promotes virion budding (By similarity).
            [FUNCTION] prM acts as a chaperone for envelope protein E during
            intracellular virion assembly by masking and inactivating
envelope
            protein E fusion peptide. prM is matured in the last step of
virion
            assembly, presumably to avoid catastrophic activation of the
viral
            fusion peptide induced by the acidic pH of the trans-Golgi
network.
            After cleavage by host furin, the pr peptide is released in the
            extracellular medium and small envelope protein M and envelope
            protein E homodimers are dissociated (By similarity).
            [FUNCTION] Envelope protein E binds cell surface receptor and is
            involved in membrane fusion between virion and target cell.
            Synthesized as an homodimer with prM which acts as a chaperone
for
            envelope protein E. After cleavage of prM, envelope protein E
            dissociate from small envelope protein M and homodimerizes (By
            similarity).
            [FUNCTION] Non-structural protein 1 is slowly secreted from
            mammalian cells, but not from mosquito cells. Secreted form
elicits
            protective immune response and plays an essential role in RNA
            replication. Soluble and membrane-associated NS1 may activate
human
            complement and induce host vascular leakage. This effect might
            explain the clinical manifestations of dengue hemorrhagic fever
and
            dengue shock syndrome (By similarity).
            [SUBUNIT] prM and envelope protein E form heterodimers in the
            endoplasmic reticulum and Golgi. Envelope protein E forms
            homodimers. NS1 forms homodimers as well as homohexamers when
            secreted. NS1 may interact with NS4A (By similarity).
            [SUBCELLULAR LOCATION] Note=The virion is assembled in the
            endoplasmic reticulum lumen, transported by vesicles to the
Golgi,
            then transported again to the cell membrane where it is released
            outside the cell.
            [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
            [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
            [SUBCELLULAR LOCATION] Small envelope protein M: Virion
membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
            Single-pass type I membrane protein (By similarity).
            [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
            Endoplasmic reticulum membrane; Peripheral membrane protein;
            Lumenal side (By similarity).
            [DOMAIN] Transmembrane domains of the small envelope protein M
and
            envelope protein E contains an endoplasmic reticulum retention
            signals (By similarity).
            [PTM] Specific enzymatic cleavages in vivo yield mature
proteins.
            The nascent protein C contains a C-terminal hydrophobic domain
that
            act as a signal sequence for translocation of prM into the lumen
of
            the ER. Mature protein C is cleaved at a site upstream of this
            hydrophobic domain by NS3. prM is cleaved in post-Golgi vesicles
by
            a host furin, releasing the mature small envelope protein M, and
            peptide pr (By similarity).
            [PTM] Envelope protein E and non-structural protein 1 are
            N-glycosylated (By similarity).
FEATURES             Location/Qualifiers
     source          1..792
                     /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
                     /specific_host="Aedes aegypti (Yellowfever mosquito)"
                     /specific_host="Homo sapiens (Human)"
                     /db_xref="taxon:11057"
     Protein         1..>792
                     /product="Genome polyprotein [Contains: Protein C"
     Region          1..101
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          1..100
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Protein C. /FTId=PRO_0000037884."
     Region          5..114
                     /region_name="Flavi_capsid"
                     /note="Flavivirus capsid protein C. Flaviviruses are
small
                     enveloped viruses with virions comprised of 3 proteins
                     called C, M and E. Multiple copies of the C protein
form
                     the nucleocapsid, which contains the ssRNA molecule;
                     pfam01003"
                     /db_xref="CDD:85176"
     Site            100..101
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by serine protease NS3 (By
similarity)."
     Region          101..114
                     /region_name="Propeptide"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="ER anchor for the protein C, removed in mature
form
                     by serine protease NS3. /FTId=PRO_0000037885."
     Region          102..122
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            114..115
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          115..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="prM. /FTId=PRO_0000264649."
     Region          115..205
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Peptide pr. /FTId=PRO_0000264650."
     Region          119..204
                     /region_name="Flavi_propep"
                     /note="Flavivirus polyprotein propeptide. The
flaviviruses
                     are small enveloped animal viruses containing a single
                     positive strand genomic RNA. The genome encodes one
large
                     ORF a polyprotein which undergos proteolytic processing
                     into mature viral peptide chains; pfam01570"
                     /db_xref="CDD:65376"
     Region          123..238
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            183
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Site            205..206
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host furin (By similarity)."
     Region          206..280
                     /region_name="Flavi_M"
                     /note="Flavivirus envelope glycoprotein M. Flaviviruses
                     are small enveloped viruses with virions comprised of 3
                     proteins called C, M and E. The envelope glycoprotein M
is
                     made as a precursor, called prM; pfam01004"
                     /db_xref="CDD:85177"
     Region          206..280
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Small envelope protein M. /FTId=PRO_0000037886."
     Region          239..259
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          260..265
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          266..286
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Site            280..281
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          281..775
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Envelope protein E. /FTId=PRO_0000037887."
     Region          281..576
                     /region_name="Flavi_glycoprot"
                     /note="Flavivirus glycoprotein, central and
dimerisation
                     domains; pfam00869"
                     /db_xref="CDD:85082"
     Bond            bond(283,310)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          287..725
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Bond            bond(340,401)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            347
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(354,385)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Bond            bond(372,396)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Site            433
                     /site_type="glycosylation"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="N-linked (GlcNAc...) (Potential)."
     Bond            bond(465,565)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          578..673
                     /region_name="Flavi_glycop_C"
                     /note="Flavivirus glycoprotein, immunoglobulin-like
                     domain; pfam02832"
                     /db_xref="CDD:66513"
     Bond            bond(582,613)
                     /bond_type="disulfide"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="By similarity."
     Region          726..746
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          747..752
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cytoplasmic (Potential)."
     Region          753..773
                     /region_name="Transmembrane region"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Potential."
     Region          774..>792
                     /region_name="Topological domain"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Extracellular (Potential)."
     Site            775..776
                     /site_type="cleavage"
                     /inference="non-experimental evidence, no additional
                     details recorded"
                     /note="Cleavage; by host signal peptidase (By
                     similarity)."
     Region          776..>792
                     /region_name="Mature chain"
                     /experiment="experimental evidence, no additional
details
                     recorded"
                     /note="Non-structural protein 1. /FTId=PRO_0000037888."
ORIGIN
        1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf vaflrflaip
       61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp talafhlttr
      121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm teaepddvdc
      181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega wkqiqkvetw
      241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd fveglsgatw
      301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt dsrcptqgea
      361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv qyenlkysvi
      421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg ldfnrvvllt
      481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev vvlgsqegam
      541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek evaetqhgtv
      601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae ppfgesyivv
      661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft svgklihqif
      721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg vmvqadsgcv
      781 inwkgkelkc gs
//


On Jan 31, 2008 7:12 AM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
>
> > Hi Hilmar,
> >
> >  After spending lots of time i figure out the error. I am able to load
> > sequences if the sequences do not have following entry
> >
> > xrefs (non-sequence databases):
>
> Is this the literal value? I am asking because I can't find this in
> the file at
>
> http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
>
> which you said was giving you grief. So does the genbank file above
> now load, or how can I identify the critical line in there?
>
>        -hilmar
> >
> > If the Genbank sequence have this entry then script
> > load_seqdatabase.pl is
> > crashing. I try it in couple of sequences and found it is the
> > culprit line
> > genbank format.  But this line is important as it contain lots of
> > information... so I am wondering how to solve this problem
> >
> > Any help?
> >
> > Thanks in advance
> > s
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>



More information about the Bioperl-l mailing list