[Bioperl-l] found the error tarp in load_seqdatabase.pl

snoze pa snoze.pa at gmail.com
Thu Jan 31 20:21:18 UTC 2008


Thanks Hilmar,

 I also thought that they are translated into genbank format. My problem is
i have downloaded tons of sequences from NCBI in gb format. In my flat
file,  i have many sequences in this format so I am unable to load them into
local database using  load_seqdatabase.pl script. So far i am full of
warnings and errors. Any solution to this problem? otherwise i will try to
write some code to load all sequences into local data base. But it seems to
be easy to modify the parsing code so that we can load these sequences.


>format (which usually lacks DBSOURCE, for example

I think if the three dimensional structure of the protein is known then in
ncbi gb format the DBSOURCE is common. I agree with you, the joys of
integration.

The link was related to tutorial i was using.. u can off it.

Thanks for looking into matter..
 s

On Jan 31, 2008 2:10 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

> I see. Note that the sequence below is really a UniProt sequence, that has
> been reformatted into GenBank format, and hence aren't in your typical
> genbank sequence format (which usually lacks DBSOURCE, for example). (The
> joys of data integration.)
> If you load the same sequence from UniProt, does it still fail to parse or
> to load?
>
> Also, does it or does this not mean that sequences at the link you sent
> load w/o error? I.e., can I close that issue report, or is there a bug in
> bioperl-db?
>
> -hilmar
>
> On Jan 31, 2008, at 1:46 PM, snoze pa wrote:
>
> The link i sent was related to my tutorial. I was following that website.
> The typical example is one of the following which have *xrefs
> (non-sequence databases): line.
> thanks
> s
> *
> LOCUS       P27912                   792 aa            linear   VRL
> 15-JAN-2008
> DEFINITION  Genome polyprotein [Contains: Protein C (Core protein) (Capsid
>             protein); prM; Peptide pr; Small envelope protein M (Matrix
>             protein); Envelope protein E; Non-structural protein 1 (NS1)].
> ACCESSION   P27912
> VERSION     P27912.1  GI:130422
> DBSOURCE    swissprot: locus POLG_DEN1A, accession P27912;
>             class: standard.
>             created: Aug 1, 1992.
>             sequence updated: Aug 1, 1992.
>             annotation updated: Jan 15, 2008.
>             xrefs: D00502.1, BAA00394.1, B32401
>             *xrefs (non-sequence databases):* HSSP:Q88653, SMR:P27912,
>             GO:0005789, InterPro:IPR011999, InterPro:IPR013754,
>             InterPro:IPR001122, InterPro:IPR000069, InterPro:IPR001157,
>             InterPro:IPR002535, InterPro:IPR000336, Gene3D:G3DSA:
> 2.60.98.10,
>             Gene3D:G3DSA:2.60.40.350, Pfam:PF01003, Pfam:PF02832,
> Pfam:PF00869,
>             Pfam:PF01004, Pfam:PF00948, Pfam:PF01570
> KEYWORDS    Capsid protein; Cleavage on pair of basic residues;
> Endoplasmic
>             reticulum; Envelope protein; Glycoprotein; Membrane; Secreted;
>             Transmembrane; Viral nucleoprotein; Virion.
> SOURCE      Dengue virus 1 Thailand/AHF 82-80/1980
>   ORGANISM  Dengue virus 1 Thailand/AHF 82-80/1980
>             Viruses; ssRNA positive-strand viruses, no DNA stage;
> Flaviviridae;
>             Flavivirus; Dengue virus group.
> REFERENCE   1  (residues 1 to 792)
>   AUTHORS   Chu,M.C., O'Rourke,E.J. and Trent,D.W.
>   TITLE     Genetic relatedness among structural protein genes of dengue 1
>             virus strains
>   JOURNAL   J. Gen. Virol. 70 (PT 7), 1701-1712 (1989)
>    PUBMED   2738579
>   REMARK    NUCLEOTIDE SEQUENCE [GENOMIC RNA].
> COMMENT     On May 27, 2005 this sequence version replaced gi:418950.
>             [FUNCTION] Protein C packages viral RNA to form a viral
>             nucleocapsid, and promotes virion budding (By similarity).
>             [FUNCTION] prM acts as a chaperone for envelope protein E
> during
>             intracellular virion assembly by masking and inactivating
> envelope
>             protein E fusion peptide. prM is matured in the last step of
> virion
>             assembly, presumably to avoid catastrophic activation of the
> viral
>             fusion peptide induced by the acidic pH of the trans-Golgi
> network.
>             After cleavage by host furin, the pr peptide is released in
> the
>             extracellular medium and small envelope protein M and envelope
>             protein E homodimers are dissociated (By similarity).
>             [FUNCTION] Envelope protein E binds cell surface receptor and
> is
>             involved in membrane fusion between virion and target cell.
>             Synthesized as an homodimer with prM which acts as a chaperone
> for
>             envelope protein E. After cleavage of prM, envelope protein E
>             dissociate from small envelope protein M and homodimerizes (By
>             similarity).
>             [FUNCTION] Non-structural protein 1 is slowly secreted from
>             mammalian cells, but not from mosquito cells. Secreted form
> elicits
>             protective immune response and plays an essential role in RNA
>             replication. Soluble and membrane-associated NS1 may activate
> human
>             complement and induce host vascular leakage. This effect might
>             explain the clinical manifestations of dengue hemorrhagic
> fever and
>             dengue shock syndrome (By similarity).
>             [SUBUNIT] prM and envelope protein E form heterodimers in the
>             endoplasmic reticulum and Golgi. Envelope protein E forms
>             homodimers. NS1 forms homodimers as well as homohexamers when
>             secreted. NS1 may interact with NS4A (By similarity).
>             [SUBCELLULAR LOCATION] Note=The virion is assembled in the
>             endoplasmic reticulum lumen, transported by vesicles to the
> Golgi,
>             then transported again to the cell membrane where it is
> released
>             outside the cell.
>             [SUBCELLULAR LOCATION] Protein C: Virion (By similarity).
>             [SUBCELLULAR LOCATION] Peptide pr: Secreted (By similarity).
>             [SUBCELLULAR LOCATION] Small envelope protein M: Virion
> membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Envelope protein E: Virion membrane;
>             Single-pass type I membrane protein (By similarity).
>             [SUBCELLULAR LOCATION] Non-structural protein 1: Secreted.
>             Endoplasmic reticulum membrane; Peripheral membrane protein;
>             Lumenal side (By similarity).
>             [DOMAIN] Transmembrane domains of the small envelope protein M
> and
>             envelope protein E contains an endoplasmic reticulum retention
>             signals (By similarity).
>             [PTM] Specific enzymatic cleavages in vivo yield mature
> proteins.
>             The nascent protein C contains a C-terminal hydrophobic domain
> that
>             act as a signal sequence for translocation of prM into the
> lumen of
>             the ER. Mature protein C is cleaved at a site upstream of this
>             hydrophobic domain by NS3. prM is cleaved in post-Golgi
> vesicles by
>             a host furin, releasing the mature small envelope protein M,
> and
>             peptide pr (By similarity).
>             [PTM] Envelope protein E and non-structural protein 1 are
>             N-glycosylated (By similarity).
> FEATURES             Location/Qualifiers
>      source          1..792
>                      /organism="Dengue virus 1 Thailand/AHF 82-80/1980"
>                      /specific_host="Aedes aegypti (Yellowfever mosquito)"
>                      /specific_host="Homo sapiens (Human)"
>                      /db_xref="taxon:11057"
>      Protein         1..>792
>                      /product="Genome polyprotein [Contains: Protein C"
>      Region          1..101
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          1..100
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Protein C. /FTId=PRO_0000037884."
>      Region          5..114
>                      /region_name="Flavi_capsid"
>                      /note="Flavivirus capsid protein C. Flaviviruses are
> small
>                      enveloped viruses with virions comprised of 3
> proteins
>                      called C, M and E. Multiple copies of the C protein
> form
>                      the nucleocapsid, which contains the ssRNA molecule;
>                      pfam01003"
>                      /db_xref="CDD:85176"
>      Site            100..101
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by serine protease NS3 (By
> similarity)."
>      Region          101..114
>                      /region_name="Propeptide"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="ER anchor for the protein C, removed in mature
> form
>                      by serine protease NS3. /FTId=PRO_0000037885."
>      Region          102..122
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            114..115
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          115..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="prM. /FTId=PRO_0000264649."
>      Region          115..205
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Peptide pr. /FTId=PRO_0000264650."
>      Region          119..204
>                      /region_name="Flavi_propep"
>                      /note="Flavivirus polyprotein propeptide. The
> flaviviruses
>                      are small enveloped animal viruses containing a
> single
>                      positive strand genomic RNA. The genome encodes one
> large
>                      ORF a polyprotein which undergos proteolytic
> processing
>                      into mature viral peptide chains; pfam01570"
>                      /db_xref="CDD:65376"
>      Region          123..238
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            183
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Site            205..206
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host furin (By similarity)."
>      Region          206..280
>                      /region_name="Flavi_M"
>                      /note="Flavivirus envelope glycoprotein M.
> Flaviviruses
>                      are small enveloped viruses with virions comprised of
> 3
>                      proteins called C, M and E. The envelope glycoprotein
> M is
>                      made as a precursor, called prM; pfam01004"
>                      /db_xref="CDD:85177"
>      Region          206..280
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Small envelope protein M.
> /FTId=PRO_0000037886."
>      Region          239..259
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          260..265
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          266..286
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Site            280..281
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          281..775
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Envelope protein E. /FTId=PRO_0000037887."
>      Region          281..576
>                      /region_name="Flavi_glycoprot"
>                      /note="Flavivirus glycoprotein, central and
> dimerisation
>                      domains; pfam00869"
>                      /db_xref="CDD:85082"
>      Bond            bond(283,310)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          287..725
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Bond            bond(340,401)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            347
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(354,385)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Bond            bond(372,396)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Site            433
>                      /site_type="glycosylation"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="N-linked (GlcNAc...) (Potential)."
>      Bond            bond(465,565)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          578..673
>                      /region_name="Flavi_glycop_C"
>                      /note="Flavivirus glycoprotein, immunoglobulin-like
>                      domain; pfam02832"
>                      /db_xref="CDD:66513"
>      Bond            bond(582,613)
>                      /bond_type="disulfide"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="By similarity."
>      Region          726..746
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          747..752
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cytoplasmic (Potential)."
>      Region          753..773
>                      /region_name="Transmembrane region"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Potential."
>      Region          774..>792
>                      /region_name="Topological domain"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Extracellular (Potential)."
>      Site            775..776
>                      /site_type="cleavage"
>                      /inference="non-experimental evidence, no additional
>                      details recorded"
>                      /note="Cleavage; by host signal peptidase (By
>                      similarity)."
>      Region          776..>792
>                      /region_name="Mature chain"
>                      /experiment="experimental evidence, no additional
> details
>                      recorded"
>                      /note="Non-structural protein 1.
> /FTId=PRO_0000037888."
> ORIGIN
>         1 mnnqrkktgn psfnmlkrar nrvstgsqla krfskgllsg qgpmklvmaf
> vaflrflaip
>        61 ptagilkrwg sfkkngainv lrgfrkeisn mlnimnrrrr svtmilmllp
> talafhlttr
>       121 ggeptlivsk qergksllfk tsagvnmctl iamdlgelce dtmtykcprm
> teaepddvdc
>       181 wcnatdtwvt ygtcsqtgeh rrdkrsvald phvglgletr tetwmssega
> wkqiqkvetw
>       241 alrhpgftvi glflahaigt sitqkgiifi llmlvtpsma mrcvgignrd
> fveglsgatw
>       301 vdvvlehgsc vttmaknkpt ldiellktev tnpavlrklc ieakisnttt
> dsrcptqgea
>       361 tlveeqdtnf vcrrtfvdrg wgngcglfgk gslitcakfk cvtklegkiv
> qyenlkysvi
>       421 vtvhtgdqhq vgnettehgt iatitpqapt seiqltdyga ltldcsprtg
> ldfnrvvllt
>       481 mkkkswlvhk qwfldlplpw tsgastsqet wnrqdllvtf ktahakkqev
> vvlgsqegam
>       541 htaltgatei qtsgtttifa ghlkcrlkmd kltlkgvsyv mctgsfklek
> evaetqhgtv
>       601 lvqvkyegtd apckipfssq dekgvtqngr litanpivid kekpvnieae
> ppfgesyivv
>       661 gagekalkls wfkkgssigk mfeatargar rmailgdtaw dfgsiggvft
> svgklihqif
>       721 gtaygvlfsg vswtmkigig illtwlglns rstslsmtci avgmvtlylg
> vmvqadsgcv
>       781 inwkgkelkc gs
> //
>
>
> On Jan 31, 2008 7:12 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> >
> > On Jan 30, 2008, at 2:30 PM, snoze pa wrote:
> >
> > > Hi Hilmar,
> > >
> > >  After spending lots of time i figure out the error. I am able to load
> > > sequences if the sequences do not have following entry
> > >
> > > xrefs (non-sequence databases):
> >
> > Is this the literal value? I am asking because I can't find this in
> > the file at
> >
> > http://biopython.open-bio.org/SRC/biopython/Tests/GenBank/cor6_6.gb
> >
> > which you said was giving you grief. So does the genbank file above
> > now load, or how can I identify the critical line in there?
> >
> >        -hilmar
> > >
> > > If the Genbank sequence have this entry then script
> > > load_seqdatabase.pl is
> > > crashing. I try it in couple of sequences and found it is the
> > > culprit line
> > > genbank format.  But this line is important as it contain lots of
> > > information... so I am wondering how to solve this problem
> > >
> > > Any help?
> > >
> > > Thanks in advance
> > > s
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>



More information about the Bioperl-l mailing list