[BioRuby] Parsing a file in Swissprot format

Urban Hafner urban at bettong.net
Fri Dec 2 08:00:01 EST 2005


Hej everybody,
I'm new to BioRuby and I think I'm doing something wrong while parsing a
file in Swissprot format. What I'm trying to do is to get the sequence
out of it. I do it like this:

sequence = Bio::SPTR.new(File.new(f).read)
p sequence.sq

But that doesn't work it gives me this error message:

/home/users/hafner/lib/site_ruby/1.8/bio/db/embl/sptr.rb:706:in `sq':
Invalid SQ Line:  (RuntimeError)
'AAGCTTAATGTATATAATCTTTTAGAGGTAAAATCTACAGCCAGCAAAAGTCATGGTAAA
TATTCTTTGACTGAACTCTCACTAAACTCCTCTAAATTATATGTCATATTAACTGGTTAA
ATTAATATAAATTTGTGACATGACCTTAACTGGTTAGGTAGGATATTTTTCTTCATGCAA
AAATATGACTAATAATAATTTAGCACAAAAATATTTCCCAATACTTTAATTCTGTGATAG
AAAAATGTTTAACTCAGCTACTATAATCCCATAATTTTGAAAACTATTTATTAGCTTTTG
TGTTTGACCCTTCCCTAGCCAAAGGCAACTATTTAAGGACCCTTTAAAACTCTTGAAACT
ACTTTAGAGTC'     from diplomarbeit/tools/smartdb-entries-without-
sequence.rb:10

I"m not sure if this is BioRuby's (I'm using the version from CVS) fault
or if the input file is faulty.

Does anybody have a clue what I'm doing wrong here?

Cheers, Urban

Here's my input file:

AC   SM0000001
XX   
DT   1.1.1999 00:00:00 (created); ili
DT   8.12.2004 12:49:00 (updated); ili2
XX   
NA   MOUSE$kappa-MAR
XX   
OS   mouse, Mus spec.
OC   eukaryota; animalia; metazoa; chordata; vertebrata;
OC   tetrapoda; mammalia; eutheria; rodentia; myomorpha; muridae;
OC   murinae
XX   
HO   human, rabbit [2]
XX   
SZ   371 bp
XX   
DE   G000538; immunoglobulin kappa light chain
DP   Direction: 3'; Pos 1: ATG
DN   Internal: y; 
DC   between joining and constant regions [1]; ~200 bp
DC   upstream of the kappa enhancer [1]
XX   
SQ   AAGCTTAATGTATATAATCTTTTAGAGGTAAAATCTACAGCCAGCAAAAGTCATGGTAAA
SQ   TATTCTTTGACTGAACTCTCACTAAACTCCTCTAAATTATATGTCATATTAACTGGTTAA
SQ   ATTAATATAAATTTGTGACATGACCTTAACTGGTTAGGTAGGATATTTTTCTTCATGCAA
SQ   AAATATGACTAATAATAATTTAGCACAAAAATATTTCCCAATACTTTAATTCTGTGATAG
SQ   AAAAATGTTTAACTCAGCTACTATAATCCCATAATTTTGAAAACTATTTATTAGCTTTTG
SQ   TGTTTGACCCTTCCCTAGCCAAAGGCAACTATTTAAGGACCCTTTAAAACTCTTGAAACT
SQ   ACTTTAGAGTC
SC   [7]
XX   
FT   2 - 11: cleavage by topoisomerase II [3]
FT   2 - 15: deleted in plasmacytoma PC 7183 [3]
FT   5 - 14: cleavage by topoisomerase II [3]
FT   5 - 14: 5'-recombination junction [3]
FT   8 - 17: cleavage by topoisomerase II [3]
FT   10 - 19: cleavage by topoisomerase II [3]
FT   32 - 41: cleavage by topoisomerase II [3]
FT   53 - 62: cleavage by Drosophila topoisomerase II only
FT   [3]
FT   68 - 77: cleavage by topoisomerase II [3]
FT   69 - 78: cleavage by topoisomerase II [3]
FT   73 - 82: cleavage by topoisomerase II [3]
FT   98 - 107: cleavage by Drosophila topoisomerase II only
FT   [3]
FT   147 - 156: cleavage by topoisomerase II [3]
FT   163 - 284: confers MAR-like features upon any DNA when
FT   contiguously reiterated in the same molecule
FT   [7]
FT   164 - 170: similar motif found in human PARP MAR
FT   SM0000116 [8]
FT   182 - 191: cleavage by topoisomerase II [3]
FT   189 - 198: cleavage by topoisomerase II [3]
FT   219 - 228: cleavage by topoisomerase II [3]
FT   242 - 251: cleavage by topoisomerase II [3]
FT   248 - 257: cleavage by topoisomerase II [3]
FT   253 - 253: G in [3]
FT   256 - 265: cleavage by topoisomerase II [3]
XX   
SF   topoisomerase II sites [1]; AT-rich sites [1];
SF   contains a breakpoint for chromosomal translocation [3];
SF   several short stretches of homopolymeric adenine or
SF   thymine [7]
XX   
BP   75% [J. Bode, direct submission]; 20% [7]
TP   constitutive [1]
XX   
FF   prototype of a S/MAR; contributes to maximal expression of
FF   the kappa gene [2]; contributes to hypermutation [9];
FF   contributes to kappa expression as shown by flow cytometic
FF   assay, but has little effect on accumulation of the
FF   respective mRNA [9]
XX   
CP   liver, kidney, spleen, thymus, MPC-11, P-815, L-cell [1]
XX   
EV   in vitro selection of S/MAR 
EC   [J. Bode, direct submission]
XX   
BF   SB000002; lamin A [6]
MM   nitrocellulose filter binding; 
SO   rl; rat
QA   6
BF   SB000003; lamin B1 [6]
MM   nitrocellulose filter binding; 
SO   rl; rat
QA   6
BF   SB000004; lamin C [6]
MM   nitrocellulose filter binding; 
SO   rl; rat
QA   6
BF   SB000018; SP120 [4]
MM   nitrocellulose filter binding; 
SO   brain; rat
QA   6
BF   SB000018; SP120 [4]
MM   southwestern blotting; 
SO   brain; rat
QA   6
BF   SB000022; topoisomerase II [3]
MM   gel retardation; 
SO   Drosophila; Drosophila melanogaster
QA   6
BF   SB000022; topoisomerase II [3]
MM   topoisomerase II cleavage assay; 
SO   Drosophila; Drosophila melanogaster
QA   6
BF   SB000043; topoisomerase II [3]
MM   topoisomerase II cleavage assay; 
SO   calf; calf
QA   6
BF   SB000045; SMI1 [5]
MM   functional analysis; 
PR   254 bp fragment
SO   yeast, extract; baker's yeast, Saccharomyces cerevisiae
QA   6
BF   SB000052; topoisomerase II [3]
MM   topoisomerase II cleavage assay; 
SO   mouse; mouse
QA   6
BF   SB000053; topoisomerase II [3]
MM   nitrocellulose filter binding; 
SO   HeLa; human
QA   6
BF   SB000067; SMAR1 [10]
MM   gel shift competition; 
SO   rec(mouse-E.coli); mouse
QA   6
BF   SB000077; SAF-A [12]
MM   supershift (antibody binding); 
SO   liver; mouse
QA   6
BF   SB000077; SAF-A [12]
MM   southwestern blotting; 
SO   liver; mouse
QA   6
XX   
RN   [1]
RX   MEDLINE; 86106203 PubMed; 3002631
RA   Cockerill, P. N., Garrard, W. T.
RT   Chromosomal loop anchorage of the kappa immunoglobin gene
RT   occurs next to the enhancer in a region containing
RT   topoisomerase II sites
RL   Cell 44:273-282 (1986)
RN   [2]
RX   MEDLINE; 90078219 PubMed; 2512290
RA   Blasquez, V. C., Xu, M., Moses, S. C., Garrard, W. T.
RT   Immunoglobulin kappa gene expression after stable
RT   integration. I. Role of the intronic MAR and enhancer in
RT   plasmacytoma cells
RL   J. Biol. Chem. 264:21183-21189 (1989)
RN   [3]
RX   MEDLINE; 89315824 PubMed; 2546156
RA   Sperry, A. O., Blasquez, V. C., Garrard, W. T.
RT   Dysfunction of chromosomal llop attachment sites:
RT   Illegitimate recombination linked to matrix association
RT   regions and topoisomerase II
RL   Proc. Natl. Acad. Sci. USA 86:5497-5501 (1989)
RN   [4]
RX   MEDLINE; 93286136 PubMed; 8509422
RA   Tsutsui, K., Tsutsui, K., Okada, S., Watarai, S., Seki, S.,
RA   Yasuda, T., Shohmori, T.
RT   Identification and characterization of a nuclear scaffold
RT   protein that binds the matrix attachment region DNA
RL   J. Biol. Chem. 268:12886-12894 (1993)
RN   [5]
RX   MEDLINE; 93296190 PubMed; 8516310
RA   Fishel, B. R., Sperry, A. O., Garrard, W. T.
RT   Yeast calmodulin and a conserved nuclear protein
RT   participate in the in vivo binding of a matrix associated
RT   region
RL   Proc. Natl. Acad. Sci. USA 90:5623-5627 (1993)
RN   [6]
RX   MEDLINE; 94344140 PubMed; 8065361
RA   Luderus, M. E. E., den Blaauwen, J. L., de Smit, O. J. B.,
RA   Compton, D. A., van Driel, R.
RT   Binding of matrix attachment regions to lamin polymers
RT   involves single-stranded regions and the minor groove
RL   Mol. Cell. Biol. 14:6297-6305 (1994)
RN   [7]
RX   MEDLINE; 96222527 PubMed; 8670229
RA   Okada, S., Tsutsui, K., Tsutsui, K., Seki, S., Shohmori, T.
RT   Subdomain structure of the matrix attachment region located
RT   within the mouse immunoglobulin kappa gene intron
RL   Biochem. Biophys. Res. Commun. 222:472-477 (1996)
RN   [8]
RA   Boulikas, T., Kong, C. F., Brooks, D., Hsie, L.
RT   The 3' untranslated region of the human
RT   poly(ADP-ribose)polymerase gene is a nuclear matrix
RT   anchoring site
RL   Int. J. Oncol. 9:1287-1294 (1996)
RN   [9]
RX   MEDLINE; 97377037 PubMed; 9233808
RA   Goyenechea, B., Klix, N., Williams, G. T., Riddell, A.,
RA   Neuberger, M. S., Milstein, C.
RT   Cells strongly expressing Ig kappa transgenes show clonal
RT   recruitment of hypermutation: a role for both MAR and the
RT   enhancers
RL   EMBO J. 16:3987-3994 (1997)
RN   [10]
RX   MEDLINE; 20408892 PubMed; 10950932
RA   Chattopadhyay, S., Kaul, R., Charest, A., Housman, D.,
RA   Chen, J.
RT   SMAR1, a novel, alternatively spliced gene product, binds
RT   the scaffold/matrix-associated region at the T cell
RT   receptor beta locus
RL   Genomics 68:93-96 (2000)
RN   [11]
RX   MEDLINE; 20496822 PubMed; 11041885
RA   Morisawa, G., Han-yama, A., Moda, I., Tamai, A., Iwabuchi,
RA   M., Meshi, T.
RT   AHM1, a novel type of nuclear matrix-localized, MAR binding
RT   protein with a single AT hook and a J
RT   domain-homologous region
RL   Plant Cell 12:1903-1916 (2000)
RN   [12]
RX   MEDLINE; 21456956 PubMed; 11573239
RA   Lobov, I. B., Tsutsui, K., Mitchell, A. R., Podgornaya, O.
RA   I.
RT   Specificity of SAF-A and lamin B binding in vitro
RT   correlates with the satellite DNA bending state
RL   J. Cell. Biochem. 83:218-229 (2001)
//



More information about the BioRuby mailing list