[Biojava-l] differences between read in sequence and stored sequence in database]
Gabrielle Doan
gabrielle_doan at gmx.net
Tue Oct 28 14:26:47 UTC 2008
Hi all,
concering the problem as described below I have found out that this
problem also occured in BioRuby and was fixed in 2004.
See:
http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
Unfortunately I'm clueless about BioRuby. Does anybody recognize this
problem or understand how it was solved in BioRuby?
I am grateful for any hints.
Cheers,
Gabrielle
-------- Original-Nachricht --------
Betreff: [Biojava-l] differences between read in sequence and stored
sequence in database
Datum: Mon, 27 Oct 2008 13:57:03 +0100
Von: Gabrielle Doan <gabrielle_doan at gmx.net>
An: biojava-l at biojava.org
Hi all,
I have a BioSQL database which contains all human chromsomes. For my
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry
Biosequence.Seq in the BioSQL schema. So I've made this query:
SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
But this query hasn't yield the desired string, because the length of
this biosequence is only 100,000,020 bp. I am very confused why I get
such a discrepancy. I have added all chromosomes with the build in
method in BioJava addRichSequence(RichSequence seq) to the database.
From my raw data I know that this sequence should have a length of
140,279,252 bp. So where is the remaining part of my sequence? I have
observed these discrepancies on all chromsomes which are longer than
100,000,020 bp.
Here is an abstract of my database:
bioentry_id description length
2 Homo sapiens mitochondrion, complete genome. 16571
3 Homo sapiens chromosome Y, reference assembly, complete sequence.
57772954
4 Homo sapiens chromosome X, reference assembly, complete sequence.
100000020
5 Homo sapiens chromosome 22, reference assembly, complete sequence.
49691432
6 Homo sapiens chromosome 21, reference assembly, complete sequence.
46944323
7 Homo sapiens chromosome 20, reference assembly, complete sequence.
25960004
8 Homo sapiens chromosome 9, reference assembly, complete sequence.
100000020
9 Homo sapiens chromosome 7, reference assembly, complete sequence.
100000020
Sequences smaller than 100,000,020 bp are correctly stored under
Biosequence.seq.
I am grateful for any hints, which explain the behaviour of my database.
Cheers,
Gabrielle
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list