From chapmanb at 50mail.com Mon Nov 1 06:53:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 1 Nov 2010 06:53:03 -0400 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: <20101101105303.GA2726@sobchak.mgh.harvard.edu> Peter, Christopher, Chris and Hilmar; > Since you sounded keen Chris, and Brad wasn't replying, I went > ahead and checked it in: > > http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a Awesome, thanks for getting this in. Happy to have it living in an official place. Christopher, thanks for the autoincrement fixes on top of what we had. If you find any additional things that need modification, feel free to check them in and keep it rolling. Brad From cjfields at illinois.edu Mon Nov 1 10:53:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 09:53:14 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: <20101101105303.GA2726@sobchak.mgh.harvard.edu> References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> Message-ID: <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: > Peter, Christopher, Chris and Hilmar; > >> Since you sounded keen Chris, and Brad wasn't replying, I went >> ahead and checked it in: >> >> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a > > Awesome, thanks for getting this in. Happy to have it living in an > official place. > > Christopher, thanks for the autoincrement fixes on top of what we had. > If you find any additional things that need modification, feel free to > check them in and keep it rolling. > > Brad If we need to set up Christopher as a biosql github collaborator/developer, we'll need a github user name. chris From cjfields at illinois.edu Mon Nov 1 11:09:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 10:09:10 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> Message-ID: <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> Done! chris On Nov 1, 2010, at 10:07 AM, Christopher Bottoms wrote: > "molecules" is my user name for github. > > Thanks! > > On Mon, Nov 1, 2010 at 9:53 AM, Chris Fields wrote: >> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: >> >>> Peter, Christopher, Chris and Hilmar; >>> >>>> Since you sounded keen Chris, and Brad wasn't replying, I went >>>> ahead and checked it in: >>>> >>>> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a >>> >>> Awesome, thanks for getting this in. Happy to have it living in an >>> official place. >>> >>> Christopher, thanks for the autoincrement fixes on top of what we had. >>> If you find any additional things that need modification, feel free to >>> check them in and keep it rolling. >>> >>> Brad >> >> If we need to set up Christopher as a biosql github collaborator/developer, we'll need a github user name. >> >> chris >> >> From hlapp at gmx.net Mon Nov 1 11:29:44 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 1 Nov 2010 11:29:44 -0400 Subject: [BioSQL-l] SQLite support In-Reply-To: <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> Message-ID: <286FC849-A4BE-493C-89A0-964DDAECEFD7@gmx.net> Cool - thanks so much guys for stepping in! -hilmar On Nov 1, 2010, at 11:09 AM, Chris Fields wrote: > Done! > > chris > > On Nov 1, 2010, at 10:07 AM, Christopher Bottoms wrote: > >> "molecules" is my user name for github. >> >> Thanks! >> >> On Mon, Nov 1, 2010 at 9:53 AM, Chris Fields >> wrote: >>> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: >>> >>>> Peter, Christopher, Chris and Hilmar; >>>> >>>>> Since you sounded keen Chris, and Brad wasn't replying, I went >>>>> ahead and checked it in: >>>>> >>>>> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a >>>> >>>> Awesome, thanks for getting this in. Happy to have it living in an >>>> official place. >>>> >>>> Christopher, thanks for the autoincrement fixes on top of what we >>>> had. >>>> If you find any additional things that need modification, feel >>>> free to >>>> check them in and keep it rolling. >>>> >>>> Brad >>> >>> If we need to set up Christopher as a biosql github collaborator/ >>> developer, we'll need a github user name. >>> >>> chris >>> >>> > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From throwern at msu.edu Wed Nov 3 11:49:55 2010 From: throwern at msu.edu (Nicholas Thrower) Date: Wed, 03 Nov 2010 11:49:55 -0400 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures Message-ID: Hello all, I?m new to this list so I hope this question hasn?t been asked before. I couldn?t find an archive to search. I have a biosql schema in oracle loaded with several genbank entries including TAIR9. I would like some help determining the relationship between mrna and cds features that belong to a single Gene entry. My seqfeature table has the following entries. OID RANK DISPLAY_NAME ENT_OID TYPE_TRM_OID SOURCE_TRM_OID 666269 22747 Gene 643516 626447 626431 666270 22748 Mrna 643516 626451 626431 666271 22749 Mrna 643516 626451 626431 666272 22750 Cds 643516 626455 626431 666273 22751 Cds 643516 626455 626431 I?m trying to correlate CDS to mRNA so that I can display them. All of the seqfeatures have the same exact locus_tag qualifier and I can?t find any other qualifiers noting this relationship. Do I need to make an assumption based on seqfeature id, or rank in order to determine the relationship between mRNA and CDS. Here is an example from TAIR of what this data could look like. http://arabidopsis.org/servlets/TairObject?id=134549&type=locus How do I know which CDS and mRNA are paired together? -Nick From biopython at maubp.freeserve.co.uk Wed Nov 3 12:20:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 16:20:05 +0000 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 3:49 PM, Nicholas Thrower wrote: > > Hello all, > > I?m new to this list so I hope this question hasn?t been asked before. I > couldn?t find an archive to search. > > I have a biosql schema in oracle loaded with several genbank entries > including TAIR9. I would like some help determining the relationship between > mrna and cds features that belong to a single Gene entry. > > My seqfeature table has the following entries. > > OID ? ? ? ? RANK ? ?DISPLAY_NAME ? ?ENT_OID ? ? TYPE_TRM_OID > SOURCE_TRM_OID > 666269 ?22747 ? Gene ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626447 > 626431 > 666270 ?22748 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 > 626431 > 666271 ?22749 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 > 626431 > 666272 ?22750 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 > 626431 > 666273 ?22751 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 > 626431 > > I?m trying to correlate CDS to mRNA so that I can display them. All of the > seqfeatures have the same exact locus_tag qualifier and I can?t find any > other qualifiers noting this relationship. Do I need to make an assumption > based on seqfeature id, or rank in order to determine the relationship > between mRNA and CDS. > > Here is an example from TAIR of what this data could look like. > http://arabidopsis.org/servlets/TairObject?id=134549&type=locus > > How do I know which CDS and mRNA are paired together? > > -Nick In a GenBank file this relationship between CDS and mRNA or gene features is implicit from the order in the file. I think the rank is used in BioSQL to record the feature order, so you'd have to look at that. (This is assuming the Bio* wrapper you used to load the GenBank file into BioSQL isn't making any inferences like this for you). P.S. Which Bio* wrapper are you using? BioPerl? Peter From throwern at msu.edu Thu Nov 4 08:34:01 2010 From: throwern at msu.edu (Nicholas Thrower) Date: Thu, 04 Nov 2010 08:34:01 -0400 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: Message-ID: Peter, Thank you for the quick answer. Yes, I used the BioPerl wrapper to load the data. I was interested in using BioRuby but was not able to find ruby scripts comparable to load_ncbi_taxonomy.pl and load_seqdatabase.pl. -Nick On 11/3/10 12:20 PM, "Peter" wrote: > On Wed, Nov 3, 2010 at 3:49 PM, Nicholas Thrower wrote: >> >> Hello all, >> >> I?m new to this list so I hope this question hasn?t been asked before. I >> couldn?t find an archive to search. >> >> I have a biosql schema in oracle loaded with several genbank entries >> including TAIR9. I would like some help determining the relationship between >> mrna and cds features that belong to a single Gene entry. >> >> My seqfeature table has the following entries. >> >> OID ? ? ? ? RANK ? ?DISPLAY_NAME ? ?ENT_OID ? ? TYPE_TRM_OID >> SOURCE_TRM_OID >> 666269 ?22747 ? Gene ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626447 >> 626431 >> 666270 ?22748 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 >> 626431 >> 666271 ?22749 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 >> 626431 >> 666272 ?22750 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 >> 626431 >> 666273 ?22751 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 >> 626431 >> >> I?m trying to correlate CDS to mRNA so that I can display them. All of the >> seqfeatures have the same exact locus_tag qualifier and I can?t find any >> other qualifiers noting this relationship. Do I need to make an assumption >> based on seqfeature id, or rank in order to determine the relationship >> between mRNA and CDS. >> >> Here is an example from TAIR of what this data could look like. >> http://arabidopsis.org/servlets/TairObject?id=134549&type=locus >> >> How do I know which CDS and mRNA are paired together? >> >> -Nick > > In a GenBank file this relationship between CDS and mRNA or gene > features is implicit from the order in the file. I think the rank is used in > BioSQL to record the feature order, so you'd have to look at that. > (This is assuming the Bio* wrapper you used to load the GenBank > file into BioSQL isn't making any inferences like this for you). > > P.S. Which Bio* wrapper are you using? BioPerl? > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 09:46:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 13:46:51 +0000 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 12:34 PM, Nicholas Thrower wrote: > > Peter, > > Thank you for the quick answer. > > Yes, I used the BioPerl wrapper to load the data. > > I was interested in using BioRuby but was not able to find ruby scripts > comparable to load_ncbi_taxonomy.pl and load_seqdatabase.pl. > > -Nick Hi Nick, Note that while the load_ncbi_taxonomy.pl script is written in perl it is part of BioSQL, not BioPerl. To me there doesn't seem to be much benefit in rewriting it in Ruby (or Python). I don't know about BioRuby, but the BioPerl script load_seqdatabase.pl doesn't have a direct equivalent in Biopython. Including a selection of scripts with Biopython is something we've considered but haven't really tackled yet. However, it is just a few of lines to do an import in BioSQL using our SeqIO parsing framework. See this wiki page for more details: http://www.biopython.org/wiki/BioSQL You may get more help on the BioRuby mailing list... Regards, Peter From dan.kortschak at adelaide.edu.au Tue Nov 23 01:07:18 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 23 Nov 2010 16:37:18 +1030 Subject: [BioSQL-l] BioSQL seqeunce quality tables Message-ID: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Hi, What is the consensus about storing sequence qualities in the BioSQL schema? There is no specific table for this, so I was wondering what others do. thanks Dan From biopython at maubp.freeserve.co.uk Tue Nov 23 04:00:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 09:00:39 +0000 Subject: [BioSQL-l] BioSQL seqeunce quality tables In-Reply-To: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> References: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Tue, Nov 23, 2010 at 6:07 AM, Dan Kortschak wrote: > > Hi, > > What is the consensus about storing sequence qualities in the BioSQL > schema? There is no specific table for this, so I was wondering what > others do. > > thanks > Dan For Biopython we decided not to store the quality, and document this as a known limitation. As I recall there was some discussion about using the existing BioSQL feature annotations and using a (Sanger) FASTQ encoded string was suggested, but there was no consensus. Is there actually a need for this? You can't be thinking of storing raw reads in BioSQL (are you? I think you'll be disappointed with the performance), but perhaps it is reasonable for contigs. I was also interested in other per-letter-annotation, like secondary structure predictions (which can be stored as a string with the same length as the sequence) or more general things like atomic coords. In principle new tables could be introduced to BioSQL just for per-letter-annotation, designed to work well with extracting a subsequence with the relevant sub-set of per-letter-annotation. Peter From dan.kortschak at adelaide.edu.au Tue Nov 23 04:09:23 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 23 Nov 2010 19:39:23 +1030 Subject: [BioSQL-l] BioSQL seqeunce quality tables In-Reply-To: References: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1290503363.2688.0.camel@sol> I was mainly thinking of contigs, but it was more an exploratory think. cheers Dan On Tue, 2010-11-23 at 09:00 +0000, Peter wrote: > For Biopython we decided not to store the quality, and document this > as a known limitation. As I recall there was some discussion about > using the existing BioSQL feature annotations and using a (Sanger) > FASTQ encoded string was suggested, but there was no consensus. > > Is there actually a need for this? You can't be thinking of storing > raw > reads in BioSQL (are you? I think you'll be disappointed with the > performance), but perhaps it is reasonable for contigs. > > I was also interested in other per-letter-annotation, like secondary > structure predictions (which can be stored as a string with the same > length as the sequence) or more general things like atomic coords. > In principle new tables could be introduced to BioSQL just for > per-letter-annotation, designed to work well with extracting a > subsequence with the relevant sub-set of per-letter-annotation. > > Peter From chapmanb at 50mail.com Mon Nov 1 10:53:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 1 Nov 2010 06:53:03 -0400 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> Message-ID: <20101101105303.GA2726@sobchak.mgh.harvard.edu> Peter, Christopher, Chris and Hilmar; > Since you sounded keen Chris, and Brad wasn't replying, I went > ahead and checked it in: > > http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a Awesome, thanks for getting this in. Happy to have it living in an official place. Christopher, thanks for the autoincrement fixes on top of what we had. If you find any additional things that need modification, feel free to check them in and keep it rolling. Brad From cjfields at illinois.edu Mon Nov 1 14:53:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 09:53:14 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: <20101101105303.GA2726@sobchak.mgh.harvard.edu> References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> Message-ID: <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: > Peter, Christopher, Chris and Hilmar; > >> Since you sounded keen Chris, and Brad wasn't replying, I went >> ahead and checked it in: >> >> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a > > Awesome, thanks for getting this in. Happy to have it living in an > official place. > > Christopher, thanks for the autoincrement fixes on top of what we had. > If you find any additional things that need modification, feel free to > check them in and keep it rolling. > > Brad If we need to set up Christopher as a biosql github collaborator/developer, we'll need a github user name. chris From cjfields at illinois.edu Mon Nov 1 15:09:10 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Nov 2010 10:09:10 -0500 Subject: [BioSQL-l] SQLite support In-Reply-To: References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> Message-ID: <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> Done! chris On Nov 1, 2010, at 10:07 AM, Christopher Bottoms wrote: > "molecules" is my user name for github. > > Thanks! > > On Mon, Nov 1, 2010 at 9:53 AM, Chris Fields wrote: >> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: >> >>> Peter, Christopher, Chris and Hilmar; >>> >>>> Since you sounded keen Chris, and Brad wasn't replying, I went >>>> ahead and checked it in: >>>> >>>> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a >>> >>> Awesome, thanks for getting this in. Happy to have it living in an >>> official place. >>> >>> Christopher, thanks for the autoincrement fixes on top of what we had. >>> If you find any additional things that need modification, feel free to >>> check them in and keep it rolling. >>> >>> Brad >> >> If we need to set up Christopher as a biosql github collaborator/developer, we'll need a github user name. >> >> chris >> >> From hlapp at gmx.net Mon Nov 1 15:29:44 2010 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 1 Nov 2010 11:29:44 -0400 Subject: [BioSQL-l] SQLite support In-Reply-To: <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> References: <8AE63B77-6E24-4A3A-B13E-A5C26043F271@gmx.net> <64F1C223-8E4E-4C71-B3AA-2ECABCC75166@illinois.edu> <20101101105303.GA2726@sobchak.mgh.harvard.edu> <6834C6D9-080C-49E6-B14C-322CD5010C2E@illinois.edu> <3FA886C2-85AC-413E-8D1D-E7948D2A7F90@illinois.edu> Message-ID: <286FC849-A4BE-493C-89A0-964DDAECEFD7@gmx.net> Cool - thanks so much guys for stepping in! -hilmar On Nov 1, 2010, at 11:09 AM, Chris Fields wrote: > Done! > > chris > > On Nov 1, 2010, at 10:07 AM, Christopher Bottoms wrote: > >> "molecules" is my user name for github. >> >> Thanks! >> >> On Mon, Nov 1, 2010 at 9:53 AM, Chris Fields >> wrote: >>> On Nov 1, 2010, at 5:53 AM, Brad Chapman wrote: >>> >>>> Peter, Christopher, Chris and Hilmar; >>>> >>>>> Since you sounded keen Chris, and Brad wasn't replying, I went >>>>> ahead and checked it in: >>>>> >>>>> http://github.com/biosql/biosql/tree/4315be111d7d9eaa47bb3674eeed89e045d2c07a >>>> >>>> Awesome, thanks for getting this in. Happy to have it living in an >>>> official place. >>>> >>>> Christopher, thanks for the autoincrement fixes on top of what we >>>> had. >>>> If you find any additional things that need modification, feel >>>> free to >>>> check them in and keep it rolling. >>>> >>>> Brad >>> >>> If we need to set up Christopher as a biosql github collaborator/ >>> developer, we'll need a github user name. >>> >>> chris >>> >>> > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From throwern at msu.edu Wed Nov 3 15:49:55 2010 From: throwern at msu.edu (Nicholas Thrower) Date: Wed, 03 Nov 2010 11:49:55 -0400 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures Message-ID: Hello all, I?m new to this list so I hope this question hasn?t been asked before. I couldn?t find an archive to search. I have a biosql schema in oracle loaded with several genbank entries including TAIR9. I would like some help determining the relationship between mrna and cds features that belong to a single Gene entry. My seqfeature table has the following entries. OID RANK DISPLAY_NAME ENT_OID TYPE_TRM_OID SOURCE_TRM_OID 666269 22747 Gene 643516 626447 626431 666270 22748 Mrna 643516 626451 626431 666271 22749 Mrna 643516 626451 626431 666272 22750 Cds 643516 626455 626431 666273 22751 Cds 643516 626455 626431 I?m trying to correlate CDS to mRNA so that I can display them. All of the seqfeatures have the same exact locus_tag qualifier and I can?t find any other qualifiers noting this relationship. Do I need to make an assumption based on seqfeature id, or rank in order to determine the relationship between mRNA and CDS. Here is an example from TAIR of what this data could look like. http://arabidopsis.org/servlets/TairObject?id=134549&type=locus How do I know which CDS and mRNA are paired together? -Nick From biopython at maubp.freeserve.co.uk Wed Nov 3 16:20:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 16:20:05 +0000 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: References: Message-ID: On Wed, Nov 3, 2010 at 3:49 PM, Nicholas Thrower wrote: > > Hello all, > > I?m new to this list so I hope this question hasn?t been asked before. I > couldn?t find an archive to search. > > I have a biosql schema in oracle loaded with several genbank entries > including TAIR9. I would like some help determining the relationship between > mrna and cds features that belong to a single Gene entry. > > My seqfeature table has the following entries. > > OID ? ? ? ? RANK ? ?DISPLAY_NAME ? ?ENT_OID ? ? TYPE_TRM_OID > SOURCE_TRM_OID > 666269 ?22747 ? Gene ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626447 > 626431 > 666270 ?22748 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 > 626431 > 666271 ?22749 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 > 626431 > 666272 ?22750 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 > 626431 > 666273 ?22751 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 > 626431 > > I?m trying to correlate CDS to mRNA so that I can display them. All of the > seqfeatures have the same exact locus_tag qualifier and I can?t find any > other qualifiers noting this relationship. Do I need to make an assumption > based on seqfeature id, or rank in order to determine the relationship > between mRNA and CDS. > > Here is an example from TAIR of what this data could look like. > http://arabidopsis.org/servlets/TairObject?id=134549&type=locus > > How do I know which CDS and mRNA are paired together? > > -Nick In a GenBank file this relationship between CDS and mRNA or gene features is implicit from the order in the file. I think the rank is used in BioSQL to record the feature order, so you'd have to look at that. (This is assuming the Bio* wrapper you used to load the GenBank file into BioSQL isn't making any inferences like this for you). P.S. Which Bio* wrapper are you using? BioPerl? Peter From throwern at msu.edu Thu Nov 4 12:34:01 2010 From: throwern at msu.edu (Nicholas Thrower) Date: Thu, 04 Nov 2010 08:34:01 -0400 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: Message-ID: Peter, Thank you for the quick answer. Yes, I used the BioPerl wrapper to load the data. I was interested in using BioRuby but was not able to find ruby scripts comparable to load_ncbi_taxonomy.pl and load_seqdatabase.pl. -Nick On 11/3/10 12:20 PM, "Peter" wrote: > On Wed, Nov 3, 2010 at 3:49 PM, Nicholas Thrower wrote: >> >> Hello all, >> >> I?m new to this list so I hope this question hasn?t been asked before. I >> couldn?t find an archive to search. >> >> I have a biosql schema in oracle loaded with several genbank entries >> including TAIR9. I would like some help determining the relationship between >> mrna and cds features that belong to a single Gene entry. >> >> My seqfeature table has the following entries. >> >> OID ? ? ? ? RANK ? ?DISPLAY_NAME ? ?ENT_OID ? ? TYPE_TRM_OID >> SOURCE_TRM_OID >> 666269 ?22747 ? Gene ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626447 >> 626431 >> 666270 ?22748 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 >> 626431 >> 666271 ?22749 ? Mrna ? ? ? ? ? ? ? ? ? ? ? 643516 ? ? ? ?626451 >> 626431 >> 666272 ?22750 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 >> 626431 >> 666273 ?22751 ? Cds ? ? ? ? ? ? ? ? ? ? ? ? ?643516 ? ? ? ?626455 >> 626431 >> >> I?m trying to correlate CDS to mRNA so that I can display them. All of the >> seqfeatures have the same exact locus_tag qualifier and I can?t find any >> other qualifiers noting this relationship. Do I need to make an assumption >> based on seqfeature id, or rank in order to determine the relationship >> between mRNA and CDS. >> >> Here is an example from TAIR of what this data could look like. >> http://arabidopsis.org/servlets/TairObject?id=134549&type=locus >> >> How do I know which CDS and mRNA are paired together? >> >> -Nick > > In a GenBank file this relationship between CDS and mRNA or gene > features is implicit from the order in the file. I think the rank is used in > BioSQL to record the feature order, so you'd have to look at that. > (This is assuming the Bio* wrapper you used to load the GenBank > file into BioSQL isn't making any inferences like this for you). > > P.S. Which Bio* wrapper are you using? BioPerl? > > Peter From biopython at maubp.freeserve.co.uk Thu Nov 4 13:46:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 13:46:51 +0000 Subject: [BioSQL-l] Relationship between Gene CDS and mRNA Seqfeatures In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 12:34 PM, Nicholas Thrower wrote: > > Peter, > > Thank you for the quick answer. > > Yes, I used the BioPerl wrapper to load the data. > > I was interested in using BioRuby but was not able to find ruby scripts > comparable to load_ncbi_taxonomy.pl and load_seqdatabase.pl. > > -Nick Hi Nick, Note that while the load_ncbi_taxonomy.pl script is written in perl it is part of BioSQL, not BioPerl. To me there doesn't seem to be much benefit in rewriting it in Ruby (or Python). I don't know about BioRuby, but the BioPerl script load_seqdatabase.pl doesn't have a direct equivalent in Biopython. Including a selection of scripts with Biopython is something we've considered but haven't really tackled yet. However, it is just a few of lines to do an import in BioSQL using our SeqIO parsing framework. See this wiki page for more details: http://www.biopython.org/wiki/BioSQL You may get more help on the BioRuby mailing list... Regards, Peter From dan.kortschak at adelaide.edu.au Tue Nov 23 06:07:18 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 23 Nov 2010 16:37:18 +1030 Subject: [BioSQL-l] BioSQL seqeunce quality tables Message-ID: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Hi, What is the consensus about storing sequence qualities in the BioSQL schema? There is no specific table for this, so I was wondering what others do. thanks Dan From biopython at maubp.freeserve.co.uk Tue Nov 23 09:00:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Nov 2010 09:00:39 +0000 Subject: [BioSQL-l] BioSQL seqeunce quality tables In-Reply-To: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> References: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: On Tue, Nov 23, 2010 at 6:07 AM, Dan Kortschak wrote: > > Hi, > > What is the consensus about storing sequence qualities in the BioSQL > schema? There is no specific table for this, so I was wondering what > others do. > > thanks > Dan For Biopython we decided not to store the quality, and document this as a known limitation. As I recall there was some discussion about using the existing BioSQL feature annotations and using a (Sanger) FASTQ encoded string was suggested, but there was no consensus. Is there actually a need for this? You can't be thinking of storing raw reads in BioSQL (are you? I think you'll be disappointed with the performance), but perhaps it is reasonable for contigs. I was also interested in other per-letter-annotation, like secondary structure predictions (which can be stored as a string with the same length as the sequence) or more general things like atomic coords. In principle new tables could be introduced to BioSQL just for per-letter-annotation, designed to work well with extracting a subsequence with the relevant sub-set of per-letter-annotation. Peter From dan.kortschak at adelaide.edu.au Tue Nov 23 09:09:23 2010 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 23 Nov 2010 19:39:23 +1030 Subject: [BioSQL-l] BioSQL seqeunce quality tables In-Reply-To: References: <1290492438.21375.18.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1290503363.2688.0.camel@sol> I was mainly thinking of contigs, but it was more an exploratory think. cheers Dan On Tue, 2010-11-23 at 09:00 +0000, Peter wrote: > For Biopython we decided not to store the quality, and document this > as a known limitation. As I recall there was some discussion about > using the existing BioSQL feature annotations and using a (Sanger) > FASTQ encoded string was suggested, but there was no consensus. > > Is there actually a need for this? You can't be thinking of storing > raw > reads in BioSQL (are you? I think you'll be disappointed with the > performance), but perhaps it is reasonable for contigs. > > I was also interested in other per-letter-annotation, like secondary > structure predictions (which can be stored as a string with the same > length as the sequence) or more general things like atomic coords. > In principle new tables could be introduced to BioSQL just for > per-letter-annotation, designed to work well with extracting a > subsequence with the relevant sub-set of per-letter-annotation. > > Peter