From Yan.Bai at UTSouthwestern.edu Tue Apr 4 11:37:00 2006 From: Yan.Bai at UTSouthwestern.edu (Yan Bai) Date: Tue, 04 Apr 2006 10:37:00 -0500 Subject: [Biojava-dev] ABI file parser Message-ID: I will need to parse the information about the sequences, i.e., sample name, comment, instrument model, run date/time and etc., plus quality calls (sample scores) and save them into a database. I guess I have to modify some library files to fullfill what I need, but dont' know where to start from, which files I need to look into. your inputs are highly appreciated. While reading the source code, I wondered about a variable named MacJunk, which is, according to the descriptions, prepended junks to real data. does it exit in mac files only? it looks like a cross-platform general offset for all data information, not limited to Macintosh, am I miss somthing here? Thanks, Yan >>> Richard Holland 03/23/06 3:30 AM >>> I've used the BioJava ABI parser to parse 3730 ABI files without any problems, and it successfully reads both base calls and quality scores. You should use the ABIFChromatogram method getBaseCalls() to return an alignment of two sequences - the first sequence is the sequence data, the second is a sequence made up of Integer scores. cheers, Richard On Wed, 2006-03-22 at 14:25 -0700, Russ Kepler wrote: > On Wednesday 22 March 2006 02:05 pm, Yan Bai wrote: > > > Another question is about the ABI file parser, located in the package > > org.biojava.bio.program.ABIFParser. Comments of this file indicate that it > > parses files from 377 DNA sequencer, while our sequence files are generated > > by 3730 XL, are there any mismatches between these two formats? Is there a > > parser specific for 3730? I couldn't find anything describe the 3730 XL > > format like the one Clark Tibbett wrote. > > The differences that I can really are the addition of the quality calls and > (maybe) caller name. I'm sure that there are others, but since I wasn't > looking for them I never really noticed their absence. I've got a parser > that keeps the quality call values if you need it. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From russ at kepler-eng.com Thu Apr 6 17:29:24 2006 From: russ at kepler-eng.com (Russ Kepler) Date: Thu, 6 Apr 2006 15:29:24 -0600 Subject: [Biojava-dev] ABI file parser In-Reply-To: References: Message-ID: <200604061529.24300.russ@kepler-eng.com> On Tuesday 04 April 2006 09:37 am, Yan Bai wrote: > While reading the source code, I wondered about a variable named MacJunk, > which is, according to the descriptions, prepended junks to real data. does > it exit in mac files only? it looks like a cross-platform general offset > for all data information, not limited to Macintosh, am I miss somthing > here? I meant to reply to this - the code is an artifact from a bad ftp program that had the tendency to drop 128 bytes of crap at the start of a file when transferring it in. Funny how those sorts of things get stuck in code (like I have any right to complain, I still have coding habits developed 25 years ago). From kowalchukm at AGR.GC.CA Fri Apr 7 15:34:05 2006 From: kowalchukm at AGR.GC.CA (Michael Kowalchuk) Date: Fri, 07 Apr 2006 14:34:05 -0500 Subject: [Biojava-dev] TranslatedSequencePanel / LabelledSequenceRenderer fix Message-ID: <1144438445.21762.39.camel@mbwinnr52786.agr.gc.ca> Hi, I'm developing a J2SE application using BioJava 1.4, and I've found a bug where a LabelledSequenceRenderer will not be drawn correctly on a TranslatedSequencePanel. Specifically, the label will not be visible at all, and the graphics will be cut off on the left. I've fixed this by translating the Graphics2D context by the minimum leading space specified by the renderer. This is equivalent to what is done in SequencePanel. There is still a placement problem with LabelledSequenceRenderer when the TranslatedSequencePanel is vertically aligned, but it works perfectly when aligned horizontally. I'm unsure if this is the fault of TranslatedSequencePanel or LabelledSequenceRenderer. I've attached my patch and the source code for a demonstration of this bug to this message. Michael Kowalchuk, Cereal Research Centre Agriculture and Agri-Food Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: text/x-patch Size: 930 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20060407/e20da954/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: demo.java Type: text/x-java Size: 2046 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-dev/attachments/20060407/e20da954/attachment-0001.bin From richard.holland at ebi.ac.uk Thu Apr 20 08:10:45 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:10:45 +0100 Subject: [Biojava-dev] Wiki sidebar Message-ID: <1145535045.4188.32.camel@texas.ebi.ac.uk> Could someone with admin privileges on the Wiki add BioJava:BioJavaXDocs to the navigation sidebar? cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Apr 25 02:20:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 25 Apr 2006 14:20:38 +0800 Subject: [Biojava-dev] Wiki sidebar Message-ID: Done. Richard Holland Sent by: biojava-dev-bounces at lists.open-bio.org 04/20/2006 08:10 PM To: biojava-dev cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Wiki sidebar Could someone with admin privileges on the Wiki add BioJava:BioJavaXDocs to the navigation sidebar? cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Fri Apr 28 04:08:00 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 16:08:00 +0800 Subject: [Biojava-dev] Docbook Message-ID: Hi - Now that the biojava docbook has been wikified should we remove it from CVS? It seems redundant to have two copies, both editable by the user community. On the same subject, if we copy the examples from the demos directory to the wiki should we remove them too? We could probably update them at the same time to use biojava1.4 or biojavax functions. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From richard.holland at ebi.ac.uk Fri Apr 28 04:43:32 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 09:43:32 +0100 Subject: [Biojava-dev] Docbook In-Reply-To: References: Message-ID: <1146213813.3955.33.camel@texas.ebi.ac.uk> > Now that the biojava docbook has been wikified should we remove it from > CVS? It seems redundant to have two copies, both editable by the user > community. Good plan. > On the same subject, if we copy the examples from the demos directory to > the wiki should we remove them too? We could probably update them at the > same time to use biojava1.4 or biojavax functions. > > - Mark Not sure here - having the demos in the directory saves some people a lot of typing, unless the examples in the Wiki are cut-and-pasteable and of a small enough size to make this manageable. Should definitely update them to 1.4 at least, preferably biojavax once it is released. -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From ckanz at ebi.ac.uk Wed Apr 26 10:07:25 2006 From: ckanz at ebi.ac.uk (Carola Kanz) Date: Wed, 26 Apr 2006 14:07:25 -0000 Subject: [Biojava-dev] Forthcoming change in the EMBL database Message-ID: Dear colleagues, We would like to announce the following important change in the EMBL database in June this year. At the time of release 87 (available from JUN-2006) the format of the EMBL flat file will undergo a change: the ID line will have a different structure (see below) and the SV line will be removed. The changes affecting the ID line structure are: * All tokens will be separated by a semicolon. * The entry name will not be displayed, in its place there will be the primary accession number. * The sequence version will be indicated. * The topology will be a separate token and will be indicated for both circular and linear molecules. * Both the data class and the taxonomic divisions will be displayed. This is an example of the new ID line: ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP. (1) (2) (3) (4) (5) (6) (7) The tokens represent: 1. Primary accession number. 2. 'SV' + sequence version number. 3. Topology: 'circular' or 'linear'. 4. Molecule type. 5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, STS, STD, "normal" entries will have STD for standard). 6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, INV, SYN, UNC, VRL, PHG)." 7. Sequence length + 'BP.'. The entry name will not be displayed any more in the ID line. Since EMBL release 3 (Dec 1983) the stable identifier of an entry has been the primary accession number. A mapping file (entryname to accession number) will be provided with the next release for those entries where the entryname doesn't coincide with the accession number. To give users a test dataset, one file with new-style ID lines called new_id_line.test.gz was provided together with the March release of the EMBL database: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz Feedback from users is sought; please use the "Contact us" link at the bottom of the EBI home page and specify "EMBL" in the feedback form. Note: this information was first made available on our "Forthcoming changes" page ( http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.html#0606 ) and in the EMBL database release notes. Regards, Carola Kanz EMBL database From Yan.Bai at UTSouthwestern.edu Tue Apr 4 15:37:00 2006 From: Yan.Bai at UTSouthwestern.edu (Yan Bai) Date: Tue, 04 Apr 2006 10:37:00 -0500 Subject: [Biojava-dev] ABI file parser Message-ID: I will need to parse the information about the sequences, i.e., sample name, comment, instrument model, run date/time and etc., plus quality calls (sample scores) and save them into a database. I guess I have to modify some library files to fullfill what I need, but dont' know where to start from, which files I need to look into. your inputs are highly appreciated. While reading the source code, I wondered about a variable named MacJunk, which is, according to the descriptions, prepended junks to real data. does it exit in mac files only? it looks like a cross-platform general offset for all data information, not limited to Macintosh, am I miss somthing here? Thanks, Yan >>> Richard Holland 03/23/06 3:30 AM >>> I've used the BioJava ABI parser to parse 3730 ABI files without any problems, and it successfully reads both base calls and quality scores. You should use the ABIFChromatogram method getBaseCalls() to return an alignment of two sequences - the first sequence is the sequence data, the second is a sequence made up of Integer scores. cheers, Richard On Wed, 2006-03-22 at 14:25 -0700, Russ Kepler wrote: > On Wednesday 22 March 2006 02:05 pm, Yan Bai wrote: > > > Another question is about the ABI file parser, located in the package > > org.biojava.bio.program.ABIFParser. Comments of this file indicate that it > > parses files from 377 DNA sequencer, while our sequence files are generated > > by 3730 XL, are there any mismatches between these two formats? Is there a > > parser specific for 3730? I couldn't find anything describe the 3730 XL > > format like the one Clark Tibbett wrote. > > The differences that I can really are the addition of the quality calls and > (maybe) caller name. I'm sure that there are others, but since I wasn't > looking for them I never really noticed their absence. I've got a parser > that keeps the quality call values if you need it. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From russ at kepler-eng.com Thu Apr 6 21:29:24 2006 From: russ at kepler-eng.com (Russ Kepler) Date: Thu, 6 Apr 2006 15:29:24 -0600 Subject: [Biojava-dev] ABI file parser In-Reply-To: References: Message-ID: <200604061529.24300.russ@kepler-eng.com> On Tuesday 04 April 2006 09:37 am, Yan Bai wrote: > While reading the source code, I wondered about a variable named MacJunk, > which is, according to the descriptions, prepended junks to real data. does > it exit in mac files only? it looks like a cross-platform general offset > for all data information, not limited to Macintosh, am I miss somthing > here? I meant to reply to this - the code is an artifact from a bad ftp program that had the tendency to drop 128 bytes of crap at the start of a file when transferring it in. Funny how those sorts of things get stuck in code (like I have any right to complain, I still have coding habits developed 25 years ago). From kowalchukm at AGR.GC.CA Fri Apr 7 19:34:05 2006 From: kowalchukm at AGR.GC.CA (Michael Kowalchuk) Date: Fri, 07 Apr 2006 14:34:05 -0500 Subject: [Biojava-dev] TranslatedSequencePanel / LabelledSequenceRenderer fix Message-ID: <1144438445.21762.39.camel@mbwinnr52786.agr.gc.ca> Hi, I'm developing a J2SE application using BioJava 1.4, and I've found a bug where a LabelledSequenceRenderer will not be drawn correctly on a TranslatedSequencePanel. Specifically, the label will not be visible at all, and the graphics will be cut off on the left. I've fixed this by translating the Graphics2D context by the minimum leading space specified by the renderer. This is equivalent to what is done in SequencePanel. There is still a placement problem with LabelledSequenceRenderer when the TranslatedSequencePanel is vertically aligned, but it works perfectly when aligned horizontally. I'm unsure if this is the fault of TranslatedSequencePanel or LabelledSequenceRenderer. I've attached my patch and the source code for a demonstration of this bug to this message. Michael Kowalchuk, Cereal Research Centre Agriculture and Agri-Food Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: text/x-patch Size: 930 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: demo.java Type: text/x-java Size: 2046 bytes Desc: not available URL: From richard.holland at ebi.ac.uk Thu Apr 20 12:10:45 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Apr 2006 13:10:45 +0100 Subject: [Biojava-dev] Wiki sidebar Message-ID: <1145535045.4188.32.camel@texas.ebi.ac.uk> Could someone with admin privileges on the Wiki add BioJava:BioJavaXDocs to the navigation sidebar? cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Apr 25 06:20:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 25 Apr 2006 14:20:38 +0800 Subject: [Biojava-dev] Wiki sidebar Message-ID: Done. Richard Holland Sent by: biojava-dev-bounces at lists.open-bio.org 04/20/2006 08:10 PM To: biojava-dev cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] Wiki sidebar Could someone with admin privileges on the Wiki add BioJava:BioJavaXDocs to the navigation sidebar? cheers, Richard -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Fri Apr 28 08:08:00 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 28 Apr 2006 16:08:00 +0800 Subject: [Biojava-dev] Docbook Message-ID: Hi - Now that the biojava docbook has been wikified should we remove it from CVS? It seems redundant to have two copies, both editable by the user community. On the same subject, if we copy the examples from the demos directory to the wiki should we remove them too? We could probably update them at the same time to use biojava1.4 or biojavax functions. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From richard.holland at ebi.ac.uk Fri Apr 28 08:43:32 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Apr 2006 09:43:32 +0100 Subject: [Biojava-dev] Docbook In-Reply-To: References: Message-ID: <1146213813.3955.33.camel@texas.ebi.ac.uk> > Now that the biojava docbook has been wikified should we remove it from > CVS? It seems redundant to have two copies, both editable by the user > community. Good plan. > On the same subject, if we copy the examples from the demos directory to > the wiki should we remove them too? We could probably update them at the > same time to use biojava1.4 or biojavax functions. > > - Mark Not sure here - having the demos in the directory saves some people a lot of typing, unless the examples in the Wiki are cut-and-pasteable and of a small enough size to make this manageable. Should definitely update them to 1.4 at least, preferably biojavax once it is released. -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From ckanz at ebi.ac.uk Wed Apr 26 14:07:25 2006 From: ckanz at ebi.ac.uk (Carola Kanz) Date: Wed, 26 Apr 2006 14:07:25 -0000 Subject: [Biojava-dev] Forthcoming change in the EMBL database Message-ID: Dear colleagues, We would like to announce the following important change in the EMBL database in June this year. At the time of release 87 (available from JUN-2006) the format of the EMBL flat file will undergo a change: the ID line will have a different structure (see below) and the SV line will be removed. The changes affecting the ID line structure are: * All tokens will be separated by a semicolon. * The entry name will not be displayed, in its place there will be the primary accession number. * The sequence version will be indicated. * The topology will be a separate token and will be indicated for both circular and linear molecules. * Both the data class and the taxonomic divisions will be displayed. This is an example of the new ID line: ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP. (1) (2) (3) (4) (5) (6) (7) The tokens represent: 1. Primary accession number. 2. 'SV' + sequence version number. 3. Topology: 'circular' or 'linear'. 4. Molecule type. 5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, STS, STD, "normal" entries will have STD for standard). 6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, INV, SYN, UNC, VRL, PHG)." 7. Sequence length + 'BP.'. The entry name will not be displayed any more in the ID line. Since EMBL release 3 (Dec 1983) the stable identifier of an entry has been the primary accession number. A mapping file (entryname to accession number) will be provided with the next release for those entries where the entryname doesn't coincide with the accession number. To give users a test dataset, one file with new-style ID lines called new_id_line.test.gz was provided together with the March release of the EMBL database: ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz Feedback from users is sought; please use the "Contact us" link at the bottom of the EBI home page and specify "EMBL" in the feedback form. Note: this information was first made available on our "Forthcoming changes" page ( http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.html#0606 ) and in the EMBL database release notes. Regards, Carola Kanz EMBL database