From holland at eaglegenomics.com Sat May 1 03:41:40 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 1 May 2010 08:41:40 +0100 Subject: [Biojava-l] problems with intallation of biojava in windows 7 In-Reply-To: <20100430212950.M75279@cnpaf.embrapa.br> References: <20100430184758.M13673@cnpaf.embrapa.br> <20100430212950.M75279@cnpaf.embrapa.br> Message-ID: <8F8E0D1C-73B7-42BA-A979-3204BFD47AD4@eaglegenomics.com> BioJava is not an executable program, and it does not have an 'installer'. To 'install' it just download the JAR files and to make sure they're included in your classpath when you run a program that makes use of it. cheers, Richard On 30 Apr 2010, at 22:32, Marcelo Goncalves Narciso (Pesquisador) wrote: > Hi, people, > > I need your help. > > When I try to install biojava in windows 7, it happens: > >> C:\Users\narciso\biojava>java -jar biojava-1.7.1-all.jar >> Failed to load Main-Class manifest attribute from >> biojava-1.7.1-all.jar > How can I fix it? > > Thanks a lot > > Marcelo > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomascramera at dnastar.com Thu May 6 12:50:15 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Thu, 6 May 2010 11:50:15 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands Message-ID: >From a PDB file, I can identify which atoms are in ligands, and which are in residues in the chain. The chain atoms end with the TER record. >From the BioJava API, I can distinguish as well -- if it's an amino sequence and the automatic alignment between SEQRES and ATOM sequences is successful. Is there a way through the API to identify atoms in ligands, when the chain is not an amino sequence or alignment fails? It looks like the TER record is ignored by PDBFileParser. From andreas at sdsc.edu Thu May 6 16:51:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 May 2010 13:51:40 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, You don't need to process TERs to build up the representation of a structure. The BioJava data model will work fine even if the file does not contain any amino acids. (e.g. check 2KQO ) Ligands will get represented as Hetatom groups in the datamodel. Check the Hetatom or Group javadocs for how to access their atoms. For your last question: Check out the Chain.getAtomGroups() and Chain.getSeqResGroups() methods... If it does not work the way you expect for a particular PDB ID, please let me know the ID, so I can take a look at the details. Andreas On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer wrote: > >From a PDB file, I can identify which atoms are in ligands, and which > are in residues in the chain. The chain atoms end with the TER record. > > > > >From the BioJava API, I can distinguish as well -- if it's an amino > sequence and the automatic alignment between SEQRES and ATOM sequences > is successful. > > > > Is there a way through the API to identify atoms in ligands, when the > chain is not an amino sequence or alignment fails? It looks like the TER > record is ignored by PDBFileParser. > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From thomascramera at dnastar.com Fri May 7 13:55:52 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Fri, 7 May 2010 12:55:52 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Thursday, May 06, 2010 3:52 PM To: Andy Thomas-Cramer Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands Hi Andy, You don't need to process TERs to build up the representation of a structure. The BioJava data model will work fine even if the file does not contain any amino acids. (e.g. check 2KQO ) Ligands will get represented as Hetatom groups in the datamodel. Check the Hetatom or Group javadocs for how to access their atoms. For your last question: Check out the Chain.getAtomGroups() and Chain.getSeqResGroups() methods... If it does not work the way you expect for a particular PDB ID, please let me know the ID, so I can take a look at the details. Andreas On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer wrote: > >From a PDB file, I can identify which atoms are in ligands, and which > are in residues in the chain. The chain atoms end with the TER record. > > > > >From the BioJava API, I can distinguish as well -- if it's an amino > sequence and the automatic alignment between SEQRES and ATOM sequences > is successful. > > > > Is there a way through the API to identify atoms in ligands, when the > chain is not an amino sequence or alignment fails? It looks like the TER > record is ignored by PDBFileParser. > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri May 7 18:27:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 7 May 2010 15:27:39 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, I see what you intend to do. If you want just the HETATOM groups, you can request them with Chain.getAtomGroups(GroupType.HETATM); or just amino acids groups you can requrest them with Chain.getAtomGroups(GroupType.AMINOACID); same would work for getSeqresGroups(...) as well, but then your two examples are quite specific: 193D is an antibiotic/DNA complex. 7EST chain I is a TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE Hetatoms are represented as Xs during the sequence alignments. I can easily fix the "failing" alignment in this case, by ignoring the wrongly aligned Hetatom Xs (patch just committed to SVN...). Not sure if it makes any biological difference in your two examples. Andreas On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer wrote: > > Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. > > Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. > > Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. > > Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Thursday, May 06, 2010 3:52 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > You don't need to process TERs to build up the representation of a > structure. ?The BioJava data model will work fine even if the file > does not contain any amino acids. (e.g. ?check 2KQO ) > > Ligands will get represented as Hetatom groups in the datamodel. > Check the Hetatom or Group javadocs for how to access their atoms. > > For your last question: Check out the Chain.getAtomGroups() and > Chain.getSeqResGroups() methods... > > If it does not work the way you expect for a particular PDB ID, please > let me know the ID, so I can take a look at the details. > > Andreas > > > On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer > wrote: >> >From a PDB file, I can identify which atoms are in ligands, and which >> are in residues in the chain. The chain atoms end with the TER record. >> >> >> >> >From the BioJava API, I can distinguish as well -- if it's an amino >> sequence and the automatic alignment between SEQRES and ATOM sequences >> is successful. >> >> >> >> Is there a way through the API to identify atoms in ligands, when the >> chain is not an amino sequence or alignment fails? It looks like the TER >> record is ignored by PDBFileParser. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jake at researchtogether.com Mon May 10 18:36:52 2010 From: jake at researchtogether.com (jake at researchtogether.com) Date: Mon, 10 May 2010 23:36:52 +0100 Subject: [Biojava-l] Idle Developer Volunteering Message-ID: <20100510223652.GB30216@researchtogether.com> Hi All, I've got some free time and I'd like to help out on this project, but I'm not sure how active the community is. Is the bug list accurate as they seem a bit old? Also, if anyone wants any coding doing I'd be glad to help out. Cheers, Jake From andreas at sdsc.edu Mon May 10 20:49:01 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 10 May 2010 17:49:01 -0700 Subject: [Biojava-l] Idle Developer Volunteering In-Reply-To: <20100510223652.GB30216@researchtogether.com> References: <20100510223652.GB30216@researchtogether.com> Message-ID: Hi Jake, thanks for your interest. We are always looking for contributors of code, documentation or helping out with questions on the mailing lists. The bug tracking system is currently a bit quiet, since we are working on a major new version: BioJava 3. The current status is documented here: http://www.biojava.org/wiki/BioJava:Modules Perhaps you can take a look there and see how this is aligned with your research interests... Andreas On Mon, May 10, 2010 at 3:36 PM, wrote: > Hi All, > > I've got some free time and I'd like to help out on this project, but I'm not sure how active the community is. Is the bug list accurate as they seem a bit old? > > Also, if anyone wants any coding doing I'd be glad to help out. > > Cheers, > Jake > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From wangh5 at muohio.edu Tue May 11 15:21:54 2010 From: wangh5 at muohio.edu (Wang, Han) Date: Tue, 11 May 2010 15:21:54 -0400 Subject: [Biojava-l] Getting numeric quality values from fastq Message-ID: Dear someone concerned: I am new to biojava. I faced some problems on the numrica values of the fastq file. I read the biojava API and found how to read in the fastq file and get the quality from each fastq reads. Unfortunately, it just reads the sequence of the original quality sequence rather numeric quality values. Can someone give me some help on this problem? I will really appreciate. Sincerely Han From idoerg at gmail.com Wed May 12 17:22:20 2010 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 12 May 2010 17:22:20 -0400 Subject: [Biojava-l] FASTQ in biojava Message-ID: Hi, We're trying to get the numeric values from the biojava fastq reader. As we understand the API, getQuality only supplies the ASCII value of the quality string. How do we get the actual Q numeric values? http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/Fastq.html#getQuality%28%29 Thanks, Iddo -- Iddo Friedberg http://iddo-friedberg.net/contact.html From heuermh at acm.org Thu May 13 01:12:44 2010 From: heuermh at acm.org (Michael Heuer) Date: Thu, 13 May 2010 01:12:44 -0400 (EDT) Subject: [Biojava-l] FASTQ in biojava In-Reply-To: Message-ID: On Tue, 11 May 2010, Wang, Han wrote: > I am new to biojava. I faced some problems on the numrica values of the > fastq file. I read the biojava API and found how to read in the fastq > file and get the quality from each fastq reads. Unfortunately, it just > reads the sequence of the original quality sequence rather numeric > quality values. Can someone give me some help on this problem? I will > really appreciate. On Wed, 12 May 2010, Iddo Friedberg wrote: > We're trying to get the numeric values from the biojava fastq reader. As we > understand the API, getQuality only supplies the ASCII value of the quality > string. How do we get the actual Q numeric values? That is correct, the current fastq package just handles IO to/from the Fastq memento class and conversion between different variants of the FASTQ format. The next step would be to go from a Fastq record to a Sequence with quality scores. I haven't written that part yet, guess I should get on it. :) michael From jeedward at yahoo.com Fri May 14 17:41:40 2010 From: jeedward at yahoo.com (John Edward) Date: Fri, 14 May 2010 14:41:40 -0700 (PDT) Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 Message-ID: <933238.27005.qm@web45913.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From holland at eaglegenomics.com Mon May 17 14:47:10 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 17 May 2010 19:47:10 +0100 Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 In-Reply-To: <933238.27005.qm@web45913.mail.sp1.yahoo.com> References: <933238.27005.qm@web45913.mail.sp1.yahoo.com> Message-ID: <4C7C7F85-FEB3-43D8-BD44-27201E00DA4B@eaglegenomics.com> Who is doing the BioJava talk at BOSC this year? I'll be at BCBGC-10 so I can present the same talk there if you like. cheers, Richard On 14 May 2010, at 22:41, John Edward wrote: > It > would be highly appreciated if you could share this announcement with your > colleagues, students and individuals whose research is in bioinformatics, > computational biology, genomics, data-mining, and related areas. > > Call > for papers: BCBGC-10, USA, July 2010 > > The > 2010 International Conference on Bioinformatics, Computational Biology, > Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will > be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of > bioinformatics, computational biology, genomics and chemoinformatics and > focuses on all areas related to the conference. > > The > conference will be held at the same time and location where several other major > international conferences will be taking place. The conference will be held as > part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during > July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to > promote research and developmental activities in computer science, information > technology, control engineering, and related fields. Another goal is to promote > the dissemination of research to a multidisciplinary audience and to facilitate > communication among researchers, developers, practitioners in different fields. > The following conferences are planned to be organized as part of MULTICONF-10. > > ? International Conference on > Artificial Intelligence and Pattern Recognition (AIPR-10) > ? International Conference on Automation, > Robotics and Control Systems (ARCS-10) > ? International Conference on > Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) > ? International Conference on Computer > Communications and Networks (CCN-10) > ? International Conference on > Enterprise Information Systems and Web Technologies (EISWT-10) > ? International Conference on High > Performance Computing Systems (HPCS-10) > ? International Conference on > Information Security and Privacy (ISP-10) > ? International Conference on Image and > Video Processing and Computer Vision (IVPCV-10) > ? International Conference on Software > Engineering Theory and Practice (SETP-10) > ? International Conference on > Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) > > > MULTICONF-10 > will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! > Located 1/2 block south of the famed International Drive, the hotel is just > minutes from great entertainment like Walt Disney World? Resort, Universal > Studios and Sea World Orlando. Guests can enjoy free scheduled transportation > to these theme parks, as well as spacious accommodations, outdoor pools and > on-site dining ? all situated on 10 tropically landscaped acres. Here, guests > can experience a full-service resort with discount hotel pricing in Orlando. > > We > invite draft paper submissions. Please see the website http://www.PromoteResearch.org for > more details. > > Sincerely > John > Edward > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomascramera at dnastar.com Mon May 17 18:46:54 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Mon, 17 May 2010 17:46:54 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andreas. I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below. I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only. For example, 193D includes these two identical sequences: SEQRES 1 C 5 HQU DSN ALA NCY CPC SEQRES 1 D 5 HQU DSN ALA NCY CPC And chain C has an associated ligand, NBU, which is not part of the sequence. Let: * "BioJava SEQRES" = chain.getSeqResGroups() * "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM): Then I get the following results with the new code: Chain: C Actual ligands: NBU Actual SEQRES: HQU DSN ALA NCY CPC BioJava SEQRES: HQU DSN ALA CPC NBU <-- ^^^ BioJava HETATM: HQU DSN NCY CPC NBU Chain: D Actual ligands: None Actual SEQRES: HQU DSN ALA NCY CPC BioJava SEQRES: HQU DSN ALA NCY CPC BioJava HETATM: HQU DSN NCY CPC I'm looking for the "Actual" lines above. Issues: * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither. -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Friday, May 07, 2010 5:28 PM To: Andy Thomas-Cramer Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands Hi Andy, I see what you intend to do. If you want just the HETATOM groups, you can request them with Chain.getAtomGroups(GroupType.HETATM); or just amino acids groups you can requrest them with Chain.getAtomGroups(GroupType.AMINOACID); same would work for getSeqresGroups(...) as well, but then your two examples are quite specific: 193D is an antibiotic/DNA complex. 7EST chain I is a TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE Hetatoms are represented as Xs during the sequence alignments. I can easily fix the "failing" alignment in this case, by ignoring the wrongly aligned Hetatom Xs (patch just committed to SVN...). Not sure if it makes any biological difference in your two examples. Andreas On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer wrote: > > Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. > > Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. > > Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. > > Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Thursday, May 06, 2010 3:52 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > You don't need to process TERs to build up the representation of a > structure. ?The BioJava data model will work fine even if the file > does not contain any amino acids. (e.g. ?check 2KQO ) > > Ligands will get represented as Hetatom groups in the datamodel. > Check the Hetatom or Group javadocs for how to access their atoms. > > For your last question: Check out the Chain.getAtomGroups() and > Chain.getSeqResGroups() methods... > > If it does not work the way you expect for a particular PDB ID, please > let me know the ID, so I can take a look at the details. > > Andreas > > > On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer > wrote: >> >From a PDB file, I can identify which atoms are in ligands, and which >> are in residues in the chain. The chain atoms end with the TER record. >> >> >> >> >From the BioJava API, I can distinguish as well -- if it's an amino >> sequence and the automatic alignment between SEQRES and ATOM sequences >> is successful. >> >> >> >> Is there a way through the API to identify atoms in ligands, when the >> chain is not an amino sequence or alignment fails? It looks like the TER >> record is ignored by PDBFileParser. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Mon May 17 22:04:00 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 17 May 2010 19:04:00 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, - There are a few things to discuss about the 193D example. This is a special case. If you investigate the details it appears that the NBU is actually covalently bound to chain C and not a free Ligand. It is one of the cases where it is difficult to draw the line between what is a ligand and what is a chemically modified peptide (oh joy) > * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method > Chain.getAtomGroups(GroupType.HETATM) provides neither. - The best way to determine ligands is using the Chemical Component Dictionary. Currently the BioJava PDB parser is not using this, yet. It contains a lot of additional info for modified residues and ligands (e.g. http://www.rcsb.org/pdb/files/ligand/NBU.cif to get the data for the NBU group ) . I will add support for this to the parser in the next couple of days. ( e.g. the group type can be used to distinguish chemically modified residues from other ligands). I did some initial work on this already in the past, but it is not hooked up with the PDB parser at the present. - Another way to determine Ligands is to investigate the various bonds within the protein. BioJava currently can't do that either, but we would like to add this at some point in the future... - Just to repeat myself: TER is not a good criteria to determine Ligands. I have seen cases in the past where authors used it to indicate an interruption in the main chain, since they could not experimentally observe the position of a loop region. The main chain did continue after the TER... > * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). > * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). > * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. - All these points are actually caused by the same issue: the attempt to match up ATOM and SEQRES sequences. The chains contain mostly hetatoms which are represented as "X" in the alignment. This makes it difficult to align them correctly. I will investigate if using the chem. comp. dictionary one_letter_code or mmcif group parent->one_letter_code will make the alignment more useful here... Andreas On Mon, May 17, 2010 at 3:46 PM, Andy Thomas-Cramer wrote: > > Hi Andreas. > > I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below. > > I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only. > > For example, 193D includes these two identical sequences: > > SEQRES ? 1 C ? ?5 ?HQU DSN ALA NCY CPC > SEQRES ? 1 D ? ?5 ?HQU DSN ALA NCY CPC > > And chain C has an associated ligand, NBU, which is not part of the sequence. > > Let: > * "BioJava SEQRES" = chain.getSeqResGroups() > * "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM): > > Then I get the following results with the new code: > > Chain: C > ?Actual ligands: ?NBU > ?Actual SEQRES: ? HQU DSN ALA NCY CPC > ?BioJava SEQRES: ?HQU DSN ALA CPC NBU <-- > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ^^^ > ?BioJava HETATM: ?HQU DSN NCY CPC NBU > > Chain: D > ? Actual ligands: ?None > ? Actual SEQRES: ? HQU DSN ALA NCY CPC > ? BioJava SEQRES: ?HQU DSN ALA NCY CPC > ? BioJava HETATM: ?HQU DSN NCY CPC > > I'm looking for the "Actual" lines above. > > Issues: > * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). > * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). > * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. > * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither. > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Friday, May 07, 2010 5:28 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > I see what you intend to do. ?If you want just the HETATOM groups, you > can request them with > > Chain.getAtomGroups(GroupType.HETATM); > > or just amino acids groups you can requrest them ?with > > Chain.getAtomGroups(GroupType.AMINOACID); > > same would work for getSeqresGroups(...) as well, but then your two > examples are quite specific: > > 193D is an antibiotic/DNA complex. > 7EST chain I is a > TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE > > Hetatoms are represented as Xs during the sequence alignments. I can > easily fix the "failing" alignment in this case, by ignoring the > wrongly aligned Hetatom Xs ?(patch just committed to SVN...). Not sure > if it makes any biological difference in your two examples. > > Andreas > > > > > On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer > wrote: >> >> Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. >> >> Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. >> >> Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. >> >> Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? >> >> >> -----Original Message----- >> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic >> Sent: Thursday, May 06, 2010 3:52 PM >> To: Andy Thomas-Cramer >> Cc: biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands >> >> Hi Andy, >> >> You don't need to process TERs to build up the representation of a >> structure. ?The BioJava data model will work fine even if the file >> does not contain any amino acids. (e.g. ?check 2KQO ) >> >> Ligands will get represented as Hetatom groups in the datamodel. >> Check the Hetatom or Group javadocs for how to access their atoms. >> >> For your last question: Check out the Chain.getAtomGroups() and >> Chain.getSeqResGroups() methods... >> >> If it does not work the way you expect for a particular PDB ID, please >> let me know the ID, so I can take a look at the details. >> >> Andreas >> >> >> On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer >> wrote: >>> >From a PDB file, I can identify which atoms are in ligands, and which >>> are in residues in the chain. The chain atoms end with the TER record. >>> >>> >>> >>> >From the BioJava API, I can distinguish as well -- if it's an amino >>> sequence and the automatic alignment between SEQRES and ATOM sequences >>> is successful. >>> >>> >>> >>> Is there a way through the API to identify atoms in ligands, when the >>> chain is not an amino sequence or alignment fails? It looks like the TER >>> record is ignored by PDBFileParser. >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> > > From kstillou at gmail.com Tue May 18 15:07:53 2010 From: kstillou at gmail.com (Katerina Stillou) Date: Tue, 18 May 2010 22:07:53 +0300 Subject: [Biojava-l] DNA sequence alignment - Percent Identity Message-ID: Hello, I am fairly new to Biojava and I have recently encountered a problem concerning the results of the method pairwiseAlignment. It is my impression, and please do correct me if I am wrong, that the only results I can get from this class are: Time (ms): Length: Score: Query: query, Length: Target: target, Length: followed by the alignment itself. What is more, this result is in a String format so I have to use some string manipulation methods in Java to extract each value, apart from the score which is the value returned from the call of the pairwiseAlignment method. However, what I am really interested in, is to find the percent identity of the two sequences. Therefore, I would be grateful to anyone that could point out a way to compute this percentage by using the data returned from the alignment. From what I have gathered by searching through the internet is that I need at least one of these: # of identical positions, # of aligned positions. Is it possible that the number of identical positions is the total number of " | " in the result of getAlignmentString()? Yeap, I am really confused. Some more information on my code: I am using the exact code presented in the Biojava Cookbook for global alignment with the NUC.4.4 substitution matrix. Thanks in advance, Katerina From andreas.draeger at uni-tuebingen.de Tue May 18 19:49:58 2010 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Wed, 19 May 2010 08:49:58 +0900 Subject: [Biojava-l] DNA sequence alignment - Percent Identity In-Reply-To: References: Message-ID: <4BF327A6.20405@uni-tuebingen.de> Hi Katerina, > Time (ms): > Length: > Score: > Query: query, Length: > Target: target, Length: > followed by the alignment itself. > > What is more, this result is in a String format so I have to use some string > manipulation methods in Java to extract each value, apart from the score > which is the value returned from the call of the pairwiseAlignment method. I have good and bad news for you. The bad news: So far, you are right. The current release of BioJava provides this information only. But the good news: A new version has already been implemented that provides several get methods. With the help of these you don't even have to calculate the percent identity by yourself, because it is also included. How to obtain the new implementation? Please do not use the Jar file of BioJava you just downloaded anymore, but anonymeously check out the latest code from the SVN repository. For instructions how to do that, please see http://biojava.org/wiki/CVS_to_SVN_Migration. I hope this helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From er.indupandey at gmail.com Thu May 20 04:00:07 2010 From: er.indupandey at gmail.com (indu pandey) Date: Thu, 20 May 2010 01:00:07 -0700 Subject: [Biojava-l] how to get secondry structure of protein Message-ID: hi, Is there any program in biojava to get the secondry structure of a protein from amino acid sequence. thanx and regards indu From Wim.DeSmet at UGent.be Thu May 20 10:58:55 2010 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Thu, 20 May 2010 16:58:55 +0200 Subject: [Biojava-l] handling gap symbols Message-ID: <4BF54E2F.80508@UGent.be> Hello all, I've been trying to figure out how to determine the location of gap symbols in an alignment, but I keep running into trouble determining what is a gap symbol. Apparently there are two different possible gap symbols and they can both appear in the same alignment? An example might make it clearer, suppose I perform the following alignment (matrix is the EDNA matrix): SequenceAlignment aligner = new NeedlemanWunsch((short) 0, (short) 3, (short) 10, (short) 10, (short) 1, matrix); Sequence first = DNATools.createDNASequence("ACT", "query"); Sequence second = DNATools.createDNASequence("AACTA", "target"); Alignment alignment = aligner.getAlignment(first, second); And Obtain the symbollist for "query", which should look like "-ACT-", I get the following Symbols: AlphabetManager$GapSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownGapSymbol AlphabetManager.getGapSymbol() returns AlphabetManager$GapSymbol, while symbolList.getAlphabet().getGapSymbol() returns AlphabetManager$WellKnownGapSymbol. Am I supposed to test against both or is there a bug here somewhere? I'm using biojava 1.7.1. regards, Wim -- Wim De Smet http://www.straininfo.net/ From andreas at sdsc.edu Thu May 20 18:43:50 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 20 May 2010 15:43:50 -0700 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References: Message-ID: Hi Indu, BioJava current can't do secondary structure prediction.... Andreas On Thu, May 20, 2010 at 1:00 AM, indu pandey wrote: > hi, > Is there any program in biojava to get the secondry structure of a protein > from amino acid sequence. > thanx and regards > indu > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas.draeger at uni-tuebingen.de Thu May 20 21:56:16 2010 From: andreas.draeger at uni-tuebingen.de (Andreas =?iso-8859-1?b?RHLkZ2Vy?=) Date: Fri, 21 May 2010 03:56:16 +0200 Subject: [Biojava-l] handling gap symbols In-Reply-To: <4BF54E2F.80508@UGent.be> References: <4BF54E2F.80508@UGent.be> Message-ID: <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> Hi Wim, Yes, you are absolutely right. The alignment used two different Gap Symbols. I do not remember the details on this exactly, because the implementation has been massively changed in the mean time. So, if you can check out the latest code from the repository, you will find a version of the alignment algorithms that does use only one kind of Gap Symbol. The old version cannot be changed or further developed anymore, sorry. Many changes were necessary to finally ensure that the Alignment will be gathered in a useful data structure. I strongly recomment not to use the Alignment from the currently available release of BioJava but to use the latest version from the SVN repository. You can do an anonymeous check out by following the instructions of this web site: http://biojava.org/wiki/CVS_to_SVN_Migration I hope this helps! Best wishes Andreas Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From phidias51 at gmail.com Fri May 21 18:05:16 2010 From: phidias51 at gmail.com (Mark Fortner) Date: Fri, 21 May 2010 15:05:16 -0700 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References:

Message-ID: I seem to recall that EMBOSS has a secondary structure prediction program. I'm not sure to what extent it would meet your needs, but this might help: http://emboss.sourceforge.net/docs/emboss_tutorial/node4.html#SECTION00430000000000000000 http://emboss.sourceforge.net/apps/release/6.1/emboss/apps/garnier.html I also found this site which listed a number of potential solutions: http://molbiol-tools.ca/Protein_secondary_structure.htm Hope this helps, Mark Fortner blog: http://feeds.feedburner.com/jroller/ideafactory On Thu, May 20, 2010 at 3:43 PM, Andreas Prlic wrote: > Hi Indu, > > BioJava current can't do secondary structure prediction.... > > Andreas > > On Thu, May 20, 2010 at 1:00 AM, indu pandey > wrote: > > hi, > > Is there any program in biojava to get the secondry structure of a > protein > > from amino acid sequence. > > thanx and regards > > indu > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From rick-biojava at rbyers.net Sun May 23 01:31:47 2010 From: rick-biojava at rbyers.net (Rick Byers) Date: Sat, 22 May 2010 22:31:47 -0700 Subject: [Biojava-l] BioJava for SNP analysis? Message-ID: Hi, I'm writing some Java code for analyzing SNP microarray data. I'd like to move to an open-source framework so I can leverage a community. To what extent is BioJava appropriate for SNP analysis? I don't see any code, documentation or samples that use microarray data (although I do see it listed as a potential 'other' module for BioJava 3). Is it reasonable to just use a Sequence for all the data points and define features for each SNP? Or is there really nothing I'd want to re-use from BioJava beyond the basic DNA alphabet? If BioJava isn't a great framework to use for SNP microarray analysis, does anyone know of an open-source project (preferably Java) that is? Thanks! Rick From buiduyminh at gmail.com Sun May 23 10:31:10 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Sun, 23 May 2010 10:31:10 -0400 Subject: [Biojava-l] Web-based Biojava? Message-ID: Hi everyone, I am pretty new to Java and Biojava. I am doing a research for my college where i have to create a website that allows users to compare the similarities of metagenomic sequences. I am using javascript to do that but I am wondering if I can do this with Biojava? Thank you, From rick-biojava at rbyers.net Sun May 23 17:06:38 2010 From: rick-biojava at rbyers.net (Rick Byers) Date: Sun, 23 May 2010 14:06:38 -0700 Subject: [Biojava-l] Web-based Biojava? In-Reply-To: References: Message-ID: I've been wondering about this myself. In particular, has anyone used BioJava from a JavaFX app or a classic Java applet? I know the corresponding bio platform for .NET (Microsoft Biology Framework) is in the process of being factored to have a core subset that can be used from within a Silverlight web application - seems many people were asking for that. Having a BioJava offering for RIA apps would be great. Rick On Sun, May 23, 2010 at 7:31 AM, Minh Bui wrote: > Hi everyone, > I am pretty new to Java and Biojava. I am doing a research for my college > where i have to create a website that allows users to compare the > similarities of metagenomic sequences. I am using javascript to do that but > I am wondering if I can do this with Biojava? > > Thank you, > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Sun May 23 18:09:54 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 23 May 2010 15:09:54 -0700 Subject: [Biojava-l] Web-based Biojava? In-Reply-To: References: Message-ID: Hi, It depends what you want to do and how you intend to build up the user interface. E.g. we are using BioJava on the server side to calculate structure alignments. For the html front end the Jmol applet is being used for visualization: http://betastaging.rcsb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_ce_cp&pdb1=1VHR&chain1=A&pdb2=2IHB&chain2=A Andreas From chris at compbio.dundee.ac.uk Mon May 24 06:58:00 2010 From: chris at compbio.dundee.ac.uk (Chris Cole) Date: Mon, 24 May 2010 11:58:00 +0100 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References:

Message-ID: <4BFA5BB8.9010604@compbio.dundee.ac.uk> It depends on what Indu means. No mention is made of a prediction, so all that is required may be a mapping to the PDB repository for structural data. Or, if a prediction is really what is required then I can suggest the java based Jalview (www.jalview.org) tool, which can do predictions via Jpred/Jnet. http://www.compbio.dundee.ac.uk/www-jpred/index.html Jalview is being developed within our lab and I was involved in the deveopment of the latest version of Jpred. Regards, Chris On 21/05/10 23:05, Mark Fortner wrote: > I seem to recall that EMBOSS has a secondary structure prediction program. > I'm not sure to what extent it would meet your needs, but this might help: > > http://emboss.sourceforge.net/docs/emboss_tutorial/node4.html#SECTION00430000000000000000 > > http://emboss.sourceforge.net/apps/release/6.1/emboss/apps/garnier.html > > > I also found this site which listed a number of potential solutions: > http://molbiol-tools.ca/Protein_secondary_structure.htm > > Hope this helps, > > Mark Fortner > > blog: http://feeds.feedburner.com/jroller/ideafactory > > > On Thu, May 20, 2010 at 3:43 PM, Andreas Prlic wrote: > >> Hi Indu, >> >> BioJava current can't do secondary structure prediction.... >> >> Andreas >> >> On Thu, May 20, 2010 at 1:00 AM, indu pandey >> wrote: >>> hi, >>> Is there any program in biojava to get the secondry structure of a >> protein >>> from amino acid sequence. >>> thanx and regards >>> indu From Wim.DeSmet at UGent.be Tue May 25 03:22:09 2010 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 25 May 2010 09:22:09 +0200 Subject: [Biojava-l] handling gap symbols In-Reply-To: <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> References: <4BF54E2F.80508@UGent.be> <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> Message-ID: <4BFB7AA1.8020205@UGent.be> Hi Andreas, Thank you, I'll use that code then. I don't suppose there's a maven repository that tracks recent dev versions? regards, Wim On 21-05-10 03:56, Andreas Dr?ger wrote: > Hi Wim, > > Yes, you are absolutely right. The alignment used two different Gap > Symbols. I do not remember the details on this exactly, because the > implementation has been massively changed in the mean time. So, if you > can check out the latest code from the repository, you will find a > version of the alignment algorithms that does use only one kind of Gap > Symbol. The old version cannot be changed or further developed anymore, > sorry. Many changes were necessary to finally ensure that the Alignment > will be gathered in a useful data structure. I strongly recomment not to > use the Alignment from the currently available release of BioJava but to > use the latest version from the SVN repository. You can do an anonymeous > check out by following the instructions of this web site: > http://biojava.org/wiki/CVS_to_SVN_Migration > > I hope this helps! > > Best wishes > Andreas > > > Dipl.-Bioinform. Andreas Dr?ger > Eberhard Karls University T?bingen > Center for Bioinformatics (ZBIT) > Sand 1 > 72076 T?bingen > Germany > > Phone: +49-7071-29-70436 > Fax: +49-7071-29-5091 -- Wim De Smet http://www.straininfo.net/ From andreas.draeger at uni-tuebingen.de Tue May 25 04:56:18 2010 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Tue, 25 May 2010 17:56:18 +0900 Subject: [Biojava-l] handling gap symbols In-Reply-To: <4BFB7AA1.8020205@UGent.be> References: <4BF54E2F.80508@UGent.be> <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> <4BFB7AA1.8020205@UGent.be> Message-ID: <4BFB90B2.3020707@uni-tuebingen.de> Wim De Smet wrote: > Hi Andreas, > > Thank you, I'll use that code then. I don't suppose there's a maven > repository that tracks recent dev versions? > > regards, > Wim Dear Wim, I am sorry, I am not sure about that. Maybe somebody else on this list can answer. Best regards Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From sylvain.foisy at diploide.net Tue May 25 09:10:35 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Tue, 25 May 2010 09:10:35 -0400 Subject: [Biojava-l] BioJava for SNP analysis? Message-ID: Hi, What do you have in mind? Genetic data representation? Databse connections , like fetching SNP from dbSNP? Specific analysis methods? My lab is also into SNP big time and I have been looking for some like-minded people to start some work on this in BJ3. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From andreas at sdsc.edu Tue May 25 12:53:31 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 25 May 2010 09:53:31 -0700 Subject: [Biojava-l] handling gap symbols In-Reply-To: <4BFB7AA1.8020205@UGent.be> References: <4BF54E2F.80508@UGent.be> <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> <4BFB7AA1.8020205@UGent.be> Message-ID: Hi Wim, I started to play around with snapshot builds for biojava. So far (some) of the modules from SVN are available from here: http://www.biojava.org/download/maven/ Andreas On Tue, May 25, 2010 at 12:22 AM, Wim De Smet wrote: > Hi Andreas, > > Thank you, I'll use that code then. I don't suppose there's a maven > repository that tracks recent dev versions? > > regards, > Wim > > On 21-05-10 03:56, Andreas Dr?ger wrote: >> >> Hi Wim, >> >> Yes, you are absolutely right. The alignment used two different Gap >> Symbols. I do not remember the details on this exactly, because the >> implementation has been massively changed in the mean time. So, if you >> can check out the latest code from the repository, you will find a >> version of the alignment algorithms that does use only one kind of Gap >> Symbol. The old version cannot be changed or further developed anymore, >> sorry. Many changes were necessary to finally ensure that the Alignment >> will be gathered in a useful data structure. I strongly recomment not to >> use the Alignment from the currently available release of BioJava but to >> use the latest version from the SVN repository. You can do an anonymeous >> check out by following the instructions of this web site: >> http://biojava.org/wiki/CVS_to_SVN_Migration >> >> I hope this helps! >> >> Best wishes >> Andreas >> >> >> Dipl.-Bioinform. Andreas Dr?ger >> Eberhard Karls University T?bingen >> Center for Bioinformatics (ZBIT) >> Sand 1 >> 72076 T?bingen >> Germany >> >> Phone: +49-7071-29-70436 >> Fax: +49-7071-29-5091 > > > -- > Wim De Smet > http://www.straininfo.net/ > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From rick-biojava at rbyers.net Tue May 25 23:50:06 2010 From: rick-biojava at rbyers.net (Rick Byers) Date: Tue, 25 May 2010 20:50:06 -0700 Subject: [Biojava-l] BioJava for SNP analysis? In-Reply-To: References: Message-ID: Hi, This probably seems silly compared to the important work professionals like yourself do, but genetics is currently just a hobby of mine. I'm mainly interested in analyzing my 23andMe results (and those of my friends and family) as a way to further my education in genetics and make it personally concrete. For example, there are some interesting population genetics results in the literature around population structure which I'd like to apply (and perhaps make available to the personal genetics community where it makes sense). This would include analysis methods like comparing individuals to identify regions half identical-by-descent, finding runs of extended homozygosity, and perhaps machine learning methods (HMM, etc.) for classifying genome regions into population sub-groups. But yes I'm also interested in dbSNP lookup and connection to other data sources (like HapMap) in order to provide analysis of specific regions beyond what the standard websites give you (eg. using LD to predict alleles relevant to specific traits/disease from strongly correlated genotyped SNPs). Since BJ doesn't really have anything for this yet, I think I'll continue building some tools on my own to better understand the constraints, and then try to figure out if there is something I can contribute back to BJ. I'm definitely interested to hear if you start something in BJ3 though - I'm happy to try to use it and help where I can. Thanks! Rick On Tue, May 25, 2010 at 6:10 AM, Sylvain Foisy wrote: > Hi, > > What do you have in mind? Genetic data representation? Databse connections > , > like fetching SNP from dbSNP? Specific analysis methods? My lab is also > into > SNP big time and I have been looking for some like-minded people to start > some work on this in BJ3. > > Best regards > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > From raphael.andre.bauer at gmail.com Wed May 26 04:45:08 2010 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Wed, 26 May 2010 10:45:08 +0200 Subject: [Biojava-l] SVN repository In-Reply-To: <4BACE08F.8020604@wur.nl> References: <4BA082EC.8010908@wur.nl> <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> <4BACE08F.8020604@wur.nl> Message-ID: On Fri, Mar 26, 2010 at 6:27 PM, Richard Finkers wrote: > > The repository has been back for two days. But it appears to be down again. Seems to be down again... I tried it yesterday and today... ra From andreas at sdsc.edu Wed May 26 10:54:25 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 26 May 2010 07:54:25 -0700 Subject: [Biojava-l] SVN repository In-Reply-To: References: <4BA082EC.8010908@wur.nl> <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> <4BACE08F.8020604@wur.nl> Message-ID: Hi Raphael, Thanks for letting us know. I will ping the OBF helpdesk again to give the SVN server a kick. We actually have 2 SVN servers: 1) The server that is used by people with developer accounts is different from the one that is 2) used for anonymous SVN. This second server gets a copy of the developer SVN every hour (or so). Luckily, the developer SVN is stable, but the anonymous copy is hosted on a virtual machine which unfortunately has been quite unreliable in the last few months. Since the developers are on a more stable and independent system, we usually don't notice if the anonymous server is down. Because the anonymous SVN is having issues we are now providing also a mirror of the (developer) SVN on github. This means you can * get the code using GIT from http://github.com/biojava/biojava * as an alternative there is also SVN access: svn list http://svn.github.com/biojava/biojava.git I will add the two github links to our download documentation Andreas On Wed, May 26, 2010 at 1:45 AM, Raphael Andr? Bauer wrote: > On Fri, Mar 26, 2010 at 6:27 PM, Richard Finkers wrote: >> >> The repository has been back for two days. But it appears to be down again. > > Seems to be down again... I tried it yesterday and today... > > ra > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed May 26 13:17:04 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 26 May 2010 10:17:04 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, I have updated the BioJava structure data model to support the PDB chemical component dictionary. This has the benefit that now * Chemically modified amino acids can be detected (and treated as amino acids, rather than Hetatom groups) * It is possible to get a component type for each Group, which allows to identify ligands. As a consequence the nr. of amino acids in a chain can change compared to the previous data representation. As such the loading of chem. comps is set to "false" by default. It can be configure by the "loadChemCompInfo" flag in the PDB/mmCIF file parsers. You can get the code either from SVN, or I also uploaded it to the (slightly experimental) Maven repository at http://www.biojava.org/download/maven/ . Andreas From kstillou at gmail.com Wed May 26 20:31:32 2010 From: kstillou at gmail.com (Katerina Stillou) Date: Thu, 27 May 2010 03:31:32 +0300 Subject: [Biojava-l] Build.xml Message-ID: Hello, first of all excuse me if this question is really silly. I downloaded from SVN the latest version of Biojava, but when I tried to build the library using ant I got this message saying "Buildfile: build.xml does not exist! Build failed". So my question is where can I find this build.xml file. The folder biojava-live I downloaded from SVN certainly does not contain it. Have I overlooked something here? Thank in advance, Katerina From andreas at sdsc.edu Thu May 27 12:14:49 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 27 May 2010 09:14:49 -0700 Subject: [Biojava-l] Build.xml In-Reply-To: References: Message-ID: Hi Katerina, The upcoming BioJava 3 does not use Ant, but Maven as a build tool. In order to compile the code you will have to write "mvn" on the command line, or even better, install an IDE of your choice and let it do the work for you. Having said that, at the present not all of the BioJava modules from SVN will compile, so you will need to take a look at whatever functionality you are mainly interested in... Andreas On Wed, May 26, 2010 at 5:31 PM, Katerina Stillou wrote: > Hello, > first of all excuse me if this question is really silly. I downloaded from > SVN the latest version of Biojava, but when I tried to build the library > using ant I got this message saying "Buildfile: build.xml does not exist! > Build failed". So my question is where can I find this build.xml file. The > folder biojava-live I downloaded from SVN certainly does not contain it. > Have I overlooked something here? > Thank in advance, > Katerina > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From raphael.andre.bauer at gmail.com Fri May 28 06:33:45 2010 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Fri, 28 May 2010 12:33:45 +0200 Subject: [Biojava-l] SVN repository In-Reply-To: References: <4BA082EC.8010908@wur.nl> <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> <4BACE08F.8020604@wur.nl> Message-ID: On Wed, May 26, 2010 at 4:54 PM, Andreas Prlic wrote: > Hi Raphael, > > Thanks for letting us know. I will ping the OBF helpdesk again to give > the SVN server a kick. > > We actually have 2 SVN servers: > 1) The server that is used by people with developer accounts is > different from the one that is > 2) used for anonymous SVN. This second server gets a copy of the > developer SVN every hour (or so). > > Luckily, the developer SVN is stable, but the anonymous copy is hosted > on a virtual machine which unfortunately has been quite unreliable in > the last few months. Since the developers are on a more stable and > independent system, we usually don't notice if the anonymous server is > down. > > Because the anonymous SVN is having issues we are now providing also a > mirror of the (developer) SVN on github. This means you can > > * get the code using GIT from http://github.com/biojava/biojava > * as an alternative there is ?also SVN access: ?svn list > http://svn.github.com/biojava/biojava.git > > I will add the two github links to our download documentation seems the server at open-bio is up and running again :) thanks a lot...! raphael From Wim.DeSmet at UGent.be Tue May 25 07:47:35 2010 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Tue, 25 May 2010 13:47:35 +0200 Subject: [Biojava-l] handling gap symbols In-Reply-To: <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> References: <4BF54E2F.80508@UGent.be> <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> Message-ID: <4BFBB8D7.2080208@UGent.be> Hi Andreas, List, On 21-05-10 03:56, Andreas Dr?ger wrote: > Hi Wim, > > Yes, you are absolutely right. The alignment used two different Gap > Symbols. I do not remember the details on this exactly, because the > implementation has been massively changed in the mean time. So, if you > can check out the latest code from the repository, you will find a > version of the alignment algorithms that does use only one kind of Gap > Symbol. The old version cannot be changed or further developed anymore, > sorry. Many changes were necessary to finally ensure that the Alignment > will be gathered in a useful data structure. I strongly recomment not to > use the Alignment from the currently available release of BioJava but to > use the latest version from the SVN repository. You can do an anonymeous > check out by following the instructions of this web site: > http://biojava.org/wiki/CVS_to_SVN_Migration I've checked out the latest code from the git repository, but I'm having some trouble getting it to compile properly. I hope this is the right place to ask more questions. My first question is whether I really need all of the modules for this simple alignment, or whether I can just stick to one of them (sequence? alignment?). The first problem I'm having to build the full package is a test failing in structure (testFilterDuplicateAFPs) and in the das module (testUniProtServer, testParseSourcesResponse). Skipping those I also found a small file encoding bug (a source file is encoded as ISO-8859-1 and the compiler is using either the platform encoding or default UTF-8), I've attached a small patch for this. The second problem (after skipping the tests) is apparently a reference problem to the core module from DNATools. I keep getting: .../sequence/sequence-dna/src/main/java/org/biojava3/seq/dna/DNATools.java:[29,31] package org.biojava3.core.symbol does not exist Indeed, the biojava-3 module doesn't contain that package. thanks for any help, Wim -- Wim De Smet http://www.straininfo.net/ -------------- next part -------------- A non-text attachment was scrubbed... Name: das_utf8.patch Type: text/x-patch Size: 663 bytes Desc: not available URL: From andreas at sdsc.edu Fri May 28 20:08:43 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 28 May 2010 17:08:43 -0700 Subject: [Biojava-l] handling gap symbols In-Reply-To: <4BFBB8D7.2080208@UGent.be> References: <4BF54E2F.80508@UGent.be> <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> <4BFBB8D7.2080208@UGent.be> Message-ID: Hi Wim, > I've checked out the latest code from the git repository, but I'm having > some trouble getting it to compile properly. I hope this is the right place > to ask more questions. yes, unfortunately not all modules do compile at the present. I will delete one or two of the broken modules over the next days, they are dead code that remained after refactoring... The good news is that due to the modularization you don;t need all of the modules to compile. Just to take a look at the alignment module, you will need core and alignment, which both compile fine.... You could also get the snapshot of the alignment module (and some others that compile) via Maven from http://www.biojava.org/download/maven/ Andreas > > My first question is whether I really need all of the modules for this > simple alignment, or whether I can just stick to one of them (sequence? > alignment?). > > The first problem I'm having to build the full package is a test failing in > structure (testFilterDuplicateAFPs) and in the das module > (testUniProtServer, testParseSourcesResponse). Skipping those I also found a > small file encoding bug (a source file is encoded as ISO-8859-1 and the > compiler is using either the platform encoding or default UTF-8), I've > attached a small patch for this. > > The second problem (after skipping the tests) is apparently a reference > problem to the core module from DNATools. I keep getting: > .../sequence/sequence-dna/src/main/java/org/biojava3/seq/dna/DNATools.java:[29,31] > package org.biojava3.core.symbol does not exist > > Indeed, the biojava-3 module doesn't contain that package. > > thanks for any help, > Wim > -- > Wim De Smet > http://www.straininfo.net/ > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jdlkem at gmail.com Sat May 29 15:37:35 2010 From: jdlkem at gmail.com (jdlkem) Date: Sat, 29 May 2010 15:37:35 -0400 Subject: [Biojava-l] Summer Coding Project Message-ID: Hello, My name is Jonathan, and I'm a rising junior at Harvard University. My interests lie primarily in all of the the sciences, with a slight emphasis on chemistry and computer science. This summer I am working on a project with the support of the Molgenis team to extend XGAP (http://xgap.org/) as a DAS service. I will be blogging my progress at http://blog.jdl-kem.com/. Best, -Jonathan From holland at eaglegenomics.com Sat May 1 07:41:40 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 1 May 2010 08:41:40 +0100 Subject: [Biojava-l] problems with intallation of biojava in windows 7 In-Reply-To: <20100430212950.M75279@cnpaf.embrapa.br> References: <20100430184758.M13673@cnpaf.embrapa.br> <20100430212950.M75279@cnpaf.embrapa.br> Message-ID: <8F8E0D1C-73B7-42BA-A979-3204BFD47AD4@eaglegenomics.com> BioJava is not an executable program, and it does not have an 'installer'. To 'install' it just download the JAR files and to make sure they're included in your classpath when you run a program that makes use of it. cheers, Richard On 30 Apr 2010, at 22:32, Marcelo Goncalves Narciso (Pesquisador) wrote: > Hi, people, > > I need your help. > > When I try to install biojava in windows 7, it happens: > >> C:\Users\narciso\biojava>java -jar biojava-1.7.1-all.jar >> Failed to load Main-Class manifest attribute from >> biojava-1.7.1-all.jar > How can I fix it? > > Thanks a lot > > Marcelo > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomascramera at dnastar.com Thu May 6 16:50:15 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Thu, 6 May 2010 11:50:15 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands Message-ID: >From a PDB file, I can identify which atoms are in ligands, and which are in residues in the chain. The chain atoms end with the TER record. >From the BioJava API, I can distinguish as well -- if it's an amino sequence and the automatic alignment between SEQRES and ATOM sequences is successful. Is there a way through the API to identify atoms in ligands, when the chain is not an amino sequence or alignment fails? It looks like the TER record is ignored by PDBFileParser. From andreas at sdsc.edu Thu May 6 20:51:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 May 2010 13:51:40 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, You don't need to process TERs to build up the representation of a structure. The BioJava data model will work fine even if the file does not contain any amino acids. (e.g. check 2KQO ) Ligands will get represented as Hetatom groups in the datamodel. Check the Hetatom or Group javadocs for how to access their atoms. For your last question: Check out the Chain.getAtomGroups() and Chain.getSeqResGroups() methods... If it does not work the way you expect for a particular PDB ID, please let me know the ID, so I can take a look at the details. Andreas On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer wrote: > >From a PDB file, I can identify which atoms are in ligands, and which > are in residues in the chain. The chain atoms end with the TER record. > > > > >From the BioJava API, I can distinguish as well -- if it's an amino > sequence and the automatic alignment between SEQRES and ATOM sequences > is successful. > > > > Is there a way through the API to identify atoms in ligands, when the > chain is not an amino sequence or alignment fails? It looks like the TER > record is ignored by PDBFileParser. > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From thomascramera at dnastar.com Fri May 7 17:55:52 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Fri, 7 May 2010 12:55:52 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Thursday, May 06, 2010 3:52 PM To: Andy Thomas-Cramer Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands Hi Andy, You don't need to process TERs to build up the representation of a structure. The BioJava data model will work fine even if the file does not contain any amino acids. (e.g. check 2KQO ) Ligands will get represented as Hetatom groups in the datamodel. Check the Hetatom or Group javadocs for how to access their atoms. For your last question: Check out the Chain.getAtomGroups() and Chain.getSeqResGroups() methods... If it does not work the way you expect for a particular PDB ID, please let me know the ID, so I can take a look at the details. Andreas On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer wrote: > >From a PDB file, I can identify which atoms are in ligands, and which > are in residues in the chain. The chain atoms end with the TER record. > > > > >From the BioJava API, I can distinguish as well -- if it's an amino > sequence and the automatic alignment between SEQRES and ATOM sequences > is successful. > > > > Is there a way through the API to identify atoms in ligands, when the > chain is not an amino sequence or alignment fails? It looks like the TER > record is ignored by PDBFileParser. > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri May 7 22:27:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 7 May 2010 15:27:39 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, I see what you intend to do. If you want just the HETATOM groups, you can request them with Chain.getAtomGroups(GroupType.HETATM); or just amino acids groups you can requrest them with Chain.getAtomGroups(GroupType.AMINOACID); same would work for getSeqresGroups(...) as well, but then your two examples are quite specific: 193D is an antibiotic/DNA complex. 7EST chain I is a TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE Hetatoms are represented as Xs during the sequence alignments. I can easily fix the "failing" alignment in this case, by ignoring the wrongly aligned Hetatom Xs (patch just committed to SVN...). Not sure if it makes any biological difference in your two examples. Andreas On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer wrote: > > Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. > > Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. > > Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. > > Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Thursday, May 06, 2010 3:52 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > You don't need to process TERs to build up the representation of a > structure. ?The BioJava data model will work fine even if the file > does not contain any amino acids. (e.g. ?check 2KQO ) > > Ligands will get represented as Hetatom groups in the datamodel. > Check the Hetatom or Group javadocs for how to access their atoms. > > For your last question: Check out the Chain.getAtomGroups() and > Chain.getSeqResGroups() methods... > > If it does not work the way you expect for a particular PDB ID, please > let me know the ID, so I can take a look at the details. > > Andreas > > > On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer > wrote: >> >From a PDB file, I can identify which atoms are in ligands, and which >> are in residues in the chain. The chain atoms end with the TER record. >> >> >> >> >From the BioJava API, I can distinguish as well -- if it's an amino >> sequence and the automatic alignment between SEQRES and ATOM sequences >> is successful. >> >> >> >> Is there a way through the API to identify atoms in ligands, when the >> chain is not an amino sequence or alignment fails? It looks like the TER >> record is ignored by PDBFileParser. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jake at researchtogether.com Mon May 10 22:36:52 2010 From: jake at researchtogether.com (jake at researchtogether.com) Date: Mon, 10 May 2010 23:36:52 +0100 Subject: [Biojava-l] Idle Developer Volunteering Message-ID: <20100510223652.GB30216@researchtogether.com> Hi All, I've got some free time and I'd like to help out on this project, but I'm not sure how active the community is. Is the bug list accurate as they seem a bit old? Also, if anyone wants any coding doing I'd be glad to help out. Cheers, Jake From andreas at sdsc.edu Tue May 11 00:49:01 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 10 May 2010 17:49:01 -0700 Subject: [Biojava-l] Idle Developer Volunteering In-Reply-To: <20100510223652.GB30216@researchtogether.com> References: <20100510223652.GB30216@researchtogether.com> Message-ID: Hi Jake, thanks for your interest. We are always looking for contributors of code, documentation or helping out with questions on the mailing lists. The bug tracking system is currently a bit quiet, since we are working on a major new version: BioJava 3. The current status is documented here: http://www.biojava.org/wiki/BioJava:Modules Perhaps you can take a look there and see how this is aligned with your research interests... Andreas On Mon, May 10, 2010 at 3:36 PM, wrote: > Hi All, > > I've got some free time and I'd like to help out on this project, but I'm not sure how active the community is. Is the bug list accurate as they seem a bit old? > > Also, if anyone wants any coding doing I'd be glad to help out. > > Cheers, > Jake > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From wangh5 at muohio.edu Tue May 11 19:21:54 2010 From: wangh5 at muohio.edu (Wang, Han) Date: Tue, 11 May 2010 15:21:54 -0400 Subject: [Biojava-l] Getting numeric quality values from fastq Message-ID: Dear someone concerned: I am new to biojava. I faced some problems on the numrica values of the fastq file. I read the biojava API and found how to read in the fastq file and get the quality from each fastq reads. Unfortunately, it just reads the sequence of the original quality sequence rather numeric quality values. Can someone give me some help on this problem? I will really appreciate. Sincerely Han From idoerg at gmail.com Wed May 12 21:22:20 2010 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 12 May 2010 17:22:20 -0400 Subject: [Biojava-l] FASTQ in biojava Message-ID: Hi, We're trying to get the numeric values from the biojava fastq reader. As we understand the API, getQuality only supplies the ASCII value of the quality string. How do we get the actual Q numeric values? http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/Fastq.html#getQuality%28%29 Thanks, Iddo -- Iddo Friedberg http://iddo-friedberg.net/contact.html From heuermh at acm.org Thu May 13 05:12:44 2010 From: heuermh at acm.org (Michael Heuer) Date: Thu, 13 May 2010 01:12:44 -0400 (EDT) Subject: [Biojava-l] FASTQ in biojava In-Reply-To: Message-ID: On Tue, 11 May 2010, Wang, Han wrote: > I am new to biojava. I faced some problems on the numrica values of the > fastq file. I read the biojava API and found how to read in the fastq > file and get the quality from each fastq reads. Unfortunately, it just > reads the sequence of the original quality sequence rather numeric > quality values. Can someone give me some help on this problem? I will > really appreciate. On Wed, 12 May 2010, Iddo Friedberg wrote: > We're trying to get the numeric values from the biojava fastq reader. As we > understand the API, getQuality only supplies the ASCII value of the quality > string. How do we get the actual Q numeric values? That is correct, the current fastq package just handles IO to/from the Fastq memento class and conversion between different variants of the FASTQ format. The next step would be to go from a Fastq record to a Sequence with quality scores. I haven't written that part yet, guess I should get on it. :) michael From jeedward at yahoo.com Fri May 14 21:41:40 2010 From: jeedward at yahoo.com (John Edward) Date: Fri, 14 May 2010 14:41:40 -0700 (PDT) Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 Message-ID: <933238.27005.qm@web45913.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From holland at eaglegenomics.com Mon May 17 18:47:10 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 17 May 2010 19:47:10 +0100 Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 In-Reply-To: <933238.27005.qm@web45913.mail.sp1.yahoo.com> References: <933238.27005.qm@web45913.mail.sp1.yahoo.com> Message-ID: <4C7C7F85-FEB3-43D8-BD44-27201E00DA4B@eaglegenomics.com> Who is doing the BioJava talk at BOSC this year? I'll be at BCBGC-10 so I can present the same talk there if you like. cheers, Richard On 14 May 2010, at 22:41, John Edward wrote: > It > would be highly appreciated if you could share this announcement with your > colleagues, students and individuals whose research is in bioinformatics, > computational biology, genomics, data-mining, and related areas. > > Call > for papers: BCBGC-10, USA, July 2010 > > The > 2010 International Conference on Bioinformatics, Computational Biology, > Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will > be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of > bioinformatics, computational biology, genomics and chemoinformatics and > focuses on all areas related to the conference. > > The > conference will be held at the same time and location where several other major > international conferences will be taking place. The conference will be held as > part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during > July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to > promote research and developmental activities in computer science, information > technology, control engineering, and related fields. Another goal is to promote > the dissemination of research to a multidisciplinary audience and to facilitate > communication among researchers, developers, practitioners in different fields. > The following conferences are planned to be organized as part of MULTICONF-10. > > ? International Conference on > Artificial Intelligence and Pattern Recognition (AIPR-10) > ? International Conference on Automation, > Robotics and Control Systems (ARCS-10) > ? International Conference on > Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) > ? International Conference on Computer > Communications and Networks (CCN-10) > ? International Conference on > Enterprise Information Systems and Web Technologies (EISWT-10) > ? International Conference on High > Performance Computing Systems (HPCS-10) > ? International Conference on > Information Security and Privacy (ISP-10) > ? International Conference on Image and > Video Processing and Computer Vision (IVPCV-10) > ? International Conference on Software > Engineering Theory and Practice (SETP-10) > ? International Conference on > Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) > > > MULTICONF-10 > will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! > Located 1/2 block south of the famed International Drive, the hotel is just > minutes from great entertainment like Walt Disney World? Resort, Universal > Studios and Sea World Orlando. Guests can enjoy free scheduled transportation > to these theme parks, as well as spacious accommodations, outdoor pools and > on-site dining ? all situated on 10 tropically landscaped acres. Here, guests > can experience a full-service resort with discount hotel pricing in Orlando. > > We > invite draft paper submissions. Please see the website http://www.PromoteResearch.org for > more details. > > Sincerely > John > Edward > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomascramera at dnastar.com Mon May 17 22:46:54 2010 From: thomascramera at dnastar.com (Andy Thomas-Cramer) Date: Mon, 17 May 2010 17:46:54 -0500 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andreas. I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below. I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only. For example, 193D includes these two identical sequences: SEQRES 1 C 5 HQU DSN ALA NCY CPC SEQRES 1 D 5 HQU DSN ALA NCY CPC And chain C has an associated ligand, NBU, which is not part of the sequence. Let: * "BioJava SEQRES" = chain.getSeqResGroups() * "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM): Then I get the following results with the new code: Chain: C Actual ligands: NBU Actual SEQRES: HQU DSN ALA NCY CPC BioJava SEQRES: HQU DSN ALA CPC NBU <-- ^^^ BioJava HETATM: HQU DSN NCY CPC NBU Chain: D Actual ligands: None Actual SEQRES: HQU DSN ALA NCY CPC BioJava SEQRES: HQU DSN ALA NCY CPC BioJava HETATM: HQU DSN NCY CPC I'm looking for the "Actual" lines above. Issues: * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither. -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Friday, May 07, 2010 5:28 PM To: Andy Thomas-Cramer Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands Hi Andy, I see what you intend to do. If you want just the HETATOM groups, you can request them with Chain.getAtomGroups(GroupType.HETATM); or just amino acids groups you can requrest them with Chain.getAtomGroups(GroupType.AMINOACID); same would work for getSeqresGroups(...) as well, but then your two examples are quite specific: 193D is an antibiotic/DNA complex. 7EST chain I is a TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE Hetatoms are represented as Xs during the sequence alignments. I can easily fix the "failing" alignment in this case, by ignoring the wrongly aligned Hetatom Xs (patch just committed to SVN...). Not sure if it makes any biological difference in your two examples. Andreas On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer wrote: > > Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. > > Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. > > Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. > > Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Thursday, May 06, 2010 3:52 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > You don't need to process TERs to build up the representation of a > structure. ?The BioJava data model will work fine even if the file > does not contain any amino acids. (e.g. ?check 2KQO ) > > Ligands will get represented as Hetatom groups in the datamodel. > Check the Hetatom or Group javadocs for how to access their atoms. > > For your last question: Check out the Chain.getAtomGroups() and > Chain.getSeqResGroups() methods... > > If it does not work the way you expect for a particular PDB ID, please > let me know the ID, so I can take a look at the details. > > Andreas > > > On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer > wrote: >> >From a PDB file, I can identify which atoms are in ligands, and which >> are in residues in the chain. The chain atoms end with the TER record. >> >> >> >> >From the BioJava API, I can distinguish as well -- if it's an amino >> sequence and the automatic alignment between SEQRES and ATOM sequences >> is successful. >> >> >> >> Is there a way through the API to identify atoms in ligands, when the >> chain is not an amino sequence or alignment fails? It looks like the TER >> record is ignored by PDBFileParser. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Tue May 18 02:04:00 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 17 May 2010 19:04:00 -0700 Subject: [Biojava-l] PDBFileParser and identifying atoms in ligands In-Reply-To: References: Message-ID: Hi Andy, - There are a few things to discuss about the 193D example. This is a special case. If you investigate the details it appears that the NBU is actually covalently bound to chain C and not a free Ligand. It is one of the cases where it is difficult to draw the line between what is a ligand and what is a chemically modified peptide (oh joy) > * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method > Chain.getAtomGroups(GroupType.HETATM) provides neither. - The best way to determine ligands is using the Chemical Component Dictionary. Currently the BioJava PDB parser is not using this, yet. It contains a lot of additional info for modified residues and ligands (e.g. http://www.rcsb.org/pdb/files/ligand/NBU.cif to get the data for the NBU group ) . I will add support for this to the parser in the next couple of days. ( e.g. the group type can be used to distinguish chemically modified residues from other ligands). I did some initial work on this already in the past, but it is not hooked up with the PDB parser at the present. - Another way to determine Ligands is to investigate the various bonds within the protein. BioJava currently can't do that either, but we would like to add this at some point in the future... - Just to repeat myself: TER is not a good criteria to determine Ligands. I have seen cases in the past where authors used it to indicate an interruption in the main chain, since they could not experimentally observe the position of a loop region. The main chain did continue after the TER... > * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). > * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). > * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. - All these points are actually caused by the same issue: the attempt to match up ATOM and SEQRES sequences. The chains contain mostly hetatoms which are represented as "X" in the alignment. This makes it difficult to align them correctly. I will investigate if using the chem. comp. dictionary one_letter_code or mmcif group parent->one_letter_code will make the alignment more useful here... Andreas On Mon, May 17, 2010 at 3:46 PM, Andy Thomas-Cramer wrote: > > Hi Andreas. > > I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below. > > I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only. > > For example, 193D includes these two identical sequences: > > SEQRES ? 1 C ? ?5 ?HQU DSN ALA NCY CPC > SEQRES ? 1 D ? ?5 ?HQU DSN ALA NCY CPC > > And chain C has an associated ligand, NBU, which is not part of the sequence. > > Let: > * "BioJava SEQRES" = chain.getSeqResGroups() > * "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM): > > Then I get the following results with the new code: > > Chain: C > ?Actual ligands: ?NBU > ?Actual SEQRES: ? HQU DSN ALA NCY CPC > ?BioJava SEQRES: ?HQU DSN ALA CPC NBU <-- > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ^^^ > ?BioJava HETATM: ?HQU DSN NCY CPC NBU > > Chain: D > ? Actual ligands: ?None > ? Actual SEQRES: ? HQU DSN ALA NCY CPC > ? BioJava SEQRES: ?HQU DSN ALA NCY CPC > ? BioJava HETATM: ?HQU DSN NCY CPC > > I'm looking for the "Actual" lines above. > > Issues: > * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups(). > * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups(). > * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results. > * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither. > > > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Friday, May 07, 2010 5:28 PM > To: Andy Thomas-Cramer > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands > > Hi Andy, > > I see what you intend to do. ?If you want just the HETATOM groups, you > can request them with > > Chain.getAtomGroups(GroupType.HETATM); > > or just amino acids groups you can requrest them ?with > > Chain.getAtomGroups(GroupType.AMINOACID); > > same would work for getSeqresGroups(...) as well, but then your two > examples are quite specific: > > 193D is an antibiotic/DNA complex. > 7EST chain I is a > TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE > > Hetatoms are represented as Xs during the sequence alignments. I can > easily fix the "failing" alignment in this case, by ignoring the > wrongly aligned Hetatom Xs ?(patch just committed to SVN...). Not sure > if it makes any biological difference in your two examples. > > Andreas > > > > > On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer > wrote: >> >> Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands. >> >> Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences. >> >> Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both. >> >> Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment? >> >> >> -----Original Message----- >> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic >> Sent: Thursday, May 06, 2010 3:52 PM >> To: Andy Thomas-Cramer >> Cc: biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands >> >> Hi Andy, >> >> You don't need to process TERs to build up the representation of a >> structure. ?The BioJava data model will work fine even if the file >> does not contain any amino acids. (e.g. ?check 2KQO ) >> >> Ligands will get represented as Hetatom groups in the datamodel. >> Check the Hetatom or Group javadocs for how to access their atoms. >> >> For your last question: Check out the Chain.getAtomGroups() and >> Chain.getSeqResGroups() methods... >> >> If it does not work the way you expect for a particular PDB ID, please >> let me know the ID, so I can take a look at the details. >> >> Andreas >> >> >> On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer >> wrote: >>> >From a PDB file, I can identify which atoms are in ligands, and which >>> are in residues in the chain. The chain atoms end with the TER record. >>> >>> >>> >>> >From the BioJava API, I can distinguish as well -- if it's an amino >>> sequence and the automatic alignment between SEQRES and ATOM sequences >>> is successful. >>> >>> >>> >>> Is there a way through the API to identify atoms in ligands, when the >>> chain is not an amino sequence or alignment fails? It looks like the TER >>> record is ignored by PDBFileParser. >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> > > From kstillou at gmail.com Tue May 18 19:07:53 2010 From: kstillou at gmail.com (Katerina Stillou) Date: Tue, 18 May 2010 22:07:53 +0300 Subject: [Biojava-l] DNA sequence alignment - Percent Identity Message-ID: Hello, I am fairly new to Biojava and I have recently encountered a problem concerning the results of the method pairwiseAlignment. It is my impression, and please do correct me if I am wrong, that the only results I can get from this class are: Time (ms): Length: Score: Query: query, Length: Target: target, Length: followed by the alignment itself. What is more, this result is in a String format so I have to use some string manipulation methods in Java to extract each value, apart from the score which is the value returned from the call of the pairwiseAlignment method. However, what I am really interested in, is to find the percent identity of the two sequences. Therefore, I would be grateful to anyone that could point out a way to compute this percentage by using the data returned from the alignment. From what I have gathered by searching through the internet is that I need at least one of these: # of identical positions, # of aligned positions. Is it possible that the number of identical positions is the total number of " | " in the result of getAlignmentString()? Yeap, I am really confused. Some more information on my code: I am using the exact code presented in the Biojava Cookbook for global alignment with the NUC.4.4 substitution matrix. Thanks in advance, Katerina From andreas.draeger at uni-tuebingen.de Tue May 18 23:49:58 2010 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Wed, 19 May 2010 08:49:58 +0900 Subject: [Biojava-l] DNA sequence alignment - Percent Identity In-Reply-To: References: Message-ID: <4BF327A6.20405@uni-tuebingen.de> Hi Katerina, > Time (ms): > Length: > Score: > Query: query, Length: > Target: target, Length: > followed by the alignment itself. > > What is more, this result is in a String format so I have to use some string > manipulation methods in Java to extract each value, apart from the score > which is the value returned from the call of the pairwiseAlignment method. I have good and bad news for you. The bad news: So far, you are right. The current release of BioJava provides this information only. But the good news: A new version has already been implemented that provides several get methods. With the help of these you don't even have to calculate the percent identity by yourself, because it is also included. How to obtain the new implementation? Please do not use the Jar file of BioJava you just downloaded anymore, but anonymeously check out the latest code from the SVN repository. For instructions how to do that, please see http://biojava.org/wiki/CVS_to_SVN_Migration. I hope this helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From er.indupandey at gmail.com Thu May 20 08:00:07 2010 From: er.indupandey at gmail.com (indu pandey) Date: Thu, 20 May 2010 01:00:07 -0700 Subject: [Biojava-l] how to get secondry structure of protein Message-ID: hi, Is there any program in biojava to get the secondry structure of a protein from amino acid sequence. thanx and regards indu From Wim.DeSmet at UGent.be Thu May 20 14:58:55 2010 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Thu, 20 May 2010 16:58:55 +0200 Subject: [Biojava-l] handling gap symbols Message-ID: <4BF54E2F.80508@UGent.be> Hello all, I've been trying to figure out how to determine the location of gap symbols in an alignment, but I keep running into trouble determining what is a gap symbol. Apparently there are two different possible gap symbols and they can both appear in the same alignment? An example might make it clearer, suppose I perform the following alignment (matrix is the EDNA matrix): SequenceAlignment aligner = new NeedlemanWunsch((short) 0, (short) 3, (short) 10, (short) 10, (short) 1, matrix); Sequence first = DNATools.createDNASequence("ACT", "query"); Sequence second = DNATools.createDNASequence("AACTA", "target"); Alignment alignment = aligner.getAlignment(first, second); And Obtain the symbollist for "query", which should look like "-ACT-", I get the following Symbols: AlphabetManager$GapSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownAtomicSymbol AlphabetManager$WellKnownGapSymbol AlphabetManager.getGapSymbol() returns AlphabetManager$GapSymbol, while symbolList.getAlphabet().getGapSymbol() returns AlphabetManager$WellKnownGapSymbol. Am I supposed to test against both or is there a bug here somewhere? I'm using biojava 1.7.1. regards, Wim -- Wim De Smet http://www.straininfo.net/ From andreas at sdsc.edu Thu May 20 22:43:50 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 20 May 2010 15:43:50 -0700 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References: Message-ID: Hi Indu, BioJava current can't do secondary structure prediction.... Andreas On Thu, May 20, 2010 at 1:00 AM, indu pandey wrote: > hi, > Is there any program in biojava to get the secondry structure of a protein > from amino acid sequence. > thanx and regards > indu > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas.draeger at uni-tuebingen.de Fri May 21 01:56:16 2010 From: andreas.draeger at uni-tuebingen.de (Andreas =?iso-8859-1?b?RHLkZ2Vy?=) Date: Fri, 21 May 2010 03:56:16 +0200 Subject: [Biojava-l] handling gap symbols In-Reply-To: <4BF54E2F.80508@UGent.be> References: <4BF54E2F.80508@UGent.be> Message-ID: <20100521035616.11549el4m75jbby8@webmail.uni-tuebingen.de> Hi Wim, Yes, you are absolutely right. The alignment used two different Gap Symbols. I do not remember the details on this exactly, because the implementation has been massively changed in the mean time. So, if you can check out the latest code from the repository, you will find a version of the alignment algorithms that does use only one kind of Gap Symbol. The old version cannot be changed or further developed anymore, sorry. Many changes were necessary to finally ensure that the Alignment will be gathered in a useful data structure. I strongly recomment not to use the Alignment from the currently available release of BioJava but to use the latest version from the SVN repository. You can do an anonymeous check out by following the instructions of this web site: http://biojava.org/wiki/CVS_to_SVN_Migration I hope this helps! Best wishes Andreas Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From phidias51 at gmail.com Fri May 21 22:05:16 2010 From: phidias51 at gmail.com (Mark Fortner) Date: Fri, 21 May 2010 15:05:16 -0700 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References:

Message-ID: I seem to recall that EMBOSS has a secondary structure prediction program. I'm not sure to what extent it would meet your needs, but this might help: http://emboss.sourceforge.net/docs/emboss_tutorial/node4.html#SECTION00430000000000000000 http://emboss.sourceforge.net/apps/release/6.1/emboss/apps/garnier.html I also found this site which listed a number of potential solutions: http://molbiol-tools.ca/Protein_secondary_structure.htm Hope this helps, Mark Fortner blog: http://feeds.feedburner.com/jroller/ideafactory On Thu, May 20, 2010 at 3:43 PM, Andreas Prlic wrote: > Hi Indu, > > BioJava current can't do secondary structure prediction.... > > Andreas > > On Thu, May 20, 2010 at 1:00 AM, indu pandey > wrote: > > hi, > > Is there any program in biojava to get the secondry structure of a > protein > > from amino acid sequence. > > thanx and regards > > indu > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From rick-biojava at rbyers.net Sun May 23 05:31:47 2010 From: rick-biojava at rbyers.net (Rick Byers) Date: Sat, 22 May 2010 22:31:47 -0700 Subject: [Biojava-l] BioJava for SNP analysis? Message-ID: Hi, I'm writing some Java code for analyzing SNP microarray data. I'd like to move to an open-source framework so I can leverage a community. To what extent is BioJava appropriate for SNP analysis? I don't see any code, documentation or samples that use microarray data (although I do see it listed as a potential 'other' module for BioJava 3). Is it reasonable to just use a Sequence for all the data points and define features for each SNP? Or is there really nothing I'd want to re-use from BioJava beyond the basic DNA alphabet? If BioJava isn't a great framework to use for SNP microarray analysis, does anyone know of an open-source project (preferably Java) that is? Thanks! Rick From buiduyminh at gmail.com Sun May 23 14:31:10 2010 From: buiduyminh at gmail.com (Minh Bui) Date: Sun, 23 May 2010 10:31:10 -0400 Subject: [Biojava-l] Web-based Biojava? Message-ID: Hi everyone, I am pretty new to Java and Biojava. I am doing a research for my college where i have to create a website that allows users to compare the similarities of metagenomic sequences. I am using javascript to do that but I am wondering if I can do this with Biojava? Thank you, From rick-biojava at rbyers.net Sun May 23 21:06:38 2010 From: rick-biojava at rbyers.net (Rick Byers) Date: Sun, 23 May 2010 14:06:38 -0700 Subject: [Biojava-l] Web-based Biojava? In-Reply-To: References: Message-ID: I've been wondering about this myself. In particular, has anyone used BioJava from a JavaFX app or a classic Java applet? I know the corresponding bio platform for .NET (Microsoft Biology Framework) is in the process of being factored to have a core subset that can be used from within a Silverlight web application - seems many people were asking for that. Having a BioJava offering for RIA apps would be great. Rick On Sun, May 23, 2010 at 7:31 AM, Minh Bui wrote: > Hi everyone, > I am pretty new to Java and Biojava. I am doing a research for my college > where i have to create a website that allows users to compare the > similarities of metagenomic sequences. I am using javascript to do that but > I am wondering if I can do this with Biojava? > > Thank you, > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Sun May 23 22:09:54 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 23 May 2010 15:09:54 -0700 Subject: [Biojava-l] Web-based Biojava? In-Reply-To: References: Message-ID: Hi, It depends what you want to do and how you intend to build up the user interface. E.g. we are using BioJava on the server side to calculate structure alignments. For the html front end the Jmol applet is being used for visualization: http://betastaging.rcsb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_ce_cp&pdb1=1VHR&chain1=A&pdb2=2IHB&chain2=A Andreas From chris at compbio.dundee.ac.uk Mon May 24 10:58:00 2010 From: chris at compbio.dundee.ac.uk (Chris Cole) Date: Mon, 24 May 2010 11:58:00 +0100 Subject: [Biojava-l] how to get secondry structure of protein In-Reply-To: References: