From khalil.elmazouari at gmail.com Sun Jun 9 15:32:32 2013 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Sun, 9 Jun 2013 21:32:32 +0200 Subject: [Biojava-l] Local aln - contig assembly Message-ID: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> Hi, I am trying to assemble overlapping sequence (direct & reverse) via local alignment. I am only searching for local aln with 100% identity. Which parameters, matrix ... should I use in order to get 100% ident. local aln. Any other suggestion for assembling overlapping seq (in Java) is welcome. Thanks khalil SubstitutionMatrix matrix = SubstitutionMatrixHelper.getNuc4_2(); SimpleGapPenalty gapP = new SimpleGapPenalty(); gapP.setOpenPenalty((short) 5); gapP.setExtensionPenalty((short) 1); SequencePair psa = Alignments.getPairwiseAlignment(query, target, PairwiseSequenceAlignerType.LOCAL, gapP, matrix); ======== Local Alignment Identity: 97.84688995215312% query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. From andreas at sdsc.edu Mon Jun 10 20:47:41 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 10 Jun 2013 17:47:41 -0700 Subject: [Biojava-l] Local aln - contig assembly In-Reply-To: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> References: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> Message-ID: Hi Khalil, if you can get 100% sequence ID depends on your sequences.. you can try to enforce a more strict alignment by increasing the gap penalties significantly (try to double or triple gap opening and extension) . A On Sun, Jun 9, 2013 at 12:32 PM, Khalil El Mazouari < khalil.elmazouari at gmail.com> wrote: > Hi, > > I am trying to assemble overlapping sequence (direct & reverse) via local > alignment. I am only searching for local aln with 100% identity. > > Which parameters, matrix ... should I use in order to get 100% ident. > local aln. > > Any other suggestion for assembling overlapping seq (in Java) is welcome. > > Thanks > > khalil > > > > SubstitutionMatrix matrix = > SubstitutionMatrixHelper.getNuc4_2(); > SimpleGapPenalty gapP = new SimpleGapPenalty(); > gapP.setOpenPenalty((short) 5); > gapP.setExtensionPenalty((short) 1); > SequencePair psa = > Alignments.getPairwiseAlignment(query, target, > PairwiseSequenceAlignerType.LOCAL, gapP, matrix); > > > > > ======== > > Local Alignment Identity: 97.84688995215312% > > query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 > |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| > target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 > > query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 > > query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 > > query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 > > query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 > > query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 > > query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 > |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| > target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 > > > > > > > > > > > ----- > > Confidentiality Notice: This e-mail and any files transmitted with it are > private and confidential and are solely for the use of the addressee. It > may contain material which is legally privileged. If you are not the > addressee or the person responsible for delivering to the addressee, please > notify that you have received this e-mail in error and that any use of it > is strictly prohibited. It would be helpful if you could notify the author > by replying to it. > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From khalil.elmazouari at gmail.com Tue Jun 11 12:10:27 2013 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 11 Jun 2013 18:10:27 +0200 Subject: [Biojava-l] Biojava-l Digest, Vol 125, Issue 2 In-Reply-To: References: Message-ID: <6746FCC6-7427-457B-A448-745E880A9A3C@gmail.com> Hi Andreas, thanks for the feedback. increasing gop and gep was not sufficient. I had to use a modified version of nuc-4_2 scoring matrix where I set the mismatch value very low. This solved the problem, Best khalil ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. On 11 Jun 2013, at 18:00, biojava-l-request at lists.open-bio.org wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. Re: Local aln - contig assembly (Andreas Prlic) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 10 Jun 2013 17:47:41 -0700 > From: Andreas Prlic > Subject: Re: [Biojava-l] Local aln - contig assembly > To: Khalil El Mazouari > Cc: "Biojava-l at lists.open-bio.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Khalil, > > if you can get 100% sequence ID depends on your sequences.. you can try to > enforce a more strict alignment by increasing the gap penalties > significantly (try to double or triple gap opening and extension) . > > A > > > On Sun, Jun 9, 2013 at 12:32 PM, Khalil El Mazouari < > khalil.elmazouari at gmail.com> wrote: > >> Hi, >> >> I am trying to assemble overlapping sequence (direct & reverse) via local >> alignment. I am only searching for local aln with 100% identity. >> >> Which parameters, matrix ... should I use in order to get 100% ident. >> local aln. >> >> Any other suggestion for assembling overlapping seq (in Java) is welcome. >> >> Thanks >> >> khalil >> >> >> >> SubstitutionMatrix matrix = >> SubstitutionMatrixHelper.getNuc4_2(); >> SimpleGapPenalty gapP = new SimpleGapPenalty(); >> gapP.setOpenPenalty((short) 5); >> gapP.setExtensionPenalty((short) 1); >> SequencePair psa = >> Alignments.getPairwiseAlignment(query, target, >> PairwiseSequenceAlignerType.LOCAL, gapP, matrix); >> >> >> >> >> ======== >> >> Local Alignment Identity: 97.84688995215312% >> >> query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 >> |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| >> target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 >> >> query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 >> >> query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 >> >> query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 >> >> query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 >> >> query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 >> >> query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 >> |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| >> target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 >> >> >> >> >> >> >> >> >> >> >> ----- >> >> Confidentiality Notice: This e-mail and any files transmitted with it are >> private and confidential and are solely for the use of the addressee. It >> may contain material which is legally privileged. If you are not the >> addressee or the person responsible for delivering to the addressee, please >> notify that you have received this e-mail in error and that any use of it >> is strictly prohibited. It would be helpful if you could notify the author >> by replying to it. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 125, Issue 2 > ***************************************** From pmpangase at gmail.com Thu Jun 13 04:11:26 2013 From: pmpangase at gmail.com (Phelelani Mpangase) Date: Thu, 13 Jun 2013 10:11:26 +0200 Subject: [Biojava-l] Working with mmCIF files Message-ID: Hello I am new to the Java progrmming language, and I am currently working on a project where I have to find information about a protein structure from the mmCIF file. I would like to extract information about non-polymers in structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about achieving this? I have been able to parse the structure using the SimpleMMcifConsumer, but I am unclear as to the steps I need to follow from there. Is the PdbxEntityNonPoly the right class to use? How do I use this class to achieve to get the data from the "_pdbx_entity_nonpoly"? Regards, Phele From andreas at sdsc.edu Thu Jun 13 20:34:10 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 13 Jun 2013 17:34:10 -0700 Subject: [Biojava-l] Working with mmCIF files In-Reply-To: References: Message-ID: Hi Phele, You are right that the PdbxEntityNonPoly class is the correct container for that cif-category. However, at the moment the SimpleMMcifConsumer does not do much with the PdbxEntityNonPoly data. This is because the same content is available via the chemical component dictionary which is used by biojava-structure as well. To explain this with an example: PDB ID 1A4W has the non-poly QWE You can access all the related information using the code below. Hope that makes sense and let us know if anything is unclear! Andreas String pdbId = "1A4W"; StructureIOFile pdbreader = new MMCIFFileReader(); try { pdbreader.setAutoFetch(true); Structure s = pdbreader.getStructureById(pdbId); Chain h = s.getChainByPDB("H"); List ligands = h.getAtomLigands(); System.out.println("These ligands have been found in chain " + h.getChainID()); for (Group l:ligands){ System.out.println(l); } System.out.println("Accessing QWE directly: "); Group qwe = h.getGroupByPDB(new ResidueNumber("H",373,null)); System.out.println(qwe.getChemComp()); System.out.println(h.getSeqResSequence()); System.out.println(h.getAtomSequence()); System.out.println(h.getAtomGroups(GroupType.HETATM)); } catch (Exception e) { e.printStackTrace(); } On Thu, Jun 13, 2013 at 1:11 AM, Phelelani Mpangase wrote: > Hello > > I am new to the Java progrmming language, and I am currently working on a > project where I have to find information about a protein structure from the > mmCIF file. I would like to extract information about non-polymers in > structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about > achieving this? > > I have been able to parse the structure using the SimpleMMcifConsumer, but > I am unclear as to the steps I need to follow from there. Is the > PdbxEntityNonPoly the right class to use? How do I use this class to > achieve to get the data from the "_pdbx_entity_nonpoly"? > > Regards, > Phele > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From pmpangase at gmail.com Fri Jun 14 04:41:20 2013 From: pmpangase at gmail.com (Phelelani Mpangase) Date: Fri, 14 Jun 2013 10:41:20 +0200 Subject: [Biojava-l] Working with mmCIF files In-Reply-To: References: Message-ID: Hi Andreas That solved my problem. Thank you so much for the help. -Phele On Fri, Jun 14, 2013 at 2:34 AM, Andreas Prlic wrote: > Hi Phele, > > You are right that the PdbxEntityNonPoly class is the correct container > for that cif-category. However, at the moment the SimpleMMcifConsumer does > not do much with the PdbxEntityNonPoly data. This is because the same > content is available via the chemical component dictionary which is used by > biojava-structure as well. > > To explain this with an example: > > PDB ID 1A4W has the non-poly QWE > > You can access all the related information using the code below. > > Hope that makes sense and let us know if anything is unclear! > > Andreas > > > String pdbId = "1A4W"; > > StructureIOFile pdbreader = new MMCIFFileReader(); > > try { > > pdbreader.setAutoFetch(true); > > Structure s = pdbreader.getStructureById(pdbId); > > Chain h = s.getChainByPDB("H"); > > > List ligands = h.getAtomLigands(); > > System.out.println("These ligands have been found in chain " + > h.getChainID()); > > for (Group l:ligands){ > > System.out.println(l); > > } > > System.out.println("Accessing QWE directly: "); > > Group qwe = h.getGroupByPDB(new ResidueNumber("H",373,null)); > > > System.out.println(qwe.getChemComp()); > > System.out.println(h.getSeqResSequence()); > > System.out.println(h.getAtomSequence()); > > System.out.println(h.getAtomGroups(GroupType.HETATM)); > > } catch (Exception e) { > > e.printStackTrace(); > > } > > > > > On Thu, Jun 13, 2013 at 1:11 AM, Phelelani Mpangase wrote: > >> Hello >> >> I am new to the Java progrmming language, and I am currently working on a >> project where I have to find information about a protein structure from >> the >> mmCIF file. I would like to extract information about non-polymers in >> structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about >> achieving this? >> >> I have been able to parse the structure using the SimpleMMcifConsumer, but >> I am unclear as to the steps I need to follow from there. Is the >> PdbxEntityNonPoly the right class to use? How do I use this class to >> achieve to get the data from the "_pdbx_entity_nonpoly"? >> >> Regards, >> Phele >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > From em.alhaweri at gmail.com Mon Jun 17 14:24:46 2013 From: em.alhaweri at gmail.com (Eman Alhaweri) Date: Mon, 17 Jun 2013 19:24:46 +0100 Subject: [Biojava-l] Problem in Digestion with Symbol 'X' Message-ID: Hi I am working with Biojava in my Master project but I have problem and I hope if you can solve it for me. The protein database that I use in my project is contained Xs which is unknown symbol. When I tried to digest the protein sequences, Biojava gave an error that X is unknown symbol. Please tell how can the sequence pass the digestion without error. The error is: Exception in thread "main" org.biojava.bio.symbol.IllegalSymbolException: The mass of the symbol [VAL ILE GLY ALA CYS MET SEC GLU HIS THR PYL TYR SER TRP GLN PRO ASP LEU ARG LYS ASN PHE] is unknown Thanks in advance. From andreas at sdsc.edu Wed Jun 19 00:31:20 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 18 Jun 2013 21:31:20 -0700 Subject: [Biojava-l] Problem in Digestion with Symbol 'X' In-Reply-To: References: Message-ID: Hi Eman, I think the error message tells you what is the problem.. what mass should be assigned to X? You need to figure out what is the best way to handle Xs... Andreas On Mon, Jun 17, 2013 at 11:24 AM, Eman Alhaweri wrote: > Hi > > I am working with Biojava in my Master project but I have problem and I > hope if you can solve it for me. The protein database that I use in my > project is contained Xs which is unknown symbol. When I tried to digest the > protein sequences, Biojava gave an error that X is unknown symbol. Please > tell how can the sequence pass the digestion without error. > > The error is: Exception in thread "main" > org.biojava.bio.symbol.IllegalSymbolException: The mass of the symbol [VAL > ILE GLY ALA CYS MET SEC GLU HIS THR PYL TYR SER TRP GLN PRO ASP LEU ARG LYS > ASN PHE] is unknown > > > Thanks in advance. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From khalil.elmazouari at gmail.com Sun Jun 9 19:32:32 2013 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Sun, 9 Jun 2013 21:32:32 +0200 Subject: [Biojava-l] Local aln - contig assembly Message-ID: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> Hi, I am trying to assemble overlapping sequence (direct & reverse) via local alignment. I am only searching for local aln with 100% identity. Which parameters, matrix ... should I use in order to get 100% ident. local aln. Any other suggestion for assembling overlapping seq (in Java) is welcome. Thanks khalil SubstitutionMatrix matrix = SubstitutionMatrixHelper.getNuc4_2(); SimpleGapPenalty gapP = new SimpleGapPenalty(); gapP.setOpenPenalty((short) 5); gapP.setExtensionPenalty((short) 1); SequencePair psa = Alignments.getPairwiseAlignment(query, target, PairwiseSequenceAlignerType.LOCAL, gapP, matrix); ======== Local Alignment Identity: 97.84688995215312% query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. From andreas at sdsc.edu Tue Jun 11 00:47:41 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 10 Jun 2013 17:47:41 -0700 Subject: [Biojava-l] Local aln - contig assembly In-Reply-To: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> References: <9174D0A2-2838-4C5D-9426-8BB195FF8B88@gmail.com> Message-ID: Hi Khalil, if you can get 100% sequence ID depends on your sequences.. you can try to enforce a more strict alignment by increasing the gap penalties significantly (try to double or triple gap opening and extension) . A On Sun, Jun 9, 2013 at 12:32 PM, Khalil El Mazouari < khalil.elmazouari at gmail.com> wrote: > Hi, > > I am trying to assemble overlapping sequence (direct & reverse) via local > alignment. I am only searching for local aln with 100% identity. > > Which parameters, matrix ... should I use in order to get 100% ident. > local aln. > > Any other suggestion for assembling overlapping seq (in Java) is welcome. > > Thanks > > khalil > > > > SubstitutionMatrix matrix = > SubstitutionMatrixHelper.getNuc4_2(); > SimpleGapPenalty gapP = new SimpleGapPenalty(); > gapP.setOpenPenalty((short) 5); > gapP.setExtensionPenalty((short) 1); > SequencePair psa = > Alignments.getPairwiseAlignment(query, target, > PairwiseSequenceAlignerType.LOCAL, gapP, matrix); > > > > > ======== > > Local Alignment Identity: 97.84688995215312% > > query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 > |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| > target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 > > query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 > > query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 > > query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 > > query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 > > query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 > > query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 > |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| > target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 > > > > > > > > > > > ----- > > Confidentiality Notice: This e-mail and any files transmitted with it are > private and confidential and are solely for the use of the addressee. It > may contain material which is legally privileged. If you are not the > addressee or the person responsible for delivering to the addressee, please > notify that you have received this e-mail in error and that any use of it > is strictly prohibited. It would be helpful if you could notify the author > by replying to it. > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From khalil.elmazouari at gmail.com Tue Jun 11 16:10:27 2013 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 11 Jun 2013 18:10:27 +0200 Subject: [Biojava-l] Biojava-l Digest, Vol 125, Issue 2 In-Reply-To: References: Message-ID: <6746FCC6-7427-457B-A448-745E880A9A3C@gmail.com> Hi Andreas, thanks for the feedback. increasing gop and gep was not sufficient. I had to use a modified version of nuc-4_2 scoring matrix where I set the mismatch value very low. This solved the problem, Best khalil ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. On 11 Jun 2013, at 18:00, biojava-l-request at lists.open-bio.org wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. Re: Local aln - contig assembly (Andreas Prlic) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 10 Jun 2013 17:47:41 -0700 > From: Andreas Prlic > Subject: Re: [Biojava-l] Local aln - contig assembly > To: Khalil El Mazouari > Cc: "Biojava-l at lists.open-bio.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Khalil, > > if you can get 100% sequence ID depends on your sequences.. you can try to > enforce a more strict alignment by increasing the gap penalties > significantly (try to double or triple gap opening and extension) . > > A > > > On Sun, Jun 9, 2013 at 12:32 PM, Khalil El Mazouari < > khalil.elmazouari at gmail.com> wrote: > >> Hi, >> >> I am trying to assemble overlapping sequence (direct & reverse) via local >> alignment. I am only searching for local aln with 100% identity. >> >> Which parameters, matrix ... should I use in order to get 100% ident. >> local aln. >> >> Any other suggestion for assembling overlapping seq (in Java) is welcome. >> >> Thanks >> >> khalil >> >> >> >> SubstitutionMatrix matrix = >> SubstitutionMatrixHelper.getNuc4_2(); >> SimpleGapPenalty gapP = new SimpleGapPenalty(); >> gapP.setOpenPenalty((short) 5); >> gapP.setExtensionPenalty((short) 1); >> SequencePair psa = >> Alignments.getPairwiseAlignment(query, target, >> PairwiseSequenceAlignerType.LOCAL, gapP, matrix); >> >> >> >> >> ======== >> >> Local Alignment Identity: 97.84688995215312% >> >> query GGGGAAAACACGAAAGGCCCTTGGTGGAGGCGCTTGAGACGGTGACAAGGGTTCCCTGGC 68 >> |||||| || ||| ||||||||||||||||||||||||||||||| ||||||||||||| >> target GGGGAAGAC-CGATGGGCCCTTGGTGGAGGCGCTTGAGACGGTGACCAGGGTTCCCTGGC 417 >> >> query CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 128 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CCCAGTAGTCAAAGGTCCGTGAGGAGCTCCACTTGTGTGCACAGTAATATGTGGCTGAGT 477 >> >> query CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 188 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CCACAGGGTCCATGTTGGTCATTGTAAGGACCACCTGGTCTTTGGAGGTGTCCTTGGTGA 537 >> >> query TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 248 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target TGGTGAGCCTGCTCTTCAGAGATGGGCTGTAGCGCTTATCATCATTCCAATAAATGAGTG 597 >> >> query CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 308 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target CAAGCCACTCCAGGGCCTTTCCTGGGGGCTGACGGATCCAGCCCACACCCACTCCACTAG 657 >> >> query TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 368 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> target TGCTGAGTGAGAACCCAGAGAAGGTGCAGGTCAGCGTGAGGGTCTGTGTGGGTTTCACCA 717 >> >> query GCGTAGGACCAGACTCCTTCAAGGTGATCTGGGCCATGGCCGGCTGGGCCGCGAGTAA 426 >> |||||||||||||||||||||||||| ||||||||| |||||||||| |||| ||||| >> target GCGTAGGACCAGACTCCTTCAAGGTG-TCTGGGCCA-GGCCGGCTGG-CCGCAAGTAA 772 >> >> >> >> >> >> >> >> >> >> >> ----- >> >> Confidentiality Notice: This e-mail and any files transmitted with it are >> private and confidential and are solely for the use of the addressee. It >> may contain material which is legally privileged. If you are not the >> addressee or the person responsible for delivering to the addressee, please >> notify that you have received this e-mail in error and that any use of it >> is strictly prohibited. It would be helpful if you could notify the author >> by replying to it. >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 125, Issue 2 > ***************************************** From pmpangase at gmail.com Thu Jun 13 08:11:26 2013 From: pmpangase at gmail.com (Phelelani Mpangase) Date: Thu, 13 Jun 2013 10:11:26 +0200 Subject: [Biojava-l] Working with mmCIF files Message-ID: Hello I am new to the Java progrmming language, and I am currently working on a project where I have to find information about a protein structure from the mmCIF file. I would like to extract information about non-polymers in structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about achieving this? I have been able to parse the structure using the SimpleMMcifConsumer, but I am unclear as to the steps I need to follow from there. Is the PdbxEntityNonPoly the right class to use? How do I use this class to achieve to get the data from the "_pdbx_entity_nonpoly"? Regards, Phele From andreas at sdsc.edu Fri Jun 14 00:34:10 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 13 Jun 2013 17:34:10 -0700 Subject: [Biojava-l] Working with mmCIF files In-Reply-To: References: Message-ID: Hi Phele, You are right that the PdbxEntityNonPoly class is the correct container for that cif-category. However, at the moment the SimpleMMcifConsumer does not do much with the PdbxEntityNonPoly data. This is because the same content is available via the chemical component dictionary which is used by biojava-structure as well. To explain this with an example: PDB ID 1A4W has the non-poly QWE You can access all the related information using the code below. Hope that makes sense and let us know if anything is unclear! Andreas String pdbId = "1A4W"; StructureIOFile pdbreader = new MMCIFFileReader(); try { pdbreader.setAutoFetch(true); Structure s = pdbreader.getStructureById(pdbId); Chain h = s.getChainByPDB("H"); List ligands = h.getAtomLigands(); System.out.println("These ligands have been found in chain " + h.getChainID()); for (Group l:ligands){ System.out.println(l); } System.out.println("Accessing QWE directly: "); Group qwe = h.getGroupByPDB(new ResidueNumber("H",373,null)); System.out.println(qwe.getChemComp()); System.out.println(h.getSeqResSequence()); System.out.println(h.getAtomSequence()); System.out.println(h.getAtomGroups(GroupType.HETATM)); } catch (Exception e) { e.printStackTrace(); } On Thu, Jun 13, 2013 at 1:11 AM, Phelelani Mpangase wrote: > Hello > > I am new to the Java progrmming language, and I am currently working on a > project where I have to find information about a protein structure from the > mmCIF file. I would like to extract information about non-polymers in > structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about > achieving this? > > I have been able to parse the structure using the SimpleMMcifConsumer, but > I am unclear as to the steps I need to follow from there. Is the > PdbxEntityNonPoly the right class to use? How do I use this class to > achieve to get the data from the "_pdbx_entity_nonpoly"? > > Regards, > Phele > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From pmpangase at gmail.com Fri Jun 14 08:41:20 2013 From: pmpangase at gmail.com (Phelelani Mpangase) Date: Fri, 14 Jun 2013 10:41:20 +0200 Subject: [Biojava-l] Working with mmCIF files In-Reply-To: References: Message-ID: Hi Andreas That solved my problem. Thank you so much for the help. -Phele On Fri, Jun 14, 2013 at 2:34 AM, Andreas Prlic wrote: > Hi Phele, > > You are right that the PdbxEntityNonPoly class is the correct container > for that cif-category. However, at the moment the SimpleMMcifConsumer does > not do much with the PdbxEntityNonPoly data. This is because the same > content is available via the chemical component dictionary which is used by > biojava-structure as well. > > To explain this with an example: > > PDB ID 1A4W has the non-poly QWE > > You can access all the related information using the code below. > > Hope that makes sense and let us know if anything is unclear! > > Andreas > > > String pdbId = "1A4W"; > > StructureIOFile pdbreader = new MMCIFFileReader(); > > try { > > pdbreader.setAutoFetch(true); > > Structure s = pdbreader.getStructureById(pdbId); > > Chain h = s.getChainByPDB("H"); > > > List ligands = h.getAtomLigands(); > > System.out.println("These ligands have been found in chain " + > h.getChainID()); > > for (Group l:ligands){ > > System.out.println(l); > > } > > System.out.println("Accessing QWE directly: "); > > Group qwe = h.getGroupByPDB(new ResidueNumber("H",373,null)); > > > System.out.println(qwe.getChemComp()); > > System.out.println(h.getSeqResSequence()); > > System.out.println(h.getAtomSequence()); > > System.out.println(h.getAtomGroups(GroupType.HETATM)); > > } catch (Exception e) { > > e.printStackTrace(); > > } > > > > > On Thu, Jun 13, 2013 at 1:11 AM, Phelelani Mpangase wrote: > >> Hello >> >> I am new to the Java progrmming language, and I am currently working on a >> project where I have to find information about a protein structure from >> the >> mmCIF file. I would like to extract information about non-polymers in >> structures (_pdbx_entity_nonpoly table) using BioJava. How do I go about >> achieving this? >> >> I have been able to parse the structure using the SimpleMMcifConsumer, but >> I am unclear as to the steps I need to follow from there. Is the >> PdbxEntityNonPoly the right class to use? How do I use this class to >> achieve to get the data from the "_pdbx_entity_nonpoly"? >> >> Regards, >> Phele >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > From em.alhaweri at gmail.com Mon Jun 17 18:24:46 2013 From: em.alhaweri at gmail.com (Eman Alhaweri) Date: Mon, 17 Jun 2013 19:24:46 +0100 Subject: [Biojava-l] Problem in Digestion with Symbol 'X' Message-ID: Hi I am working with Biojava in my Master project but I have problem and I hope if you can solve it for me. The protein database that I use in my project is contained Xs which is unknown symbol. When I tried to digest the protein sequences, Biojava gave an error that X is unknown symbol. Please tell how can the sequence pass the digestion without error. The error is: Exception in thread "main" org.biojava.bio.symbol.IllegalSymbolException: The mass of the symbol [VAL ILE GLY ALA CYS MET SEC GLU HIS THR PYL TYR SER TRP GLN PRO ASP LEU ARG LYS ASN PHE] is unknown Thanks in advance. From andreas at sdsc.edu Wed Jun 19 04:31:20 2013 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 18 Jun 2013 21:31:20 -0700 Subject: [Biojava-l] Problem in Digestion with Symbol 'X' In-Reply-To: References: Message-ID: Hi Eman, I think the error message tells you what is the problem.. what mass should be assigned to X? You need to figure out what is the best way to handle Xs... Andreas On Mon, Jun 17, 2013 at 11:24 AM, Eman Alhaweri wrote: > Hi > > I am working with Biojava in my Master project but I have problem and I > hope if you can solve it for me. The protein database that I use in my > project is contained Xs which is unknown symbol. When I tried to digest the > protein sequences, Biojava gave an error that X is unknown symbol. Please > tell how can the sequence pass the digestion without error. > > The error is: Exception in thread "main" > org.biojava.bio.symbol.IllegalSymbolException: The mass of the symbol [VAL > ILE GLY ALA CYS MET SEC GLU HIS THR PYL TYR SER TRP GLN PRO ASP LEU ARG LYS > ASN PHE] is unknown > > > Thanks in advance. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l >