From jogoodma at indiana.edu Mon Jan 3 10:13:27 2011 From: jogoodma at indiana.edu (Josh Goodman) Date: Mon, 03 Jan 2011 10:13:27 -0500 Subject: [Biojava-l] Error parsing GFF3 file In-Reply-To: References: <6D0D6C61-35E4-406A-8554-4C8AB82449F0@mytum.de> Message-ID: <4D21E797.1070906@indiana.edu> Negative locations can also happen in cases where the genome sequence is incomplete and other experimental evidence shows that the actual sequence extends beyond the existing start location. Since coordinate systems are almost always anchored to the assembled genome sequence the other evidence features upstream of the start get assigned negative coordinates. You can see an example of this in Drosophila melanogaster (ftp://ftp.flybase.org/genomes/Drosophila_melanogaster/current/gff/dmel-3L-r5.32.gff.gz). In Dmel 3L you see an aberration breakpoint and chromosome band features all upstream of the sequenced start site at position 1. Cheers, Josh On 12/31/2010 01:21 PM, Scooter Willis wrote: > Phillip > > I think it is complaining about the negative location (-1864985,746). Is > this a circular genome? That seems to be a rather large sequence segment and > I think it is correct to complain about the negative location. We tried to > plan ahead on circular genomes and genes that cross the boundary begin/end > boundary and at the same time not have the programmer brain explode trying > to handle all the combinations that exist. It gets really fun when you have > a negative strand. > > One of the challenges of a valid gff3 file is that you can make sure > ontology is correct and the file format is correct but when you try and > bring it all together to do something with the data(turn it into a protein) > you need to check harder. > > If this is a valid location can you send me the gff3 segment and the DNA > sequence that describes the features and I will see what I can do to make it > work without previous reference to head exploding. Let me know what the end > goal is on parsing gff3 file and what is missing when you try and map to a > GeneSequence/ProteinSequence. > > Thanks > > Scooter > > On Fri, Dec 31, 2010 at 12:07 PM, Philipp Comans wrote: > >> Hello everyone, >> >> I am trying to parse the file available here: >> ftp://ftp.jgi-psf.org/pub/JGI_data/Amphimedon_queenslandica/annotation/Aqu1.gff3.gz >> with the following commands: >> >> import java.util.Iterator; >> >> import org.biojava3.genome.parsers.gff.FeatureI; >> import org.biojava3.genome.parsers.gff.FeatureList; >> import org.biojava3.genome.parsers.gff.GFF3Reader; >> >> public class GFFReader3 { >> >> public static void main(String[] args) throws Exception { >> >> FeatureList features = (FeatureList) >> GFF3Reader.read("/Users/philipp/Dropbox/IDP/JGI_data/annotation/Smiles.gff3"); >> Iterator featureIterator = features.iterator(); >> >> FeatureI currentFeature = null; >> >> while (featureIterator.hasNext()) { >> currentFeature = featureIterator.next(); >> System.out.println(currentFeature); >> } >> >> } >> >> } >> >> The error I get is: >> 31.12.2010 18:05:10 org.biojava3.genome.parsers.gff.GFF3Reader read >> INFO: Gff.read(): Reading >> /Users/philipp/Dropbox/IDP/JGI_data/annotation/Aqu1.gff3 >> Exception in thread "main" java.lang.IllegalArgumentException: Improper >> location parameters: (-1864985,746) >> at org.biojava3.genome.parsers.gff.Location.(Location.java:75) >> at org.biojava3.genome.parsers.gff.Location.union(Location.java:258) >> at >> org.biojava3.genome.parsers.gff.FeatureList.add(FeatureList.java:49) >> at >> org.biojava3.genome.parsers.gff.GFF3Reader.read(GFF3Reader.java:59) >> at GFFReader3.main(GFFReader3.java:11) >> >> I find this very strange because the file is a valid GFF document according >> to >> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online >> >> Is this a bug or am I doing something wrong? >> Thanks for your help, I wish you a happy New Year! >> >> Philipp >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Jan 5 10:46:47 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 5 Jan 2011 07:46:47 -0800 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: Message-ID: Hi Sj, Better to post such questions to the public lists, otherwise they might get lost... SVN access instructions are available from http://www.biojava.org/wiki/CVS_to_SVN_Migration Andreas 2011/1/5 Sj Pookpan : > Dear biojava master > > ? ?I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). ?Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From mavcunha at gmail.com Wed Jan 5 12:20:21 2011 From: mavcunha at gmail.com (Marco Valtas) Date: Wed, 5 Jan 2011 15:20:21 -0200 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: Message-ID: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> You might check the Github mirror too, code.open-bio.org seem offline sometimes. Cheers Marco. Sent from my iPhone On 05/01/2011, at 13:46, Andreas Prlic wrote: > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : >> Dear biojava master >> >> I'm java developer and new to biojava, I want to know that how can get >> access to biojava source code from SVN (by eclipse SVN). Do I need to >> register to biojava to get user account and password (I try to find on >> biojava site but no where to register). Would you please give me the way to >> startup this. >> >> SJP. >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Jan 5 23:26:59 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 5 Jan 2011 20:26:59 -0800 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> Message-ID: try svn co http://svn.github.com/biojava/biojava.git ./biojava A On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > Thank you for your answer > ??? I try the account mention on > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > - Anonymous SVN access or > - Developer SVN access (ssh account required) or > - BioJava SNAPSHOT builds (anonymous, Maven required) > > on eclipse SVN connection user interface, the application always ask for > 'password' to access 'svn', how do I get it pass? > > How ever, I already subscibe my mail to biojava mailing list (yesterday > 05/01/2011) and waiting for mail confirmation, but until now nothing send to > me what the next action for me. > > Regards, > > SJP. > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > You might check the Github mirror too, code.open-bio.org seem offline > sometimes. > > Cheers Marco. > > Sent from my iPhone > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : > > Dear biojava master > > I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > > > From darnells at dnastar.com Thu Jan 6 17:56:42 2011 From: darnells at dnastar.com (Steve Darnell) Date: Thu, 6 Jan 2011 16:56:42 -0600 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> Message-ID: In the past, I have had problems using the Subclipse plug-in in Eclipse to checkout BioJava from the svn service at github on both Windows and Mac. This is how I worked around it... http://lists.open-bio.org/pipermail/biojava-dev/2010-August/004356.html The Subclipse svn plug-in from Tigris would fail during checkout with an exception stating "RA layer request failed svn: REPORT of '[...]' 200 OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of the Windows indexing service, antivirus scanners, or the Subclipse JavaHL adapter causing svn problems. Resolving these issues did not help. In the end, I used a git client on Windows to clone the biojava project from github and then imported it into Eclipse as an existing Maven project with the m2eclipse plug-in. -- I did not revisit my search for an integrated Eclipse solution. Now, I mainly use the jars made by the maven auto-build system. ~Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Wednesday, January 05, 2011 10:27 PM To: Sj Pookpan Cc: biojava-l at biojava.org Subject: Re: [Biojava-l] how to access biojava source control on svn try svn co http://svn.github.com/biojava/biojava.git ./biojava A On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > Thank you for your answer > ??? I try the account mention on > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > - Anonymous SVN access or > - Developer SVN access (ssh account required) or > - BioJava SNAPSHOT builds (anonymous, Maven required) > > on eclipse SVN connection user interface, the application always ask for > 'password' to access 'svn', how do I get it pass? > > How ever, I already subscibe my mail to biojava mailing list (yesterday > 05/01/2011) and waiting for mail confirmation, but until now nothing send to > me what the next action for me. > > Regards, > > SJP. > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > You might check the Github mirror too, code.open-bio.org seem offline > sometimes. > > Cheers Marco. > > Sent from my iPhone > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : > > Dear biojava master > > I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > > > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From hdilley at catbio.com Thu Jan 6 19:09:29 2011 From: hdilley at catbio.com (Hara Dilley) Date: Thu, 6 Jan 2011 16:09:29 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence Message-ID: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Hi, I would like to align a set of sequences against a scaffold and get the list of the modifications for each aligned sequence. I am using biojava3 I have tried to create a profile thinking that I can get the AlignedSequences from it but that it appears to be null. Here is part of my code: Profile profile = Alignments.getMultipleSequenceAlignment(lst); Profile.getAlignedSequence(0); Can someone please point to an example for this or to the classes I have to use. Thank you, Hara From science.translator at gmail.com Thu Jan 6 19:45:13 2011 From: science.translator at gmail.com (Matthew Busse) Date: Thu, 6 Jan 2011 16:45:13 -0800 Subject: [Biojava-l] Checking out code Message-ID: Hello all, I've tried several times over the past couple days to check out code from both of the sites, and I either receive a "Connection timed out" error from the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( http://svn.github.com)" error from the github site. I'm using the maven and subclipse plug-ins for eclipse, and this is the first time Ive tried to use an SVN repository, so it's possible I'm doing something wrong, but these errors sound like it's a connection problem. Thanks! Matthew From andreas at sdsc.edu Thu Jan 6 20:04:43 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 17:04:43 -0800 Subject: [Biojava-l] Checking out code In-Reply-To: References: Message-ID: I just tried to reproduce a fresh checkout from github and I can confirm, it does not work correctly any more. ... investigating what is going on... A On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse wrote: > Hello all, > > I've tried several times over the past couple days to check out code from > both of the sites, and I either receive a "Connection timed out" error from > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > http://svn.github.com)" error from the github site. > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > first time Ive tried to use an SVN repository, so it's possible I'm doing > something wrong, but these errors sound like it's a connection problem. > > Thanks! > > Matthew > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Thu Jan 6 20:06:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 17:06:46 -0800 Subject: [Biojava-l] Checking out code In-Reply-To: References: Message-ID: and to follow up, I believe we are seeing this problem: http://support.github.com/discussions/repos/5188-fresh-checkout-of-svn-repo-shows-dozens-of-empty-files-where-they-shouldnt-be On Thu, Jan 6, 2011 at 5:04 PM, Andreas Prlic wrote: > I just tried to reproduce a fresh checkout from github and I can > confirm, it does not work correctly any more. > > ... investigating what is going on... > > A > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > wrote: >> Hello all, >> >> I've tried several times over the past couple days to check out code from >> both of the sites, and I either receive a "Connection timed out" error from >> the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( >> http://svn.github.com)" error from the github site. >> >> I'm using the maven and subclipse plug-ins for eclipse, and this is the >> first time Ive tried to use an SVN repository, so it's possible I'm doing >> something wrong, but these errors sound like it's a connection problem. >> >> Thanks! >> >> Matthew >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Thu Jan 6 21:24:48 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 18:24:48 -0800 Subject: [Biojava-l] github svn access issues Message-ID: Hi, Seems github is currently having issues with their svn interface. As such if you want to get hold of a copy of the latest source for the moment there are the following two options: - use git rather than svn to fetch the code from github - use Maven and get the biojava3.0.1-SNAPSHOT builds from the http://www.biojava.org/download/maven/ repository Andreas From willishf at ufl.edu Thu Jan 6 21:42:50 2011 From: willishf at ufl.edu (Scooter Willis) Date: Thu, 6 Jan 2011 21:42:50 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Message-ID: Hara Can you provide more of the code you are using that shows how you are loading the initial sequences. Thanks Scooter On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From hdilley at catbio.com Fri Jan 7 12:20:13 2011 From: hdilley at catbio.com (Hara Dilley) Date: Fri, 7 Jan 2011 09:20:13 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Thanks Scooter, Below is the code of how I populate lst. Of course my real sequences are different, but for this example it doesn't matter. List lst = new ArrayList(); ProteinSequence s1 = new ProteinSequence("SHALG"); ProteinSequence s2 = new ProteinSequence("SWQVLG"); lst.add(s1); lst.add(s2); From: willishf at gmail.com [mailto:willishf at gmail.com] On Behalf Of Scooter Willis Sent: Thursday, January 06, 2011 6:43 PM To: Hara Dilley Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence Hara Can you provide more of the code you are using that shows how you are loading the initial sequences. Thanks Scooter On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley > wrote: Hi, I would like to align a set of sequences against a scaffold and get the list of the modifications for each aligned sequence. I am using biojava3 I have tried to create a profile thinking that I can get the AlignedSequences from it but that it appears to be null. Here is part of my code: Profile profile = Alignments.getMultipleSequenceAlignment(lst); Profile.getAlignedSequence(0); Can someone please point to an example for this or to the classes I have to use. Thank you, Hara _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From willishf at ufl.edu Fri Jan 7 14:51:40 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 7 Jan 2011 14:51:40 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: Hara Figured out the problem. Welcome to the world of biology indexes start at 1 and computer science starts at 0. If you use 1 as your first index it will work. In the core module we tried to make that clear by using BioIndex in the method name. I will see what I can do about getting that added/changed in the alignment module. Thanks Scooter On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > Thanks Scooter, > > Below is the code of how I populate lst. Of course my real sequences are > different, but for this example it doesn?t matter. > > > > List lst = *new* ArrayList(); > > ProteinSequence s1 = *new* ProteinSequence(?SHALG?); > > ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); > > lst.add(s1); > > lst.add(s2); > > > > > > > > *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter > Willis > *Sent:* Thursday, January 06, 2011 6:43 PM > *To:* Hara Dilley > *Cc:* biojava-l at lists.open-bio.org > *Subject:* Re: [Biojava-l] biojava3 getting the features from > alignedsequence > > > > Hara > > > > Can you provide more of the code you are using that shows how you are > loading the initial sequences. > > > > Thanks > > > Scooter > > > > On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From mavcunha at gmail.com Fri Jan 7 17:02:46 2011 From: mavcunha at gmail.com (Marco Valtas) Date: Fri, 7 Jan 2011 20:02:46 -0200 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: I think in such cases we could throw an exception. what I mean is that having sequences positions counted from 1 and someone tries to fetch a position from 0 a exception could be thrown telling that such index starts at 1. For arrays and lists that not model a sequence will be better keep the computer science convention. Any thoughts? Marco Valtas Developer at ThoughtWorks Sent from my iPhone On 07/01/2011, at 17:51, Scooter Willis wrote: > Hara > > Figured out the problem. Welcome to the world of biology indexes start at 1 > and computer science starts at 0. > > If you use 1 as your first index it will work. In the core module we tried > to make that clear by using BioIndex in the method name. I will see what I > can do about getting that added/changed in the alignment module. > > Thanks > > Scooter > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > >> Thanks Scooter, >> >> Below is the code of how I populate lst. Of course my real sequences are >> different, but for this example it doesn?t matter. >> >> >> >> List lst = *new* ArrayList(); >> >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); >> >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); >> >> lst.add(s1); >> >> lst.add(s2); >> >> >> >> >> >> >> >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter >> Willis >> *Sent:* Thursday, January 06, 2011 6:43 PM >> *To:* Hara Dilley >> *Cc:* biojava-l at lists.open-bio.org >> *Subject:* Re: [Biojava-l] biojava3 getting the features from >> alignedsequence >> >> >> >> Hara >> >> >> >> Can you provide more of the code you are using that shows how you are >> loading the initial sequences. >> >> >> >> Thanks >> >> >> Scooter >> >> >> >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: >> >> Hi, >> >> I would like to align a set of sequences against a scaffold and get the >> list of the modifications for each aligned sequence. >> I am using biojava3 >> I have tried to create a profile thinking that I can get the >> AlignedSequences from it but that it appears to be null. >> Here is part of my code: >> >> Profile profile = >> Alignments.getMultipleSequenceAlignment(lst); >> Profile.getAlignedSequence(0); >> >> Can someone please point to an example for this or to the classes I have to >> use. >> Thank you, >> Hara >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From hdilley at catbio.com Fri Jan 7 17:04:58 2011 From: hdilley at catbio.com (Hara Dilley) Date: Fri, 7 Jan 2011 14:04:58 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> That would be very helpful! -----Original Message----- From: Marco Valtas [mailto:mavcunha at gmail.com] Sent: Friday, January 07, 2011 2:03 PM To: Scooter Willis Cc: Hara Dilley; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence I think in such cases we could throw an exception. what I mean is that having sequences positions counted from 1 and someone tries to fetch a position from 0 a exception could be thrown telling that such index starts at 1. For arrays and lists that not model a sequence will be better keep the computer science convention. Any thoughts? Marco Valtas Developer at ThoughtWorks Sent from my iPhone On 07/01/2011, at 17:51, Scooter Willis wrote: > Hara > > Figured out the problem. Welcome to the world of biology indexes start at 1 > and computer science starts at 0. > > If you use 1 as your first index it will work. In the core module we tried > to make that clear by using BioIndex in the method name. I will see what I > can do about getting that added/changed in the alignment module. > > Thanks > > Scooter > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > >> Thanks Scooter, >> >> Below is the code of how I populate lst. Of course my real sequences are >> different, but for this example it doesn?t matter. >> >> >> >> List lst = *new* ArrayList(); >> >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); >> >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); >> >> lst.add(s1); >> >> lst.add(s2); >> >> >> >> >> >> >> >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter >> Willis >> *Sent:* Thursday, January 06, 2011 6:43 PM >> *To:* Hara Dilley >> *Cc:* biojava-l at lists.open-bio.org >> *Subject:* Re: [Biojava-l] biojava3 getting the features from >> alignedsequence >> >> >> >> Hara >> >> >> >> Can you provide more of the code you are using that shows how you are >> loading the initial sequences. >> >> >> >> Thanks >> >> >> Scooter >> >> >> >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: >> >> Hi, >> >> I would like to align a set of sequences against a scaffold and get the >> list of the modifications for each aligned sequence. >> I am using biojava3 >> I have tried to create a profile thinking that I can get the >> AlignedSequences from it but that it appears to be null. >> Here is part of my code: >> >> Profile profile = >> Alignments.getMultipleSequenceAlignment(lst); >> Profile.getAlignedSequence(0); >> >> Can someone please point to an example for this or to the classes I have to >> use. >> Thank you, >> Hara >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Jan 7 17:25:28 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 7 Jan 2011 14:25:28 -0800 Subject: [Biojava-l] github svn access issues ... resolved Message-ID: Hi, Quick follow up: it seems that github pushed out an update and this works again: svn co http://svn.github.com/biojava/biojava.git ./biojava Andreas From science.translator at gmail.com Fri Jan 7 19:58:12 2011 From: science.translator at gmail.com (Matthew Busse) Date: Fri, 7 Jan 2011 16:58:12 -0800 Subject: [Biojava-l] Biojava-l Digest, Vol 96, Issue 5 In-Reply-To: References: Message-ID: Thanks for your input, Steve, I was able to clone the source code from the github using a git client (SmartGit 2.02). As a note to others who may be attempting to do the same, it took a really long time, probably about an hour, to clone all the files, just let it keep running. Cheers, Matthew > Message: 1 > Date: Thu, 6 Jan 2011 16:56:42 -0600 > From: "Steve Darnell" > Subject: Re: [Biojava-l] how to access biojava source control on svn > To: "Sj Pookpan" > Cc: biojava-l at biojava.org > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > In the past, I have had problems using the Subclipse plug-in in Eclipse to > checkout BioJava from the svn service at github on both Windows and Mac. > This is how I worked around it... > > http://lists.open-bio.org/pipermail/biojava-dev/2010-August/004356.html > > The Subclipse svn plug-in from Tigris would fail during checkout with an > exception stating "RA layer request failed svn: REPORT of '[...]' 200 > OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of > the Windows indexing service, antivirus scanners, or the Subclipse > JavaHL adapter causing svn problems. Resolving these issues did not > help. > > In the end, I used a git client on Windows to clone the biojava project > from github and then imported it into Eclipse as an existing Maven > project with the m2eclipse plug-in. > > -- > > I did not revisit my search for an integrated Eclipse solution. Now, I > mainly use the jars made by the maven auto-build system. > > ~Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto: > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, January 05, 2011 10:27 PM > To: Sj Pookpan > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] how to access biojava source control on svn > > try > > svn co http://svn.github.com/biojava/biojava.git ./biojava > > > A > > On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > > Thank you for your answer > > ??? I try the account mention on > > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > > > - Anonymous SVN access or > > - Developer SVN access (ssh account required) or > > - BioJava SNAPSHOT builds (anonymous, Maven required) > > > > on eclipse SVN connection user interface, the application always ask for > > 'password' to access 'svn', how do I get it pass? > > > > How ever, I already subscibe my mail to biojava mailing list (yesterday > > 05/01/2011) and waiting for mail confirmation, but until now nothing send > to > > me what the next action for me. > > > > Regards, > > > > SJP. > > > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > > > You might check the Github mirror too, code.open-bio.org seem offline > > sometimes. > > > > Cheers Marco. > > > > Sent from my iPhone > > > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > > > Hi Sj, > > > > Better to post such questions to the public lists, otherwise they > > might get lost... > > > > SVN access instructions are available from > > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > > > Andreas > > > > > > > > > > 2011/1/5 Sj Pookpan : > > > > Dear biojava master > > > > I'm java developer and new to biojava, I want to know that how can get > > access to biojava source code from SVN (by eclipse SVN). Do I need to > > register to biojava to get user account and password (I try to find on > > biojava site but no where to register). Would you please give me the way > to > > startup this. > > > > SJP. > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > ------------------------------ > > Message: 2 > Date: Thu, 6 Jan 2011 16:09:29 -0800 > From: Hara Dilley > Subject: [Biojava-l] biojava3 getting the features from > alignedsequence > To: "biojava-l at lists.open-bio.org" > Message-ID: > <097A86B3EF965D47895332A88D4223CF12AB11203B at mail2.CATBIO.local> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > > > ------------------------------ > > Message: 3 > Date: Thu, 6 Jan 2011 16:45:13 -0800 > From: Matthew Busse > Subject: [Biojava-l] Checking out code > To: biojava-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hello all, > > I've tried several times over the past couple days to check out code from > both of the sites, and I either receive a "Connection timed out" error from > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > http://svn.github.com)" error from the github site. > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > first time Ive tried to use an SVN repository, so it's possible I'm doing > something wrong, but these errors sound like it's a connection problem. > > Thanks! > > Matthew > > > ------------------------------ > > Message: 4 > Date: Thu, 6 Jan 2011 17:04:43 -0800 > From: Andreas Prlic > Subject: Re: [Biojava-l] Checking out code > To: Matthew Busse > Cc: biojava-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > I just tried to reproduce a fresh checkout from github and I can > confirm, it does not work correctly any more. > > ... investigating what is going on... > > A > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > wrote: > > Hello all, > > > > I've tried several times over the past couple days to check out code from > > both of the sites, and I either receive a "Connection timed out" error > from > > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > > http://svn.github.com)" error from the github site. > > > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > > first time Ive tried to use an SVN repository, so it's possible I'm doing > > something wrong, but these errors sound like it's a connection problem. > > > > Thanks! > > > > Matthew > > _______________________________________________ > > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > ------------------------------ > > Message: 5 > Date: Thu, 6 Jan 2011 17:06:46 -0800 > From: Andreas Prlic > Subject: Re: [Biojava-l] Checking out code > To: biojava-l at biojava.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > and to follow up, I believe we are seeing this problem: > > > http://support.github.com/discussions/repos/5188-fresh-checkout-of-svn-repo-shows-dozens-of-empty-files-where-they-shouldnt-be > > > > > On Thu, Jan 6, 2011 at 5:04 PM, Andreas Prlic wrote: > > I just tried to reproduce a fresh checkout from github and I can > > confirm, it does not work correctly any more. > > > > ... investigating what is going on... > > > > A > > > > > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > > wrote: > >> Hello all, > >> > >> I've tried several times over the past couple days to check out code > from > >> both of the sites, and I either receive a "Connection timed out" error > from > >> the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > >> http://svn.github.com)" error from the github site. > >> > >> I'm using the maven and subclipse plug-ins for eclipse, and this is the > >> first time Ive tried to use an SVN repository, so it's possible I'm > doing > >> something wrong, but these errors sound like it's a connection problem. > >> > >> Thanks! > >> > >> Matthew > >> _______________________________________________ > >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > > > ------------------------------ > > Message: 6 > Date: Thu, 6 Jan 2011 18:24:48 -0800 > From: Andreas Prlic > Subject: [Biojava-l] github svn access issues > To: biojava-dev , > biojava-l at biojava.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > Seems github is currently having issues with their svn interface. As > such if you want to get hold of a copy of the latest source for the > moment there are the following two options: > > - use git rather than svn to fetch the code from github > - use Maven and get the biojava3.0.1-SNAPSHOT builds from the > http://www.biojava.org/download/maven/ repository > > Andreas > > > ------------------------------ > > Message: 7 > Date: Thu, 6 Jan 2011 21:42:50 -0500 > From: Scooter Willis > Subject: Re: [Biojava-l] biojava3 getting the features from > alignedsequence > To: Hara Dilley > Cc: "biojava-l at lists.open-bio.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hara > > Can you provide more of the code you are using that shows how you are > loading the initial sequences. > > Thanks > > Scooter > > On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > > > Hi, > > > > I would like to align a set of sequences against a scaffold and get the > > list of the modifications for each aligned sequence. > > I am using biojava3 > > I have tried to create a profile thinking that I can get the > > AlignedSequences from it but that it appears to be null. > > Here is part of my code: > > > > Profile profile = > > Alignments.getMultipleSequenceAlignment(lst); > > Profile.getAlignedSequence(0); > > > > Can someone please point to an example for this or to the classes I have > to > > use. > > Thank you, > > Hara > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 96, Issue 5 > **************************************** > From willishf at ufl.edu Fri Jan 7 20:31:26 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 7 Jan 2011 20:31:26 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> Message-ID: The javadoc on the interface indicates the intent to throw an IndexOutOfBoundsException with javadoc that indicates start with 1. Just needs to be better defined as a general rule in the method name getAlignedSequenceBioIndex(int index) as an example. BioIndex should be used in all methods where the desire is to use 1 instead of 0. We should also provide the equivalent getAlignedSequence(int index) with index = 0; If not this mistake will be made often. Thanks Scooter On Fri, Jan 7, 2011 at 5:04 PM, Hara Dilley wrote: > That would be very helpful! > > -----Original Message----- > From: Marco Valtas [mailto:mavcunha at gmail.com] > Sent: Friday, January 07, 2011 2:03 PM > To: Scooter Willis > Cc: Hara Dilley; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence > > I think in such cases we could throw an exception. what I mean is that > having sequences positions counted from 1 and someone tries to fetch a > position from 0 a exception could be thrown telling that such index starts > at 1. For arrays and lists that not model a sequence will be better keep the > computer science convention. Any thoughts? > > Marco Valtas > Developer at ThoughtWorks > > Sent from my iPhone > > On 07/01/2011, at 17:51, Scooter Willis wrote: > > > Hara > > > > Figured out the problem. Welcome to the world of biology indexes start at > 1 > > and computer science starts at 0. > > > > If you use 1 as your first index it will work. In the core module we > tried > > to make that clear by using BioIndex in the method name. I will see what > I > > can do about getting that added/changed in the alignment module. > > > > Thanks > > > > Scooter > > > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > > > >> Thanks Scooter, > >> > >> Below is the code of how I populate lst. Of course my real sequences are > >> different, but for this example it doesn?t matter. > >> > >> > >> > >> List lst = *new* ArrayList(); > >> > >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); > >> > >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); > >> > >> lst.add(s1); > >> > >> lst.add(s2); > >> > >> > >> > >> > >> > >> > >> > >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of > *Scooter > >> Willis > >> *Sent:* Thursday, January 06, 2011 6:43 PM > >> *To:* Hara Dilley > >> *Cc:* biojava-l at lists.open-bio.org > >> *Subject:* Re: [Biojava-l] biojava3 getting the features from > >> alignedsequence > >> > >> > >> > >> Hara > >> > >> > >> > >> Can you provide more of the code you are using that shows how you are > >> loading the initial sequences. > >> > >> > >> > >> Thanks > >> > >> > >> Scooter > >> > >> > >> > >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > >> > >> Hi, > >> > >> I would like to align a set of sequences against a scaffold and get the > >> list of the modifications for each aligned sequence. > >> I am using biojava3 > >> I have tried to create a profile thinking that I can get the > >> AlignedSequences from it but that it appears to be null. > >> Here is part of my code: > >> > >> Profile profile = > >> Alignments.getMultipleSequenceAlignment(lst); > >> Profile.getAlignedSequence(0); > >> > >> Can someone please point to an example for this or to the classes I have > to > >> use. > >> Thank you, > >> Hara > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > >> > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hdilley at catbio.com Mon Jan 10 13:02:03 2011 From: hdilley at catbio.com (Hara Dilley) Date: Mon, 10 Jan 2011 10:02:03 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> Reading the javadoc, I don't see a direct way of getting the features out of the alignedSequences. I would assume that I have to write my own compare method that compares the 2 sequences, and figures out the features. Is that correct? thanks From willishf at ufl.edu Mon Jan 10 13:39:34 2011 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 10 Jan 2011 13:39:34 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> Message-ID: Hara Features is rather abstract so not sure what data you are trying to extract. Are you looking for the amino acids in each of the aligned columns? Scooter On Mon, Jan 10, 2011 at 1:02 PM, Hara Dilley wrote: > Reading the javadoc, I don?t see a direct way of getting the features out > of the alignedSequences. I would assume that I have to write my own compare > method that compares the 2 sequences, and figures out the features. Is that > correct? > > > > thanks > From khalil.elmazouari at gmail.com Fri Jan 14 05:32:47 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 11:32:47 +0100 Subject: [Biojava-l] unwanted gap in alignments Message-ID: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Hi All, I am testing the PSA and MSA examples from Cookbook3. Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R expected PSA was: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- the same for MSA DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- expected MSA DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! How can I force or avoid the gap creation at specific positions? Many thanks. Khalil From willishf at ufl.edu Fri Jan 14 07:36:34 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 14 Jan 2011 07:36:34 -0500 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: Khalil You can change the GAP penalty and see what happens. I think there is also a way to specify a pre-alignment of sequence positions but haven't used it. Thanks Scooter On Fri, Jan 14, 2011 at 5:32 AM, Khalil El Mazouari < khalil.elmazouari at gmail.com> wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. > below: > > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From khalil.elmazouari at gmail.com Fri Jan 14 08:51:10 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 14:51:10 +0100 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: <21D9291E-DE12-4496-8E0F-5428A4760CA7@gmail.com> Hi Scooter, I've tested different gop and gep values. No success!. Regards, Khalil On 14 Jan 2011, at 13:36, Scooter Willis wrote: > Khalil > > You can change the GAP penalty and see what happens. I think there is also a way to specify a pre-alignment of sequence positions but haven't used it. > > Thanks > > Scooter > > On Fri, Jan 14, 2011 at 5:32 AM, Khalil El Mazouari wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Fri Jan 14 10:45:01 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 14 Jan 2011 07:45:01 -0800 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: looks a bit like an end-gap issue to me. I think the global alignment algorithm does not penalize end gaps. Try a local alignment (smith waterman) instead. Andreas On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From drandrewwalsh at gmail.com Fri Jan 14 11:22:55 2011 From: drandrewwalsh at gmail.com (Andrew Walsh) Date: Fri, 14 Jan 2011 11:22:55 -0500 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: <4D30785F.6070502@gmail.com> Changing the gap penalty isn't making a difference because both versions have the same number of gaps and gaps of the same length. Penalizing end gaps might address the first example, but not the second. Since the gaps are the same (from the point of view of how gaps are scored by the algorithms), what is actually driving the output is the substitution penalties. In the PSA example, the preferred alignment has an 'R' substituted for a 'G', whereas the unwanted output has 'R' substituted for 'S'. The latter is more common substitution since it is more conservative from the point of view of amino acid chemistry and may also require fewer mutations (although that depends on the codon usage for both 'R' and 'S'). Thus it will get a lower penalty, so most algorithms will prefer the unwanted PSA over your expected output. A similar reasoning applies to the MSA example. In the unwanted version, it is matching 'G' to 'G', which is not a substitution at all and thus gets a higher score than the 'V' to 'G' substitution required for the expected output. Now, I can understand why, in the PSA example an end gap seems more likely than an internal gap, and in the MSA example one deletion event seems more likely than two similar but slightly different deletion events. But the math of the traditional alignment algorithms just won't support those outputs. Unfortunately, I don't have a good answer for how to make BioJava output your desired result. But it is my hope that clarifying the problem might be a useful step in arriving at a solution. Incidentally, does your desired output come directly from a particular alignment algorithm, or have they been hand-adjusted? -Andy Walsh On 1/14/2011 10:45 AM, Andreas Prlic wrote: > looks a bit like an end-gap issue to me. I think the global alignment > algorithm does not penalize end gaps. Try a local alignment (smith > waterman) instead. > > Andreas > > > > On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari > wrote: >> Hi All, >> >> I am testing the PSA and MSA examples from Cookbook3. >> >> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: >> >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R >> >> expected PSA was: >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- >> >> >> the same for MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> expected MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> >> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! >> >> How can I force or avoid the gap creation at specific positions? >> >> Many thanks. >> >> Khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From khalil.elmazouari at gmail.com Fri Jan 14 12:31:06 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 18:31:06 +0100 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: Hi Andreas, local alignment doesn't help: GLOBAL: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R LOCAL: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCA EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA The last R is gone. This not what I expect. Regards, Khalil On 14 Jan 2011, at 16:45, Andreas Prlic wrote: > looks a bit like an end-gap issue to me. I think the global alignment > algorithm does not penalize end gaps. Try a local alignment (smith > waterman) instead. > > Andreas > > > > On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari > wrote: >> Hi All, >> >> I am testing the PSA and MSA examples from Cookbook3. >> >> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: >> >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R >> >> expected PSA was: >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- >> >> >> the same for MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> expected MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> >> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! >> >> How can I force or avoid the gap creation at specific positions? >> >> Many thanks. >> >> Khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> From andrew.mcsweeny at rockets.utoledo.edu Sun Jan 16 19:31:59 2011 From: andrew.mcsweeny at rockets.utoledo.edu (McSweeny, Andrew J) Date: Mon, 17 Jan 2011 00:31:59 +0000 Subject: [Biojava-l] Purpose of BioJava mailing list Message-ID: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> Hi all, Is the purpose of the BioJava group and mailing list solely for disussing the BioJava framework, or may we post about unrelated bioinformatics tools we are working on in Java that will be released as open-source projects? Andrew McSweeny, MS From markjschreiber at gmail.com Sun Jan 16 21:10:53 2011 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 17 Jan 2011 10:10:53 +0800 Subject: [Biojava-l] Purpose of BioJava mailing list In-Reply-To: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> References: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> Message-ID: I think low volume stuff that is relevant to bioinformatics and Java is OK. Maybe mark it with an [off topic] tag in the subject. The BioJava linkedin site also discusses Java Bioinformatics things that are not strictly BioJava - Mark On Mon, Jan 17, 2011 at 8:31 AM, McSweeny, Andrew J < andrew.mcsweeny at rockets.utoledo.edu> wrote: > Hi all, > > Is the purpose of the BioJava group and mailing list solely for disussing > the BioJava framework, or may we post about unrelated bioinformatics tools > we are working on in Java that will be released as open-source projects? > > Andrew McSweeny, MS > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From gwaldon at geneinfinity.org Mon Jan 17 12:49:27 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 17 Jan 2011 11:49:27 -0600 Subject: [Biojava-l] new problem: serializable Message-ID: <20110117114927.63994vvte2wvtlic@gator1273.hostgator.com> Hi Bernd: I am having this need too. Have you been able to serialize properly SimpleRichSequence? Is-it possible for you to share you code? We could add it to BioJava. Let me know. Thanks, George ------------------------------------------------------------------------------ On Mon, Sep 27, 2010 at 6:17 AM, Bernd Jagla wrote: Yes, that's what I am doing. I have subclassed from SimpleRichSequence to SimpleSerializableRichSequence (couldn't think of something nicer...) and am working my way through the bits... I just haven't found the tools that make this a "jiffy". I am sweating here; more or less at least.. ;) Thanks again B On 9/27/2010 2:04 PM, Richard Holland wrote: I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. cheers, Richard On 27 Sep 2010, at 13:01, Bernd Jagla wrote: Thanks everyone. I got the biojavax working. Unfortunately the serialization process is not completely done yet ... It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. But I also have problems with SimpleNCBITaxon. At least I was able to create a SimpleRichSequence object Please let me know if you can think of something that would ease the work a bit.... Thanks a lot, Bernd On 9/23/2010 6:01 PM, James Swetnam wrote: How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. James Swetnam On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? On 23 Sep 2010, at 16:27, Bernd Jagla wrote: Sorry, again me... I now get the following error: Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) ... 9 more It seems that the SimpleRichSequence is not serializable.... Is there a way to make use of a serializable object? Thanks, Bernd _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l -- James Swetnam Lead Scientific Programmer Department of Pharmacology NYU Langone Medical Center -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From jw12 at sanger.ac.uk Tue Jan 18 05:28:29 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 18 Jan 2011 10:28:29 +0000 Subject: [Biojava-l] Registrations open for DAS Workshop 2011 Message-ID: DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Registration is open for the 2011 DAS workshop (2,3,4th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.ebi.ac.uk//training/onsite/110302DAS.html and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Please register early as places will be limited. Registration closes 18 February 2011 (17:00). If you are interested in giving a 15 minute talk on the second day please email Jonathan Warren using jonathan.warren at sanger.ac.uk Many thanks The Sanger/EBI DAS team. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From khalil.elmazouari at gmail.com Wed Jan 19 05:39:44 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Wed, 19 Jan 2011 11:39:44 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults Message-ID: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Hi all, while doing PSA or MSA with default gop and gep values I obtained the following alignment! QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R Expected PSA should be at least QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- this expected alignment was obtained with gop=1 and gep=100 I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? Many thanks. Khalil From sterg at teikav.edu.gr Wed Jan 19 06:19:51 2011 From: sterg at teikav.edu.gr (sterg) Date: Wed, 19 Jan 2011 13:19:51 +0200 Subject: [Biojava-l] ScalaLab, a system that can offers high-level scripting access to BioJava Message-ID: <1295435991.6592.0.camel@sterg.teikav.edu.gr> Hi guys, I like to announce that I develop a scientific programming environment based on Scala, with a Matlab-like feeling, the ScalaLab, http://code.google.com/p/scalalab/ that offers the potential to execute BioJava code (BioJava is included as toolbox) and also there exist a lot of potential in using Scala for more high-level, scriptable access to BioJava tasks. A BioJava Scala wrapper library, for having more compact BioJava code is in my mind to develop, of course I'm open to possible cooperation with the community. I have also opened a ScalaLab discussion mailing list for anyone interesting to participate. Regards Stergios From andreas at sdsc.edu Wed Jan 19 10:07:44 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 07:07:44 -0800 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Message-ID: Hi Kalil, can you send your code snipplet that you are running? I just re-ran the cookbook example and it works for me. Also this behaves fine: ProteinSequence s1 = new ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); ProteinSequence s2 = new ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); SubstitutionMatrix matrix = new SimpleSubstitutionMatrix(); SequencePair pair = Alignments.getPairwiseAlignment(s1, s2, PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair); System.out.println("Identicals:" + pair.getNumIdenticals()); System.out.println("Similars:" + pair.getNumSimilars()); Andreas On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari wrote: > Hi all, > > while doing PSA or MSA with default gop and gep values I obtained the following alignment! > > QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS > QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R > > Expected PSA should be at least > QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS > QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- > > this expected alignment was obtained with gop=1 and gep=100 > > I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? > > Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? > > Many thanks. > > Khalil > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From khalil.elmazouari at gmail.com Wed Jan 19 16:04:30 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Wed, 19 Jan 2011 22:04:30 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Message-ID: <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Thank Andreas, these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. Regards, Khalil On 19 Jan 2011, at 16:07, Andreas Prlic wrote: > Hi Kalil, > > can you send your code snipplet that you are running? I just re-ran > the cookbook example and it works for me. Also this behaves fine: > > ProteinSequence s1 = new > ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); > ProteinSequence s2 = new > ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); > > SubstitutionMatrix matrix = new > SimpleSubstitutionMatrix(); > SequencePair pair = > Alignments.getPairwiseAlignment(s1, s2, > PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); > System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), > pair.getTarget().getAccession(), pair); > > System.out.println("Identicals:" + pair.getNumIdenticals()); > System.out.println("Similars:" + pair.getNumSimilars()); > > Andreas > > > > On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari > wrote: >> Hi all, >> >> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >> >> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >> >> Expected PSA should be at least >> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >> >> this expected alignment was obtained with gop=1 and gep=100 >> >> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >> >> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >> >> Many thanks. >> >> Khalil >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> From andreas at sdsc.edu Wed Jan 19 16:35:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 13:35:57 -0800 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Message-ID: even if I use the global alignment for aligning this sequence against itself, it aligns 100% and I don;t see the strange gap. What are the two sequences you are aligning? Otherwise I can;t reproduce the behaviour that you describe. Andreas On Wed, Jan 19, 2011 at 1:04 PM, Khalil El Mazouari wrote: > Thank Andreas, > > these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. > > I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. > > Regards, > > Khalil > > > On 19 Jan 2011, at 16:07, Andreas Prlic wrote: > >> Hi Kalil, >> >> can you send your code snipplet that you are running? I just re-ran >> the cookbook example and it works for me. Also this behaves fine: >> >> ProteinSequence s1 = new >> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >> ? ? ? ? ? ? ? ProteinSequence s2 = new >> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >> >> ? ? ? ? ? ? ? SubstitutionMatrix matrix = new >> SimpleSubstitutionMatrix(); >> ? ? ? ? ? ? ? SequencePair pair = >> Alignments.getPairwiseAlignment(s1, s2, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); >> ? ? ? ? ? ? ? System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), >> pair.getTarget().getAccession(), pair); >> >> ? ? ? ? ? ? ? System.out.println("Identicals:" + pair.getNumIdenticals()); >> ? ? ? ? ? ? ? System.out.println("Similars:" + pair.getNumSimilars()); >> >> Andreas >> >> >> >> On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari >> wrote: >>> Hi all, >>> >>> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >>> >>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >>> >>> Expected PSA should be at least >>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >>> >>> this expected alignment was obtained with gop=1 and gep=100 >>> >>> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >>> >>> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >>> >>> Many thanks. >>> >>> Khalil >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > From jayunit100 at gmail.com Wed Jan 19 17:49:36 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 19 Jan 2011 17:49:36 -0500 Subject: [Biojava-l] structure alignment seems slow ... Message-ID: Hi guys : The following code snippet, for two 150 amino acid proteins, is taking a somewhat long time, that is, well over 4 minutes, .... long t=(System.currentTimeMillis()); StructurePairAligner aligner = new StructurePairAligner(); aligner.align(s1, s2); aligner.setDebug(false); System.out.println("time = " + (System.currentTimeMillis()-t)/1000+" sec"); Strangely, Im running biojava on a 2x28 GHz Quad Core Intel Xeon chip, Mac OS 10.5.8, with aggressiveheap enabled, and 6 Gigs of RAM. This suprises me since the tests on the biojava wiki page for structural alignments seem to be quite efficient. -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Wed Jan 19 17:55:56 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 14:55:56 -0800 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: The old StructurePairAligner is unpublished and is probably slower than the CE and FATCAT implementations. Try e.g. CE: http://biojava.org/wiki/BioJava:CookBook:PDB:CE_Algorithm Andreas On Wed, Jan 19, 2011 at 2:49 PM, Jay Vyas wrote: > Hi guys : > > The following code snippet, for two 150 amino acid proteins, is taking a > somewhat long time, that is, well over 4 ?minutes, .... > > ? ? ? ?long t=(System.currentTimeMillis()); > ? ? ? ?StructurePairAligner aligner = new StructurePairAligner(); > ? ? ? ?aligner.align(s1, s2); > ? ? ? ?aligner.setDebug(false); > ? ? ? ?System.out.println("time = " + (System.currentTimeMillis()-t)/1000+" > sec"); > > Strangely, Im running biojava on a 2x28 GHz Quad Core Intel Xeon chip, Mac > OS 10.5.8, with aggressiveheap enabled, and 6 Gigs of RAM. > This suprises me since the tests on the biojava wiki page for structural > alignments seem to be quite efficient. > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Wed Jan 19 18:12:50 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 19 Jan 2011 18:12:50 -0500 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: Fixed my problem, the issue was that i had multiple models in my structures, and I assumed biojava was only using the first one. To fix it , if you replace Atom[] ca2 = StructureTools.getAtomCAArray(s2); with Atom[] ca2 = StructureTools.getAtomCAArray(s2.getModel(0).get(0)); The CE alignmenbt is blazing fast. ~0 seconds -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Wed Jan 19 18:52:19 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 15:52:19 -0800 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: glad to hear that it works now. I agree with your initial assumption about the StructureTool class. The getAtomCAArray should only consider model 0. I will change it to behave accordingly... Andreas On Wed, Jan 19, 2011 at 3:12 PM, Jay Vyas wrote: > Fixed my problem, the issue was that i had multiple models in my structures, > and I assumed biojava was only using the first one. > > To fix it , if you replace > > > ??????? Atom[] ca2 = StructureTools.getAtomCAArray(s2); > > ???? with > > ??????? Atom[] ca2 = StructureTools.getAtomCAArray(s2.getModel(0).get(0)); > > The CE alignmenbt is blazing fast. ~0 seconds > > -- > Jay Vyas > MMSB/UCHC > From khalil.elmazouari at gmail.com Thu Jan 20 03:42:11 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 20 Jan 2011 09:42:11 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Message-ID: <285E1045-845F-4899-A1D5-2B4316337EE2@gmail.com> please try with the following sequences >seq1 QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >seq2 QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR thanks, khalil On 19 Jan 2011, at 22:35, Andreas Prlic wrote: > even if I use the global alignment for aligning this sequence against > itself, it aligns 100% and I don;t see the strange gap. What are the > two sequences you are aligning? Otherwise I can;t reproduce the > behaviour that you describe. > > Andreas > > > > On Wed, Jan 19, 2011 at 1:04 PM, Khalil El Mazouari > wrote: >> Thank Andreas, >> >> these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. >> >> I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. >> >> Regards, >> >> Khalil >> >> >> On 19 Jan 2011, at 16:07, Andreas Prlic wrote: >> >>> Hi Kalil, >>> >>> can you send your code snipplet that you are running? I just re-ran >>> the cookbook example and it works for me. Also this behaves fine: >>> >>> ProteinSequence s1 = new >>> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >>> ProteinSequence s2 = new >>> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >>> >>> SubstitutionMatrix matrix = new >>> SimpleSubstitutionMatrix(); >>> SequencePair pair = >>> Alignments.getPairwiseAlignment(s1, s2, >>> PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); >>> System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), >>> pair.getTarget().getAccession(), pair); >>> >>> System.out.println("Identicals:" + pair.getNumIdenticals()); >>> System.out.println("Similars:" + pair.getNumSimilars()); >>> >>> Andreas >>> >>> >>> >>> On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari >>> wrote: >>>> Hi all, >>>> >>>> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >>>> >>>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >>>> >>>> Expected PSA should be at least >>>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >>>> >>>> this expected alignment was obtained with gop=1 and gep=100 >>>> >>>> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >>>> >>>> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >>>> >>>> Many thanks. >>>> >>>> Khalil >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >> >> From khalil.elmazouari at gmail.com Fri Jan 21 13:48:24 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 21 Jan 2011 19:48:24 +0100 Subject: [Biojava-l] Write seq in Genbank format Message-ID: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> Hi All, how to output annotated sequences in Genbank format? Thanks, khalil From holland at eaglegenomics.com Fri Jan 21 14:01:26 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 21 Jan 2011 19:01:26 +0000 Subject: [Biojava-l] Write seq in Genbank format In-Reply-To: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> References: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> Message-ID: I don't think BJ3 has a Genbank parser yet (unless I missed something?), so under the older version of BJ you can do it like this: RichSequence rs = ....; // this is your annotated sequence object RichSequence.IOTools.writeGenbank(System.out, rs, null); // this writes it to STDOUT in Genbank format cheers, Richard On 21 Jan 2011, at 18:48, Khalil El Mazouari wrote: > Hi All, > > how to output annotated sequences in Genbank format? > > Thanks, > > khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jayunit100 at gmail.com Mon Jan 24 16:38:48 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 24 Jan 2011 16:38:48 -0500 Subject: [Biojava-l] advice on rmsd algorithms.... Message-ID: Hi guys . I noticed that I get a different RMSD using the biojava alignCE methd, (CeMain), as compared to MolMol, another popular molecular visualization tool. Any idea why ? The rmsd appears to be 2.54 (biojava CEMain) as compared to 13.5 (molmol). Im using the gap size of -1 as in the biojava examples..... -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Jan 24 16:44:08 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 24 Jan 2011 13:44:08 -0800 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: Hi Jay, probably the alignments are not the same. Did you look at the results in 3D and does the MolMol alignment make any sense? Andreas On Mon, Jan 24, 2011 at 1:38 PM, Jay Vyas wrote: > Hi guys . ?I noticed that I get a different RMSD using the biojava alignCE > methd, (CeMain), as compared to MolMol, another popular molecular > visualization tool. > > Any idea why ? ?The rmsd appears to be 2.54 (biojava CEMain) as compared to > 13.5 (molmol). ? Im using the gap size of -1 as in the biojava > examples..... > > > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Mon Jan 24 17:39:37 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 24 Jan 2011 17:39:37 -0500 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: Your right that the alignments different.... Any advise on the difference between FatCat and CE ? From andreas at sdsc.edu Mon Jan 24 17:48:53 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 24 Jan 2011 14:48:53 -0800 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: hm. for details I recommend reading the original papers describing the algorithms. Is there any aspect in particular that you are interested in? Andreas On Mon, Jan 24, 2011 at 2:39 PM, Jay Vyas wrote: > Your right that ?the alignments different.... ?Any advise on the difference > between FatCat and CE ? > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From khalil.elmazouari at gmail.com Tue Jan 25 06:16:58 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 25 Jan 2011 12:16:58 +0100 Subject: [Biojava-l] From biojava to biojava3 Message-ID: Hi All, I have a legacy code that uses RichSequence and other VERY useful classes from biojavax. I am interested in using MSA and some classes from Biojava3. Is it possible to use RichSequence in Biojava3? How to Convert RichSequence to ProteinSequence? If not, how to deal with legacy code in Biojava3 context? many thanks;) Khalil From willishf at ufl.edu Tue Jan 25 07:20:20 2011 From: willishf at ufl.edu (Scooter Willis) Date: Tue, 25 Jan 2011 07:20:20 -0500 Subject: [Biojava-l] From biojava to biojava3 In-Reply-To: References: Message-ID: Khalil You can use both. You will need to write some code(easy get sequence as a string from RichSequence and then create a new ProteinSequence) to go from RichSequence->ProteinSequence->RichSequence. Scooter On Tue, Jan 25, 2011 at 6:16 AM, Khalil El Mazouari wrote: > Hi All, > > I have a legacy code that uses RichSequence and other VERY useful classes from biojavax. I am interested in using MSA and some classes from Biojava3. > Is it possible to use RichSequence in Biojava3? How to Convert RichSequence to ProteinSequence? > If not, how to deal with legacy code in Biojava3 context? > > many thanks;) > > Khalil > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From martin.jones at ed.ac.uk Tue Jan 25 12:35:29 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Tue, 25 Jan 2011 17:35:29 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes Message-ID: Hi, I am using some of the code from here: http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: ------------------ snip----------------- #NEXUS [ID: 7166567671] begin trees; [Note: This tree contains information on the topology, branch lengths (if present), and the probability of the partition indicated by the branch.] tree con_50_majrule = (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); [Note: This tree contains information only on the topology and branch lengths (mean of the posterior probability density).] tree con_50_majrule = (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); end; -------------------------- snip ------------- the file parses without any error, but when I get a list of vertices, the first one (27563) is missing: def tree = getTree(nexus, 'con 50 majrule') tree.vertexSet().each{ println it } output: 6843 p1 6960 61985 p3 6657 p2 ______________snip ___________ The above code is Groovy, but hopefully is self-explanatory. Is the first node being treated specially because it is the root? If so, can I get it using some other method? Thanks, Martin From tiagoantao at gmail.com Tue Jan 25 12:42:29 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 25 Jan 2011 17:42:29 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: I think the patch that I submitted to the old newick module of biojava never really was released by the biojava team. So that example will not work with biojava (unless you track the svn release with the patch and use that, arguably not a good solution). More info can be supplied by the biojava maintainers. I just made the patch and the example. On Tue, Jan 25, 2011 at 5:35 PM, Martin Jones wrote: > Hi, > > I am using some of the code from here: > > http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ > > to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: > > ------------------ snip----------------- > #NEXUS > > [ID: 7166567671] > begin trees; > ? [Note: This tree contains information on the topology, > ? ? ? ? ?branch lengths (if present), and the probability > ? ? ? ? ?of the partition indicated by the branch.] > ? tree con_50_majrule = > (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); > > ? [Note: This tree contains information only on the topology > ? ? ? ? ?and branch lengths (mean of the posterior probability density).] > ? tree con_50_majrule = > (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); > end; > > -------------------------- snip ------------- > > the file parses without any error, but when I get a list of vertices, > the first one (27563) is missing: > > def tree = getTree(nexus, 'con 50 majrule') > tree.vertexSet().each{ > ?println it > } > > output: > > 6843 > p1 > 6960 > 61985 > p3 > 6657 > p2 > > ______________snip ___________ > > The above code is Groovy, but hopefully is self-explanatory. Is the > first node being treated specially because it is the root? If so, can > I get it using some other method? > > Thanks, > > Martin > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From andreas at sdsc.edu Tue Jan 25 12:54:02 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 25 Jan 2011 09:54:02 -0800 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: Hi, Tiago, I found a mail from 2009/11/16 which says that your patch got committed. Is this the one you are referring to? Martin, this would mean that the patch is available through the biojava-legacy project (biojava v. 1.8) Andreas 2011/1/25 Tiago Ant?o : > I think the patch that I submitted to the old newick module of biojava > never really was released by the biojava team. So that example will > not work with biojava (unless you track the svn release with the patch > and use that, arguably not a good solution). > > More info can be supplied by the biojava maintainers. I just made the > patch and the example. > > On Tue, Jan 25, 2011 at 5:35 PM, Martin Jones wrote: >> Hi, >> >> I am using some of the code from here: >> >> http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ >> >> to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: >> >> ------------------ snip----------------- >> #NEXUS >> >> [ID: 7166567671] >> begin trees; >> ? [Note: This tree contains information on the topology, >> ? ? ? ? ?branch lengths (if present), and the probability >> ? ? ? ? ?of the partition indicated by the branch.] >> ? tree con_50_majrule = >> (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); >> >> ? [Note: This tree contains information only on the topology >> ? ? ? ? ?and branch lengths (mean of the posterior probability density).] >> ? tree con_50_majrule = >> (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); >> end; >> >> -------------------------- snip ------------- >> >> the file parses without any error, but when I get a list of vertices, >> the first one (27563) is missing: >> >> def tree = getTree(nexus, 'con 50 majrule') >> tree.vertexSet().each{ >> ?println it >> } >> >> output: >> >> 6843 >> p1 >> 6960 >> 61985 >> p3 >> 6657 >> p2 >> >> ______________snip ___________ >> >> The above code is Groovy, but hopefully is self-explanatory. Is the >> first node being treated specially because it is the root? If so, can >> I get it using some other method? >> >> Thanks, >> >> Martin >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > "If you want to get laid, go to college.? If you want an education, go > to the library." - Frank Zappa > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tiagoantao at gmail.com Tue Jan 25 12:57:05 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 25 Jan 2011 17:57:05 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: 2011/1/25 Andreas Prlic : > Tiago, I found a mail from 2009/11/16 which says that your patch got > committed. Is this the one you are referring to? ?Martin, this would > mean that the patch is available through the biojava-legacy project > (biojava v. 1.8) I do not follow the biojava releases closely, but I think (I might be wrong) that it never got released. From martin.jones at ed.ac.uk Wed Jan 26 05:36:56 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Wed, 26 Jan 2011 10:36:56 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: Thanks for the tip, switching to biojava 1.8 fixed the problem. 2011/1/25 Tiago Ant?o : > 2011/1/25 Andreas Prlic : >> Tiago, I found a mail from 2009/11/16 which says that your patch got >> committed. Is this the one you are referring to? ?Martin, this would >> mean that the patch is available through the biojava-legacy project >> (biojava v. 1.8) > > > I do not follow the biojava releases closely, but I think (I might be > wrong) that it never got released. > > From leapingfrog at yahoo.com Sun Jan 30 02:04:32 2011 From: leapingfrog at yahoo.com (David Scott) Date: Sat, 29 Jan 2011 23:04:32 -0800 (PST) Subject: [Biojava-l] Attempt at Cookbook Message-ID: <384530.37939.qm@web113212.mail.gq1.yahoo.com> Copied code from Alignment section of biojava3 Cookbook on "how can I profile the time and memory requirements of a Multiple Sequence Alignment?" Created a series of FASTA proteins (sod) from NIH. It compiles fine on a Windows machine and I print out the sequences to make sure they look like they are read okay. I added a snippet of code to see it read okay: Iterator it = list.iterator(); System.out.println(); while ( it.hasNext() ) { System.out.println( it.next() ); } System.out.println(); The follow error occurs when I run it: C:\biojava1\w11>java org/biojava3/alignment/CookbookMSAProfiler sod.fasta Loading sequences from sod.fasta... 3 HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQGLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG LVPLFGIDVWEHAYYLQYK HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQVLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG LVPLFGIDVWEHAYYLQYK KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK sequences in 39 ms using 15872 kB Stage 1: pairwise similarity calculation... Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.biojava3.alignment.Alignments.getAllPairsScorers(Ljava/util/List;Lorg/biojava3/alignment/Alignments$PairwiseSequenceScorerType;Lorg/biojava3/alignment/template/GapPenalty;Lorg/biojava3/alignment/template/SubstitutionMatrix;)Ljava/util/List; from class org.biojava3.alignment.CookbookMSAProfiler at org.biojava3.alignment.CookbookMSAProfiler.main(CookbookMSAProfiler.java:86) Sincerely, David Scott From andreas at sdsc.edu Mon Jan 31 01:05:42 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 30 Jan 2011 22:05:42 -0800 Subject: [Biojava-l] Attempt at Cookbook In-Reply-To: <384530.37939.qm@web113212.mail.gq1.yahoo.com> References: <384530.37939.qm@web113212.mail.gq1.yahoo.com> Message-ID: Hi David, Some of the access methods in Alignments.java were not public... now fixed in SVN. Andreas On Sat, Jan 29, 2011 at 11:04 PM, David Scott wrote: > Copied code from Alignment section of biojava3 Cookbook on > ?"how can I profile the time and memory requirements of a Multiple Sequence > Alignment?" > > Created a series of FASTA proteins (sod) from NIH. > > It compiles fine on a Windows machine and I print out the sequences to make sure > they look ?like they are read okay. > > I added a snippet of code to see it read okay: > > ? ?Iterator it = list.iterator(); > ? ?System.out.println(); > ? ?while ( it.hasNext() ) { > ? ? ? ?System.out.println( it.next() ); > ? ?} > ? ?System.out.println(); > > The follow error occurs when I run it: > > C:\biojava1\w11>java org/biojava3/alignment/CookbookMSAProfiler sod.fasta > Loading sequences from sod.fasta... 3 > HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQGLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG > LVPLFGIDVWEHAYYLQYK > HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQVLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG > LVPLFGIDVWEHAYYLQYK > KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN > LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV > WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK > > ?sequences in 39 ms using 15872 kB > > Stage 1: pairwise similarity calculation... Exception in thread "main" > java.lang.IllegalAccessError: tried to access method > org.biojava3.alignment.Alignments.getAllPairsScorers(Ljava/util/List;Lorg/biojava3/alignment/Alignments$PairwiseSequenceScorerType;Lorg/biojava3/alignment/template/GapPenalty;Lorg/biojava3/alignment/template/SubstitutionMatrix;)Ljava/util/List; > ?from class org.biojava3.alignment.CookbookMSAProfiler ? ? ? ?at > org.biojava3.alignment.CookbookMSAProfiler.main(CookbookMSAProfiler.java:86) > > Sincerely, > > David Scott > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From martin.jones at ed.ac.uk Mon Jan 31 05:40:51 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Mon, 31 Jan 2011 10:40:51 +0000 Subject: [Biojava-l] It is possible to parse phylogenetic trees with bootstrap/posterior probabilities Message-ID: Hi all, I am using BioJava to parse MrBayes NEXUS files containing consensus trees, using code that looks like this: NexusFileBuilder builder = new NexusFileBuilder(); NexusFileFormat.parseFile(builder, new File('2fin3-4.nex.con')); NexusFile nexus = builder.getNexusFile(); def tree = getTree(nexus, 'con 50 majrule') The parser constructs a weighted graph where the edge weights are the branch lengths. It is possible to parse the tree so that posterior probabilities are included in the resulting object? Martin From jogoodma at indiana.edu Mon Jan 3 15:13:27 2011 From: jogoodma at indiana.edu (Josh Goodman) Date: Mon, 03 Jan 2011 10:13:27 -0500 Subject: [Biojava-l] Error parsing GFF3 file In-Reply-To: References: <6D0D6C61-35E4-406A-8554-4C8AB82449F0@mytum.de> Message-ID: <4D21E797.1070906@indiana.edu> Negative locations can also happen in cases where the genome sequence is incomplete and other experimental evidence shows that the actual sequence extends beyond the existing start location. Since coordinate systems are almost always anchored to the assembled genome sequence the other evidence features upstream of the start get assigned negative coordinates. You can see an example of this in Drosophila melanogaster (ftp://ftp.flybase.org/genomes/Drosophila_melanogaster/current/gff/dmel-3L-r5.32.gff.gz). In Dmel 3L you see an aberration breakpoint and chromosome band features all upstream of the sequenced start site at position 1. Cheers, Josh On 12/31/2010 01:21 PM, Scooter Willis wrote: > Phillip > > I think it is complaining about the negative location (-1864985,746). Is > this a circular genome? That seems to be a rather large sequence segment and > I think it is correct to complain about the negative location. We tried to > plan ahead on circular genomes and genes that cross the boundary begin/end > boundary and at the same time not have the programmer brain explode trying > to handle all the combinations that exist. It gets really fun when you have > a negative strand. > > One of the challenges of a valid gff3 file is that you can make sure > ontology is correct and the file format is correct but when you try and > bring it all together to do something with the data(turn it into a protein) > you need to check harder. > > If this is a valid location can you send me the gff3 segment and the DNA > sequence that describes the features and I will see what I can do to make it > work without previous reference to head exploding. Let me know what the end > goal is on parsing gff3 file and what is missing when you try and map to a > GeneSequence/ProteinSequence. > > Thanks > > Scooter > > On Fri, Dec 31, 2010 at 12:07 PM, Philipp Comans wrote: > >> Hello everyone, >> >> I am trying to parse the file available here: >> ftp://ftp.jgi-psf.org/pub/JGI_data/Amphimedon_queenslandica/annotation/Aqu1.gff3.gz >> with the following commands: >> >> import java.util.Iterator; >> >> import org.biojava3.genome.parsers.gff.FeatureI; >> import org.biojava3.genome.parsers.gff.FeatureList; >> import org.biojava3.genome.parsers.gff.GFF3Reader; >> >> public class GFFReader3 { >> >> public static void main(String[] args) throws Exception { >> >> FeatureList features = (FeatureList) >> GFF3Reader.read("/Users/philipp/Dropbox/IDP/JGI_data/annotation/Smiles.gff3"); >> Iterator featureIterator = features.iterator(); >> >> FeatureI currentFeature = null; >> >> while (featureIterator.hasNext()) { >> currentFeature = featureIterator.next(); >> System.out.println(currentFeature); >> } >> >> } >> >> } >> >> The error I get is: >> 31.12.2010 18:05:10 org.biojava3.genome.parsers.gff.GFF3Reader read >> INFO: Gff.read(): Reading >> /Users/philipp/Dropbox/IDP/JGI_data/annotation/Aqu1.gff3 >> Exception in thread "main" java.lang.IllegalArgumentException: Improper >> location parameters: (-1864985,746) >> at org.biojava3.genome.parsers.gff.Location.(Location.java:75) >> at org.biojava3.genome.parsers.gff.Location.union(Location.java:258) >> at >> org.biojava3.genome.parsers.gff.FeatureList.add(FeatureList.java:49) >> at >> org.biojava3.genome.parsers.gff.GFF3Reader.read(GFF3Reader.java:59) >> at GFFReader3.main(GFFReader3.java:11) >> >> I find this very strange because the file is a valid GFF document according >> to >> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online >> >> Is this a bug or am I doing something wrong? >> Thanks for your help, I wish you a happy New Year! >> >> Philipp >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Jan 5 15:46:47 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 5 Jan 2011 07:46:47 -0800 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: Message-ID: Hi Sj, Better to post such questions to the public lists, otherwise they might get lost... SVN access instructions are available from http://www.biojava.org/wiki/CVS_to_SVN_Migration Andreas 2011/1/5 Sj Pookpan : > Dear biojava master > > ? ?I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). ?Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From mavcunha at gmail.com Wed Jan 5 17:20:21 2011 From: mavcunha at gmail.com (Marco Valtas) Date: Wed, 5 Jan 2011 15:20:21 -0200 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: Message-ID: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> You might check the Github mirror too, code.open-bio.org seem offline sometimes. Cheers Marco. Sent from my iPhone On 05/01/2011, at 13:46, Andreas Prlic wrote: > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : >> Dear biojava master >> >> I'm java developer and new to biojava, I want to know that how can get >> access to biojava source code from SVN (by eclipse SVN). Do I need to >> register to biojava to get user account and password (I try to find on >> biojava site but no where to register). Would you please give me the way to >> startup this. >> >> SJP. >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Thu Jan 6 04:26:59 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 5 Jan 2011 20:26:59 -0800 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> Message-ID: try svn co http://svn.github.com/biojava/biojava.git ./biojava A On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > Thank you for your answer > ??? I try the account mention on > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > - Anonymous SVN access or > - Developer SVN access (ssh account required) or > - BioJava SNAPSHOT builds (anonymous, Maven required) > > on eclipse SVN connection user interface, the application always ask for > 'password' to access 'svn', how do I get it pass? > > How ever, I already subscibe my mail to biojava mailing list (yesterday > 05/01/2011) and waiting for mail confirmation, but until now nothing send to > me what the next action for me. > > Regards, > > SJP. > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > You might check the Github mirror too, code.open-bio.org seem offline > sometimes. > > Cheers Marco. > > Sent from my iPhone > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : > > Dear biojava master > > I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > > > From darnells at dnastar.com Thu Jan 6 22:56:42 2011 From: darnells at dnastar.com (Steve Darnell) Date: Thu, 6 Jan 2011 16:56:42 -0600 Subject: [Biojava-l] how to access biojava source control on svn In-Reply-To: References: <7C328935-A61C-4B8A-BD2D-126597598F7B@gmail.com> Message-ID: In the past, I have had problems using the Subclipse plug-in in Eclipse to checkout BioJava from the svn service at github on both Windows and Mac. This is how I worked around it... http://lists.open-bio.org/pipermail/biojava-dev/2010-August/004356.html The Subclipse svn plug-in from Tigris would fail during checkout with an exception stating "RA layer request failed svn: REPORT of '[...]' 200 OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of the Windows indexing service, antivirus scanners, or the Subclipse JavaHL adapter causing svn problems. Resolving these issues did not help. In the end, I used a git client on Windows to clone the biojava project from github and then imported it into Eclipse as an existing Maven project with the m2eclipse plug-in. -- I did not revisit my search for an integrated Eclipse solution. Now, I mainly use the jars made by the maven auto-build system. ~Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Wednesday, January 05, 2011 10:27 PM To: Sj Pookpan Cc: biojava-l at biojava.org Subject: Re: [Biojava-l] how to access biojava source control on svn try svn co http://svn.github.com/biojava/biojava.git ./biojava A On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > Thank you for your answer > ??? I try the account mention on > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > - Anonymous SVN access or > - Developer SVN access (ssh account required) or > - BioJava SNAPSHOT builds (anonymous, Maven required) > > on eclipse SVN connection user interface, the application always ask for > 'password' to access 'svn', how do I get it pass? > > How ever, I already subscibe my mail to biojava mailing list (yesterday > 05/01/2011) and waiting for mail confirmation, but until now nothing send to > me what the next action for me. > > Regards, > > SJP. > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > You might check the Github mirror too, code.open-bio.org seem offline > sometimes. > > Cheers Marco. > > Sent from my iPhone > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > Hi Sj, > > Better to post such questions to the public lists, otherwise they > might get lost... > > SVN access instructions are available from > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > Andreas > > > > > 2011/1/5 Sj Pookpan : > > Dear biojava master > > I'm java developer and new to biojava, I want to know that how can get > access to biojava source code from SVN (by eclipse SVN). Do I need to > register to biojava to get user account and password (I try to find on > biojava site but no where to register). Would you please give me the way to > startup this. > > SJP. > > > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From hdilley at catbio.com Fri Jan 7 00:09:29 2011 From: hdilley at catbio.com (Hara Dilley) Date: Thu, 6 Jan 2011 16:09:29 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence Message-ID: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Hi, I would like to align a set of sequences against a scaffold and get the list of the modifications for each aligned sequence. I am using biojava3 I have tried to create a profile thinking that I can get the AlignedSequences from it but that it appears to be null. Here is part of my code: Profile profile = Alignments.getMultipleSequenceAlignment(lst); Profile.getAlignedSequence(0); Can someone please point to an example for this or to the classes I have to use. Thank you, Hara From science.translator at gmail.com Fri Jan 7 00:45:13 2011 From: science.translator at gmail.com (Matthew Busse) Date: Thu, 6 Jan 2011 16:45:13 -0800 Subject: [Biojava-l] Checking out code Message-ID: Hello all, I've tried several times over the past couple days to check out code from both of the sites, and I either receive a "Connection timed out" error from the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( http://svn.github.com)" error from the github site. I'm using the maven and subclipse plug-ins for eclipse, and this is the first time Ive tried to use an SVN repository, so it's possible I'm doing something wrong, but these errors sound like it's a connection problem. Thanks! Matthew From andreas at sdsc.edu Fri Jan 7 01:04:43 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 17:04:43 -0800 Subject: [Biojava-l] Checking out code In-Reply-To: References: Message-ID: I just tried to reproduce a fresh checkout from github and I can confirm, it does not work correctly any more. ... investigating what is going on... A On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse wrote: > Hello all, > > I've tried several times over the past couple days to check out code from > both of the sites, and I either receive a "Connection timed out" error from > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > http://svn.github.com)" error from the github site. > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > first time Ive tried to use an SVN repository, so it's possible I'm doing > something wrong, but these errors sound like it's a connection problem. > > Thanks! > > Matthew > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri Jan 7 01:06:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 17:06:46 -0800 Subject: [Biojava-l] Checking out code In-Reply-To: References: Message-ID: and to follow up, I believe we are seeing this problem: http://support.github.com/discussions/repos/5188-fresh-checkout-of-svn-repo-shows-dozens-of-empty-files-where-they-shouldnt-be On Thu, Jan 6, 2011 at 5:04 PM, Andreas Prlic wrote: > I just tried to reproduce a fresh checkout from github and I can > confirm, it does not work correctly any more. > > ... investigating what is going on... > > A > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > wrote: >> Hello all, >> >> I've tried several times over the past couple days to check out code from >> both of the sites, and I either receive a "Connection timed out" error from >> the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( >> http://svn.github.com)" error from the github site. >> >> I'm using the maven and subclipse plug-ins for eclipse, and this is the >> first time Ive tried to use an SVN repository, so it's possible I'm doing >> something wrong, but these errors sound like it's a connection problem. >> >> Thanks! >> >> Matthew >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Fri Jan 7 02:24:48 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 6 Jan 2011 18:24:48 -0800 Subject: [Biojava-l] github svn access issues Message-ID: Hi, Seems github is currently having issues with their svn interface. As such if you want to get hold of a copy of the latest source for the moment there are the following two options: - use git rather than svn to fetch the code from github - use Maven and get the biojava3.0.1-SNAPSHOT builds from the http://www.biojava.org/download/maven/ repository Andreas From willishf at ufl.edu Fri Jan 7 02:42:50 2011 From: willishf at ufl.edu (Scooter Willis) Date: Thu, 6 Jan 2011 21:42:50 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Message-ID: Hara Can you provide more of the code you are using that shows how you are loading the initial sequences. Thanks Scooter On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From hdilley at catbio.com Fri Jan 7 17:20:13 2011 From: hdilley at catbio.com (Hara Dilley) Date: Fri, 7 Jan 2011 09:20:13 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Thanks Scooter, Below is the code of how I populate lst. Of course my real sequences are different, but for this example it doesn't matter. List lst = new ArrayList(); ProteinSequence s1 = new ProteinSequence("SHALG"); ProteinSequence s2 = new ProteinSequence("SWQVLG"); lst.add(s1); lst.add(s2); From: willishf at gmail.com [mailto:willishf at gmail.com] On Behalf Of Scooter Willis Sent: Thursday, January 06, 2011 6:43 PM To: Hara Dilley Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence Hara Can you provide more of the code you are using that shows how you are loading the initial sequences. Thanks Scooter On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley > wrote: Hi, I would like to align a set of sequences against a scaffold and get the list of the modifications for each aligned sequence. I am using biojava3 I have tried to create a profile thinking that I can get the AlignedSequences from it but that it appears to be null. Here is part of my code: Profile profile = Alignments.getMultipleSequenceAlignment(lst); Profile.getAlignedSequence(0); Can someone please point to an example for this or to the classes I have to use. Thank you, Hara _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From willishf at ufl.edu Fri Jan 7 19:51:40 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 7 Jan 2011 14:51:40 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: Hara Figured out the problem. Welcome to the world of biology indexes start at 1 and computer science starts at 0. If you use 1 as your first index it will work. In the core module we tried to make that clear by using BioIndex in the method name. I will see what I can do about getting that added/changed in the alignment module. Thanks Scooter On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > Thanks Scooter, > > Below is the code of how I populate lst. Of course my real sequences are > different, but for this example it doesn?t matter. > > > > List lst = *new* ArrayList(); > > ProteinSequence s1 = *new* ProteinSequence(?SHALG?); > > ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); > > lst.add(s1); > > lst.add(s2); > > > > > > > > *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter > Willis > *Sent:* Thursday, January 06, 2011 6:43 PM > *To:* Hara Dilley > *Cc:* biojava-l at lists.open-bio.org > *Subject:* Re: [Biojava-l] biojava3 getting the features from > alignedsequence > > > > Hara > > > > Can you provide more of the code you are using that shows how you are > loading the initial sequences. > > > > Thanks > > > Scooter > > > > On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From mavcunha at gmail.com Fri Jan 7 22:02:46 2011 From: mavcunha at gmail.com (Marco Valtas) Date: Fri, 7 Jan 2011 20:02:46 -0200 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: I think in such cases we could throw an exception. what I mean is that having sequences positions counted from 1 and someone tries to fetch a position from 0 a exception could be thrown telling that such index starts at 1. For arrays and lists that not model a sequence will be better keep the computer science convention. Any thoughts? Marco Valtas Developer at ThoughtWorks Sent from my iPhone On 07/01/2011, at 17:51, Scooter Willis wrote: > Hara > > Figured out the problem. Welcome to the world of biology indexes start at 1 > and computer science starts at 0. > > If you use 1 as your first index it will work. In the core module we tried > to make that clear by using BioIndex in the method name. I will see what I > can do about getting that added/changed in the alignment module. > > Thanks > > Scooter > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > >> Thanks Scooter, >> >> Below is the code of how I populate lst. Of course my real sequences are >> different, but for this example it doesn?t matter. >> >> >> >> List lst = *new* ArrayList(); >> >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); >> >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); >> >> lst.add(s1); >> >> lst.add(s2); >> >> >> >> >> >> >> >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter >> Willis >> *Sent:* Thursday, January 06, 2011 6:43 PM >> *To:* Hara Dilley >> *Cc:* biojava-l at lists.open-bio.org >> *Subject:* Re: [Biojava-l] biojava3 getting the features from >> alignedsequence >> >> >> >> Hara >> >> >> >> Can you provide more of the code you are using that shows how you are >> loading the initial sequences. >> >> >> >> Thanks >> >> >> Scooter >> >> >> >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: >> >> Hi, >> >> I would like to align a set of sequences against a scaffold and get the >> list of the modifications for each aligned sequence. >> I am using biojava3 >> I have tried to create a profile thinking that I can get the >> AlignedSequences from it but that it appears to be null. >> Here is part of my code: >> >> Profile profile = >> Alignments.getMultipleSequenceAlignment(lst); >> Profile.getAlignedSequence(0); >> >> Can someone please point to an example for this or to the classes I have to >> use. >> Thank you, >> Hara >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From hdilley at catbio.com Fri Jan 7 22:04:58 2011 From: hdilley at catbio.com (Hara Dilley) Date: Fri, 7 Jan 2011 14:04:58 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> That would be very helpful! -----Original Message----- From: Marco Valtas [mailto:mavcunha at gmail.com] Sent: Friday, January 07, 2011 2:03 PM To: Scooter Willis Cc: Hara Dilley; biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence I think in such cases we could throw an exception. what I mean is that having sequences positions counted from 1 and someone tries to fetch a position from 0 a exception could be thrown telling that such index starts at 1. For arrays and lists that not model a sequence will be better keep the computer science convention. Any thoughts? Marco Valtas Developer at ThoughtWorks Sent from my iPhone On 07/01/2011, at 17:51, Scooter Willis wrote: > Hara > > Figured out the problem. Welcome to the world of biology indexes start at 1 > and computer science starts at 0. > > If you use 1 as your first index it will work. In the core module we tried > to make that clear by using BioIndex in the method name. I will see what I > can do about getting that added/changed in the alignment module. > > Thanks > > Scooter > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > >> Thanks Scooter, >> >> Below is the code of how I populate lst. Of course my real sequences are >> different, but for this example it doesn?t matter. >> >> >> >> List lst = *new* ArrayList(); >> >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); >> >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); >> >> lst.add(s1); >> >> lst.add(s2); >> >> >> >> >> >> >> >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of *Scooter >> Willis >> *Sent:* Thursday, January 06, 2011 6:43 PM >> *To:* Hara Dilley >> *Cc:* biojava-l at lists.open-bio.org >> *Subject:* Re: [Biojava-l] biojava3 getting the features from >> alignedsequence >> >> >> >> Hara >> >> >> >> Can you provide more of the code you are using that shows how you are >> loading the initial sequences. >> >> >> >> Thanks >> >> >> Scooter >> >> >> >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: >> >> Hi, >> >> I would like to align a set of sequences against a scaffold and get the >> list of the modifications for each aligned sequence. >> I am using biojava3 >> I have tried to create a profile thinking that I can get the >> AlignedSequences from it but that it appears to be null. >> Here is part of my code: >> >> Profile profile = >> Alignments.getMultipleSequenceAlignment(lst); >> Profile.getAlignedSequence(0); >> >> Can someone please point to an example for this or to the classes I have to >> use. >> Thank you, >> Hara >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Jan 7 22:25:28 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 7 Jan 2011 14:25:28 -0800 Subject: [Biojava-l] github svn access issues ... resolved Message-ID: Hi, Quick follow up: it seems that github pushed out an update and this works again: svn co http://svn.github.com/biojava/biojava.git ./biojava Andreas From science.translator at gmail.com Sat Jan 8 00:58:12 2011 From: science.translator at gmail.com (Matthew Busse) Date: Fri, 7 Jan 2011 16:58:12 -0800 Subject: [Biojava-l] Biojava-l Digest, Vol 96, Issue 5 In-Reply-To: References: Message-ID: Thanks for your input, Steve, I was able to clone the source code from the github using a git client (SmartGit 2.02). As a note to others who may be attempting to do the same, it took a really long time, probably about an hour, to clone all the files, just let it keep running. Cheers, Matthew > Message: 1 > Date: Thu, 6 Jan 2011 16:56:42 -0600 > From: "Steve Darnell" > Subject: Re: [Biojava-l] how to access biojava source control on svn > To: "Sj Pookpan" > Cc: biojava-l at biojava.org > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > In the past, I have had problems using the Subclipse plug-in in Eclipse to > checkout BioJava from the svn service at github on both Windows and Mac. > This is how I worked around it... > > http://lists.open-bio.org/pipermail/biojava-dev/2010-August/004356.html > > The Subclipse svn plug-in from Tigris would fail during checkout with an > exception stating "RA layer request failed svn: REPORT of '[...]' 200 > OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of > the Windows indexing service, antivirus scanners, or the Subclipse > JavaHL adapter causing svn problems. Resolving these issues did not > help. > > In the end, I used a git client on Windows to clone the biojava project > from github and then imported it into Eclipse as an existing Maven > project with the m2eclipse plug-in. > > -- > > I did not revisit my search for an integrated Eclipse solution. Now, I > mainly use the jars made by the maven auto-build system. > > ~Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto: > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, January 05, 2011 10:27 PM > To: Sj Pookpan > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] how to access biojava source control on svn > > try > > svn co http://svn.github.com/biojava/biojava.git ./biojava > > > A > > On Wed, Jan 5, 2011 at 8:21 PM, Sj Pookpan wrote: > > Thank you for your answer > > ??? I try the account mention on > > 'http://www.biojava.org/wiki/CVS_to_SVN_Migration' > > > > - Anonymous SVN access or > > - Developer SVN access (ssh account required) or > > - BioJava SNAPSHOT builds (anonymous, Maven required) > > > > on eclipse SVN connection user interface, the application always ask for > > 'password' to access 'svn', how do I get it pass? > > > > How ever, I already subscibe my mail to biojava mailing list (yesterday > > 05/01/2011) and waiting for mail confirmation, but until now nothing send > to > > me what the next action for me. > > > > Regards, > > > > SJP. > > > > On 1/6/2011 12:20 AM, Marco Valtas wrote: > > > > You might check the Github mirror too, code.open-bio.org seem offline > > sometimes. > > > > Cheers Marco. > > > > Sent from my iPhone > > > > On 05/01/2011, at 13:46, Andreas Prlic wrote: > > > > Hi Sj, > > > > Better to post such questions to the public lists, otherwise they > > might get lost... > > > > SVN access instructions are available from > > http://www.biojava.org/wiki/CVS_to_SVN_Migration > > > > Andreas > > > > > > > > > > 2011/1/5 Sj Pookpan : > > > > Dear biojava master > > > > I'm java developer and new to biojava, I want to know that how can get > > access to biojava source code from SVN (by eclipse SVN). Do I need to > > register to biojava to get user account and password (I try to find on > > biojava site but no where to register). Would you please give me the way > to > > startup this. > > > > SJP. > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > ------------------------------ > > Message: 2 > Date: Thu, 6 Jan 2011 16:09:29 -0800 > From: Hara Dilley > Subject: [Biojava-l] biojava3 getting the features from > alignedsequence > To: "biojava-l at lists.open-bio.org" > Message-ID: > <097A86B3EF965D47895332A88D4223CF12AB11203B at mail2.CATBIO.local> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > I would like to align a set of sequences against a scaffold and get the > list of the modifications for each aligned sequence. > I am using biojava3 > I have tried to create a profile thinking that I can get the > AlignedSequences from it but that it appears to be null. > Here is part of my code: > > Profile profile = > Alignments.getMultipleSequenceAlignment(lst); > Profile.getAlignedSequence(0); > > Can someone please point to an example for this or to the classes I have to > use. > Thank you, > Hara > > > > > ------------------------------ > > Message: 3 > Date: Thu, 6 Jan 2011 16:45:13 -0800 > From: Matthew Busse > Subject: [Biojava-l] Checking out code > To: biojava-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hello all, > > I've tried several times over the past couple days to check out code from > both of the sites, and I either receive a "Connection timed out" error from > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > http://svn.github.com)" error from the github site. > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > first time Ive tried to use an SVN repository, so it's possible I'm doing > something wrong, but these errors sound like it's a connection problem. > > Thanks! > > Matthew > > > ------------------------------ > > Message: 4 > Date: Thu, 6 Jan 2011 17:04:43 -0800 > From: Andreas Prlic > Subject: Re: [Biojava-l] Checking out code > To: Matthew Busse > Cc: biojava-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > I just tried to reproduce a fresh checkout from github and I can > confirm, it does not work correctly any more. > > ... investigating what is going on... > > A > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > wrote: > > Hello all, > > > > I've tried several times over the past couple days to check out code from > > both of the sites, and I either receive a "Connection timed out" error > from > > the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > > http://svn.github.com)" error from the github site. > > > > I'm using the maven and subclipse plug-ins for eclipse, and this is the > > first time Ive tried to use an SVN repository, so it's possible I'm doing > > something wrong, but these errors sound like it's a connection problem. > > > > Thanks! > > > > Matthew > > _______________________________________________ > > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > ------------------------------ > > Message: 5 > Date: Thu, 6 Jan 2011 17:06:46 -0800 > From: Andreas Prlic > Subject: Re: [Biojava-l] Checking out code > To: biojava-l at biojava.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > and to follow up, I believe we are seeing this problem: > > > http://support.github.com/discussions/repos/5188-fresh-checkout-of-svn-repo-shows-dozens-of-empty-files-where-they-shouldnt-be > > > > > On Thu, Jan 6, 2011 at 5:04 PM, Andreas Prlic wrote: > > I just tried to reproduce a fresh checkout from github and I can > > confirm, it does not work correctly any more. > > > > ... investigating what is going on... > > > > A > > > > > > > > On Thu, Jan 6, 2011 at 4:45 PM, Matthew Busse > > wrote: > >> Hello all, > >> > >> I've tried several times over the past couple days to check out code > from > >> both of the sites, and I either receive a "Connection timed out" error > from > >> the code.open site or a "PROPFIND of '/biojava': 502 Bad Gateway ( > >> http://svn.github.com)" error from the github site. > >> > >> I'm using the maven and subclipse plug-ins for eclipse, and this is the > >> first time Ive tried to use an SVN repository, so it's possible I'm > doing > >> something wrong, but these errors sound like it's a connection problem. > >> > >> Thanks! > >> > >> Matthew > >> _______________________________________________ > >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > > > ------------------------------ > > Message: 6 > Date: Thu, 6 Jan 2011 18:24:48 -0800 > From: Andreas Prlic > Subject: [Biojava-l] github svn access issues > To: biojava-dev , > biojava-l at biojava.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > Seems github is currently having issues with their svn interface. As > such if you want to get hold of a copy of the latest source for the > moment there are the following two options: > > - use git rather than svn to fetch the code from github > - use Maven and get the biojava3.0.1-SNAPSHOT builds from the > http://www.biojava.org/download/maven/ repository > > Andreas > > > ------------------------------ > > Message: 7 > Date: Thu, 6 Jan 2011 21:42:50 -0500 > From: Scooter Willis > Subject: Re: [Biojava-l] biojava3 getting the features from > alignedsequence > To: Hara Dilley > Cc: "biojava-l at lists.open-bio.org" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hara > > Can you provide more of the code you are using that shows how you are > loading the initial sequences. > > Thanks > > Scooter > > On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > > > Hi, > > > > I would like to align a set of sequences against a scaffold and get the > > list of the modifications for each aligned sequence. > > I am using biojava3 > > I have tried to create a profile thinking that I can get the > > AlignedSequences from it but that it appears to be null. > > Here is part of my code: > > > > Profile profile = > > Alignments.getMultipleSequenceAlignment(lst); > > Profile.getAlignedSequence(0); > > > > Can someone please point to an example for this or to the classes I have > to > > use. > > Thank you, > > Hara > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 96, Issue 5 > **************************************** > From willishf at ufl.edu Sat Jan 8 01:31:26 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 7 Jan 2011 20:31:26 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> Message-ID: The javadoc on the interface indicates the intent to throw an IndexOutOfBoundsException with javadoc that indicates start with 1. Just needs to be better defined as a general rule in the method name getAlignedSequenceBioIndex(int index) as an example. BioIndex should be used in all methods where the desire is to use 1 instead of 0. We should also provide the equivalent getAlignedSequence(int index) with index = 0; If not this mistake will be made often. Thanks Scooter On Fri, Jan 7, 2011 at 5:04 PM, Hara Dilley wrote: > That would be very helpful! > > -----Original Message----- > From: Marco Valtas [mailto:mavcunha at gmail.com] > Sent: Friday, January 07, 2011 2:03 PM > To: Scooter Willis > Cc: Hara Dilley; biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] biojava3 getting the features from alignedsequence > > I think in such cases we could throw an exception. what I mean is that > having sequences positions counted from 1 and someone tries to fetch a > position from 0 a exception could be thrown telling that such index starts > at 1. For arrays and lists that not model a sequence will be better keep the > computer science convention. Any thoughts? > > Marco Valtas > Developer at ThoughtWorks > > Sent from my iPhone > > On 07/01/2011, at 17:51, Scooter Willis wrote: > > > Hara > > > > Figured out the problem. Welcome to the world of biology indexes start at > 1 > > and computer science starts at 0. > > > > If you use 1 as your first index it will work. In the core module we > tried > > to make that clear by using BioIndex in the method name. I will see what > I > > can do about getting that added/changed in the alignment module. > > > > Thanks > > > > Scooter > > > > On Fri, Jan 7, 2011 at 12:20 PM, Hara Dilley wrote: > > > >> Thanks Scooter, > >> > >> Below is the code of how I populate lst. Of course my real sequences are > >> different, but for this example it doesn?t matter. > >> > >> > >> > >> List lst = *new* ArrayList(); > >> > >> ProteinSequence s1 = *new* ProteinSequence(?SHALG?); > >> > >> ProteinSequence s2 = *new* ProteinSequence(?SWQVLG?); > >> > >> lst.add(s1); > >> > >> lst.add(s2); > >> > >> > >> > >> > >> > >> > >> > >> *From:* willishf at gmail.com [mailto:willishf at gmail.com] *On Behalf Of > *Scooter > >> Willis > >> *Sent:* Thursday, January 06, 2011 6:43 PM > >> *To:* Hara Dilley > >> *Cc:* biojava-l at lists.open-bio.org > >> *Subject:* Re: [Biojava-l] biojava3 getting the features from > >> alignedsequence > >> > >> > >> > >> Hara > >> > >> > >> > >> Can you provide more of the code you are using that shows how you are > >> loading the initial sequences. > >> > >> > >> > >> Thanks > >> > >> > >> Scooter > >> > >> > >> > >> On Thu, Jan 6, 2011 at 7:09 PM, Hara Dilley wrote: > >> > >> Hi, > >> > >> I would like to align a set of sequences against a scaffold and get the > >> list of the modifications for each aligned sequence. > >> I am using biojava3 > >> I have tried to create a profile thinking that I can get the > >> AlignedSequences from it but that it appears to be null. > >> Here is part of my code: > >> > >> Profile profile = > >> Alignments.getMultipleSequenceAlignment(lst); > >> Profile.getAlignedSequence(0); > >> > >> Can someone please point to an example for this or to the classes I have > to > >> use. > >> Thank you, > >> Hara > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > >> > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hdilley at catbio.com Mon Jan 10 18:02:03 2011 From: hdilley at catbio.com (Hara Dilley) Date: Mon, 10 Jan 2011 10:02:03 -0800 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> Message-ID: <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> Reading the javadoc, I don't see a direct way of getting the features out of the alignedSequences. I would assume that I have to write my own compare method that compares the 2 sequences, and figures out the features. Is that correct? thanks From willishf at ufl.edu Mon Jan 10 18:39:34 2011 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 10 Jan 2011 13:39:34 -0500 Subject: [Biojava-l] biojava3 getting the features from alignedsequence In-Reply-To: <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> References: <097A86B3EF965D47895332A88D4223CF12AB11203B@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112090@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB112121@mail2.CATBIO.local> <097A86B3EF965D47895332A88D4223CF12AB1121B1@mail2.CATBIO.local> Message-ID: Hara Features is rather abstract so not sure what data you are trying to extract. Are you looking for the amino acids in each of the aligned columns? Scooter On Mon, Jan 10, 2011 at 1:02 PM, Hara Dilley wrote: > Reading the javadoc, I don?t see a direct way of getting the features out > of the alignedSequences. I would assume that I have to write my own compare > method that compares the 2 sequences, and figures out the features. Is that > correct? > > > > thanks > From khalil.elmazouari at gmail.com Fri Jan 14 10:32:47 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 11:32:47 +0100 Subject: [Biojava-l] unwanted gap in alignments Message-ID: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Hi All, I am testing the PSA and MSA examples from Cookbook3. Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R expected PSA was: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- the same for MSA DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- expected MSA DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! How can I force or avoid the gap creation at specific positions? Many thanks. Khalil From willishf at ufl.edu Fri Jan 14 12:36:34 2011 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 14 Jan 2011 07:36:34 -0500 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: Khalil You can change the GAP penalty and see what happens. I think there is also a way to specify a pre-alignment of sequence positions but haven't used it. Thanks Scooter On Fri, Jan 14, 2011 at 5:32 AM, Khalil El Mazouari < khalil.elmazouari at gmail.com> wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. > below: > > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From khalil.elmazouari at gmail.com Fri Jan 14 13:51:10 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 14:51:10 +0100 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: <21D9291E-DE12-4496-8E0F-5428A4760CA7@gmail.com> Hi Scooter, I've tested different gop and gep values. No success!. Regards, Khalil On 14 Jan 2011, at 13:36, Scooter Willis wrote: > Khalil > > You can change the GAP penalty and see what happens. I think there is also a way to specify a pre-alignment of sequence positions but haven't used it. > > Thanks > > Scooter > > On Fri, Jan 14, 2011 at 5:32 AM, Khalil El Mazouari wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Fri Jan 14 15:45:01 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 14 Jan 2011 07:45:01 -0800 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: looks a bit like an end-gap issue to me. I think the global alignment algorithm does not penalize end gaps. Try a local alignment (smith waterman) instead. Andreas On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari wrote: > Hi All, > > I am testing the PSA and MSA examples from Cookbook3. > > Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: > > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R > > expected PSA was: > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS > EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- > > > the same for MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > expected MSA > DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT > EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- > EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- > > > I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! > > How can I force or avoid the gap creation at specific positions? > > Many thanks. > > Khalil > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From drandrewwalsh at gmail.com Fri Jan 14 16:22:55 2011 From: drandrewwalsh at gmail.com (Andrew Walsh) Date: Fri, 14 Jan 2011 11:22:55 -0500 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: <4D30785F.6070502@gmail.com> Changing the gap penalty isn't making a difference because both versions have the same number of gaps and gaps of the same length. Penalizing end gaps might address the first example, but not the second. Since the gaps are the same (from the point of view of how gaps are scored by the algorithms), what is actually driving the output is the substitution penalties. In the PSA example, the preferred alignment has an 'R' substituted for a 'G', whereas the unwanted output has 'R' substituted for 'S'. The latter is more common substitution since it is more conservative from the point of view of amino acid chemistry and may also require fewer mutations (although that depends on the codon usage for both 'R' and 'S'). Thus it will get a lower penalty, so most algorithms will prefer the unwanted PSA over your expected output. A similar reasoning applies to the MSA example. In the unwanted version, it is matching 'G' to 'G', which is not a substitution at all and thus gets a higher score than the 'V' to 'G' substitution required for the expected output. Now, I can understand why, in the PSA example an end gap seems more likely than an internal gap, and in the MSA example one deletion event seems more likely than two similar but slightly different deletion events. But the math of the traditional alignment algorithms just won't support those outputs. Unfortunately, I don't have a good answer for how to make BioJava output your desired result. But it is my hope that clarifying the problem might be a useful step in arriving at a solution. Incidentally, does your desired output come directly from a particular alignment algorithm, or have they been hand-adjusted? -Andy Walsh On 1/14/2011 10:45 AM, Andreas Prlic wrote: > looks a bit like an end-gap issue to me. I think the global alignment > algorithm does not penalize end gaps. Try a local alignment (smith > waterman) instead. > > Andreas > > > > On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari > wrote: >> Hi All, >> >> I am testing the PSA and MSA examples from Cookbook3. >> >> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: >> >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R >> >> expected PSA was: >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- >> >> >> the same for MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> expected MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> >> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! >> >> How can I force or avoid the gap creation at specific positions? >> >> Many thanks. >> >> Khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From khalil.elmazouari at gmail.com Fri Jan 14 17:31:06 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 14 Jan 2011 18:31:06 +0100 Subject: [Biojava-l] unwanted gap in alignments In-Reply-To: References: <8CCDC4FD-8052-4CCA-93F6-A4DE8ED9DC60@gmail.com> Message-ID: Hi Andreas, local alignment doesn't help: GLOBAL: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R LOCAL: EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCA EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA The last R is gone. This not what I expect. Regards, Khalil On 14 Jan 2011, at 16:45, Andreas Prlic wrote: > looks a bit like an end-gap issue to me. I think the global alignment > algorithm does not penalize end gaps. Try a local alignment (smith > waterman) instead. > > Andreas > > > > On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari > wrote: >> Hi All, >> >> I am testing the PSA and MSA examples from Cookbook3. >> >> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below: >> >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R >> >> expected PSA was: >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS >> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR------------------- >> >> >> the same for MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> expected MSA >> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT >> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR----------------- >> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR----------------- >> >> >> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success! >> >> How can I force or avoid the gap creation at specific positions? >> >> Many thanks. >> >> Khalil >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> From andrew.mcsweeny at rockets.utoledo.edu Mon Jan 17 00:31:59 2011 From: andrew.mcsweeny at rockets.utoledo.edu (McSweeny, Andrew J) Date: Mon, 17 Jan 2011 00:31:59 +0000 Subject: [Biojava-l] Purpose of BioJava mailing list Message-ID: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> Hi all, Is the purpose of the BioJava group and mailing list solely for disussing the BioJava framework, or may we post about unrelated bioinformatics tools we are working on in Java that will be released as open-source projects? Andrew McSweeny, MS From markjschreiber at gmail.com Mon Jan 17 02:10:53 2011 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 17 Jan 2011 10:10:53 +0800 Subject: [Biojava-l] Purpose of BioJava mailing list In-Reply-To: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> References: <469B4CD3D7690A418E8F96B7BA4585F81406F7B0@BL2PRD0103MB052.prod.exchangelabs.com> Message-ID: I think low volume stuff that is relevant to bioinformatics and Java is OK. Maybe mark it with an [off topic] tag in the subject. The BioJava linkedin site also discusses Java Bioinformatics things that are not strictly BioJava - Mark On Mon, Jan 17, 2011 at 8:31 AM, McSweeny, Andrew J < andrew.mcsweeny at rockets.utoledo.edu> wrote: > Hi all, > > Is the purpose of the BioJava group and mailing list solely for disussing > the BioJava framework, or may we post about unrelated bioinformatics tools > we are working on in Java that will be released as open-source projects? > > Andrew McSweeny, MS > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From gwaldon at geneinfinity.org Mon Jan 17 17:49:27 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 17 Jan 2011 11:49:27 -0600 Subject: [Biojava-l] new problem: serializable Message-ID: <20110117114927.63994vvte2wvtlic@gator1273.hostgator.com> Hi Bernd: I am having this need too. Have you been able to serialize properly SimpleRichSequence? Is-it possible for you to share you code? We could add it to BioJava. Let me know. Thanks, George ------------------------------------------------------------------------------ On Mon, Sep 27, 2010 at 6:17 AM, Bernd Jagla wrote: Yes, that's what I am doing. I have subclassed from SimpleRichSequence to SimpleSerializableRichSequence (couldn't think of something nicer...) and am working my way through the bits... I just haven't found the tools that make this a "jiffy". I am sweating here; more or less at least.. ;) Thanks again B On 9/27/2010 2:04 PM, Richard Holland wrote: I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. cheers, Richard On 27 Sep 2010, at 13:01, Bernd Jagla wrote: Thanks everyone. I got the biojavax working. Unfortunately the serialization process is not completely done yet ... It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. But I also have problems with SimpleNCBITaxon. At least I was able to create a SimpleRichSequence object Please let me know if you can think of something that would ease the work a bit.... Thanks a lot, Bernd On 9/23/2010 6:01 PM, James Swetnam wrote: How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. James Swetnam On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? On 23 Sep 2010, at 16:27, Bernd Jagla wrote: Sorry, again me... I now get the following error: Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) ... 9 more It seems that the SimpleRichSequence is not serializable.... Is there a way to make use of a serializable object? Thanks, Bernd _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l -- James Swetnam Lead Scientific Programmer Department of Pharmacology NYU Langone Medical Center -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From jw12 at sanger.ac.uk Tue Jan 18 10:28:29 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 18 Jan 2011 10:28:29 +0000 Subject: [Biojava-l] Registrations open for DAS Workshop 2011 Message-ID: DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Registration is open for the 2011 DAS workshop (2,3,4th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.ebi.ac.uk//training/onsite/110302DAS.html and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Please register early as places will be limited. Registration closes 18 February 2011 (17:00). If you are interested in giving a 15 minute talk on the second day please email Jonathan Warren using jonathan.warren at sanger.ac.uk Many thanks The Sanger/EBI DAS team. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From khalil.elmazouari at gmail.com Wed Jan 19 10:39:44 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Wed, 19 Jan 2011 11:39:44 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults Message-ID: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Hi all, while doing PSA or MSA with default gop and gep values I obtained the following alignment! QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R Expected PSA should be at least QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- this expected alignment was obtained with gop=1 and gep=100 I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? Many thanks. Khalil From sterg at teikav.edu.gr Wed Jan 19 11:19:51 2011 From: sterg at teikav.edu.gr (sterg) Date: Wed, 19 Jan 2011 13:19:51 +0200 Subject: [Biojava-l] ScalaLab, a system that can offers high-level scripting access to BioJava Message-ID: <1295435991.6592.0.camel@sterg.teikav.edu.gr> Hi guys, I like to announce that I develop a scientific programming environment based on Scala, with a Matlab-like feeling, the ScalaLab, http://code.google.com/p/scalalab/ that offers the potential to execute BioJava code (BioJava is included as toolbox) and also there exist a lot of potential in using Scala for more high-level, scriptable access to BioJava tasks. A BioJava Scala wrapper library, for having more compact BioJava code is in my mind to develop, of course I'm open to possible cooperation with the community. I have also opened a ScalaLab discussion mailing list for anyone interesting to participate. Regards Stergios From andreas at sdsc.edu Wed Jan 19 15:07:44 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 07:07:44 -0800 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Message-ID: Hi Kalil, can you send your code snipplet that you are running? I just re-ran the cookbook example and it works for me. Also this behaves fine: ProteinSequence s1 = new ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); ProteinSequence s2 = new ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); SubstitutionMatrix matrix = new SimpleSubstitutionMatrix(); SequencePair pair = Alignments.getPairwiseAlignment(s1, s2, PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair); System.out.println("Identicals:" + pair.getNumIdenticals()); System.out.println("Similars:" + pair.getNumSimilars()); Andreas On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari wrote: > Hi all, > > while doing PSA or MSA with default gop and gep values I obtained the following alignment! > > QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS > QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R > > Expected PSA should be at least > QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS > QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- > > this expected alignment was obtained with gop=1 and gep=100 > > I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? > > Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? > > Many thanks. > > Khalil > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From khalil.elmazouari at gmail.com Wed Jan 19 21:04:30 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Wed, 19 Jan 2011 22:04:30 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> Message-ID: <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Thank Andreas, these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. Regards, Khalil On 19 Jan 2011, at 16:07, Andreas Prlic wrote: > Hi Kalil, > > can you send your code snipplet that you are running? I just re-ran > the cookbook example and it works for me. Also this behaves fine: > > ProteinSequence s1 = new > ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); > ProteinSequence s2 = new > ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); > > SubstitutionMatrix matrix = new > SimpleSubstitutionMatrix(); > SequencePair pair = > Alignments.getPairwiseAlignment(s1, s2, > PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); > System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), > pair.getTarget().getAccession(), pair); > > System.out.println("Identicals:" + pair.getNumIdenticals()); > System.out.println("Similars:" + pair.getNumSimilars()); > > Andreas > > > > On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari > wrote: >> Hi all, >> >> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >> >> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >> >> Expected PSA should be at least >> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >> >> this expected alignment was obtained with gop=1 and gep=100 >> >> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >> >> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >> >> Many thanks. >> >> Khalil >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> From andreas at sdsc.edu Wed Jan 19 21:35:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 13:35:57 -0800 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Message-ID: even if I use the global alignment for aligning this sequence against itself, it aligns 100% and I don;t see the strange gap. What are the two sequences you are aligning? Otherwise I can;t reproduce the behaviour that you describe. Andreas On Wed, Jan 19, 2011 at 1:04 PM, Khalil El Mazouari wrote: > Thank Andreas, > > these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. > > I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. > > Regards, > > Khalil > > > On 19 Jan 2011, at 16:07, Andreas Prlic wrote: > >> Hi Kalil, >> >> can you send your code snipplet that you are running? I just re-ran >> the cookbook example and it works for me. Also this behaves fine: >> >> ProteinSequence s1 = new >> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >> ? ? ? ? ? ? ? ProteinSequence s2 = new >> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >> >> ? ? ? ? ? ? ? SubstitutionMatrix matrix = new >> SimpleSubstitutionMatrix(); >> ? ? ? ? ? ? ? SequencePair pair = >> Alignments.getPairwiseAlignment(s1, s2, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); >> ? ? ? ? ? ? ? System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), >> pair.getTarget().getAccession(), pair); >> >> ? ? ? ? ? ? ? System.out.println("Identicals:" + pair.getNumIdenticals()); >> ? ? ? ? ? ? ? System.out.println("Similars:" + pair.getNumSimilars()); >> >> Andreas >> >> >> >> On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari >> wrote: >>> Hi all, >>> >>> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >>> >>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >>> >>> Expected PSA should be at least >>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >>> >>> this expected alignment was obtained with gop=1 and gep=100 >>> >>> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >>> >>> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >>> >>> Many thanks. >>> >>> Khalil >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > From jayunit100 at gmail.com Wed Jan 19 22:49:36 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 19 Jan 2011 17:49:36 -0500 Subject: [Biojava-l] structure alignment seems slow ... Message-ID: Hi guys : The following code snippet, for two 150 amino acid proteins, is taking a somewhat long time, that is, well over 4 minutes, .... long t=(System.currentTimeMillis()); StructurePairAligner aligner = new StructurePairAligner(); aligner.align(s1, s2); aligner.setDebug(false); System.out.println("time = " + (System.currentTimeMillis()-t)/1000+" sec"); Strangely, Im running biojava on a 2x28 GHz Quad Core Intel Xeon chip, Mac OS 10.5.8, with aggressiveheap enabled, and 6 Gigs of RAM. This suprises me since the tests on the biojava wiki page for structural alignments seem to be quite efficient. -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Wed Jan 19 22:55:56 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 14:55:56 -0800 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: The old StructurePairAligner is unpublished and is probably slower than the CE and FATCAT implementations. Try e.g. CE: http://biojava.org/wiki/BioJava:CookBook:PDB:CE_Algorithm Andreas On Wed, Jan 19, 2011 at 2:49 PM, Jay Vyas wrote: > Hi guys : > > The following code snippet, for two 150 amino acid proteins, is taking a > somewhat long time, that is, well over 4 ?minutes, .... > > ? ? ? ?long t=(System.currentTimeMillis()); > ? ? ? ?StructurePairAligner aligner = new StructurePairAligner(); > ? ? ? ?aligner.align(s1, s2); > ? ? ? ?aligner.setDebug(false); > ? ? ? ?System.out.println("time = " + (System.currentTimeMillis()-t)/1000+" > sec"); > > Strangely, Im running biojava on a 2x28 GHz Quad Core Intel Xeon chip, Mac > OS 10.5.8, with aggressiveheap enabled, and 6 Gigs of RAM. > This suprises me since the tests on the biojava wiki page for structural > alignments seem to be quite efficient. > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Wed Jan 19 23:12:50 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 19 Jan 2011 18:12:50 -0500 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: Fixed my problem, the issue was that i had multiple models in my structures, and I assumed biojava was only using the first one. To fix it , if you replace Atom[] ca2 = StructureTools.getAtomCAArray(s2); with Atom[] ca2 = StructureTools.getAtomCAArray(s2.getModel(0).get(0)); The CE alignmenbt is blazing fast. ~0 seconds -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Wed Jan 19 23:52:19 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 19 Jan 2011 15:52:19 -0800 Subject: [Biojava-l] structure alignment seems slow ... In-Reply-To: References: Message-ID: glad to hear that it works now. I agree with your initial assumption about the StructureTool class. The getAtomCAArray should only consider model 0. I will change it to behave accordingly... Andreas On Wed, Jan 19, 2011 at 3:12 PM, Jay Vyas wrote: > Fixed my problem, the issue was that i had multiple models in my structures, > and I assumed biojava was only using the first one. > > To fix it , if you replace > > > ??????? Atom[] ca2 = StructureTools.getAtomCAArray(s2); > > ???? with > > ??????? Atom[] ca2 = StructureTools.getAtomCAArray(s2.getModel(0).get(0)); > > The CE alignmenbt is blazing fast. ~0 seconds > > -- > Jay Vyas > MMSB/UCHC > From khalil.elmazouari at gmail.com Thu Jan 20 08:42:11 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 20 Jan 2011 09:42:11 +0100 Subject: [Biojava-l] SimpleGapPenalty defaults In-Reply-To: References: <15185872-D7BC-4789-B853-D3463C14E5C8@gmail.com> <6656537C-6B43-47AD-83D1-6B6E1D09E706@gmail.com> Message-ID: <285E1045-845F-4899-A1D5-2B4316337EE2@gmail.com> please try with the following sequences >seq1 QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >seq2 QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR thanks, khalil On 19 Jan 2011, at 22:35, Andreas Prlic wrote: > even if I use the global alignment for aligning this sequence against > itself, it aligns 100% and I don;t see the strange gap. What are the > two sequences you are aligning? Otherwise I can;t reproduce the > behaviour that you describe. > > Andreas > > > > On Wed, Jan 19, 2011 at 1:04 PM, Khalil El Mazouari > wrote: >> Thank Andreas, >> >> these 2 seq (s1 and s2) are exactly the same. Indeed, it works for 100% identical seq. >> >> I have used the same code as below except, I used .GLOBAL. I am not interested in local alignment. >> >> Regards, >> >> Khalil >> >> >> On 19 Jan 2011, at 16:07, Andreas Prlic wrote: >> >>> Hi Kalil, >>> >>> can you send your code snipplet that you are running? I just re-ran >>> the cookbook example and it works for me. Also this behaves fine: >>> >>> ProteinSequence s1 = new >>> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >>> ProteinSequence s2 = new >>> ProteinSequence("QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSSQVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCAR"); >>> >>> SubstitutionMatrix matrix = new >>> SimpleSubstitutionMatrix(); >>> SequencePair pair = >>> Alignments.getPairwiseAlignment(s1, s2, >>> PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix); >>> System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), >>> pair.getTarget().getAccession(), pair); >>> >>> System.out.println("Identicals:" + pair.getNumIdenticals()); >>> System.out.println("Similars:" + pair.getNumSimilars()); >>> >>> Andreas >>> >>> >>> >>> On Wed, Jan 19, 2011 at 2:39 AM, Khalil El Mazouari >>> wrote: >>>> Hi all, >>>> >>>> while doing PSA or MSA with default gop and gep values I obtained the following alignment! >>>> >>>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA---------------------R >>>> >>>> Expected PSA should be at least >>>> QVQLQQPGSELVKPGASVKLSCKASGYTFTNYLIHWVRQRPGRGLEWIGRIDPNSGGTKYSEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCATYYFGRSFFDFWGQGTTLTVSS >>>> QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCA-----R---------------- >>>> >>>> this expected alignment was obtained with gop=1 and gep=100 >>>> >>>> I can't understand while the PSA algorithm with default values always adds many gaps at the end of alignment to end up with a S:R while it is obvious that with less gaps we could obtain better SequencePair with R:R? >>>> >>>> Finally, how to get a score for PSA, that reflects the number of identical, similar residues and gaps? >>>> >>>> Many thanks. >>>> >>>> Khalil >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >> >> From khalil.elmazouari at gmail.com Fri Jan 21 18:48:24 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Fri, 21 Jan 2011 19:48:24 +0100 Subject: [Biojava-l] Write seq in Genbank format Message-ID: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> Hi All, how to output annotated sequences in Genbank format? Thanks, khalil From holland at eaglegenomics.com Fri Jan 21 19:01:26 2011 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 21 Jan 2011 19:01:26 +0000 Subject: [Biojava-l] Write seq in Genbank format In-Reply-To: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> References: <285EC26C-C2FD-4AEE-8CD1-B870325F702D@gmail.com> Message-ID: I don't think BJ3 has a Genbank parser yet (unless I missed something?), so under the older version of BJ you can do it like this: RichSequence rs = ....; // this is your annotated sequence object RichSequence.IOTools.writeGenbank(System.out, rs, null); // this writes it to STDOUT in Genbank format cheers, Richard On 21 Jan 2011, at 18:48, Khalil El Mazouari wrote: > Hi All, > > how to output annotated sequences in Genbank format? > > Thanks, > > khalil > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jayunit100 at gmail.com Mon Jan 24 21:38:48 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 24 Jan 2011 16:38:48 -0500 Subject: [Biojava-l] advice on rmsd algorithms.... Message-ID: Hi guys . I noticed that I get a different RMSD using the biojava alignCE methd, (CeMain), as compared to MolMol, another popular molecular visualization tool. Any idea why ? The rmsd appears to be 2.54 (biojava CEMain) as compared to 13.5 (molmol). Im using the gap size of -1 as in the biojava examples..... -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Jan 24 21:44:08 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 24 Jan 2011 13:44:08 -0800 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: Hi Jay, probably the alignments are not the same. Did you look at the results in 3D and does the MolMol alignment make any sense? Andreas On Mon, Jan 24, 2011 at 1:38 PM, Jay Vyas wrote: > Hi guys . ?I noticed that I get a different RMSD using the biojava alignCE > methd, (CeMain), as compared to MolMol, another popular molecular > visualization tool. > > Any idea why ? ?The rmsd appears to be 2.54 (biojava CEMain) as compared to > 13.5 (molmol). ? Im using the gap size of -1 as in the biojava > examples..... > > > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Mon Jan 24 22:39:37 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Mon, 24 Jan 2011 17:39:37 -0500 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: Your right that the alignments different.... Any advise on the difference between FatCat and CE ? From andreas at sdsc.edu Mon Jan 24 22:48:53 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 24 Jan 2011 14:48:53 -0800 Subject: [Biojava-l] advice on rmsd algorithms.... In-Reply-To: References: Message-ID: hm. for details I recommend reading the original papers describing the algorithms. Is there any aspect in particular that you are interested in? Andreas On Mon, Jan 24, 2011 at 2:39 PM, Jay Vyas wrote: > Your right that ?the alignments different.... ?Any advise on the difference > between FatCat and CE ? > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From khalil.elmazouari at gmail.com Tue Jan 25 11:16:58 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 25 Jan 2011 12:16:58 +0100 Subject: [Biojava-l] From biojava to biojava3 Message-ID: Hi All, I have a legacy code that uses RichSequence and other VERY useful classes from biojavax. I am interested in using MSA and some classes from Biojava3. Is it possible to use RichSequence in Biojava3? How to Convert RichSequence to ProteinSequence? If not, how to deal with legacy code in Biojava3 context? many thanks;) Khalil From willishf at ufl.edu Tue Jan 25 12:20:20 2011 From: willishf at ufl.edu (Scooter Willis) Date: Tue, 25 Jan 2011 07:20:20 -0500 Subject: [Biojava-l] From biojava to biojava3 In-Reply-To: References: Message-ID: Khalil You can use both. You will need to write some code(easy get sequence as a string from RichSequence and then create a new ProteinSequence) to go from RichSequence->ProteinSequence->RichSequence. Scooter On Tue, Jan 25, 2011 at 6:16 AM, Khalil El Mazouari wrote: > Hi All, > > I have a legacy code that uses RichSequence and other VERY useful classes from biojavax. I am interested in using MSA and some classes from Biojava3. > Is it possible to use RichSequence in Biojava3? How to Convert RichSequence to ProteinSequence? > If not, how to deal with legacy code in Biojava3 context? > > many thanks;) > > Khalil > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From martin.jones at ed.ac.uk Tue Jan 25 17:35:29 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Tue, 25 Jan 2011 17:35:29 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes Message-ID: Hi, I am using some of the code from here: http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: ------------------ snip----------------- #NEXUS [ID: 7166567671] begin trees; [Note: This tree contains information on the topology, branch lengths (if present), and the probability of the partition indicated by the branch.] tree con_50_majrule = (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); [Note: This tree contains information only on the topology and branch lengths (mean of the posterior probability density).] tree con_50_majrule = (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); end; -------------------------- snip ------------- the file parses without any error, but when I get a list of vertices, the first one (27563) is missing: def tree = getTree(nexus, 'con 50 majrule') tree.vertexSet().each{ println it } output: 6843 p1 6960 61985 p3 6657 p2 ______________snip ___________ The above code is Groovy, but hopefully is self-explanatory. Is the first node being treated specially because it is the root? If so, can I get it using some other method? Thanks, Martin From tiagoantao at gmail.com Tue Jan 25 17:42:29 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 25 Jan 2011 17:42:29 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: I think the patch that I submitted to the old newick module of biojava never really was released by the biojava team. So that example will not work with biojava (unless you track the svn release with the patch and use that, arguably not a good solution). More info can be supplied by the biojava maintainers. I just made the patch and the example. On Tue, Jan 25, 2011 at 5:35 PM, Martin Jones wrote: > Hi, > > I am using some of the code from here: > > http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ > > to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: > > ------------------ snip----------------- > #NEXUS > > [ID: 7166567671] > begin trees; > ? [Note: This tree contains information on the topology, > ? ? ? ? ?branch lengths (if present), and the probability > ? ? ? ? ?of the partition indicated by the branch.] > ? tree con_50_majrule = > (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); > > ? [Note: This tree contains information only on the topology > ? ? ? ? ?and branch lengths (mean of the posterior probability density).] > ? tree con_50_majrule = > (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); > end; > > -------------------------- snip ------------- > > the file parses without any error, but when I get a list of vertices, > the first one (27563) is missing: > > def tree = getTree(nexus, 'con 50 majrule') > tree.vertexSet().each{ > ?println it > } > > output: > > 6843 > p1 > 6960 > 61985 > p3 > 6657 > p2 > > ______________snip ___________ > > The above code is Groovy, but hopefully is self-explanatory. Is the > first node being treated specially because it is the root? If so, can > I get it using some other method? > > Thanks, > > Martin > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From andreas at sdsc.edu Tue Jan 25 17:54:02 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 25 Jan 2011 09:54:02 -0800 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: Hi, Tiago, I found a mail from 2009/11/16 which says that your patch got committed. Is this the one you are referring to? Martin, this would mean that the patch is available through the biojava-legacy project (biojava v. 1.8) Andreas 2011/1/25 Tiago Ant?o : > I think the patch that I submitted to the old newick module of biojava > never really was released by the biojava team. So that example will > not work with biojava (unless you track the svn release with the patch > and use that, arguably not a good solution). > > More info can be supplied by the biojava maintainers. I just made the > patch and the example. > > On Tue, Jan 25, 2011 at 5:35 PM, Martin Jones wrote: >> Hi, >> >> I am using some of the code from here: >> >> http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ >> >> to parse nexus phylogenetic tree data. My nexus file is output from MrBayes: >> >> ------------------ snip----------------- >> #NEXUS >> >> [ID: 7166567671] >> begin trees; >> ? [Note: This tree contains information on the topology, >> ? ? ? ? ?branch lengths (if present), and the probability >> ? ? ? ? ?of the partition indicated by the branch.] >> ? tree con_50_majrule = >> (27563:0.194008,(6843:0.233188,6960:0.229043)0.98:0.117649,(61985:0.217735,6657:0.275071)0.89:0.089314); >> >> ? [Note: This tree contains information only on the topology >> ? ? ? ? ?and branch lengths (mean of the posterior probability density).] >> ? tree con_50_majrule = >> (27563:0.194008,(6843:0.233188,6960:0.229043):0.117649,(61985:0.217735,6657:0.275071):0.089314); >> end; >> >> -------------------------- snip ------------- >> >> the file parses without any error, but when I get a list of vertices, >> the first one (27563) is missing: >> >> def tree = getTree(nexus, 'con 50 majrule') >> tree.vertexSet().each{ >> ?println it >> } >> >> output: >> >> 6843 >> p1 >> 6960 >> 61985 >> p3 >> 6657 >> p2 >> >> ______________snip ___________ >> >> The above code is Groovy, but hopefully is self-explanatory. Is the >> first node being treated specially because it is the root? If so, can >> I get it using some other method? >> >> Thanks, >> >> Martin >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > "If you want to get laid, go to college.? If you want an education, go > to the library." - Frank Zappa > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tiagoantao at gmail.com Tue Jan 25 17:57:05 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 25 Jan 2011 17:57:05 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: 2011/1/25 Andreas Prlic : > Tiago, I found a mail from 2009/11/16 which says that your patch got > committed. Is this the one you are referring to? ?Martin, this would > mean that the patch is available through the biojava-legacy project > (biojava v. 1.8) I do not follow the biojava releases closely, but I think (I might be wrong) that it never got released. From martin.jones at ed.ac.uk Wed Jan 26 10:36:56 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Wed, 26 Jan 2011 10:36:56 +0000 Subject: [Biojava-l] The root node is missing when parsing nexus tree files from MrBayes In-Reply-To: References: Message-ID: Thanks for the tip, switching to biojava 1.8 fixed the problem. 2011/1/25 Tiago Ant?o : > 2011/1/25 Andreas Prlic : >> Tiago, I found a mail from 2009/11/16 which says that your patch got >> committed. Is this the one you are referring to? ?Martin, this would >> mean that the patch is available through the biojava-legacy project >> (biojava v. 1.8) > > > I do not follow the biojava releases closely, but I think (I might be > wrong) that it never got released. > > From leapingfrog at yahoo.com Sun Jan 30 07:04:32 2011 From: leapingfrog at yahoo.com (David Scott) Date: Sat, 29 Jan 2011 23:04:32 -0800 (PST) Subject: [Biojava-l] Attempt at Cookbook Message-ID: <384530.37939.qm@web113212.mail.gq1.yahoo.com> Copied code from Alignment section of biojava3 Cookbook on "how can I profile the time and memory requirements of a Multiple Sequence Alignment?" Created a series of FASTA proteins (sod) from NIH. It compiles fine on a Windows machine and I print out the sequences to make sure they look like they are read okay. I added a snippet of code to see it read okay: Iterator it = list.iterator(); System.out.println(); while ( it.hasNext() ) { System.out.println( it.next() ); } System.out.println(); The follow error occurs when I run it: C:\biojava1\w11>java org/biojava3/alignment/CookbookMSAProfiler sod.fasta Loading sequences from sod.fasta... 3 HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQGLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG LVPLFGIDVWEHAYYLQYK HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQVLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG LVPLFGIDVWEHAYYLQYK KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK sequences in 39 ms using 15872 kB Stage 1: pairwise similarity calculation... Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.biojava3.alignment.Alignments.getAllPairsScorers(Ljava/util/List;Lorg/biojava3/alignment/Alignments$PairwiseSequenceScorerType;Lorg/biojava3/alignment/template/GapPenalty;Lorg/biojava3/alignment/template/SubstitutionMatrix;)Ljava/util/List; from class org.biojava3.alignment.CookbookMSAProfiler at org.biojava3.alignment.CookbookMSAProfiler.main(CookbookMSAProfiler.java:86) Sincerely, David Scott From andreas at sdsc.edu Mon Jan 31 06:05:42 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 30 Jan 2011 22:05:42 -0800 Subject: [Biojava-l] Attempt at Cookbook In-Reply-To: <384530.37939.qm@web113212.mail.gq1.yahoo.com> References: <384530.37939.qm@web113212.mail.gq1.yahoo.com> Message-ID: Hi David, Some of the access methods in Alignments.java were not public... now fixed in SVN. Andreas On Sat, Jan 29, 2011 at 11:04 PM, David Scott wrote: > Copied code from Alignment section of biojava3 Cookbook on > ?"how can I profile the time and memory requirements of a Multiple Sequence > Alignment?" > > Created a series of FASTA proteins (sod) from NIH. > > It compiles fine on a Windows machine and I print out the sequences to make sure > they look ?like they are read okay. > > I added a snippet of code to see it read okay: > > ? ?Iterator it = list.iterator(); > ? ?System.out.println(); > ? ?while ( it.hasNext() ) { > ? ? ? ?System.out.println( it.next() ); > ? ?} > ? ?System.out.println(); > > The follow error occurs when I run it: > > C:\biojava1\w11>java org/biojava3/alignment/CookbookMSAProfiler sod.fasta > Loading sequences from sod.fasta... 3 > HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQGLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG > LVPLFGIDVWEHAYYLQYK > HINHSIFWTNLCKDGGEPSGKLLQAINRDFGSLQVLQARLNAIAIAVQGSGWGWLGYNKIDKRLEVACCPNQDPLEPTTG > LVPLFGIDVWEHAYYLQYK > KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN > LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV > WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK > > ?sequences in 39 ms using 15872 kB > > Stage 1: pairwise similarity calculation... Exception in thread "main" > java.lang.IllegalAccessError: tried to access method > org.biojava3.alignment.Alignments.getAllPairsScorers(Ljava/util/List;Lorg/biojava3/alignment/Alignments$PairwiseSequenceScorerType;Lorg/biojava3/alignment/template/GapPenalty;Lorg/biojava3/alignment/template/SubstitutionMatrix;)Ljava/util/List; > ?from class org.biojava3.alignment.CookbookMSAProfiler ? ? ? ?at > org.biojava3.alignment.CookbookMSAProfiler.main(CookbookMSAProfiler.java:86) > > Sincerely, > > David Scott > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From martin.jones at ed.ac.uk Mon Jan 31 10:40:51 2011 From: martin.jones at ed.ac.uk (Martin Jones) Date: Mon, 31 Jan 2011 10:40:51 +0000 Subject: [Biojava-l] It is possible to parse phylogenetic trees with bootstrap/posterior probabilities Message-ID: Hi all, I am using BioJava to parse MrBayes NEXUS files containing consensus trees, using code that looks like this: NexusFileBuilder builder = new NexusFileBuilder(); NexusFileFormat.parseFile(builder, new File('2fin3-4.nex.con')); NexusFile nexus = builder.getNexusFile(); def tree = getTree(nexus, 'con 50 majrule') The parser constructs a weighted graph where the edge weights are the branch lengths. It is possible to parse the tree so that posterior probabilities are included in the resulting object? Martin