From alex at coolest.com Thu Nov 1 04:20:26 2007 From: alex at coolest.com (dasoudesu) Date: Thu, 1 Nov 2007 01:20:26 -0700 (PDT) Subject: [Biojava-l] [ann] Informal Text-mining & Java Meetup in Tokyo Message-ID: <13524848.post@talk.nabble.com> Just wanted to announce a mini-event: Informal Text-mining & Java Meetup in Tokyo http://curehunter.com/public/events.do Come have a casual drink with some similarly minded devs interested in new tech. (We like: Text-mining, Natural Language Processing, Java, C#, Python, Flex, Dojo, Lucene...) Time/location: November 29th 2007, Thursday 8pm-10pm Amarcord in Hatsudai (near Shinjuku), Tokyo http://way.sub.jp/amarcord/access.php 2000-3000yen for food/drinks If you can attend, please confirm by emailing: events at curehunter com We will do a short demo of CureHunter and talk about some of the tech we used. After that we will have a projector available if anyone else would like to present for 5-15 min on stuff they are working on. (the location is best equipped for drinking, however) Hope to meet a few Java people from around Tokyo. Best Regards, Alex --- http://curehunter.com - http://popjisyo.com - http://winstone.sf.net -- View this message in context: http://www.nabble.com/-ann--Informal-Text-mining---Java-Meetup-in-Tokyo-tf4729944.html#a13524848 Sent from the BioJava mailing list archive at Nabble.com. From ap3 at sanger.ac.uk Thu Nov 1 12:59:35 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 1 Nov 2007 16:59:35 +0000 Subject: [Biojava-l] Biojava migrating to Subversion Message-ID: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> Hi all, Over the next weeks (until Christmas) BioJava will finally move the version control system from CVS to Subversion (svn). This is happening in parallel to the other open-bio projects. We will ensure that nothing gets lost during this migration. This means that all Biojava modules, branches, tags and the history of the files will be imported into the new repository. Over the next weeks we will A) Test the migration procedure to ensure nothing gets lost B) We will declare a CVS freeze at some point, giving all developers enough time to commit the latest code to CVS. C) After the freeze the final svn migration will happen. At this point we will also do a quick BioJava release (version 1.5.1) D) From that moment on all future Biojava development will happen via svn, CVS will remain frozen. Detailed instructions for how to check out and commit code using svn will be announced closer to the migration date. We will keep you informed about the details of these ongoings. There is also a wiki page which provides documentation for this: http://biojava.org/wiki/CVS_to_SVN_Migration Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From abhi232 at cc.gatech.edu Mon Nov 5 12:59:15 2007 From: abhi232 at cc.gatech.edu (abhi232 at cc.gatech.edu) Date: Mon, 5 Nov 2007 12:59:15 -0500 (EST) Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> Message-ID: <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> Hi all, I am having a byte array which is having the data from an .ab1 file.The biojava library provides a class called as ABITrace which takes as input either a byte[] array , a file or a url.If i use the later parameters (the file or the url )the program works but if I pass the byte array to the constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a problem with the ABITrace class or how can I bypass this particular error. I am printing the length of the byte array and it comes to 144930...Can that cause a problem in my code? Thanks in advance. Abhinav From holland at ebi.ac.uk Tue Nov 6 05:15:43 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Nov 2007 10:15:43 +0000 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> Message-ID: <47303ECF.4020806@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I suspect the byte array itself may contain inaccurate data. Internally, both the URL and File constructors read the data into a byte array and then pass it to the same method as is used by the byte[] constructor. So, something must be different between the byte array you have, and the byte array obtained by reading the file in. The File constructor uses the following code to read the file: byte[] bytes = null; ByteArrayOutputStream baos = new ByteArrayOutputStream(); FileInputStream fis = new FileInputStream(ABIFile); BufferedInputStream bis = new BufferedInputStream(fis); int b; while ((b = bis.read()) >= 0) { baos.write(b); } bis.close(); fis.close(); baos.close(); bytes = baos.toByteArray(); If the above code produces different results to your byte array when reading data from the same file as your code, then something has gone wrong with the construction of your byte array. Lastly, a full stack trace would help us pinpoint the line that is breaking, and hopefully provide a hint as to what is wrong with the contents of the byte array. If you could provide one that would be very helpful. cheers, Richard abhi232 at cc.gatech.edu wrote: > Hi all, > I am having a byte array which is having the data from an .ab1 file.The > biojava library provides a class called as ABITrace which takes as input > either a byte[] array , a file or a url.If i use the later parameters (the > file or the url )the program works but if I pass the byte array to the > constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a > problem with the ABITrace class or how can I bypass this particular error. > I am printing the length of the byte array and it comes to 144930...Can > that cause a problem in my code? > > Thanks in advance. > Abhinav > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMD7P4C5LeMEKA/QRAmGIAJ9a/V6nZqMROz3H4u69ECQ+9iTgMgCeNZvr oe52S3khmTvi5BFCL1W4KHM= =5JAO -----END PGP SIGNATURE----- From holland at ebi.ac.uk Tue Nov 6 11:53:54 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Nov 2007 16:53:54 +0000 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <4730A6F1.9050407@cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <47303ECF.4020806@ebi.ac.uk> <4730A6F1.9050407@cc.gatech.edu> Message-ID: <47309C22.10803@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think that either the file is at fault, or the method you are using to read the file into Java is at fault. Could you provide us with the complete piece of code you are using from the point where you read the file into the array through to the point where you generate the output you quoted? (Not as an attachment as the mailing list will strip those - simply paste it into the message body instead). cheers, Richard abhinav wrote: > Richard Holland wrote: > I suspect the byte array itself may contain inaccurate data. > > Internally, both the URL and File constructors read the data into a byte > array and then pass it to the same method as is used by the byte[] > constructor. > > So, something must be different between the byte array you have, and the > byte array obtained by reading the file in. > > The File constructor uses the following code to read the file: > > byte[] bytes = null; > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > FileInputStream fis = new FileInputStream(ABIFile); > BufferedInputStream bis = new BufferedInputStream(fis); > int b; > while ((b = bis.read()) >= 0) > { > baos.write(b); > } > bis.close(); fis.close(); baos.close(); > bytes = baos.toByteArray(); > > If the above code produces different results to your byte array when > reading data from the same file as your code, then something has gone > wrong with the construction of your byte array. > > Lastly, a full stack trace would help us pinpoint the line that is > breaking, and hopefully provide a hint as to what is wrong with the > contents of the byte array. If you could provide one that would be very > helpful. > > cheers, > Richard > > > abhi232 at cc.gatech.edu wrote: > >>>> Hi all, >>>> I am having a byte array which is having the data from an .ab1 file.The >>>> biojava library provides a class called as ABITrace which takes as input >>>> either a byte[] array , a file or a url.If i use the later parameters (the >>>> file or the url )the program works but if I pass the byte array to the >>>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>>> problem with the ABITrace class or how can I bypass this particular error. >>>> I am printing the length of the byte array and it comes to 144930...Can >>>> that cause a problem in my code? >>>> >>>> Thanks in advance. >>>> Abhinav >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> > Yes I looked at the file ABITrace and found out that the first three > characters must be ABI or the 128-130 characters must be ABI.But I > cannot find that in the file that I am having.Also If this is not the > case then there should be an illegal format exception whereas I am > arrayIndexOutOfBound Exception which is also weird. > I am getting the following stack trace. > The bytes that i want are:0 > The bytes that i want are:11 > The bytes that i want are:0 > The size of the byte array generated is:144930 > Byte array also recieved > java.lang.ArrayIndexOutOfBoundsException: 128 > at org.biojava.bio.program.abi.ABITrace.isABI(ABITrace.java:552) > at org.biojava.bio.program.abi.ABITrace.initData(ABITrace.java:289) > at org.biojava.bio.program.abi.ABITrace.(ABITrace.java:136) > at Trace.init(Trace.java:138) > at sun.applet.AppletPanel.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > The bytes I want are the first three bytes that I want to check if my > file is ABI or not.I checked the isABI function as well it returns true > or false value and not arrayIndexOutOfBouond . Also the number 128 does > it hve any significance in this case? > Thanks in advance > Abhinav -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMJwi4C5LeMEKA/QRAhAOAJ0ZjIWk1CXSLYlU2CUCp7xodAfFeACgjtFG T1Z8W0JhCe7+hx5rbKLGqVk= =qNcr -----END PGP SIGNATURE----- From abhi232 at cc.gatech.edu Tue Nov 6 13:03:02 2007 From: abhi232 at cc.gatech.edu (abhinav) Date: Tue, 06 Nov 2007 12:03:02 -0600 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <47309C22.10803@ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <47303ECF.4020806@ebi.ac.uk> <4730A6F1.9050407@cc.gatech.edu> <47309C22.10803@ebi.ac.uk> Message-ID: <4730AC56.9060808@cc.gatech.edu> Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I think that either the file is at fault, or the method you are using to > read the file into Java is at fault. > > Could you provide us with the complete piece of code you are using from > the point where you read the file into the array through to the point > where you generate the output you quoted? (Not as an attachment as the > mailing list will strip those - simply paste it into the message body > instead). > > cheers, > Richard > > > abhinav wrote: > >> Richard Holland wrote: >> I suspect the byte array itself may contain inaccurate data. >> >> Internally, both the URL and File constructors read the data into a byte >> array and then pass it to the same method as is used by the byte[] >> constructor. >> >> So, something must be different between the byte array you have, and the >> byte array obtained by reading the file in. >> >> The File constructor uses the following code to read the file: >> >> byte[] bytes = null; >> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >> FileInputStream fis = new FileInputStream(ABIFile); >> BufferedInputStream bis = new BufferedInputStream(fis); >> int b; >> while ((b = bis.read()) >= 0) >> { >> baos.write(b); >> } >> bis.close(); fis.close(); baos.close(); >> bytes = baos.toByteArray(); >> >> If the above code produces different results to your byte array when >> reading data from the same file as your code, then something has gone >> wrong with the construction of your byte array. >> >> Lastly, a full stack trace would help us pinpoint the line that is >> breaking, and hopefully provide a hint as to what is wrong with the >> contents of the byte array. If you could provide one that would be very >> helpful. >> >> cheers, >> Richard >> >> >> abhi232 at cc.gatech.edu wrote: >> >> >>>>> Hi all, >>>>> I am having a byte array which is having the data from an .ab1 file.The >>>>> biojava library provides a class called as ABITrace which takes as input >>>>> either a byte[] array , a file or a url.If i use the later parameters (the >>>>> file or the url )the program works but if I pass the byte array to the >>>>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>>>> problem with the ABITrace class or how can I bypass this particular error. >>>>> I am printing the length of the byte array and it comes to 144930...Can >>>>> that cause a problem in my code? >>>>> >>>>> Thanks in advance. >>>>> Abhinav >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>>> >>>>> > > >> Yes I looked at the file ABITrace and found out that the first three >> characters must be ABI or the 128-130 characters must be ABI.But I >> cannot find that in the file that I am having.Also If this is not the >> case then there should be an illegal format exception whereas I am >> arrayIndexOutOfBound Exception which is also weird. >> I am getting the following stack trace. >> The bytes that i want are:0 >> The bytes that i want are:11 >> The bytes that i want are:0 >> The size of the byte array generated is:144930 >> Byte array also recieved >> java.lang.ArrayIndexOutOfBoundsException: 128 >> at org.biojava.bio.program.abi.ABITrace.isABI(ABITrace.java:552) >> at org.biojava.bio.program.abi.ABITrace.initData(ABITrace.java:289) >> at org.biojava.bio.program.abi.ABITrace.(ABITrace.java:136) >> at Trace.init(Trace.java:138) >> at sun.applet.AppletPanel.run(Unknown Source) >> at java.lang.Thread.run(Unknown Source) >> The bytes I want are the first three bytes that I want to check if my >> file is ABI or not.I checked the isABI function as well it returns true >> or false value and not arrayIndexOutOfBouond . Also the number 128 does >> it hve any significance in this case? >> Thanks in advance >> Abhinav >> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHMJwi4C5LeMEKA/QRAhAOAJ0ZjIWk1CXSLYlU2CUCp7xodAfFeACgjtFG > T1Z8W0JhCe7+hx5rbKLGqVk= > =qNcr > -----END PGP SIGNATURE----- > Ok Yes here is the code that i am using .I establish a connection with a php page which in turn reads the file and prints the content back to me.I am using DataOutputStream for sending data and BufferedReader for taking in the data.Then I am reading the data into a string and converting it to byte[] array . this the code where the connection is estableshed and the data is taken and displayed. private HttpURLConnection httpConn; private DataOutputStream out; private DataInputStream temp_stream; private BufferedReader in; private BufferedInputStream in_buff_stream; private String str ; private byte[] bytearray; Chromatogram abif_chromatogram; /** Creates a new instance of testPost */ public testPost() { httpConn = null; str = new String(""); bytearray = new byte[144930]; } public byte[] create_and_write_Connection(String url,String data_request) { try { URL conn_url = new URL(url); httpConn = (HttpURLConnection)conn_url.openConnection(); httpConn.setDoOutput(true); httpConn.setDoInput(true); httpConn.setRequestMethod("POST"); out=new DataOutputStream(httpConn.getOutputStream()); out.writeBytes(data_request); out.flush(); System.out.println("Connection established successfully and data written"); InputStreamReader in_stream = new InputStreamReader(httpConn.getInputStream()); System.out.println("The character encoding used is:"+ in_stream.getEncoding()); in = new BufferedReader(in_stream); System.out.println("Data acceptance started"); while(in.readLine()!=null) { str += in.readLine(); } System.out.println("The string to be returned is:"+str); bytearray = str.getBytes("ISO8859-1"); String temp_string = new String(bytearray,"windows-1252"); System.out.println("The encoded string is as follows:"+ temp_string); System.out.println("The size of byte array inside testpost is:"+ Array.getLength(bytearray)); for(int i = 0 ; i < 3 ; i ++) System.out.println("The bytes that i want are:"+ bytearray[i]); return bytearray; } catch(Exception e) { e.printStackTrace(); } return bytearray; } Please guide me on this point Thanks Abhinav From holland at ebi.ac.uk Tue Nov 6 12:05:12 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Nov 2007 17:05:12 +0000 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <4730AC56.9060808@cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <47303ECF.4020806@ebi.ac.uk> <4730A6F1.9050407@cc.gatech.edu> <47309C22.10803@ebi.ac.uk> <4730AC56.9060808@cc.gatech.edu> Message-ID: <47309EC8.2070904@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The String is where you're going wrong. ABI files are not Stringifyable - - they are binary data. Converting them to a String will corrupt them. cheers, Richard abhinav wrote: > Richard Holland wrote: > I think that either the file is at fault, or the method you are using to > read the file into Java is at fault. > > Could you provide us with the complete piece of code you are using from > the point where you read the file into the array through to the point > where you generate the output you quoted? (Not as an attachment as the > mailing list will strip those - simply paste it into the message body > instead). > > cheers, > Richard > > > abhinav wrote: > >>>> Richard Holland wrote: >>>> I suspect the byte array itself may contain inaccurate data. >>>> >>>> Internally, both the URL and File constructors read the data into a byte >>>> array and then pass it to the same method as is used by the byte[] >>>> constructor. >>>> >>>> So, something must be different between the byte array you have, and the >>>> byte array obtained by reading the file in. >>>> >>>> The File constructor uses the following code to read the file: >>>> >>>> byte[] bytes = null; >>>> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >>>> FileInputStream fis = new FileInputStream(ABIFile); >>>> BufferedInputStream bis = new BufferedInputStream(fis); >>>> int b; >>>> while ((b = bis.read()) >= 0) >>>> { >>>> baos.write(b); >>>> } >>>> bis.close(); fis.close(); baos.close(); >>>> bytes = baos.toByteArray(); >>>> >>>> If the above code produces different results to your byte array when >>>> reading data from the same file as your code, then something has gone >>>> wrong with the construction of your byte array. >>>> >>>> Lastly, a full stack trace would help us pinpoint the line that is >>>> breaking, and hopefully provide a hint as to what is wrong with the >>>> contents of the byte array. If you could provide one that would be very >>>> helpful. >>>> >>>> cheers, >>>> Richard >>>> >>>> >>>> abhi232 at cc.gatech.edu wrote: >>>> >>>> >>>>>>> Hi all, >>>>>>> I am having a byte array which is having the data from an .ab1 file.The >>>>>>> biojava library provides a class called as ABITrace which takes as input >>>>>>> either a byte[] array , a file or a url.If i use the later parameters (the >>>>>>> file or the url )the program works but if I pass the byte array to the >>>>>>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>>>>>> problem with the ABITrace class or how can I bypass this particular error. >>>>>>> I am printing the length of the byte array and it comes to 144930...Can >>>>>>> that cause a problem in my code? >>>>>>> >>>>>>> Thanks in advance. >>>>>>> Abhinav >>>>>>> _______________________________________________ >>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>> >>>>>>> >>>>>>> > > >>>> Yes I looked at the file ABITrace and found out that the first three >>>> characters must be ABI or the 128-130 characters must be ABI.But I >>>> cannot find that in the file that I am having.Also If this is not the >>>> case then there should be an illegal format exception whereas I am >>>> arrayIndexOutOfBound Exception which is also weird. >>>> I am getting the following stack trace. >>>> The bytes that i want are:0 >>>> The bytes that i want are:11 >>>> The bytes that i want are:0 >>>> The size of the byte array generated is:144930 >>>> Byte array also recieved >>>> java.lang.ArrayIndexOutOfBoundsException: 128 >>>> at org.biojava.bio.program.abi.ABITrace.isABI(ABITrace.java:552) >>>> at org.biojava.bio.program.abi.ABITrace.initData(ABITrace.java:289) >>>> at org.biojava.bio.program.abi.ABITrace.(ABITrace.java:136) >>>> at Trace.init(Trace.java:138) >>>> at sun.applet.AppletPanel.run(Unknown Source) >>>> at java.lang.Thread.run(Unknown Source) >>>> The bytes I want are the first three bytes that I want to check if my >>>> file is ABI or not.I checked the isABI function as well it returns true >>>> or false value and not arrayIndexOutOfBouond . Also the number 128 does >>>> it hve any significance in this case? >>>> Thanks in advance >>>> Abhinav >>>> > > Ok Yes here is the code that i am using .I establish a connection with a > php page which in turn reads the file and prints the content back to > me.I am using DataOutputStream for sending data and BufferedReader for > taking in the data.Then I am reading the data into a string and > converting it to byte[] array . this the code where the connection is > estableshed and the data is taken and displayed. > private HttpURLConnection httpConn; > private DataOutputStream out; > private DataInputStream temp_stream; > private BufferedReader in; > private BufferedInputStream in_buff_stream; > private String str ; > private byte[] bytearray; > Chromatogram abif_chromatogram; > /** Creates a new instance of testPost */ > public testPost() > { > httpConn = null; > str = new String(""); > bytearray = new byte[144930]; > } > public byte[] create_and_write_Connection(String url,String > data_request) > { > try > { > URL conn_url = new URL(url); > httpConn = (HttpURLConnection)conn_url.openConnection(); > httpConn.setDoOutput(true); > httpConn.setDoInput(true); > httpConn.setRequestMethod("POST"); > out=new DataOutputStream(httpConn.getOutputStream()); > out.writeBytes(data_request); > out.flush(); > System.out.println("Connection established successfully and > data written"); > InputStreamReader in_stream = new > InputStreamReader(httpConn.getInputStream()); > System.out.println("The character encoding used is:"+ > in_stream.getEncoding()); > in = new BufferedReader(in_stream); > System.out.println("Data acceptance started"); > while(in.readLine()!=null) > { > str += in.readLine(); > } > System.out.println("The string to be returned is:"+str); > bytearray = str.getBytes("ISO8859-1"); > String temp_string = new String(bytearray,"windows-1252"); > System.out.println("The encoded string is as follows:"+ > temp_string); > System.out.println("The size of byte array inside testpost > is:"+ Array.getLength(bytearray)); > for(int i = 0 ; i < 3 ; i ++) > System.out.println("The bytes that i want are:"+ > bytearray[i]); > return bytearray; > } > catch(Exception e) > { > e.printStackTrace(); > } > return bytearray; > } > Please guide me on this point > Thanks > Abhinav -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMJ7I4C5LeMEKA/QRAupLAJ9YDoGohk5uZSNYZnRRMJ5WeNDpGgCfdCyg +Z/gXBbPmrG3SuQlfeHuD3A= =akSf -----END PGP SIGNATURE----- From abhi232 at cc.gatech.edu Tue Nov 6 12:40:01 2007 From: abhi232 at cc.gatech.edu (abhinav) Date: Tue, 06 Nov 2007 11:40:01 -0600 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <47303ECF.4020806@ebi.ac.uk> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <47303ECF.4020806@ebi.ac.uk> Message-ID: <4730A6F1.9050407@cc.gatech.edu> Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I suspect the byte array itself may contain inaccurate data. > > Internally, both the URL and File constructors read the data into a byte > array and then pass it to the same method as is used by the byte[] > constructor. > > So, something must be different between the byte array you have, and the > byte array obtained by reading the file in. > > The File constructor uses the following code to read the file: > > byte[] bytes = null; > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > FileInputStream fis = new FileInputStream(ABIFile); > BufferedInputStream bis = new BufferedInputStream(fis); > int b; > while ((b = bis.read()) >= 0) > { > baos.write(b); > } > bis.close(); fis.close(); baos.close(); > bytes = baos.toByteArray(); > > If the above code produces different results to your byte array when > reading data from the same file as your code, then something has gone > wrong with the construction of your byte array. > > Lastly, a full stack trace would help us pinpoint the line that is > breaking, and hopefully provide a hint as to what is wrong with the > contents of the byte array. If you could provide one that would be very > helpful. > > cheers, > Richard > > > abhi232 at cc.gatech.edu wrote: > >> Hi all, >> I am having a byte array which is having the data from an .ab1 file.The >> biojava library provides a class called as ABITrace which takes as input >> either a byte[] array , a file or a url.If i use the later parameters (the >> file or the url )the program works but if I pass the byte array to the >> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >> problem with the ABITrace class or how can I bypass this particular error. >> I am printing the length of the byte array and it comes to 144930...Can >> that cause a problem in my code? >> >> Thanks in advance. >> Abhinav >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHMD7P4C5LeMEKA/QRAmGIAJ9a/V6nZqMROz3H4u69ECQ+9iTgMgCeNZvr > oe52S3khmTvi5BFCL1W4KHM= > =5JAO > -----END PGP SIGNATURE----- > Yes I looked at the file ABITrace and found out that the first three characters must be ABI or the 128-130 characters must be ABI.But I cannot find that in the file that I am having.Also If this is not the case then there should be an illegal format exception whereas I am arrayIndexOutOfBound Exception which is also weird. I am getting the following stack trace. The bytes that i want are:0 The bytes that i want are:11 The bytes that i want are:0 The size of the byte array generated is:144930 Byte array also recieved java.lang.ArrayIndexOutOfBoundsException: 128 at org.biojava.bio.program.abi.ABITrace.isABI(ABITrace.java:552) at org.biojava.bio.program.abi.ABITrace.initData(ABITrace.java:289) at org.biojava.bio.program.abi.ABITrace.(ABITrace.java:136) at Trace.init(Trace.java:138) at sun.applet.AppletPanel.run(Unknown Source) at java.lang.Thread.run(Unknown Source) The bytes I want are the first three bytes that I want to check if my file is ABI or not.I checked the isABI function as well it returns true or false value and not arrayIndexOutOfBouond . Also the number 128 does it hve any significance in this case? Thanks in advance Abhinav From walsh at andrew.cmu.edu Tue Nov 6 12:23:36 2007 From: walsh at andrew.cmu.edu (Andrew Walsh) Date: Tue, 06 Nov 2007 12:23:36 -0500 Subject: [Biojava-l] Error while reading byte data for creating a Trace. In-Reply-To: <4730AC56.9060808@cc.gatech.edu> References: <6EDA8DA0-39B2-40A3-B3B3-DB5F3463DB51@sanger.ac.uk> <2839.130.207.66.142.1194285555.squirrel@webmail.cc.gatech.edu> <47303ECF.4020806@ebi.ac.uk> <4730A6F1.9050407@cc.gatech.edu> <47309C22.10803@ebi.ac.uk> <4730AC56.9060808@cc.gatech.edu> Message-ID: <4730A318.8010406@andrew.cmu.edu> You also appear to be losing every other line with the following code: while(in.readLine()!=null) { str += in.readLine(); } Every time the while statement checks its condition, a line is read from the inputstream. That line is never stored. Then, if the condition is met, another line is read and that line is added to your String. -Andy abhinav wrote: > Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> I think that either the file is at fault, or the method you are using to >> read the file into Java is at fault. >> >> Could you provide us with the complete piece of code you are using from >> the point where you read the file into the array through to the point >> where you generate the output you quoted? (Not as an attachment as the >> mailing list will strip those - simply paste it into the message body >> instead). >> >> cheers, >> Richard >> >> >> abhinav wrote: >> >> >>> Richard Holland wrote: >>> I suspect the byte array itself may contain inaccurate data. >>> >>> Internally, both the URL and File constructors read the data into a byte >>> array and then pass it to the same method as is used by the byte[] >>> constructor. >>> >>> So, something must be different between the byte array you have, and the >>> byte array obtained by reading the file in. >>> >>> The File constructor uses the following code to read the file: >>> >>> byte[] bytes = null; >>> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >>> FileInputStream fis = new FileInputStream(ABIFile); >>> BufferedInputStream bis = new BufferedInputStream(fis); >>> int b; >>> while ((b = bis.read()) >= 0) >>> { >>> baos.write(b); >>> } >>> bis.close(); fis.close(); baos.close(); >>> bytes = baos.toByteArray(); >>> >>> If the above code produces different results to your byte array when >>> reading data from the same file as your code, then something has gone >>> wrong with the construction of your byte array. >>> >>> Lastly, a full stack trace would help us pinpoint the line that is >>> breaking, and hopefully provide a hint as to what is wrong with the >>> contents of the byte array. If you could provide one that would be very >>> helpful. >>> >>> cheers, >>> Richard >>> >>> >>> abhi232 at cc.gatech.edu wrote: >>> >>> >>> >>>>>> Hi all, >>>>>> I am having a byte array which is having the data from an .ab1 file.The >>>>>> biojava library provides a class called as ABITrace which takes as input >>>>>> either a byte[] array , a file or a url.If i use the later parameters (the >>>>>> file or the url )the program works but if I pass the byte array to the >>>>>> constructor I get java.lang.arrayIndexOutOfBound.Exception.Is there a >>>>>> problem with the ABITrace class or how can I bypass this particular error. >>>>>> I am printing the length of the byte array and it comes to 144930...Can >>>>>> that cause a problem in my code? >>>>>> >>>>>> Thanks in advance. >>>>>> Abhinav >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>> >>>>>> >>>>>> >>>>>> >> >> >>> Yes I looked at the file ABITrace and found out that the first three >>> characters must be ABI or the 128-130 characters must be ABI.But I >>> cannot find that in the file that I am having.Also If this is not the >>> case then there should be an illegal format exception whereas I am >>> arrayIndexOutOfBound Exception which is also weird. >>> I am getting the following stack trace. >>> The bytes that i want are:0 >>> The bytes that i want are:11 >>> The bytes that i want are:0 >>> The size of the byte array generated is:144930 >>> Byte array also recieved >>> java.lang.ArrayIndexOutOfBoundsException: 128 >>> at org.biojava.bio.program.abi.ABITrace.isABI(ABITrace.java:552) >>> at org.biojava.bio.program.abi.ABITrace.initData(ABITrace.java:289) >>> at org.biojava.bio.program.abi.ABITrace.(ABITrace.java:136) >>> at Trace.init(Trace.java:138) >>> at sun.applet.AppletPanel.run(Unknown Source) >>> at java.lang.Thread.run(Unknown Source) >>> The bytes I want are the first three bytes that I want to check if my >>> file is ABI or not.I checked the isABI function as well it returns true >>> or false value and not arrayIndexOutOfBouond . Also the number 128 does >>> it hve any significance in this case? >>> Thanks in advance >>> Abhinav >>> >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFHMJwi4C5LeMEKA/QRAhAOAJ0ZjIWk1CXSLYlU2CUCp7xodAfFeACgjtFG >> T1Z8W0JhCe7+hx5rbKLGqVk= >> =qNcr >> -----END PGP SIGNATURE----- >> >> > Ok Yes here is the code that i am using .I establish a connection with a > php page which in turn reads the file and prints the content back to > me.I am using DataOutputStream for sending data and BufferedReader for > taking in the data.Then I am reading the data into a string and > converting it to byte[] array . this the code where the connection is > estableshed and the data is taken and displayed. > > > > private HttpURLConnection httpConn; > private DataOutputStream out; > private DataInputStream temp_stream; > private BufferedReader in; > private BufferedInputStream in_buff_stream; > private String str ; > private byte[] bytearray; > Chromatogram abif_chromatogram; > > /** Creates a new instance of testPost */ > public testPost() > { > > httpConn = null; > str = new String(""); > bytearray = new byte[144930]; > > } > public byte[] create_and_write_Connection(String url,String > data_request) > { > try > { > URL conn_url = new URL(url); > httpConn = (HttpURLConnection)conn_url.openConnection(); > httpConn.setDoOutput(true); > httpConn.setDoInput(true); > httpConn.setRequestMethod("POST"); > out=new DataOutputStream(httpConn.getOutputStream()); > out.writeBytes(data_request); > out.flush(); > System.out.println("Connection established successfully and > data written"); > InputStreamReader in_stream = new > InputStreamReader(httpConn.getInputStream()); > > System.out.println("The character encoding used is:"+ > in_stream.getEncoding()); > in = new BufferedReader(in_stream); > > > System.out.println("Data acceptance started"); > > > while(in.readLine()!=null) > { > str += in.readLine(); > } > System.out.println("The string to be returned is:"+str); > bytearray = str.getBytes("ISO8859-1"); > String temp_string = new String(bytearray,"windows-1252"); > System.out.println("The encoded string is as follows:"+ > temp_string); > System.out.println("The size of byte array inside testpost > is:"+ Array.getLength(bytearray)); > for(int i = 0 ; i < 3 ; i ++) > System.out.println("The bytes that i want are:"+ > bytearray[i]); > return bytearray; > } > catch(Exception e) > { > e.printStackTrace(); > } > return bytearray; > } > Please guide me on this point > Thanks > Abhinav > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From holland at ebi.ac.uk Thu Nov 8 08:53:09 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 08 Nov 2007 13:53:09 +0000 Subject: [Biojava-l] BioJava 3 Proposals Message-ID: <473314C5.8070207@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear BioJava users, The BioJava developers are considering options for the future development of the BioJava toolkit. We consider that it needs improvement in a few major areas to make it easier to use and understand, and also faster and more scalable. The options are to either rewrite large parts of the existing code, working within the existing interfaces and paradigms, or to develop a new set of BioJava packages from the ground up in order to take advantage of lessons learned from the design patterns of the existing code. The BioJava developers have spent the last couple of months discussing ideas and proposals related to these options on a Wiki page, and would now like to open this discussion to all users of BioJava and the bioinformatics community in general. We would like to invite anyone who has any ideas or suggestions to contribute these to the Wiki page, and/or to comment on the ideas and suggestions that have already been posted there. Here is a link to the Wiki page, and also a link to the associated Talk page where much of the discussion has taken place so far: http://biojava.org/wiki/BioJava3_Proposal http://biojava.org/wiki/Talk:BioJava3_Proposal It is our intention to leave the discussion open until mid-January 2008 when we will summarise it and use it as the basis of a plan of action. We will then distribute the summary and the action plan via the BioJava website. We look forward to hearing your comments and ideas. Please do remember to make them directly to the Wiki page so that they are preserved in context, making it easier for us to summarise them later! cheers, Richard (on behalf of all BioJava developers) PS. Just to reassure you, this is NOT a plan to drop the existing codebase. It will continue to exist, but the outcome of these discussions will determine whether we will continue to develop and support it or start afresh with a clean slate and a new codebase. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMxTE4C5LeMEKA/QRAlGSAJwKzO0oAe3T2e8ibcG8uRReOVfh7wCdGlwn JkcVzA55Ye32o8Ry48LO+04= =oaaC -----END PGP SIGNATURE----- From holland at ebi.ac.uk Thu Nov 8 08:58:23 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 08 Nov 2007 13:58:23 +0000 Subject: [Biojava-l] Biojava wiki Message-ID: <473315FF.70506@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 what's happened to the biojava wiki today? i get errors from all pages, including the front page, indicating zero-sized replies. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMxX/4C5LeMEKA/QRAmBPAJ9hx450OqBsD8s4DPgL8LsvpD4aRwCfZA62 6KkoyXhahrWkZo2OWyCL+Uk= =1jK7 -----END PGP SIGNATURE----- From phidias51 at gmail.com Thu Nov 8 10:39:29 2007 From: phidias51 at gmail.com (Mark Fortner) Date: Thu, 8 Nov 2007 07:39:29 -0800 Subject: [Biojava-l] Biojava wiki In-Reply-To: <473315FF.70506@ebi.ac.uk> References: <473315FF.70506@ebi.ac.uk> Message-ID: <6e1d61f50711080739t6df72848se87e6001f97d01ce@mail.gmail.com> Richard, That's odd. It comes up fine for me. BTW, in your proposal you mentioned that people had "moved on". I was wondering what types of tasks they had moved on to, and what should be included in the Proposal to insure that BioJava stays relevant to them? Regards, Mark On Nov 8, 2007 5:58 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > what's happened to the biojava wiki today? i get errors from all pages, > including the front page, indicating zero-sized replies. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHMxX/4C5LeMEKA/QRAmBPAJ9hx450OqBsD8s4DPgL8LsvpD4aRwCfZA62 > 6KkoyXhahrWkZo2OWyCL+Uk= > =1jK7 > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at gmx.net Thu Nov 8 10:53:03 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 8 Nov 2007 10:53:03 -0500 Subject: [Biojava-l] small "bug" correction in package BioSql In-Reply-To: <762277.43372.qm@web26507.mail.ukl.yahoo.com> References: <762277.43372.qm@web26507.mail.ukl.yahoo.com> Message-ID: Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we explicitly lowercase the value found for alphabet, and the comment says why: # Note: Biojava uses upper-case terms for alphabet, so we # need to change to all-lower in case the sequence was # manipulated by Biojava. $obj->alphabet(lc($rows->[3])) if $rows->[3]; However, when inserting sequences, we leave the value as is in BioPerl (which is lowercase), leading to a potential problem for Biojava upon retrieval. Do the Biojava folks deal with that? Should this may harmonized across the board? -hilmar On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: > Dear Peter, > > All the alphabet are "DNA" (upper case) in my database. The > sequences are taken from NCBI by a BioJava application. > Thus is should be that BioJava inserts the records with "DNA". Thus > no potential "hidden bug" in BioPython. > > Maybe a point to share with the Open-Bio committee. > > Eric > > ----- Message d'origine ---- > De : Peter > ? : Eric Gibert > Cc : biopython at lists.open-bio.org > Envoy? le : Jeudi, 8 Novembre 2007, 19h40mn 00s > Objet : Re: [BioPython] small "bug" correction in package BioSql > > Eric Gibert wrote: >> Dear all, >> >> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >> function: >> >> ... >> >> please note my correction: force moltype to be turn in lower case as >> my database has upper case value! this raises the "Unknown moltype" >> error. > > Hi Eric, I've made your suggested change in CVS, > biopython/BioSQL/BioSeq.py revision 1.13, thank you. > > I would encourage you to investigate why some of the "alphabet" fields > in the biosequence table are in upper case. There could be a bug > elsewhere which is writing these entries with the wrong alphabet. Is > this affecting all entries, or just some? > > Peter > > > > > > > > > ______________________________________________________________________ > _______ > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers > Yahoo! Mail > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Thu Nov 8 11:17:25 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 08 Nov 2007 16:17:25 +0000 Subject: [Biojava-l] Biojava wiki In-Reply-To: <6e1d61f50711080739t6df72848se87e6001f97d01ce@mail.gmail.com> References: <473315FF.70506@ebi.ac.uk> <6e1d61f50711080739t6df72848se87e6001f97d01ce@mail.gmail.com> Message-ID: <47333695.40808@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > BTW, in your proposal you mentioned that people had "moved on". I was > wondering what types of tasks they had moved on to, and what should be > included in the Proposal to insure that BioJava stays relevant to them? Good point. From what we can tell, people are not so sequence-focused any more but are more interested in features, alignments, population data, etc. - more 'metadata' so to speak. We do need some mechanism to ensure that we are correct in this thinking, and that future shifts in direction are catered for in this design phase. Could you add a note to the wiki with your points, and/or any ideas you may have about ensuring these requirements are met? cheers, Richard > Regards, > > Mark > > On Nov 8, 2007 5:58 AM, Richard Holland wrote: > > what's happened to the biojava wiki today? i get errors from all pages, > including the front page, indicating zero-sized replies. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMzaV4C5LeMEKA/QRAoPUAJ0TQ+xFF1J3EtZgHmvYj2HH41koCgCeLYm0 D5Z7SJDWjvJ9rbCrS+RTEeI= =XhE1 -----END PGP SIGNATURE----- From holland at ebi.ac.uk Thu Nov 8 11:18:46 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 08 Nov 2007 16:18:46 +0000 Subject: [Biojava-l] small "bug" correction in package BioSql In-Reply-To: References: <762277.43372.qm@web26507.mail.ukl.yahoo.com> Message-ID: <473336E6.6000100@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 we do need a consensus here. I'm happy to go with whatever value is chosen, as the BioJava code can easily be modified to suit. cheers, Richard Hilmar Lapp wrote: > Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we > explicitly lowercase the value found for alphabet, and the comment > says why: > > # Note: Biojava uses upper-case terms for alphabet, so we > # need to change to all-lower in case the sequence was > # manipulated by Biojava. > $obj->alphabet(lc($rows->[3])) if $rows->[3]; > > However, when inserting sequences, we leave the value as is in > BioPerl (which is lowercase), leading to a potential problem for > Biojava upon retrieval. Do the Biojava folks deal with that? Should > this may harmonized across the board? > > -hilmar > > On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: > >> Dear Peter, >> >> All the alphabet are "DNA" (upper case) in my database. The >> sequences are taken from NCBI by a BioJava application. >> Thus is should be that BioJava inserts the records with "DNA". Thus >> no potential "hidden bug" in BioPython. >> >> Maybe a point to share with the Open-Bio committee. >> >> Eric >> >> ----- Message d'origine ---- >> De : Peter >> ? : Eric Gibert >> Cc : biopython at lists.open-bio.org >> Envoy? le : Jeudi, 8 Novembre 2007, 19h40mn 00s >> Objet : Re: [BioPython] small "bug" correction in package BioSql >> >> Eric Gibert wrote: >>> Dear all, >>> >>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >>> function: >>> >>> ... >>> >>> please note my correction: force moltype to be turn in lower case as >>> my database has upper case value! this raises the "Unknown moltype" >>> error. >> Hi Eric, I've made your suggested change in CVS, >> biopython/BioSQL/BioSeq.py revision 1.13, thank you. >> >> I would encourage you to investigate why some of the "alphabet" fields >> in the biosequence table are in upper case. There could be a bug >> elsewhere which is writing these entries with the wrong alphabet. Is >> this affecting all entries, or just some? >> >> Peter >> >> >> >> >> >> >> >> >> ______________________________________________________________________ >> _______ >> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers >> Yahoo! Mail >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHMzbm4C5LeMEKA/QRAtzGAJ98MKWg0uUOafDVVkihSzfSTwtfxACgi6q3 9x+CUHig3GfBCZ56rDb1ZG4= =OJyB -----END PGP SIGNATURE----- From hlapp at gmx.net Thu Nov 8 15:28:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 8 Nov 2007 15:28:19 -0500 Subject: [Biojava-l] [BioPython] error on insert new sequences from GenBank: no annotations saved in BioSQL database In-Reply-To: <499834.44468.qm@web26501.mail.ukl.yahoo.com> References: <499834.44468.qm@web26501.mail.ukl.yahoo.com> Message-ID: Maybe we need to hold some mini-hackathon to make the different toolkits compatible in how they map annotation to the schema. Obviously I don't know whether you have the latest Biojava setup here, but I'll just comment how BioPerl/Bioperl-db would map this: 'ORIGIN' - if I'm not mistaken this is only a token that introduces the actual sequence. I'm not sure what Biojava is storing as value here. 'DIVISION' - this maps to column division in table bioentry (though I agree that if perfectly following the weak typing principle this should be tag/value association, but at present it's still an actual column) 'genbank_accessions' - secondary accession numbers indeed go into the qualifier value table. The primary accession maps to column accession in table bioentry 'TITLE' - this is part of a publication reference, and should map to column title in table reference (which it does in bioperl-db) 'cross_references' - not sure where these would be coming from in GenBank format; for EMBL this will map to the dbxref table 'data_file_division' - not sure what this is (same as DIVISION?) 'VERSION' - in BioPerl we parse this apart into a version for the accession (which is column version in table bioentry) and the GI number, which maps to column identifier in table bioentry 'references' - these map to table reference (and bioentry_reference for association with the bioentry) 'KEYWORDS' - indeed these map to bioentry_qualifier_value 'GI' - maps to column identifier in table bioentry 'SIZE' - not sure what size that is. If it is the length of the sequence, it should (and in BioPerl/bioperl-db does) map to column length in table biosequence 'DEFINITION' - maps to column description in table bioentry 'REFERENCE' - should be the same as for 'references' 'MDAT' - not sure what this is 'ORGANISM' - this is the organism and maps to the table taxon (and taxon_name), with a foreign key in bioentry pointing to the taxon 'JOURNAL' - this is part of a reference, see 'references' 'ACCESSION' - the primary accession, maps to column accession in table bioentry 'LOCUS' - in the file itself this is an entire line consisting of multiple fields; BioPerl/bioperl-db maps the locus name (the first token after the literal token LOCUS) to column name in table bioentry 'SOURCE' - this is the organism, see 'ORGANISM' 'PUBMED' - this is part of a literature reference, and maps to a foreign key in the reference table (reference.dbxref) to a dbxref entry with PUBMED or PMID as the database and the pubmed ID as the accession 'AUTHORS' - part of a literature reference, maps to column authors in table reference 'TYPE' - not sure what this is. If it's the alphabet, it maps to table biosequence, column alphabet 'CIRCULAR' - this at present indeed maps to bioentry_qualifier_value, though there have been plans to make it a column in table biosequence. Note that this could in fact be the way Biojava stores it too, but upon retrieval represents it in the way you are seeing it. Hth, -hilmar On Nov 8, 2007, at 12:50 PM, Eric Gibert wrote: > Dear all, > > When I retrieve a BioSQL.BioSeq.DBSeqRecord which was inserted > previously by my BioJava application, I have: > > print "Debug on Seq:", Seq.id, "=", Seq.annotations.keys() > > Debug on Seq: AJ459190.1 = ['ORIGIN', 'DIVISION', > 'genbank_accessions', 'TITLE', 'cross_references', > 'data_file_division', 'VERSION', 'references', 'KEYWORDS', 'GI', > 'SIZE', 'DEFINITION', 'REFERENCE', 'MDAT', 'ORGANISM', 'JOURNAL', > 'ACCESSION', 'LOCUS', 'SOURCE', 'PUBMED', 'AUTHORS', 'TYPE', > 'CIRCULAR'] > > but a freshly inserted BioSeq by BioPython 1.44 only gives me: > Debug on Seq: EF631597.1 = ['cross_references', 'dates', > 'references', 'gi', 'data_file_division'] > > > Once I look in the table bioentry_qualifier_value > > * 20 records for a Sequence imported by BioJava > * 1 only for a Sequence inserted by BioPython: the date which > should be inserted by "_load_bioentry_date" in BioSQL/Loader.py > > Quite a few annotations missing, no? > > Any idea? > > Eric > > > > > > ______________________________________________________________________ > _______ > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers > Yahoo! Mail > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Nov 8 15:30:29 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 8 Nov 2007 15:30:29 -0500 Subject: [Biojava-l] small "bug" correction in package BioSql In-Reply-To: <473336E6.6000100@ebi.ac.uk> References: <762277.43372.qm@web26507.mail.ukl.yahoo.com> <473336E6.6000100@ebi.ac.uk> Message-ID: <9FF48B4B-74F1-4371-BBB5-541F1A70D88F@gmx.net> It seems BioPerl and Biopython both want (and have traditionally used) lowercase - do you mind going with that for Biojava as well, or alternatively, simply map upon insert/update and retrieve? -hilmar On Nov 8, 2007, at 11:18 AM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > we do need a consensus here. > > I'm happy to go with whatever value is chosen, as the BioJava code can > easily be modified to suit. > > cheers, > Richard > > Hilmar Lapp wrote: >> Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we >> explicitly lowercase the value found for alphabet, and the comment >> says why: >> >> # Note: Biojava uses upper-case terms for alphabet, so we >> # need to change to all-lower in case the sequence was >> # manipulated by Biojava. >> $obj->alphabet(lc($rows->[3])) if $rows->[3]; >> >> However, when inserting sequences, we leave the value as is in >> BioPerl (which is lowercase), leading to a potential problem for >> Biojava upon retrieval. Do the Biojava folks deal with that? Should >> this may harmonized across the board? >> >> -hilmar >> >> On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: >> >>> Dear Peter, >>> >>> All the alphabet are "DNA" (upper case) in my database. The >>> sequences are taken from NCBI by a BioJava application. >>> Thus is should be that BioJava inserts the records with "DNA". Thus >>> no potential "hidden bug" in BioPython. >>> >>> Maybe a point to share with the Open-Bio committee. >>> >>> Eric >>> >>> ----- Message d'origine ---- >>> De : Peter >>> ? : Eric Gibert >>> Cc : biopython at lists.open-bio.org >>> Envoy? le : Jeudi, 8 Novembre 2007, 19h40mn 00s >>> Objet : Re: [BioPython] small "bug" correction in package BioSql >>> >>> Eric Gibert wrote: >>>> Dear all, >>>> >>>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >>>> function: >>>> >>>> ... >>>> >>>> please note my correction: force moltype to be turn in lower >>>> case as >>>> my database has upper case value! this raises the "Unknown moltype" >>>> error. >>> Hi Eric, I've made your suggested change in CVS, >>> biopython/BioSQL/BioSeq.py revision 1.13, thank you. >>> >>> I would encourage you to investigate why some of the "alphabet" >>> fields >>> in the biosequence table are in upper case. There could be a bug >>> elsewhere which is writing these entries with the wrong >>> alphabet. Is >>> this affecting all entries, or just some? >>> >>> Peter >>> >>> >>> >>> >>> >>> >>> >>> >>> ____________________________________________________________________ >>> __ >>> _______ >>> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers >>> Yahoo! Mail >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHMzbm4C5LeMEKA/QRAtzGAJ98MKWg0uUOafDVVkihSzfSTwtfxACgi6q3 > 9x+CUHig3GfBCZ56rDb1ZG4= > =OJyB > -----END PGP SIGNATURE----- -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at ebi.ac.uk Fri Nov 9 03:39:01 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 09 Nov 2007 08:39:01 +0000 Subject: [Biojava-l] small "bug" correction in package BioSql In-Reply-To: <9FF48B4B-74F1-4371-BBB5-541F1A70D88F@gmx.net> References: <762277.43372.qm@web26507.mail.ukl.yahoo.com> <473336E6.6000100@ebi.ac.uk> <9FF48B4B-74F1-4371-BBB5-541F1A70D88F@gmx.net> Message-ID: <47341CA5.9080509@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 i'll see what i can do. Hilmar Lapp wrote: > It seems BioPerl and Biopython both want (and have traditionally used) > lowercase - do you mind going with that for Biojava as well, or > alternatively, simply map upon insert/update and retrieve? > > -hilmar > > On Nov 8, 2007, at 11:18 AM, Richard Holland wrote: > > we do need a consensus here. > > I'm happy to go with whatever value is chosen, as the BioJava code can > easily be modified to suit. > > cheers, > Richard > > Hilmar Lapp wrote: >>>> Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we >>>> explicitly lowercase the value found for alphabet, and the comment >>>> says why: >>>> >>>> # Note: Biojava uses upper-case terms for alphabet, so we >>>> # need to change to all-lower in case the sequence was >>>> # manipulated by Biojava. >>>> $obj->alphabet(lc($rows->[3])) if $rows->[3]; >>>> >>>> However, when inserting sequences, we leave the value as is in >>>> BioPerl (which is lowercase), leading to a potential problem for >>>> Biojava upon retrieval. Do the Biojava folks deal with that? Should >>>> this may harmonized across the board? >>>> >>>> -hilmar >>>> >>>> On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: >>>> >>>>> Dear Peter, >>>>> >>>>> All the alphabet are "DNA" (upper case) in my database. The >>>>> sequences are taken from NCBI by a BioJava application. >>>>> Thus is should be that BioJava inserts the records with "DNA". Thus >>>>> no potential "hidden bug" in BioPython. >>>>> >>>>> Maybe a point to share with the Open-Bio committee. >>>>> >>>>> Eric >>>>> >>>>> ----- Message d'origine ---- >>>>> De : Peter >>>>> ? : Eric Gibert >>>>> Cc : biopython at lists.open-bio.org >>>>> Envoy? le : Jeudi, 8 Novembre 2007, 19h40mn 00s >>>>> Objet : Re: [BioPython] small "bug" correction in package BioSql >>>>> >>>>> Eric Gibert wrote: >>>>>> Dear all, >>>>>> >>>>>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >>>>>> function: >>>>>> >>>>>> ... >>>>>> >>>>>> please note my correction: force moltype to be turn in lower case as >>>>>> my database has upper case value! this raises the "Unknown moltype" >>>>>> error. >>>>> Hi Eric, I've made your suggested change in CVS, >>>>> biopython/BioSQL/BioSeq.py revision 1.13, thank you. >>>>> >>>>> I would encourage you to investigate why some of the "alphabet" fields >>>>> in the biosequence table are in upper case. There could be a bug >>>>> elsewhere which is writing these entries with the wrong alphabet. Is >>>>> this affecting all entries, or just some? >>>>> >>>>> Peter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> _______ >>>>> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers >>>>> Yahoo! Mail >>>>> _______________________________________________ >>>>> BioPython mailing list - BioPython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHNByl4C5LeMEKA/QRAmCzAJ9fxSm8l5YAEHAUe2hH+Gwc1Xe5IwCfcMf6 c9sy8lASDV069FQJ79Geemw= =RHM1 -----END PGP SIGNATURE----- From holland at ebi.ac.uk Fri Nov 9 07:42:38 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 09 Nov 2007 12:42:38 +0000 Subject: [Biojava-l] small "bug" correction in package BioSql In-Reply-To: <9FF48B4B-74F1-4371-BBB5-541F1A70D88F@gmx.net> References: <762277.43372.qm@web26507.mail.ukl.yahoo.com> <473336E6.6000100@ebi.ac.uk> <9FF48B4B-74F1-4371-BBB5-541F1A70D88F@gmx.net> Message-ID: <473455BE.6040807@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I did a bit of poking around in our code and internally BioJava represents all the default alphabet names (Protein, DNA, etc.) in upper case. It also allows for mixed case alphabet names. It's not quite as easy as I thought to change these to lower case as they are often referenced by text name, meaning other people's code might break if I change them. Also, as it allows for mixed-case alphabet names, I can't do a toUpper/toLower fudge on persistence to BioSQL, as I wouldn't necessarily get out what I put in! So, I think I'll add this as a point on the recently announced BioJava 3 proposal, that BioSQL interaction must be compliant with standards laid down by the BioSQL project, and that our code will be able to cope with this internally. That brings us back to BioSQL standards - the idea of a mini-hackathon to solve this once and for all is a very good one. Our previous attempts between BioPerl and BioJava in Singapore were good, but still there are niggles as seen in this thread of discussion. It seems that a schema on it's own just isn't enough to make the various projects play nicely, and instructions are needed on exactly how to use that schema if they are truly all going to be able to use it without caring who or what wrote the data that is being read. cheers, Richard Hilmar Lapp wrote: > It seems BioPerl and Biopython both want (and have traditionally used) > lowercase - do you mind going with that for Biojava as well, or > alternatively, simply map upon insert/update and retrieve? > > -hilmar > > On Nov 8, 2007, at 11:18 AM, Richard Holland wrote: > > we do need a consensus here. > > I'm happy to go with whatever value is chosen, as the BioJava code can > easily be modified to suit. > > cheers, > Richard > > Hilmar Lapp wrote: >>>> Indeed Biojava uses uppercase for alphabet. In Bioperl-db, we >>>> explicitly lowercase the value found for alphabet, and the comment >>>> says why: >>>> >>>> # Note: Biojava uses upper-case terms for alphabet, so we >>>> # need to change to all-lower in case the sequence was >>>> # manipulated by Biojava. >>>> $obj->alphabet(lc($rows->[3])) if $rows->[3]; >>>> >>>> However, when inserting sequences, we leave the value as is in >>>> BioPerl (which is lowercase), leading to a potential problem for >>>> Biojava upon retrieval. Do the Biojava folks deal with that? Should >>>> this may harmonized across the board? >>>> >>>> -hilmar >>>> >>>> On Nov 8, 2007, at 6:49 AM, Eric Gibert wrote: >>>> >>>>> Dear Peter, >>>>> >>>>> All the alphabet are "DNA" (upper case) in my database. The >>>>> sequences are taken from NCBI by a BioJava application. >>>>> Thus is should be that BioJava inserts the records with "DNA". Thus >>>>> no potential "hidden bug" in BioPython. >>>>> >>>>> Maybe a point to share with the Open-Bio committee. >>>>> >>>>> Eric >>>>> >>>>> ----- Message d'origine ---- >>>>> De : Peter >>>>> ? : Eric Gibert >>>>> Cc : biopython at lists.open-bio.org >>>>> Envoy? le : Jeudi, 8 Novembre 2007, 19h40mn 00s >>>>> Objet : Re: [BioPython] small "bug" correction in package BioSql >>>>> >>>>> Eric Gibert wrote: >>>>>> Dear all, >>>>>> >>>>>> In BioSeq/BioSeq.py, in the class DBSeq definition, we have the >>>>>> function: >>>>>> >>>>>> ... >>>>>> >>>>>> please note my correction: force moltype to be turn in lower case as >>>>>> my database has upper case value! this raises the "Unknown moltype" >>>>>> error. >>>>> Hi Eric, I've made your suggested change in CVS, >>>>> biopython/BioSQL/BioSeq.py revision 1.13, thank you. >>>>> >>>>> I would encourage you to investigate why some of the "alphabet" fields >>>>> in the biosequence table are in upper case. There could be a bug >>>>> elsewhere which is writing these entries with the wrong alphabet. Is >>>>> this affecting all entries, or just some? >>>>> >>>>> Peter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> _______ >>>>> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers >>>>> Yahoo! Mail >>>>> _______________________________________________ >>>>> BioPython mailing list - BioPython at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython >>>> > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHNFW84C5LeMEKA/QRApBiAJ41WqCDKOJhee5NxIsquYaR/ImBRgCfb7zM LX75HHvCUC/v4n3okmUQ+ME= =d6QO -----END PGP SIGNATURE----- From email2ants at gmail.com Fri Nov 9 12:55:36 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Fri, 9 Nov 2007 17:55:36 +0000 Subject: [Biojava-l] Getting a base from an alignment (way to complex?) Message-ID: Hi All, I've generated an alignment and I am retrieving positions within the alignment using Symbol base = alignment.symbolAt(label, i); I am trying to get whether the base at this position is G, A, T or C However when I use base.getName() it returns strings such as "thymine" The documentation states that the method getToken should also be available, but this returns method undefined. http://www.biojava.org/docs/api15/org/biojava/bio/symbol/Symbol.html Is there a simple way of retrieving a one letter textual representation of the symbol? Many thanks Anthony From zagato.gekko at gmail.com Fri Nov 9 13:48:02 2007 From: zagato.gekko at gmail.com (Zagato) Date: Fri, 9 Nov 2007 13:48:02 -0500 Subject: [Biojava-l] Getting a base from an alignment (way to complex?) In-Reply-To: References: Message-ID: <98028b00711091048k26c61fc7qc68b14d8d289c769@mail.gmail.com> Try with: String s = alignment.symbolListForLabel( label ).subStr( i, i+1 ); Bye... Alan Jairo Acosta Cali - Colombia On Nov 9, 2007 12:55 PM, Anthony Underwood wrote: > Hi All, > > I've generated an alignment and I am retrieving positions within the > alignment using > > Symbol base = alignment.symbolAt(label, i); > > I am trying to get whether the base at this position is G, A, T or C > > However when I use base.getName() it returns strings such as "thymine" > > The documentation states that the method getToken should also be > available, but this returns method undefined. > http://www.biojava.org/docs/api15/org/biojava/bio/symbol/Symbol.html > > Is there a simple way of retrieving a one letter textual > representation of the symbol? > > > Many thanks > > > Anthony > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Farewell. http://www.youtube.com/zagatogekko ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From zagato.gekko at gmail.com Fri Nov 9 13:48:02 2007 From: zagato.gekko at gmail.com (Zagato) Date: Fri, 9 Nov 2007 13:48:02 -0500 Subject: [Biojava-l] Getting a base from an alignment (way to complex?) In-Reply-To: References: Message-ID: <98028b00711091048k26c61fc7qc68b14d8d289c769@mail.gmail.com> Try with: String s = alignment.symbolListForLabel( label ).subStr( i, i+1 ); Bye... Alan Jairo Acosta Cali - Colombia On Nov 9, 2007 12:55 PM, Anthony Underwood wrote: > Hi All, > > I've generated an alignment and I am retrieving positions within the > alignment using > > Symbol base = alignment.symbolAt(label, i); > > I am trying to get whether the base at this position is G, A, T or C > > However when I use base.getName() it returns strings such as "thymine" > > The documentation states that the method getToken should also be > available, but this returns method undefined. > http://www.biojava.org/docs/api15/org/biojava/bio/symbol/Symbol.html > > Is there a simple way of retrieving a one letter textual > representation of the symbol? > > > Many thanks > > > Anthony > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Farewell. http://www.youtube.com/zagatogekko ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From gwaldon at geneinfinity.org Fri Nov 9 13:45:10 2007 From: gwaldon at geneinfinity.org (George Waldon) Date: Fri, 09 Nov 2007 10:45:10 -0800 Subject: [Biojava-l] Getting a base from an alignment (way to complex?) Message-ID: <20071109184510.80580.qmail@mmm1924.dulles19-verio.com> Tokens are associated with alphabets. Get the tokenization from the alphabet using: SymbolTokenization = Alphabet.getTokenization("token"); Get the token from the tokenization using: String = SymbolTokenization.tokenizeSymbol(Symbol); Also, check the tutotial and the cookbook on the biojava web site at www.biojava.org, which are often more informative than the javadoc. Frankly speaking, I agree with you and we should have a method like String = Symbol.getToken(Alphabet,"token"); to do these operations simply and without loosing our hairs! Best luck, George > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Anthony Underwood > Sent: Friday, November 09, 2007 9:56 AM > To: BioJava > Subject: [Biojava-l] Getting a base from an alignment (way to complex?) > > Hi All, > > I've generated an alignment and I am retrieving positions within the > alignment using > > Symbol base = alignment.symbolAt(label, i); > > I am trying to get whether the base at this position is G, A, T or C > > However when I use base.getName() it returns strings such as "thymine" > > The documentation states that the method getToken should also be > available, but this returns method undefined. > http://www.biojava.org/docs/api15/org/biojava/bio/symbol/Symbol.html > > Is there a simple way of retrieving a one letter textual > representation of the symbol? > > > Many thanks > > > Anthony > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists From email2ants at gmail.com Fri Nov 9 18:23:01 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Fri, 9 Nov 2007 23:23:01 +0000 Subject: [Biojava-l] Getting a base from an alignment (way to complex?) In-Reply-To: <98028b00711091048k26c61fc7qc68b14d8d289c769@mail.gmail.com> References: <98028b00711091048k26c61fc7qc68b14d8d289c769@mail.gmail.com> Message-ID: <70FC5536-E1B3-41C7-92BC-0B43A0E11E09@gmail.com> Hi Alan, Thanks for the suggestion. That was my first thought, but then I was thinking for amino acids this wouldn't work. I would have to use a hashmap to convert the amino acid to the appropriate single letter code. Hi George, I'll try your suggestion. As you say I think this is too much for something that should be a one liner. Thanks for your advice. Get the tokenization from the alphabet using: SymbolTokenization = Alphabet.getTokenization("token"); Get the token from the tokenization using: String = SymbolTokenization.tokenizeSymbol(Symbol); Thanks to both of you Anthony On 9 Nov 2007, at 18:48, Zagato wrote: > Try with: > String s = alignment.symbolListForLabel( label ).subStr( i, i+1 ); > > Bye... > > Alan Jairo Acosta > Cali - Colombia > > On Nov 9, 2007 12:55 PM, Anthony Underwood < email2ants at gmail.com> > wrote: > Hi All, > > I've generated an alignment and I am retrieving positions within the > alignment using > > Symbol base = alignment.symbolAt(label, i); > > I am trying to get whether the base at this position is G, A, T or C > > However when I use base.getName() it returns strings such as "thymine" > > The documentation states that the method getToken should also be > available, but this returns method undefined. http://www.biojava.org/docs/api15/org/biojava/bio/symbol/Symbol.html > > Is there a simple way of retrieving a one letter textual > representation of the symbol? > > > Many thanks > > > Anthony > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Farewell. > http://www.youtube.com/zagatogekko > ruby << __EOF__ > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse > __EOF__ From hlapp at gmx.net Sat Nov 10 15:38:17 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 10 Nov 2007 15:38:17 -0500 Subject: [Biojava-l] error on insert new sequences from GenBank: no annotations saved in BioSQL database In-Reply-To: <001c01c8238b$2ec64070$6400a8c0@Gecko> References: <499834.44468.qm@web26501.mail.ukl.yahoo.com> <47336117.2010102@maubp.freeserve.co.uk> <001c01c8238b$2ec64070$6400a8c0@Gecko> Message-ID: <5DDEBCDE-C8DA-4B2C-86F4-47FDB82CADAC@gmx.net> Just a few comments below, specifically where no rows would in fact be what I expect: On Nov 10, 2007, at 6:16 AM, Eric Gibert wrote: > [...] > -------- For you information, I went thru the tables of my BioSQL > database: > [...] > 1) table bioentry: all column populated except for 'taxon_id' which > is NULL > (maybe I need an extra call for populating the 'taxon' table before?) Bioperl-db will try to look up (or create if necessary) the taxon from the taxon information attached to the sequence, but for BioPerl we actually recommend to pre-load the database with the NCBI taxonomy, which can be comfortably done with the script load_ncbi_taxonomy.pl that comes with BioSQL. > > 2) table bioentry_dbxref: no data inserted (always empty, even with > BioJava) This would mean that the sequence(s) have no dbxrefs. Note that for GenBank sequences that would be expected, since unfortunately, and unlike EMBL format, GenBank puts the dbxrefs into the feature table. > 3) table bioentry_qualifier_value: > > One entry only, for the 'term_id' = 149, rank = 1, and value = '07- > JUL-2005' > or other 'DD-MMM-YYYY' dates (see my remarks below) Below you say that your term table is empty, so I don't know why you can have value here at all. > [...] > 5) table bioentry_relationships: no entry found (always empty, even > with > BioJava) If you load sequences, they won't have direct relationships to other sequences (except dbxrefs, but those are rather 'pointers' and are stored in their own table). In Bioperl-db, this table is used only if you load sequence clusters through Bio::Cluster objects (such as UniGene). > [...] > 7) table comment: no entry found (always empty, even with BioJava) Again, this is expected with GenBank. AFAIK genbank format doesn't allow for comments at the level of the sequence. You would (i.e., should) find entries here if you load UniProt entries. > 8) table dbxref: some records are generated, for dbname 'PUBMED' > and 'Taxon' > with the correct value Taxon obviously isn't really a dbxref, but rather a taxon (and hence should go into that table). > [...] > 9) table dbxref_qualifier_value: (always empty, even with BioJava) That's almost expected. There's rather few cases where dbxrefs have additional attributes that the language can parse out from a source (and then maps to the schema). > [...] > 10) table location: all locations loaded correctly, note that > 'term_id' and > 'dbxref_id' remain NULL for these seq but I have value for other seq. Theoretically, the term_id should point to the term giving the type of the location. If you (or Biopython) are only dealing with simple ('normal') locations, then it's not needed. The dbxref_id gives the reference to the remote sequence if the location for a feature refers to a different sequence than the feature itself does (so-called 'remote locations'). If the sequences you loaded don't have such locations, there this would be expected to be empty (or if Biopython doesn't handle such locations). > 11) table location_qualifier_value: always empty, even with BioJava This is expected if Biopython doesn't support fuzzy locations, or if none of the feature locations that you loaded are fuzzy. > [...] > 13) Table reference: entries correct, note 'dbxref_id' remains NULL > for > these seq but I have value for other seq. It should point to the pubmed ID for the reference but only if there was one. > 14) table seqfeature: entries are there (same as in table 'location'). > FYI:'display_name is always NULL. GenBank doesn't give names to features (and I think EMBL does neither), so this is expected. > 15) table seqfeature_dbxref: always empty, even with BioJava That's likely more to do with your language object model than with anything else. dbxref annotation for features is in tag/value pairs, just as any other, so your language (Biopython in this case) will have to do a lot of interpretation to tease out the semantics behind each tag name and based on that decide what to do with the value. Indeed, by default we don't even do this in BioPerl. > [...] > 17) table seqfeature_relationship: always empty, even with BioJava GenBank (and EMBL) feature tables are flat, not hierarchical, so this is expected. > 18) table taxon: always empty, even with BioJava) This is where the organism should go. > 19) table taxon_name: I have one but not from this test (I tried to > tinker a > little bit with taxon but stopped) That's odd that you can have an entry in taxon_name w/o a corresponding one in taxon. Do you have foreign key checks disabled? > 20) table term: always empty, even with BioJava That's strange, since you say you do have rows in bioentry_qualifier_value, which has an enforced foreign key to term. Did you disable the foreign key checks? > 21) table term_dbxref: always empty, even with BioJava That's expected unless you loaded an ontology whose terms have dbxrefs, and your language object model supports that. > [...] > 23) table term_synonym: always empty, even with BioJava Same as for 21). Your terms would have to have synonyms, and your language object model would have to support those, before you could expect to get anything in here. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From shirleyc at cis.upenn.edu Tue Nov 13 13:45:59 2007 From: shirleyc at cis.upenn.edu (Shirley Cohen) Date: Tue, 13 Nov 2007 13:45:59 -0500 Subject: [Biojava-l] maximum parsimony search Message-ID: <3001DEBB-AD61-4089-AE42-910AAC097D99@cis.upenn.edu> Hi BioJava People, I'm looking for existing code that implements a maximum parsimony search in Java. Does BioJava have this functionality? If so, can you point me to the appropriate classes? Thanks, Shirley From bmduggan at yahoo.com Tue Nov 13 19:48:22 2007 From: bmduggan at yahoo.com (Brendan Duggan) Date: Wed, 14 Nov 2007 11:48:22 +1100 (EST) Subject: [Biojava-l] Disulfide information in PDB files Message-ID: <454510.91557.qm@web52705.mail.re2.yahoo.com> Greetings I'm trying to mine some information on disulfides in the PDB and was hoping there might be a way of obtaining this information with the BioJava PDB parser. However, I haven't been able to see anything like this mentioned in the API docs. If it is currently not possible to extract disulfide information from PDB files are there any plans to implement this? Thanks! Brendan Make the switch to the world's best email. Get the new Yahoo!7 Mail now. http://au.yahoo.com/worldsbestmail/viagra/index.html From holland at ebi.ac.uk Wed Nov 14 03:50:31 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 14 Nov 2007 08:50:31 +0000 Subject: [Biojava-l] maximum parsimony search In-Reply-To: <3001DEBB-AD61-4089-AE42-910AAC097D99@cis.upenn.edu> References: <3001DEBB-AD61-4089-AE42-910AAC097D99@cis.upenn.edu> Message-ID: <473AB6D7.2010405@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There is a class currently only available from the head of CVS - ie. it is unreleased yet. To get it you'll need to check out the very latest BioJava source code from CVS. The JavaDoc for the class is here: http://www.spice-3d.org/public-files/javadoc/biojava/org/biojavax/bio/phylo/ParsimonyTreeMethod.html It is designed to take input in the form of blocks of data similar to what you would find in a Nexus file (the Nexus file parsers elsewhere in the org/biojavax/bio/phylo package will provide these). However you could of course create such objects from your own data without needing to read/write any Nexus files. cheers, Richard Shirley Cohen wrote: > Hi BioJava People, > > I'm looking for existing code that implements a maximum parsimony > search in Java. Does BioJava have this functionality? If so, can you > point me to the appropriate classes? > > Thanks, > > Shirley > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHOrbW4C5LeMEKA/QRAuswAJ9olIwj7DGszOnKORU255YS3m2ohACfbKTw ihjuQVv0j+nlXb+4SL5pIfw= =ldfM -----END PGP SIGNATURE----- From holland at ebi.ac.uk Wed Nov 14 03:55:24 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 14 Nov 2007 08:55:24 +0000 Subject: [Biojava-l] Disulfide information in PDB files In-Reply-To: <454510.91557.qm@web52705.mail.re2.yahoo.com> References: <454510.91557.qm@web52705.mail.re2.yahoo.com> Message-ID: <473AB7FC.10403@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Currently this is not parsed - the parser does not read all the tags in the most recent PDB specification. Could you open a bug request at http://bugzilla.open-bio.org/ to formally add this to our to-do list? Thanks! cheers, Richard Brendan Duggan wrote: > Greetings > > I'm trying to mine some information on disulfides in > the PDB and was hoping there might be a way of > obtaining this information with the BioJava PDB > parser. However, I haven't been able to see anything > like this mentioned in the API docs. If it is > currently not possible to extract disulfide > information from PDB files are there any plans to > implement this? > > Thanks! > > Brendan > > > Make the switch to the world's best email. Get the new Yahoo!7 Mail now. http://au.yahoo.com/worldsbestmail/viagra/index.html > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHOrf84C5LeMEKA/QRArfeAJ9nCViM2jyVfubIpl5w/1EXMYTv/gCgjVEs zDnxHjv8xJsRBw5pfE2NdkA= =tGqm -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Wed Nov 14 04:32:28 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Wed, 14 Nov 2007 09:32:28 +0000 Subject: [Biojava-l] Disulfide information in PDB files In-Reply-To: <454510.91557.qm@web52705.mail.re2.yahoo.com> References: <454510.91557.qm@web52705.mail.re2.yahoo.com> Message-ID: <9B898ADF-78EB-4B5C-A432-98274190815F@sanger.ac.uk> Hi Brendan, SSBOND lines are currently not parsed. If this is what you need, I can add this over the next couple of days. If you want to compute the bonds yourself, the framework can e.g. calculate distances between the sulphur atoms for you. - Andreas On 14 Nov 2007, at 00:48, Brendan Duggan wrote: > Greetings > > I'm trying to mine some information on disulfides in > the PDB and was hoping there might be a way of > obtaining this information with the BioJava PDB > parser. However, I haven't been able to see anything > like this mentioned in the API docs. If it is > currently not possible to extract disulfide > information from PDB files are there any plans to > implement this? > > Thanks! > > Brendan > > > Make the switch to the world's best email. Get the new Yahoo! > 7 Mail now. http://au.yahoo.com/worldsbestmail/viagra/index.html > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From deb at mb.au.dk Thu Nov 15 07:04:02 2007 From: deb at mb.au.dk (Ditlev Egeskov Brodersen) Date: Thu, 15 Nov 2007 13:04:02 +0100 Subject: [Biojava-l] Parsing exising gaps Message-ID: <002701c8277f$9dbdca50$d9395ef0$@au.dk> Dear all, I have managed to read an MSF-formatted alignment from a file selected through FileChooser as follows: BufferedReader br = new BufferedReader(new FileReader(aFileChooser.getSelectedFile())); SimpleAlignment align = (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, br); I can now retrieve the sequence names and sequences through the Alignment object: Iterator aLabels = align.getLabels().iterator(); Iterator aSequences = align.symbolListIterator(); However, I now what to be able to translate between real sequence numbers and the positions within each alignment string, i.e. retrieve positions that remove the gaps first (gaps are represented by hyphens '-' in the MSF format). How can I tell BioJava to parse the gaps into an GappedSequence format? I have tried the following to check what position 15 (past the the first gap) translates into: int n = 0; while(aSequences.hasNext()) { SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); SimpleGappedSequence aGapped = new SimpleGappedSequence(new SimpleSequence(aSym, "", aLabels.next().toString(), null)); System.out.println(aGapped.gappedToLocation(new PointLocation(15))); } But I only get 15 back out. I have also studied the constructor of the underlying SimpleGappedSymbolList but it simply copies the SymbolList and creates one big block: public SimpleGappedSymbolList(SymbolList source) { this.source = source; this.alpha = source.getAlphabet(); this.blocks = new ArrayList(); this.length = source.length(); Block b = new Block(1, length, 1, length); blocks.add(b); } Is there a way to tell SimpleGappedSequence to parse itself in terms of the gap characters in the sequence string? How is the sequence represented in this case, if not by gaps? Surely the hyphen cannot be a part of the standard PROTEIN-TERM alphabet, yet I get no complaints for the use of it? Best wishes, Ditlev -- Ditlev E. Brodersen, Ph.D. Lektor, Associate Professor Department of Molecular Biology Office: +45 89425259 University of AarhusLab: +45 89425022 Gustav Wieds Vej 10cFax: +45 86123178 DK-8000 Aarhus C Email: deb at mb.au.dk Denmark Lab WWW: www.bioxray.dk/~deb From holland at ebi.ac.uk Thu Nov 15 08:51:48 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 15 Nov 2007 13:51:48 +0000 Subject: [Biojava-l] Parsing exising gaps In-Reply-To: <002701c8277f$9dbdca50$d9395ef0$@au.dk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> Message-ID: <473C4EF4.5080301@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think you've uncovered a number of problems here: 1. The PROTEIN-TERM alphabet does define '-' as a valid symbol, as do all the other predefined alphabets. 2. The MSF parser doesn't bother trying to build GappedSequence instances, instead it just builds solid sequences with the gaps as normal symbols. 3. There is no constructor or method for taking a sequence with embedded gap symbols and turning it into a GappedSequence with separate chunks. Combined, these three problems make it impossible to do what you want easily. I will make a note to fix this on the plans for the next BioJava development cycle. In the meantime, your best bet would be to construct a second alignment block by iterating over the alignment block you already have and parsing the locations of the gap symbols. You would create a SimpleGappedSequence intially over the ungapped sequence, then use the insert gap methods to insert the gaps into this ungapped sequence before putting all the SimpleGappedSequence objects together into a new alignment. cheers, Richard Ditlev Egeskov Brodersen wrote: > Dear all, > > > > I have managed to read an MSF-formatted alignment from a file selected > through FileChooser as follows: > > > > BufferedReader br = new BufferedReader(new > FileReader(aFileChooser.getSelectedFile())); > > SimpleAlignment align = > (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, br); > > > > I can now retrieve the sequence names and sequences through the Alignment > object: > > > > Iterator aLabels = align.getLabels().iterator(); > > Iterator aSequences = align.symbolListIterator(); > > > > However, I now what to be able to translate between real sequence numbers > and the positions within each alignment string, i.e. retrieve positions that > remove the gaps first (gaps are represented by hyphens '-' in the MSF > format). How can I tell BioJava to parse the gaps into an GappedSequence > format? I have tried the following to check what position 15 (past the the > first gap) translates into: > > > > int n = 0; > > while(aSequences.hasNext()) { > > SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); > > SimpleGappedSequence aGapped = new SimpleGappedSequence(new > SimpleSequence(aSym, "", aLabels.next().toString(), null)); > > System.out.println(aGapped.gappedToLocation(new PointLocation(15))); > > } > > > > But I only get 15 back out. I have also studied the constructor of the > underlying SimpleGappedSymbolList but it simply copies the SymbolList and > creates one big block: > > > > public SimpleGappedSymbolList(SymbolList source) { > > this.source = source; > > this.alpha = source.getAlphabet(); > > this.blocks = new ArrayList(); > > this.length = source.length(); > > Block b = new Block(1, length, 1, length); > > blocks.add(b); > > } > > > > Is there a way to tell SimpleGappedSequence to parse itself in terms of the > gap characters in the sequence string? How is the sequence represented in > this case, if not by gaps? Surely the hyphen cannot be a part of the > standard PROTEIN-TERM alphabet, yet I get no complaints for the use of it? > > > > Best wishes, > > > > Ditlev > > > > -- > > > > Ditlev E. Brodersen, Ph.D. > Lektor, Associate Professor > > > > Department of Molecular Biology Office: +45 89425259 > University of AarhusLab: +45 89425022 > Gustav Wieds Vej 10cFax: +45 86123178 > DK-8000 Aarhus C Email: deb at mb.au.dk > Denmark Lab WWW: www.bioxray.dk/~deb > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHPE704C5LeMEKA/QRAniIAJsGv+5HIP3mCDxBIUdw0SjDrWu8dgCeNviA EsJK4gv+EVY7wc4r6W2A0+I= =wCQs -----END PGP SIGNATURE----- From holland at ebi.ac.uk Fri Nov 16 03:59:41 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 16 Nov 2007 08:59:41 +0000 Subject: [Biojava-l] Parsing exising gaps In-Reply-To: <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> Message-ID: <473D5BFD.8080305@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Ditlev. After some investigation and some helpful hints from Mark, it turns out that there are methods in DNATools/ProteinTools that can construct proper GappedSymbolList objects out of strings. I have managed to modify the MSF parser to use this instead. This means that the MSF parser will now return instances of GappedSymbolList (actually GappedSequences to be accurate) rather than SimpleSymbolList. Thanks to the way the APIs work this will make no difference to existing users (except those who are depending on being able to cast it to a certain type - which they shouldn't, because the API doesn't guarantee it to be of any type!), but it will fix it for you. Future releases will modify the API (or include a completely new MSF parser) which will explicitly return GappedSymbolLists in the API declarations rather than plain SymbolLists, but I can't do that right now because it would break existing users code. To get the modified parser you will need to check out the very latest source code from our CVS repository and compile it using ant. Instructions are on our website at biojava.org if you have not done this before. Hope this helps you. cheers, Richard Ditlev Egeskov Brodersen wrote: > Hi Richard, > > thanks for clarifying this and for your useful suggestion, which I've > managed to implement as shown below. It works nicely, but I was really > surprised to learn that biojava hasn't yet implemented a proper parsing of > gap characters from strings into the object structure as this seems central > to any use of pre-aligned sequences. Also, I find it problematic that the > API implements the gap characters as part of the alphabets. In my view, this > breaks the logic of the object model because proteins don't really have gaps > in their sequences. > > Rather, the constructor of the Sequence-derived classes ought to throw an > exception when non-protein characters are passed and should not allow the > user to create an object with sequence elements that are non-standard. > Instead, I think there should be a static method that allows cleaning the > input sequence before passing it to the Sequence constructor. On the other > hand, the constructor of the GappedSequence-derived classes should recognise > the gaps and create an object with blocks of legal protein symbols and gaps > in the appropriate places. > > -- Ditlev > > // Read MSF file into Alignment object > BufferedReader br = new BufferedReader(new > FileReader(aFileChooser.getSelectedFile())); > SimpleAlignment align = > (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, br); > > // Iterate through sequences in turn > Iterator aSequences = align.symbolListIterator(); > while(aSequences.hasNext()) { > > // Retrieve SymbolList, the associated gap symbol and sequence string > SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); > Symbol aGapSymbol = aSym.getAlphabet().getGapSymbol(); > String aGappedString = aSym.seqString(); > > // Prepare non-gapped string > String aPlainString = ""; > > // Loop through individual symbols and add non-gap characters to > string > for(int i=1;i<=aSym.length();i++) > if(aSym.symbolAt(i) != aGapSymbol) > aPlainString += aGappedString.charAt(i-1); > > // Create a new gapped sequence object with the plain (non-gapped) > sequence > SimpleGappedSequence aGapped = > (SimpleGappedSequence)ProteinTools.createGappedProteinSequence(aPlainString, > ""); > > // Use separate indices for gapped and plain sequences > int n = 1; > > // Loop through individual gapped sequence symbols and insert gap into > object when gap symbol is encountered > for(int i=1;i<=aSym.length();i++) > if(aSym.symbolAt(i) != aGapSymbol) > n++; > else > aGapped.addGapInSource(n); > > -- > > Ditlev Egeskov Brodersen > Lektor > Bakkefaldet 30, Hasle > 8210 ?rhus V > > www.lindeman-brodersen.dk > >> -----Original Message----- >> From: Richard Holland [mailto:holland at ebi.ac.uk] >> Sent: 15 November 2007 14:52 >> To: Ditlev Egeskov Brodersen >> Cc: biojava-l at biojava.org >> Subject: Re: [Biojava-l] Parsing exising gaps >> > I think you've uncovered a number of problems here: > > 1. The PROTEIN-TERM alphabet does define '-' as a valid symbol, as do > all the other predefined alphabets. > > 2. The MSF parser doesn't bother trying to build GappedSequence > instances, instead it just builds solid sequences with the gaps as > normal symbols. > > 3. There is no constructor or method for taking a sequence with > embedded > gap symbols and turning it into a GappedSequence with separate chunks. > > Combined, these three problems make it impossible to do what you want > easily. I will make a note to fix this on the plans for the next > BioJava > development cycle. > > In the meantime, your best bet would be to construct a second alignment > block by iterating over the alignment block you already have and > parsing > the locations of the gap symbols. You would create a > SimpleGappedSequence intially over the ungapped sequence, then use the > insert gap methods to insert the gaps into this ungapped sequence > before > putting all the SimpleGappedSequence objects together into a new > alignment. > > cheers, > Richard > > Ditlev Egeskov Brodersen wrote: >>>> Dear all, >>>> >>>> >>>> >>>> I have managed to read an MSF-formatted alignment from a file > selected >>>> through FileChooser as follows: >>>> >>>> >>>> >>>> BufferedReader br = new BufferedReader(new >>>> FileReader(aFileChooser.getSelectedFile())); >>>> >>>> SimpleAlignment align = >>>> (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, > br); >>>> >>>> >>>> I can now retrieve the sequence names and sequences through the > Alignment >>>> object: >>>> >>>> >>>> >>>> Iterator aLabels = align.getLabels().iterator(); >>>> >>>> Iterator aSequences = align.symbolListIterator(); >>>> >>>> >>>> >>>> However, I now what to be able to translate between real sequence > numbers >>>> and the positions within each alignment string, i.e. retrieve > positions that >>>> remove the gaps first (gaps are represented by hyphens '-' in the MSF >>>> format). How can I tell BioJava to parse the gaps into an > GappedSequence >>>> format? I have tried the following to check what position 15 (past > the the >>>> first gap) translates into: >>>> >>>> >>>> >>>> int n = 0; >>>> >>>> while(aSequences.hasNext()) { >>>> >>>> SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); >>>> >>>> SimpleGappedSequence aGapped = new SimpleGappedSequence(new >>>> SimpleSequence(aSym, "", aLabels.next().toString(), null)); >>>> >>>> System.out.println(aGapped.gappedToLocation(new > PointLocation(15))); >>>> } >>>> >>>> >>>> >>>> But I only get 15 back out. I have also studied the constructor of > the >>>> underlying SimpleGappedSymbolList but it simply copies the SymbolList > and >>>> creates one big block: >>>> >>>> >>>> >>>> public SimpleGappedSymbolList(SymbolList source) { >>>> >>>> this.source = source; >>>> >>>> this.alpha = source.getAlphabet(); >>>> >>>> this.blocks = new ArrayList(); >>>> >>>> this.length = source.length(); >>>> >>>> Block b = new Block(1, length, 1, length); >>>> >>>> blocks.add(b); >>>> >>>> } >>>> >>>> >>>> >>>> Is there a way to tell SimpleGappedSequence to parse itself in terms > of the >>>> gap characters in the sequence string? How is the sequence > represented in >>>> this case, if not by gaps? Surely the hyphen cannot be a part of the >>>> standard PROTEIN-TERM alphabet, yet I get no complaints for the use > of it? >>>> >>>> >>>> Best wishes, >>>> >>>> >>>> >>>> Ditlev >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> Ditlev E. Brodersen, Ph.D. >>>> Lektor, Associate Professor >>>> >>>> >>>> >>>> Department of Molecular Biology Office: +45 89425259 >>>> University of AarhusLab: +45 89425022 >>>> Gustav Wieds Vej 10cFax: +45 86123178 >>>> DK-8000 Aarhus C Email: deb at mb.au.dk >>>> Denmark Lab WWW: > www.bioxray.dk/~deb >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHPVv84C5LeMEKA/QRAn0cAJ9jJUaA3bjiEwlzxaAo/bsN5+CT1QCcCLxS Rv73CVmtYpEz+apJwM1L3sA= =UPU6 -----END PGP SIGNATURE----- From deb at mb.au.dk Fri Nov 16 04:28:40 2007 From: deb at mb.au.dk (Ditlev Egeskov Brodersen) Date: Fri, 16 Nov 2007 10:28:40 +0100 Subject: [Biojava-l] Parsing exising gaps In-Reply-To: <473D5BFD.8080305@ebi.ac.uk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> <473D5BFD.8080305@ebi.ac.uk> Message-ID: <000601c82833$143c5300$3cb4f900$@au.dk> Hi Richard, thanks for your super fast reply. I managed to recompile using CVS/ant and the MSF import now works brilliantly and simply as follows: BufferedReader br = new BufferedReader(new FileReader(aFileChooser.getSelectedFile())); SimpleAlignment align = (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, br); // Iterate through sequences in turn Iterator aSequences = align.symbolListIterator(); while(aSequences.hasNext()) { // Retrieve gapped sequence SimpleGappedSequence aGapped = (SimpleGappedSequence)aSequences.next(); ...do whatever with each gapped sequence } The returned gapped sequences are all properly set up with gaps, name etc. But as for other users, I think there may be some problems, since the SimpleAlignment object only has a general symbol list iterator, the user will have to cast each statement extracting a sequence object, and SimpleSequence aSimple = (SimpleSequence)aSequences.next(); returns an ClassCastException at run time. So old code might not run with the update as far as I can see. Ditlev -- ? Ditlev E. Brodersen, Ph.D. Lektor, Associate Professor ? Department of Molecular Biology?? Office:? +45 89425259 University of Aarhus????????????? Lab:???? +45 89425022 Gustav Wieds Vej 10c????????????? Fax:???? +45 86123178 DK-8000 Aarhus C????????????????? Email:? deb at mb.au.dk Denmark?????????????????????????? Lab WWW: www.bioxray.dk/~deb > -----Original Message----- > From: Richard Holland [mailto:holland at ebi.ac.uk] > Sent: 16 November 2007 10:00 > To: Ditlev Egeskov Brodersen > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] Parsing exising gaps > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Ditlev. > > After some investigation and some helpful hints from Mark, it turns out > that there are methods in DNATools/ProteinTools that can construct > proper GappedSymbolList objects out of strings. > > I have managed to modify the MSF parser to use this instead. This means > that the MSF parser will now return instances of GappedSymbolList > (actually GappedSequences to be accurate) rather than SimpleSymbolList. > Thanks to the way the APIs work this will make no difference to > existing > users (except those who are depending on being able to cast it to a > certain type - which they shouldn't, because the API doesn't guarantee > it to be of any type!), but it will fix it for you. Future releases > will > modify the API (or include a completely new MSF parser) which will > explicitly return GappedSymbolLists in the API declarations rather than > plain SymbolLists, but I can't do that right now because it would break > existing users code. > > To get the modified parser you will need to check out the very latest > source code from our CVS repository and compile it using ant. > Instructions are on our website at biojava.org if you have not done > this > before. > > Hope this helps you. > > cheers, > Richard > > > Ditlev Egeskov Brodersen wrote: > > Hi Richard, > > > > thanks for clarifying this and for your useful suggestion, which > I've > > managed to implement as shown below. It works nicely, but I was > really > > surprised to learn that biojava hasn't yet implemented a proper > parsing of > > gap characters from strings into the object structure as this seems > central > > to any use of pre-aligned sequences. Also, I find it problematic that > the > > API implements the gap characters as part of the alphabets. In my > view, this > > breaks the logic of the object model because proteins don't really > have gaps > > in their sequences. > > > > Rather, the constructor of the Sequence-derived classes ought to > throw an > > exception when non-protein characters are passed and should not allow > the > > user to create an object with sequence elements that are non- > standard. > > Instead, I think there should be a static method that allows cleaning > the > > input sequence before passing it to the Sequence constructor. On the > other > > hand, the constructor of the GappedSequence-derived classes should > recognise > > the gaps and create an object with blocks of legal protein symbols > and gaps > > in the appropriate places. > > > > -- Ditlev > > > > // Read MSF file into Alignment object > > BufferedReader br = new BufferedReader(new > > FileReader(aFileChooser.getSelectedFile())); > > SimpleAlignment align = > > (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, > br); > > > > // Iterate through sequences in turn > > Iterator aSequences = align.symbolListIterator(); > > while(aSequences.hasNext()) { > > > > // Retrieve SymbolList, the associated gap symbol and sequence > string > > SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); > > Symbol aGapSymbol = aSym.getAlphabet().getGapSymbol(); > > String aGappedString = aSym.seqString(); > > > > // Prepare non-gapped string > > String aPlainString = ""; > > > > // Loop through individual symbols and add non-gap characters > to > > string > > for(int i=1;i<=aSym.length();i++) > > if(aSym.symbolAt(i) != aGapSymbol) > > aPlainString += aGappedString.charAt(i-1); > > > > // Create a new gapped sequence object with the plain (non- > gapped) > > sequence > > SimpleGappedSequence aGapped = > > > (SimpleGappedSequence)ProteinTools.createGappedProteinSequence(aPlainSt > ring, > > ""); > > > > // Use separate indices for gapped and plain sequences > > int n = 1; > > > > // Loop through individual gapped sequence symbols and insert > gap into > > object when gap symbol is encountered > > for(int i=1;i<=aSym.length();i++) > > if(aSym.symbolAt(i) != aGapSymbol) > > n++; > > else > > aGapped.addGapInSource(n); > > > > -- > > > > Ditlev Egeskov Brodersen > > Lektor > > Bakkefaldet 30, Hasle > > 8210 ?rhus V > > > > www.lindeman-brodersen.dk > > > >> -----Original Message----- > >> From: Richard Holland [mailto:holland at ebi.ac.uk] > >> Sent: 15 November 2007 14:52 > >> To: Ditlev Egeskov Brodersen > >> Cc: biojava-l at biojava.org > >> Subject: Re: [Biojava-l] Parsing exising gaps > >> > > I think you've uncovered a number of problems here: > > > > 1. The PROTEIN-TERM alphabet does define '-' as a valid symbol, as do > > all the other predefined alphabets. > > > > 2. The MSF parser doesn't bother trying to build GappedSequence > > instances, instead it just builds solid sequences with the gaps as > > normal symbols. > > > > 3. There is no constructor or method for taking a sequence with > > embedded > > gap symbols and turning it into a GappedSequence with separate > chunks. > > > > Combined, these three problems make it impossible to do what you want > > easily. I will make a note to fix this on the plans for the next > > BioJava > > development cycle. > > > > In the meantime, your best bet would be to construct a second > alignment > > block by iterating over the alignment block you already have and > > parsing > > the locations of the gap symbols. You would create a > > SimpleGappedSequence intially over the ungapped sequence, then use > the > > insert gap methods to insert the gaps into this ungapped sequence > > before > > putting all the SimpleGappedSequence objects together into a new > > alignment. > > > > cheers, > > Richard > > > > Ditlev Egeskov Brodersen wrote: > >>>> Dear all, > >>>> > >>>> > >>>> > >>>> I have managed to read an MSF-formatted alignment from a file > > selected > >>>> through FileChooser as follows: > >>>> > >>>> > >>>> > >>>> BufferedReader br = new BufferedReader(new > >>>> FileReader(aFileChooser.getSelectedFile())); > >>>> > >>>> SimpleAlignment align = > >>>> (SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, > > br); > >>>> > >>>> > >>>> I can now retrieve the sequence names and sequences through the > > Alignment > >>>> object: > >>>> > >>>> > >>>> > >>>> Iterator aLabels = align.getLabels().iterator(); > >>>> > >>>> Iterator aSequences = align.symbolListIterator(); > >>>> > >>>> > >>>> > >>>> However, I now what to be able to translate between real sequence > > numbers > >>>> and the positions within each alignment string, i.e. retrieve > > positions that > >>>> remove the gaps first (gaps are represented by hyphens '-' in the > MSF > >>>> format). How can I tell BioJava to parse the gaps into an > > GappedSequence > >>>> format? I have tried the following to check what position 15 (past > > the the > >>>> first gap) translates into: > >>>> > >>>> > >>>> > >>>> int n = 0; > >>>> > >>>> while(aSequences.hasNext()) { > >>>> > >>>> SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next(); > >>>> > >>>> SimpleGappedSequence aGapped = new SimpleGappedSequence(new > >>>> SimpleSequence(aSym, "", aLabels.next().toString(), null)); > >>>> > >>>> System.out.println(aGapped.gappedToLocation(new > > PointLocation(15))); > >>>> } > >>>> > >>>> > >>>> > >>>> But I only get 15 back out. I have also studied the constructor of > > the > >>>> underlying SimpleGappedSymbolList but it simply copies the > SymbolList > > and > >>>> creates one big block: > >>>> > >>>> > >>>> > >>>> public SimpleGappedSymbolList(SymbolList source) { > >>>> > >>>> this.source = source; > >>>> > >>>> this.alpha = source.getAlphabet(); > >>>> > >>>> this.blocks = new ArrayList(); > >>>> > >>>> this.length = source.length(); > >>>> > >>>> Block b = new Block(1, length, 1, length); > >>>> > >>>> blocks.add(b); > >>>> > >>>> } > >>>> > >>>> > >>>> > >>>> Is there a way to tell SimpleGappedSequence to parse itself in > terms > > of the > >>>> gap characters in the sequence string? How is the sequence > > represented in > >>>> this case, if not by gaps? Surely the hyphen cannot be a part of > the > >>>> standard PROTEIN-TERM alphabet, yet I get no complaints for the > use > > of it? > >>>> > >>>> > >>>> Best wishes, > >>>> > >>>> > >>>> > >>>> Ditlev > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> > >>>> > >>>> Ditlev E. Brodersen, Ph.D. > >>>> Lektor, Associate Professor > >>>> > >>>> > >>>> > >>>> Department of Molecular Biology Office: +45 89425259 > >>>> University of AarhusLab: +45 89425022 > >>>> Gustav Wieds Vej 10cFax: +45 86123178 > >>>> DK-8000 Aarhus C Email: deb at mb.au.dk > >>>> Denmark Lab WWW: > > www.bioxray.dk/~deb > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHPVv84C5LeMEKA/QRAn0cAJ9jJUaA3bjiEwlzxaAo/bsN5+CT1QCcCLxS > Rv73CVmtYpEz+apJwM1L3sA= > =UPU6 > -----END PGP SIGNATURE----- From holland at ebi.ac.uk Fri Nov 16 04:49:35 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 16 Nov 2007 09:49:35 +0000 Subject: [Biojava-l] Parsing exising gaps In-Reply-To: <000601c82833$143c5300$3cb4f900$@au.dk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> <473D5BFD.8080305@ebi.ac.uk> <000601c82833$143c5300$3cb4f900$@au.dk> Message-ID: <473D67AF.2020007@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > The returned gapped sequences are all properly set up with gaps, name etc. > But as for other users, I think there may be some problems, since the > SimpleAlignment object only has a general symbol list iterator, the user > will have to cast each statement extracting a sequence object, and > > SimpleSequence aSimple = (SimpleSequence)aSequences.next(); > > returns an ClassCastException at run time. So old code might not run with > the update as far as I can see. This is true. However, such code would be unsupported by us as the API clearly states that SimpleAlignment returns SymbolList instances, and does not make any guarantees about the exact implementation details of the objects it returns. To attempt to cast it to anything other than SymbolList would be a mistake! (Although actually it is now returning a guarantee of GappedSymbolList, which is what your code can now take advantage of). To assume it will return SimpleSequence is outside the behaviour defined by the API and therefore should not be relied upon. A more correct behaviour would be to test each item returned: SymbolList symlist = aSequences.next(); if (symlist instanceof SimpleSequence) { SimpleSequence seq = (SimpleSequence)symlist; // Do simple-sequence stuff } else { // Do something else! } In future, I will modify the API to change the SymbolList guarantee to a GappedSymbolList guarantee, but I can't do this right now as this really would break everyone's code! We are currently planning a redesign as you may be aware, so issues like this will hopefully be resolved as part of that process. For a start, if we use Java 5 generics in future as we plan, we can strictly specify what kinds of objects will be returned by things such as the alignment API, making it easier for us to enforce API-compliant behaviour in user's code. cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHPWev4C5LeMEKA/QRAvTOAJ9tqdBGWangZ9YQPpEDJ4WWBP/vjQCdHlMB ITj7O/foDly4aOT4SV1Jb+k= =g7Vs -----END PGP SIGNATURE----- From deb at mb.au.dk Fri Nov 16 05:11:15 2007 From: deb at mb.au.dk (Ditlev Egeskov Brodersen) Date: Fri, 16 Nov 2007 11:11:15 +0100 Subject: [Biojava-l] Wrapping SimpleGappedSequence In-Reply-To: <473D67AF.2020007@ebi.ac.uk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> <473D5BFD.8080305@ebi.ac.uk> <000601c82833$143c5300$3cb4f900$@au.dk> <473D67AF.2020007@ebi.ac.uk> Message-ID: <000f01c82839$06722550$13566ff0$@au.dk> Hi again, thanks for the info - will do the check just to be proper. I have another question: In my application, I would like to wrap the retrieved SimpleGappedSequence objects inside another object that extends the functionality with application-specific stuff. Ideally, I would do this by extending the SimpleGappedSequence object and create it by passing the SimpleGappedSequence from the alignment import to the constructor of the parent, like so: class AlignedSequence extends SimpleGappedSequence { public AlignedSequence(SimpleGappedSequence aGapped) { super(aGapped); } ..custom stuff.. } However, the problem is that there is only one constructor for the SimpleGappedSequence, one which takes a simple Sequence object. I can pass the derived class alright, but all gap information is lost again, presumably because the SimpleGappedSequence constructor just takes out the seqString() and puts it into its own sequence object. Shouldn't the constructor of the SimpleGappedSequence class recognise when a derived (and gapped) sequence object is passed, and process it accordingly? As it stands, I am forced to include the SimpleGappedSequence as a private member of the AlignedSequence class, which is not near as nice since all statement using the class will have to do something like class AlignedSequence extends SimpleGappedSequence { private SimpleGappedSequence gapped_sequence; public AlignedSequence(SimpleGappedSequence aGapped) { gapped_sequence = aGapped; } public SimpleGappedSequence getGappedSequence() { return(gapped_sequence); } ..custom stuff.. } ... AlignedSequence aAligned = new AlignedSequence(aGapped); aAligned.getGappedSequence().seqString(); rather than simply: AlignedSequence aAligned = new AlignedSequence(aGapped); aAligned.seqString(); In other words, is there any solution with the current setup that would allow me to extend SimpleGappedSequence and not loose the gap information? -- Ditlev -- ? Ditlev E. Brodersen, Ph.D. Lektor, Associate Professor ? Department of Molecular Biology?? Office:? +45 89425259 University of Aarhus????????????? Lab:???? +45 89425022 Gustav Wieds Vej 10c????????????? Fax:???? +45 86123178 DK-8000 Aarhus C????????????????? Email:? deb at mb.au.dk Denmark?????????????????????????? Lab WWW: www.bioxray.dk/~deb > -----Original Message----- > From: Richard Holland [mailto:holland at ebi.ac.uk] > Sent: 16 November 2007 10:50 > To: Ditlev Egeskov Brodersen > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] Parsing exising gaps > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > The returned gapped sequences are all properly set up with gaps, > name etc. > > But as for other users, I think there may be some problems, since the > > SimpleAlignment object only has a general symbol list iterator, the > user > > will have to cast each statement extracting a sequence object, and > > > > SimpleSequence aSimple = (SimpleSequence)aSequences.next(); > > > > returns an ClassCastException at run time. So old code might not run > with > > the update as far as I can see. > > This is true. However, such code would be unsupported by us as the API > clearly states that SimpleAlignment returns SymbolList instances, and > does not make any guarantees about the exact implementation details of > the objects it returns. To attempt to cast it to anything other than > SymbolList would be a mistake! (Although actually it is now returning a > guarantee of GappedSymbolList, which is what your code can now take > advantage of). To assume it will return SimpleSequence is outside the > behaviour defined by the API and therefore should not be relied upon. > > A more correct behaviour would be to test each item returned: > > SymbolList symlist = aSequences.next(); > if (symlist instanceof SimpleSequence) { > SimpleSequence seq = (SimpleSequence)symlist; > // Do simple-sequence stuff > } else { > // Do something else! > } > > In future, I will modify the API to change the SymbolList guarantee to > a > GappedSymbolList guarantee, but I can't do this right now as this > really > would break everyone's code! > > We are currently planning a redesign as you may be aware, so issues > like > this will hopefully be resolved as part of that process. For a start, > if > we use Java 5 generics in future as we plan, we can strictly specify > what kinds of objects will be returned by things such as the alignment > API, making it easier for us to enforce API-compliant behaviour in > user's code. > > cheers, > Richard > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHPWev4C5LeMEKA/QRAvTOAJ9tqdBGWangZ9YQPpEDJ4WWBP/vjQCdHlMB > ITj7O/foDly4aOT4SV1Jb+k= > =g7Vs > -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Fri Nov 16 04:51:35 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 16 Nov 2007 09:51:35 +0000 Subject: [Biojava-l] Parsing exising gaps In-Reply-To: <473D5BFD.8080305@ebi.ac.uk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> <473D5BFD.8080305@ebi.ac.uk> Message-ID: > > To get the modified parser you will need to check out the very latest > source code from our CVS repository and compile it using ant. > Instructions are on our website at biojava.org if you have not done > this > before. alternatively you could get the automatically built biojava.jar from http://www.spice-3d.org/cruise/ Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at ebi.ac.uk Fri Nov 16 05:46:57 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 16 Nov 2007 10:46:57 +0000 Subject: [Biojava-l] Wrapping SimpleGappedSequence In-Reply-To: <000f01c82839$06722550$13566ff0$@au.dk> References: <002701c8277f$9dbdca50$d9395ef0$@au.dk> <473C4EF4.5080301@ebi.ac.uk> <000a01c827c1$8c8e5a50$a5ab0ef0$@dk> <473D5BFD.8080305@ebi.ac.uk> <000601c82833$143c5300$3cb4f900$@au.dk> <473D67AF.2020007@ebi.ac.uk> <000f01c82839$06722550$13566ff0$@au.dk> Message-ID: <473D7521.9070603@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The easiest way is simply for me to alter the constructor to SimpleGappedSequence (and equivalently to SimpleGappedSymbolList) to copy all gaps if passed another instance of GappedSymbolList as the parameter. I've just done this in CVS so you should be able to update your copy and observe the new behaviour. cheers, Richard Ditlev Egeskov Brodersen wrote: > Hi again, > > thanks for the info - will do the check just to be proper. I have another > question: In my application, I would like to wrap the retrieved > SimpleGappedSequence objects inside another object that extends the > functionality with application-specific stuff. Ideally, I would do this by > extending the SimpleGappedSequence object and create it by passing the > SimpleGappedSequence from the alignment import to the constructor of the > parent, like so: > > class AlignedSequence extends SimpleGappedSequence { > public AlignedSequence(SimpleGappedSequence aGapped) { > super(aGapped); > } > > ..custom stuff.. > } > > However, the problem is that there is only one constructor for the > SimpleGappedSequence, one which takes a simple Sequence object. I can pass > the derived class alright, but all gap information is lost again, presumably > because the SimpleGappedSequence constructor just takes out the seqString() > and puts it into its own sequence object. > > Shouldn't the constructor of the SimpleGappedSequence class recognise when a > derived (and gapped) sequence object is passed, and process it accordingly? > > As it stands, I am forced to include the SimpleGappedSequence as a private > member of the AlignedSequence class, which is not near as nice since all > statement using the class will have to do something like > > class AlignedSequence extends SimpleGappedSequence { > private SimpleGappedSequence gapped_sequence; > > public AlignedSequence(SimpleGappedSequence aGapped) { > gapped_sequence = aGapped; > } > > public SimpleGappedSequence getGappedSequence() { > return(gapped_sequence); > } > > ..custom stuff.. > } > > ... > > AlignedSequence aAligned = new AlignedSequence(aGapped); > aAligned.getGappedSequence().seqString(); > > rather than simply: > > AlignedSequence aAligned = new AlignedSequence(aGapped); > aAligned.seqString(); > > In other words, is there any solution with the current setup that would > allow me to extend SimpleGappedSequence and not loose the gap information? > > -- Ditlev > > -- > > Ditlev E. Brodersen, Ph.D. > Lektor, Associate Professor > > Department of Molecular Biology Office: +45 89425259 > University of Aarhus Lab: +45 89425022 > Gustav Wieds Vej 10c Fax: +45 86123178 > DK-8000 Aarhus C