From salmantahir1 at gmail.com Tue Feb 6 06:21:41 2007 From: salmantahir1 at gmail.com (Salman Tahir) Date: Tue, 6 Feb 2007 11:21:41 +0000 Subject: [Biojava-l] Variable modifications to Protein sequences Message-ID: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> Hi, I am emailing regarding a java program that I am writing which deals with the in silico manipulation of cross-linked peptides. I would like to manipulate protein sequences by taking into account variable modifications using BioJava classes (if possible). For example: If I have the following variable modification: hydroxo-BS2GD4 (N-term, K): 118.0563805 And the following sequence: FLEKQNKER (assuming it contains the N-term) I need to generate the following sequences FLEKQNKER -> FLEKQNKER, FLEKhydroxo-BS2GD0QNKER, FLEKQNKhydroxo-BS2GD0ER, FLEKhydroxo-BS2GD0QNKhydroxo-BS2GD0 ER, Fhydroxo-BS2GD0LEK QNKER and so on - generating a total of 8 peptide sequences with different masses. Is there a way i can generate these additional sequences using BioJava? Any help would be mostly appreciated. - Salman From markjschreiber at gmail.com Wed Feb 7 10:30:47 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 7 Feb 2007 23:30:47 +0800 Subject: [Biojava-l] Variable modifications to Protein sequences In-Reply-To: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> References: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> Message-ID: <93b45ca50702070730n79d0aea2seefa3f3af5cebd1@mail.gmail.com> Hi - I think that the best approach would be to define your own custom alphabet (possibly with a custom tokenization). You could then create SymbolLists or sequences with this new alphabet. - Mark On 2/6/07, Salman Tahir wrote: > Hi, > > I am emailing regarding a java program that I am writing which deals with > the in silico manipulation of cross-linked peptides. > > I would like to manipulate protein sequences by taking into account variable > modifications using BioJava classes (if possible). For example: > > If I have the following variable modification: > hydroxo-BS2GD4 (N-term, K): 118.0563805 > > And the following sequence: FLEKQNKER (assuming it contains the N-term) > > I need to generate the following sequences > FLEKQNKER -> FLEKQNKER, FLEKhydroxo-BS2GD0QNKER, FLEKQNKhydroxo-BS2GD0ER, > FLEKhydroxo-BS2GD0QNKhydroxo-BS2GD0 ER, Fhydroxo-BS2GD0LEK QNKER and so on - > generating a total of 8 peptide sequences with different masses. > > Is there a way i can generate these additional sequences using BioJava? Any > help would be mostly appreciated. > > - Salman > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From su24 at st-andrews.ac.uk Wed Feb 7 11:11:37 2007 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Wed, 7 Feb 2007 16:11:37 +0000 Subject: [Biojava-l] Use Of Blosum62 Message-ID: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> Dear All, I am having a problem using the Blosum62 matrix file from the NCBI website for use in Smith-Waterman alignments. When I attempt to run my code reading in the file to initialise the Substitution Matrix Object the following exception is thrown "org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't contain character: '*'" Thanking you in advance, Saif -------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Sir Harold Mitchell Building School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From su24 at st-andrews.ac.uk Wed Feb 7 11:38:52 2007 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Wed, 7 Feb 2007 16:38:52 +0000 Subject: [Biojava-l] Use Of Blosum62 In-Reply-To: <45C9FD73.5040200@ebi.ac.uk> References: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> <45C9FD73.5040200@ebi.ac.uk> Message-ID: <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> Hi, Here is the code I am using Cheers, Saif public static void main(String args[]) { String x=("prot sequence excluded"); String y=("prot sequence excluded"); try { FiniteAlphabet alphabet (FiniteAlphabetAlphabetManager.alphabetForName("PROTEIN"); SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("///Users/su24/BLOSUM62")); // The above line causes the exception SequenceAlignment aligner = new SmithWaterman( -1,3,2,2,1,matrix); Sequence query = ProteinTools.createProteinSequence(x, "query"); Sequence target = ProteinTools.createProteinSequence(y, "target"); aligner.pairwiseAlignment(query, target); System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString()); } catch (Exception exc) { exc.printStackTrace(); } } } Quoting Richard Holland : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi. You need to include your complete program code in your email so that > we can accurately diagnose the problem. > > cheers, > Richard > > Saif Ur-Rehman wrote: > > Dear All, > > > > I am having a problem using the Blosum62 matrix file from the NCBI website > for > > use in Smith-Waterman alignments. When I attempt to run my code reading in > the > > file to initialise the Substitution Matrix Object the following exception > is > > thrown > > > > "org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't > > contain character: '*'" > > > > > > > > Thanking you in advance, > > > > Saif > > > > > > > > > -------------------------------------------------------------------------------- > > Saif Ur-Rehman > > Research Student > > The Centre for Evolution, Genes & Genomics (CEGG) > > Sir Harold Mitchell Building > > School of Biology > > The University of St Andrews > > St Andrews, > > Fife > > Scotland,UK > > > > ------------------------------------------------------------------ > > University of St Andrews Webmail: https://webmail.st-andrews.ac.uk > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFFyf1y4C5LeMEKA/QRAp7zAJ9ncsgYyBBRDrgSBnANTYv0xYI+PwCgpGv0 > cQQ8vFVB8QtKW9kAt9pRSEQ= > =2rLL > -----END PGP SIGNATURE----- > ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Sir Harold Mitchell Building School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From andreas.draeger at uni-tuebingen.de Wed Feb 7 13:15:59 2007 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Wed, 07 Feb 2007 19:15:59 +0100 Subject: [Biojava-l] Use Of Blosum62 In-Reply-To: <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> References: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> <45C9FD73.5040200@ebi.ac.uk> <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> Message-ID: <45CA175F.1050105@uni-tuebingen.de> Hey Saif, > FiniteAlphabet alphabet > (FiniteAlphabetAlphabetManager.alphabetForName("PROTEIN"); > This cannot work because the symbol "*" is indeed not included in the alphabet "PROTEIN" as it marks the termination symbol. You will need to use the alphabet "PROTEIN-TERM". I hope that helps. If I remember correctly, there have been discussions about that special character in previous e-mails on that list. Cheers, Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From Russell.Smithies at agresearch.co.nz Sun Feb 11 19:55:17 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 12 Feb 2007 13:55:17 +1300 Subject: [Biojava-l] New Zealand bioinformatician vacancy In-Reply-To: <6dce9a0b0702110814v16d50c5dtc1c8c6f21f14c450@mail.gmail.com> References: <1170950909.6074.30.camel@localhost><6dce9a0b0702081123j3e8efe89r4e2a93b88d4029d9@mail.gmail.com><1170967751.6084.18.camel@localhost><6dce9a0b0702081307s11683737r17e8d22b47ed6fef@mail.gmail.com><1170970046.6084.36.camel@localhost><6dce9a0b0702081552n28e0aa79x79a8d4eb659fe119@mail.gmail.com><1171105304.5853.11.camel@localhost> <6dce9a0b0702110814v16d50c5dtc1c8c6f21f14c450@mail.gmail.com> Message-ID: Hi all, I hope you don't mind me posting an advert but we have an opening for a bioinformatician. So if you'd like to work in New Zealand, read on... thanx, Russell Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz AgResearch Farming Food and Health. First Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ========================================================== Bioinformatician AgResearch - Applied Biotechnologies As part of AgResearch's company strategy we are continuing to grow our business in the area of bioinformatics. This capability is essential for our science discovery. In this position you will be part of a national team of 26 bioinformaticians, mathematical biologists and statisticians and be based at our Grasslands campus at Palmerston North. This is a permanent position. You will be an advocate for bioinformatics within AgResearch; you will work collaboratively on projects and will provide bioinformatics training and advice to science staff working in the biotechnology area. We are seeking a person who has: * An excellent tertiary qualification in molecular biology or genetics * Experience with the use of bioinformatics applications * Knowledge of life sciences databases and the internet * Well developed IT technical skills and web based technologies * Experience in a training environment * Excellent writing, speaking and interpersonal skills * Familiarity with Perl, Java or Unix Scripting If you possess the above skills, we would like to hear from you. To find out more about this position please contact Peter Johnstone by email peter.johnstone at agresearch.co.nz or alternatively phone +64 3 489 9081. For a job description and to apply on line please go to http://www.agresearch.co.nz/recruitment . Vacancy No AGR494. Please provide contact details for 2 Referees with your application. For general information on AgResearch please visit our website at www.agresearch.co.nz Applications close 9th March 2007. Linda Murray Science Administrator AgResearch Limited Invermay Agricultural Centre Private Bag 50034 Mosgiel, New Zealand Phone: +64 3 489 9011 Email: linda.murray at agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From salmantahir1 at gmail.com Mon Feb 12 05:41:40 2007 From: salmantahir1 at gmail.com (Salman Tahir) Date: Mon, 12 Feb 2007 10:41:40 +0000 Subject: [Biojava-l] Tokenization Message-ID: <2049e51a0702120241k135a92edl235910394f1d124c@mail.gmail.com> Hi, I've created a custom alphabet and I would like to use it to parse a file. After defining the symbols in the alphabet I do the following: SymbolTokenization parser = alpha.getTokenization("token"); ('alpha being the custom alphabet'). However I get the following error: "BioException: There is no tokenization 'token' defined in alphabet Variable Modifications" I understand this to mean that I have to populate the new tokenization appropriately because this is a new alphabet. However, I cannot find any examples that show how to achieve this. Can anyone help? - Salman From m.titulaer at erasmusmc.nl Wed Feb 14 10:38:58 2007 From: m.titulaer at erasmusmc.nl (Mark Titulaer) Date: Wed, 14 Feb 2007 16:38:58 +0100 Subject: [Biojava-l] Conversion Applied Bioisystems MALDI Voyager DE-STR .dat file into peak list Message-ID: <45D32D12.3070800@erasmusmc.nl> I would like to convert an Applied Biosystems MALDI Voyager DE-STR .dat file into a peak list. Is there any library available ? From paolo.romano at istge.it Tue Feb 20 05:39:30 2007 From: paolo.romano at istge.it (Paolo Romano) Date: Tue, 20 Feb 2007 11:39:30 +0100 Subject: [Biojava-l] Final Call for Papers NETTAB 2007 Message-ID: <7.0.1.0.0.20070220113922.01e9c968@istge.it> ================================= Apologies, if you're receiving multiple copies ================================= Final Call for Papers 7th International Workshop NETTAB 2007 June 12-15, 2006 Department of Computer Science, University of Pisa, Pisa http://www.nettab.org/2007/ FOCUS THEME A Semantic Web for Bioinformatics: Goals, Tools, Systems, Applications ADJUNCT THEMES Algorithms in bioinformatics Formal Methods for Systems Biology Network Tools and Applications in Bioinformatics AIMS This workshop aims at getting together biologists, bioinformaticians, computer scientists and linguists to try to answer the following questions: - Is the Semantic Web of some use for Bioinformatics? - Which goals should have a Semantic Web for Bioinformatics? - Which standards, technologies and tools of the Semantic Web can most profitably be used in Bioinformatics? - Which application did the Semantic Web already find in Bioinformatics? - Which current Bioinformatics research problems can be solved by the Semantic Web? - Which are the short, medium and long term perspectives in applying Semantic Web technologies to Bionformatics? The workshop also intends to: - introduce the basic knowledge of related standards and technologies, in a non trivial way through invited lectures and tutorials - outline the promising features of the Semantic Web in bioinformatics through invited lectures and open discussion - show some valuable examples in bioinformatics through invited lectures, oral communications and posters - support as much discussion as possible through open discussions and a panel discussion - practically demonstrate "how it works" through tutorials INVITED SPEAKERS We are extremely glad to inform you that four excellent speakers have accepted to give an invited lecture at next NETTAB workshop. + Session on Goals of the Semantic Web in Bioinformatics Invited speaker: Eric Neumann, Co-chair of the Health Care and Life Science Interest Group of the Semantic Web Activity at W3C & Teranode Inc., USA + Session on Semantic Web Standards and Technologies Invited speaker: Guus Schreiber, Chair of Semantic Web Deployment Interest Group at W3C & University of Amsterdam, The Netherlands + Session on Semantic Web Tools Invited speaker: Olivier Bodenreider, Medical Ontology Research, National Library of Medicine, USA + Session on Applications of Semantic Web in Bioinformatics Invited speakers: Michael Schroeder, Biotec TU Dresden, Germany, and Albert Burger, Heriot-Watt University, Scotland, UK Titles of the lectures will soon be announced. TOPICS NETTAB workshops also include general sessions on Network Tools and Applications in Biology and adjunct focus topics selected by local organizers. You are therefore welcome to submit your work on any of the followings topics: Goals of a Semantic Web for Bioinformatics: Standards, Technologies, Tools for the Semantic Web Systems for a Semantic Web for Bioinformatics: Existing and perspective applications of the Semantic Web for Bioinformatics Algorithms in Bioinformatics Formal Methods for Systems Biology Network Tools and Applications in Bioinformatics DEADLINES Submission of oral communications: March 16, 2007 Proposals for tutorials: March 16, 2007 Submission of posters: April 20, 2007 Early Registration Deadline: April 27, 2007 SUBMISSION OF CONTRIBUTIONS You are welcome to submit your contributions through the MyReview system at the following URL: http://www.nettab.org/2007/myreview/ . This is a two steps submission procedure. You will have to submit first an abstract and later the paper. You are invited to prepare your contribution according to instructions available at http://www.biomedcentral.com/bmcbioinformatics/ifora/ (specially, see "Preparing main manuscript text" section). See the workshop's web site for more details about the preparation of the papers. PUBLICATION OF PAPERS AND POSTERS All accepted oral communications and posters will be published in the Workshop's Proceedings that will be distributed to all participants. Selected papers will be published by BMC Bioinformatics as a Supplement Issue. When submitting, you will be asked if your submission is intended for the workshop only or for the publication in BMC Bioinformatics too. The papers will be reviewed by the members of the Programme Committee. They will decide whether the papers are accepted for the workshop only or for the publication in BMC Bioinformatics as well. The workshop will engage itself to pay part of the cost of the publication, but please be advised that depending on the final budget of the workshop you could be requested to contribute to publication costs which has been set to 600 British pounds per paper. For any further information or clarification, please contact the organization by email at info at nettab.org . CHAIRS P. Romano, Bioinformatics, Natl Cancer Research Inst., Italy M. Schr?der, Biotechnology Centre, TU Dresden, Germany N. Cannata, Mathematics & Computer Science Dept, Univ. of Camerino, Italy O. Signore, ISTI, National Research Council, Italy PROGRAMME COMMITTEE G. Armano, Electrical and Electronic Engineering Dept, Univ. of Cagliari, IT C. Baker, Institute for Infocomm Research (I2R), SG P. Barahona, Department of Informatics, New University of Lisboa, PT L. Barrio-Alvers, Transinsight GmbH, DE O. Bodenreider, National Library of Medicine, USA A. Burger, Department of Computer Science, Heriot-Watt University, UK M. Cannataro, Experimental and Clinical Medicine Dept, Univ. of Catanzaro "Magna Graecia", IT W. Ceusters, Bioinformatics and Life Sciences, University at Buffalo, USA M. Cockerill, BioMed Central, UK M.-D. Devignes, LORIA, Vandoeuvre les Nancy, FR R. Dieng, INRIA, Sophia Antipolis, FR L. Grivell, European Molecular Biology Organisation, DE M. Harris, European Bioinformatics Institute, UK M. Helmer-Citterich, Biology Dept, University of Rome "Tor Vergata", IT C. M. Keet, Computer Science Faculty, Free University of Bozen-Bolzano, IT J. Koehler, Biomathematics and Bioinformatics, Rothamsted Research, UK M. Krallinger, Spanish National Cancer Research Center (CNIO), ES L. Krippahl, Department of Informatics, New University of Lisboa, PT P. Lambrix, Computer and Information Science Dept, Link?ping University, SE U. Leser, Institute for Computer Science, Humboldt-University of Berlin, DE J. Luciano, Department of Genetics, Harvard Medical School, USA R. Marangoni, Computer Science Department, University of Pisa, IT M. Marchiori, Pure and Applied Mathematics Dept, University of Padua, IT M. Masseroli, Department of BioEngineering, Polytechnic of Milan, IT G. Mauri, Informatics Systems and Communication Dept, Univ. Milan "Bicocca", IT E. Merelli, Mathematics and Computer Science Dept, University of Camerino, IT S. Moeller, Institute of Neuro- and Bioinformatics, University of L?beck, DE S. Philippi, Institute for Software Technology, Univ. of Koblenz-Landau, DE D. Quann, IBM Software Group, USA D. Rubin, Stanford Medical Informatics, Stanford University Medical Center, USA S.-A. Sansone, European Bioinformatics Institute, UK M. Senger, International Rice Research Institute, PH D. Turi, School of Computer Science, University of Manchester, UK G. Vetere, IBM Center for Advanced Studies of Rome, IT D. Zaccagnini, Language and Computing, USA Looking forward to seeing you all in Tuscany. Best regards. On behalf of the Chairs and of the Organizing Committee Paolo Romano Paolo Romano (paolo.romano at istge.it) Bioinformatics and Structural Proteomics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 From kumar at neb.com Wed Feb 21 16:33:48 2007 From: kumar at neb.com (Sanjay Kumar) Date: Wed, 21 Feb 2007 16:33:48 -0500 Subject: [Biojava-l] Job opportunity - Bioinformatics Software Developer Message-ID: Hi- Sorry for the job posting, but NEB has an opening for a bioinformatics software developer that might be just right for someone with BioJava experience. We are located just north of Boston Massachusetts, USA, and the work environment is wonderful. The posting is listed below and at our web site: http://www.neb.com/nebecomm/career.asp If interested, please email your C.V. and statement of interest to: resumes at neb.com Attn: Job Code 1102SK Cheers. -sanjay ====================== Sanjay Kumar, PhD Research Scientist New England Biolabs ================================================================= Bioinformatics Software Developer Summary: Bioinformatics software developer that will work under the direction of a principle investigator and collaborate with other scientists in designing, developing, testing and maintaining computational tools, web services, and potentially large relational databases to support the study of diverse research topics in molecular biology relevant to New England Biolabs. At minimum, a BS degree in computer science, biology, physics, or related field plus 2 years experience as a software developer is required. Proven ability to design and implement projects using RDBMS software and web-based user interfaces is essential. Primary Responsibilities: Accurately evaluate proposed projects in terms of their feasibility, required resources, and estimated timelines and milestones, and effectively communicate the results to the principle investigator. Design, implement, test, deploy, and maintain relational databases containing molecular biology/genomic data. Translate user requirements into a database schema. Design, implement, test, deploy, and maintain software tools for querying/browsing/updating data in relational databases, with emphasis on web accessible applications. Design, implement, test, deploy, and maintain software tools to access external data sources, e.g., at NCBI, EBI, and WormBase, with emphasis on web accessible applications. Assemble bioinformatics pipelines with custom and/or third-party software. Provide technical support to end users when necessary. Technical Expertise: Expertise in at least one of the following languages: Java, Perl, Python, C++, or C is required. Experience writing SQL (including DDL) preferably for Apache Derby, PostgreSQL, Sybase, or MS SQLServer is required. Web programming experience using HTML, Javascript, and PHP, and familiarity with Apache/Tomcat, Glassfish, or Sun Web Server is preferred. Experience with object relational mapping frameworks (Hibernate, JDO) and middleware (J2EE/EJB, Spring) is preferred. Experience with SOAP web services is preferred. Familiarity with Mac OS, Windows and UNIX and the ability to move easily between them is beneficial. If interested, please email your C.V. and statement of interest to: resumes at neb.com Attn: Job Code 1102SK From crackeur at comcast.net Sat Feb 24 23:28:32 2007 From: crackeur at comcast.net (Jimmy Zhang) Date: Sat, 24 Feb 2007 20:28:32 -0800 Subject: [Biojava-l] [ANN]VTD-XML 2.0 References: <1133171676.2994.75.camel@zidane> Message-ID: <015e01c75895$86e20800$0d02a8c0@ximpleware> The VTD-XML project team is proud to announce the release of version 2.0 of VTD-XML, the next generation XML parser/indexer. The new features introduced in this version are: * VTD+XML version 1.0: the world's first true native XML index that is simple, general-purpose and back-compatible with XML. * NodeRecorder Class that saves VTDNav's cursor location for later sequential access. * Overwrite capability * Lexically comparisons between VTD and strings To download the software, please go to http://sourceforge.net/project/showfiles.php?group_id=110612 To read the latest benchmark report please go to http://vtd-xml.sf.net/benchmark1.html To get the latest API overview http://www.ximpleware.com/vtd-xml_intro.pdf From ilhami.visne at gmail.com Tue Feb 27 09:47:38 2007 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Tue, 27 Feb 2007 15:47:38 +0100 Subject: [Biojava-l] case-sensitive sequences Message-ID: my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am using now RichSequenceIterator to iterate over the sequences. How can i tell biojava that it should parse it case-sensitive? if i call seq.seqString() method, it should return exactly like it was in the file with upper- and lower-case. thanx. From ilhami.visne at gmail.com Tue Feb 27 12:54:13 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Tue, 27 Feb 2007 18:54:13 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E44B6F.9090202@ebi.ac.uk> References: <45E44B6F.9090202@ebi.ac.uk> Message-ID: <45E47045.5010400@gmail.com> Thank you for quick answer. Here is the part of my code: BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); RichSequence rs = iter.nextRichSequence(); Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > DNA is not case-sensitive. What I suspect you are parsing is the output > of some sequencing software which is using case as a rough indicator of > base calling quality? > > The case will have been lost when the file was parsed, not at the moment > you iterate over the resulting sequences. This means that you have to > modify your file parsing method to become case-sensitive. > > The default DNA alphabet is not case-sensitive. It makes no distinction > between the two, and will convert everything to one case. > > If you need to preserve case, you will need to use a custom alphabet > which treats the cases differently, and also specify a tokenizer which > is case-sensitive. See the help pages at http://biojava.org/ for help on > creating new alphabets. Or, have a look at the ABITools.QUALITY alphabet > in BioJava, which interprets the case and stores the quality scores > separately. > > Note however that your custom alphabet is NOT the same as the original > DNA alphabet, and so you may not be able to use it in all the standard > transforms (RNA etc.). If you do want to use these then you will have to > make a second copy of each sequence using the normal DNA alphabet and > pass that copy to the routines. > > If you post to this list the code you are using to read the file, then I > can show you where to insert the reference to this new alphabet. > > cheers, > Richard > > Ilhami Visne wrote: > >> my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am >> using now RichSequenceIterator to iterate over the sequences. >> >> How can i tell biojava that it should parse it case-sensitive? if i call >> seq.seqString() method, it should return exactly like it was in the file >> with upper- and lower-case. >> >> thanx. >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > uZKlrdE8y6vMfKcOlm9yBZA= > =2VZC > -----END PGP SIGNATURE----- > > From markjschreiber at gmail.com Tue Feb 27 21:54:57 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 28 Feb 2007 10:54:57 +0800 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E47045.5010400@gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> Message-ID: <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> Hi - There are also the classes: SoftMaskedAlphabet and SoftMaskedAlphabet.CaseSensitiveTokenization and SoftMaskedAlphabet.MaskingDetector. Together these classes let you read a sequence that contains case sensitive information and (if you wish) make use of that information. You can also write out the sequence in the original case sensitive format. It was originally designed for reading data that had been 'softmasked' for low complexity regions (eg lower case regions are low complexity and would be ignored in subsequent analysis) but it would be used for quality or any other distinction. - Mark On 2/28/07, ilhami visne wrote: > Thank you for quick answer. Here is the part of my code: > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > RichSequence rs = iter.nextRichSequence(); > > Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > DNA is not case-sensitive. What I suspect you are parsing is the output > > of some sequencing software which is using case as a rough indicator of > > base calling quality? > > > > The case will have been lost when the file was parsed, not at the moment > > you iterate over the resulting sequences. This means that you have to > > modify your file parsing method to become case-sensitive. > > > > The default DNA alphabet is not case-sensitive. It makes no distinction > > between the two, and will convert everything to one case. > > > > If you need to preserve case, you will need to use a custom alphabet > > which treats the cases differently, and also specify a tokenizer which > > is case-sensitive. See the help pages at http://biojava.org/ for help on > > creating new alphabets. Or, have a look at the ABITools.QUALITY alphabet > > in BioJava, which interprets the case and stores the quality scores > > separately. > > > > Note however that your custom alphabet is NOT the same as the original > > DNA alphabet, and so you may not be able to use it in all the standard > > transforms (RNA etc.). If you do want to use these then you will have to > > make a second copy of each sequence using the normal DNA alphabet and > > pass that copy to the routines. > > > > If you post to this list the code you are using to read the file, then I > > can show you where to insert the reference to this new alphabet. > > > > cheers, > > Richard > > > > Ilhami Visne wrote: > > > >> my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am > >> using now RichSequenceIterator to iterate over the sequences. > >> > >> How can i tell biojava that it should parse it case-sensitive? if i call > >> seq.seqString() method, it should return exactly like it was in the file > >> with upper- and lower-case. > >> > >> thanx. > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > > uZKlrdE8y6vMfKcOlm9yBZA= > > =2VZC > > -----END PGP SIGNATURE----- > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ilhami.visne at gmail.com Wed Feb 28 00:15:53 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Wed, 28 Feb 2007 06:15:53 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> Message-ID: <45E51009.2090605@gmail.com> Thank you. it does now. i should able to find it myself, but i am really not a bioinformaticians yet. my code (maybe there is someone, who has the same problem like me) BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); SymbolTokenization dnaParser = dna.getTokenization("token"); RichSequenceIterator iter = RichSequence.IOTools.readFasta(br,dnaParser,null); RichSequence rs = iter.nextRichSequence(); Mark Schreiber wrote: > Hi - > > There are also the classes: SoftMaskedAlphabet and > SoftMaskedAlphabet.CaseSensitiveTokenization and > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > read a sequence that contains case sensitive information and (if you > wish) make use of that information. You can also write out the > sequence in the original case sensitive format. > > It was originally designed for reading data that had been 'softmasked' > for low complexity regions (eg lower case regions are low complexity > and would be ignored in subsequent analysis) but it would be used for > quality or any other distinction. > > - Mark > > On 2/28/07, ilhami visne wrote: >> Thank you for quick answer. Here is the part of my code: >> >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); >> RichSequence rs = iter.nextRichSequence(); >> >> Richard Holland wrote: >> > -----BEGIN PGP SIGNED MESSAGE----- >> > Hash: SHA1 >> > >> > DNA is not case-sensitive. What I suspect you are parsing is the >> output >> > of some sequencing software which is using case as a rough >> indicator of >> > base calling quality? >> > >> > The case will have been lost when the file was parsed, not at the >> moment >> > you iterate over the resulting sequences. This means that you have to >> > modify your file parsing method to become case-sensitive. >> > >> > The default DNA alphabet is not case-sensitive. It makes no >> distinction >> > between the two, and will convert everything to one case. >> > >> > If you need to preserve case, you will need to use a custom alphabet >> > which treats the cases differently, and also specify a tokenizer which >> > is case-sensitive. See the help pages at http://biojava.org/ for >> help on >> > creating new alphabets. Or, have a look at the ABITools.QUALITY >> alphabet >> > in BioJava, which interprets the case and stores the quality scores >> > separately. >> > >> > Note however that your custom alphabet is NOT the same as the original >> > DNA alphabet, and so you may not be able to use it in all the standard >> > transforms (RNA etc.). If you do want to use these then you will >> have to >> > make a second copy of each sequence using the normal DNA alphabet and >> > pass that copy to the routines. >> > >> > If you post to this list the code you are using to read the file, >> then I >> > can show you where to insert the reference to this new alphabet. >> > >> > cheers, >> > Richard >> > >> > Ilhami Visne wrote: >> > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) >> and i am >> >> using now RichSequenceIterator to iterate over the sequences. >> >> >> >> How can i tell biojava that it should parse it case-sensitive? if >> i call >> >> seq.seqString() method, it should return exactly like it was in >> the file >> >> with upper- and lower-case. >> >> >> >> thanx. >> >> _______________________________________________ >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> > -----BEGIN PGP SIGNATURE----- >> > Version: GnuPG v1.4.2.2 (GNU/Linux) >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv >> > uZKlrdE8y6vMfKcOlm9yBZA= >> > =2VZC >> > -----END PGP SIGNATURE----- >> > >> > >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From ilhami.visne at gmail.com Wed Feb 28 02:48:46 2007 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Wed, 28 Feb 2007 08:48:46 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E51009.2090605@gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> Message-ID: i've changed my code and called the RestrictionSiteFinder with the new sequence. it's throwed this exception. Exception in thread "Thread-25" java.lang.UnsupportedOperationException: Ambiguity should be handled at the level of the wrapped Alphabet at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) i understand why it didn't work (lower case symbol 'a' and upper symbol 'A'), but i can't find a solution. Any idea? On 2/28/07, ilhami visne wrote: > Thank you. it does now. i should able to find it myself, but i am really > not a bioinformaticians yet. > > my code (maybe there is someone, who has the same problem like me) > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); > SymbolTokenization dnaParser = dna.getTokenization("token"); > > RichSequenceIterator iter = > RichSequence.IOTools.readFasta(br,dnaParser,null); > RichSequence rs = iter.nextRichSequence(); > > Mark Schreiber wrote: > > Hi - > > > > There are also the classes: SoftMaskedAlphabet and > > SoftMaskedAlphabet.CaseSensitiveTokenization and > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > > read a sequence that contains case sensitive information and (if you > > wish) make use of that information. You can also write out the > > sequence in the original case sensitive format. > > > > It was originally designed for reading data that had been 'softmasked' > > for low complexity regions (eg lower case regions are low complexity > > and would be ignored in subsequent analysis) but it would be used for > > quality or any other distinction. > > > > - Mark > > > > On 2/28/07, ilhami visne wrote: > >> Thank you for quick answer. Here is the part of my code: > >> > >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > >> RichSequence rs = iter.nextRichSequence(); > >> > >> Richard Holland wrote: > >> > -----BEGIN PGP SIGNED MESSAGE----- > >> > Hash: SHA1 > >> > > >> > DNA is not case-sensitive. What I suspect you are parsing is the > >> output > >> > of some sequencing software which is using case as a rough > >> indicator of > >> > base calling quality? > >> > > >> > The case will have been lost when the file was parsed, not at the > >> moment > >> > you iterate over the resulting sequences. This means that you have to > >> > modify your file parsing method to become case-sensitive. > >> > > >> > The default DNA alphabet is not case-sensitive. It makes no > >> distinction > >> > between the two, and will convert everything to one case. > >> > > >> > If you need to preserve case, you will need to use a custom alphabet > >> > which treats the cases differently, and also specify a tokenizer which > >> > is case-sensitive. See the help pages at http://biojava.org/ for > >> help on > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY > >> alphabet > >> > in BioJava, which interprets the case and stores the quality scores > >> > separately. > >> > > >> > Note however that your custom alphabet is NOT the same as the original > >> > DNA alphabet, and so you may not be able to use it in all the standard > >> > transforms (RNA etc.). If you do want to use these then you will > >> have to > >> > make a second copy of each sequence using the normal DNA alphabet and > >> > pass that copy to the routines. > >> > > >> > If you post to this list the code you are using to read the file, > >> then I > >> > can show you where to insert the reference to this new alphabet. > >> > > >> > cheers, > >> > Richard > >> > > >> > Ilhami Visne wrote: > >> > > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) > >> and i am > >> >> using now RichSequenceIterator to iterate over the sequences. > >> >> > >> >> How can i tell biojava that it should parse it case-sensitive? if > >> i call > >> >> seq.seqString() method, it should return exactly like it was in > >> the file > >> >> with upper- and lower-case. > >> >> > >> >> thanx. > >> >> _______________________________________________ > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> >> > >> >> > >> > -----BEGIN PGP SIGNATURE----- > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > >> > uZKlrdE8y6vMfKcOlm9yBZA= > >> > =2VZC > >> > -----END PGP SIGNATURE----- > >> > > >> > > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > From markjschreiber at gmail.com Wed Feb 28 06:03:24 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 28 Feb 2007 19:03:24 +0800 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> Message-ID: <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> Hi - Is there any reason why you need to be running the restriction finder over the soft masked sequence? Can you post some example code to replicate the bug/annoyance? If you think this is a genuine bug then please submit a biojava bug report to http://bugzilla.open-bio.org/ Please also include the example code that demonstrates the bug. Thanks. - Mark On 2/28/07, Ilhami Visne wrote: > i've changed my code and called the RestrictionSiteFinder with the new > sequence. it's throwed this exception. > > Exception in thread "Thread-25" > java.lang.UnsupportedOperationException: Ambiguity should be handled > at the level of the wrapped Alphabet > at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) > at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) > at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) > at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) > at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) > > i understand why it didn't work (lower case symbol 'a' and upper > symbol 'A'), but i can't find a solution. Any idea? > > On 2/28/07, ilhami visne wrote: > > Thank you. it does now. i should able to find it myself, but i am really > > not a bioinformaticians yet. > > > > my code (maybe there is someone, who has the same problem like me) > > > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > > > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); > > SymbolTokenization dnaParser = dna.getTokenization("token"); > > > > RichSequenceIterator iter = > > RichSequence.IOTools.readFasta(br,dnaParser,null); > > RichSequence rs = iter.nextRichSequence(); > > > > Mark Schreiber wrote: > > > Hi - > > > > > > There are also the classes: SoftMaskedAlphabet and > > > SoftMaskedAlphabet.CaseSensitiveTokenization and > > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > > > read a sequence that contains case sensitive information and (if you > > > wish) make use of that information. You can also write out the > > > sequence in the original case sensitive format. > > > > > > It was originally designed for reading data that had been 'softmasked' > > > for low complexity regions (eg lower case regions are low complexity > > > and would be ignored in subsequent analysis) but it would be used for > > > quality or any other distinction. > > > > > > - Mark > > > > > > On 2/28/07, ilhami visne wrote: > > >> Thank you for quick answer. Here is the part of my code: > > >> > > >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > > >> RichSequence rs = iter.nextRichSequence(); > > >> > > >> Richard Holland wrote: > > >> > -----BEGIN PGP SIGNED MESSAGE----- > > >> > Hash: SHA1 > > >> > > > >> > DNA is not case-sensitive. What I suspect you are parsing is the > > >> output > > >> > of some sequencing software which is using case as a rough > > >> indicator of > > >> > base calling quality? > > >> > > > >> > The case will have been lost when the file was parsed, not at the > > >> moment > > >> > you iterate over the resulting sequences. This means that you have to > > >> > modify your file parsing method to become case-sensitive. > > >> > > > >> > The default DNA alphabet is not case-sensitive. It makes no > > >> distinction > > >> > between the two, and will convert everything to one case. > > >> > > > >> > If you need to preserve case, you will need to use a custom alphabet > > >> > which treats the cases differently, and also specify a tokenizer which > > >> > is case-sensitive. See the help pages at http://biojava.org/ for > > >> help on > > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY > > >> alphabet > > >> > in BioJava, which interprets the case and stores the quality scores > > >> > separately. > > >> > > > >> > Note however that your custom alphabet is NOT the same as the original > > >> > DNA alphabet, and so you may not be able to use it in all the standard > > >> > transforms (RNA etc.). If you do want to use these then you will > > >> have to > > >> > make a second copy of each sequence using the normal DNA alphabet and > > >> > pass that copy to the routines. > > >> > > > >> > If you post to this list the code you are using to read the file, > > >> then I > > >> > can show you where to insert the reference to this new alphabet. > > >> > > > >> > cheers, > > >> > Richard > > >> > > > >> > Ilhami Visne wrote: > > >> > > > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) > > >> and i am > > >> >> using now RichSequenceIterator to iterate over the sequences. > > >> >> > > >> >> How can i tell biojava that it should parse it case-sensitive? if > > >> i call > > >> >> seq.seqString() method, it should return exactly like it was in > > >> the file > > >> >> with upper- and lower-case. > > >> >> > > >> >> thanx. > > >> >> _______________________________________________ > > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> >> > > >> >> > > >> > -----BEGIN PGP SIGNATURE----- > > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) > > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > >> > > > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > > >> > uZKlrdE8y6vMfKcOlm9yBZA= > > >> > =2VZC > > >> > -----END PGP SIGNATURE----- > > >> > > > >> > > > >> > > >> _______________________________________________ > > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From alainfa at bluewin.ch Wed Feb 28 08:54:17 2007 From: alainfa at bluewin.ch (Alain Favre) Date: Wed, 28 Feb 2007 14:54:17 +0100 Subject: [Biojava-l] biojava.utils.bytecode Message-ID: <000601c75b3f$f1409fc0$2101a8c0@camillealain> Hello, Looking for this package whwre can I find it Thanks for your answer A. Favre From ilhami.visne at gmail.com Wed Feb 28 14:56:36 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Wed, 28 Feb 2007 20:56:36 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> Message-ID: <45E5DE74.8040008@gmail.com> Yes, there is one. i am writing a small program, which my coworker will use. first it downloads some repeat masked sequences and then with restrictionsitefinder it finds cuts and then exports all fragments. these fragments should be repeat masked (case-sensitive) too. after i find the cut positions, with SequenceTools.subSequence() i'll extract the fragments and write them out. here is the code sample: BufferedReader br = new BufferedReader(new FileReader("aSeq.fasta")); Alphabet maskeddna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); SymbolTokenization dnaParser = maskeddna.getTokenization("token"); RichSequenceIterator iter = RichSequence.IOTools.readFasta(br,dnaParser,null); RichSequence seq = iter.nextRichSequence(); SimpleThreadPool threadPool = new SimpleThreadPool(); RestrictionEnzyme enzyme = RestrictionEnzymeManager.getEnzyme("MseI"); RestrictionMapper mapper = new RestrictionMapper(threadPool); mapper.addEnzyme(enzyme); mapper.annotate(seq); this throws: Exception in thread "Thread-3" java.lang.UnsupportedOperationException: Ambiguity should be handled at the level of the wrapped Alphabet at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) Mark Schreiber wrote: > Hi - > > Is there any reason why you need to be running the restriction finder > over the soft masked sequence? > > Can you post some example code to replicate the bug/annoyance? > > If you think this is a genuine bug then please submit a biojava bug > report to http://bugzilla.open-bio.org/ > Please also include the example code that demonstrates the bug. > > Thanks. > > - Mark > > On 2/28/07, Ilhami Visne wrote: >> i've changed my code and called the RestrictionSiteFinder with the new >> sequence. it's throwed this exception. >> >> Exception in thread "Thread-25" >> java.lang.UnsupportedOperationException: Ambiguity should be handled >> at the level of the wrapped Alphabet >> at >> org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) >> >> at >> org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) >> >> at >> org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) >> >> at >> org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) >> >> at >> org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) >> >> >> i understand why it didn't work (lower case symbol 'a' and upper >> symbol 'A'), but i can't find a solution. Any idea? >> >> On 2/28/07, ilhami visne wrote: >> > Thank you. it does now. i should able to find it myself, but i am >> really >> > not a bioinformaticians yet. >> > >> > my code (maybe there is someone, who has the same problem like me) >> > >> > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); >> > >> > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); >> > SymbolTokenization dnaParser = dna.getTokenization("token"); >> > >> > RichSequenceIterator iter = >> > RichSequence.IOTools.readFasta(br,dnaParser,null); >> > RichSequence rs = iter.nextRichSequence(); >> > >> > Mark Schreiber wrote: >> > > Hi - >> > > >> > > There are also the classes: SoftMaskedAlphabet and >> > > SoftMaskedAlphabet.CaseSensitiveTokenization and >> > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you >> > > read a sequence that contains case sensitive information and (if you >> > > wish) make use of that information. You can also write out the >> > > sequence in the original case sensitive format. >> > > >> > > It was originally designed for reading data that had been >> 'softmasked' >> > > for low complexity regions (eg lower case regions are low complexity >> > > and would be ignored in subsequent analysis) but it would be used >> for >> > > quality or any other distinction. >> > > >> > > - Mark >> > > >> > > On 2/28/07, ilhami visne wrote: >> > >> Thank you for quick answer. Here is the part of my code: >> > >> >> > >> BufferedReader br = new BufferedReader(new >> FileReader("seq.fasta")); >> > >> RichSequenceIterator iter = >> RichSequence.IOTools.readFastaDNA(br,null); >> > >> RichSequence rs = iter.nextRichSequence(); >> > >> >> > >> Richard Holland wrote: >> > >> > -----BEGIN PGP SIGNED MESSAGE----- >> > >> > Hash: SHA1 >> > >> > >> > >> > DNA is not case-sensitive. What I suspect you are parsing is the >> > >> output >> > >> > of some sequencing software which is using case as a rough >> > >> indicator of >> > >> > base calling quality? >> > >> > >> > >> > The case will have been lost when the file was parsed, not at the >> > >> moment >> > >> > you iterate over the resulting sequences. This means that you >> have to >> > >> > modify your file parsing method to become case-sensitive. >> > >> > >> > >> > The default DNA alphabet is not case-sensitive. It makes no >> > >> distinction >> > >> > between the two, and will convert everything to one case. >> > >> > >> > >> > If you need to preserve case, you will need to use a custom >> alphabet >> > >> > which treats the cases differently, and also specify a >> tokenizer which >> > >> > is case-sensitive. See the help pages at http://biojava.org/ for >> > >> help on >> > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY >> > >> alphabet >> > >> > in BioJava, which interprets the case and stores the quality >> scores >> > >> > separately. >> > >> > >> > >> > Note however that your custom alphabet is NOT the same as the >> original >> > >> > DNA alphabet, and so you may not be able to use it in all the >> standard >> > >> > transforms (RNA etc.). If you do want to use these then you will >> > >> have to >> > >> > make a second copy of each sequence using the normal DNA >> alphabet and >> > >> > pass that copy to the routines. >> > >> > >> > >> > If you post to this list the code you are using to read the file, >> > >> then I >> > >> > can show you where to insert the reference to this new alphabet. >> > >> > >> > >> > cheers, >> > >> > Richard >> > >> > >> > >> > Ilhami Visne wrote: >> > >> > >> > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) >> > >> and i am >> > >> >> using now RichSequenceIterator to iterate over the sequences. >> > >> >> >> > >> >> How can i tell biojava that it should parse it >> case-sensitive? if >> > >> i call >> > >> >> seq.seqString() method, it should return exactly like it was in >> > >> the file >> > >> >> with upper- and lower-case. >> > >> >> >> > >> >> thanx. >> > >> >> _______________________________________________ >> > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> > >> >> >> > >> > -----BEGIN PGP SIGNATURE----- >> > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) >> > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> > >> > >> > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv >> > >> > uZKlrdE8y6vMfKcOlm9yBZA= >> > >> > =2VZC >> > >> > -----END PGP SIGNATURE----- >> > >> > >> > >> > >> > >> >> > >> _______________________________________________ >> > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> > > >> > >> > >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From markjschreiber at gmail.com Wed Feb 28 20:32:29 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 1 Mar 2007 09:32:29 +0800 Subject: [Biojava-l] biojava.utils.bytecode In-Reply-To: <000601c75b3f$f1409fc0$2101a8c0@camillealain> References: <000601c75b3f$f1409fc0$2101a8c0@camillealain> Message-ID: <93b45ca50702281732g59b9e8c4k652f58f98c49f225@mail.gmail.com> Hi - You need to download the bytecode.jar which can be found at: http://biojava.org/wiki/BioJava:Download. - Mark On 2/28/07, Alain Favre wrote: > Hello, > Looking for this package whwre can I find it > Thanks for your answer > A. Favre > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From salmantahir1 at gmail.com Tue Feb 6 11:21:41 2007 From: salmantahir1 at gmail.com (Salman Tahir) Date: Tue, 6 Feb 2007 11:21:41 +0000 Subject: [Biojava-l] Variable modifications to Protein sequences Message-ID: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> Hi, I am emailing regarding a java program that I am writing which deals with the in silico manipulation of cross-linked peptides. I would like to manipulate protein sequences by taking into account variable modifications using BioJava classes (if possible). For example: If I have the following variable modification: hydroxo-BS2GD4 (N-term, K): 118.0563805 And the following sequence: FLEKQNKER (assuming it contains the N-term) I need to generate the following sequences FLEKQNKER -> FLEKQNKER, FLEKhydroxo-BS2GD0QNKER, FLEKQNKhydroxo-BS2GD0ER, FLEKhydroxo-BS2GD0QNKhydroxo-BS2GD0 ER, Fhydroxo-BS2GD0LEK QNKER and so on - generating a total of 8 peptide sequences with different masses. Is there a way i can generate these additional sequences using BioJava? Any help would be mostly appreciated. - Salman From markjschreiber at gmail.com Wed Feb 7 15:30:47 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 7 Feb 2007 23:30:47 +0800 Subject: [Biojava-l] Variable modifications to Protein sequences In-Reply-To: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> References: <2049e51a0702060321l349cd04dp6828b30e3ff980b@mail.gmail.com> Message-ID: <93b45ca50702070730n79d0aea2seefa3f3af5cebd1@mail.gmail.com> Hi - I think that the best approach would be to define your own custom alphabet (possibly with a custom tokenization). You could then create SymbolLists or sequences with this new alphabet. - Mark On 2/6/07, Salman Tahir wrote: > Hi, > > I am emailing regarding a java program that I am writing which deals with > the in silico manipulation of cross-linked peptides. > > I would like to manipulate protein sequences by taking into account variable > modifications using BioJava classes (if possible). For example: > > If I have the following variable modification: > hydroxo-BS2GD4 (N-term, K): 118.0563805 > > And the following sequence: FLEKQNKER (assuming it contains the N-term) > > I need to generate the following sequences > FLEKQNKER -> FLEKQNKER, FLEKhydroxo-BS2GD0QNKER, FLEKQNKhydroxo-BS2GD0ER, > FLEKhydroxo-BS2GD0QNKhydroxo-BS2GD0 ER, Fhydroxo-BS2GD0LEK QNKER and so on - > generating a total of 8 peptide sequences with different masses. > > Is there a way i can generate these additional sequences using BioJava? Any > help would be mostly appreciated. > > - Salman > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From su24 at st-andrews.ac.uk Wed Feb 7 16:11:37 2007 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Wed, 7 Feb 2007 16:11:37 +0000 Subject: [Biojava-l] Use Of Blosum62 Message-ID: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> Dear All, I am having a problem using the Blosum62 matrix file from the NCBI website for use in Smith-Waterman alignments. When I attempt to run my code reading in the file to initialise the Substitution Matrix Object the following exception is thrown "org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't contain character: '*'" Thanking you in advance, Saif -------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Sir Harold Mitchell Building School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From su24 at st-andrews.ac.uk Wed Feb 7 16:38:52 2007 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Wed, 7 Feb 2007 16:38:52 +0000 Subject: [Biojava-l] Use Of Blosum62 In-Reply-To: <45C9FD73.5040200@ebi.ac.uk> References: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> <45C9FD73.5040200@ebi.ac.uk> Message-ID: <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> Hi, Here is the code I am using Cheers, Saif public static void main(String args[]) { String x=("prot sequence excluded"); String y=("prot sequence excluded"); try { FiniteAlphabet alphabet (FiniteAlphabetAlphabetManager.alphabetForName("PROTEIN"); SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("///Users/su24/BLOSUM62")); // The above line causes the exception SequenceAlignment aligner = new SmithWaterman( -1,3,2,2,1,matrix); Sequence query = ProteinTools.createProteinSequence(x, "query"); Sequence target = ProteinTools.createProteinSequence(y, "target"); aligner.pairwiseAlignment(query, target); System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString()); } catch (Exception exc) { exc.printStackTrace(); } } } Quoting Richard Holland : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi. You need to include your complete program code in your email so that > we can accurately diagnose the problem. > > cheers, > Richard > > Saif Ur-Rehman wrote: > > Dear All, > > > > I am having a problem using the Blosum62 matrix file from the NCBI website > for > > use in Smith-Waterman alignments. When I attempt to run my code reading in > the > > file to initialise the Substitution Matrix Object the following exception > is > > thrown > > > > "org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't > > contain character: '*'" > > > > > > > > Thanking you in advance, > > > > Saif > > > > > > > > > -------------------------------------------------------------------------------- > > Saif Ur-Rehman > > Research Student > > The Centre for Evolution, Genes & Genomics (CEGG) > > Sir Harold Mitchell Building > > School of Biology > > The University of St Andrews > > St Andrews, > > Fife > > Scotland,UK > > > > ------------------------------------------------------------------ > > University of St Andrews Webmail: https://webmail.st-andrews.ac.uk > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFFyf1y4C5LeMEKA/QRAp7zAJ9ncsgYyBBRDrgSBnANTYv0xYI+PwCgpGv0 > cQQ8vFVB8QtKW9kAt9pRSEQ= > =2rLL > -----END PGP SIGNATURE----- > ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Sir Harold Mitchell Building School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From andreas.draeger at uni-tuebingen.de Wed Feb 7 18:15:59 2007 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Wed, 07 Feb 2007 19:15:59 +0100 Subject: [Biojava-l] Use Of Blosum62 In-Reply-To: <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> References: <1170864697.45c9fa39303a5@webmail.st-andrews.ac.uk> <45C9FD73.5040200@ebi.ac.uk> <1170866332.45ca009c26757@webmail.st-andrews.ac.uk> Message-ID: <45CA175F.1050105@uni-tuebingen.de> Hey Saif, > FiniteAlphabet alphabet > (FiniteAlphabetAlphabetManager.alphabetForName("PROTEIN"); > This cannot work because the symbol "*" is indeed not included in the alphabet "PROTEIN" as it marks the termination symbol. You will need to use the alphabet "PROTEIN-TERM". I hope that helps. If I remember correctly, there have been discussions about that special character in previous e-mails on that list. Cheers, Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From Russell.Smithies at agresearch.co.nz Mon Feb 12 00:55:17 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 12 Feb 2007 13:55:17 +1300 Subject: [Biojava-l] New Zealand bioinformatician vacancy In-Reply-To: <6dce9a0b0702110814v16d50c5dtc1c8c6f21f14c450@mail.gmail.com> References: <1170950909.6074.30.camel@localhost><6dce9a0b0702081123j3e8efe89r4e2a93b88d4029d9@mail.gmail.com><1170967751.6084.18.camel@localhost><6dce9a0b0702081307s11683737r17e8d22b47ed6fef@mail.gmail.com><1170970046.6084.36.camel@localhost><6dce9a0b0702081552n28e0aa79x79a8d4eb659fe119@mail.gmail.com><1171105304.5853.11.camel@localhost> <6dce9a0b0702110814v16d50c5dtc1c8c6f21f14c450@mail.gmail.com> Message-ID: Hi all, I hope you don't mind me posting an advert but we have an opening for a bioinformatician. So if you'd like to work in New Zealand, read on... thanx, Russell Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz AgResearch Farming Food and Health. First Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ========================================================== Bioinformatician AgResearch - Applied Biotechnologies As part of AgResearch's company strategy we are continuing to grow our business in the area of bioinformatics. This capability is essential for our science discovery. In this position you will be part of a national team of 26 bioinformaticians, mathematical biologists and statisticians and be based at our Grasslands campus at Palmerston North. This is a permanent position. You will be an advocate for bioinformatics within AgResearch; you will work collaboratively on projects and will provide bioinformatics training and advice to science staff working in the biotechnology area. We are seeking a person who has: * An excellent tertiary qualification in molecular biology or genetics * Experience with the use of bioinformatics applications * Knowledge of life sciences databases and the internet * Well developed IT technical skills and web based technologies * Experience in a training environment * Excellent writing, speaking and interpersonal skills * Familiarity with Perl, Java or Unix Scripting If you possess the above skills, we would like to hear from you. To find out more about this position please contact Peter Johnstone by email peter.johnstone at agresearch.co.nz or alternatively phone +64 3 489 9081. For a job description and to apply on line please go to http://www.agresearch.co.nz/recruitment . Vacancy No AGR494. Please provide contact details for 2 Referees with your application. For general information on AgResearch please visit our website at www.agresearch.co.nz Applications close 9th March 2007. Linda Murray Science Administrator AgResearch Limited Invermay Agricultural Centre Private Bag 50034 Mosgiel, New Zealand Phone: +64 3 489 9011 Email: linda.murray at agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From salmantahir1 at gmail.com Mon Feb 12 10:41:40 2007 From: salmantahir1 at gmail.com (Salman Tahir) Date: Mon, 12 Feb 2007 10:41:40 +0000 Subject: [Biojava-l] Tokenization Message-ID: <2049e51a0702120241k135a92edl235910394f1d124c@mail.gmail.com> Hi, I've created a custom alphabet and I would like to use it to parse a file. After defining the symbols in the alphabet I do the following: SymbolTokenization parser = alpha.getTokenization("token"); ('alpha being the custom alphabet'). However I get the following error: "BioException: There is no tokenization 'token' defined in alphabet Variable Modifications" I understand this to mean that I have to populate the new tokenization appropriately because this is a new alphabet. However, I cannot find any examples that show how to achieve this. Can anyone help? - Salman From m.titulaer at erasmusmc.nl Wed Feb 14 15:38:58 2007 From: m.titulaer at erasmusmc.nl (Mark Titulaer) Date: Wed, 14 Feb 2007 16:38:58 +0100 Subject: [Biojava-l] Conversion Applied Bioisystems MALDI Voyager DE-STR .dat file into peak list Message-ID: <45D32D12.3070800@erasmusmc.nl> I would like to convert an Applied Biosystems MALDI Voyager DE-STR .dat file into a peak list. Is there any library available ? From paolo.romano at istge.it Tue Feb 20 10:39:30 2007 From: paolo.romano at istge.it (Paolo Romano) Date: Tue, 20 Feb 2007 11:39:30 +0100 Subject: [Biojava-l] Final Call for Papers NETTAB 2007 Message-ID: <7.0.1.0.0.20070220113922.01e9c968@istge.it> ================================= Apologies, if you're receiving multiple copies ================================= Final Call for Papers 7th International Workshop NETTAB 2007 June 12-15, 2006 Department of Computer Science, University of Pisa, Pisa http://www.nettab.org/2007/ FOCUS THEME A Semantic Web for Bioinformatics: Goals, Tools, Systems, Applications ADJUNCT THEMES Algorithms in bioinformatics Formal Methods for Systems Biology Network Tools and Applications in Bioinformatics AIMS This workshop aims at getting together biologists, bioinformaticians, computer scientists and linguists to try to answer the following questions: - Is the Semantic Web of some use for Bioinformatics? - Which goals should have a Semantic Web for Bioinformatics? - Which standards, technologies and tools of the Semantic Web can most profitably be used in Bioinformatics? - Which application did the Semantic Web already find in Bioinformatics? - Which current Bioinformatics research problems can be solved by the Semantic Web? - Which are the short, medium and long term perspectives in applying Semantic Web technologies to Bionformatics? The workshop also intends to: - introduce the basic knowledge of related standards and technologies, in a non trivial way through invited lectures and tutorials - outline the promising features of the Semantic Web in bioinformatics through invited lectures and open discussion - show some valuable examples in bioinformatics through invited lectures, oral communications and posters - support as much discussion as possible through open discussions and a panel discussion - practically demonstrate "how it works" through tutorials INVITED SPEAKERS We are extremely glad to inform you that four excellent speakers have accepted to give an invited lecture at next NETTAB workshop. + Session on Goals of the Semantic Web in Bioinformatics Invited speaker: Eric Neumann, Co-chair of the Health Care and Life Science Interest Group of the Semantic Web Activity at W3C & Teranode Inc., USA + Session on Semantic Web Standards and Technologies Invited speaker: Guus Schreiber, Chair of Semantic Web Deployment Interest Group at W3C & University of Amsterdam, The Netherlands + Session on Semantic Web Tools Invited speaker: Olivier Bodenreider, Medical Ontology Research, National Library of Medicine, USA + Session on Applications of Semantic Web in Bioinformatics Invited speakers: Michael Schroeder, Biotec TU Dresden, Germany, and Albert Burger, Heriot-Watt University, Scotland, UK Titles of the lectures will soon be announced. TOPICS NETTAB workshops also include general sessions on Network Tools and Applications in Biology and adjunct focus topics selected by local organizers. You are therefore welcome to submit your work on any of the followings topics: Goals of a Semantic Web for Bioinformatics: Standards, Technologies, Tools for the Semantic Web Systems for a Semantic Web for Bioinformatics: Existing and perspective applications of the Semantic Web for Bioinformatics Algorithms in Bioinformatics Formal Methods for Systems Biology Network Tools and Applications in Bioinformatics DEADLINES Submission of oral communications: March 16, 2007 Proposals for tutorials: March 16, 2007 Submission of posters: April 20, 2007 Early Registration Deadline: April 27, 2007 SUBMISSION OF CONTRIBUTIONS You are welcome to submit your contributions through the MyReview system at the following URL: http://www.nettab.org/2007/myreview/ . This is a two steps submission procedure. You will have to submit first an abstract and later the paper. You are invited to prepare your contribution according to instructions available at http://www.biomedcentral.com/bmcbioinformatics/ifora/ (specially, see "Preparing main manuscript text" section). See the workshop's web site for more details about the preparation of the papers. PUBLICATION OF PAPERS AND POSTERS All accepted oral communications and posters will be published in the Workshop's Proceedings that will be distributed to all participants. Selected papers will be published by BMC Bioinformatics as a Supplement Issue. When submitting, you will be asked if your submission is intended for the workshop only or for the publication in BMC Bioinformatics too. The papers will be reviewed by the members of the Programme Committee. They will decide whether the papers are accepted for the workshop only or for the publication in BMC Bioinformatics as well. The workshop will engage itself to pay part of the cost of the publication, but please be advised that depending on the final budget of the workshop you could be requested to contribute to publication costs which has been set to 600 British pounds per paper. For any further information or clarification, please contact the organization by email at info at nettab.org . CHAIRS P. Romano, Bioinformatics, Natl Cancer Research Inst., Italy M. Schr?der, Biotechnology Centre, TU Dresden, Germany N. Cannata, Mathematics & Computer Science Dept, Univ. of Camerino, Italy O. Signore, ISTI, National Research Council, Italy PROGRAMME COMMITTEE G. Armano, Electrical and Electronic Engineering Dept, Univ. of Cagliari, IT C. Baker, Institute for Infocomm Research (I2R), SG P. Barahona, Department of Informatics, New University of Lisboa, PT L. Barrio-Alvers, Transinsight GmbH, DE O. Bodenreider, National Library of Medicine, USA A. Burger, Department of Computer Science, Heriot-Watt University, UK M. Cannataro, Experimental and Clinical Medicine Dept, Univ. of Catanzaro "Magna Graecia", IT W. Ceusters, Bioinformatics and Life Sciences, University at Buffalo, USA M. Cockerill, BioMed Central, UK M.-D. Devignes, LORIA, Vandoeuvre les Nancy, FR R. Dieng, INRIA, Sophia Antipolis, FR L. Grivell, European Molecular Biology Organisation, DE M. Harris, European Bioinformatics Institute, UK M. Helmer-Citterich, Biology Dept, University of Rome "Tor Vergata", IT C. M. Keet, Computer Science Faculty, Free University of Bozen-Bolzano, IT J. Koehler, Biomathematics and Bioinformatics, Rothamsted Research, UK M. Krallinger, Spanish National Cancer Research Center (CNIO), ES L. Krippahl, Department of Informatics, New University of Lisboa, PT P. Lambrix, Computer and Information Science Dept, Link?ping University, SE U. Leser, Institute for Computer Science, Humboldt-University of Berlin, DE J. Luciano, Department of Genetics, Harvard Medical School, USA R. Marangoni, Computer Science Department, University of Pisa, IT M. Marchiori, Pure and Applied Mathematics Dept, University of Padua, IT M. Masseroli, Department of BioEngineering, Polytechnic of Milan, IT G. Mauri, Informatics Systems and Communication Dept, Univ. Milan "Bicocca", IT E. Merelli, Mathematics and Computer Science Dept, University of Camerino, IT S. Moeller, Institute of Neuro- and Bioinformatics, University of L?beck, DE S. Philippi, Institute for Software Technology, Univ. of Koblenz-Landau, DE D. Quann, IBM Software Group, USA D. Rubin, Stanford Medical Informatics, Stanford University Medical Center, USA S.-A. Sansone, European Bioinformatics Institute, UK M. Senger, International Rice Research Institute, PH D. Turi, School of Computer Science, University of Manchester, UK G. Vetere, IBM Center for Advanced Studies of Rome, IT D. Zaccagnini, Language and Computing, USA Looking forward to seeing you all in Tuscany. Best regards. On behalf of the Chairs and of the Organizing Committee Paolo Romano Paolo Romano (paolo.romano at istge.it) Bioinformatics and Structural Proteomics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 From kumar at neb.com Wed Feb 21 21:33:48 2007 From: kumar at neb.com (Sanjay Kumar) Date: Wed, 21 Feb 2007 16:33:48 -0500 Subject: [Biojava-l] Job opportunity - Bioinformatics Software Developer Message-ID: Hi- Sorry for the job posting, but NEB has an opening for a bioinformatics software developer that might be just right for someone with BioJava experience. We are located just north of Boston Massachusetts, USA, and the work environment is wonderful. The posting is listed below and at our web site: http://www.neb.com/nebecomm/career.asp If interested, please email your C.V. and statement of interest to: resumes at neb.com Attn: Job Code 1102SK Cheers. -sanjay ====================== Sanjay Kumar, PhD Research Scientist New England Biolabs ================================================================= Bioinformatics Software Developer Summary: Bioinformatics software developer that will work under the direction of a principle investigator and collaborate with other scientists in designing, developing, testing and maintaining computational tools, web services, and potentially large relational databases to support the study of diverse research topics in molecular biology relevant to New England Biolabs. At minimum, a BS degree in computer science, biology, physics, or related field plus 2 years experience as a software developer is required. Proven ability to design and implement projects using RDBMS software and web-based user interfaces is essential. Primary Responsibilities: Accurately evaluate proposed projects in terms of their feasibility, required resources, and estimated timelines and milestones, and effectively communicate the results to the principle investigator. Design, implement, test, deploy, and maintain relational databases containing molecular biology/genomic data. Translate user requirements into a database schema. Design, implement, test, deploy, and maintain software tools for querying/browsing/updating data in relational databases, with emphasis on web accessible applications. Design, implement, test, deploy, and maintain software tools to access external data sources, e.g., at NCBI, EBI, and WormBase, with emphasis on web accessible applications. Assemble bioinformatics pipelines with custom and/or third-party software. Provide technical support to end users when necessary. Technical Expertise: Expertise in at least one of the following languages: Java, Perl, Python, C++, or C is required. Experience writing SQL (including DDL) preferably for Apache Derby, PostgreSQL, Sybase, or MS SQLServer is required. Web programming experience using HTML, Javascript, and PHP, and familiarity with Apache/Tomcat, Glassfish, or Sun Web Server is preferred. Experience with object relational mapping frameworks (Hibernate, JDO) and middleware (J2EE/EJB, Spring) is preferred. Experience with SOAP web services is preferred. Familiarity with Mac OS, Windows and UNIX and the ability to move easily between them is beneficial. If interested, please email your C.V. and statement of interest to: resumes at neb.com Attn: Job Code 1102SK From crackeur at comcast.net Sun Feb 25 04:28:32 2007 From: crackeur at comcast.net (Jimmy Zhang) Date: Sat, 24 Feb 2007 20:28:32 -0800 Subject: [Biojava-l] [ANN]VTD-XML 2.0 References: <1133171676.2994.75.camel@zidane> Message-ID: <015e01c75895$86e20800$0d02a8c0@ximpleware> The VTD-XML project team is proud to announce the release of version 2.0 of VTD-XML, the next generation XML parser/indexer. The new features introduced in this version are: * VTD+XML version 1.0: the world's first true native XML index that is simple, general-purpose and back-compatible with XML. * NodeRecorder Class that saves VTDNav's cursor location for later sequential access. * Overwrite capability * Lexically comparisons between VTD and strings To download the software, please go to http://sourceforge.net/project/showfiles.php?group_id=110612 To read the latest benchmark report please go to http://vtd-xml.sf.net/benchmark1.html To get the latest API overview http://www.ximpleware.com/vtd-xml_intro.pdf From ilhami.visne at gmail.com Tue Feb 27 14:47:38 2007 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Tue, 27 Feb 2007 15:47:38 +0100 Subject: [Biojava-l] case-sensitive sequences Message-ID: my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am using now RichSequenceIterator to iterate over the sequences. How can i tell biojava that it should parse it case-sensitive? if i call seq.seqString() method, it should return exactly like it was in the file with upper- and lower-case. thanx. From ilhami.visne at gmail.com Tue Feb 27 17:54:13 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Tue, 27 Feb 2007 18:54:13 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E44B6F.9090202@ebi.ac.uk> References: <45E44B6F.9090202@ebi.ac.uk> Message-ID: <45E47045.5010400@gmail.com> Thank you for quick answer. Here is the part of my code: BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); RichSequence rs = iter.nextRichSequence(); Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > DNA is not case-sensitive. What I suspect you are parsing is the output > of some sequencing software which is using case as a rough indicator of > base calling quality? > > The case will have been lost when the file was parsed, not at the moment > you iterate over the resulting sequences. This means that you have to > modify your file parsing method to become case-sensitive. > > The default DNA alphabet is not case-sensitive. It makes no distinction > between the two, and will convert everything to one case. > > If you need to preserve case, you will need to use a custom alphabet > which treats the cases differently, and also specify a tokenizer which > is case-sensitive. See the help pages at http://biojava.org/ for help on > creating new alphabets. Or, have a look at the ABITools.QUALITY alphabet > in BioJava, which interprets the case and stores the quality scores > separately. > > Note however that your custom alphabet is NOT the same as the original > DNA alphabet, and so you may not be able to use it in all the standard > transforms (RNA etc.). If you do want to use these then you will have to > make a second copy of each sequence using the normal DNA alphabet and > pass that copy to the routines. > > If you post to this list the code you are using to read the file, then I > can show you where to insert the reference to this new alphabet. > > cheers, > Richard > > Ilhami Visne wrote: > >> my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am >> using now RichSequenceIterator to iterate over the sequences. >> >> How can i tell biojava that it should parse it case-sensitive? if i call >> seq.seqString() method, it should return exactly like it was in the file >> with upper- and lower-case. >> >> thanx. >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > uZKlrdE8y6vMfKcOlm9yBZA= > =2VZC > -----END PGP SIGNATURE----- > > From markjschreiber at gmail.com Wed Feb 28 02:54:57 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 28 Feb 2007 10:54:57 +0800 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E47045.5010400@gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> Message-ID: <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> Hi - There are also the classes: SoftMaskedAlphabet and SoftMaskedAlphabet.CaseSensitiveTokenization and SoftMaskedAlphabet.MaskingDetector. Together these classes let you read a sequence that contains case sensitive information and (if you wish) make use of that information. You can also write out the sequence in the original case sensitive format. It was originally designed for reading data that had been 'softmasked' for low complexity regions (eg lower case regions are low complexity and would be ignored in subsequent analysis) but it would be used for quality or any other distinction. - Mark On 2/28/07, ilhami visne wrote: > Thank you for quick answer. Here is the part of my code: > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > RichSequence rs = iter.nextRichSequence(); > > Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > DNA is not case-sensitive. What I suspect you are parsing is the output > > of some sequencing software which is using case as a rough indicator of > > base calling quality? > > > > The case will have been lost when the file was parsed, not at the moment > > you iterate over the resulting sequences. This means that you have to > > modify your file parsing method to become case-sensitive. > > > > The default DNA alphabet is not case-sensitive. It makes no distinction > > between the two, and will convert everything to one case. > > > > If you need to preserve case, you will need to use a custom alphabet > > which treats the cases differently, and also specify a tokenizer which > > is case-sensitive. See the help pages at http://biojava.org/ for help on > > creating new alphabets. Or, have a look at the ABITools.QUALITY alphabet > > in BioJava, which interprets the case and stores the quality scores > > separately. > > > > Note however that your custom alphabet is NOT the same as the original > > DNA alphabet, and so you may not be able to use it in all the standard > > transforms (RNA etc.). If you do want to use these then you will have to > > make a second copy of each sequence using the normal DNA alphabet and > > pass that copy to the routines. > > > > If you post to this list the code you are using to read the file, then I > > can show you where to insert the reference to this new alphabet. > > > > cheers, > > Richard > > > > Ilhami Visne wrote: > > > >> my sequence files contain case-sensitive symbols (TAATAACgagagg) and i am > >> using now RichSequenceIterator to iterate over the sequences. > >> > >> How can i tell biojava that it should parse it case-sensitive? if i call > >> seq.seqString() method, it should return exactly like it was in the file > >> with upper- and lower-case. > >> > >> thanx. > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.2.2 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > > > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > > uZKlrdE8y6vMfKcOlm9yBZA= > > =2VZC > > -----END PGP SIGNATURE----- > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ilhami.visne at gmail.com Wed Feb 28 05:15:53 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Wed, 28 Feb 2007 06:15:53 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> Message-ID: <45E51009.2090605@gmail.com> Thank you. it does now. i should able to find it myself, but i am really not a bioinformaticians yet. my code (maybe there is someone, who has the same problem like me) BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); SymbolTokenization dnaParser = dna.getTokenization("token"); RichSequenceIterator iter = RichSequence.IOTools.readFasta(br,dnaParser,null); RichSequence rs = iter.nextRichSequence(); Mark Schreiber wrote: > Hi - > > There are also the classes: SoftMaskedAlphabet and > SoftMaskedAlphabet.CaseSensitiveTokenization and > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > read a sequence that contains case sensitive information and (if you > wish) make use of that information. You can also write out the > sequence in the original case sensitive format. > > It was originally designed for reading data that had been 'softmasked' > for low complexity regions (eg lower case regions are low complexity > and would be ignored in subsequent analysis) but it would be used for > quality or any other distinction. > > - Mark > > On 2/28/07, ilhami visne wrote: >> Thank you for quick answer. Here is the part of my code: >> >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); >> RichSequence rs = iter.nextRichSequence(); >> >> Richard Holland wrote: >> > -----BEGIN PGP SIGNED MESSAGE----- >> > Hash: SHA1 >> > >> > DNA is not case-sensitive. What I suspect you are parsing is the >> output >> > of some sequencing software which is using case as a rough >> indicator of >> > base calling quality? >> > >> > The case will have been lost when the file was parsed, not at the >> moment >> > you iterate over the resulting sequences. This means that you have to >> > modify your file parsing method to become case-sensitive. >> > >> > The default DNA alphabet is not case-sensitive. It makes no >> distinction >> > between the two, and will convert everything to one case. >> > >> > If you need to preserve case, you will need to use a custom alphabet >> > which treats the cases differently, and also specify a tokenizer which >> > is case-sensitive. See the help pages at http://biojava.org/ for >> help on >> > creating new alphabets. Or, have a look at the ABITools.QUALITY >> alphabet >> > in BioJava, which interprets the case and stores the quality scores >> > separately. >> > >> > Note however that your custom alphabet is NOT the same as the original >> > DNA alphabet, and so you may not be able to use it in all the standard >> > transforms (RNA etc.). If you do want to use these then you will >> have to >> > make a second copy of each sequence using the normal DNA alphabet and >> > pass that copy to the routines. >> > >> > If you post to this list the code you are using to read the file, >> then I >> > can show you where to insert the reference to this new alphabet. >> > >> > cheers, >> > Richard >> > >> > Ilhami Visne wrote: >> > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) >> and i am >> >> using now RichSequenceIterator to iterate over the sequences. >> >> >> >> How can i tell biojava that it should parse it case-sensitive? if >> i call >> >> seq.seqString() method, it should return exactly like it was in >> the file >> >> with upper- and lower-case. >> >> >> >> thanx. >> >> _______________________________________________ >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> > -----BEGIN PGP SIGNATURE----- >> > Version: GnuPG v1.4.2.2 (GNU/Linux) >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv >> > uZKlrdE8y6vMfKcOlm9yBZA= >> > =2VZC >> > -----END PGP SIGNATURE----- >> > >> > >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From ilhami.visne at gmail.com Wed Feb 28 07:48:46 2007 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Wed, 28 Feb 2007 08:48:46 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <45E51009.2090605@gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> Message-ID: i've changed my code and called the RestrictionSiteFinder with the new sequence. it's throwed this exception. Exception in thread "Thread-25" java.lang.UnsupportedOperationException: Ambiguity should be handled at the level of the wrapped Alphabet at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) i understand why it didn't work (lower case symbol 'a' and upper symbol 'A'), but i can't find a solution. Any idea? On 2/28/07, ilhami visne wrote: > Thank you. it does now. i should able to find it myself, but i am really > not a bioinformaticians yet. > > my code (maybe there is someone, who has the same problem like me) > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); > SymbolTokenization dnaParser = dna.getTokenization("token"); > > RichSequenceIterator iter = > RichSequence.IOTools.readFasta(br,dnaParser,null); > RichSequence rs = iter.nextRichSequence(); > > Mark Schreiber wrote: > > Hi - > > > > There are also the classes: SoftMaskedAlphabet and > > SoftMaskedAlphabet.CaseSensitiveTokenization and > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > > read a sequence that contains case sensitive information and (if you > > wish) make use of that information. You can also write out the > > sequence in the original case sensitive format. > > > > It was originally designed for reading data that had been 'softmasked' > > for low complexity regions (eg lower case regions are low complexity > > and would be ignored in subsequent analysis) but it would be used for > > quality or any other distinction. > > > > - Mark > > > > On 2/28/07, ilhami visne wrote: > >> Thank you for quick answer. Here is the part of my code: > >> > >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > >> RichSequence rs = iter.nextRichSequence(); > >> > >> Richard Holland wrote: > >> > -----BEGIN PGP SIGNED MESSAGE----- > >> > Hash: SHA1 > >> > > >> > DNA is not case-sensitive. What I suspect you are parsing is the > >> output > >> > of some sequencing software which is using case as a rough > >> indicator of > >> > base calling quality? > >> > > >> > The case will have been lost when the file was parsed, not at the > >> moment > >> > you iterate over the resulting sequences. This means that you have to > >> > modify your file parsing method to become case-sensitive. > >> > > >> > The default DNA alphabet is not case-sensitive. It makes no > >> distinction > >> > between the two, and will convert everything to one case. > >> > > >> > If you need to preserve case, you will need to use a custom alphabet > >> > which treats the cases differently, and also specify a tokenizer which > >> > is case-sensitive. See the help pages at http://biojava.org/ for > >> help on > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY > >> alphabet > >> > in BioJava, which interprets the case and stores the quality scores > >> > separately. > >> > > >> > Note however that your custom alphabet is NOT the same as the original > >> > DNA alphabet, and so you may not be able to use it in all the standard > >> > transforms (RNA etc.). If you do want to use these then you will > >> have to > >> > make a second copy of each sequence using the normal DNA alphabet and > >> > pass that copy to the routines. > >> > > >> > If you post to this list the code you are using to read the file, > >> then I > >> > can show you where to insert the reference to this new alphabet. > >> > > >> > cheers, > >> > Richard > >> > > >> > Ilhami Visne wrote: > >> > > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) > >> and i am > >> >> using now RichSequenceIterator to iterate over the sequences. > >> >> > >> >> How can i tell biojava that it should parse it case-sensitive? if > >> i call > >> >> seq.seqString() method, it should return exactly like it was in > >> the file > >> >> with upper- and lower-case. > >> >> > >> >> thanx. > >> >> _______________________________________________ > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> >> > >> >> > >> > -----BEGIN PGP SIGNATURE----- > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > >> > uZKlrdE8y6vMfKcOlm9yBZA= > >> > =2VZC > >> > -----END PGP SIGNATURE----- > >> > > >> > > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > From markjschreiber at gmail.com Wed Feb 28 11:03:24 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 28 Feb 2007 19:03:24 +0800 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> Message-ID: <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> Hi - Is there any reason why you need to be running the restriction finder over the soft masked sequence? Can you post some example code to replicate the bug/annoyance? If you think this is a genuine bug then please submit a biojava bug report to http://bugzilla.open-bio.org/ Please also include the example code that demonstrates the bug. Thanks. - Mark On 2/28/07, Ilhami Visne wrote: > i've changed my code and called the RestrictionSiteFinder with the new > sequence. it's throwed this exception. > > Exception in thread "Thread-25" > java.lang.UnsupportedOperationException: Ambiguity should be handled > at the level of the wrapped Alphabet > at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) > at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) > at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) > at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) > at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) > > i understand why it didn't work (lower case symbol 'a' and upper > symbol 'A'), but i can't find a solution. Any idea? > > On 2/28/07, ilhami visne wrote: > > Thank you. it does now. i should able to find it myself, but i am really > > not a bioinformaticians yet. > > > > my code (maybe there is someone, who has the same problem like me) > > > > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > > > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); > > SymbolTokenization dnaParser = dna.getTokenization("token"); > > > > RichSequenceIterator iter = > > RichSequence.IOTools.readFasta(br,dnaParser,null); > > RichSequence rs = iter.nextRichSequence(); > > > > Mark Schreiber wrote: > > > Hi - > > > > > > There are also the classes: SoftMaskedAlphabet and > > > SoftMaskedAlphabet.CaseSensitiveTokenization and > > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you > > > read a sequence that contains case sensitive information and (if you > > > wish) make use of that information. You can also write out the > > > sequence in the original case sensitive format. > > > > > > It was originally designed for reading data that had been 'softmasked' > > > for low complexity regions (eg lower case regions are low complexity > > > and would be ignored in subsequent analysis) but it would be used for > > > quality or any other distinction. > > > > > > - Mark > > > > > > On 2/28/07, ilhami visne wrote: > > >> Thank you for quick answer. Here is the part of my code: > > >> > > >> BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); > > >> RichSequenceIterator iter = RichSequence.IOTools.readFastaDNA(br,null); > > >> RichSequence rs = iter.nextRichSequence(); > > >> > > >> Richard Holland wrote: > > >> > -----BEGIN PGP SIGNED MESSAGE----- > > >> > Hash: SHA1 > > >> > > > >> > DNA is not case-sensitive. What I suspect you are parsing is the > > >> output > > >> > of some sequencing software which is using case as a rough > > >> indicator of > > >> > base calling quality? > > >> > > > >> > The case will have been lost when the file was parsed, not at the > > >> moment > > >> > you iterate over the resulting sequences. This means that you have to > > >> > modify your file parsing method to become case-sensitive. > > >> > > > >> > The default DNA alphabet is not case-sensitive. It makes no > > >> distinction > > >> > between the two, and will convert everything to one case. > > >> > > > >> > If you need to preserve case, you will need to use a custom alphabet > > >> > which treats the cases differently, and also specify a tokenizer which > > >> > is case-sensitive. See the help pages at http://biojava.org/ for > > >> help on > > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY > > >> alphabet > > >> > in BioJava, which interprets the case and stores the quality scores > > >> > separately. > > >> > > > >> > Note however that your custom alphabet is NOT the same as the original > > >> > DNA alphabet, and so you may not be able to use it in all the standard > > >> > transforms (RNA etc.). If you do want to use these then you will > > >> have to > > >> > make a second copy of each sequence using the normal DNA alphabet and > > >> > pass that copy to the routines. > > >> > > > >> > If you post to this list the code you are using to read the file, > > >> then I > > >> > can show you where to insert the reference to this new alphabet. > > >> > > > >> > cheers, > > >> > Richard > > >> > > > >> > Ilhami Visne wrote: > > >> > > > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) > > >> and i am > > >> >> using now RichSequenceIterator to iterate over the sequences. > > >> >> > > >> >> How can i tell biojava that it should parse it case-sensitive? if > > >> i call > > >> >> seq.seqString() method, it should return exactly like it was in > > >> the file > > >> >> with upper- and lower-case. > > >> >> > > >> >> thanx. > > >> >> _______________________________________________ > > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> >> > > >> >> > > >> > -----BEGIN PGP SIGNATURE----- > > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) > > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > >> > > > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv > > >> > uZKlrdE8y6vMfKcOlm9yBZA= > > >> > =2VZC > > >> > -----END PGP SIGNATURE----- > > >> > > > >> > > > >> > > >> _______________________________________________ > > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From alainfa at bluewin.ch Wed Feb 28 13:54:17 2007 From: alainfa at bluewin.ch (Alain Favre) Date: Wed, 28 Feb 2007 14:54:17 +0100 Subject: [Biojava-l] biojava.utils.bytecode Message-ID: <000601c75b3f$f1409fc0$2101a8c0@camillealain> Hello, Looking for this package whwre can I find it Thanks for your answer A. Favre From ilhami.visne at gmail.com Wed Feb 28 19:56:36 2007 From: ilhami.visne at gmail.com (ilhami visne) Date: Wed, 28 Feb 2007 20:56:36 +0100 Subject: [Biojava-l] case-sensitive sequences In-Reply-To: <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> References: <45E44B6F.9090202@ebi.ac.uk> <45E47045.5010400@gmail.com> <93b45ca50702271854h3253e04bw3e9ba2f98ddc1270@mail.gmail.com> <45E51009.2090605@gmail.com> <93b45ca50702280303j4480c8d4kd69c9aa41c0221d2@mail.gmail.com> Message-ID: <45E5DE74.8040008@gmail.com> Yes, there is one. i am writing a small program, which my coworker will use. first it downloads some repeat masked sequences and then with restrictionsitefinder it finds cuts and then exports all fragments. these fragments should be repeat masked (case-sensitive) too. after i find the cut positions, with SequenceTools.subSequence() i'll extract the fragments and write them out. here is the code sample: BufferedReader br = new BufferedReader(new FileReader("aSeq.fasta")); Alphabet maskeddna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); SymbolTokenization dnaParser = maskeddna.getTokenization("token"); RichSequenceIterator iter = RichSequence.IOTools.readFasta(br,dnaParser,null); RichSequence seq = iter.nextRichSequence(); SimpleThreadPool threadPool = new SimpleThreadPool(); RestrictionEnzyme enzyme = RestrictionEnzymeManager.getEnzyme("MseI"); RestrictionMapper mapper = new RestrictionMapper(threadPool); mapper.addEnzyme(enzyme); mapper.annotate(seq); this throws: Exception in thread "Thread-3" java.lang.UnsupportedOperationException: Ambiguity should be handled at the level of the wrapped Alphabet at org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) at org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) at org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) at org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) at org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) Mark Schreiber wrote: > Hi - > > Is there any reason why you need to be running the restriction finder > over the soft masked sequence? > > Can you post some example code to replicate the bug/annoyance? > > If you think this is a genuine bug then please submit a biojava bug > report to http://bugzilla.open-bio.org/ > Please also include the example code that demonstrates the bug. > > Thanks. > > - Mark > > On 2/28/07, Ilhami Visne wrote: >> i've changed my code and called the RestrictionSiteFinder with the new >> sequence. it's throwed this exception. >> >> Exception in thread "Thread-25" >> java.lang.UnsupportedOperationException: Ambiguity should be handled >> at the level of the wrapped Alphabet >> at >> org.biojava.bio.symbol.SoftMaskedAlphabet.getAmbiguity(SoftMaskedAlphabet.java:183) >> >> at >> org.biojava.bio.symbol.AlphabetManager.getAllSymbols(AlphabetManager.java:223) >> >> at >> org.biojava.bio.seq.io.SymbolListCharSequence.(SymbolListCharSequence.java:75) >> >> at >> org.biojava.bio.molbio.RestrictionSiteFinder.run(RestrictionSiteFinder.java:73) >> >> at >> org.biojava.utils.SimpleThreadPool$PooledThread.run(SimpleThreadPool.java:295) >> >> >> i understand why it didn't work (lower case symbol 'a' and upper >> symbol 'A'), but i can't find a solution. Any idea? >> >> On 2/28/07, ilhami visne wrote: >> > Thank you. it does now. i should able to find it myself, but i am >> really >> > not a bioinformaticians yet. >> > >> > my code (maybe there is someone, who has the same problem like me) >> > >> > BufferedReader br = new BufferedReader(new FileReader("seq.fasta")); >> > >> > Alphabet dna = SoftMaskedAlphabet.getInstance(DNATools.getDNA()); >> > SymbolTokenization dnaParser = dna.getTokenization("token"); >> > >> > RichSequenceIterator iter = >> > RichSequence.IOTools.readFasta(br,dnaParser,null); >> > RichSequence rs = iter.nextRichSequence(); >> > >> > Mark Schreiber wrote: >> > > Hi - >> > > >> > > There are also the classes: SoftMaskedAlphabet and >> > > SoftMaskedAlphabet.CaseSensitiveTokenization and >> > > SoftMaskedAlphabet.MaskingDetector. Together these classes let you >> > > read a sequence that contains case sensitive information and (if you >> > > wish) make use of that information. You can also write out the >> > > sequence in the original case sensitive format. >> > > >> > > It was originally designed for reading data that had been >> 'softmasked' >> > > for low complexity regions (eg lower case regions are low complexity >> > > and would be ignored in subsequent analysis) but it would be used >> for >> > > quality or any other distinction. >> > > >> > > - Mark >> > > >> > > On 2/28/07, ilhami visne wrote: >> > >> Thank you for quick answer. Here is the part of my code: >> > >> >> > >> BufferedReader br = new BufferedReader(new >> FileReader("seq.fasta")); >> > >> RichSequenceIterator iter = >> RichSequence.IOTools.readFastaDNA(br,null); >> > >> RichSequence rs = iter.nextRichSequence(); >> > >> >> > >> Richard Holland wrote: >> > >> > -----BEGIN PGP SIGNED MESSAGE----- >> > >> > Hash: SHA1 >> > >> > >> > >> > DNA is not case-sensitive. What I suspect you are parsing is the >> > >> output >> > >> > of some sequencing software which is using case as a rough >> > >> indicator of >> > >> > base calling quality? >> > >> > >> > >> > The case will have been lost when the file was parsed, not at the >> > >> moment >> > >> > you iterate over the resulting sequences. This means that you >> have to >> > >> > modify your file parsing method to become case-sensitive. >> > >> > >> > >> > The default DNA alphabet is not case-sensitive. It makes no >> > >> distinction >> > >> > between the two, and will convert everything to one case. >> > >> > >> > >> > If you need to preserve case, you will need to use a custom >> alphabet >> > >> > which treats the cases differently, and also specify a >> tokenizer which >> > >> > is case-sensitive. See the help pages at http://biojava.org/ for >> > >> help on >> > >> > creating new alphabets. Or, have a look at the ABITools.QUALITY >> > >> alphabet >> > >> > in BioJava, which interprets the case and stores the quality >> scores >> > >> > separately. >> > >> > >> > >> > Note however that your custom alphabet is NOT the same as the >> original >> > >> > DNA alphabet, and so you may not be able to use it in all the >> standard >> > >> > transforms (RNA etc.). If you do want to use these then you will >> > >> have to >> > >> > make a second copy of each sequence using the normal DNA >> alphabet and >> > >> > pass that copy to the routines. >> > >> > >> > >> > If you post to this list the code you are using to read the file, >> > >> then I >> > >> > can show you where to insert the reference to this new alphabet. >> > >> > >> > >> > cheers, >> > >> > Richard >> > >> > >> > >> > Ilhami Visne wrote: >> > >> > >> > >> >> my sequence files contain case-sensitive symbols (TAATAACgagagg) >> > >> and i am >> > >> >> using now RichSequenceIterator to iterate over the sequences. >> > >> >> >> > >> >> How can i tell biojava that it should parse it >> case-sensitive? if >> > >> i call >> > >> >> seq.seqString() method, it should return exactly like it was in >> > >> the file >> > >> >> with upper- and lower-case. >> > >> >> >> > >> >> thanx. >> > >> >> _______________________________________________ >> > >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> > >> >> >> > >> > -----BEGIN PGP SIGNATURE----- >> > >> > Version: GnuPG v1.4.2.2 (GNU/Linux) >> > >> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> > >> > >> > >> > iD8DBQFF5Etv4C5LeMEKA/QRAnGBAJ45eeQhmb4AT0CLTQCVyn5HxFS/cQCfXXgv >> > >> > uZKlrdE8y6vMfKcOlm9yBZA= >> > >> > =2VZC >> > >> > -----END PGP SIGNATURE----- >> > >> > >> > >> > >> > >> >> > >> _______________________________________________ >> > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> > > >> > >> > >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >