From andreas at sdsc.edu Thu Sep 9 18:24:35 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 9 Sep 2010 15:24:35 -0700 Subject: [Biojava-l] Biojava Post translational modifications In-Reply-To: <9CE87E39-5DE3-4996-A53F-63C2B5453901@gmail.com> References: <9CE87E39-5DE3-4996-A53F-63C2B5453901@gmail.com> Message-ID: Hi Jay, Is this from the latest svn-trunk? Sounds like this has been created using the biojava 1.7. There were several improvements over the last months regarding chemically modified groups .... In the current code base if you set FileParsingParameters.setLoadChemCompInfo(true), you will get the chemically correct representation for all groups... I suggest trying out the code below (using a checkout from biojava-svn ...) Andreas public void basicLoad(String pdbId){ try { PDBFileReader reader = new PDBFileReader(); // the path to the local PDB installation reader.setPath("/tmp"); // are all files in one directory, or are the files split, // as on the PDB ftp servers? reader.setPdbDirectorySplit(true); // should a missing PDB id be fetched automatically from the FTP servers? reader.setAutoFetch(true); // configure the parameters of file parsing FileParsingParameters params = new FileParsingParameters(); // should the ATOM and SEQRES residues be aligned when creating the internal data model? params.setAlignSeqRes(true); // should secondary structure get parsed from the file params.setParseSecStruc(false); // This tells the code to fetch the chemical definitions for all groups params.setLoadChemCompInfo(true); reader.setFileParsingParameters(params); Structure structure = reader.getStructureById(pdbId); System.out.println(structure); for (Chain c: structure.getChains()){ System.out.println("Chain " + c.getName() + " details:"); System.out.println("Atom ligands: " + c.getAtomLigands()); System.out.println(c.getSeqResGroups()); } } catch (Exception e){ e.printStackTrace(); } } On Thu, Sep 9, 2010 at 3:13 PM, JAX wrote: > Hi Andreas, some of my collaborators could not get post translational > modifications from pdb files using biojavas structure API. ?Do you have any > thoughts on this? > > Jay Vyas > MMSB > UCHC > Begin forwarded message: > > From: Patrick Gradie > Date: September 9, 2010 5:23:10 PM EDT > To: biotoolkit at googlegroups.com > Subject: Re: problems with biojava > Reply-To: biotoolkit at googlegroups.com > > The issue that I found with the BioJava PDB?utility is as follows: > BioJava takes a PDB File xxxx.cif.gz and then populates a Structure variable > in memory that you can pull from. > You are able to get things like header, dbref, model, chain, residue, and > atom info. That was good to have, however, I found that when I tried > searching for motifs I could not find any of the ones that had required > modifications. > This is because when biojava would parse?(ACE)SKS(MLZ)DRKYTL it would simply > truncate the (ACE) and (MLZ). ?However the important thing here is that MLZ > is an?N-METHYL-LYSINE or a K before modification. > So in the database would be SKSDRKY (there is no atom data for T or L in the > example string only sequence information) > The motif?[KR][AST]K[DNQK] would not be found in that truncated sequence > because the K in the center is required to be in the sequence. > I am not sure why BioJava would just truncate these modified residues. > ESPECIALLY because in the pdb file iteself is the following line in every > single file except around 15 out of the 64k: > > loop_ > _entity_poly.entity_id > _entity_poly.type > _entity_poly.nstd_linkage > _entity_poly.nstd_monomer > _entity_poly.pdbx_seq_one_letter_code > _entity_poly.pdbx_seq_one_letter_code_can > _entity_poly.pdbx_strand_id > 1 'polypeptide(L)' no no > ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM > PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN > TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS > GPEAPEWYQVELKAFQATQQK > ; > ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM > PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN > TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS > GPEAPEWYQVELKAFQATQQK > ; > A > 2 'polypeptide(L)' no yes '(ACE)SKS(MLZ)DRKYTL' > XSKSKDRKYTL > B > > As you can see above, the sequence XSKSKDRKYTL is given in full. ?the ACE is > turned into an X because it doesn't map to a regular amino acid. ?So the PDB > files hold both the modified and unmodified version of the sequence in this > special section. Given that information it is possible to create a database > that motifs can be searched for within. > BioJava will throw a bunch of errors "WARNING: unknown group name MLZ" for > residues it doesn't interpret as regular amino acids. > I am not sure, though, if the BioJava 3 release fixes this problem. > -Patrick > > On Thu, Sep 9, 2010 at 1:47 PM, Jay Vyas wrote: >> >> Hi guys, does anyone want to tell me about the issues regarding the >> PDB utilities in BioJava ? ?I am interested in knowing what they were >> ? >> >> -- >> Jay Vyas >> MMSB/UCHC > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From asandro1501 at gmail.com Thu Sep 9 21:17:19 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Thu, 9 Sep 2010 22:17:19 -0300 Subject: [Biojava-l] First project Biojava Message-ID: Hello I'm Brazilian so I apologize for my English. I am starting with BioJava. Need to build a code that works with file formats. Gbk and. Fa. To be more specific, I need to find in the headers of the files. Gbk the initial location of a particular protein and then perform the search on the file format. Fa. Thank you for listening. -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From anantpossible at gmail.com Sat Sep 11 00:42:30 2010 From: anantpossible at gmail.com (Anant Jain) Date: Sat, 11 Sep 2010 10:12:30 +0530 Subject: [Biojava-l] First project Biojava In-Reply-To: References: Message-ID: Hi Alex, I had done some work on PDB file and parsing its headers. I need to check about corresponding API for Genbank and Fasta files, will inform you once found any one. As you are new to Bio-java, you can contact me on Yahoo. My IM is anant_jain86 at yahoo.com I will be happy to help you. Regards, Anant Jain B.Tech Bioinformatics RHCE On Fri, Sep 10, 2010 at 6:47 AM, Alex Silva wrote: > Hello > > I'm Brazilian so I apologize for my English. I am starting with BioJava. > Need > to build a code that works with file formats. Gbk and. Fa. To be more > specific, I need to find in the headers of the files. Gbk the initial > location of a particular protein and then perform the search on the file > format. Fa. > > Thank you for listening. > > -- > Alex Silva > G.R.A. Sistemas Corporativos > msn: gra.sistemas at hotmail.com > 55-9165-7378 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Anant Jain B.Tech Bioinformatics, RHCE From nagendravns at gmail.com Mon Sep 13 08:36:46 2010 From: nagendravns at gmail.com (nagendra kumar) Date: Mon, 13 Sep 2010 18:06:46 +0530 Subject: [Biojava-l] run biojava Message-ID: i am runing the bio java in debian . how to set classpath &path in debian , error throws depricated API use in program how to resolbsd problem From nagendravns at gmail.com Mon Sep 13 10:30:51 2010 From: nagendravns at gmail.com (nagendra kumar) Date: Mon, 13 Sep 2010 20:00:51 +0530 Subject: [Biojava-l] run biojava Message-ID: how to run biojava in debian , From koen.bruynseels at cropdesign.com Mon Sep 13 12:22:48 2010 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Mon, 13 Sep 2010 18:22:48 +0200 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 09/11/2010 and will not return until 09/15/2010. I will respond to your message when I return. From simon.rayner.cn at gmail.com Mon Sep 13 21:18:15 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Mon, 13 Sep 2010 21:18:15 -0400 Subject: [Biojava-l] run biojava In-Reply-To: References: Message-ID: you can run it the same way you would run a java program. have a look at the cookbook page, there are many examples. http://biojava.org/wiki/BioJava:CookBook#How_Do_I.....3F this is a good first example to try and run. just copy, paste and compile http://biojava.org/wiki/BioJava:Cookbook:SeqIO:ReadFasta On Mon, Sep 13, 2010 at 10:30 AM, nagendra kumar wrote: > how to run biojava in debian , > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From paolo.romano at istge.it Wed Sep 15 11:51:34 2010 From: paolo.romano at istge.it (Paolo Romano) Date: Wed, 15 Sep 2010 17:51:34 +0200 Subject: [Biojava-l] NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 Message-ID: <201009151552.o8FFpaXH000456@clus2.istge.it> I hope this announcement can be of interest for this list. Forgive me if I'm wrong! And apologies for any duplication. Ciao. Paolo ========== NETTAB 2010 on "Biological Wikis" joint with the BBCC 2010 workshop on Bioinformatics and Computational Biology in Campania November 29 - December 1, 2010, Naples, Italy http://www.nettab.org/2010/ http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/ The deadline for the submission of oral communications is quickly approaching, submit you contribution within next Friday September 24, 2010 through the EasyChair site ( http://www.easychair.org/conferences/?conf=nettab2010 ). The lenght of contributions for oral communications should be between 3 and 5 pages, including tables and figures. See more instructions below. NETTAB 2010 workshop promises to be a great meeting for all researchers involved in the exploitation of wikis in biology. Don't miss this opportunity to discuss your ideas and doubts with such scientists as - Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom - Alexander Pico, Gladstone Institute of Cardiovascular Disease, San Francisco, USA - Andrew Su, Bioinformatics and Computational Biology, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, USA - Dan Bolser, College of Life Sciences, University of Dundee, Scotland, United Kingdom - Robert Hoffmann, Computational Biology Center, cBIO, Memorial Sloan-Kettering Cancer Center, MSKCC, New York, USA - Thomas Kelder, Department of Bioinformatics (BiGCaT), Maastricht University, the Netherlands - Jaime Prilusky, Bioinformatics, Weizmann Institute of Science, Rehovot, Israel - and many other who, we hope, will join the workshop. Here below, please find a summary of the Call. The complete Call is available on-line at http://www.nettab.org/2010/call.html . Further information is availble at http://www.nettab.org/2010/ . ============ CALL FOR PAPERS TOPICS The following list is not meant to be exclusive of any further topics as stated above. Submitted contributions should address one or more of the following topics: * Wiki development tools o Wikimedia o Wikimedia extensions o Semantic Wikis o Wiki-coupled CMSs o Other wikis * Arising issues for the biomedical domain: o Authoritativeness of contributions and sites o Quality assessment o Users acknowledgement o Stimulatation of quality contributions o Authorships management and reward o 'Scientific production' value for contributions o Management of bioinformatics data types * Wikis and collaborative systems for: o Genomics, proteomics, metabolomics, any -omics o Proteins analysis and visualization o gene and proteins interactions o metabolic pathways o oncology research * Issues to be tackled by wiki and collaborative research for: o Genomics, proteomics, metabolomics, any -omics o Proteins analysis and visualization o gene and proteins interactions o metabolic pathways o oncology research The NETTAB 2010 workshop is a joint event with the BBCC 2010 workshop on This deadline also applies to the BBCC 2010 workshop. Submit for BBCC through the same EasyChair site and select 'BBCC session' topic. TYPE OF CONTRIBUTIONS The following possible contributions are sought: * Oral communications * Posters * Software demos All accepted contributions will be published in the proceedings of the workshop. DEADLINES * September 24, 2010: Oral communications submission o Decisions announced: October 24, 2010 * October 29, 2010: Early registration ends * November 29 - December 1, 2010: Workshop and Tutorials INSTRUCTIONS Kindly follow the instructions carefully when preparing your contribution and submit your contribution through the EasyChair system at http://www.easychair.org/conferences/?conf=nettab2010. All contributions should follow the same format, as specified here: font type: Times New Roman, font size: 12 pti, page size: A4, left and right margins: 2.0 cm, upper margin: 2.5 cm, lower margin: 2.0 cm. The lenght of contributions for oral communications should be between 3 and 5 pages, including tables and figures. They should include: Abstract, Introduction, Methods, Results and Discussion, References. All contributions for oral communications will be evaluated by at least three referees. For any further information or clarification, please contact the organization by email at info at nettab.org. ORGANIZATION (see http://www.nettab.org/2010/organization.html for the Scientific Committee and more information) Co-chairs * Angelo Facchiano, CNR-ISA, Avellino, Italy * Paolo Romano, National Cancer Research Institute, Genoa, Italy We look forward to meeting you in Naples! Paolo Romano and Angelo Facchiano on behalf of the Scientific Committee Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 Skype: p.romano Web: http://www.nettab.org/promano/ From bernd.jagla at pasteur.fr Tue Sep 21 07:46:35 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 13:46:35 +0200 Subject: [Biojava-l] fileToBiojava question Message-ID: <4C989B1B.7090807@pasteur.fr> Hello, I am getting a little frustrated with the wiki page (I guess I don't spend enough time reading and testing). I have the impression that some of the documentation relates to version 3 whereas others relate to 1.5 or 1.7. So sorry if this all sounds a bit confused... ;( I believe I am using 1.7.1. (I wasn't able to find a readme file that contains that information) even though I would probably like to use version 3. But as I am stuck with an older Eclipse version I think it will be even worse when I try that. Anyways, I am trying to read in sequence files using SeqIOTools.fileToBiojava, which seems to be deprecated, with the following parameters: "genbank", "dna", bufferedReader. somehow this works with "fasta" but with genbank I get the following exception: Execute failed: Unknown file type '524300' in some cases I get: Unknown file type '262156' Does this mean anything to you? Or how do you read in a sequence file? I am looking for a generic way that covers many file types (genbank, fasta, swissprot...) Once I have this I will probably be able to get to the feature information using the information from the tutorial. Thanks for your time. Bernd From simon.rayner.cn at gmail.com Tue Sep 21 08:07:21 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 21 Sep 2010 20:07:21 +0800 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C989B1B.7090807@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> Message-ID: hi, can you post the code you are trying to run along with the full error, it will help to figure out what is happening. There are now loaders for biojavax as well, which work well which are available in the biojavax docs here http://biojava.org/wiki/BioJava:BioJavaXDocs#Example but yeah, it's confusing unless you happen to be a real java guru. i keep having to refer back to the docs because i keep forgeting which class does what On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla wrote: > Hello, > > I am getting a little frustrated with the wiki page (I guess I don't spend > enough time reading and testing). I have the impression that some of the > documentation relates to version 3 whereas others relate to 1.5 or 1.7. > So sorry if this all sounds a bit confused... ;( > > I believe I am using 1.7.1. (I wasn't able to find a readme file that > contains that information) even though I would probably like to use version > 3. But as I am stuck with an older Eclipse version I think it will be even > worse when I try that. > > Anyways, I am trying to read in sequence files using > SeqIOTools.fileToBiojava, which seems to be deprecated, with the following > parameters: "genbank", "dna", bufferedReader. > > somehow this works with "fasta" but with genbank I get the following > exception: > Execute failed: Unknown file type '524300' > in some cases I get: > Unknown file type '262156' > > Does this mean anything to you? > > Or how do you read in a sequence file? I am looking for a generic way that > covers many file types (genbank, fasta, swissprot...) > > Once I have this I will probably be able to get to the feature information > using the information from the tutorial. > > Thanks for your time. > > Bernd > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From bernd.jagla at pasteur.fr Tue Sep 21 08:39:09 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 14:39:09 +0200 Subject: [Biojava-l] what is a namespace? Message-ID: <4C98A76D.5060309@pasteur.fr> Hi, sorry for the basic question, but I would like to clarify the following: When you are talking about namespace in your documentation (e.g. biojavaXDocs) it means that I tag the information that is associated with that namespace in order to differentiate it from something else. Is this a fair description? I can generate a name space using the following code: Namespace ns = (Namespace)RichObjectFactory.getObject(SimpleNamespace.class,new Object[]{"myNamespace"}); Why is the second parameter an array of objects? When would I use something other than a SimpleNamespace class? Could you point me to some examples of their use? In my code I can have many instances of a given class. Do I need to use different namespaces each time to avoid conflicts? E.g. I have class that reads in sequence and annotation data. Do I have use different namespaces for each instance? Thanks, Bernd PS. please let me know if these questions are too basic to ask!!!! Otherwise I will probably have some more ;) From bernd.jagla at pasteur.fr Tue Sep 21 08:47:21 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 14:47:21 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> Message-ID: <4C98A959.2040405@pasteur.fr> Sorry for the wrong reply... Here is the FULL code I marked the passages that are important in red: Thanks for looking at it!!!! Bernd package org.pasteur.pf2.biojava; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Iterator; import java.util.NoSuchElementException; import org.biojava.bio.BioException; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.SequenceIterator; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.seq.io.SymbolTokenization; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.AlphabetManager; import org.biojava.bio.symbol.SymbolList; import org.biojavax.RichObjectFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.knime.core.data.DataCell; import org.knime.core.data.DataColumnSpec; import org.knime.core.data.DataColumnSpecCreator; import org.knime.core.data.DataTableSpec; import org.knime.core.data.RowKey; import org.knime.core.data.container.BlobDataCell; import org.knime.core.data.def.DefaultRow; import org.knime.core.data.def.StringCell; import org.knime.core.node.BufferedDataContainer; import org.knime.core.node.BufferedDataTable; import org.knime.core.node.CanceledExecutionException; import org.knime.core.node.ExecutionContext; import org.knime.core.node.ExecutionMonitor; import org.knime.core.node.InvalidSettingsException; import org.knime.core.node.NodeLogger; import org.knime.core.node.NodeModel; import org.knime.core.node.NodeSettingsRO; import org.knime.core.node.NodeSettingsWO; import org.knime.core.node.defaultnodesettings.SettingsModelString; import org.biojavax.bio.seq.io.EMBLFormat; import org.biojavax.bio.seq.io.FastaFormat; import org.biojavax.bio.seq.io.GenbankFormat; import org.biojavax.bio.seq.io.INSDseqFormat; import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.RichStreamReader; import org.biojavax.bio.seq.io.UniProtFormat; import org.pasteur.pf2.datatypes.*; /** * This is the model implementation of FastAReader. Reads a FASTA file into two * columns: seq_name and sequence * * @author Bernd Jagla */ @SuppressWarnings("deprecation") public class FastAReaderNodeModel extends NodeModel { // the logger instance private static final NodeLogger logger = NodeLogger .getLogger(FastQReaderNodeModel.class); private Alphabet alpha; private SequenceIterator iter; /** * the settings key which is used to retrieve and store the settings (from * the dialog or from a settings file) (package visibility to be usable from * the dialog). */ private static final String FAR_name = "far_name"; private static final String FAR_fileFormat = "far_ff"; private static final String FAR_alphabet = "far_alph"; private final SettingsModelString m_fpname = createFAR_fpname(); private final SettingsModelString m_fformat = createFileFormat(); private final SettingsModelString m_alphabet = createAlphabet(); /** * Constructor for the node model. */ protected FastAReaderNodeModel() { super(0, 1); } /** * {@inheritDoc} */ @Override protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception { // TODO do something here logger.info("Node Model Stub... this is not yet implemented !"); // the data table spec of the single output table, // the table will have three columns: DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; allColSpecs[0] = new DataColumnSpecCreator("sequence", SequenceDataCell.TYPE) .createSpec(); DataTableSpec outputSpec = new DataTableSpec(allColSpecs); // the execution context will provide us with storage capacity, in this // case a data container to which we will add rows sequentially // Note, this container can also handle arbitrary big data tables, it // will buffer to disc if necessary. BufferedDataContainer container = exec.createDataContainer(outputSpec); // let's add m_count rows to it // once we are done, we close the container and return its table FileReader fp = new FileReader(m_fpname.getStringValue()); exec.checkCanceled(); //String form = m_fformat.getStringValue(); //String alphabet = m_alphabet.getStringValue(); String form = "genbank"; String alphabet = "DNA"; BufferedReader br = new BufferedReader(fp); // String line = br.readLine(); int count = 0; SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( form, alphabet, br); while (iter.hasNext()) { exec.checkCanceled(); RowKey key = new RowKey("Row " + count); exec.setProgress("Row " + count); // System.out.println(fastq.getSequence()); Sequence seq = iter.nextSequence(); String seqName = seq.getName(); // String seqName = "asdf"; //String sequence = seq.seqString(); System.err.println("reading: " + seqName + " " + seq.length()); SequenceDataCell seqCell = new SequenceDataCell(seqName, seq); container.addRowToTable(new DefaultRow(key, seqCell)); count++; } System.err.println("finished reading file"); br.close(); fp.close(); container.close(); return new BufferedDataTable[] { container.getTable() }; } /** * Makes a SequenceIterator look like an * Iterator {@code } * * @param iter * The SequenceIterator * @return An Iterator that returns only Sequence * objects. You cannot call remove() on this * iterator! */ public Iterator asIterator(SequenceIterator iter) { final SequenceIterator it = iter; return new Iterator() { public boolean hasNext() { return it.hasNext(); } public Sequence next() { try { return it.nextSequence(); } catch (BioException e) { NoSuchElementException ex = new NoSuchElementException(); ex.initCause(e); throw ex; } } public void remove() { throw new UnsupportedOperationException(); } }; } public static RichSequenceFormat formatForName(String name) throws ClassNotFoundException, InstantiationException, IllegalAccessException { // determine the format to use RichSequenceFormat format; if (name.equalsIgnoreCase("fasta")) { format = (RichSequenceFormat) new FastaFormat(); } else if (name.equalsIgnoreCase("genbank")) { format = (RichSequenceFormat) new GenbankFormat(); } else if (name.equalsIgnoreCase("uniprot")) { format = new UniProtFormat(); } else if (name.equalsIgnoreCase("embl")) { format = new EMBLFormat(); } else if (name.equalsIgnoreCase("INSDseq")) { format = new INSDseqFormat(); } else { Class formatClass = Class.forName(name); format = (RichSequenceFormat) formatClass.newInstance(); } return format; } /** * {@inheritDoc} */ @Override protected void reset() { } /** * {@inheritDoc} */ @Override protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) throws InvalidSettingsException { DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; allColSpecs[0] = new DataColumnSpecCreator("sequence", SequenceDataCell.TYPE) .createSpec(); DataTableSpec outputSpec = new DataTableSpec(allColSpecs); return new DataTableSpec[] { outputSpec }; } /** * {@inheritDoc} */ @Override protected void saveSettingsTo(final NodeSettingsWO settings) { m_alphabet.saveSettingsTo(settings); m_fformat.saveSettingsTo(settings); m_fpname.saveSettingsTo(settings); } /** * {@inheritDoc} */ @Override protected void loadValidatedSettingsFrom(final NodeSettingsRO settings) throws InvalidSettingsException { m_alphabet.loadSettingsFrom(settings); m_fformat.loadSettingsFrom(settings); m_fpname.loadSettingsFrom(settings); } /** * {@inheritDoc} */ @Override protected void validateSettings(final NodeSettingsRO settings) throws InvalidSettingsException { m_alphabet.validateSettings(settings); m_fformat.validateSettings(settings); m_fpname.validateSettings(settings); } /** * {@inheritDoc} */ @Override protected void loadInternals(final File internDir, final ExecutionMonitor exec) throws IOException, CanceledExecutionException { } /** * {@inheritDoc} */ @Override protected void saveInternals(final File internDir, final ExecutionMonitor exec) throws IOException, CanceledExecutionException { } public static SettingsModelString createFAR_fpname() { return new SettingsModelString(FAR_name, ""); } public static SettingsModelString createFileFormat() { return new SettingsModelString(FAR_fileFormat, "FASTA"); } public static SettingsModelString createAlphabet() { return new SettingsModelString(FAR_alphabet, "RNA"); } } On 9/21/2010 2:40 PM, simon rayner wrote: > hi, > > can you repost to the biojava group along with the full code, (just in > case there is a missing import or something). you only replied to, > and not to the biojava mailing list > > thanks > > simon > > On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla > wrote: > > Thanks for the quick reply! > > Here is some code that should have all the important parts: > > String form = "genbank"; > String alphabet = "dna"; > BufferedReader br = new BufferedReader(fp); > SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( > form, alphabet, br); > while (iter.hasNext()) { > Sequence seq = iter.nextSequence(); > => Exception thrown > String seqName = seq.getName(); > } > > > When trying to simplify the code a bit I now get the following error: > Execute failed: Could not initialize class > org.biojava.bio.seq.FeatureFilter > > I assume that in the previous times I had a spelling error?? > Then the exception got thrown during the initialization of "iter" > > Thanks, > > Bernd > > > On 9/21/2010 2:07 PM, simon rayner wrote: >> hi, >> >> can you post the code you are trying to run along with the full >> error, it will help to figure out what is happening. There are >> now loaders for biojavax as well, which work well which are >> available in the biojavax docs here >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >> >> but yeah, it's confusing unless you happen to be a real java >> guru. i keep having to refer back to the docs because i keep >> forgeting which class does what >> >> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla >> > wrote: >> >> Hello, >> >> I am getting a little frustrated with the wiki page (I guess >> I don't spend enough time reading and testing). I have the >> impression that some of the documentation relates to version >> 3 whereas others relate to 1.5 or 1.7. >> So sorry if this all sounds a bit confused... ;( >> >> I believe I am using 1.7.1. (I wasn't able to find a readme >> file that contains that information) even though I would >> probably like to use version 3. But as I am stuck with an >> older Eclipse version I think it will be even worse when I >> try that. >> >> Anyways, I am trying to read in sequence files using >> SeqIOTools.fileToBiojava, which seems to be deprecated, with >> the following parameters: "genbank", "dna", bufferedReader. >> >> somehow this works with "fasta" but with genbank I get the >> following exception: >> Execute failed: Unknown file type '524300' >> in some cases I get: >> Unknown file type '262156' >> >> Does this mean anything to you? >> >> Or how do you read in a sequence file? I am looking for a >> generic way that covers many file types (genbank, fasta, >> swissprot...) >> >> Once I have this I will probably be able to get to the >> feature information using the information from the tutorial. >> >> Thanks for your time. >> >> Bernd >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > From simon.rayner.cn at gmail.com Tue Sep 21 21:10:09 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 21 Sep 2010 21:10:09 -0400 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C98A959.2040405@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> Message-ID: sorry for the delay in replying due to time difference. this is a modified version of your code that uses biojavax. i stripped out the pasteur stuff and added code to the *execute* method (about line 74). Also marked the imports i added at the top hope this helps package cn.cas.wiv.bif.biojava; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Iterator; import java.util.NoSuchElementException; /******************* your biojava imports **********************/ import org.biojava.bio.BioException; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.SequenceIterator; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.seq.io.SymbolTokenization; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.AlphabetManager; import org.biojava.bio.symbol.SymbolList; import org.biojavax.RichObjectFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.EMBLFormat; import org.biojavax.bio.seq.io.FastaFormat; import org.biojavax.bio.seq.io.GenbankFormat; import org.biojavax.bio.seq.io.INSDseqFormat; import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.RichStreamReader; import org.biojavax.bio.seq.io.UniProtFormat; /********* added these imports to make things work **********/ import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; /** * This is the model implementation of FastAReader. Reads a FASTA file into two * columns: seq_name and sequence * * @author Bernd Jagla */ @SuppressWarnings("deprecation") public class FastAReaderNodeModel { // the logger instance private Alphabet alpha; private SequenceIterator iter; protected void execute(FileReader fp) throws Exception { /** * {@inheritDoc} */ //String form = m_fformat.getStringValue(); //String alphabet = m_alphabet.getStringValue(); String form = "genbank"; String alphabet = "DNA"; /****************** old way *********************/ int count = 0; BufferedReader br = new BufferedReader(fp); SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( form, alphabet, br); while (iter.hasNext()) { // System.out.println(fastq.getSequence()); Sequence seq = iter.nextSequence(); String seqName = seq.getName(); // String seqName = "asdf"; //String sequence = seq.seqString(); System.err.println("reading: " + seqName + " " + seq.length()); count++; } System.err.println("finished reading file"); /****************** biojavax way *********************/ RichSequence refRSequence; SimpleNamespace ns = new SimpleNamespace("MTBGB"); RichSequenceIterator rsi = RichSequence.IOTools.readGenbankDNA(br, ns); while(rsi.hasNext()) { refRSequence = rsi.nextRichSequence(); System.out.println("read " + refRSequence.length() + " bases"); /** if you want the features, use a FeatureFilter and a FeatureHolder **/ FeatureFilter ff = new FeatureFilter.ByType("CDS"); FeatureHolder fhRef = refRSequence.filter(ff); } br.close(); fp.close(); } /** * Makes a SequenceIterator look like an * Iterator {@code } * * @param iter * The SequenceIterator * @return An Iterator that returns only Sequence * objects. You cannot call remove() on this * iterator! */ public Iterator asIterator(SequenceIterator iter) { final SequenceIterator it = iter; return new Iterator() { public boolean hasNext() { return it.hasNext(); } public Sequence next() { try { return it.nextSequence(); } catch (BioException e) { NoSuchElementException ex = new NoSuchElementException(); ex.initCause(e); throw ex; } } public void remove() { throw new UnsupportedOperationException(); } }; } public static RichSequenceFormat formatForName(String name) throws ClassNotFoundException, InstantiationException, IllegalAccessException { // determine the format to use RichSequenceFormat format; if (name.equalsIgnoreCase("fasta")) { format = (RichSequenceFormat) new FastaFormat(); } else if (name.equalsIgnoreCase("genbank")) { format = (RichSequenceFormat) new GenbankFormat(); } else if (name.equalsIgnoreCase("uniprot")) { format = new UniProtFormat(); } else if (name.equalsIgnoreCase("embl")) { format = new EMBLFormat(); } else if (name.equalsIgnoreCase("INSDseq")) { format = new INSDseqFormat(); } else { Class formatClass = Class.forName(name); format = (RichSequenceFormat) formatClass.newInstance(); } return format; } } On Tue, Sep 21, 2010 at 8:47 AM, Bernd Jagla wrote: > Sorry for the wrong reply... > Here is the FULL code I marked the passages that are important in red: > > Thanks for looking at it!!!! > > Bernd > > > package org.pasteur.pf2.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.knime.core.data.DataCell; > import org.knime.core.data.DataColumnSpec; > import org.knime.core.data.DataColumnSpecCreator; > import org.knime.core.data.DataTableSpec; > import org.knime.core.data.RowKey; > import org.knime.core.data.container.BlobDataCell; > import org.knime.core.data.def.DefaultRow; > import org.knime.core.data.def.StringCell; > import org.knime.core.node.BufferedDataContainer; > import org.knime.core.node.BufferedDataTable; > import org.knime.core.node.CanceledExecutionException; > import org.knime.core.node.ExecutionContext; > import org.knime.core.node.ExecutionMonitor; > import org.knime.core.node.InvalidSettingsException; > import org.knime.core.node.NodeLogger; > import org.knime.core.node.NodeModel; > import org.knime.core.node.NodeSettingsRO; > import org.knime.core.node.NodeSettingsWO; > import org.knime.core.node.defaultnodesettings.SettingsModelString; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > import org.pasteur.pf2.datatypes.*; > /** > * This is the model implementation of FastAReader. Reads a FASTA file into > two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel extends NodeModel { > // the logger instance > private static final NodeLogger logger = NodeLogger > .getLogger(FastQReaderNodeModel.class); > private Alphabet alpha; > private SequenceIterator iter; > > /** > * the settings key which is used to retrieve and store the settings > (from > * the dialog or from a settings file) (package visibility to be usable > from > * the dialog). > */ > private static final String FAR_name = "far_name"; > > private static final String FAR_fileFormat = "far_ff"; > > private static final String FAR_alphabet = "far_alph"; > > private final SettingsModelString m_fpname = createFAR_fpname(); > private final SettingsModelString m_fformat = createFileFormat(); > private final SettingsModelString m_alphabet = createAlphabet(); > > /** > * Constructor for the node model. > */ > protected FastAReaderNodeModel() { > super(0, 1); > } > > /** > * {@inheritDoc} > */ > @Override > protected BufferedDataTable[] execute(final BufferedDataTable[] inData, > final ExecutionContext exec) throws Exception { > > // TODO do something here > logger.info("Node Model Stub... this is not yet implemented !"); > > // the data table spec of the single output table, > // the table will have three columns: > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > // the execution context will provide us with storage capacity, in > this > // case a data container to which we will add rows sequentially > // Note, this container can also handle arbitrary big data tables, > it > // will buffer to disc if necessary. > BufferedDataContainer container = > exec.createDataContainer(outputSpec); > // let's add m_count rows to it > // once we are done, we close the container and return its table > FileReader fp = new FileReader(m_fpname.getStringValue()); > > exec.checkCanceled(); > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > > BufferedReader br = new BufferedReader(fp); > // String line = br.readLine(); > int count = 0; > > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > exec.checkCanceled(); > RowKey key = new RowKey("Row " + count); > exec.setProgress("Row " + count); > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + seq.length()); > SequenceDataCell seqCell = new SequenceDataCell(seqName, seq); > container.addRowToTable(new DefaultRow(key, seqCell)); > count++; > } > System.err.println("finished reading file"); > br.close(); > fp.close(); > container.close(); > return new BufferedDataTable[] { container.getTable() }; > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > /** > * {@inheritDoc} > */ > @Override > protected void reset() { > } > > /** > * {@inheritDoc} > */ > @Override > protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) > throws InvalidSettingsException { > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > > return new DataTableSpec[] { outputSpec }; > > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveSettingsTo(final NodeSettingsWO settings) { > m_alphabet.saveSettingsTo(settings); > m_fformat.saveSettingsTo(settings); > m_fpname.saveSettingsTo(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadValidatedSettingsFrom(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.loadSettingsFrom(settings); > m_fformat.loadSettingsFrom(settings); > m_fpname.loadSettingsFrom(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void validateSettings(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.validateSettings(settings); > m_fformat.validateSettings(settings); > m_fpname.validateSettings(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > public static SettingsModelString createFAR_fpname() { > return new SettingsModelString(FAR_name, ""); > } > > public static SettingsModelString createFileFormat() { > return new SettingsModelString(FAR_fileFormat, "FASTA"); > } > > public static SettingsModelString createAlphabet() { > return new SettingsModelString(FAR_alphabet, "RNA"); > > } > > } > > > On 9/21/2010 2:40 PM, simon rayner wrote: > > hi, > > can you repost to the biojava group along with the full code, (just in case > there is a missing import or something). you only replied to, and not to > the biojava mailing list > > thanks > > simon > > On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla wrote: > >> Thanks for the quick reply! >> >> Here is some code that should have all the important parts: >> >> String form = "genbank"; >> String alphabet = "dna"; >> BufferedReader br = new BufferedReader(fp); >> SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( >> form, alphabet, br); >> while (iter.hasNext()) { >> Sequence seq = iter.nextSequence(); >> => Exception thrown >> String seqName = seq.getName(); >> } >> >> >> When trying to simplify the code a bit I now get the following error: >> Execute failed: Could not initialize class >> org.biojava.bio.seq.FeatureFilter >> >> I assume that in the previous times I had a spelling error?? >> Then the exception got thrown during the initialization of "iter" >> >> Thanks, >> >> Bernd >> >> >> On 9/21/2010 2:07 PM, simon rayner wrote: >> >> hi, >> >> can you post the code you are trying to run along with the full error, it >> will help to figure out what is happening. There are now loaders for >> biojavax as well, which work well which are available in the biojavax docs >> here http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >> >> but yeah, it's confusing unless you happen to be a real java guru. i keep >> having to refer back to the docs because i keep forgeting which class does >> what >> >> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla wrote: >> >>> Hello, >>> >>> I am getting a little frustrated with the wiki page (I guess I don't >>> spend enough time reading and testing). I have the impression that some of >>> the documentation relates to version 3 whereas others relate to 1.5 or 1.7. >>> So sorry if this all sounds a bit confused... ;( >>> >>> I believe I am using 1.7.1. (I wasn't able to find a readme file that >>> contains that information) even though I would probably like to use version >>> 3. But as I am stuck with an older Eclipse version I think it will be even >>> worse when I try that. >>> >>> Anyways, I am trying to read in sequence files using >>> SeqIOTools.fileToBiojava, which seems to be deprecated, with the following >>> parameters: "genbank", "dna", bufferedReader. >>> >>> somehow this works with "fasta" but with genbank I get the following >>> exception: >>> Execute failed: Unknown file type '524300' >>> in some cases I get: >>> Unknown file type '262156' >>> >>> Does this mean anything to you? >>> >>> Or how do you read in a sequence file? I am looking for a generic way >>> that covers many file types (genbank, fasta, swissprot...) >>> >>> Once I have this I will probably be able to get to the feature >>> information using the information from the tutorial. >>> >>> Thanks for your time. >>> >>> Bernd >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> >> > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From bernd.jagla at pasteur.fr Thu Sep 23 07:23:14 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 13:23:14 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> Message-ID: <4C9B38A2.8010006@pasteur.fr> Simon, thanks a lot!!! I implemented your way in a separate class and it works. Now I just have to get it work within my framework.... Best, Bernd On 9/22/2010 3:10 AM, simon rayner wrote: > sorry for the delay in replying due to time difference. > > this is a modified version of your code that uses biojavax. i > stripped out the pasteur stuff and added code to the *execute* method > (about line 74). Also marked the imports i added at the top > > hope this helps > > package cn.cas.wiv.bif.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > /******************* your biojava imports **********************/ > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > > /********* added these imports to make things work **********/ > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.RichSequence; > import org.biojavax.bio.seq.RichSequenceIterator; > import org.biojava.bio.seq.*; > import org.biojava.bio.symbol.*; > > /** > * This is the model implementation of FastAReader. Reads a FASTA file > into two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel { > // the logger instance > private Alphabet alpha; > private SequenceIterator iter; > > protected void execute(FileReader fp) throws Exception { > > /** > * {@inheritDoc} > */ > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > /****************** old way *********************/ > int count = 0; > BufferedReader br = new BufferedReader(fp); > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + > seq.length()); > count++; > } > System.err.println("finished reading file"); > > /****************** biojavax way *********************/ > RichSequence refRSequence; > SimpleNamespace ns = new SimpleNamespace("MTBGB"); > RichSequenceIterator rsi > = RichSequence.IOTools.readGenbankDNA(br, ns); > while(rsi.hasNext()) > { > refRSequence = rsi.nextRichSequence(); > System.out.println("read " + refRSequence.length() + " bases"); > /** if you want the features, use a FeatureFilter and a > FeatureHolder **/ > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > FeatureHolder fhRef = refRSequence.filter(ff); > } > > br.close(); > fp.close(); > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > } > > > On Tue, Sep 21, 2010 at 8:47 AM, Bernd Jagla > wrote: > > Sorry for the wrong reply... > Here is the FULL code I marked the passages that are important in red: > > Thanks for looking at it!!!! > > Bernd > > > package org.pasteur.pf2.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.knime.core.data.DataCell; > import org.knime.core.data.DataColumnSpec; > import org.knime.core.data.DataColumnSpecCreator; > import org.knime.core.data.DataTableSpec; > import org.knime.core.data.RowKey; > import org.knime.core.data.container.BlobDataCell; > import org.knime.core.data.def.DefaultRow; > import org.knime.core.data.def.StringCell; > import org.knime.core.node.BufferedDataContainer; > import org.knime.core.node.BufferedDataTable; > import org.knime.core.node.CanceledExecutionException; > import org.knime.core.node.ExecutionContext; > import org.knime.core.node.ExecutionMonitor; > import org.knime.core.node.InvalidSettingsException; > import org.knime.core.node.NodeLogger; > import org.knime.core.node.NodeModel; > import org.knime.core.node.NodeSettingsRO; > import org.knime.core.node.NodeSettingsWO; > import org.knime.core.node.defaultnodesettings.SettingsModelString; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > import org.pasteur.pf2.datatypes.*; > /** > * This is the model implementation of FastAReader. Reads a FASTA > file into two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel extends NodeModel { > // the logger instance > private static final NodeLogger logger = NodeLogger > .getLogger(FastQReaderNodeModel.class); > private Alphabet alpha; > private SequenceIterator iter; > > /** > * the settings key which is used to retrieve and store the > settings (from > * the dialog or from a settings file) (package visibility to > be usable from > * the dialog). > */ > private static final String FAR_name = "far_name"; > > private static final String FAR_fileFormat = "far_ff"; > > private static final String FAR_alphabet = "far_alph"; > > private final SettingsModelString m_fpname = createFAR_fpname(); > private final SettingsModelString m_fformat = createFileFormat(); > private final SettingsModelString m_alphabet = createAlphabet(); > > /** > * Constructor for the node model. > */ > protected FastAReaderNodeModel() { > super(0, 1); > } > > /** > * {@inheritDoc} > */ > @Override > protected BufferedDataTable[] execute(final > BufferedDataTable[] inData, > final ExecutionContext exec) throws Exception { > > // TODO do something here > logger.info ("Node Model Stub... this is not > yet implemented !"); > > // the data table spec of the single output table, > // the table will have three columns: > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > // the execution context will provide us with storage > capacity, in this > // case a data container to which we will add rows > sequentially > // Note, this container can also handle arbitrary big data > tables, it > // will buffer to disc if necessary. > BufferedDataContainer container = > exec.createDataContainer(outputSpec); > // let's add m_count rows to it > // once we are done, we close the container and return its > table > FileReader fp = new FileReader(m_fpname.getStringValue()); > > exec.checkCanceled(); > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > > BufferedReader br = new BufferedReader(fp); > // String line = br.readLine(); > int count = 0; > > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > exec.checkCanceled(); > RowKey key = new RowKey("Row " + count); > exec.setProgress("Row " + count); > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + > seq.length()); > SequenceDataCell seqCell = new > SequenceDataCell(seqName, seq); > container.addRowToTable(new DefaultRow(key, seqCell)); > count++; > } > System.err.println("finished reading file"); > br.close(); > fp.close(); > container.close(); > return new BufferedDataTable[] { container.getTable() }; > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() > on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > /** > * {@inheritDoc} > */ > @Override > protected void reset() { > } > > /** > * {@inheritDoc} > */ > @Override > protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) > throws InvalidSettingsException { > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > > return new DataTableSpec[] { outputSpec }; > > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveSettingsTo(final NodeSettingsWO settings) { > m_alphabet.saveSettingsTo(settings); > m_fformat.saveSettingsTo(settings); > m_fpname.saveSettingsTo(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadValidatedSettingsFrom(final NodeSettingsRO > settings) > throws InvalidSettingsException { > m_alphabet.loadSettingsFrom(settings); > m_fformat.loadSettingsFrom(settings); > m_fpname.loadSettingsFrom(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void validateSettings(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.validateSettings(settings); > m_fformat.validateSettings(settings); > m_fpname.validateSettings(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > public static SettingsModelString createFAR_fpname() { > return new SettingsModelString(FAR_name, ""); > } > > public static SettingsModelString createFileFormat() { > return new SettingsModelString(FAR_fileFormat, "FASTA"); > } > > public static SettingsModelString createAlphabet() { > return new SettingsModelString(FAR_alphabet, "RNA"); > > } > > } > > > On 9/21/2010 2:40 PM, simon rayner wrote: >> hi, >> >> can you repost to the biojava group along with the full code, >> (just in case there is a missing import or something). you only >> replied to, and not to the biojava mailing list >> >> thanks >> >> simon >> >> On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla >> > wrote: >> >> Thanks for the quick reply! >> >> Here is some code that should have all the important parts: >> >> String form = "genbank"; >> String alphabet = "dna"; >> BufferedReader br = new BufferedReader(fp); >> SequenceIterator iter = (SequenceIterator) >> SeqIOTools.fileToBiojava( >> form, alphabet, br); >> while (iter.hasNext()) { >> Sequence seq = iter.nextSequence(); >> => Exception thrown >> String seqName = seq.getName(); >> } >> >> >> When trying to simplify the code a bit I now get the >> following error: >> Execute failed: Could not initialize class >> org.biojava.bio.seq.FeatureFilter >> >> I assume that in the previous times I had a spelling error?? >> Then the exception got thrown during the initialization of "iter" >> >> Thanks, >> >> Bernd >> >> >> On 9/21/2010 2:07 PM, simon rayner wrote: >>> hi, >>> >>> can you post the code you are trying to run along with the >>> full error, it will help to figure out what is happening. >>> There are now loaders for biojavax as well, which work well >>> which are available in the biojavax docs here >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >>> >>> but yeah, it's confusing unless you happen to be a real java >>> guru. i keep having to refer back to the docs because i >>> keep forgeting which class does what >>> >>> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla >>> > wrote: >>> >>> Hello, >>> >>> I am getting a little frustrated with the wiki page (I >>> guess I don't spend enough time reading and testing). I >>> have the impression that some of the documentation >>> relates to version 3 whereas others relate to 1.5 or 1.7. >>> So sorry if this all sounds a bit confused... ;( >>> >>> I believe I am using 1.7.1. (I wasn't able to find a >>> readme file that contains that information) even though >>> I would probably like to use version 3. But as I am >>> stuck with an older Eclipse version I think it will be >>> even worse when I try that. >>> >>> Anyways, I am trying to read in sequence files using >>> SeqIOTools.fileToBiojava, which seems to be deprecated, >>> with the following parameters: "genbank", "dna", >>> bufferedReader. >>> >>> somehow this works with "fasta" but with genbank I get >>> the following exception: >>> Execute failed: Unknown file type '524300' >>> in some cases I get: >>> Unknown file type '262156' >>> >>> Does this mean anything to you? >>> >>> Or how do you read in a sequence file? I am looking for >>> a generic way that covers many file types (genbank, >>> fasta, swissprot...) >>> >>> Once I have this I will probably be able to get to the >>> feature information using the information from the >>> tutorial. >>> >>> Thanks for your time. >>> >>> Bernd >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> >>> -- >>> Simon Rayner >>> >>> State Key Laboratory of Virology >>> Wuhan Institute of Virology >>> Chinese Academy of Sciences >>> Wuhan, Hubei 430071 >>> P.R.China >>> >>> +86 (27) 87199895 (office) >>> +86 18627113001 (cell) >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > From bernd.jagla at pasteur.fr Thu Sep 23 10:40:02 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 16:40:02 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B38A2.8010006@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> Message-ID: <4C9B66C2.704@pasteur.fr> Hello again,... I am still struggling with my little problem. I am getting closer I think... I made some minor modifications to some biojava files and would like to compile it under 1.6. Is this possible? When I compare version 1.7.1 with mine the only difference seems to be the java version. And the sample code runs with your but not with mine... ;( I get the following error message: Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/utils/bytecode/CodeException at org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) at org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) Caused by: java.lang.ClassNotFoundException: org.biojava.utils.bytecode.CodeException at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) ... 11 more Do you think this problem can be due to the compiler??? Reading through some of the information on the web about NoClassDefFoundError it should be something like or my class path is messed or something like that... Thanks for any tip/hint and how to identify and solve this problem Best, Bernd From bernd.jagla at pasteur.fr Thu Sep 23 10:53:43 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 16:53:43 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B66C2.704@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> Message-ID: <4C9B69F7.8020309@pasteur.fr> Just to make sure that the "small" modification is not the issue: I added the follwing in FastqReader.java Iterable read(InputStream inputStream) throws IOException; Fastq readNext(InputStream inputStream) throws IOException; Since I needed an iterator for a different project. Best, Bernd On 9/23/2010 4:40 PM, Bernd Jagla wrote: > Hello again,... > > I am still struggling with my little problem. I am getting closer I > think... > I made some minor modifications to some biojava files and would like > to compile it under 1.6. Is this possible? > > When I compare version 1.7.1 with mine the only difference seems to be > the java version. And the sample code runs with your but not with > mine... ;( > > I get the following error message: > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/utils/bytecode/CodeException > at > org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) > at > org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) > at > org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) > at > org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) > at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) > at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) > Caused by: java.lang.ClassNotFoundException: > org.biojava.utils.bytecode.CodeException > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) > ... 11 more > > > Do you think this problem can be due to the compiler??? > > Reading through some of the information on the web about > NoClassDefFoundError it should be something like or my class path is > messed or something like that... > > Thanks for any tip/hint and how to identify and solve this problem > > Best, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From bernd.jagla at pasteur.fr Thu Sep 23 11:25:52 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 17:25:52 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B69F7.8020309@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> <4C9B69F7.8020309@pasteur.fr> Message-ID: <4C9B7180.1090702@pasteur.fr> Bingo, I got it. The problem was that I wasn't using ant to build the library but rather relied on Eclipse to do the job... now it seems to be working. Actually when including bytecode.jar as Richard suggested it even works with my KNIME applicattion... Sorry for that... But maybe the iterator would be good to include in the next release??? Cheers, Bernd On 9/23/2010 4:53 PM, Bernd Jagla wrote: > Just to make sure that the "small" modification is not the issue: > > I added the follwing in FastqReader.java > Iterable read(InputStream inputStream) throws IOException; > > Fastq readNext(InputStream inputStream) throws IOException; > > Since I needed an iterator for a different project. > > Best, > > Bernd > > > > On 9/23/2010 4:40 PM, Bernd Jagla wrote: >> Hello again,... >> >> I am still struggling with my little problem. I am getting closer I >> think... >> I made some minor modifications to some biojava files and would like >> to compile it under 1.6. Is this possible? >> >> When I compare version 1.7.1 with mine the only difference seems to >> be the java version. And the sample code runs with your but not with >> mine... ;( >> >> I get the following error message: >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/biojava/utils/bytecode/CodeException >> at >> org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) >> at >> org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) >> at >> org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) >> at >> org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) >> at >> org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) >> at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) >> at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) >> Caused by: java.lang.ClassNotFoundException: >> org.biojava.utils.bytecode.CodeException >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> ... 11 more >> >> >> Do you think this problem can be due to the compiler??? >> >> Reading through some of the information on the web about >> NoClassDefFoundError it should be something like or my class path is >> messed or something like that... >> >> Thanks for any tip/hint and how to identify and solve this problem >> >> Best, >> >> Bernd >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From bernd.jagla at pasteur.fr Thu Sep 23 11:27:42 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 17:27:42 +0200 Subject: [Biojava-l] new problem: serializable Message-ID: <4C9B71EE.2020603@pasteur.fr> Sorry, again me... I now get the following error: Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) ... 9 more It seems that the SimpleRichSequence is not serializable.... Is there a way to make use of a serializable object? Thanks, Bernd From holland at eaglegenomics.com Thu Sep 23 11:06:35 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Sep 2010 16:06:35 +0100 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B66C2.704@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> Message-ID: <648EA5EE-CB83-4BD3-AFFC-944C1BB22E09@eaglegenomics.com> Your classpath is missing bytecode.jar. It's available in the same location as the main biojava jars. cheers, Richard On 23 Sep 2010, at 15:40, Bernd Jagla wrote: > Hello again,... > > I am still struggling with my little problem. I am getting closer I think... > I made some minor modifications to some biojava files and would like to compile it under 1.6. Is this possible? > > When I compare version 1.7.1 with mine the only difference seems to be the java version. And the sample code runs with your but not with mine... ;( > > I get the following error message: > > Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/utils/bytecode/CodeException > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) > at org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) > at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) > at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) > Caused by: java.lang.ClassNotFoundException: org.biojava.utils.bytecode.CodeException > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) > ... 11 more > > > Do you think this problem can be due to the compiler??? > > Reading through some of the information on the web about NoClassDefFoundError it should be something like or my class path is messed or something like that... > > Thanks for any tip/hint and how to identify and solve this problem > > Best, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Sep 23 11:34:32 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Sep 2010 16:34:32 +0100 Subject: [Biojava-l] new problem: serializable In-Reply-To: <4C9B71EE.2020603@pasteur.fr> References: <4C9B71EE.2020603@pasteur.fr> Message-ID: <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > Sorry, again me... > > I now get the following error: > > Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > ... 9 more > > It seems that the SimpleRichSequence is not serializable.... > > Is there a way to make use of a serializable object? > > Thanks, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From james.swetnam at nyumc.org Thu Sep 23 12:01:51 2010 From: james.swetnam at nyumc.org (James Swetnam) Date: Thu, 23 Sep 2010 12:01:51 -0400 Subject: [Biojava-l] new problem: serializable In-Reply-To: <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> Message-ID: How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. James Swetnam On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: > The RichSequence interface doesn't extend Serializable, so therefore you > can't seralize BioJavaX sequence objects. :( I can't remember the logic > behind that one but it seemed like there was a good reason at the time... > > If you're passing sequences around by serialisation, do you really need to > pass the complete object or could you just pass the bits you're interested > in in some kind of basic data structure? > > > On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > > > Sorry, again me... > > > > I now get the following error: > > > > Caused by: java.io.NotSerializableException: > org.biojavax.bio.seq.SimpleRichSequence > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > > at > org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > > at > org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > > at > org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > > ... 9 more > > > > It seems that the SimpleRichSequence is not serializable.... > > > > Is there a way to make use of a serializable object? > > > > Thanks, > > > > Bernd > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- James Swetnam Lead Scientific Programmer Department of Pharmacology NYU Langone Medical Center From bernd.jagla at pasteur.fr Mon Sep 27 08:01:47 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 27 Sep 2010 14:01:47 +0200 Subject: [Biojava-l] new problem: serializable In-Reply-To: References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> Message-ID: <4CA087AB.2060808@pasteur.fr> Thanks everyone. I got the biojavax working. Unfortunately the serialization process is not completely done yet ... It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. But I also have problems with SimpleNCBITaxon. At least I was able to create a SimpleRichSequence object Please let me know if you can think of something that would ease the work a bit.... Thanks a lot, Bernd On 9/23/2010 6:01 PM, James Swetnam wrote: > How about subclassing SimpleRichSequence and implementing serializable > yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. > Hacky, but will get you over the bump. > > James Swetnam > > On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland > > wrote: > > The RichSequence interface doesn't extend Serializable, so > therefore you can't seralize BioJavaX sequence objects. :( I can't > remember the logic behind that one but it seemed like there was a > good reason at the time... > > If you're passing sequences around by serialisation, do you really > need to pass the complete object or could you just pass the bits > you're interested in in some kind of basic data structure? > > > On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > > > Sorry, again me... > > > > I now get the following error: > > > > Caused by: java.io.NotSerializableException: > org.biojavax.bio.seq.SimpleRichSequence > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > > at > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > > at > org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > > at > org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > > at > org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > > at > org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > > ... 9 more > > > > It seems that the SimpleRichSequence is not serializable.... > > > > Is there a way to make use of a serializable object? > > > > Thanks, > > > > Bernd > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > James Swetnam > Lead Scientific Programmer > Department of Pharmacology > NYU Langone Medical Center > From holland at eaglegenomics.com Mon Sep 27 08:04:42 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 27 Sep 2010 13:04:42 +0100 Subject: [Biojava-l] new problem: serializable In-Reply-To: <4CA087AB.2060808@pasteur.fr> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> <4CA087AB.2060808@pasteur.fr> Message-ID: <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. cheers, Richard On 27 Sep 2010, at 13:01, Bernd Jagla wrote: > Thanks everyone. I got the biojavax working. > Unfortunately the serialization process is not completely done yet ... > It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. > > Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. > But I also have problems with SimpleNCBITaxon. > > At least I was able to create a SimpleRichSequence object > > Please let me know if you can think of something that would ease the work a bit.... > > Thanks a lot, > > Bernd > > > > On 9/23/2010 6:01 PM, James Swetnam wrote: >> How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. >> >> James Swetnam >> >> On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: >> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... >> >> If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? >> >> >> On 23 Sep 2010, at 16:27, Bernd Jagla wrote: >> >> > Sorry, again me... >> > >> > I now get the following error: >> > >> > Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence >> > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) >> > at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) >> > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) >> > at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) >> > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) >> > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) >> > at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) >> > at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) >> > at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) >> > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) >> > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) >> > ... 9 more >> > >> > It seems that the SimpleRichSequence is not serializable.... >> > >> > Is there a way to make use of a serializable object? >> > >> > Thanks, >> > >> > Bernd >> > >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> -- >> James Swetnam >> Lead Scientific Programmer >> Department of Pharmacology >> NYU Langone Medical Center >> -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bernd.jagla at pasteur.fr Mon Sep 27 09:17:55 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 27 Sep 2010 15:17:55 +0200 Subject: [Biojava-l] new problem: serializable In-Reply-To: <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> <4CA087AB.2060808@pasteur.fr> <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> Message-ID: <4CA09983.9090107@pasteur.fr> Yes, that's what I am doing. I have subclassed from SimpleRichSequence to SimpleSerializableRichSequence (couldn't think of something nicer...) and am working my way through the bits... I just haven't found the tools that make this a "jiffy". I am sweating here; more or less at least.. ;) Thanks again B On 9/27/2010 2:04 PM, Richard Holland wrote: > I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. > > Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. > > cheers, > Richard > > On 27 Sep 2010, at 13:01, Bernd Jagla wrote: > >> Thanks everyone. I got the biojavax working. >> Unfortunately the serialization process is not completely done yet ... >> It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. >> >> Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. >> But I also have problems with SimpleNCBITaxon. >> >> At least I was able to create a SimpleRichSequence object >> >> Please let me know if you can think of something that would ease the work a bit.... >> >> Thanks a lot, >> >> Bernd >> >> >> >> On 9/23/2010 6:01 PM, James Swetnam wrote: >>> How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. >>> >>> James Swetnam >>> >>> On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: >>> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... >>> >>> If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? >>> >>> >>> On 23 Sep 2010, at 16:27, Bernd Jagla wrote: >>> >>>> Sorry, again me... >>>> >>>> I now get the following error: >>>> >>>> Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence >>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) >>>> at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) >>>> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) >>>> at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) >>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) >>>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) >>>> at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) >>>> at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) >>>> at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) >>>> at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) >>>> at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) >>>> ... 9 more >>>> >>>> It seems that the SimpleRichSequence is not serializable.... >>>> >>>> Is there a way to make use of a serializable object? >>>> >>>> Thanks, >>>> >>>> Bernd >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> -- >>> James Swetnam >>> Lead Scientific Programmer >>> Department of Pharmacology >>> NYU Langone Medical Center >>> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > From mdmiller53 at comcast.net Mon Sep 27 12:35:45 2010 From: mdmiller53 at comcast.net (mdmiller) Date: Mon, 27 Sep 2010 09:35:45 -0700 Subject: [Biojava-l] new problem: serializable In-Reply-To: References: Message-ID: <3143202B18DD41FE9FA8015A85E09D8C@mmPC> hi bernd, > Please let me know if you can think of something that would ease the > work a bit.... two ideas. if you know a resource you can refetch the record from, then you only have to serialize the identifier at that resource then when you deserialize, simply go to that resource. the other is to use XStream (http://xstream.codehaus.org/), which is open source, and i found does a good job. cheers, michael ----- Original Message ----- > > Message: 1 > Date: Mon, 27 Sep 2010 14:01:47 +0200 > From: Bernd Jagla > Subject: Re: [Biojava-l] new problem: serializable > To: James Swetnam > Cc: biojava-l at lists.open-bio.org > Message-ID: <4CA087AB.2060808 at pasteur.fr> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Thanks everyone. I got the biojavax working. > Unfortunately the serialization process is not completely done yet ... > It turns out the following information is more difficult than expected > to serialize... I haven't found a tool in Eclipse that can help me there. > > Generally the problems arise when dealing with Sets like annotation, > features, notes, RankedDocRef. > But I also have problems with SimpleNCBITaxon. > > At least I was able to create a SimpleRichSequence object > > Please let me know if you can think of something that would ease the > work a bit.... > > Thanks a lot, > > Bernd > > > From asandro1501 at gmail.com Mon Sep 27 21:17:11 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Mon, 27 Sep 2010 22:17:11 -0300 Subject: [Biojava-l] headers .gbk Message-ID: Good evening, I need a code to read the headers of a file .Gbk. I need to locate the occurrences of geneid. Can anyone help me? -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From jolyon.holdstock at ogt.co.uk Tue Sep 28 05:49:48 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue, 28 Sep 2010 10:49:48 +0100 Subject: [Biojava-l] headers .gbk[Scanned] In-Reply-To: References: Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> Hi, The db_xref annotations aren't picked up (please correct me if I'm wrong) so depending on how many files you are handling one way is to change all the gene ID annotations from: /db_xref="GeneID:1095448" to /GeneID="1095448" You can then create a ComparableTerm to extract these: ComparableTerm geneIDTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("GeneID"); There is an example in the cookbook on how you can use this: http://www.biojava.org/wiki/BioJava:Cookbook:Annotations:List2 hope this helps, J -----Original Message----- From: Alex Silva [mailto:asandro1501 at gmail.com] Sent: 28 September 2010 02:17 To: biojava-l at biojava.org Subject: [Biojava-l] headers .gbk[Scanned] Good evening, I need a code to read the headers of a file .Gbk. I need to locate the occurrences of geneid. Can anyone help me? -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. From asandro1501 at gmail.com Tue Sep 28 23:54:29 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Wed, 29 Sep 2010 00:54:29 -0300 Subject: [Biojava-l] headers .gbk[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> Message-ID: Hi Jolyon You can help me use the code, I need only search for the file geneid alt_Celera_chr22.gbk. I saw but I do not use code. Thank you for your attention 2010/9/28 Jolyon Holdstock > Hi, > > The db_xref annotations aren't picked up (please correct me if I'm > wrong) so depending on how many files you are handling one way is to > change all the gene ID annotations from: > > /db_xref="GeneID:1095448" to /GeneID="1095448" > > You can then create a ComparableTerm to extract these: > > ComparableTerm geneIDTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("GeneID"); > > There is an example in the cookbook on how you can use this: > > http://www.biojava.org/wiki/BioJava:Cookbook:Annotations:List2 > > hope this helps, > > J > > -----Original Message----- > From: Alex Silva [mailto:asandro1501 at gmail.com] > Sent: 28 September 2010 02:17 > To: biojava-l at biojava.org > Subject: [Biojava-l] headers .gbk[Scanned] > > Good evening, > > I need a code to read the headers of a file .Gbk. I need to locate the > occurrences of geneid. Can anyone help me? > > -- > Alex Silva > G.R.A. Sistemas Corporativos > msn: gra.sistemas at hotmail.com > 55-9165-7378 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From darnells at dnastar.com Wed Sep 29 00:29:00 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 28 Sep 2010 23:29:00 -0500 Subject: [Biojava-l] Chemical component files Message-ID: Greetings, I would like to install a local instance of the Chemical Component Dictionary (CCD) for use with BioJava's chemical component functionality. The wwPDB distributes the CCD as a single file; however, ChemCompGroupFactory.java and related classes expect individual cif.gz files. The individual cif.gz files are downloaded on-the-fly from http://www.rcsb.org/pdb/files/ligand/, but they are not discoverable from a web browser. Where can I find the individual files that form the CCD? Am I overlooking a utility to use the single file version? Best regards, Steve From andreas at sdsc.edu Wed Sep 29 13:43:50 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 29 Sep 2010 10:43:50 -0700 Subject: [Biojava-l] Chemical component files In-Reply-To: References: Message-ID: Hi Steve, BioJava can automatically fetch missing chemical component files from the RCSB web site for you. If you set up the PDB file path correctly the files will get cached in that location for future use. As such there should be no need to worry about getting them installed manually (unless you work with offline computers?). If you want access to the single files, you can access them by their three letter code: http://www.rcsb.org/pdb/files/ligand/TYS.cif Does that work for you? Andreas On Tue, Sep 28, 2010 at 9:29 PM, Steve Darnell wrote: > Greetings, > > I would like to install a local instance of the Chemical Component > Dictionary (CCD) for use with BioJava's chemical component > functionality. ?The wwPDB distributes the CCD as a single file; however, > ChemCompGroupFactory.java and related classes expect individual cif.gz > files. > > The individual cif.gz files are downloaded on-the-fly from > http://www.rcsb.org/pdb/files/ligand/, but they are not discoverable > from a web browser. ?Where can I find the individual files that form the > CCD? ?Am I overlooking a utility to use the single file version? > > Best regards, > Steve > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Thu Sep 9 22:24:35 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 9 Sep 2010 15:24:35 -0700 Subject: [Biojava-l] Biojava Post translational modifications In-Reply-To: <9CE87E39-5DE3-4996-A53F-63C2B5453901@gmail.com> References: <9CE87E39-5DE3-4996-A53F-63C2B5453901@gmail.com> Message-ID: Hi Jay, Is this from the latest svn-trunk? Sounds like this has been created using the biojava 1.7. There were several improvements over the last months regarding chemically modified groups .... In the current code base if you set FileParsingParameters.setLoadChemCompInfo(true), you will get the chemically correct representation for all groups... I suggest trying out the code below (using a checkout from biojava-svn ...) Andreas public void basicLoad(String pdbId){ try { PDBFileReader reader = new PDBFileReader(); // the path to the local PDB installation reader.setPath("/tmp"); // are all files in one directory, or are the files split, // as on the PDB ftp servers? reader.setPdbDirectorySplit(true); // should a missing PDB id be fetched automatically from the FTP servers? reader.setAutoFetch(true); // configure the parameters of file parsing FileParsingParameters params = new FileParsingParameters(); // should the ATOM and SEQRES residues be aligned when creating the internal data model? params.setAlignSeqRes(true); // should secondary structure get parsed from the file params.setParseSecStruc(false); // This tells the code to fetch the chemical definitions for all groups params.setLoadChemCompInfo(true); reader.setFileParsingParameters(params); Structure structure = reader.getStructureById(pdbId); System.out.println(structure); for (Chain c: structure.getChains()){ System.out.println("Chain " + c.getName() + " details:"); System.out.println("Atom ligands: " + c.getAtomLigands()); System.out.println(c.getSeqResGroups()); } } catch (Exception e){ e.printStackTrace(); } } On Thu, Sep 9, 2010 at 3:13 PM, JAX wrote: > Hi Andreas, some of my collaborators could not get post translational > modifications from pdb files using biojavas structure API. ?Do you have any > thoughts on this? > > Jay Vyas > MMSB > UCHC > Begin forwarded message: > > From: Patrick Gradie > Date: September 9, 2010 5:23:10 PM EDT > To: biotoolkit at googlegroups.com > Subject: Re: problems with biojava > Reply-To: biotoolkit at googlegroups.com > > The issue that I found with the BioJava PDB?utility is as follows: > BioJava takes a PDB File xxxx.cif.gz and then populates a Structure variable > in memory that you can pull from. > You are able to get things like header, dbref, model, chain, residue, and > atom info. That was good to have, however, I found that when I tried > searching for motifs I could not find any of the ones that had required > modifications. > This is because when biojava would parse?(ACE)SKS(MLZ)DRKYTL it would simply > truncate the (ACE) and (MLZ). ?However the important thing here is that MLZ > is an?N-METHYL-LYSINE or a K before modification. > So in the database would be SKSDRKY (there is no atom data for T or L in the > example string only sequence information) > The motif?[KR][AST]K[DNQK] would not be found in that truncated sequence > because the K in the center is required to be in the sequence. > I am not sure why BioJava would just truncate these modified residues. > ESPECIALLY because in the pdb file iteself is the following line in every > single file except around 15 out of the 64k: > > loop_ > _entity_poly.entity_id > _entity_poly.type > _entity_poly.nstd_linkage > _entity_poly.nstd_monomer > _entity_poly.pdbx_seq_one_letter_code > _entity_poly.pdbx_seq_one_letter_code_can > _entity_poly.pdbx_strand_id > 1 'polypeptide(L)' no no > ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM > PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN > TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS > GPEAPEWYQVELKAFQATQQK > ; > ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM > PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN > TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS > GPEAPEWYQVELKAFQATQQK > ; > A > 2 'polypeptide(L)' no yes '(ACE)SKS(MLZ)DRKYTL' > XSKSKDRKYTL > B > > As you can see above, the sequence XSKSKDRKYTL is given in full. ?the ACE is > turned into an X because it doesn't map to a regular amino acid. ?So the PDB > files hold both the modified and unmodified version of the sequence in this > special section. Given that information it is possible to create a database > that motifs can be searched for within. > BioJava will throw a bunch of errors "WARNING: unknown group name MLZ" for > residues it doesn't interpret as regular amino acids. > I am not sure, though, if the BioJava 3 release fixes this problem. > -Patrick > > On Thu, Sep 9, 2010 at 1:47 PM, Jay Vyas wrote: >> >> Hi guys, does anyone want to tell me about the issues regarding the >> PDB utilities in BioJava ? ?I am interested in knowing what they were >> ? >> >> -- >> Jay Vyas >> MMSB/UCHC > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From asandro1501 at gmail.com Fri Sep 10 01:17:19 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Thu, 9 Sep 2010 22:17:19 -0300 Subject: [Biojava-l] First project Biojava Message-ID: Hello I'm Brazilian so I apologize for my English. I am starting with BioJava. Need to build a code that works with file formats. Gbk and. Fa. To be more specific, I need to find in the headers of the files. Gbk the initial location of a particular protein and then perform the search on the file format. Fa. Thank you for listening. -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From anantpossible at gmail.com Sat Sep 11 04:42:30 2010 From: anantpossible at gmail.com (Anant Jain) Date: Sat, 11 Sep 2010 10:12:30 +0530 Subject: [Biojava-l] First project Biojava In-Reply-To: References: Message-ID: Hi Alex, I had done some work on PDB file and parsing its headers. I need to check about corresponding API for Genbank and Fasta files, will inform you once found any one. As you are new to Bio-java, you can contact me on Yahoo. My IM is anant_jain86 at yahoo.com I will be happy to help you. Regards, Anant Jain B.Tech Bioinformatics RHCE On Fri, Sep 10, 2010 at 6:47 AM, Alex Silva wrote: > Hello > > I'm Brazilian so I apologize for my English. I am starting with BioJava. > Need > to build a code that works with file formats. Gbk and. Fa. To be more > specific, I need to find in the headers of the files. Gbk the initial > location of a particular protein and then perform the search on the file > format. Fa. > > Thank you for listening. > > -- > Alex Silva > G.R.A. Sistemas Corporativos > msn: gra.sistemas at hotmail.com > 55-9165-7378 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Anant Jain B.Tech Bioinformatics, RHCE From nagendravns at gmail.com Mon Sep 13 12:36:46 2010 From: nagendravns at gmail.com (nagendra kumar) Date: Mon, 13 Sep 2010 18:06:46 +0530 Subject: [Biojava-l] run biojava Message-ID: i am runing the bio java in debian . how to set classpath &path in debian , error throws depricated API use in program how to resolbsd problem From nagendravns at gmail.com Mon Sep 13 14:30:51 2010 From: nagendravns at gmail.com (nagendra kumar) Date: Mon, 13 Sep 2010 20:00:51 +0530 Subject: [Biojava-l] run biojava Message-ID: how to run biojava in debian , From koen.bruynseels at cropdesign.com Mon Sep 13 16:22:48 2010 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Mon, 13 Sep 2010 18:22:48 +0200 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 09/11/2010 and will not return until 09/15/2010. I will respond to your message when I return. From simon.rayner.cn at gmail.com Tue Sep 14 01:18:15 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Mon, 13 Sep 2010 21:18:15 -0400 Subject: [Biojava-l] run biojava In-Reply-To: References: Message-ID: you can run it the same way you would run a java program. have a look at the cookbook page, there are many examples. http://biojava.org/wiki/BioJava:CookBook#How_Do_I.....3F this is a good first example to try and run. just copy, paste and compile http://biojava.org/wiki/BioJava:Cookbook:SeqIO:ReadFasta On Mon, Sep 13, 2010 at 10:30 AM, nagendra kumar wrote: > how to run biojava in debian , > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From paolo.romano at istge.it Wed Sep 15 15:51:34 2010 From: paolo.romano at istge.it (Paolo Romano) Date: Wed, 15 Sep 2010 17:51:34 +0200 Subject: [Biojava-l] NETTAB 2010: Submission deadline is approaching: Sep 24, 2010 Message-ID: <201009151552.o8FFpaXH000456@clus2.istge.it> I hope this announcement can be of interest for this list. Forgive me if I'm wrong! And apologies for any duplication. Ciao. Paolo ========== NETTAB 2010 on "Biological Wikis" joint with the BBCC 2010 workshop on Bioinformatics and Computational Biology in Campania November 29 - December 1, 2010, Naples, Italy http://www.nettab.org/2010/ http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/ The deadline for the submission of oral communications is quickly approaching, submit you contribution within next Friday September 24, 2010 through the EasyChair site ( http://www.easychair.org/conferences/?conf=nettab2010 ). The lenght of contributions for oral communications should be between 3 and 5 pages, including tables and figures. See more instructions below. NETTAB 2010 workshop promises to be a great meeting for all researchers involved in the exploitation of wikis in biology. Don't miss this opportunity to discuss your ideas and doubts with such scientists as - Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom - Alexander Pico, Gladstone Institute of Cardiovascular Disease, San Francisco, USA - Andrew Su, Bioinformatics and Computational Biology, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, USA - Dan Bolser, College of Life Sciences, University of Dundee, Scotland, United Kingdom - Robert Hoffmann, Computational Biology Center, cBIO, Memorial Sloan-Kettering Cancer Center, MSKCC, New York, USA - Thomas Kelder, Department of Bioinformatics (BiGCaT), Maastricht University, the Netherlands - Jaime Prilusky, Bioinformatics, Weizmann Institute of Science, Rehovot, Israel - and many other who, we hope, will join the workshop. Here below, please find a summary of the Call. The complete Call is available on-line at http://www.nettab.org/2010/call.html . Further information is availble at http://www.nettab.org/2010/ . ============ CALL FOR PAPERS TOPICS The following list is not meant to be exclusive of any further topics as stated above. Submitted contributions should address one or more of the following topics: * Wiki development tools o Wikimedia o Wikimedia extensions o Semantic Wikis o Wiki-coupled CMSs o Other wikis * Arising issues for the biomedical domain: o Authoritativeness of contributions and sites o Quality assessment o Users acknowledgement o Stimulatation of quality contributions o Authorships management and reward o 'Scientific production' value for contributions o Management of bioinformatics data types * Wikis and collaborative systems for: o Genomics, proteomics, metabolomics, any -omics o Proteins analysis and visualization o gene and proteins interactions o metabolic pathways o oncology research * Issues to be tackled by wiki and collaborative research for: o Genomics, proteomics, metabolomics, any -omics o Proteins analysis and visualization o gene and proteins interactions o metabolic pathways o oncology research The NETTAB 2010 workshop is a joint event with the BBCC 2010 workshop on This deadline also applies to the BBCC 2010 workshop. Submit for BBCC through the same EasyChair site and select 'BBCC session' topic. TYPE OF CONTRIBUTIONS The following possible contributions are sought: * Oral communications * Posters * Software demos All accepted contributions will be published in the proceedings of the workshop. DEADLINES * September 24, 2010: Oral communications submission o Decisions announced: October 24, 2010 * October 29, 2010: Early registration ends * November 29 - December 1, 2010: Workshop and Tutorials INSTRUCTIONS Kindly follow the instructions carefully when preparing your contribution and submit your contribution through the EasyChair system at http://www.easychair.org/conferences/?conf=nettab2010. All contributions should follow the same format, as specified here: font type: Times New Roman, font size: 12 pti, page size: A4, left and right margins: 2.0 cm, upper margin: 2.5 cm, lower margin: 2.0 cm. The lenght of contributions for oral communications should be between 3 and 5 pages, including tables and figures. They should include: Abstract, Introduction, Methods, Results and Discussion, References. All contributions for oral communications will be evaluated by at least three referees. For any further information or clarification, please contact the organization by email at info at nettab.org. ORGANIZATION (see http://www.nettab.org/2010/organization.html for the Scientific Committee and more information) Co-chairs * Angelo Facchiano, CNR-ISA, Avellino, Italy * Paolo Romano, National Cancer Research Institute, Genoa, Italy We look forward to meeting you in Naples! Paolo Romano and Angelo Facchiano on behalf of the Scientific Committee Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 Skype: p.romano Web: http://www.nettab.org/promano/ From bernd.jagla at pasteur.fr Tue Sep 21 11:46:35 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 13:46:35 +0200 Subject: [Biojava-l] fileToBiojava question Message-ID: <4C989B1B.7090807@pasteur.fr> Hello, I am getting a little frustrated with the wiki page (I guess I don't spend enough time reading and testing). I have the impression that some of the documentation relates to version 3 whereas others relate to 1.5 or 1.7. So sorry if this all sounds a bit confused... ;( I believe I am using 1.7.1. (I wasn't able to find a readme file that contains that information) even though I would probably like to use version 3. But as I am stuck with an older Eclipse version I think it will be even worse when I try that. Anyways, I am trying to read in sequence files using SeqIOTools.fileToBiojava, which seems to be deprecated, with the following parameters: "genbank", "dna", bufferedReader. somehow this works with "fasta" but with genbank I get the following exception: Execute failed: Unknown file type '524300' in some cases I get: Unknown file type '262156' Does this mean anything to you? Or how do you read in a sequence file? I am looking for a generic way that covers many file types (genbank, fasta, swissprot...) Once I have this I will probably be able to get to the feature information using the information from the tutorial. Thanks for your time. Bernd From simon.rayner.cn at gmail.com Tue Sep 21 12:07:21 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 21 Sep 2010 20:07:21 +0800 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C989B1B.7090807@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> Message-ID: hi, can you post the code you are trying to run along with the full error, it will help to figure out what is happening. There are now loaders for biojavax as well, which work well which are available in the biojavax docs here http://biojava.org/wiki/BioJava:BioJavaXDocs#Example but yeah, it's confusing unless you happen to be a real java guru. i keep having to refer back to the docs because i keep forgeting which class does what On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla wrote: > Hello, > > I am getting a little frustrated with the wiki page (I guess I don't spend > enough time reading and testing). I have the impression that some of the > documentation relates to version 3 whereas others relate to 1.5 or 1.7. > So sorry if this all sounds a bit confused... ;( > > I believe I am using 1.7.1. (I wasn't able to find a readme file that > contains that information) even though I would probably like to use version > 3. But as I am stuck with an older Eclipse version I think it will be even > worse when I try that. > > Anyways, I am trying to read in sequence files using > SeqIOTools.fileToBiojava, which seems to be deprecated, with the following > parameters: "genbank", "dna", bufferedReader. > > somehow this works with "fasta" but with genbank I get the following > exception: > Execute failed: Unknown file type '524300' > in some cases I get: > Unknown file type '262156' > > Does this mean anything to you? > > Or how do you read in a sequence file? I am looking for a generic way that > covers many file types (genbank, fasta, swissprot...) > > Once I have this I will probably be able to get to the feature information > using the information from the tutorial. > > Thanks for your time. > > Bernd > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From bernd.jagla at pasteur.fr Tue Sep 21 12:39:09 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 14:39:09 +0200 Subject: [Biojava-l] what is a namespace? Message-ID: <4C98A76D.5060309@pasteur.fr> Hi, sorry for the basic question, but I would like to clarify the following: When you are talking about namespace in your documentation (e.g. biojavaXDocs) it means that I tag the information that is associated with that namespace in order to differentiate it from something else. Is this a fair description? I can generate a name space using the following code: Namespace ns = (Namespace)RichObjectFactory.getObject(SimpleNamespace.class,new Object[]{"myNamespace"}); Why is the second parameter an array of objects? When would I use something other than a SimpleNamespace class? Could you point me to some examples of their use? In my code I can have many instances of a given class. Do I need to use different namespaces each time to avoid conflicts? E.g. I have class that reads in sequence and annotation data. Do I have use different namespaces for each instance? Thanks, Bernd PS. please let me know if these questions are too basic to ask!!!! Otherwise I will probably have some more ;) From bernd.jagla at pasteur.fr Tue Sep 21 12:47:21 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Tue, 21 Sep 2010 14:47:21 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> Message-ID: <4C98A959.2040405@pasteur.fr> Sorry for the wrong reply... Here is the FULL code I marked the passages that are important in red: Thanks for looking at it!!!! Bernd package org.pasteur.pf2.biojava; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Iterator; import java.util.NoSuchElementException; import org.biojava.bio.BioException; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.SequenceIterator; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.seq.io.SymbolTokenization; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.AlphabetManager; import org.biojava.bio.symbol.SymbolList; import org.biojavax.RichObjectFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.knime.core.data.DataCell; import org.knime.core.data.DataColumnSpec; import org.knime.core.data.DataColumnSpecCreator; import org.knime.core.data.DataTableSpec; import org.knime.core.data.RowKey; import org.knime.core.data.container.BlobDataCell; import org.knime.core.data.def.DefaultRow; import org.knime.core.data.def.StringCell; import org.knime.core.node.BufferedDataContainer; import org.knime.core.node.BufferedDataTable; import org.knime.core.node.CanceledExecutionException; import org.knime.core.node.ExecutionContext; import org.knime.core.node.ExecutionMonitor; import org.knime.core.node.InvalidSettingsException; import org.knime.core.node.NodeLogger; import org.knime.core.node.NodeModel; import org.knime.core.node.NodeSettingsRO; import org.knime.core.node.NodeSettingsWO; import org.knime.core.node.defaultnodesettings.SettingsModelString; import org.biojavax.bio.seq.io.EMBLFormat; import org.biojavax.bio.seq.io.FastaFormat; import org.biojavax.bio.seq.io.GenbankFormat; import org.biojavax.bio.seq.io.INSDseqFormat; import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.RichStreamReader; import org.biojavax.bio.seq.io.UniProtFormat; import org.pasteur.pf2.datatypes.*; /** * This is the model implementation of FastAReader. Reads a FASTA file into two * columns: seq_name and sequence * * @author Bernd Jagla */ @SuppressWarnings("deprecation") public class FastAReaderNodeModel extends NodeModel { // the logger instance private static final NodeLogger logger = NodeLogger .getLogger(FastQReaderNodeModel.class); private Alphabet alpha; private SequenceIterator iter; /** * the settings key which is used to retrieve and store the settings (from * the dialog or from a settings file) (package visibility to be usable from * the dialog). */ private static final String FAR_name = "far_name"; private static final String FAR_fileFormat = "far_ff"; private static final String FAR_alphabet = "far_alph"; private final SettingsModelString m_fpname = createFAR_fpname(); private final SettingsModelString m_fformat = createFileFormat(); private final SettingsModelString m_alphabet = createAlphabet(); /** * Constructor for the node model. */ protected FastAReaderNodeModel() { super(0, 1); } /** * {@inheritDoc} */ @Override protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception { // TODO do something here logger.info("Node Model Stub... this is not yet implemented !"); // the data table spec of the single output table, // the table will have three columns: DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; allColSpecs[0] = new DataColumnSpecCreator("sequence", SequenceDataCell.TYPE) .createSpec(); DataTableSpec outputSpec = new DataTableSpec(allColSpecs); // the execution context will provide us with storage capacity, in this // case a data container to which we will add rows sequentially // Note, this container can also handle arbitrary big data tables, it // will buffer to disc if necessary. BufferedDataContainer container = exec.createDataContainer(outputSpec); // let's add m_count rows to it // once we are done, we close the container and return its table FileReader fp = new FileReader(m_fpname.getStringValue()); exec.checkCanceled(); //String form = m_fformat.getStringValue(); //String alphabet = m_alphabet.getStringValue(); String form = "genbank"; String alphabet = "DNA"; BufferedReader br = new BufferedReader(fp); // String line = br.readLine(); int count = 0; SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( form, alphabet, br); while (iter.hasNext()) { exec.checkCanceled(); RowKey key = new RowKey("Row " + count); exec.setProgress("Row " + count); // System.out.println(fastq.getSequence()); Sequence seq = iter.nextSequence(); String seqName = seq.getName(); // String seqName = "asdf"; //String sequence = seq.seqString(); System.err.println("reading: " + seqName + " " + seq.length()); SequenceDataCell seqCell = new SequenceDataCell(seqName, seq); container.addRowToTable(new DefaultRow(key, seqCell)); count++; } System.err.println("finished reading file"); br.close(); fp.close(); container.close(); return new BufferedDataTable[] { container.getTable() }; } /** * Makes a SequenceIterator look like an * Iterator {@code } * * @param iter * The SequenceIterator * @return An Iterator that returns only Sequence * objects. You cannot call remove() on this * iterator! */ public Iterator asIterator(SequenceIterator iter) { final SequenceIterator it = iter; return new Iterator() { public boolean hasNext() { return it.hasNext(); } public Sequence next() { try { return it.nextSequence(); } catch (BioException e) { NoSuchElementException ex = new NoSuchElementException(); ex.initCause(e); throw ex; } } public void remove() { throw new UnsupportedOperationException(); } }; } public static RichSequenceFormat formatForName(String name) throws ClassNotFoundException, InstantiationException, IllegalAccessException { // determine the format to use RichSequenceFormat format; if (name.equalsIgnoreCase("fasta")) { format = (RichSequenceFormat) new FastaFormat(); } else if (name.equalsIgnoreCase("genbank")) { format = (RichSequenceFormat) new GenbankFormat(); } else if (name.equalsIgnoreCase("uniprot")) { format = new UniProtFormat(); } else if (name.equalsIgnoreCase("embl")) { format = new EMBLFormat(); } else if (name.equalsIgnoreCase("INSDseq")) { format = new INSDseqFormat(); } else { Class formatClass = Class.forName(name); format = (RichSequenceFormat) formatClass.newInstance(); } return format; } /** * {@inheritDoc} */ @Override protected void reset() { } /** * {@inheritDoc} */ @Override protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) throws InvalidSettingsException { DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; allColSpecs[0] = new DataColumnSpecCreator("sequence", SequenceDataCell.TYPE) .createSpec(); DataTableSpec outputSpec = new DataTableSpec(allColSpecs); return new DataTableSpec[] { outputSpec }; } /** * {@inheritDoc} */ @Override protected void saveSettingsTo(final NodeSettingsWO settings) { m_alphabet.saveSettingsTo(settings); m_fformat.saveSettingsTo(settings); m_fpname.saveSettingsTo(settings); } /** * {@inheritDoc} */ @Override protected void loadValidatedSettingsFrom(final NodeSettingsRO settings) throws InvalidSettingsException { m_alphabet.loadSettingsFrom(settings); m_fformat.loadSettingsFrom(settings); m_fpname.loadSettingsFrom(settings); } /** * {@inheritDoc} */ @Override protected void validateSettings(final NodeSettingsRO settings) throws InvalidSettingsException { m_alphabet.validateSettings(settings); m_fformat.validateSettings(settings); m_fpname.validateSettings(settings); } /** * {@inheritDoc} */ @Override protected void loadInternals(final File internDir, final ExecutionMonitor exec) throws IOException, CanceledExecutionException { } /** * {@inheritDoc} */ @Override protected void saveInternals(final File internDir, final ExecutionMonitor exec) throws IOException, CanceledExecutionException { } public static SettingsModelString createFAR_fpname() { return new SettingsModelString(FAR_name, ""); } public static SettingsModelString createFileFormat() { return new SettingsModelString(FAR_fileFormat, "FASTA"); } public static SettingsModelString createAlphabet() { return new SettingsModelString(FAR_alphabet, "RNA"); } } On 9/21/2010 2:40 PM, simon rayner wrote: > hi, > > can you repost to the biojava group along with the full code, (just in > case there is a missing import or something). you only replied to, > and not to the biojava mailing list > > thanks > > simon > > On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla > wrote: > > Thanks for the quick reply! > > Here is some code that should have all the important parts: > > String form = "genbank"; > String alphabet = "dna"; > BufferedReader br = new BufferedReader(fp); > SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( > form, alphabet, br); > while (iter.hasNext()) { > Sequence seq = iter.nextSequence(); > => Exception thrown > String seqName = seq.getName(); > } > > > When trying to simplify the code a bit I now get the following error: > Execute failed: Could not initialize class > org.biojava.bio.seq.FeatureFilter > > I assume that in the previous times I had a spelling error?? > Then the exception got thrown during the initialization of "iter" > > Thanks, > > Bernd > > > On 9/21/2010 2:07 PM, simon rayner wrote: >> hi, >> >> can you post the code you are trying to run along with the full >> error, it will help to figure out what is happening. There are >> now loaders for biojavax as well, which work well which are >> available in the biojavax docs here >> http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >> >> but yeah, it's confusing unless you happen to be a real java >> guru. i keep having to refer back to the docs because i keep >> forgeting which class does what >> >> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla >> > wrote: >> >> Hello, >> >> I am getting a little frustrated with the wiki page (I guess >> I don't spend enough time reading and testing). I have the >> impression that some of the documentation relates to version >> 3 whereas others relate to 1.5 or 1.7. >> So sorry if this all sounds a bit confused... ;( >> >> I believe I am using 1.7.1. (I wasn't able to find a readme >> file that contains that information) even though I would >> probably like to use version 3. But as I am stuck with an >> older Eclipse version I think it will be even worse when I >> try that. >> >> Anyways, I am trying to read in sequence files using >> SeqIOTools.fileToBiojava, which seems to be deprecated, with >> the following parameters: "genbank", "dna", bufferedReader. >> >> somehow this works with "fasta" but with genbank I get the >> following exception: >> Execute failed: Unknown file type '524300' >> in some cases I get: >> Unknown file type '262156' >> >> Does this mean anything to you? >> >> Or how do you read in a sequence file? I am looking for a >> generic way that covers many file types (genbank, fasta, >> swissprot...) >> >> Once I have this I will probably be able to get to the >> feature information using the information from the tutorial. >> >> Thanks for your time. >> >> Bernd >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > From simon.rayner.cn at gmail.com Wed Sep 22 01:10:09 2010 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 21 Sep 2010 21:10:09 -0400 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C98A959.2040405@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> Message-ID: sorry for the delay in replying due to time difference. this is a modified version of your code that uses biojavax. i stripped out the pasteur stuff and added code to the *execute* method (about line 74). Also marked the imports i added at the top hope this helps package cn.cas.wiv.bif.biojava; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Iterator; import java.util.NoSuchElementException; /******************* your biojava imports **********************/ import org.biojava.bio.BioException; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.SequenceIterator; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.seq.io.SymbolTokenization; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.AlphabetManager; import org.biojava.bio.symbol.SymbolList; import org.biojavax.RichObjectFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.EMBLFormat; import org.biojavax.bio.seq.io.FastaFormat; import org.biojavax.bio.seq.io.GenbankFormat; import org.biojavax.bio.seq.io.INSDseqFormat; import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; import org.biojavax.bio.seq.io.RichSequenceFormat; import org.biojavax.bio.seq.io.RichStreamReader; import org.biojavax.bio.seq.io.UniProtFormat; /********* added these imports to make things work **********/ import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; /** * This is the model implementation of FastAReader. Reads a FASTA file into two * columns: seq_name and sequence * * @author Bernd Jagla */ @SuppressWarnings("deprecation") public class FastAReaderNodeModel { // the logger instance private Alphabet alpha; private SequenceIterator iter; protected void execute(FileReader fp) throws Exception { /** * {@inheritDoc} */ //String form = m_fformat.getStringValue(); //String alphabet = m_alphabet.getStringValue(); String form = "genbank"; String alphabet = "DNA"; /****************** old way *********************/ int count = 0; BufferedReader br = new BufferedReader(fp); SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( form, alphabet, br); while (iter.hasNext()) { // System.out.println(fastq.getSequence()); Sequence seq = iter.nextSequence(); String seqName = seq.getName(); // String seqName = "asdf"; //String sequence = seq.seqString(); System.err.println("reading: " + seqName + " " + seq.length()); count++; } System.err.println("finished reading file"); /****************** biojavax way *********************/ RichSequence refRSequence; SimpleNamespace ns = new SimpleNamespace("MTBGB"); RichSequenceIterator rsi = RichSequence.IOTools.readGenbankDNA(br, ns); while(rsi.hasNext()) { refRSequence = rsi.nextRichSequence(); System.out.println("read " + refRSequence.length() + " bases"); /** if you want the features, use a FeatureFilter and a FeatureHolder **/ FeatureFilter ff = new FeatureFilter.ByType("CDS"); FeatureHolder fhRef = refRSequence.filter(ff); } br.close(); fp.close(); } /** * Makes a SequenceIterator look like an * Iterator {@code } * * @param iter * The SequenceIterator * @return An Iterator that returns only Sequence * objects. You cannot call remove() on this * iterator! */ public Iterator asIterator(SequenceIterator iter) { final SequenceIterator it = iter; return new Iterator() { public boolean hasNext() { return it.hasNext(); } public Sequence next() { try { return it.nextSequence(); } catch (BioException e) { NoSuchElementException ex = new NoSuchElementException(); ex.initCause(e); throw ex; } } public void remove() { throw new UnsupportedOperationException(); } }; } public static RichSequenceFormat formatForName(String name) throws ClassNotFoundException, InstantiationException, IllegalAccessException { // determine the format to use RichSequenceFormat format; if (name.equalsIgnoreCase("fasta")) { format = (RichSequenceFormat) new FastaFormat(); } else if (name.equalsIgnoreCase("genbank")) { format = (RichSequenceFormat) new GenbankFormat(); } else if (name.equalsIgnoreCase("uniprot")) { format = new UniProtFormat(); } else if (name.equalsIgnoreCase("embl")) { format = new EMBLFormat(); } else if (name.equalsIgnoreCase("INSDseq")) { format = new INSDseqFormat(); } else { Class formatClass = Class.forName(name); format = (RichSequenceFormat) formatClass.newInstance(); } return format; } } On Tue, Sep 21, 2010 at 8:47 AM, Bernd Jagla wrote: > Sorry for the wrong reply... > Here is the FULL code I marked the passages that are important in red: > > Thanks for looking at it!!!! > > Bernd > > > package org.pasteur.pf2.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.knime.core.data.DataCell; > import org.knime.core.data.DataColumnSpec; > import org.knime.core.data.DataColumnSpecCreator; > import org.knime.core.data.DataTableSpec; > import org.knime.core.data.RowKey; > import org.knime.core.data.container.BlobDataCell; > import org.knime.core.data.def.DefaultRow; > import org.knime.core.data.def.StringCell; > import org.knime.core.node.BufferedDataContainer; > import org.knime.core.node.BufferedDataTable; > import org.knime.core.node.CanceledExecutionException; > import org.knime.core.node.ExecutionContext; > import org.knime.core.node.ExecutionMonitor; > import org.knime.core.node.InvalidSettingsException; > import org.knime.core.node.NodeLogger; > import org.knime.core.node.NodeModel; > import org.knime.core.node.NodeSettingsRO; > import org.knime.core.node.NodeSettingsWO; > import org.knime.core.node.defaultnodesettings.SettingsModelString; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > import org.pasteur.pf2.datatypes.*; > /** > * This is the model implementation of FastAReader. Reads a FASTA file into > two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel extends NodeModel { > // the logger instance > private static final NodeLogger logger = NodeLogger > .getLogger(FastQReaderNodeModel.class); > private Alphabet alpha; > private SequenceIterator iter; > > /** > * the settings key which is used to retrieve and store the settings > (from > * the dialog or from a settings file) (package visibility to be usable > from > * the dialog). > */ > private static final String FAR_name = "far_name"; > > private static final String FAR_fileFormat = "far_ff"; > > private static final String FAR_alphabet = "far_alph"; > > private final SettingsModelString m_fpname = createFAR_fpname(); > private final SettingsModelString m_fformat = createFileFormat(); > private final SettingsModelString m_alphabet = createAlphabet(); > > /** > * Constructor for the node model. > */ > protected FastAReaderNodeModel() { > super(0, 1); > } > > /** > * {@inheritDoc} > */ > @Override > protected BufferedDataTable[] execute(final BufferedDataTable[] inData, > final ExecutionContext exec) throws Exception { > > // TODO do something here > logger.info("Node Model Stub... this is not yet implemented !"); > > // the data table spec of the single output table, > // the table will have three columns: > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > // the execution context will provide us with storage capacity, in > this > // case a data container to which we will add rows sequentially > // Note, this container can also handle arbitrary big data tables, > it > // will buffer to disc if necessary. > BufferedDataContainer container = > exec.createDataContainer(outputSpec); > // let's add m_count rows to it > // once we are done, we close the container and return its table > FileReader fp = new FileReader(m_fpname.getStringValue()); > > exec.checkCanceled(); > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > > BufferedReader br = new BufferedReader(fp); > // String line = br.readLine(); > int count = 0; > > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > exec.checkCanceled(); > RowKey key = new RowKey("Row " + count); > exec.setProgress("Row " + count); > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + seq.length()); > SequenceDataCell seqCell = new SequenceDataCell(seqName, seq); > container.addRowToTable(new DefaultRow(key, seqCell)); > count++; > } > System.err.println("finished reading file"); > br.close(); > fp.close(); > container.close(); > return new BufferedDataTable[] { container.getTable() }; > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > /** > * {@inheritDoc} > */ > @Override > protected void reset() { > } > > /** > * {@inheritDoc} > */ > @Override > protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) > throws InvalidSettingsException { > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > > return new DataTableSpec[] { outputSpec }; > > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveSettingsTo(final NodeSettingsWO settings) { > m_alphabet.saveSettingsTo(settings); > m_fformat.saveSettingsTo(settings); > m_fpname.saveSettingsTo(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadValidatedSettingsFrom(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.loadSettingsFrom(settings); > m_fformat.loadSettingsFrom(settings); > m_fpname.loadSettingsFrom(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void validateSettings(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.validateSettings(settings); > m_fformat.validateSettings(settings); > m_fpname.validateSettings(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > public static SettingsModelString createFAR_fpname() { > return new SettingsModelString(FAR_name, ""); > } > > public static SettingsModelString createFileFormat() { > return new SettingsModelString(FAR_fileFormat, "FASTA"); > } > > public static SettingsModelString createAlphabet() { > return new SettingsModelString(FAR_alphabet, "RNA"); > > } > > } > > > On 9/21/2010 2:40 PM, simon rayner wrote: > > hi, > > can you repost to the biojava group along with the full code, (just in case > there is a missing import or something). you only replied to, and not to > the biojava mailing list > > thanks > > simon > > On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla wrote: > >> Thanks for the quick reply! >> >> Here is some code that should have all the important parts: >> >> String form = "genbank"; >> String alphabet = "dna"; >> BufferedReader br = new BufferedReader(fp); >> SequenceIterator iter = (SequenceIterator) SeqIOTools.fileToBiojava( >> form, alphabet, br); >> while (iter.hasNext()) { >> Sequence seq = iter.nextSequence(); >> => Exception thrown >> String seqName = seq.getName(); >> } >> >> >> When trying to simplify the code a bit I now get the following error: >> Execute failed: Could not initialize class >> org.biojava.bio.seq.FeatureFilter >> >> I assume that in the previous times I had a spelling error?? >> Then the exception got thrown during the initialization of "iter" >> >> Thanks, >> >> Bernd >> >> >> On 9/21/2010 2:07 PM, simon rayner wrote: >> >> hi, >> >> can you post the code you are trying to run along with the full error, it >> will help to figure out what is happening. There are now loaders for >> biojavax as well, which work well which are available in the biojavax docs >> here http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >> >> but yeah, it's confusing unless you happen to be a real java guru. i keep >> having to refer back to the docs because i keep forgeting which class does >> what >> >> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla wrote: >> >>> Hello, >>> >>> I am getting a little frustrated with the wiki page (I guess I don't >>> spend enough time reading and testing). I have the impression that some of >>> the documentation relates to version 3 whereas others relate to 1.5 or 1.7. >>> So sorry if this all sounds a bit confused... ;( >>> >>> I believe I am using 1.7.1. (I wasn't able to find a readme file that >>> contains that information) even though I would probably like to use version >>> 3. But as I am stuck with an older Eclipse version I think it will be even >>> worse when I try that. >>> >>> Anyways, I am trying to read in sequence files using >>> SeqIOTools.fileToBiojava, which seems to be deprecated, with the following >>> parameters: "genbank", "dna", bufferedReader. >>> >>> somehow this works with "fasta" but with genbank I get the following >>> exception: >>> Execute failed: Unknown file type '524300' >>> in some cases I get: >>> Unknown file type '262156' >>> >>> Does this mean anything to you? >>> >>> Or how do you read in a sequence file? I am looking for a generic way >>> that covers many file types (genbank, fasta, swissprot...) >>> >>> Once I have this I will probably be able to get to the feature >>> information using the information from the tutorial. >>> >>> Thanks for your time. >>> >>> Bernd >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> >> > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From bernd.jagla at pasteur.fr Thu Sep 23 11:23:14 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 13:23:14 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> Message-ID: <4C9B38A2.8010006@pasteur.fr> Simon, thanks a lot!!! I implemented your way in a separate class and it works. Now I just have to get it work within my framework.... Best, Bernd On 9/22/2010 3:10 AM, simon rayner wrote: > sorry for the delay in replying due to time difference. > > this is a modified version of your code that uses biojavax. i > stripped out the pasteur stuff and added code to the *execute* method > (about line 74). Also marked the imports i added at the top > > hope this helps > > package cn.cas.wiv.bif.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > /******************* your biojava imports **********************/ > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > > /********* added these imports to make things work **********/ > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.RichSequence; > import org.biojavax.bio.seq.RichSequenceIterator; > import org.biojava.bio.seq.*; > import org.biojava.bio.symbol.*; > > /** > * This is the model implementation of FastAReader. Reads a FASTA file > into two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel { > // the logger instance > private Alphabet alpha; > private SequenceIterator iter; > > protected void execute(FileReader fp) throws Exception { > > /** > * {@inheritDoc} > */ > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > /****************** old way *********************/ > int count = 0; > BufferedReader br = new BufferedReader(fp); > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + > seq.length()); > count++; > } > System.err.println("finished reading file"); > > /****************** biojavax way *********************/ > RichSequence refRSequence; > SimpleNamespace ns = new SimpleNamespace("MTBGB"); > RichSequenceIterator rsi > = RichSequence.IOTools.readGenbankDNA(br, ns); > while(rsi.hasNext()) > { > refRSequence = rsi.nextRichSequence(); > System.out.println("read " + refRSequence.length() + " bases"); > /** if you want the features, use a FeatureFilter and a > FeatureHolder **/ > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > FeatureHolder fhRef = refRSequence.filter(ff); > } > > br.close(); > fp.close(); > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > } > > > On Tue, Sep 21, 2010 at 8:47 AM, Bernd Jagla > wrote: > > Sorry for the wrong reply... > Here is the FULL code I marked the passages that are important in red: > > Thanks for looking at it!!!! > > Bernd > > > package org.pasteur.pf2.biojava; > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileReader; > import java.io.IOException; > import java.util.Iterator; > import java.util.NoSuchElementException; > > import org.biojava.bio.BioException; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.seq.SequenceIterator; > import org.biojava.bio.seq.io.SeqIOTools; > import org.biojava.bio.seq.io.SymbolTokenization; > import org.biojava.bio.symbol.Alphabet; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.SymbolList; > import org.biojavax.RichObjectFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.knime.core.data.DataCell; > import org.knime.core.data.DataColumnSpec; > import org.knime.core.data.DataColumnSpecCreator; > import org.knime.core.data.DataTableSpec; > import org.knime.core.data.RowKey; > import org.knime.core.data.container.BlobDataCell; > import org.knime.core.data.def.DefaultRow; > import org.knime.core.data.def.StringCell; > import org.knime.core.node.BufferedDataContainer; > import org.knime.core.node.BufferedDataTable; > import org.knime.core.node.CanceledExecutionException; > import org.knime.core.node.ExecutionContext; > import org.knime.core.node.ExecutionMonitor; > import org.knime.core.node.InvalidSettingsException; > import org.knime.core.node.NodeLogger; > import org.knime.core.node.NodeModel; > import org.knime.core.node.NodeSettingsRO; > import org.knime.core.node.NodeSettingsWO; > import org.knime.core.node.defaultnodesettings.SettingsModelString; > import org.biojavax.bio.seq.io.EMBLFormat; > import org.biojavax.bio.seq.io.FastaFormat; > import org.biojavax.bio.seq.io.GenbankFormat; > import org.biojavax.bio.seq.io.INSDseqFormat; > import org.biojavax.bio.seq.io.RichSequenceBuilderFactory; > import org.biojavax.bio.seq.io.RichSequenceFormat; > import org.biojavax.bio.seq.io.RichStreamReader; > import org.biojavax.bio.seq.io.UniProtFormat; > import org.pasteur.pf2.datatypes.*; > /** > * This is the model implementation of FastAReader. Reads a FASTA > file into two > * columns: seq_name and sequence > * > * @author Bernd Jagla > */ > @SuppressWarnings("deprecation") > public class FastAReaderNodeModel extends NodeModel { > // the logger instance > private static final NodeLogger logger = NodeLogger > .getLogger(FastQReaderNodeModel.class); > private Alphabet alpha; > private SequenceIterator iter; > > /** > * the settings key which is used to retrieve and store the > settings (from > * the dialog or from a settings file) (package visibility to > be usable from > * the dialog). > */ > private static final String FAR_name = "far_name"; > > private static final String FAR_fileFormat = "far_ff"; > > private static final String FAR_alphabet = "far_alph"; > > private final SettingsModelString m_fpname = createFAR_fpname(); > private final SettingsModelString m_fformat = createFileFormat(); > private final SettingsModelString m_alphabet = createAlphabet(); > > /** > * Constructor for the node model. > */ > protected FastAReaderNodeModel() { > super(0, 1); > } > > /** > * {@inheritDoc} > */ > @Override > protected BufferedDataTable[] execute(final > BufferedDataTable[] inData, > final ExecutionContext exec) throws Exception { > > // TODO do something here > logger.info ("Node Model Stub... this is not > yet implemented !"); > > // the data table spec of the single output table, > // the table will have three columns: > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > // the execution context will provide us with storage > capacity, in this > // case a data container to which we will add rows > sequentially > // Note, this container can also handle arbitrary big data > tables, it > // will buffer to disc if necessary. > BufferedDataContainer container = > exec.createDataContainer(outputSpec); > // let's add m_count rows to it > // once we are done, we close the container and return its > table > FileReader fp = new FileReader(m_fpname.getStringValue()); > > exec.checkCanceled(); > //String form = m_fformat.getStringValue(); > //String alphabet = m_alphabet.getStringValue(); > String form = "genbank"; > String alphabet = "DNA"; > > BufferedReader br = new BufferedReader(fp); > // String line = br.readLine(); > int count = 0; > > SequenceIterator iter = (SequenceIterator) > SeqIOTools.fileToBiojava( > form, alphabet, br); > > while (iter.hasNext()) { > exec.checkCanceled(); > RowKey key = new RowKey("Row " + count); > exec.setProgress("Row " + count); > // System.out.println(fastq.getSequence()); > Sequence seq = iter.nextSequence(); > String seqName = seq.getName(); > // String seqName = "asdf"; > //String sequence = seq.seqString(); > System.err.println("reading: " + seqName + " " + > seq.length()); > SequenceDataCell seqCell = new > SequenceDataCell(seqName, seq); > container.addRowToTable(new DefaultRow(key, seqCell)); > count++; > } > System.err.println("finished reading file"); > br.close(); > fp.close(); > container.close(); > return new BufferedDataTable[] { container.getTable() }; > } > > /** > * Makes a SequenceIterator look like an > * Iterator {@code } > * > * @param iter > * The SequenceIterator > * @return An Iterator that returns only > Sequence > * objects. You cannot call remove() > on this > * iterator! > */ > public Iterator asIterator(SequenceIterator iter) { > final SequenceIterator it = iter; > return new Iterator() { > public boolean hasNext() { > return it.hasNext(); > } > > public Sequence next() { > try { > return it.nextSequence(); > } catch (BioException e) { > NoSuchElementException ex = new > NoSuchElementException(); > ex.initCause(e); > throw ex; > } > } > > public void remove() { > throw new UnsupportedOperationException(); > } > }; > } > > public static RichSequenceFormat formatForName(String name) > throws ClassNotFoundException, InstantiationException, > IllegalAccessException { > // determine the format to use > RichSequenceFormat format; > if (name.equalsIgnoreCase("fasta")) { > format = (RichSequenceFormat) new FastaFormat(); > } else if (name.equalsIgnoreCase("genbank")) { > format = (RichSequenceFormat) new GenbankFormat(); > } else if (name.equalsIgnoreCase("uniprot")) { > format = new UniProtFormat(); > } else if (name.equalsIgnoreCase("embl")) { > format = new EMBLFormat(); > } else if (name.equalsIgnoreCase("INSDseq")) { > format = new INSDseqFormat(); > } else { > Class formatClass = Class.forName(name); > format = (RichSequenceFormat) formatClass.newInstance(); > } > return format; > } > > /** > * {@inheritDoc} > */ > @Override > protected void reset() { > } > > /** > * {@inheritDoc} > */ > @Override > protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) > throws InvalidSettingsException { > DataColumnSpec[] allColSpecs = new DataColumnSpec[1]; > allColSpecs[0] = new DataColumnSpecCreator("sequence", > SequenceDataCell.TYPE) > .createSpec(); > DataTableSpec outputSpec = new DataTableSpec(allColSpecs); > > return new DataTableSpec[] { outputSpec }; > > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveSettingsTo(final NodeSettingsWO settings) { > m_alphabet.saveSettingsTo(settings); > m_fformat.saveSettingsTo(settings); > m_fpname.saveSettingsTo(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadValidatedSettingsFrom(final NodeSettingsRO > settings) > throws InvalidSettingsException { > m_alphabet.loadSettingsFrom(settings); > m_fformat.loadSettingsFrom(settings); > m_fpname.loadSettingsFrom(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void validateSettings(final NodeSettingsRO settings) > throws InvalidSettingsException { > m_alphabet.validateSettings(settings); > m_fformat.validateSettings(settings); > m_fpname.validateSettings(settings); > } > > /** > * {@inheritDoc} > */ > @Override > protected void loadInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > /** > * {@inheritDoc} > */ > @Override > protected void saveInternals(final File internDir, > final ExecutionMonitor exec) throws IOException, > CanceledExecutionException { > } > > public static SettingsModelString createFAR_fpname() { > return new SettingsModelString(FAR_name, ""); > } > > public static SettingsModelString createFileFormat() { > return new SettingsModelString(FAR_fileFormat, "FASTA"); > } > > public static SettingsModelString createAlphabet() { > return new SettingsModelString(FAR_alphabet, "RNA"); > > } > > } > > > On 9/21/2010 2:40 PM, simon rayner wrote: >> hi, >> >> can you repost to the biojava group along with the full code, >> (just in case there is a missing import or something). you only >> replied to, and not to the biojava mailing list >> >> thanks >> >> simon >> >> On Tue, Sep 21, 2010 at 8:18 PM, Bernd Jagla >> > wrote: >> >> Thanks for the quick reply! >> >> Here is some code that should have all the important parts: >> >> String form = "genbank"; >> String alphabet = "dna"; >> BufferedReader br = new BufferedReader(fp); >> SequenceIterator iter = (SequenceIterator) >> SeqIOTools.fileToBiojava( >> form, alphabet, br); >> while (iter.hasNext()) { >> Sequence seq = iter.nextSequence(); >> => Exception thrown >> String seqName = seq.getName(); >> } >> >> >> When trying to simplify the code a bit I now get the >> following error: >> Execute failed: Could not initialize class >> org.biojava.bio.seq.FeatureFilter >> >> I assume that in the previous times I had a spelling error?? >> Then the exception got thrown during the initialization of "iter" >> >> Thanks, >> >> Bernd >> >> >> On 9/21/2010 2:07 PM, simon rayner wrote: >>> hi, >>> >>> can you post the code you are trying to run along with the >>> full error, it will help to figure out what is happening. >>> There are now loaders for biojavax as well, which work well >>> which are available in the biojavax docs here >>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Example >>> >>> but yeah, it's confusing unless you happen to be a real java >>> guru. i keep having to refer back to the docs because i >>> keep forgeting which class does what >>> >>> On Tue, Sep 21, 2010 at 7:46 PM, Bernd Jagla >>> > wrote: >>> >>> Hello, >>> >>> I am getting a little frustrated with the wiki page (I >>> guess I don't spend enough time reading and testing). I >>> have the impression that some of the documentation >>> relates to version 3 whereas others relate to 1.5 or 1.7. >>> So sorry if this all sounds a bit confused... ;( >>> >>> I believe I am using 1.7.1. (I wasn't able to find a >>> readme file that contains that information) even though >>> I would probably like to use version 3. But as I am >>> stuck with an older Eclipse version I think it will be >>> even worse when I try that. >>> >>> Anyways, I am trying to read in sequence files using >>> SeqIOTools.fileToBiojava, which seems to be deprecated, >>> with the following parameters: "genbank", "dna", >>> bufferedReader. >>> >>> somehow this works with "fasta" but with genbank I get >>> the following exception: >>> Execute failed: Unknown file type '524300' >>> in some cases I get: >>> Unknown file type '262156' >>> >>> Does this mean anything to you? >>> >>> Or how do you read in a sequence file? I am looking for >>> a generic way that covers many file types (genbank, >>> fasta, swissprot...) >>> >>> Once I have this I will probably be able to get to the >>> feature information using the information from the >>> tutorial. >>> >>> Thanks for your time. >>> >>> Bernd >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> >>> -- >>> Simon Rayner >>> >>> State Key Laboratory of Virology >>> Wuhan Institute of Virology >>> Chinese Academy of Sciences >>> Wuhan, Hubei 430071 >>> P.R.China >>> >>> +86 (27) 87199895 (office) >>> +86 18627113001 (cell) >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > From bernd.jagla at pasteur.fr Thu Sep 23 14:40:02 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 16:40:02 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B38A2.8010006@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> Message-ID: <4C9B66C2.704@pasteur.fr> Hello again,... I am still struggling with my little problem. I am getting closer I think... I made some minor modifications to some biojava files and would like to compile it under 1.6. Is this possible? When I compare version 1.7.1 with mine the only difference seems to be the java version. And the sample code runs with your but not with mine... ;( I get the following error message: Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/utils/bytecode/CodeException at org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) at org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) Caused by: java.lang.ClassNotFoundException: org.biojava.utils.bytecode.CodeException at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) ... 11 more Do you think this problem can be due to the compiler??? Reading through some of the information on the web about NoClassDefFoundError it should be something like or my class path is messed or something like that... Thanks for any tip/hint and how to identify and solve this problem Best, Bernd From bernd.jagla at pasteur.fr Thu Sep 23 14:53:43 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 16:53:43 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B66C2.704@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> Message-ID: <4C9B69F7.8020309@pasteur.fr> Just to make sure that the "small" modification is not the issue: I added the follwing in FastqReader.java Iterable read(InputStream inputStream) throws IOException; Fastq readNext(InputStream inputStream) throws IOException; Since I needed an iterator for a different project. Best, Bernd On 9/23/2010 4:40 PM, Bernd Jagla wrote: > Hello again,... > > I am still struggling with my little problem. I am getting closer I > think... > I made some minor modifications to some biojava files and would like > to compile it under 1.6. Is this possible? > > When I compare version 1.7.1 with mine the only difference seems to be > the java version. And the sample code runs with your but not with > mine... ;( > > I get the following error message: > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/utils/bytecode/CodeException > at > org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) > at > org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) > at > org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) > at > org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) > at > org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) > at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) > at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) > Caused by: java.lang.ClassNotFoundException: > org.biojava.utils.bytecode.CodeException > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) > ... 11 more > > > Do you think this problem can be due to the compiler??? > > Reading through some of the information on the web about > NoClassDefFoundError it should be something like or my class path is > messed or something like that... > > Thanks for any tip/hint and how to identify and solve this problem > > Best, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From bernd.jagla at pasteur.fr Thu Sep 23 15:25:52 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 17:25:52 +0200 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B69F7.8020309@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> <4C9B69F7.8020309@pasteur.fr> Message-ID: <4C9B7180.1090702@pasteur.fr> Bingo, I got it. The problem was that I wasn't using ant to build the library but rather relied on Eclipse to do the job... now it seems to be working. Actually when including bytecode.jar as Richard suggested it even works with my KNIME applicattion... Sorry for that... But maybe the iterator would be good to include in the next release??? Cheers, Bernd On 9/23/2010 4:53 PM, Bernd Jagla wrote: > Just to make sure that the "small" modification is not the issue: > > I added the follwing in FastqReader.java > Iterable read(InputStream inputStream) throws IOException; > > Fastq readNext(InputStream inputStream) throws IOException; > > Since I needed an iterator for a different project. > > Best, > > Bernd > > > > On 9/23/2010 4:40 PM, Bernd Jagla wrote: >> Hello again,... >> >> I am still struggling with my little problem. I am getting closer I >> think... >> I made some minor modifications to some biojava files and would like >> to compile it under 1.6. Is this possible? >> >> When I compare version 1.7.1 with mine the only difference seems to >> be the java version. And the sample code runs with your but not with >> mine... ;( >> >> I get the following error message: >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/biojava/utils/bytecode/CodeException >> at >> org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) >> at >> org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) >> at >> org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) >> at >> org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) >> at >> org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) >> at >> org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) >> at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) >> at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) >> Caused by: java.lang.ClassNotFoundException: >> org.biojava.utils.bytecode.CodeException >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> ... 11 more >> >> >> Do you think this problem can be due to the compiler??? >> >> Reading through some of the information on the web about >> NoClassDefFoundError it should be something like or my class path is >> messed or something like that... >> >> Thanks for any tip/hint and how to identify and solve this problem >> >> Best, >> >> Bernd >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From bernd.jagla at pasteur.fr Thu Sep 23 15:27:42 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Thu, 23 Sep 2010 17:27:42 +0200 Subject: [Biojava-l] new problem: serializable Message-ID: <4C9B71EE.2020603@pasteur.fr> Sorry, again me... I now get the following error: Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) ... 9 more It seems that the SimpleRichSequence is not serializable.... Is there a way to make use of a serializable object? Thanks, Bernd From holland at eaglegenomics.com Thu Sep 23 15:06:35 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Sep 2010 16:06:35 +0100 Subject: [Biojava-l] fileToBiojava question In-Reply-To: <4C9B66C2.704@pasteur.fr> References: <4C989B1B.7090807@pasteur.fr> <4C98A2A0.7040803@pasteur.fr> <4C98A959.2040405@pasteur.fr> <4C9B38A2.8010006@pasteur.fr> <4C9B66C2.704@pasteur.fr> Message-ID: <648EA5EE-CB83-4BD3-AFFC-944C1BB22E09@eaglegenomics.com> Your classpath is missing bytecode.jar. It's available in the same location as the main biojava jars. cheers, Richard On 23 Sep 2010, at 15:40, Bernd Jagla wrote: > Hello again,... > > I am still struggling with my little problem. I am getting closer I think... > I made some minor modifications to some biojava files and would like to compile it under 1.6. Is this possible? > > When I compare version 1.7.1 with mine the only difference seems to be the java version. And the sample code runs with your but not with mine... ;( > > I get the following error message: > > Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/utils/bytecode/CodeException > at org.biojava.bio.seq.FeatureFilter$OnlyChildren.(FeatureFilter.java:1273) > at org.biojava.bio.seq.FeatureFilter.(FeatureFilter.java:1817) > at org.biojava.bio.seq.SimpleFeatureHolder.(SimpleFeatureHolder.java:54) > at org.biojavax.bio.seq.RichFeature$Tools.makeEmptyFeature(RichFeature.java:167) > at org.biojavax.bio.seq.io.RichSeqIOAdapter.(RichSeqIOAdapter.java:61) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:100) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.(SimpleRichSequenceBuilder.java:81) > at org.biojavax.bio.seq.io.SimpleRichSequenceBuilderFactory.makeSequenceBuilder(SimpleRichSequenceBuilderFactory.java:68) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) > at org.pasteur.pf2.biojava.biojavaIO.execute(biojavaIO.java:54) > at org.pasteur.pf2.biojava.biojavaIO.main(biojavaIO.java:29) > Caused by: java.lang.ClassNotFoundException: org.biojava.utils.bytecode.CodeException > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:252) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) > ... 11 more > > > Do you think this problem can be due to the compiler??? > > Reading through some of the information on the web about NoClassDefFoundError it should be something like or my class path is messed or something like that... > > Thanks for any tip/hint and how to identify and solve this problem > > Best, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Sep 23 15:34:32 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Sep 2010 16:34:32 +0100 Subject: [Biojava-l] new problem: serializable In-Reply-To: <4C9B71EE.2020603@pasteur.fr> References: <4C9B71EE.2020603@pasteur.fr> Message-ID: <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > Sorry, again me... > > I now get the following error: > > Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > ... 9 more > > It seems that the SimpleRichSequence is not serializable.... > > Is there a way to make use of a serializable object? > > Thanks, > > Bernd > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From james.swetnam at nyumc.org Thu Sep 23 16:01:51 2010 From: james.swetnam at nyumc.org (James Swetnam) Date: Thu, 23 Sep 2010 12:01:51 -0400 Subject: [Biojava-l] new problem: serializable In-Reply-To: <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> Message-ID: How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. James Swetnam On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: > The RichSequence interface doesn't extend Serializable, so therefore you > can't seralize BioJavaX sequence objects. :( I can't remember the logic > behind that one but it seemed like there was a good reason at the time... > > If you're passing sequences around by serialisation, do you really need to > pass the complete object or could you just pass the bits you're interested > in in some kind of basic data structure? > > > On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > > > Sorry, again me... > > > > I now get the following error: > > > > Caused by: java.io.NotSerializableException: > org.biojavax.bio.seq.SimpleRichSequence > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > > at > org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > > at > org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > > at > org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > > ... 9 more > > > > It seems that the SimpleRichSequence is not serializable.... > > > > Is there a way to make use of a serializable object? > > > > Thanks, > > > > Bernd > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- James Swetnam Lead Scientific Programmer Department of Pharmacology NYU Langone Medical Center From bernd.jagla at pasteur.fr Mon Sep 27 12:01:47 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 27 Sep 2010 14:01:47 +0200 Subject: [Biojava-l] new problem: serializable In-Reply-To: References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> Message-ID: <4CA087AB.2060808@pasteur.fr> Thanks everyone. I got the biojavax working. Unfortunately the serialization process is not completely done yet ... It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. But I also have problems with SimpleNCBITaxon. At least I was able to create a SimpleRichSequence object Please let me know if you can think of something that would ease the work a bit.... Thanks a lot, Bernd On 9/23/2010 6:01 PM, James Swetnam wrote: > How about subclassing SimpleRichSequence and implementing serializable > yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. > Hacky, but will get you over the bump. > > James Swetnam > > On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland > > wrote: > > The RichSequence interface doesn't extend Serializable, so > therefore you can't seralize BioJavaX sequence objects. :( I can't > remember the logic behind that one but it seemed like there was a > good reason at the time... > > If you're passing sequences around by serialisation, do you really > need to pass the complete object or could you just pass the bits > you're interested in in some kind of basic data structure? > > > On 23 Sep 2010, at 16:27, Bernd Jagla wrote: > > > Sorry, again me... > > > > I now get the following error: > > > > Caused by: java.io.NotSerializableException: > org.biojavax.bio.seq.SimpleRichSequence > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) > > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) > > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) > > at > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) > > at > org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) > > at > org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) > > at > org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) > > at > org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) > > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) > > ... 9 more > > > > It seems that the SimpleRichSequence is not serializable.... > > > > Is there a way to make use of a serializable object? > > > > Thanks, > > > > Bernd > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > James Swetnam > Lead Scientific Programmer > Department of Pharmacology > NYU Langone Medical Center > From holland at eaglegenomics.com Mon Sep 27 12:04:42 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 27 Sep 2010 13:04:42 +0100 Subject: [Biojava-l] new problem: serializable In-Reply-To: <4CA087AB.2060808@pasteur.fr> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> <4CA087AB.2060808@pasteur.fr> Message-ID: <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. cheers, Richard On 27 Sep 2010, at 13:01, Bernd Jagla wrote: > Thanks everyone. I got the biojavax working. > Unfortunately the serialization process is not completely done yet ... > It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. > > Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. > But I also have problems with SimpleNCBITaxon. > > At least I was able to create a SimpleRichSequence object > > Please let me know if you can think of something that would ease the work a bit.... > > Thanks a lot, > > Bernd > > > > On 9/23/2010 6:01 PM, James Swetnam wrote: >> How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. >> >> James Swetnam >> >> On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: >> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... >> >> If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? >> >> >> On 23 Sep 2010, at 16:27, Bernd Jagla wrote: >> >> > Sorry, again me... >> > >> > I now get the following error: >> > >> > Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence >> > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) >> > at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) >> > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) >> > at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) >> > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) >> > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) >> > at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) >> > at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) >> > at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) >> > at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) >> > at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) >> > ... 9 more >> > >> > It seems that the SimpleRichSequence is not serializable.... >> > >> > Is there a way to make use of a serializable object? >> > >> > Thanks, >> > >> > Bernd >> > >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> -- >> James Swetnam >> Lead Scientific Programmer >> Department of Pharmacology >> NYU Langone Medical Center >> -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bernd.jagla at pasteur.fr Mon Sep 27 13:17:55 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Mon, 27 Sep 2010 15:17:55 +0200 Subject: [Biojava-l] new problem: serializable In-Reply-To: <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> References: <4C9B71EE.2020603@pasteur.fr> <85C715D2-0053-4C7E-BDB9-C3AE67E460BE@eaglegenomics.com> <4CA087AB.2060808@pasteur.fr> <3A9596A6-4A05-49DF-9F4D-48E439F7E8BA@eaglegenomics.com> Message-ID: <4CA09983.9090107@pasteur.fr> Yes, that's what I am doing. I have subclassed from SimpleRichSequence to SimpleSerializableRichSequence (couldn't think of something nicer...) and am working my way through the bits... I just haven't found the tools that make this a "jiffy". I am sweating here; more or less at least.. ;) Thanks again B On 9/27/2010 2:04 PM, Richard Holland wrote: > I think you can follow James' advice and subclass SimpleRichSequence, and then annotate it such that the awkward bits are not seralised. > > Or, you can just extract the parameters of interest out of the original object and put them into some holding class (e.g. a simple HashMap) as I suggested and serialise that instead. > > cheers, > Richard > > On 27 Sep 2010, at 13:01, Bernd Jagla wrote: > >> Thanks everyone. I got the biojavax working. >> Unfortunately the serialization process is not completely done yet ... >> It turns out the following information is more difficult than expected to serialize... I haven't found a tool in Eclipse that can help me there. >> >> Generally the problems arise when dealing with Sets like annotation, features, notes, RankedDocRef. >> But I also have problems with SimpleNCBITaxon. >> >> At least I was able to create a SimpleRichSequence object >> >> Please let me know if you can think of something that would ease the work a bit.... >> >> Thanks a lot, >> >> Bernd >> >> >> >> On 9/23/2010 6:01 PM, James Swetnam wrote: >>> How about subclassing SimpleRichSequence and implementing serializable yourself? Doesn't seem to be final. Eclipse can do it in a jiffy. Hacky, but will get you over the bump. >>> >>> James Swetnam >>> >>> On Thu, Sep 23, 2010 at 11:34 AM, Richard Holland wrote: >>> The RichSequence interface doesn't extend Serializable, so therefore you can't seralize BioJavaX sequence objects. :( I can't remember the logic behind that one but it seemed like there was a good reason at the time... >>> >>> If you're passing sequences around by serialisation, do you really need to pass the complete object or could you just pass the bits you're interested in in some kind of basic data structure? >>> >>> >>> On 23 Sep 2010, at 16:27, Bernd Jagla wrote: >>> >>>> Sorry, again me... >>>> >>>> I now get the following error: >>>> >>>> Caused by: java.io.NotSerializableException: org.biojavax.bio.seq.SimpleRichSequence >>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1156) >>>> at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) >>>> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) >>>> at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) >>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) >>>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) >>>> at org.knime.core.data.container.DCObjectOutputVersion2.writeDataCellPerJavaSerialization(DCObjectOutputVersion2.java:127) >>>> at org.knime.core.data.container.Buffer.writeBlobDataCell(Buffer.java:1253) >>>> at org.knime.core.data.container.Buffer.handleIncomingBlob(Buffer.java:790) >>>> at org.knime.core.data.container.Buffer.saveBlobs(Buffer.java:607) >>>> at org.knime.core.data.container.Buffer.addRow(Buffer.java:551) >>>> ... 9 more >>>> >>>> It seems that the SimpleRichSequence is not serializable.... >>>> >>>> Is there a way to make use of a serializable object? >>>> >>>> Thanks, >>>> >>>> Bernd >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> -- >>> James Swetnam >>> Lead Scientific Programmer >>> Department of Pharmacology >>> NYU Langone Medical Center >>> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > From mdmiller53 at comcast.net Mon Sep 27 16:35:45 2010 From: mdmiller53 at comcast.net (mdmiller) Date: Mon, 27 Sep 2010 09:35:45 -0700 Subject: [Biojava-l] new problem: serializable In-Reply-To: References: Message-ID: <3143202B18DD41FE9FA8015A85E09D8C@mmPC> hi bernd, > Please let me know if you can think of something that would ease the > work a bit.... two ideas. if you know a resource you can refetch the record from, then you only have to serialize the identifier at that resource then when you deserialize, simply go to that resource. the other is to use XStream (http://xstream.codehaus.org/), which is open source, and i found does a good job. cheers, michael ----- Original Message ----- > > Message: 1 > Date: Mon, 27 Sep 2010 14:01:47 +0200 > From: Bernd Jagla > Subject: Re: [Biojava-l] new problem: serializable > To: James Swetnam > Cc: biojava-l at lists.open-bio.org > Message-ID: <4CA087AB.2060808 at pasteur.fr> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Thanks everyone. I got the biojavax working. > Unfortunately the serialization process is not completely done yet ... > It turns out the following information is more difficult than expected > to serialize... I haven't found a tool in Eclipse that can help me there. > > Generally the problems arise when dealing with Sets like annotation, > features, notes, RankedDocRef. > But I also have problems with SimpleNCBITaxon. > > At least I was able to create a SimpleRichSequence object > > Please let me know if you can think of something that would ease the > work a bit.... > > Thanks a lot, > > Bernd > > > From asandro1501 at gmail.com Tue Sep 28 01:17:11 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Mon, 27 Sep 2010 22:17:11 -0300 Subject: [Biojava-l] headers .gbk Message-ID: Good evening, I need a code to read the headers of a file .Gbk. I need to locate the occurrences of geneid. Can anyone help me? -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From jolyon.holdstock at ogt.co.uk Tue Sep 28 09:49:48 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue, 28 Sep 2010 10:49:48 +0100 Subject: [Biojava-l] headers .gbk[Scanned] In-Reply-To: References: Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> Hi, The db_xref annotations aren't picked up (please correct me if I'm wrong) so depending on how many files you are handling one way is to change all the gene ID annotations from: /db_xref="GeneID:1095448" to /GeneID="1095448" You can then create a ComparableTerm to extract these: ComparableTerm geneIDTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("GeneID"); There is an example in the cookbook on how you can use this: http://www.biojava.org/wiki/BioJava:Cookbook:Annotations:List2 hope this helps, J -----Original Message----- From: Alex Silva [mailto:asandro1501 at gmail.com] Sent: 28 September 2010 02:17 To: biojava-l at biojava.org Subject: [Biojava-l] headers .gbk[Scanned] Good evening, I need a code to read the headers of a file .Gbk. I need to locate the occurrences of geneid. Can anyone help me? -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. From asandro1501 at gmail.com Wed Sep 29 03:54:29 2010 From: asandro1501 at gmail.com (Alex Silva) Date: Wed, 29 Sep 2010 00:54:29 -0300 Subject: [Biojava-l] headers .gbk[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F0336CE17@EUCLID.internal.ogtip.com> Message-ID: Hi Jolyon You can help me use the code, I need only search for the file geneid alt_Celera_chr22.gbk. I saw but I do not use code. Thank you for your attention 2010/9/28 Jolyon Holdstock > Hi, > > The db_xref annotations aren't picked up (please correct me if I'm > wrong) so depending on how many files you are handling one way is to > change all the gene ID annotations from: > > /db_xref="GeneID:1095448" to /GeneID="1095448" > > You can then create a ComparableTerm to extract these: > > ComparableTerm geneIDTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("GeneID"); > > There is an example in the cookbook on how you can use this: > > http://www.biojava.org/wiki/BioJava:Cookbook:Annotations:List2 > > hope this helps, > > J > > -----Original Message----- > From: Alex Silva [mailto:asandro1501 at gmail.com] > Sent: 28 September 2010 02:17 > To: biojava-l at biojava.org > Subject: [Biojava-l] headers .gbk[Scanned] > > Good evening, > > I need a code to read the headers of a file .Gbk. I need to locate the > occurrences of geneid. Can anyone help me? > > -- > Alex Silva > G.R.A. Sistemas Corporativos > msn: gra.sistemas at hotmail.com > 55-9165-7378 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > -- Alex Silva G.R.A. Sistemas Corporativos msn: gra.sistemas at hotmail.com 55-9165-7378 From darnells at dnastar.com Wed Sep 29 04:29:00 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 28 Sep 2010 23:29:00 -0500 Subject: [Biojava-l] Chemical component files Message-ID: Greetings, I would like to install a local instance of the Chemical Component Dictionary (CCD) for use with BioJava's chemical component functionality. The wwPDB distributes the CCD as a single file; however, ChemCompGroupFactory.java and related classes expect individual cif.gz files. The individual cif.gz files are downloaded on-the-fly from http://www.rcsb.org/pdb/files/ligand/, but they are not discoverable from a web browser. Where can I find the individual files that form the CCD? Am I overlooking a utility to use the single file version? Best regards, Steve From andreas at sdsc.edu Wed Sep 29 17:43:50 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 29 Sep 2010 10:43:50 -0700 Subject: [Biojava-l] Chemical component files In-Reply-To: References: Message-ID: Hi Steve, BioJava can automatically fetch missing chemical component files from the RCSB web site for you. If you set up the PDB file path correctly the files will get cached in that location for future use. As such there should be no need to worry about getting them installed manually (unless you work with offline computers?). If you want access to the single files, you can access them by their three letter code: http://www.rcsb.org/pdb/files/ligand/TYS.cif Does that work for you? Andreas On Tue, Sep 28, 2010 at 9:29 PM, Steve Darnell wrote: > Greetings, > > I would like to install a local instance of the Chemical Component > Dictionary (CCD) for use with BioJava's chemical component > functionality. ?The wwPDB distributes the CCD as a single file; however, > ChemCompGroupFactory.java and related classes expect individual cif.gz > files. > > The individual cif.gz files are downloaded on-the-fly from > http://www.rcsb.org/pdb/files/ligand/, but they are not discoverable > from a web browser. ?Where can I find the individual files that form the > CCD? ?Am I overlooking a utility to use the single file version? > > Best regards, > Steve > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 -----------------------------------------------------------------------