From augustovmail-java at yahoo.com.br Mon Sep 1 10:21:25 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Mon, 1 Sep 2008 16:21:25 +0200 Subject: [Biojava-l] Exception org.hibernate.NonUniqueObjectException In-Reply-To: References: <381a3e850808200636n50e8700ap21d54a4554dd2fb5@mail.gmail.com> Message-ID: <381a3e850809010721h19a99ecfncd88c8f4d9d5f1@mail.gmail.com> Hi Richard, Thank you by the suggestion, but it didn't work also. After a lot of tests, I discovered the error and I implemented one solution. There is one error in the method RichObjectFactory.clearLRUCache(). This method doesn't clear the static field CONTAINS_TERM in the class SimpleRichFeatureRelationship. I created one static method in this class to clear the field CONTAINS_TERM. Now, finally, the program is working... Cheers, Augusto 2008/8/26 Richard Holland > Hello. > > In this function: > > > if (countOrfs % stepORF == 0) { > System.out.println(countOrfs); > session.flush(); > tx.commit(); > session.clear(); > session.close(); > RichObjectFactory.clearLRUCache(); > session = sessionFactory.openSession(); > RichObjectFactory.connectToBioSQL(session); > > > RichObjectFactory.setDefaultNamespaceName(Messages.getString("nameSpaceDefault")); > tx = session.beginTransaction(); > } > > I don't think it's necessary to close and reopen the session. I think > the following should work fine: > > > if (countOrfs % stepORF == 0) { > System.out.println(countOrfs); > session.flush(); > tx.commit(); > session.clear(); > RichObjectFactory.clearLRUCache(); > tx = session.beginTransaction(); > } > > cheers, > Richard > > > 2008/8/20 Augusto Fernandes Vellozo : > > Hi, > > I am trying to load a lot of features from one file to MYSQL and i am > having > > problems to do this with BIOJAVA/hibernate. > > If I don't do the flush/clear in the session, i have one exception like > > OutOfMemory. > > > > But, after I do the flush/clear, the second query throws the exception: > > org.hibernate.NonUniqueObjectException: a different object with the same > > identifier value was already associated with the session: [Term#23755] > > > > I've already tried to clean the RichObjectFactory, but it doesn't work. > > > > Please, some one knows what could be happening? Some suggestion? > > The code is below. > > > > Thanks, > > -- > > Augusto F. Vellozo > > > > import java.io.BufferedReader; > > import java.io.File; > > import java.io.FileReader; > > import java.util.TreeSet; > > > > import org.biojava.bio.BioException; > > import org.biojavax.RichObjectFactory; > > import org.biojavax.SimpleRichAnnotation; > > import org.biojavax.bio.seq.RichFeature; > > import org.biojavax.bio.seq.SimplePosition; > > import org.biojavax.bio.seq.SimpleRichFeature; > > import org.biojavax.bio.seq.SimpleRichLocation; > > import org.biojavax.bio.seq.RichLocation.Strand; > > import org.biojavax.bio.taxa.NCBITaxon; > > import org.biojavax.ontology.SimpleComparableOntology; > > import org.hibernate.Session; > > import org.hibernate.SessionFactory; > > import org.hibernate.Transaction; > > import org.hibernate.cfg.Configuration; > > > > public class LoadORFVRTest > > { > > public static void main(String[] args) { > > SessionFactory sessionFactory = new > > Configuration().configure("hibernate.cfg.xml").buildSessionFactory(); > > Session session = sessionFactory.openSession(); > > RichObjectFactory.connectToBioSQL(session); > > > > > RichObjectFactory.setDefaultNamespaceName(Messages.getString("nameSpaceDefault")); > > Transaction tx = session.beginTransaction(); > > try { > > //file orfs > > File fileOrfs; > > fileOrfs = new File(args[0]); > > > > String orfName, geneName = ""; > > BufferedReader br = new BufferedReader(new > > FileReader(fileOrfs)); > > String line, line2, line3, lineAmino; > > int countOrfs = 0; > > int beginPos = -1, endPos = -1, nextPos = -1; > > int strand = 0; > > int stepORF = > > Integer.parseInt(Messages.getString("LoadORFVR.printORF")); > > while ((line = br.readLine()) != null) { > > if (line.length() > 0) { > > if (line.startsWith(">")) { //ORF heading > > //new ORF > > //save last ORF > > if (strand != 0) { > > saveORF(session, strand, beginPos, endPos, > > nextPos - 1, geneName, Integer.parseInt(args[1])); > > countOrfs++; > > } > > if (countOrfs % stepORF == 0) { > > System.out.println(countOrfs); > > session.flush(); > > tx.commit(); > > session.clear(); > > session.close(); > > RichObjectFactory.clearLRUCache(); > > session = sessionFactory.openSession(); > > RichObjectFactory.connectToBioSQL(session); > > > > > RichObjectFactory.setDefaultNamespaceName(Messages.getString("nameSpaceDefault")); > > tx = session.beginTransaction(); > > } > > orfName = line.substring(1); > > geneName = orfName.substring(0, > > orfName.indexOf("_")); > > line = br.readLine(); > > > > if (line.startsWith("Reading frame: ")) { > > strand = Integer.parseInt(line.substring(15)); > > if (strand == 0) { > > System.out.println("Format error, strand = > > 0"); > > } > > else { > > nextPos = 1; > > beginPos = -1; > > endPos = -1; > > } > > } > > else { > > System.out.println("Format error in line > > 'Reading frame':" + line); > > strand = 0; > > } > > br.readLine(); // empty line > > } > > else if (strand != 0) { //ORF sequence > > line2 = br.readLine(); > > line3 = br.readLine(); > > br.readLine(); // empty line > > if (strand < 0) { > > lineAmino = line3; > > } > > else { > > lineAmino = line; > > } > > lineAmino = lineAmino.substring(3, > > lineAmino.length() - 1); > > if (lineAmino.trim().length() != 0) { > > if (beginPos < 0) { > > beginPos = nextPos + > > firstPosNotSpace(lineAmino) - 1; > > } > > endPos = nextPos + lastPosNotSpace(lineAmino) > + > > 1; > > } > > nextPos += lineAmino.length(); > > } > > } > > } > > if (strand != 0) { > > saveORF(session, strand, beginPos, endPos, nextPos - 1, > > geneName, Integer.parseInt(args[1])); > > } > > session.flush(); > > tx.commit(); > > session.clear(); > > } > > catch (Exception e) { > > e.printStackTrace(); > > } > > finally { > > if (tx.isActive()) { > > tx.rollback(); > > } > > session.close(); > > } > > > > } > > > > public static void saveORF(Session session, int strand, int beginPos, > > int endPos, int lastPos, String geneName, > > int ncbiTaxonId) throws BioException { > > SimplePosition beginPosition, endPosition; > > if (strand < 0 && beginPos < 4) { > > beginPosition = new SimplePosition(true, false, beginPos); > > } > > else { > > beginPosition = new SimplePosition(beginPos); > > } > > if (strand > 0 && (endPos == lastPos)) { > > endPosition = new SimplePosition(false, true, endPos); > > } > > else { > > endPosition = new SimplePosition(endPos); > > } > > // save; > > NCBITaxon taxon = (NCBITaxon) session.createQuery("from Taxon > where > > ncbi_taxon_id=:ncbiTaxonNumber").setInteger( > > "ncbiTaxonNumber", ncbiTaxonId).uniqueResult(); > > > > SimpleComparableOntology ontFeatures = (SimpleComparableOntology) > > RichObjectFactory.getObject( > > SimpleComparableOntology.class, new Object[] > > {Messages.getString("ontologyFeatures")}); > > SimpleComparableOntology ontGeneral = ((SimpleComparableOntology) > > RichObjectFactory.getObject( > > SimpleComparableOntology.class, new Object[] > > {Messages.getString("ontologyGeneral")})); > > SimpleRichFeature featureGene = (SimpleRichFeature) > > session.createQuery( > > "select f from Feature as f join f.parent as b where " > > + "f.name=:geneName and f.typeTerm=:geneTerm and > > b.taxon=:taxonId ").setString("geneName", geneName).setParameter( > > "taxonId", taxon).setParameter("geneTerm", > > > ontFeatures.getOrCreateTerm(Messages.getString("termGene"))).uniqueResult(); > > RichFeature.Template ft = new RichFeature.Template(); > > ft.location = featureGene.getLocation().translate(0); > > ft.sourceTerm = > > ontGeneral.getOrCreateTerm(Messages.getString("termVR")); > > ft.typeTerm = > > ontFeatures.getOrCreateTerm(Messages.getString("termMRNA")); > > ft.annotation = new SimpleRichAnnotation(); > > ft.featureRelationshipSet = new TreeSet(); > > ft.rankedCrossRefs = new TreeSet(); > > SimpleRichFeature featureMRNA = (SimpleRichFeature) > > featureGene.createFeature(ft); > > featureMRNA.setName(geneName); > > > > ft = new RichFeature.Template(); > > if (strand < 0) { > > ft.location = new SimpleRichLocation(beginPosition, > endPosition, > > 0, Strand.NEGATIVE_STRAND); > > } > > else { > > ft.location = new SimpleRichLocation(beginPosition, > endPosition, > > 0, Strand.POSITIVE_STRAND); > > } > > ft.sourceTerm = > > ontGeneral.getOrCreateTerm(Messages.getString("termVR")); > > ft.typeTerm = > > ontFeatures.getOrCreateTerm(Messages.getString("termORF")); > > ft.annotation = new SimpleRichAnnotation(); > > ft.featureRelationshipSet = new TreeSet(); > > ft.rankedCrossRefs = new TreeSet(); > > SimpleRichFeature featureORF = (SimpleRichFeature) > > featureMRNA.createFeature(ft); > > featureORF.setName(geneName); > > } > > > > public static int firstPosNotSpace(String str) { > > int i = 0; > > while (i < str.length() && str.charAt(i) == ' ') { > > i++; > > } > > return i; > > } > > > > public static int lastPosNotSpace(String str) { > > int i = str.length() - 1; > > while (i >= 0 && str.charAt(i) == ' ') { > > i--; > > } > > return i; > > } > > } > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > Richard Holland > Finance Director > Eagle Genomics > http://www.eaglegenomics.com/ > -- Augusto F. Vellozo From holland at eaglegenomics.com Thu Sep 4 06:37:43 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 4 Sep 2008 11:37:43 +0100 Subject: [Biojava-l] org.hibernate.NonUniqueObjectException Message-ID: Hi guys. After several people noticed that they got NonUniqueObjectExceptions after flushing the BJX cache midway through the process of reading a file into BioSQL, we've finally got to the bottom of it - several classes, including the sequence relationship and all the file format classes, had static references to Term objects which survive the cache flush and therefore got carried over into the new hibernate session. This causes duplicate objects, non unique objects, object is from another session, etc. kinds of exceptions. I've gone through and removed all static references, so now things should behave properly. Committed to SVN just now. Please let me know if this continues to be a problem and I'll take another look. cheers, Richard -- Richard Holland, BSc Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From su24 at st-andrews.ac.uk Thu Sep 4 07:55:19 2008 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Thu, 4 Sep 2008 12:55:19 +0100 Subject: [Biojava-l] Alphabet issues Message-ID: <1220529319.48bfcca785d9c@webmail.st-andrews.ac.uk> Dear Concerned, I am attempting to read in a Fasta file of amino acid sequences using the Alphabet "PROTEIN-TERM" and I am sporadically getting an Illegalsymbolexception which states "This tokenization doesn't contain character: 'M'" The code causing the exception is as follows BufferedInputStream is = new BufferedInputStream(new FileInputStream(filename)); Alphabet alpha= ProteinTools.getTAlphabet(); SequenceDB db = SeqIOTools.readFasta(is, alpha); Thanking you in advance Saif ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student Phone: +44 (0)1334 46 3362 Email:su24 at st-andrews.ac.uk http://bio.st-andrews.ac.uk/staff/su24.htm The Centre for Evolution, Genes & Genomics (CEGG), University of St Andrews ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From klerissonpaixao at gmail.com Sun Sep 7 22:29:08 2008 From: klerissonpaixao at gmail.com (=?ISO-8859-1?Q?Kl=E9risson_Paix=E3o?=) Date: Mon, 8 Sep 2008 00:29:08 -0200 Subject: [Biojava-l] Integer aligner Message-ID: <9e364a990809071929h3208c4cdyb0d5bf9390faea41@mail.gmail.com> Hey everyone!I really need to align integers. My alphabet won't be DNA, nor RNA, neither Protein. It must be integers, numbers. How can I do it using biojava? Thanks a lot! Klerisson From dtoomey at rcsi.ie Tue Sep 9 04:38:52 2008 From: dtoomey at rcsi.ie (David Toomey) Date: Tue, 9 Sep 2008 09:38:52 +0100 Subject: [Biojava-l] Parsing multiple query blast result Message-ID: Hi I have been using the Blast Parser example in the cookbook and I have it working fine for a blast result with hits from a single query sequence. However when I attempt to process a file that contains blast results from a search with multiple query sequences the ArrayList returned from the parse method only contains a single entry (from the first query sequence). Is there something I need to do to get the parser to return the results from all the query sequences. I am using Biojava 1.6. Thanks Dave From augustovmail-java at yahoo.com.br Tue Sep 9 15:03:26 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Tue, 9 Sep 2008 21:03:26 +0200 Subject: [Biojava-l] Error in SimpleRichLocation.contains() Message-ID: <381a3e850809091203q1e88e5d5v7ce03b9965560b96@mail.gmail.com> Hi Everyone. I think there is one error with the method contains in SimpleRichLocation. If I have the locations loc1=(1..10) , loc2=join(2..4, 6..8) then loc1 contains (loc2) is true, it is right? But....the program below prints false. Someone knows what's happening? Thanks a lot, Augusto public class Test { public static void main(String[] args) { RichLocation loc1 = new SimpleRichLocation(new SimplePosition(1), new SimplePosition(10), 0); RichLocation loc2 = new SimpleRichLocation(new SimplePosition(2), new SimplePosition(4), 0); RichLocation loc3 = new SimpleRichLocation(new SimplePosition(6), new SimplePosition(8), 0); ArrayList a = new ArrayList(); a.add(loc2); a.add(loc3); CompoundRichLocation loc4 = new CompoundRichLocation(a); System.out.println(loc1.contains(loc4)); } } -- Augusto F. Vellozo -- Augusto F. Vellozo From holland at eaglegenomics.com Tue Sep 9 19:42:49 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 10 Sep 2008 00:42:49 +0100 Subject: [Biojava-l] Error in SimpleRichLocation.contains() In-Reply-To: <381a3e850809091203q1e88e5d5v7ce03b9965560b96@mail.gmail.com> References: <381a3e850809091203q1e88e5d5v7ce03b9965560b96@mail.gmail.com> Message-ID: Yes, you're right, loc1 should contain loc2 and return true when asked to confirm this. Can you raise a bug report with the details you gave in your email so that we don't lose track of it? The place to go to do this is http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava cheers, Richard 2008/9/9 Augusto Fernandes Vellozo : > Hi Everyone. > > I think there is one error with the method contains in SimpleRichLocation. > > If I have the locations > loc1=(1..10) , > loc2=join(2..4, 6..8) then > loc1 contains (loc2) is true, it is right? > > But....the program below prints false. > Someone knows what's happening? > > Thanks a lot, > > Augusto > > public class Test > { > public static void main(String[] args) { > RichLocation loc1 = new SimpleRichLocation(new SimplePosition(1), > new SimplePosition(10), 0); > RichLocation loc2 = new SimpleRichLocation(new SimplePosition(2), > new SimplePosition(4), 0); > RichLocation loc3 = new SimpleRichLocation(new SimplePosition(6), > new SimplePosition(8), 0); > ArrayList a = new ArrayList(); > a.add(loc2); > a.add(loc3); > CompoundRichLocation loc4 = new CompoundRichLocation(a); > System.out.println(loc1.contains(loc4)); > } > } > > > > -- > Augusto F. Vellozo > > > > -- > Augusto F. Vellozo > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Sep 9 21:12:00 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 10 Sep 2008 02:12:00 +0100 Subject: [Biojava-l] Parsing multiple query blast result In-Reply-To: References: Message-ID: Hmm... looks like a genuine bug. I can't see anything immediately wrong in the source code and the blast results file looks normal too. What I think has happened is that the BLASTP output format for multiple input query sequences has changed recently (I see you're using the March 2008 version) and the parser doesn't recognise this yet. If you have access to an older version of BLAST, could you try repeating your query with it and attempt to parse the output results to see if this theory is correct? Whatever happens, could you raise a bug at http://bugzilla.open-bio.org/ with the details and hopefully then someone else will be able to take a look at this some time this week (I'm away for the next few days now). cheers, Richard 2008/9/9 David Toomey : > Hi Richard > > Thanks very much > > I was coding in netbeans so I have just included the source, I did split the > code up into a few separate classes but it is all directly from the > cookbook. This is in ie.rar and is kicked off from Main.java > The blast result file is also included in "test query.rar" > > The output I get when I run it is > > run: > Starting > databaseId : new_prot_target_for_download.fasta 4566 sequences; > 2,083,676 total letters > program : ncbi-blastp > queryLength : 284 > queryId : gi|5524211|gb|AAD44166.1| > version : 2.2.18 > Hits: > match: Cytochrome e score: 1.0E-108 > match: Cytochrome e score: 1.0E-13 > match: Cytochrome e score: 1.0E-9 > > > BUILD SUCCESSFUL (total time: 2 seconds) > > > Cheers > > Dave > > -----Original Message----- > From: Richard Holland [mailto:dicknetherlands at gmail.com] > Sent: 09 September 2008 15:32 > To: David Toomey > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] Parsing multiple query blast result > > Would it be possible for you to send the BLAST file and actual source > code you are working with? > > The list won't accept attachments, but you can send them directly to > me. If I can't fix it quickly or see anything obviously wrong with it, > I'll let you know ASAP so that you can post it back to the list for > someone else to investigate. (I'm going to be out at meetings for most > of the next 10 days so you probably don't want to wait for me to get > back!). > > cheers, > Richard > > 2008/9/9 David Toomey : >> Hi >> >> >> >> I have been using the Blast Parser example in the cookbook and I have it >> working fine for a blast result with hits from a single query sequence. >> However when I attempt to process a file that contains blast results from > a >> search with multiple query sequences the ArrayList returned from the parse >> method only contains a single entry (from the first query sequence). Is >> there something I need to do to get the parser to return the results from >> all the query sequences. I am using Biojava 1.6. >> >> >> >> Thanks >> >> >> >> Dave >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > Richard Holland, BSc > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- Richard Holland, BSc Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From augustovmail-java at yahoo.com.br Thu Sep 11 05:54:28 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Thu, 11 Sep 2008 11:54:28 +0200 Subject: [Biojava-l] Getting feature by qualifier_value In-Reply-To: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> References: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> Message-ID: <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> Hi all. I don't know very well hibernate, and I am with problems to do one query into database BIOSQL using Biojavax. Given one organism (NVBITaxon), I need to get the feature with one specific value of one property (field value of table SeqFeature_qualifier_value). Please, somebody knows how can i do this? Thanks a lot, -- Augusto F. Vellozo -- Augusto F. Vellozo From zagato.gekko at gmail.com Thu Sep 11 18:20:07 2008 From: zagato.gekko at gmail.com (Zagato) Date: Thu, 11 Sep 2008 17:20:07 -0500 Subject: [Biojava-l] Getting feature by qualifier_value In-Reply-To: <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> References: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> Message-ID: <98028b00809111520s3237927drf7770b95bc077cd2@mail.gmail.com> The proccess to get an object using hibernate it's something like this: TestClass // This is the class objTestClass // this is the object from the class @@@@@@@@@@ TestClass objTestClass = null; hbn_session.clear(); Query query = hbn_session.createQuery( "from TestClass where attribute = :attribute" ); query.setParameter("attribute", 55); // Here you set the value for the specific where.. List list = query.list(); // You get the result that match if( list.size() == 1 ) objTestClass = (TestClass)list.get(0); Remenber, you must have the map file in xml to hibernate, i don't remember well but in the documentacion from biojava it's a better example... (http://www.biojava.org/wiki/BioJava:Cookbook#BioSQL_and_Sequence_Databases) I hope this helps you.. Bye !!! Alan Jairo Acosta On Thu, Sep 11, 2008 at 4:54 AM, Augusto Fernandes Vellozo < augustovmail-java at yahoo.com.br> wrote: > Hi all. > > I don't know very well hibernate, and I am with problems to do one query > into database BIOSQL using Biojavax. > > Given one organism (NVBITaxon), I need to get the feature with one specific > value of one property (field value of table SeqFeature_qualifier_value). > > Please, somebody knows how can i do this? > > Thanks a lot, > > -- > Augusto F. Vellozo > > > > -- > Augusto F. Vellozo > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Farewell. http://www.youtube.com/zagatogekko ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From holland at eaglegenomics.com Fri Sep 12 11:59:05 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Sep 2008 16:59:05 +0100 Subject: [Biojava-l] Getting feature by qualifier_value In-Reply-To: <98028b00809111520s3237927drf7770b95bc077cd2@mail.gmail.com> References: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> <98028b00809111520s3237927drf7770b95bc077cd2@mail.gmail.com> Message-ID: Augusto, If you can adapt Zagato's code to load the sequence object you require, then you will find the seqfeature_qualifier_value entries are accessible by iterating over the sequence object's features then within each feature iterating over the Notes. cheers, RIchard 2008/9/11 Zagato : > The proccess to get an object using hibernate it's something like this: > > TestClass // This is the class > objTestClass // this is the object from the class > > @@@@@@@@@@ > > TestClass objTestClass = null; > hbn_session.clear(); > Query query = hbn_session.createQuery( "from TestClass where attribute = > :attribute" ); > query.setParameter("attribute", 55); // Here you set the value for the > specific where.. > List list = query.list(); // You get the result that match > if( list.size() == 1 ) > objTestClass = (TestClass)list.get(0); > > Remenber, you must have the map file in xml to hibernate, i don't remember > well but in the documentacion from biojava it's a better example... > (http://www.biojava.org/wiki/BioJava:Cookbook#BioSQL_and_Sequence_Databases) > > I hope this helps you.. Bye !!! > > Alan Jairo Acosta > > > On Thu, Sep 11, 2008 at 4:54 AM, Augusto Fernandes Vellozo < > augustovmail-java at yahoo.com.br> wrote: > >> Hi all. >> >> I don't know very well hibernate, and I am with problems to do one query >> into database BIOSQL using Biojavax. >> >> Given one organism (NVBITaxon), I need to get the feature with one specific >> value of one property (field value of table SeqFeature_qualifier_value). >> >> Please, somebody knows how can i do this? >> >> Thanks a lot, >> >> -- >> Augusto F. Vellozo >> >> >> >> -- >> Augusto F. Vellozo >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > Farewell. > http://www.youtube.com/zagatogekko > ruby << __EOF__ > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse > __EOF__ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From augustovmail-java at yahoo.com.br Fri Sep 12 12:13:02 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Fri, 12 Sep 2008 18:13:02 +0200 Subject: [Biojava-l] Getting feature by qualifier_value In-Reply-To: References: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> <98028b00809111520s3237927drf7770b95bc077cd2@mail.gmail.com> Message-ID: <381a3e850809120913k1e25cean8585a8082e99cbc2@mail.gmail.com> Hi Richard, The problem is to obtain the sequence object, because the sequence object I require depends on the seqfeature_qualifier_value. I don't have the sequence object, i need to search and obtain the sequence related to one value stored in seqfeature_qualifier_value. I know how to do this using SQL query directly, but i don't know how do this using hibernate. Thanks, Augusto 2008/9/12 Richard Holland > Augusto, > > If you can adapt Zagato's code to load the sequence object you > require, then you will find the seqfeature_qualifier_value entries are > accessible by iterating over the sequence object's features then > within each feature iterating over the Notes. > > cheers, > RIchard > > 2008/9/11 Zagato : > > The proccess to get an object using hibernate it's something like this: > > > > TestClass // This is the class > > objTestClass // this is the object from the class > > > > @@@@@@@@@@ > > > > TestClass objTestClass = null; > > hbn_session.clear(); > > Query query = hbn_session.createQuery( "from TestClass where attribute = > > :attribute" ); > > query.setParameter("attribute", 55); // Here you set the value for the > > specific where.. > > List list = query.list(); // You get the result that match > > if( list.size() == 1 ) > > objTestClass = (TestClass)list.get(0); > > > > Remenber, you must have the map file in xml to hibernate, i don't > remember > > well but in the documentacion from biojava it's a better example... > > ( > http://www.biojava.org/wiki/BioJava:Cookbook#BioSQL_and_Sequence_Databases > ) > > > > I hope this helps you.. Bye !!! > > > > Alan Jairo Acosta > > > > > > On Thu, Sep 11, 2008 at 4:54 AM, Augusto Fernandes Vellozo < > > augustovmail-java at yahoo.com.br> wrote: > > > >> Hi all. > >> > >> I don't know very well hibernate, and I am with problems to do one query > >> into database BIOSQL using Biojavax. > >> > >> Given one organism (NVBITaxon), I need to get the feature with one > specific > >> value of one property (field value of table SeqFeature_qualifier_value). > >> > >> Please, somebody knows how can i do this? > >> > >> Thanks a lot, > >> > >> -- > >> Augusto F. Vellozo > >> > >> > >> > >> -- > >> Augusto F. Vellozo > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > > > > > -- > > Farewell. > > http://www.youtube.com/zagatogekko > > ruby << __EOF__ > > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse > > __EOF__ > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > Richard Holland, BSc > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Augusto F. Vellozo From holland at eaglegenomics.com Fri Sep 12 12:45:18 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Sep 2008 17:45:18 +0100 Subject: [Biojava-l] Getting feature by qualifier_value In-Reply-To: <381a3e850809120913k1e25cean8585a8082e99cbc2@mail.gmail.com> References: <381a3e850809110253qb107cffld5b3d2ba5e57c7ca@mail.gmail.com> <381a3e850809110254x6829eb5fo3a4284edd5deb547@mail.gmail.com> <98028b00809111520s3237927drf7770b95bc077cd2@mail.gmail.com> <381a3e850809120913k1e25cean8585a8082e99cbc2@mail.gmail.com> Message-ID: Ah, in that case, have a read of this previous post on the same subject: http://www.nabble.com/How-to-query-%27value%27-in-%27seqfeature_qualifier_value%27-using-HQL-td17511584.html 2008/9/12 Augusto Fernandes Vellozo : > Hi Richard, > > The problem is to obtain the sequence object, because the sequence object I > require depends on the seqfeature_qualifier_value. > I don't have the sequence object, i need to search and obtain the sequence > related to one value stored in seqfeature_qualifier_value. > I know how to do this using SQL query directly, but i don't know how do this > using hibernate. > > Thanks, > > Augusto > > 2008/9/12 Richard Holland >> >> Augusto, >> >> If you can adapt Zagato's code to load the sequence object you >> require, then you will find the seqfeature_qualifier_value entries are >> accessible by iterating over the sequence object's features then >> within each feature iterating over the Notes. >> >> cheers, >> RIchard >> >> 2008/9/11 Zagato : >> > The proccess to get an object using hibernate it's something like this: >> > >> > TestClass // This is the class >> > objTestClass // this is the object from the class >> > >> > @@@@@@@@@@ >> > >> > TestClass objTestClass = null; >> > hbn_session.clear(); >> > Query query = hbn_session.createQuery( "from TestClass where attribute = >> > :attribute" ); >> > query.setParameter("attribute", 55); // Here you set the value for the >> > specific where.. >> > List list = query.list(); // You get the result that match >> > if( list.size() == 1 ) >> > objTestClass = (TestClass)list.get(0); >> > >> > Remenber, you must have the map file in xml to hibernate, i don't >> > remember >> > well but in the documentacion from biojava it's a better example... >> > >> > (http://www.biojava.org/wiki/BioJava:Cookbook#BioSQL_and_Sequence_Databases) >> > >> > I hope this helps you.. Bye !!! >> > >> > Alan Jairo Acosta >> > >> > >> > On Thu, Sep 11, 2008 at 4:54 AM, Augusto Fernandes Vellozo < >> > augustovmail-java at yahoo.com.br> wrote: >> > >> >> Hi all. >> >> >> >> I don't know very well hibernate, and I am with problems to do one >> >> query >> >> into database BIOSQL using Biojavax. >> >> >> >> Given one organism (NVBITaxon), I need to get the feature with one >> >> specific >> >> value of one property (field value of table >> >> SeqFeature_qualifier_value). >> >> >> >> Please, somebody knows how can i do this? >> >> >> >> Thanks a lot, >> >> >> >> -- >> >> Augusto F. Vellozo >> >> >> >> >> >> >> >> -- >> >> Augusto F. Vellozo >> >> _______________________________________________ >> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > >> > >> > >> > -- >> > Farewell. >> > http://www.youtube.com/zagatogekko >> > ruby << __EOF__ >> > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse >> > __EOF__ >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- >> Richard Holland, BSc >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ > > > > -- > Augusto F. Vellozo > -- Richard Holland, BSc Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From paolo.romano at istge.it Tue Sep 16 09:00:33 2008 From: paolo.romano at istge.it (Paolo Romano) Date: Tue, 16 Sep 2008 15:00:33 +0200 Subject: [Biojava-l] CFP: Semantic Web Applications and Tools for Life Sciences (SWAT4LS) Message-ID: <200809161315.m8GDFxuG036232@ibm43p.biotech.ist.unige.it> Apologies, if you're receiving multiple copies. ---------------------------------------------------------------- International Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS) 28th November 2008, Edinburgh, UK http://www.swat4ls.org/ CALL FOR PAPERS Overview -------- The workshop is organized in sessions and open discussions. Invited speakers will present state-of-the-art, provocative lectures on the workshop's main topic, while a number of submissions will be accepted as oral presentations and posters on all workshop's topics. Workshop Description -------------------- Semantic Web technologies, tools and applications are starting to emerge in Life Sciences. In recent years, systems have been introduced and an increasing interest among researchers is arising. This workshop will provide a venue to present and discuss benefits and limits of the adoption of these technologies and tools in biomedical informatics and computational biology. It will showcase experiences, information resources, tools development and applications. It will bring together researchers, both developers and users, from the various fields of Biology, Bioinformatics and Computer Science, to discuss goals, current limits and some real use cases for Semantic Web technologies in Life Sciences. Keynote Speakers ------------------------ + Semantic web technology in translational cancer research, Michael Krauthammer, Department of Pathology, Yale University School of Medicine, USA + Using Ontologies to bring Web Services on to the Semantic Web Mark Wilkinson, Dept. of Medical Genetics, University of British Columbia, Canada Workshop Venue and Format ------------------------ The workshop will take place in Edinburgh, Scotland, on 28 November 2008, and is hosted by the e-Science Institute of the UK's National e-Science Centre (NeSC). SWAT4LS will be a one-day workshop and will consist of two invited talks, regular paper and poster presentations. The workshop will conclude with a panel discussion on the strength and weaknesses of the Semantic Web for the Life Sciences. Deadlines --------- * Submission deadline (both papers and posters): 30 September 2008 * Notification of acceptance: 20 October 2008 * Camera-ready submission: 3 November 2008 (Authors who are interested in submitting a paper/poster to the workshop, but may require a short extension to the deadline, should get in touch with the organisers.) Topics of Interest ------------------ Topics of interest include, but are not limited to: * Standards, Technologies, Tools for the Semantic Web o Semantic Web standards (RDF, OWL, ...) o RDF/OWL, SKOS, .... and their applicability to bioinformatics o RDF Schemas and Query systems o Biomedical Ontologies and related tools o Formal approaches to large biomedical controlled terminologies and vocabularies * Systems for a Semantic Web for Bioinformatics o Bio-ontologies, RDF stores, Semantic Web Services o RDF repositories and query systems for life sciences o Semantically aware biomedical Web Services o Semantic Biological Data Integration Systems * Existing and perspective applications of the Semantic Web for Bioinformatics o Semantic browsers, Semantic collaborative research o Case studies, use cases, and scenarios o Semantic Web applications in life sciences Type of contributions ---------------------- The following possible original contributions are sought: * Oral communications (regular papers) * Posters * Software demos All accepted oral communications and posters will be published with the CEUR-WS.org Workshop Proceedings service (see http://ceur-ws.org/). Furthermore, a selection of papers will be published in a special issue of the BMC Bioinformatics journal devoted to the SWAT4LS workshop. To this end, a special Call will be launched shortly after the workshop, for extended and revised versions of contributions submitted to the workshop and accepted either as oral communication or poster. Workshop Chairs ---------------- + Albert Burger, School of Mathematical and Computer Sciences, Heriot-Watt University, and Human Genetics Unit, Medical Research Council, Edinburgh, United Kingdom + Adrian Paschke, Biotechnology Centre, TU Dresden, Dresden, Germany + Paolo Romano, Bioinformatics, National Cancer Research Institute, Genova, Italy + Andrea Splendiani, Medical Informatics Department, University of Rennes 1, Rennes, France ===================================================================== In Co-operation with: - National Cancer Research Institute, Genova, Italy - Biotechnology Centre, TU Dresden, Dresden, Germany - School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, United Kingdom - Universit? de Rennes 1, Rennes, France - e-Science Institute, Edinburgh, Scotland, United Kingdom - SeaLife Project, European Commission Information Society and Media - Laboratory for Interdisciplinary Technologies in Bioinformatics (LITBIO), Italy ===================================================================== For any information, refer to info at swat4ls.org . Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 From big.swish at gmail.com Thu Sep 18 23:37:59 2008 From: big.swish at gmail.com (Doug Swisher) Date: Thu, 18 Sep 2008 22:37:59 -0500 Subject: [Biojava-l] How to find a sequence within a larger sequence and flip it Message-ID: <31f1101d0809182037o13459ec5s282258967738aeb2@mail.gmail.com> Hi, I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can help out a bit...even if it's just a hint as to where to look next. I have a long DNA sequence and a shorter sequence that exists within the larger one. I want to find the location of the smaller sequence within the larger one, and then create a new sequence with the small one flipped end-for-end. That's confusing, so let me give an example. Long sequence: aaaagacttttt Short sequence: gact Goal sequence: aaaatcagtttt To find the location of the short sequence within the larger one, I could certainly do some string manipulation: SymbolList bigDNA = DNATools.createDNA("aaaagacttttt"); SymbolList subDNA = DNATools.createDNA("gact"); int start = bigDNA.seqString().indexOf(subDNA.seqString()); While that would work, I'm wondering if there is a more efficient method that avoids the conversion to strings (in my real code, I start with Sequences, not strings; I used SymbolLists here for simplicity). To "excise" the short sequence, flip it around, and construct a new SymbolList, I could also do some string manipulation, as in the following: StringBuilder middle = new StringBuilder(subDNA.seqString()); String leftPart = bigDNA.seqString().substring(0, subDNA.length()); String rightPart = bigDNA.seqString().substring(start + subDNA.length(), bigDNA.length()); SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() + rightPart); Looking at the documentation, such as ProjectionUtils or SymbolList.edit(), it appears there might be some support for manipulating the sequence directly. Is there a way to do it, without again dropping "down" to strings? Thanks in advance for any assistance. Cheers, -Doug P.S. Yeah, the second code snippet is pretty inefficient; I was trying to be clear rather than efficient. From holland at eaglegenomics.com Fri Sep 19 04:42:50 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Sep 2008 09:42:50 +0100 Subject: [Biojava-l] How to find a sequence within a larger sequence and flip it In-Reply-To: <31f1101d0809182037o13459ec5s282258967738aeb2@mail.gmail.com> References: <31f1101d0809182037o13459ec5s282258967738aeb2@mail.gmail.com> Message-ID: Hello. To be honest, I think you've already got the only way to quickly locate a subsequence within a sequence. For whatever reason, the Sequence and SymbolList interfaces lack any kind of indexOf() or find() functions, and the SequenceTools class, usually the provider of all things useful, also fails to fill the gap. You're right about there being a SymbolList edit facility. This only works on SymbolLists that have declared themselves editable, which will depend on how your SymbolList objects were created. What you do is create a new Edit object, based on starting position in the original sequence, length of sequence to remove in the original, and the SymbolList you want to use to replace the removed bits. Then you pass this to the edit() method on the SymbolList/Sequence object you want to replace. So, the end result is only a small improvement on your original plan, but here goes: 1. Create your sequence. 2. Create your other sequence. 3. Convert both to strings and use an indexOf in the String object to locate the subsequence in the original sequence. 4. Use string tools to flip the subsequence then create a new SymbolList based on it. 5. If the original sequence is editable, use the Edit method described above to replace a chunk of it with the new flipped subsequence. Otherwise, construct a new string using the String object methods and construct a new original sequence based on that instead. cheers. Richard 2008/9/19 Doug Swisher : > Hi, > > I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can help > out a bit...even if it's just a hint as to where to look next. > > I have a long DNA sequence and a shorter sequence that exists within the > larger one. I want to find the location of the smaller sequence within the > larger one, and then create a new sequence with the small one flipped > end-for-end. That's confusing, so let me give an example. > > Long sequence: aaaagacttttt > Short sequence: gact > Goal sequence: aaaatcagtttt > > To find the location of the short sequence within the larger one, I could > certainly do some string manipulation: > > SymbolList bigDNA = DNATools.createDNA("aaaagacttttt"); > SymbolList subDNA = DNATools.createDNA("gact"); > int start = bigDNA.seqString().indexOf(subDNA.seqString()); > > While that would work, I'm wondering if there is a more efficient method > that avoids the conversion to strings (in my real code, I start with > Sequences, not strings; I used SymbolLists here for simplicity). > > To "excise" the short sequence, flip it around, and construct a new > SymbolList, I could also do some string manipulation, as in the following: > > StringBuilder middle = new StringBuilder(subDNA.seqString()); > String leftPart = bigDNA.seqString().substring(0, subDNA.length()); > String rightPart = bigDNA.seqString().substring(start + subDNA.length(), > bigDNA.length()); > SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() + > rightPart); > > Looking at the documentation, such as ProjectionUtils or SymbolList.edit(), > it appears there might be some support for manipulating the sequence > directly. Is there a way to do it, without again dropping "down" to > strings? > > Thanks in advance for any assistance. > > Cheers, > -Doug > > P.S. Yeah, the second code snippet is pretty inefficient; I was trying to be > clear rather than efficient. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Fri Sep 19 08:43:44 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 19 Sep 2008 20:43:44 +0800 Subject: [Biojava-l] How to find a sequence within a larger sequence and flip it In-Reply-To: References: <31f1101d0809182037o13459ec5s282258967738aeb2@mail.gmail.com> Message-ID: <93b45ca50809190543g18cb5e93rc51245cfe1904ac@mail.gmail.com> Hi - You don't have to go to a String to make a match. There is a class SymbolListCharSequence that wraps a SymbolList as a CharSequence that lets you perform Regexs etc to identify the match. You can also use the KnuthMorrisPrattSearch to find exact matches. Finally to find non-exact matches you can use the SmithWaterman or Needleman Wunsch. - Mark On Fri, Sep 19, 2008 at 4:42 PM, Richard Holland wrote: > Hello. > > To be honest, I think you've already got the only way to quickly > locate a subsequence within a sequence. For whatever reason, the > Sequence and SymbolList interfaces lack any kind of indexOf() or > find() functions, and the SequenceTools class, usually the provider of > all things useful, also fails to fill the gap. > > You're right about there being a SymbolList edit facility. This only > works on SymbolLists that have declared themselves editable, which > will depend on how your SymbolList objects were created. What you do > is create a new Edit object, based on starting position in the > original sequence, length of sequence to remove in the original, and > the SymbolList you want to use to replace the removed bits. Then you > pass this to the edit() method on the SymbolList/Sequence object you > want to replace. > > So, the end result is only a small improvement on your original plan, > but here goes: > > 1. Create your sequence. > 2. Create your other sequence. > 3. Convert both to strings and use an indexOf in the String object to > locate the subsequence in the original sequence. > 4. Use string tools to flip the subsequence then create a new > SymbolList based on it. > 5. If the original sequence is editable, use the Edit method > described above to replace a chunk of it with the new flipped > subsequence. Otherwise, construct a new string using the String object > methods and construct a new original sequence based on that instead. > > cheers. > Richard > > 2008/9/19 Doug Swisher : > > Hi, > > > > I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can > help > > out a bit...even if it's just a hint as to where to look next. > > > > I have a long DNA sequence and a shorter sequence that exists within the > > larger one. I want to find the location of the smaller sequence within > the > > larger one, and then create a new sequence with the small one flipped > > end-for-end. That's confusing, so let me give an example. > > > > Long sequence: aaaagacttttt > > Short sequence: gact > > Goal sequence: aaaatcagtttt > > > > To find the location of the short sequence within the larger one, I could > > certainly do some string manipulation: > > > > SymbolList bigDNA = DNATools.createDNA("aaaagacttttt"); > > SymbolList subDNA = DNATools.createDNA("gact"); > > int start = bigDNA.seqString().indexOf(subDNA.seqString()); > > > > While that would work, I'm wondering if there is a more efficient method > > that avoids the conversion to strings (in my real code, I start with > > Sequences, not strings; I used SymbolLists here for simplicity). > > > > To "excise" the short sequence, flip it around, and construct a new > > SymbolList, I could also do some string manipulation, as in the > following: > > > > StringBuilder middle = new StringBuilder(subDNA.seqString()); > > String leftPart = bigDNA.seqString().substring(0, subDNA.length()); > > String rightPart = bigDNA.seqString().substring(start + > subDNA.length(), > > bigDNA.length()); > > SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() + > > rightPart); > > > > Looking at the documentation, such as ProjectionUtils or > SymbolList.edit(), > > it appears there might be some support for manipulating the sequence > > directly. Is there a way to do it, without again dropping "down" to > > strings? > > > > Thanks in advance for any assistance. > > > > Cheers, > > -Doug > > > > P.S. Yeah, the second code snippet is pretty inefficient; I was trying to > be > > clear rather than efficient. > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From big.swish at gmail.com Fri Sep 19 10:20:08 2008 From: big.swish at gmail.com (Doug Swisher) Date: Fri, 19 Sep 2008 09:20:08 -0500 Subject: [Biojava-l] How to find a sequence within a larger sequence and flip it In-Reply-To: <93b45ca50809190543g18cb5e93rc51245cfe1904ac@mail.gmail.com> References: <31f1101d0809182037o13459ec5s282258967738aeb2@mail.gmail.com> <93b45ca50809190543g18cb5e93rc51245cfe1904ac@mail.gmail.com> Message-ID: <31f1101d0809190720vb492313h69b5dc29efd38c31@mail.gmail.com> Mark & Richard, Thanks for the quick responses. It looks like the combination of KnuthMorrisPrattSearch and edit() will do just what I need. FYI, The SymbolListCharSequence won't work for me, as I'm actually porting the code to .Net, and the .Net RegEx engine isn't flexible enough to accept a non-string. (Please don't hate me; I'm working in a java-averse environment, and I want to take advantage of all the BioJava goodness.) Cheers, -Doug On Fri, Sep 19, 2008 at 7:43 AM, Mark Schreiber wrote: > Hi - > > You don't have to go to a String to make a match. There is a class > SymbolListCharSequence that wraps a SymbolList as a CharSequence that lets > you perform Regexs etc to identify the match. You can also use the > KnuthMorrisPrattSearch to find exact matches. > > Finally to find non-exact matches you can use the SmithWaterman or > Needleman Wunsch. > > - Mark > > On Fri, Sep 19, 2008 at 4:42 PM, Richard Holland < > holland at eaglegenomics.com> wrote: > >> Hello. >> >> To be honest, I think you've already got the only way to quickly >> locate a subsequence within a sequence. For whatever reason, the >> Sequence and SymbolList interfaces lack any kind of indexOf() or >> find() functions, and the SequenceTools class, usually the provider of >> all things useful, also fails to fill the gap. >> >> You're right about there being a SymbolList edit facility. This only >> works on SymbolLists that have declared themselves editable, which >> will depend on how your SymbolList objects were created. What you do >> is create a new Edit object, based on starting position in the >> original sequence, length of sequence to remove in the original, and >> the SymbolList you want to use to replace the removed bits. Then you >> pass this to the edit() method on the SymbolList/Sequence object you >> want to replace. >> >> So, the end result is only a small improvement on your original plan, >> but here goes: >> >> 1. Create your sequence. >> 2. Create your other sequence. >> 3. Convert both to strings and use an indexOf in the String object to >> locate the subsequence in the original sequence. >> 4. Use string tools to flip the subsequence then create a new >> SymbolList based on it. >> 5. If the original sequence is editable, use the Edit method >> described above to replace a chunk of it with the new flipped >> subsequence. Otherwise, construct a new string using the String object >> methods and construct a new original sequence based on that instead. >> >> cheers. >> Richard >> >> 2008/9/19 Doug Swisher : >> > Hi, >> > >> > I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can >> help >> > out a bit...even if it's just a hint as to where to look next. >> > >> > I have a long DNA sequence and a shorter sequence that exists within the >> > larger one. I want to find the location of the smaller sequence within >> the >> > larger one, and then create a new sequence with the small one flipped >> > end-for-end. That's confusing, so let me give an example. >> > >> > Long sequence: aaaagacttttt >> > Short sequence: gact >> > Goal sequence: aaaatcagtttt >> > >> > To find the location of the short sequence within the larger one, I >> could >> > certainly do some string manipulation: >> > >> > SymbolList bigDNA = DNATools.createDNA("aaaagacttttt"); >> > SymbolList subDNA = DNATools.createDNA("gact"); >> > int start = bigDNA.seqString().indexOf(subDNA.seqString()); >> > >> > While that would work, I'm wondering if there is a more efficient method >> > that avoids the conversion to strings (in my real code, I start with >> > Sequences, not strings; I used SymbolLists here for simplicity). >> > >> > To "excise" the short sequence, flip it around, and construct a new >> > SymbolList, I could also do some string manipulation, as in the >> following: >> > >> > StringBuilder middle = new StringBuilder(subDNA.seqString()); >> > String leftPart = bigDNA.seqString().substring(0, subDNA.length()); >> > String rightPart = bigDNA.seqString().substring(start + >> subDNA.length(), >> > bigDNA.length()); >> > SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() + >> > rightPart); >> > >> > Looking at the documentation, such as ProjectionUtils or >> SymbolList.edit(), >> > it appears there might be some support for manipulating the sequence >> > directly. Is there a way to do it, without again dropping "down" to >> > strings? >> > >> > Thanks in advance for any assistance. >> > >> > Cheers, >> > -Doug >> > >> > P.S. Yeah, the second code snippet is pretty inefficient; I was trying >> to be >> > clear rather than efficient. >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From dtoomey at rcsi.ie Tue Sep 30 09:26:02 2008 From: dtoomey at rcsi.ie (David Toomey) Date: Tue, 30 Sep 2008 14:26:02 +0100 Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result Message-ID: Hi I am parsing a blast result and I am getting a StringIndexOutOfBoundsException. The stack trace is at java.lang.String.substring(String.java:1938) at java.lang.String.substring(String.java:1905) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA lignmentSAXParser.java:291) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign mentSAXParser.java:116) at org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP arser.java:517) at org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP arser.java:287) at org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse r.java:251) at org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja va:117) at org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser .java:634) at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341 ) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars er.java:314) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser. java:276) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java :163) at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65) at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44) at ie.rcsi.blast.Main.main(Main.java:30) I have updated BlastLikeAlignmentSAXParser to output some debug info and narrowed down the line causing the problem to the following line 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) GN=ISPF If I remove the carriage return and put it on a single line then everything works fine. Strangely if I copy this entry and put it in a file on it's own it also parses correctly, even with the carriage return!!! Has anyone seen this before or does anyone have a suggestion on what I might to do fix it. I send the complete blast result if it would help. I have tried using blast 2.2.18 and 2.2.17 and the problem is the same. Cheers Dave From holland at eaglegenomics.com Tue Sep 30 12:31:17 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 30 Sep 2008 17:31:17 +0100 Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result In-Reply-To: References: Message-ID: Sounds like it _might_ be something to do with the carriage return itself. Is the blast file generated on the same OS that you're running your analysis on? (e.g. you might run Blast on a Linux box, but attempt to parse the file on a Windows box?). If the two OSes are different, this might point to it - as Linux won't necessarily understand the Windows linebreaks, or vice versa, and might misinterpret them. When you copy the portion of the file to a new file on the OS you're running the analysis on, it will substitute its own local linebreaks and thus mask the problem. So the first thing I'd check is to what the two OSes involved are. If they're different, try running your analysis program on the same OS as the Blast output was generated on. If that does fix it, then try putting your Blast files through dos2unix or something similar to convert the linebreaks before running your analysis program. If they're the same OS, then we still have a problem! cheers, Richard 2008/9/30 David Toomey : > Hi > > > > I am parsing a blast result and I am getting a > StringIndexOutOfBoundsException. The stack trace is > > > > at java.lang.String.substring(String.java:1938) > > at java.lang.String.substring(String.java:1905) > > at > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA > lignmentSAXParser.java:291) > > at > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign > mentSAXParser.java:116) > > at > org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP > arser.java:517) > > at > org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP > arser.java:287) > > at > org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse > r.java:251) > > at > org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja > va:117) > > at > org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser > .java:634) > > at > org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341 > ) > > at > org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168) > > at > org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars > er.java:314) > > at > org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser. > java:276) > > at > org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java > :163) > > at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65) > > at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44) > > at ie.rcsi.blast.Main.main(Main.java:30) > > > > I have updated BlastLikeAlignmentSAXParser to output some debug info and > narrowed down the line causing the problem to the following line > > > > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) > > GN=ISPF > > > > If I remove the carriage return and put it on a single line then everything > works fine. Strangely if I copy this entry and put it in a file on it's own > it also parses correctly, even with the carriage return!!! > > > > Has anyone seen this before or does anyone have a suggestion on what I might > to do fix it. I send the complete blast result if it would help. I have > tried using blast 2.2.18 and 2.2.17 and the problem is the same. > > > > Cheers > > > > Dave > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/