From mark.schreiber at group.novartis.com Sun Oct 3 22:03:14 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Sun Oct 3 22:02:51 2004 Subject: [Biojava-l] Per-sequence information from PairwiseAlignment Message-ID: Some examples are at: http://www.biojava.org/docs/bj_in_anger/profileHMM.htm and http://www.biojava.org/tutorials/dp-doc.html Rahul Karnik Sent by: biojava-l-bounces@portal.open-bio.org 09/20/2004 03:40 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Per-sequence information from PairwiseAlignment Hello, [I am a new user of BioJava, so please forgive any obvious errors I may make.] I was trying to modify the PairwiseAlignment demo to show the traditional output of alignment programs, i.e. the aligned sequences alongside one another. It seems this information is available from: StatePath result = aligner.viterbi(seqs, ScoreType.PROBABILITY); I was able to get result.symbolListForLabel(StatePath.SEQUENCE) and display the symbols, but these seem to have both sequences in each symbol (of class SimpleAtomicSymbol). I could parse out each sequence from the combined SymbolList, but I was wondering if there is a better way to achieve this. Thanks, Rahul _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at group.novartis.com Mon Oct 4 01:47:30 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Mon Oct 4 01:46:57 2004 Subject: [Biojava-l] BioSQL Message-ID: Does anyone have a current BioSQL schema that matches the BioJava bindings? Preferably one for Oracle. Thanks, Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From bioinformatica at adinet.com.uy Thu Oct 7 14:59:48 2004 From: bioinformatica at adinet.com.uy (=?ISO-8859-15?Q?Carlos=20Montemui=F1o?=) Date: Thu Oct 7 14:56:30 2004 Subject: [Biojava-l] Clustering services Message-ID: <414D1AF700017EEF@nfs04.in.adinet.com.uy> Does anyone have an idea if it's possible to clustering EST (or DNA) with BioJava. Thanks, Carlos. Carlos Montemui?o Research Scientist (Bioinformatics) Universidad Cat?lica From mark.schreiber at group.novartis.com Thu Oct 7 21:44:02 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Thu Oct 7 21:43:20 2004 Subject: [Biojava-l] Clustering services Message-ID: You could do it with the DP package but it would be very slow. It would be much better to use one of the specialized algorithms such as GAP3 - Mark Carlos Montemui?o Sent by: biojava-l-bounces@portal.open-bio.org 10/07/2004 09:59 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Clustering services Does anyone have an idea if it's possible to clustering EST (or DNA) with BioJava. Thanks, Carlos. Carlos Montemui?o Research Scientist (Bioinformatics) Universidad Cat?lica _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From lmorris at ebi.ac.uk Thu Oct 21 06:35:40 2004 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Thu Oct 21 06:34:44 2004 Subject: [Biojava-l] parsing EMBL files Message-ID: <417790FC.8010100@ebi.ac.uk> Hi I'm using the biojava (SeqIOTools.readEmbl()) to parse EMBL flat files. However I've noticed after reading EMBL flat files, and writing them out again using biojava the location data of features can change. e.g: In the original flat file (accession number : AE001273) this location data in present: FT CDS join(complement(536417..536485),complement(535378..536418)) This gets changed by Biojava to: FT CDS complement(535378..536485) I looked at the biojava src code and found out that this occurs because Biojava merges sub-locations in join statments that are overlapping. If I comment out this code in LocationTools._union(): if(canMerge(last,cur)) { try { last = MergeLocation.mergeLocations(last,cur); } catch (BioException ex) { throw new BioError("Cannot make MergeLocation",ex); } } Then the location appears as it should in the original EMBL format, with the join descibing the overlap. Overlaps between sub-locations in joins,are allowed in EMBL format, but are very rare. They maybe used to describe frameshifts,occurring through sequence errors. Is there another reason, for including this code which perhaps I've missed? I've commented it out and diffed the files produced, and there aren't any other differences with the EMBL file (AE001273.embl) at least. I just wanted to check if by removing this code, whether there might be other side effects. Many thanks, Lorna Morris From mark.schreiber at group.novartis.com Thu Oct 21 21:02:59 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Thu Oct 21 21:02:07 2004 Subject: [Biojava-l] parsing EMBL files Message-ID: I guess the logic was, if two sublocations overlap then they should be merged. Apparently that is not always the case : ( I think that the Location that is formed will be an instance of a MergeLocation in which case it should be possible to recover the sub-locations Lorna Morris Sent by: biojava-l-bounces@portal.open-bio.org 10/21/2004 06:35 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] parsing EMBL files Hi I'm using the biojava (SeqIOTools.readEmbl()) to parse EMBL flat files. However I've noticed after reading EMBL flat files, and writing them out again using biojava the location data of features can change. e.g: In the original flat file (accession number : AE001273) this location data in present: FT CDS join(complement(536417..536485),complement(535378..536418)) This gets changed by Biojava to: FT CDS complement(535378..536485) I looked at the biojava src code and found out that this occurs because Biojava merges sub-locations in join statments that are overlapping. If I comment out this code in LocationTools._union(): if(canMerge(last,cur)) { try { last = MergeLocation.mergeLocations(last,cur); } catch (BioException ex) { throw new BioError("Cannot make MergeLocation",ex); } } Then the location appears as it should in the original EMBL format, with the join descibing the overlap. Overlaps between sub-locations in joins,are allowed in EMBL format, but are very rare. They maybe used to describe frameshifts,occurring through sequence errors. Is there another reason, for including this code which perhaps I've missed? I've commented it out and diffed the files produced, and there aren't any other differences with the EMBL file (AE001273.embl) at least. I just wanted to check if by removing this code, whether there might be other side effects. Many thanks, Lorna Morris _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From lmorris at ebi.ac.uk Mon Oct 25 06:14:37 2004 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Mon Oct 25 06:13:46 2004 Subject: [Biojava-l] parsing EMBL files In-Reply-To: References: Message-ID: <417CD20D.80804@ebi.ac.uk> Hi Mark Thanks for your answer. I can see the logic, it is an odd way to represent a frameshift by using an overlap in a join statement. It seems easier for me to not call mergeLocations in the first place instead of recovering the sub-locations from the MergeLocation instance. I spotted something else, whilst parsing EMBL files. Sorting sub-locations by their natural order (min, max values) doesn't always work, if the CompoundLocation overlaps with the origin. For example the following join statement occurs in the EMBL file (AE001273.embl): join(1041920..1042519,1..1176) From this Biojava creates the following CompoundLocation: 1176, 1041920,{([1,1176]), [1041920,1042519])} A call to Collections.sort(locations, Location.naturalOrder) in constructor of CompoundLocation flips the 2 sub-locations, so the one with the lower coordinates appears first. The quickest solution for me is to retain the original order of the Location objects as they are parsed, and not do any sorting. Perhaps there is a better solution, to allow CompoundLocation objects to be CircularLocations and if they pass through the origin, then don't sort them according to the naturalOrder. Do you think this solution would be preferable, or do you think it is sufficient to retain original order of sub-locations in all CompoundLocations. I will stick with this quick fix for the time being, but maybe work on a better one for the future, if it is more appropriate. Thanks for your help, Lorna mark.schreiber@group.novartis.com wrote: >I guess the logic was, if two sublocations overlap then they should be >merged. Apparently that is not always the case : ( > >I think that the Location that is formed will be an instance of a >MergeLocation in which case it should be possible to recover the >sub-locations > > > > > >Lorna Morris >Sent by: biojava-l-bounces@portal.open-bio.org >10/21/2004 06:35 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] parsing EMBL files > > >Hi > >I'm using the biojava (SeqIOTools.readEmbl()) to parse EMBL flat files. >However I've noticed after reading EMBL flat files, and writing them out >again using biojava the location data of features can change. > >e.g: > >In the original flat file (accession number : AE001273) this location >data in present: > >FT CDS >join(complement(536417..536485),complement(535378..536418)) > >This gets changed by Biojava to: > >FT CDS complement(535378..536485) > > >I looked at the biojava src code and found out that this occurs because >Biojava merges sub-locations in join statments that are overlapping. > >If I comment out this code in LocationTools._union(): > >if(canMerge(last,cur)) { > try { > > last = MergeLocation.mergeLocations(last,cur); > > } catch (BioException ex) { > throw new BioError("Cannot make MergeLocation",ex); > } > } > > >Then the location appears as it should in the original EMBL format, with >the join descibing the overlap. Overlaps between sub-locations in >joins,are allowed in EMBL format, but are very rare. They maybe used to >describe frameshifts,occurring through sequence errors. > >Is there another reason, for including this code which perhaps I've >missed? I've commented it out and diffed the files produced, and there >aren't any other differences with the EMBL file (AE001273.embl) at least. > >I just wanted to check if by removing this code, whether there might be >other side effects. > >Many thanks, > >Lorna Morris > > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > > >