From aumanga at biggjapan.com Thu Jan 8 04:04:38 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 18:04:38 +0900 Subject: [Biojava-l] Genebank Webservices (corrrect result page) Message-ID: <4965C1A6.1030306@biggjapan.com> Sorry, correct result page is : http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 From aumanga at biggjapan.com Thu Jan 8 04:09:22 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 18:09:22 +0900 Subject: [Biojava-l] Genebank Webservices (corrrect result page) In-Reply-To: <4965C1A6.1030306@biggjapan.com> References: <4965C1A6.1030306@biggjapan.com> Message-ID: <4965C2C2.2030801@biggjapan.com> Greetings all, Sorry if this is reposted! I come from a computer science background and only have little knowledge in bioinformatics. In the application I develop,I want to search for an genebank id (like 4558277) from ncbi and want to retrieve the relavent PDB_ID. For example : Say for id '4558277', i get the result http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 I can see the value '1F58_L' which is only significant to me. I want to know where there is any webservice to do retrive this information. That means, I send '4558277' in SOAP input parameters and in the result I should get the value '1F58_L' . I found following webservices and want to know whether I can use the one for 'Gene' : http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html thanks in advance, umanga Ashika Umanga Umagiliya wrote: > Sorry, correct result page is : > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From aumanga at biggjapan.com Thu Jan 8 03:58:09 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 17:58:09 +0900 Subject: [Biojava-l] Genebank Webservices ? Message-ID: <4965C021.2060109@biggjapan.com> Greetings all, I come from a computer science background and only have little knowledge in bioinformatics. In the application I develop,I want to search for an genebank id (like 4558277) from ncbi and want to retrieve the relavent PDB_ID. For example : Say for id '4558277', i get the result http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277 I can see the value '1F58_L' which is only significant to me. I want to know where there is any webservice to do retrive this information. That means, I send '4558277' in SOAP input parameters and in the result I should get the value '1F58_L' . I found following webservices and want to know whether I can use the one for 'Gene' : http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html thanks in advance, umanga From holland at eaglegenomics.com Thu Jan 8 06:07:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 08 Jan 2009 11:07:43 +0000 Subject: [Biojava-l] Genebank Webservices ? In-Reply-To: <4965C021.2060109@biggjapan.com> References: <4965C021.2060109@biggjapan.com> Message-ID: <4965DE7F.8020209@eaglegenomics.com> There is no generic interface to NCBI eUtils in BioJava, but one is planned. In the meantime take a look at this existing BioJava 1.6 package, which will query Genbank for a sequence and return a BioJava RichSequence object containing the result. You can then search through the annotations and features of the sequence to find the result you need. This is for Gene records: http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.html Or the equivalent for Peptide records: http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.html cheers, Richard Ashika Umanga Umagiliya wrote: > Greetings all, > > I come from a computer science background and only have little knowledge > in bioinformatics. > In the application I develop,I want to search for an genebank id (like > 4558277) from ncbi and want to retrieve the relavent PDB_ID. > > For example : > Say for id '4558277', i get the result > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277 > > I can see the value '1F58_L' which is only significant to me. > > I want to know where there is any webservice to do retrive this > information. > > That means, I send '4558277' in SOAP input parameters and in the result > I should get the value '1F58_L' . > > I found following webservices and want to know whether I can use the one > for 'Gene' : > > http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html > > thanks in advance, > umanga > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas.prlic at gmail.com Mon Jan 12 04:28:22 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Mon, 12 Jan 2009 10:28:22 +0100 Subject: [Biojava-l] BioJava In-Reply-To: <496ac2075c8cf4.53511495@wp.pl> References: <496ac2075c8cf4.53511495@wp.pl> Message-ID: <59a41c430901120128l53f2a5c8le0a122a0a73515@mail.gmail.com> Hi Michal, the code you sent looks fine to me. Still I am not sure if I fully understand what you are trying to say. What do you mean with "each hit" ? >From our previous discussion I understand that you work with two sets of atoms (residues) where each position in one set corresponds to a position in the other set. This means you know that all atoms are on structurally equivalent positions and the two sets of atoms are of the same size. If this is the case, then the SVDSuperimposer is the right tool and you would include all atoms in the two sets for the RMSD calculation. If you work with 2 proteins where you do NOT know the structurally equivalent positions at the start, then StructurePairAligner provides an algorithm to align two proteins (of different length) and find pairs of atoms (residues) on structurally equivalent positions. In this case, the RMSD calculation considers the positions that are equivalent and ignores the unaligned regions. Guess I should create a wiki page for explaining this difference between SVDSuperimposer and StructurePairAligner... Andreas 2009/1/12 Micha? Lorenc : > Dear Andreas, > I used the SVDSuperimposer class, but after Calc.rotate and Calc.shift I > would know which Atom is close to another Atom. > > SVDSuperimposer.getRMS(caAtoms1, caAtoms2) get me only for the whole > protein structure the RMS value, but how could I get a RMS value for > each hit? > > I attached you my code. Thank you in advance! > > Best regards, > > Michal > > ---------------------------------------------------- > Adam Ma?ysz na aukcji! > Zobacz: > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2F522934896.html&sid=605 > From aumanga at biggjapan.com Thu Jan 15 20:03:24 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Fri, 16 Jan 2009 10:03:24 +0900 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? Message-ID: <496FDCDC.5010805@biggjapan.com> Greetings all, I come from a computer science background and at the moment I work on a Bioinformatics software.I really see the necessity to learn more on bioinformatics , quickly :) I hear (and use blindly)all this words - "sequence alignment , epitopes , CDR , homology modeling ,docking,amino acids"...etc and at the moment I don't care much about them since I've been told what to happen and I implement it. Where can i learn about this concepts easily , I mean for a guy come from mathematical and IT background ?/ Best regards, umanga From holland at eaglegenomics.com Fri Jan 16 05:50:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 16 Jan 2009 10:50:35 +0000 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <496FDCDC.5010805@biggjapan.com> References: <496FDCDC.5010805@biggjapan.com> Message-ID: <4970667B.9030601@eaglegenomics.com> Your best bet is a good old fashioned book. ;) A quick search on Amazon threw up this one which looks like a very helpful intro to cell biology for people like you (and me!) who have come to bioinformatics from a computer science background: http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 Hopefully this is a good starting point. I'm sure everyone on this list has their own favourite books which they could recommend to you as well. cheers, Richard Ashika Umanga Umagiliya wrote: > Greetings all, > > I come from a computer science background and at the moment I work on a > Bioinformatics software.I really see the necessity to learn more on > bioinformatics , quickly :) > I hear (and use blindly)all this words - "sequence alignment , epitopes > , CDR , homology modeling ,docking,amino acids"...etc and at the moment > I don't care much about them since I've been told what to happen and I > implement it. > Where can i learn about this concepts easily , I mean for a guy come > from mathematical and IT background ?/ > > Best regards, > umanga > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Fri Jan 16 07:27:23 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 16 Jan 2009 20:27:23 +0800 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <4970667B.9030601@eaglegenomics.com> References: <496FDCDC.5010805@biggjapan.com> <4970667B.9030601@eaglegenomics.com> Message-ID: <93b45ca50901160427l5941f82dy18b68f5000c32722@mail.gmail.com> Wikipedia is always a good place to get a very rapid overview of some unfamiliar biological term. - Mark On Fri, Jan 16, 2009 at 6:50 PM, Richard Holland wrote: > > Your best bet is a good old fashioned book. ;) > > A quick search on Amazon threw up this one which looks like a very > helpful intro to cell biology for people like you (and me!) who have > come to bioinformatics from a computer science background: > > http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 > > Hopefully this is a good starting point. I'm sure everyone on this list > has their own favourite books which they could recommend to you as well. > > cheers, > Richard > > > Ashika Umanga Umagiliya wrote: > > Greetings all, > > > > I come from a computer science background and at the moment I work on a > > Bioinformatics software.I really see the necessity to learn more on > > bioinformatics , quickly :) > > I hear (and use blindly)all this words - "sequence alignment , epitopes > > , CDR , homology modeling ,docking,amino acids"...etc and at the moment > > I don't care much about them since I've been told what to happen and I > > implement it. > > Where can i learn about this concepts easily , I mean for a guy come > > from mathematical and IT background ?/ > > > > Best regards, > > umanga > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From koen.bruynseels at cropdesign.com Fri Jan 16 08:09:58 2009 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Fri, 16 Jan 2009 14:09:58 +0100 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 01/14/2009 and will not return until 01/25/2009. I will respond to your message when I return. From aumanga at biggjapan.com Sun Jan 18 19:43:31 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Mon, 19 Jan 2009 09:43:31 +0900 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <4970667B.9030601@eaglegenomics.com> References: <496FDCDC.5010805@biggjapan.com> <4970667B.9030601@eaglegenomics.com> Message-ID: <4973CCB3.4000008@biggjapan.com> Thanks everyone for the tips.. I started reading "BioInformatics for Dummies" to get the basics..then hoping to move on to the book Richard recommended. Thank you again, Best regards, umanga Richard Holland wrote: > Your best bet is a good old fashioned book. ;) > > A quick search on Amazon threw up this one which looks like a very > helpful intro to cell biology for people like you (and me!) who have > come to bioinformatics from a computer science background: > > http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 > > Hopefully this is a good starting point. I'm sure everyone on this list > has their own favourite books which they could recommend to you as well. > > cheers, > Richard > > > Ashika Umanga Umagiliya wrote: > >> Greetings all, >> >> I come from a computer science background and at the moment I work on a >> Bioinformatics software.I really see the necessity to learn more on >> bioinformatics , quickly :) >> I hear (and use blindly)all this words - "sequence alignment , epitopes >> , CDR , homology modeling ,docking,amino acids"...etc and at the moment >> I don't care much about them since I've been told what to happen and I >> implement it. >> Where can i learn about this concepts easily , I mean for a guy come >> from mathematical and IT background ?/ >> >> Best regards, >> umanga >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > From marcel.huntemann at gmail.com Tue Jan 20 21:42:14 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Tue, 20 Jan 2009 18:42:14 -0800 Subject: [Biojava-l] How to get translated sequence out of blast result Message-ID: <49768B86.20707@Gmail.com> Hi! I've a multiple fasta file with a lot of nucleotide sequences in it. I made a blastx with this file against a database. Now I want to parse the blast result. To be more precisely: I want to get the translated protein query sequence with it's start and stop position for each hit. I am using the example code from the BioJava cookbook (http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). The parsing works fine so far, besides one problem. I am able to get the start and stop position for the query sequence via hit.getQueryStart() and hit.getQueryEnd(). But I couldn't figure out yet, how to get the translated protein query sequence out of the blast result. I couldn't find something like hit.getQuerySequence() or similar. I would guess that something like that exists already somehwere or am I wrong and I've to implement it myself? Thanks, Marcel From markjschreiber at gmail.com Wed Jan 21 21:30:54 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 22 Jan 2009 10:30:54 +0800 Subject: [Biojava-l] Off topic: JDK6 and JAX-WS 2.1 Message-ID: <93b45ca50901211830i5af3e213p9db9a6d10f42fa75@mail.gmail.com> Sorry for the off topic post but this is something that has caused me to loose quite a bit of hair recently. If you're planning on doing webservice development with JAX-WS don't use JDK6 unless you use a version more recent than update 3. I'll spare you the gory details but versions of JDK6u4 and onwards use JAX-WS 2.1 which prevents the need for playing with endorsed directories etc which is very tricky in IDEs and not un-complicated with Ant. - Mark From marcel.huntemann at gmail.com Thu Jan 22 19:48:57 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 22 Jan 2009 16:48:57 -0800 Subject: [Biojava-l] Problem with blast file parser Message-ID: <497913F9.70009@Gmail.com> Hi! I am experiencing a strange problem with the Blast parser. I am using the code from the BioJava CookBook (http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). In the blast file are the results of 20 contigs. The problem is that the parser only gives me the results of every other sequence. So I get the results for contig # 1, 3, 5, 7, 9, 11 and then it continues with the even ones 12, 14, 16, 18 and 20. Did anyone experience the same problems or maybe knows what causes them? Thanks, Marcel From charles at imbusch.net Fri Jan 23 11:17:32 2009 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 23 Jan 2009 17:17:32 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497913F9.70009@Gmail.com> References: <497913F9.70009@Gmail.com> Message-ID: <4979ED9C.6040207@imbusch.net> Hello Marcel, I also do experience the problem that the parser is skipping the even result numbers. I have not found a sufficient solution for that, so I gave up on parsing on a blast result file containing multiple results. Instead I splitted up the big fasta file into serveral ones, so that I just get one result for one fasta file. That works, even it's not the best solution for it. Let me know if you find another solution for that problem. Cheers, Charles From markjschreiber at gmail.com Fri Jan 23 21:20:36 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 24 Jan 2009 10:20:36 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <4979ED9C.6040207@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> Message-ID: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Is this XML parsing or blast text output? - Mark On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch wrote: > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Fri Jan 23 22:54:02 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Fri, 23 Jan 2009 19:54:02 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Message-ID: <497A90DA.5070104@Gmail.com> As I said, I am using the code from http://biojava.org/wiki/BioJava:CookBook:Blast:Parser. I have a normal text file that was created by blast. I thought that the given code converts the input stream from the file into SAX events. Do I have to do another step, before I use the code of that example? Cheers, Marcel Mark Schreiber wrote: > Is this XML parsing or blast text output? > > - Mark > > On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch > wrote: > > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From charles at imbusch.net Sun Jan 25 06:54:53 2009 From: charles at imbusch.net (Charles Imbusch) Date: Sun, 25 Jan 2009 12:54:53 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Message-ID: <497C530D.7090006@imbusch.net> Hello Mark, same here. I'm parsing plain text output. Cheers, Charles Mark Schreiber schrieb: > Is this XML parsing or blast text output? > > - Mark From markjschreiber at gmail.com Sun Jan 25 21:45:25 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 26 Jan 2009 10:45:25 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497C530D.7090006@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> Message-ID: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> Have you tried parsing the XML output. While the text parser does 'fake' XML by emitting SAX events I think that the XML parser may be a lot more stable (the XML output of blast is more stable, at least recently anyway). I know this isn't the best solution to your problem but the default text output of BLAST is not the most parseable. In fact it is probably the least machine readable of all the blast outputs and definitely the least stable. - Mark On Sun, Jan 25, 2009 at 7:54 PM, Charles Imbusch wrote: > > Hello Mark, > > same here. I'm parsing plain text output. > > Cheers, > Charles > > Mark Schreiber schrieb: >> >> Is this XML parsing or blast text output? >> >> - Mark From ahmed.elmasri at gmail.com Mon Jan 26 01:35:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Mon, 26 Jan 2009 01:35:20 -0500 Subject: [Biojava-l] Depreciated methods Message-ID: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> Hello list, I am new to BioJava and I have been trying some of its examples. I came across some of the depreciated methods and I am not sure if they will be removed entirely any time soon? I also found some of the examples problematic and didn't run properly even though I am following the instructions stated in the comment section. Please let me know if you have answers for my questions. Best wishes, Ahmed -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Mon Jan 26 03:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 08:10:42 +0000 Subject: [Biojava-l] Depreciated methods In-Reply-To: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> Message-ID: <497D7002.3010101@eaglegenomics.com> > I am new to BioJava and I have been trying some of its examples. I came > across some of the depreciated methods and I am not sure if they will be > removed entirely any time soon? Deprecated = may be removed without notice in any future release. I couldn't say for any individual method, but in general it's a bad idea to use anything that is marked deprecated when writing new code. > I also found some of the examples > problematic and didn't run properly even though I am following the > instructions stated in the comment section. Could you specify exactly which examples didn't work, and the exact problems you had with them? thanks, Richard > Please let me know if you have answers for my questions. > Best wishes, > Ahmed > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From charles at imbusch.net Mon Jan 26 06:04:09 2009 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 26 Jan 2009 12:04:09 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> Message-ID: <497D98A9.6010904@imbusch.net> Hello Mark, no I haven't tried to parse XML output from Blast. Just because of the fact that plain text output can be viewed with any editor. That's still very convenient. But I'm keen: is there actually an easy to use program for viewing XML output? Another option would be to generate XML and plain text output from Blast at the same time (in one run). But I couldn't find a way to do so. Maybe I missed something? Cheers, Charles Mark Schreiber schrieb: > Have you tried parsing the XML output. While the text parser does > 'fake' XML by emitting SAX events I think that the XML parser may be a > lot more stable (the XML output of blast is more stable, at least > recently anyway). > > I know this isn't the best solution to your problem but the default > text output of BLAST is not the most parseable. In fact it is probably > the least machine readable of all the blast outputs and definitely the > least stable. > > - Mark > From holland at eaglegenomics.com Mon Jan 26 06:21:26 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 11:21:26 +0000 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497D98A9.6010904@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> Message-ID: <497D9CB6.4010403@eaglegenomics.com> This app looks hopeful for viewing blast output - haven't tried it though...: http://www.korilog.com/index.php/BlastViewer.html Otherwise, no there's no way of making Blast output in more than one format at once. It's either text, or XML, but it won't do both. cheers, Richard Charles Imbusch wrote: > Hello Mark, > > no I haven't tried to parse XML output from Blast. > Just because of the fact that plain text output can be > viewed with any editor. That's still very convenient. > > But I'm keen: is there actually an easy to use program for viewing > XML output? > Another option would be to generate XML and plain text > output from Blast at the same time (in one run). But I couldn't find a > way to do so. Maybe I missed something? > > Cheers, > Charles > > Mark Schreiber schrieb: >> Have you tried parsing the XML output. While the text parser does >> 'fake' XML by emitting SAX events I think that the XML parser may be a >> lot more stable (the XML output of blast is more stable, at least >> recently anyway). >> >> I know this isn't the best solution to your problem but the default >> text output of BLAST is not the most parseable. In fact it is probably >> the least machine readable of all the blast outputs and definitely the >> least stable. >> >> - Mark >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ahmed.elmasri at gmail.com Mon Jan 26 12:24:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Mon, 26 Jan 2009 12:24:20 -0500 Subject: [Biojava-l] Depreciated methods In-Reply-To: <497D7002.3010101@eaglegenomics.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> <497D7002.3010101@eaglegenomics.com> Message-ID: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> Thanks Richard! For the deprecated methods, is there a reference or a wiki that maps the deprecated ones to ones that should be used instead? As for the examples I am having trouble with: WriteToFasta is one. Here is the error that I am getting: java.lang.IllegalArgumentException: No alphabet was set in the identifier at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928) at examples.WriteToFasta.main(WriteToFasta.java:43) And here is my parameter value: //get the int constant for the file type int fileType = Integer.parseInt("2"); I would appreciate any help. Best wishes, Ahmed On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland wrote: > > I am new to BioJava and I have been trying some of its examples. I came > > across some of the depreciated methods and I am not sure if they will be > > removed entirely any time soon? > > Deprecated = may be removed without notice in any future release. I > couldn't say for any individual method, but in general it's a bad idea > to use anything that is marked deprecated when writing new code. > > > I also found some of the examples > > problematic and didn't run properly even though I am following the > > instructions stated in the comment section. > > Could you specify exactly which examples didn't work, and the exact > problems you had with them? > > thanks, > Richard > > > Please let me know if you have answers for my questions. > > Best wishes, > > Ahmed > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Mon Jan 26 12:30:00 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 17:30:00 +0000 Subject: [Biojava-l] Depreciated methods In-Reply-To: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> <497D7002.3010101@eaglegenomics.com> <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> Message-ID: <497DF318.8070703@eaglegenomics.com> Most methods include a note saying which method should be used instead. For those that don't, take a look in the org.biojavax packages to see if there are suitable alternative classes. In the case of the deprecated SeqIOTools.fileToBioJava, a much better version of the FASTA parser/writer exists in the org.biojavax packages. Instructions on how to use it are here: http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Example cheers, Richard Hamed, Ahmed A. wrote: > Thanks Richard! > > For the deprecated methods, is there a reference or a wiki that maps the > deprecated ones to ones that should be used instead? > > As for the examples I am having trouble with: WriteToFasta is one. Here > is the error that I am getting: > java.lang.IllegalArgumentException: No alphabet was set in the identifier > at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928) > at examples.WriteToFasta.main(WriteToFasta.java:43) > > And here is my parameter value: > //get the int constant for the file type > int fileType = Integer.parseInt("2"); > > I would appreciate any help. > Best wishes, > Ahmed > > > > > On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland > > wrote: > > > I am new to BioJava and I have been trying some of its examples. I > came > > across some of the depreciated methods and I am not sure if they > will be > > removed entirely any time soon? > > Deprecated = may be removed without notice in any future release. I > couldn't say for any individual method, but in general it's a bad idea > to use anything that is marked deprecated when writing new code. > > > I also found some of the examples > > problematic and didn't run properly even though I am following the > > instructions stated in the comment section. > > Could you specify exactly which examples didn't work, and the exact > problems you had with them? > > thanks, > Richard > > > Please let me know if you have answers for my questions. > > Best wishes, > > Ahmed > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Mon Jan 26 19:55:32 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 27 Jan 2009 08:55:32 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497D98A9.6010904@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> Message-ID: <93b45ca50901261655n727219cdnd1ee27bf3b0d31e6@mail.gmail.com> You can generate plain text from XML using an XSLT. It probably won't be identical to the BLAST text output but that format is not so stable anyway. On Mon, Jan 26, 2009 at 7:04 PM, Charles Imbusch wrote: > Hello Mark, > > no I haven't tried to parse XML output from Blast. > Just because of the fact that plain text output can be > viewed with any editor. That's still very convenient. > > But I'm keen: is there actually an easy to use program for viewing > XML output? > Another option would be to generate XML and plain text > output from Blast at the same time (in one run). But I couldn't find a > way to do so. Maybe I missed something? > > Cheers, > Charles > > Mark Schreiber schrieb: >> >> Have you tried parsing the XML output. While the text parser does >> 'fake' XML by emitting SAX events I think that the XML parser may be a >> lot more stable (the XML output of blast is more stable, at least >> recently anyway). >> >> I know this isn't the best solution to your problem but the default >> text output of BLAST is not the most parseable. In fact it is probably >> the least machine readable of all the blast outputs and definitely the >> least stable. >> >> - Mark >> > > From nir at rosettadesigngroup.com Tue Jan 27 08:08:34 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Tue, 27 Jan 2009 15:08:34 +0200 Subject: [Biojava-l] Rosetta Academic Training Workshop Message-ID: <2F3C5E9F-418B-489F-A852-3A99248D03AC@rosettadesigngroup.com> Due to public demand, ?Rosetta Design Group? is organizing a ?Rosetta? software training workshop, aimed for academic groups. The format of the workshop will be a ?webinar? - a web seminar, enabling more groups to attend while avoiding the annoying jet lag and accommodation troubles. Would you be interested in participating? If so please fill the form located at: http://rosettadesigngroup.com/blog/rosetta-academic-workshop/ and we will contact you when the details are finalized.* Nir London | Rosetta Design Group http://rosettadesigngroup.com/ * If you?re not from an academic group, don?t worry, write us anyway? From gwu at molbio.mgh.harvard.edu Wed Jan 28 23:51:28 2009 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 28 Jan 2009 23:51:28 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <497D9CB6.4010403@eaglegenomics.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> Message-ID: <498135D0.9060103@molbio.mgh.harvard.edu> Hi Everyone, I have a piece of code to parse Genbank file and retrieve gene sequence and related information. It works well with sequences such as Arabidopsis thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome 2. The contig that the code failed on is the largest one in my test. Contig NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. That causes some gene coordinates out of range. Attached is the code. Can anyone give some suggesttion? The Mus musculus Genbank file can be downloaded at : ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz Thanks in advance Gang ========================================== public class TestMus { public void testMusChr2() throws FileNotFoundException, NoSuchElementException, BioException { String fp="/tmp/mm_alt_chr2.gbk"; System.out.println("File: " + fp); BufferedReader gReader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(fp)))); Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqI = RichSequence.IOTools.readGenbankDNA(gReader, ns); while (seqI.hasNext()) { RichSequence seq = seqI.nextRichSequence(); String organism = seq.getTaxon().getDisplayName(); String accession = seq.getAccession(); String identifier = seq.getIdentifier(); int taxonID = seq.getTaxon().getNCBITaxID(); String division = seq.getDivision(); String seqVersion = "" + seq.getSeqVersion(); int seqLength = seq.length(); String description = seq.getDescription(); System.out.println("Organism: " + organism + "\nAccession: " + accession + "\nIdentifier: " + identifier + "\nTaxonID: " + taxonID + "\nDivision: " + division + "\nSeqVersion: " + seqVersion + "\nLength: " + seqLength); System.out.println("2041-2101: " + seq.subStr(2041, 2101)); for (Iterator i = seq.features(); i.hasNext();) { RichFeature f = (RichFeature) i.next(); int rank = f.getRank(); String fType = f.getType(); if (fType.toLowerCase().equals("gene")) { int startPos=f.getLocation().getMin(); int endPos=f.getLocation().getMax(); int geneLen=endPos-startPos+1; String sequence=seq.subStr(startPos, endPos); String strand = f.getStrand().getToken() + ""; Annotation ann = (Annotation) f.getAnnotation(); String geneIdentifier =""; if (ann.containsProperty("locus_tag")) { geneIdentifier=ann.getProperty("locus_tag") + ""; } else geneIdentifier=ann.getProperty("gene") + ""; String alternativeIdentifiers=""; try { alternativeIdentifiers= (String) ann.getProperty("gene"); } catch(NoSuchElementException e) {} String annotation=""; System.out.println(rank + "\t" + geneIdentifier + "\t" + alternativeIdentifiers + "\t" + startPos + "\t" + endPos + "\t" + geneLen + "\t" + strand); } } } } public static void main(String [] args) throws Exception { TestMus tm=new TestMus(); tm.testMusChr2(); } } From markjschreiber at gmail.com Thu Jan 29 00:43:35 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 29 Jan 2009 13:43:35 +0800 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <498135D0.9060103@molbio.mgh.harvard.edu> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> Message-ID: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> I assume that the downloaded file has the complete sequence in it? Probably worth checking that it has the complete sequence block (all 116366104 bp). - Mark On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: > Hi Everyone, > > I have a piece of code to parse Genbank file and retrieve gene sequence and > related information. It works well with sequences such as Arabidopsis > thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome > 2. The contig that the code failed on is the largest one in my test. Contig > NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. > That causes some gene coordinates out of range. Attached is the code. Can > anyone give some suggesttion? > > The Mus musculus Genbank file can be downloaded at : > ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz > > Thanks in advance > > Gang > ========================================== > public class TestMus { > public void testMusChr2() throws FileNotFoundException, > NoSuchElementException, BioException { > String fp="/tmp/mm_alt_chr2.gbk"; > System.out.println("File: " + fp); > BufferedReader gReader = new BufferedReader(new InputStreamReader(new > FileInputStream(new File(fp)))); > Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqI = > RichSequence.IOTools.readGenbankDNA(gReader, ns); > while (seqI.hasNext()) { > RichSequence seq = seqI.nextRichSequence(); > String organism = seq.getTaxon().getDisplayName(); > String accession = seq.getAccession(); > String identifier = seq.getIdentifier(); > int taxonID = seq.getTaxon().getNCBITaxID(); > String division = seq.getDivision(); > String seqVersion = "" + seq.getSeqVersion(); > int seqLength = seq.length(); > String description = seq.getDescription(); > System.out.println("Organism: " + organism > + "\nAccession: " + accession > + "\nIdentifier: " + identifier > + "\nTaxonID: " + taxonID > + "\nDivision: " + division > + "\nSeqVersion: " + seqVersion > + "\nLength: " + seqLength); > System.out.println("2041-2101: " + seq.subStr(2041, 2101)); > for (Iterator i = seq.features(); i.hasNext();) { > RichFeature f = (RichFeature) i.next(); > int rank = f.getRank(); > String fType = f.getType(); > if (fType.toLowerCase().equals("gene")) { > int startPos=f.getLocation().getMin(); > int endPos=f.getLocation().getMax(); > int geneLen=endPos-startPos+1; > String sequence=seq.subStr(startPos, endPos); > String strand = f.getStrand().getToken() + ""; > Annotation ann = (Annotation) f.getAnnotation(); > String geneIdentifier =""; > if (ann.containsProperty("locus_tag")) { > geneIdentifier=ann.getProperty("locus_tag") + ""; > } > else geneIdentifier=ann.getProperty("gene") + ""; > > String alternativeIdentifiers=""; > try { > alternativeIdentifiers= (String) > ann.getProperty("gene"); > > } catch(NoSuchElementException e) {} > String annotation=""; > System.out.println(rank + "\t" + geneIdentifier + "\t" + > alternativeIdentifiers + "\t" > + startPos + "\t" + endPos + "\t" + geneLen + > "\t" + strand); > } > } > } > } > public static void main(String [] args) throws Exception { > TestMus tm=new TestMus(); > tm.testMusChr2(); > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at eaglegenomics.com Thu Jan 29 02:25:10 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 07:25:10 +0000 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> Message-ID: <498159D6.8010906@eaglegenomics.com> Gabrielle Doan posted a solution to this a while back and I believe the changes have been committed already: http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html How old is the copy of BioJava that you're using? Have you tried checking out the trunk from Subversion to see if that works? cheers, Richard Mark Schreiber wrote: > I assume that the downloaded file has the complete sequence in it? Probably > worth checking that it has the complete sequence block (all 116366104 bp). > > - Mark > > On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: > >> Hi Everyone, >> >> I have a piece of code to parse Genbank file and retrieve gene sequence and >> related information. It works well with sequences such as Arabidopsis >> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome >> 2. The contig that the code failed on is the largest one in my test. Contig >> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. >> That causes some gene coordinates out of range. Attached is the code. Can >> anyone give some suggesttion? >> >> The Mus musculus Genbank file can be downloaded at : >> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz >> >> Thanks in advance >> >> Gang >> ========================================== >> public class TestMus { >> public void testMusChr2() throws FileNotFoundException, >> NoSuchElementException, BioException { >> String fp="/tmp/mm_alt_chr2.gbk"; >> System.out.println("File: " + fp); >> BufferedReader gReader = new BufferedReader(new InputStreamReader(new >> FileInputStream(new File(fp)))); >> Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator seqI = >> RichSequence.IOTools.readGenbankDNA(gReader, ns); >> while (seqI.hasNext()) { >> RichSequence seq = seqI.nextRichSequence(); >> String organism = seq.getTaxon().getDisplayName(); >> String accession = seq.getAccession(); >> String identifier = seq.getIdentifier(); >> int taxonID = seq.getTaxon().getNCBITaxID(); >> String division = seq.getDivision(); >> String seqVersion = "" + seq.getSeqVersion(); >> int seqLength = seq.length(); >> String description = seq.getDescription(); >> System.out.println("Organism: " + organism >> + "\nAccession: " + accession >> + "\nIdentifier: " + identifier >> + "\nTaxonID: " + taxonID >> + "\nDivision: " + division >> + "\nSeqVersion: " + seqVersion >> + "\nLength: " + seqLength); >> System.out.println("2041-2101: " + seq.subStr(2041, 2101)); >> for (Iterator i = seq.features(); i.hasNext();) { >> RichFeature f = (RichFeature) i.next(); >> int rank = f.getRank(); >> String fType = f.getType(); >> if (fType.toLowerCase().equals("gene")) { >> int startPos=f.getLocation().getMin(); >> int endPos=f.getLocation().getMax(); >> int geneLen=endPos-startPos+1; >> String sequence=seq.subStr(startPos, endPos); >> String strand = f.getStrand().getToken() + ""; >> Annotation ann = (Annotation) f.getAnnotation(); >> String geneIdentifier =""; >> if (ann.containsProperty("locus_tag")) { >> geneIdentifier=ann.getProperty("locus_tag") + ""; >> } >> else geneIdentifier=ann.getProperty("gene") + ""; >> >> String alternativeIdentifiers=""; >> try { >> alternativeIdentifiers= (String) >> ann.getProperty("gene"); >> >> } catch(NoSuchElementException e) {} >> String annotation=""; >> System.out.println(rank + "\t" + geneIdentifier + "\t" + >> alternativeIdentifiers + "\t" >> + startPos + "\t" + endPos + "\t" + geneLen + >> "\t" + strand); >> } >> } >> } >> } >> public static void main(String [] args) throws Exception { >> TestMus tm=new TestMus(); >> tm.testMusChr2(); >> } >> } >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jw12 at sanger.ac.uk Thu Jan 29 06:20:47 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 29 Jan 2009 11:20:47 +0000 Subject: [Biojava-l] closing this week: Registrations for DAS workshop. Message-ID: <207B56E3-C65C-41A4-800E-AF0B9F158CA6@sanger.ac.uk> DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Registration is open for the 2009 DAS workshop (8,9,10th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.dasregistry.org/course.jsp and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Closing date for registration is 1st Feb 2009. If you register now you can change the details of your registration any time up until this closing date. Please register early as places will be limited. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at eaglegenomics.com Thu Jan 29 11:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 16:10:42 +0000 Subject: [Biojava-l] Eagle Genomics is hiring Message-ID: <4981D502.1000905@eaglegenomics.com> Hi all, Apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. Richard ===== Senior Bioinformatics Software Developer Eagle Genomics Ltd., Cambridge, UK http://www.eaglegenomics.com/ We are a young and exciting bioinformatics company looking to revolutionise the way in which industry and academia work together. We are based at the heart of Europe's largest biotech cluster in Cambridge, UK. As we expand our client base, we're looking to build a talented and committed team of experts. We are currently looking for a software developer to work on a wide range of complex projects, and who is happy to work face-to-face with our customers. Ideally you will have had substantial prior experience working in a life science company or research institute, however we will also consider graduates with a track record in bioinformatics. In addition to your superb technical skills, you will also: * have the ability to quickly translate scientific problems into real software solutions, * be able to put technical concepts into simple language for end users to understand, * be able to pick up new skills and techniques in record time, * work well in a collaborative team environment, * be creative, innovative, and forward-thinking. You will have hands-on experience in some of the following: * Java, * Perl, * SQL query design, * Relational database schema design, * Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL, etc., * Ensembl, * BioMart, * DAS, * Taverna, * Oracle Life Sciences Platform, * Oracle database administration, * MySQL database administration, * VMware virtual machines, * Grid computing and parallelisation. The preferred candidate will be able to work from our offices in Cambridge, but we would also consider telecommuting arrangements. We offer a competitive salary and a range of company benefits. To apply, please send your CV and cover letter as PDF documents to jobs at eaglegenomics.com. If you have any questions about the position or would like to discuss it further before applying, please use the same email address. We are only able to offer positions to EEA citizens and permanent residents, or Tier 1 migrants under the new UK points-based immigration scheme. Individual contracting arrangements could be considered but we will prefer those candidates who can work with us as employees. No agencies please. -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From gwu at molbio.mgh.harvard.edu Thu Jan 29 13:40:06 2009 From: gwu at molbio.mgh.harvard.edu (gwu) Date: Thu, 29 Jan 2009 13:40:06 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> Message-ID: <4981F806.7070100@molbio.mgh.harvard.edu> Thanks Mark. I did parse out the sequence block with sed and the length agrees with what the Genbank says. Gang Mark Schreiber wrote: > I assume that the downloaded file has the complete sequence in it? > Probably worth checking that it has the complete sequence block (all > 116366104 bp). > > - Mark > > On Thu, Jan 29, 2009 at 12:51 PM, gang wu > wrote: > > Hi Everyone, > > I have a piece of code to parse Genbank file and retrieve gene > sequence and related information. It works well with sequences > such as Arabidopsis thaliana, C. elegans, Bos taurus. But it > failed with Mus musculus chromosome 2. The contig that the code > failed on is the largest one in my test. Contig NT_039207 has > 116366104 bp, but the code shows it's cut to 100000020 bp. That > causes some gene coordinates out of range. Attached is the code. > Can anyone give some suggesttion? > > The Mus musculus Genbank file can be downloaded at : > ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz > > Thanks in advance > > Gang > ========================================== > public class TestMus { > public void testMusChr2() throws FileNotFoundException, > NoSuchElementException, BioException { > String fp="/tmp/mm_alt_chr2.gbk"; > System.out.println("File: " + fp); > BufferedReader gReader = new BufferedReader(new > InputStreamReader(new FileInputStream(new File(fp)))); > Namespace ns = (Namespace) > RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqI = > RichSequence.IOTools.readGenbankDNA(gReader, ns); > while (seqI.hasNext()) { > RichSequence seq = seqI.nextRichSequence(); > String organism = seq.getTaxon().getDisplayName(); > String accession = seq.getAccession(); > String identifier = seq.getIdentifier(); > int taxonID = seq.getTaxon().getNCBITaxID(); > String division = seq.getDivision(); > String seqVersion = "" + seq.getSeqVersion(); > int seqLength = seq.length(); > String description = seq.getDescription(); > System.out.println("Organism: " + organism > + "\nAccession: " + accession > + "\nIdentifier: " + identifier > + "\nTaxonID: " + taxonID > + "\nDivision: " + division > + "\nSeqVersion: " + seqVersion > + "\nLength: " + seqLength); > System.out.println("2041-2101: " + seq.subStr(2041, 2101)); > for (Iterator i = seq.features(); i.hasNext();) { > RichFeature f = (RichFeature) i.next(); > int rank = f.getRank(); > String fType = f.getType(); > if (fType.toLowerCase().equals("gene")) { > int startPos=f.getLocation().getMin(); > int endPos=f.getLocation().getMax(); > int geneLen=endPos-startPos+1; > String sequence=seq.subStr(startPos, endPos); > String strand = f.getStrand().getToken() + ""; > Annotation ann = (Annotation) f.getAnnotation(); > String geneIdentifier =""; > if (ann.containsProperty("locus_tag")) { > geneIdentifier=ann.getProperty("locus_tag") > + ""; > } > else geneIdentifier=ann.getProperty("gene") + ""; > > String alternativeIdentifiers=""; > try { > alternativeIdentifiers= (String) > ann.getProperty("gene"); > > } catch(NoSuchElementException e) {} > String annotation=""; > System.out.println(rank + "\t" + geneIdentifier > + "\t" + alternativeIdentifiers + "\t" > + startPos + "\t" + endPos + "\t" + > geneLen + "\t" + strand); > } > } > } > } > public static void main(String [] args) throws Exception { > TestMus tm=new TestMus(); > tm.testMusChr2(); > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From gwu at molbio.mgh.harvard.edu Thu Jan 29 14:28:42 2009 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Thu, 29 Jan 2009 14:28:42 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <498159D6.8010906@eaglegenomics.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> <498159D6.8010906@eaglegenomics.com> Message-ID: <4982036A.7070302@molbio.mgh.harvard.edu> Thanks Richard. That is exactly the same issue. The latest Subversion trunk fixed the problem. Thanks again for the quick response. Gang Richard Holland wrote: > Gabrielle Doan posted a solution to this a while back and I believe the > changes have been committed already: > > http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html > > How old is the copy of BioJava that you're using? Have you tried > checking out the trunk from Subversion to see if that works? > > cheers, > Richard > > Mark Schreiber wrote: > >> I assume that the downloaded file has the complete sequence in it? Probably >> worth checking that it has the complete sequence block (all 116366104 bp). >> >> - Mark >> >> On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: >> >> >>> Hi Everyone, >>> >>> I have a piece of code to parse Genbank file and retrieve gene sequence and >>> related information. It works well with sequences such as Arabidopsis >>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome >>> 2. The contig that the code failed on is the largest one in my test. Contig >>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. >>> That causes some gene coordinates out of range. Attached is the code. Can >>> anyone give some suggesttion? >>> >>> The Mus musculus Genbank file can be downloaded at : >>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz >>> >>> Thanks in advance >>> >>> Gang >>> ========================================== >>> public class TestMus { >>> public void testMusChr2() throws FileNotFoundException, >>> NoSuchElementException, BioException { >>> String fp="/tmp/mm_alt_chr2.gbk"; >>> System.out.println("File: " + fp); >>> BufferedReader gReader = new BufferedReader(new InputStreamReader(new >>> FileInputStream(new File(fp)))); >>> Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); >>> RichSequenceIterator seqI = >>> RichSequence.IOTools.readGenbankDNA(gReader, ns); >>> while (seqI.hasNext()) { >>> RichSequence seq = seqI.nextRichSequence(); >>> String organism = seq.getTaxon().getDisplayName(); >>> String accession = seq.getAccession(); >>> String identifier = seq.getIdentifier(); >>> int taxonID = seq.getTaxon().getNCBITaxID(); >>> String division = seq.getDivision(); >>> String seqVersion = "" + seq.getSeqVersion(); >>> int seqLength = seq.length(); >>> String description = seq.getDescription(); >>> System.out.println("Organism: " + organism >>> + "\nAccession: " + accession >>> + "\nIdentifier: " + identifier >>> + "\nTaxonID: " + taxonID >>> + "\nDivision: " + division >>> + "\nSeqVersion: " + seqVersion >>> + "\nLength: " + seqLength); >>> System.out.println("2041-2101: " + seq.subStr(2041, 2101)); >>> for (Iterator i = seq.features(); i.hasNext();) { >>> RichFeature f = (RichFeature) i.next(); >>> int rank = f.getRank(); >>> String fType = f.getType(); >>> if (fType.toLowerCase().equals("gene")) { >>> int startPos=f.getLocation().getMin(); >>> int endPos=f.getLocation().getMax(); >>> int geneLen=endPos-startPos+1; >>> String sequence=seq.subStr(startPos, endPos); >>> String strand = f.getStrand().getToken() + ""; >>> Annotation ann = (Annotation) f.getAnnotation(); >>> String geneIdentifier =""; >>> if (ann.containsProperty("locus_tag")) { >>> geneIdentifier=ann.getProperty("locus_tag") + ""; >>> } >>> else geneIdentifier=ann.getProperty("gene") + ""; >>> >>> String alternativeIdentifiers=""; >>> try { >>> alternativeIdentifiers= (String) >>> ann.getProperty("gene"); >>> >>> } catch(NoSuchElementException e) {} >>> String annotation=""; >>> System.out.println(rank + "\t" + geneIdentifier + "\t" + >>> alternativeIdentifiers + "\t" >>> + startPos + "\t" + endPos + "\t" + geneLen + >>> "\t" + strand); >>> } >>> } >>> } >>> } >>> public static void main(String [] args) throws Exception { >>> TestMus tm=new TestMus(); >>> tm.testMusChr2(); >>> } >>> } >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > From marcel.huntemann at gmail.com Thu Jan 29 15:55:15 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 29 Jan 2009 12:55:15 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <4979ED9C.6040207@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> Message-ID: <498217B3.4010703@Gmail.com> Hi Charles! I've "found" a solution now. After dealing a couple of days with the terrible xml output of blast and BioJava's BlastXMLParser (which also wasn't working properly), I decided to have a look at the source code and try to figure out myself what was wrong with the BlastLikeSAXParser. So I checked out the present status of the source code via the anonymous svn checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After a couple of hours and me not finding an error that could cause this behavior, I thought I'll just give it a try and compiled the checked out source via ant. Then used the new created biojava.jar and suddenly everything went perfectly! So, whatever the error was (unfortunately I don't have the old source code to make a diff on certain files), it is already corrected in the up-to-the-minute version in the subversion system. Try it out! Cheers, Marcel Charles Imbusch wrote: > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > From andreas at sdsc.edu Thu Jan 29 16:11:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 29 Jan 2009 13:11:51 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <498217B3.4010703@Gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com> Message-ID: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> Hi, We had a couple of bug reports recently regarding issues that already got fixed in the latest biojava builds from SVN. I think it is time to start preparing the next biojava release ( 1.7 ) to make sure everybody gets up to the latest status... Andreas On Thu, Jan 29, 2009 at 12:55 PM, Marcel Huntemann wrote: > Hi Charles! > > I've "found" a solution now. After dealing a couple of days with the > terrible xml output of blast and BioJava's BlastXMLParser (which also > wasn't working properly), I decided to have a look at the source code and > try to figure out myself what was wrong with the BlastLikeSAXParser. So I > checked out the present status of the source code via the anonymous svn > checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After > a couple of hours and me not finding an error that could cause this > behavior, I thought I'll just give it a try and compiled the checked out > source via ant. Then used the new created biojava.jar and suddenly > everything went perfectly! > So, whatever the error was (unfortunately I don't have the old source code > to make a diff on certain files), it is already corrected in the > up-to-the-minute version in the subversion system. > Try it out! > > Cheers, > Marcel > > > Charles Imbusch wrote: >> Hello Marcel, >> >> I also do experience the problem that the parser is skipping >> the even result numbers. I have not found a sufficient solution >> for that, so I gave up on parsing on a blast result file containing >> multiple results. Instead I splitted up the big fasta file into >> serveral ones, so that I just get one result for one fasta file. >> That works, even it's not the best solution for it. >> >> Let me know if you find another solution for that problem. >> >> Cheers, >> Charles >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Thu Jan 29 16:24:55 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 29 Jan 2009 13:24:55 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com> <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> Message-ID: <49821EA7.9090603@Gmail.com> That sounds reasonable. I bet a lot of people would appreciate that! Andreas Prlic wrote: > Hi, > > We had a couple of bug reports recently regarding issues that already > got fixed in the latest biojava builds from SVN. I think it is time to > start preparing the next biojava release ( 1.7 ) to make sure > everybody gets up to the latest status... > > Andreas From marcin.swiatek at mail.mcgill.ca Thu Jan 29 16:56:29 2009 From: marcin.swiatek at mail.mcgill.ca (Marcin Swiatek) Date: Thu, 29 Jan 2009 16:56:29 -0500 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <49821EA7.9090603@Gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com><59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> <49821EA7.9090603@Gmail.com> Message-ID: <176A06E658ED0745965C072C5F2C116A02F87314@EXCHANGE2VS2.campus.mcgill.ca> I personally would. Especially that I have just solved the problem myself, unaware that someone did that already. BTW: the problem I picked up (which seems similar to the description given) was that new set line (as evaluated by checkNewBlastLikeDataSet in BlastSAXParser) wasn't picked up by HitSectionSAXParser and neither it percolated up to BlastSAXParser, thus leaving the state machine of the parser in a weird state. It would recover by skipping everything up to the next data set (thus the result of having every other item processed). BTW2: XML parser in 1.6 doesn't deal with new BLAST files either (2.19, was it?). Has this been fixed in the SVN repository? Cheers, Marcin -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Marcel Huntemann Sent: Thursday, January 29, 2009 4:25 PM To: Andreas Prlic Cc: biojava-dev; biojava-l at biojava.org Subject: Re: [Biojava-l] Problem with blast file parser That sounds reasonable. I bet a lot of people would appreciate that! Andreas Prlic wrote: > Hi, > > We had a couple of bug reports recently regarding issues that already > got fixed in the latest biojava builds from SVN. I think it is time to > start preparing the next biojava release ( 1.7 ) to make sure > everybody gets up to the latest status... > > Andreas _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From umanga.bio at gmail.com Fri Jan 30 07:00:41 2009 From: umanga.bio at gmail.com (Ashika Umanga Umangiliya) Date: Fri, 30 Jan 2009 21:00:41 +0900 Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ? Message-ID: Greetings all, In the application I develop ,I want to draw chromatogram from AB1. I come from computer science background have little knowledge this subject.Where can I find information on this? Can I draw the graph using data in AB1 file? Or is there any function ? thanks in advance, Umanga From ayates at ebi.ac.uk Fri Jan 30 07:57:40 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Jan 2009 12:57:40 +0000 Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ? In-Reply-To: References: Message-ID: <4982F944.7080905@ebi.ac.uk> Hi Umanga, Fortunately BioJava has an API for drawing chromatograms located under org.biojava.bio.chromatogram & org.biojava.bio.chromatogram.graphic. To parse in a AB1 file you can run the following code: import java.io.*; import org.biojava.bio.program.abi.ABIFChromatogram; import org.biojava.bio.chromatogram.*; import org.biojava.bio.chromatogram.graphic.*; File file = new File("chr.ab1"); Chromatogram c = ABIFChromatogram.parse(file); ChromatogramGraphic cg = new ChromatogramGraphic(c); //Can't remember how to get this so you'll have to find out Graphics2D context = getContextFromSomewhere(); cg.drawTo(cg); You can configure the size of the image through the ChromatogramGraphic object & alter a number of ChromatogramGraphic.Option attributes through ChromatogramGraphic.setOption(ChromatogramGraphic.Option opt, Object value). This should be enough to get you going. I will warn you that this class is quite memory intensive & an application I wrote ages ago had very big memory problems because of it (the drawing component not the file parsing). An alternative library is available from http://code.google.com/p/bioview2/ (which was developed by an old colleague). Try the biojava code first and if that serves your purpose then great; if not then try bioview2. Regards, Andy Yates P.S. The AB1 parser only supports the processed data channels in the AB1 file. If you want the raw data from it then you will have to modify the source or use another library (probably the C library StadenIO) to convert the raw data into an SCF file. Ashika Umanga Umangiliya wrote: > Greetings all, > > In the application I develop ,I want to draw chromatogram from AB1. I come > from computer science background have little knowledge this subject.Where > can I find information on this? > Can I draw the graph using data in AB1 file? Or is there any function ? > > > thanks in advance, > > Umanga > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ahmed.elmasri at gmail.com Fri Jan 30 21:41:31 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Fri, 30 Jan 2009 21:41:31 -0500 Subject: [Biojava-l] Sequence start/end location Message-ID: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> Hello list, I am trying to get find the start and end location of a gene in a gene sequence. I am reading from a gene FASTA database file. Is there a built-in method that I can use? The alternative is really painful since I have to parse a ptt file and not exactly working for me. Thanks very much! Ahmed From markjschreiber at gmail.com Fri Jan 30 23:15:59 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 12:15:59 +0800 Subject: [Biojava-l] Sequence start/end location In-Reply-To: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> Message-ID: <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com> Hi - Unfortunately your FASTA file won't contain any feature information which could tell you the start and end. If you don't want to get the info from the PTT file you might want to look at parsing the Genbank file instead which will have the feature information. A PTT parser might not be a bad thing for BioJava though. If you write one please consider adding it. - Mark On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A. wrote: > > Hello list, > I am trying to get find the start and end location of a gene in a gene > sequence. I am reading from a gene FASTA database file. Is there a built-in > method that I can use? The alternative is really painful since I have to > parse a ptt file and not exactly working for me. > Thanks very much! > Ahmed > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From markjschreiber at gmail.com Sat Jan 31 05:25:59 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 18:25:59 +0800 Subject: [Biojava-l] Sequence start/end location In-Reply-To: <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com> References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com> <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com> Message-ID: <93b45ca50901310225p6676c282m203e8c4e13ba37f1@mail.gmail.com> Hi Ahmed - For a first time contribution it would probably be easiest to post something to the list and someone with a development account can check it in for you. Please make sure to add javadoc comments and a basic JUnit test for any classes you make. - Mark On Sat, Jan 31, 2009 at 3:13 PM, Hamed, Ahmed A. wrote: > Dear Mark, > Thank you for your response. I would be happy to contribute my PTTParser if > you point me to where/how to check it in. I am still new to the BioJava > community and there is so much to learn. > Best wishes, > Ahmed > > On Fri, Jan 30, 2009 at 11:15 PM, Mark Schreiber > wrote: >> >> Hi - >> >> Unfortunately your FASTA file won't contain any feature information >> which could tell you the start and end. If you don't want to get the >> info from the PTT file you might want to look at parsing the Genbank >> file instead which will have the feature information. >> >> A PTT parser might not be a bad thing for BioJava though. If you write >> one please consider adding it. >> >> - Mark >> >> On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A. >> wrote: >> > >> > Hello list, >> > I am trying to get find the start and end location of a gene in a gene >> > sequence. I am reading from a gene FASTA database file. Is there a >> > built-in >> > method that I can use? The alternative is really painful since I have to >> > parse a ptt file and not exactly working for me. >> > Thanks very much! >> > Ahmed >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > > From aumanga at biggjapan.com Thu Jan 8 09:04:38 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 18:04:38 +0900 Subject: [Biojava-l] Genebank Webservices (corrrect result page) Message-ID: <4965C1A6.1030306@biggjapan.com> Sorry, correct result page is : http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 From aumanga at biggjapan.com Thu Jan 8 09:09:22 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 18:09:22 +0900 Subject: [Biojava-l] Genebank Webservices (corrrect result page) In-Reply-To: <4965C1A6.1030306@biggjapan.com> References: <4965C1A6.1030306@biggjapan.com> Message-ID: <4965C2C2.2030801@biggjapan.com> Greetings all, Sorry if this is reposted! I come from a computer science background and only have little knowledge in bioinformatics. In the application I develop,I want to search for an genebank id (like 4558277) from ncbi and want to retrieve the relavent PDB_ID. For example : Say for id '4558277', i get the result http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 I can see the value '1F58_L' which is only significant to me. I want to know where there is any webservice to do retrive this information. That means, I send '4558277' in SOAP input parameters and in the result I should get the value '1F58_L' . I found following webservices and want to know whether I can use the one for 'Gene' : http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html thanks in advance, umanga Ashika Umanga Umagiliya wrote: > Sorry, correct result page is : > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From aumanga at biggjapan.com Thu Jan 8 08:58:09 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 08 Jan 2009 17:58:09 +0900 Subject: [Biojava-l] Genebank Webservices ? Message-ID: <4965C021.2060109@biggjapan.com> Greetings all, I come from a computer science background and only have little knowledge in bioinformatics. In the application I develop,I want to search for an genebank id (like 4558277) from ncbi and want to retrieve the relavent PDB_ID. For example : Say for id '4558277', i get the result http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277 I can see the value '1F58_L' which is only significant to me. I want to know where there is any webservice to do retrive this information. That means, I send '4558277' in SOAP input parameters and in the result I should get the value '1F58_L' . I found following webservices and want to know whether I can use the one for 'Gene' : http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html thanks in advance, umanga From holland at eaglegenomics.com Thu Jan 8 11:07:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 08 Jan 2009 11:07:43 +0000 Subject: [Biojava-l] Genebank Webservices ? In-Reply-To: <4965C021.2060109@biggjapan.com> References: <4965C021.2060109@biggjapan.com> Message-ID: <4965DE7F.8020209@eaglegenomics.com> There is no generic interface to NCBI eUtils in BioJava, but one is planned. In the meantime take a look at this existing BioJava 1.6 package, which will query Genbank for a sequence and return a BioJava RichSequence object containing the result. You can then search through the annotations and features of the sequence to find the result you need. This is for Gene records: http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.html Or the equivalent for Peptide records: http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.html cheers, Richard Ashika Umanga Umagiliya wrote: > Greetings all, > > I come from a computer science background and only have little knowledge > in bioinformatics. > In the application I develop,I want to search for an genebank id (like > 4558277) from ncbi and want to retrieve the relavent PDB_ID. > > For example : > Say for id '4558277', i get the result > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277 > > I can see the value '1F58_L' which is only significant to me. > > I want to know where there is any webservice to do retrive this > information. > > That means, I send '4558277' in SOAP input parameters and in the result > I should get the value '1F58_L' . > > I found following webservices and want to know whether I can use the one > for 'Gene' : > > http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html > > thanks in advance, > umanga > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas.prlic at gmail.com Mon Jan 12 09:28:22 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Mon, 12 Jan 2009 10:28:22 +0100 Subject: [Biojava-l] BioJava In-Reply-To: <496ac2075c8cf4.53511495@wp.pl> References: <496ac2075c8cf4.53511495@wp.pl> Message-ID: <59a41c430901120128l53f2a5c8le0a122a0a73515@mail.gmail.com> Hi Michal, the code you sent looks fine to me. Still I am not sure if I fully understand what you are trying to say. What do you mean with "each hit" ? >From our previous discussion I understand that you work with two sets of atoms (residues) where each position in one set corresponds to a position in the other set. This means you know that all atoms are on structurally equivalent positions and the two sets of atoms are of the same size. If this is the case, then the SVDSuperimposer is the right tool and you would include all atoms in the two sets for the RMSD calculation. If you work with 2 proteins where you do NOT know the structurally equivalent positions at the start, then StructurePairAligner provides an algorithm to align two proteins (of different length) and find pairs of atoms (residues) on structurally equivalent positions. In this case, the RMSD calculation considers the positions that are equivalent and ignores the unaligned regions. Guess I should create a wiki page for explaining this difference between SVDSuperimposer and StructurePairAligner... Andreas 2009/1/12 Micha? Lorenc : > Dear Andreas, > I used the SVDSuperimposer class, but after Calc.rotate and Calc.shift I > would know which Atom is close to another Atom. > > SVDSuperimposer.getRMS(caAtoms1, caAtoms2) get me only for the whole > protein structure the RMS value, but how could I get a RMS value for > each hit? > > I attached you my code. Thank you in advance! > > Best regards, > > Michal > > ---------------------------------------------------- > Adam Ma?ysz na aukcji! > Zobacz: > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2F522934896.html&sid=605 > From aumanga at biggjapan.com Fri Jan 16 01:03:24 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Fri, 16 Jan 2009 10:03:24 +0900 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? Message-ID: <496FDCDC.5010805@biggjapan.com> Greetings all, I come from a computer science background and at the moment I work on a Bioinformatics software.I really see the necessity to learn more on bioinformatics , quickly :) I hear (and use blindly)all this words - "sequence alignment , epitopes , CDR , homology modeling ,docking,amino acids"...etc and at the moment I don't care much about them since I've been told what to happen and I implement it. Where can i learn about this concepts easily , I mean for a guy come from mathematical and IT background ?/ Best regards, umanga From holland at eaglegenomics.com Fri Jan 16 10:50:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 16 Jan 2009 10:50:35 +0000 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <496FDCDC.5010805@biggjapan.com> References: <496FDCDC.5010805@biggjapan.com> Message-ID: <4970667B.9030601@eaglegenomics.com> Your best bet is a good old fashioned book. ;) A quick search on Amazon threw up this one which looks like a very helpful intro to cell biology for people like you (and me!) who have come to bioinformatics from a computer science background: http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 Hopefully this is a good starting point. I'm sure everyone on this list has their own favourite books which they could recommend to you as well. cheers, Richard Ashika Umanga Umagiliya wrote: > Greetings all, > > I come from a computer science background and at the moment I work on a > Bioinformatics software.I really see the necessity to learn more on > bioinformatics , quickly :) > I hear (and use blindly)all this words - "sequence alignment , epitopes > , CDR , homology modeling ,docking,amino acids"...etc and at the moment > I don't care much about them since I've been told what to happen and I > implement it. > Where can i learn about this concepts easily , I mean for a guy come > from mathematical and IT background ?/ > > Best regards, > umanga > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Fri Jan 16 12:27:23 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 16 Jan 2009 20:27:23 +0800 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <4970667B.9030601@eaglegenomics.com> References: <496FDCDC.5010805@biggjapan.com> <4970667B.9030601@eaglegenomics.com> Message-ID: <93b45ca50901160427l5941f82dy18b68f5000c32722@mail.gmail.com> Wikipedia is always a good place to get a very rapid overview of some unfamiliar biological term. - Mark On Fri, Jan 16, 2009 at 6:50 PM, Richard Holland wrote: > > Your best bet is a good old fashioned book. ;) > > A quick search on Amazon threw up this one which looks like a very > helpful intro to cell biology for people like you (and me!) who have > come to bioinformatics from a computer science background: > > http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 > > Hopefully this is a good starting point. I'm sure everyone on this list > has their own favourite books which they could recommend to you as well. > > cheers, > Richard > > > Ashika Umanga Umagiliya wrote: > > Greetings all, > > > > I come from a computer science background and at the moment I work on a > > Bioinformatics software.I really see the necessity to learn more on > > bioinformatics , quickly :) > > I hear (and use blindly)all this words - "sequence alignment , epitopes > > , CDR , homology modeling ,docking,amino acids"...etc and at the moment > > I don't care much about them since I've been told what to happen and I > > implement it. > > Where can i learn about this concepts easily , I mean for a guy come > > from mathematical and IT background ?/ > > > > Best regards, > > umanga > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From koen.bruynseels at cropdesign.com Fri Jan 16 13:09:58 2009 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Fri, 16 Jan 2009 14:09:58 +0100 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 01/14/2009 and will not return until 01/25/2009. I will respond to your message when I return. From aumanga at biggjapan.com Mon Jan 19 00:43:31 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Mon, 19 Jan 2009 09:43:31 +0900 Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers? In-Reply-To: <4970667B.9030601@eaglegenomics.com> References: <496FDCDC.5010805@biggjapan.com> <4970667B.9030601@eaglegenomics.com> Message-ID: <4973CCB3.4000008@biggjapan.com> Thanks everyone for the tips.. I started reading "BioInformatics for Dummies" to get the basics..then hoping to move on to the book Richard recommended. Thank you again, Best regards, umanga Richard Holland wrote: > Your best bet is a good old fashioned book. ;) > > A quick search on Amazon threw up this one which looks like a very > helpful intro to cell biology for people like you (and me!) who have > come to bioinformatics from a computer science background: > > http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4 > > Hopefully this is a good starting point. I'm sure everyone on this list > has their own favourite books which they could recommend to you as well. > > cheers, > Richard > > > Ashika Umanga Umagiliya wrote: > >> Greetings all, >> >> I come from a computer science background and at the moment I work on a >> Bioinformatics software.I really see the necessity to learn more on >> bioinformatics , quickly :) >> I hear (and use blindly)all this words - "sequence alignment , epitopes >> , CDR , homology modeling ,docking,amino acids"...etc and at the moment >> I don't care much about them since I've been told what to happen and I >> implement it. >> Where can i learn about this concepts easily , I mean for a guy come >> from mathematical and IT background ?/ >> >> Best regards, >> umanga >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > From marcel.huntemann at gmail.com Wed Jan 21 02:42:14 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Tue, 20 Jan 2009 18:42:14 -0800 Subject: [Biojava-l] How to get translated sequence out of blast result Message-ID: <49768B86.20707@Gmail.com> Hi! I've a multiple fasta file with a lot of nucleotide sequences in it. I made a blastx with this file against a database. Now I want to parse the blast result. To be more precisely: I want to get the translated protein query sequence with it's start and stop position for each hit. I am using the example code from the BioJava cookbook (http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). The parsing works fine so far, besides one problem. I am able to get the start and stop position for the query sequence via hit.getQueryStart() and hit.getQueryEnd(). But I couldn't figure out yet, how to get the translated protein query sequence out of the blast result. I couldn't find something like hit.getQuerySequence() or similar. I would guess that something like that exists already somehwere or am I wrong and I've to implement it myself? Thanks, Marcel From markjschreiber at gmail.com Thu Jan 22 02:30:54 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 22 Jan 2009 10:30:54 +0800 Subject: [Biojava-l] Off topic: JDK6 and JAX-WS 2.1 Message-ID: <93b45ca50901211830i5af3e213p9db9a6d10f42fa75@mail.gmail.com> Sorry for the off topic post but this is something that has caused me to loose quite a bit of hair recently. If you're planning on doing webservice development with JAX-WS don't use JDK6 unless you use a version more recent than update 3. I'll spare you the gory details but versions of JDK6u4 and onwards use JAX-WS 2.1 which prevents the need for playing with endorsed directories etc which is very tricky in IDEs and not un-complicated with Ant. - Mark From marcel.huntemann at gmail.com Fri Jan 23 00:48:57 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 22 Jan 2009 16:48:57 -0800 Subject: [Biojava-l] Problem with blast file parser Message-ID: <497913F9.70009@Gmail.com> Hi! I am experiencing a strange problem with the Blast parser. I am using the code from the BioJava CookBook (http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). In the blast file are the results of 20 contigs. The problem is that the parser only gives me the results of every other sequence. So I get the results for contig # 1, 3, 5, 7, 9, 11 and then it continues with the even ones 12, 14, 16, 18 and 20. Did anyone experience the same problems or maybe knows what causes them? Thanks, Marcel From charles at imbusch.net Fri Jan 23 16:17:32 2009 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 23 Jan 2009 17:17:32 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497913F9.70009@Gmail.com> References: <497913F9.70009@Gmail.com> Message-ID: <4979ED9C.6040207@imbusch.net> Hello Marcel, I also do experience the problem that the parser is skipping the even result numbers. I have not found a sufficient solution for that, so I gave up on parsing on a blast result file containing multiple results. Instead I splitted up the big fasta file into serveral ones, so that I just get one result for one fasta file. That works, even it's not the best solution for it. Let me know if you find another solution for that problem. Cheers, Charles From markjschreiber at gmail.com Sat Jan 24 02:20:36 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 24 Jan 2009 10:20:36 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <4979ED9C.6040207@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> Message-ID: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Is this XML parsing or blast text output? - Mark On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch wrote: > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Sat Jan 24 03:54:02 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Fri, 23 Jan 2009 19:54:02 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Message-ID: <497A90DA.5070104@Gmail.com> As I said, I am using the code from http://biojava.org/wiki/BioJava:CookBook:Blast:Parser. I have a normal text file that was created by blast. I thought that the given code converts the input stream from the file into SAX events. Do I have to do another step, before I use the code of that example? Cheers, Marcel Mark Schreiber wrote: > Is this XML parsing or blast text output? > > - Mark > > On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch > wrote: > > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From charles at imbusch.net Sun Jan 25 11:54:53 2009 From: charles at imbusch.net (Charles Imbusch) Date: Sun, 25 Jan 2009 12:54:53 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> Message-ID: <497C530D.7090006@imbusch.net> Hello Mark, same here. I'm parsing plain text output. Cheers, Charles Mark Schreiber schrieb: > Is this XML parsing or blast text output? > > - Mark From markjschreiber at gmail.com Mon Jan 26 02:45:25 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 26 Jan 2009 10:45:25 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497C530D.7090006@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> Message-ID: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> Have you tried parsing the XML output. While the text parser does 'fake' XML by emitting SAX events I think that the XML parser may be a lot more stable (the XML output of blast is more stable, at least recently anyway). I know this isn't the best solution to your problem but the default text output of BLAST is not the most parseable. In fact it is probably the least machine readable of all the blast outputs and definitely the least stable. - Mark On Sun, Jan 25, 2009 at 7:54 PM, Charles Imbusch wrote: > > Hello Mark, > > same here. I'm parsing plain text output. > > Cheers, > Charles > > Mark Schreiber schrieb: >> >> Is this XML parsing or blast text output? >> >> - Mark From ahmed.elmasri at gmail.com Mon Jan 26 06:35:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Mon, 26 Jan 2009 01:35:20 -0500 Subject: [Biojava-l] Depreciated methods Message-ID: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> Hello list, I am new to BioJava and I have been trying some of its examples. I came across some of the depreciated methods and I am not sure if they will be removed entirely any time soon? I also found some of the examples problematic and didn't run properly even though I am following the instructions stated in the comment section. Please let me know if you have answers for my questions. Best wishes, Ahmed -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Mon Jan 26 08:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 08:10:42 +0000 Subject: [Biojava-l] Depreciated methods In-Reply-To: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> Message-ID: <497D7002.3010101@eaglegenomics.com> > I am new to BioJava and I have been trying some of its examples. I came > across some of the depreciated methods and I am not sure if they will be > removed entirely any time soon? Deprecated = may be removed without notice in any future release. I couldn't say for any individual method, but in general it's a bad idea to use anything that is marked deprecated when writing new code. > I also found some of the examples > problematic and didn't run properly even though I am following the > instructions stated in the comment section. Could you specify exactly which examples didn't work, and the exact problems you had with them? thanks, Richard > Please let me know if you have answers for my questions. > Best wishes, > Ahmed > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From charles at imbusch.net Mon Jan 26 11:04:09 2009 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 26 Jan 2009 12:04:09 +0100 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> Message-ID: <497D98A9.6010904@imbusch.net> Hello Mark, no I haven't tried to parse XML output from Blast. Just because of the fact that plain text output can be viewed with any editor. That's still very convenient. But I'm keen: is there actually an easy to use program for viewing XML output? Another option would be to generate XML and plain text output from Blast at the same time (in one run). But I couldn't find a way to do so. Maybe I missed something? Cheers, Charles Mark Schreiber schrieb: > Have you tried parsing the XML output. While the text parser does > 'fake' XML by emitting SAX events I think that the XML parser may be a > lot more stable (the XML output of blast is more stable, at least > recently anyway). > > I know this isn't the best solution to your problem but the default > text output of BLAST is not the most parseable. In fact it is probably > the least machine readable of all the blast outputs and definitely the > least stable. > > - Mark > From holland at eaglegenomics.com Mon Jan 26 11:21:26 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 11:21:26 +0000 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497D98A9.6010904@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> Message-ID: <497D9CB6.4010403@eaglegenomics.com> This app looks hopeful for viewing blast output - haven't tried it though...: http://www.korilog.com/index.php/BlastViewer.html Otherwise, no there's no way of making Blast output in more than one format at once. It's either text, or XML, but it won't do both. cheers, Richard Charles Imbusch wrote: > Hello Mark, > > no I haven't tried to parse XML output from Blast. > Just because of the fact that plain text output can be > viewed with any editor. That's still very convenient. > > But I'm keen: is there actually an easy to use program for viewing > XML output? > Another option would be to generate XML and plain text > output from Blast at the same time (in one run). But I couldn't find a > way to do so. Maybe I missed something? > > Cheers, > Charles > > Mark Schreiber schrieb: >> Have you tried parsing the XML output. While the text parser does >> 'fake' XML by emitting SAX events I think that the XML parser may be a >> lot more stable (the XML output of blast is more stable, at least >> recently anyway). >> >> I know this isn't the best solution to your problem but the default >> text output of BLAST is not the most parseable. In fact it is probably >> the least machine readable of all the blast outputs and definitely the >> least stable. >> >> - Mark >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ahmed.elmasri at gmail.com Mon Jan 26 17:24:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Mon, 26 Jan 2009 12:24:20 -0500 Subject: [Biojava-l] Depreciated methods In-Reply-To: <497D7002.3010101@eaglegenomics.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> <497D7002.3010101@eaglegenomics.com> Message-ID: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> Thanks Richard! For the deprecated methods, is there a reference or a wiki that maps the deprecated ones to ones that should be used instead? As for the examples I am having trouble with: WriteToFasta is one. Here is the error that I am getting: java.lang.IllegalArgumentException: No alphabet was set in the identifier at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928) at examples.WriteToFasta.main(WriteToFasta.java:43) And here is my parameter value: //get the int constant for the file type int fileType = Integer.parseInt("2"); I would appreciate any help. Best wishes, Ahmed On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland wrote: > > I am new to BioJava and I have been trying some of its examples. I came > > across some of the depreciated methods and I am not sure if they will be > > removed entirely any time soon? > > Deprecated = may be removed without notice in any future release. I > couldn't say for any individual method, but in general it's a bad idea > to use anything that is marked deprecated when writing new code. > > > I also found some of the examples > > problematic and didn't run properly even though I am following the > > instructions stated in the comment section. > > Could you specify exactly which examples didn't work, and the exact > problems you had with them? > > thanks, > Richard > > > Please let me know if you have answers for my questions. > > Best wishes, > > Ahmed > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Mon Jan 26 17:30:00 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 26 Jan 2009 17:30:00 +0000 Subject: [Biojava-l] Depreciated methods In-Reply-To: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com> <497D7002.3010101@eaglegenomics.com> <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com> Message-ID: <497DF318.8070703@eaglegenomics.com> Most methods include a note saying which method should be used instead. For those that don't, take a look in the org.biojavax packages to see if there are suitable alternative classes. In the case of the deprecated SeqIOTools.fileToBioJava, a much better version of the FASTA parser/writer exists in the org.biojavax packages. Instructions on how to use it are here: http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Example cheers, Richard Hamed, Ahmed A. wrote: > Thanks Richard! > > For the deprecated methods, is there a reference or a wiki that maps the > deprecated ones to ones that should be used instead? > > As for the examples I am having trouble with: WriteToFasta is one. Here > is the error that I am getting: > java.lang.IllegalArgumentException: No alphabet was set in the identifier > at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928) > at examples.WriteToFasta.main(WriteToFasta.java:43) > > And here is my parameter value: > //get the int constant for the file type > int fileType = Integer.parseInt("2"); > > I would appreciate any help. > Best wishes, > Ahmed > > > > > On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland > > wrote: > > > I am new to BioJava and I have been trying some of its examples. I > came > > across some of the depreciated methods and I am not sure if they > will be > > removed entirely any time soon? > > Deprecated = may be removed without notice in any future release. I > couldn't say for any individual method, but in general it's a bad idea > to use anything that is marked deprecated when writing new code. > > > I also found some of the examples > > problematic and didn't run properly even though I am following the > > instructions stated in the comment section. > > Could you specify exactly which examples didn't work, and the exact > problems you had with them? > > thanks, > Richard > > > Please let me know if you have answers for my questions. > > Best wishes, > > Ahmed > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Tue Jan 27 00:55:32 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 27 Jan 2009 08:55:32 +0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <497D98A9.6010904@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> Message-ID: <93b45ca50901261655n727219cdnd1ee27bf3b0d31e6@mail.gmail.com> You can generate plain text from XML using an XSLT. It probably won't be identical to the BLAST text output but that format is not so stable anyway. On Mon, Jan 26, 2009 at 7:04 PM, Charles Imbusch wrote: > Hello Mark, > > no I haven't tried to parse XML output from Blast. > Just because of the fact that plain text output can be > viewed with any editor. That's still very convenient. > > But I'm keen: is there actually an easy to use program for viewing > XML output? > Another option would be to generate XML and plain text > output from Blast at the same time (in one run). But I couldn't find a > way to do so. Maybe I missed something? > > Cheers, > Charles > > Mark Schreiber schrieb: >> >> Have you tried parsing the XML output. While the text parser does >> 'fake' XML by emitting SAX events I think that the XML parser may be a >> lot more stable (the XML output of blast is more stable, at least >> recently anyway). >> >> I know this isn't the best solution to your problem but the default >> text output of BLAST is not the most parseable. In fact it is probably >> the least machine readable of all the blast outputs and definitely the >> least stable. >> >> - Mark >> > > From nir at rosettadesigngroup.com Tue Jan 27 13:08:34 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Tue, 27 Jan 2009 15:08:34 +0200 Subject: [Biojava-l] Rosetta Academic Training Workshop Message-ID: <2F3C5E9F-418B-489F-A852-3A99248D03AC@rosettadesigngroup.com> Due to public demand, ?Rosetta Design Group? is organizing a ?Rosetta? software training workshop, aimed for academic groups. The format of the workshop will be a ?webinar? - a web seminar, enabling more groups to attend while avoiding the annoying jet lag and accommodation troubles. Would you be interested in participating? If so please fill the form located at: http://rosettadesigngroup.com/blog/rosetta-academic-workshop/ and we will contact you when the details are finalized.* Nir London | Rosetta Design Group http://rosettadesigngroup.com/ * If you?re not from an academic group, don?t worry, write us anyway? From gwu at molbio.mgh.harvard.edu Thu Jan 29 04:51:28 2009 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Wed, 28 Jan 2009 23:51:28 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <497D9CB6.4010403@eaglegenomics.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> Message-ID: <498135D0.9060103@molbio.mgh.harvard.edu> Hi Everyone, I have a piece of code to parse Genbank file and retrieve gene sequence and related information. It works well with sequences such as Arabidopsis thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome 2. The contig that the code failed on is the largest one in my test. Contig NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. That causes some gene coordinates out of range. Attached is the code. Can anyone give some suggesttion? The Mus musculus Genbank file can be downloaded at : ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz Thanks in advance Gang ========================================== public class TestMus { public void testMusChr2() throws FileNotFoundException, NoSuchElementException, BioException { String fp="/tmp/mm_alt_chr2.gbk"; System.out.println("File: " + fp); BufferedReader gReader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(fp)))); Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqI = RichSequence.IOTools.readGenbankDNA(gReader, ns); while (seqI.hasNext()) { RichSequence seq = seqI.nextRichSequence(); String organism = seq.getTaxon().getDisplayName(); String accession = seq.getAccession(); String identifier = seq.getIdentifier(); int taxonID = seq.getTaxon().getNCBITaxID(); String division = seq.getDivision(); String seqVersion = "" + seq.getSeqVersion(); int seqLength = seq.length(); String description = seq.getDescription(); System.out.println("Organism: " + organism + "\nAccession: " + accession + "\nIdentifier: " + identifier + "\nTaxonID: " + taxonID + "\nDivision: " + division + "\nSeqVersion: " + seqVersion + "\nLength: " + seqLength); System.out.println("2041-2101: " + seq.subStr(2041, 2101)); for (Iterator i = seq.features(); i.hasNext();) { RichFeature f = (RichFeature) i.next(); int rank = f.getRank(); String fType = f.getType(); if (fType.toLowerCase().equals("gene")) { int startPos=f.getLocation().getMin(); int endPos=f.getLocation().getMax(); int geneLen=endPos-startPos+1; String sequence=seq.subStr(startPos, endPos); String strand = f.getStrand().getToken() + ""; Annotation ann = (Annotation) f.getAnnotation(); String geneIdentifier =""; if (ann.containsProperty("locus_tag")) { geneIdentifier=ann.getProperty("locus_tag") + ""; } else geneIdentifier=ann.getProperty("gene") + ""; String alternativeIdentifiers=""; try { alternativeIdentifiers= (String) ann.getProperty("gene"); } catch(NoSuchElementException e) {} String annotation=""; System.out.println(rank + "\t" + geneIdentifier + "\t" + alternativeIdentifiers + "\t" + startPos + "\t" + endPos + "\t" + geneLen + "\t" + strand); } } } } public static void main(String [] args) throws Exception { TestMus tm=new TestMus(); tm.testMusChr2(); } } From markjschreiber at gmail.com Thu Jan 29 05:43:35 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 29 Jan 2009 13:43:35 +0800 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <498135D0.9060103@molbio.mgh.harvard.edu> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> Message-ID: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> I assume that the downloaded file has the complete sequence in it? Probably worth checking that it has the complete sequence block (all 116366104 bp). - Mark On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: > Hi Everyone, > > I have a piece of code to parse Genbank file and retrieve gene sequence and > related information. It works well with sequences such as Arabidopsis > thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome > 2. The contig that the code failed on is the largest one in my test. Contig > NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. > That causes some gene coordinates out of range. Attached is the code. Can > anyone give some suggesttion? > > The Mus musculus Genbank file can be downloaded at : > ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz > > Thanks in advance > > Gang > ========================================== > public class TestMus { > public void testMusChr2() throws FileNotFoundException, > NoSuchElementException, BioException { > String fp="/tmp/mm_alt_chr2.gbk"; > System.out.println("File: " + fp); > BufferedReader gReader = new BufferedReader(new InputStreamReader(new > FileInputStream(new File(fp)))); > Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqI = > RichSequence.IOTools.readGenbankDNA(gReader, ns); > while (seqI.hasNext()) { > RichSequence seq = seqI.nextRichSequence(); > String organism = seq.getTaxon().getDisplayName(); > String accession = seq.getAccession(); > String identifier = seq.getIdentifier(); > int taxonID = seq.getTaxon().getNCBITaxID(); > String division = seq.getDivision(); > String seqVersion = "" + seq.getSeqVersion(); > int seqLength = seq.length(); > String description = seq.getDescription(); > System.out.println("Organism: " + organism > + "\nAccession: " + accession > + "\nIdentifier: " + identifier > + "\nTaxonID: " + taxonID > + "\nDivision: " + division > + "\nSeqVersion: " + seqVersion > + "\nLength: " + seqLength); > System.out.println("2041-2101: " + seq.subStr(2041, 2101)); > for (Iterator i = seq.features(); i.hasNext();) { > RichFeature f = (RichFeature) i.next(); > int rank = f.getRank(); > String fType = f.getType(); > if (fType.toLowerCase().equals("gene")) { > int startPos=f.getLocation().getMin(); > int endPos=f.getLocation().getMax(); > int geneLen=endPos-startPos+1; > String sequence=seq.subStr(startPos, endPos); > String strand = f.getStrand().getToken() + ""; > Annotation ann = (Annotation) f.getAnnotation(); > String geneIdentifier =""; > if (ann.containsProperty("locus_tag")) { > geneIdentifier=ann.getProperty("locus_tag") + ""; > } > else geneIdentifier=ann.getProperty("gene") + ""; > > String alternativeIdentifiers=""; > try { > alternativeIdentifiers= (String) > ann.getProperty("gene"); > > } catch(NoSuchElementException e) {} > String annotation=""; > System.out.println(rank + "\t" + geneIdentifier + "\t" + > alternativeIdentifiers + "\t" > + startPos + "\t" + endPos + "\t" + geneLen + > "\t" + strand); > } > } > } > } > public static void main(String [] args) throws Exception { > TestMus tm=new TestMus(); > tm.testMusChr2(); > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at eaglegenomics.com Thu Jan 29 07:25:10 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 07:25:10 +0000 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> Message-ID: <498159D6.8010906@eaglegenomics.com> Gabrielle Doan posted a solution to this a while back and I believe the changes have been committed already: http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html How old is the copy of BioJava that you're using? Have you tried checking out the trunk from Subversion to see if that works? cheers, Richard Mark Schreiber wrote: > I assume that the downloaded file has the complete sequence in it? Probably > worth checking that it has the complete sequence block (all 116366104 bp). > > - Mark > > On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: > >> Hi Everyone, >> >> I have a piece of code to parse Genbank file and retrieve gene sequence and >> related information. It works well with sequences such as Arabidopsis >> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome >> 2. The contig that the code failed on is the largest one in my test. Contig >> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. >> That causes some gene coordinates out of range. Attached is the code. Can >> anyone give some suggesttion? >> >> The Mus musculus Genbank file can be downloaded at : >> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz >> >> Thanks in advance >> >> Gang >> ========================================== >> public class TestMus { >> public void testMusChr2() throws FileNotFoundException, >> NoSuchElementException, BioException { >> String fp="/tmp/mm_alt_chr2.gbk"; >> System.out.println("File: " + fp); >> BufferedReader gReader = new BufferedReader(new InputStreamReader(new >> FileInputStream(new File(fp)))); >> Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator seqI = >> RichSequence.IOTools.readGenbankDNA(gReader, ns); >> while (seqI.hasNext()) { >> RichSequence seq = seqI.nextRichSequence(); >> String organism = seq.getTaxon().getDisplayName(); >> String accession = seq.getAccession(); >> String identifier = seq.getIdentifier(); >> int taxonID = seq.getTaxon().getNCBITaxID(); >> String division = seq.getDivision(); >> String seqVersion = "" + seq.getSeqVersion(); >> int seqLength = seq.length(); >> String description = seq.getDescription(); >> System.out.println("Organism: " + organism >> + "\nAccession: " + accession >> + "\nIdentifier: " + identifier >> + "\nTaxonID: " + taxonID >> + "\nDivision: " + division >> + "\nSeqVersion: " + seqVersion >> + "\nLength: " + seqLength); >> System.out.println("2041-2101: " + seq.subStr(2041, 2101)); >> for (Iterator i = seq.features(); i.hasNext();) { >> RichFeature f = (RichFeature) i.next(); >> int rank = f.getRank(); >> String fType = f.getType(); >> if (fType.toLowerCase().equals("gene")) { >> int startPos=f.getLocation().getMin(); >> int endPos=f.getLocation().getMax(); >> int geneLen=endPos-startPos+1; >> String sequence=seq.subStr(startPos, endPos); >> String strand = f.getStrand().getToken() + ""; >> Annotation ann = (Annotation) f.getAnnotation(); >> String geneIdentifier =""; >> if (ann.containsProperty("locus_tag")) { >> geneIdentifier=ann.getProperty("locus_tag") + ""; >> } >> else geneIdentifier=ann.getProperty("gene") + ""; >> >> String alternativeIdentifiers=""; >> try { >> alternativeIdentifiers= (String) >> ann.getProperty("gene"); >> >> } catch(NoSuchElementException e) {} >> String annotation=""; >> System.out.println(rank + "\t" + geneIdentifier + "\t" + >> alternativeIdentifiers + "\t" >> + startPos + "\t" + endPos + "\t" + geneLen + >> "\t" + strand); >> } >> } >> } >> } >> public static void main(String [] args) throws Exception { >> TestMus tm=new TestMus(); >> tm.testMusChr2(); >> } >> } >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jw12 at sanger.ac.uk Thu Jan 29 11:20:47 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 29 Jan 2009 11:20:47 +0000 Subject: [Biojava-l] closing this week: Registrations for DAS workshop. Message-ID: <207B56E3-C65C-41A4-800E-AF0B9F158CA6@sanger.ac.uk> DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Registration is open for the 2009 DAS workshop (8,9,10th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.dasregistry.org/course.jsp and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Closing date for registration is 1st Feb 2009. If you register now you can change the details of your registration any time up until this closing date. Please register early as places will be limited. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at eaglegenomics.com Thu Jan 29 16:10:42 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 29 Jan 2009 16:10:42 +0000 Subject: [Biojava-l] Eagle Genomics is hiring Message-ID: <4981D502.1000905@eaglegenomics.com> Hi all, Apologies if this is inappropriate for the list, but I thought it would be a good way to reach the kind of people we're looking for. Richard ===== Senior Bioinformatics Software Developer Eagle Genomics Ltd., Cambridge, UK http://www.eaglegenomics.com/ We are a young and exciting bioinformatics company looking to revolutionise the way in which industry and academia work together. We are based at the heart of Europe's largest biotech cluster in Cambridge, UK. As we expand our client base, we're looking to build a talented and committed team of experts. We are currently looking for a software developer to work on a wide range of complex projects, and who is happy to work face-to-face with our customers. Ideally you will have had substantial prior experience working in a life science company or research institute, however we will also consider graduates with a track record in bioinformatics. In addition to your superb technical skills, you will also: * have the ability to quickly translate scientific problems into real software solutions, * be able to put technical concepts into simple language for end users to understand, * be able to pick up new skills and techniques in record time, * work well in a collaborative team environment, * be creative, innovative, and forward-thinking. You will have hands-on experience in some of the following: * Java, * Perl, * SQL query design, * Relational database schema design, * Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL, etc., * Ensembl, * BioMart, * DAS, * Taverna, * Oracle Life Sciences Platform, * Oracle database administration, * MySQL database administration, * VMware virtual machines, * Grid computing and parallelisation. The preferred candidate will be able to work from our offices in Cambridge, but we would also consider telecommuting arrangements. We offer a competitive salary and a range of company benefits. To apply, please send your CV and cover letter as PDF documents to jobs at eaglegenomics.com. If you have any questions about the position or would like to discuss it further before applying, please use the same email address. We are only able to offer positions to EEA citizens and permanent residents, or Tier 1 migrants under the new UK points-based immigration scheme. Individual contracting arrangements could be considered but we will prefer those candidates who can work with us as employees. No agencies please. -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From gwu at molbio.mgh.harvard.edu Thu Jan 29 18:40:06 2009 From: gwu at molbio.mgh.harvard.edu (gwu) Date: Thu, 29 Jan 2009 13:40:06 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> Message-ID: <4981F806.7070100@molbio.mgh.harvard.edu> Thanks Mark. I did parse out the sequence block with sed and the length agrees with what the Genbank says. Gang Mark Schreiber wrote: > I assume that the downloaded file has the complete sequence in it? > Probably worth checking that it has the complete sequence block (all > 116366104 bp). > > - Mark > > On Thu, Jan 29, 2009 at 12:51 PM, gang wu > wrote: > > Hi Everyone, > > I have a piece of code to parse Genbank file and retrieve gene > sequence and related information. It works well with sequences > such as Arabidopsis thaliana, C. elegans, Bos taurus. But it > failed with Mus musculus chromosome 2. The contig that the code > failed on is the largest one in my test. Contig NT_039207 has > 116366104 bp, but the code shows it's cut to 100000020 bp. That > causes some gene coordinates out of range. Attached is the code. > Can anyone give some suggesttion? > > The Mus musculus Genbank file can be downloaded at : > ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz > > Thanks in advance > > Gang > ========================================== > public class TestMus { > public void testMusChr2() throws FileNotFoundException, > NoSuchElementException, BioException { > String fp="/tmp/mm_alt_chr2.gbk"; > System.out.println("File: " + fp); > BufferedReader gReader = new BufferedReader(new > InputStreamReader(new FileInputStream(new File(fp)))); > Namespace ns = (Namespace) > RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqI = > RichSequence.IOTools.readGenbankDNA(gReader, ns); > while (seqI.hasNext()) { > RichSequence seq = seqI.nextRichSequence(); > String organism = seq.getTaxon().getDisplayName(); > String accession = seq.getAccession(); > String identifier = seq.getIdentifier(); > int taxonID = seq.getTaxon().getNCBITaxID(); > String division = seq.getDivision(); > String seqVersion = "" + seq.getSeqVersion(); > int seqLength = seq.length(); > String description = seq.getDescription(); > System.out.println("Organism: " + organism > + "\nAccession: " + accession > + "\nIdentifier: " + identifier > + "\nTaxonID: " + taxonID > + "\nDivision: " + division > + "\nSeqVersion: " + seqVersion > + "\nLength: " + seqLength); > System.out.println("2041-2101: " + seq.subStr(2041, 2101)); > for (Iterator i = seq.features(); i.hasNext();) { > RichFeature f = (RichFeature) i.next(); > int rank = f.getRank(); > String fType = f.getType(); > if (fType.toLowerCase().equals("gene")) { > int startPos=f.getLocation().getMin(); > int endPos=f.getLocation().getMax(); > int geneLen=endPos-startPos+1; > String sequence=seq.subStr(startPos, endPos); > String strand = f.getStrand().getToken() + ""; > Annotation ann = (Annotation) f.getAnnotation(); > String geneIdentifier =""; > if (ann.containsProperty("locus_tag")) { > geneIdentifier=ann.getProperty("locus_tag") > + ""; > } > else geneIdentifier=ann.getProperty("gene") + ""; > > String alternativeIdentifiers=""; > try { > alternativeIdentifiers= (String) > ann.getProperty("gene"); > > } catch(NoSuchElementException e) {} > String annotation=""; > System.out.println(rank + "\t" + geneIdentifier > + "\t" + alternativeIdentifiers + "\t" > + startPos + "\t" + endPos + "\t" + > geneLen + "\t" + strand); > } > } > } > } > public static void main(String [] args) throws Exception { > TestMus tm=new TestMus(); > tm.testMusChr2(); > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From gwu at molbio.mgh.harvard.edu Thu Jan 29 19:28:42 2009 From: gwu at molbio.mgh.harvard.edu (gang wu) Date: Thu, 29 Jan 2009 14:28:42 -0500 Subject: [Biojava-l] Genbank file parser error In-Reply-To: <498159D6.8010906@eaglegenomics.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com> <497C530D.7090006@imbusch.net> <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com> <497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com> <498135D0.9060103@molbio.mgh.harvard.edu> <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com> <498159D6.8010906@eaglegenomics.com> Message-ID: <4982036A.7070302@molbio.mgh.harvard.edu> Thanks Richard. That is exactly the same issue. The latest Subversion trunk fixed the problem. Thanks again for the quick response. Gang Richard Holland wrote: > Gabrielle Doan posted a solution to this a while back and I believe the > changes have been committed already: > > http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html > > How old is the copy of BioJava that you're using? Have you tried > checking out the trunk from Subversion to see if that works? > > cheers, > Richard > > Mark Schreiber wrote: > >> I assume that the downloaded file has the complete sequence in it? Probably >> worth checking that it has the complete sequence block (all 116366104 bp). >> >> - Mark >> >> On Thu, Jan 29, 2009 at 12:51 PM, gang wu wrote: >> >> >>> Hi Everyone, >>> >>> I have a piece of code to parse Genbank file and retrieve gene sequence and >>> related information. It works well with sequences such as Arabidopsis >>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome >>> 2. The contig that the code failed on is the largest one in my test. Contig >>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. >>> That causes some gene coordinates out of range. Attached is the code. Can >>> anyone give some suggesttion? >>> >>> The Mus musculus Genbank file can be downloaded at : >>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz >>> >>> Thanks in advance >>> >>> Gang >>> ========================================== >>> public class TestMus { >>> public void testMusChr2() throws FileNotFoundException, >>> NoSuchElementException, BioException { >>> String fp="/tmp/mm_alt_chr2.gbk"; >>> System.out.println("File: " + fp); >>> BufferedReader gReader = new BufferedReader(new InputStreamReader(new >>> FileInputStream(new File(fp)))); >>> Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); >>> RichSequenceIterator seqI = >>> RichSequence.IOTools.readGenbankDNA(gReader, ns); >>> while (seqI.hasNext()) { >>> RichSequence seq = seqI.nextRichSequence(); >>> String organism = seq.getTaxon().getDisplayName(); >>> String accession = seq.getAccession(); >>> String identifier = seq.getIdentifier(); >>> int taxonID = seq.getTaxon().getNCBITaxID(); >>> String division = seq.getDivision(); >>> String seqVersion = "" + seq.getSeqVersion(); >>> int seqLength = seq.length(); >>> String description = seq.getDescription(); >>> System.out.println("Organism: " + organism >>> + "\nAccession: " + accession >>> + "\nIdentifier: " + identifier >>> + "\nTaxonID: " + taxonID >>> + "\nDivision: " + division >>> + "\nSeqVersion: " + seqVersion >>> + "\nLength: " + seqLength); >>> System.out.println("2041-2101: " + seq.subStr(2041, 2101)); >>> for (Iterator i = seq.features(); i.hasNext();) { >>> RichFeature f = (RichFeature) i.next(); >>> int rank = f.getRank(); >>> String fType = f.getType(); >>> if (fType.toLowerCase().equals("gene")) { >>> int startPos=f.getLocation().getMin(); >>> int endPos=f.getLocation().getMax(); >>> int geneLen=endPos-startPos+1; >>> String sequence=seq.subStr(startPos, endPos); >>> String strand = f.getStrand().getToken() + ""; >>> Annotation ann = (Annotation) f.getAnnotation(); >>> String geneIdentifier =""; >>> if (ann.containsProperty("locus_tag")) { >>> geneIdentifier=ann.getProperty("locus_tag") + ""; >>> } >>> else geneIdentifier=ann.getProperty("gene") + ""; >>> >>> String alternativeIdentifiers=""; >>> try { >>> alternativeIdentifiers= (String) >>> ann.getProperty("gene"); >>> >>> } catch(NoSuchElementException e) {} >>> String annotation=""; >>> System.out.println(rank + "\t" + geneIdentifier + "\t" + >>> alternativeIdentifiers + "\t" >>> + startPos + "\t" + endPos + "\t" + geneLen + >>> "\t" + strand); >>> } >>> } >>> } >>> } >>> public static void main(String [] args) throws Exception { >>> TestMus tm=new TestMus(); >>> tm.testMusChr2(); >>> } >>> } >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > From marcel.huntemann at gmail.com Thu Jan 29 20:55:15 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 29 Jan 2009 12:55:15 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <4979ED9C.6040207@imbusch.net> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> Message-ID: <498217B3.4010703@Gmail.com> Hi Charles! I've "found" a solution now. After dealing a couple of days with the terrible xml output of blast and BioJava's BlastXMLParser (which also wasn't working properly), I decided to have a look at the source code and try to figure out myself what was wrong with the BlastLikeSAXParser. So I checked out the present status of the source code via the anonymous svn checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After a couple of hours and me not finding an error that could cause this behavior, I thought I'll just give it a try and compiled the checked out source via ant. Then used the new created biojava.jar and suddenly everything went perfectly! So, whatever the error was (unfortunately I don't have the old source code to make a diff on certain files), it is already corrected in the up-to-the-minute version in the subversion system. Try it out! Cheers, Marcel Charles Imbusch wrote: > Hello Marcel, > > I also do experience the problem that the parser is skipping > the even result numbers. I have not found a sufficient solution > for that, so I gave up on parsing on a blast result file containing > multiple results. Instead I splitted up the big fasta file into > serveral ones, so that I just get one result for one fasta file. > That works, even it's not the best solution for it. > > Let me know if you find another solution for that problem. > > Cheers, > Charles > > From andreas at sdsc.edu Thu Jan 29 21:11:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 29 Jan 2009 13:11:51 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <498217B3.4010703@Gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com> Message-ID: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> Hi, We had a couple of bug reports recently regarding issues that already got fixed in the latest biojava builds from SVN. I think it is time to start preparing the next biojava release ( 1.7 ) to make sure everybody gets up to the latest status... Andreas On Thu, Jan 29, 2009 at 12:55 PM, Marcel Huntemann wrote: > Hi Charles! > > I've "found" a solution now. After dealing a couple of days with the > terrible xml output of blast and BioJava's BlastXMLParser (which also > wasn't working properly), I decided to have a look at the source code and > try to figure out myself what was wrong with the BlastLikeSAXParser. So I > checked out the present status of the source code via the anonymous svn > checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After > a couple of hours and me not finding an error that could cause this > behavior, I thought I'll just give it a try and compiled the checked out > source via ant. Then used the new created biojava.jar and suddenly > everything went perfectly! > So, whatever the error was (unfortunately I don't have the old source code > to make a diff on certain files), it is already corrected in the > up-to-the-minute version in the subversion system. > Try it out! > > Cheers, > Marcel > > > Charles Imbusch wrote: >> Hello Marcel, >> >> I also do experience the problem that the parser is skipping >> the even result numbers. I have not found a sufficient solution >> for that, so I gave up on parsing on a blast result file containing >> multiple results. Instead I splitted up the big fasta file into >> serveral ones, so that I just get one result for one fasta file. >> That works, even it's not the best solution for it. >> >> Let me know if you find another solution for that problem. >> >> Cheers, >> Charles >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Thu Jan 29 21:24:55 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 29 Jan 2009 13:24:55 -0800 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com> <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> Message-ID: <49821EA7.9090603@Gmail.com> That sounds reasonable. I bet a lot of people would appreciate that! Andreas Prlic wrote: > Hi, > > We had a couple of bug reports recently regarding issues that already > got fixed in the latest biojava builds from SVN. I think it is time to > start preparing the next biojava release ( 1.7 ) to make sure > everybody gets up to the latest status... > > Andreas From marcin.swiatek at mail.mcgill.ca Thu Jan 29 21:56:29 2009 From: marcin.swiatek at mail.mcgill.ca (Marcin Swiatek) Date: Thu, 29 Jan 2009 16:56:29 -0500 Subject: [Biojava-l] Problem with blast file parser In-Reply-To: <49821EA7.9090603@Gmail.com> References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net> <498217B3.4010703@Gmail.com><59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com> <49821EA7.9090603@Gmail.com> Message-ID: <176A06E658ED0745965C072C5F2C116A02F87314@EXCHANGE2VS2.campus.mcgill.ca> I personally would. Especially that I have just solved the problem myself, unaware that someone did that already. BTW: the problem I picked up (which seems similar to the description given) was that new set line (as evaluated by checkNewBlastLikeDataSet in BlastSAXParser) wasn't picked up by HitSectionSAXParser and neither it percolated up to BlastSAXParser, thus leaving the state machine of the parser in a weird state. It would recover by skipping everything up to the next data set (thus the result of having every other item processed). BTW2: XML parser in 1.6 doesn't deal with new BLAST files either (2.19, was it?). Has this been fixed in the SVN repository? Cheers, Marcin -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Marcel Huntemann Sent: Thursday, January 29, 2009 4:25 PM To: Andreas Prlic Cc: biojava-dev; biojava-l at biojava.org Subject: Re: [Biojava-l] Problem with blast file parser That sounds reasonable. I bet a lot of people would appreciate that! Andreas Prlic wrote: > Hi, > > We had a couple of bug reports recently regarding issues that already > got fixed in the latest biojava builds from SVN. I think it is time to > start preparing the next biojava release ( 1.7 ) to make sure > everybody gets up to the latest status... > > Andreas _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From umanga.bio at gmail.com Fri Jan 30 12:00:41 2009 From: umanga.bio at gmail.com (Ashika Umanga Umangiliya) Date: Fri, 30 Jan 2009 21:00:41 +0900 Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ? Message-ID: Greetings all, In the application I develop ,I want to draw chromatogram from AB1. I come from computer science background have little knowledge this subject.Where can I find information on this? Can I draw the graph using data in AB1 file? Or is there any function ? thanks in advance, Umanga From ayates at ebi.ac.uk Fri Jan 30 12:57:40 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Jan 2009 12:57:40 +0000 Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ? In-Reply-To: References: Message-ID: <4982F944.7080905@ebi.ac.uk> Hi Umanga, Fortunately BioJava has an API for drawing chromatograms located under org.biojava.bio.chromatogram & org.biojava.bio.chromatogram.graphic. To parse in a AB1 file you can run the following code: import java.io.*; import org.biojava.bio.program.abi.ABIFChromatogram; import org.biojava.bio.chromatogram.*; import org.biojava.bio.chromatogram.graphic.*; File file = new File("chr.ab1"); Chromatogram c = ABIFChromatogram.parse(file); ChromatogramGraphic cg = new ChromatogramGraphic(c); //Can't remember how to get this so you'll have to find out Graphics2D context = getContextFromSomewhere(); cg.drawTo(cg); You can configure the size of the image through the ChromatogramGraphic object & alter a number of ChromatogramGraphic.Option attributes through ChromatogramGraphic.setOption(ChromatogramGraphic.Option opt, Object value). This should be enough to get you going. I will warn you that this class is quite memory intensive & an application I wrote ages ago had very big memory problems because of it (the drawing component not the file parsing). An alternative library is available from http://code.google.com/p/bioview2/ (which was developed by an old colleague). Try the biojava code first and if that serves your purpose then great; if not then try bioview2. Regards, Andy Yates P.S. The AB1 parser only supports the processed data channels in the AB1 file. If you want the raw data from it then you will have to modify the source or use another library (probably the C library StadenIO) to convert the raw data into an SCF file. Ashika Umanga Umangiliya wrote: > Greetings all, > > In the application I develop ,I want to draw chromatogram from AB1. I come > from computer science background have little knowledge this subject.Where > can I find information on this? > Can I draw the graph using data in AB1 file? Or is there any function ? > > > thanks in advance, > > Umanga > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ahmed.elmasri at gmail.com Sat Jan 31 02:41:31 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Fri, 30 Jan 2009 21:41:31 -0500 Subject: [Biojava-l] Sequence start/end location Message-ID: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> Hello list, I am trying to get find the start and end location of a gene in a gene sequence. I am reading from a gene FASTA database file. Is there a built-in method that I can use? The alternative is really painful since I have to parse a ptt file and not exactly working for me. Thanks very much! Ahmed From markjschreiber at gmail.com Sat Jan 31 04:15:59 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 12:15:59 +0800 Subject: [Biojava-l] Sequence start/end location In-Reply-To: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> Message-ID: <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com> Hi - Unfortunately your FASTA file won't contain any feature information which could tell you the start and end. If you don't want to get the info from the PTT file you might want to look at parsing the Genbank file instead which will have the feature information. A PTT parser might not be a bad thing for BioJava though. If you write one please consider adding it. - Mark On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A. wrote: > > Hello list, > I am trying to get find the start and end location of a gene in a gene > sequence. I am reading from a gene FASTA database file. Is there a built-in > method that I can use? The alternative is really painful since I have to > parse a ptt file and not exactly working for me. > Thanks very much! > Ahmed > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From markjschreiber at gmail.com Sat Jan 31 10:25:59 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 31 Jan 2009 18:25:59 +0800 Subject: [Biojava-l] Sequence start/end location In-Reply-To: <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com> References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com> <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com> <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com> Message-ID: <93b45ca50901310225p6676c282m203e8c4e13ba37f1@mail.gmail.com> Hi Ahmed - For a first time contribution it would probably be easiest to post something to the list and someone with a development account can check it in for you. Please make sure to add javadoc comments and a basic JUnit test for any classes you make. - Mark On Sat, Jan 31, 2009 at 3:13 PM, Hamed, Ahmed A. wrote: > Dear Mark, > Thank you for your response. I would be happy to contribute my PTTParser if > you point me to where/how to check it in. I am still new to the BioJava > community and there is so much to learn. > Best wishes, > Ahmed > > On Fri, Jan 30, 2009 at 11:15 PM, Mark Schreiber > wrote: >> >> Hi - >> >> Unfortunately your FASTA file won't contain any feature information >> which could tell you the start and end. If you don't want to get the >> info from the PTT file you might want to look at parsing the Genbank >> file instead which will have the feature information. >> >> A PTT parser might not be a bad thing for BioJava though. If you write >> one please consider adding it. >> >> - Mark >> >> On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A. >> wrote: >> > >> > Hello list, >> > I am trying to get find the start and end location of a gene in a gene >> > sequence. I am reading from a gene FASTA database file. Is there a >> > built-in >> > method that I can use? The alternative is really painful since I have to >> > parse a ptt file and not exactly working for me. >> > Thanks very much! >> > Ahmed >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > >