From jchavird at gmail.com Tue Jun 1 12:55:34 2010 From: jchavird at gmail.com (Justin Havird) Date: Tue, 1 Jun 2010 11:55:34 -0500 Subject: [EMBOSS] Tranalign relaxation? In-Reply-To: References:

Message-ID: Thanks, Yes the problem is that some taxa don't follow the standard genetic codes available for some codons, usually start codons. I may try to pass the sequences to tranalign without the first codon and see if that will clear up a majority of the troublemakers. I thin the issue with "unambigouous ambiguous codons" is what is happening in cases where there is an ambiguous nucleotide position, even though the amino acid should be able to be determined. Thanks again and let me know if anything else comes to mind, Justin On Mon, May 31, 2010 at 6:08 PM, Peter wrote: > On Mon, May 31, 2010 at 11:38 PM, Justin Havird > wrote: > > Hi Peter, > > > > Yes, I have been using the correct mitochondrial translation codes. > > Depending on the taxonomic group, there are about 4 different > mitochondrial > > translation codes that I have been using. The problem is that some > > individual species use alternate start codons, etc. These are generally > > case-by-case problems. > > > > Someone suggested I alter the tranalign program itself and create a new > > translation table that would take these alternate start codons into > > consideration. However, I haven't had much luck with this. > > > > Any other ideas? > > > > Thanks! > > > > Justin > > If I understand correctly, part of the problem is your organisms > don't all fully follow the standard genetic codes available in > EMBOSS. Therefore would a new tranalign option to give the > start codon(s) to use be helpful? > > Personally I'd probably solve this kind of thing by writing a script > in (Bio)python. I might still call tranalign to do the hard work but > might try passing it the sequences with the first codon/amino > acid removed. > > You didn't CC the mailing list - perhaps an accident? Please > feel free to forward my reply back to the list. > > Peter C. > > P.S. There were some issues with using EMBOSS transeq for > the translation of "unambigouous ambiguous codons" which I > reported back in July 2009: > http://lists.open-bio.org/pipermail/emboss/2009-July/003667.html > > I haven't been back to test if the latest EMBOSS release > addressed it (EMBOSS 6.1.0 is still broken in this regard > I think), but your problem with ambiguous IUPAC codes > may be linked to this. > From asidhu at biomap.org Thu Jun 3 12:38:08 2010 From: asidhu at biomap.org (Amandeep Sidhu) Date: Fri, 4 Jun 2010 00:38:08 +0800 Subject: [EMBOSS] CFP: 23rd IEEE International Symposium on Computer-Based Medical Systems 2010 (IEEE CBMS 2010) Message-ID: IEEE CBMS 2010 23rd IEEE International Symposium on Computer-Based Medical Systems 2010 Perth, Australia, 12-15 October 2010 http://www.cbms2010.curtin.edu.au/ The 23rd IEEE International Symposium on Computer-Based Medical Systems (CBMS 2010) is intended to provide an international forum for discussing the latest results in the field of computational medicine. The scientific program of CBMS 2010 will consist of invited keynote talks given by leading scientists in the field, and regular and special track sessions that cover a broad array of issues which relate computing to medicine. RELEVANT TOPICS Network and Telemedicine Systems Medical Databases & Information Systems Computer-Aided Diagnosis Medical Devices with Embedded Computers Bioinformatics in Medicine Software Systems in Medicine Pervasive Health Systems and Services Web-based Delivery of Medical Information Medical Image Segmentation & Compression Content Analysis of Biomedical Image Data Knowledge-Based & Decision Support Systems Hand-held Computing Applications in Medicine Knowledge Discovery & Data Mining Signal and Image Processing in Medicine Multimedia Biomedical Databases CBMS 2010 invites original previously unpublished contributions that are not submitted concurrently to a journal or another conference. Many of the above listed topics are represented by corresponding Special Tracks, while others are solely covered by the general CBMS track. Prospective authors are expected to submit their contributions to one of the corresponding Special Tracks or to the general track if none of the special tracks is relevant. SPECIAL TRACKS ST1: Computational Proteomics and Genomics ST2: Knowledge Discovery and Decision Systems in Biomedicine ST3: Ontologies for Biomedical Systems ST4: HealthGrid & Cloud Computing ST5: Technology Enhanced Learning in Medical Education ST6: Intelligent Patient Management ST7: Data Streams in Healthcare ST8: Supporting Collaboration among Healthcare Workers ST9: Telemedicine ST10: Computer-Based Systems for Mental Health ST11: Image Informatics in Biomedical Research and Clinical Medicine ST12: e-Health SUBMISSION GUIDELINES Papers should be submitted electronically using EasyChair online submission system. The papers must be prepared following the IEEE two-column format and should not exceed the length of 6 (six) Letter-sized pages. LaTeX or Microsoft Word templates can be used when preparing the papers. Please, note that only PDF format of submissions is allowed. Submission web site: http://www.easychair.org/conferences/?conf=cbms2010 All submissions will be peer-reviewed by at least three reviewers. The proceedings will be published by the IEEE Computer Society Press. At least one of the authors of accepted papers is required to register and present the work at the conference; otherwise their papers will be removed from the digital library after the conference. IMPORTANT DATES Submission deadline for regular papers: 24 June 2010 Deadline for tutorial submission: 24 June 2010 Notification of acceptation for papers and tutorials: 2 Aug 2010 Final camera ready due: 2 Sep 2010 Author registration: 2 Sep 2010 INTENDED AUDIENCE Engineers, scientists, clinicians and managers involved in medical computing projects are encouraged to submit papers to the symposium and/or attend the symposium. The symposium provides its attendees with an opportunity to experience state-of-the-art research and development in a variety of topics directly and indirectly related to their own work. In addition to research papers, keynote speakers and tutorial sessions it provides participants with an opportunity to come up-to-date on important technological issues. The symposium encourages the participation of students engaged in research/development in computer-based medical systems. Organizing Committee GENERAL CHAIRS Tharam Dillon, Curtin University of Technology, Australia Daniel Rubin, National Center for Biomedical Ontologies, USA William Gallagher, University College Dublin, Ireland PROGRAM CHAIRS Amandeep Sidhu, Curtin University of Technology, Australia Alexey Tsymbal, Siemens, Germany PUBLICATION CHAIRS Mykola Pechenizkiy, Eindhoven University of Technology, Netherlands Tony Hu, Drexel University, USA SPECIAL TRACK CHAIRS Maja Hadzic, Curtin University of Technology, Australia Jake Chen, Indiana University, USA TUTORIAL CHAIRS Phoebe Chen, La Trobe University, Australia Xiaofang Zhou, University of Queensland, Australia PUBLICITY CHAIRS Carolyn McGregor, University of Ontario Institute of Technology, Canada Meifania Chen, Curtin University of Technology, Australia From charles-listes-emboss at plessy.org Mon Jun 7 07:59:51 2010 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 7 Jun 2010 20:59:51 +0900 Subject: [EMBOSS] SAM as a sequence input and output format? Message-ID: <20100607115951.GD4213@kunpuu.plessy.org> Dear EMBOSS developers, I often look for a sequence format that would put all the information on a single line easy to parse with a Perl or sed filter, and was wondering if???in the absence of a better alternative???it would be possible for EMBOSS to include the SAM format as a sequence format. http://samtools.sourceforge.net/samtools.shtml It may look heretic at a first sight, but I see that other projects are also using SAM as format to store sequences with qualities: http://picard.sourceforge.net/command-line-overview.shtml#FastqToSam Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Mon Jun 7 08:41:08 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 07 Jun 2010 13:41:08 +0100 Subject: [EMBOSS] SAM as a sequence input and output format? In-Reply-To: <20100607115951.GD4213@kunpuu.plessy.org> References: <20100607115951.GD4213@kunpuu.plessy.org> Message-ID: <4C0CE8E4.90409@ebi.ac.uk> On 07/06/2010 12:59, Charles Plessy wrote: > Dear EMBOSS developers, > > I often look for a sequence format that would put all the information on a > single line easy to parse with a Perl or sed filter, and was wondering if ? in > the absence of a better alternative ? it would be possible for EMBOSS to > include the SAM format as a sequence format. Yes ... already included as an input format. We are working on it as an output format, perhaps in time for the next release. For large numbers of sequneces, we are also working on the binary (BAM) version of the format which is popular because of the high level of compression. regards, Peter Rice From roi.brodo at hotmail.com Wed Jun 16 03:13:05 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 07:13:05 +0000 Subject: [EMBOSS] =?windows-1256?q?Get_a_partial_GenBank_of_a_circular_DNA?= =?windows-1256?q?=FE?= Message-ID: Hello, Suppose I have some circular DNA RefSeq in a Genbank format. I would like to create a new (smaller) Genbank file (both features and sequence) containing only a range of nucleotides from the original file, but a range that overlaps the "end". For example, if my original Genbank spans 1,000,000 base (1-1,000,000) I would like to get join(800,000-1,000,000, 1-100,000), so the new Genbank file will actually contain 300,000 bp from the circular DNA. How can I do that? Again, my input is a Genbank file and output should also be a Gebnank file. Thanks! - Roi _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 From jison at ebi.ac.uk Wed Jun 16 04:21:42 2010 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 16 Jun 2010 09:21:42 +0100 (BST) Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular_DNA?= =?iso-8859-1?q?=FE?= In-Reply-To: References: Message-ID: <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> Hi Roi You could try cutseq to remove the middle bit of the sequence: http://emboss.sourceforge.net/apps/cvs/emboss/apps/cutseq.html extractseq might also be worth a look: http://emboss.sourceforge.net/apps/cvs/emboss/apps/extractseq.html Have a play and let us know how you get on. Cheers Jon > > Hello, > > > > Suppose I have some circular DNA RefSeq in a Genbank format. I would > like to create a new (smaller) Genbank file (both features and sequence) > containing only a range of nucleotides from the original file, but a > range that overlaps the "end". For example, if my original Genbank spans > 1,000,000 base (1-1,000,000) I would like to get join(800,000-1,000,000, > 1-100,000), so the new Genbank file will actually contain 300,000 bp > from the circular DNA. > > > > How can I do that? Again, my input is a Genbank file and output should > also be a Gebnank file. > > > > Thanks! > > - Roi > _________________________________________________________________ > Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From roi.brodo at hotmail.com Wed Jun 16 08:13:20 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 12:13:20 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> References: , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> Message-ID: Hi Jon and thanks for the reply. I'm not sure how cutseq can create a genbank file (a single file with both the features and the sequence). I also didn't find how can extractseq can be used to achieve this task. Thanks, Roi _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 From roi.brodo at hotmail.com Wed Jun 16 08:55:25 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 12:55:25 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>, Message-ID: After some more reading I think I can do it using union. The problem is that after I create the list (of the two ranges) using yank, union dies on "union terminated: Bad value for '-sequence' and no prompt". Why is that? Shouldn't i use a yank file? _________________________________________________________________ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 From pmr at ebi.ac.uk Thu Jun 17 04:38:21 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 17 Jun 2010 09:38:21 +0100 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>,

Message-ID: <4C19DEFD.2080601@ebi.ac.uk> On 16/06/2010 13:55, Roi Brodo wrote: > > After some more reading I think I can do it using union. The problem is that after I create the list (of the two ranges) using yank, union dies on "union terminated: Bad value for '-sequence' and no prompt". Why is that? Shouldn't i use a yank file? Yes, yank and union is the correct approach. The output of yank is a list file, so the input to union should be @filename to read a list of sequence addresses from the file. If you just give the filename it assumes it is sequences (perhaps a fasta file of sequences to be joined). We will add this to our feature requests - it should be possible to make seqret handle ranges from circular sequences. This will be after the next release as it requires rewriting the way several library functions work to allow the circular range. regards, Peter Rice From roi.brodo at hotmail.com Thu Jun 17 04:45:17 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Thu, 17 Jun 2010 08:45:17 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: <4C19DEFD.2080601@ebi.ac.uk> References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>,

, <4C19DEFD.2080601@ebi.ac.uk> Message-ID: Thank you. May you keep up the great work! _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 From korndoerfer at crelux.com Fri Jun 18 03:58:27 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Fri, 18 Jun 2010 09:58:27 +0200 Subject: [EMBOSS] pepstats vs protparam Message-ID: <4C1B2723.3010806@crelux.com> dear fellow emboss users, we have in the past used the wellknown expasy protparam tool to calculate our extinction coefficients for proteins. easy for most users and that is what they got used to. we are now switching to an increased use of emboss, since it can be so nicely incorporated into our database programs and pipelines. we now noted that pepstats does give different values for absorption coefficients for the same sequence. i have looked at the pepstats docu but could not find anything on the algorithm used. really, the math seems simple E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine) with Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125; (from the protparam docu). would anybody know whether this is what pepstats uses (and help me avoid having to dig through the source code). and second, it seems that the expasy colleagues have tweaked their algorithm further see http://expasy.org/tools/protparam-doc.html (see the important note around the middle of the page). would anybody have an opinion whether it would be adequate (even allowed) to emulate protparam behaviour in peptstats or to give the user a choice to do so ? it seems this might be easy to code. for sure it would result in more consistency in our work and would also allow us to be more consistent in communication with our clients without having to move away from emboss. thanks cheers ingo -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jun 18 05:29:18 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 10:29:18 +0100 Subject: [EMBOSS] pepstats vs protparam In-Reply-To: <4C1B2723.3010806@crelux.com> References: <4C1B2723.3010806@crelux.com> Message-ID: <4C1B3C6E.8040709@ebi.ac.uk> On 18/06/2010 08:58, Ingo P. Korndoerfer wrote: > we now noted that pepstats does give different values for absorption > coefficients for the same sequence. > i have looked at the pepstats docu but could not find anything on the > algorithm used. really, the math seems simple > > E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine) > > with > > Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125; > > (from the protparam docu). > > would anybody know whether this is what pepstats uses (and help me avoid having to dig through the source code). EMBOSS takes values from local data file emboss/data/Eamino.dat which is using slightly different values. That file credits Gill and von Hippel(1989) _Anal_Biochem_ 182 319-326 for the extinction coefficient values. Protparam updates these values from Pace et al. (1995). We will update EMBOSS to use these values in the next release. Thanks for letting us know Peter From btiwari at ceh.ac.uk Fri Jun 18 08:07:09 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 13:07:09 +0100 Subject: [EMBOSS] getting organism from accession Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Dear all, I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: accession : species where species is taken from the OS line of a database entry. The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of: seqret -feature embl:XXXX -oufo2 myfeat.txt (embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term "organism="Whateverus thingus" so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing. Does anyone know if I'm missing something obvious in Emboss that I could use for this? (I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.) If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task? cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From simon.andrews at bbsrc.ac.uk Fri Jun 18 09:48:23 2010 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 18 Jun 2010 14:48:23 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <78D900EA-65A4-453B-94EB-675E51FAC7AD@bbsrc.ac.uk> On 18 Jun 2010, at 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. how about: entret embl:XXXX stdout | grep ^OS Is that close enough? Simon. From pmr at ebi.ac.uk Fri Jun 18 10:25:12 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 15:25:12 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <4C1B81C8.7030704@ebi.ac.uk> On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism > that the sequence is associated with - i.e. the content of the OS line > in an embl file. I don't need the taxonomic id, and I don't need to > start traversing taxonomy trees. I want to do this by accessing remote > databases (via srs, as configured in my emboss.defaults file), rather > than indexing databases locally. Coming soon. EMBOSS 6.3.0 will include the capability to pick up the taxid and to parse the NCBI taxonomy. This opens up a lot of possibilities. You could also write a little EMBOSS utility to report the Tax string from the sequence object. Suggestions for ways to use this information in a future release are welcome. regards, Peter From btiwari at ceh.ac.uk Fri Jun 18 10:49:50 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 15:49:50 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <4C1B81C8.7030704@ebi.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk>, <4C1B81C8.7030704@ebi.ac.uk> Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BDF@nerckwmb1.ad.nerc.ac.uk> Hello, Thanks Simon for your suggestion. Being so stuck on seqret, I hadn't thought much about entret. Cheers. Peter: great news about the new feature for EMBOSS 6.3.0. Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* ________________________________________ From: Peter Rice [pmr at ebi.ac.uk] Sent: 18 June 2010 15:25 To: Tiwari, Bela Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] getting organism from accession On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism > that the sequence is associated with - i.e. the content of the OS line > in an embl file. I don't need the taxonomic id, and I don't need to > start traversing taxonomy trees. I want to do this by accessing remote > databases (via srs, as configured in my emboss.defaults file), rather > than indexing databases locally. Coming soon. EMBOSS 6.3.0 will include the capability to pick up the taxid and to parse the NCBI taxonomy. This opens up a lot of possibilities. You could also write a little EMBOSS utility to report the Tax string from the sequence object. Suggestions for ways to use this information in a future release are welcome. regards, Peter -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From btiwari at ceh.ac.uk Fri Jun 18 11:05:07 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 16:05:07 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1276872877.8505.2.camel@rrothery-desktop> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk>, <1276872877.8505.2.camel@rrothery-desktop> Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BE0@nerckwmb1.ad.nerc.ac.uk> Thanks Richard. The script looks handy and at minimum may help me deal with searching for my identifiers in more than one database. cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* ________________________________________ From: Richard Rothery [rrothery at ualberta.ca] Sent: 18 June 2010 15:54 To: Tiwari, Bela Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] getting organism from accession I have a perl script written by Craig Knox of the University of Alberta Bioinformatics help desk that does this. I have attached it FYI. It is slow, but gets the job done. Output can be fed to gnumeric etc. Richard On Fri, 2010-06-18 at 13:07 +0100, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. > > The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of: > > seqret -feature embl:XXXX -oufo2 myfeat.txt > > (embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term > > "organism="Whateverus thingus" > > so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing. > > Does anyone know if I'm missing something obvious in Emboss that I could use for this? > > (I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.) > > If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task? > > cheers, > > Bela > > ************************* > Dr. Bela Tiwari > Lead Bioinformatician > NERC Environmental > Bioinformatics Centre > http://nebc.nerc.ac.uk > tel: 01491 69 2705 > > Centre for Ecology and Hydrology > Maclean Bldg, Benson Lane > Crowmarsh Gifford > Wallingford, England > OX10 8BB > ************************* -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From pmr at ebi.ac.uk Fri Jun 18 11:53:26 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 16:53:26 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <4C1B9676.10106@ebi.ac.uk> On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. This will be in EMBOSS 6.3.0 as: infoseq -only -accession -organism (with no ':' in the output) For now it has the full text from the OS line, soon we may be able to select one species name instead of a list. Hope this will help Peter From R.MAULEON at CGIAR.ORG Tue Jun 22 05:53:49 2010 From: R.MAULEON at CGIAR.ORG (Mauleon, Ramil (IRRI)) Date: Tue, 22 Jun 2010 17:53:49 +0800 Subject: [EMBOSS] tfextract does not work properly with newer transfac site.dat file Message-ID: Hello, I used tfextract on the Transfac 6.4 file to be able to use this on tfscan, but it does not parse the file properly. Part of the problem that I saw with the Transfac site.dat 6.4 file were: 1 - many entries had more that 1 motif sequences (the SQ line); these subsequently weren't included in the parsed output AC R00018 XX ID MOUSE$ACRD_01 XX DT 20.06.1990 (created); ewi. DT 24.08.1995 (updated); hiwi. CO Copyright (C), Biobase GmbH. XX TY D XX DE AChR delta (acetylcholine receptor, delta-subunit); Gene: G000457. XX SQ TGCCTGG. SQ TGCCCTTG. SQ TGCCCTAA. SQ TGGCAAAC. XX SF -148 . . . 2 - Some motif sequences were broken up to 2 lines, for example.. AC R00709 XX ID HA$HMGCR_02 XX DT 20.06.1990 (created); ewi. DT 06.09.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX TY D XX DE HMGCOAR (HMG-CoA reductase); Gene: G000157. XX SQ TGCTGGAACTCGACCAGCTATTGGTTGGCTCGGCCGTGGTGAGAGATGGTGCGGTGCCCG SQ TTCTCC. Thanks in advance for fixing tfextract Ramil --------------------------------- Ramil P. Mauleon Bioinformatics Specialist International Rice Research Institute DAPO Box 7777, Metro Manila, Philippines email: r.mauleon at cgiar.org phone: 632-580-5600 ext 2508 ; fax: 632-580-5699 --------------------------------- From korndoerfer at crelux.com Thu Jun 24 06:20:31 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Thu, 24 Jun 2010 12:20:31 +0200 Subject: [EMBOSS] pepstats pI values (vs protparam) Message-ID: <4C23316F.9000402@crelux.com> oh no ... i need some help again ... my colleagues tell me (similar to the issue resolved last week) peptstats and protparam calculate different pIs. i could not find in the pepstats docu, how it calculates the pI. the file eamino.dat does not seem to contain any pKas, and i could not identify any other file by name that would contain pkas. i any case, my feeling is i would have to manipulate some input to feed pepstats the values from *Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions* Electrophoresis Volume 15, Issue 1, Date: 1994, Pages: 529-539 Bengt Bjellqvist, Bodil Basse, Eydfinnur Olsen, Julio E. Celis since this seems to be what protparam uses. can anybody help ? alternatively, of course, if i can convince my team pepstats values are better then protparam values, that would be fine, too ... ingo -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From pmr at ebi.ac.uk Thu Jun 24 06:57:41 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 24 Jun 2010 11:57:41 +0100 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C23316F.9000402@crelux.com> References: <4C23316F.9000402@crelux.com> Message-ID: <4C233A25.3070305@ebi.ac.uk> On 24/06/10 11:20, Ingo P. Korndoerfer wrote: > oh no ... i need some help again ... > > my colleagues tell me (similar to the issue resolved last week) > peptstats and protparam calculate different pIs. > > i could not find in the pepstats docu, how it calculates the pI. the > file eamino.dat does not seem to contain any pKas, > and i could not identify any other file by name that would contain pkas. Ah, fixed file not currently available as a command line option. Epk.dat is the file > i any case, my feeling is i would have to manipulate some input to feed > pepstats the values from > > *Reference points for comparisons of two-dimensional maps of proteins > from different human cell types defined in a pH scale where isoelectric > points correlate with polypeptide compositions* > Electrophoresis > Volume 15, Issue 1, Date: 1994, Pages: 529-539 > Bengt Bjellqvist, Bodil Basse, Eydfinnur Olsen, Julio E. Celis > > since this seems to be what protparam uses. Ah, they have a few extra tricks, adjusting the N terminal values for some amino acid residues. So some coding changes (not much, but we are very close to the release on July 15th) will be needed to implement these values. Thanks for letting us know. regards, Peter Rice From pmr at ebi.ac.uk Mon Jun 28 12:03:03 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 28 Jun 2010 17:03:03 +0100 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C23316F.9000402@crelux.com> References: <4C23316F.9000402@crelux.com> Message-ID: <4C28C7B7.9080400@ebi.ac.uk> On 24/06/2010 11:20, Ingo P. Korndoerfer wrote: > oh no ... i need some help again ... > > my colleagues tell me (similar to the issue resolved last week) > peptstats and protparam calculate different pIs. > > alternatively, of course, if i can convince my team pepstats values are > better then protparam values, that would be fine, too ... That may be the easiest option for now... I checked on what protparam does ... and it is very specific to running 2D gels under high urea conditions. It also looks as though the publication does not have all the values (only a few N-terminal resudie values were given, for the proteins they analysed) so I have to dig deeper to calculate the others. The values are defined only for acidic proteins (and for acidic residues, values are given to 2 decimal places, for basic residues they are to the nearest integer! Are there any more recent calculatioins for basic resudues we can use? It does lead to some interesting options, especially for eukaryote proteins, but we will not be able to change the calculations in time for this release. Meanwhile we would like to hear more from users who calculate pIs what conditions they are interested in (and why). regards, Peter Rice From korndoerfer at crelux.com Tue Jun 29 04:05:02 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Tue, 29 Jun 2010 10:05:02 +0200 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C28C7B7.9080400@ebi.ac.uk> References: <4C23316F.9000402@crelux.com> <4C28C7B7.9080400@ebi.ac.uk> Message-ID: <4C29A92E.9020508@crelux.com> o.k. ... so i will contribute why we need to know the pI. our core activity is the purification of proteins for crystallization. a good pI prediction helps us making better guesses at what ion exchange media and buffer conditions might make a good starting point in the purification. so, i guess, what we need is the pI under conditions typically applicable to a purification. which is likely somewhat off from "physiological", but might be diluted saline solutions, typically, e.g., 100 mM NaCl, 10 mM Tris ... (just to give an idea about the ionic strength, if that matters). and while we are at it: it might be interesting for us to also be able to guess beforehand what difference a serine or threonine phosphorylation etc might make ... ingo On 28/06/2010 18:03, Peter Rice wrote: > On 24/06/2010 11:20, Ingo P. Korndoerfer wrote: >> oh no ... i need some help again ... >> >> my colleagues tell me (similar to the issue resolved last week) >> peptstats and protparam calculate different pIs. >> >> alternatively, of course, if i can convince my team pepstats values >> are better then protparam values, that would be fine, too ... > > That may be the easiest option for now... > > I checked on what protparam does ... and it is very specific to > running 2D gels under high urea conditions. It also looks as though > the publication does not have all the values (only a few N-terminal > resudie values were given, for the proteins they analysed) so I have > to dig deeper to calculate the others. The values are defined only > for acidic proteins (and for acidic residues, values are given to 2 > decimal places, for basic residues they are to the nearest integer! > Are there any more recent calculatioins for basic resudues we can > use? > > It does lead to some interesting options, especially for eukaryote > proteins, but we will not be able to change the calculations in time > for this release. > > Meanwhile we would like to hear more from users who calculate pIs > what conditions they are interested in (and why). > > regards, > > Peter Rice > > -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From hpm at bioinfo-user.org.uk Wed Jun 30 10:18:03 2010 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Wed, 30 Jun 2010 15:18:03 +0100 Subject: [EMBOSS] accessing emboss ftp site In-Reply-To: References: <6F57C2D1-8927-420C-940C-C6EC0C62AABE@gmail.com> <4BD0DB9B.5050005@sonsorol.org> <4BD1C05D.5010109@sonsorol.org> Message-ID: Hi Chris, I'm also seeing problems with the FTP site, but using Mirror rather than Transmit, it looks like the server does not like options being specified to the LIST command: Scanning remote directory /pub/EMBOSS ---> CWD /pub/EMBOSS 250 Directory successfully changed. ---> TYPE A 200 Switching to ASCII mode. ---> PORT 172,21,22,1,171,245 200 PORT command successful. Consider using PASV. ---> PASV 227 Entering Passive Mode (208,94,50,58,104,178) ---> LIST -lat timed out Cannot get remote directory listing because: timed out Cannot get remote directory details (/pub/EMBOSS) disconnecting from emboss.open-bio.org Trying it with a command line client I get the response to hang if I try using any options to ls or dir, without options they are fine. All the best, Hamish On 29 May 2010 03:44, Koen van der Drift wrote: > Hi Chris, > > Did you have a chance to look at this? Just tried again, and Transmit still > won't let me access the emboss ftp site. > > Thanks, > > - Koen. > > > On Apr 23, 2010, at 11:44 AM, Chris Dagdigian wrote: > >> >> In the last few months the open-bio.org servers switched datacenters, IP >> addresses and firewall/IDS appliances. Lots of juicy things to look at and >> debug. >> >> Koen - if you have a chance can you send me the IP address that you are >> using to connect from? I might be able to find some relevant log entries >> with that info. >> >> -Chris >> >> >> >> Koen van der Drift wrote: >>> >>> Just for the record, it used to work with Transmit, this is only from >>> the last few months. >>> >>> - Koen. >>> >>> On Thu, Apr 22, 2010 at 7:28 PM, Chris Dagdigian >>> ?wrote: >>>> >>>> Might be an issue with the Juniper Netscreen firewall/IDS security >>>> appliance >>>> that sits upstream of the EMBOSS FTP server. I'll take a look at the >>>> security logs and alerts. >>>> >>>> -Chris >>>> >>>> >>>> Koen van der Drift wrote: >>>>> >>>>> Hi, >>>>> >>>>> For a while now I am unable to access the emboss ftp site using the OS >>>>> X >>>>> client Transmit. Loggin in works fine, but it chokes on the LIST >>>>> command. I have no problems accessing it from the command line. I have >>>>> added the output from Transmit below. I don't know if this is a >>>>> Transmit >>>>> or emboss issue, but just wanted to let you know. >>>>> >>>>> Thanks, >>>>> >>>>> - Koen. >>>>> >>>>> >>>>> Transmit 3.6.9 Session Transcript >>>>> LibNcFTP 3.2.1 (August 13, 2007) compiled for UNIX >>>>> Uname: Darwin|exile.local|9.8.0|Darwin Kernel Version 9.8.0: Wed Jul 15 >>>>> 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC|Power Macintosh >>>>> 220: (vsFTPd 2.0.1) >>>>> Connected to emboss.open-bio.org. >>>>> Cmd: USER anonymous >>>>> 331: Please specify the password. >>>>> Cmd: PASS NcFTP@ >>>>> 230: Login successful. >>>>> Cmd: TYPE A >>>>> 200: Switching to ASCII mode. >>>>> Logged in to emboss.open-bio.org as anonymous. >>>>> Cmd: SYST >>>>> 215: UNIX Type: L8 >>>>> Cmd: PWD >>>>> 257: "/" >>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>> 250: Directory successfully changed. >>>>> Cmd: PWD >>>>> 257: "/pub/EMBOSS/fixes" >>>>> Cmd: PASV >>>>> 227: Entering Passive Mode (208,94,50,58,83,232) >>>>> Cmd: LIST -a >>>>> Could not read reply from control connection -- timed out. (SReadline >>>>> 1) >>>>> 220: (vsFTPd 2.0.1) >>>>> Connected to emboss.open-bio.org. >>>>> Cmd: USER anonymous >>>>> 331: Please specify the password. >>>>> Cmd: PASS NcFTP@ >>>>> 230: Login successful. >>>>> Logged in to emboss.open-bio.org as anonymous. >>>>> Cmd: SYST >>>>> 215: UNIX Type: L8 >>>>> Cmd: PWD >>>>> 257: "/" >>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>> 250: Directory successfully changed. >>>>> Cmd: PWD >>>>> 257: "/pub/EMBOSS/fixes" >>>>> Cmd: PASV >>>>> 227: Entering Passive Mode (208,94,50,58,222,100) >>>>> Cmd: LIST -a >>>>> Could not read reply from control connection -- timed out. (SReadline >>>>> 1) >>>>> >>>>> _______________________________________________ >>>>> EMBOSS mailing list >>>>> EMBOSS at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ---- "Saying the internet has changed dramatically over the last five years is clich? ? the internet is always changing dramatically" - Craig Labovitz, Arbor Networks. From dag at sonsorol.org Wed Jun 30 12:30:41 2010 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 30 Jun 2010 12:30:41 -0400 Subject: [EMBOSS] accessing emboss ftp site In-Reply-To: References: <6F57C2D1-8927-420C-940C-C6EC0C62AABE@gmail.com> <4BD0DB9B.5050005@sonsorol.org> <4BD1C05D.5010109@sonsorol.org>

Message-ID: <4C2B7131.8070909@sonsorol.org> I'll check into this. We run the standard VS-FTPD server on the box but it has not been updated in a while. We also use an upstream Juniper SSG-series firewall and intrusion detection system and sometimes it gets messed up and affects individual TCP sessions. A reboot of the firewall usually clears things up. Given the problems with options maybe it is the vs-ftpd configuration though. I'll give it a look. -Chris Hamish McWilliam wrote: > Hi Chris, > > I'm also seeing problems with the FTP site, but using Mirror rather > than Transmit, it looks like the server does not like options being > specified to the LIST command: > > Scanning remote directory /pub/EMBOSS > ---> CWD /pub/EMBOSS > 250 Directory successfully changed. > ---> TYPE A > 200 Switching to ASCII mode. > ---> PORT 172,21,22,1,171,245 > 200 PORT command successful. Consider using PASV. > ---> PASV > 227 Entering Passive Mode (208,94,50,58,104,178) > ---> LIST -lat > timed out > Cannot get remote directory listing because: timed out > Cannot get remote directory details (/pub/EMBOSS) > disconnecting from emboss.open-bio.org > > Trying it with a command line client I get the response to hang if I > try using any options to ls or dir, without options they are fine. > > All the best, > > Hamish > > On 29 May 2010 03:44, Koen van der Drift wrote: >> Hi Chris, >> >> Did you have a chance to look at this? Just tried again, and Transmit still >> won't let me access the emboss ftp site. >> >> Thanks, >> >> - Koen. >> >> >> On Apr 23, 2010, at 11:44 AM, Chris Dagdigian wrote: >> >>> In the last few months the open-bio.org servers switched datacenters, IP >>> addresses and firewall/IDS appliances. Lots of juicy things to look at and >>> debug. >>> >>> Koen - if you have a chance can you send me the IP address that you are >>> using to connect from? I might be able to find some relevant log entries >>> with that info. >>> >>> -Chris >>> >>> >>> >>> Koen van der Drift wrote: >>>> Just for the record, it used to work with Transmit, this is only from >>>> the last few months. >>>> >>>> - Koen. >>>> >>>> On Thu, Apr 22, 2010 at 7:28 PM, Chris Dagdigian >>>> wrote: >>>>> Might be an issue with the Juniper Netscreen firewall/IDS security >>>>> appliance >>>>> that sits upstream of the EMBOSS FTP server. I'll take a look at the >>>>> security logs and alerts. >>>>> >>>>> -Chris >>>>> >>>>> >>>>> Koen van der Drift wrote: >>>>>> Hi, >>>>>> >>>>>> For a while now I am unable to access the emboss ftp site using the OS >>>>>> X >>>>>> client Transmit. Loggin in works fine, but it chokes on the LIST >>>>>> command. I have no problems accessing it from the command line. I have >>>>>> added the output from Transmit below. I don't know if this is a >>>>>> Transmit >>>>>> or emboss issue, but just wanted to let you know. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> - Koen. >>>>>> >>>>>> >>>>>> Transmit 3.6.9 Session Transcript >>>>>> LibNcFTP 3.2.1 (August 13, 2007) compiled for UNIX >>>>>> Uname: Darwin|exile.local|9.8.0|Darwin Kernel Version 9.8.0: Wed Jul 15 >>>>>> 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC|Power Macintosh >>>>>> 220: (vsFTPd 2.0.1) >>>>>> Connected to emboss.open-bio.org. >>>>>> Cmd: USER anonymous >>>>>> 331: Please specify the password. >>>>>> Cmd: PASS NcFTP@ >>>>>> 230: Login successful. >>>>>> Cmd: TYPE A >>>>>> 200: Switching to ASCII mode. >>>>>> Logged in to emboss.open-bio.org as anonymous. >>>>>> Cmd: SYST >>>>>> 215: UNIX Type: L8 >>>>>> Cmd: PWD >>>>>> 257: "/" >>>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>>> 250: Directory successfully changed. >>>>>> Cmd: PWD >>>>>> 257: "/pub/EMBOSS/fixes" >>>>>> Cmd: PASV >>>>>> 227: Entering Passive Mode (208,94,50,58,83,232) >>>>>> Cmd: LIST -a >>>>>> Could not read reply from control connection -- timed out. (SReadline >>>>>> 1) >>>>>> 220: (vsFTPd 2.0.1) >>>>>> Connected to emboss.open-bio.org. >>>>>> Cmd: USER anonymous >>>>>> 331: Please specify the password. >>>>>> Cmd: PASS NcFTP@ >>>>>> 230: Login successful. >>>>>> Logged in to emboss.open-bio.org as anonymous. >>>>>> Cmd: SYST >>>>>> 215: UNIX Type: L8 >>>>>> Cmd: PWD >>>>>> 257: "/" >>>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>>> 250: Directory successfully changed. >>>>>> Cmd: PWD >>>>>> 257: "/pub/EMBOSS/fixes" >>>>>> Cmd: PASV >>>>>> 227: Entering Passive Mode (208,94,50,58,222,100) >>>>>> Cmd: LIST -a >>>>>> Could not read reply from control connection -- timed out. (SReadline >>>>>> 1) >>>>>> >>>>>> _______________________________________________ >>>>>> EMBOSS mailing list >>>>>> EMBOSS at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/emboss >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > > From jchavird at gmail.com Tue Jun 1 16:55:34 2010 From: jchavird at gmail.com (Justin Havird) Date: Tue, 1 Jun 2010 11:55:34 -0500 Subject: [EMBOSS] Tranalign relaxation? In-Reply-To: References:

Message-ID: Thanks, Yes the problem is that some taxa don't follow the standard genetic codes available for some codons, usually start codons. I may try to pass the sequences to tranalign without the first codon and see if that will clear up a majority of the troublemakers. I thin the issue with "unambigouous ambiguous codons" is what is happening in cases where there is an ambiguous nucleotide position, even though the amino acid should be able to be determined. Thanks again and let me know if anything else comes to mind, Justin On Mon, May 31, 2010 at 6:08 PM, Peter wrote: > On Mon, May 31, 2010 at 11:38 PM, Justin Havird > wrote: > > Hi Peter, > > > > Yes, I have been using the correct mitochondrial translation codes. > > Depending on the taxonomic group, there are about 4 different > mitochondrial > > translation codes that I have been using. The problem is that some > > individual species use alternate start codons, etc. These are generally > > case-by-case problems. > > > > Someone suggested I alter the tranalign program itself and create a new > > translation table that would take these alternate start codons into > > consideration. However, I haven't had much luck with this. > > > > Any other ideas? > > > > Thanks! > > > > Justin > > If I understand correctly, part of the problem is your organisms > don't all fully follow the standard genetic codes available in > EMBOSS. Therefore would a new tranalign option to give the > start codon(s) to use be helpful? > > Personally I'd probably solve this kind of thing by writing a script > in (Bio)python. I might still call tranalign to do the hard work but > might try passing it the sequences with the first codon/amino > acid removed. > > You didn't CC the mailing list - perhaps an accident? Please > feel free to forward my reply back to the list. > > Peter C. > > P.S. There were some issues with using EMBOSS transeq for > the translation of "unambigouous ambiguous codons" which I > reported back in July 2009: > http://lists.open-bio.org/pipermail/emboss/2009-July/003667.html > > I haven't been back to test if the latest EMBOSS release > addressed it (EMBOSS 6.1.0 is still broken in this regard > I think), but your problem with ambiguous IUPAC codes > may be linked to this. > From asidhu at biomap.org Thu Jun 3 16:38:08 2010 From: asidhu at biomap.org (Amandeep Sidhu) Date: Fri, 4 Jun 2010 00:38:08 +0800 Subject: [EMBOSS] CFP: 23rd IEEE International Symposium on Computer-Based Medical Systems 2010 (IEEE CBMS 2010) Message-ID: IEEE CBMS 2010 23rd IEEE International Symposium on Computer-Based Medical Systems 2010 Perth, Australia, 12-15 October 2010 http://www.cbms2010.curtin.edu.au/ The 23rd IEEE International Symposium on Computer-Based Medical Systems (CBMS 2010) is intended to provide an international forum for discussing the latest results in the field of computational medicine. The scientific program of CBMS 2010 will consist of invited keynote talks given by leading scientists in the field, and regular and special track sessions that cover a broad array of issues which relate computing to medicine. RELEVANT TOPICS Network and Telemedicine Systems Medical Databases & Information Systems Computer-Aided Diagnosis Medical Devices with Embedded Computers Bioinformatics in Medicine Software Systems in Medicine Pervasive Health Systems and Services Web-based Delivery of Medical Information Medical Image Segmentation & Compression Content Analysis of Biomedical Image Data Knowledge-Based & Decision Support Systems Hand-held Computing Applications in Medicine Knowledge Discovery & Data Mining Signal and Image Processing in Medicine Multimedia Biomedical Databases CBMS 2010 invites original previously unpublished contributions that are not submitted concurrently to a journal or another conference. Many of the above listed topics are represented by corresponding Special Tracks, while others are solely covered by the general CBMS track. Prospective authors are expected to submit their contributions to one of the corresponding Special Tracks or to the general track if none of the special tracks is relevant. SPECIAL TRACKS ST1: Computational Proteomics and Genomics ST2: Knowledge Discovery and Decision Systems in Biomedicine ST3: Ontologies for Biomedical Systems ST4: HealthGrid & Cloud Computing ST5: Technology Enhanced Learning in Medical Education ST6: Intelligent Patient Management ST7: Data Streams in Healthcare ST8: Supporting Collaboration among Healthcare Workers ST9: Telemedicine ST10: Computer-Based Systems for Mental Health ST11: Image Informatics in Biomedical Research and Clinical Medicine ST12: e-Health SUBMISSION GUIDELINES Papers should be submitted electronically using EasyChair online submission system. The papers must be prepared following the IEEE two-column format and should not exceed the length of 6 (six) Letter-sized pages. LaTeX or Microsoft Word templates can be used when preparing the papers. Please, note that only PDF format of submissions is allowed. Submission web site: http://www.easychair.org/conferences/?conf=cbms2010 All submissions will be peer-reviewed by at least three reviewers. The proceedings will be published by the IEEE Computer Society Press. At least one of the authors of accepted papers is required to register and present the work at the conference; otherwise their papers will be removed from the digital library after the conference. IMPORTANT DATES Submission deadline for regular papers: 24 June 2010 Deadline for tutorial submission: 24 June 2010 Notification of acceptation for papers and tutorials: 2 Aug 2010 Final camera ready due: 2 Sep 2010 Author registration: 2 Sep 2010 INTENDED AUDIENCE Engineers, scientists, clinicians and managers involved in medical computing projects are encouraged to submit papers to the symposium and/or attend the symposium. The symposium provides its attendees with an opportunity to experience state-of-the-art research and development in a variety of topics directly and indirectly related to their own work. In addition to research papers, keynote speakers and tutorial sessions it provides participants with an opportunity to come up-to-date on important technological issues. The symposium encourages the participation of students engaged in research/development in computer-based medical systems. Organizing Committee GENERAL CHAIRS Tharam Dillon, Curtin University of Technology, Australia Daniel Rubin, National Center for Biomedical Ontologies, USA William Gallagher, University College Dublin, Ireland PROGRAM CHAIRS Amandeep Sidhu, Curtin University of Technology, Australia Alexey Tsymbal, Siemens, Germany PUBLICATION CHAIRS Mykola Pechenizkiy, Eindhoven University of Technology, Netherlands Tony Hu, Drexel University, USA SPECIAL TRACK CHAIRS Maja Hadzic, Curtin University of Technology, Australia Jake Chen, Indiana University, USA TUTORIAL CHAIRS Phoebe Chen, La Trobe University, Australia Xiaofang Zhou, University of Queensland, Australia PUBLICITY CHAIRS Carolyn McGregor, University of Ontario Institute of Technology, Canada Meifania Chen, Curtin University of Technology, Australia From charles-listes-emboss at plessy.org Mon Jun 7 11:59:51 2010 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 7 Jun 2010 20:59:51 +0900 Subject: [EMBOSS] SAM as a sequence input and output format? Message-ID: <20100607115951.GD4213@kunpuu.plessy.org> Dear EMBOSS developers, I often look for a sequence format that would put all the information on a single line easy to parse with a Perl or sed filter, and was wondering if???in the absence of a better alternative???it would be possible for EMBOSS to include the SAM format as a sequence format. http://samtools.sourceforge.net/samtools.shtml It may look heretic at a first sight, but I see that other projects are also using SAM as format to store sequences with qualities: http://picard.sourceforge.net/command-line-overview.shtml#FastqToSam Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Mon Jun 7 12:41:08 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 07 Jun 2010 13:41:08 +0100 Subject: [EMBOSS] SAM as a sequence input and output format? In-Reply-To: <20100607115951.GD4213@kunpuu.plessy.org> References: <20100607115951.GD4213@kunpuu.plessy.org> Message-ID: <4C0CE8E4.90409@ebi.ac.uk> On 07/06/2010 12:59, Charles Plessy wrote: > Dear EMBOSS developers, > > I often look for a sequence format that would put all the information on a > single line easy to parse with a Perl or sed filter, and was wondering if ? in > the absence of a better alternative ? it would be possible for EMBOSS to > include the SAM format as a sequence format. Yes ... already included as an input format. We are working on it as an output format, perhaps in time for the next release. For large numbers of sequneces, we are also working on the binary (BAM) version of the format which is popular because of the high level of compression. regards, Peter Rice From roi.brodo at hotmail.com Wed Jun 16 07:13:05 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 07:13:05 +0000 Subject: [EMBOSS] =?windows-1256?q?Get_a_partial_GenBank_of_a_circular_DNA?= =?windows-1256?q?=FE?= Message-ID: Hello, Suppose I have some circular DNA RefSeq in a Genbank format. I would like to create a new (smaller) Genbank file (both features and sequence) containing only a range of nucleotides from the original file, but a range that overlaps the "end". For example, if my original Genbank spans 1,000,000 base (1-1,000,000) I would like to get join(800,000-1,000,000, 1-100,000), so the new Genbank file will actually contain 300,000 bp from the circular DNA. How can I do that? Again, my input is a Genbank file and output should also be a Gebnank file. Thanks! - Roi _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 From jison at ebi.ac.uk Wed Jun 16 08:21:42 2010 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 16 Jun 2010 09:21:42 +0100 (BST) Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular_DNA?= =?iso-8859-1?q?=FE?= In-Reply-To: References: Message-ID: <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> Hi Roi You could try cutseq to remove the middle bit of the sequence: http://emboss.sourceforge.net/apps/cvs/emboss/apps/cutseq.html extractseq might also be worth a look: http://emboss.sourceforge.net/apps/cvs/emboss/apps/extractseq.html Have a play and let us know how you get on. Cheers Jon > > Hello, > > > > Suppose I have some circular DNA RefSeq in a Genbank format. I would > like to create a new (smaller) Genbank file (both features and sequence) > containing only a range of nucleotides from the original file, but a > range that overlaps the "end". For example, if my original Genbank spans > 1,000,000 base (1-1,000,000) I would like to get join(800,000-1,000,000, > 1-100,000), so the new Genbank file will actually contain 300,000 bp > from the circular DNA. > > > > How can I do that? Again, my input is a Genbank file and output should > also be a Gebnank file. > > > > Thanks! > > - Roi > _________________________________________________________________ > Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From roi.brodo at hotmail.com Wed Jun 16 12:13:20 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 12:13:20 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> References: , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk> Message-ID: Hi Jon and thanks for the reply. I'm not sure how cutseq can create a genbank file (a single file with both the features and the sequence). I also didn't find how can extractseq can be used to achieve this task. Thanks, Roi _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 From roi.brodo at hotmail.com Wed Jun 16 12:55:25 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Wed, 16 Jun 2010 12:55:25 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>, Message-ID: After some more reading I think I can do it using union. The problem is that after I create the list (of the two ranges) using yank, union dies on "union terminated: Bad value for '-sequence' and no prompt". Why is that? Shouldn't i use a yank file? _________________________________________________________________ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 From pmr at ebi.ac.uk Thu Jun 17 08:38:21 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 17 Jun 2010 09:38:21 +0100 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>,

Message-ID: <4C19DEFD.2080601@ebi.ac.uk> On 16/06/2010 13:55, Roi Brodo wrote: > > After some more reading I think I can do it using union. The problem is that after I create the list (of the two ranges) using yank, union dies on "union terminated: Bad value for '-sequence' and no prompt". Why is that? Shouldn't i use a yank file? Yes, yank and union is the correct approach. The output of yank is a list file, so the input to union should be @filename to read a list of sequence addresses from the file. If you just give the filename it assumes it is sequences (perhaps a fasta file of sequences to be joined). We will add this to our feature requests - it should be possible to make seqret handle ranges from circular sequences. This will be after the next release as it requires rewriting the way several library functions work to allow the circular range. regards, Peter Rice From roi.brodo at hotmail.com Thu Jun 17 08:45:17 2010 From: roi.brodo at hotmail.com (Roi Brodo) Date: Thu, 17 Jun 2010 08:45:17 +0000 Subject: [EMBOSS] =?iso-8859-1?q?Get_a_partial_GenBank_of_a_circular______?= =?iso-8859-1?q?DNA=FE?= In-Reply-To: <4C19DEFD.2080601@ebi.ac.uk> References: , , <34280.172.22.100.208.1276676502.squirrel@webmail.ebi.ac.uk>,

, <4C19DEFD.2080601@ebi.ac.uk> Message-ID: Thank you. May you keep up the great work! _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 From korndoerfer at crelux.com Fri Jun 18 07:58:27 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Fri, 18 Jun 2010 09:58:27 +0200 Subject: [EMBOSS] pepstats vs protparam Message-ID: <4C1B2723.3010806@crelux.com> dear fellow emboss users, we have in the past used the wellknown expasy protparam tool to calculate our extinction coefficients for proteins. easy for most users and that is what they got used to. we are now switching to an increased use of emboss, since it can be so nicely incorporated into our database programs and pipelines. we now noted that pepstats does give different values for absorption coefficients for the same sequence. i have looked at the pepstats docu but could not find anything on the algorithm used. really, the math seems simple E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine) with Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125; (from the protparam docu). would anybody know whether this is what pepstats uses (and help me avoid having to dig through the source code). and second, it seems that the expasy colleagues have tweaked their algorithm further see http://expasy.org/tools/protparam-doc.html (see the important note around the middle of the page). would anybody have an opinion whether it would be adequate (even allowed) to emulate protparam behaviour in peptstats or to give the user a choice to do so ? it seems this might be easy to code. for sure it would result in more consistency in our work and would also allow us to be more consistent in communication with our clients without having to move away from emboss. thanks cheers ingo -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jun 18 09:29:18 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 10:29:18 +0100 Subject: [EMBOSS] pepstats vs protparam In-Reply-To: <4C1B2723.3010806@crelux.com> References: <4C1B2723.3010806@crelux.com> Message-ID: <4C1B3C6E.8040709@ebi.ac.uk> On 18/06/2010 08:58, Ingo P. Korndoerfer wrote: > we now noted that pepstats does give different values for absorption > coefficients for the same sequence. > i have looked at the pepstats docu but could not find anything on the > algorithm used. really, the math seems simple > > E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine) > > with > > Ext(Tyr) = 1490, Ext(Trp) = 5500, Ext(Cystine) = 125; > > (from the protparam docu). > > would anybody know whether this is what pepstats uses (and help me avoid having to dig through the source code). EMBOSS takes values from local data file emboss/data/Eamino.dat which is using slightly different values. That file credits Gill and von Hippel(1989) _Anal_Biochem_ 182 319-326 for the extinction coefficient values. Protparam updates these values from Pace et al. (1995). We will update EMBOSS to use these values in the next release. Thanks for letting us know Peter From btiwari at ceh.ac.uk Fri Jun 18 12:07:09 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 13:07:09 +0100 Subject: [EMBOSS] getting organism from accession Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Dear all, I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: accession : species where species is taken from the OS line of a database entry. The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of: seqret -feature embl:XXXX -oufo2 myfeat.txt (embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term "organism="Whateverus thingus" so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing. Does anyone know if I'm missing something obvious in Emboss that I could use for this? (I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.) If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task? cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From simon.andrews at bbsrc.ac.uk Fri Jun 18 13:48:23 2010 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 18 Jun 2010 14:48:23 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <78D900EA-65A4-453B-94EB-675E51FAC7AD@bbsrc.ac.uk> On 18 Jun 2010, at 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. how about: entret embl:XXXX stdout | grep ^OS Is that close enough? Simon. From pmr at ebi.ac.uk Fri Jun 18 14:25:12 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 15:25:12 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <4C1B81C8.7030704@ebi.ac.uk> On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism > that the sequence is associated with - i.e. the content of the OS line > in an embl file. I don't need the taxonomic id, and I don't need to > start traversing taxonomy trees. I want to do this by accessing remote > databases (via srs, as configured in my emboss.defaults file), rather > than indexing databases locally. Coming soon. EMBOSS 6.3.0 will include the capability to pick up the taxid and to parse the NCBI taxonomy. This opens up a lot of possibilities. You could also write a little EMBOSS utility to report the Tax string from the sequence object. Suggestions for ways to use this information in a future release are welcome. regards, Peter From btiwari at ceh.ac.uk Fri Jun 18 14:49:50 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 15:49:50 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <4C1B81C8.7030704@ebi.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk>, <4C1B81C8.7030704@ebi.ac.uk> Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BDF@nerckwmb1.ad.nerc.ac.uk> Hello, Thanks Simon for your suggestion. Being so stuck on seqret, I hadn't thought much about entret. Cheers. Peter: great news about the new feature for EMBOSS 6.3.0. Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* ________________________________________ From: Peter Rice [pmr at ebi.ac.uk] Sent: 18 June 2010 15:25 To: Tiwari, Bela Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] getting organism from accession On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism > that the sequence is associated with - i.e. the content of the OS line > in an embl file. I don't need the taxonomic id, and I don't need to > start traversing taxonomy trees. I want to do this by accessing remote > databases (via srs, as configured in my emboss.defaults file), rather > than indexing databases locally. Coming soon. EMBOSS 6.3.0 will include the capability to pick up the taxid and to parse the NCBI taxonomy. This opens up a lot of possibilities. You could also write a little EMBOSS utility to report the Tax string from the sequence object. Suggestions for ways to use this information in a future release are welcome. regards, Peter -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From btiwari at ceh.ac.uk Fri Jun 18 15:05:07 2010 From: btiwari at ceh.ac.uk (Tiwari, Bela) Date: Fri, 18 Jun 2010 16:05:07 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1276872877.8505.2.camel@rrothery-desktop> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk>, <1276872877.8505.2.camel@rrothery-desktop> Message-ID: <1DCCED50D0696A498958BA6B254456E2088DF30BE0@nerckwmb1.ad.nerc.ac.uk> Thanks Richard. The script looks handy and at minimum may help me deal with searching for my identifiers in more than one database. cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* ________________________________________ From: Richard Rothery [rrothery at ualberta.ca] Sent: 18 June 2010 15:54 To: Tiwari, Bela Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] getting organism from accession I have a perl script written by Craig Knox of the University of Alberta Bioinformatics help desk that does this. I have attached it FYI. It is slow, but gets the job done. Output can be fed to gnumeric etc. Richard On Fri, 2010-06-18 at 13:07 +0100, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. > > The closest I've made it to using Emboss is to get the gff output file containing feature information using a command along the lines of: > > seqret -feature embl:XXXX -oufo2 myfeat.txt > > (embl is a database I can search using srs as configured in my emboss.defaults file.) The first non-hashed line in the file myfeat.txt contains the term > > "organism="Whateverus thingus" > > so I could parse that out. However, this file still contains a lot of extra (unwanted) information and requires parsing. > > Does anyone know if I'm missing something obvious in Emboss that I could use for this? > > (I have tried the BioPerl route to get this info from the NCBI, and apart from being unwieldly, I'm managing to get the wrong organism returned for the type of identifer I have. No, I haven't spent time tracking down the problem - frankly, I'd rather resove it using Emboss and/or srs calls.) > > If there isn't anything that will do the job in Emboss at the moment, is there any chance I can put in a development request for an extra flag for seqret, or an extra utility tool that might accomplish this task? > > cheers, > > Bela > > ************************* > Dr. Bela Tiwari > Lead Bioinformatician > NERC Environmental > Bioinformatics Centre > http://nebc.nerc.ac.uk > tel: 01491 69 2705 > > Centre for Ecology and Hydrology > Maclean Bldg, Benson Lane > Crowmarsh Gifford > Wallingford, England > OX10 8BB > ************************* -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. From pmr at ebi.ac.uk Fri Jun 18 15:53:26 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 18 Jun 2010 16:53:26 +0100 Subject: [EMBOSS] getting organism from accession In-Reply-To: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> References: <1DCCED50D0696A498958BA6B254456E2088DF30BDD@nerckwmb1.ad.nerc.ac.uk> Message-ID: <4C1B9676.10106@ebi.ac.uk> On 18/06/2010 13:07, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that the sequence is associated with - i.e. the content of the OS line in an embl file. I don't need the taxonomic id, and I don't need to start traversing taxonomy trees. I want to do this by accessing remote databases (via srs, as configured in my emboss.defaults file), rather than indexing databases locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. This will be in EMBOSS 6.3.0 as: infoseq -only -accession -organism (with no ':' in the output) For now it has the full text from the OS line, soon we may be able to select one species name instead of a list. Hope this will help Peter From R.MAULEON at CGIAR.ORG Tue Jun 22 09:53:49 2010 From: R.MAULEON at CGIAR.ORG (Mauleon, Ramil (IRRI)) Date: Tue, 22 Jun 2010 17:53:49 +0800 Subject: [EMBOSS] tfextract does not work properly with newer transfac site.dat file Message-ID: Hello, I used tfextract on the Transfac 6.4 file to be able to use this on tfscan, but it does not parse the file properly. Part of the problem that I saw with the Transfac site.dat 6.4 file were: 1 - many entries had more that 1 motif sequences (the SQ line); these subsequently weren't included in the parsed output AC R00018 XX ID MOUSE$ACRD_01 XX DT 20.06.1990 (created); ewi. DT 24.08.1995 (updated); hiwi. CO Copyright (C), Biobase GmbH. XX TY D XX DE AChR delta (acetylcholine receptor, delta-subunit); Gene: G000457. XX SQ TGCCTGG. SQ TGCCCTTG. SQ TGCCCTAA. SQ TGGCAAAC. XX SF -148 . . . 2 - Some motif sequences were broken up to 2 lines, for example.. AC R00709 XX ID HA$HMGCR_02 XX DT 20.06.1990 (created); ewi. DT 06.09.1995 (updated); ewi. CO Copyright (C), Biobase GmbH. XX TY D XX DE HMGCOAR (HMG-CoA reductase); Gene: G000157. XX SQ TGCTGGAACTCGACCAGCTATTGGTTGGCTCGGCCGTGGTGAGAGATGGTGCGGTGCCCG SQ TTCTCC. Thanks in advance for fixing tfextract Ramil --------------------------------- Ramil P. Mauleon Bioinformatics Specialist International Rice Research Institute DAPO Box 7777, Metro Manila, Philippines email: r.mauleon at cgiar.org phone: 632-580-5600 ext 2508 ; fax: 632-580-5699 --------------------------------- From korndoerfer at crelux.com Thu Jun 24 10:20:31 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Thu, 24 Jun 2010 12:20:31 +0200 Subject: [EMBOSS] pepstats pI values (vs protparam) Message-ID: <4C23316F.9000402@crelux.com> oh no ... i need some help again ... my colleagues tell me (similar to the issue resolved last week) peptstats and protparam calculate different pIs. i could not find in the pepstats docu, how it calculates the pI. the file eamino.dat does not seem to contain any pKas, and i could not identify any other file by name that would contain pkas. i any case, my feeling is i would have to manipulate some input to feed pepstats the values from *Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions* Electrophoresis Volume 15, Issue 1, Date: 1994, Pages: 529-539 Bengt Bjellqvist, Bodil Basse, Eydfinnur Olsen, Julio E. Celis since this seems to be what protparam uses. can anybody help ? alternatively, of course, if i can convince my team pepstats values are better then protparam values, that would be fine, too ... ingo -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From pmr at ebi.ac.uk Thu Jun 24 10:57:41 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 24 Jun 2010 11:57:41 +0100 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C23316F.9000402@crelux.com> References: <4C23316F.9000402@crelux.com> Message-ID: <4C233A25.3070305@ebi.ac.uk> On 24/06/10 11:20, Ingo P. Korndoerfer wrote: > oh no ... i need some help again ... > > my colleagues tell me (similar to the issue resolved last week) > peptstats and protparam calculate different pIs. > > i could not find in the pepstats docu, how it calculates the pI. the > file eamino.dat does not seem to contain any pKas, > and i could not identify any other file by name that would contain pkas. Ah, fixed file not currently available as a command line option. Epk.dat is the file > i any case, my feeling is i would have to manipulate some input to feed > pepstats the values from > > *Reference points for comparisons of two-dimensional maps of proteins > from different human cell types defined in a pH scale where isoelectric > points correlate with polypeptide compositions* > Electrophoresis > Volume 15, Issue 1, Date: 1994, Pages: 529-539 > Bengt Bjellqvist, Bodil Basse, Eydfinnur Olsen, Julio E. Celis > > since this seems to be what protparam uses. Ah, they have a few extra tricks, adjusting the N terminal values for some amino acid residues. So some coding changes (not much, but we are very close to the release on July 15th) will be needed to implement these values. Thanks for letting us know. regards, Peter Rice From pmr at ebi.ac.uk Mon Jun 28 16:03:03 2010 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 28 Jun 2010 17:03:03 +0100 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C23316F.9000402@crelux.com> References: <4C23316F.9000402@crelux.com> Message-ID: <4C28C7B7.9080400@ebi.ac.uk> On 24/06/2010 11:20, Ingo P. Korndoerfer wrote: > oh no ... i need some help again ... > > my colleagues tell me (similar to the issue resolved last week) > peptstats and protparam calculate different pIs. > > alternatively, of course, if i can convince my team pepstats values are > better then protparam values, that would be fine, too ... That may be the easiest option for now... I checked on what protparam does ... and it is very specific to running 2D gels under high urea conditions. It also looks as though the publication does not have all the values (only a few N-terminal resudie values were given, for the proteins they analysed) so I have to dig deeper to calculate the others. The values are defined only for acidic proteins (and for acidic residues, values are given to 2 decimal places, for basic residues they are to the nearest integer! Are there any more recent calculatioins for basic resudues we can use? It does lead to some interesting options, especially for eukaryote proteins, but we will not be able to change the calculations in time for this release. Meanwhile we would like to hear more from users who calculate pIs what conditions they are interested in (and why). regards, Peter Rice From korndoerfer at crelux.com Tue Jun 29 08:05:02 2010 From: korndoerfer at crelux.com (Ingo P. Korndoerfer) Date: Tue, 29 Jun 2010 10:05:02 +0200 Subject: [EMBOSS] pepstats pI values (vs protparam) In-Reply-To: <4C28C7B7.9080400@ebi.ac.uk> References: <4C23316F.9000402@crelux.com> <4C28C7B7.9080400@ebi.ac.uk> Message-ID: <4C29A92E.9020508@crelux.com> o.k. ... so i will contribute why we need to know the pI. our core activity is the purification of proteins for crystallization. a good pI prediction helps us making better guesses at what ion exchange media and buffer conditions might make a good starting point in the purification. so, i guess, what we need is the pI under conditions typically applicable to a purification. which is likely somewhat off from "physiological", but might be diluted saline solutions, typically, e.g., 100 mM NaCl, 10 mM Tris ... (just to give an idea about the ionic strength, if that matters). and while we are at it: it might be interesting for us to also be able to guess beforehand what difference a serine or threonine phosphorylation etc might make ... ingo On 28/06/2010 18:03, Peter Rice wrote: > On 24/06/2010 11:20, Ingo P. Korndoerfer wrote: >> oh no ... i need some help again ... >> >> my colleagues tell me (similar to the issue resolved last week) >> peptstats and protparam calculate different pIs. >> >> alternatively, of course, if i can convince my team pepstats values >> are better then protparam values, that would be fine, too ... > > That may be the easiest option for now... > > I checked on what protparam does ... and it is very specific to > running 2D gels under high urea conditions. It also looks as though > the publication does not have all the values (only a few N-terminal > resudie values were given, for the proteins they analysed) so I have > to dig deeper to calculate the others. The values are defined only > for acidic proteins (and for acidic residues, values are given to 2 > decimal places, for basic residues they are to the nearest integer! > Are there any more recent calculatioins for basic resudues we can > use? > > It does lead to some interesting options, especially for eukaryote > proteins, but we will not be able to change the calculations in time > for this release. > > Meanwhile we would like to hear more from users who calculate pIs > what conditions they are interested in (and why). > > regards, > > Peter Rice > > -------------- next part -------------- A non-text attachment was scrubbed... Name: korndoerfer.vcf Type: text/x-vcard Size: 318 bytes Desc: not available URL: From hpm at bioinfo-user.org.uk Wed Jun 30 14:18:03 2010 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Wed, 30 Jun 2010 15:18:03 +0100 Subject: [EMBOSS] accessing emboss ftp site In-Reply-To: References: <6F57C2D1-8927-420C-940C-C6EC0C62AABE@gmail.com> <4BD0DB9B.5050005@sonsorol.org> <4BD1C05D.5010109@sonsorol.org> Message-ID: Hi Chris, I'm also seeing problems with the FTP site, but using Mirror rather than Transmit, it looks like the server does not like options being specified to the LIST command: Scanning remote directory /pub/EMBOSS ---> CWD /pub/EMBOSS 250 Directory successfully changed. ---> TYPE A 200 Switching to ASCII mode. ---> PORT 172,21,22,1,171,245 200 PORT command successful. Consider using PASV. ---> PASV 227 Entering Passive Mode (208,94,50,58,104,178) ---> LIST -lat timed out Cannot get remote directory listing because: timed out Cannot get remote directory details (/pub/EMBOSS) disconnecting from emboss.open-bio.org Trying it with a command line client I get the response to hang if I try using any options to ls or dir, without options they are fine. All the best, Hamish On 29 May 2010 03:44, Koen van der Drift wrote: > Hi Chris, > > Did you have a chance to look at this? Just tried again, and Transmit still > won't let me access the emboss ftp site. > > Thanks, > > - Koen. > > > On Apr 23, 2010, at 11:44 AM, Chris Dagdigian wrote: > >> >> In the last few months the open-bio.org servers switched datacenters, IP >> addresses and firewall/IDS appliances. Lots of juicy things to look at and >> debug. >> >> Koen - if you have a chance can you send me the IP address that you are >> using to connect from? I might be able to find some relevant log entries >> with that info. >> >> -Chris >> >> >> >> Koen van der Drift wrote: >>> >>> Just for the record, it used to work with Transmit, this is only from >>> the last few months. >>> >>> - Koen. >>> >>> On Thu, Apr 22, 2010 at 7:28 PM, Chris Dagdigian >>> ?wrote: >>>> >>>> Might be an issue with the Juniper Netscreen firewall/IDS security >>>> appliance >>>> that sits upstream of the EMBOSS FTP server. I'll take a look at the >>>> security logs and alerts. >>>> >>>> -Chris >>>> >>>> >>>> Koen van der Drift wrote: >>>>> >>>>> Hi, >>>>> >>>>> For a while now I am unable to access the emboss ftp site using the OS >>>>> X >>>>> client Transmit. Loggin in works fine, but it chokes on the LIST >>>>> command. I have no problems accessing it from the command line. I have >>>>> added the output from Transmit below. I don't know if this is a >>>>> Transmit >>>>> or emboss issue, but just wanted to let you know. >>>>> >>>>> Thanks, >>>>> >>>>> - Koen. >>>>> >>>>> >>>>> Transmit 3.6.9 Session Transcript >>>>> LibNcFTP 3.2.1 (August 13, 2007) compiled for UNIX >>>>> Uname: Darwin|exile.local|9.8.0|Darwin Kernel Version 9.8.0: Wed Jul 15 >>>>> 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC|Power Macintosh >>>>> 220: (vsFTPd 2.0.1) >>>>> Connected to emboss.open-bio.org. >>>>> Cmd: USER anonymous >>>>> 331: Please specify the password. >>>>> Cmd: PASS NcFTP@ >>>>> 230: Login successful. >>>>> Cmd: TYPE A >>>>> 200: Switching to ASCII mode. >>>>> Logged in to emboss.open-bio.org as anonymous. >>>>> Cmd: SYST >>>>> 215: UNIX Type: L8 >>>>> Cmd: PWD >>>>> 257: "/" >>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>> 250: Directory successfully changed. >>>>> Cmd: PWD >>>>> 257: "/pub/EMBOSS/fixes" >>>>> Cmd: PASV >>>>> 227: Entering Passive Mode (208,94,50,58,83,232) >>>>> Cmd: LIST -a >>>>> Could not read reply from control connection -- timed out. (SReadline >>>>> 1) >>>>> 220: (vsFTPd 2.0.1) >>>>> Connected to emboss.open-bio.org. >>>>> Cmd: USER anonymous >>>>> 331: Please specify the password. >>>>> Cmd: PASS NcFTP@ >>>>> 230: Login successful. >>>>> Logged in to emboss.open-bio.org as anonymous. >>>>> Cmd: SYST >>>>> 215: UNIX Type: L8 >>>>> Cmd: PWD >>>>> 257: "/" >>>>> Cmd: CWD /pub/EMBOSS/fixes >>>>> 250: Directory successfully changed. >>>>> Cmd: PWD >>>>> 257: "/pub/EMBOSS/fixes" >>>>> Cmd: PASV >>>>> 227: Entering Passive Mode (208,94,50,58,222,100) >>>>> Cmd: LIST -a >>>>> Could not read reply from control connection -- timed out. (SReadline >>>>> 1) >>>>> >>>>> _______________________________________________ >>>>> EMBOSS mailing list >>>>> EMBOSS at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ---- "Saying the internet has changed dramatically over the last five years is clich? ? the internet is always changing dramatically" - Craig Labovitz, Arbor Networks. From dag at sonsorol.org Wed Jun 30 16:30:41 2010 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 30 Jun 2010 12:30:41 -0400 Subject: [EMBOSS] accessing emboss ftp site In-Reply-To: References: <6F57C2D1-8927-420C-940C-C6EC0C62AABE@gmail.com> <4BD0DB9B.5050005@sonsorol.org> <4BD1C05D.5010109@sonsorol.org>