From mictadlo at gmail.com Mon Jan 2 21:05:48 2012 From: mictadlo at gmail.com (Mic) Date: Tue, 3 Jan 2012 12:05:48 +1000 Subject: [Biopython] subprocess.Popen problem In-Reply-To: References: Message-ID: With the following code: if __name__ == '__main__': cmd_soap = 'soap ...' proc = subprocess.Popen(cmd_soap, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) returncode = proc.wait() print "returncoded", returncode stdout_value, stderr_value = proc.communicate() print 'stderr ', stderr_value for line in stderr_value: print "!", line if returncode == 1: sys.exit(1) I get this: ! B ! e ! g ! i ! n ! ! P ! r ! o ! g ! r ! a ! m ! ! S ! O ! A ! P ! a ! l ! i ! g ! n ! e ! r ! / ! s ! o ! a ! p ! 2 ! ... instead of : Begin Program SOAPaligner/soap2 ... What did I wrong? On Thu, Nov 3, 2011 at 7:31 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 3:16 AM, Mic wrote: > > Thank you, I wrote the following code and not sure whether it is what did > > write me. > > Depending on the tool I would check for a non-zero return code rather > than just treating 1 as an error. > > You are also not collecting stderr/stdout correctly. If you send them > to a pipe, the strings from the .communicate will be empty. Rather > reads from the process object's .stdout and .stderr handles. See: > http://docs.python.org/library/subprocess.html > > Peter > From bpederse at gmail.com Mon Jan 2 22:01:23 2012 From: bpederse at gmail.com (Brent Pedersen) Date: Mon, 2 Jan 2012 20:01:23 -0700 Subject: [Biopython] subprocess.Popen problem In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 7:05 PM, Mic wrote: > With the following code: > if __name__ == '__main__': > ? ? ? ?cmd_soap = 'soap ...' > ? ? ? ?proc = subprocess.Popen(cmd_soap, shell=True, > stdout=subprocess.PIPE, stderr=subprocess.PIPE) > ? ? ? ?returncode = proc.wait() > ? ? ? ?print "returncoded", returncode > ? ? ? ?stdout_value, stderr_value = proc.communicate() > ? ? ? ?print 'stderr ', stderr_value > ? ? ? ?for line in stderr_value: > ? ? ? ? ? ? ? print "!", line > ? ? ? ?if returncode == 1: > ? ? ? ? ? ? ? ?sys.exit(1) > > I get this: > > ! B > ! e > ! g > ! i > ! n > ! > ! P > ! r > ! o > ! g > ! r > ! a > ! m > ! > ! S > ! O > ! A > ! P > ! a > ! l > ! i > ! g > ! n > ! e > ! r > ! / > ! s > ! o > ! a > ! p > ! 2 > ! > ... > > instead of : > Begin Program SOAPaligner/soap2 > ... > > > What did I wrong? > stderr_value is a string. you are iterating over the letters. you could do: for line in stderr_value.split("\n"): print "!", line also, instead of using proc.wait/communicate(), you could do: for line in proc.stderr: print "!", line print proc.returncode then you can see the output as it's generated (after buffering). > > On Thu, Nov 3, 2011 at 7:31 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 3:16 AM, Mic wrote: >> > Thank you, I wrote the following code and not sure whether it is what did >> > write me. >> >> Depending on the tool I would check for a non-zero return code rather >> than just treating 1 as an error. >> >> You are also not collecting stderr/stdout correctly. If you send them >> to a pipe, the strings from the .communicate will be empty. Rather >> reads from the process object's .stdout and .stderr handles. See: >> http://docs.python.org/library/subprocess.html >> >> Peter >> > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From David.Lapointe at umassmed.edu Fri Jan 6 14:43:45 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Fri, 6 Jan 2012 14:43:45 -0500 Subject: [Biopython] Bio Motif Message-ID: Is there a reference to the algorithm behind this module? David -- David Lapointe, Ph.D. Director Scientific Computing/Information Services University of Massachusetts Medical School 55 Lake Avenue N Worcester MA 01655 508-856-5141 (v) ' the lyf so short, the craft so long to lerne' From mjldehoon at yahoo.com Fri Jan 6 22:49:30 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 6 Jan 2012 19:49:30 -0800 (PST) Subject: [Biopython] Bio Motif In-Reply-To: Message-ID: <1325908170.23026.YahooMailClassic@web161205.mail.bf1.yahoo.com> Which part of Bio.Motif are you using? Bio.Motif has various capabilities (parsing, PWM score calculation, threshold calculation), and different references may be applicable depending on how you are using Bio.Motif. -Michiel. --- On Fri, 1/6/12, Lapointe, David wrote: > From: Lapointe, David > Subject: [Biopython] Bio Motif > To: "biopython at lists.open-bio.org" > Date: Friday, January 6, 2012, 2:43 PM > Is there a reference to the algorithm > behind this module? > > David > > -- > David Lapointe, Ph.D. > Director Scientific Computing/Information Services > University of Massachusetts Medical School > 55 Lake Avenue N > Worcester MA 01655 > 508-856-5141 (v) > ' the lyf so short, the craft so long to lerne' > > > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From David.Lapointe at umassmed.edu Sat Jan 7 17:39:02 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Sat, 7 Jan 2012 17:39:02 -0500 Subject: [Biopython] Bio Motif In-Reply-To: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> References: , <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> Message-ID: Thanks Michiel, I was assuming the replies went back to the list. Sorry. ________________________________________ From: Michiel de Hoon [mjldehoon at yahoo.com] Sent: Saturday, January 07, 2012 9:54 AM To: Lapointe, David Subject: RE: [Biopython] Bio Motif Hi David, It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). Best, -Michiel. --- On Sat, 1/7/12, Lapointe, David wrote: > From: Lapointe, David > Subject: RE: [Biopython] Bio Motif > To: "Michiel de Hoon" > Date: Saturday, January 7, 2012, 9:05 AM > Yes I should have been more specific. > Not the third party parts ( MEME, etc). I am > going from a list of sequences, creating a > PWM, then using the generated PWM to parse longer sequences > for motifs. Mostly I am interested in information about the > thresholds and interpreting results from search_pwm, but > also some way to cite the method for publication. > > David > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Friday, January 06, 2012 10:49 PM > To: biopython at lists.open-bio.org; > Lapointe, David > Subject: Re: [Biopython] Bio Motif > > Which part of Bio.Motif are you using? Bio.Motif has > various capabilities (parsing, PWM score calculation, > threshold calculation), and different references may be > applicable depending on how you are using Bio.Motif. > > -Michiel. > > --- On Fri, 1/6/12, Lapointe, David > wrote: > > > From: Lapointe, David > > Subject: [Biopython] Bio Motif > > To: "biopython at lists.open-bio.org" > > > Date: Friday, January 6, 2012, 2:43 PM > > Is there a reference to the algorithm > > behind this module? > > > > David > > > > -- > > David Lapointe, Ph.D. > > Director Scientific Computing/Information Services > > University of Massachusetts Medical School > > 55 Lake Avenue N > > Worcester MA 01655 > > 508-856-5141 (v) > > ' the lyf so short, the craft so long to lerne' > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > From bartek at rezolwenta.eu.org Sat Jan 7 19:13:16 2012 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 8 Jan 2012 01:13:16 +0100 Subject: [Biopython] Bio Motif In-Reply-To: References: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> Message-ID: Hi David, The references given by Michiel are fine. If you are using the specific methods for choosing the thresholds based on the expected false positive/negative rates you could cite the paper by Svedn Rahmann et al. (pubmed id 16646785). best Bartek On Sat, Jan 7, 2012 at 11:39 PM, Lapointe, David wrote: > Thanks Michiel, > > I was assuming the replies went back to the list. Sorry. > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Saturday, January 07, 2012 9:54 AM > To: Lapointe, David > Subject: RE: [Biopython] Bio Motif > > Hi David, > > It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). > > Best, > -Michiel. > > --- On Sat, 1/7/12, Lapointe, David wrote: > >> From: Lapointe, David >> Subject: RE: [Biopython] Bio Motif >> To: "Michiel de Hoon" >> Date: Saturday, January 7, 2012, 9:05 AM >> Yes I should have been more specific. >> Not the third party parts ( MEME, etc). I am >> going ? from a list of sequences, creating a >> PWM, then using the generated PWM to parse longer sequences >> for motifs. Mostly I am interested in information about the >> thresholds and interpreting results from search_pwm, but >> also some way to cite the method for publication. >> >> David >> ________________________________________ >> From: Michiel de Hoon [mjldehoon at yahoo.com] >> Sent: Friday, January 06, 2012 10:49 PM >> To: biopython at lists.open-bio.org; >> Lapointe, David >> Subject: Re: [Biopython] Bio Motif >> >> Which part of Bio.Motif are you using? Bio.Motif has >> various capabilities (parsing, PWM score calculation, >> threshold calculation), and different references may be >> applicable depending on how you are using Bio.Motif. >> >> -Michiel. >> >> --- On Fri, 1/6/12, Lapointe, David >> wrote: >> >> > From: Lapointe, David >> > Subject: [Biopython] Bio Motif >> > To: "biopython at lists.open-bio.org" >> >> > Date: Friday, January 6, 2012, 2:43 PM >> > Is there a reference to the algorithm >> > behind this module? >> > >> > David >> > >> > -- >> > David Lapointe, Ph.D. >> > Director Scientific Computing/Information Services >> > University of Massachusetts Medical School >> > 55 Lake Avenue N >> > Worcester MA 01655 >> > 508-856-5141 (v) >> > ' the lyf so short, the craft so long to lerne' >> > >> > >> > _______________________________________________ >> > Biopython mailing list ?- ?Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> >> > > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Bartek Wilczynski From David.Lapointe at umassmed.edu Sat Jan 7 20:00:16 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Sat, 7 Jan 2012 20:00:16 -0500 Subject: [Biopython] Bio Motif In-Reply-To: References: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> , Message-ID: Thanks! ________________________________________ From: biopython-bounces at lists.open-bio.org [biopython-bounces at lists.open-bio.org] On Behalf Of Bartek Wilczynski [bartek at rezolwenta.eu.org] Sent: Saturday, January 07, 2012 7:13 PM To: Lapointe, David Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] Bio Motif Hi David, The references given by Michiel are fine. If you are using the specific methods for choosing the thresholds based on the expected false positive/negative rates you could cite the paper by Svedn Rahmann et al. (pubmed id 16646785). best Bartek On Sat, Jan 7, 2012 at 11:39 PM, Lapointe, David wrote: > Thanks Michiel, > > I was assuming the replies went back to the list. Sorry. > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Saturday, January 07, 2012 9:54 AM > To: Lapointe, David > Subject: RE: [Biopython] Bio Motif > > Hi David, > > It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). > > Best, > -Michiel. > > --- On Sat, 1/7/12, Lapointe, David wrote: > >> From: Lapointe, David >> Subject: RE: [Biopython] Bio Motif >> To: "Michiel de Hoon" >> Date: Saturday, January 7, 2012, 9:05 AM >> Yes I should have been more specific. >> Not the third party parts ( MEME, etc). I am >> going from a list of sequences, creating a >> PWM, then using the generated PWM to parse longer sequences >> for motifs. Mostly I am interested in information about the >> thresholds and interpreting results from search_pwm, but >> also some way to cite the method for publication. >> >> David >> ________________________________________ >> From: Michiel de Hoon [mjldehoon at yahoo.com] >> Sent: Friday, January 06, 2012 10:49 PM >> To: biopython at lists.open-bio.org; >> Lapointe, David >> Subject: Re: [Biopython] Bio Motif >> >> Which part of Bio.Motif are you using? Bio.Motif has >> various capabilities (parsing, PWM score calculation, >> threshold calculation), and different references may be >> applicable depending on how you are using Bio.Motif. >> >> -Michiel. >> >> --- On Fri, 1/6/12, Lapointe, David >> wrote: >> >> > From: Lapointe, David >> > Subject: [Biopython] Bio Motif >> > To: "biopython at lists.open-bio.org" >> >> > Date: Friday, January 6, 2012, 2:43 PM >> > Is there a reference to the algorithm >> > behind this module? >> > >> > David >> > >> > -- >> > David Lapointe, Ph.D. >> > Director Scientific Computing/Information Services >> > University of Massachusetts Medical School >> > 55 Lake Avenue N >> > Worcester MA 01655 >> > 508-856-5141 (v) >> > ' the lyf so short, the craft so long to lerne' >> > >> > >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> >> > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Bartek Wilczynski _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mictadlo at gmail.com Mon Jan 16 01:24:28 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 16:24:28 +1000 Subject: [Biopython] Unique reads Message-ID: Hello, I read in many papers that they made unique reads before the reads were align and later on the SNPs were called. However, I could not find out how they do it. Which tool can be used to do it? Thank you in advance. From mictadlo at gmail.com Mon Jan 16 08:01:56 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 23:01:56 +1000 Subject: [Biopython] compare sequences Message-ID: Hello, Is there anyway a memory efficient way to compare sequences like from NGS? Thank you in advance. From p.j.a.cock at googlemail.com Mon Jan 16 08:30:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Jan 2012 13:30:01 +0000 Subject: [Biopython] compare sequences In-Reply-To: References: Message-ID: On Mon, Jan 16, 2012 at 1:01 PM, Mic wrote: > Hello, > Is there anyway a?memory?efficient?way to ?compare ?sequences like from NGS? > > Thank you in advance. Hi Mic, Could you stop posting such broad questions to multiple mailing lists simultaneously please? Perhaps you would find Biostars Q&A more useful? http://biostar.stackexchange.com/ See also: http://dx.doi.org/10.1371/journal.pcbi.1002202 Peter From mrrizkalla at gmail.com Tue Jan 17 14:43:07 2012 From: mrrizkalla at gmail.com (Mariam Reyad Rizkallah) Date: Tue, 17 Jan 2012 21:43:07 +0200 Subject: [Biopython] Installation on Ubuntu 11.10 64-bit (Loader.py:799) Message-ID: Dear Biopython community, I am installing Biopython v. 1.58 and BioSQL 1.0.1 on Ubuntu 11.10 64-bit virtual machine (Python 2.7.2+). When I run tests, I face this warning. test_BioSQL ... > ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: > UserWarning: order location operators are not fully supported > % feature.location_operator) > ok > test_BioSQL_SeqIO ... > ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: > UserWarning: bond location operators are not fully supported > % feature.location_operator) ok Is it related to the platform (64-bit)? Thank you. Mariam From p.j.a.cock at googlemail.com Wed Jan 18 04:12:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 09:12:10 +0000 Subject: [Biopython] Installation on Ubuntu 11.10 64-bit (Loader.py:799) In-Reply-To: References: Message-ID: On Tuesday, January 17, 2012, Mariam Reyad Rizkallah wrote: > Dear Biopython community, > > I am installing Biopython v. 1.58 and BioSQL 1.0.1 on Ubuntu 11.10 64-bit > virtual machine (Python 2.7.2+). When I run tests, I face this warning. > > test_BioSQL ... >> ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: >> UserWarning: order location operators are not fully supported >> % feature.location_operator) >> ok >> test_BioSQL_SeqIO ... >> ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: >> UserWarning: bond location operators are not fully supported >> % feature.location_operator) > > ok > > > Is it related to the platform (64-bit)? > No, it's harmless - an obscure location string used in GenBank files that we don't store faithfully in BioSQL. Ideally we'd silence that warning in the unit tests... Peter From mike.thon at gmail.com Wed Jan 18 05:14:56 2012 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 18 Jan 2012 11:14:56 +0100 Subject: [Biopython] Is this a valid genbank record? Message-ID: Does anyone know if these GenBank records are valid: http://www.ncbi.nlm.nih.gov/protein/323463153 http://www.ncbi.nlm.nih.gov/protein/93279336 ...because biopython raised an exception when trying to parse them. They have weird feature locations: Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) thanks Mike From p.j.a.cock at googlemail.com Wed Jan 18 06:03:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 11:03:51 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: > Does anyone know if these GenBank records are valid: > > http://www.ncbi.nlm.nih.gov/protein/323463153 > http://www.ncbi.nlm.nih.gov/protein/93279336 > > ...because biopython raised an exception when trying to parse them. ?They have weird feature locations: > > ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), > ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) > > > thanks > Mike See also: http://www.bioperl.org/wiki/BioPerl_Locations#bond.28location.2Clocation...location.29 The use of "bond" in a feature location isn't described in the official GenBank/EMBL/DDBJ Feature Table definition, but that is aimed at nucleotide sequences only. I'm unaware of an official documentation on GenPept variations. Being practical we'd better update the parser to cope with it, even though it does seem to be a rare corner case. I'd have to go back and check, but I suspect prior to the parser rewrite back in Biopython 1.55 (released August 2011) we might have allowed for this. Are you happy to test an updated parser? Peter From mike.thon at gmail.com Wed Jan 18 06:11:52 2012 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 18 Jan 2012 12:11:52 +0100 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >> Does anyone know if these GenBank records are valid: >> >> http://www.ncbi.nlm.nih.gov/protein/323463153 >> http://www.ncbi.nlm.nih.gov/protein/93279336 >> >> ...because biopython raised an exception when trying to parse them. They have weird feature locations: >> >> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >> bond(138),bond(139),bond(138)) >> >> >> thanks >> Mike > > See also: http://www.bioperl.org/wiki/BioPerl_Locations#bond.28location.2Clocation...location.29 > > The use of "bond" in a feature location isn't described in the official > GenBank/EMBL/DDBJ Feature Table definition, but that is aimed at > nucleotide sequences only. I'm unaware of an official documentation > on GenPept variations. > > Being practical we'd better update the parser to cope with it, even > though it does seem to be a rare corner case. > > I'd have to go back and check, but I suspect prior to the parser rewrite > back in Biopython 1.55 (released August 2011) we might have allowed > for this. > > Are you happy to test an updated parser? > I'm happy to volunteer my student to test it :) Just post here when its ready and we'll try it. I'll have to think about what to do to get a script to use a local installation of biopython instead of the system-installed one. Do I need to mess with PYTHONPATH? Mike From p.j.a.cock at googlemail.com Wed Jan 18 09:50:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 14:50:05 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: > > On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: > >> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>> ?They have weird feature locations: >>> >>> ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), >>> ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) >>> Do you actually need to do anything with this feature? If not, then the pragmatic solution is we issue a warning but otherwise ignore the feature and continue parsing. I'm struggling to grok exactly what this location is trying to convey - maybe I should read the associated paper? >> >> Are you happy to test an updated parser? >> > > I'm happy to volunteer my student to test it :) ?Just post here when its ready > and we'll try it. Will do, with more details about how to test it. Peter From cjfields at illinois.edu Wed Jan 18 10:24:25 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 15:24:25 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >> >> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >> >>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>> They have weird feature locations: >>>> >>>> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>> bond(138),bond(139),bond(138)) >>>> > > Do you actually need to do anything with this feature? If not, then > the pragmatic solution is we issue a warning but otherwise ignore > the feature and continue parsing. I'm struggling to grok exactly > what this location is trying to convey - maybe I should read the > associated paper? GenPept is littered with these. With bioperl we only attempt to support 'bond' types for round-tripping, but I don't recall whether this has been extensively tested, though it would be easy enough to add this in to see if the location factory will handle this properly (both to and from a location string). Do wish NCBI would document this more... chris From p.j.a.cock at googlemail.com Wed Jan 18 10:34:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 15:34:58 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Wed, Jan 18, 2012 at 3:24 PM, Fields, Christopher J wrote: > On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: > >> On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >>> >>> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >>> >>>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>>> ?They have weird feature locations: >>>>> >>>>> ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>>> ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) >>>>> >> >> Do you actually need to do anything with this feature? If not, then >> the pragmatic solution is we issue a warning but otherwise ignore >> the feature and continue parsing. I'm struggling to grok exactly >> what this location is trying to convey - maybe I should read the >> associated paper? > > GenPept is littered with these. ?With bioperl we only attempt to > support 'bond' types for round-tripping, but I don't recall whether > this has been extensively tested, though it would be easy enough > to add this in to see if the location factory will handle this properly > (both to and from a location string). > > Do wish NCBI would document this more... +1 Any idea why this example is: join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) rather than: bond(127,127,130,130,138,138,139,138) or indeed by so many of the residues are bonded more than once? Peter From cjfields at illinois.edu Wed Jan 18 11:31:12 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 16:31:12 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> On Jan 18, 2012, at 9:34 AM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 3:24 PM, Fields, Christopher J > wrote: >> On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: >> >>> On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >>>> >>>> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >>>> >>>>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>>>> They have weird feature locations: >>>>>> >>>>>> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>>>> bond(138),bond(139),bond(138)) >>>>>> >>> >>> Do you actually need to do anything with this feature? If not, then >>> the pragmatic solution is we issue a warning but otherwise ignore >>> the feature and continue parsing. I'm struggling to grok exactly >>> what this location is trying to convey - maybe I should read the >>> associated paper? >> >> GenPept is littered with these. With bioperl we only attempt to >> support 'bond' types for round-tripping, but I don't recall whether >> this has been extensively tested, though it would be easy enough >> to add this in to see if the location factory will handle this properly >> (both to and from a location string). >> >> Do wish NCBI would document this more... > > +1 > > > join(bond(127),bond(127),bond(130),bond(130),bond(138), > bond(138),bond(139),bond(138)) > > rather than: > > bond(127,127,130,130,138,138,139,138) > > or indeed by so many of the residues are bonded more than > once? > > Peter No, that one is particularly odd, but there isn't a reason I could see where this couldn't be supported, it's just a join of simple locations. Seems this is something that may be auto-generated, wouldn't be surprised to see more of these. As to whether it's a valid GenBank record, well, considering the source of the record is NCBI, I think it's safe to say it's valid. (though again, this all comes back to how helpful it would be to have documentation re: how bond() is defined within the context of the feature table) chris From p.j.a.cock at googlemail.com Wed Jan 18 11:38:23 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 16:38:23 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> Message-ID: On Wed, Jan 18, 2012 at 4:31 PM, Fields, Christopher J wrote: > On Jan 18, 2012, at 9:34 AM, Peter Cock wrote: >> Any idea why this example is: >> >> join(bond(127),bond(127),bond(130),bond(130),bond(138), >> bond(138),bond(139),bond(138)) >> >> rather than: >> >> bond(127,127,130,130,138,138,139,138) >> >> or indeed by so many of the residues are bonded more than >> once? >> >> Peter > > No, that one is particularly odd, but there isn't a reason I could > see where this couldn't be supported, it's just a join of simple locations. >?Seems this is something that may be auto-generated, wouldn't be > surprised to see more of these. > > As to whether it's a valid GenBank record, well, considering the source > of the record is NCBI, I think it's safe to say it's valid. > > (though again, this all comes back to how helpful it would be to have > documentation re: how bond() is defined within the context of the > feature table) There is precedent for the NCBI publishing GenBank files which didn't conform to the published specification - but in the absence of any spec we're basically stuck with guessing what they intend. I will attempt to enquirer... Peter From David.Lapointe at umassmed.edu Wed Jan 18 11:13:45 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Wed, 18 Jan 2012 11:13:45 -0500 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: Those strange joins are where Ca has a "bond" with an AA in the structure, I would imagine. Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) /heterogen="( CA,1000 )" -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Michael Thon Sent: Wednesday, January 18, 2012 5:15 AM To: Biopython Mailing List Subject: [Biopython] Is this a valid genbank record? Does anyone know if these GenBank records are valid: http://www.ncbi.nlm.nih.gov/protein/323463153 http://www.ncbi.nlm.nih.gov/protein/93279336 ...because biopython raised an exception when trying to parse them. They have weird feature locations: Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) thanks Mike _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From dilara.ally at gmail.com Wed Jan 18 19:10:29 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Wed, 18 Jan 2012 16:10:29 -0800 Subject: [Biopython] SNP id Message-ID: <4F175F75.1030803@gmail.com> Hi All I was wondering if anyone on this listserv had any recommendation regarding coverage for the identification of SNPs from SOLiD or Illumina data? What is the acceptable coverage amount these days? Thanks! Dilara From bpkth2012 at gmail.com Wed Jan 25 11:14:50 2012 From: bpkth2012 at gmail.com (Sarttu Bourvir) Date: Wed, 25 Jan 2012 17:14:50 +0100 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: Message-ID: Hi, I am new to both python and biopython. What I'm trying to do is to parse a blast result xml file (myblast.xml), attached here. The code looks like this: #!/usr/bin/env python import sys import re import Bio from Bio.Blast import NCBIWWW from Bio.Blast import NCBIXML from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Blast import NCBIStandalone infile = raw_input ("which file:") result_handle = open(infile,'r') blast_parser = NCBIStandalone.BlastParser() blast_iterator = NCBIStandalone.Iterator(result_handle, blast_parser) blast_record = blast_iterator.next() for blast_record in blast_iterator: E_VALUE_THRESH = 0.01 for alignment in blast_records.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print '****Alignment****' print 'sequence', alignment.title print 'alignment length', alignment.length When I try to run it,I get this: Traceback (most recent call last): File "Blastparsee.py", line 17, in blast_record = blast_iterator.next() File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 1645, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 804, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 100, in feed self._scan_header(uhandle, consumer) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 225, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? I attached the xml file I've been trying to blast. However, I get the same error if I try using any other xml files from blast. What's going on? Thank you! Cheers, Sar P.S. If this type of messages are not allowed on this e-mailing list I apologize and promise to behave in the future!;) -------------- next part -------------- A non-text attachment was scrubbed... Name: myblast.xml Type: text/xml Size: 586125 bytes Desc: not available URL: From chapmanb at 50mail.com Wed Jan 25 11:57:34 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 25 Jan 2012 11:57:34 -0500 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: Message-ID: <87r4ynu3cx.fsf@fastmail.fm> Sar; > I am new to both python and biopython. Welcome. Thanks for including your code along with the problem report. > What I'm trying to do is to parse a blast result xml file (myblast.xml), > attached here. > > The code looks like this: [...] > blast_parser = NCBIStandalone.BlastParser() [...] > ValueError: Invalid header? You are using NCBIStandalone, which parses plain text blast output. To parse the XML output, you should use the NCBIXML parser: from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) The tutorial has more details and examples: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87 Hope this helps, Brad From kristian_ullrich at yahoo.de Mon Jan 30 05:09:09 2012 From: kristian_ullrich at yahoo.de (Kristian Ullrich) Date: Mon, 30 Jan 2012 10:09:09 +0000 (GMT) Subject: [Biopython] (no subject) Message-ID: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Hello Biopython Team, I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. Thank you in anticipation Kristian Ullrich From p.j.a.cock at googlemail.com Mon Jan 30 06:14:37 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 30 Jan 2012 11:14:37 +0000 Subject: [Biopython] (no subject) In-Reply-To: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: On Mon, Jan 30, 2012 at 10:09 AM, Kristian Ullrich wrote: > Hello Biopython Team, > > I am going to work with biological networks, next to the great tool cytoscape > I want to create networks out of python. To my knowledge there exists > pygraphviz and networkx which can produce gml and dot output files. > Cytoscape uses the XGMML Language > (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). > > Is there an easy way of how to use or manipulate existing python xml > parsers to work with xgmml files or are there plans of the Biopython > Team to write an XGMML - parser? > > Since a lot of biologist work with cytoscape this would be a very useful > tool to build graphical netowrks. > > Thank you in anticipation > > Kristian Ullrich Hi, XGMML looks much more general than 'just' biology, so I wonder if XGMML support in NetworkX would a better idea? I couldn't find an open issue on this on their tracker, but certainly they sounded positive about this back in 2009, http://groups.google.com/group/networkx-discuss/browse_thread/thread/fb30306e43414c74 Peter P.S. There are several Python libraries for using dot files and GraphViz (which is an excellent library), I don't recall why but last time I needed to do this I used pydot - http://code.google.com/p/pydot/ From Leighton.Pritchard at hutton.ac.uk Mon Jan 30 06:55:46 2012 From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard) Date: Mon, 30 Jan 2012 11:55:46 +0000 Subject: [Biopython] Representing biological networks was: Re: (no subject) In-Reply-To: References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: Hi, I use Cytoscape and NetworkX, which cope perfectly well - as far as I've pushed them - with GML as a language for data transfer (you might want to get into the habit of using the relabel=True option when using read_gml(), though ;) ). However, for biological networks, you might also want to look at SBML which, unlike XGMML, is actively maintained and biology-specific, to see if it meets your needs. Cytoscape handles this format directly, but coverage is still lacking in NetworkX (which would be the appropriate place for this, rather than Biopython, in my opinion). There is a Python API to libSBML, though. http://sbml.org/Main_Page http://sbml.org/Software/libSBML/docs/python-api/ http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats https://networkx.lanl.gov/trac/ticket/325 Cheers, L. On 30 Jan 2012, at Monday, January 30, 11:14, Peter Cock wrote: On Mon, Jan 30, 2012 at 10:09 AM, Kristian Ullrich > wrote: Hello Biopython Team, I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. Thank you in anticipation Kristian Ullrich Hi, XGMML looks much more general than 'just' biology, so I wonder if XGMML support in NetworkX would a better idea? I couldn't find an open issue on this on their tracker, but certainly they sounded positive about this back in 2009, http://groups.google.com/group/networkx-discuss/browse_thread/thread/fb30306e43414c74 Peter P.S. There are several Python libraries for using dot files and GraphViz (which is an excellent library), I don't recall why but last time I needed to do this I used pydot - http://code.google.com/p/pydot/ _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -- Dr Leighton Pritchard MRSC DG31, Plant Pathology Programme, James Hutton Institute (Dundee) Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:leighton.pritchard at hutton.ac.uk w:http://www.hutton.ac.uk/staff/leighton-pritchard gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827 ________________________________________________________ This email is from The James Hutton Institute (JHI), however the views expressed by the sender are not necessarily the views of JHI and its subsidiaries. This email and any attachments are confidential and are intended solely for the use of the recipient(s) to whom they are addressed. If you are not the intended recipient, you should not read, copy, disclose or rely on any information contained in this email, and we would ask you to contact the sender immediately and delete the email from your system. Although JHI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and any attachments. The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Edinburgh No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796 From mmokrejs at fold.natur.cuni.cz Mon Jan 30 07:27:16 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Mon, 30 Jan 2012 13:27:16 +0100 Subject: [Biopython] (no subject) In-Reply-To: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: <4F268CA4.2080801@fold.natur.cuni.cz> Hi, I do not know if this help you but there is a biopython's branch using networkx to parse GO files: https://github.com/ntamas/biopython Martin Kristian Ullrich wrote: > Hello Biopython Team, > > I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). > > Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? > > Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. > > Thank you in anticipation > > Kristian Ullrich > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From sweta.dubey31 at gmail.com Tue Jan 31 02:00:33 2012 From: sweta.dubey31 at gmail.com (shweta dubey) Date: Tue, 31 Jan 2012 12:30:33 +0530 Subject: [Biopython] regarding retrieving antigen information of specific gene using Biopython Message-ID: hello everyone, I am new to Biopython.I have a set of genes and i want information of antigens specific to these genes from a database(suppose, Antigen Database). How can i do the same using Biopython?? Thanks in advance Shweta Dubey From mictadlo at gmail.com Tue Jan 3 02:05:48 2012 From: mictadlo at gmail.com (Mic) Date: Tue, 3 Jan 2012 12:05:48 +1000 Subject: [Biopython] subprocess.Popen problem In-Reply-To: References: Message-ID: With the following code: if __name__ == '__main__': cmd_soap = 'soap ...' proc = subprocess.Popen(cmd_soap, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) returncode = proc.wait() print "returncoded", returncode stdout_value, stderr_value = proc.communicate() print 'stderr ', stderr_value for line in stderr_value: print "!", line if returncode == 1: sys.exit(1) I get this: ! B ! e ! g ! i ! n ! ! P ! r ! o ! g ! r ! a ! m ! ! S ! O ! A ! P ! a ! l ! i ! g ! n ! e ! r ! / ! s ! o ! a ! p ! 2 ! ... instead of : Begin Program SOAPaligner/soap2 ... What did I wrong? On Thu, Nov 3, 2011 at 7:31 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 3:16 AM, Mic wrote: > > Thank you, I wrote the following code and not sure whether it is what did > > write me. > > Depending on the tool I would check for a non-zero return code rather > than just treating 1 as an error. > > You are also not collecting stderr/stdout correctly. If you send them > to a pipe, the strings from the .communicate will be empty. Rather > reads from the process object's .stdout and .stderr handles. See: > http://docs.python.org/library/subprocess.html > > Peter > From bpederse at gmail.com Tue Jan 3 03:01:23 2012 From: bpederse at gmail.com (Brent Pedersen) Date: Mon, 2 Jan 2012 20:01:23 -0700 Subject: [Biopython] subprocess.Popen problem In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 7:05 PM, Mic wrote: > With the following code: > if __name__ == '__main__': > ? ? ? ?cmd_soap = 'soap ...' > ? ? ? ?proc = subprocess.Popen(cmd_soap, shell=True, > stdout=subprocess.PIPE, stderr=subprocess.PIPE) > ? ? ? ?returncode = proc.wait() > ? ? ? ?print "returncoded", returncode > ? ? ? ?stdout_value, stderr_value = proc.communicate() > ? ? ? ?print 'stderr ', stderr_value > ? ? ? ?for line in stderr_value: > ? ? ? ? ? ? ? print "!", line > ? ? ? ?if returncode == 1: > ? ? ? ? ? ? ? ?sys.exit(1) > > I get this: > > ! B > ! e > ! g > ! i > ! n > ! > ! P > ! r > ! o > ! g > ! r > ! a > ! m > ! > ! S > ! O > ! A > ! P > ! a > ! l > ! i > ! g > ! n > ! e > ! r > ! / > ! s > ! o > ! a > ! p > ! 2 > ! > ... > > instead of : > Begin Program SOAPaligner/soap2 > ... > > > What did I wrong? > stderr_value is a string. you are iterating over the letters. you could do: for line in stderr_value.split("\n"): print "!", line also, instead of using proc.wait/communicate(), you could do: for line in proc.stderr: print "!", line print proc.returncode then you can see the output as it's generated (after buffering). > > On Thu, Nov 3, 2011 at 7:31 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 3:16 AM, Mic wrote: >> > Thank you, I wrote the following code and not sure whether it is what did >> > write me. >> >> Depending on the tool I would check for a non-zero return code rather >> than just treating 1 as an error. >> >> You are also not collecting stderr/stdout correctly. If you send them >> to a pipe, the strings from the .communicate will be empty. Rather >> reads from the process object's .stdout and .stderr handles. See: >> http://docs.python.org/library/subprocess.html >> >> Peter >> > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From David.Lapointe at umassmed.edu Fri Jan 6 19:43:45 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Fri, 6 Jan 2012 14:43:45 -0500 Subject: [Biopython] Bio Motif Message-ID: Is there a reference to the algorithm behind this module? David -- David Lapointe, Ph.D. Director Scientific Computing/Information Services University of Massachusetts Medical School 55 Lake Avenue N Worcester MA 01655 508-856-5141 (v) ' the lyf so short, the craft so long to lerne' From mjldehoon at yahoo.com Sat Jan 7 03:49:30 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 6 Jan 2012 19:49:30 -0800 (PST) Subject: [Biopython] Bio Motif In-Reply-To: Message-ID: <1325908170.23026.YahooMailClassic@web161205.mail.bf1.yahoo.com> Which part of Bio.Motif are you using? Bio.Motif has various capabilities (parsing, PWM score calculation, threshold calculation), and different references may be applicable depending on how you are using Bio.Motif. -Michiel. --- On Fri, 1/6/12, Lapointe, David wrote: > From: Lapointe, David > Subject: [Biopython] Bio Motif > To: "biopython at lists.open-bio.org" > Date: Friday, January 6, 2012, 2:43 PM > Is there a reference to the algorithm > behind this module? > > David > > -- > David Lapointe, Ph.D. > Director Scientific Computing/Information Services > University of Massachusetts Medical School > 55 Lake Avenue N > Worcester MA 01655 > 508-856-5141 (v) > ' the lyf so short, the craft so long to lerne' > > > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From David.Lapointe at umassmed.edu Sat Jan 7 22:39:02 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Sat, 7 Jan 2012 17:39:02 -0500 Subject: [Biopython] Bio Motif In-Reply-To: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> References: , <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> Message-ID: Thanks Michiel, I was assuming the replies went back to the list. Sorry. ________________________________________ From: Michiel de Hoon [mjldehoon at yahoo.com] Sent: Saturday, January 07, 2012 9:54 AM To: Lapointe, David Subject: RE: [Biopython] Bio Motif Hi David, It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). Best, -Michiel. --- On Sat, 1/7/12, Lapointe, David wrote: > From: Lapointe, David > Subject: RE: [Biopython] Bio Motif > To: "Michiel de Hoon" > Date: Saturday, January 7, 2012, 9:05 AM > Yes I should have been more specific. > Not the third party parts ( MEME, etc). I am > going from a list of sequences, creating a > PWM, then using the generated PWM to parse longer sequences > for motifs. Mostly I am interested in information about the > thresholds and interpreting results from search_pwm, but > also some way to cite the method for publication. > > David > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Friday, January 06, 2012 10:49 PM > To: biopython at lists.open-bio.org; > Lapointe, David > Subject: Re: [Biopython] Bio Motif > > Which part of Bio.Motif are you using? Bio.Motif has > various capabilities (parsing, PWM score calculation, > threshold calculation), and different references may be > applicable depending on how you are using Bio.Motif. > > -Michiel. > > --- On Fri, 1/6/12, Lapointe, David > wrote: > > > From: Lapointe, David > > Subject: [Biopython] Bio Motif > > To: "biopython at lists.open-bio.org" > > > Date: Friday, January 6, 2012, 2:43 PM > > Is there a reference to the algorithm > > behind this module? > > > > David > > > > -- > > David Lapointe, Ph.D. > > Director Scientific Computing/Information Services > > University of Massachusetts Medical School > > 55 Lake Avenue N > > Worcester MA 01655 > > 508-856-5141 (v) > > ' the lyf so short, the craft so long to lerne' > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > From bartek at rezolwenta.eu.org Sun Jan 8 00:13:16 2012 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 8 Jan 2012 01:13:16 +0100 Subject: [Biopython] Bio Motif In-Reply-To: References: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> Message-ID: Hi David, The references given by Michiel are fine. If you are using the specific methods for choosing the thresholds based on the expected false positive/negative rates you could cite the paper by Svedn Rahmann et al. (pubmed id 16646785). best Bartek On Sat, Jan 7, 2012 at 11:39 PM, Lapointe, David wrote: > Thanks Michiel, > > I was assuming the replies went back to the list. Sorry. > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Saturday, January 07, 2012 9:54 AM > To: Lapointe, David > Subject: RE: [Biopython] Bio Motif > > Hi David, > > It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). > > Best, > -Michiel. > > --- On Sat, 1/7/12, Lapointe, David wrote: > >> From: Lapointe, David >> Subject: RE: [Biopython] Bio Motif >> To: "Michiel de Hoon" >> Date: Saturday, January 7, 2012, 9:05 AM >> Yes I should have been more specific. >> Not the third party parts ( MEME, etc). I am >> going ? from a list of sequences, creating a >> PWM, then using the generated PWM to parse longer sequences >> for motifs. Mostly I am interested in information about the >> thresholds and interpreting results from search_pwm, but >> also some way to cite the method for publication. >> >> David >> ________________________________________ >> From: Michiel de Hoon [mjldehoon at yahoo.com] >> Sent: Friday, January 06, 2012 10:49 PM >> To: biopython at lists.open-bio.org; >> Lapointe, David >> Subject: Re: [Biopython] Bio Motif >> >> Which part of Bio.Motif are you using? Bio.Motif has >> various capabilities (parsing, PWM score calculation, >> threshold calculation), and different references may be >> applicable depending on how you are using Bio.Motif. >> >> -Michiel. >> >> --- On Fri, 1/6/12, Lapointe, David >> wrote: >> >> > From: Lapointe, David >> > Subject: [Biopython] Bio Motif >> > To: "biopython at lists.open-bio.org" >> >> > Date: Friday, January 6, 2012, 2:43 PM >> > Is there a reference to the algorithm >> > behind this module? >> > >> > David >> > >> > -- >> > David Lapointe, Ph.D. >> > Director Scientific Computing/Information Services >> > University of Massachusetts Medical School >> > 55 Lake Avenue N >> > Worcester MA 01655 >> > 508-856-5141 (v) >> > ' the lyf so short, the craft so long to lerne' >> > >> > >> > _______________________________________________ >> > Biopython mailing list ?- ?Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> >> > > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Bartek Wilczynski From David.Lapointe at umassmed.edu Sun Jan 8 01:00:16 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Sat, 7 Jan 2012 20:00:16 -0500 Subject: [Biopython] Bio Motif In-Reply-To: References: <1325948070.60902.YahooMailClassic@web161206.mail.bf1.yahoo.com> , Message-ID: Thanks! ________________________________________ From: biopython-bounces at lists.open-bio.org [biopython-bounces at lists.open-bio.org] On Behalf Of Bartek Wilczynski [bartek at rezolwenta.eu.org] Sent: Saturday, January 07, 2012 7:13 PM To: Lapointe, David Cc: biopython at lists.open-bio.org Subject: Re: [Biopython] Bio Motif Hi David, The references given by Michiel are fine. If you are using the specific methods for choosing the thresholds based on the expected false positive/negative rates you could cite the paper by Svedn Rahmann et al. (pubmed id 16646785). best Bartek On Sat, Jan 7, 2012 at 11:39 PM, Lapointe, David wrote: > Thanks Michiel, > > I was assuming the replies went back to the list. Sorry. > ________________________________________ > From: Michiel de Hoon [mjldehoon at yahoo.com] > Sent: Saturday, January 07, 2012 9:54 AM > To: Lapointe, David > Subject: RE: [Biopython] Bio Motif > > Hi David, > > It's better to reply to the list rather than to individual people. As a case in point, whereas I wrote some parts of Bio.Motif, I did not write the parts that you are using. My suggestion would be to cite "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" by Richard Durbin et al. as a general reference for these kinds of analysis, but the actual author of the relevant parts of Bio.Motif may have other suggestions. I would also encourage you to cite the generic Biopython reference (PubMed ID 19304878). > > Best, > -Michiel. > > --- On Sat, 1/7/12, Lapointe, David wrote: > >> From: Lapointe, David >> Subject: RE: [Biopython] Bio Motif >> To: "Michiel de Hoon" >> Date: Saturday, January 7, 2012, 9:05 AM >> Yes I should have been more specific. >> Not the third party parts ( MEME, etc). I am >> going from a list of sequences, creating a >> PWM, then using the generated PWM to parse longer sequences >> for motifs. Mostly I am interested in information about the >> thresholds and interpreting results from search_pwm, but >> also some way to cite the method for publication. >> >> David >> ________________________________________ >> From: Michiel de Hoon [mjldehoon at yahoo.com] >> Sent: Friday, January 06, 2012 10:49 PM >> To: biopython at lists.open-bio.org; >> Lapointe, David >> Subject: Re: [Biopython] Bio Motif >> >> Which part of Bio.Motif are you using? Bio.Motif has >> various capabilities (parsing, PWM score calculation, >> threshold calculation), and different references may be >> applicable depending on how you are using Bio.Motif. >> >> -Michiel. >> >> --- On Fri, 1/6/12, Lapointe, David >> wrote: >> >> > From: Lapointe, David >> > Subject: [Biopython] Bio Motif >> > To: "biopython at lists.open-bio.org" >> >> > Date: Friday, January 6, 2012, 2:43 PM >> > Is there a reference to the algorithm >> > behind this module? >> > >> > David >> > >> > -- >> > David Lapointe, Ph.D. >> > Director Scientific Computing/Information Services >> > University of Massachusetts Medical School >> > 55 Lake Avenue N >> > Worcester MA 01655 >> > 508-856-5141 (v) >> > ' the lyf so short, the craft so long to lerne' >> > >> > >> > _______________________________________________ >> > Biopython mailing list - Biopython at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython >> > >> >> > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Bartek Wilczynski _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mictadlo at gmail.com Mon Jan 16 06:24:28 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 16:24:28 +1000 Subject: [Biopython] Unique reads Message-ID: Hello, I read in many papers that they made unique reads before the reads were align and later on the SNPs were called. However, I could not find out how they do it. Which tool can be used to do it? Thank you in advance. From mictadlo at gmail.com Mon Jan 16 13:01:56 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 23:01:56 +1000 Subject: [Biopython] compare sequences Message-ID: Hello, Is there anyway a memory efficient way to compare sequences like from NGS? Thank you in advance. From p.j.a.cock at googlemail.com Mon Jan 16 13:30:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Jan 2012 13:30:01 +0000 Subject: [Biopython] compare sequences In-Reply-To: References: Message-ID: On Mon, Jan 16, 2012 at 1:01 PM, Mic wrote: > Hello, > Is there anyway a?memory?efficient?way to ?compare ?sequences like from NGS? > > Thank you in advance. Hi Mic, Could you stop posting such broad questions to multiple mailing lists simultaneously please? Perhaps you would find Biostars Q&A more useful? http://biostar.stackexchange.com/ See also: http://dx.doi.org/10.1371/journal.pcbi.1002202 Peter From mrrizkalla at gmail.com Tue Jan 17 19:43:07 2012 From: mrrizkalla at gmail.com (Mariam Reyad Rizkallah) Date: Tue, 17 Jan 2012 21:43:07 +0200 Subject: [Biopython] Installation on Ubuntu 11.10 64-bit (Loader.py:799) Message-ID: Dear Biopython community, I am installing Biopython v. 1.58 and BioSQL 1.0.1 on Ubuntu 11.10 64-bit virtual machine (Python 2.7.2+). When I run tests, I face this warning. test_BioSQL ... > ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: > UserWarning: order location operators are not fully supported > % feature.location_operator) > ok > test_BioSQL_SeqIO ... > ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: > UserWarning: bond location operators are not fully supported > % feature.location_operator) ok Is it related to the platform (64-bit)? Thank you. Mariam From p.j.a.cock at googlemail.com Wed Jan 18 09:12:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 09:12:10 +0000 Subject: [Biopython] Installation on Ubuntu 11.10 64-bit (Loader.py:799) In-Reply-To: References: Message-ID: On Tuesday, January 17, 2012, Mariam Reyad Rizkallah wrote: > Dear Biopython community, > > I am installing Biopython v. 1.58 and BioSQL 1.0.1 on Ubuntu 11.10 64-bit > virtual machine (Python 2.7.2+). When I run tests, I face this warning. > > test_BioSQL ... >> ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: >> UserWarning: order location operators are not fully supported >> % feature.location_operator) >> ok >> test_BioSQL_SeqIO ... >> ~/biopython-1.58/build/lib.linux-i686-2.7/BioSQL/Loader.py:799: >> UserWarning: bond location operators are not fully supported >> % feature.location_operator) > > ok > > > Is it related to the platform (64-bit)? > No, it's harmless - an obscure location string used in GenBank files that we don't store faithfully in BioSQL. Ideally we'd silence that warning in the unit tests... Peter From mike.thon at gmail.com Wed Jan 18 10:14:56 2012 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 18 Jan 2012 11:14:56 +0100 Subject: [Biopython] Is this a valid genbank record? Message-ID: Does anyone know if these GenBank records are valid: http://www.ncbi.nlm.nih.gov/protein/323463153 http://www.ncbi.nlm.nih.gov/protein/93279336 ...because biopython raised an exception when trying to parse them. They have weird feature locations: Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) thanks Mike From p.j.a.cock at googlemail.com Wed Jan 18 11:03:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 11:03:51 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: > Does anyone know if these GenBank records are valid: > > http://www.ncbi.nlm.nih.gov/protein/323463153 > http://www.ncbi.nlm.nih.gov/protein/93279336 > > ...because biopython raised an exception when trying to parse them. ?They have weird feature locations: > > ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), > ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) > > > thanks > Mike See also: http://www.bioperl.org/wiki/BioPerl_Locations#bond.28location.2Clocation...location.29 The use of "bond" in a feature location isn't described in the official GenBank/EMBL/DDBJ Feature Table definition, but that is aimed at nucleotide sequences only. I'm unaware of an official documentation on GenPept variations. Being practical we'd better update the parser to cope with it, even though it does seem to be a rare corner case. I'd have to go back and check, but I suspect prior to the parser rewrite back in Biopython 1.55 (released August 2011) we might have allowed for this. Are you happy to test an updated parser? Peter From mike.thon at gmail.com Wed Jan 18 11:11:52 2012 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 18 Jan 2012 12:11:52 +0100 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >> Does anyone know if these GenBank records are valid: >> >> http://www.ncbi.nlm.nih.gov/protein/323463153 >> http://www.ncbi.nlm.nih.gov/protein/93279336 >> >> ...because biopython raised an exception when trying to parse them. They have weird feature locations: >> >> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >> bond(138),bond(139),bond(138)) >> >> >> thanks >> Mike > > See also: http://www.bioperl.org/wiki/BioPerl_Locations#bond.28location.2Clocation...location.29 > > The use of "bond" in a feature location isn't described in the official > GenBank/EMBL/DDBJ Feature Table definition, but that is aimed at > nucleotide sequences only. I'm unaware of an official documentation > on GenPept variations. > > Being practical we'd better update the parser to cope with it, even > though it does seem to be a rare corner case. > > I'd have to go back and check, but I suspect prior to the parser rewrite > back in Biopython 1.55 (released August 2011) we might have allowed > for this. > > Are you happy to test an updated parser? > I'm happy to volunteer my student to test it :) Just post here when its ready and we'll try it. I'll have to think about what to do to get a script to use a local installation of biopython instead of the system-installed one. Do I need to mess with PYTHONPATH? Mike From p.j.a.cock at googlemail.com Wed Jan 18 14:50:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 14:50:05 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: > > On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: > >> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>> ?They have weird feature locations: >>> >>> ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), >>> ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) >>> Do you actually need to do anything with this feature? If not, then the pragmatic solution is we issue a warning but otherwise ignore the feature and continue parsing. I'm struggling to grok exactly what this location is trying to convey - maybe I should read the associated paper? >> >> Are you happy to test an updated parser? >> > > I'm happy to volunteer my student to test it :) ?Just post here when its ready > and we'll try it. Will do, with more details about how to test it. Peter From cjfields at illinois.edu Wed Jan 18 15:24:25 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 15:24:25 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >> >> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >> >>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>> They have weird feature locations: >>>> >>>> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>> bond(138),bond(139),bond(138)) >>>> > > Do you actually need to do anything with this feature? If not, then > the pragmatic solution is we issue a warning but otherwise ignore > the feature and continue parsing. I'm struggling to grok exactly > what this location is trying to convey - maybe I should read the > associated paper? GenPept is littered with these. With bioperl we only attempt to support 'bond' types for round-tripping, but I don't recall whether this has been extensively tested, though it would be easy enough to add this in to see if the location factory will handle this properly (both to and from a location string). Do wish NCBI would document this more... chris From p.j.a.cock at googlemail.com Wed Jan 18 15:34:58 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 15:34:58 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: On Wed, Jan 18, 2012 at 3:24 PM, Fields, Christopher J wrote: > On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: > >> On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >>> >>> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >>> >>>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>>> ?They have weird feature locations: >>>>> >>>>> ? ? Het ? ? ? ? ? ? join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>>> ? ? ? ? ? ? ? ? ? ? bond(138),bond(139),bond(138)) >>>>> >> >> Do you actually need to do anything with this feature? If not, then >> the pragmatic solution is we issue a warning but otherwise ignore >> the feature and continue parsing. I'm struggling to grok exactly >> what this location is trying to convey - maybe I should read the >> associated paper? > > GenPept is littered with these. ?With bioperl we only attempt to > support 'bond' types for round-tripping, but I don't recall whether > this has been extensively tested, though it would be easy enough > to add this in to see if the location factory will handle this properly > (both to and from a location string). > > Do wish NCBI would document this more... +1 Any idea why this example is: join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) rather than: bond(127,127,130,130,138,138,139,138) or indeed by so many of the residues are bonded more than once? Peter From cjfields at illinois.edu Wed Jan 18 16:31:12 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 16:31:12 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> Message-ID: <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> On Jan 18, 2012, at 9:34 AM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 3:24 PM, Fields, Christopher J > wrote: >> On Jan 18, 2012, at 8:50 AM, Peter Cock wrote: >> >>> On Wed, Jan 18, 2012 at 11:11 AM, Michael Thon wrote: >>>> >>>> On Jan 18, 2012, at 12:03 PM, Peter Cock wrote: >>>> >>>>> On Wed, Jan 18, 2012 at 10:14 AM, Michael Thon wrote: >>>>>> They have weird feature locations: >>>>>> >>>>>> Het join(bond(127),bond(127),bond(130),bond(130),bond(138), >>>>>> bond(138),bond(139),bond(138)) >>>>>> >>> >>> Do you actually need to do anything with this feature? If not, then >>> the pragmatic solution is we issue a warning but otherwise ignore >>> the feature and continue parsing. I'm struggling to grok exactly >>> what this location is trying to convey - maybe I should read the >>> associated paper? >> >> GenPept is littered with these. With bioperl we only attempt to >> support 'bond' types for round-tripping, but I don't recall whether >> this has been extensively tested, though it would be easy enough >> to add this in to see if the location factory will handle this properly >> (both to and from a location string). >> >> Do wish NCBI would document this more... > > +1 > > > join(bond(127),bond(127),bond(130),bond(130),bond(138), > bond(138),bond(139),bond(138)) > > rather than: > > bond(127,127,130,130,138,138,139,138) > > or indeed by so many of the residues are bonded more than > once? > > Peter No, that one is particularly odd, but there isn't a reason I could see where this couldn't be supported, it's just a join of simple locations. Seems this is something that may be auto-generated, wouldn't be surprised to see more of these. As to whether it's a valid GenBank record, well, considering the source of the record is NCBI, I think it's safe to say it's valid. (though again, this all comes back to how helpful it would be to have documentation re: how bond() is defined within the context of the feature table) chris From p.j.a.cock at googlemail.com Wed Jan 18 16:38:23 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 Jan 2012 16:38:23 +0000 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> References: <4309D62E-FA32-456C-B12D-15202D25CCC7@gmail.com> <9FBD9AC1-7F5D-4182-899C-42C028F7D180@illinois.edu> Message-ID: On Wed, Jan 18, 2012 at 4:31 PM, Fields, Christopher J wrote: > On Jan 18, 2012, at 9:34 AM, Peter Cock wrote: >> Any idea why this example is: >> >> join(bond(127),bond(127),bond(130),bond(130),bond(138), >> bond(138),bond(139),bond(138)) >> >> rather than: >> >> bond(127,127,130,130,138,138,139,138) >> >> or indeed by so many of the residues are bonded more than >> once? >> >> Peter > > No, that one is particularly odd, but there isn't a reason I could > see where this couldn't be supported, it's just a join of simple locations. >?Seems this is something that may be auto-generated, wouldn't be > surprised to see more of these. > > As to whether it's a valid GenBank record, well, considering the source > of the record is NCBI, I think it's safe to say it's valid. > > (though again, this all comes back to how helpful it would be to have > documentation re: how bond() is defined within the context of the > feature table) There is precedent for the NCBI publishing GenBank files which didn't conform to the published specification - but in the absence of any spec we're basically stuck with guessing what they intend. I will attempt to enquirer... Peter From David.Lapointe at umassmed.edu Wed Jan 18 16:13:45 2012 From: David.Lapointe at umassmed.edu (Lapointe, David) Date: Wed, 18 Jan 2012 11:13:45 -0500 Subject: [Biopython] Is this a valid genbank record? In-Reply-To: References: Message-ID: Those strange joins are where Ca has a "bond" with an AA in the structure, I would imagine. Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) /heterogen="( CA,1000 )" -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Michael Thon Sent: Wednesday, January 18, 2012 5:15 AM To: Biopython Mailing List Subject: [Biopython] Is this a valid genbank record? Does anyone know if these GenBank records are valid: http://www.ncbi.nlm.nih.gov/protein/323463153 http://www.ncbi.nlm.nih.gov/protein/93279336 ...because biopython raised an exception when trying to parse them. They have weird feature locations: Het join(bond(127),bond(127),bond(130),bond(130),bond(138), bond(138),bond(139),bond(138)) thanks Mike _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From dilara.ally at gmail.com Thu Jan 19 00:10:29 2012 From: dilara.ally at gmail.com (Dilara Ally) Date: Wed, 18 Jan 2012 16:10:29 -0800 Subject: [Biopython] SNP id Message-ID: <4F175F75.1030803@gmail.com> Hi All I was wondering if anyone on this listserv had any recommendation regarding coverage for the identification of SNPs from SOLiD or Illumina data? What is the acceptable coverage amount these days? Thanks! Dilara From bpkth2012 at gmail.com Wed Jan 25 16:14:50 2012 From: bpkth2012 at gmail.com (Sarttu Bourvir) Date: Wed, 25 Jan 2012 17:14:50 +0100 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: Message-ID: Hi, I am new to both python and biopython. What I'm trying to do is to parse a blast result xml file (myblast.xml), attached here. The code looks like this: #!/usr/bin/env python import sys import re import Bio from Bio.Blast import NCBIWWW from Bio.Blast import NCBIXML from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Blast import NCBIStandalone infile = raw_input ("which file:") result_handle = open(infile,'r') blast_parser = NCBIStandalone.BlastParser() blast_iterator = NCBIStandalone.Iterator(result_handle, blast_parser) blast_record = blast_iterator.next() for blast_record in blast_iterator: E_VALUE_THRESH = 0.01 for alignment in blast_records.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print '****Alignment****' print 'sequence', alignment.title print 'alignment length', alignment.length When I try to run it,I get this: Traceback (most recent call last): File "Blastparsee.py", line 17, in blast_record = blast_iterator.next() File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 1645, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 804, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 100, in feed self._scan_header(uhandle, consumer) File "/usr/lib/pymodules/python2.6/Bio/Blast/NCBIStandalone.py", line 225, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? I attached the xml file I've been trying to blast. However, I get the same error if I try using any other xml files from blast. What's going on? Thank you! Cheers, Sar P.S. If this type of messages are not allowed on this e-mailing list I apologize and promise to behave in the future!;) -------------- next part -------------- A non-text attachment was scrubbed... Name: myblast.xml Type: text/xml Size: 586125 bytes Desc: not available URL: From chapmanb at 50mail.com Wed Jan 25 16:57:34 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 25 Jan 2012 11:57:34 -0500 Subject: [Biopython] Fwd: BlastParsing gives Value Error: Invalid header? In-Reply-To: References: Message-ID: <87r4ynu3cx.fsf@fastmail.fm> Sar; > I am new to both python and biopython. Welcome. Thanks for including your code along with the problem report. > What I'm trying to do is to parse a blast result xml file (myblast.xml), > attached here. > > The code looks like this: [...] > blast_parser = NCBIStandalone.BlastParser() [...] > ValueError: Invalid header? You are using NCBIStandalone, which parses plain text blast output. To parse the XML output, you should use the NCBIXML parser: from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) The tutorial has more details and examples: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87 Hope this helps, Brad From kristian_ullrich at yahoo.de Mon Jan 30 10:09:09 2012 From: kristian_ullrich at yahoo.de (Kristian Ullrich) Date: Mon, 30 Jan 2012 10:09:09 +0000 (GMT) Subject: [Biopython] (no subject) Message-ID: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Hello Biopython Team, I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. Thank you in anticipation Kristian Ullrich From p.j.a.cock at googlemail.com Mon Jan 30 11:14:37 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 30 Jan 2012 11:14:37 +0000 Subject: [Biopython] (no subject) In-Reply-To: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: On Mon, Jan 30, 2012 at 10:09 AM, Kristian Ullrich wrote: > Hello Biopython Team, > > I am going to work with biological networks, next to the great tool cytoscape > I want to create networks out of python. To my knowledge there exists > pygraphviz and networkx which can produce gml and dot output files. > Cytoscape uses the XGMML Language > (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). > > Is there an easy way of how to use or manipulate existing python xml > parsers to work with xgmml files or are there plans of the Biopython > Team to write an XGMML - parser? > > Since a lot of biologist work with cytoscape this would be a very useful > tool to build graphical netowrks. > > Thank you in anticipation > > Kristian Ullrich Hi, XGMML looks much more general than 'just' biology, so I wonder if XGMML support in NetworkX would a better idea? I couldn't find an open issue on this on their tracker, but certainly they sounded positive about this back in 2009, http://groups.google.com/group/networkx-discuss/browse_thread/thread/fb30306e43414c74 Peter P.S. There are several Python libraries for using dot files and GraphViz (which is an excellent library), I don't recall why but last time I needed to do this I used pydot - http://code.google.com/p/pydot/ From Leighton.Pritchard at hutton.ac.uk Mon Jan 30 11:55:46 2012 From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard) Date: Mon, 30 Jan 2012 11:55:46 +0000 Subject: [Biopython] Representing biological networks was: Re: (no subject) In-Reply-To: References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: Hi, I use Cytoscape and NetworkX, which cope perfectly well - as far as I've pushed them - with GML as a language for data transfer (you might want to get into the habit of using the relabel=True option when using read_gml(), though ;) ). However, for biological networks, you might also want to look at SBML which, unlike XGMML, is actively maintained and biology-specific, to see if it meets your needs. Cytoscape handles this format directly, but coverage is still lacking in NetworkX (which would be the appropriate place for this, rather than Biopython, in my opinion). There is a Python API to libSBML, though. http://sbml.org/Main_Page http://sbml.org/Software/libSBML/docs/python-api/ http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats https://networkx.lanl.gov/trac/ticket/325 Cheers, L. On 30 Jan 2012, at Monday, January 30, 11:14, Peter Cock wrote: On Mon, Jan 30, 2012 at 10:09 AM, Kristian Ullrich > wrote: Hello Biopython Team, I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. Thank you in anticipation Kristian Ullrich Hi, XGMML looks much more general than 'just' biology, so I wonder if XGMML support in NetworkX would a better idea? I couldn't find an open issue on this on their tracker, but certainly they sounded positive about this back in 2009, http://groups.google.com/group/networkx-discuss/browse_thread/thread/fb30306e43414c74 Peter P.S. There are several Python libraries for using dot files and GraphViz (which is an excellent library), I don't recall why but last time I needed to do this I used pydot - http://code.google.com/p/pydot/ _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -- Dr Leighton Pritchard MRSC DG31, Plant Pathology Programme, James Hutton Institute (Dundee) Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:leighton.pritchard at hutton.ac.uk w:http://www.hutton.ac.uk/staff/leighton-pritchard gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827 ________________________________________________________ This email is from The James Hutton Institute (JHI), however the views expressed by the sender are not necessarily the views of JHI and its subsidiaries. This email and any attachments are confidential and are intended solely for the use of the recipient(s) to whom they are addressed. If you are not the intended recipient, you should not read, copy, disclose or rely on any information contained in this email, and we would ask you to contact the sender immediately and delete the email from your system. Although JHI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and any attachments. The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Edinburgh No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796 From mmokrejs at fold.natur.cuni.cz Mon Jan 30 12:27:16 2012 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Mon, 30 Jan 2012 13:27:16 +0100 Subject: [Biopython] (no subject) In-Reply-To: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> References: <1327918149.73202.YahooMailNeo@web25901.mail.ukl.yahoo.com> Message-ID: <4F268CA4.2080801@fold.natur.cuni.cz> Hi, I do not know if this help you but there is a biopython's branch using networkx to parse GO files: https://github.com/ntamas/biopython Martin Kristian Ullrich wrote: > Hello Biopython Team, > > I am going to work with biological networks, next to the great tool cytoscape I want to create networks out of python. To my knowledge there exists pygraphviz and networkx which can produce gml and dot output files. Cytoscape uses the XGMML Language (http://www.cs.rpi.edu/research/groups/pb/punin/public_html/XGMML/). > > Is there an easy way of how to use or manipulate existing python xml parsers to work with xgmml files or are there plans of the Biopython Team to write an XGMML - parser? > > Since a lot of biologist work with cytoscape this would be a very useful tool to build graphical netowrks. > > Thank you in anticipation > > Kristian Ullrich > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From sweta.dubey31 at gmail.com Tue Jan 31 07:00:33 2012 From: sweta.dubey31 at gmail.com (shweta dubey) Date: Tue, 31 Jan 2012 12:30:33 +0530 Subject: [Biopython] regarding retrieving antigen information of specific gene using Biopython Message-ID: hello everyone, I am new to Biopython.I have a set of genes and i want information of antigens specific to these genes from a database(suppose, Antigen Database). How can i do the same using Biopython?? Thanks in advance Shweta Dubey