From wolfgang.meyer at gmail.com Tue Jan 1 12:33:41 2008 From: wolfgang.meyer at gmail.com (Wolfgang Meyer) Date: Tue, 1 Jan 2008 18:33:41 +0100 Subject: [BioPython] residue sequence number length (no more than 4 digits) Message-ID: Hi, According to PDB format (old), residue sequence number length should be no longer than 4 digits. ... 23 - 26 Integer resSeq Residue sequence number. ... However, Bio.PDB.Residue.__init__(...) does not check the length of this parameter, neither does Bio.PDB.PDBIO. Though Bio.PDB.PDBIO tries to restrict the length of residue sequence number to 4 in the format string: _ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c %8.3f%8.3f%8.3f%6.2f%6.2f %4s%2s%2s\n" This does not prevent a residue sequence number longer than 4 digits to be written into a PDB file by PDBIO. Such a PDB file would be considered false by many PDB file parsers. Of course users should be responsible to feed residue sequence number of valid length to a residue. However, wouldn't it be better to handle some careless input of wrong residue sequence number in BioPython? Thanks! -- Wolfgang Meyer From hlapp at gmx.net Tue Jan 1 18:25:39 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Jan 2008 18:25:39 -0500 Subject: [BioPython] [BioSQL-l] Authority in biodatabase table In-Reply-To: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> Message-ID: (Sorry for this long-too-late reply. Going through old email that got left unread or unresponded.) Peter - you probably implemented something meanwhile that suits your needs. Just FYI, BioPerl leaves this empty too. The general notion for authority is that of the LSID authority field, but of course you won't be able to parse this out of any input file. The value for SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - NCBI hasn't ever issued any LSIDs, but presumably it would be something like ncbi.nlm.nih.gov. -hilmar On Nov 26, 2007, at 2:10 PM, Peter wrote: > Thank's for all the replies on the db_xref issue. > > Today I'd like to ask if there are any established guidelines for the > biodatabase table - in particular for how to use the "authority" field > in the biodatabase table, and if there is any agreed terminology for > the named "sub databases" defined therein i.e. what should I call them > in our documentation. > > By default, unless the user specifies an authority, we end up with a > NULL when creating entries in the biodatabase table using Biopython. > For example: > >> from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", > passwd = "", host = "localhost", db="bioseqdb") > db = server.new_database("orchids", description="Just for testing") > server.adaptor.commit() > > I'd like to give some sensible defaults in any worked examples. Apart >> from simple test cases (like above), sensible examples that came to > mind would be creating a "sub database" to contain: > (*) an entire GenBank release > (*) the latest SwissProt release > > What would you use in these cases. In fact, what does your > biodatabase table contain right now? > > Thank you all, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lee.byung-chul at kaist.ac.kr Wed Jan 2 06:00:37 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Wed, 02 Jan 2008 20:00:37 +0900 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format Message-ID: <477B6ED5.8080005@kaist.ac.kr> Dear colleagues. I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. I think that to do the process firstly the fasta format should be converted to clustalw format, so I try to use Formatconverter. However, at my trial, I cannot do that. I did like below: ---- #!/usr/bin/env python from Bio import Fasta from Bio.Align.FormatConvert import FormatConverter from Bio.Alphabet import IUPAC alignment = Fasta.FastaAlign.parse_file('tmp.fasta',type='PROTEIN') converter = FormatConverter(alignment) clw_align = converter.to_clustal() print clw_align ---- and tmp.fasta is --- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD But error occured. error messages are below: --- Traceback (most recent call last): File "tmp.py", line 7, in alignment = Fasta.FastaAlign.parse_file('tmp.fasta', type='PROTEIN') File "/var/lib/python-support/python2.5/Bio/Fasta/FastaAlign.py", line 48, in parse_file cur_align = iterator.next() File "/var/lib/python-support/python2.5/Bio/Fasta/__init__.py", line 72, in next result = self._iterator.next() File "/var/lib/python-support/python2.5/Martel/IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "/var/lib/python-support/python2.5/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 0 ----- What should I do? Could you advide me ? Thank you! Byung chul Lee From biopython at maubp.freeserve.co.uk Wed Jan 2 06:54:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 11:54:34 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <477B6ED5.8080005@kaist.ac.kr> References: <477B6ED5.8080005@kaist.ac.kr> Message-ID: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> Hello Byung chul Lee, On 1/2/08, Lee,Byung-chul wrote: > > Dear colleagues. > > I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. > I think that to do the process firstly the fasta format should be > converted to clustalw format, so I try to use Formatconverter. > However, at my trial, I cannot do that. Once you have an alignment object (loaded from any file format), this should work with AlignInfo. I don't think you need to convert it from FASTA to ClustalW. I would guess the error you saw is a problem with Biopython/Martel and mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. What version of Biopython are you using, as I would have expected this to work fine with Biopython 1.44? You could also try using Bio.SeqIO to load the FASTA format alignment file instead, see http://biopython.org/wiki/SeqIO from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) Peter From biopython at maubp.freeserve.co.uk Wed Jan 2 06:57:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 11:57:46 +0000 Subject: [BioPython] [BioSQL-l] Authority in biodatabase table In-Reply-To: References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a@mail.gmail.com> On 1/1/08, Hilmar Lapp wrote: > (Sorry for this long-too-late reply. Going through old email that got > left unread or unresponded.) > > Peter - you probably implemented something meanwhile that suits your > needs. Just FYI, BioPerl leaves this empty too. The general notion > for authority is that of the LSID authority field, but of course you > won't be able to parse this out of any input file. The value for > SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - > NCBI hasn't ever issued any LSIDs, but presumably it would be > something like ncbi.nlm.nih.gov. > > -hilmar Thank you Hilmar. It seem's that the current code in Biopython is fine (the authority field is left blank by default, unless the user supplies their own value), and consistent with both BioPerl and BioJava in this regard (thanks Richard). Peter From lee.byung-chul at kaist.ac.kr Wed Jan 2 08:44:47 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Wed, 02 Jan 2008 22:44:47 +0900 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> Message-ID: <477B954F.9020004@kaist.ac.kr> Thank you very much for your kind reply, Peter. As your explanation, I tried to use SeqIO, but another error occured I did it like below: ----------------- from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() -------------------- but the results are ----------------- Traceback (most recent call last): File "tmp.py", line 16, in print summary_align.dumb_consensus() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus consensus_alpha = self._guess_consensus_alphabet() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet ("Non-gapped alphabet found in alignment object.") ValueError: Non-gapped alphabet found in alignment object. --------------------- In addition, all sequences have the same lenghth in my tmp.fasta file. ----- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD Is this problem caused by the Biopython/Martel and mxTextTools vesions? I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. What should I do for this? Thanks. Byung chul. Peter wrote: > Hello Byung chul Lee, > > On 1/2/08, Lee,Byung-chul wrote: > >> Dear colleagues. >> >> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. >> I think that to do the process firstly the fasta format should be >> converted to clustalw format, so I try to use Formatconverter. >> However, at my trial, I cannot do that. >> > > Once you have an alignment object (loaded from any file format), this > should work with AlignInfo. I don't think you need to convert it from > FASTA to ClustalW. > > I would guess the error you saw is a problem with Biopython/Martel and > mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. > What version of Biopython are you using, as I would have expected this > to work fine with Biopython 1.44? > > You could also try using Bio.SeqIO to load the FASTA format alignment > file instead, see http://biopython.org/wiki/SeqIO > > from Bio import SeqIO > from Bio.Align import AlignInfo > alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) > summary_align = AlignInfo.SummaryInfo(alignment) > > Peter > > From biopython at maubp.freeserve.co.uk Wed Jan 2 12:46:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 17:46:25 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <477B954F.9020004@kaist.ac.kr> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> <477B954F.9020004@kaist.ac.kr> Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > As your explanation, I tried to use SeqIO, but another error occured > I did it like below: My fault, sorry. I wasn't at a computer with Biopython installed, I had to guess. I'll try and put together a proper example for you tomorrow. > Is this problem caused by the Biopython/Martel and mxTextTools vesions? > I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. The original problem you reported was due to the combination of Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is going to be very simple if you want to use the Ubuntu repositories. To avoid this Martel problem, I would suggest you un-install Biopython 1.43 from the Ubuntu repository, and then install Biopython 1.44 from source. Peter From biopython at maubp.freeserve.co.uk Fri Jan 4 08:20:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jan 2008 13:20:26 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> <477B954F.9020004@kaist.ac.kr> <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706@mail.gmail.com> On Jan 2, 2008 5:46 PM, Peter wrote: > On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > > As your explanation, I tried to use SeqIO, but another error occured > > I did it like below: > > My fault, sorry. I wasn't at a computer with Biopython installed, I > had to guess. I'll try and put together a proper example for you > tomorrow. This should work on Biopython 1.43 or later, I have tested it using the simple FASTA file you gave earlier: from Bio.Alphabet.IUPAC import IUPACProtein from Bio.Alphabet import Gapped from Bio import SeqIO from Bio.Align import AlignInfo gapped_protein = Gapped(IUPACProtein()) records = list(SeqIO.parse(open('tmp.fasta'), "fasta")) for rec in records : #Override the default generic alphabet: rec.seq.alphabet = gapped_protein #Turn these records into an alignment alignment = SeqIO.to_alignment(records, gapped_protein) del records summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() The problem with my previous shorter suggestion was the Bio.SeqIO FASTA parser returned SeqRecord objects with a generic alphabet, while the alignment summary expected a gapped alphabet. I'm beginning to think that the Bio.SeqIO.parse() function should allow an alphabet to be specified as an optional argument for this sort of situation. Alternatively, going back to your original code how about: from Bio.Fasta import FastaAlign from Bio.Align import AlignInfo alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN') summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0. It should work with older versions of Biopython using mxTextTools 2.0 as well. Peter From mjldehoon at yahoo.com Sat Jan 5 03:41:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST) Subject: [BioPython] Bio.Ais Message-ID: <140129.37367.qm@web62402.mail.re1.yahoo.com> Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From meesters at uni-mainz.de Mon Jan 7 13:13:59 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 7 Jan 2008 19:13:59 +0100 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' Message-ID: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> Hoi, I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I have this approach: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) residue.add(new) Here x, y, and z are floating point numbers and serial_number is an integer. 'residue' is a 'Residue' I'm iterating over. However, I keep getting the following error message and don't have a clue, how to proceed: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) TypeError: object of type 'module' is not callable Does anyone have a hint for me, how actually add an atom or what's wrong here? TIA Christian From biopython at maubp.freeserve.co.uk Mon Jan 7 13:55:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Jan 2008 18:55:57 +0000 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' In-Reply-To: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> Christian Meesters wrote: > I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I > have this approach: > ... > new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) > TypeError: object of type 'module' is not callable > > Does anyone have a hint for me, how actually add an atom or what's wrong > here? I would infer from the error that "Atom" refers to the Bio.PDB.Atom module, rather than the Bio.PDB.Atom.Atom class. How did you do your imports? Try this: from Bio.PDB.Atom import Atom Peter From lueck at ipk-gatersleben.de Tue Jan 8 04:06:40 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 10:06:40 +0100 Subject: [BioPython] blastall does not exist at %s" % blastcmd Message-ID: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> Hi! I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message: >>> Traceback (most recent call last): File "F:\Blast\blast.py", line 10, in my_blast_db, my_blast_file) File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall raise ValueError, "blastall does not exist at %s" % blastcmd ValueError: blastall does not exist at C:\Blast\bin\blastall.exe <<< >>> My Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"F:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\bin\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() <<< blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. Does somewone has idea where's the problem? Greetings Stefanie From biopython at maubp.freeserve.co.uk Tue Jan 8 05:46:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jan 2008 10:46:02 +0000 Subject: [BioPython] blastall does not exist at %s" % blastcmd In-Reply-To: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com> On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter From lueck at ipk-gatersleben.de Tue Jan 8 06:32:54 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 12:32:54 +0100 Subject: [BioPython] blastall does not exist at %s" % blastcmd References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com> Message-ID: <003a01c851ea$357e5cd0$1022a8c0@ipkgatersleben.de> Thanks Peter! C:\Blast\blastall.exe worked!. Sorry for the drive mistake, I have it on both... But my xml File is empty :-( I'll try to fix it... standalone blast version is blast-2.2.17-ia32-win32.exe Stefanie ----- Original Message ----- From: "Peter" To: "Stefanie L?ck" Cc: Sent: Tuesday, January 08, 2008 11:46 AM Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the > cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line > 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, > "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be > found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter From lueck at ipk-gatersleben.de Tue Jan 8 09:18:08 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 15:18:08 +0100 Subject: [BioPython] empty xml after local blast Message-ID: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> Hi again! I got blastall running but my xml output file is empty... Any ideas? Where exactly must be my fasta file? >>> Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"C:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() >>> I'm using Python 2.5, biopython-1.44.win32-py2.5.exe and blast-2.2.17-ia32-win32.exe Regards Stefanie From biopython at maubp.freeserve.co.uk Tue Jan 8 09:33:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jan 2008 14:33:29 +0000 Subject: [BioPython] empty xml after local blast In-Reply-To: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com> On Jan 8, 2008 2:18 PM, Stefanie L?ck wrote: > Hi again! > > I got blastall running but my xml output file is empty... > Any ideas? Have you ever tried running blastall.exe from the command line "by hand"? This can be very useful, and would let you rule out several basic problems (e.g. make sure blast is installed correctly, and that your database is working). > Where exactly must be my fasta file? Where ever you like - as long as you specify its location correctly. Your code below seems to assume that "test.fasta" is in the current directory (i.e. where you are running your python script from). Is this correct? It may be simpler to use a full path, e.g. my_blast_file = r"C:\temp\test.fasta" I suspect that Standalone blast is not finding the input file, or that it is not finding your database. If you get an empty XML file, one thing to try is checking the error output from the command line call: print error_info.read() Peter From lueck at ipk-gatersleben.de Tue Jan 8 10:18:32 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 16:18:32 +0100 Subject: [BioPython] empty xml after local blast References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com> Message-ID: <009d01c85209$bb314210$1022a8c0@ipkgatersleben.de> Thanks, it's couldn't find the database! Great help, thanks a lot ;-) ----- Original Message ----- From: "Peter" To: "Stefanie L?ck" Cc: Sent: Tuesday, January 08, 2008 3:33 PM Subject: Re: [BioPython] empty xml after local blast On Jan 8, 2008 2:18 PM, Stefanie L?ck wrote: > Hi again! > > I got blastall running but my xml output file is empty... > Any ideas? Have you ever tried running blastall.exe from the command line "by hand"? This can be very useful, and would let you rule out several basic problems (e.g. make sure blast is installed correctly, and that your database is working). > Where exactly must be my fasta file? Where ever you like - as long as you specify its location correctly. Your code below seems to assume that "test.fasta" is in the current directory (i.e. where you are running your python script from). Is this correct? It may be simpler to use a full path, e.g. my_blast_file = r"C:\temp\test.fasta" I suspect that Standalone blast is not finding the input file, or that it is not finding your database. If you get an empty XML file, one thing to try is checking the error output from the command line call: print error_info.read() Peter From meesters at uni-mainz.de Tue Jan 8 11:12:09 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Tue, 8 Jan 2008 17:12:09 +0100 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' In-Reply-To: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> Message-ID: <1199808729.5401.75.camel@meesters.biologie.uni-mainz.de> > I would infer from the error that "Atom" refers to the Bio.PDB.Atom > module, rather than the Bio.PDB.Atom.Atom class. How did you do your > imports? Try this: > > from Bio.PDB.Atom import Atom > > Peter Ouch! Next time I'll try the tutor-list ;-). Thanks a lot. Christian From quantrum75 at yahoo.com Thu Jan 10 19:16:51 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Thu, 10 Jan 2008 16:16:51 -0800 (PST) Subject: [BioPython] bio.PDB module In-Reply-To: Message-ID: <258224.6110.qm@web31404.mail.mud.yahoo.com> Hi I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure. I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way. Thanks for your time Regards Rama biopython-request at lists.open-bio.org wrote: Send BioPython mailing list submissions to biopython at lists.open-bio.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.open-bio.org/mailman/listinfo/biopython or, via email, send a message with subject or body 'help' to biopython-request at lists.open-bio.org You can reach the person managing the list at biopython-owner at lists.open-bio.org When replying, please edit your Subject line so it is more specific than "Re: Contents of BioPython digest..." Today's Topics: 1. Re: [BioSQL-l] Authority in biodatabase table (Peter) 2. Re: FormatConverter: from Fasta format to ClustalW format (Lee,Byung-chul) 3. Re: FormatConverter: from Fasta format to ClustalW format (Peter) 4. Re: FormatConverter: from Fasta format to ClustalW format (Peter) 5. Bio.Ais (Michiel de Hoon) 6. Bio.PDB - adding 'dummy atoms' (Christian Meesters) 7. Re: Bio.PDB - adding 'dummy atoms' (Peter) 8. blastall does not exist at %s" % blastcmd (Stefanie L?ck) 9. Re: blastall does not exist at %s" % blastcmd (Peter) ---------------------------------------------------------------------- Message: 1 Date: Wed, 2 Jan 2008 11:57:46 +0000 From: Peter Subject: Re: [BioPython] [BioSQL-l] Authority in biodatabase table To: "Hilmar Lapp" Cc: biopython at lists.open-bio.org, biosql-l at lists.open-bio.org Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On 1/1/08, Hilmar Lapp wrote: > (Sorry for this long-too-late reply. Going through old email that got > left unread or unresponded.) > > Peter - you probably implemented something meanwhile that suits your > needs. Just FYI, BioPerl leaves this empty too. The general notion > for authority is that of the LSID authority field, but of course you > won't be able to parse this out of any input file. The value for > SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - > NCBI hasn't ever issued any LSIDs, but presumably it would be > something like ncbi.nlm.nih.gov. > > -hilmar Thank you Hilmar. It seem's that the current code in Biopython is fine (the authority field is left blank by default, unless the user supplies their own value), and consistent with both BioPerl and BioJava in this regard (thanks Richard). Peter ------------------------------ Message: 2 Date: Wed, 02 Jan 2008 22:44:47 +0900 From: "Lee,Byung-chul" Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: biopython at lists.open-bio.org Message-ID: <477B954F.9020004 at kaist.ac.kr> Content-Type: text/plain; charset=EUC-KR Thank you very much for your kind reply, Peter. As your explanation, I tried to use SeqIO, but another error occured I did it like below: ----------------- from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() -------------------- but the results are ----------------- Traceback (most recent call last): File "tmp.py", line 16, in print summary_align.dumb_consensus() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus consensus_alpha = self._guess_consensus_alphabet() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet ("Non-gapped alphabet found in alignment object.") ValueError: Non-gapped alphabet found in alignment object. --------------------- In addition, all sequences have the same lenghth in my tmp.fasta file. ----- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD Is this problem caused by the Biopython/Martel and mxTextTools vesions? I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. What should I do for this? Thanks. Byung chul. Peter wrote: > Hello Byung chul Lee, > > On 1/2/08, Lee,Byung-chul wrote: > >> Dear colleagues. >> >> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. >> I think that to do the process firstly the fasta format should be >> converted to clustalw format, so I try to use Formatconverter. >> However, at my trial, I cannot do that. >> > > Once you have an alignment object (loaded from any file format), this > should work with AlignInfo. I don't think you need to convert it from > FASTA to ClustalW. > > I would guess the error you saw is a problem with Biopython/Martel and > mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. > What version of Biopython are you using, as I would have expected this > to work fine with Biopython 1.44? > > You could also try using Bio.SeqIO to load the FASTA format alignment > file instead, see http://biopython.org/wiki/SeqIO > > from Bio import SeqIO > from Bio.Align import AlignInfo > alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) > summary_align = AlignInfo.SummaryInfo(alignment) > > Peter > > ------------------------------ Message: 3 Date: Wed, 2 Jan 2008 17:46:25 +0000 From: Peter Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: "Lee,Byung-chul" Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > As your explanation, I tried to use SeqIO, but another error occured > I did it like below: My fault, sorry. I wasn't at a computer with Biopython installed, I had to guess. I'll try and put together a proper example for you tomorrow. > Is this problem caused by the Biopython/Martel and mxTextTools vesions? > I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. The original problem you reported was due to the combination of Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is going to be very simple if you want to use the Ubuntu repositories. To avoid this Martel problem, I would suggest you un-install Biopython 1.43 from the Ubuntu repository, and then install Biopython 1.44 from source. Peter ------------------------------ Message: 4 Date: Fri, 4 Jan 2008 13:20:26 +0000 From: Peter Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: "Lee,Byung-chul" Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 2, 2008 5:46 PM, Peter wrote: > On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > > As your explanation, I tried to use SeqIO, but another error occured > > I did it like below: > > My fault, sorry. I wasn't at a computer with Biopython installed, I > had to guess. I'll try and put together a proper example for you > tomorrow. This should work on Biopython 1.43 or later, I have tested it using the simple FASTA file you gave earlier: from Bio.Alphabet.IUPAC import IUPACProtein from Bio.Alphabet import Gapped from Bio import SeqIO from Bio.Align import AlignInfo gapped_protein = Gapped(IUPACProtein()) records = list(SeqIO.parse(open('tmp.fasta'), "fasta")) for rec in records : #Override the default generic alphabet: rec.seq.alphabet = gapped_protein #Turn these records into an alignment alignment = SeqIO.to_alignment(records, gapped_protein) del records summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() The problem with my previous shorter suggestion was the Bio.SeqIO FASTA parser returned SeqRecord objects with a generic alphabet, while the alignment summary expected a gapped alphabet. I'm beginning to think that the Bio.SeqIO.parse() function should allow an alphabet to be specified as an optional argument for this sort of situation. Alternatively, going back to your original code how about: from Bio.Fasta import FastaAlign from Bio.Align import AlignInfo alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN') summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0. It should work with older versions of Biopython using mxTextTools 2.0 as well. Peter ------------------------------ Message: 5 Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST) From: Michiel de Hoon Subject: [BioPython] Bio.Ais To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org Message-ID: <140129.37367.qm at web62402.mail.re1.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. ------------------------------ Message: 6 Date: Mon, 7 Jan 2008 19:13:59 +0100 From: Christian Meesters Subject: [BioPython] Bio.PDB - adding 'dummy atoms' To: "biopython at lists.open-bio.org" Message-ID: <1199729639.13152.20.camel at meesters.biologie.uni-mainz.de> Content-Type: text/plain Hoi, I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I have this approach: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) residue.add(new) Here x, y, and z are floating point numbers and serial_number is an integer. 'residue' is a 'Residue' I'm iterating over. However, I keep getting the following error message and don't have a clue, how to proceed: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) TypeError: object of type 'module' is not callable Does anyone have a hint for me, how actually add an atom or what's wrong here? TIA Christian ------------------------------ Message: 7 Date: Mon, 7 Jan 2008 18:55:57 +0000 From: Peter Subject: Re: [BioPython] Bio.PDB - adding 'dummy atoms' To: "Christian Meesters" Cc: "biopython at lists.open-bio.org" Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Christian Meesters wrote: > I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I > have this approach: > ... > new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) > TypeError: object of type 'module' is not callable > > Does anyone have a hint for me, how actually add an atom or what's wrong > here? I would infer from the error that "Atom" refers to the Bio.PDB.Atom module, rather than the Bio.PDB.Atom.Atom class. How did you do your imports? Try this: from Bio.PDB.Atom import Atom Peter ------------------------------ Message: 8 Date: Tue, 8 Jan 2008 10:06:40 +0100 From: Stefanie L?ck Subject: [BioPython] blastall does not exist at %s" % blastcmd To: Message-ID: <002301c851d5$c7daac60$1022a8c0 at ipkgatersleben.de> Content-Type: text/plain; charset="iso-8859-1" Hi! I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message: >>> Traceback (most recent call last): File "F:\Blast\blast.py", line 10, in my_blast_db, my_blast_file) File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall raise ValueError, "blastall does not exist at %s" % blastcmd ValueError: blastall does not exist at C:\Blast\bin\blastall.exe <<< >>> My Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"F:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\bin\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() <<< blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. Does somewone has idea where's the problem? Greetings Stefanie ------------------------------ Message: 9 Date: Tue, 8 Jan 2008 10:46:02 +0000 From: Peter Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd To: " Stefanie L?ck " Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter ------------------------------ _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython End of BioPython Digest, Vol 61, Issue 2 **************************************** --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From lee.byung-chul at kaist.ac.kr Thu Jan 10 22:15:02 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Fri, 11 Jan 2008 12:15:02 +0900 Subject: [BioPython] bio.PDB module In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com> References: <258224.6110.qm@web31404.mail.mud.yahoo.com> Message-ID: <4786DF36.6070102@kaist.ac.kr> quantrum75 wrote: > > Hi > > I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure. > > I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way. > > Thanks for your time > > Regards > > Rama > > > I think the web page below can help you. Check it. : http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Byung chul. From mjldehoon at yahoo.com Fri Jan 11 06:16:45 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 11 Jan 2008 03:16:45 -0800 (PST) Subject: [BioPython] [Biopython-dev] Bio.Ais In-Reply-To: <140129.37367.qm@web62402.mail.re1.yahoo.com> Message-ID: <426295.9925.qm@web62415.mail.re1.yahoo.com> Looking at this again, currently we have no documentation for Bio.Ais, no maintainer, and no apparent users (at least, I couldn't find any in the mailing list archives). Would anybody mind very much if I mark this module as deprecated? Just to find out if there are any users of this code out there. --Michiel. Michiel de Hoon wrote: Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From biopython at maubp.freeserve.co.uk Fri Jan 11 06:51:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 11 Jan 2008 11:51:41 +0000 Subject: [BioPython] bio.PDB module In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com> References: <258224.6110.qm@web31404.mail.mud.yahoo.com> Message-ID: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com> On Jan 11, 2008 12:16 AM, quantrum75 wrote: > Hi > I am a biopython newbie. I was wondering if someone could show me or send me > ( I would be thankful) where I could find a script which can read a pdb file and out > the phi and psi angles of the protein structure. I see Byung chul has already suggested reading this page: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Do you think we should incorporate some that into the main Biopython documentation? > I have read through the bio.PDB module and structural module documentation, > but still do not have an idea on how to proceed to tackle the problem. I wish the > bio.PDB documentation was a bit more detailed and included some examples to > work with. Have you read the Biopython Structural Bioinformatics FAQ, http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is linked to from our documentation webpage, but doesn't seem to me mentioned in the main Biopython Tutorial and Cookbook... > I really would like to contribute to the project and maybe if I got an > initial idea on how to work with the same, I can contribute in some small way. Maybe you could start a "Getting started with Bio.PDB" page on the Wiki? Peter From quantrum75 at yahoo.com Fri Jan 11 08:26:23 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Fri, 11 Jan 2008 05:26:23 -0800 (PST) Subject: [BioPython] bio.PDB module In-Reply-To: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com> Message-ID: <496455.11121.qm@web31409.mail.mud.yahoo.com> Hi Peter, Thanks for your reply. I did go through the links which you made a mention of to me including the structural bioinformatics FAQ. However, I feel the documentation pertaining to bio.PDB module is seriously short on any practical examples for a person like me who likes to learn from examples. I would love to be able to write a "getting started with bio.PDB" wiki or document with examples. However, I need to get the basic ideas on how to use the module which I am unable to from the current documentation which is why I made the request for a script which can compute the phi and psi angle of a pdb file. I ll see what I can do and if you could direct to any resources, that would be great. Thanks Rama Peter wrote: On Jan 11, 2008 12:16 AM, quantrum75 wrote: > Hi > I am a biopython newbie. I was wondering if someone could show me or send me > ( I would be thankful) where I could find a script which can read a pdb file and out > the phi and psi angles of the protein structure. I see Byung chul has already suggested reading this page: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Do you think we should incorporate some that into the main Biopython documentation? > I have read through the bio.PDB module and structural module documentation, > but still do not have an idea on how to proceed to tackle the problem. I wish the > bio.PDB documentation was a bit more detailed and included some examples to > work with. Have you read the Biopython Structural Bioinformatics FAQ, http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is linked to from our documentation webpage, but doesn't seem to me mentioned in the main Biopython Tutorial and Cookbook... > I really would like to contribute to the project and maybe if I got an > initial idea on how to work with the same, I can contribute in some small way. Maybe you could start a "Getting started with Bio.PDB" page on the Wiki? Peter --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From jdieten at gmail.com Tue Jan 15 08:26:08 2008 From: jdieten at gmail.com (Joost van Dieten) Date: Tue, 15 Jan 2008 14:26:08 +0100 Subject: [BioPython] [Biopython] Blast problem Message-ID: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> Hi Everyone, I'am having a problem with the hsp.match function in the Bio-Python Blast module. A few weeks ago the hsp.match returned me the following: ATGGCA++TGG But now it gives me: ATGGCA TGG I can't see the number of gaps anymore, anyone a solution for this? Best regards, Joost van Dieten From biopython at maubp.freeserve.co.uk Tue Jan 15 09:09:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Jan 2008 14:09:47 +0000 Subject: [BioPython] [Biopython] Blast problem In-Reply-To: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> References: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> Message-ID: <320fb6e00801150609y16c77bdch927dd6d9689996a5@mail.gmail.com> Hi Joost, > Iam having a problem with the hsp.match function in the Bio-Python Blast > module. A few weeks ago the hsp.match returned me the following: > > ATGGCA++TGG > > But now it gives me: > > ATGGCA TGG > > I can't see the number of gaps anymore, anyone a solution for this? Are you using the online version of blast with Biopython? Perhaps the NCBI changed something. Are you parsing the XML output or the plain text? Can you provide any more information (e.g. which version of Biopython). Thanks Peter From luca.beltrame at unimi.it Thu Jan 17 09:12:55 2008 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Thu, 17 Jan 2008 15:12:55 +0100 Subject: [BioPython] KEGG Gene parser? Message-ID: <200801171512.55898.luca.beltrame@unimi.it> Hello. I'd like to know if there is a parser that can parse KEGG gene entries. As far as I can see, Bio.KEGG can only do Compound and Enzyme. Should there be the need I'm thinking about writing one, but since in 2004 someone had posted something (now no longer available), I'm asking the list first. Thanks. From lueck at ipk-gatersleben.de Mon Jan 21 06:21:52 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Mon, 21 Jan 2008 12:21:52 +0100 Subject: [BioPython] blastall questions (output, full length subject) Message-ID: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> Hi! I need again some advice for a local blast with blastall. First of all, everything works fine, I just have some questions on how to continue: 1) How can I see the full length of the subject? I always can see only this part, which is matching with the query. 2) How are your suggestions to continue with the xml output? I want to sort the Hits by % of matching and my idea was it to put everything in a dictionary (%match as key and all the rest information's as values). Is this the right way? Greetings Stefanie From winter at biotec.tu-dresden.de Mon Jan 21 08:18:15 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Mon, 21 Jan 2008 14:18:15 +0100 Subject: [BioPython] blastall questions (output, full length subject) In-Reply-To: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> References: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> Message-ID: <47949B97.90205@biotec.tu-dresden.de> Stefanie L?ck wrote: > Hi! > > I need again some advice for a local blast with blastall. > > First of all, everything works fine, I just have some questions on how to > continue: > > 1) How can I see the full length of the subject? I always can see only this > part, which is matching with the query. Hi Stefanie, you suffered from the slightly confusing naming in the BioPython NCBIXML class. Here is an explanation: alignment.length = total length of unaligned hit sequence record.query_letters = length of query sequence len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment with parser = NCBIXML.BlastParser() records = parser.parse(open(blast_results_file)) for record in records: for alignment in record.alignments: for hsp in alignment.hsps: # do s.th. > 2) How are your suggestions to continue with the xml output? I want to sort > the Hits by % of matching and my idea was it to put everything in a > dictionary (%match as key and all the rest information's as values). If you refer to the sequence identity percentage, you can use sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query)) To use the sequence identity as key in a dictionary, you would have to keep a list (or set) of records as value, since different records (hits) can have the same sequence identity. I would recommend to just keep a set (or list) of records, and use the key or cmp parameter of Python's sort function to sort by one field of the record: http://wiki.python.org/moin/HowTo/Sorting If you only need some information of the record, it might be even easier to store this information in a list, and keep a set (or list) of these lists. HTH, Christof PS: Maybe we could enrich NCBIXML.py for some more meaningful variables? > > Is this the right way? > > > > Greetings > > Stefanie > > > > _______________________________________________ BioPython mailing list - > BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Mon Jan 21 10:15:45 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Jan 2008 15:15:45 +0000 Subject: [BioPython] KEGG Gene parser? In-Reply-To: <200801171512.55898.luca.beltrame@unimi.it> References: <200801171512.55898.luca.beltrame@unimi.it> Message-ID: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> On Jan 17, 2008 Luca Beltrame wrote: > Hello. > > I'd like to know if there is a parser that can parse KEGG gene entries. As far > as I can see, Bio.KEGG can only do Compound and Enzyme. And there is also Bio.KEGG.Map, but you are right, there doesn't seem to be anything for KEGG gene entries. > Should there be the need I'm thinking about writing one, but since in 2004 > someone had posted something (now no longer available), I'm asking the list > first. It looks no-else is working on any KEGG code, so if you still want to write something it could be be useful. Are you happy to write this in a similar style to the existing Bio.KEGG modules, and put together some basic documentation and a test case too? Peter From jkhilmer at gmail.com Tue Jan 22 14:41:07 2008 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 22 Jan 2008 12:41:07 -0700 Subject: [BioPython] KEGG Gene parser? In-Reply-To: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> References: <200801171512.55898.luca.beltrame@unimi.it> <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> Message-ID: <81277ce10801221141ya4f0d3fr87858102274d6e2e@mail.gmail.com> Luca, My lab also has interest in KEGG gene entries. Although I have minimal experience in professional Python programming, I would be happy to help in any way: perhaps testing etc. Jonathan Hilmer Bothner Research Group Montana State University On Jan 21, 2008 8:15 AM, Peter wrote: > On Jan 17, 2008 Luca Beltrame wrote: > > Hello. > > > > I'd like to know if there is a parser that can parse KEGG gene entries. As far > > as I can see, Bio.KEGG can only do Compound and Enzyme. > > And there is also Bio.KEGG.Map, but you are right, there doesn't seem > to be anything for KEGG gene entries. > > > Should there be the need I'm thinking about writing one, but since in 2004 > > someone had posted something (now no longer available), I'm asking the list > > first. > > It looks no-else is working on any KEGG code, so if you still want to > write something it could be be useful. Are you happy to write this in > a similar style to the existing Bio.KEGG modules, and put together > some basic documentation and a test case too? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From bsantos at biocant.pt Wed Jan 23 12:55:18 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 23 Jan 2008 17:55:18 +0000 Subject: [BioPython] Problems runing BLAST Message-ID: <20080123175518.eab8a089@mail.biocant.pt> Hi I use to run blastall without any problems, but now I have moved all my scripts to a server runing Fedora Core 6 and now I get the folowing error when parsing the blast results: Traceback (most recent call last): File "/usr/local/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 568, in parse raise SyntaxError("Your XML file did not start References: <20080123175518.eab8a089@mail.biocant.pt> Message-ID: On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > raise SyntaxError("Your XML file did not start SyntaxError: Your XML file did not start References: <20080123175518.eab8a089@mail.biocant.pt> Message-ID: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> On 1/23/08, Sebastian Bassi wrote: > On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > > raise SyntaxError("Your XML file did not start > SyntaxError: Your XML file did not start > Can you show us the result of: > head your_xml_file.xml Seeing the start of the XML file would be very helpful. And if is empty, what has been written to the error handle? I would guess maybe the database is in a new location or something simple like that... print error_info.read() Another thing to check is the version of Biopython on the new machine. Earlier versions would default to asking blast for plain text output instead of XML. Peter From bsantos at biocant.pt Fri Jan 25 07:15:56 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 25 Jan 2008 12:15:56 -0000 Subject: [BioPython] Problems runing BLAST In-Reply-To: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> References: <20080123175518.eab8a089@mail.biocant.pt> <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> Message-ID: <000301c85f4c$0bd830d0$23889270$@pt> I wasn't using any XML file as intermediate, I was parsing the blast results directly. But it was really a problem with the databases. Now it's solved. My question now is another one, I'm blasting a multifasta file, so I need to know which results belongs to which query sequence ID. I Know I can simply assume that the blast result is ordered according to the sequences in the fasta file, but is any other away to obtain the query ID directly using the Blast Record class? Thanks in advance, Bruno Santos -----Mensagem original----- De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de Peter Enviada: quarta-feira, 23 de Janeiro de 2008 21:07 Para: Sebastian Bassi Cc: Bruno Santos; biopython at biopython.org Assunto: Re: [BioPython] Problems runing BLAST On 1/23/08, Sebastian Bassi wrote: > On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > > raise SyntaxError("Your XML file did not start > SyntaxError: Your XML file did not start > Can you show us the result of: > head your_xml_file.xml Seeing the start of the XML file would be very helpful. And if is empty, what has been written to the error handle? I would guess maybe the database is in a new location or something simple like that... print error_info.read() Another thing to check is the version of Biopython on the new machine. Earlier versions would default to asking blast for plain text output instead of XML. Peter From winter at biotec.tu-dresden.de Fri Jan 25 08:02:06 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Fri, 25 Jan 2008 14:02:06 +0100 Subject: [BioPython] Problems runing BLAST In-Reply-To: <000301c85f4c$0bd830d0$23889270$@pt> References: <20080123175518.eab8a089@mail.biocant.pt> <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> <000301c85f4c$0bd830d0$23889270$@pt> Message-ID: <4799DDCE.1030205@biotec.tu-dresden.de> Bruno Santos wrote: > I wasn't using any XML file as intermediate, I was parsing the blast results > directly. But it was really a problem with the databases. Now it's solved. > > My question now is another one, I'm blasting a multifasta file, so I need to > know which results belongs to which query sequence ID. I Know I can simply > assume that the blast result is ordered according to the sequences in the > fasta file, but is any other away to obtain the query ID directly using the > Blast Record class? record.query? Try exploring your Blast Record instance on a Python shell with the dir function: >>> record >>> dir(record) ['__doc__', '__init__', '__module__', '_num_letters_in_database', 'alignments', 'application', 'blast_cutoff', 'database', 'database_length', 'database_letters', 'database_name', 'database_sequences', 'date', 'descriptions', 'dropoff_1st_pass', 'effective_database_length', 'effective_hsp_length', 'effective_query_length', 'effective_search_space', 'effective_search_space_used', 'expect', 'filter', 'frameshift', 'gap_penalties', 'gap_trigger', 'gap_x_dropoff', 'gap_x_dropoff_final', 'gapped', 'hsps_gapped', 'hsps_no_gap', 'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'ka_params', 'ka_params_gap', 'matrix', 'multiple_alignment', 'num_good_extends', 'num_hits', 'num_letters_in_database', 'num_seqs_better_e', 'num_sequences', 'num_sequences_in_database', 'posted_date', 'query', 'query_id', 'query_length', 'query_letters', 'reference', 'sc_match', 'sc_mismatch', 'threshold', 'version', 'window_size'] Cheers, Christof > > Thanks in advance, > Bruno Santos From mjldehoon at yahoo.com Fri Jan 25 08:04:38 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Jan 2008 05:04:38 -0800 (PST) Subject: [BioPython] Bio.EUtils Message-ID: <8786.65209.qm@web62404.mail.re1.yahoo.com> Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. From mjldehoon at yahoo.com Sat Jan 26 00:38:01 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Jan 2008 21:38:01 -0800 (PST) Subject: [BioPython] Bio.EUtils In-Reply-To: Message-ID: <367303.23759.qm@web62406.mail.re1.yahoo.com> Dear Rohini, Thank you for your example. It was very helpful. Just a few questions about it: > dbinfo = EUtils.databases['pubmed'] Is this statement needed? The variable dbinfo is not used in your example, and the example words fine without this statement. > Then parse the xml or text lines. Do you parse the xml or text output yourself, or do you use any Biopython tools for that? The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils: >>> from Bio.WWW import NCBI >>> lines = NCBI.efetch(db='pubmed', id=listids, retmode='xml' ).readlines() # or retmode='text' I am saying "almost" the same, because currently Bio.WWW.NCBI.efetch does not handle multiple listids (so it accepts listids = '18211820' but not listids = ['18211820', '18211718', '18178374']). However, this can be fixed very easily in Biopython. My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI? Thanks again, --Michiel. Rohini Damle wrote: Hi, Here is how I use Bio.Eutils: from Bio import EUtils from Bio.EUtils import DBIdsClient dbinfo = EUtils.databases['pubmed'] #listids is a list of pubmed ids record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids)) rec2= record.efetch(retmode="xml",rettype=None).readlines() # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format Then parse the xml or text lines. Thanks -Rohini. On Jan 25, 2008 5:04 AM, Michiel de Hoon wrote: Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython Rohini Damle wrote: Hi, Here is how I use Bio.Eutils: from Bio import EUtils from Bio.EUtils import DBIdsClient dbinfo = EUtils.databases['pubmed'] #listids is a list of pubmed ids record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids)) rec2= record.efetch(retmode="xml",rettype=None).readlines() # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format Then parse the xml or text lines. Thanks -Rohini. On Jan 25, 2008 5:04 AM, Michiel de Hoon wrote: Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From rjalves at igc.gulbenkian.pt Mon Jan 28 04:58:50 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 09:58:50 +0000 Subject: [BioPython] Translation issues Message-ID: <479DA75A.6070804@igc.gulbenkian.pt> Hi. I'm trying to automate and validate the process of translation in sequences downloaded from NCBI. Basically I fetch a GenBank file, extract the DNA sequences and use the Translation module of BioPython to check if it matches. The problem is that the starting aminoacid in NCBI is always M but with the Translation module isn't, even if the codon is marked as "starting" in the corresponding codon table. So for instance, the sequence : "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA" with codon table 11 will translate to: a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" while the translation on the GenBank file is: b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" causing the test a == b to fail. The sequences are exactly the same with the exception of the initial aminoacid I could do the test in other ways and remove the initial letter, but that wouldn't work globally. So, is this the right behavior or am I missing something? Any other suggestions to do this test will also help. Thanks -- Renato Alves From biopython at maubp.freeserve.co.uk Mon Jan 28 05:40:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 28 Jan 2008 10:40:28 +0000 Subject: [BioPython] Translation issues In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt> References: <479DA75A.6070804@igc.gulbenkian.pt> Message-ID: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> On 1/28/08, Renato Alves wrote: > Hi. > > I'm trying to automate and validate the process of translation in > sequences downloaded from NCBI. ... The problem is > that the starting aminoacid in NCBI is always M but with the Translation > module isn't, even if the codon is marked as "starting" in the > corresponding codon table. > > So, is this the right behavior or am I missing something? Sadly, that is the just way the translation module works. This is a fairly common problem, and its one I was planning to try and "fix" as part of Bug 2382 http://bugzilla.open-bio.org/show_bug.cgi?id=2381 I would like some comments on the ideas on that bug - for example would you prefer separate methods/functions for blind translation, translation until a stop codon, and translation from a start codon which is treated as an M - or a single method with lots of optional arguments? > Any other suggestions to do this test will also help. Right now, I would check the start codon yourself and then use an M when translating the sequence. Remember the codon table (table 11 in your example) should have all the valid start codons defined. Peter From bsouthey at gmail.com Mon Jan 28 09:42:22 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 28 Jan 2008 08:42:22 -0600 Subject: [BioPython] Translation issues In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt> References: <479DA75A.6070804@igc.gulbenkian.pt> Message-ID: Hi, Please see: http://en.wikipedia.org/wiki/Start_codon "In addition to AUG, alternative start codons, mainly GUG and UUG are used in prokaryotes. For example E. coli uses 77% ATG (AUG), 14% GTG (GUG), 8% TTG (UUG) and a few others." Really the only way is to compare the sequences after the first position (a[1:]==b[1:]) assuming you expect an exact match. Alternatively you need to perform some type of alignment and flag unexpected differences. Regards Bruce On Jan 28, 2008 3:58 AM, Renato Alves wrote: > Hi. > > I'm trying to automate and validate the process of translation in > sequences downloaded from NCBI. > > Basically I fetch a GenBank file, extract the DNA sequences and use the > Translation module of BioPython to check if it matches. The problem is > that the starting aminoacid in NCBI is always M but with the Translation > module isn't, even if the codon is marked as "starting" in the > corresponding codon table. > > So for instance, the sequence : > > "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA > ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC > GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA > ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA > CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG > AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA > ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA > AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA" > > with codon table 11 will translate to: > > a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN > LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG > FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" > > while the translation on the GenBank file is: > > b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN > LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG > FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" > > causing the test a == b to fail. The sequences are exactly the same with > the exception of the initial aminoacid > > I could do the test in other ways and remove the initial letter, but > that wouldn't work globally. > > So, is this the right behavior or am I missing something? > > Any other suggestions to do this test will also help. > > Thanks > -- > Renato Alves > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From rjalves at igc.gulbenkian.pt Mon Jan 28 10:37:57 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 15:37:57 +0000 Subject: [BioPython] Translation issues In-Reply-To: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> References: <479DA75A.6070804@igc.gulbenkian.pt> <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> Message-ID: <479DF6D5.7020709@igc.gulbenkian.pt> Peter wrote: > Sadly, that is the just way the translation module works. This is a > fairly common problem, and its one I was planning to try and "fix" as > part of Bug 2382 > http://bugzilla.open-bio.org/show_bug.cgi?id=2381 > In this case, I guess that something that tests if the 1st codon is a start codon and matches the codon table's start codons, would be replaced by "M". But this is a very naive and specific thing. I don't know if this could break other uses of this function. > I would like some comments on the ideas on that bug - for example > would you prefer separate methods/functions for blind translation, > translation until a stop codon, and translation from a start codon > which is treated as an M - or a single method with lots of optional > arguments? > I don't have the expertise to distinguish the pros and cons between the two approaches. Still, in terms of potential user friendliness, I would go for separate methods/functions to keep the task simple and obvious. > Right now, I would check the start codon yourself and then use an M > when translating the sequence. Remember the codon table (table 11 in > your example) should have all the valid start codons defined. > I'm adopting the technique suggested by Bruce Southey to workaround this particular problem. Still this wouldn't work on more elaborate cases like some of the ones described on the bug thread you mentioned. Still, many thanks for the quick and clean answers. Renato From rjalves at igc.gulbenkian.pt Mon Jan 28 12:42:05 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 17:42:05 +0000 Subject: [BioPython] Alphabet Checking Message-ID: <479E13ED.2080908@igc.gulbenkian.pt> /var/lib/python-support/python2.4/Bio/Translate.py in translate_to_stop(self, seq) 34 def translate_to_stop(self, seq): 35 # This doesn't have a stop encoding ---> 36 assert seq.alphabet == self.table.nucleotide_alphabet, \ 37 "cannot translate from given alphabet (have %s, need %s)" %\ 38 (seq.alphabet, self.table.nucleotide_alphabet) AssertionError: cannot translate from given alphabet (have IUPACAmbiguousDNA(), need IUPACAmbiguousDNA()) Aren't those two exactly equal? Matching references doesn't seem to work as expected :( What I did: from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA from Bio import Translate from Bio import Seq a=Seq.Seq("ATCGGATGA...ATGCAGT",alphabet=IUPACAmbiguousDNA()) b=Translate.ambiguous_dna_by_id[11] b.translate_to_stop(a) ... error pops out The only way around I was able to find is: b.table.nucleotide_alphabet=a.alphabet I guess this is a bad day :( it's the second clash with the Translate module in the same day :| Should I report this as bug? From p.j.a.cock at googlemail.com Mon Jan 28 12:56:11 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Jan 2008 17:56:11 +0000 Subject: [BioPython] Alphabet Checking In-Reply-To: <479E13ED.2080908@igc.gulbenkian.pt> References: <479E13ED.2080908@igc.gulbenkian.pt> Message-ID: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> > Aren't those two exactly equal? > > Matching references doesn't seem to work as expected :( That does look like a bug... > The only way around I was able to find is: Another option, from Bio import Translate from Bio import Seq trans=Translate.ambiguous_dna_by_id[11] a=Seq.Seq("ATCGGATGAATGCAGT",alphabet=trans.table.nucleotide_alphabet) print trans.translate_to_stop(a) print trans.translate(a) > I guess this is a bad day :( it's the second clash with the Translate > module in the same day :| I don't like the Bio.Translate module either. > Should I report this as bug? Please do. If we do just add translation to the seq object (bug 2381) and deprecate the Bio.Translate module then in a sense this problem goes away ;) Peter From tiagoantao at gmail.com Mon Jan 28 13:10:56 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 28 Jan 2008 18:10:56 +0000 Subject: [BioPython] Alphabet Checking In-Reply-To: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> References: <479E13ED.2080908@igc.gulbenkian.pt> <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> Message-ID: <6d941f120801281010r6e8e829dub26a85e6a0b61983@mail.gmail.com> On Jan 28, 2008 5:56 PM, Peter Cock wrote: > > Aren't those two exactly equal? > > > > Matching references doesn't seem to work as expected :( > > That does look like a bug... It is probably completely unrelated, but it might not... >From an "helicopter view" at the code I have noticed that SeqIO uses Nexus in some cases. I have patched a previous Nexus bug by using deepcopy, which could cause something like this: AssertionError: cannot translate from given alphabet (have IUPACAmbiguousDNA(), need IUPACAmbiguousDNA()) (ie, it has the same type name, but is really not the same object) Again, it is probably unrelated (I know very little about Bio.Seq and Bio.SeqIO), but, just in case... From mjldehoon at yahoo.com Mon Jan 28 19:43:22 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 28 Jan 2008 16:43:22 -0800 (PST) Subject: [BioPython] Bio.EUtils In-Reply-To: Message-ID: <356164.74184.qm@web62403.mail.re1.yahoo.com> Rohini Damle wrote: The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils: ... My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI? I guess Bio.EUtils is faster, can be used for batch-processing (like fetching records for a list of pubmed ids) . I have not tried Bio.WWW.NCBI , will try it and get back to you. If you make the following modification in Bio.WWW.NCBI.py: line 189: replace options = urllib.urlencode(params) by options = urllib.urlencode(params, doseq=1) then Bio.WWW.NCBI can also fetch records for a list of pubmed ids. I'm guessing that then it is as fast as (or faster than) Bio.EUtils, but I'd be interested in what you find in practice. Thanks, --Michiel --------------------------------- Never miss a thing. Make Yahoo your homepage. From bsantos at biocant.pt Tue Jan 29 06:34:07 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 29 Jan 2008 11:34:07 -0000 Subject: [BioPython] Problems runing BLAST Message-ID: <000101c8626a$dd98e760$98cab620$@pt> I am once more having problems running blast using biopython. I start the script the blastall process starts and after a few minutes it starts sleeping and no error message is passed. When I check the xml file it only writes part of the results for the first sequence. Does anyone has ever had the same problem? I'm using: python 2.5.1 biopython 1.44 blastall 2.2.16 My code is the following: from Bio import SeqIO from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML import time import math import time import os primer = 'D2' sample = 'AGC' #Defines all the databases that will be used my_blast_db = ('\"/home/bsantos/DataBases/nt.00 /home/bsantos/DataBases/nt.01 /home/bsantos/DataBases/nt.02 /home/bsantos/DataBases/nt.03 /home/bsantos/DataBases/nt.04 /home/bsantos/DataBases/nt.05 /home/bsantos/DataBases/RDPIIdb /home/bsantos/DataBases/RNADB\"') print my_blast_db #Define the fasta file to Blast destination = '/home/bsantos/Metagenomics/Results/' + sample + '/' + primer + '/filteredfile_sample' + sample + '_' + primer + '_F.fasta' my_blast_file = (destination) #Defines the blast binaries my_blast_exe = "/usr/local/bin/blastall" print (os.path.exists(my_blast_exe)) print time.ctime() #Performs Blast print 'Now Performing Blast' result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) print 'This errors have occured:' print error_info.read() print 'Starting parsing the results.......' #Parse the result of the blast in XML format blast_results = result_handle.read() #Catch the results save_file = open('/home/bsantos/Metagenomics/Results/' + sample + '/' + primer + '/BlastReport_sample' + sample + '_' + primer + '_F.xml', 'w') save_file.write(blast_results) #Write all the information to an XML file save_file.flush() save_file.close() From biopython at maubp.freeserve.co.uk Tue Jan 29 07:15:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 29 Jan 2008 12:15:26 +0000 Subject: [BioPython] Problems runing BLAST In-Reply-To: <000101c8626a$dd98e760$98cab620$@pt> References: <000101c8626a$dd98e760$98cab620$@pt> Message-ID: <320fb6e00801290415g10e099dj108ecea15a72109c@mail.gmail.com> Hi Bruno, On Jan 29, 2008 11:34 AM, Bruno Santos wrote: > I am once more having problems running blast using biopython. I start the > script the blastall process starts and after a few minutes it starts > sleeping and no error message is passed. When I check the xml file it only > writes part of the results for the first sequence. Have you tried running the same command "by hand" at the command line, to check that is works, and time how long you should expect it to take? > Does anyone has ever had the same problem? I think the problem is to do with asking the operating system to read all the error output. Try commenting out this bit, and only read the error handle if you have a problem: # print error_info.read() Quoting from the tutorial, >> The error info can be hard to deal with, because if you try to do a error_handle.read() and >> there was no error info returned, then the read() call will block and not return, locking your >> script. In my opinion, the best way to deal with the error is only to print it out if you are not >> getting result_handle results to be parsed, but otherwise to leave it alone. Peter From jblanca at btc.upv.es Wed Jan 30 04:15:49 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Wed, 30 Jan 2008 10:15:49 +0100 Subject: [BioPython] blast parse Message-ID: <200801301015.50812.jblanca@btc.upv.es> Hi: I'm new on the list and on biopython. I come from perl and I'm liking python a lot. I'm trying to read a big blast file and it takes a lot o time and memory. I'm not sure if I'm taking the most efficient path. Basically I'm doing: blasth = file('blast.xml', 'r') from Bio.Blast import NCBIXML p = NCBIXML.BlastParser() blast_parse = p.parse(blasth) for blast_result in blast_parse: #do whatever I was expecting to read the records one by one, but the call to p.parse(blasth) takes a lot of time and memory. I'm not sure about what this function returns, a list or an iterator. I've looked at the NCBIXML.py file and the BlastParser class has two parse methods (am I wrong?). def parse(self, handler): """Parses the XML data handler -- file handler or StringIO This method returns a list of Blast record objects. """ def parse(handle, debug=0): """Returns an iterator a Blast record for each query. handle - file handle to and XML file to parse debug - integer, amount of debug information to print This is a generator function that returns multiple Blast records objects - one for each query sequence given to blast. The file is read incrementally, returning complete records as they are read in. I guess that the first function would read the complete file before returning anything, but the second should return and read the records one by one. I don't know if this guess is correct. Is there other way to read these huge blast files without using so much memory? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From mjldehoon at yahoo.com Wed Jan 30 04:56:56 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 30 Jan 2008 01:56:56 -0800 (PST) Subject: [BioPython] blast parse In-Reply-To: <200801301015.50812.jblanca@btc.upv.es> Message-ID: <940738.9737.qm@web62407.mail.re1.yahoo.com> Dear Jose, To get the records one-by-one, use from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for blast_result in blast_parse: # do whatever with blast_result This avoids having to read the complete XML file all at once. To the developers: We should probably think about removing the NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read exactly one record from the XML file. --Michiel. Jose Blanca wrote: Hi: I'm new on the list and on biopython. I come from perl and I'm liking python a lot. I'm trying to read a big blast file and it takes a lot o time and memory. I'm not sure if I'm taking the most efficient path. Basically I'm doing: blasth = file('blast.xml', 'r') from Bio.Blast import NCBIXML p = NCBIXML.BlastParser() blast_parse = p.parse(blasth) for blast_result in blast_parse: #do whatever I was expecting to read the records one by one, but the call to p.parse(blasth) takes a lot of time and memory. I'm not sure about what this function returns, a list or an iterator. I've looked at the NCBIXML.py file and the BlastParser class has two parse methods (am I wrong?). def parse(self, handler): """Parses the XML data handler -- file handler or StringIO This method returns a list of Blast record objects. """ def parse(handle, debug=0): """Returns an iterator a Blast record for each query. handle - file handle to and XML file to parse debug - integer, amount of debug information to print This is a generator function that returns multiple Blast records objects - one for each query sequence given to blast. The file is read incrementally, returning complete records as they are read in. I guess that the first function would read the complete file before returning anything, but the second should return and read the records one by one. I don't know if this guess is correct. Is there other way to read these huge blast files without using so much memory? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Never miss a thing. Make Yahoo your homepage. From lueck at ipk-gatersleben.de Wed Jan 30 05:24:55 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Wed, 30 Jan 2008 11:24:55 +0100 Subject: [BioPython] Clustalw pair wise alignment Message-ID: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> Hi! I working with clustalw and everything works fine. No I have some questions: 1) Must the input data be in a file or can it also be in the code (e.g. in a list)? 2) Because, I want to do many (up to hundreds) pair wise alignments (short sequences) and I don't want to store each of them in a separate file. If I have it in one file, clustalw make a multiple alignment: Match1 ------CAAGATTTGAGCACCACAGGCAA--- full1 ------CAAGATTTGAGCACCACAGGCAACAG Match0 AGCCTTCAAGATTTGAGCACCACAG------- full0 AGCCTTCAAGATTTGAGCACCACAG------- whereas Match1 should only align to full1 and so on. Could someone give a hint? Regards Stefanie From biopython at maubp.freeserve.co.uk Wed Jan 30 06:47:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 30 Jan 2008 11:47:42 +0000 Subject: [BioPython] Clustalw pair wise alignment In-Reply-To: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> References: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801300347h6f1ec197qc599ec9f2c80bab@mail.gmail.com> Hi Stefanie > I working with clustalw and everything works fine. No I have some questions: > > 1) Must the input data be in a file or can it also be in the code (e.g. in a list)? I believe for the Clustalw command line tool, you have to supply the input data in a file. > 2) Because, I want to do many (up to hundreds) pair wise > alignments (short sequences) and I don't want to store > each of them in a separate file. > > If I have it in one file, clustalw make a multiple alignment: Yes, that is expected for clustalw. > Could someone give a hint? If you want to use Clustalw, you could re-use a temporary file for each pair of sequences (rather than creating hundreds of different input files). I would consider using the EMBOSS tools "needle" or "water" for doing pairwise alignments. These have the advantage that you can actually supply the sequence as part of the command line (provided they are not too long). See http://emboss.sourceforge.net/apps/ and also http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#asis Peter From winter at biotec.tu-dresden.de Wed Jan 30 07:48:34 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 30 Jan 2008 13:48:34 +0100 Subject: [BioPython] blast parse In-Reply-To: <940738.9737.qm@web62407.mail.re1.yahoo.com> References: <940738.9737.qm@web62407.mail.re1.yahoo.com> Message-ID: <47A07222.9000200@biotec.tu-dresden.de> Michiel de Hoon wrote: > Dear Jose, > > To get the records one-by-one, use > > from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for > blast_result in blast_parse: # do whatever with blast_result > > This avoids having to read the complete XML file all at once. > > To the developers: We should probably think about removing the > NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read > exactly one record from the XML file. I thinks removing NCBIXML.BlastParser.parse is a good idea. We should keep it simple. Christof From wolfgang.meyer at gmail.com Tue Jan 1 17:33:41 2008 From: wolfgang.meyer at gmail.com (Wolfgang Meyer) Date: Tue, 1 Jan 2008 18:33:41 +0100 Subject: [BioPython] residue sequence number length (no more than 4 digits) Message-ID: Hi, According to PDB format (old), residue sequence number length should be no longer than 4 digits. ... 23 - 26 Integer resSeq Residue sequence number. ... However, Bio.PDB.Residue.__init__(...) does not check the length of this parameter, neither does Bio.PDB.PDBIO. Though Bio.PDB.PDBIO tries to restrict the length of residue sequence number to 4 in the format string: _ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c %8.3f%8.3f%8.3f%6.2f%6.2f %4s%2s%2s\n" This does not prevent a residue sequence number longer than 4 digits to be written into a PDB file by PDBIO. Such a PDB file would be considered false by many PDB file parsers. Of course users should be responsible to feed residue sequence number of valid length to a residue. However, wouldn't it be better to handle some careless input of wrong residue sequence number in BioPython? Thanks! -- Wolfgang Meyer From hlapp at gmx.net Tue Jan 1 23:25:39 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 1 Jan 2008 18:25:39 -0500 Subject: [BioPython] [BioSQL-l] Authority in biodatabase table In-Reply-To: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> Message-ID: (Sorry for this long-too-late reply. Going through old email that got left unread or unresponded.) Peter - you probably implemented something meanwhile that suits your needs. Just FYI, BioPerl leaves this empty too. The general notion for authority is that of the LSID authority field, but of course you won't be able to parse this out of any input file. The value for SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - NCBI hasn't ever issued any LSIDs, but presumably it would be something like ncbi.nlm.nih.gov. -hilmar On Nov 26, 2007, at 2:10 PM, Peter wrote: > Thank's for all the replies on the db_xref issue. > > Today I'd like to ask if there are any established guidelines for the > biodatabase table - in particular for how to use the "authority" field > in the biodatabase table, and if there is any agreed terminology for > the named "sub databases" defined therein i.e. what should I call them > in our documentation. > > By default, unless the user specifies an authority, we end up with a > NULL when creating entries in the biodatabase table using Biopython. > For example: > >> from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", > passwd = "", host = "localhost", db="bioseqdb") > db = server.new_database("orchids", description="Just for testing") > server.adaptor.commit() > > I'd like to give some sensible defaults in any worked examples. Apart >> from simple test cases (like above), sensible examples that came to > mind would be creating a "sub database" to contain: > (*) an entire GenBank release > (*) the latest SwissProt release > > What would you use in these cases. In fact, what does your > biodatabase table contain right now? > > Thank you all, > > Peter > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lee.byung-chul at kaist.ac.kr Wed Jan 2 11:00:37 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Wed, 02 Jan 2008 20:00:37 +0900 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format Message-ID: <477B6ED5.8080005@kaist.ac.kr> Dear colleagues. I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. I think that to do the process firstly the fasta format should be converted to clustalw format, so I try to use Formatconverter. However, at my trial, I cannot do that. I did like below: ---- #!/usr/bin/env python from Bio import Fasta from Bio.Align.FormatConvert import FormatConverter from Bio.Alphabet import IUPAC alignment = Fasta.FastaAlign.parse_file('tmp.fasta',type='PROTEIN') converter = FormatConverter(alignment) clw_align = converter.to_clustal() print clw_align ---- and tmp.fasta is --- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD But error occured. error messages are below: --- Traceback (most recent call last): File "tmp.py", line 7, in alignment = Fasta.FastaAlign.parse_file('tmp.fasta', type='PROTEIN') File "/var/lib/python-support/python2.5/Bio/Fasta/FastaAlign.py", line 48, in parse_file cur_align = iterator.next() File "/var/lib/python-support/python2.5/Bio/Fasta/__init__.py", line 72, in next result = self._iterator.next() File "/var/lib/python-support/python2.5/Martel/IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "/var/lib/python-support/python2.5/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 0 ----- What should I do? Could you advide me ? Thank you! Byung chul Lee From biopython at maubp.freeserve.co.uk Wed Jan 2 11:54:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 11:54:34 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <477B6ED5.8080005@kaist.ac.kr> References: <477B6ED5.8080005@kaist.ac.kr> Message-ID: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> Hello Byung chul Lee, On 1/2/08, Lee,Byung-chul wrote: > > Dear colleagues. > > I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. > I think that to do the process firstly the fasta format should be > converted to clustalw format, so I try to use Formatconverter. > However, at my trial, I cannot do that. Once you have an alignment object (loaded from any file format), this should work with AlignInfo. I don't think you need to convert it from FASTA to ClustalW. I would guess the error you saw is a problem with Biopython/Martel and mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. What version of Biopython are you using, as I would have expected this to work fine with Biopython 1.44? You could also try using Bio.SeqIO to load the FASTA format alignment file instead, see http://biopython.org/wiki/SeqIO from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) Peter From biopython at maubp.freeserve.co.uk Wed Jan 2 11:57:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 11:57:46 +0000 Subject: [BioPython] [BioSQL-l] Authority in biodatabase table In-Reply-To: References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com> Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a@mail.gmail.com> On 1/1/08, Hilmar Lapp wrote: > (Sorry for this long-too-late reply. Going through old email that got > left unread or unresponded.) > > Peter - you probably implemented something meanwhile that suits your > needs. Just FYI, BioPerl leaves this empty too. The general notion > for authority is that of the LSID authority field, but of course you > won't be able to parse this out of any input file. The value for > SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - > NCBI hasn't ever issued any LSIDs, but presumably it would be > something like ncbi.nlm.nih.gov. > > -hilmar Thank you Hilmar. It seem's that the current code in Biopython is fine (the authority field is left blank by default, unless the user supplies their own value), and consistent with both BioPerl and BioJava in this regard (thanks Richard). Peter From lee.byung-chul at kaist.ac.kr Wed Jan 2 13:44:47 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Wed, 02 Jan 2008 22:44:47 +0900 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> Message-ID: <477B954F.9020004@kaist.ac.kr> Thank you very much for your kind reply, Peter. As your explanation, I tried to use SeqIO, but another error occured I did it like below: ----------------- from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() -------------------- but the results are ----------------- Traceback (most recent call last): File "tmp.py", line 16, in print summary_align.dumb_consensus() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus consensus_alpha = self._guess_consensus_alphabet() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet ("Non-gapped alphabet found in alignment object.") ValueError: Non-gapped alphabet found in alignment object. --------------------- In addition, all sequences have the same lenghth in my tmp.fasta file. ----- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD Is this problem caused by the Biopython/Martel and mxTextTools vesions? I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. What should I do for this? Thanks. Byung chul. Peter wrote: > Hello Byung chul Lee, > > On 1/2/08, Lee,Byung-chul wrote: > >> Dear colleagues. >> >> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. >> I think that to do the process firstly the fasta format should be >> converted to clustalw format, so I try to use Formatconverter. >> However, at my trial, I cannot do that. >> > > Once you have an alignment object (loaded from any file format), this > should work with AlignInfo. I don't think you need to convert it from > FASTA to ClustalW. > > I would guess the error you saw is a problem with Biopython/Martel and > mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. > What version of Biopython are you using, as I would have expected this > to work fine with Biopython 1.44? > > You could also try using Bio.SeqIO to load the FASTA format alignment > file instead, see http://biopython.org/wiki/SeqIO > > from Bio import SeqIO > from Bio.Align import AlignInfo > alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) > summary_align = AlignInfo.SummaryInfo(alignment) > > Peter > > From biopython at maubp.freeserve.co.uk Wed Jan 2 17:46:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jan 2008 17:46:25 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <477B954F.9020004@kaist.ac.kr> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> <477B954F.9020004@kaist.ac.kr> Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > As your explanation, I tried to use SeqIO, but another error occured > I did it like below: My fault, sorry. I wasn't at a computer with Biopython installed, I had to guess. I'll try and put together a proper example for you tomorrow. > Is this problem caused by the Biopython/Martel and mxTextTools vesions? > I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. The original problem you reported was due to the combination of Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is going to be very simple if you want to use the Ubuntu repositories. To avoid this Martel problem, I would suggest you un-install Biopython 1.43 from the Ubuntu repository, and then install Biopython 1.44 from source. Peter From biopython at maubp.freeserve.co.uk Fri Jan 4 13:20:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jan 2008 13:20:26 +0000 Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format In-Reply-To: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> References: <477B6ED5.8080005@kaist.ac.kr> <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com> <477B954F.9020004@kaist.ac.kr> <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com> Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706@mail.gmail.com> On Jan 2, 2008 5:46 PM, Peter wrote: > On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > > As your explanation, I tried to use SeqIO, but another error occured > > I did it like below: > > My fault, sorry. I wasn't at a computer with Biopython installed, I > had to guess. I'll try and put together a proper example for you > tomorrow. This should work on Biopython 1.43 or later, I have tested it using the simple FASTA file you gave earlier: from Bio.Alphabet.IUPAC import IUPACProtein from Bio.Alphabet import Gapped from Bio import SeqIO from Bio.Align import AlignInfo gapped_protein = Gapped(IUPACProtein()) records = list(SeqIO.parse(open('tmp.fasta'), "fasta")) for rec in records : #Override the default generic alphabet: rec.seq.alphabet = gapped_protein #Turn these records into an alignment alignment = SeqIO.to_alignment(records, gapped_protein) del records summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() The problem with my previous shorter suggestion was the Bio.SeqIO FASTA parser returned SeqRecord objects with a generic alphabet, while the alignment summary expected a gapped alphabet. I'm beginning to think that the Bio.SeqIO.parse() function should allow an alphabet to be specified as an optional argument for this sort of situation. Alternatively, going back to your original code how about: from Bio.Fasta import FastaAlign from Bio.Align import AlignInfo alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN') summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0. It should work with older versions of Biopython using mxTextTools 2.0 as well. Peter From mjldehoon at yahoo.com Sat Jan 5 08:41:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST) Subject: [BioPython] Bio.Ais Message-ID: <140129.37367.qm@web62402.mail.re1.yahoo.com> Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From meesters at uni-mainz.de Mon Jan 7 18:13:59 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 7 Jan 2008 19:13:59 +0100 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' Message-ID: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> Hoi, I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I have this approach: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) residue.add(new) Here x, y, and z are floating point numbers and serial_number is an integer. 'residue' is a 'Residue' I'm iterating over. However, I keep getting the following error message and don't have a clue, how to proceed: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) TypeError: object of type 'module' is not callable Does anyone have a hint for me, how actually add an atom or what's wrong here? TIA Christian From biopython at maubp.freeserve.co.uk Mon Jan 7 18:55:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Jan 2008 18:55:57 +0000 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' In-Reply-To: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> Christian Meesters wrote: > I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I > have this approach: > ... > new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) > TypeError: object of type 'module' is not callable > > Does anyone have a hint for me, how actually add an atom or what's wrong > here? I would infer from the error that "Atom" refers to the Bio.PDB.Atom module, rather than the Bio.PDB.Atom.Atom class. How did you do your imports? Try this: from Bio.PDB.Atom import Atom Peter From lueck at ipk-gatersleben.de Tue Jan 8 09:06:40 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 10:06:40 +0100 Subject: [BioPython] blastall does not exist at %s" % blastcmd Message-ID: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> Hi! I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message: >>> Traceback (most recent call last): File "F:\Blast\blast.py", line 10, in my_blast_db, my_blast_file) File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall raise ValueError, "blastall does not exist at %s" % blastcmd ValueError: blastall does not exist at C:\Blast\bin\blastall.exe <<< >>> My Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"F:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\bin\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() <<< blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. Does somewone has idea where's the problem? Greetings Stefanie From biopython at maubp.freeserve.co.uk Tue Jan 8 10:46:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jan 2008 10:46:02 +0000 Subject: [BioPython] blastall does not exist at %s" % blastcmd In-Reply-To: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com> On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter From lueck at ipk-gatersleben.de Tue Jan 8 11:32:54 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 12:32:54 +0100 Subject: [BioPython] blastall does not exist at %s" % blastcmd References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de> <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com> Message-ID: <003a01c851ea$357e5cd0$1022a8c0@ipkgatersleben.de> Thanks Peter! C:\Blast\blastall.exe worked!. Sorry for the drive mistake, I have it on both... But my xml File is empty :-( I'll try to fix it... standalone blast version is blast-2.2.17-ia32-win32.exe Stefanie ----- Original Message ----- From: "Peter" To: "Stefanie L?ck" Cc: Sent: Tuesday, January 08, 2008 11:46 AM Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the > cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line > 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, > "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be > found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter From lueck at ipk-gatersleben.de Tue Jan 8 14:18:08 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 15:18:08 +0100 Subject: [BioPython] empty xml after local blast Message-ID: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> Hi again! I got blastall running but my xml output file is empty... Any ideas? Where exactly must be my fasta file? >>> Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"C:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() >>> I'm using Python 2.5, biopython-1.44.win32-py2.5.exe and blast-2.2.17-ia32-win32.exe Regards Stefanie From biopython at maubp.freeserve.co.uk Tue Jan 8 14:33:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jan 2008 14:33:29 +0000 Subject: [BioPython] empty xml after local blast In-Reply-To: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com> On Jan 8, 2008 2:18 PM, Stefanie L?ck wrote: > Hi again! > > I got blastall running but my xml output file is empty... > Any ideas? Have you ever tried running blastall.exe from the command line "by hand"? This can be very useful, and would let you rule out several basic problems (e.g. make sure blast is installed correctly, and that your database is working). > Where exactly must be my fasta file? Where ever you like - as long as you specify its location correctly. Your code below seems to assume that "test.fasta" is in the current directory (i.e. where you are running your python script from). Is this correct? It may be simpler to use a full path, e.g. my_blast_file = r"C:\temp\test.fasta" I suspect that Standalone blast is not finding the input file, or that it is not finding your database. If you get an empty XML file, one thing to try is checking the error output from the command line call: print error_info.read() Peter From lueck at ipk-gatersleben.de Tue Jan 8 15:18:32 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Tue, 8 Jan 2008 16:18:32 +0100 Subject: [BioPython] empty xml after local blast References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de> <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com> Message-ID: <009d01c85209$bb314210$1022a8c0@ipkgatersleben.de> Thanks, it's couldn't find the database! Great help, thanks a lot ;-) ----- Original Message ----- From: "Peter" To: "Stefanie L?ck" Cc: Sent: Tuesday, January 08, 2008 3:33 PM Subject: Re: [BioPython] empty xml after local blast On Jan 8, 2008 2:18 PM, Stefanie L?ck wrote: > Hi again! > > I got blastall running but my xml output file is empty... > Any ideas? Have you ever tried running blastall.exe from the command line "by hand"? This can be very useful, and would let you rule out several basic problems (e.g. make sure blast is installed correctly, and that your database is working). > Where exactly must be my fasta file? Where ever you like - as long as you specify its location correctly. Your code below seems to assume that "test.fasta" is in the current directory (i.e. where you are running your python script from). Is this correct? It may be simpler to use a full path, e.g. my_blast_file = r"C:\temp\test.fasta" I suspect that Standalone blast is not finding the input file, or that it is not finding your database. If you get an empty XML file, one thing to try is checking the error output from the command line call: print error_info.read() Peter From meesters at uni-mainz.de Tue Jan 8 16:12:09 2008 From: meesters at uni-mainz.de (Christian Meesters) Date: Tue, 8 Jan 2008 17:12:09 +0100 Subject: [BioPython] Bio.PDB - adding 'dummy atoms' In-Reply-To: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de> <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com> Message-ID: <1199808729.5401.75.camel@meesters.biologie.uni-mainz.de> > I would infer from the error that "Atom" refers to the Bio.PDB.Atom > module, rather than the Bio.PDB.Atom.Atom class. How did you do your > imports? Try this: > > from Bio.PDB.Atom import Atom > > Peter Ouch! Next time I'll try the tutor-list ;-). Thanks a lot. Christian From quantrum75 at yahoo.com Fri Jan 11 00:16:51 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Thu, 10 Jan 2008 16:16:51 -0800 (PST) Subject: [BioPython] bio.PDB module In-Reply-To: Message-ID: <258224.6110.qm@web31404.mail.mud.yahoo.com> Hi I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure. I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way. Thanks for your time Regards Rama biopython-request at lists.open-bio.org wrote: Send BioPython mailing list submissions to biopython at lists.open-bio.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.open-bio.org/mailman/listinfo/biopython or, via email, send a message with subject or body 'help' to biopython-request at lists.open-bio.org You can reach the person managing the list at biopython-owner at lists.open-bio.org When replying, please edit your Subject line so it is more specific than "Re: Contents of BioPython digest..." Today's Topics: 1. Re: [BioSQL-l] Authority in biodatabase table (Peter) 2. Re: FormatConverter: from Fasta format to ClustalW format (Lee,Byung-chul) 3. Re: FormatConverter: from Fasta format to ClustalW format (Peter) 4. Re: FormatConverter: from Fasta format to ClustalW format (Peter) 5. Bio.Ais (Michiel de Hoon) 6. Bio.PDB - adding 'dummy atoms' (Christian Meesters) 7. Re: Bio.PDB - adding 'dummy atoms' (Peter) 8. blastall does not exist at %s" % blastcmd (Stefanie L?ck) 9. Re: blastall does not exist at %s" % blastcmd (Peter) ---------------------------------------------------------------------- Message: 1 Date: Wed, 2 Jan 2008 11:57:46 +0000 From: Peter Subject: Re: [BioPython] [BioSQL-l] Authority in biodatabase table To: "Hilmar Lapp" Cc: biopython at lists.open-bio.org, biosql-l at lists.open-bio.org Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On 1/1/08, Hilmar Lapp wrote: > (Sorry for this long-too-late reply. Going through old email that got > left unread or unresponded.) > > Peter - you probably implemented something meanwhile that suits your > needs. Just FYI, BioPerl leaves this empty too. The general notion > for authority is that of the LSID authority field, but of course you > won't be able to parse this out of any input file. The value for > SwissProt would be uniprot.org, for example. For NCBI, I'm not sure - > NCBI hasn't ever issued any LSIDs, but presumably it would be > something like ncbi.nlm.nih.gov. > > -hilmar Thank you Hilmar. It seem's that the current code in Biopython is fine (the authority field is left blank by default, unless the user supplies their own value), and consistent with both BioPerl and BioJava in this regard (thanks Richard). Peter ------------------------------ Message: 2 Date: Wed, 02 Jan 2008 22:44:47 +0900 From: "Lee,Byung-chul" Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: biopython at lists.open-bio.org Message-ID: <477B954F.9020004 at kaist.ac.kr> Content-Type: text/plain; charset=EUC-KR Thank you very much for your kind reply, Peter. As your explanation, I tried to use SeqIO, but another error occured I did it like below: ----------------- from Bio import SeqIO from Bio.Align import AlignInfo alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() -------------------- but the results are ----------------- Traceback (most recent call last): File "tmp.py", line 16, in print summary_align.dumb_consensus() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus consensus_alpha = self._guess_consensus_alphabet() File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet ("Non-gapped alphabet found in alignment object.") ValueError: Non-gapped alphabet found in alignment object. --------------------- In addition, all sequences have the same lenghth in my tmp.fasta file. ----- >seq2 DAC >seq3 DC- >seq1 DAD >seq4 DDD Is this problem caused by the Biopython/Martel and mxTextTools vesions? I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. What should I do for this? Thanks. Byung chul. Peter wrote: > Hello Byung chul Lee, > > On 1/2/08, Lee,Byung-chul wrote: > >> Dear colleagues. >> >> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file. >> I think that to do the process firstly the fasta format should be >> converted to clustalw format, so I try to use Formatconverter. >> However, at my trial, I cannot do that. >> > > Once you have an alignment object (loaded from any file format), this > should work with AlignInfo. I don't think you need to convert it from > FASTA to ClustalW. > > I would guess the error you saw is a problem with Biopython/Martel and > mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0. > What version of Biopython are you using, as I would have expected this > to work fine with Biopython 1.44? > > You could also try using Bio.SeqIO to load the FASTA format alignment > file instead, see http://biopython.org/wiki/SeqIO > > from Bio import SeqIO > from Bio.Align import AlignInfo > alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta")) > summary_align = AlignInfo.SummaryInfo(alignment) > > Peter > > ------------------------------ Message: 3 Date: Wed, 2 Jan 2008 17:46:25 +0000 From: Peter Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: "Lee,Byung-chul" Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > As your explanation, I tried to use SeqIO, but another error occured > I did it like below: My fault, sorry. I wasn't at a computer with Biopython installed, I had to guess. I'll try and put together a proper example for you tomorrow. > Is this problem caused by the Biopython/Martel and mxTextTools vesions? > I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1. The original problem you reported was due to the combination of Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is going to be very simple if you want to use the Ubuntu repositories. To avoid this Martel problem, I would suggest you un-install Biopython 1.43 from the Ubuntu repository, and then install Biopython 1.44 from source. Peter ------------------------------ Message: 4 Date: Fri, 4 Jan 2008 13:20:26 +0000 From: Peter Subject: Re: [BioPython] FormatConverter: from Fasta format to ClustalW format To: "Lee,Byung-chul" Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 2, 2008 5:46 PM, Peter wrote: > On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote: > > As your explanation, I tried to use SeqIO, but another error occured > > I did it like below: > > My fault, sorry. I wasn't at a computer with Biopython installed, I > had to guess. I'll try and put together a proper example for you > tomorrow. This should work on Biopython 1.43 or later, I have tested it using the simple FASTA file you gave earlier: from Bio.Alphabet.IUPAC import IUPACProtein from Bio.Alphabet import Gapped from Bio import SeqIO from Bio.Align import AlignInfo gapped_protein = Gapped(IUPACProtein()) records = list(SeqIO.parse(open('tmp.fasta'), "fasta")) for rec in records : #Override the default generic alphabet: rec.seq.alphabet = gapped_protein #Turn these records into an alignment alignment = SeqIO.to_alignment(records, gapped_protein) del records summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() The problem with my previous shorter suggestion was the Bio.SeqIO FASTA parser returned SeqRecord objects with a generic alphabet, while the alignment summary expected a gapped alphabet. I'm beginning to think that the Bio.SeqIO.parse() function should allow an alphabet to be specified as an optional argument for this sort of situation. Alternatively, going back to your original code how about: from Bio.Fasta import FastaAlign from Bio.Align import AlignInfo alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN') summary_align = AlignInfo.SummaryInfo(alignment) print summary_align.dumb_consensus() print summary_align.gap_consensus() This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0. It should work with older versions of Biopython using mxTextTools 2.0 as well. Peter ------------------------------ Message: 5 Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST) From: Michiel de Hoon Subject: [BioPython] Bio.Ais To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org Message-ID: <140129.37367.qm at web62402.mail.re1.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. ------------------------------ Message: 6 Date: Mon, 7 Jan 2008 19:13:59 +0100 From: Christian Meesters Subject: [BioPython] Bio.PDB - adding 'dummy atoms' To: "biopython at lists.open-bio.org" Message-ID: <1199729639.13152.20.camel at meesters.biologie.uni-mainz.de> Content-Type: text/plain Hoi, I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I have this approach: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) residue.add(new) Here x, y, and z are floating point numbers and serial_number is an integer. 'residue' is a 'Residue' I'm iterating over. However, I keep getting the following error message and don't have a clue, how to proceed: new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) TypeError: object of type 'module' is not callable Does anyone have a hint for me, how actually add an atom or what's wrong here? TIA Christian ------------------------------ Message: 7 Date: Mon, 7 Jan 2008 18:55:57 +0000 From: Peter Subject: Re: [BioPython] Bio.PDB - adding 'dummy atoms' To: "Christian Meesters" Cc: "biopython at lists.open-bio.org" Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Christian Meesters wrote: > I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I > have this approach: > ... > new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number) > TypeError: object of type 'module' is not callable > > Does anyone have a hint for me, how actually add an atom or what's wrong > here? I would infer from the error that "Atom" refers to the Bio.PDB.Atom module, rather than the Bio.PDB.Atom.Atom class. How did you do your imports? Try this: from Bio.PDB.Atom import Atom Peter ------------------------------ Message: 8 Date: Tue, 8 Jan 2008 10:06:40 +0100 From: Stefanie L?ck Subject: [BioPython] blastall does not exist at %s" % blastcmd To: Message-ID: <002301c851d5$c7daac60$1022a8c0 at ipkgatersleben.de> Content-Type: text/plain; charset="iso-8859-1" Hi! I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message: >>> Traceback (most recent call last): File "F:\Blast\blast.py", line 10, in my_blast_db, my_blast_file) File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall raise ValueError, "blastall does not exist at %s" % blastcmd ValueError: blastall does not exist at C:\Blast\bin\blastall.exe <<< >>> My Code: import Bio from Bio.Blast import NCBIStandalone import os my_blast_db = r"F:\Blast\primerdb" my_blast_file = "test.fasta" my_blast_exe = r"C:\Blast\bin\blastall.exe" result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) blast_results = result_handle.read() save_file = open("my_blast.xml", "w") save_file.write(blast_results) save_file.close() <<< blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. Does somewone has idea where's the problem? Greetings Stefanie ------------------------------ Message: 9 Date: Tue, 8 Jan 2008 10:46:02 +0000 From: Peter Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd To: " Stefanie L?ck " Cc: biopython at lists.open-bio.org Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Jan 8, 2008 9:06 AM, Stefanie L?ck wrote: > Hi! > > I'm trying to get a local blast running. I proceeded as described in the cookbook > but I allways get this Error message: > >>> > Traceback (most recent call last): > File "F:\Blast\blast.py", line 10, in > my_blast_db, my_blast_file) > File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall > raise ValueError, "blastall does not exist at %s" % blastcmd > ValueError: blastall does not exist at C:\Blast\bin\blastall.exe > <<< > > >>> > My Code: > > import Bio > from Bio.Blast import NCBIStandalone > import os > > my_blast_db = r"F:\Blast\primerdb" > my_blast_file = "test.fasta" > my_blast_exe = r"C:\Blast\bin\blastall.exe" > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn", > my_blast_db, my_blast_file) > ... > blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool. > Could you try this, which is the test done in the Biopython blastall function that triggers the error message you saw: print os.path.exists(my_blast_exe) Could you also double check the path is C:\Blast\bin\blastall.exe and not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point on Windows). Also did you install it to the F: drive where your database is, rather than C: ? > I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe. What version of standalone blast do you have? Peter ------------------------------ _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython End of BioPython Digest, Vol 61, Issue 2 **************************************** --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From lee.byung-chul at kaist.ac.kr Fri Jan 11 03:15:02 2008 From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul) Date: Fri, 11 Jan 2008 12:15:02 +0900 Subject: [BioPython] bio.PDB module In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com> References: <258224.6110.qm@web31404.mail.mud.yahoo.com> Message-ID: <4786DF36.6070102@kaist.ac.kr> quantrum75 wrote: > > Hi > > I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure. > > I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way. > > Thanks for your time > > Regards > > Rama > > > I think the web page below can help you. Check it. : http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Byung chul. From mjldehoon at yahoo.com Fri Jan 11 11:16:45 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 11 Jan 2008 03:16:45 -0800 (PST) Subject: [BioPython] [Biopython-dev] Bio.Ais In-Reply-To: <140129.37367.qm@web62402.mail.re1.yahoo.com> Message-ID: <426295.9925.qm@web62415.mail.re1.yahoo.com> Looking at this again, currently we have no documentation for Bio.Ais, no maintainer, and no apparent users (at least, I couldn't find any in the mailing list archives). Would anybody mind very much if I mark this module as deprecated? Just to find out if there are any users of this code out there. --Michiel. Michiel de Hoon wrote: Hi everybody, I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). Currently, this example script does not seem to work: $ python example_ais2.py Traceback (most recent call last): File "example_ais2.py", line 39, in immune = Immune( align, alphabet, 100 ) ... TypeError: 'int' object is not iterable The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases. Does anybody know how to fix this example? If not, what should we do with it? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From biopython at maubp.freeserve.co.uk Fri Jan 11 11:51:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 11 Jan 2008 11:51:41 +0000 Subject: [BioPython] bio.PDB module In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com> References: <258224.6110.qm@web31404.mail.mud.yahoo.com> Message-ID: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com> On Jan 11, 2008 12:16 AM, quantrum75 wrote: > Hi > I am a biopython newbie. I was wondering if someone could show me or send me > ( I would be thankful) where I could find a script which can read a pdb file and out > the phi and psi angles of the protein structure. I see Byung chul has already suggested reading this page: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Do you think we should incorporate some that into the main Biopython documentation? > I have read through the bio.PDB module and structural module documentation, > but still do not have an idea on how to proceed to tackle the problem. I wish the > bio.PDB documentation was a bit more detailed and included some examples to > work with. Have you read the Biopython Structural Bioinformatics FAQ, http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is linked to from our documentation webpage, but doesn't seem to me mentioned in the main Biopython Tutorial and Cookbook... > I really would like to contribute to the project and maybe if I got an > initial idea on how to work with the same, I can contribute in some small way. Maybe you could start a "Getting started with Bio.PDB" page on the Wiki? Peter From quantrum75 at yahoo.com Fri Jan 11 13:26:23 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Fri, 11 Jan 2008 05:26:23 -0800 (PST) Subject: [BioPython] bio.PDB module In-Reply-To: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com> Message-ID: <496455.11121.qm@web31409.mail.mud.yahoo.com> Hi Peter, Thanks for your reply. I did go through the links which you made a mention of to me including the structural bioinformatics FAQ. However, I feel the documentation pertaining to bio.PDB module is seriously short on any practical examples for a person like me who likes to learn from examples. I would love to be able to write a "getting started with bio.PDB" wiki or document with examples. However, I need to get the basic ideas on how to use the module which I am unable to from the current documentation which is why I made the request for a script which can compute the phi and psi angle of a pdb file. I ll see what I can do and if you could direct to any resources, that would be great. Thanks Rama Peter wrote: On Jan 11, 2008 12:16 AM, quantrum75 wrote: > Hi > I am a biopython newbie. I was wondering if someone could show me or send me > ( I would be thankful) where I could find a script which can read a pdb file and out > the phi and psi angles of the protein structure. I see Byung chul has already suggested reading this page: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/ Do you think we should incorporate some that into the main Biopython documentation? > I have read through the bio.PDB module and structural module documentation, > but still do not have an idea on how to proceed to tackle the problem. I wish the > bio.PDB documentation was a bit more detailed and included some examples to > work with. Have you read the Biopython Structural Bioinformatics FAQ, http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is linked to from our documentation webpage, but doesn't seem to me mentioned in the main Biopython Tutorial and Cookbook... > I really would like to contribute to the project and maybe if I got an > initial idea on how to work with the same, I can contribute in some small way. Maybe you could start a "Getting started with Bio.PDB" page on the Wiki? Peter --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From jdieten at gmail.com Tue Jan 15 13:26:08 2008 From: jdieten at gmail.com (Joost van Dieten) Date: Tue, 15 Jan 2008 14:26:08 +0100 Subject: [BioPython] [Biopython] Blast problem Message-ID: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> Hi Everyone, I'am having a problem with the hsp.match function in the Bio-Python Blast module. A few weeks ago the hsp.match returned me the following: ATGGCA++TGG But now it gives me: ATGGCA TGG I can't see the number of gaps anymore, anyone a solution for this? Best regards, Joost van Dieten From biopython at maubp.freeserve.co.uk Tue Jan 15 14:09:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Jan 2008 14:09:47 +0000 Subject: [BioPython] [Biopython] Blast problem In-Reply-To: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> References: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com> Message-ID: <320fb6e00801150609y16c77bdch927dd6d9689996a5@mail.gmail.com> Hi Joost, > Iam having a problem with the hsp.match function in the Bio-Python Blast > module. A few weeks ago the hsp.match returned me the following: > > ATGGCA++TGG > > But now it gives me: > > ATGGCA TGG > > I can't see the number of gaps anymore, anyone a solution for this? Are you using the online version of blast with Biopython? Perhaps the NCBI changed something. Are you parsing the XML output or the plain text? Can you provide any more information (e.g. which version of Biopython). Thanks Peter From luca.beltrame at unimi.it Thu Jan 17 14:12:55 2008 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Thu, 17 Jan 2008 15:12:55 +0100 Subject: [BioPython] KEGG Gene parser? Message-ID: <200801171512.55898.luca.beltrame@unimi.it> Hello. I'd like to know if there is a parser that can parse KEGG gene entries. As far as I can see, Bio.KEGG can only do Compound and Enzyme. Should there be the need I'm thinking about writing one, but since in 2004 someone had posted something (now no longer available), I'm asking the list first. Thanks. From lueck at ipk-gatersleben.de Mon Jan 21 11:21:52 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Mon, 21 Jan 2008 12:21:52 +0100 Subject: [BioPython] blastall questions (output, full length subject) Message-ID: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> Hi! I need again some advice for a local blast with blastall. First of all, everything works fine, I just have some questions on how to continue: 1) How can I see the full length of the subject? I always can see only this part, which is matching with the query. 2) How are your suggestions to continue with the xml output? I want to sort the Hits by % of matching and my idea was it to put everything in a dictionary (%match as key and all the rest information's as values). Is this the right way? Greetings Stefanie From winter at biotec.tu-dresden.de Mon Jan 21 13:18:15 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Mon, 21 Jan 2008 14:18:15 +0100 Subject: [BioPython] blastall questions (output, full length subject) In-Reply-To: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> References: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de> Message-ID: <47949B97.90205@biotec.tu-dresden.de> Stefanie L?ck wrote: > Hi! > > I need again some advice for a local blast with blastall. > > First of all, everything works fine, I just have some questions on how to > continue: > > 1) How can I see the full length of the subject? I always can see only this > part, which is matching with the query. Hi Stefanie, you suffered from the slightly confusing naming in the BioPython NCBIXML class. Here is an explanation: alignment.length = total length of unaligned hit sequence record.query_letters = length of query sequence len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment with parser = NCBIXML.BlastParser() records = parser.parse(open(blast_results_file)) for record in records: for alignment in record.alignments: for hsp in alignment.hsps: # do s.th. > 2) How are your suggestions to continue with the xml output? I want to sort > the Hits by % of matching and my idea was it to put everything in a > dictionary (%match as key and all the rest information's as values). If you refer to the sequence identity percentage, you can use sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query)) To use the sequence identity as key in a dictionary, you would have to keep a list (or set) of records as value, since different records (hits) can have the same sequence identity. I would recommend to just keep a set (or list) of records, and use the key or cmp parameter of Python's sort function to sort by one field of the record: http://wiki.python.org/moin/HowTo/Sorting If you only need some information of the record, it might be even easier to store this information in a list, and keep a set (or list) of these lists. HTH, Christof PS: Maybe we could enrich NCBIXML.py for some more meaningful variables? > > Is this the right way? > > > > Greetings > > Stefanie > > > > _______________________________________________ BioPython mailing list - > BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Mon Jan 21 15:15:45 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Jan 2008 15:15:45 +0000 Subject: [BioPython] KEGG Gene parser? In-Reply-To: <200801171512.55898.luca.beltrame@unimi.it> References: <200801171512.55898.luca.beltrame@unimi.it> Message-ID: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> On Jan 17, 2008 Luca Beltrame wrote: > Hello. > > I'd like to know if there is a parser that can parse KEGG gene entries. As far > as I can see, Bio.KEGG can only do Compound and Enzyme. And there is also Bio.KEGG.Map, but you are right, there doesn't seem to be anything for KEGG gene entries. > Should there be the need I'm thinking about writing one, but since in 2004 > someone had posted something (now no longer available), I'm asking the list > first. It looks no-else is working on any KEGG code, so if you still want to write something it could be be useful. Are you happy to write this in a similar style to the existing Bio.KEGG modules, and put together some basic documentation and a test case too? Peter From jkhilmer at gmail.com Tue Jan 22 19:41:07 2008 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 22 Jan 2008 12:41:07 -0700 Subject: [BioPython] KEGG Gene parser? In-Reply-To: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> References: <200801171512.55898.luca.beltrame@unimi.it> <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com> Message-ID: <81277ce10801221141ya4f0d3fr87858102274d6e2e@mail.gmail.com> Luca, My lab also has interest in KEGG gene entries. Although I have minimal experience in professional Python programming, I would be happy to help in any way: perhaps testing etc. Jonathan Hilmer Bothner Research Group Montana State University On Jan 21, 2008 8:15 AM, Peter wrote: > On Jan 17, 2008 Luca Beltrame wrote: > > Hello. > > > > I'd like to know if there is a parser that can parse KEGG gene entries. As far > > as I can see, Bio.KEGG can only do Compound and Enzyme. > > And there is also Bio.KEGG.Map, but you are right, there doesn't seem > to be anything for KEGG gene entries. > > > Should there be the need I'm thinking about writing one, but since in 2004 > > someone had posted something (now no longer available), I'm asking the list > > first. > > It looks no-else is working on any KEGG code, so if you still want to > write something it could be be useful. Are you happy to write this in > a similar style to the existing Bio.KEGG modules, and put together > some basic documentation and a test case too? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From bsantos at biocant.pt Wed Jan 23 17:55:18 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 23 Jan 2008 17:55:18 +0000 Subject: [BioPython] Problems runing BLAST Message-ID: <20080123175518.eab8a089@mail.biocant.pt> Hi I use to run blastall without any problems, but now I have moved all my scripts to a server runing Fedora Core 6 and now I get the folowing error when parsing the blast results: Traceback (most recent call last): File "/usr/local/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 568, in parse raise SyntaxError("Your XML file did not start References: <20080123175518.eab8a089@mail.biocant.pt> Message-ID: On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > raise SyntaxError("Your XML file did not start SyntaxError: Your XML file did not start References: <20080123175518.eab8a089@mail.biocant.pt> Message-ID: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> On 1/23/08, Sebastian Bassi wrote: > On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > > raise SyntaxError("Your XML file did not start > SyntaxError: Your XML file did not start > Can you show us the result of: > head your_xml_file.xml Seeing the start of the XML file would be very helpful. And if is empty, what has been written to the error handle? I would guess maybe the database is in a new location or something simple like that... print error_info.read() Another thing to check is the version of Biopython on the new machine. Earlier versions would default to asking blast for plain text output instead of XML. Peter From bsantos at biocant.pt Fri Jan 25 12:15:56 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 25 Jan 2008 12:15:56 -0000 Subject: [BioPython] Problems runing BLAST In-Reply-To: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> References: <20080123175518.eab8a089@mail.biocant.pt> <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> Message-ID: <000301c85f4c$0bd830d0$23889270$@pt> I wasn't using any XML file as intermediate, I was parsing the blast results directly. But it was really a problem with the databases. Now it's solved. My question now is another one, I'm blasting a multifasta file, so I need to know which results belongs to which query sequence ID. I Know I can simply assume that the blast result is ordered according to the sequences in the fasta file, but is any other away to obtain the query ID directly using the Blast Record class? Thanks in advance, Bruno Santos -----Mensagem original----- De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de Peter Enviada: quarta-feira, 23 de Janeiro de 2008 21:07 Para: Sebastian Bassi Cc: Bruno Santos; biopython at biopython.org Assunto: Re: [BioPython] Problems runing BLAST On 1/23/08, Sebastian Bassi wrote: > On Jan 23, 2008 3:55 PM, Bruno Santos wrote: > > raise SyntaxError("Your XML file did not start > SyntaxError: Your XML file did not start > Can you show us the result of: > head your_xml_file.xml Seeing the start of the XML file would be very helpful. And if is empty, what has been written to the error handle? I would guess maybe the database is in a new location or something simple like that... print error_info.read() Another thing to check is the version of Biopython on the new machine. Earlier versions would default to asking blast for plain text output instead of XML. Peter From winter at biotec.tu-dresden.de Fri Jan 25 13:02:06 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Fri, 25 Jan 2008 14:02:06 +0100 Subject: [BioPython] Problems runing BLAST In-Reply-To: <000301c85f4c$0bd830d0$23889270$@pt> References: <20080123175518.eab8a089@mail.biocant.pt> <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com> <000301c85f4c$0bd830d0$23889270$@pt> Message-ID: <4799DDCE.1030205@biotec.tu-dresden.de> Bruno Santos wrote: > I wasn't using any XML file as intermediate, I was parsing the blast results > directly. But it was really a problem with the databases. Now it's solved. > > My question now is another one, I'm blasting a multifasta file, so I need to > know which results belongs to which query sequence ID. I Know I can simply > assume that the blast result is ordered according to the sequences in the > fasta file, but is any other away to obtain the query ID directly using the > Blast Record class? record.query? Try exploring your Blast Record instance on a Python shell with the dir function: >>> record >>> dir(record) ['__doc__', '__init__', '__module__', '_num_letters_in_database', 'alignments', 'application', 'blast_cutoff', 'database', 'database_length', 'database_letters', 'database_name', 'database_sequences', 'date', 'descriptions', 'dropoff_1st_pass', 'effective_database_length', 'effective_hsp_length', 'effective_query_length', 'effective_search_space', 'effective_search_space_used', 'expect', 'filter', 'frameshift', 'gap_penalties', 'gap_trigger', 'gap_x_dropoff', 'gap_x_dropoff_final', 'gapped', 'hsps_gapped', 'hsps_no_gap', 'hsps_prelim_gapped', 'hsps_prelim_gapped_attemped', 'ka_params', 'ka_params_gap', 'matrix', 'multiple_alignment', 'num_good_extends', 'num_hits', 'num_letters_in_database', 'num_seqs_better_e', 'num_sequences', 'num_sequences_in_database', 'posted_date', 'query', 'query_id', 'query_length', 'query_letters', 'reference', 'sc_match', 'sc_mismatch', 'threshold', 'version', 'window_size'] Cheers, Christof > > Thanks in advance, > Bruno Santos From mjldehoon at yahoo.com Fri Jan 25 13:04:38 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Jan 2008 05:04:38 -0800 (PST) Subject: [BioPython] Bio.EUtils Message-ID: <8786.65209.qm@web62404.mail.re1.yahoo.com> Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. From mjldehoon at yahoo.com Sat Jan 26 05:38:01 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Jan 2008 21:38:01 -0800 (PST) Subject: [BioPython] Bio.EUtils In-Reply-To: Message-ID: <367303.23759.qm@web62406.mail.re1.yahoo.com> Dear Rohini, Thank you for your example. It was very helpful. Just a few questions about it: > dbinfo = EUtils.databases['pubmed'] Is this statement needed? The variable dbinfo is not used in your example, and the example words fine without this statement. > Then parse the xml or text lines. Do you parse the xml or text output yourself, or do you use any Biopython tools for that? The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils: >>> from Bio.WWW import NCBI >>> lines = NCBI.efetch(db='pubmed', id=listids, retmode='xml' ).readlines() # or retmode='text' I am saying "almost" the same, because currently Bio.WWW.NCBI.efetch does not handle multiple listids (so it accepts listids = '18211820' but not listids = ['18211820', '18211718', '18178374']). However, this can be fixed very easily in Biopython. My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI? Thanks again, --Michiel. Rohini Damle wrote: Hi, Here is how I use Bio.Eutils: from Bio import EUtils from Bio.EUtils import DBIdsClient dbinfo = EUtils.databases['pubmed'] #listids is a list of pubmed ids record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids)) rec2= record.efetch(retmode="xml",rettype=None).readlines() # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format Then parse the xml or text lines. Thanks -Rohini. On Jan 25, 2008 5:04 AM, Michiel de Hoon wrote: Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython Rohini Damle wrote: Hi, Here is how I use Bio.Eutils: from Bio import EUtils from Bio.EUtils import DBIdsClient dbinfo = EUtils.databases['pubmed'] #listids is a list of pubmed ids record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids)) rec2= record.efetch(retmode="xml",rettype=None).readlines() # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format Then parse the xml or text lines. Thanks -Rohini. On Jan 25, 2008 5:04 AM, Michiel de Hoon wrote: Hello everybody, I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering: 1) Is anybody using Bio.EUtils? 2) If so, could you give an example script that uses Bio.EUtils? So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others. Thanks! --Michiel. --------------------------------- Never miss a thing. Make Yahoo your homepage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From rjalves at igc.gulbenkian.pt Mon Jan 28 09:58:50 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 09:58:50 +0000 Subject: [BioPython] Translation issues Message-ID: <479DA75A.6070804@igc.gulbenkian.pt> Hi. I'm trying to automate and validate the process of translation in sequences downloaded from NCBI. Basically I fetch a GenBank file, extract the DNA sequences and use the Translation module of BioPython to check if it matches. The problem is that the starting aminoacid in NCBI is always M but with the Translation module isn't, even if the codon is marked as "starting" in the corresponding codon table. So for instance, the sequence : "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA" with codon table 11 will translate to: a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" while the translation on the GenBank file is: b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" causing the test a == b to fail. The sequences are exactly the same with the exception of the initial aminoacid I could do the test in other ways and remove the initial letter, but that wouldn't work globally. So, is this the right behavior or am I missing something? Any other suggestions to do this test will also help. Thanks -- Renato Alves From biopython at maubp.freeserve.co.uk Mon Jan 28 10:40:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 28 Jan 2008 10:40:28 +0000 Subject: [BioPython] Translation issues In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt> References: <479DA75A.6070804@igc.gulbenkian.pt> Message-ID: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> On 1/28/08, Renato Alves wrote: > Hi. > > I'm trying to automate and validate the process of translation in > sequences downloaded from NCBI. ... The problem is > that the starting aminoacid in NCBI is always M but with the Translation > module isn't, even if the codon is marked as "starting" in the > corresponding codon table. > > So, is this the right behavior or am I missing something? Sadly, that is the just way the translation module works. This is a fairly common problem, and its one I was planning to try and "fix" as part of Bug 2382 http://bugzilla.open-bio.org/show_bug.cgi?id=2381 I would like some comments on the ideas on that bug - for example would you prefer separate methods/functions for blind translation, translation until a stop codon, and translation from a start codon which is treated as an M - or a single method with lots of optional arguments? > Any other suggestions to do this test will also help. Right now, I would check the start codon yourself and then use an M when translating the sequence. Remember the codon table (table 11 in your example) should have all the valid start codons defined. Peter From bsouthey at gmail.com Mon Jan 28 14:42:22 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 28 Jan 2008 08:42:22 -0600 Subject: [BioPython] Translation issues In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt> References: <479DA75A.6070804@igc.gulbenkian.pt> Message-ID: Hi, Please see: http://en.wikipedia.org/wiki/Start_codon "In addition to AUG, alternative start codons, mainly GUG and UUG are used in prokaryotes. For example E. coli uses 77% ATG (AUG), 14% GTG (GUG), 8% TTG (UUG) and a few others." Really the only way is to compare the sequences after the first position (a[1:]==b[1:]) assuming you expect an exact match. Alternatively you need to perform some type of alignment and flag unexpected differences. Regards Bruce On Jan 28, 2008 3:58 AM, Renato Alves wrote: > Hi. > > I'm trying to automate and validate the process of translation in > sequences downloaded from NCBI. > > Basically I fetch a GenBank file, extract the DNA sequences and use the > Translation module of BioPython to check if it matches. The problem is > that the starting aminoacid in NCBI is always M but with the Translation > module isn't, even if the codon is marked as "starting" in the > corresponding codon table. > > So for instance, the sequence : > > "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA > ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC > GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA > ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA > CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG > AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA > ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA > AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA" > > with codon table 11 will translate to: > > a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN > LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG > FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" > > while the translation on the GenBank file is: > > b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN > LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG > FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ" > > causing the test a == b to fail. The sequences are exactly the same with > the exception of the initial aminoacid > > I could do the test in other ways and remove the initial letter, but > that wouldn't work globally. > > So, is this the right behavior or am I missing something? > > Any other suggestions to do this test will also help. > > Thanks > -- > Renato Alves > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From rjalves at igc.gulbenkian.pt Mon Jan 28 15:37:57 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 15:37:57 +0000 Subject: [BioPython] Translation issues In-Reply-To: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> References: <479DA75A.6070804@igc.gulbenkian.pt> <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com> Message-ID: <479DF6D5.7020709@igc.gulbenkian.pt> Peter wrote: > Sadly, that is the just way the translation module works. This is a > fairly common problem, and its one I was planning to try and "fix" as > part of Bug 2382 > http://bugzilla.open-bio.org/show_bug.cgi?id=2381 > In this case, I guess that something that tests if the 1st codon is a start codon and matches the codon table's start codons, would be replaced by "M". But this is a very naive and specific thing. I don't know if this could break other uses of this function. > I would like some comments on the ideas on that bug - for example > would you prefer separate methods/functions for blind translation, > translation until a stop codon, and translation from a start codon > which is treated as an M - or a single method with lots of optional > arguments? > I don't have the expertise to distinguish the pros and cons between the two approaches. Still, in terms of potential user friendliness, I would go for separate methods/functions to keep the task simple and obvious. > Right now, I would check the start codon yourself and then use an M > when translating the sequence. Remember the codon table (table 11 in > your example) should have all the valid start codons defined. > I'm adopting the technique suggested by Bruce Southey to workaround this particular problem. Still this wouldn't work on more elaborate cases like some of the ones described on the bug thread you mentioned. Still, many thanks for the quick and clean answers. Renato From rjalves at igc.gulbenkian.pt Mon Jan 28 17:42:05 2008 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Mon, 28 Jan 2008 17:42:05 +0000 Subject: [BioPython] Alphabet Checking Message-ID: <479E13ED.2080908@igc.gulbenkian.pt> /var/lib/python-support/python2.4/Bio/Translate.py in translate_to_stop(self, seq) 34 def translate_to_stop(self, seq): 35 # This doesn't have a stop encoding ---> 36 assert seq.alphabet == self.table.nucleotide_alphabet, \ 37 "cannot translate from given alphabet (have %s, need %s)" %\ 38 (seq.alphabet, self.table.nucleotide_alphabet) AssertionError: cannot translate from given alphabet (have IUPACAmbiguousDNA(), need IUPACAmbiguousDNA()) Aren't those two exactly equal? Matching references doesn't seem to work as expected :( What I did: from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA from Bio import Translate from Bio import Seq a=Seq.Seq("ATCGGATGA...ATGCAGT",alphabet=IUPACAmbiguousDNA()) b=Translate.ambiguous_dna_by_id[11] b.translate_to_stop(a) ... error pops out The only way around I was able to find is: b.table.nucleotide_alphabet=a.alphabet I guess this is a bad day :( it's the second clash with the Translate module in the same day :| Should I report this as bug? From p.j.a.cock at googlemail.com Mon Jan 28 17:56:11 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Jan 2008 17:56:11 +0000 Subject: [BioPython] Alphabet Checking In-Reply-To: <479E13ED.2080908@igc.gulbenkian.pt> References: <479E13ED.2080908@igc.gulbenkian.pt> Message-ID: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> > Aren't those two exactly equal? > > Matching references doesn't seem to work as expected :( That does look like a bug... > The only way around I was able to find is: Another option, from Bio import Translate from Bio import Seq trans=Translate.ambiguous_dna_by_id[11] a=Seq.Seq("ATCGGATGAATGCAGT",alphabet=trans.table.nucleotide_alphabet) print trans.translate_to_stop(a) print trans.translate(a) > I guess this is a bad day :( it's the second clash with the Translate > module in the same day :| I don't like the Bio.Translate module either. > Should I report this as bug? Please do. If we do just add translation to the seq object (bug 2381) and deprecate the Bio.Translate module then in a sense this problem goes away ;) Peter From tiagoantao at gmail.com Mon Jan 28 18:10:56 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 28 Jan 2008 18:10:56 +0000 Subject: [BioPython] Alphabet Checking In-Reply-To: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> References: <479E13ED.2080908@igc.gulbenkian.pt> <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com> Message-ID: <6d941f120801281010r6e8e829dub26a85e6a0b61983@mail.gmail.com> On Jan 28, 2008 5:56 PM, Peter Cock wrote: > > Aren't those two exactly equal? > > > > Matching references doesn't seem to work as expected :( > > That does look like a bug... It is probably completely unrelated, but it might not... >From an "helicopter view" at the code I have noticed that SeqIO uses Nexus in some cases. I have patched a previous Nexus bug by using deepcopy, which could cause something like this: AssertionError: cannot translate from given alphabet (have IUPACAmbiguousDNA(), need IUPACAmbiguousDNA()) (ie, it has the same type name, but is really not the same object) Again, it is probably unrelated (I know very little about Bio.Seq and Bio.SeqIO), but, just in case... From mjldehoon at yahoo.com Tue Jan 29 00:43:22 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 28 Jan 2008 16:43:22 -0800 (PST) Subject: [BioPython] Bio.EUtils In-Reply-To: Message-ID: <356164.74184.qm@web62403.mail.re1.yahoo.com> Rohini Damle wrote: The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils: ... My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI? I guess Bio.EUtils is faster, can be used for batch-processing (like fetching records for a list of pubmed ids) . I have not tried Bio.WWW.NCBI , will try it and get back to you. If you make the following modification in Bio.WWW.NCBI.py: line 189: replace options = urllib.urlencode(params) by options = urllib.urlencode(params, doseq=1) then Bio.WWW.NCBI can also fetch records for a list of pubmed ids. I'm guessing that then it is as fast as (or faster than) Bio.EUtils, but I'd be interested in what you find in practice. Thanks, --Michiel --------------------------------- Never miss a thing. Make Yahoo your homepage. From bsantos at biocant.pt Tue Jan 29 11:34:07 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 29 Jan 2008 11:34:07 -0000 Subject: [BioPython] Problems runing BLAST Message-ID: <000101c8626a$dd98e760$98cab620$@pt> I am once more having problems running blast using biopython. I start the script the blastall process starts and after a few minutes it starts sleeping and no error message is passed. When I check the xml file it only writes part of the results for the first sequence. Does anyone has ever had the same problem? I'm using: python 2.5.1 biopython 1.44 blastall 2.2.16 My code is the following: from Bio import SeqIO from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML import time import math import time import os primer = 'D2' sample = 'AGC' #Defines all the databases that will be used my_blast_db = ('\"/home/bsantos/DataBases/nt.00 /home/bsantos/DataBases/nt.01 /home/bsantos/DataBases/nt.02 /home/bsantos/DataBases/nt.03 /home/bsantos/DataBases/nt.04 /home/bsantos/DataBases/nt.05 /home/bsantos/DataBases/RDPIIdb /home/bsantos/DataBases/RNADB\"') print my_blast_db #Define the fasta file to Blast destination = '/home/bsantos/Metagenomics/Results/' + sample + '/' + primer + '/filteredfile_sample' + sample + '_' + primer + '_F.fasta' my_blast_file = (destination) #Defines the blast binaries my_blast_exe = "/usr/local/bin/blastall" print (os.path.exists(my_blast_exe)) print time.ctime() #Performs Blast print 'Now Performing Blast' result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) print 'This errors have occured:' print error_info.read() print 'Starting parsing the results.......' #Parse the result of the blast in XML format blast_results = result_handle.read() #Catch the results save_file = open('/home/bsantos/Metagenomics/Results/' + sample + '/' + primer + '/BlastReport_sample' + sample + '_' + primer + '_F.xml', 'w') save_file.write(blast_results) #Write all the information to an XML file save_file.flush() save_file.close() From biopython at maubp.freeserve.co.uk Tue Jan 29 12:15:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 29 Jan 2008 12:15:26 +0000 Subject: [BioPython] Problems runing BLAST In-Reply-To: <000101c8626a$dd98e760$98cab620$@pt> References: <000101c8626a$dd98e760$98cab620$@pt> Message-ID: <320fb6e00801290415g10e099dj108ecea15a72109c@mail.gmail.com> Hi Bruno, On Jan 29, 2008 11:34 AM, Bruno Santos wrote: > I am once more having problems running blast using biopython. I start the > script the blastall process starts and after a few minutes it starts > sleeping and no error message is passed. When I check the xml file it only > writes part of the results for the first sequence. Have you tried running the same command "by hand" at the command line, to check that is works, and time how long you should expect it to take? > Does anyone has ever had the same problem? I think the problem is to do with asking the operating system to read all the error output. Try commenting out this bit, and only read the error handle if you have a problem: # print error_info.read() Quoting from the tutorial, >> The error info can be hard to deal with, because if you try to do a error_handle.read() and >> there was no error info returned, then the read() call will block and not return, locking your >> script. In my opinion, the best way to deal with the error is only to print it out if you are not >> getting result_handle results to be parsed, but otherwise to leave it alone. Peter From jblanca at btc.upv.es Wed Jan 30 09:15:49 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Wed, 30 Jan 2008 10:15:49 +0100 Subject: [BioPython] blast parse Message-ID: <200801301015.50812.jblanca@btc.upv.es> Hi: I'm new on the list and on biopython. I come from perl and I'm liking python a lot. I'm trying to read a big blast file and it takes a lot o time and memory. I'm not sure if I'm taking the most efficient path. Basically I'm doing: blasth = file('blast.xml', 'r') from Bio.Blast import NCBIXML p = NCBIXML.BlastParser() blast_parse = p.parse(blasth) for blast_result in blast_parse: #do whatever I was expecting to read the records one by one, but the call to p.parse(blasth) takes a lot of time and memory. I'm not sure about what this function returns, a list or an iterator. I've looked at the NCBIXML.py file and the BlastParser class has two parse methods (am I wrong?). def parse(self, handler): """Parses the XML data handler -- file handler or StringIO This method returns a list of Blast record objects. """ def parse(handle, debug=0): """Returns an iterator a Blast record for each query. handle - file handle to and XML file to parse debug - integer, amount of debug information to print This is a generator function that returns multiple Blast records objects - one for each query sequence given to blast. The file is read incrementally, returning complete records as they are read in. I guess that the first function would read the complete file before returning anything, but the second should return and read the records one by one. I don't know if this guess is correct. Is there other way to read these huge blast files without using so much memory? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From mjldehoon at yahoo.com Wed Jan 30 09:56:56 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 30 Jan 2008 01:56:56 -0800 (PST) Subject: [BioPython] blast parse In-Reply-To: <200801301015.50812.jblanca@btc.upv.es> Message-ID: <940738.9737.qm@web62407.mail.re1.yahoo.com> Dear Jose, To get the records one-by-one, use from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for blast_result in blast_parse: # do whatever with blast_result This avoids having to read the complete XML file all at once. To the developers: We should probably think about removing the NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read exactly one record from the XML file. --Michiel. Jose Blanca wrote: Hi: I'm new on the list and on biopython. I come from perl and I'm liking python a lot. I'm trying to read a big blast file and it takes a lot o time and memory. I'm not sure if I'm taking the most efficient path. Basically I'm doing: blasth = file('blast.xml', 'r') from Bio.Blast import NCBIXML p = NCBIXML.BlastParser() blast_parse = p.parse(blasth) for blast_result in blast_parse: #do whatever I was expecting to read the records one by one, but the call to p.parse(blasth) takes a lot of time and memory. I'm not sure about what this function returns, a list or an iterator. I've looked at the NCBIXML.py file and the BlastParser class has two parse methods (am I wrong?). def parse(self, handler): """Parses the XML data handler -- file handler or StringIO This method returns a list of Blast record objects. """ def parse(handle, debug=0): """Returns an iterator a Blast record for each query. handle - file handle to and XML file to parse debug - integer, amount of debug information to print This is a generator function that returns multiple Blast records objects - one for each query sequence given to blast. The file is read incrementally, returning complete records as they are read in. I guess that the first function would read the complete file before returning anything, but the second should return and read the records one by one. I don't know if this guess is correct. Is there other way to read these huge blast files without using so much memory? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Never miss a thing. Make Yahoo your homepage. From lueck at ipk-gatersleben.de Wed Jan 30 10:24:55 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Wed, 30 Jan 2008 11:24:55 +0100 Subject: [BioPython] Clustalw pair wise alignment Message-ID: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> Hi! I working with clustalw and everything works fine. No I have some questions: 1) Must the input data be in a file or can it also be in the code (e.g. in a list)? 2) Because, I want to do many (up to hundreds) pair wise alignments (short sequences) and I don't want to store each of them in a separate file. If I have it in one file, clustalw make a multiple alignment: Match1 ------CAAGATTTGAGCACCACAGGCAA--- full1 ------CAAGATTTGAGCACCACAGGCAACAG Match0 AGCCTTCAAGATTTGAGCACCACAG------- full0 AGCCTTCAAGATTTGAGCACCACAG------- whereas Match1 should only align to full1 and so on. Could someone give a hint? Regards Stefanie From biopython at maubp.freeserve.co.uk Wed Jan 30 11:47:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 30 Jan 2008 11:47:42 +0000 Subject: [BioPython] Clustalw pair wise alignment In-Reply-To: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> References: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00801300347h6f1ec197qc599ec9f2c80bab@mail.gmail.com> Hi Stefanie > I working with clustalw and everything works fine. No I have some questions: > > 1) Must the input data be in a file or can it also be in the code (e.g. in a list)? I believe for the Clustalw command line tool, you have to supply the input data in a file. > 2) Because, I want to do many (up to hundreds) pair wise > alignments (short sequences) and I don't want to store > each of them in a separate file. > > If I have it in one file, clustalw make a multiple alignment: Yes, that is expected for clustalw. > Could someone give a hint? If you want to use Clustalw, you could re-use a temporary file for each pair of sequences (rather than creating hundreds of different input files). I would consider using the EMBOSS tools "needle" or "water" for doing pairwise alignments. These have the advantage that you can actually supply the sequence as part of the command line (provided they are not too long). See http://emboss.sourceforge.net/apps/ and also http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#asis Peter From winter at biotec.tu-dresden.de Wed Jan 30 12:48:34 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 30 Jan 2008 13:48:34 +0100 Subject: [BioPython] blast parse In-Reply-To: <940738.9737.qm@web62407.mail.re1.yahoo.com> References: <940738.9737.qm@web62407.mail.re1.yahoo.com> Message-ID: <47A07222.9000200@biotec.tu-dresden.de> Michiel de Hoon wrote: > Dear Jose, > > To get the records one-by-one, use > > from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for > blast_result in blast_parse: # do whatever with blast_result > > This avoids having to read the complete XML file all at once. > > To the developers: We should probably think about removing the > NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read > exactly one record from the XML file. I thinks removing NCBIXML.BlastParser.parse is a good idea. We should keep it simple. Christof