From aurelie.bornot at free.fr Fri Apr 1 07:44:39 2005 From: aurelie.bornot at free.fr (=?iso-8859-1?Q?Aur=E9lie_Bornot?=) Date: Fri Apr 1 07:39:00 2005 Subject: [BioPython] Blast and PsiBlast Message-ID: <007501c536b8$91de1420$0b413851@YSENGARD> Hello ! I am new in programming with python and Biopython !! First : Thanks guies for all the work that is already done ! I must deal a lot of types of alignments, and I am in trouble for some questions ... : ( - I don't understand why qblast( ) can't "call" the program called blastx.. ? - I am searching for how to do psiBLAST or PhiBlast on line : is it possible ? - Is it possible to use the BlastErrorParser () (NCBIStandalone) to parse a file which come from some qblast() online ? - I am also searching if there is a module to deal with Smith&Waterman and Needleman&Wunch ? and if not : how can I a do ? a lot of questions...... Could someone try to answer to any of these questions ??? or give me some clues and advice Thanks a lot Aur?lie From aurelie.bornot at free.fr Fri Apr 1 08:04:15 2005 From: aurelie.bornot at free.fr (=?iso-8859-1?Q?Aur=E9lie_Bornot?=) Date: Fri Apr 1 07:58:58 2005 Subject: [BioPython] Bioregistry...doesn't work with me Message-ID: <008401c536bb$4ead9470$0b413851@YSENGARD> Hello, Sorry for "spaming" I am trying to retrieve sequences with the script of the CookBook and it doesn't work... I don't understand why.... Can someone tell me what I did wrong ? >>> from Bio import db >>> print db db, exporting 'embl', 'embl-dbfetch-cgi', 'embl-ebi-cgi', 'embl-fast', 'embl-xem bl-cgi', 'embl-xml', 'fasta', 'fasta-sequence-eutils', 'genbank-nucleotide', 'ge nbank-protein', 'interpro', 'interpro-ebi-cgi', 'medline', 'medline-eutils', 'nu cleotide-genbank-eutils', 'pdb', 'pdb-ebi-cgi', 'pdb-rcsb-cgi', 'prodoc', 'prodo c-expasy-cgi', 'prosite', 'prosite-expasy-cgi', 'protein-genbank-eutils', 'swiss prot', 'swissprot-expasy-cgi', 'swissprot-usmirror-cgi' >>> db.keys() ['embl-dbfetch-cgi', 'embl-xml', 'fasta', 'prodoc-expasy-cgi', 'interpro', 'pdb- rcsb-cgi', 'swissprot-expasy-cgi', 'swissprot', 'prosite', 'swissprot-usmirror-c gi', 'fasta-sequence-eutils', 'interpro-ebi-cgi', 'embl', 'nucleotide-genbank-eu tils', 'protein-genbank-eutils', 'prodoc', 'prosite-expasy-cgi', 'embl-fast', 'm edline', 'genbank-nucleotide', 'pdb-ebi-cgi', 'medline-eutils', 'genbank-protein ', 'embl-ebi-cgi', 'embl-xembl-cgi', 'pdb'] >>> sp = db["swissprot"] >>> sp >>> record_handle = sp['O23729'] Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\lib\site-packages\Bio\config\DBRegistry.py", line 150, in __ getitem__ data = self._run_serial(key) File "C:\Python24\lib\site-packages\Bio\config\DBRegistry.py", line 217, in _r un_serial raise KeyError, "I could not get any results." KeyError: 'I could not get any results.' I have verified : the entry O23729 is still correct.... I must work with Windows XP, may the problem comes from this...? Thanks ! Aur?lie From jtk at cmp.uea.ac.uk Fri Apr 1 09:24:14 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri Apr 1 08:23:03 2005 Subject: [BioPython] Bioregistry...doesn't work with me In-Reply-To: <008401c536bb$4ead9470$0b413851@YSENGARD> References: <008401c536bb$4ead9470$0b413851@YSENGARD> Message-ID: <20050401142414.GG15157@jtkpc.cmp.uea.ac.uk> On Fri, Apr 01, 2005 at 03:04:15PM +0200, Aur?lie Bornot wrote: > > I am trying to retrieve sequences with the script of the CookBook and it doesn't work... > I don't understand why.... > Can someone tell me what I did wrong ? [deleted output of print commands] > >>> from Bio import db > >>> print db > >>> db.keys() > >>> sp = db["swissprot"] > >>> sp > > >>> record_handle = sp['O23729'] > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Python24\lib\site-packages\Bio\config\DBRegistry.py", line 150, in __ > getitem__ > data = self._run_serial(key) > File "C:\Python24\lib\site-packages\Bio\config\DBRegistry.py", line 217, in _r > un_serial > raise KeyError, "I could not get any results." > KeyError: 'I could not get any results.' > > > I have verified : the entry O23729 is still correct.... Your example works for me: >>> record_handle = sp['O23729'] >>> record_handle >>> print record_handle.readline() ID CHS3_BROFI STANDARD; PRT; 394 AA. >>> print record_handle.readline() AC O23729; >>> print record_handle.readline() DT 15-JUL-1999 (Rel. 38, Created) > I must work with Windows XP, may the problem comes from this...? That's always a possiblility (I've tried on Linux), but it's more likely a network problem, Biopython may have been unable to get the record because e.g. * you are behind some firewall that doesn't allow the kind of http used by Biopython for retrieving the record * the network connectivity between your site and www.expasy.ch may have been transiently down Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From LKatz at smith.edu Fri Apr 1 13:22:38 2005 From: LKatz at smith.edu (Laura A. Katz) Date: Fri Apr 1 13:18:48 2005 Subject: [BioPython] Exclusion limits Message-ID: How can I alter my GB search scripts to exclude the following: __ exclude ESTs __ exclude STSs __ exclude GSS __ exclude TPA __ exclude working draft __ exclude patents __ exclude all of the above Note: these exclusion sets appear under 'limits' on the NCBI web browser. From cymon at duke.edu Fri Apr 1 14:37:10 2005 From: cymon at duke.edu (Cymon Cox) Date: Fri Apr 1 14:31:18 2005 Subject: [BioPython] Exclusion limits In-Reply-To: References: Message-ID: <1112384229.29244.63.camel@isis.biology.duke.edu> Hi Laura, On Fri, 2005-04-01 at 13:22, Laura A. Katz wrote: > How can I alter my GB search scripts to exclude the following: > > __ exclude ESTs __ exclude STSs __ exclude GSS __ exclude TPA > __ exclude working draft __ exclude patents __ exclude all of the above > According to this page: http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/InfoHubs/Exercises/infohubs_qa_ests_exclude.html you should be able to add the qualifiers to the end of your search string: NOT gbdiv_est[prop] NOT gbdiv_sts[prop] NOT gbdiv_gss[prop] NOT gbdiv_tpa[prop] NOT gbdiv_pat[prop] etc I'm guessing at the latter and I dont know what the qualifier for working draft might be... Cymon Note I'm replying to the Biopython list here for reference... > > Note: these exclusion sets appear under 'limits' on the NCBI web browser. > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython ____________________________________________________________________ Cymon J. Cox Research Associate Department of Biology Duke University BOX 90338 Durham NC 27708 Phone : (919) 660-7370 Fax : (919) 660-7293 HomePage : http://www.duke.edu/~cymon _____________________________________________________________________ Fedora Core release 2 (Tettnang) isis.biology.duke.edu 14:35:15 up 12 days, 1:09, 11 users, load average: 0.12, 0.09, 0.05 From mdehoon at ims.u-tokyo.ac.jp Sun Apr 3 05:22:46 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 3 05:17:58 2005 Subject: [BioPython] Blast and PsiBlast In-Reply-To: <007501c536b8$91de1420$0b413851@YSENGARD> References: <007501c536b8$91de1420$0b413851@YSENGARD> Message-ID: <424FB5E6.6010203@ims.u-tokyo.ac.jp> Aur?lie Bornot wrote: > - I don't understand why qblast( ) can't "call" the program called blastx.. ? As far as I know, there is no fundamental reason why qblast cannot call blastx. Probably it has simply not yet been implemented. If you could write a patch to Bio/Blast/NCBIWWW.py so that qblast can run blastx, that would be great. If you are familiar with blastx, maybe such a patch would not be difficult. You probably would only need to change the parameters in the qblast function (but then again, I don't use blast much, so I am no expert on this matter). We'd be happy to include such a patch with Biopython. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From saccenti at cerm.unifi.it Tue Apr 5 10:45:41 2005 From: saccenti at cerm.unifi.it (Edoardo Saccenti) Date: Tue Apr 5 10:41:54 2005 Subject: [BioPython] parsing NMR PDB In-Reply-To: References: Message-ID: <200504051645.41921.saccenti@cerm.unifi.it> I'm parsing an NMR PDB file using the command parser = PDBparser(PERMISSIVE=1) #or = 0 structure = parser.get_structure(str_id, pdb_name) I get the warning: nonstandard resolution NOT APPLICABLE. parser works fine but I can't avoid this warning. Any idea? Thanks a lot Edoardo -- "Raffiniert ist der Herr Gott, aber boshaft ist Er nicht." --- Dr. Edoardo Saccenti CERM Nuclear Magnetic Resonance Research Center Scientific Pole - University of Florence Via Luigi Sacconi n? 6 50019 Sesto Fiorentino (FI) tel: +39 055 4574193 fax: +39 055 4574253 saccenti@cerm.unifi.it www.cerm.unifi.it From thamelry at binf.ku.dk Tue Apr 5 10:49:12 2005 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Tue Apr 5 22:06:31 2005 Subject: [BioPython] parsing NMR PDB In-Reply-To: <200504051645.41921.saccenti@cerm.unifi.it> References: <200504051645.41921.saccenti@cerm.unifi.it> Message-ID: <200504051649.12125.thamelry@binf.ku.dk> On Tuesday 05 April 2005 16:45, Edoardo Saccenti wrote: > I'm parsing an NMR PDB file > using the command > > parser = PDBparser(PERMISSIVE=1) #or = 0 > structure = parser.get_structure(str_id, pdb_name) > > I get the warning: > nonstandard resolution NOT APPLICABLE. > > parser works fine but I can't avoid this warning. > > Any idea? Everything is fine: NMR structures do not have a resolution. I think you are using an old version of Biopython, right? The current version does not generate this warning anymore. Best regards, -Thomas From rafael at nbn.ac.za Wed Apr 6 02:27:55 2005 From: rafael at nbn.ac.za (Rafael C. Jimenez) Date: Wed Apr 6 02:22:25 2005 Subject: [BioPython] mysql trouble In-Reply-To: References: Message-ID: Try to create a cursor of "conn" and then use the method execute to process the query. See one example from the MySQLdb help: #!/usr/bin/python # import MySQL module import MySQLdb # connect db = MySQLdb.connect(host="localhost", user="joe", passwd="secret", db="db56a") # create a cursor cursor = db.cursor() # execute SQL statement cursor.execute("""INSERT INTO animals (name, species) VALUES ("Harry", "Hamster")""") Good luck with your project! Cheers, Rafael. On 31/03/2005, at 12:30, pap501@york.ac.uk wrote: > Hi > > I am a masters student at the University of York working on a project > to create a database of DNA sequences. > > I am trying to grab information from text files using a python script > to insert data into a MySQL table. The problem is that although I can > query MySQL through Python I cannot write into MySQL through Python. I > am using the Windows platform with Python version 2.4 with Biopython > installed and MySQL Server 4.1. > > I have attached my python script to this email. > The script takes each text file in turn (although only one is listed > in the script at the mo) and inserts the library code, Genbank code > (primary key), TiGR code (if there is one, null otherwise) and the DNA > sequence (string of characters). Each file is a library of ~2000 DNA > sequences. The python script runs but does not write the information > into my MySQL table. When I query the table in MySQL it says that the > table is empty. > > Can anyone advise me what to do? Is it soemthing to do with the setup > of MySQL and/or Python or a problem with the Python script? > > Any advice would be greatly appreciated. > > Many thanks > > Phil_______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From rafael at nbn.ac.za Fri Apr 8 04:53:21 2005 From: rafael at nbn.ac.za (Rafael C. Jimenez) Date: Fri Apr 8 04:45:53 2005 Subject: [BioPython] biopython-epydoc Message-ID: <3ad44acdcc95b240b88904bae39fcabf@nbn.ac.za> I am trying to get html help pages of biopython through epydoc. Epydoc generate the html files but there is not information inside about the modules, classes, and/or functions. Any idea of how to do this? My input/output: ---------------------------------------------------------------------- nbn213:/System/Library/Frameworks/Python.framework/Versions/2.3/bin rc$ ls epydoc epydocgui idle pydoc python python2.3 nbn213:/System/Library/Frameworks/Python.framework/Versions/2.3/bin rc$ ./epydoc -o /Users/rc/Desktop/epydoc/biopython --css blue --private-css green -v -n biopython -u http://biopython.org --inheritance listed Bio Importing 1 modules. [1/1] Importing Bio Building API documentation for 1 modules. [1/1] Building docs for Bio Writing HTML docs (10 files) to '/Users/rc/Desktop/epydoc/biopython'. [ 1/10] Writing epydoc.css [ 2/10] Writing Bio-module.html [ 3/10] Writing indices.html [ 4/10] Writing trees.html [ 5/10] Writing help.html [ 6/10] Writing frames.html [ 7/10] Writing toc.html [ 8/10] Writing toc-everything.html [ 9/10] Writing toc-Bio-module.html [10/10] Writing index.html From Frederic.Sohm at iaf.cnrs-gif.fr Fri Apr 8 05:46:48 2005 From: Frederic.Sohm at iaf.cnrs-gif.fr (Frederic.Sohm@iaf.cnrs-gif.fr) Date: Fri Apr 8 05:40:56 2005 Subject: [BioPython] biopython-epydoc Message-ID: <1112953608.4256530863d31@mail.iaf.cnrs-gif.fr> Hi, Sorry, I have no idea about what the problem is, but I have an advice if you do use epydoc, because I have had the problem before. To avoid getting a lot of useless output and a very long documentation you should comment out the following lines in Bio.Restriction.Restriction.py which are at the bottom end of the file. This code create the class for the restriction enzymes (and there is about 600 of them). Epydoc will create the documentation for each enzyme. It is always more or less the same. The processing is very long and the doc quite useless. If you comment these lines before running epydoc you will still get the documentation once (for the super class) Here are the line to comment out. Don't forget to uncomment them after :-) Sorry for not being to helping you with your problem, I have had a look and it doesn't ring a bell. Fred """ CommOnly = RestrictionBatch() # commercial enzymes NonComm = RestrictionBatch() # not available commercially for TYPE, (bases, enzymes) in typedict.iteritems() : # # The keys are the pseudo-types TYPE (stored as type1, type2...) # The names are not important and are only present to differentiate # the keys in the dict. All the pseudo-types are in fact RestrictionType. # These names will not be used after and the pseudo-types are not # kept in the locals() dictionary. It is therefore impossible to # import them. # Now, if you have look at the dictionary, you will see that not all the # types are present as those without corresponding enzymes have been # removed by Dictionary_Builder(). # # The values are tuples which contain # as first element a tuple of bases (as string) and # as second element the names of the enzymes. # # First eval the bases. # bases = tuple([eval(x) for x in bases]) # # now create the particular value of RestrictionType for the classes # in enzymes. # T = type.__new__(RestrictionType, 'RestrictionType', bases, {}) for k in enzymes : # # Now, we go through all the enzymes and assign them their type. # enzymedict[k] contains the values of the attributes for this # particular class (self.site, self.ovhg,....). # newenz = T(k, bases, enzymedict[k]) # # we add the enzymes to the corresponding batch. # # No need to verify the enzyme is a RestrictionType -> add_nocheck # if newenz.is_comm() : CommOnly.add_nocheck(newenz) else : NonComm.add_nocheck(newenz) # # AllEnzymes is a RestrictionBatch with all the enzymes from Rebase. # AllEnzymes = CommOnly | NonComm # # Now, place the enzymes in locals so they can be imported. # names = [str(x) for x in AllEnzymes] locals().update(dict(map(None, names, AllEnzymes))) __all__=['FormattedSeq', 'Analysis', 'RestrictionBatch','AllEnzymes','CommOnly','NonComm']+names del k, x, enzymes, TYPE, bases, names "" Le vendredi 8 Avril 2005 10:53, Rafael C. Jimenez a ?crit : > I am trying to get html help pages of biopython through epydoc. Epydoc > generate the html files but there is not information inside about the > modules, classes, and/or functions. Any idea of how to do this? > > My input/output: > ---------------------------------------------------------------------- > nbn213:/System/Library/Frameworks/Python.framework/Versions/2.3/bin rc$ > ls > epydoc epydocgui idle pydoc python > python2.3 > > nbn213:/System/Library/Frameworks/Python.framework/Versions/2.3/bin rc$ > ./epydoc -o /Users/rc/Desktop/epydoc/biopython --css blue > --private-css green -v -n biopython -u http://biopython.org > --inheritance listed Bio > > Importing 1 modules. > [1/1] Importing Bio > Building API documentation for 1 modules. > [1/1] Building docs for Bio > Writing HTML docs (10 files) to '/Users/rc/Desktop/epydoc/biopython'. > [ 1/10] Writing epydoc.css > [ 2/10] Writing Bio-module.html > [ 3/10] Writing indices.html > [ 4/10] Writing trees.html > [ 5/10] Writing help.html > [ 6/10] Writing frames.html > [ 7/10] Writing toc.html > [ 8/10] Writing toc-everything.html > [ 9/10] Writing toc-Bio-module.html > [10/10] Writing index.html > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Fr?d?ric Sohm Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s" UPR 2197 DEPSN, CNRS Institut de Neurosciences A. Fessard 1 Avenue de la Terrasse 91 198 GIF-SUR-YVETTE FRANCE Phone: +33 (0) 1 69 82 34 12 Fax:+33 (0) 1 69 82 34 47 From GeNuts at emw.demon.nl Fri Apr 8 07:23:54 2005 From: GeNuts at emw.demon.nl (E.M.Wielhouwer) Date: Fri Apr 8 07:44:56 2005 Subject: [BioPython] Installation of Biopython 1.40b on Windows XP system (newbee on (bio)python and programming) Message-ID: <6.0.0.22.2.20050408132326.01b7d618@pop3.demon.nl> Dear All, As beginner in programming I picked up O'Reilly's "Learning Python" because there was a package called BioPython with nice tools for free of use of which I'm interested in as Biomedical Researcher. Bought both books (learning and programming python, however, following each of examples per chapter of "learning python" I stumbled upon a problem that I can't overcome which make me desperate due to the lack of knowledge on this subject. (below also questions on the installation of biopython on my windows XP machine) As read at page 270 I tried to follow your example on importing scripts outside the python installation directory but I failed. I hope you might want to help me out here I'm really stuck aon this subject. Below I describe what I've done but first some settings of python. ---- Python version: PythonWin 2.4.1 (#65, Mar 30 2005, 09:33:37) [MSC v.1310 32 bit (Intel)] on win32. Portions Copyright 1994-2004 Mark Hammond (mhammond@skippinet.com.au) - see 'Help/About PythonWin' for further copyright information. Operating system: Windows-XP SP1 Program locates at: C:\\Prgrmmr\\Python24 Specific MyScript path C:\Prgrmmr\Python24\MyPythonpaths.pth (this includes two lines of code: # .pth file for My own Scripts and: C:\Prgrmmr\MyCode\ ) Location of my scripts: C:\\Prgrmmr\\MyCode ---- Here is my output from Python (pythonwin - IDE mode). ---- >>> import sys >>> sys.path ['', 'C:\\WINDOWS\\System32\\python24.zip', 'C:\\Documents and Settings\\Gandalf', 'C:\\Prgrmmr\\Python24\\DLLs', 'C:\\Prgrmmr\\Python24\\lib', 'C:\\Prgrmmr\\Python24\\lib\\plat-win', 'C:\\Prgrmmr\\Python24\\lib\\lib-tk', 'C:\\Prgrmmr\\Python24\\Lib\\site-packages\\pythonwin', 'C:\\Prgrmmr\\Python24', 'C:\\Prgrmmr\\MyCode', 'C:\\Prgrmmr\\Python24\\Lib\\site-packages\\reportlab', 'C:\\Prgrmmr\\Python24\\lib\\site-packages', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\Numeric', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\PIL', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\win32', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\win32\\lib'] ---- continued below---- ['', 'C:\\WINDOWS\\System32\\python24.zip', 'C:\\Documents and Settings\\Gandalf', 'C:\\Prgrmmr\\Python24\\DLLs', 'C:\\Prgrmmr\\Python24\\lib', 'C:\\Prgrmmr\\Python24\\lib\\plat-win', 'C:\\Prgrmmr\\Python24\\lib\\lib-tk', 'C:\\Prgrmmr\\Python24\\Lib\\site-packages\\pythonwin', 'C:\\Prgrmmr\\Python24', 'C:\\Prgrmmr\\MyCode', 'C:\\Prgrmmr\\Python24\\Lib\\site-packages\\reportlab', 'C:\\Prgrmmr\\Python24\\lib\\site-packages', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\Numeric', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\PIL', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\win32', 'C:\\Prgrmmr\\Python24\\lib\\site-packages\\win32\\lib'] ----continued----- >>> import Prgrmmr.MyCode.mod Traceback (most recent call last): File "", line 1, in ? ImportError: No module named Prgrmmr.MyCode.mod >>> ---- Similar result is obtained even when the __init__.py file is included in the C:\\Prgrmmr and C:\\Prgrmmr\\MyCode directory. The __init__.py file includes the following text: #File: Python\__init__.py print 'Python init' x=1 I hope you can help me out on this (most likely I forgot something mentioned in the book (which I assume)), A few other things which seems to be difficult for me on Brad's biopython installation manual. First, it's not clear to me if I should have additional libraries aside the ones described in Brad's Manual? Second, (manual 3.3.2 Reportlab install) How about rl_accel 0.52, PyRXP 1.05, etc. , etc. has that also to be installed? www.biopython.org/docs/install/Installation.html#htoc18 Third (manual 3.4) should I install MySQL from www.mySQL.com prior to the install of MySQLdb or not at all? www.biopython.org/docs/install/Installation.html#htoc20 Third, (manual 3.4) because I've got a windows XP OS I need to compiler the psycog library (or do I call this differently) for that, where do I get a (VC++ 6) compiler from (might be a stupid question)? www.biopython.org/docs/biosql/python_biosql_basic.html#htoc6 Finally, (manual 4.4) where goes the documentation, tests, example code that exists in the biopython source file biopython-1.40b.zip. Does it have to be compiled and how do I do that if 4.5 does not apply? I question this due to the reading of section 5, paragraph 3, line 2 "Tests directory inside the distribution). www.biopython.org/docs/install/Installation.html#htoc31 I'm stuck on this and every bit help is welcome and well appreciated, Eric From letondal at pasteur.fr Fri Apr 8 09:54:32 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Fri Apr 8 09:43:53 2005 Subject: [BioPython] CFP NETTAB 2005: Network Tools and Applications in Biology / Workflows management Message-ID: Hi, {Please pass the word!} NETTAB 2005: Network Tools and Applications in Biology Workflows management: new abilities for the biological information overflow 5-7 October, 2005, Second University of Naples Naples, Italy Web page : http://www.nettab.org/2005 Call for Papers: http://www.nettab.org/2005/call.html -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal From aurelie.bornot at free.fr Wed Apr 13 12:04:34 2005 From: aurelie.bornot at free.fr (aurelie.bornot@free.fr) Date: Wed Apr 13 12:00:35 2005 Subject: [BioPython] mysql trouble Message-ID: <1113408274.425d4312982ff@imp5-q.free.fr> Hi, I had a similar problem and I tried Rafael 's script on windows : it didn't work either.... for me.... :'( But I tried to add : db.commit() after cursor.execute("""---""") and it worked !!! hope this help ! cheers, Aurelie Rafael C. Jimenez rafael at nbn.ac.za wrote : >Try to create a cursor of "conn" and then use the method execute to >process the query. >See one example from the MySQLdb help: >#!/usr/bin/python ># import MySQL module >import MySQLdb ># connect >db = MySQLdb.connect(host="localhost", user="joe", >passwd="secret", >db="db56a") ># create a cursor >cursor = db.cursor() ># execute SQL statement >cursor.execute("""INSERT INTO animals (name, species) VALUES >("Harry", >"Hamster")""") >Good luck with your project! >Cheers, >Rafael. On 31/03/2005, at 12:30, pap501 at york.ac.uk wrote: > Hi > > I am a masters student at the University of York working on a project > to create a database of DNA sequences. > > I am trying to grab information from text files using a python script > to insert data into a MySQL table. The problem is that although I can > query MySQL through Python I cannot write into MySQL through Python. I > am using the Windows platform with Python version 2.4 with Biopython > installed and MySQL Server 4.1. > > I have attached my python script to this email. > The script takes each text file in turn (although only one is listed > in the script at the mo) and inserts the library code, Genbank code > (primary key), TiGR code (if there is one, null otherwise) and the DNA > sequence (string of characters). Each file is a library of ~2000 DNA > sequences. The python script runs but does not write the information > into my MySQL table. When I query the table in MySQL it says that the > table is empty. > > Can anyone advise me what to do? Is it soemthing to do with the setup > of MySQL and/or Python or a problem with the Python script? > > Any advice would be greatly appreciated. > > Many thanks > > Phil_______________________________________________ > BioPython mailing list - BioPython at biopython.org > http://biopython.org/mailman/listinfo/biopython From aurelie.bornot at free.fr Wed Apr 13 12:18:17 2005 From: aurelie.bornot at free.fr (aurelie.bornot@free.fr) Date: Wed Apr 13 12:13:07 2005 Subject: [BioPython] Bioregistry on windows.... Message-ID: <1113409097.425d464973ac9@imp5-q.free.fr> Hi ! I have tried hard since my first mail ("Bioregistry...doesn't work with me")to deal with Bioregistry on windows.... And ... I am lost... Jan T. Kim suggest it was a network problem (thanks for your clue Jan !) but if it is I can't find it (I am a news programmer...) can't it be a problem with fork()???? when we do : >>> from Bio import db >>> print db >>> db.keys() >>> sp = db["swissprot"] the db creation uses fork() (I saw that trying to follow the labyrinth (for me) of the creation of db, but I lost myself and I don't remember where exatly... sorry ! Can someone help ??? Thanks Aurelie From aurelie.bornot at free.fr Wed Apr 13 12:24:40 2005 From: aurelie.bornot at free.fr (aurelie.bornot@free.fr) Date: Wed Apr 13 12:18:38 2005 Subject: [BioPython] BioSQL and BioSeqDatabase Message-ID: <1113409480.425d47c80305f@imp5-q.free.fr> Hello, Once again : sorry for spaming !! Does anyone know how I can get the structure of the BioSeqDatabase of the BioSQL module and how to create one....??? I would like to insert the table(s) of a BioSeqDatabase in my own database.... Is it possible ??? Thanks ! Aurelie From faheem at email.unc.edu Wed Apr 13 14:04:59 2005 From: faheem at email.unc.edu (Faheem Mitha) Date: Wed Apr 13 13:58:28 2005 Subject: [BioPython] parsing fasta file to list of sequences Message-ID: Hi, I was needing to parse a fasta file into a list of sequences in the following fashion. [['acg'], ['tac'],...] where each entry is a different sequence. Is this easily possible with the current parsing tools? If so, would somone be kind enough to sketch an approach? Thanks in advance. Faheem. From fkauff at duke.edu Wed Apr 13 14:22:26 2005 From: fkauff at duke.edu (Frank Kauff) Date: Wed Apr 13 14:18:33 2005 Subject: [BioPython] parsing fasta file to list of sequences In-Reply-To: References: Message-ID: <1113416546.22662.1.camel@osiris.biology.duke.edu> Hi Faheem, On Wed, 2005-04-13 at 14:04 -0400, Faheem Mitha wrote: > Hi, > > I was needing to parse a fasta file into a list of sequences in the following > fashion. > > [['acg'], ['tac'],...] > > where each entry is a different sequence. Is this easily possible with the > current parsing tools? If so, would somone be kind enough to sketch an > approach? Thanks in advance. > [fkauff@osiris align]$ cat fasta >one AAAAA >two CCCCCC >three GGGGGGGG >>> from Bio import SeqUtils >>> fasta=SeqUtils.quick_FASTA_reader('fasta') >>> names,seqs=zip(*fasta) >>> names ('one', 'two', 'three') >>> seqs ('AAAAA', 'CCCCCC', 'GGGGGGGG') or to get exactly what you wanted >>> seqs2=[[s[1]] for s in fasta] >>> seqs2 [['AAAAA'], ['CCCCCC'], ['GGGGGGGG']] Frank > Faheem. > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 Web http://www.lutzonilab.net/member/frankkauff.shtml From faheem at email.unc.edu Wed Apr 13 14:30:11 2005 From: faheem at email.unc.edu (Faheem Mitha) Date: Wed Apr 13 14:24:29 2005 Subject: [BioPython] parsing fasta file to list of sequences In-Reply-To: <1113416546.22662.1.camel@osiris.biology.duke.edu> References: <1113416546.22662.1.camel@osiris.biology.duke.edu> Message-ID: On Wed, 13 Apr 2005, Frank Kauff wrote: > [fkauff@osiris align]$ cat fasta >> one > AAAAA >> two > CCCCCC >> three > GGGGGGGG > >>>> from Bio import SeqUtils >>>> fasta=SeqUtils.quick_FASTA_reader('fasta') >>>> names,seqs=zip(*fasta) >>>> names > ('one', 'two', 'three') >>>> seqs > ('AAAAA', 'CCCCCC', 'GGGGGGGG') > > or to get exactly what you wanted > >>>> seqs2=[[s[1]] for s in fasta] >>>> seqs2 > [['AAAAA'], ['CCCCCC'], ['GGGGGGGG']] Thanks Frank. That's very helpful. Faheem. From karbak at gmail.com Thu Apr 14 01:59:01 2005 From: karbak at gmail.com (K. Arun) Date: Thu Apr 14 01:52:35 2005 Subject: [BioPython] Bio.PDB : model count in NMR structures Message-ID: <162452a105041322594f3237ab@mail.gmail.com> Hello, In the course of extracting sequence information for a sub-set of PDB files and writing out specific parts using the Bio.PDB classes, I ran across an anomaly. When parsing NMR structures, if I query the number of models present in the structure thus : str = p.get_structure('8tfv', '8tfv.pdb') mdls = Selection.unfold_entities(str, 'M') print len(mdls) , the number of models printed always exceeds the number present in the file by 1. A mdls[-1].get_list() yields an empty chain list. Is this a known issue, or even, a feature ? -arun From thamelry at binf.ku.dk Thu Apr 14 04:45:13 2005 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Thu Apr 14 05:49:18 2005 Subject: [BioPython] Bio.PDB : model count in NMR structures In-Reply-To: <162452a105041322594f3237ab@mail.gmail.com> References: <162452a105041322594f3237ab@mail.gmail.com> Message-ID: <200504141045.14080.thamelry@binf.ku.dk> Hi Arun, > In the course of extracting sequence information for a sub-set of PDB > files and writing out specific parts using the Bio.PDB classes, I ran > across an anomaly. When parsing NMR structures, if I query the number > of models present in the structure thus : > > str = p.get_structure('8tfv', '8tfv.pdb') > mdls = Selection.unfold_entities(str, 'M') > > print len(mdls) > > , the number of models printed always exceeds the number present in > the file by 1. A mdls[-1].get_list() yields an empty chain list. Is > this a known issue, or even, a feature ? Thanks for reporting this bug. I've fixed it in CVS. BTW, len(str) will give you the number of models. Best regards, -Thomas -- Thomas Hamelryck, Postdoctoral researcher Bioinformatics center University of Copenhagen Universitetsparken 15 Bygning 10 2100 Copenhagen, Denmark --- http://www.binf.ku.dk/users/thamelry/ From idoerg at burnham.org Thu Apr 14 20:53:47 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu Apr 14 20:47:21 2005 Subject: [BioPython] Speakers for BOSC needed Message-ID: <425F109B.8040507@burnham.org> Hi all, It's that time of year again, and BOSC 2005 will be happening on June 23-24. The more Biopython representatives, the merrier. I will be around, but I will be dealing with my own SIG meeting, so I will not be able to give a talk. Is there someone who can give the BioPython "plenary"? should be a 30-40 minute talk. Also, there are slots for shorter talks, so if you contributed and interesting module, or had an interesting experience with biopython you would like to share, please submit a talk. For those of you who do not know what BOSC is, it's the Bioinformatics Open Source Conference, which is held as a satellite meeting of ISMB. I highly recommend this event, it is a real eye opener with respect to the world of open source, and computational biology. More about it here: http://open-bio.org/bosc/ Cheers, Iddo -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 http://ffas.ljcrf.edu/~iddo ========================== The First Automated Protein Function Prediction SIG Detroit, MI June 24, 2005 http://ffas.burnham.org/AFP From dalke at dalkescientific.com Fri Apr 15 06:21:39 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri Apr 15 06:20:41 2005 Subject: [BioPython] Speakers for BOSC needed In-Reply-To: <425F109B.8040507@burnham.org> References: <425F109B.8040507@burnham.org> Message-ID: <6e67f5757757976119255f804d72bb3a@dalkescientific.com> To follow up on Iddo's email, if you can't make it to BOSC this year because you happen to be in, say, Europe you should consider Europython. It's 3 days after BOSC and overlaps with ISMB. It will be in lovely Gothenburg, Sweden again this year. Jag vill vara d?r nu. www.europython.org Both BOSC and EuroPython have call-for-talks out so even if you don't want to give the main Biopython talk you might want to give one about your current work. Andrew dalke@dalkescientific.com From wligtenberg at gmail.com Fri Apr 15 07:36:36 2005 From: wligtenberg at gmail.com (Willem Ligtenberg) Date: Fri Apr 15 07:32:21 2005 Subject: [BioPython] Martel: IterParser.IterHeaderFooter Message-ID: I'm creating a parser and have specified this: self.ip = IterParser.IterHeaderFooter( self.header, RecordReader.CountLines, (1,), self.record, RecordReader.CountLines, (1,), None, None, None, "format") Then I can parse the records with: for data in self.ip.iterateFile(open(fileName, "r")): and do things with data. But I also need the information from the header, because it specifies with what kind of information I'm working. How can I access that data? Thanks in advance, Willem Ligtenberg From marc.saric at gmx.de Fri Apr 15 09:44:44 2005 From: marc.saric at gmx.de (Marc Saric) Date: Fri Apr 15 09:38:23 2005 Subject: [BioPython] Martel.Parser.ParserPositionException: Message-ID: <425FC54C.7050006@gmx.de> Hi all, I tried to index the the following Genbank file with this simple script (as described in the cookbook) but it failed with the following traceback. The file can be found in: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=BA000018 including all features (SNP, CDD, MGC, HPRD, STS) I tried with Biopython 1.3, Biopython 1.4b and the Biopython-CVS as of 2005-04-15. The program ===snip=== #!/usr/bin/env python from Bio import GenBank dict_file = "ba000018_s_aureus_n315_genome.gb" index_file = "ba000018_s_aureus_n315_genome.idx" GenBank.index_file(dict_file, index_file) ===snap=== The Traceback: ===snip=== Traceback (most recent call last): File "/home/saric/data/devel/workspace/scripts/hitman/index_gb.py", line 37, in ? GenBank.index_file(dict_file, index_file) # FIXME: This breaks with the N315 S.aureus-genome File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/GenBank/__init__.py", line 1283, in index_file SimpleSeqRecord.create_flatdb([filename], indexname, indexer) File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/Mindy/SimpleSeqRecord.py", line 152, in create_flatdb creator.load(filename, builder = builder, fileid_info = {}) File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/Mindy/BaseDB.py", line 52, in load for record in iterator.iterate(source, cont_handler = builder): File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Martel/IterParser.py", line 71, in iterateFile raise Parser.ParserPositionException(self.start_position) Martel.Parser.ParserPositionException: error parsing at or beyond character 5887615 ===snap=== I use a x86-machine running SuSE-Linux 9.1 (kernel 2.6.5-7.147-default, gcc version 3.3.3). If I use a version of the file, which does not have the feature-annotations, it gives the following traceback: ===snip=== Traceback (most recent call last): File "/home/saric/data/devel/workspace/scripts/hitman/index_gb.py", line 37, in ? GenBank.index_file(dict_file, index_file) # FIXME: This breaks with the N315 S.aureus-genome File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/GenBank/__init__.py", line 1283, in index_file SimpleSeqRecord.create_flatdb([filename], indexname, indexer) File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/Mindy/SimpleSeqRecord.py", line 152, in create_flatdb creator.load(filename, builder = builder, fileid_info = {}) File "/home/saric/transfer/source/biopython/biopython_cvs_20050415/biopython/build/lib.linux-i686-2.3/Bio/Mindy/BaseDB.py", line 36, in load raise TypeError("Cannot identify file as a %s format" % TypeError: Cannot identify file as a unknown format ===snap=== which -to me- sounds a bit strange, as it looks like a normal Genbank-file. If someone has a clue... Thanks in advance. -- Bye, Marc Saric http://www.marcsaric.de From mls5w at virginia.edu Fri Apr 15 13:39:03 2005 From: mls5w at virginia.edu (Michael Sierk) Date: Fri Apr 15 13:28:07 2005 Subject: [BioPython] Alternate Conformations in PDB Message-ID: How does the PDB module deal with alternate conformations? It appears to me that it is completely ignoring them, at least in the polypeptide builder. Thanks, Mike Sierk Here's the snippet of code: > print "id: ", id > pdb_file = '/rd0/users/mls5w/cathdoms/' + id + '.pdb' > p = PDBParser() > s = p.get_structure('1', pdb_file) # The structure > ppb=CaPPBuilder() > for pp in ppb.build_peptides(s): > print pp.get_sequence() > Here's the output: > id: 11baA0 > Seq('KESAAAKFERQHMDS', 0x62f300>) > Seq('SNYCNLMMCCRKMTQGKCKPVNTFVHESLADVKAVCSQKKVTCKNGQTNCYQSKSTMRIT > ...', ) And here's the relevant section of the PDB file: > ATOM 89 N HIS A 12 7.245 -5.906 13.618 1.00 13.83 > N > ATOM 90 CA HIS A 12 7.227 -6.425 14.964 1.00 13.71 > C > ATOM 91 C HIS A 12 7.373 -5.379 16.042 1.00 14.43 > C > ATOM 92 O HIS A 12 7.253 -5.726 17.224 1.00 14.41 > O > ATOM 93 CB HIS A 12 8.405 -7.418 15.156 1.00 11.77 > C > ATOM 94 CG HIS A 12 8.257 -8.620 14.267 1.00 9.61 > C > ATOM 95 ND1 HIS A 12 7.438 -9.677 14.614 1.00 8.63 > N > ATOM 96 CD2 HIS A 12 8.761 -8.909 13.054 1.00 9.08 > C > ATOM 97 CE1 HIS A 12 7.469 -10.573 13.653 1.00 8.45 > C > ATOM 98 NE2 HIS A 12 8.244 -10.136 12.692 1.00 9.30 > N > ATOM 99 N MET A 13 7.676 -4.145 15.632 1.00 15.20 > N > ATOM 100 CA MET A 13 7.908 -3.151 16.679 1.00 16.58 > C > ATOM 101 C MET A 13 6.710 -2.207 16.806 1.00 18.05 > C > ATOM 102 O MET A 13 6.222 -1.682 15.802 1.00 18.98 > O > ATOM 103 CB MET A 13 9.150 -2.336 16.326 1.00 16.39 > C > ATOM 104 CG MET A 13 10.462 -3.110 16.502 1.00 14.43 > C > ATOM 105 SD MET A 13 10.738 -3.774 18.125 1.00 14.79 > S > ATOM 106 CE MET A 13 10.504 -2.444 19.253 1.00 13.41 > C > ATOM 107 N ASP A 14 6.232 -2.054 18.027 1.00 18.73 > N > ATOM 108 CA ASP A 14 5.248 -1.020 18.349 1.00 20.36 > C > ATOM 109 C ASP A 14 5.471 -0.573 19.800 1.00 21.48 > C > ATOM 110 O ASP A 14 4.799 -0.915 20.761 1.00 19.90 > O > ATOM 111 CB ASP A 14 3.850 -1.536 18.122 1.00 20.82 > C > ATOM 112 CG ASP A 14 2.763 -0.518 18.409 1.00 20.73 > C > ATOM 113 OD1 ASP A 14 3.013 0.679 18.204 1.00 21.16 > O > ATOM 114 OD2 ASP A 14 1.676 -0.922 18.878 1.00 20.89 > O > ATOM 115 N SER A 15 6.531 0.266 19.929 1.00 22.96 > N > ATOM 116 CA SER A 15 7.021 0.672 21.236 1.00 25.33 > C > ATOM 117 C SER A 15 6.142 1.732 21.895 1.00 25.92 > C > ATOM 118 O SER A 15 5.631 2.643 21.240 1.00 26.33 > O > ATOM 119 CB SER A 15 8.431 1.278 21.165 1.00 25.96 > C > ATOM 120 OG SER A 15 9.224 0.674 20.169 1.00 28.77 > O > ATOM 121 N AGLY A 16 5.838 1.525 23.178 0.50 26.35 > N > ATOM 122 N BGLY A 16 6.124 1.705 23.225 0.50 25.97 > N > ATOM 123 CA AGLY A 16 4.936 2.410 23.894 0.50 27.02 > C > ATOM 124 CA BGLY A 16 5.414 2.708 24.008 0.50 26.33 > C > ATOM 125 C AGLY A 16 4.234 1.754 25.072 0.50 27.16 > C > ATOM 126 C BGLY A 16 3.966 2.816 23.527 0.50 26.37 > C > ATOM 127 O AGLY A 16 4.212 0.540 25.212 0.50 27.16 > O > ATOM 128 O BGLY A 16 3.341 3.874 23.573 0.50 26.73 > O > ATOM 129 N AASN A 17 3.636 2.582 25.913 0.50 27.76 > N > ATOM 130 N BASN A 17 3.485 1.689 23.016 0.50 26.24 > N > ATOM 131 CA AASN A 17 2.987 2.195 27.145 0.50 28.39 > C > ATOM 132 CA BASN A 17 2.127 1.580 22.548 0.50 25.85 > C > ATOM 133 C AASN A 17 1.661 1.467 26.992 0.50 28.38 > C > ATOM 134 C BASN A 17 1.516 0.294 23.105 0.50 25.52 > C > ATOM 135 O AASN A 17 1.453 0.474 27.719 0.50 28.66 > O > ATOM 136 O BASN A 17 2.210 -0.697 23.262 0.50 25.46 > O > ATOM 137 CB AASN A 17 2.763 3.460 27.988 0.50 29.72 > C > ATOM 138 CB BASN A 17 1.985 1.538 21.029 0.50 26.31 > C > ATOM 139 CG AASN A 17 2.781 3.204 29.473 0.50 30.83 > C > ATOM 140 CG BASN A 17 0.521 1.841 20.683 0.50 26.41 > C > ATOM 145 N ASER A 18 0.750 1.899 26.127 0.50 28.07 > N > ATOM 146 N BSER A 18 0.240 0.382 23.399 0.50 25.45 > N > ATOM 147 CA ASER A 18 -0.539 1.242 25.929 0.50 27.72 > C > ATOM 148 CA BSER A 18 -0.525 -0.705 23.998 0.50 25.06 > C > ATOM 149 C ASER A 18 -1.045 1.350 24.492 0.50 27.14 > C > ATOM 150 C BSER A 18 -1.777 -1.015 23.183 0.50 24.72 > C > ATOM 151 O ASER A 18 -1.950 2.120 24.156 0.50 26.94 > O > ATOM 152 O BSER A 18 -2.368 -0.143 22.535 0.50 24.51 > O > ATOM 153 CB ASER A 18 -1.621 1.834 26.845 0.50 28.33 > C > ATOM 154 CB BSER A 18 -0.997 -0.266 25.397 0.50 24.82 > C > ATOM 155 OG ASER A 18 -1.392 1.513 28.209 0.50 29.30 > O > ATOM 156 OG BSER A 18 -2.132 -1.021 25.794 0.50 25.79 > O > ATOM 157 N APRO A 19 -0.468 0.545 23.598 0.50 26.85 > N > ATOM 158 N BPRO A 19 -2.203 -2.267 23.241 0.50 24.66 > N > ATOM 159 CA APRO A 19 -0.783 0.550 22.177 0.50 26.33 > C > ATOM 160 CA BPRO A 19 -3.401 -2.682 22.525 0.50 24.80 > C > ATOM 161 C APRO A 19 -2.171 0.081 21.778 0.50 25.90 > C > ATOM 162 C BPRO A 19 -4.683 -2.125 23.129 0.50 25.11 > C > ATOM 163 O APRO A 19 -2.629 0.411 20.667 0.50 25.39 > O > ATOM 164 O BPRO A 19 -5.696 -2.031 22.426 0.50 25.01 > O > ATOM 165 CB APRO A 19 0.313 -0.344 21.569 0.50 26.13 > C > ATOM 166 CB BPRO A 19 -3.325 -4.213 22.567 0.50 24.85 > C > ATOM 167 CG APRO A 19 0.703 -1.221 22.715 0.50 26.33 > C > ATOM 168 CG BPRO A 19 -1.877 -4.499 22.849 0.50 24.41 > C > ATOM 169 CD APRO A 19 0.728 -0.284 23.894 0.50 26.61 > C > ATOM 170 CD BPRO A 19 -1.454 -3.407 23.797 0.50 24.40 > C > ATOM 171 N ASER A 20 -2.876 -0.674 22.618 0.50 25.65 > N > ATOM 172 N BSER A 20 -4.653 -1.682 24.379 0.50 25.23 > N > ATOM 173 CA ASER A 20 -4.230 -1.118 22.259 0.50 25.89 > C > ATOM 174 CA BSER A 20 -5.848 -1.272 25.101 0.50 25.72 > C > ATOM 175 C ASER A 20 -5.291 -0.139 22.750 0.50 25.73 > C > ATOM 176 C BSER A 20 -6.254 0.166 24.944 0.50 26.25 > C > ATOM 177 O ASER A 20 -6.480 -0.463 22.793 0.50 26.21 > O > ATOM 178 O BSER A 20 -7.348 0.525 25.439 0.50 27.13 > O > ATOM 179 CB ASER A 20 -4.540 -2.530 22.712 0.50 25.71 > C > ATOM 180 CB BSER A 20 -5.570 -1.510 26.607 0.50 24.71 > C > ATOM 181 OG ASER A 20 -4.728 -2.715 24.098 0.50 25.54 > O > ATOM 182 OG BSER A 20 -4.669 -0.508 27.054 0.50 24.26 > O > ATOM 183 N ASER A 21 -4.886 1.088 23.074 0.50 25.75 > N > ATOM 184 N BSER A 21 -5.466 1.026 24.306 0.50 26.72 > N > ATOM 185 CA ASER A 21 -5.768 2.144 23.524 0.50 25.75 > C > ATOM 186 CA BSER A 21 -5.907 2.416 24.149 0.50 27.32 > C > ATOM 187 C ASER A 21 -6.611 2.788 22.422 0.50 26.28 > C > ATOM 188 C BSER A 21 -6.794 2.644 22.931 0.50 27.72 > C > ATOM 189 O ASER A 21 -6.206 3.117 21.302 0.50 26.12 > O > ATOM 190 O BSER A 21 -6.464 2.331 21.778 0.50 28.30 > O > ATOM 191 CB ASER A 21 -4.993 3.250 24.251 0.50 24.73 > C > ATOM 192 CB BSER A 21 -4.704 3.354 24.068 0.50 26.81 > C > ATOM 193 OG ASER A 21 -5.885 4.221 24.785 0.50 24.45 > O > ATOM 194 OG BSER A 21 -5.089 4.561 23.427 0.50 27.93 > O > ATOM 195 N ASER A 22 -7.889 2.965 22.778 0.50 26.48 > N > ATOM 196 N BSER A 22 -7.965 3.250 23.169 0.50 27.84 > N > ATOM 197 CA ASER A 22 -8.920 3.579 21.976 0.50 26.69 > C > ATOM 198 CA BSER A 22 -8.918 3.570 22.113 0.50 27.56 > C > ATOM 199 C ASER A 22 -8.428 4.801 21.209 0.50 26.80 > C > ATOM 200 C BSER A 22 -8.537 4.831 21.343 0.50 27.43 > C > ATOM 201 O ASER A 22 -8.764 4.977 20.033 0.50 26.70 > O > ATOM 202 O BSER A 22 -9.053 5.066 20.240 0.50 27.39 > O > ATOM 203 CB ASER A 22 -10.060 4.097 22.893 0.50 26.60 > C > ATOM 204 CB BSER A 22 -10.326 3.759 22.688 0.50 27.51 > C > ATOM 205 OG ASER A 22 -9.541 4.989 23.880 0.50 26.51 > O > ATOM 206 OG BSER A 22 -10.313 4.024 24.085 0.50 26.99 > O > ATOM 207 N SER A 23 -7.640 5.631 21.902 1.00 27.00 > N > ATOM 208 CA SER A 23 -7.169 6.864 21.294 1.00 26.84 > C > ATOM 209 C SER A 23 -6.336 6.629 20.054 1.00 26.48 > C > ATOM 210 O SER A 23 -6.133 7.548 19.242 1.00 26.87 > O > ATOM 211 CB SER A 23 -6.294 7.609 22.306 1.00 28.70 > C > ATOM 212 OG SER A 23 -6.859 7.488 23.616 1.00 32.19 > O > ATOM 213 N ASN A 24 -5.845 5.405 19.860 1.00 25.76 > N > ATOM 214 CA ASN A 24 -5.059 5.083 18.679 1.00 25.09 > C > ATOM 215 C ASN A 24 -5.839 4.660 17.454 1.00 23.88 > C > ATOM 216 O ASN A 24 -5.270 4.417 16.381 1.00 23.10 > O > ATOM 217 CB ASN A 24 -4.125 3.903 19.074 1.00 28.98 > C > ATOM 218 CG ASN A 24 -2.816 4.508 19.592 1.00 31.51 > C > ATOM 219 OD1 ASN A 24 -2.010 4.982 18.768 1.00 33.38 > O > ATOM 220 ND2 ASN A 24 -2.623 4.518 20.908 1.00 32.55 > N > ATOM 221 N TYR A 25 -7.172 4.572 17.571 1.00 21.77 > N > ATOM 222 CA TYR A 25 -8.002 4.106 16.460 1.00 19.42 > C > ATOM 223 C TYR A 25 -7.721 4.816 15.163 1.00 18.30 > C > ATOM 224 O TYR A 25 -7.485 4.187 14.119 1.00 18.39 > O > ATOM 225 CB TYR A 25 -9.493 4.208 16.852 1.00 17.65 > C > ATOM 226 CG TYR A 25 -10.419 3.819 15.695 1.00 16.80 > C > ATOM 227 CD1 TYR A 25 -10.770 2.507 15.449 1.00 15.58 > C > ATOM 228 CD2 TYR A 25 -10.926 4.825 14.879 1.00 15.90 > C > ATOM 229 CE1 TYR A 25 -11.574 2.188 14.353 1.00 16.54 > C > ATOM 230 CE2 TYR A 25 -11.738 4.520 13.803 1.00 16.61 > C > ATOM 231 CZ TYR A 25 -12.059 3.198 13.561 1.00 15.96 > C > ATOM 232 OH TYR A 25 -12.862 2.927 12.478 1.00 17.88 > O > ATOM 233 N CYS A 26 -7.720 6.149 15.146 1.00 17.44 > N > ATOM 234 CA CYS A 26 -7.559 6.858 13.875 1.00 16.90 > C > ATOM 235 C CYS A 26 -6.186 6.627 13.271 1.00 17.07 > C > ATOM 236 O CYS A 26 -5.981 6.456 12.073 1.00 16.96 > O > ATOM 237 CB CYS A 26 -7.769 8.356 14.136 1.00 16.94 > C > ATOM 238 SG CYS A 26 -9.578 8.741 14.286 1.00 15.18 > S ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ Michael Sierk, Postdoctoral Fellow email: mls5wvirginiaedu Biochemistry & Molecular Genetics Dept. Phone (W): (434) 924-2821 University of Virginia (H): (434) 970-2268 ------------------------------------------------------------------------ ------ "There is something fascinating about science. One gets such wholesome returns of conjecture out of such a trifling investment of fact." -- Mark Twain. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ Michael Sierk, Postdoctoral Fellow email: mls5wvirginiaedu Biochemistry & Molecular Genetics Dept. Phone (W): (434) 924-2821 University of Virginia (H): (434) 970-2268 ------------------------------------------------------------------------ ------ "There is something fascinating about science. One gets such wholesome returns of conjecture out of such a trifling investment of fact." -- Mark Twain. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ From letondal at pasteur.fr Sat Apr 16 11:13:47 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Sat Apr 16 11:03:33 2005 Subject: [BioPython] suggestions for Bio.PDB Message-ID: Hi, Would it be possible for the get_structure() method in PDBParser to accept a filehandle, instead of just accepting just a filename? That would be very nice to enable different origins for data (an url, a string + StringIO, a pipe, ...). Another suggestion: it could be useful to keep a record of the read structure, just in case the user would like to benefit from biopython PDB modules, but also do some custom analysis. My suggestions could be impleted like this: def get_structure(self, id, filename=None, filehandle=None): """Return the structure. Arguments: o id - string, the id that will be used for the structure o filename - name of the PDB file o filehandle - a filehandle on a PDB structure """ self.header=None self.trailer=None # Make a StructureBuilder instance (pass id of structure as parameter) self.structure_builder.init_structure(id) if filename is None: file=open(filename) elif filehandle is not None: file = filehandle else: return None pdb_lines = file.readlines() self.pdb_text = "".join(pdb_lines) self._parse(pdb_lines) if filehandle is None: file.close() self.structure_builder.set_header(self.header) # Return the Structure instance return self.structure_builder.get_structure() Thanks for providing this biopython PDB modules! -- Catherine Letondal -- Institut Pasteur www.pasteur.fr/~letondal From biopython at maubp.freeserve.co.uk Sat Apr 16 14:44:01 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat Apr 16 14:31:02 2005 Subject: [BioPython] Martel.Parser.ParserPositionException: In-Reply-To: <425FC54C.7050006@gmx.de> References: <425FC54C.7050006@gmx.de> Message-ID: <42615CF1.7030306@maubp.freeserve.co.uk> Marc Saric wrote: > Hi all, > > I tried to index the the following Genbank file with this simple script > (as described in the cookbook) but it failed with the following traceback. > > The file can be found in: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=BA000018 > > including all features (SNP, CDD, MGC, HPRD, STS) > > I tried with Biopython 1.3, Biopython 1.4b and the Biopython-CVS as of > 2005-04-15. > > The program > > ===snip=== > #!/usr/bin/env python > from Bio import GenBank > > dict_file = "ba000018_s_aureus_n315_genome.gb" > index_file = "ba000018_s_aureus_n315_genome.idx" > > GenBank.index_file(dict_file, index_file) I have just tried this on Windows 2000 with Python 2.3, and suspect the problem may be due to an extra blank line at the very end of the file. I went to the link you gave above, ticked the boxes for SNP, CDD, MGC, HPRD, STS and then clicked on the "Send" to file button. This gave me a file that was about 5.6 MB in size, which I called ba000018_s_aureus_n315_genome.gb following your example. Initially I got a very similar error to yours - in my case the exact position was 5887555, corresponding to the end of the file. I would assume the numerical difference is due to Windows versus Unix new line characters. If I then edited the GenBank file to remove the final blank new line, the sample program seemed to work. This should be a quick an easy thing for you to try, and if it does fix the error, I think we will have identified a minor bug... Peter From thamelry at binf.ku.dk Sun Apr 17 09:03:15 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sun Apr 17 09:37:37 2005 Subject: [BioPython] Alternate Conformations in PDB In-Reply-To: References: Message-ID: <32815.83.92.3.59.1113742995.squirrel@www.binf.ku.dk> > How does the PDB module deal with alternate conformations? It appears > to me that it is completely ignoring them, at least in the polypeptide > builder. Bio.PDB's representation of alternate conformation is actually quite sophisticated IMO (it can even deal with point mutations, ie. for example a Ser and a Pro residue in the same position). OTOH CaPPBuilder indeed breaks the chain if it finds a disordered CA atom, which is a bit questionable, especially since PPBuilder does take disorder into account. So I'll submit a more sophisticated version of CaPPBuilder to CVS today. Best regards, -Thomas From thamelry at binf.ku.dk Sun Apr 17 09:39:54 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sun Apr 17 09:37:40 2005 Subject: [BioPython] suggestions for Bio.PDB In-Reply-To: References: Message-ID: <32950.83.92.3.59.1113745194.squirrel@www.binf.ku.dk> Hi, > Would it be possible for the get_structure() method in PDBParser to > accept a filehandle You're not the first to suggest this - it's already in the CVS version, also for generating PDB output with PDBIO. > Another suggestion: it could be useful to keep a record of the read > structure, just in case the user would like to benefit from biopython > PDB modules, but also do some custom analysis. I don't think this would be very useful, it's easy enough to just read in the file separately. BTW, I'm soon going to start to implement a parser for the new PDB XML format. Any suggestions, comments, etc. regarding this are welcome. Cheers, -Thomas From dalke at dalkescientific.com Sun Apr 17 12:59:54 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Apr 17 12:53:15 2005 Subject: [BioPython] suggestions for Bio.PDB In-Reply-To: References: Message-ID: <55a2235c25137c06ff5da0b79071ee81@dalkescientific.com> Catherine Letondal wrote: > Would it be possible for the get_structure() method in PDBParser to > accept a filehandle, instead of just accepting just a filename? > def get_structure(self, id, filename=None, filehandle=None): > """Return the structure. I've also seen this sort of API done like this def get_structure(self, id, infile): if isinstance(infile, basestring): infile = open(infile) so that there is only one parameter instead of two. Andrew dalke@dalkescientific.com From thamelry at binf.ku.dk Sun Apr 17 13:05:43 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sun Apr 17 13:09:27 2005 Subject: [BioPython] suggestions for Bio.PDB In-Reply-To: <55a2235c25137c06ff5da0b79071ee81@dalkescientific.com> References: <55a2235c25137c06ff5da0b79071ee81@dalkescientific.com> Message-ID: <33426.83.92.3.59.1113757543.squirrel@www.binf.ku.dk> > Catherine Letondal wrote: >> Would it be possible for the get_structure() method in PDBParser to >> accept a filehandle, instead of just accepting just a filename? > >> def get_structure(self, id, filename=None, filehandle=None): >> """Return the structure. > > I've also seen this sort of API done like this > > def get_structure(self, id, infile): > if isinstance(infile, basestring): > infile = open(infile) > > so that there is only one parameter instead of two. I used: def get_structure(self, id, file): ... if type(file)==types.StringType: file=open(file) So there's indeed only one parameter. If 'file' is not a string it's assumed to have the interface of a file handle. -Thomas From dalke at dalkescientific.com Sun Apr 17 13:50:30 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Apr 17 13:43:59 2005 Subject: [BioPython] suggestions for Bio.PDB In-Reply-To: <33426.83.92.3.59.1113757543.squirrel@www.binf.ku.dk> References: <55a2235c25137c06ff5da0b79071ee81@dalkescientific.com> <33426.83.92.3.59.1113757543.squirrel@www.binf.ku.dk> Message-ID: <08a231012f0bd5886de1dc2b93f2835e@dalkescientific.com> Hi Thomas, > I used: > > def get_structure(self, id, file): > ... > if type(file)==types.StringType: > file=open(file) You should use isinstance(file, basestring) for two reasons. As of a few Python releases ago, Python allows unicode filenames but your check doesn't because >>> import types >>> type("") == types.StringType True >>> type(u"") == types.StringType False >>> Also, some people do derive from string. The most relevant example is the 'path' module at http://www.jorendorff.com/articles/python/path/ which points out in the "Warts" section A function that checks the type of its parameters with code like if type(s) is type(''): ..., will fail to recognize a path object as a string. This may be considered a bug in the function, though. The test should be written if isinstance(s, (str, unicode)), or in Python 2.3, if isinstance(s, basestring), both of which allow for string subclasses. Andrew dalke@dalkescientific.com From thamelry at binf.ku.dk Sun Apr 17 14:03:23 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sun Apr 17 14:04:33 2005 Subject: [BioPython] suggestions for Bio.PDB In-Reply-To: <08a231012f0bd5886de1dc2b93f2835e@dalkescientific.com> References: <55a2235c25137c06ff5da0b79071ee81@dalkescientific.com> <33426.83.92.3.59.1113757543.squirrel@www.binf.ku.dk> <08a231012f0bd5886de1dc2b93f2835e@dalkescientific.com> Message-ID: <33589.83.92.3.59.1113761003.squirrel@www.binf.ku.dk> Andrew wrote: > You should use > isinstance(file, basestring) > > for two reasons. As of a few Python releases ago, Python > allows unicode filenames but your check doesn't because > > >>> import types > >>> type("") == types.StringType > True > >>> type(u"") == types.StringType > False Thanks for pointing this out. I've updated it in CVS. Cheers, -Thomas From thamelry at binf.ku.dk Sun Apr 17 14:10:56 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sun Apr 17 14:04:35 2005 Subject: [BioPython] Alternate Conformations in PDB In-Reply-To: References: Message-ID: <33606.83.92.3.59.1113761456.squirrel@www.binf.ku.dk> > How does the PDB module deal with alternate conformations? It appears > to me that it is completely ignoring them, at least in the polypeptide > builder. I've updated CaPPBuilder in CVS and it works on your example now. PPBuilder works in Biopython 1.4b for what you want, I'd like to point out (since you have a full atom model). Thanks for reporting the problem, -Thomas From eirik.sonneland at student.umb.no Mon Apr 18 04:30:56 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Mon Apr 18 04:27:12 2005 Subject: [BioPython] Name of database for NCBIWWW.qblast Message-ID: <42637040.5030702@student.umb.no> Dear all, I'm writing my Master on SNP mining in BOS TAURUS (using Biopython) as a part of the BOVINE genome sequencing project. My question is: Q: from NCBI Trace Archives i get my FASTA and .scf files(chromatograms). I want to (MEGA)BLAST these FASTA files against the BOVINE CONTIGS using Biopython code: b_results = NCBIWWW.qblast('blastn', '???????', f_record).read(). When reading the NCBI infosite I can find some databases but not the BOVINE CONTIGS on http://www.ncbi.nlm.nih.gov/BLAST/docs/netblast.html . What is the name of this database to put into my code? An answer to this would really help allot! Please....:-) Thank you ! Regards, Eirik S?nneland www.south-pole.net From eirik.sonneland at student.umb.no Mon Apr 18 04:30:56 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Mon Apr 18 04:27:34 2005 Subject: [BioPython] Name of database for NCBIWWW.qblast Message-ID: <42637040.5030702@student.umb.no> Dear all, I'm writing my Master on SNP mining in BOS TAURUS (using Biopython) as a part of the BOVINE genome sequencing project. My question is: Q: from NCBI Trace Archives i get my FASTA and .scf files(chromatograms). I want to (MEGA)BLAST these FASTA files against the BOVINE CONTIGS using Biopython code: b_results = NCBIWWW.qblast('blastn', '???????', f_record).read(). When reading the NCBI infosite I can find some databases but not the BOVINE CONTIGS on http://www.ncbi.nlm.nih.gov/BLAST/docs/netblast.html . What is the name of this database to put into my code? An answer to this would really help allot! Please....:-) Thank you ! Regards, Eirik S?nneland www.south-pole.net From idoerg at burnham.org Tue Apr 19 13:29:16 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue Apr 19 13:22:44 2005 Subject: [BioPython] BOSC 2005 Message-ID: <42653FEC.2010907@burnham.org> {Please pass the word!} SECOND CALL FOR SPEAKERS The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the not-for-profit Open Bioinformatics Foundation. The meeting will take place June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest Group (SIG) meetings occurring in conjunction with the 13th International Conference on Intelligent Systems for Molecular Biology. see http://www.iscb.org/ismb2005 for more information. Because of the power of many Open Source bioinformatics packages in use by the Research Community today, it is not too presumptuous to say that the work of the Open Source Bioinformatics Community represents the cutting edge of Bioinformatics in general. This has been repeatedly demonstrated by the quality of presentations at previous BOSC conferences. This year, at BOSC 2005, we want to continue this tradition of excellence, while presenting this message to a wider part of the Research Community. Please, pass this message on to anyone you know that is interested in Bioinformatics software. BOSC PROGRAM & CONTACT INFO * Web: http://www.open-bio.org/bosc2005/ * Online Registration: https://www.cteusa.com/iscb4/ * Email: bosc@open-bio.org FEES * Corporate : $195 ($245 after May 16th) * Academic : $170 ($220 after May 16th) * Student : $145 ($195 after May 16th) SPEAKERS & ABSTRACTS WANTED The program committee is currently seeking abstracts for talks at BOSC 2005. BOSC is a great opportunity for you to tell the community about your use, development, or philosophy of open source software development in bioinformatics. The committee will select several submitted abstracts for 25-minute talks and others for shorter "lightning" talks. Accepted abstracts will be published on the BOSC web site. If you are interested in speaking at BOSC 2005, please send us before April 26, 2005: * an abstract (no more than a few paragraphs) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. Abstracts will be accepted for submission until April 26, 2005. Abstracts chosen for presentation will be announced May 12, 2005 (before the ISMB Early Registration Deadline). LIGHTNING-TALK SPEAKERS WANTED! The program committee is currently seeking speakers for the lightning talks at BOSC 2005. Lightning talks are quick - only five minutes long - and a great opportunity for you to give people a quick summary of your open source project, code, idea, or vision of the future. If you are interested in giving a lightning talk at BOSC 2005, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. We will accept entries on-line until BOSC starts, but space for demos and lightning talks is limited.
Hi, I just wanted to mention a problem with the NCBIStandalone Parser when I try to parse a blast nucleic report file with No hits found using the following code (reduced to the minimum) saint-moret:~ > cat blast_pb.py #! /local/cours/bin/python from Bio.Blast import NCBIStandalone import sys blast_bin='/local/cours/bin/blastall' blast_prog = 'blastn' blastDB='/local/cours/bank/test_db' query = sys.argv[1] # blast exec OUT, ERR = NCBIStandalone.blastall(blast_bin, blast_prog, blastDB, query) # parser instanciation parser=NCBIStandalone.BlastParser() # parsing record = parser.parse(OUT) execution fails with the following traceback saint-moret:~ > ./blast_pb.py query2 Traceback (most recent call last): File "./blast_pb.py", line 17, in ? record = parser.parse(OUT) File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", line 567, in parse self._scanner.feed(handle, self._consumer) File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", line 95, in feed self._scan_header(uhandle, consumer) File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", line 129, in _scan_header read_and_call_while(uhandle, consumer.noevent, blank=1) File "/local/cours/lib/python2.3/site-packages/Bio/ParserSupport.py", line 314, in read_and_call_while line = safe_readline(uhandle) File "/local/cours/lib/python2.3/site-packages/Bio/ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. blast manualy run give me the following results saint-moret:~ > blastall -p blastn -d /local/bank/hpylori/blast2/hp26695_genome -i query2 BLASTN 2.2.10 [Oct-19-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= no_hits (120 letters) Database: Hpylori 1 sequences; 1,667,867 total letters Searching[NULL_Caption] WARNING: [000.000] no_hits: SetUpBlastSearch failed. [NULL_Caption] ERROR: [000.000] no_hits: BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence [NULL_Caption] ERROR: [000.000] no_hits: BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence done ***** No hits found ****** Database: Hpylori Posted date: May 3, 2004 3:55 AM Number of letters in database: 1,667,867 Number of sequences in database: 1 Lambda K H -1.00 -1.00 -1.00 the parser works perfectly with blastn reports with hits. we discover the problem, trying to work on a blast report of multiple sequences versus one bank using the NCBIStandalone.iterator. some details running Python 2.3.4, Biopython 1.30 and blastall 2.2.10 Release on MacOSX thank's for any information Eric From aurelie.bornot at free.fr Mon Apr 25 05:42:29 2005 From: aurelie.bornot at free.fr (aurelie.bornot@free.fr) Date: Mon Apr 25 05:35:59 2005 Subject: [BioPython] Big GenBank files Message-ID: <1114422149.426cbb8506bc7@imp4-q.free.fr> Hi ! I am trying to make a program that do automatically blasts of a base of sequences against the genbank sequences. And I would like to retrieve (also automatically) the most interesting GenBank files..... to keep informations about them in my database. But I've got a problem (again..sorry ! :'( ) : I've 2*512 Mega of RAM but it seems that my computer can't deal with 'big' GenBank files like 'BA000028.3'(7 M) or 'AP008212' (37 M) for example : fichier = open('AP008212.fasta',"w") record_parser = GenBank.RecordParser() ncbi_dict = GenBank.NCBIDictionary ('nucleotide','genbank',parser=record_parser) gb_record = ncbi_dict['AP008212'] fichier.close() ...never ends... I suppose it is because the files are to big for the algo of the transformation in registry.... For 'AP008212' (37 M) : ncbi_dict = GenBank.NCBIDictionary ('nucleotide','fasta') doesn't works either... I tried to understand how all this works to try to retrieve the header of the connexion (maybe there is a possibility of give up the download of these big files...) but I am not very used to python and to all that concern connexions... I have been on this problem for 3 days... and I am lost... I don't known what to do... Could someone help me ?! Thanks ! Aurelie From biopython at maubp.freeserve.co.uk Mon Apr 25 07:16:35 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon Apr 25 07:15:37 2005 Subject: [BioPython] Big GenBank files In-Reply-To: <1114422149.426cbb8506bc7@imp4-q.free.fr> References: <1114422149.426cbb8506bc7@imp4-q.free.fr> Message-ID: <426CD193.5030801@maubp.freeserve.co.uk> aurelie.bornot@free.fr wrote: > Hi ! > > I am trying to make a program that do automatically blasts of a base of > sequences against the genbank sequences. And I would like to retrieve (also > automatically) the most interesting GenBank files..... to keep informations > about them in my database. > > But I've got a problem (again..sorry ! :'( ) : > > I've 2*512 Mega of RAM but it seems that my computer can't deal with 'big' > GenBank files like 'BA000028.3'(7 M) or 'AP008212' (37 M) Have a look at bug 1747 which should help with reading large GenBank files (however I'm not sure if it will affect GenBank.NCBIDictionary) http://bugzilla.open-bio.org/show_bug.cgi?id=1747 The test code used was based on Section 3.4.2 of the Tutorial, Parsing GenBank records: http://www.biopython.org/docs/tutorial/Tutorial.html#htoc35 See also the discussion last month:- http://www.biopython.org/pipermail/biopython/2005-March/002568.html Peter -- PhD Student MOAC Doctoral Training Centre University of Warwick, UK From julie.bernauer at ibbmc.u-psud.fr Mon Apr 25 07:47:50 2005 From: julie.bernauer at ibbmc.u-psud.fr (Julie Bernauer) Date: Mon Apr 25 07:45:48 2005 Subject: [BioPython] Psi blast Message-ID: <1114429670.9366.8.camel@fifi.ibbmc.u-psud.fr> Hello ! I hope this question hasn't been answered yet. Please excuse me in case it has. I'd like to get the result of phi-blast after 5 runs (html page): Here is a piece of my code : while 1: cur_record=s_iterator.next() if cur_record is None: break f_record=Fasta.Record() f_record.title=cur_record.accessions[0] f_record.sequence=cur_record.sequence print f_record.title b_results=NCBIWWW.qblast('blastp','swissprot',f_record) save_file=open('resultats/blast/'+id+'.html','w') blast_results=b_results.read() save_file.write(blast_results) save_file.close() I's like to have sthg like : b_results=NCBIWWW.qblast('blastp','swissprot',f_record,run_psiblast=5) Would you please help me ? Thanks in advance -- Julie BERNAUER Equipe de G?nomique Structurale http://www.genomics.eu.org IBBMC - UMR 8619 - U.P.S. B?t.430 Tel. : +33 1 69 15 31 57 91405 Orsay - FRANCE Fax. : +33 1 69 85 37 15 From mdehoon at ims.u-tokyo.ac.jp Tue Apr 26 04:25:59 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Apr 26 04:17:04 2005 Subject: [BioPython] NCBIStandalone Parser problem In-Reply-To: <20050421091119.GA12744@hebus.sis.pasteur.fr> References: <20050421091119.GA12744@hebus.sis.pasteur.fr> Message-ID: <426DFB17.5010409@ims.u-tokyo.ac.jp> Could you try this again with Biopython version 1.40b and see if the problem still occurs there? If so, could you send me the query that you are using so I can replicate this error? --Michiel. edeveaud@pasteur.fr wrote: > Hi, > > I just wanted to mention a problem with the NCBIStandalone Parser > > when I try to parse a blast nucleic report file with No hits found > using the following code (reduced to the minimum) > > > saint-moret:~ > cat blast_pb.py > #! /local/cours/bin/python > from Bio.Blast import NCBIStandalone > import sys > > blast_bin='/local/cours/bin/blastall' > blast_prog = 'blastn' > blastDB='/local/cours/bank/test_db' > query = sys.argv[1] > > # blast exec > OUT, ERR = NCBIStandalone.blastall(blast_bin, blast_prog, blastDB, query) > > # parser instanciation > parser=NCBIStandalone.BlastParser() > > # parsing > record = parser.parse(OUT) > > > > execution fails with the following traceback > saint-moret:~ > ./blast_pb.py query2 > Traceback (most recent call last): > File "./blast_pb.py", line 17, in ? > record = parser.parse(OUT) > File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", > line 567, in parse > self._scanner.feed(handle, self._consumer) > File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", > line 95, in feed > self._scan_header(uhandle, consumer) > File "/local/cours/lib/python2.3/site-packages/Bio/Blast/NCBIStandalone.py", > line 129, in _scan_header > read_and_call_while(uhandle, consumer.noevent, blank=1) > File "/local/cours/lib/python2.3/site-packages/Bio/ParserSupport.py", line > 314, in read_and_call_while > line = safe_readline(uhandle) > File "/local/cours/lib/python2.3/site-packages/Bio/ParserSupport.py", line > 411, in safe_readline > raise SyntaxError, "Unexpected end of stream." > SyntaxError: Unexpected end of stream. > > blast manualy run give me the following results > > > > saint-moret:~ > blastall -p blastn -d /local/bank/hpylori/blast2/hp26695_genome > -i query2 > BLASTN 2.2.10 [Oct-19-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= no_hits > (120 letters) > > Database: Hpylori > 1 sequences; 1,667,867 total letters > > Searching[NULL_Caption] WARNING: [000.000] no_hits: SetUpBlastSearch failed. > [NULL_Caption] ERROR: [000.000] no_hits: BLASTSetUpSearch: Unable to > calculate Karlin-Altschul params, check query sequence > [NULL_Caption] ERROR: [000.000] no_hits: BLASTSetUpSearch: Unable to > calculate Karlin-Altschul params, check query sequence > done > > ***** No hits found ****** > > Database: Hpylori > Posted date: May 3, 2004 3:55 AM > Number of letters in database: 1,667,867 > Number of sequences in database: 1 > > Lambda K H > -1.00 -1.00 -1.00 > > > > > the parser works perfectly with blastn reports with hits. > we discover the problem, trying to work on a blast report of multiple sequences > versus one bank using the NCBIStandalone.iterator. > > some details > running Python 2.3.4, Biopython 1.30 and blastall 2.2.10 Release on MacOSX > > thank's for any information > > Eric > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From gvwilson at cs.utoronto.ca Tue Apr 26 13:34:35 2005 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Tue Apr 26 15:40:20 2005 Subject: [BioPython] ann: Data Crunching Message-ID: Readers of this newsgroup might be interested in a new book on data crunching, which is available from Amazon: http://www.amazon.com/exec/obidos/ASIN/0974514071 or directly from the Pragmatic Programmers: http://www.pragmaticprogrammer.com/titles/gwd/index.html The book covers basic text processing, regular expressions, XML manipulation, binary data handling, and the 10% of relational databases that every programmer should know. Most of the examples are in Python (though Unix command line tools, XSL, and SQL are in there as well). The book is aimed at beginner and intermediate programmers; I hope that people with a background in science and engineering will find it particularly useful. Thanks, Greg Wilson (its grinning author) From pap501 at york.ac.uk Tue Apr 26 17:04:27 2005 From: pap501 at york.ac.uk (pap501@york.ac.uk) Date: Tue Apr 26 16:57:38 2005 Subject: [BioPython] ann: Data Crunching In-Reply-To: References: Message-ID: No thanks there was an email circulated before about books not to buy and yours is one of them On Apr 26 2005, Greg Wilson wrote: > Readers of this newsgroup might be interested in a new book on data > crunching, which is available from Amazon: > > http://www.amazon.com/exec/obidos/ASIN/0974514071 > > or directly from the Pragmatic Programmers: > > http://www.pragmaticprogrammer.com/titles/gwd/index.html > > The book covers basic text processing, regular expressions, XML > manipulation, binary data handling, and the 10% of relational databases > that every programmer should know. Most of the examples are in Python > (though Unix command line tools, XSL, and SQL are in there as well). > The book is aimed at beginner and intermediate programmers; I hope that > people with a background in science and engineering will find it > particularly useful. > > Thanks, > Greg Wilson (its grinning author) > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > From gvwilson at cs.utoronto.ca Tue Apr 26 17:20:30 2005 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Tue Apr 26 17:17:42 2005 Subject: [BioPython] ann: Data Crunching In-Reply-To: References: Message-ID: > Phil [pap501@york.ac.uk] wrote: > No thanks > > there was an email circulated before about books not to buy and yours is > one of them Wow, that was quick --- it's only been on the market two weeks. I'd be grateful if you could point me at the "don't buy" message; I'd obviously be interested in finding out what I got wrong. Thanks, Greg >> Readers of this newsgroup might be interested in a new book on data >> crunching, which is available from Amazon: >> >> http://www.amazon.com/exec/obidos/ASIN/0974514071 >> >> or directly from the Pragmatic Programmers: >> >> http://www.pragmaticprogrammer.com/titles/gwd/index.html From pap501 at york.ac.uk Tue Apr 26 17:47:08 2005 From: pap501 at york.ac.uk (pap501@york.ac.uk) Date: Tue Apr 26 17:40:30 2005 Subject: Fwd: Re: [BioPython] ann: Data Crunching Message-ID: Dear all Sorry for the previous email. A friend of mine thought it would be funny to send an email while I was away from my computer. I have apologised to Greg and hopefully he accepts my apologies ---------- Forwarded message ---------- From: pap501@york.ac.uk To: Greg Wilson Subject: Re: [BioPython] ann: Data Crunching Date: 26 Apr 2005 22:42:40 +0100 Dear Greg Sorry my friend thought it would be funny to send an email while I wasnt at my computer! I am doing my MRes bioinformatics at York university, England and would be interested in a book like yours. Do you have a link to more details about your book? I am rather embarassed that you forwarded the email to biopython as some of my lecturers are in the email list! Best wishes Phil On Apr 26 2005, Greg Wilson wrote: > > Phil [pap501@york.ac.uk] wrote: > > No thanks > > > > there was an email circulated before about books not to buy and yours > > is one of them > > Wow, that was quick --- it's only been on the market two weeks. I'd be > grateful if you could point me at the "don't buy" message; I'd obviously > be interested in finding out what I got wrong. > > Thanks, > Greg > > >> Readers of this newsgroup might be interested in a new book on data > >> crunching, which is available from Amazon: > >> > >> http://www.amazon.com/exec/obidos/ASIN/0974514071 > >> > >> or directly from the Pragmatic Programmers: > >> > >> http://www.pragmaticprogrammer.com/titles/gwd/index.html > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > From gvwilson at cs.utoronto.ca Tue Apr 26 18:31:39 2005 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Tue Apr 26 18:27:34 2005 Subject: Fwd: Re: [BioPython] ann: Data Crunching In-Reply-To: References: Message-ID: > Dear all > Sorry for the previous email. > A friend of mine thought it would be funny to send an email while I was > away from my computer. Don't ya hate it when that happens? ;-) Sorry for replying here before checking with Phil --- hope his friend gets sent to bed without supper. Thanks, Greg From fkauff at duke.edu Wed Apr 27 10:14:12 2005 From: fkauff at duke.edu (Frank Kauff) Date: Wed Apr 27 10:09:34 2005 Subject: [BioPython] drawing trees Message-ID: <1114611252.4523.12.camel@osiris.biology.duke.edu> Hi all, it's a little bit off topic, but anyway: I need to print phylogenetic trees. drawtree.py from Rick Ree does a very nice job generating pdf or eps files, but the trees are always printed on only one page. If the trees are large, squeezing them into in page makes the output unreadlible. Does anybody know a easy way to manipulate a eps or pdf file or to tell my printer somehow to print a file streched over multiple pages? Thanks, Frank From karthik at rishi.serc.iisc.ernet.in Thu Apr 28 06:34:08 2005 From: karthik at rishi.serc.iisc.ernet.in (Karthik Raman) Date: Thu Apr 28 05:55:45 2005 Subject: [BioPython] Help with Biopython BLAST Message-ID: Hi How do I access query_letters from Blast Headers? I've been trying to access it as b_record (from the standard example in the cookbook): b_record.header.query_letters What is the mistake I am doing? Regards Karthik ******************************************************** KARTHIK RAMAN Graduate Research Student Supercomputer Education and Research Centre Indian Institute of Science Bangalore - 560 012 Phone: +91-80-22932469 (Ext 24) +91-80-23601409 (Ext 24) +91-98862-22404 (Mobile) E-mail : karthik AT rishi.serc.iisc.ernet.in Webpage : http://rishi.serc.iisc.ernet.in/~karthik/ Blog : karthikraman.blogspot.com ******************************************************* From mdehoon at ims.u-tokyo.ac.jp Thu Apr 28 08:05:23 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Apr 28 07:53:21 2005 Subject: [BioPython] Help with Biopython BLAST In-Reply-To: References: Message-ID: <4270D183.90403@ims.u-tokyo.ac.jp> You can check b_record.__dict__.keys() to find out the members of b_record. It appears that you should use b_record.query_letters instead of b_record.header.query_letters. --Michiel. Karthik Raman wrote: > Hi > > How do I access query_letters from Blast Headers? > > I've been trying to access it as b_record (from the standard example in > the cookbook): > > b_record.header.query_letters > > What is the mistake I am doing? > > Regards > > Karthik > > ******************************************************** > > KARTHIK RAMAN > Graduate Research Student > Supercomputer Education and Research Centre > Indian Institute of Science > Bangalore - 560 012 > > Phone: +91-80-22932469 (Ext 24) > +91-80-23601409 (Ext 24) > +91-98862-22404 (Mobile) > > E-mail : karthik AT rishi.serc.iisc.ernet.in > Webpage : http://rishi.serc.iisc.ernet.in/~karthik/ > Blog : karthikraman.blogspot.com > > ******************************************************* > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From MAG at Stowers-Institute.org Fri Apr 29 10:40:07 2005 From: MAG at Stowers-Institute.org (Goel, Manisha) Date: Fri Apr 29 10:35:08 2005 Subject: [BioPython] Reading in many multiple seq alignmnet for substitution matrices Message-ID: <200504291433.j3TEXAfY006631@portal.open-bio.org> Hi, I have been trying to use the SubsMat module to make substitution matrices for my data. All seems to be fine except that I need help with the following two issues - 1) My data is in the form of many multiple sequence alignments .. Can I read all these MSAs in sequentially and generate a common replace_info at the end. 2) What other format can be read in for MSA except Clustalw. -Thanks for all the help. -Manisha Goel Post-doc Associate 1000E 50th St. Kansas city MO-64110