From italo.maia at gmail.com Sun Jul 1 19:12:24 2007 From: italo.maia at gmail.com (Italo Maia) Date: Sun, 1 Jul 2007 20:12:24 -0300 Subject: [BioPython] Error installing biopython... Message-ID: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> Did anyone else had this error trying to install biopython under ubuntu 7? Instalando python-biopython (1.42-2) ... Compiling /var/lib/python-support/python2.5/Bio/Wise/dnal.py ... File "/var/lib/python-support/python2.5/Bio/Wise/dnal.py", line 5 from __future__ import division SyntaxError: from __future__ imports must occur at the beginning of the file -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From idoerg at gmail.com Sun Jul 1 19:19:40 2007 From: idoerg at gmail.com (I. Friedberg) Date: Sun, 1 Jul 2007 16:19:40 -0700 Subject: [BioPython] Error installing biopython... In-Reply-To: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> References: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> Message-ID: For 7.04 there is a patch: https://bugs.launchpad.net/ubuntu/+source/python-biopython/+bug/118771 I don't understand why they did not put the bugfix version in the Ubuntu repositories though. Iddo On 7/1/07, Italo Maia wrote: > > Did anyone else had this error trying to install biopython under ubuntu 7? > > Instalando python-biopython (1.42-2) ... > Compiling /var/lib/python-support/python2.5/Bio/Wise/dnal.py ... > File "/var/lib/python-support/python2.5/Bio/Wise/dnal.py", line 5 > from __future__ import division > SyntaxError: from __future__ imports must occur at the beginning of the > file > > > -- > "A arrog?ncia ? a arma dos fracos." > > =========================== > Italo Moreira Campelo Maia > Ci?ncia da Computa??o - UECE > Desenvolvedor WEB > Programador Java, Python > > Meu blog ^^ http://eusouolobomal.blogspot.com/ > > =========================== > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From dalloliogm at gmail.com Tue Jul 3 11:24:05 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Jul 2007 17:24:05 +0200 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <4683CFA0.1050905@maubp.freeserve.co.uk> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> Message-ID: <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> 2007/6/28, Peter : > Giovanni Marco Dall'Olio wrote: > > Hi! > > In principle, when I can't decide which keys to use for a dictionary, > > I just take simple numerical integers as keys, and it works quite > > well. > > It simplifies testing/debugging/organization a lot and I can decide > > the meaning of every key later (so it's better for dictionaries which > > have to contain very heterogeneous data). > > It sounds like you don't need/want a dictionary at all. If you are > assigning increasing numerical integers "keys", then why not just use > the list of features directly? That is not true: with a list is more complicated to add/remove elements if not in the last position. For instance, if I remove the first element in a list, all the other elements shift a position and I risk losing all the references I've made to them. Also, I don't need the sort/reverse and other operators. Moreover the cost of the insert operation into a dictionary in python is of the order of O(1) for every position, while for lists is not constant if not in the last position (sorry, I can't find a reference for this). In other languages it could seem strange to use a list to store data, because traditionally the cost of retrieving an element from a list is on the order of O(n) (this is not the case of python). Let's have a look at your example: - we have a list of features like this: list_features = ['GTAAGT', 'TACTAAC', 'TGT'] - then we specify the meaning of these features in another dictionary: splicesignal5 = list_features[0] polypirimidinetract = list_features[1] splicesignal3 = list_features[2] python passes the variables by value: this means that if you change one of the values in the list_features list, then you have to update all the variables which refer to it manually. >>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>> splicesignal5 = list_features[0] >>> print splicesignal5 'GTAAGT' >>> list_features[0] = 'TTTTTTT' >>> print splicesignal5 'GTAAGT' # wrong! >>> splicesignal5 = list_features[0] # have to update all the variables which refer to list_features manually >>> print splicesignal5' 'TTTTTTT' This is why I prefer to save the positions of the features instead of their values: >>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], 'splicesignal3': [2]} >>> def get_feature(feature_name): return list_features[dict_aliases[feature_name]] # (this code doesn't work) However, I think it's better to save the features in a dictionary instead of a list, for the reasons I was explaining before, in this way: >>> dict_features = {0: 'GTAAGT', 1: 'TACTAAC', 2: 'TGT'} # features are in a dictionary instead of a list >>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], 'splicesignal3': [2]} >>> def get_feature(feature_name): return(map (dict_features.get, x for x in dict_aliases[feature_name])) Another option could be to use references to memory positions instead of dictionary keys, but I don't know how to implement this in python, and I'm not sure it would be computationally convenient. > e.g. assuming record is a SeqRecord object: > > first_feature = record.features[0] > second_feature = record.features[1] > third_feature = record.features[2] > etc > > > I'm not sure I have understood the example you gave me on > > http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features > > , but it seems to work in a way similar to what I was saying before: > > it saves all the features in a list (or is it a dictionary?) and > > access them later by their positions. > > That example stored integers (indices in the features list) in a > dictionary using either the Locus tag, GI numbers or GeneID (e.g. keys > like "NEQ010", "GI:41614806" or "GeneID:2654552"). > > The point being if you know in advance you want to find individual > feature on the basis of their locus tag (for example), rather than the > order in the file, then I would map the locus tag strings to positions > in the list. > > e.g. > > locus_tag_cds_index = \ index_genbank_features(gb_record,"CDS","locus_tag") > my_feature = gb_record.features[locus_tag_index["NEQ010"]] uh ok.. but how is the gb_record.features dictionary structured? Which keys does it have? And what happens to these dictionaries (let's say, locus_tag_cds_index), when a feature from gb_record.features is deleted or modified? > You could also build a dictionary which maps from the locus tag directly > to the associated SeqFeature objects themselves. > > > Not to be silly but... how do you represent a gene with its > > transcripts/exons/introns structure with biopython? With SeqRecord and > > SeqFeature objects? > > If you loaded a GenBank or EMBL file using SeqIO you get one SeqRecord > object (assuming there is only one LOCUS line in the file) which > contains a list of SeqFeature objects which in turn may contain > sub-features. > > I work with bacteria so I don't have much experience with dealing with > sub-features in a SeqFeature object. I've never worked with SeqFeature and GenBank files (I have to work with GFF/GTF for annotations), but I will try to see how does it works. Thank you very much for these replies! I was really hoping to have this kind of feedback. Cheers! :) > > Peter > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From skhadar at gmail.com Tue Jul 3 22:52:15 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Wed, 4 Jul 2007 08:22:15 +0530 Subject: [BioPython] Biopython Installation on CentOS Message-ID: Dear All, We need to install Biopython and its dependencies (egenix-mx-base-2.0.6, Numeric-24.2 ) on our webserver. Machine Details : CentOS 4, x86_64, When we are trying to install these packages using 'python setup.py build'. It end up with the same error. "error: command 'gcc' failed with exit status 1". Please help us to solve this. Many thanks in advance, Shameer Khadar From sbassi at gmail.com Tue Jul 3 22:59:48 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 3 Jul 2007 23:59:48 -0300 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: On 7/3/07, Shameer Khadar wrote: > When we are trying to install these packages using 'python setup.py > build'. It end up with the same error. "error: command 'gcc' failed with > exit status 1". Did you install "build-essencial"? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Jul 3 23:22:56 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 4 Jul 2007 00:22:56 -0300 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: On 7/4/07, Shameer Khadar wrote: > No I am not aware of this 'build essencial'. > Can you pls tell me where I can download this ? I am sorry, this package is for compiling in Ubuntu, not in CentOS. My mistake. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Wed Jul 4 04:40:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 04 Jul 2007 09:40:58 +0100 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: <468B5D1A.2050800@maubp.freeserve.co.uk> Sending again, to the mailing list this time! Shameer wrote: > When we are trying to install these packages using 'python setup.py > build'. It end up with the same error. "error: command 'gcc' failed > with exit status 1". Could you find the command (run by setup.py) that caused this gcc error? i.e. somewhere near the end of the output from 'python setup.py build' (I'm hoping to see which part of Biopython is causing the problem) What version of python do you have installed? What version of gcc do you have installed? Do you have flex installed? Have you been able to install any other python modules from source? (e.g. some of the recommended packages or dependencies like Numeric?) Peter From biopython at maubp.freeserve.co.uk Thu Jul 5 05:33:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 05 Jul 2007 10:33:26 +0100 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> Message-ID: <468CBAE6.3030306@maubp.freeserve.co.uk> Giovanni Marco Dall'Olio wrote: > Let's have a look at your example: > - we have a list of features like this: > list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > > - then we specify the meaning of these features in another dictionary: > splicesignal5 = list_features[0] > polypirimidinetract = list_features[1] > splicesignal3 = list_features[2] > > python passes the variables by value: this means that if you change > one of the values in the list_features list, then you have to update > all the variables which refer to it manually. > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>>> splicesignal5 = list_features[0] >>>> print splicesignal5 > 'GTAAGT' >>>> list_features[0] = 'TTTTTTT' >>>> print splicesignal5 > 'GTAAGT' # wrong! >>>> splicesignal5 = list_features[0] # have to update all the > variables which refer to list_features manually >>>> print splicesignal5' > 'TTTTTTT' > > This is why I prefer to save the positions of the features instead of > their values: >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], > 'splicesignal3': [2]} >>>> def get_feature(feature_name): return > list_features[dict_aliases[feature_name]] # (this code doesn't work) ... > Another option could be to use references to memory positions instead > of dictionary keys, but I don't know how to implement this in python, > and I'm not sure it would be computationally convenient. Have you considered making "feature objects", where each object can hold multiple pieces of information such as a name, alias, type - as well as the sequence data itself. You may wish to create your own class here, or try and use the existing Biopython SeqFeature object. You could then use a list to hold your feature objects, or a dictionary keyed on the alias perhaps. Or both. e.g. class Feature : #Very simple class which could be extended def __init__(self, seq_string) : self.seq = seq_string def __repr__(self) : #Use id(self) is to show the memory location (in hex), just #to show difference between two instance with same seq return "Feature(%s) instance at %s" \ % (self.seq, hex(id(self))) list_features = [Feature('GTAAGT'), Feature('TACTAAC'), Feature('TGT')] splicesignal5 = list_features[0] print splicesignal5 print list_features[0] print "EDITING first object in the list:" list_features[0].seq = 'TTTTTTT' print splicesignal5 #changed, now TTTTTTT print list_features[0] print "REPLACING first object in the list:" list_features[0] = Feature('GGGGGG') print splicesignal5 #still points to old object, TTTTTTT print list_features[0] -- I'm not sure if that is closer to what you wanted, or not. Peter From skhadar at gmail.com Thu Jul 5 09:10:24 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Thu, 5 Jul 2007 18:40:24 +0530 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: <468B5D1A.2050800@maubp.freeserve.co.uk> References: <468B5D1A.2050800@maubp.freeserve.co.uk> Message-ID: Dear Bassi and Peter, thanks for ur inputs, > > > Could you find the command (run by setup.py) that caused this gcc > error? i.e. somewhere near the end of the output from 'python setup.py > build' (I'm hoping to see which part of Biopython is causing the > problem) yes, it happened when i run python setup.py What version of python do you have installed? Python-2.4.4 What version of gcc do you have installed? i have no gcc, but i installed it and also the python-dev, after that it worked !! Do you have flex installed? what is the purpose of this flex Have you been able to install any other python modules from source? > (e.g. some of the recommended packages or dependencies like Numeric?) got same error for all, after installing python-dev and gcc, its done. many thanks, -- Shameer Khadar NCBS-TIFR Bangalore Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jeddahbioc at yahoo.com Tue Jul 10 07:15:52 2007 From: jeddahbioc at yahoo.com (FFFFF AAAAA) Date: Tue, 10 Jul 2007 04:15:52 -0700 (PDT) Subject: [BioPython] filter Message-ID: <385584.63029.qm@web43146.mail.sp1.yahoo.com> Hi, How to choose pdb files from Richardson Top500H pdbs with aresolution =< 1.00 Angstroms using python scripts. Thanks Fawzia --------------------------------- Shape Yahoo! in your own image. Join our Network Research Panel today! From biopython at maubp.freeserve.co.uk Tue Jul 10 08:03:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 13:03:15 +0100 Subject: [BioPython] filtering PDB files by resolution In-Reply-To: <385584.63029.qm@web43146.mail.sp1.yahoo.com> References: <385584.63029.qm@web43146.mail.sp1.yahoo.com> Message-ID: <46937583.9060908@maubp.freeserve.co.uk> FFFFF AAAAA wrote: > Hi, > How to choose pdb files from Richardson Top500H pdbs with aresolution =< 1.00 Angstroms using python scripts. > Thanks > Fawzia I've never had to look at the resolution itself, but the REMARK 2 lines in the header looks relevant. You may find it easiest to do this your self rather than using the Biopython module Bio.PDB as that focuses on the atomistic structure rather than the metadata. Alternatively maybe you could take the list of PDB identifiers and your 1.00 Angstroms, and put these into the www.pdb.org web query interface. Peter P.S. There might be something useful here: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/top500/ Incidentally, are you working with original PDB files from www.pdb.org or the modified versions from the Richardson group which had hydrogens added with reduce? From biopython at maubp.freeserve.co.uk Tue Jul 10 08:33:08 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 13:33:08 +0100 Subject: [BioPython] filtering PDB files by resolution In-Reply-To: <46937583.9060908@maubp.freeserve.co.uk> References: <385584.63029.qm@web43146.mail.sp1.yahoo.com> <46937583.9060908@maubp.freeserve.co.uk> Message-ID: <46937C84.5040409@maubp.freeserve.co.uk> Link to the "Top 500" PDB page, http://kinemage.biochem.duke.edu/databases/top500.php Peter wrote: > Alternatively maybe you could take the list of PDB identifiers and your > 1.00 Angstroms, and put these into the www.pdb.org web query interface. Anyway, using the web interface, here are twenty structures from the 494 unique PDB IDs on the "Richardson group's Top 500" list done by X-Ray crystallography with a resolution under 1.0 Angstroms: 1A6M 1.0 1AHO 1.0 1B0Y 0.9 1BXO 0.9 1BYI 1.0 1C75 1.0 1CEX 1.0 1EJG 0.5 1ETN 0.9 1GCI 0.8 1IXH 1.0 1LKK 1.0 1NLS 0.9 1RB9 0.9 2ERL 1.0 2FDN 0.9 2PVB 0.9 3PYP 0.9 4LZT 0.9 7A3H 0.9 I just pasted in the 494 unique PDB IDs (comma separated) and specified the x-ray resolution to be between 0.0 and 1.0 Note 1A1Y (resolution 1.05 angstroms) is now obsolete, and does not appear in the search results even if you relax the resolution limit. Also watch out for the fact that some of the PDB IDs on the Top 500 list have been replaced: 1TAX -> 1GOK 1GDO -> 1XFF 5ICB -> 1IG5 2MYR -> 1E70 Finally, I can't see how to use the web query to search for resolutions from other experimental techniques - but it looks like all of the "Top 500" were done by x-ray anyway. Do check this! Peter From mmayhew at mcb.mcgill.ca Tue Jul 10 14:08:18 2007 From: mmayhew at mcb.mcgill.ca (Michael Mayhew) Date: Tue, 10 Jul 2007 14:08:18 -0400 Subject: [BioPython] Martel Parser error... Message-ID: <4693CB12.5080207@mcb.mcgill.ca> Greetings, I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and have successfully compiled/installed the prerequisite packages (Numeric and mxTextTools). I have been receiving a Martel Parser error as detailed in the following readout (from a python interactive session), when I try to use either Fasta.RecordParser() or Fasta.SequenceParser() instances: >>tester = iter.next() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Fasta/__init__.py", line 72, in next result = self._iterator.next() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Martel/IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 0 I confirmed this when I ran the included test suites (with python setup.py test). I have seen some suggestions to get the most recent CVS version of biopython to rectify this problem. How would I go about this? Is getting the most recent CVS version of biopython the only/best thing to do? Thanks in advance. Michael Mayhew From biopython at maubp.freeserve.co.uk Tue Jul 10 15:27:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 20:27:15 +0100 Subject: [BioPython] Martel Parser error... In-Reply-To: <4693CB12.5080207@mcb.mcgill.ca> References: <4693CB12.5080207@mcb.mcgill.ca> Message-ID: <4693DD93.9090208@maubp.freeserve.co.uk> Michael Mayhew wrote: > Greetings, > > I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and > have successfully compiled/installed the prerequisite packages (Numeric > and mxTextTools). > > I have been receiving a Martel Parser error as detailed in the > following readout (from a python interactive session), when I try to use > either Fasta.RecordParser() or Fasta.SequenceParser() instances: > > >>tester = iter.next() > Traceback (most recent call last): > ... Could you give a complete stand alone example? I'm not sure what you are trying to do here... Have you looked at Bio.SeqIO (available in Biopython 1.43) instead of Bio.Fasta? http://biopython.org/wiki/SeqIO > I confirmed this when I ran the included test suites (with python > setup.py test). Could you show us the failed test result? > I have seen some suggestions to get the most recent CVS version of > biopython to rectify this problem. How would I go about this? > > Is getting the most recent CVS version of biopython the only/best > thing to do? I'm not sure what's wrong here - so its hard to say if CVS would be any better. I've not used Biopython on Mac OS X, but it should work. Peter From biopython at maubp.freeserve.co.uk Tue Jul 10 16:03:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 21:03:10 +0100 Subject: [BioPython] Bio.SeqIO and files with one record Message-ID: <4693E5FE.708@maubp.freeserve.co.uk> Dear Biopython people, I'd like a little feedback on the Bio.SeqIO module - in particular, one situation I think could be improved is when dealing with sequences files which contain a single record - for example a very simple Fasta file, or a chromosome in a GenBank file. http://www.biopython.org/wiki/SeqIO The shortest way to get this one record as a SeqRecord object is probably: from Bio import SeqIO record = SeqIO.parse(open("example.gbk"), "genbank").next() This works, assuming there is at least one record, but will not trigger any error if there was more than one record - something you may want to check. Do any of you think this situation is common enough to warrant adding another function to Bio.SeqIO to do this for you (raising errors for no records or more than one record). My suggestions for possible names include parse_single, parse_one, parse_sole, parse_individual and mono_parse One way to do this inline would be: from Bio import SeqIO temp_list = list(SeqIO.parse(open("example.gbk"), "genbank")) assert len(temp_list) == 1 record = temp_list[0] del temp_list Or perhaps: from Bio import SeqIO temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank")) record = temp_iter.next() try : assert temp_iter.next() is None except StopIteration : pass del temp_iter The above code copes with the fact that in general some iterators may signal the end by raising a StopIteration except, or by returning None. Peter P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html From mmayhew at MCB.McGill.CA Tue Jul 10 20:12:22 2007 From: mmayhew at MCB.McGill.CA (mmayhew at MCB.McGill.CA) Date: Tue, 10 Jul 2007 20:12:22 -0400 (EDT) Subject: [BioPython] Martel Parser error... In-Reply-To: <4693DD93.9090208@maubp.freeserve.co.uk> References: <4693CB12.5080207@mcb.mcgill.ca> <4693DD93.9090208@maubp.freeserve.co.uk> Message-ID: <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> Tested a sample file with a single FASTA record at home with my Windows XP machine and produced the following set of results using Bio.SeqIO first and then Bio.Fasta (with Fasta.Iterator and Fasta.RecordParser() instances). Looks like the same error, with a completely different OS (received exact same error with Mac OSX v10.4 & biopython 1.42). Since Bio.SeqIO works fine (thank you for the reccomendation) I will use that, but the Bio.Fasta error may potentially be an error to look into. >>> from Bio import SeqIO >>> handle = open('test.txt', 'r') >>> for record in SeqIO.parse(handle, "fasta"): print record.id hg18_knownGene_uc001hsx.1 >>> record.seq Seq('ATGGCCCGAACCAAGCAGACTGCGCGCAAGTCAACGGGTGGCAAGGCGCCGCGCAAGCAGCTGGCCACCAAGGTGGCTCGCAAGAGCGCACCTGCCACTGGCGGCGTGAAGAAGCCGCACCGCTACCGGCCCGGCACGGTGGCGCTTCGCGAGATCCGCCGCTACCAGAAGTCCACTGAGCTGCTAATCCGCAAGTTGCCCTTCCAGCGGCTGATGCGCGAGATCGCTCAGGACTTTAAGACCGACCTGCGCTTCCAGAGCTCGGCCGTGATGGCGCTGCAGGAGGCGTGCGAGTCTTACCTGGTGGGGCTGTTTGAGGACACCAACCTGTGTGTCATCCATGCCAAACGGGTCACCATCATGCCTAAGGACATCCAGCTGGCACGCCGTATCCGCGGGGAGCGGGCCTAG', SingleLetterAlphabet()) >>> record.seq.tostring() 'ATGGCCCGAACCAAGCAGACTGCGCGCAAGTCAACGGGTGGCAAGGCGCCGCGCAAGCAGCTGGCCACCAAGGTGGCTCGCAAGAGCGCACCTGCCACTGGCGGCGTGAAGAAGCCGCACCGCTACCGGCCCGGCACGGTGGCGCTTCGCGAGATCCGCCGCTACCAGAAGTCCACTGAGCTGCTAATCCGCAAGTTGCCCTTCCAGCGGCTGATGCGCGAGATCGCTCAGGACTTTAAGACCGACCTGCGCTTCCAGAGCTCGGCCGTGATGGCGCTGCAGGAGGCGTGCGAGTCTTACCTGGTGGGGCTGTTTGAGGACACCAACCTGTGTGTCATCCATGCCAAACGGGTCACCATCATGCCTAAGGACATCCAGCTGGCACGCCGTATCCGCGGGGAGCGGGCCTAG' >>> handle.close() >>> from Bio import Fasta >>> handle = open('test.txt', 'r') >>> it = Fasta.Iterator(handle, Fasta.RecordParser()) >>> seq = it.next() Traceback (most recent call last): File "", line 1, in -toplevel- seq = it.next() File "C:\Python24\Lib\site-packages\Bio\Fasta\__init__.py", line 72, in next result = self._iterator.next() File "C:\Python24\Lib\site-packages\Martel\IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "C:\Python24\lib\xml\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 0 > Michael Mayhew wrote: >> Greetings, >> >> I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and >> have successfully compiled/installed the prerequisite packages (Numeric >> and mxTextTools). >> >> I have been receiving a Martel Parser error as detailed in the >> following readout (from a python interactive session), when I try to use >> either Fasta.RecordParser() or Fasta.SequenceParser() instances: >> >> >>tester = iter.next() >> Traceback (most recent call last): > > ... > > Could you give a complete stand alone example? I'm not sure what you are > trying to do here... > > Have you looked at Bio.SeqIO (available in Biopython 1.43) instead of > Bio.Fasta? > > http://biopython.org/wiki/SeqIO > >> I confirmed this when I ran the included test suites (with python >> setup.py test). > > Could you show us the failed test result? > >> I have seen some suggestions to get the most recent CVS version of >> biopython to rectify this problem. How would I go about this? >> >> Is getting the most recent CVS version of biopython the only/best >> thing to do? > > I'm not sure what's wrong here - so its hard to say if CVS would be any > better. I've not used Biopython on Mac OS X, but it should work. > > Peter > From fahy at chapman.edu Tue Jul 10 21:36:19 2007 From: fahy at chapman.edu (Michael Fahy) Date: Tue, 10 Jul 2007 18:36:19 -0700 Subject: [BioPython] Transcription Message-ID: <000401c7c35b$e1adea20$a509be60$@edu> I just showed the BioPython tutorial to some of our Biology and Chemistry faculty. They pointed out that all the "Transcribe" function does is replace each occurrence of "T" in the sequence with a "U". The biologists said that that is not what they mean by transcription. They felt that each nucleotide should have been replaced by the complementary nucleotide, and that the resulting string should have been reversed. This, they said, would be concordant with the way in which biologists use the term "transcribe'. It would not be hard to do, so why does BioPython do what it does and call it transcription? Michael Fahy Mathematics and Computer Science Chapman University One University Drive Orange, CA 92866 (714) 997-6879 fahy at chapman.edu From mdehoon at c2b2.columbia.edu Tue Jul 10 23:59:11 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 10 Jul 2007 23:59:11 -0400 Subject: [BioPython] Transcription References: <000401c7c35b$e1adea20$a509be60$@edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> It all depends on how you interpret the sequence that you give to the transcribe function, and for that matter, to the translate function. For translation, virtually all biological publications show the non-template strand. Hence, the sequence given to the translate function in Biopython is also interpreted as the non-template strand. For consistency, the sequence given to the transcribe function in Biopython is also taken to be the non-template strand: >>> from Bio.Seq import * >>> s = "ATGGTATAA" >>> translate(s) 'MV*' >>> transcribe(s) 'AUGGUAUAA' >>> translate(_) 'MV*' If you want to have the reverse complement, use the reverse_complement function. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michael Fahy Sent: Tue 7/10/2007 9:36 PM To: BioPython at lists.open-bio.org Subject: [BioPython] Transcription I just showed the BioPython tutorial to some of our Biology and Chemistry faculty. They pointed out that all the "Transcribe" function does is replace each occurrence of "T" in the sequence with a "U". The biologists said that that is not what they mean by transcription. They felt that each nucleotide should have been replaced by the complementary nucleotide, and that the resulting string should have been reversed. This, they said, would be concordant with the way in which biologists use the term "transcribe'. It would not be hard to do, so why does BioPython do what it does and call it transcription? Michael Fahy Mathematics and Computer Science Chapman University One University Drive Orange, CA 92866 (714) 997-6879 fahy at chapman.edu _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From aloraine at gmail.com Wed Jul 11 00:14:50 2007 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 10 Jul 2007 23:14:50 -0500 Subject: [BioPython] Transcription In-Reply-To: <000401c7c35b$e1adea20$a509be60$@edu> References: <000401c7c35b$e1adea20$a509be60$@edu> Message-ID: <83722dde0707102114w5d81f9dr7d3701afadd13ed8@mail.gmail.com> Hello, I guess the "translate" functions are more useful. I've used biopython tools many times to translate nucleotide sequences into proteins, using different genetic codes. It is a very useful feature, often the first step in series of computations. Maybe the audience was responding to how we programmers like to represent biological sequences as character strings? DNA, the molecule, is double-stranded, so it might be more proper to model it as a pair of strings. But this would be wasteful of space, since one string is all you need to capture the sequence of both strands. They are right that the antisense strand is used as the template for RNA synthesis (transcription), but I'm not sure if it is proper to say that one strand or the other is being transcribed. Maybe in future you could say something like: biopython sequence objects have a string that represents a sequence of nucleotides, and when you call a transcribe method, the method assumes that this string also represents the sense strand of a double-stranded DNA molecule. Best wishes, Ann On 7/10/07, Michael Fahy wrote: > I just showed the BioPython tutorial to some of our Biology and Chemistry > faculty. They pointed out that all the "Transcribe" function does is > replace each occurrence of "T" in the sequence with a "U". The biologists > said that that is not what they mean by transcription. They felt that each > nucleotide should have been replaced by the complementary nucleotide, and > that the resulting string should have been reversed. > > This, they said, would be concordant with the way in which biologists use > the term "transcribe'. It would not be hard to do, so why does BioPython do > what it does and call it transcription? > > > > > > Michael Fahy > > Mathematics and Computer Science > > Chapman University > > One University Drive > > Orange, CA 92866 > > (714) 997-6879 > > fahy at chapman.edu > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor University of Alabama at Birmingham http://www.transvar.org 205-996-4155 From biopython at maubp.freeserve.co.uk Wed Jul 11 04:03:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Jul 2007 09:03:58 +0100 Subject: [BioPython] Martel Parser error... In-Reply-To: <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> References: <4693CB12.5080207@mcb.mcgill.ca> <4693DD93.9090208@maubp.freeserve.co.uk> <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> Message-ID: <46948EEE.1050908@maubp.freeserve.co.uk> mmayhew at MCB.McGill.CA wrote: > Tested a sample file with a single FASTA record at home with my Windows XP > machine and produced the following set of results using Bio.SeqIO first > and then Bio.Fasta (with Fasta.Iterator and Fasta.RecordParser() > instances). > Looks like the same error, with a completely different OS (received exact > same error with Mac OSX v10.4 & biopython 1.42). Since Bio.SeqIO works > fine (thank you for the reccomendation) I will use that, but the Bio.Fasta > error may potentially be an error to look into. Your example worked fine for me on Linux, I haven't tried on Windows XP yet. While you get the same error on both Windows XP and Mac OS X? My only suggestion right now is to check your line endings (CR versus CRLF versus LF) are appropriate for the platform, and to try: handle = open('test.txt', 'rU') Also, what do these give? print repr(open('test.txt', 'r').read()) print repr(open('test.txt', 'rU').read()) I'm looking for any difference in the new lines (e.g. \n versus \n\r) Maybe Michiel can suggest something as I believe he sometimes uses Biopython on Mac OS X. On the bright side, Bio.SeqIO works for you (and we now recommend using that instead of Bio.Fasta) but I would still like to sort this out. Peter From kosa at genesilico.pl Wed Jul 11 04:24:11 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Wed, 11 Jul 2007 10:24:11 +0200 Subject: [BioPython] Bio.SeqIO and files with one record Message-ID: <469493AB.3020504@genesilico.pl> Hi, Do I understand correctly that the function is to return a record instead of a parser? If yes I think it could be useful. parse_single sounds good. Cheers, Jan Kosinski > Message: 7 > Date: Tue, 10 Jul 2007 21:03:10 +0100 > From: Peter > Subject: [BioPython] Bio.SeqIO and files with one record > To: biopython at lists.open-bio.org > Message-ID: <4693E5FE.708 at maubp.freeserve.co.uk> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear Biopython people, > > I'd like a little feedback on the Bio.SeqIO module - in particular, one > situation I think could be improved is when dealing with sequences files > which contain a single record - for example a very simple Fasta file, or > a chromosome in a GenBank file. > > http://www.biopython.org/wiki/SeqIO > > The shortest way to get this one record as a SeqRecord object is probably: > > from Bio import SeqIO > record = SeqIO.parse(open("example.gbk"), "genbank").next() > > This works, assuming there is at least one record, but will not trigger > any error if there was more than one record - something you may want to > check. > > Do any of you think this situation is common enough to warrant adding > another function to Bio.SeqIO to do this for you (raising errors for no > records or more than one record). My suggestions for possible names > include parse_single, parse_one, parse_sole, parse_individual and mono_parse > > One way to do this inline would be: > > from Bio import SeqIO > temp_list = list(SeqIO.parse(open("example.gbk"), "genbank")) > assert len(temp_list) == 1 > record = temp_list[0] > del temp_list > > Or perhaps: > > from Bio import SeqIO > temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank")) > record = temp_iter.next() > try : > assert temp_iter.next() is None > except StopIteration : > pass > del temp_iter > > The above code copes with the fact that in general some iterators may > signal the end by raising a StopIteration except, or by returning None. > > Peter > > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > > > > :. From biopython at maubp.freeserve.co.uk Wed Jul 11 05:32:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Jul 2007 10:32:13 +0100 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <469493AB.3020504@genesilico.pl> References: <469493AB.3020504@genesilico.pl> Message-ID: <4694A39D.4010805@maubp.freeserve.co.uk> Jan Kosinski wrote: > Hi, > > Do I understand correctly that the function is to return a record > instead of a parser? If yes I think it could be useful. parse_single > sounds good. Yes, sorry if I wasn't clear. Bio.SeqIO.parse(handle, format) would still return an iterator giving SeqRecord objects. The suggested function (possibly called) Bio.SeqIO.parse_single(handle, format) would return a single SeqRecord object if the file contains one and only one record. It would raise exceptions for no records, or more than one record. e.g. from Bio import SeqIO handle = open('example.gbk') record = Bio.SeqIO.parse_single(handle, genbank') or, from Bio import SeqIO record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta') As I said, I sometimes find myself wanting to do this - for example single query BLAST files in fasta format, or bacterial genomes in GenBank format. The question is, is this worth adding to the interface or is this a relatively rare need? Peter From kosa at genesilico.pl Wed Jul 11 05:58:27 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Wed, 11 Jul 2007 11:58:27 +0200 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694A39D.4010805@maubp.freeserve.co.uk> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> Message-ID: <4694A9C3.2030707@genesilico.pl> I think this is a rare case that python program must be sure that no other sequences are in the file it reads. If a program reads a file with single sequence using SeqIO.parse and refer to that sequence by sth like "records[0]" it is not needed to check if there were other sequences in the file as far as the program was not design to notify the user that perhaps he made an error by providing a file with multiple sequences. Actually, although parse_single would be useful, such function adds little new value to biopython and certainly adding this function could be postponed to the time when you really don't know what to add more to biopython ;-) Yet, I it is worth adding if it does not require much work and it would not mess up the interface at all. Janek Peter wrote: > > > The question is, is this worth adding to the interface or is this a > relatively rare need? :. From mmokrejs at ribosome.natur.cuni.cz Wed Jul 11 06:32:36 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Wed, 11 Jul 2007 12:32:36 +0200 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694A39D.4010805@maubp.freeserve.co.uk> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> Message-ID: <4694B1C4.3010609@ribosome.natur.cuni.cz> Hi, Peter wrote: > Jan Kosinski wrote: >> Hi, >> >> Do I understand correctly that the function is to return a record >> instead of a parser? If yes I think it could be useful. parse_single >> sounds good. > > Yes, sorry if I wasn't clear. > > Bio.SeqIO.parse(handle, format) would still return an iterator giving > SeqRecord objects. > > The suggested function (possibly called) Bio.SeqIO.parse_single(handle, > format) would return a single SeqRecord object if the file contains one > and only one record. It would raise exceptions for no records, or more > than one record. > > e.g. > > from Bio import SeqIO > handle = open('example.gbk') > record = Bio.SeqIO.parse_single(handle, genbank') > > or, > > from Bio import SeqIO > record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta') I think it does make sense, but call it parse_the_only_one() to make it clear, it does not pick up just the very first record from the many. > > As I said, I sometimes find myself wanting to do this - for example > single query BLAST files in fasta format, or bacterial genomes in > GenBank format. > > The question is, is this worth adding to the interface or is this a > relatively rare need? Once people learn to wrap the iterator in a loop it is not necessary, but I think if you have the time to do this ... ;-) Martin From cjfields at uiuc.edu Wed Jul 11 07:58:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 06:58:50 -0500 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694B1C4.3010609@ribosome.natur.cuni.cz> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> <4694B1C4.3010609@ribosome.natur.cuni.cz> Message-ID: On Jul 11, 2007, at 5:32 AM, Martin MOKREJ? wrote: > ... > Once people learn to wrap the iterator in a loop it is not > necessary, but I think > if you have the time to do this ... ;-) > Martin Wrapping the iterator in a while loop works for bioperl as it only works when the expression evals to true (and assigning undef evals as false). It's a common bioperl idiom to do something like: my $seqio = Bio::SeaIO->new(-format => 'genbank', -file => 'myfile.gb'); while (my $seq = $seqio->next_seq) { # do stuff here .... } chris From dalloliogm at gmail.com Thu Jul 12 11:00:03 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 12 Jul 2007 17:00:03 +0200 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <468CBAE6.3030306@maubp.freeserve.co.uk> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> <468CBAE6.3030306@maubp.freeserve.co.uk> Message-ID: <5aa3b3570707120800g1ed2c8f1t73f117f61cab0874@mail.gmail.com> Yes, it's true, it is something similar to the way SeqFeature should work. But I just still don't get how to represent my genes in biopython :( You know, I've printed the Bio module UML scheme from here: http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png and putted it in the wall above the monitor of my computer like a poster. So everyday, when I come at work, I see the Bio module UML scheme and ask myself why SeqRecord.features is a list instead of a dictionary :) 2007/7/5, Peter : > Giovanni Marco Dall'Olio wrote: > > Let's have a look at your example: > > - we have a list of features like this: > > list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > > > > - then we specify the meaning of these features in another dictionary: > > splicesignal5 = list_features[0] > > polypirimidinetract = list_features[1] > > splicesignal3 = list_features[2] > > > > python passes the variables by value: this means that if you change > > one of the values in the list_features list, then you have to update > > all the variables which refer to it manually. > > > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > >>>> splicesignal5 = list_features[0] > >>>> print splicesignal5 > > 'GTAAGT' > >>>> list_features[0] = 'TTTTTTT' > >>>> print splicesignal5 > > 'GTAAGT' # wrong! > >>>> splicesignal5 = list_features[0] # have to update all the > > variables which refer to list_features manually > >>>> print splicesignal5' > > 'TTTTTTT' > > > > This is why I prefer to save the positions of the features instead of > > their values: > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], > > 'splicesignal3': [2]} > >>>> def get_feature(feature_name): return > > list_features[dict_aliases[feature_name]] # (this code doesn't work) > > ... > > > Another option could be to use references to memory positions instead > > of dictionary keys, but I don't know how to implement this in python, > > and I'm not sure it would be computationally convenient. > > Have you considered making "feature objects", where each object can hold > multiple pieces of information such as a name, alias, type - as well as > the sequence data itself. You may wish to create your own class here, or > try and use the existing Biopython SeqFeature object. > > You could then use a list to hold your feature objects, or a dictionary > keyed on the alias perhaps. Or both. > > e.g. > > class Feature : > #Very simple class which could be extended > def __init__(self, seq_string) : > self.seq = seq_string > > def __repr__(self) : > #Use id(self) is to show the memory location (in hex), just > #to show difference between two instance with same seq > return "Feature(%s) instance at %s" \ > % (self.seq, hex(id(self))) > > > list_features = [Feature('GTAAGT'), > Feature('TACTAAC'), > Feature('TGT')] > > splicesignal5 = list_features[0] > print splicesignal5 > print list_features[0] > > print "EDITING first object in the list:" > list_features[0].seq = 'TTTTTTT' > > print splicesignal5 #changed, now TTTTTTT > print list_features[0] > > print "REPLACING first object in the list:" > list_features[0] = Feature('GGGGGG') > > print splicesignal5 #still points to old object, TTTTTTT > print list_features[0] > > -- > > I'm not sure if that is closer to what you wanted, or not. > > Peter > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Sat Jul 14 06:18:12 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 11:18:12 +0100 Subject: [BioPython] Problem with blastx output parsing =~ In-Reply-To: <46704A00.9010409@maubp.freeserve.co.uk> References: <800166920706040936w4de744acn8cefe445a6284f72@mail.gmail.com> <46644664.6080009@maubp.freeserve.co.uk> <800166920706041022u5fafc308h71bdcaa11acfade1@mail.gmail.com> <46644ED2.1080505@maubp.freeserve.co.uk> <4665408E.2090306@maubp.freeserve.co.uk> <466FC73C.2000608@maubp.freeserve.co.uk> <800166920706131028o4eb5ea6eqa92e3f0634ea7748@mail.gmail.com> <46704A00.9010409@maubp.freeserve.co.uk> Message-ID: <4698A2E4.4010208@maubp.freeserve.co.uk> Hi Italo, I haven't heard anything back from you - did you get all your 24,000 plain text blast output files to work with Biopython? Thanks Peter Related bug 2090 http://bugzilla.open-bio.org/show_bug.cgi?id=2090 From italo.maia at gmail.com Sat Jul 14 10:23:36 2007 From: italo.maia at gmail.com (Italo Maia) Date: Sat, 14 Jul 2007 11:23:36 -0300 Subject: [BioPython] Problem with blastx output parsing =~ In-Reply-To: <4698A2E4.4010208@maubp.freeserve.co.uk> References: <800166920706040936w4de744acn8cefe445a6284f72@mail.gmail.com> <46644664.6080009@maubp.freeserve.co.uk> <800166920706041022u5fafc308h71bdcaa11acfade1@mail.gmail.com> <46644ED2.1080505@maubp.freeserve.co.uk> <4665408E.2090306@maubp.freeserve.co.uk> <466FC73C.2000608@maubp.freeserve.co.uk> <800166920706131028o4eb5ea6eqa92e3f0634ea7748@mail.gmail.com> <46704A00.9010409@maubp.freeserve.co.uk> <4698A2E4.4010208@maubp.freeserve.co.uk> Message-ID: <800166920707140723pa8540bbs3e3b83a62525a7de@mail.gmail.com> Wow Peter, hi! Well, to tell you the truth, i was told to do something else here at work, so i left those blast output aside. But, i'll try to parse some blast outputs after tomorrow, and give you some feedback. ^_^ 2007/7/14, Peter : > > Hi Italo, > > I haven't heard anything back from you - did you get all your 24,000 > plain text blast output files to work with Biopython? > > Thanks > > Peter > > Related bug 2090 > http://bugzilla.open-bio.org/show_bug.cgi?id=2090 > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From douglas.kojetin at gmail.com Sat Jul 14 13:03:21 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Sat, 14 Jul 2007 13:03:21 -0400 Subject: [BioPython] Bio.PDB phi/psi angles Message-ID: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> Hi All, I would like to print the phi/psi angles for a structure using the script found here: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/ python/ramachandran/calculate/ but the script chokes on the following line: phi_psi = poly.get_phi_psi_list() error output: Traceback (most recent call last): File "./ramachandran_biopython.py", line 62, in phi_psi = poly.get_phi_psi_list() File "/sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py", line 169, in get_phi_psi_list res.xtra["PHI"]=phi NameError: global name 'res' is not defined Does anyone know what I can do to overcome this problem? Thanks, Doug From orlando.doehring at googlemail.com Sat Jul 14 13:59:44 2007 From: orlando.doehring at googlemail.com (=?ISO-8859-1?Q?Orlando_D=F6hring?=) Date: Sat, 14 Jul 2007 19:59:44 +0200 Subject: [BioPython] HETATM records retrieval Message-ID: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> Dear community, how should HETATM records be retrieved via Biopython? I assume it should be somewhere on the chain or residue level: - http://www.biopython.org/DIST/docs/api/public/Bio.PDB.Residue.Residue-class.html - http://biopython.org/DIST/docs/api/private/Bio.PDB.Chain.Chain-class.html Using the following basic sample code : for model in self.structure.get_iterator(): for chain in model.get_iterator(): print chain.__repr__() for residue in chain.get_iterator(): print residue.__repr__() applied to protein 1DHR (http://www.pdb.org/pdb/files/1dhr.pdb) we get: ... As one can see that are all ATOM records. Thanks. Yours, Orlando From biopython at maubp.freeserve.co.uk Sat Jul 14 14:07:55 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 19:07:55 +0100 Subject: [BioPython] Bio.PDB phi/psi angles In-Reply-To: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> References: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> Message-ID: <469910FB.40601@maubp.freeserve.co.uk> Douglas Kojetin wrote: > Hi All, > > I would like to print the phi/psi angles for a structure using the > script found here: > > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/ > python/ramachandran/calculate/ > > but the script chokes on the following line: > > phi_psi = poly.get_phi_psi_list() > > error output: > > Traceback (most recent call last): > File "./ramachandran_biopython.py", line 62, in > phi_psi = poly.get_phi_psi_list() > File "/sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py", > line 169, in get_phi_psi_list > res.xtra["PHI"]=phi > NameError: global name 'res' is not defined > > Does anyone know what I can do to overcome this problem? Its not your fault, it looks like an error has crept into Bio/PDB/Polypeptide.py with revision 1.32 (shipped with biopython 1.43), which I think I've just fixed with CVS revision 1.33 You can grab the files from here once the webpage has updated: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/PDB/Polypeptide.py?cvsroot=biopython Just backup and replace the existing file here: /sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py Or, you could downgrade to biopython 1.42 ;) Peter From biopython at maubp.freeserve.co.uk Sat Jul 14 14:23:07 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 19:23:07 +0100 Subject: [BioPython] HETATM records retrieval In-Reply-To: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> References: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> Message-ID: <4699148B.2050500@maubp.freeserve.co.uk> Orlando D?hring wrote: > Dear community, > > how should HETATM records be retrieved via Biopython? I assume it should be > somewhere on the chain or residue level In the PDB file you used, 1DHR, all the HETATM records are for solvents (NAD = NICOTINAMIDE ADENINE DINUCLEOTIDE and HOH = water) so they don't appear as part of the protein chains. I haven't looked at this recently so its not fresh in my mind.chain. When there are HETATM entries within a protein (e.g. alternative amino acids) then they should be part of the chain. > Using the following basic sample code : > > for model in self.structure.get_iterator(): > for chain in model.get_iterator(): > print chain.__repr__() > for residue in chain.get_iterator(): > print residue.__repr__() You don't need to explicitly call the get_iterator() method, so I much prefer this style myself: structure = ... for model in structure: for chain in model: print repr(chain) for residue in chain: print repr(residue) I've also used the repr() function rather than the hidden __repr__ of the object; its the same end result but I find this clearer. Have you read the example on this page? In particular the use of the PPBuilder or CaPPBuilder classes: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/#BioPython I also urge you to look at the author's (Thomas Hamelryck) documentation here: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is much more useful than the automatic API documentation you linked to. Peter From O.Doehring at cs.ucl.ac.uk Sat Jul 14 14:38:22 2007 From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk) Date: 14 Jul 2007 19:38:22 +0100 Subject: [BioPython] HETATM records retrieval Message-ID: Dear community, how should HETATM records be retrieved via Biopython? I assume it should be somewhere on the chain or residue level: - http://www.biopython.org/DIST/docs/api/public/Bio.PDB.Residue.Residue-class.html - http://biopython.org/DIST/docs/api/private/Bio.PDB.Chain.Chain-class.html Using the following basic sample code : for model in self.structure.get_iterator(): for chain in model.get_iterator(): print chain.__repr__() for residue in chain.get_iterator(): print residue.__repr__() applied to protein 1DHR (http://www.pdb.org/pdb/files/1dhr.pdb) we get: ... As one can see that are all ATOM records. Thanks. Yours, Orlando From fahy at chapman.edu Mon Jul 16 02:41:36 2007 From: fahy at chapman.edu (Michael Fahy) Date: Sun, 15 Jul 2007 23:41:36 -0700 Subject: [BioPython] PHYLIP Message-ID: <002401c7c774$5b9a2170$12ce6450$@edu> I've just started using BioPython and have worked through the Cookbook section on calling clustalw from Python to do alignments. It would be cool to use clustalw to produce PHYLIP-format files and then call the PHYLIP programs to produce phylogenetic trees from them. Has anyone already worked this out? I searched the last couple of years of list archives and did not find anything about using BioPython to access PHYLIP. From cy at cymon.org Mon Jul 16 04:51:44 2007 From: cy at cymon.org (Cymon J. Cox) Date: Mon, 16 Jul 2007 09:51:44 +0100 Subject: [BioPython] PHYLIP In-Reply-To: <002401c7c774$5b9a2170$12ce6450$@edu> References: <002401c7c774$5b9a2170$12ce6450$@edu> Message-ID: <1184575904.9393.7.camel@clintonite.nhm.ac.uk> Hi Michael, On Sun, 2007-07-15 at 23:41 -0700, Michael Fahy wrote: > I've just started using BioPython and have worked through the Cookbook > section on calling clustalw from Python to do alignments. It would be cool > to use clustalw to produce PHYLIP-format files and then call the PHYLIP > programs to produce phylogenetic trees from them. If your doing phylogenetics in Python you should checkout Peter Foster's P4 (http://www.bmnh.org/~pf/p4.html). > Has anyone already worked > this out? Probably, but then few people continue to use phylip on a regular basis these days... Cheers, C. ________________________________________________________________________ Cymon J. Cox Biometry and Molecular Research Department of Zoology Natural History Museum Cromwell Road London, SW7 5BD Email: cy at cymon.org, c.cox at nhm.ac.uk, cymon.cox at gmail.com Phone : +44 (0)20 7942 6981 HomePage : http://www.duke.edu/~cymon -8.63/-6.77 ________________________________________________________________________ Fedora Core release 6 (Zod) clintonite.nhm.ac.uk 09:43:20 up 19:13, 6 users, load average: 0.02, 0.18, 0.29 From biopython at maubp.freeserve.co.uk Mon Jul 16 05:14:54 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 10:14:54 +0100 Subject: [BioPython] PHYLIP In-Reply-To: <002401c7c774$5b9a2170$12ce6450$@edu> References: <002401c7c774$5b9a2170$12ce6450$@edu> Message-ID: <469B370E.4000703@maubp.freeserve.co.uk> Michael Fahy wrote: > I've just started using BioPython and have worked through the Cookbook > section on calling clustalw from Python to do alignments. It would be cool > to use clustalw to produce PHYLIP-format files and then call the PHYLIP > programs to produce phylogenetic trees from them. Has anyone already worked > this out? I searched the last couple of years of list archives and did not > find anything about using BioPython to access PHYLIP. You should be able to do this: 1. Produce your unaligned sequences in a suitable format for clustalw e.g. write a fasta file using Bio.SeqIO.write(...) 2. Run clustalw (e.g. using the Bio.Clustalw command line wrapper in Biopthon, or just make a system call in python). 3. Read in the clustal format alignment using Bio.SeqIO.parse(...) and write it out unaltered using Bio.SeqIO.write(...) in phylip format. See http://biopython.org/wiki/SeqIO#File_Format_Conversion 4. Run the PHYLIP tools (e.g. by making a system call from python, or by hand at the command line). Personally I like the EMBOSS implementation of PHYLIP as this uses proper command line arguments - making calling them from code much easier. (Note they do like to re-arrange their website, and as EMBOSS 5,0 is just out, it looks like some links are broken right now). Note that you should avoid long record id's as the phylip format imposes strict truncation of 10 characters (which could lead to non-unique record names). Peter From mmokrejs at ribosome.natur.cuni.cz Mon Jul 16 10:24:05 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 16 Jul 2007 16:24:05 +0200 Subject: [BioPython] Transcription In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> Message-ID: <469B7F85.2030409@ribosome.natur.cuni.cz> BTW, someone who has the rights should include this example in the docs. The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Martin Michiel De Hoon wrote: > It all depends on how you interpret the sequence that you give to the > transcribe function, and for that matter, to the translate function. For > translation, virtually all biological publications show the non-template > strand. Hence, the sequence given to the translate function in Biopython is > also interpreted as the non-template strand. For consistency, the sequence > given to the transcribe function in Biopython is also taken to be the > non-template strand: > >>>> from Bio.Seq import * >>>> s = "ATGGTATAA" >>>> translate(s) > 'MV*' >>>> transcribe(s) > 'AUGGUAUAA' >>>> translate(_) > 'MV*' > > If you want to have the reverse complement, use the reverse_complement > function. > > --Michiel. > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michael Fahy > Sent: Tue 7/10/2007 9:36 PM > To: BioPython at lists.open-bio.org > Subject: [BioPython] Transcription > > I just showed the BioPython tutorial to some of our Biology and Chemistry > faculty. They pointed out that all the "Transcribe" function does is > replace each occurrence of "T" in the sequence with a "U". The biologists > said that that is not what they mean by transcription. They felt that each > nucleotide should have been replaced by the complementary nucleotide, and > that the resulting string should have been reversed. > > This, they said, would be concordant with the way in which biologists use > the term "transcribe'. It would not be hard to do, so why does BioPython do > what it does and call it transcription? > > > > > > Michael Fahy > > Mathematics and Computer Science > > Chapman University > > One University Drive > > Orange, CA 92866 > > (714) 997-6879 > > fahy at chapman.edu > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > -- Dr. Martin Mokrejs Dept. of Genetics and Microbiology Faculty of Science, Charles University Vinicna 5, 128 43 Prague, Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs From biopython at maubp.freeserve.co.uk Mon Jul 16 11:15:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 16:15:31 +0100 Subject: [BioPython] Bio.SeqIO ideas In-Reply-To: <469B6231.6040109@ribosome.natur.cuni.cz> References: <4693E5FE.708@maubp.freeserve.co.uk> <469B6231.6040109@ribosome.natur.cuni.cz> Message-ID: <469B8B93.5070201@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > Peter, > maybe the docs (generated from sources as well as those in the > Documentation) should be clear what is id, name, description of SeqRecord object. They are all strings, normally specified when creating the instance of the SeqRecord object. The answer is it depends on where the SeqRecord came from - and for Bio.SeqIO this means which file format. One idea I had in mind was to expand the wiki page with worked examples of a sequence files and the SeqRecord created from it by Bio.SeqIO > E.g., > it would be helpful to demonstrate the values on an example of a FASTA > record parsed. Then one would figure out what is the difference between name > and description. Fasta files are used in the tutorial, http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 Do you think in addition to explicitly showing the record id and seq, I should also show the description (and name)? Fasta files are a very free form format, and in general the first word (splitting on white space) is a name or identifier. In some cases (e.g. NCBI fasta files) this can be subdivided (splitting on the | character). To be explicit suppose you had this: >554154531 a made up protein SDKJSDLHVLSDJDKJFDLJFKLSDJD >heat shock protein EINDLKNFLDHFDSHFLDSHJDSHDJHJHKJHSD Biopython will use the first word as both the record id and name, and the full text as the description. For example given this FASTA file you would get two records, the first: id = name = "554154531" description = "554154531 a made up protein" and the second, id = name = "heat" description = "heat shock protein" Note that the inclusion of the full text as the description is partly based on older Biopython code, and also to try and make it as easy as possible for you to extract any data from the line in your own code. Peter From biopython at maubp.freeserve.co.uk Mon Jul 16 12:26:29 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 17:26:29 +0100 Subject: [BioPython] Transcription In-Reply-To: <469B7F85.2030409@ribosome.natur.cuni.cz> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> Message-ID: <469B9C35.2020604@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter From fahy at chapman.edu Mon Jul 16 12:39:42 2007 From: fahy at chapman.edu (Michael Fahy) Date: Mon, 16 Jul 2007 09:39:42 -0700 Subject: [BioPython] Transcription In-Reply-To: <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> It would have helped me avoid some confusion if the Tutorial had mentioned the difference between sequences from the coding strand and from the template strand and the fact that the transcribe function in BioPython expects a sequence from the coding strand. My biology colleagues here seem to prefer thinking of transcribing the sequence from the template strand. It's not a big deal now that we understand what the intent is, but it caused some misunderstanding at first. -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter Sent: Monday, July 16, 2007 9:26 AM To: Martin MOKREJ? Cc: BioPython at lists.open-bio.org Subject: Re: [BioPython] Transcription Martin MOKREJ? wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mdehoon at c2b2.columbia.edu Mon Jul 16 20:06:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 16 Jul 2007 20:06:26 -0400 Subject: [BioPython] Transcription References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> > The source to the tutorial is a TeX (or LaTeX?) file in CVS, which > generates the HTML and PDF tutorial. I can update the CVS files, but I > don't think I can update the file on the webpage... I update the file on the web page whenever I make a new Biopython release, using the file that is in CVS at that time. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Mon 7/16/2007 12:26 PM To: Martin MOKREJS Cc: BioPython at lists.open-bio.org Subject: Re: [BioPython] Transcription Martin MOKREJS wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3318 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070716/c3064869/attachment.bin From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 04:50:25 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 10:50:25 +0200 Subject: [BioPython] Bio.SeqIO ideas In-Reply-To: <469B8B93.5070201@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> <469B6231.6040109@ribosome.natur.cuni.cz> <469B8B93.5070201@maubp.freeserve.co.uk> Message-ID: <469C82D1.7040907@ribosome.natur.cuni.cz> Hi Peter, Peter wrote: > Martin MOKREJ? wrote: >> Peter, >> maybe the docs (generated from sources as well as those in the >> Documentation) should be clear what is id, name, description of >> SeqRecord object. > > They are all strings, normally specified when creating the instance of > the SeqRecord object. The answer is it depends on where the SeqRecord > came from - and for Bio.SeqIO this means which file format. > > One idea I had in mind was to expand the wiki page with worked examples > of a sequence files and the SeqRecord created from it by Bio.SeqIO I didn't know that, then definitely such examples would be really helpful. > > E.g., >> it would be helpful to demonstrate the values on an example of a FASTA >> record parsed. Then one would figure out what is the difference >> between name and description. > > Fasta files are used in the tutorial, > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 This is I think not very clear, rather show how to get the real sequence using tostring() instead of the __repr__ output, where the sequence is truncated and an alphabet is shown. Yes, tostring() was used somewhere way above in section 2.2. > > Do you think in addition to explicitly showing the record id and seq, I > should also show the description (and name)? Yes, because they are confusing. For parsing FASTA files I either my old home grown code I did use something else. Yesterdays I just wanted to parse and modify some files having extra coordinates in the description line and thought let's use biopython. Yep, but had to do several times dir(record) to see the methods available, as the manuals did not provide me with complete listing of the methods/functions. And I really had to play around to see what is name and description. And then do an extra search in the docs for the SeqRecord class and its properties. > > Fasta files are a very free form format, and in general the first word > (splitting on white space) is a name or identifier. In some cases (e.g. > NCBI fasta files) this can be subdivided (splitting on the | character). > > To be explicit suppose you had this: > > >554154531 a made up protein > SDKJSDLHVLSDJDKJFDLJFKLSDJD > >heat shock protein > EINDLKNFLDHFDSHFLDSHJDSHDJHJHKJHSD > > Biopython will use the first word as both the record id and name, and > the full text as the description. For example given this FASTA file you > would get two records, the first: > > id = name = "554154531" > description = "554154531 a made up protein" > > and the second, > > id = name = "heat" > description = "heat shock protein" Please include these examples on the web and maybe it is sufficient for the first pass, probably thinking how an EMBL record would get parsed is unnecessarily complex. FASTA should definitely appear in there. > > Note that the inclusion of the full text as the description is partly > based on older Biopython code, and also to try and make it as easy as > possible for you to extract any data from the line in your own code. I use only record.id, record.description and record.seq.tostring(). BTW, doing record.seq.tostring instead of record.seq.tostring() breaks biopython code somewhere inside but was clear it was my fault anyway. Martin From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 05:07:44 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:07:44 +0200 Subject: [BioPython] Transcription In-Reply-To: <469B9C35.2020604@maubp.freeserve.co.uk> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <469C86E0.5010205@ribosome.natur.cuni.cz> Peter, Peter wrote: > Martin MOKREJ? wrote: >> BTW, someone who has the rights should include this example in the docs. >> The relevant section is empty: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 > > Or remove it? We cover transcription and translation earlier: > > Section 2.2 Working with sequences > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 But is is buried in the text, it is more helpful to have a chapter item like this. Moreover, I ended up going through my thrash and picking up the email on this thread last from week. Now I have from Bio.Seq import translate ... print ''.join(('>', _record.id, ' ', _description, ' ', _start.strip(), '..', _stop.strip(), '\n', translate(_record.seq.tostring()[int(_start)-1:int(_stop)]))) but following the Tutorial would have been better as one could choose different translator. For example, now I wanted to look again into the docs to show you what's in, I did search on the main webpage but searching for 'translate' gave nothing, so I ticked also some additional set like the discussion but nothing came. I went to Documentation section, picked up API documentation as that is the fastest http://biopython.org/DIST/docs/api/public/trees.html , found the Bio.Translate.Translator . Again, the generated docs are bad: http://biopython.org/DIST/docs/api/public/Bio.Translate.Translator-class.html and >>> help(Translate) is not much helpful ither: Help on module Bio.Translate in Bio: NAME Bio.Translate FILE /usr/lib/python2.4/site-packages/Bio/Translate.py CLASSES Translator class Translator | Methods defined here: | | __init__(self, table) | | back_translate(self, seq) | | translate(self, seq, stop_symbol='*') | | translate_to_stop(self, seq) DATA ambiguous_dna_by_id = {1: , 2: , 2: , 2: , 2: I think the classes should be documented at least in the source code. That's one of the very nice python features, of course not unique only to it. Regards, Martin From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 05:29:31 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:29:31 +0200 Subject: [BioPython] Transcription In-Reply-To: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> References: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> Message-ID: <469C8BFB.8000901@ribosome.natur.cuni.cz> Michael Fahy wrote: > It would have helped me avoid some confusion if the Tutorial had mentioned > the difference between sequences from the coding strand and from the > template strand and the fact that the transcribe function in BioPython > expects a sequence from the coding strand. My biology colleagues here seem > to prefer thinking of transcribing the sequence from the template strand. > It's not a big deal now that we understand what the intent is, but it caused > some misunderstanding at first. Yes, I read the discussion last week, it is fooling for me as well, I don't think I need the transcribe() as I would probably always do seq.tostring().replace('tT', 'uU') and not search a special function in the code. But what I would do look for is a function doing reverse_complement() and replace('tT', 'uU') while allowing me to optionally pass in intron or probably even more conveniently exon positions. Sure one can do it himself. Definitely, the Transcribe API docs are bad as for the Translate which is really really bad. One one wondered what translate_to_stop really does as one would hope that it starts as well on the very first AUG being in the correct frame, maybe does 3-frame or even six-frame translation ... who know what the undocumented function does. ;-)) Well, doing http://www.google.com/search?hl=en&q=site%3Abiopython.org+translate_to_stop&btnG=Google+Search gave few hits, one in tutorial and one can see it only translates to the stop and doesn't care about start and frames. Definitely, no other docs exist except the Tutorial and Cookbook as shown in the Google output. Very bad I think. Maybe someone would really fix the Transcribe? http://www.google.com/search?num=100&hl=en&q=%22from+Bio+import+Transcribe%22&btnG=Search&lr=lang_en Similarly, the Translate points me to the original thread years ago: http://209.85.135.104/search?q=cache:MU0IHhkMV9YJ:portal.open-bio.org/pipermail/biopython/2005-May.txt+%22from+Bio+import+Translate%22&hl=en&ct=clnk&cd=6&lr=lang_cs|lang_en|lang_de|lang_sk Martin > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter > Sent: Monday, July 16, 2007 9:26 AM > To: Martin MOKREJ? > Cc: BioPython at lists.open-bio.org > Subject: Re: [BioPython] Transcription > > Martin MOKREJ? wrote: >> BTW, someone who has the rights should include this example in the docs. >> The relevant section is empty: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 > > Or remove it? We cover transcription and translation earlier: > > Section 2.2 Working with sequences > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 > > The source to the tutorial is a TeX (or LaTeX?) file in CVS, which > generates the HTML and PDF tutorial. I can update the CVS files, but I > don't think I can update the file on the webpage... > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 05:30:38 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:30:38 +0200 Subject: [BioPython] Transcription In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> Message-ID: <469C8C3E.6000608@ribosome.natur.cuni.cz> Michiel De Hoon wrote: >> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >> generates the HTML and PDF tutorial. I can update the CVS files, but I >> don't think I can update the file on the webpage... > > I update the file on the web page whenever I make a new Biopython release, > using the file that is in CVS at that time. Maybe a script would do that daily? M. From biopython at maubp.freeserve.co.uk Tue Jul 17 07:41:12 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Jul 2007 12:41:12 +0100 Subject: [BioPython] Transcription In-Reply-To: <469C8C3E.6000608@ribosome.natur.cuni.cz> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> <469C8C3E.6000608@ribosome.natur.cuni.cz> Message-ID: <469CAAD8.4060802@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > > Michiel De Hoon wrote: >>> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >>> generates the HTML and PDF tutorial. I can update the CVS files, but I >>> don't think I can update the file on the webpage... >> I update the file on the web page whenever I make a new Biopython release, >> using the file that is in CVS at that time. > > Maybe a script would do that daily? That isn't always a good idea - Updating the online copy as part of the release cycle means that the documentation matches the latest available release. Updating the tutorial in sync with CVS could mean that the documentation would include changes made for as yet unreleased code. As long as we avoid unreleased changes, we can of course improve the documentation and update the online copy in between release cycles. Peter From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 08:44:34 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 14:44:34 +0200 Subject: [BioPython] Transcription In-Reply-To: <469CAAD8.4060802@maubp.freeserve.co.uk> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> <469C8C3E.6000608@ribosome.natur.cuni.cz> <469CAAD8.4060802@maubp.freeserve.co.uk> Message-ID: <469CB9B2.1060605@ribosome.natur.cuni.cz> Peter wrote: > Martin MOKREJ? wrote: >> >> Michiel De Hoon wrote: >>>> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >>>> generates the HTML and PDF tutorial. I can update the CVS files, but >>>> I don't think I can update the file on the webpage... >>> I update the file on the web page whenever I make a new Biopython >>> release, >>> using the file that is in CVS at that time. >> >> Maybe a script would do that daily? > > That isn't always a good idea - Updating the online copy as part of the > release cycle means that the documentation matches the latest available > release. Updating the tutorial in sync with CVS could mean that the > documentation would include changes made for as yet unreleased code. I understand that of course. > > As long as we avoid unreleased changes, we can of course improve the > documentation and update the online copy in between release cycles. Or keep both? M. From jodyhey at yahoo.com Tue Jul 24 00:23:13 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Mon, 23 Jul 2007 21:23:13 -0700 (PDT) Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw Message-ID: <270644.85634.qm@web53909.mail.re2.yahoo.com> I am running windows XP, Python 2.5.1 and BioPython trying to do a multiple alignment on a fasta file. The file is not large, with just 10 sequences to try things out. I'm familiar with clustalw as a command line program. With clustalw in the path the following works fine and opens a command prompt window with the clustalw menu >>>os.system('clustalw') However this line returns a '1' when it should do an alignment on the sequences contained in c.faa >>> os.system('clustalw c.faa') I also tried following the examples in the cookbook http://biopython.org/DIST/docs/tutorial/Tutorial.html and the examples in the bioinformatics course http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw For example, the following call to a constructed command line >>> alignment = Clustalw.do_alignment(cline) generates this error result Traceback (most recent call last): File "", line 1, in alignment = Clustalw.do_alignment(cline) File "C:\Program Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file .\testalign.out not produced, commandline: clustalw .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out Thanks much for any pointers jhey ____________________________________________________________________________________ Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 From biopython at maubp.freeserve.co.uk Tue Jul 24 03:47:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Jul 2007 08:47:58 +0100 Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw In-Reply-To: <270644.85634.qm@web53909.mail.re2.yahoo.com> References: <270644.85634.qm@web53909.mail.re2.yahoo.com> Message-ID: <46A5AEAE.4080203@maubp.freeserve.co.uk> Emanuel Hey wrote: > I am running windows XP, Python 2.5.1 and BioPython > trying to do a multiple alignment on a fasta file. > The file is not large, with just 10 sequences to try > things out. I'm familiar with clustalw as a command > line program. > > With clustalw in the path the following works fine and > opens a command prompt window with the clustalw menu >>>> os.system('clustalw') That suggests that clustalw is on the windows path, or in the current directory. Great. (I personally use a full path to clustalw.exe when working on Windows) > However this line returns a '1' when it should do an > alignment on the sequences contained in c.faa >>>> os.system('clustalw c.faa') Obvious question time - is the file c.faa in the current directory? i.e. Does this work? import os assert os.path.isfile('c.faa') > I also tried following the examples in the cookbook > http://biopython.org/DIST/docs/tutorial/Tutorial.html > and the examples in the bioinformatics course > http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw > > For example, the following call to a constructed > command line >>>> alignment = Clustalw.do_alignment(cline) > > generates this error result > > > Traceback (most recent call last): > File "", line 1, in > alignment = Clustalw.do_alignment(cline) > File "C:\Program > Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", > line 117, in do_alignment > % (out_file, command_line)) > IOError: Output .aln file .\testalign.out not > produced, commandline: clustalw > .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out This may be the same issue... Could you try using fully qualified paths for both clustalw.exe and the input fasta file? Peter From jodyhey at yahoo.com Tue Jul 24 10:38:56 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Tue, 24 Jul 2007 07:38:56 -0700 (PDT) Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw In-Reply-To: <46A5AEAE.4080203@maubp.freeserve.co.uk> Message-ID: <912111.24870.qm@web53904.mail.re2.yahoo.com> Yes, it was indeed a path problem. Thanks. jhey --- Peter wrote: > Emanuel Hey wrote: > > I am running windows XP, Python 2.5.1 and > BioPython > > trying to do a multiple alignment on a fasta file. > > The file is not large, with just 10 sequences to > try > > things out. I'm familiar with clustalw as a > command > > line program. > > > > With clustalw in the path the following works fine > and > > opens a command prompt window with the clustalw > menu > >>>> os.system('clustalw') > > That suggests that clustalw is on the windows path, > or in the current > directory. Great. (I personally use a full path to > clustalw.exe when > working on Windows) > > > However this line returns a '1' when it should do > an > > alignment on the sequences contained in c.faa > >>>> os.system('clustalw c.faa') > > Obvious question time - is the file c.faa in the > current directory? > i.e. Does this work? > > import os > assert os.path.isfile('c.faa') > > > I also tried following the examples in the > cookbook > > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > and the examples in the bioinformatics course > > > http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw > > > > For example, the following call to a constructed > > command line > >>>> alignment = Clustalw.do_alignment(cline) > > > > generates this error result > > > > > > Traceback (most recent call last): > > File "", line 1, in > > alignment = Clustalw.do_alignment(cline) > > File "C:\Program > > > Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", > > line 117, in do_alignment > > % (out_file, command_line)) > > IOError: Output .aln file .\testalign.out not > > produced, commandline: clustalw > > .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out > > This may be the same issue... > > Could you try using fully qualified paths for both > clustalw.exe and the > input fasta file? > > Peter > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From jodyhey at yahoo.com Tue Jul 24 15:57:05 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Tue, 24 Jul 2007 12:57:05 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw Message-ID: <935709.26436.qm@web53905.mail.re2.yahoo.com> I don't know if this is related to my previous problem As before under windows XP If I have clustalw.exe in the current directory then I should be able to execute just using >>>os.sytem('clustalw ' + 'data.faa') Indeed this works fine However if I give it the full path >>> os.system('clustalw ' + 'C:\temp\pythonplay\hcgplay\data.faa') or >>> os.system('clustalw ' + 'C:\\temp\\pythonplay\\hcgplay\\data.faa') then the clustalw run crashes and returns Error: unknown option /-INFILE=C:\temp\pythonplay\hcgplay\data.faa This is very annoying. Any clues? Thanks jhey ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/ From biopython at maubp.freeserve.co.uk Tue Jul 24 17:13:17 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Jul 2007 22:13:17 +0100 Subject: [BioPython] os.system problem with clustalw In-Reply-To: <935709.26436.qm@web53905.mail.re2.yahoo.com> References: <935709.26436.qm@web53905.mail.re2.yahoo.com> Message-ID: <46A66B6D.3010708@maubp.freeserve.co.uk> Emanuel Hey wrote: > If I have clustalw.exe in the current directory then I > should be able to execute just using > >>>> os.sytem('clustalw ' + 'data.faa') > > Indeed this works fine Yes, and if you try this at the windows command prompt it also works: clustalw data.faa or: clustalw.exe data.faa > However if I give it the full path >>>> os.system('clustalw ' + > 'C:\temp\pythonplay\hcgplay\data.faa') That should fail due to the way python uses slashes as escape characters, so \t means a tab for example. > or >>>> os.system('clustalw ' + > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > then the clustalw run crashes and returns > Error: unknown option > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa Its not crashing, its just returning with an error message. You are not dealing with a Biopython or even a python problem here - you are simply (but understandably) having trouble with the clustalw command line options. Notice that clustalw.exe will tolerate this: clustalw.exe data.faa as shorthand for: clustalw.exe /infile=data.faa However, for some reason it does not seem to work with full paths like this: clustalw.exe C:\temp\pythonplay\hcgplay\data.faa You have to be very explicit: clustalw.exe /infile=C:\temp\pythonplay\hcgplay\data.faa This may be a (windows only?) bug in clustalw. Its certainly not intuitive as is. Peter P.S. Did you not like using Bio.Clustalw to build the command line string for you? From skhadar at gmail.com Wed Jul 25 08:21:17 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Wed, 25 Jul 2007 17:51:17 +0530 Subject: [BioPython] Problem with Bio.PDB Message-ID: Hi, I want to use Bio.PDB module to calculate HSE and Resdue depth for all residues in a couple of proteins. This is the code I had written and am getting some serious errors :( . (My PyQ(Python Quotient is low, sorry if there is any significant syntax errors:) )) --- PS : Input PDB : http://www.rcsb.org/pdb/explore/explore.do?structureId=1crn msms is installed on my computer. -- 1. HSE.py _code_ #!/usr/bin/python from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('str1', '1crn.pdb') model=structure[0] hse = HSExposure() exp_ca = hse.calc_hs_exposure(model, option='CA3') exp_cb = hse.calc_hs_exposure(model, option='CB') print exp_ca[100] _error_ Traceback (most recent call last): File "HSE.py", line 12, in ? hse = HSExposure() TypeError: 'module' object is not callable 2. rd.py _code_ #!/usr/bin/python from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('str1', '1crn.pdb') model=structure[0] rd = ResidueDepth(model, '1crn.pdb') residue_depth, ca_depth=rd[46] _error_ pdb_to_xyzr: error, file 1crn.pdb line 91 residue 1 atom pattern THR N was not found in ./atmtypenumbers pdb_to_xyzr: error, file 1crn.pdb line 92 residue 1 atom pattern THR CA was not found in ./atmtypenumbers (..... till the last atom) pdb_to_xyzr: error, file 1crn.pdb line 417 residue 46 atom pattern ASN OXT was not found in ./atmtypenumbers Traceback (most recent call last): File "rd.py", line 8, in ? rd = ResidueDepth(model, '1crn.pdb') File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") -- Thanks Shameer Khadar NCBS-TIFR Bangalore India From biopython at maubp.freeserve.co.uk Wed Jul 25 09:00:02 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:00:02 +0100 Subject: [BioPython] Problem with Bio.PDB In-Reply-To: References: Message-ID: <46A74952.5030807@maubp.freeserve.co.uk> Shameer Khadar wrote: > Hi, > I want to use Bio.PDB module to calculate HSE and Resdue depth for all > residues in a couple of proteins. > This is the code I had written and am getting some serious errors :( . (My > PyQ(Python Quotient is low, sorry if there is any significant syntax > errors:) )) And your HSE code: > #!/usr/bin/python > from Bio.PDB import * > parser = PDBParser() > structure = parser.get_structure('str1', '1crn.pdb') > model=structure[0] > hse = HSExposure() > exp_ca = hse.calc_hs_exposure(model, option='CA3') > exp_cb = hse.calc_hs_exposure(model, option='CB') > print exp_ca[100] I'm guessing that code was based on a very old copy of the sample file: biopython/Scripts/Structure/hsexpo (which lacks a .py extension), dating to the release of Biopython 1.40 or earlier. Note that the DSSP, ResidueDepth and HSExposure classes changed two years ago. Here is my suggestion based on reading the latest version of that example (shipped with Biopython 1.41 onwards): from Bio.PDB import * RADIUS = 13.0 parser = PDBParser() structure = parser.get_structure('xxxx', '1crn.pdb') model=structure[0] print "HSE based on the approximate CA-CB vectors," print "using three consecutive CA positions:" hse_ca = HSExposureCA(model, RADIUS) for key in hse_ca.keys() : print key, hse_ca[key] print "HSE based on the real CA-CB vectors:" hse_cb = HSExposureCB(model, RADIUS) for key in hse_cb.keys() : print key, hse_cb[key] Peter From biopython at maubp.freeserve.co.uk Wed Jul 25 09:11:21 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:11:21 +0100 Subject: [BioPython] Problem with Bio.PDB documentation for HSE In-Reply-To: <46A74952.5030807@maubp.freeserve.co.uk> References: <46A74952.5030807@maubp.freeserve.co.uk> Message-ID: <46A74BF9.5070404@maubp.freeserve.co.uk> Actually... Shameer was trying to use this HSE code: >> #!/usr/bin/python >> from Bio.PDB import * >> parser = PDBParser() >> structure = parser.get_structure('str1', '1crn.pdb') >> model=structure[0] >> hse = HSExposure() >> exp_ca = hse.calc_hs_exposure(model, option='CA3') >> exp_cb = hse.calc_hs_exposure(model, option='CB') >> print exp_ca[100] I replied: > I'm guessing that code was based on a very old copy of the sample file: > biopython/Scripts/Structure/hsexpo (which lacks a .py extension), dating > to the release of Biopython 1.40 or earlier. Note that the DSSP, > ResidueDepth and HSExposure classes changed two years ago. I've realised that Shameer's code looks like it was based on page 12 of the "The Biopython Structural Bioinformatics FAQ" (i.e. the guide to Bio.PDB), http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf Thomas - it looks like the documentation needs updating here. As far as I can tell, the source is not in CVS... Peter From jodyhey at yahoo.com Wed Jul 25 09:39:43 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 06:39:43 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A66B6D.3010708@maubp.freeserve.co.uk> Message-ID: <396836.63695.qm@web53901.mail.re2.yahoo.com> Thanks much for responding ok, I had no idea Clustalw was so particular about its command line flags. I was not using Bio.Clustalw to build the command line because I could not get that to work either. for example this does not work, for reasons that are obscure to me. >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks jhey --- Peter wrote: > Emanuel Hey wrote: > > If I have clustalw.exe in the current directory > then I > > should be able to execute just using > > > >>>> os.sytem('clustalw ' + 'data.faa') > > > > Indeed this works fine > > Yes, and if you try this at the windows command > prompt it also works: > > clustalw data.faa > > or: > > clustalw.exe data.faa > > > However if I give it the full path > >>>> os.system('clustalw ' + > > 'C:\temp\pythonplay\hcgplay\data.faa') > > That should fail due to the way python uses slashes > as escape > characters, so \t means a tab for example. > > > or > >>>> os.system('clustalw ' + > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > > > then the clustalw run crashes and returns > > Error: unknown option > > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa > > Its not crashing, its just returning with an error > message. > > You are not dealing with a Biopython or even a > python problem here - you > are simply (but understandably) having trouble with > the clustalw command > line options. > > Notice that clustalw.exe will tolerate this: > > clustalw.exe data.faa > > as shorthand for: > > clustalw.exe /infile=data.faa > > However, for some reason it does not seem to work > with full paths like this: > > clustalw.exe C:\temp\pythonplay\hcgplay\data.faa > > You have to be very explicit: > > clustalw.exe > /infile=C:\temp\pythonplay\hcgplay\data.faa > > This may be a (windows only?) bug in clustalw. Its > certainly not > intuitive as is. > > Peter > > P.S. Did you not like using Bio.Clustalw to build > the command line > string for you? > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From jodyhey at yahoo.com Wed Jul 25 09:39:43 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 06:39:43 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A66B6D.3010708@maubp.freeserve.co.uk> Message-ID: <396836.63695.qm@web53901.mail.re2.yahoo.com> Thanks much for responding ok, I had no idea Clustalw was so particular about its command line flags. I was not using Bio.Clustalw to build the command line because I could not get that to work either. for example this does not work, for reasons that are obscure to me. >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks jhey --- Peter wrote: > Emanuel Hey wrote: > > If I have clustalw.exe in the current directory > then I > > should be able to execute just using > > > >>>> os.sytem('clustalw ' + 'data.faa') > > > > Indeed this works fine > > Yes, and if you try this at the windows command > prompt it also works: > > clustalw data.faa > > or: > > clustalw.exe data.faa > > > However if I give it the full path > >>>> os.system('clustalw ' + > > 'C:\temp\pythonplay\hcgplay\data.faa') > > That should fail due to the way python uses slashes > as escape > characters, so \t means a tab for example. > > > or > >>>> os.system('clustalw ' + > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > > > then the clustalw run crashes and returns > > Error: unknown option > > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa > > Its not crashing, its just returning with an error > message. > > You are not dealing with a Biopython or even a > python problem here - you > are simply (but understandably) having trouble with > the clustalw command > line options. > > Notice that clustalw.exe will tolerate this: > > clustalw.exe data.faa > > as shorthand for: > > clustalw.exe /infile=data.faa > > However, for some reason it does not seem to work > with full paths like this: > > clustalw.exe C:\temp\pythonplay\hcgplay\data.faa > > You have to be very explicit: > > clustalw.exe > /infile=C:\temp\pythonplay\hcgplay\data.faa > > This may be a (windows only?) bug in clustalw. Its > certainly not > intuitive as is. > > Peter > > P.S. Did you not like using Bio.Clustalw to build > the command line > string for you? > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From biopython at maubp.freeserve.co.uk Wed Jul 25 10:21:53 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 15:21:53 +0100 Subject: [BioPython] os.system problem with clustalw In-Reply-To: <396836.63695.qm@web53901.mail.re2.yahoo.com> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> Message-ID: <46A75C81.8050108@maubp.freeserve.co.uk> Emanuel Hey wrote: > Thanks much for responding > > ok, I had no idea Clustalw was so particular about its command line > flags. Its confused me in the past too! > I was not using Bio.Clustalw to build the command line because I > could not get that to work either. Ahh. Good point. I guess I should have checked this when I wrote my last email, but it turns out Bio.Clustalw was building it's command lines without using the INPUT argument... which I've just fixed in CVS. This should now work: from Bio.Clustalw import MultipleAlignCL, do_alignment faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' cline = MultipleAlignCL(faa_filename) #print cline align = do_alignment(cline) for col_index in range(align.get_alignment_length()) : print align.get_column(col_index) Please try this by backing up and then updating the file Bio/Clustalw/__init__.py to CVS revision 1.15, which you can download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython Thanks Peter From fahy at chapman.edu Wed Jul 25 11:14:06 2007 From: fahy at chapman.edu (Michael Fahy) Date: Wed, 25 Jul 2007 08:14:06 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> Speaking of the clustalw command line, is it possible to get clustalw to output a phylogenetic tree file, rather than an alignment file, via command line arguments? ---------------- > Emanuel Hey wrote: >> Thanks much for responding >> >> ok, I had no idea Clustalw was so particular about its command line >> flags. > > Its confused me in the past too! From jodyhey at yahoo.com Wed Jul 25 12:23:18 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 09:23:18 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <562586.70794.qm@web53909.mail.re2.yahoo.com> Peter Thanks. Actually do_alignment() is not working for me for just names, even without directories. >>> cline = MultipleAlignCL('data.faa') >>> str(cline) 'clustalw -INFILE=data.faa' >>> align = do_alignment(cline) Traceback (most recent call last): File "", line 1, in align = do_alignment(cline) File "C:\Program Files\Python25\lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file data.aln not produced, commandline: clustalw -INFILE=data.faa Is this me? I think I got the new __init__.py installed ok. jhey --- Peter wrote: > Emanuel Hey wrote: > > Thanks much for responding > > > > ok, I had no idea Clustalw was so particular > about its command line > > flags. > > Its confused me in the past too! > > > I was not using Bio.Clustalw to build the command > line because I > > could not get that to work either. > > Ahh. Good point. I guess I should have checked this > when I wrote my last > email, but it turns out Bio.Clustalw was building > it's command lines > without using the INPUT argument... which I've just > fixed in CVS. > > This should now work: > > from Bio.Clustalw import MultipleAlignCL, > do_alignment > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > #print cline > align = do_alignment(cline) > for col_index in range(align.get_alignment_length()) > : > print align.get_column(col_index) > > Please try this by backing up and then updating the > file > Bio/Clustalw/__init__.py to CVS revision 1.15, which > you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython > > Thanks > > Peter > ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From jodyhey at yahoo.com Wed Jul 25 12:23:18 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 09:23:18 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <562586.70794.qm@web53909.mail.re2.yahoo.com> Peter Thanks. Actually do_alignment() is not working for me for just names, even without directories. >>> cline = MultipleAlignCL('data.faa') >>> str(cline) 'clustalw -INFILE=data.faa' >>> align = do_alignment(cline) Traceback (most recent call last): File "", line 1, in align = do_alignment(cline) File "C:\Program Files\Python25\lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file data.aln not produced, commandline: clustalw -INFILE=data.faa Is this me? I think I got the new __init__.py installed ok. jhey --- Peter wrote: > Emanuel Hey wrote: > > Thanks much for responding > > > > ok, I had no idea Clustalw was so particular > about its command line > > flags. > > Its confused me in the past too! > > > I was not using Bio.Clustalw to build the command > line because I > > could not get that to work either. > > Ahh. Good point. I guess I should have checked this > when I wrote my last > email, but it turns out Bio.Clustalw was building > it's command lines > without using the INPUT argument... which I've just > fixed in CVS. > > This should now work: > > from Bio.Clustalw import MultipleAlignCL, > do_alignment > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > #print cline > align = do_alignment(cline) > for col_index in range(align.get_alignment_length()) > : > print align.get_column(col_index) > > Please try this by backing up and then updating the > file > Bio/Clustalw/__init__.py to CVS revision 1.15, which > you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython > > Thanks > > Peter > ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From italo.maia at gmail.com Wed Jul 25 13:23:43 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 14:23:43 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! @_@ Message-ID: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> I'm confused! Shouldn't SeqIO have a "parse" method?? My ubuntu biopython doesn't. Is this correct? -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From biopython at maubp.freeserve.co.uk Wed Jul 25 17:01:50 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 22:01:50 +0100 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> Message-ID: <46A7BA3E.4000407@maubp.freeserve.co.uk> Italo Maia wrote: > I'm confused! Shouldn't SeqIO have a "parse" method?? > My ubuntu biopython doesn't. Is this correct? You need Biopython 1.43 or later to use the new Bio.SeqIO code. http://biopython.org/wiki/SeqIO Which version of Ubuntu do you have? At the moment, only new un-released Ubuntu Gutsy has the latest version of Biopython in the repositories: http://packages.ubuntu.com/gutsy/source/python-biopython The good news is that is should be simple to install Biopython from source on Ubuntu - provided you install all the dependencies first! I personally still use Ubuntu Dapper Drake on my Linux machine. Peter From biopython at maubp.freeserve.co.uk Wed Jul 25 18:31:39 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 23:31:39 +0100 Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <562586.70794.qm@web53909.mail.re2.yahoo.com> References: <562586.70794.qm@web53909.mail.re2.yahoo.com> Message-ID: <46A7CF4B.6050704@maubp.freeserve.co.uk> Emanuel Hey wrote: > Peter > > Thanks. > > Actually do_alignment() is not working for me for just > names, even without directories. My fault maybe. The old version of Bio/Clustalw/__init__.py before my first update today probably would have worked without directories in the filename. My mistake was that while clustalw on windows seems to copes with / or - for some of its options, it has to be /infile=... rather than -infile=... (using either infile for INFILE is fine). On Linux, you can only use - for arguments. I should have spotted there was something amiss, but I was tricked by the fact that in my simple testing the output alignment already existed, and there was no error trapped from the system call, so it appeared to work. Grr. There are also "complications" when filenames contain spaces (a Microsoft innovation which frankly was in many respects a dreadful idea). Emanuel - please could you try updating Bio/Clustalw/__init__.py once again, trying the do_alignment() function, and reporting back. Please be explicit about the filenames used. Peter From italo.maia at gmail.com Wed Jul 25 19:07:49 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 20:07:49 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <46A7BA3E.4000407@maubp.freeserve.co.uk> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> <46A7BA3E.4000407@maubp.freeserve.co.uk> Message-ID: <800166920707251607jed5b257lcab41695b75b4a2e@mail.gmail.com> Oh well, so i'll try gutsy's biopython debian package. Link : http://mirrors.kernel.org/ubuntu/pool/universe/p/python-biopython/python-biopython_1.43-1_i386.deb Thanks Peter. 2007/7/25, Peter : > > Italo Maia wrote: > > I'm confused! Shouldn't SeqIO have a "parse" method?? > > My ubuntu biopython doesn't. Is this correct? > > You need Biopython 1.43 or later to use the new Bio.SeqIO code. > http://biopython.org/wiki/SeqIO > > Which version of Ubuntu do you have? At the moment, only new un-released > Ubuntu Gutsy has the latest version of Biopython in the repositories: > http://packages.ubuntu.com/gutsy/source/python-biopython > > The good news is that is should be simple to install Biopython from > source on Ubuntu - provided you install all the dependencies first! I > personally still use Ubuntu Dapper Drake on my Linux machine. > > Peter > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From italo.maia at gmail.com Wed Jul 25 19:07:49 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 20:07:49 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <46A7BA3E.4000407@maubp.freeserve.co.uk> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> <46A7BA3E.4000407@maubp.freeserve.co.uk> Message-ID: <800166920707251607jed5b257lcab41695b75b4a2e@mail.gmail.com> Oh well, so i'll try gutsy's biopython debian package. Link : http://mirrors.kernel.org/ubuntu/pool/universe/p/python-biopython/python-biopython_1.43-1_i386.deb Thanks Peter. 2007/7/25, Peter : > > Italo Maia wrote: > > I'm confused! Shouldn't SeqIO have a "parse" method?? > > My ubuntu biopython doesn't. Is this correct? > > You need Biopython 1.43 or later to use the new Bio.SeqIO code. > http://biopython.org/wiki/SeqIO > > Which version of Ubuntu do you have? At the moment, only new un-released > Ubuntu Gutsy has the latest version of Biopython in the repositories: > http://packages.ubuntu.com/gutsy/source/python-biopython > > The good news is that is should be simple to install Biopython from > source on Ubuntu - provided you install all the dependencies first! I > personally still use Ubuntu Dapper Drake on my Linux machine. > > Peter > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From biopython at maubp.freeserve.co.uk Thu Jul 26 09:55:14 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 14:55:14 +0100 Subject: [BioPython] clustalw and trees In-Reply-To: <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> <46A75C81.8050108@maubp.freeserve.co.uk> <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> Message-ID: <46A8A7C2.3000003@maubp.freeserve.co.uk> Michael Fahy wrote: > > Speaking of the clustalw command line, is it possible to get clustalw to > output a phylogenetic tree file, rather than an alignment file, via > command line arguments? Do you mean in addition to the guide tree (*.dnd) it generates by default when building an alignment (*.aln) from a fasta input? I would have a look at clustalw -help (which works on Linux!) for more options, e.g. -OUTPUTTREE=nj OR phylip OR dist OR nexus -SEED=n :seed number for bootstraps. -KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps. -BOOTLABELS=node OR branch :position of bootstrap values in tree display Peter From jodyhey at yahoo.com Thu Jul 26 11:25:33 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 26 Jul 2007 08:25:33 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <46A7CF4B.6050704@maubp.freeserve.co.uk> Message-ID: <847192.79361.qm@web53902.mail.re2.yahoo.com> Peter both of these now work. >>> faa_filename = 'data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) >>> >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks! jhey --- Peter wrote: > Emanuel Hey wrote: > > Peter > > > > Thanks. > > > > Actually do_alignment() is not working for me for > just > > names, even without directories. > > My fault maybe. The old version of > Bio/Clustalw/__init__.py before my > first update today probably would have worked > without directories in the > filename. > > My mistake was that while clustalw on windows seems > to copes with / or - > for some of its options, it has to be /infile=... > rather than > -infile=... (using either infile for INFILE is > fine). On Linux, you can > only use - for arguments. > > I should have spotted there was something amiss, but > I was tricked by > the fact that in my simple testing the output > alignment already existed, > and there was no error trapped from the system call, > so it appeared to > work. Grr. > > There are also "complications" when filenames > contain spaces (a > Microsoft innovation which frankly was in many > respects a dreadful idea). > > Emanuel - please could you try updating > Bio/Clustalw/__init__.py once > again, trying the do_alignment() function, and > reporting back. Please be > explicit about the filenames used. > > Peter > > ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 From biopython at maubp.freeserve.co.uk Thu Jul 26 11:57:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 16:57:26 +0100 Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <847192.79361.qm@web53902.mail.re2.yahoo.com> References: <847192.79361.qm@web53902.mail.re2.yahoo.com> Message-ID: <46A8C466.3000803@maubp.freeserve.co.uk> Emanuel Hey wrote: > Peter > > both of these now work. > > faa_filename = 'data.faa' > cline = MultipleAlignCL(faa_filename) > align = do_alignment(cline) > > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > align = do_alignment(cline) > > Thanks! > > jhey Oh good :) Have you tried file names and/or paths with spaces in them? Peter From dalloliogm at gmail.com Thu Jul 26 12:15:52 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 26 Jul 2007 18:15:52 +0200 Subject: [BioPython] biopython UML class diagram documentation? Message-ID: <5aa3b3570707260915i77862742k7a0d92b553fe6566@mail.gmail.com> Hi, where can I find an UML documentation with at least class diagrams of the whole biopython project? I have already found this one from the Pasteur Institute of Paris: - http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png but I was wondering if there is something ufficial in the wiki or somewhere else. Thanks! -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From fahy at chapman.edu Thu Jul 26 13:35:56 2007 From: fahy at chapman.edu (Michael Fahy) Date: Thu, 26 Jul 2007 10:35:56 -0700 Subject: [BioPython] clustalw and trees In-Reply-To: <46A8A7C2.3000003@maubp.freeserve.co.uk> Message-ID: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> Peter, I was hoping to get it to generate a "real" phylip tree file rather than the guide tree that it generates automatically. If I run it from the command line and add the flag "-OUTPUTTREE=phylip", it creates a guide tree file and an alignment file but no phylip tree file. If I run it interactively I can get it to create a phylip tree file (which is different from the guide tree file) but I have not been able to figure out how to do this from the command line. -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Thursday, July 26, 2007 6:55 AM To: fahy at chapman.edu Cc: biopython at biopython.org Subject: Re: [BioPython] clustalw and trees Michael Fahy wrote: > > Speaking of the clustalw command line, is it possible to get clustalw to > output a phylogenetic tree file, rather than an alignment file, via > command line arguments? Do you mean in addition to the guide tree (*.dnd) it generates by default when building an alignment (*.aln) from a fasta input? I would have a look at clustalw -help (which works on Linux!) for more options, e.g. -OUTPUTTREE=nj OR phylip OR dist OR nexus -SEED=n :seed number for bootstraps. -KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps. -BOOTLABELS=node OR branch :position of bootstrap values in tree display Peter From jodyhey at yahoo.com Thu Jul 26 13:38:16 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 26 Jul 2007 10:38:16 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <46A8C466.3000803@maubp.freeserve.co.uk> Message-ID: <709674.30122.qm@web53908.mail.re2.yahoo.com> I'm avoiding directories with space names, but I just did a quick check. This works >>> align = do_alignment(cline) >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\space test\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) jhey --- Peter wrote: > Emanuel Hey wrote: > > Peter > > > > both of these now work. > > > > faa_filename = 'data.faa' > > cline = MultipleAlignCL(faa_filename) > > align = do_alignment(cline) > > > > faa_filename = > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > > cline = MultipleAlignCL(faa_filename) > > align = do_alignment(cline) > > > > Thanks! > > > > jhey > > Oh good :) > > Have you tried file names and/or paths with spaces > in them? > > Peter > > ____________________________________________________________________________________ Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From biopython at maubp.freeserve.co.uk Thu Jul 26 14:29:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 19:29:13 +0100 Subject: [BioPython] clustalw and trees In-Reply-To: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> References: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> Message-ID: <46A8E7F9.1010506@maubp.freeserve.co.uk> Michael Fahy wrote: > Peter, > > I was hoping to get it to generate a "real" phylip tree file rather than the > guide tree that it generates automatically. If I run it from the command > line and add the flag "-OUTPUTTREE=phylip", it creates a guide tree file > and an alignment file but no phylip tree file. If I run it interactively I > can get it to create a phylip tree file (which is different from the guide > tree file) but I have not been able to figure out how to do this from the > command line. Oh. Maybe you need to pore over the clustalw documentation... You might also look at the EMBOSS version of PHYLIP and use that instead (assuming its available on your OS). http://emboss.sourceforge.net/apps/release/5.0/embassy/phylip/ http://emboss.sourceforge.net/apps/release/5.0/embassy/phylipnew/ They have repackaged all the PHYLIP tools with usable command line interfaces - if you have ever tried to use the original PHYLIP tools in a script you'll appreciate the difference. Peter From mmokrejs at ribosome.natur.cuni.cz Fri Jul 27 09:42:56 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 27 Jul 2007 15:42:56 +0200 Subject: [BioPython] GenBank parser used to break recently on rRNA records Message-ID: <46A9F660.1030107@ribosome.natur.cuni.cz> Hi, I tried to parse all ESTs and cDNAs from GenBank using biopython about 3 weeks old from CVS and it turned out it choked here: Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz' Traceback (most recent call last): File "translate_ESTs.py", line 27, in ? _record = _iterator.next() File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next return self._parser.parse(self.handle) File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \ AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): LOCUS DQ369798 725 bp rRNA linear HTC 14-JUN-2007 However, the code has been revamped as I see in current CVS, so this is just for your information. I can parse the file with current code. ;-) Martin From biopython at maubp.freeserve.co.uk Fri Jul 27 10:08:38 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 15:08:38 +0100 Subject: [BioPython] GenBank parser used to break recently on rRNA records In-Reply-To: <46A9F660.1030107@ribosome.natur.cuni.cz> References: <46A9F660.1030107@ribosome.natur.cuni.cz> Message-ID: <46A9FC66.7080507@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > Hi, > I tried to parse all ESTs and cDNAs from GenBank using biopython about > 3 weeks old from CVS and it turned out it choked here: > > Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz' > Traceback (most recent call last): > File "translate_ESTs.py", line 27, in ? > _record = _iterator.next() > File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next > return self._parser.parse(self.handle) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse > self._scanner.feed(handle, self._consumer) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed > self._feed_first_line(consumer, self.line) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line > assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \ > AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): > LOCUS DQ369798 725 bp rRNA linear HTC 14-JUN-2007 > > However, the code has been revamped as I see in current CVS, so this is > just for your information. I can parse the file with current code. ;-) > Martin It looks like the NCBI have introduced another sequence type to their databases, 'rRNA' in this case. I think this validates the recent change which will now accept anything with 'RNA' or 'DNA' in the string :) Peter From douglas.kojetin at gmail.com Fri Jul 27 15:50:46 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Fri, 27 Jul 2007 15:50:46 -0400 Subject: [BioPython] Bio.PDB: create a dummy vector Message-ID: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> Hi All, I would like to calculate the the angle between all of the N-H vectors in a PDB file to a specific point in 3D space. Can someone tell me how to create a dummy atom using Biopython located at [0.0, 0.0, 0.0]? atom1 = structure[0]['A'][1]['H'] atom2 = structure[0]['A'][1]['N'] vector1=atom1.get_vector() vector2=atom2.get_vector() dummy = :: somehow create a point at [0.0, 0.0, 0.0] ::: angle=calc_angle(vector1,vector2,dummy) Thanks, Doug From karbak at gmail.com Sat Jul 28 13:32:51 2007 From: karbak at gmail.com (K. Arun) Date: Sat, 28 Jul 2007 13:32:51 -0400 Subject: [BioPython] Bio.PDB: create a dummy vector In-Reply-To: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> References: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> Message-ID: <162452a10707281032r2c88b689k19706c122c851802@mail.gmail.com> On 7/27/07, Douglas Kojetin wrote: > I would like to calculate the the angle between all of the N-H > vectors in a PDB file to a specific point in 3D space. Can someone > tell me how to create a dummy atom using Biopython located at [0.0, > 0.0, 0.0]? > > atom1 = structure[0]['A'][1]['H'] > atom2 = structure[0]['A'][1]['N'] > > vector1=atom1.get_vector() > vector2=atom2.get_vector() > dummy = :: somehow create a point at [0.0, 0.0, 0.0] ::: Just calling Bio.PDB.Vector directly as below seems to work. dummy = Bio.PDB.Vector(0.0, 0.0, 0.0) -arun From fahy at chapman.edu Sat Jul 28 21:07:29 2007 From: fahy at chapman.edu (Michael Fahy) Date: Sat, 28 Jul 2007 18:07:29 -0700 (PDT) Subject: [BioPython] biopython In-Reply-To: <46A999C9.5010800@unice.fr> References: <46A999C9.5010800@unice.fr> Message-ID: <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> Dear Richard, Thank you for correcting my misuse of terminology.? As I understand clusalw, it generates a guide tree from a distance matrix calculated from pairwise alignments.? It then uses this guide tree to do a full multiple alignment.? If you run clustalw interactively, you can ask it to generate a phylogenetic tree file from this multiple alignment.? The tree file produced in this way differs, naturally enough, from the guide tree.? If you run clustalw and pass it command line arguments it will automatically? write the guide tree to a file but I have not be able to get it to write the other tree to a file. I now understand from your comments that there is little value in creating this tree file automatically due to inaccuracies in the clustalw alignment and other factors.? I have read some references that do recommend using clustalw for creating multiple alignments (and even for creating phlyogenetic trees).? I have also read Edgar's paper in which he provides evidence for the superior accuracy of MUSCLE.? Is there consensus in the research community that , while clustalw was a useful program for doing multiple alignments, it has been surpassed by newer programs such as MUSCLE and T-Coffee?? If so, it would be useful to update BipPython and the BioPython Tutorial and Cookbook to use these alternative programs. And, if you have created a multiple alignment and cleaned it (e.g. by removing domains with too much homoplasy) which tool or tools would you use to create the tree file (or files) from the alignment?? I understand that you recommend using multiple methods (neighbor joining, parsimony, maximum likelihood, etc.) and comparing the results.? I would guess that there are different tools that are better suited to each method.? You mention TreeeDyn and that looks like a very powerful tool but it appears that it is used for editing trees and not for creating tree files from multiple alignment files. OK, I just saw the link to your genbank2treedyn program on the treedyn site.? It looks like your program will read a fasta file with a set of sequences and then use clustalw to do the multiple alignment and phylip to create the phylogenetic tree.? So I guess you are not opposed to using clustalw, you are just warning against using its multiple alignment files to create trees without analyzing and correcting them by hand. Thanks for your help. --Michael -- > Dear Michael > > I was hoping to get it to generate a "real" phylip tree file rather than > the > guide tree that it generates automatically. > > 1/ > There is nothing such as a "phylip" tree > The usual tree format for phylip as well as many treeing programs is > "newick", in the form > ((a,b),c) > This is the format of the clustal guide tree. > > 2/ you should read clustal and phylip docs as well as some phylogenetic > courses > Making a good phylogenetic tree cannot be automated yet. You have to > - check alignement by hand (clustal will align sequences that should not > be aligned) > - exclude domains (positions) with too much homoplasy, or missing > positions in some sequences. > - several methods should be compared (distances, Ml, MP, ...) and a > boostrap run > > Clustal is an alignement program (you may try Muscle, Lagan, Tcoffe, > ...) and not at all a phylogeny program > > Finally, if you make trees, try : www.treedyn.org ;-) > > best > Richard > From biopython at maubp.freeserve.co.uk Sun Jul 29 05:06:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Jul 2007 10:06:13 +0100 Subject: [BioPython] biopython In-Reply-To: <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> References: <46A999C9.5010800@unice.fr> <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> Message-ID: <46AC5885.7010007@maubp.freeserve.co.uk> Michael Fahy wrote: > If you run clustalw and pass it command line arguments it will > automatically write the guide tree to a file but I have not be > able to get it to write the other tree to a file. If you use filename.fasta as input (or similar extensions), then by default Clustalw will call the alignment filename.aln and the guide tree filename.dnd (I normally accept these defaults myself). An example of changing the alignment filename (tested on Linux): clustalw -infile=demo.faa -outfile=demo.align This will result in an alignment, demo.align (our specified name), and a guide tree called demo.dnd (default naming). Another command line: clustalw -infile=demo.faa -newtree=demo.tree This will result only in a guide tree, demo.tree (our specified name), but no alignment. I don't know if you can get clustal to output both the alignment and the guide tree specifying both filenames. > Is there consensus in the research community that , while clustalw > was a useful program for doing multiple alignments, it has been > surpassed by newer programs such as MUSCLE and T-Coffee? My personal impression is that many Biologists are quite content with clustalw. If they are unhappy with an alignment then they might edit it by hand, or investigate other tools. There may be a link to the age of the researcher, or their computer skill ;) > If so, it would be useful to update BipPython and the BioPython > Tutorial and Cookbook to use these alternative programs. I would keep the clustalw examples, as I think the program is still widely used and is a useful baseline. Its also available on all the main operating systems. As MUSCLE is available on Linux, Mac and Windows, extending the tutorial to use it might be nice. Ideally we would also want to add a MUSCLE command line wrapper to Biopython to make calling the program as easy as possible. For T-Coffee, its not obvious if its cross platform, but I suspect its not available for Windows. Again, the same caveat applies - it would be best to have a Biopython command line wrapper for it before adding it to the tutorial. If you are happy at the command line, and running command line tools from python, then these tools should read FASTA files and output at least one output format Biopython can read. Peter From jdiezperezj at gmail.com Tue Jul 31 10:38:34 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Tue, 31 Jul 2007 16:38:34 +0200 Subject: [BioPython] blast output xml Message-ID: Hi, Does anyone knows if is it possible to get blast-xml output running blast from biopython scripts? How can I do that? Thanks Javi From jdiezperezj at gmail.com Tue Jul 31 10:45:28 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Tue, 31 Jul 2007 16:45:28 +0200 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: Running local blast On 7/31/07, Javier D?ez wrote: > > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi > From biopython at maubp.freeserve.co.uk Tue Jul 31 11:14:18 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Jul 2007 16:14:18 +0100 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: <46AF51CA.2030005@maubp.freeserve.co.uk> Javier D?ez wrote: > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi Yes, you can run standalone blast from Biopython, and parse its XML output. See "Chapter 3 BLAST" of the tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html Note that while parsing the plain text worked well with older versions of BLAST. We don't recommend using this anymore - use the XML output. Peter From mdehoon at c2b2.columbia.edu Tue Jul 31 11:11:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 01 Aug 2007 00:11:38 +0900 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: <46AF512A.7050305@c2b2.columbia.edu> See section 3.1 in the manual. --Michiel. Javier D?ez wrote: > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From italo.maia at gmail.com Sun Jul 1 23:12:24 2007 From: italo.maia at gmail.com (Italo Maia) Date: Sun, 1 Jul 2007 20:12:24 -0300 Subject: [BioPython] Error installing biopython... Message-ID: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> Did anyone else had this error trying to install biopython under ubuntu 7? Instalando python-biopython (1.42-2) ... Compiling /var/lib/python-support/python2.5/Bio/Wise/dnal.py ... File "/var/lib/python-support/python2.5/Bio/Wise/dnal.py", line 5 from __future__ import division SyntaxError: from __future__ imports must occur at the beginning of the file -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From idoerg at gmail.com Sun Jul 1 23:19:40 2007 From: idoerg at gmail.com (I. Friedberg) Date: Sun, 1 Jul 2007 16:19:40 -0700 Subject: [BioPython] Error installing biopython... In-Reply-To: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> References: <800166920707011612j64d0f9bbp732f2ca9c14e7cbf@mail.gmail.com> Message-ID: For 7.04 there is a patch: https://bugs.launchpad.net/ubuntu/+source/python-biopython/+bug/118771 I don't understand why they did not put the bugfix version in the Ubuntu repositories though. Iddo On 7/1/07, Italo Maia wrote: > > Did anyone else had this error trying to install biopython under ubuntu 7? > > Instalando python-biopython (1.42-2) ... > Compiling /var/lib/python-support/python2.5/Bio/Wise/dnal.py ... > File "/var/lib/python-support/python2.5/Bio/Wise/dnal.py", line 5 > from __future__ import division > SyntaxError: from __future__ imports must occur at the beginning of the > file > > > -- > "A arrog?ncia ? a arma dos fracos." > > =========================== > Italo Moreira Campelo Maia > Ci?ncia da Computa??o - UECE > Desenvolvedor WEB > Programador Java, Python > > Meu blog ^^ http://eusouolobomal.blogspot.com/ > > =========================== > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From dalloliogm at gmail.com Tue Jul 3 15:24:05 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Jul 2007 17:24:05 +0200 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <4683CFA0.1050905@maubp.freeserve.co.uk> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> Message-ID: <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> 2007/6/28, Peter : > Giovanni Marco Dall'Olio wrote: > > Hi! > > In principle, when I can't decide which keys to use for a dictionary, > > I just take simple numerical integers as keys, and it works quite > > well. > > It simplifies testing/debugging/organization a lot and I can decide > > the meaning of every key later (so it's better for dictionaries which > > have to contain very heterogeneous data). > > It sounds like you don't need/want a dictionary at all. If you are > assigning increasing numerical integers "keys", then why not just use > the list of features directly? That is not true: with a list is more complicated to add/remove elements if not in the last position. For instance, if I remove the first element in a list, all the other elements shift a position and I risk losing all the references I've made to them. Also, I don't need the sort/reverse and other operators. Moreover the cost of the insert operation into a dictionary in python is of the order of O(1) for every position, while for lists is not constant if not in the last position (sorry, I can't find a reference for this). In other languages it could seem strange to use a list to store data, because traditionally the cost of retrieving an element from a list is on the order of O(n) (this is not the case of python). Let's have a look at your example: - we have a list of features like this: list_features = ['GTAAGT', 'TACTAAC', 'TGT'] - then we specify the meaning of these features in another dictionary: splicesignal5 = list_features[0] polypirimidinetract = list_features[1] splicesignal3 = list_features[2] python passes the variables by value: this means that if you change one of the values in the list_features list, then you have to update all the variables which refer to it manually. >>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>> splicesignal5 = list_features[0] >>> print splicesignal5 'GTAAGT' >>> list_features[0] = 'TTTTTTT' >>> print splicesignal5 'GTAAGT' # wrong! >>> splicesignal5 = list_features[0] # have to update all the variables which refer to list_features manually >>> print splicesignal5' 'TTTTTTT' This is why I prefer to save the positions of the features instead of their values: >>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], 'splicesignal3': [2]} >>> def get_feature(feature_name): return list_features[dict_aliases[feature_name]] # (this code doesn't work) However, I think it's better to save the features in a dictionary instead of a list, for the reasons I was explaining before, in this way: >>> dict_features = {0: 'GTAAGT', 1: 'TACTAAC', 2: 'TGT'} # features are in a dictionary instead of a list >>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], 'splicesignal3': [2]} >>> def get_feature(feature_name): return(map (dict_features.get, x for x in dict_aliases[feature_name])) Another option could be to use references to memory positions instead of dictionary keys, but I don't know how to implement this in python, and I'm not sure it would be computationally convenient. > e.g. assuming record is a SeqRecord object: > > first_feature = record.features[0] > second_feature = record.features[1] > third_feature = record.features[2] > etc > > > I'm not sure I have understood the example you gave me on > > http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features > > , but it seems to work in a way similar to what I was saying before: > > it saves all the features in a list (or is it a dictionary?) and > > access them later by their positions. > > That example stored integers (indices in the features list) in a > dictionary using either the Locus tag, GI numbers or GeneID (e.g. keys > like "NEQ010", "GI:41614806" or "GeneID:2654552"). > > The point being if you know in advance you want to find individual > feature on the basis of their locus tag (for example), rather than the > order in the file, then I would map the locus tag strings to positions > in the list. > > e.g. > > locus_tag_cds_index = \ index_genbank_features(gb_record,"CDS","locus_tag") > my_feature = gb_record.features[locus_tag_index["NEQ010"]] uh ok.. but how is the gb_record.features dictionary structured? Which keys does it have? And what happens to these dictionaries (let's say, locus_tag_cds_index), when a feature from gb_record.features is deleted or modified? > You could also build a dictionary which maps from the locus tag directly > to the associated SeqFeature objects themselves. > > > Not to be silly but... how do you represent a gene with its > > transcripts/exons/introns structure with biopython? With SeqRecord and > > SeqFeature objects? > > If you loaded a GenBank or EMBL file using SeqIO you get one SeqRecord > object (assuming there is only one LOCUS line in the file) which > contains a list of SeqFeature objects which in turn may contain > sub-features. > > I work with bacteria so I don't have much experience with dealing with > sub-features in a SeqFeature object. I've never worked with SeqFeature and GenBank files (I have to work with GFF/GTF for annotations), but I will try to see how does it works. Thank you very much for these replies! I was really hoping to have this kind of feedback. Cheers! :) > > Peter > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From skhadar at gmail.com Wed Jul 4 02:52:15 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Wed, 4 Jul 2007 08:22:15 +0530 Subject: [BioPython] Biopython Installation on CentOS Message-ID: Dear All, We need to install Biopython and its dependencies (egenix-mx-base-2.0.6, Numeric-24.2 ) on our webserver. Machine Details : CentOS 4, x86_64, When we are trying to install these packages using 'python setup.py build'. It end up with the same error. "error: command 'gcc' failed with exit status 1". Please help us to solve this. Many thanks in advance, Shameer Khadar From sbassi at gmail.com Wed Jul 4 02:59:48 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 3 Jul 2007 23:59:48 -0300 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: On 7/3/07, Shameer Khadar wrote: > When we are trying to install these packages using 'python setup.py > build'. It end up with the same error. "error: command 'gcc' failed with > exit status 1". Did you install "build-essencial"? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Wed Jul 4 03:22:56 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 4 Jul 2007 00:22:56 -0300 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: On 7/4/07, Shameer Khadar wrote: > No I am not aware of this 'build essencial'. > Can you pls tell me where I can download this ? I am sorry, this package is for compiling in Ubuntu, not in CentOS. My mistake. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Wed Jul 4 08:40:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 04 Jul 2007 09:40:58 +0100 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: References: Message-ID: <468B5D1A.2050800@maubp.freeserve.co.uk> Sending again, to the mailing list this time! Shameer wrote: > When we are trying to install these packages using 'python setup.py > build'. It end up with the same error. "error: command 'gcc' failed > with exit status 1". Could you find the command (run by setup.py) that caused this gcc error? i.e. somewhere near the end of the output from 'python setup.py build' (I'm hoping to see which part of Biopython is causing the problem) What version of python do you have installed? What version of gcc do you have installed? Do you have flex installed? Have you been able to install any other python modules from source? (e.g. some of the recommended packages or dependencies like Numeric?) Peter From biopython at maubp.freeserve.co.uk Thu Jul 5 09:33:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 05 Jul 2007 10:33:26 +0100 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> Message-ID: <468CBAE6.3030306@maubp.freeserve.co.uk> Giovanni Marco Dall'Olio wrote: > Let's have a look at your example: > - we have a list of features like this: > list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > > - then we specify the meaning of these features in another dictionary: > splicesignal5 = list_features[0] > polypirimidinetract = list_features[1] > splicesignal3 = list_features[2] > > python passes the variables by value: this means that if you change > one of the values in the list_features list, then you have to update > all the variables which refer to it manually. > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>>> splicesignal5 = list_features[0] >>>> print splicesignal5 > 'GTAAGT' >>>> list_features[0] = 'TTTTTTT' >>>> print splicesignal5 > 'GTAAGT' # wrong! >>>> splicesignal5 = list_features[0] # have to update all the > variables which refer to list_features manually >>>> print splicesignal5' > 'TTTTTTT' > > This is why I prefer to save the positions of the features instead of > their values: >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], > 'splicesignal3': [2]} >>>> def get_feature(feature_name): return > list_features[dict_aliases[feature_name]] # (this code doesn't work) ... > Another option could be to use references to memory positions instead > of dictionary keys, but I don't know how to implement this in python, > and I'm not sure it would be computationally convenient. Have you considered making "feature objects", where each object can hold multiple pieces of information such as a name, alias, type - as well as the sequence data itself. You may wish to create your own class here, or try and use the existing Biopython SeqFeature object. You could then use a list to hold your feature objects, or a dictionary keyed on the alias perhaps. Or both. e.g. class Feature : #Very simple class which could be extended def __init__(self, seq_string) : self.seq = seq_string def __repr__(self) : #Use id(self) is to show the memory location (in hex), just #to show difference between two instance with same seq return "Feature(%s) instance at %s" \ % (self.seq, hex(id(self))) list_features = [Feature('GTAAGT'), Feature('TACTAAC'), Feature('TGT')] splicesignal5 = list_features[0] print splicesignal5 print list_features[0] print "EDITING first object in the list:" list_features[0].seq = 'TTTTTTT' print splicesignal5 #changed, now TTTTTTT print list_features[0] print "REPLACING first object in the list:" list_features[0] = Feature('GGGGGG') print splicesignal5 #still points to old object, TTTTTTT print list_features[0] -- I'm not sure if that is closer to what you wanted, or not. Peter From skhadar at gmail.com Thu Jul 5 13:10:24 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Thu, 5 Jul 2007 18:40:24 +0530 Subject: [BioPython] Biopython Installation on CentOS In-Reply-To: <468B5D1A.2050800@maubp.freeserve.co.uk> References: <468B5D1A.2050800@maubp.freeserve.co.uk> Message-ID: Dear Bassi and Peter, thanks for ur inputs, > > > Could you find the command (run by setup.py) that caused this gcc > error? i.e. somewhere near the end of the output from 'python setup.py > build' (I'm hoping to see which part of Biopython is causing the > problem) yes, it happened when i run python setup.py What version of python do you have installed? Python-2.4.4 What version of gcc do you have installed? i have no gcc, but i installed it and also the python-dev, after that it worked !! Do you have flex installed? what is the purpose of this flex Have you been able to install any other python modules from source? > (e.g. some of the recommended packages or dependencies like Numeric?) got same error for all, after installing python-dev and gcc, its done. many thanks, -- Shameer Khadar NCBS-TIFR Bangalore Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jeddahbioc at yahoo.com Tue Jul 10 11:15:52 2007 From: jeddahbioc at yahoo.com (FFFFF AAAAA) Date: Tue, 10 Jul 2007 04:15:52 -0700 (PDT) Subject: [BioPython] filter Message-ID: <385584.63029.qm@web43146.mail.sp1.yahoo.com> Hi, How to choose pdb files from Richardson Top500H pdbs with aresolution =< 1.00 Angstroms using python scripts. Thanks Fawzia --------------------------------- Shape Yahoo! in your own image. Join our Network Research Panel today! From biopython at maubp.freeserve.co.uk Tue Jul 10 12:03:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 13:03:15 +0100 Subject: [BioPython] filtering PDB files by resolution In-Reply-To: <385584.63029.qm@web43146.mail.sp1.yahoo.com> References: <385584.63029.qm@web43146.mail.sp1.yahoo.com> Message-ID: <46937583.9060908@maubp.freeserve.co.uk> FFFFF AAAAA wrote: > Hi, > How to choose pdb files from Richardson Top500H pdbs with aresolution =< 1.00 Angstroms using python scripts. > Thanks > Fawzia I've never had to look at the resolution itself, but the REMARK 2 lines in the header looks relevant. You may find it easiest to do this your self rather than using the Biopython module Bio.PDB as that focuses on the atomistic structure rather than the metadata. Alternatively maybe you could take the list of PDB identifiers and your 1.00 Angstroms, and put these into the www.pdb.org web query interface. Peter P.S. There might be something useful here: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/top500/ Incidentally, are you working with original PDB files from www.pdb.org or the modified versions from the Richardson group which had hydrogens added with reduce? From biopython at maubp.freeserve.co.uk Tue Jul 10 12:33:08 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 13:33:08 +0100 Subject: [BioPython] filtering PDB files by resolution In-Reply-To: <46937583.9060908@maubp.freeserve.co.uk> References: <385584.63029.qm@web43146.mail.sp1.yahoo.com> <46937583.9060908@maubp.freeserve.co.uk> Message-ID: <46937C84.5040409@maubp.freeserve.co.uk> Link to the "Top 500" PDB page, http://kinemage.biochem.duke.edu/databases/top500.php Peter wrote: > Alternatively maybe you could take the list of PDB identifiers and your > 1.00 Angstroms, and put these into the www.pdb.org web query interface. Anyway, using the web interface, here are twenty structures from the 494 unique PDB IDs on the "Richardson group's Top 500" list done by X-Ray crystallography with a resolution under 1.0 Angstroms: 1A6M 1.0 1AHO 1.0 1B0Y 0.9 1BXO 0.9 1BYI 1.0 1C75 1.0 1CEX 1.0 1EJG 0.5 1ETN 0.9 1GCI 0.8 1IXH 1.0 1LKK 1.0 1NLS 0.9 1RB9 0.9 2ERL 1.0 2FDN 0.9 2PVB 0.9 3PYP 0.9 4LZT 0.9 7A3H 0.9 I just pasted in the 494 unique PDB IDs (comma separated) and specified the x-ray resolution to be between 0.0 and 1.0 Note 1A1Y (resolution 1.05 angstroms) is now obsolete, and does not appear in the search results even if you relax the resolution limit. Also watch out for the fact that some of the PDB IDs on the Top 500 list have been replaced: 1TAX -> 1GOK 1GDO -> 1XFF 5ICB -> 1IG5 2MYR -> 1E70 Finally, I can't see how to use the web query to search for resolutions from other experimental techniques - but it looks like all of the "Top 500" were done by x-ray anyway. Do check this! Peter From mmayhew at mcb.mcgill.ca Tue Jul 10 18:08:18 2007 From: mmayhew at mcb.mcgill.ca (Michael Mayhew) Date: Tue, 10 Jul 2007 14:08:18 -0400 Subject: [BioPython] Martel Parser error... Message-ID: <4693CB12.5080207@mcb.mcgill.ca> Greetings, I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and have successfully compiled/installed the prerequisite packages (Numeric and mxTextTools). I have been receiving a Martel Parser error as detailed in the following readout (from a python interactive session), when I try to use either Fasta.RecordParser() or Fasta.SequenceParser() instances: >>tester = iter.next() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Fasta/__init__.py", line 72, in next result = self._iterator.next() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Martel/IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Martel/Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/handler.py", line 38, in fatalError raise exception Martel.Parser.ParserPositionException: error parsing at or beyond character 0 I confirmed this when I ran the included test suites (with python setup.py test). I have seen some suggestions to get the most recent CVS version of biopython to rectify this problem. How would I go about this? Is getting the most recent CVS version of biopython the only/best thing to do? Thanks in advance. Michael Mayhew From biopython at maubp.freeserve.co.uk Tue Jul 10 19:27:15 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 20:27:15 +0100 Subject: [BioPython] Martel Parser error... In-Reply-To: <4693CB12.5080207@mcb.mcgill.ca> References: <4693CB12.5080207@mcb.mcgill.ca> Message-ID: <4693DD93.9090208@maubp.freeserve.co.uk> Michael Mayhew wrote: > Greetings, > > I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and > have successfully compiled/installed the prerequisite packages (Numeric > and mxTextTools). > > I have been receiving a Martel Parser error as detailed in the > following readout (from a python interactive session), when I try to use > either Fasta.RecordParser() or Fasta.SequenceParser() instances: > > >>tester = iter.next() > Traceback (most recent call last): > ... Could you give a complete stand alone example? I'm not sure what you are trying to do here... Have you looked at Bio.SeqIO (available in Biopython 1.43) instead of Bio.Fasta? http://biopython.org/wiki/SeqIO > I confirmed this when I ran the included test suites (with python > setup.py test). Could you show us the failed test result? > I have seen some suggestions to get the most recent CVS version of > biopython to rectify this problem. How would I go about this? > > Is getting the most recent CVS version of biopython the only/best > thing to do? I'm not sure what's wrong here - so its hard to say if CVS would be any better. I've not used Biopython on Mac OS X, but it should work. Peter From biopython at maubp.freeserve.co.uk Tue Jul 10 20:03:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 21:03:10 +0100 Subject: [BioPython] Bio.SeqIO and files with one record Message-ID: <4693E5FE.708@maubp.freeserve.co.uk> Dear Biopython people, I'd like a little feedback on the Bio.SeqIO module - in particular, one situation I think could be improved is when dealing with sequences files which contain a single record - for example a very simple Fasta file, or a chromosome in a GenBank file. http://www.biopython.org/wiki/SeqIO The shortest way to get this one record as a SeqRecord object is probably: from Bio import SeqIO record = SeqIO.parse(open("example.gbk"), "genbank").next() This works, assuming there is at least one record, but will not trigger any error if there was more than one record - something you may want to check. Do any of you think this situation is common enough to warrant adding another function to Bio.SeqIO to do this for you (raising errors for no records or more than one record). My suggestions for possible names include parse_single, parse_one, parse_sole, parse_individual and mono_parse One way to do this inline would be: from Bio import SeqIO temp_list = list(SeqIO.parse(open("example.gbk"), "genbank")) assert len(temp_list) == 1 record = temp_list[0] del temp_list Or perhaps: from Bio import SeqIO temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank")) record = temp_iter.next() try : assert temp_iter.next() is None except StopIteration : pass del temp_iter The above code copes with the fact that in general some iterators may signal the end by raising a StopIteration except, or by returning None. Peter P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html From mmayhew at MCB.McGill.CA Wed Jul 11 00:12:22 2007 From: mmayhew at MCB.McGill.CA (mmayhew at MCB.McGill.CA) Date: Tue, 10 Jul 2007 20:12:22 -0400 (EDT) Subject: [BioPython] Martel Parser error... In-Reply-To: <4693DD93.9090208@maubp.freeserve.co.uk> References: <4693CB12.5080207@mcb.mcgill.ca> <4693DD93.9090208@maubp.freeserve.co.uk> Message-ID: <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> Tested a sample file with a single FASTA record at home with my Windows XP machine and produced the following set of results using Bio.SeqIO first and then Bio.Fasta (with Fasta.Iterator and Fasta.RecordParser() instances). Looks like the same error, with a completely different OS (received exact same error with Mac OSX v10.4 & biopython 1.42). Since Bio.SeqIO works fine (thank you for the reccomendation) I will use that, but the Bio.Fasta error may potentially be an error to look into. >>> from Bio import SeqIO >>> handle = open('test.txt', 'r') >>> for record in SeqIO.parse(handle, "fasta"): print record.id hg18_knownGene_uc001hsx.1 >>> record.seq Seq('ATGGCCCGAACCAAGCAGACTGCGCGCAAGTCAACGGGTGGCAAGGCGCCGCGCAAGCAGCTGGCCACCAAGGTGGCTCGCAAGAGCGCACCTGCCACTGGCGGCGTGAAGAAGCCGCACCGCTACCGGCCCGGCACGGTGGCGCTTCGCGAGATCCGCCGCTACCAGAAGTCCACTGAGCTGCTAATCCGCAAGTTGCCCTTCCAGCGGCTGATGCGCGAGATCGCTCAGGACTTTAAGACCGACCTGCGCTTCCAGAGCTCGGCCGTGATGGCGCTGCAGGAGGCGTGCGAGTCTTACCTGGTGGGGCTGTTTGAGGACACCAACCTGTGTGTCATCCATGCCAAACGGGTCACCATCATGCCTAAGGACATCCAGCTGGCACGCCGTATCCGCGGGGAGCGGGCCTAG', SingleLetterAlphabet()) >>> record.seq.tostring() 'ATGGCCCGAACCAAGCAGACTGCGCGCAAGTCAACGGGTGGCAAGGCGCCGCGCAAGCAGCTGGCCACCAAGGTGGCTCGCAAGAGCGCACCTGCCACTGGCGGCGTGAAGAAGCCGCACCGCTACCGGCCCGGCACGGTGGCGCTTCGCGAGATCCGCCGCTACCAGAAGTCCACTGAGCTGCTAATCCGCAAGTTGCCCTTCCAGCGGCTGATGCGCGAGATCGCTCAGGACTTTAAGACCGACCTGCGCTTCCAGAGCTCGGCCGTGATGGCGCTGCAGGAGGCGTGCGAGTCTTACCTGGTGGGGCTGTTTGAGGACACCAACCTGTGTGTCATCCATGCCAAACGGGTCACCATCATGCCTAAGGACATCCAGCTGGCACGCCGTATCCGCGGGGAGCGGGCCTAG' >>> handle.close() >>> from Bio import Fasta >>> handle = open('test.txt', 'r') >>> it = Fasta.Iterator(handle, Fasta.RecordParser()) >>> seq = it.next() Traceback (most recent call last): File "", line 1, in -toplevel- seq = it.next() File "C:\Python24\Lib\site-packages\Bio\Fasta\__init__.py", line 72, in next result = self._iterator.next() File "C:\Python24\Lib\site-packages\Martel\IterParser.py", line 152, in iterateFile self.header_parser.parseString(rec) File "C:\Python24\Lib\site-packages\Martel\Parser.py", line 356, in parseString self._err_handler.fatalError(result) File "C:\Python24\lib\xml\sax\handler.py", line 38, in fatalError raise exception ParserPositionException: error parsing at or beyond character 0 > Michael Mayhew wrote: >> Greetings, >> >> I am using BioPython-1.42 (also tried 1.43) on Mac OS X (10.4.8) and >> have successfully compiled/installed the prerequisite packages (Numeric >> and mxTextTools). >> >> I have been receiving a Martel Parser error as detailed in the >> following readout (from a python interactive session), when I try to use >> either Fasta.RecordParser() or Fasta.SequenceParser() instances: >> >> >>tester = iter.next() >> Traceback (most recent call last): > > ... > > Could you give a complete stand alone example? I'm not sure what you are > trying to do here... > > Have you looked at Bio.SeqIO (available in Biopython 1.43) instead of > Bio.Fasta? > > http://biopython.org/wiki/SeqIO > >> I confirmed this when I ran the included test suites (with python >> setup.py test). > > Could you show us the failed test result? > >> I have seen some suggestions to get the most recent CVS version of >> biopython to rectify this problem. How would I go about this? >> >> Is getting the most recent CVS version of biopython the only/best >> thing to do? > > I'm not sure what's wrong here - so its hard to say if CVS would be any > better. I've not used Biopython on Mac OS X, but it should work. > > Peter > From fahy at chapman.edu Wed Jul 11 01:36:19 2007 From: fahy at chapman.edu (Michael Fahy) Date: Tue, 10 Jul 2007 18:36:19 -0700 Subject: [BioPython] Transcription Message-ID: <000401c7c35b$e1adea20$a509be60$@edu> I just showed the BioPython tutorial to some of our Biology and Chemistry faculty. They pointed out that all the "Transcribe" function does is replace each occurrence of "T" in the sequence with a "U". The biologists said that that is not what they mean by transcription. They felt that each nucleotide should have been replaced by the complementary nucleotide, and that the resulting string should have been reversed. This, they said, would be concordant with the way in which biologists use the term "transcribe'. It would not be hard to do, so why does BioPython do what it does and call it transcription? Michael Fahy Mathematics and Computer Science Chapman University One University Drive Orange, CA 92866 (714) 997-6879 fahy at chapman.edu From mdehoon at c2b2.columbia.edu Wed Jul 11 03:59:11 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 10 Jul 2007 23:59:11 -0400 Subject: [BioPython] Transcription References: <000401c7c35b$e1adea20$a509be60$@edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> It all depends on how you interpret the sequence that you give to the transcribe function, and for that matter, to the translate function. For translation, virtually all biological publications show the non-template strand. Hence, the sequence given to the translate function in Biopython is also interpreted as the non-template strand. For consistency, the sequence given to the transcribe function in Biopython is also taken to be the non-template strand: >>> from Bio.Seq import * >>> s = "ATGGTATAA" >>> translate(s) 'MV*' >>> transcribe(s) 'AUGGUAUAA' >>> translate(_) 'MV*' If you want to have the reverse complement, use the reverse_complement function. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michael Fahy Sent: Tue 7/10/2007 9:36 PM To: BioPython at lists.open-bio.org Subject: [BioPython] Transcription I just showed the BioPython tutorial to some of our Biology and Chemistry faculty. They pointed out that all the "Transcribe" function does is replace each occurrence of "T" in the sequence with a "U". The biologists said that that is not what they mean by transcription. They felt that each nucleotide should have been replaced by the complementary nucleotide, and that the resulting string should have been reversed. This, they said, would be concordant with the way in which biologists use the term "transcribe'. It would not be hard to do, so why does BioPython do what it does and call it transcription? Michael Fahy Mathematics and Computer Science Chapman University One University Drive Orange, CA 92866 (714) 997-6879 fahy at chapman.edu _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From aloraine at gmail.com Wed Jul 11 04:14:50 2007 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 10 Jul 2007 23:14:50 -0500 Subject: [BioPython] Transcription In-Reply-To: <000401c7c35b$e1adea20$a509be60$@edu> References: <000401c7c35b$e1adea20$a509be60$@edu> Message-ID: <83722dde0707102114w5d81f9dr7d3701afadd13ed8@mail.gmail.com> Hello, I guess the "translate" functions are more useful. I've used biopython tools many times to translate nucleotide sequences into proteins, using different genetic codes. It is a very useful feature, often the first step in series of computations. Maybe the audience was responding to how we programmers like to represent biological sequences as character strings? DNA, the molecule, is double-stranded, so it might be more proper to model it as a pair of strings. But this would be wasteful of space, since one string is all you need to capture the sequence of both strands. They are right that the antisense strand is used as the template for RNA synthesis (transcription), but I'm not sure if it is proper to say that one strand or the other is being transcribed. Maybe in future you could say something like: biopython sequence objects have a string that represents a sequence of nucleotides, and when you call a transcribe method, the method assumes that this string also represents the sense strand of a double-stranded DNA molecule. Best wishes, Ann On 7/10/07, Michael Fahy wrote: > I just showed the BioPython tutorial to some of our Biology and Chemistry > faculty. They pointed out that all the "Transcribe" function does is > replace each occurrence of "T" in the sequence with a "U". The biologists > said that that is not what they mean by transcription. They felt that each > nucleotide should have been replaced by the complementary nucleotide, and > that the resulting string should have been reversed. > > This, they said, would be concordant with the way in which biologists use > the term "transcribe'. It would not be hard to do, so why does BioPython do > what it does and call it transcription? > > > > > > Michael Fahy > > Mathematics and Computer Science > > Chapman University > > One University Drive > > Orange, CA 92866 > > (714) 997-6879 > > fahy at chapman.edu > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Ann Loraine Assistant Professor University of Alabama at Birmingham http://www.transvar.org 205-996-4155 From biopython at maubp.freeserve.co.uk Wed Jul 11 08:03:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Jul 2007 09:03:58 +0100 Subject: [BioPython] Martel Parser error... In-Reply-To: <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> References: <4693CB12.5080207@mcb.mcgill.ca> <4693DD93.9090208@maubp.freeserve.co.uk> <3620.24.200.95.226.1184112742.squirrel@mail.mcb.mcgill.ca> Message-ID: <46948EEE.1050908@maubp.freeserve.co.uk> mmayhew at MCB.McGill.CA wrote: > Tested a sample file with a single FASTA record at home with my Windows XP > machine and produced the following set of results using Bio.SeqIO first > and then Bio.Fasta (with Fasta.Iterator and Fasta.RecordParser() > instances). > Looks like the same error, with a completely different OS (received exact > same error with Mac OSX v10.4 & biopython 1.42). Since Bio.SeqIO works > fine (thank you for the reccomendation) I will use that, but the Bio.Fasta > error may potentially be an error to look into. Your example worked fine for me on Linux, I haven't tried on Windows XP yet. While you get the same error on both Windows XP and Mac OS X? My only suggestion right now is to check your line endings (CR versus CRLF versus LF) are appropriate for the platform, and to try: handle = open('test.txt', 'rU') Also, what do these give? print repr(open('test.txt', 'r').read()) print repr(open('test.txt', 'rU').read()) I'm looking for any difference in the new lines (e.g. \n versus \n\r) Maybe Michiel can suggest something as I believe he sometimes uses Biopython on Mac OS X. On the bright side, Bio.SeqIO works for you (and we now recommend using that instead of Bio.Fasta) but I would still like to sort this out. Peter From kosa at genesilico.pl Wed Jul 11 08:24:11 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Wed, 11 Jul 2007 10:24:11 +0200 Subject: [BioPython] Bio.SeqIO and files with one record Message-ID: <469493AB.3020504@genesilico.pl> Hi, Do I understand correctly that the function is to return a record instead of a parser? If yes I think it could be useful. parse_single sounds good. Cheers, Jan Kosinski > Message: 7 > Date: Tue, 10 Jul 2007 21:03:10 +0100 > From: Peter > Subject: [BioPython] Bio.SeqIO and files with one record > To: biopython at lists.open-bio.org > Message-ID: <4693E5FE.708 at maubp.freeserve.co.uk> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear Biopython people, > > I'd like a little feedback on the Bio.SeqIO module - in particular, one > situation I think could be improved is when dealing with sequences files > which contain a single record - for example a very simple Fasta file, or > a chromosome in a GenBank file. > > http://www.biopython.org/wiki/SeqIO > > The shortest way to get this one record as a SeqRecord object is probably: > > from Bio import SeqIO > record = SeqIO.parse(open("example.gbk"), "genbank").next() > > This works, assuming there is at least one record, but will not trigger > any error if there was more than one record - something you may want to > check. > > Do any of you think this situation is common enough to warrant adding > another function to Bio.SeqIO to do this for you (raising errors for no > records or more than one record). My suggestions for possible names > include parse_single, parse_one, parse_sole, parse_individual and mono_parse > > One way to do this inline would be: > > from Bio import SeqIO > temp_list = list(SeqIO.parse(open("example.gbk"), "genbank")) > assert len(temp_list) == 1 > record = temp_list[0] > del temp_list > > Or perhaps: > > from Bio import SeqIO > temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank")) > record = temp_iter.next() > try : > assert temp_iter.next() is None > except StopIteration : > pass > del temp_iter > > The above code copes with the fact that in general some iterators may > signal the end by raising a StopIteration except, or by returning None. > > Peter > > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > > > > :. From biopython at maubp.freeserve.co.uk Wed Jul 11 09:32:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Jul 2007 10:32:13 +0100 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <469493AB.3020504@genesilico.pl> References: <469493AB.3020504@genesilico.pl> Message-ID: <4694A39D.4010805@maubp.freeserve.co.uk> Jan Kosinski wrote: > Hi, > > Do I understand correctly that the function is to return a record > instead of a parser? If yes I think it could be useful. parse_single > sounds good. Yes, sorry if I wasn't clear. Bio.SeqIO.parse(handle, format) would still return an iterator giving SeqRecord objects. The suggested function (possibly called) Bio.SeqIO.parse_single(handle, format) would return a single SeqRecord object if the file contains one and only one record. It would raise exceptions for no records, or more than one record. e.g. from Bio import SeqIO handle = open('example.gbk') record = Bio.SeqIO.parse_single(handle, genbank') or, from Bio import SeqIO record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta') As I said, I sometimes find myself wanting to do this - for example single query BLAST files in fasta format, or bacterial genomes in GenBank format. The question is, is this worth adding to the interface or is this a relatively rare need? Peter From kosa at genesilico.pl Wed Jul 11 09:58:27 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Wed, 11 Jul 2007 11:58:27 +0200 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694A39D.4010805@maubp.freeserve.co.uk> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> Message-ID: <4694A9C3.2030707@genesilico.pl> I think this is a rare case that python program must be sure that no other sequences are in the file it reads. If a program reads a file with single sequence using SeqIO.parse and refer to that sequence by sth like "records[0]" it is not needed to check if there were other sequences in the file as far as the program was not design to notify the user that perhaps he made an error by providing a file with multiple sequences. Actually, although parse_single would be useful, such function adds little new value to biopython and certainly adding this function could be postponed to the time when you really don't know what to add more to biopython ;-) Yet, I it is worth adding if it does not require much work and it would not mess up the interface at all. Janek Peter wrote: > > > The question is, is this worth adding to the interface or is this a > relatively rare need? :. From mmokrejs at ribosome.natur.cuni.cz Wed Jul 11 10:32:36 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Wed, 11 Jul 2007 12:32:36 +0200 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694A39D.4010805@maubp.freeserve.co.uk> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> Message-ID: <4694B1C4.3010609@ribosome.natur.cuni.cz> Hi, Peter wrote: > Jan Kosinski wrote: >> Hi, >> >> Do I understand correctly that the function is to return a record >> instead of a parser? If yes I think it could be useful. parse_single >> sounds good. > > Yes, sorry if I wasn't clear. > > Bio.SeqIO.parse(handle, format) would still return an iterator giving > SeqRecord objects. > > The suggested function (possibly called) Bio.SeqIO.parse_single(handle, > format) would return a single SeqRecord object if the file contains one > and only one record. It would raise exceptions for no records, or more > than one record. > > e.g. > > from Bio import SeqIO > handle = open('example.gbk') > record = Bio.SeqIO.parse_single(handle, genbank') > > or, > > from Bio import SeqIO > record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta') I think it does make sense, but call it parse_the_only_one() to make it clear, it does not pick up just the very first record from the many. > > As I said, I sometimes find myself wanting to do this - for example > single query BLAST files in fasta format, or bacterial genomes in > GenBank format. > > The question is, is this worth adding to the interface or is this a > relatively rare need? Once people learn to wrap the iterator in a loop it is not necessary, but I think if you have the time to do this ... ;-) Martin From cjfields at uiuc.edu Wed Jul 11 11:58:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 06:58:50 -0500 Subject: [BioPython] Bio.SeqIO and files with one record In-Reply-To: <4694B1C4.3010609@ribosome.natur.cuni.cz> References: <469493AB.3020504@genesilico.pl> <4694A39D.4010805@maubp.freeserve.co.uk> <4694B1C4.3010609@ribosome.natur.cuni.cz> Message-ID: On Jul 11, 2007, at 5:32 AM, Martin MOKREJ? wrote: > ... > Once people learn to wrap the iterator in a loop it is not > necessary, but I think > if you have the time to do this ... ;-) > Martin Wrapping the iterator in a while loop works for bioperl as it only works when the expression evals to true (and assigning undef evals as false). It's a common bioperl idiom to do something like: my $seqio = Bio::SeaIO->new(-format => 'genbank', -file => 'myfile.gb'); while (my $seq = $seqio->next_seq) { # do stuff here .... } chris From dalloliogm at gmail.com Thu Jul 12 15:00:03 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 12 Jul 2007 17:00:03 +0200 Subject: [BioPython] I don't understand why SeqRecord.feature is a list In-Reply-To: <468CBAE6.3030306@maubp.freeserve.co.uk> References: <5aa3b3570706120407x7bc29550j26bd8c7a5f4ae02b@mail.gmail.com> <920D9BCD-ADC3-4704-AA97-2AE8089F02CE@mitre.org> <466EAE8D.2090609@maubp.freeserve.co.uk> <5aa3b3570706280645s6744b6fdn2cce34abb6883155@mail.gmail.com> <4683CFA0.1050905@maubp.freeserve.co.uk> <5aa3b3570707030824w605ad101y8f58319d0b0cb0e5@mail.gmail.com> <468CBAE6.3030306@maubp.freeserve.co.uk> Message-ID: <5aa3b3570707120800g1ed2c8f1t73f117f61cab0874@mail.gmail.com> Yes, it's true, it is something similar to the way SeqFeature should work. But I just still don't get how to represent my genes in biopython :( You know, I've printed the Bio module UML scheme from here: http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png and putted it in the wall above the monitor of my computer like a poster. So everyday, when I come at work, I see the Bio module UML scheme and ask myself why SeqRecord.features is a list instead of a dictionary :) 2007/7/5, Peter : > Giovanni Marco Dall'Olio wrote: > > Let's have a look at your example: > > - we have a list of features like this: > > list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > > > > - then we specify the meaning of these features in another dictionary: > > splicesignal5 = list_features[0] > > polypirimidinetract = list_features[1] > > splicesignal3 = list_features[2] > > > > python passes the variables by value: this means that if you change > > one of the values in the list_features list, then you have to update > > all the variables which refer to it manually. > > > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > >>>> splicesignal5 = list_features[0] > >>>> print splicesignal5 > > 'GTAAGT' > >>>> list_features[0] = 'TTTTTTT' > >>>> print splicesignal5 > > 'GTAAGT' # wrong! > >>>> splicesignal5 = list_features[0] # have to update all the > > variables which refer to list_features manually > >>>> print splicesignal5' > > 'TTTTTTT' > > > > This is why I prefer to save the positions of the features instead of > > their values: > >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT'] > >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1], > > 'splicesignal3': [2]} > >>>> def get_feature(feature_name): return > > list_features[dict_aliases[feature_name]] # (this code doesn't work) > > ... > > > Another option could be to use references to memory positions instead > > of dictionary keys, but I don't know how to implement this in python, > > and I'm not sure it would be computationally convenient. > > Have you considered making "feature objects", where each object can hold > multiple pieces of information such as a name, alias, type - as well as > the sequence data itself. You may wish to create your own class here, or > try and use the existing Biopython SeqFeature object. > > You could then use a list to hold your feature objects, or a dictionary > keyed on the alias perhaps. Or both. > > e.g. > > class Feature : > #Very simple class which could be extended > def __init__(self, seq_string) : > self.seq = seq_string > > def __repr__(self) : > #Use id(self) is to show the memory location (in hex), just > #to show difference between two instance with same seq > return "Feature(%s) instance at %s" \ > % (self.seq, hex(id(self))) > > > list_features = [Feature('GTAAGT'), > Feature('TACTAAC'), > Feature('TGT')] > > splicesignal5 = list_features[0] > print splicesignal5 > print list_features[0] > > print "EDITING first object in the list:" > list_features[0].seq = 'TTTTTTT' > > print splicesignal5 #changed, now TTTTTTT > print list_features[0] > > print "REPLACING first object in the list:" > list_features[0] = Feature('GGGGGG') > > print splicesignal5 #still points to old object, TTTTTTT > print list_features[0] > > -- > > I'm not sure if that is closer to what you wanted, or not. > > Peter > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Sat Jul 14 10:18:12 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 11:18:12 +0100 Subject: [BioPython] Problem with blastx output parsing =~ In-Reply-To: <46704A00.9010409@maubp.freeserve.co.uk> References: <800166920706040936w4de744acn8cefe445a6284f72@mail.gmail.com> <46644664.6080009@maubp.freeserve.co.uk> <800166920706041022u5fafc308h71bdcaa11acfade1@mail.gmail.com> <46644ED2.1080505@maubp.freeserve.co.uk> <4665408E.2090306@maubp.freeserve.co.uk> <466FC73C.2000608@maubp.freeserve.co.uk> <800166920706131028o4eb5ea6eqa92e3f0634ea7748@mail.gmail.com> <46704A00.9010409@maubp.freeserve.co.uk> Message-ID: <4698A2E4.4010208@maubp.freeserve.co.uk> Hi Italo, I haven't heard anything back from you - did you get all your 24,000 plain text blast output files to work with Biopython? Thanks Peter Related bug 2090 http://bugzilla.open-bio.org/show_bug.cgi?id=2090 From italo.maia at gmail.com Sat Jul 14 14:23:36 2007 From: italo.maia at gmail.com (Italo Maia) Date: Sat, 14 Jul 2007 11:23:36 -0300 Subject: [BioPython] Problem with blastx output parsing =~ In-Reply-To: <4698A2E4.4010208@maubp.freeserve.co.uk> References: <800166920706040936w4de744acn8cefe445a6284f72@mail.gmail.com> <46644664.6080009@maubp.freeserve.co.uk> <800166920706041022u5fafc308h71bdcaa11acfade1@mail.gmail.com> <46644ED2.1080505@maubp.freeserve.co.uk> <4665408E.2090306@maubp.freeserve.co.uk> <466FC73C.2000608@maubp.freeserve.co.uk> <800166920706131028o4eb5ea6eqa92e3f0634ea7748@mail.gmail.com> <46704A00.9010409@maubp.freeserve.co.uk> <4698A2E4.4010208@maubp.freeserve.co.uk> Message-ID: <800166920707140723pa8540bbs3e3b83a62525a7de@mail.gmail.com> Wow Peter, hi! Well, to tell you the truth, i was told to do something else here at work, so i left those blast output aside. But, i'll try to parse some blast outputs after tomorrow, and give you some feedback. ^_^ 2007/7/14, Peter : > > Hi Italo, > > I haven't heard anything back from you - did you get all your 24,000 > plain text blast output files to work with Biopython? > > Thanks > > Peter > > Related bug 2090 > http://bugzilla.open-bio.org/show_bug.cgi?id=2090 > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From douglas.kojetin at gmail.com Sat Jul 14 17:03:21 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Sat, 14 Jul 2007 13:03:21 -0400 Subject: [BioPython] Bio.PDB phi/psi angles Message-ID: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> Hi All, I would like to print the phi/psi angles for a structure using the script found here: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/ python/ramachandran/calculate/ but the script chokes on the following line: phi_psi = poly.get_phi_psi_list() error output: Traceback (most recent call last): File "./ramachandran_biopython.py", line 62, in phi_psi = poly.get_phi_psi_list() File "/sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py", line 169, in get_phi_psi_list res.xtra["PHI"]=phi NameError: global name 'res' is not defined Does anyone know what I can do to overcome this problem? Thanks, Doug From orlando.doehring at googlemail.com Sat Jul 14 17:59:44 2007 From: orlando.doehring at googlemail.com (=?ISO-8859-1?Q?Orlando_D=F6hring?=) Date: Sat, 14 Jul 2007 19:59:44 +0200 Subject: [BioPython] HETATM records retrieval Message-ID: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> Dear community, how should HETATM records be retrieved via Biopython? I assume it should be somewhere on the chain or residue level: - http://www.biopython.org/DIST/docs/api/public/Bio.PDB.Residue.Residue-class.html - http://biopython.org/DIST/docs/api/private/Bio.PDB.Chain.Chain-class.html Using the following basic sample code : for model in self.structure.get_iterator(): for chain in model.get_iterator(): print chain.__repr__() for residue in chain.get_iterator(): print residue.__repr__() applied to protein 1DHR (http://www.pdb.org/pdb/files/1dhr.pdb) we get: ... As one can see that are all ATOM records. Thanks. Yours, Orlando From biopython at maubp.freeserve.co.uk Sat Jul 14 18:07:55 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 19:07:55 +0100 Subject: [BioPython] Bio.PDB phi/psi angles In-Reply-To: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> References: <5FF6C50F-AFAE-4922-BA8D-98CAD917777B@gmail.com> Message-ID: <469910FB.40601@maubp.freeserve.co.uk> Douglas Kojetin wrote: > Hi All, > > I would like to print the phi/psi angles for a structure using the > script found here: > > http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/ > python/ramachandran/calculate/ > > but the script chokes on the following line: > > phi_psi = poly.get_phi_psi_list() > > error output: > > Traceback (most recent call last): > File "./ramachandran_biopython.py", line 62, in > phi_psi = poly.get_phi_psi_list() > File "/sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py", > line 169, in get_phi_psi_list > res.xtra["PHI"]=phi > NameError: global name 'res' is not defined > > Does anyone know what I can do to overcome this problem? Its not your fault, it looks like an error has crept into Bio/PDB/Polypeptide.py with revision 1.32 (shipped with biopython 1.43), which I think I've just fixed with CVS revision 1.33 You can grab the files from here once the webpage has updated: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/PDB/Polypeptide.py?cvsroot=biopython Just backup and replace the existing file here: /sw/lib/python2.5/site-packages/Bio/PDB/Polypeptide.py Or, you could downgrade to biopython 1.42 ;) Peter From biopython at maubp.freeserve.co.uk Sat Jul 14 18:23:07 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 19:23:07 +0100 Subject: [BioPython] HETATM records retrieval In-Reply-To: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> References: <8cc339d80707141059r2ad5f0ah68c6d1cbb0deebf2@mail.gmail.com> Message-ID: <4699148B.2050500@maubp.freeserve.co.uk> Orlando D?hring wrote: > Dear community, > > how should HETATM records be retrieved via Biopython? I assume it should be > somewhere on the chain or residue level In the PDB file you used, 1DHR, all the HETATM records are for solvents (NAD = NICOTINAMIDE ADENINE DINUCLEOTIDE and HOH = water) so they don't appear as part of the protein chains. I haven't looked at this recently so its not fresh in my mind.chain. When there are HETATM entries within a protein (e.g. alternative amino acids) then they should be part of the chain. > Using the following basic sample code : > > for model in self.structure.get_iterator(): > for chain in model.get_iterator(): > print chain.__repr__() > for residue in chain.get_iterator(): > print residue.__repr__() You don't need to explicitly call the get_iterator() method, so I much prefer this style myself: structure = ... for model in structure: for chain in model: print repr(chain) for residue in chain: print repr(residue) I've also used the repr() function rather than the hidden __repr__ of the object; its the same end result but I find this clearer. Have you read the example on this page? In particular the use of the PPBuilder or CaPPBuilder classes: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/#BioPython I also urge you to look at the author's (Thomas Hamelryck) documentation here: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf This is much more useful than the automatic API documentation you linked to. Peter From O.Doehring at cs.ucl.ac.uk Sat Jul 14 18:38:22 2007 From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk) Date: 14 Jul 2007 19:38:22 +0100 Subject: [BioPython] HETATM records retrieval Message-ID: Dear community, how should HETATM records be retrieved via Biopython? I assume it should be somewhere on the chain or residue level: - http://www.biopython.org/DIST/docs/api/public/Bio.PDB.Residue.Residue-class.html - http://biopython.org/DIST/docs/api/private/Bio.PDB.Chain.Chain-class.html Using the following basic sample code : for model in self.structure.get_iterator(): for chain in model.get_iterator(): print chain.__repr__() for residue in chain.get_iterator(): print residue.__repr__() applied to protein 1DHR (http://www.pdb.org/pdb/files/1dhr.pdb) we get: ... As one can see that are all ATOM records. Thanks. Yours, Orlando From fahy at chapman.edu Mon Jul 16 06:41:36 2007 From: fahy at chapman.edu (Michael Fahy) Date: Sun, 15 Jul 2007 23:41:36 -0700 Subject: [BioPython] PHYLIP Message-ID: <002401c7c774$5b9a2170$12ce6450$@edu> I've just started using BioPython and have worked through the Cookbook section on calling clustalw from Python to do alignments. It would be cool to use clustalw to produce PHYLIP-format files and then call the PHYLIP programs to produce phylogenetic trees from them. Has anyone already worked this out? I searched the last couple of years of list archives and did not find anything about using BioPython to access PHYLIP. From cy at cymon.org Mon Jul 16 08:51:44 2007 From: cy at cymon.org (Cymon J. Cox) Date: Mon, 16 Jul 2007 09:51:44 +0100 Subject: [BioPython] PHYLIP In-Reply-To: <002401c7c774$5b9a2170$12ce6450$@edu> References: <002401c7c774$5b9a2170$12ce6450$@edu> Message-ID: <1184575904.9393.7.camel@clintonite.nhm.ac.uk> Hi Michael, On Sun, 2007-07-15 at 23:41 -0700, Michael Fahy wrote: > I've just started using BioPython and have worked through the Cookbook > section on calling clustalw from Python to do alignments. It would be cool > to use clustalw to produce PHYLIP-format files and then call the PHYLIP > programs to produce phylogenetic trees from them. If your doing phylogenetics in Python you should checkout Peter Foster's P4 (http://www.bmnh.org/~pf/p4.html). > Has anyone already worked > this out? Probably, but then few people continue to use phylip on a regular basis these days... Cheers, C. ________________________________________________________________________ Cymon J. Cox Biometry and Molecular Research Department of Zoology Natural History Museum Cromwell Road London, SW7 5BD Email: cy at cymon.org, c.cox at nhm.ac.uk, cymon.cox at gmail.com Phone : +44 (0)20 7942 6981 HomePage : http://www.duke.edu/~cymon -8.63/-6.77 ________________________________________________________________________ Fedora Core release 6 (Zod) clintonite.nhm.ac.uk 09:43:20 up 19:13, 6 users, load average: 0.02, 0.18, 0.29 From biopython at maubp.freeserve.co.uk Mon Jul 16 09:14:54 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 10:14:54 +0100 Subject: [BioPython] PHYLIP In-Reply-To: <002401c7c774$5b9a2170$12ce6450$@edu> References: <002401c7c774$5b9a2170$12ce6450$@edu> Message-ID: <469B370E.4000703@maubp.freeserve.co.uk> Michael Fahy wrote: > I've just started using BioPython and have worked through the Cookbook > section on calling clustalw from Python to do alignments. It would be cool > to use clustalw to produce PHYLIP-format files and then call the PHYLIP > programs to produce phylogenetic trees from them. Has anyone already worked > this out? I searched the last couple of years of list archives and did not > find anything about using BioPython to access PHYLIP. You should be able to do this: 1. Produce your unaligned sequences in a suitable format for clustalw e.g. write a fasta file using Bio.SeqIO.write(...) 2. Run clustalw (e.g. using the Bio.Clustalw command line wrapper in Biopthon, or just make a system call in python). 3. Read in the clustal format alignment using Bio.SeqIO.parse(...) and write it out unaltered using Bio.SeqIO.write(...) in phylip format. See http://biopython.org/wiki/SeqIO#File_Format_Conversion 4. Run the PHYLIP tools (e.g. by making a system call from python, or by hand at the command line). Personally I like the EMBOSS implementation of PHYLIP as this uses proper command line arguments - making calling them from code much easier. (Note they do like to re-arrange their website, and as EMBOSS 5,0 is just out, it looks like some links are broken right now). Note that you should avoid long record id's as the phylip format imposes strict truncation of 10 characters (which could lead to non-unique record names). Peter From mmokrejs at ribosome.natur.cuni.cz Mon Jul 16 14:24:05 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 16 Jul 2007 16:24:05 +0200 Subject: [BioPython] Transcription In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> Message-ID: <469B7F85.2030409@ribosome.natur.cuni.cz> BTW, someone who has the rights should include this example in the docs. The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Martin Michiel De Hoon wrote: > It all depends on how you interpret the sequence that you give to the > transcribe function, and for that matter, to the translate function. For > translation, virtually all biological publications show the non-template > strand. Hence, the sequence given to the translate function in Biopython is > also interpreted as the non-template strand. For consistency, the sequence > given to the transcribe function in Biopython is also taken to be the > non-template strand: > >>>> from Bio.Seq import * >>>> s = "ATGGTATAA" >>>> translate(s) > 'MV*' >>>> transcribe(s) > 'AUGGUAUAA' >>>> translate(_) > 'MV*' > > If you want to have the reverse complement, use the reverse_complement > function. > > --Michiel. > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michael Fahy > Sent: Tue 7/10/2007 9:36 PM > To: BioPython at lists.open-bio.org > Subject: [BioPython] Transcription > > I just showed the BioPython tutorial to some of our Biology and Chemistry > faculty. They pointed out that all the "Transcribe" function does is > replace each occurrence of "T" in the sequence with a "U". The biologists > said that that is not what they mean by transcription. They felt that each > nucleotide should have been replaced by the complementary nucleotide, and > that the resulting string should have been reversed. > > This, they said, would be concordant with the way in which biologists use > the term "transcribe'. It would not be hard to do, so why does BioPython do > what it does and call it transcription? > > > > > > Michael Fahy > > Mathematics and Computer Science > > Chapman University > > One University Drive > > Orange, CA 92866 > > (714) 997-6879 > > fahy at chapman.edu > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > -- Dr. Martin Mokrejs Dept. of Genetics and Microbiology Faculty of Science, Charles University Vinicna 5, 128 43 Prague, Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs From biopython at maubp.freeserve.co.uk Mon Jul 16 15:15:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 16:15:31 +0100 Subject: [BioPython] Bio.SeqIO ideas In-Reply-To: <469B6231.6040109@ribosome.natur.cuni.cz> References: <4693E5FE.708@maubp.freeserve.co.uk> <469B6231.6040109@ribosome.natur.cuni.cz> Message-ID: <469B8B93.5070201@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > Peter, > maybe the docs (generated from sources as well as those in the > Documentation) should be clear what is id, name, description of SeqRecord object. They are all strings, normally specified when creating the instance of the SeqRecord object. The answer is it depends on where the SeqRecord came from - and for Bio.SeqIO this means which file format. One idea I had in mind was to expand the wiki page with worked examples of a sequence files and the SeqRecord created from it by Bio.SeqIO > E.g., > it would be helpful to demonstrate the values on an example of a FASTA > record parsed. Then one would figure out what is the difference between name > and description. Fasta files are used in the tutorial, http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 Do you think in addition to explicitly showing the record id and seq, I should also show the description (and name)? Fasta files are a very free form format, and in general the first word (splitting on white space) is a name or identifier. In some cases (e.g. NCBI fasta files) this can be subdivided (splitting on the | character). To be explicit suppose you had this: >554154531 a made up protein SDKJSDLHVLSDJDKJFDLJFKLSDJD >heat shock protein EINDLKNFLDHFDSHFLDSHJDSHDJHJHKJHSD Biopython will use the first word as both the record id and name, and the full text as the description. For example given this FASTA file you would get two records, the first: id = name = "554154531" description = "554154531 a made up protein" and the second, id = name = "heat" description = "heat shock protein" Note that the inclusion of the full text as the description is partly based on older Biopython code, and also to try and make it as easy as possible for you to extract any data from the line in your own code. Peter From biopython at maubp.freeserve.co.uk Mon Jul 16 16:26:29 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 17:26:29 +0100 Subject: [BioPython] Transcription In-Reply-To: <469B7F85.2030409@ribosome.natur.cuni.cz> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> Message-ID: <469B9C35.2020604@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter From fahy at chapman.edu Mon Jul 16 16:39:42 2007 From: fahy at chapman.edu (Michael Fahy) Date: Mon, 16 Jul 2007 09:39:42 -0700 Subject: [BioPython] Transcription In-Reply-To: <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> It would have helped me avoid some confusion if the Tutorial had mentioned the difference between sequences from the coding strand and from the template strand and the fact that the transcribe function in BioPython expects a sequence from the coding strand. My biology colleagues here seem to prefer thinking of transcribing the sequence from the template strand. It's not a big deal now that we understand what the intent is, but it caused some misunderstanding at first. -----Original Message----- From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter Sent: Monday, July 16, 2007 9:26 AM To: Martin MOKREJ? Cc: BioPython at lists.open-bio.org Subject: Re: [BioPython] Transcription Martin MOKREJ? wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mdehoon at c2b2.columbia.edu Tue Jul 17 00:06:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 16 Jul 2007 20:06:26 -0400 Subject: [BioPython] Transcription References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> > The source to the tutorial is a TeX (or LaTeX?) file in CVS, which > generates the HTML and PDF tutorial. I can update the CVS files, but I > don't think I can update the file on the webpage... I update the file on the web page whenever I make a new Biopython release, using the file that is in CVS at that time. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Mon 7/16/2007 12:26 PM To: Martin MOKREJS Cc: BioPython at lists.open-bio.org Subject: Re: [BioPython] Transcription Martin MOKREJS wrote: > BTW, someone who has the rights should include this example in the docs. > The relevant section is empty: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 Or remove it? We cover transcription and translation earlier: Section 2.2 Working with sequences http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 The source to the tutorial is a TeX (or LaTeX?) file in CVS, which generates the HTML and PDF tutorial. I can update the CVS files, but I don't think I can update the file on the webpage... Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3318 bytes Desc: not available URL: From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 08:50:25 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 10:50:25 +0200 Subject: [BioPython] Bio.SeqIO ideas In-Reply-To: <469B8B93.5070201@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> <469B6231.6040109@ribosome.natur.cuni.cz> <469B8B93.5070201@maubp.freeserve.co.uk> Message-ID: <469C82D1.7040907@ribosome.natur.cuni.cz> Hi Peter, Peter wrote: > Martin MOKREJ? wrote: >> Peter, >> maybe the docs (generated from sources as well as those in the >> Documentation) should be clear what is id, name, description of >> SeqRecord object. > > They are all strings, normally specified when creating the instance of > the SeqRecord object. The answer is it depends on where the SeqRecord > came from - and for Bio.SeqIO this means which file format. > > One idea I had in mind was to expand the wiki page with worked examples > of a sequence files and the SeqRecord created from it by Bio.SeqIO I didn't know that, then definitely such examples would be really helpful. > > E.g., >> it would be helpful to demonstrate the values on an example of a FASTA >> record parsed. Then one would figure out what is the difference >> between name and description. > > Fasta files are used in the tutorial, > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 This is I think not very clear, rather show how to get the real sequence using tostring() instead of the __repr__ output, where the sequence is truncated and an alphabet is shown. Yes, tostring() was used somewhere way above in section 2.2. > > Do you think in addition to explicitly showing the record id and seq, I > should also show the description (and name)? Yes, because they are confusing. For parsing FASTA files I either my old home grown code I did use something else. Yesterdays I just wanted to parse and modify some files having extra coordinates in the description line and thought let's use biopython. Yep, but had to do several times dir(record) to see the methods available, as the manuals did not provide me with complete listing of the methods/functions. And I really had to play around to see what is name and description. And then do an extra search in the docs for the SeqRecord class and its properties. > > Fasta files are a very free form format, and in general the first word > (splitting on white space) is a name or identifier. In some cases (e.g. > NCBI fasta files) this can be subdivided (splitting on the | character). > > To be explicit suppose you had this: > > >554154531 a made up protein > SDKJSDLHVLSDJDKJFDLJFKLSDJD > >heat shock protein > EINDLKNFLDHFDSHFLDSHJDSHDJHJHKJHSD > > Biopython will use the first word as both the record id and name, and > the full text as the description. For example given this FASTA file you > would get two records, the first: > > id = name = "554154531" > description = "554154531 a made up protein" > > and the second, > > id = name = "heat" > description = "heat shock protein" Please include these examples on the web and maybe it is sufficient for the first pass, probably thinking how an EMBL record would get parsed is unnecessarily complex. FASTA should definitely appear in there. > > Note that the inclusion of the full text as the description is partly > based on older Biopython code, and also to try and make it as easy as > possible for you to extract any data from the line in your own code. I use only record.id, record.description and record.seq.tostring(). BTW, doing record.seq.tostring instead of record.seq.tostring() breaks biopython code somewhere inside but was clear it was my fault anyway. Martin From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 09:07:44 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:07:44 +0200 Subject: [BioPython] Transcription In-Reply-To: <469B9C35.2020604@maubp.freeserve.co.uk> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> Message-ID: <469C86E0.5010205@ribosome.natur.cuni.cz> Peter, Peter wrote: > Martin MOKREJ? wrote: >> BTW, someone who has the rights should include this example in the docs. >> The relevant section is empty: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 > > Or remove it? We cover transcription and translation earlier: > > Section 2.2 Working with sequences > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 But is is buried in the text, it is more helpful to have a chapter item like this. Moreover, I ended up going through my thrash and picking up the email on this thread last from week. Now I have from Bio.Seq import translate ... print ''.join(('>', _record.id, ' ', _description, ' ', _start.strip(), '..', _stop.strip(), '\n', translate(_record.seq.tostring()[int(_start)-1:int(_stop)]))) but following the Tutorial would have been better as one could choose different translator. For example, now I wanted to look again into the docs to show you what's in, I did search on the main webpage but searching for 'translate' gave nothing, so I ticked also some additional set like the discussion but nothing came. I went to Documentation section, picked up API documentation as that is the fastest http://biopython.org/DIST/docs/api/public/trees.html , found the Bio.Translate.Translator . Again, the generated docs are bad: http://biopython.org/DIST/docs/api/public/Bio.Translate.Translator-class.html and >>> help(Translate) is not much helpful ither: Help on module Bio.Translate in Bio: NAME Bio.Translate FILE /usr/lib/python2.4/site-packages/Bio/Translate.py CLASSES Translator class Translator | Methods defined here: | | __init__(self, table) | | back_translate(self, seq) | | translate(self, seq, stop_symbol='*') | | translate_to_stop(self, seq) DATA ambiguous_dna_by_id = {1: , 2: , 2: , 2: , 2: I think the classes should be documented at least in the source code. That's one of the very nice python features, of course not unique only to it. Regards, Martin From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 09:29:31 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:29:31 +0200 Subject: [BioPython] Transcription In-Reply-To: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> References: <039e01c7c7c7$e8cae9a0$c789d3ce@chapman.edu> Message-ID: <469C8BFB.8000901@ribosome.natur.cuni.cz> Michael Fahy wrote: > It would have helped me avoid some confusion if the Tutorial had mentioned > the difference between sequences from the coding strand and from the > template strand and the fact that the transcribe function in BioPython > expects a sequence from the coding strand. My biology colleagues here seem > to prefer thinking of transcribing the sequence from the template strand. > It's not a big deal now that we understand what the intent is, but it caused > some misunderstanding at first. Yes, I read the discussion last week, it is fooling for me as well, I don't think I need the transcribe() as I would probably always do seq.tostring().replace('tT', 'uU') and not search a special function in the code. But what I would do look for is a function doing reverse_complement() and replace('tT', 'uU') while allowing me to optionally pass in intron or probably even more conveniently exon positions. Sure one can do it himself. Definitely, the Transcribe API docs are bad as for the Translate which is really really bad. One one wondered what translate_to_stop really does as one would hope that it starts as well on the very first AUG being in the correct frame, maybe does 3-frame or even six-frame translation ... who know what the undocumented function does. ;-)) Well, doing http://www.google.com/search?hl=en&q=site%3Abiopython.org+translate_to_stop&btnG=Google+Search gave few hits, one in tutorial and one can see it only translates to the stop and doesn't care about start and frames. Definitely, no other docs exist except the Tutorial and Cookbook as shown in the Google output. Very bad I think. Maybe someone would really fix the Transcribe? http://www.google.com/search?num=100&hl=en&q=%22from+Bio+import+Transcribe%22&btnG=Search&lr=lang_en Similarly, the Translate points me to the original thread years ago: http://209.85.135.104/search?q=cache:MU0IHhkMV9YJ:portal.open-bio.org/pipermail/biopython/2005-May.txt+%22from+Bio+import+Translate%22&hl=en&ct=clnk&cd=6&lr=lang_cs|lang_en|lang_de|lang_sk Martin > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org > [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter > Sent: Monday, July 16, 2007 9:26 AM > To: Martin MOKREJ? > Cc: BioPython at lists.open-bio.org > Subject: Re: [BioPython] Transcription > > Martin MOKREJ? wrote: >> BTW, someone who has the rights should include this example in the docs. >> The relevant section is empty: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc68 > > Or remove it? We cover transcription and translation earlier: > > Section 2.2 Working with sequences > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc8 > > The source to the tutorial is a TeX (or LaTeX?) file in CVS, which > generates the HTML and PDF tutorial. I can update the CVS files, but I > don't think I can update the file on the webpage... > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 09:30:38 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 11:30:38 +0200 Subject: [BioPython] Transcription In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> Message-ID: <469C8C3E.6000608@ribosome.natur.cuni.cz> Michiel De Hoon wrote: >> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >> generates the HTML and PDF tutorial. I can update the CVS files, but I >> don't think I can update the file on the webpage... > > I update the file on the web page whenever I make a new Biopython release, > using the file that is in CVS at that time. Maybe a script would do that daily? M. From biopython at maubp.freeserve.co.uk Tue Jul 17 11:41:12 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Jul 2007 12:41:12 +0100 Subject: [BioPython] Transcription In-Reply-To: <469C8C3E.6000608@ribosome.natur.cuni.cz> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> <469C8C3E.6000608@ribosome.natur.cuni.cz> Message-ID: <469CAAD8.4060802@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > > Michiel De Hoon wrote: >>> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >>> generates the HTML and PDF tutorial. I can update the CVS files, but I >>> don't think I can update the file on the webpage... >> I update the file on the web page whenever I make a new Biopython release, >> using the file that is in CVS at that time. > > Maybe a script would do that daily? That isn't always a good idea - Updating the online copy as part of the release cycle means that the documentation matches the latest available release. Updating the tutorial in sync with CVS could mean that the documentation would include changes made for as yet unreleased code. As long as we avoid unreleased changes, we can of course improve the documentation and update the online copy in between release cycles. Peter From mmokrejs at ribosome.natur.cuni.cz Tue Jul 17 12:44:34 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Tue, 17 Jul 2007 14:44:34 +0200 Subject: [BioPython] Transcription In-Reply-To: <469CAAD8.4060802@maubp.freeserve.co.uk> References: <000401c7c35b$e1adea20$a509be60$@edu> <6243BAA9F5E0D24DA41B27997D1FD14402B5F1@mail2.exch.c2b2.columbia.edu> <469B7F85.2030409@ribosome.natur.cuni.cz> <469B9C35.2020604@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5F3@mail2.exch.c2b2.columbia.edu> <469C8C3E.6000608@ribosome.natur.cuni.cz> <469CAAD8.4060802@maubp.freeserve.co.uk> Message-ID: <469CB9B2.1060605@ribosome.natur.cuni.cz> Peter wrote: > Martin MOKREJ? wrote: >> >> Michiel De Hoon wrote: >>>> The source to the tutorial is a TeX (or LaTeX?) file in CVS, which >>>> generates the HTML and PDF tutorial. I can update the CVS files, but >>>> I don't think I can update the file on the webpage... >>> I update the file on the web page whenever I make a new Biopython >>> release, >>> using the file that is in CVS at that time. >> >> Maybe a script would do that daily? > > That isn't always a good idea - Updating the online copy as part of the > release cycle means that the documentation matches the latest available > release. Updating the tutorial in sync with CVS could mean that the > documentation would include changes made for as yet unreleased code. I understand that of course. > > As long as we avoid unreleased changes, we can of course improve the > documentation and update the online copy in between release cycles. Or keep both? M. From jodyhey at yahoo.com Tue Jul 24 04:23:13 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Mon, 23 Jul 2007 21:23:13 -0700 (PDT) Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw Message-ID: <270644.85634.qm@web53909.mail.re2.yahoo.com> I am running windows XP, Python 2.5.1 and BioPython trying to do a multiple alignment on a fasta file. The file is not large, with just 10 sequences to try things out. I'm familiar with clustalw as a command line program. With clustalw in the path the following works fine and opens a command prompt window with the clustalw menu >>>os.system('clustalw') However this line returns a '1' when it should do an alignment on the sequences contained in c.faa >>> os.system('clustalw c.faa') I also tried following the examples in the cookbook http://biopython.org/DIST/docs/tutorial/Tutorial.html and the examples in the bioinformatics course http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw For example, the following call to a constructed command line >>> alignment = Clustalw.do_alignment(cline) generates this error result Traceback (most recent call last): File "", line 1, in alignment = Clustalw.do_alignment(cline) File "C:\Program Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file .\testalign.out not produced, commandline: clustalw .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out Thanks much for any pointers jhey ____________________________________________________________________________________ Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 From biopython at maubp.freeserve.co.uk Tue Jul 24 07:47:58 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Jul 2007 08:47:58 +0100 Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw In-Reply-To: <270644.85634.qm@web53909.mail.re2.yahoo.com> References: <270644.85634.qm@web53909.mail.re2.yahoo.com> Message-ID: <46A5AEAE.4080203@maubp.freeserve.co.uk> Emanuel Hey wrote: > I am running windows XP, Python 2.5.1 and BioPython > trying to do a multiple alignment on a fasta file. > The file is not large, with just 10 sequences to try > things out. I'm familiar with clustalw as a command > line program. > > With clustalw in the path the following works fine and > opens a command prompt window with the clustalw menu >>>> os.system('clustalw') That suggests that clustalw is on the windows path, or in the current directory. Great. (I personally use a full path to clustalw.exe when working on Windows) > However this line returns a '1' when it should do an > alignment on the sequences contained in c.faa >>>> os.system('clustalw c.faa') Obvious question time - is the file c.faa in the current directory? i.e. Does this work? import os assert os.path.isfile('c.faa') > I also tried following the examples in the cookbook > http://biopython.org/DIST/docs/tutorial/Tutorial.html > and the examples in the bioinformatics course > http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw > > For example, the following call to a constructed > command line >>>> alignment = Clustalw.do_alignment(cline) > > generates this error result > > > Traceback (most recent call last): > File "", line 1, in > alignment = Clustalw.do_alignment(cline) > File "C:\Program > Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", > line 117, in do_alignment > % (out_file, command_line)) > IOError: Output .aln file .\testalign.out not > produced, commandline: clustalw > .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out This may be the same issue... Could you try using fully qualified paths for both clustalw.exe and the input fasta file? Peter From jodyhey at yahoo.com Tue Jul 24 14:38:56 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Tue, 24 Jul 2007 07:38:56 -0700 (PDT) Subject: [BioPython] clustalw problem using standalone or Bio.Clustalw In-Reply-To: <46A5AEAE.4080203@maubp.freeserve.co.uk> Message-ID: <912111.24870.qm@web53904.mail.re2.yahoo.com> Yes, it was indeed a path problem. Thanks. jhey --- Peter wrote: > Emanuel Hey wrote: > > I am running windows XP, Python 2.5.1 and > BioPython > > trying to do a multiple alignment on a fasta file. > > The file is not large, with just 10 sequences to > try > > things out. I'm familiar with clustalw as a > command > > line program. > > > > With clustalw in the path the following works fine > and > > opens a command prompt window with the clustalw > menu > >>>> os.system('clustalw') > > That suggests that clustalw is on the windows path, > or in the current > directory. Great. (I personally use a full path to > clustalw.exe when > working on Windows) > > > However this line returns a '1' when it should do > an > > alignment on the sequences contained in c.faa > >>>> os.system('clustalw c.faa') > > Obvious question time - is the file c.faa in the > current directory? > i.e. Does this work? > > import os > assert os.path.isfile('c.faa') > > > I also tried following the examples in the > cookbook > > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > and the examples in the bioinformatics course > > > http://www.pasteur.fr/recherche/unites/sis/formation/python/ch11s06.html#quest_run_clustalw > > > > For example, the following call to a constructed > > command line > >>>> alignment = Clustalw.do_alignment(cline) > > > > generates this error result > > > > > > Traceback (most recent call last): > > File "", line 1, in > > alignment = Clustalw.do_alignment(cline) > > File "C:\Program > > > Files\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", > > line 117, in do_alignment > > % (out_file, command_line)) > > IOError: Output .aln file .\testalign.out not > > produced, commandline: clustalw > > .\chimp_hemoglobin.gb.faa -OUTFILE=.\testalign.out > > This may be the same issue... > > Could you try using fully qualified paths for both > clustalw.exe and the > input fasta file? > > Peter > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From jodyhey at yahoo.com Tue Jul 24 19:57:05 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Tue, 24 Jul 2007 12:57:05 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw Message-ID: <935709.26436.qm@web53905.mail.re2.yahoo.com> I don't know if this is related to my previous problem As before under windows XP If I have clustalw.exe in the current directory then I should be able to execute just using >>>os.sytem('clustalw ' + 'data.faa') Indeed this works fine However if I give it the full path >>> os.system('clustalw ' + 'C:\temp\pythonplay\hcgplay\data.faa') or >>> os.system('clustalw ' + 'C:\\temp\\pythonplay\\hcgplay\\data.faa') then the clustalw run crashes and returns Error: unknown option /-INFILE=C:\temp\pythonplay\hcgplay\data.faa This is very annoying. Any clues? Thanks jhey ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/ From biopython at maubp.freeserve.co.uk Tue Jul 24 21:13:17 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Jul 2007 22:13:17 +0100 Subject: [BioPython] os.system problem with clustalw In-Reply-To: <935709.26436.qm@web53905.mail.re2.yahoo.com> References: <935709.26436.qm@web53905.mail.re2.yahoo.com> Message-ID: <46A66B6D.3010708@maubp.freeserve.co.uk> Emanuel Hey wrote: > If I have clustalw.exe in the current directory then I > should be able to execute just using > >>>> os.sytem('clustalw ' + 'data.faa') > > Indeed this works fine Yes, and if you try this at the windows command prompt it also works: clustalw data.faa or: clustalw.exe data.faa > However if I give it the full path >>>> os.system('clustalw ' + > 'C:\temp\pythonplay\hcgplay\data.faa') That should fail due to the way python uses slashes as escape characters, so \t means a tab for example. > or >>>> os.system('clustalw ' + > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > then the clustalw run crashes and returns > Error: unknown option > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa Its not crashing, its just returning with an error message. You are not dealing with a Biopython or even a python problem here - you are simply (but understandably) having trouble with the clustalw command line options. Notice that clustalw.exe will tolerate this: clustalw.exe data.faa as shorthand for: clustalw.exe /infile=data.faa However, for some reason it does not seem to work with full paths like this: clustalw.exe C:\temp\pythonplay\hcgplay\data.faa You have to be very explicit: clustalw.exe /infile=C:\temp\pythonplay\hcgplay\data.faa This may be a (windows only?) bug in clustalw. Its certainly not intuitive as is. Peter P.S. Did you not like using Bio.Clustalw to build the command line string for you? From skhadar at gmail.com Wed Jul 25 12:21:17 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Wed, 25 Jul 2007 17:51:17 +0530 Subject: [BioPython] Problem with Bio.PDB Message-ID: Hi, I want to use Bio.PDB module to calculate HSE and Resdue depth for all residues in a couple of proteins. This is the code I had written and am getting some serious errors :( . (My PyQ(Python Quotient is low, sorry if there is any significant syntax errors:) )) --- PS : Input PDB : http://www.rcsb.org/pdb/explore/explore.do?structureId=1crn msms is installed on my computer. -- 1. HSE.py _code_ #!/usr/bin/python from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('str1', '1crn.pdb') model=structure[0] hse = HSExposure() exp_ca = hse.calc_hs_exposure(model, option='CA3') exp_cb = hse.calc_hs_exposure(model, option='CB') print exp_ca[100] _error_ Traceback (most recent call last): File "HSE.py", line 12, in ? hse = HSExposure() TypeError: 'module' object is not callable 2. rd.py _code_ #!/usr/bin/python from Bio.PDB import * parser = PDBParser() structure = parser.get_structure('str1', '1crn.pdb') model=structure[0] rd = ResidueDepth(model, '1crn.pdb') residue_depth, ca_depth=rd[46] _error_ pdb_to_xyzr: error, file 1crn.pdb line 91 residue 1 atom pattern THR N was not found in ./atmtypenumbers pdb_to_xyzr: error, file 1crn.pdb line 92 residue 1 atom pattern THR CA was not found in ./atmtypenumbers (..... till the last atom) pdb_to_xyzr: error, file 1crn.pdb line 417 residue 46 atom pattern ASN OXT was not found in ./atmtypenumbers Traceback (most recent call last): File "rd.py", line 8, in ? rd = ResidueDepth(model, '1crn.pdb') File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 132, in __init__ surface=get_surface(pdb_file) File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 83, in get_surface surface=_read_vertex_array(surface_file) File "/software/biopython-1.42/build/lib.linux-x86_64-2.3/Bio/PDB/ResidueDepth.py", line 51, in _read_vertex_array fp=open(filename, "r") -- Thanks Shameer Khadar NCBS-TIFR Bangalore India From biopython at maubp.freeserve.co.uk Wed Jul 25 13:00:02 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:00:02 +0100 Subject: [BioPython] Problem with Bio.PDB In-Reply-To: References: Message-ID: <46A74952.5030807@maubp.freeserve.co.uk> Shameer Khadar wrote: > Hi, > I want to use Bio.PDB module to calculate HSE and Resdue depth for all > residues in a couple of proteins. > This is the code I had written and am getting some serious errors :( . (My > PyQ(Python Quotient is low, sorry if there is any significant syntax > errors:) )) And your HSE code: > #!/usr/bin/python > from Bio.PDB import * > parser = PDBParser() > structure = parser.get_structure('str1', '1crn.pdb') > model=structure[0] > hse = HSExposure() > exp_ca = hse.calc_hs_exposure(model, option='CA3') > exp_cb = hse.calc_hs_exposure(model, option='CB') > print exp_ca[100] I'm guessing that code was based on a very old copy of the sample file: biopython/Scripts/Structure/hsexpo (which lacks a .py extension), dating to the release of Biopython 1.40 or earlier. Note that the DSSP, ResidueDepth and HSExposure classes changed two years ago. Here is my suggestion based on reading the latest version of that example (shipped with Biopython 1.41 onwards): from Bio.PDB import * RADIUS = 13.0 parser = PDBParser() structure = parser.get_structure('xxxx', '1crn.pdb') model=structure[0] print "HSE based on the approximate CA-CB vectors," print "using three consecutive CA positions:" hse_ca = HSExposureCA(model, RADIUS) for key in hse_ca.keys() : print key, hse_ca[key] print "HSE based on the real CA-CB vectors:" hse_cb = HSExposureCB(model, RADIUS) for key in hse_cb.keys() : print key, hse_cb[key] Peter From biopython at maubp.freeserve.co.uk Wed Jul 25 13:11:21 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:11:21 +0100 Subject: [BioPython] Problem with Bio.PDB documentation for HSE In-Reply-To: <46A74952.5030807@maubp.freeserve.co.uk> References: <46A74952.5030807@maubp.freeserve.co.uk> Message-ID: <46A74BF9.5070404@maubp.freeserve.co.uk> Actually... Shameer was trying to use this HSE code: >> #!/usr/bin/python >> from Bio.PDB import * >> parser = PDBParser() >> structure = parser.get_structure('str1', '1crn.pdb') >> model=structure[0] >> hse = HSExposure() >> exp_ca = hse.calc_hs_exposure(model, option='CA3') >> exp_cb = hse.calc_hs_exposure(model, option='CB') >> print exp_ca[100] I replied: > I'm guessing that code was based on a very old copy of the sample file: > biopython/Scripts/Structure/hsexpo (which lacks a .py extension), dating > to the release of Biopython 1.40 or earlier. Note that the DSSP, > ResidueDepth and HSExposure classes changed two years ago. I've realised that Shameer's code looks like it was based on page 12 of the "The Biopython Structural Bioinformatics FAQ" (i.e. the guide to Bio.PDB), http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf Thomas - it looks like the documentation needs updating here. As far as I can tell, the source is not in CVS... Peter From jodyhey at yahoo.com Wed Jul 25 13:39:43 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 06:39:43 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A66B6D.3010708@maubp.freeserve.co.uk> Message-ID: <396836.63695.qm@web53901.mail.re2.yahoo.com> Thanks much for responding ok, I had no idea Clustalw was so particular about its command line flags. I was not using Bio.Clustalw to build the command line because I could not get that to work either. for example this does not work, for reasons that are obscure to me. >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks jhey --- Peter wrote: > Emanuel Hey wrote: > > If I have clustalw.exe in the current directory > then I > > should be able to execute just using > > > >>>> os.sytem('clustalw ' + 'data.faa') > > > > Indeed this works fine > > Yes, and if you try this at the windows command > prompt it also works: > > clustalw data.faa > > or: > > clustalw.exe data.faa > > > However if I give it the full path > >>>> os.system('clustalw ' + > > 'C:\temp\pythonplay\hcgplay\data.faa') > > That should fail due to the way python uses slashes > as escape > characters, so \t means a tab for example. > > > or > >>>> os.system('clustalw ' + > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > > > then the clustalw run crashes and returns > > Error: unknown option > > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa > > Its not crashing, its just returning with an error > message. > > You are not dealing with a Biopython or even a > python problem here - you > are simply (but understandably) having trouble with > the clustalw command > line options. > > Notice that clustalw.exe will tolerate this: > > clustalw.exe data.faa > > as shorthand for: > > clustalw.exe /infile=data.faa > > However, for some reason it does not seem to work > with full paths like this: > > clustalw.exe C:\temp\pythonplay\hcgplay\data.faa > > You have to be very explicit: > > clustalw.exe > /infile=C:\temp\pythonplay\hcgplay\data.faa > > This may be a (windows only?) bug in clustalw. Its > certainly not > intuitive as is. > > Peter > > P.S. Did you not like using Bio.Clustalw to build > the command line > string for you? > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From jodyhey at yahoo.com Wed Jul 25 13:39:43 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 06:39:43 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A66B6D.3010708@maubp.freeserve.co.uk> Message-ID: <396836.63695.qm@web53901.mail.re2.yahoo.com> Thanks much for responding ok, I had no idea Clustalw was so particular about its command line flags. I was not using Bio.Clustalw to build the command line because I could not get that to work either. for example this does not work, for reasons that are obscure to me. >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks jhey --- Peter wrote: > Emanuel Hey wrote: > > If I have clustalw.exe in the current directory > then I > > should be able to execute just using > > > >>>> os.sytem('clustalw ' + 'data.faa') > > > > Indeed this works fine > > Yes, and if you try this at the windows command > prompt it also works: > > clustalw data.faa > > or: > > clustalw.exe data.faa > > > However if I give it the full path > >>>> os.system('clustalw ' + > > 'C:\temp\pythonplay\hcgplay\data.faa') > > That should fail due to the way python uses slashes > as escape > characters, so \t means a tab for example. > > > or > >>>> os.system('clustalw ' + > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa') > > > > then the clustalw run crashes and returns > > Error: unknown option > > /-INFILE=C:\temp\pythonplay\hcgplay\data.faa > > Its not crashing, its just returning with an error > message. > > You are not dealing with a Biopython or even a > python problem here - you > are simply (but understandably) having trouble with > the clustalw command > line options. > > Notice that clustalw.exe will tolerate this: > > clustalw.exe data.faa > > as shorthand for: > > clustalw.exe /infile=data.faa > > However, for some reason it does not seem to work > with full paths like this: > > clustalw.exe C:\temp\pythonplay\hcgplay\data.faa > > You have to be very explicit: > > clustalw.exe > /infile=C:\temp\pythonplay\hcgplay\data.faa > > This may be a (windows only?) bug in clustalw. Its > certainly not > intuitive as is. > > Peter > > P.S. Did you not like using Bio.Clustalw to build > the command line > string for you? > > ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From biopython at maubp.freeserve.co.uk Wed Jul 25 14:21:53 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 15:21:53 +0100 Subject: [BioPython] os.system problem with clustalw In-Reply-To: <396836.63695.qm@web53901.mail.re2.yahoo.com> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> Message-ID: <46A75C81.8050108@maubp.freeserve.co.uk> Emanuel Hey wrote: > Thanks much for responding > > ok, I had no idea Clustalw was so particular about its command line > flags. Its confused me in the past too! > I was not using Bio.Clustalw to build the command line because I > could not get that to work either. Ahh. Good point. I guess I should have checked this when I wrote my last email, but it turns out Bio.Clustalw was building it's command lines without using the INPUT argument... which I've just fixed in CVS. This should now work: from Bio.Clustalw import MultipleAlignCL, do_alignment faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' cline = MultipleAlignCL(faa_filename) #print cline align = do_alignment(cline) for col_index in range(align.get_alignment_length()) : print align.get_column(col_index) Please try this by backing up and then updating the file Bio/Clustalw/__init__.py to CVS revision 1.15, which you can download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython Thanks Peter From fahy at chapman.edu Wed Jul 25 15:14:06 2007 From: fahy at chapman.edu (Michael Fahy) Date: Wed, 25 Jul 2007 08:14:06 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> Speaking of the clustalw command line, is it possible to get clustalw to output a phylogenetic tree file, rather than an alignment file, via command line arguments? ---------------- > Emanuel Hey wrote: >> Thanks much for responding >> >> ok, I had no idea Clustalw was so particular about its command line >> flags. > > Its confused me in the past too! From jodyhey at yahoo.com Wed Jul 25 16:23:18 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 09:23:18 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <562586.70794.qm@web53909.mail.re2.yahoo.com> Peter Thanks. Actually do_alignment() is not working for me for just names, even without directories. >>> cline = MultipleAlignCL('data.faa') >>> str(cline) 'clustalw -INFILE=data.faa' >>> align = do_alignment(cline) Traceback (most recent call last): File "", line 1, in align = do_alignment(cline) File "C:\Program Files\Python25\lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file data.aln not produced, commandline: clustalw -INFILE=data.faa Is this me? I think I got the new __init__.py installed ok. jhey --- Peter wrote: > Emanuel Hey wrote: > > Thanks much for responding > > > > ok, I had no idea Clustalw was so particular > about its command line > > flags. > > Its confused me in the past too! > > > I was not using Bio.Clustalw to build the command > line because I > > could not get that to work either. > > Ahh. Good point. I guess I should have checked this > when I wrote my last > email, but it turns out Bio.Clustalw was building > it's command lines > without using the INPUT argument... which I've just > fixed in CVS. > > This should now work: > > from Bio.Clustalw import MultipleAlignCL, > do_alignment > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > #print cline > align = do_alignment(cline) > for col_index in range(align.get_alignment_length()) > : > print align.get_column(col_index) > > Please try this by backing up and then updating the > file > Bio/Clustalw/__init__.py to CVS revision 1.15, which > you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython > > Thanks > > Peter > ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From jodyhey at yahoo.com Wed Jul 25 16:23:18 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 25 Jul 2007 09:23:18 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw In-Reply-To: <46A75C81.8050108@maubp.freeserve.co.uk> Message-ID: <562586.70794.qm@web53909.mail.re2.yahoo.com> Peter Thanks. Actually do_alignment() is not working for me for just names, even without directories. >>> cline = MultipleAlignCL('data.faa') >>> str(cline) 'clustalw -INFILE=data.faa' >>> align = do_alignment(cline) Traceback (most recent call last): File "", line 1, in align = do_alignment(cline) File "C:\Program Files\Python25\lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file data.aln not produced, commandline: clustalw -INFILE=data.faa Is this me? I think I got the new __init__.py installed ok. jhey --- Peter wrote: > Emanuel Hey wrote: > > Thanks much for responding > > > > ok, I had no idea Clustalw was so particular > about its command line > > flags. > > Its confused me in the past too! > > > I was not using Bio.Clustalw to build the command > line because I > > could not get that to work either. > > Ahh. Good point. I guess I should have checked this > when I wrote my last > email, but it turns out Bio.Clustalw was building > it's command lines > without using the INPUT argument... which I've just > fixed in CVS. > > This should now work: > > from Bio.Clustalw import MultipleAlignCL, > do_alignment > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > #print cline > align = do_alignment(cline) > for col_index in range(align.get_alignment_length()) > : > print align.get_column(col_index) > > Please try this by backing up and then updating the > file > Bio/Clustalw/__init__.py to CVS revision 1.15, which > you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython > > Thanks > > Peter > ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From italo.maia at gmail.com Wed Jul 25 17:23:43 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 14:23:43 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! @_@ Message-ID: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> I'm confused! Shouldn't SeqIO have a "parse" method?? My ubuntu biopython doesn't. Is this correct? -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From biopython at maubp.freeserve.co.uk Wed Jul 25 21:01:50 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 22:01:50 +0100 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> Message-ID: <46A7BA3E.4000407@maubp.freeserve.co.uk> Italo Maia wrote: > I'm confused! Shouldn't SeqIO have a "parse" method?? > My ubuntu biopython doesn't. Is this correct? You need Biopython 1.43 or later to use the new Bio.SeqIO code. http://biopython.org/wiki/SeqIO Which version of Ubuntu do you have? At the moment, only new un-released Ubuntu Gutsy has the latest version of Biopython in the repositories: http://packages.ubuntu.com/gutsy/source/python-biopython The good news is that is should be simple to install Biopython from source on Ubuntu - provided you install all the dependencies first! I personally still use Ubuntu Dapper Drake on my Linux machine. Peter From biopython at maubp.freeserve.co.uk Wed Jul 25 22:31:39 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 23:31:39 +0100 Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <562586.70794.qm@web53909.mail.re2.yahoo.com> References: <562586.70794.qm@web53909.mail.re2.yahoo.com> Message-ID: <46A7CF4B.6050704@maubp.freeserve.co.uk> Emanuel Hey wrote: > Peter > > Thanks. > > Actually do_alignment() is not working for me for just > names, even without directories. My fault maybe. The old version of Bio/Clustalw/__init__.py before my first update today probably would have worked without directories in the filename. My mistake was that while clustalw on windows seems to copes with / or - for some of its options, it has to be /infile=... rather than -infile=... (using either infile for INFILE is fine). On Linux, you can only use - for arguments. I should have spotted there was something amiss, but I was tricked by the fact that in my simple testing the output alignment already existed, and there was no error trapped from the system call, so it appeared to work. Grr. There are also "complications" when filenames contain spaces (a Microsoft innovation which frankly was in many respects a dreadful idea). Emanuel - please could you try updating Bio/Clustalw/__init__.py once again, trying the do_alignment() function, and reporting back. Please be explicit about the filenames used. Peter From italo.maia at gmail.com Wed Jul 25 23:07:49 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 20:07:49 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <46A7BA3E.4000407@maubp.freeserve.co.uk> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> <46A7BA3E.4000407@maubp.freeserve.co.uk> Message-ID: <800166920707251607jed5b257lcab41695b75b4a2e@mail.gmail.com> Oh well, so i'll try gutsy's biopython debian package. Link : http://mirrors.kernel.org/ubuntu/pool/universe/p/python-biopython/python-biopython_1.43-1_i386.deb Thanks Peter. 2007/7/25, Peter : > > Italo Maia wrote: > > I'm confused! Shouldn't SeqIO have a "parse" method?? > > My ubuntu biopython doesn't. Is this correct? > > You need Biopython 1.43 or later to use the new Bio.SeqIO code. > http://biopython.org/wiki/SeqIO > > Which version of Ubuntu do you have? At the moment, only new un-released > Ubuntu Gutsy has the latest version of Biopython in the repositories: > http://packages.ubuntu.com/gutsy/source/python-biopython > > The good news is that is should be simple to install Biopython from > source on Ubuntu - provided you install all the dependencies first! I > personally still use Ubuntu Dapper Drake on my Linux machine. > > Peter > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From italo.maia at gmail.com Wed Jul 25 23:07:49 2007 From: italo.maia at gmail.com (Italo Maia) Date: Wed, 25 Jul 2007 20:07:49 -0300 Subject: [BioPython] My SeqIO doesn't have a "parse" method! In-Reply-To: <46A7BA3E.4000407@maubp.freeserve.co.uk> References: <800166920707251023u3a1cd8b0vf271440cbdcfa58a@mail.gmail.com> <46A7BA3E.4000407@maubp.freeserve.co.uk> Message-ID: <800166920707251607jed5b257lcab41695b75b4a2e@mail.gmail.com> Oh well, so i'll try gutsy's biopython debian package. Link : http://mirrors.kernel.org/ubuntu/pool/universe/p/python-biopython/python-biopython_1.43-1_i386.deb Thanks Peter. 2007/7/25, Peter : > > Italo Maia wrote: > > I'm confused! Shouldn't SeqIO have a "parse" method?? > > My ubuntu biopython doesn't. Is this correct? > > You need Biopython 1.43 or later to use the new Bio.SeqIO code. > http://biopython.org/wiki/SeqIO > > Which version of Ubuntu do you have? At the moment, only new un-released > Ubuntu Gutsy has the latest version of Biopython in the repositories: > http://packages.ubuntu.com/gutsy/source/python-biopython > > The good news is that is should be simple to install Biopython from > source on Ubuntu - provided you install all the dependencies first! I > personally still use Ubuntu Dapper Drake on my Linux machine. > > Peter > > -- "A arrog?ncia ? a arma dos fracos." =========================== Italo Moreira Campelo Maia Ci?ncia da Computa??o - UECE Desenvolvedor WEB Programador Java, Python Meu blog ^^ http://eusouolobomal.blogspot.com/ =========================== From biopython at maubp.freeserve.co.uk Thu Jul 26 13:55:14 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 14:55:14 +0100 Subject: [BioPython] clustalw and trees In-Reply-To: <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> References: <396836.63695.qm@web53901.mail.re2.yahoo.com> <46A75C81.8050108@maubp.freeserve.co.uk> <3181.66.27.156.108.1185376446.squirrel@webmail.chapman.edu> Message-ID: <46A8A7C2.3000003@maubp.freeserve.co.uk> Michael Fahy wrote: > > Speaking of the clustalw command line, is it possible to get clustalw to > output a phylogenetic tree file, rather than an alignment file, via > command line arguments? Do you mean in addition to the guide tree (*.dnd) it generates by default when building an alignment (*.aln) from a fasta input? I would have a look at clustalw -help (which works on Linux!) for more options, e.g. -OUTPUTTREE=nj OR phylip OR dist OR nexus -SEED=n :seed number for bootstraps. -KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps. -BOOTLABELS=node OR branch :position of bootstrap values in tree display Peter From jodyhey at yahoo.com Thu Jul 26 15:25:33 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 26 Jul 2007 08:25:33 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <46A7CF4B.6050704@maubp.freeserve.co.uk> Message-ID: <847192.79361.qm@web53902.mail.re2.yahoo.com> Peter both of these now work. >>> faa_filename = 'data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) >>> >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) Thanks! jhey --- Peter wrote: > Emanuel Hey wrote: > > Peter > > > > Thanks. > > > > Actually do_alignment() is not working for me for > just > > names, even without directories. > > My fault maybe. The old version of > Bio/Clustalw/__init__.py before my > first update today probably would have worked > without directories in the > filename. > > My mistake was that while clustalw on windows seems > to copes with / or - > for some of its options, it has to be /infile=... > rather than > -infile=... (using either infile for INFILE is > fine). On Linux, you can > only use - for arguments. > > I should have spotted there was something amiss, but > I was tricked by > the fact that in my simple testing the output > alignment already existed, > and there was no error trapped from the system call, > so it appeared to > work. Grr. > > There are also "complications" when filenames > contain spaces (a > Microsoft innovation which frankly was in many > respects a dreadful idea). > > Emanuel - please could you try updating > Bio/Clustalw/__init__.py once > again, trying the do_alignment() function, and > reporting back. Please be > explicit about the filenames used. > > Peter > > ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 From biopython at maubp.freeserve.co.uk Thu Jul 26 15:57:26 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 16:57:26 +0100 Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <847192.79361.qm@web53902.mail.re2.yahoo.com> References: <847192.79361.qm@web53902.mail.re2.yahoo.com> Message-ID: <46A8C466.3000803@maubp.freeserve.co.uk> Emanuel Hey wrote: > Peter > > both of these now work. > > faa_filename = 'data.faa' > cline = MultipleAlignCL(faa_filename) > align = do_alignment(cline) > > faa_filename = > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > cline = MultipleAlignCL(faa_filename) > align = do_alignment(cline) > > Thanks! > > jhey Oh good :) Have you tried file names and/or paths with spaces in them? Peter From dalloliogm at gmail.com Thu Jul 26 16:15:52 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 26 Jul 2007 18:15:52 +0200 Subject: [BioPython] biopython UML class diagram documentation? Message-ID: <5aa3b3570707260915i77862742k7a0d92b553fe6566@mail.gmail.com> Hi, where can I find an UML documentation with at least class diagrams of the whole biopython project? I have already found this one from the Pasteur Institute of Paris: - http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png but I was wondering if there is something ufficial in the wiki or somewhere else. Thanks! -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From fahy at chapman.edu Thu Jul 26 17:35:56 2007 From: fahy at chapman.edu (Michael Fahy) Date: Thu, 26 Jul 2007 10:35:56 -0700 Subject: [BioPython] clustalw and trees In-Reply-To: <46A8A7C2.3000003@maubp.freeserve.co.uk> Message-ID: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> Peter, I was hoping to get it to generate a "real" phylip tree file rather than the guide tree that it generates automatically. If I run it from the command line and add the flag "-OUTPUTTREE=phylip", it creates a guide tree file and an alignment file but no phylip tree file. If I run it interactively I can get it to create a phylip tree file (which is different from the guide tree file) but I have not been able to figure out how to do this from the command line. -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Thursday, July 26, 2007 6:55 AM To: fahy at chapman.edu Cc: biopython at biopython.org Subject: Re: [BioPython] clustalw and trees Michael Fahy wrote: > > Speaking of the clustalw command line, is it possible to get clustalw to > output a phylogenetic tree file, rather than an alignment file, via > command line arguments? Do you mean in addition to the guide tree (*.dnd) it generates by default when building an alignment (*.aln) from a fasta input? I would have a look at clustalw -help (which works on Linux!) for more options, e.g. -OUTPUTTREE=nj OR phylip OR dist OR nexus -SEED=n :seed number for bootstraps. -KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps. -BOOTLABELS=node OR branch :position of bootstrap values in tree display Peter From jodyhey at yahoo.com Thu Jul 26 17:38:16 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 26 Jul 2007 10:38:16 -0700 (PDT) Subject: [BioPython] os.system problem with clustalw (on Windows) In-Reply-To: <46A8C466.3000803@maubp.freeserve.co.uk> Message-ID: <709674.30122.qm@web53908.mail.re2.yahoo.com> I'm avoiding directories with space names, but I just did a quick check. This works >>> align = do_alignment(cline) >>> faa_filename = 'C:\\temp\\pythonplay\\hcgplay\\space test\\data.faa' >>> cline = MultipleAlignCL(faa_filename) >>> align = do_alignment(cline) jhey --- Peter wrote: > Emanuel Hey wrote: > > Peter > > > > both of these now work. > > > > faa_filename = 'data.faa' > > cline = MultipleAlignCL(faa_filename) > > align = do_alignment(cline) > > > > faa_filename = > > 'C:\\temp\\pythonplay\\hcgplay\\data.faa' > > cline = MultipleAlignCL(faa_filename) > > align = do_alignment(cline) > > > > Thanks! > > > > jhey > > Oh good :) > > Have you tried file names and/or paths with spaces > in them? > > Peter > > ____________________________________________________________________________________ Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From biopython at maubp.freeserve.co.uk Thu Jul 26 18:29:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Jul 2007 19:29:13 +0100 Subject: [BioPython] clustalw and trees In-Reply-To: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> References: <003601c7cfab$6c8dd460$c789d3ce@chapman.edu> Message-ID: <46A8E7F9.1010506@maubp.freeserve.co.uk> Michael Fahy wrote: > Peter, > > I was hoping to get it to generate a "real" phylip tree file rather than the > guide tree that it generates automatically. If I run it from the command > line and add the flag "-OUTPUTTREE=phylip", it creates a guide tree file > and an alignment file but no phylip tree file. If I run it interactively I > can get it to create a phylip tree file (which is different from the guide > tree file) but I have not been able to figure out how to do this from the > command line. Oh. Maybe you need to pore over the clustalw documentation... You might also look at the EMBOSS version of PHYLIP and use that instead (assuming its available on your OS). http://emboss.sourceforge.net/apps/release/5.0/embassy/phylip/ http://emboss.sourceforge.net/apps/release/5.0/embassy/phylipnew/ They have repackaged all the PHYLIP tools with usable command line interfaces - if you have ever tried to use the original PHYLIP tools in a script you'll appreciate the difference. Peter From mmokrejs at ribosome.natur.cuni.cz Fri Jul 27 13:42:56 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 27 Jul 2007 15:42:56 +0200 Subject: [BioPython] GenBank parser used to break recently on rRNA records Message-ID: <46A9F660.1030107@ribosome.natur.cuni.cz> Hi, I tried to parse all ESTs and cDNAs from GenBank using biopython about 3 weeks old from CVS and it turned out it choked here: Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz' Traceback (most recent call last): File "translate_ESTs.py", line 27, in ? _record = _iterator.next() File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next return self._parser.parse(self.handle) File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \ AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): LOCUS DQ369798 725 bp rRNA linear HTC 14-JUN-2007 However, the code has been revamped as I see in current CVS, so this is just for your information. I can parse the file with current code. ;-) Martin From biopython at maubp.freeserve.co.uk Fri Jul 27 14:08:38 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 15:08:38 +0100 Subject: [BioPython] GenBank parser used to break recently on rRNA records In-Reply-To: <46A9F660.1030107@ribosome.natur.cuni.cz> References: <46A9F660.1030107@ribosome.natur.cuni.cz> Message-ID: <46A9FC66.7080507@maubp.freeserve.co.uk> Martin MOKREJ? wrote: > Hi, > I tried to parse all ESTs and cDNAs from GenBank using biopython about > 3 weeks old from CVS and it turned out it choked here: > > Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz' > Traceback (most recent call last): > File "translate_ESTs.py", line 27, in ? > _record = _iterator.next() > File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next > return self._parser.parse(self.handle) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse > self._scanner.feed(handle, self._consumer) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed > self._feed_first_line(consumer, self.line) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line > assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \ > AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): > LOCUS DQ369798 725 bp rRNA linear HTC 14-JUN-2007 > > However, the code has been revamped as I see in current CVS, so this is > just for your information. I can parse the file with current code. ;-) > Martin It looks like the NCBI have introduced another sequence type to their databases, 'rRNA' in this case. I think this validates the recent change which will now accept anything with 'RNA' or 'DNA' in the string :) Peter From douglas.kojetin at gmail.com Fri Jul 27 19:50:46 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Fri, 27 Jul 2007 15:50:46 -0400 Subject: [BioPython] Bio.PDB: create a dummy vector Message-ID: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> Hi All, I would like to calculate the the angle between all of the N-H vectors in a PDB file to a specific point in 3D space. Can someone tell me how to create a dummy atom using Biopython located at [0.0, 0.0, 0.0]? atom1 = structure[0]['A'][1]['H'] atom2 = structure[0]['A'][1]['N'] vector1=atom1.get_vector() vector2=atom2.get_vector() dummy = :: somehow create a point at [0.0, 0.0, 0.0] ::: angle=calc_angle(vector1,vector2,dummy) Thanks, Doug From karbak at gmail.com Sat Jul 28 17:32:51 2007 From: karbak at gmail.com (K. Arun) Date: Sat, 28 Jul 2007 13:32:51 -0400 Subject: [BioPython] Bio.PDB: create a dummy vector In-Reply-To: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> References: <19E72ECB-55F6-4A1A-98F8-E04D5ECFF0DC@gmail.com> Message-ID: <162452a10707281032r2c88b689k19706c122c851802@mail.gmail.com> On 7/27/07, Douglas Kojetin wrote: > I would like to calculate the the angle between all of the N-H > vectors in a PDB file to a specific point in 3D space. Can someone > tell me how to create a dummy atom using Biopython located at [0.0, > 0.0, 0.0]? > > atom1 = structure[0]['A'][1]['H'] > atom2 = structure[0]['A'][1]['N'] > > vector1=atom1.get_vector() > vector2=atom2.get_vector() > dummy = :: somehow create a point at [0.0, 0.0, 0.0] ::: Just calling Bio.PDB.Vector directly as below seems to work. dummy = Bio.PDB.Vector(0.0, 0.0, 0.0) -arun From fahy at chapman.edu Sun Jul 29 01:07:29 2007 From: fahy at chapman.edu (Michael Fahy) Date: Sat, 28 Jul 2007 18:07:29 -0700 (PDT) Subject: [BioPython] biopython In-Reply-To: <46A999C9.5010800@unice.fr> References: <46A999C9.5010800@unice.fr> Message-ID: <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> Dear Richard, Thank you for correcting my misuse of terminology.? As I understand clusalw, it generates a guide tree from a distance matrix calculated from pairwise alignments.? It then uses this guide tree to do a full multiple alignment.? If you run clustalw interactively, you can ask it to generate a phylogenetic tree file from this multiple alignment.? The tree file produced in this way differs, naturally enough, from the guide tree.? If you run clustalw and pass it command line arguments it will automatically? write the guide tree to a file but I have not be able to get it to write the other tree to a file. I now understand from your comments that there is little value in creating this tree file automatically due to inaccuracies in the clustalw alignment and other factors.? I have read some references that do recommend using clustalw for creating multiple alignments (and even for creating phlyogenetic trees).? I have also read Edgar's paper in which he provides evidence for the superior accuracy of MUSCLE.? Is there consensus in the research community that , while clustalw was a useful program for doing multiple alignments, it has been surpassed by newer programs such as MUSCLE and T-Coffee?? If so, it would be useful to update BipPython and the BioPython Tutorial and Cookbook to use these alternative programs. And, if you have created a multiple alignment and cleaned it (e.g. by removing domains with too much homoplasy) which tool or tools would you use to create the tree file (or files) from the alignment?? I understand that you recommend using multiple methods (neighbor joining, parsimony, maximum likelihood, etc.) and comparing the results.? I would guess that there are different tools that are better suited to each method.? You mention TreeeDyn and that looks like a very powerful tool but it appears that it is used for editing trees and not for creating tree files from multiple alignment files. OK, I just saw the link to your genbank2treedyn program on the treedyn site.? It looks like your program will read a fasta file with a set of sequences and then use clustalw to do the multiple alignment and phylip to create the phylogenetic tree.? So I guess you are not opposed to using clustalw, you are just warning against using its multiple alignment files to create trees without analyzing and correcting them by hand. Thanks for your help. --Michael -- > Dear Michael > > I was hoping to get it to generate a "real" phylip tree file rather than > the > guide tree that it generates automatically. > > 1/ > There is nothing such as a "phylip" tree > The usual tree format for phylip as well as many treeing programs is > "newick", in the form > ((a,b),c) > This is the format of the clustal guide tree. > > 2/ you should read clustal and phylip docs as well as some phylogenetic > courses > Making a good phylogenetic tree cannot be automated yet. You have to > - check alignement by hand (clustal will align sequences that should not > be aligned) > - exclude domains (positions) with too much homoplasy, or missing > positions in some sequences. > - several methods should be compared (distances, Ml, MP, ...) and a > boostrap run > > Clustal is an alignement program (you may try Muscle, Lagan, Tcoffe, > ...) and not at all a phylogeny program > > Finally, if you make trees, try : www.treedyn.org ;-) > > best > Richard > From biopython at maubp.freeserve.co.uk Sun Jul 29 09:06:13 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Jul 2007 10:06:13 +0100 Subject: [BioPython] biopython In-Reply-To: <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> References: <46A999C9.5010800@unice.fr> <2399.10.100.0.80.1185671249.squirrel@webmail.chapman.edu> Message-ID: <46AC5885.7010007@maubp.freeserve.co.uk> Michael Fahy wrote: > If you run clustalw and pass it command line arguments it will > automatically write the guide tree to a file but I have not be > able to get it to write the other tree to a file. If you use filename.fasta as input (or similar extensions), then by default Clustalw will call the alignment filename.aln and the guide tree filename.dnd (I normally accept these defaults myself). An example of changing the alignment filename (tested on Linux): clustalw -infile=demo.faa -outfile=demo.align This will result in an alignment, demo.align (our specified name), and a guide tree called demo.dnd (default naming). Another command line: clustalw -infile=demo.faa -newtree=demo.tree This will result only in a guide tree, demo.tree (our specified name), but no alignment. I don't know if you can get clustal to output both the alignment and the guide tree specifying both filenames. > Is there consensus in the research community that , while clustalw > was a useful program for doing multiple alignments, it has been > surpassed by newer programs such as MUSCLE and T-Coffee? My personal impression is that many Biologists are quite content with clustalw. If they are unhappy with an alignment then they might edit it by hand, or investigate other tools. There may be a link to the age of the researcher, or their computer skill ;) > If so, it would be useful to update BipPython and the BioPython > Tutorial and Cookbook to use these alternative programs. I would keep the clustalw examples, as I think the program is still widely used and is a useful baseline. Its also available on all the main operating systems. As MUSCLE is available on Linux, Mac and Windows, extending the tutorial to use it might be nice. Ideally we would also want to add a MUSCLE command line wrapper to Biopython to make calling the program as easy as possible. For T-Coffee, its not obvious if its cross platform, but I suspect its not available for Windows. Again, the same caveat applies - it would be best to have a Biopython command line wrapper for it before adding it to the tutorial. If you are happy at the command line, and running command line tools from python, then these tools should read FASTA files and output at least one output format Biopython can read. Peter From jdiezperezj at gmail.com Tue Jul 31 14:38:34 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Tue, 31 Jul 2007 16:38:34 +0200 Subject: [BioPython] blast output xml Message-ID: Hi, Does anyone knows if is it possible to get blast-xml output running blast from biopython scripts? How can I do that? Thanks Javi From jdiezperezj at gmail.com Tue Jul 31 14:45:28 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Tue, 31 Jul 2007 16:45:28 +0200 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: Running local blast On 7/31/07, Javier D?ez wrote: > > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi > From biopython at maubp.freeserve.co.uk Tue Jul 31 15:14:18 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Jul 2007 16:14:18 +0100 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: <46AF51CA.2030005@maubp.freeserve.co.uk> Javier D?ez wrote: > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi Yes, you can run standalone blast from Biopython, and parse its XML output. See "Chapter 3 BLAST" of the tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html Note that while parsing the plain text worked well with older versions of BLAST. We don't recommend using this anymore - use the XML output. Peter From mdehoon at c2b2.columbia.edu Tue Jul 31 15:11:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 01 Aug 2007 00:11:38 +0900 Subject: [BioPython] blast output xml In-Reply-To: References: Message-ID: <46AF512A.7050305@c2b2.columbia.edu> See section 3.1 in the manual. --Michiel. Javier D?ez wrote: > Hi, > Does anyone knows if is it possible to get blast-xml output running blast > from biopython scripts? > How can I do that? > Thanks > Javi > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython