From chapmanb at 50mail.com Wed May 1 06:04:08 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 01 May 2013 06:04:08 -0400 Subject: [Biopython] GFF parsing with biopython In-Reply-To: References: Message-ID: <877gjjkodj.fsf@fastmail.fm> Mic; (moving to galaxy-dev list so folks there can follow, but future questions are more appropriate for the Biopython list only since this isn't a Galaxy question) > I have the following GFF file from a SNAP > > X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 [...] > With the code below I have tried to parse the above GFF file The attributes you're missing are parts of the feature, not the SeqRecord itself, which is why you're seeing attribute error. Here's a full example that pulls all of the information from an example line: from BCBio import GFF in_file = "snap.gff" with open(in_file) as in_handle: for rec in GFF.parse(in_handle): feature = rec.features[0] print rec.id print feature.qualifiers["source"][0] print feature.type print feature.location.start print feature.location.end print feature.qualifiers["score"][0] print feature.location.strand print feature.qualifiers.get("X1-snap.1", [None])[0] which outputs: X1 SNAP Einit 2578 2712 -3.221 1 true Hope this helps, Brad From zhigangwu.bgi at gmail.com Wed May 1 22:11:38 2013 From: zhigangwu.bgi at gmail.com (Zhigang Wu) Date: Wed, 1 May 2013 19:11:38 -0700 Subject: [Biopython] [Biopython-dev] Lazy-loading parsers, was: Biopython GSoC 2013 applications via NESCent In-Reply-To: References: Message-ID: Thanks so much Alex. I definitely will take a look at it. Thanks again for making your code available. Zhigang On May 1, 2013, at 3:56 PM, "Alex Leach" wrote: > Dear all, > > I also left some minor comments on the proposal; I hope they're helpful and I wish you every success! > > You should focus on the proposal for now, but I thought I'd share a more presentable version of the fasta lazy-loader I wrote a couple of years ago. The focus at the time was to minimise memory usage and increase the speed of random access to fasta-formatted sequences, stored on disk. Only sequence accessions and file locations are stored in-memory (in a dict). Once the index has been populated, it can 'pickle' the dictionary to a file on disk, for later re-use. > > It doesn't exactly fulfill all of your needs, but I hope it might help you in the right direction.. > > Also, were there plans for making the lazy loader thread-safe? I've done it in the past by passing a `multiprocessing.Pipe` instance to a method (`pipe_sequences`) of the lazy loader. If redesigning the code, I'd try to implement a callback scheme, but passing a Pipe did the job.. Maybe it's outside the current scope of the project, but anyway, I put the module up on github if you want to check it out[1]. > > > Cheers, > Alex > > > [1] - https://github.com/alexleach/fasta_lazy_loader/blob/master/fasta_lazy_loader.py > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mictadlo at gmail.com Fri May 3 01:08:18 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 3 May 2013 15:08:18 +1000 Subject: [Biopython] GFF parsing with biopython In-Reply-To: <877gjjkodj.fsf@fastmail.fm> References: <877gjjkodj.fsf@fastmail.fm> Message-ID: Thank you. On Wed, May 1, 2013 at 8:04 PM, Brad Chapman wrote: > > Mic; > (moving to galaxy-dev list so folks there can follow, but future > questions are more appropriate for the Biopython list only since > this isn't a Galaxy question) > > > I have the following GFF file from a SNAP > > > > X1 SNAP Einit 2579 2712 -3.221 + . > X1-snap.1 > [...] > > With the code below I have tried to parse the above GFF file > > The attributes you're missing are parts of the feature, not the > SeqRecord itself, which is why you're seeing attribute error. Here's a > full example that pulls all of the information from an example line: > > from BCBio import GFF > > in_file = "snap.gff" > with open(in_file) as in_handle: > for rec in GFF.parse(in_handle): > feature = rec.features[0] > print rec.id > print feature.qualifiers["source"][0] > print feature.type > print feature.location.start > print feature.location.end > print feature.qualifiers["score"][0] > print feature.location.strand > print feature.qualifiers.get("X1-snap.1", [None])[0] > > which outputs: > > X1 > SNAP > Einit > 2578 > 2712 > -3.221 > 1 > true > > Hope this helps, > Brad > From mictadlo at gmail.com Fri May 3 01:26:18 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 3 May 2013 15:26:18 +1000 Subject: [Biopython] GFF.writer Message-ID: Hi, I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) an example how to write gff3 files. Is it possible not to write "##sequence-region ID1 1 20"? Thank you in advance. Mic From chapmanb at 50mail.com Fri May 3 06:51:37 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 03 May 2013 06:51:37 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: Message-ID: <878v3ws5dy.fsf@fastmail.fm> Mic; > I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) an > example how to write gff3 files. > > Is it possible not to write "##sequence-region ID1 1 20"? It only writes that line if you supply a sequence for the SeqRecord: https://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFOutput.py#L105 so manually removing the sequence before writing will prevent it from outputting that metadata. However, the GFF3 spec encourages writing these metadata items: http://www.sequenceontology.org/gff3.shtml Brad From carlos.borroto at gmail.com Fri May 3 12:42:33 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Fri, 3 May 2013 12:42:33 -0400 Subject: [Biopython] Private '_filenames' attribute in class _SQLiteManySeqFilesDict? Message-ID: Hi, I'm writing a tool to index and query a local database using SeqIO.index_db(). I want to provide something similar to: $ blastdbcmd -db nt -info Database: Nucleotide collection (nt) 17,200,582 sequences; 44,123,601,625 total bases Date: Feb 7, 2013 4:14 PM Longest sequence: 65,476,681 bases Volumes: /local/db/blast/nt.00 /local/db/blast/nt.01 /local/db/blast/nt.02 /local/db/blast/nt.03 /local/db/blast/nt.04 /local/db/blast/nt.05 /local/db/blast/nt.06 /local/db/blast/nt.07 /local/db/blast/nt.08 /local/db/blast/nt.09 /local/db/blast/nt.10 /local/db/blast/nt.11 /local/db/blast/nt.12 I have limited Object-Oriented Python's skills, actually very low exposure to OO in general. The object returned by SeqIO.index_db() has an attribute '_filenames' which I could use to report "volumes". I recently read it is a common rule to mark private methods and attributes with the starting underscore. Would it be safe to use '_filenames' in this case? Could this attribute be converted to a public one so it can be safely use without fear of breaking the code in the future? Thanks, Carlos From andreagarcia871 at gmail.com Sun May 5 16:21:06 2013 From: andreagarcia871 at gmail.com (=?ISO-8859-1?Q?Andrea_Garc=EDa?=) Date: Sun, 5 May 2013 17:21:06 -0300 Subject: [Biopython] How to install tests manually? Message-ID: Hello all, I have just installed BioPython using defaults on Windows 7, and it seems all tests files are missing. I've checked under: c:\Python27\Lib\site-packages\Bio\ against this directory https://github.com/biopython/biopython/tree/master/Tests How can I install the tests manually? Cheers From mictadlo at gmail.com Mon May 6 01:02:01 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 6 May 2013 15:02:01 +1000 Subject: [Biopython] GFF.writer In-Reply-To: <878v3ws5dy.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: Hi Brad, Thank you it is working, but I have few questions by running the bellow code: from BCBio import GFF from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.SeqFeature import SeqFeature, FeatureLocation out_file = "your_file.gff" seq = Seq("GATCGATCGATCGATCGATC") rec = SeqRecord(seq, "ID1") qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis thaliana RepID:Q9LMV1_ARATH", "ID": "gene1"} sub_qualifiers = {"source": "prediction"} top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, qualifiers=qualifiers) top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", strand=1, score=12, qualifiers=sub_qualifiers), SeqFeature(FeatureLocation(15, 20), type="exon", strand=1, score=-13, qualifiers=sub_qualifiers)] rec.features = [top_feature] with open(out_file, "w") as out_handle: GFF.write([rec], out_handle) * How is it possible to avoid to get e.g. *%20* and is there a way to get this order ID, note in below output? note=F5M15.26*%20*n*%3A* 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 * How is it possible to get score in sub_features, because the above code caused the following error? Traceback (most recent call last): File "problem.py", line 15, in qualifiers=sub_qualifiers), TypeError: __init__() got an unexpected keyword argument 'score' Thank you in advance Mic On Fri, May 3, 2013 at 8:51 PM, Brad Chapman wrote: > > Mic; > > > I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) > an > > example how to write gff3 files. > > > > Is it possible not to write "##sequence-region ID1 1 20"? > > It only writes that line if you supply a sequence for the SeqRecord: > > > https://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFOutput.py#L105 > > so manually removing the sequence before writing will prevent it from > outputting that metadata. However, the GFF3 spec encourages writing > these metadata items: > > http://www.sequenceontology.org/gff3.shtml > > Brad > From w.arindrarto at gmail.com Mon May 6 02:16:11 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 6 May 2013 08:16:11 +0200 Subject: [Biopython] How to install tests manually? In-Reply-To: References: Message-ID: > Hello all, > > I have just installed BioPython using defaults on Windows 7, and it seems > all tests files are missing. I've checked under: > > c:\Python27\Lib\site-packages\Bio\ > > against this directory > > https://github.com/biopython/biopython/tree/master/Tests > > How can I install the tests manually? Hi Andrea, AFAIK, the Windows installer does not include the Test directory. If you would like to run the tests, you need to download the source, unpack it, and run it using `python setup.py test` from the top directory. Hope that helps :), Bow From p.j.a.cock at googlemail.com Mon May 6 04:21:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 09:21:16 +0100 Subject: [Biopython] How to install tests manually? In-Reply-To: References: Message-ID: On Mon, May 6, 2013 at 7:16 AM, Wibowo Arindrarto wrote: >> Hello all, >> >> I have just installed BioPython using defaults on Windows 7, and it seems >> all tests files are missing. I've checked under: >> >> c:\Python27\Lib\site-packages\Bio\ >> >> against this directory >> >> https://github.com/biopython/biopython/tree/master/Tests >> >> How can I install the tests manually? > > Hi Andrea, > > AFAIK, the Windows installer does not include the Test directory. If > you would like to run the tests, you need to download the source, > unpack it, and run it using `python setup.py test` from the top > directory. > > Hope that helps :), > Bow Yes, that is correct - the Windows installers do not include the source code (nor the compiled Tutorial as PDF and HTML). We provide the source code bundles ZIP files to make this easier for Windows users (where the Unix style tar-balls are harder to decompress). If you are interested in doing Biopython development running the tests is a really good idea, but you'll also want to setup the C compilers so you can build Biopython (assuming you are using C Python, this would not apply to PyPy or Jython). Peter From p.j.a.cock at googlemail.com Mon May 6 04:29:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 09:29:05 +0100 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: On Mon, May 6, 2013 at 6:02 AM, Mic wrote: > Hi Brad, > Thank you it is working, but I have few questions by running the bellow > code: > from BCBio import GFF > from Bio.Seq import Seq > from Bio.SeqRecord import SeqRecord > from Bio.SeqFeature import SeqFeature, FeatureLocation > > out_file = "your_file.gff" > seq = Seq("GATCGATCGATCGATCGATC") > rec = SeqRecord(seq, "ID1") > qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis > thaliana RepID:Q9LMV1_ARATH", > "ID": "gene1"} > sub_qualifiers = {"source": "prediction"} > top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, > qualifiers=qualifiers) > top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", > strand=1, score=12, > qualifiers=sub_qualifiers), > SeqFeature(FeatureLocation(15, 20), > type="exon", strand=1, score=-13, > qualifiers=sub_qualifiers)] > rec.features = [top_feature] > > with open(out_file, "w") as out_handle: > GFF.write([rec], out_handle) > > > * How is it possible to avoid to get e.g. *%20* and is there a way to get > this order ID, note in below output? > note=F5M15.26*%20*n*%3A* > 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 > > * How is it possible to get score in sub_features, because the above code > caused the following error? > Traceback (most recent call > last): > > File "problem.py", line 15, in > > > > qualifiers=sub_qualifiers), > > TypeError: __init__() got an unexpected keyword argument 'score' > > Thank you in advance > > Mic > Hi Mic, Just to give you advance warning, sub-features are being deprecated in the next release of Biopython. You'll still get them when parsing a GenBank file etc, but they won't be used when writing the GenBank file. Instead we have a new CompoundFeatureLocation instead. One of the reasons for doing this is that historically sub-features have been used for complex locations and NOT parent/child style relationships as in GFF. Brad - this would be a good thing for us to work on at the upcoming CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013 Peter From chapmanb at 50mail.com Mon May 6 07:03:20 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 06 May 2013 07:03:20 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: <871u9k5q13.fsf@fastmail.fm> Mic; > Thank you it is working, but I have few questions by running the bellow > code: [...] > * How is it possible to avoid to get e.g. *%20* and is there a way to get > this order ID, note in below output? > note=F5M15.26*%20*n*%3A* > 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 Apologies, I am escaping too much according to the GFF specification. I checked in a fix to avoid escaping spaces and semi-colons. If you get the latest version from GitHub it will avoid this issue. I also checked in an update to order the key/value attributes in alphabetical order. There isn't a defined ordering of these in the spec but I agree that a consistent one would be nice. Thanks for all the useful feedback. > * How is it possible to get score in sub_features, because the above code > caused the following error? You want to specify the score as part of the SeqFeature qualifiers. Your fixed code is: from BCBio import GFF from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.SeqFeature import SeqFeature, FeatureLocation out_file = "your_file.gff" seq = Seq("GATCGATCGATCGATCGATC") rec = SeqRecord(seq, "ID1") qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis thaliana RepID:Q9LMV1_ARATH", "ID": "gene1"} top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, qualifiers=qualifiers) top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", strand=1, qualifiers={"source": "prediction", "score": 12}), SeqFeature(FeatureLocation(15, 20), type="exon", strand=1, qualifiers={"source": "prediction", "score": -13})] rec.features = [top_feature] with open(out_file, "w") as out_handle: GFF.write([rec], out_handle) Peter: > Just to give you advance warning, sub-features are being deprecated > in the next release of Biopython. You'll still get them when parsing a > GenBank file etc, but they won't be used when writing the GenBank > file. Instead we have a new CompoundFeatureLocation instead. > One of the reasons for doing this is that historically sub-features > have been used for complex locations and NOT parent/child style > relationships as in GFF. > > Brad - this would be a good thing for us to work on at the upcoming > CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013 Agreed, I need to get up to date with this on the latest release. I'm also going to spend some time and merge most of the functionality into Ryan's gffutils library so it can import and export Biopython objects: https://github.com/daler/gffutils/tree/refactor Brad From p.j.a.cock at googlemail.com Mon May 6 07:17:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 12:17:12 +0100 Subject: [Biopython] GFF.writer In-Reply-To: <871u9k5q13.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> Message-ID: On Mon, May 6, 2013 at 12:03 PM, Brad Chapman wrote: > > I also checked in an update to order the key/value attributes in > alphabetical order. There isn't a defined ordering of these in the spec > but I agree that a consistent one would be nice. Would using an OrderedDict be neater? i.e. Preserve any user given order or whatever there was when parsing. This would allow ad-hoc conventions like the ID is first to be observed (or whatever the user preferred). We'll be dropping Python 2.5 support shortly so that isn't a problem - and in any case we 're already bundling a backport of OrderedDict under Bio._py3k so you could use that if needed: from Bio._py3k import OrderedDict Peter From p.j.a.cock at googlemail.com Mon May 6 09:26:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 14:26:20 +0100 Subject: [Biopython] Private '_filenames' attribute in class _SQLiteManySeqFilesDict? In-Reply-To: References: Message-ID: On Fri, May 3, 2013 at 5:42 PM, Carlos Borroto wrote: > Hi, > > I'm writing a tool to index and query a local database using > SeqIO.index_db(). I want to provide something similar to: > $ blastdbcmd -db nt -info > Database: Nucleotide collection (nt) > 17,200,582 sequences; 44,123,601,625 total bases > > Date: Feb 7, 2013 4:14 PM Longest sequence: 65,476,681 bases > > Volumes: > /local/db/blast/nt.00 > /local/db/blast/nt.01 > /local/db/blast/nt.02 > /local/db/blast/nt.03 > /local/db/blast/nt.04 > /local/db/blast/nt.05 > /local/db/blast/nt.06 > /local/db/blast/nt.07 > /local/db/blast/nt.08 > /local/db/blast/nt.09 > /local/db/blast/nt.10 > /local/db/blast/nt.11 > /local/db/blast/nt.12 > > I have limited Object-Oriented Python's skills, actually very low > exposure to OO in general. The object returned by SeqIO.index_db() has > an attribute '_filenames' which I could use to report "volumes". I > recently read it is a common rule to mark private methods and > attributes with the starting underscore. Would it be safe to use > '_filenames' in this case? Could this attribute be converted to a > public one so it can be safely use without fear of breaking the code > in the future? > > Thanks, > Carlos Hi Carlos, Yes, this use of the underscore is intended to indicate that the attribute _filenames is private. This should be safe to use if you treat it as read only, but the intent was to keep open the option of changing the implementation without altering the public API. In this case, there is no need to keep the list in memory since it is also in the SQLite index file - and so a future optimisation to reduce memory usage might remove the _filenames attribute. However, that hasn't changed yet. I would not like to make this a public (read write) attribute. It might be reasonable to expose it as a read only attribute (using a Python property). For now I would just use it in your code, but comment that it is a private attribute and could break in a future release of Biopython. Regards, Peter From chapmanb at 50mail.com Mon May 6 22:30:31 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 06 May 2013 22:30:31 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> Message-ID: <8761yvh67s.fsf@fastmail.fm> Peter; > Would using an OrderedDict be neater? i.e. Preserve any user > given order or whatever there was when parsing. This would > allow ad-hoc conventions like the ID is first to be observed > (or whatever the user preferred). The current API generates the GFF from Biopython Seq and SeqFeature objects, so there isn't a clean way to pass through ordering like this. We could expose qualifiers as OrderedDicts if that's a useful change, but still need to pick an ordering for non-qualifier items. Practically, there is no guaranteed order to GFF3 attributes. Exposing an alphabetized list seems reasonable but it's probably not worth going too far down this path. Brad From tomlin.9 at wright.edu Tue May 14 00:07:57 2013 From: tomlin.9 at wright.edu (Tomlin, Joshua James) Date: Tue, 14 May 2013 04:07:57 +0000 Subject: [Biopython] help - Biopython Install Message-ID: I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors which you can see below. Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Entrez_online ... FAIL test_SearchIO_write ... FAIL test_SeqIO_index ... FAIL test_Tutorial ... FAIL test_bgzf ... FAIL Bio.bgzf docstring test ... FAIL ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) From tomlin.9 at wright.edu Mon May 13 20:38:35 2013 From: tomlin.9 at wright.edu (Tomlin, Joshua James) Date: Tue, 14 May 2013 00:38:35 +0000 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors as seen in the output below: Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Install psycopg2 if you want to use pg with BioSQL test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... skipping. C module in Bio.KDTree not compiled test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... skipping. C module in Bio.KDTree not compiled test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... FAIL test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_SeqXML ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ok test_Tutorial ... FAIL test_UniGene ... ok test_Uniprot ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... ok test_prosite1 ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Motif docstring test ... ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.Seq docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.FastaIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez_online.py", line 34, in test_read_from_url rec = Entrez.read(einfo) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read record = handler.read(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_write_single_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, single query (xml_2226_blastp_004.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 49, in test_write_single_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_index.py", line 432, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) From nlindberg at mkei.org Tue May 14 09:48:42 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Tue, 14 May 2013 13:48:42 +0000 Subject: [Biopython] Installation Help In-Reply-To: References: , Message-ID: How did you install it? Sent from my Verizon Wireless Phone ----- Reply message ----- From: "Tomlin, Joshua James" Date: Tue, May 14, 2013 3:58 am Subject: [Biopython] Installation Help To: "biopython at biopython.org" I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors as seen in the output below: Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Install psycopg2 if you want to use pg with BioSQL test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... skipping. C module in Bio.KDTree not compiled test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... skipping. C module in Bio.KDTree not compiled test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... FAIL test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_SeqXML ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ok test_Tutorial ... FAIL test_UniGene ... ok test_Uniprot ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... ok test_prosite1 ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Motif docstring test ... ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.Seq docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.FastaIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez_online.py", line 34, in test_read_from_url rec = Entrez.read(einfo) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read record = handler.read(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_write_single_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, single query (xml_2226_blastp_004.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 49, in test_write_single_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_index.py", line 432, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Tue May 14 10:10:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 15:10:16 +0100 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 1:38 AM, Tomlin, Joshua James wrote: > > I'm trying to install biopython on my laptop. I have mac os x 10.7.5, > python 2.7, and have installed numpy I believe. > I can run the 'import Bio' command in python without any errors but > when I try run 'python setup.py test' I get 6 errors as seen in the output below: > > Basically I need to know if I have done everything I need to do correctly? > If I have then why are these 6 tests failing and will that impact my future > use of biopython? There are four issues here, mostly quite minor: * NCBI online resource temporarily unavailable, retest should be fine * Missing test files ( see below) * Python 2.7.4 made a change to the SAX XML parser (see below) * GZIP bug in Python 2.7.4 (see below) There were some unfortunate breakages in Python 2.7.4 which you've run into, plus missing test files. If you installed Python 2.7.4 yourself, it might be simplest to replace it with Python 2.7.3 in the short term. If this came with the Apple OS then don't do that ;) > ====================================================================== > ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) > Test Entrez.read from URL > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_Entrez_online.py", line 34, in test_read_from_url > rec = Entrez.read(einfo) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read > record = handler.read(handle) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read > self.parser.ParseFile(handle) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler > raise RuntimeError(value) > RuntimeError: Unable to open connection to #DbInfo?dbaf= This is an online test and the NCBI API didn't respond in this case, if you wait and retry it should work. > ====================================================================== > ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) > Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml > self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) > File "test_SearchIO_write.py", line 27, in parse_write_and_compare > SearchIO.write(source_qresults, out_file, out_format, **kwargs) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write > writer.write_file(qresults) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file > xml.startDocument() > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument > self.write('\n' > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write > super(UnbufferedTextIOWrapper, self).write(s) > TypeError: must be unicode, not str Python 2.7.4 broke our code, see: http://lists.open-bio.org/pipermail/biopython-dev/2013-April/010505.html The next Biopython release will fix this, so you could try that or perhaps downgrade to Python 2.7.3 instead? > ====================================================================== > ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) > Index fastq-sanger file Quality/example.fastq.bgz get_raw > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_SeqIO_index.py", line 432, in > f = lambda x : x.get_raw_check(fn, fmt, alpha, c) > File "test_SeqIO_index.py", line 272, in get_raw_check > raw_file = h.read() > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read > while self._read(readsize): > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read > self._read_gzip_header() > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header > self._read_exact(struct.unpack(" File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 185, in _read_exact > data = self.fileobj.read(n) > TypeError: an integer is required This is a bug in Python 2.7.4, which broke GZIP support. You can downgrade to Python 2.7.3, wait for 2.7.5, apply the one line fix by hand: http://bugs.python.org/issue17666 or ignore this if you do not plan to use BGZF compressed files. (This explains most of the failures) > ====================================================================== > ERROR: test_doctests (test_Tutorial.TutorialTestCase) > Run tutorial doctests. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_Tutorial.py", line 152, in test_doctests > ValueError: 4 Tutorial doctests failed: test_from_line_05671, test_from_line_06030, test_from_line_06190, test_from_line_06479 Recently reported bug, I missed two sample data files, see: http://lists.open-bio.org/pipermail/biopython-dev/2013-May/010599.html You can download the two missing files: Doc/examples/my_blast.xml Doc/examples/my_blat.psl Available from github or here: http://biopython.org/SRC/Doc/examples/my_blast.xml http://biopython.org/SRC/Doc/examples/my_blat.psl I guess we need to push out Biopython 1.62 sooner rather than later, which would solve most of these issues. Thanks for taking the time to report these problems. Regards, Peter (Sorry you'll get this twice Tomlin, I forget to CC the list on my first reply) From p.j.a.cock at googlemail.com Tue May 14 10:11:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 15:11:28 +0100 Subject: [Biopython] help - Biopython Install In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 5:07 AM, Tomlin, Joshua James wrote: > I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. > I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors which you can see below. > > Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? > > Thanks. > > Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) > [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] > Operating system: posix darwin > test_Entrez_online ... FAIL > test_SearchIO_write ... FAIL > test_SeqIO_index ... FAIL > test_Tutorial ... FAIL > test_bgzf ... FAIL > Bio.bgzf docstring test ... FAIL > ---------------------------------------------------------------------- > Ran 213 tests in 296.686 seconds > > FAILED (failures = 6) Hi Tomlin, I replied to your longer email with the full test output (it was held for moderation as the email was unusually long): http://lists.open-bio.org/pipermail/biopython/2013-May/008558.html Peter From norbert.auer at boku.ac.at Tue May 14 12:27:00 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Tue, 14 May 2013 18:27:00 +0200 Subject: [Biopython] corrupted blast results In-Reply-To: <51926522020000BD00012F85@gwia2.boku.ac.at> References: <51926522020000BD00012F85@gwia2.boku.ac.at> Message-ID: <519281F4020000BD00012F92@gwia2.boku.ac.at> Hi, I have currently some problems using the NCBIWWW.qblast function. I used this query to blast some sequences. result_handle = NCBIWWW.qblast("blastn", "refseq_genomic", seq_fasta,entrez_query="txid10029 [ORGN]",hitlist_size=2) save_file = open("blast.xml", "w") blast_results = result_handle.read() save_file.write(blast_results) result_handle.close() Last time I haven't any problems with this script but today I get only corrupted (not well formed) XML files back. In my last try I got a correct XML File but after a deeper investigation of this file I found out that the showed alignment was wrong. The header shows Identities = 660/661 but looking into the alignment shows that this cannot be true. I used a similar query over the web fronted and got the same hit expect that the alignment was correct. It seems that there was a insertion of 3 nucleotides in the middle of the subject sequence. How could this be? I have no explanation for this behaviour. from the NCBIWWW.qblast function: Query 241 AAGGCAGGACTGAAGAGTGTCATTATGGGGTGAGCCTTTCAAGGTCCCTGCCACTCTCTC 300 ||||||||||||||||||||||||||||||||||||||||| | | Sbjct 1002610 AAGGCAGGACTGAAGAGTGTCATTATGGGGTGAGCCTTTCATCAAGGTCCCTGCCACTCT 1002551 from the web fronted: Query 241 ACTCTCTTTGTGTACTTTAAAGGTGCTGTGCCCCAAACTCCTGGGACACGGAGAGAACTC 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1169534 ACTCTCTTTGTGTACTTTAAAGGTGCTGTGCCCCAAACTCCTGGGACACGGAGAGAACTC 1169593 I was wondering if this is a NCBI service problem (running on a different server than the web fronted) or is it a biopython issue? I use biopython version 1.61 If necessary I could attach the blast XML files but they are very long. Thanks From p.j.a.cock at googlemail.com Tue May 14 14:11:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 19:11:45 +0100 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 3:10 PM, Peter Cock wrote: > On Tue, May 14, 2013 at 1:38 AM, Tomlin, Joshua James wrote: >> >> ====================================================================== >> ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) >> Index fastq-sanger file Quality/example.fastq.bgz get_raw >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "test_SeqIO_index.py", line 432, in >> f = lambda x : x.get_raw_check(fn, fmt, alpha, c) >> File "test_SeqIO_index.py", line 272, in get_raw_check >> raw_file = h.read() >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read >> while self._read(readsize): >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read >> self._read_gzip_header() >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header >> self._read_exact(struct.unpack("> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 185, in _read_exact >> data = self.fileobj.read(n) >> TypeError: an integer is required > > This is a bug in Python 2.7.4, which broke GZIP support. > You can downgrade to Python 2.7.3, wait for 2.7.5, apply > the one line fix by hand: http://bugs.python.org/issue17666 or > ignore this if you do not plan to use BGZF compressed files. > > (This explains most of the failures) I've modified the unit tests to skip test_bgzf.py and the Bio.bgzf doctests if run on Python with this broken gzip: https://github.com/biopython/biopython/commit/975f5c4f6422951ff2ac54bed6928312fdcd1a51 https://github.com/biopython/biopython/commit/975f5c4f6422951ff2ac54bed6928312fdcd1a51 The user will now see this in the full test output: test_bgzf ... skipping. Your Python has a broken gzip library, see http://bugs.python.org/issue17666 for details Peter From hlapp at drycafe.net Wed May 15 16:44:07 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 15 May 2013 16:44:07 -0400 Subject: [Biopython] Workshop on Sustainable Software for Science: Practice and Experiences Message-ID: FYI, if you haven't seen this yet: http://wssspe.researchcomputing.org.uk/ It seems to me that the Bio* projects, perhaps led by BioPerl as the oldest and thus longest running (nowadays more fancily called "sustained") of them would have a lot to say about the subject. Anyone interested in a joint submission? Also, I notice that Biojava's Andreas is on the organizing committee, so maybe he's been conspiring on something already :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From cjfields at illinois.edu Thu May 16 00:43:22 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 16 May 2013 04:43:22 +0000 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Jason and I have discussed looking into opportunity's like this, I think it makes sense to try a joint submission. chris On May 15, 2013, at 3:44 PM, Hilmar Lapp wrote: > FYI, if you haven't seen this yet: > > http://wssspe.researchcomputing.org.uk/ > > It seems to me that the Bio* projects, perhaps led by BioPerl as the oldest and thus longest running (nowadays more fancily called "sustained") of them would have a lot to say about the subject. Anyone interested in a joint submission? > > Also, I notice that Biojava's Andreas is on the organizing committee, so maybe he's been conspiring on something already :-) > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mictadlo at gmail.com Thu May 16 00:57:12 2013 From: mictadlo at gmail.com (Mic) Date: Thu, 16 May 2013 14:57:12 +1000 Subject: [Biopython] NCBIXML.parse Message-ID: Hi, Why does NCBIXML.parse attach UR090 to the UniRef90 ids and as results I get is UR090:UniRef90_Q9FX16 with the following code: with open("x.blastp.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: query_name = blast_record.query for alignment in blast_record.alignments: hit_id = alignment.hit_id Is it possible to remove UR090 or maybe it should be UR090:Q9FX16? Is UR090:UniRef90_Q9FX16 compatible Gbrowse2? Thank you in advance. Mic From mictadlo at gmail.com Thu May 16 01:10:35 2013 From: mictadlo at gmail.com (Mic) Date: Thu, 16 May 2013 15:10:35 +1000 Subject: [Biopython] GFF.writer In-Reply-To: <8761yvh67s.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> <8761yvh67s.fsf@fastmail.fm> Message-ID: Hi all, Thank you it is working fine. From SNAP I have got Eterm and Einit as sub_features together with a score than I created a top_feature gene and got the following gff3 file: ##gff-version 3 ##sequence-region X 1 4795218 X SNAP gene 5974 7324 . - . ID=X-snap.4;Name=UR090:UniRef90_Q9FX16;Note=F12G12.10 protein n:1 Tax:Arabidopsis thaliana RepID:Q9FX16_ARATH X SNAP Eterm 5974 6007 5.650 - . Parent=X-snap.4 X SNAP Einit 6161 7324 -5.800 - . Parent=X-snap.4 Now I just wonder whether I should add sub_features' score together (5.650 + (-5.800 ) = -0.1499) and the result insert to the top_feature score ( -0.1499)? Thank you in advance. Mic On Tue, May 7, 2013 at 12:30 PM, Brad Chapman wrote: > > Peter; > > > Would using an OrderedDict be neater? i.e. Preserve any user > > given order or whatever there was when parsing. This would > > allow ad-hoc conventions like the ID is first to be observed > > (or whatever the user preferred). > > The current API generates the GFF from Biopython Seq and SeqFeature > objects, so there isn't a clean way to pass through ordering like this. > We could expose qualifiers as OrderedDicts if that's a useful change, > but still need to pick an ordering for non-qualifier items. > > Practically, there is no guaranteed order to GFF3 attributes. Exposing > an alphabetized list seems reasonable but it's probably not worth going > too far down this path. > > Brad > From p.j.a.cock at googlemail.com Thu May 16 05:10:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 May 2013 10:10:25 +0100 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Message-ID: On Thu, May 16, 2013 at 5:43 AM, Fields, Christopher J wrote: > Jason and I have discussed looking into opportunity's like this, I think it makes > sense to try a joint submission. > > chris This sounds like a good idea, although given the time and place I am unlikely to be able to attend in person: First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE) (to held in conjunction with SC13, Sunday, 17 November 2013, Denver, CO, USA) http://wssspe.researchcomputing.org.uk/ Rather than trying to discuss this over four mailing lists should we switch to the cross project list open-bio-l, or continue off-list? http://lists.open-bio.org/mailman/listinfo/open-bio-l Thanks, Peter From cjfields at illinois.edu Thu May 16 09:09:45 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 16 May 2013 13:09:45 +0000 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74E1FBCF@CHIMBX5.ad.uillinois.edu> Yes, though we need to make sure others (e.g. those not subscribed to open-bio-l) are in the loop. November is a possibility for me. chris On May 16, 2013, at 4:10 AM, Peter Cock wrote: > On Thu, May 16, 2013 at 5:43 AM, Fields, Christopher J > wrote: >> Jason and I have discussed looking into opportunity's like this, I think it makes >> sense to try a joint submission. >> >> chris > > This sounds like a good idea, although given the time and place I am > unlikely to be able to attend in person: > > First Workshop on Sustainable Software for Science: Practice and > Experiences (WSSSPE) > (to held in conjunction with SC13, Sunday, 17 November 2013, Denver, CO, USA) > http://wssspe.researchcomputing.org.uk/ > > Rather than trying to discuss this over four mailing lists should we switch > to the cross project list open-bio-l, or continue off-list? > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > Thanks, > > Peter From p.j.a.cock at googlemail.com Thu May 16 10:20:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 May 2013 15:20:41 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Thu, May 16, 2013 at 5:57 AM, Mic wrote: > Hi, > Why does NCBIXML.parse attach UR090 to the UniRef90 ids and as results I > get is UR090:UniRef90_Q9FX16 with the following code: > > with open("x.blastp.xml") as bf: > blast_records = NCBIXML.parse(bf) > > for blast_record in blast_records: > query_name = blast_record.query > for alignment in blast_record.alignments: > hit_id = alignment.hit_id > > Is it possible to remove UR090 or maybe it should be UR090:Q9FX16? > > Is UR090:UniRef90_Q9FX16 compatible Gbrowse2? > > Thank you in advance. > > Mic Can you post your x.blastp.xml online somewhere for us to look at? If not perhaps you can at least include the relevant snippet of the XML file in your email, and tell us about the database you are using. Thanks, Peter From p.j.a.cock at googlemail.com Fri May 17 04:55:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 May 2013 09:55:05 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Fri, May 17, 2013 at 5:26 AM, Mic wrote: > Please find attached the x.blastp.xml file. Do you need more information? > > >>> from Bio.Blast import NCBIXML >>> with open("x.blastp.xml") as bf: ... for r in NCBIXML.parse(bf): ... for a in r.alignments: ... print r.query, a.hit_id ... X-snap.4 UniRef90_Q9FX16 (etc) This matches the first hit in the XML, 1 UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UniRef90_Q9FX16 308 If you are getting UR090:UniRef90_Q9FX16 then perhaps some other part of your code is adding the prefix? What else did your code do (it was incomplete - there were no print statements for example)? Regards, Peter From p.j.a.cock at googlemail.com Mon May 20 06:53:01 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 May 2013 11:53:01 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Mon, May 20, 2013 at 4:26 AM, Mic wrote: > I am sorry, the XML file which I sent was created with one year old Blast > library. When I run Blast with the following command and a new UniRef90 > library > > blastp -query X.aa.snap -db /db/uniprot/uniref90 -evalue 0.00001 > -max_target_seqs 15 -out x.blastp.xml -num_threads 6 -outfmt 5 > > Please find attached the new XML file ... Got it, and yes this does use a different ID style: 1 UR090:UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UR090:UniRef90_Q9FX16 308 If you want to change that then I would review how the database was created (e.g. did you make this BLAST database yourself with makeblastdb (new) or formatdb (old), and if so what identifiers did the input FASTA file use?). It might be simpler to just handle the alternative identifier style in your script. > and it looks like that a new schema has been created > http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases?style=xml . > > Michal That was a link to the EDAM ontology - I don't see how that is related to the NCBI BLAST XML schema? Thanks, Peter From mictadlo at gmail.com Sun May 19 23:26:29 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 20 May 2013 13:26:29 +1000 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: I am sorry, the XML file which I sent was created with one year old Blast library. When I run Blast with the following command and a new UniRef90 library blastp -query X.aa.snap -db /db/uniprot/uniref90 -evalue 0.00001 -max_target_seqs 15 -out x.blastp.xml -num_threads 6 -outfmt 5 Please find attached the new XML file and it looks like that a new schema has been created http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases?style=xml . Michal On Fri, May 17, 2013 at 6:55 PM, Peter Cock wrote: > > On Fri, May 17, 2013 at 5:26 AM, Mic wrote: > >> Please find attached the x.blastp.xml file. Do you need more information? >> >> > > >>> from Bio.Blast import NCBIXML > > >>> with open("x.blastp.xml") as bf: > ... for r in NCBIXML.parse(bf): > ... for a in r.alignments: > ... print r.query, a.hit_id > ... > X-snap.4 UniRef90_Q9FX16 > (etc) > > This matches the first hit in the XML, > > 1 > UniRef90_Q9FX16 > F12G12.10 protein n=1 Tax=Arabidopsis thaliana > RepID=Q9FX16_ARATH > UniRef90_Q9FX16 > 308 > > If you are getting UR090:UniRef90_Q9FX16 then perhaps some other > part of your code is adding the prefix? What else did your code do > (it was incomplete - there were no print statements for example)? > > Regards, > > Peter > > -------------- next part -------------- A non-text attachment was scrubbed... Name: x.blastp2.xml Type: text/xml Size: 6029 bytes Desc: not available URL: From ferreirafm at usp.br Wed May 22 15:08:18 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Wed, 22 May 2013 16:08:18 -0300 Subject: [Biopython] write one rec to file at once Message-ID: <519D17A2.9060501@usp.br> Hi list, Is there any way of writing one and only one record to a fastafile, without have to open a reclist, append rec to it and only then write the reclist to a previous opened file? I was just thinking, if I can read a rec at once like: inrec = SeqIO.read(open(fastafile, "rU"), "fasta") why not write it at once? Best, Fred From jocelyne at gmail.com Wed May 22 15:46:00 2013 From: jocelyne at gmail.com (Jocelyne) Date: Wed, 22 May 2013 12:46:00 -0700 Subject: [Biopython] write one rec to file at once In-Reply-To: <519D17A2.9060501@usp.br> References: <519D17A2.9060501@usp.br> Message-ID: SeqIO.write(rec, fastq_out, "fastq") On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira wrote: > Hi list, > Is there any way of writing one and only one record to a fastafile, without > have to open a reclist, append rec to it and only then write the reclist to > a previous opened file? I was just thinking, if I can read a rec at once > like: > inrec = SeqIO.read(open(fastafile, "rU"), "fasta") > why not write it at once? > Best, > Fred > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ferreirafm at usp.br Wed May 22 16:06:32 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Wed, 22 May 2013 17:06:32 -0300 Subject: [Biopython] write one rec to file at once In-Reply-To: References: <519D17A2.9060501@usp.br> Message-ID: <519D2548.3050303@usp.br> Tks. Em 22-05-2013 16:46, Jocelyne escreveu: > SeqIO.write(rec, fastq_out, "fastq") > > On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira > wrote: >> Hi list, >> Is there any way of writing one and only one record to a fastafile, without >> have to open a reclist, append rec to it and only then write the reclist to >> a previous opened file? I was just thinking, if I can read a rec at once >> like: >> inrec = SeqIO.read(open(fastafile, "rU"), "fasta") >> why not write it at once? >> Best, >> Fred >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython -- Dr. Frederico Moraes Ferreira University of Sao Paulo Heart Institute, School of Medicine Laboratoy of Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 Sao Paulo - SP Brasil From idoerg at gmail.com Wed May 22 16:25:12 2013 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 22 May 2013 16:25:12 -0400 Subject: [Biopython] write one rec to file at once In-Reply-To: References: <519D17A2.9060501@usp.br> Message-ID: Probably a good idea to add this to the the manual/cookbook. On Wed, May 22, 2013 at 3:46 PM, Jocelyne wrote: > SeqIO.write(rec, fastq_out, "fastq") > > On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira > wrote: > > Hi list, > > Is there any way of writing one and only one record to a fastafile, > without > > have to open a reclist, append rec to it and only then write the reclist > to > > a previous opened file? I was just thinking, if I can read a rec at once > > like: > > inrec = SeqIO.read(open(fastafile, "rU"), "fasta") > > why not write it at once? > > Best, > > Fred > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From ivangreg at gmail.com Thu May 23 09:07:59 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:07:59 -0400 Subject: [Biopython] Displaying pairwise2 output fails. Message-ID: Hello Biopythonians, Are you able to display pairwise alignments? It fails in my system: from Bio import pairwise2 for a in pairwise2.align.globalxx("ACCGT", "ACG"): print format_alignment(*a) Traceback (most recent call last): File "", line 2, in NameError: name 'format_alignment' is not defined Notice that the commands above are just a copy/paste of its docstring. It should have produced something like this ACCGT ||||| AC-G- Score=3 ACCGT ||||| A-CG- Score=3 My system information (Fedora 18): Python 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2 and Biopython 1.61. Any help would be appreciated. Ivan Ivan Gregoretti, PhD Bioinformatics From w.arindrarto at gmail.com Thu May 23 09:13:36 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 23 May 2013 15:13:36 +0200 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: Hi Ivan On Thu, May 23, 2013 at 3:07 PM, Ivan Gregoretti wrote: > Hello Biopythonians, > > Are you able to display pairwise alignments? It fails in my system: > > from Bio import pairwise2 > for a in pairwise2.align.globalxx("ACCGT", "ACG"): > print format_alignment(*a) > > Traceback (most recent call last): > File "", line 2, in > NameError: name 'format_alignment' is not defined This happens because of the way Python namespacing works. With your current import, you should write this: print pairwise2.format_alignment ... instead of this: print format_alignment ... The code you have will work if you import format_alignment explicitly, like so: from Bio.pairwise2 import format_alignment Hope that helps :), Bow From ivangreg at gmail.com Thu May 23 09:32:52 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:32:52 -0400 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: Thank you Bow. I did try many variations to try to load format_alignment() but I was unsuccessful; "from Bio.pairwise2 import format_alignment" did not cross my mind. Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 9:13 AM, Wibowo Arindrarto wrote: > Hi Ivan > > On Thu, May 23, 2013 at 3:07 PM, Ivan Gregoretti wrote: >> Hello Biopythonians, >> >> Are you able to display pairwise alignments? It fails in my system: >> >> from Bio import pairwise2 >> for a in pairwise2.align.globalxx("ACCGT", "ACG"): >> print format_alignment(*a) >> >> Traceback (most recent call last): >> File "", line 2, in >> NameError: name 'format_alignment' is not defined > > This happens because of the way Python namespacing works. With your > current import, you should write this: > > print pairwise2.format_alignment ... > > instead of this: > > print format_alignment ... > > The code you have will work if you import format_alignment explicitly, like so: > > from Bio.pairwise2 import format_alignment > > Hope that helps :), > Bow From ivangreg at gmail.com Thu May 23 09:36:39 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:36:39 -0400 Subject: [Biopython] pairwise2 and cpairwise2 Message-ID: One more about pairwise2: At the bottom of this page http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html and in the Biopython tutorial, there are references to a C implementation of pairwise2, called cpairwise2. How do you run that C implementation without falling back to pure python? Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics From p.j.a.cock at googlemail.com Thu May 23 09:44:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 14:44:04 +0100 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 2:36 PM, Ivan Gregoretti wrote: > One more about pairwise2: > > At the bottom of this page > http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html > > and in the Biopython tutorial, there are references to a C > implementation of pairwise2, called cpairwise2. > > How do you run that C implementation without falling back to pure python? > > Thank you, > > Ivan It should happen automatically, assuming you're not running under Jython or PyPy where the C code isn't used. How did you install Biopython? Peter From p.j.a.cock at googlemail.com Thu May 23 09:51:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 14:51:32 +0100 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 2:32 PM, Ivan Gregoretti wrote: > Thank you Bow. I did try many variations to try to load > format_alignment() but I was unsuccessful; "from Bio.pairwise2 import > format_alignment" did not cross my mind. > > Ivan Thanks for this feedback, I've attempted to clarify the example: https://github.com/biopython/biopython/commit/7b058bf9ade7922bf746f4f7c7ebcb897c236a94 Peter From ivangreg at gmail.com Thu May 23 10:33:00 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 10:33:00 -0400 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: Hello Peter, I installed Biopython form biopython-1.61.tar.gz. No customisation, just python setup.py build python setup.py test sudo python setup.py install Is there a way to show that pairwise2 is not falling back to pure python? Alignment feels a bit slow in my machine. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 9:44 AM, Peter Cock wrote: > On Thu, May 23, 2013 at 2:36 PM, Ivan Gregoretti wrote: >> One more about pairwise2: >> >> At the bottom of this page >> http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html >> >> and in the Biopython tutorial, there are references to a C >> implementation of pairwise2, called cpairwise2. >> >> How do you run that C implementation without falling back to pure python? >> >> Thank you, >> >> Ivan > > It should happen automatically, assuming you're not running > under Jython or PyPy where the C code isn't used. How did > you install Biopython? > > Peter From p.j.a.cock at googlemail.com Thu May 23 10:42:06 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 15:42:06 +0100 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 3:33 PM, Ivan Gregoretti wrote: > Hello Peter, > > I installed Biopython form biopython-1.61.tar.gz. No customisation, just > > python setup.py build > python setup.py test > sudo python setup.py install > > Is there a way to show that pairwise2 is not falling back to pure python? > Alignment feels a bit slow in my machine. > > Thank you, > > Ivan If the C version not available, Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_45 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import pairwise2 >>> from Bio import cpairwise2 Traceback (most recent call last): File "", line 1, in ImportError: cannot import name cpairwise2 If the C version is installed, this should work: Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import pairwise2 >>> from Bio import cpairwise2 >>> pairwise2.rint is cpairwise2.rint True >>> pairwise2._make_score_matrix_fast is cpairwise2._make_score_matrix_fast True Peter From ivangreg at gmail.com Thu May 23 10:47:11 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 10:47:11 -0400 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: Thank you Peter, my system is indeed using the C version. Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 10:42 AM, Peter Cock wrote: > On Thu, May 23, 2013 at 3:33 PM, Ivan Gregoretti wrote: >> Hello Peter, >> >> I installed Biopython form biopython-1.61.tar.gz. No customisation, just >> >> python setup.py build >> python setup.py test >> sudo python setup.py install >> >> Is there a way to show that pairwise2 is not falling back to pure python? >> Alignment feels a bit slow in my machine. >> >> Thank you, >> >> Ivan > > If the C version not available, > > Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) > [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_45 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio import pairwise2 >>>> from Bio import cpairwise2 > Traceback (most recent call last): > File "", line 1, in > ImportError: cannot import name cpairwise2 > > If the C version is installed, this should work: > > Python 2.7.2 (default, Oct 11 2012, 20:14:37) > [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio import pairwise2 >>>> from Bio import cpairwise2 >>>> pairwise2.rint is cpairwise2.rint > True >>>> pairwise2._make_score_matrix_fast is cpairwise2._make_score_matrix_fast > True > > Peter From francesco.chiani at gmail.com Mon May 27 11:29:12 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Mon, 27 May 2013 17:29:12 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: HI pythonian, I'm trying to convert a simple txt unigene list in a gene symbol list with Bio.Entrez , let's semplify: just one unigene: handle =Entrez.esearch(db="unigene", term="Z26634") but..what I've got from Entrez.read is fairly unusefull, I think.. what I'm doing wrong? kindly From p.j.a.cock at googlemail.com Mon May 27 11:36:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 May 2013 16:36:45 +0100 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: On Mon, May 27, 2013 at 4:29 PM, francesco chiani wrote: > HI pythonian, > I'm trying to convert a simple txt unigene list in a gene symbol list with > Bio.Entrez , let's semplify: just one unigene: > handle =Entrez.esearch(db="unigene", term="Z26634") > but..what I've got from Entrez.read is fairly unusefull, I think.. what I'm > doing wrong? > > kindly Hi Francesco, I'm not clear what you are hoping to achieve here. Could you give an example of the input data (e.g. a list like ["Z26634", ...] perhaps?) and the desired output to match? Thanks, Peter From francesco.chiani at gmail.com Mon May 27 13:58:52 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Mon, 27 May 2013 19:58:52 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: Hi Peter, As you said the input is a list look like unigene_list=[" Z26634","....",".."] The output another list gene_symbol =["ANK2", "...",".."] Sorry for the very bad posed question. Il giorno 27/mag/2013 17:36, "Peter Cock" ha scritto: > On Mon, May 27, 2013 at 4:29 PM, francesco chiani > wrote: > > HI pythonian, > > I'm trying to convert a simple txt unigene list in a gene symbol list > with > > Bio.Entrez , let's semplify: just one unigene: > > handle =Entrez.esearch(db="unigene", term="Z26634") > > but..what I've got from Entrez.read is fairly unusefull, I think.. what > I'm > > doing wrong? > > > > kindly > > Hi Francesco, > > I'm not clear what you are hoping to achieve here. > > Could you give an example of the input data (e.g. a list like > ["Z26634", ...] perhaps?) and the desired output to match? > > Thanks, > > Peter > From mictadlo at gmail.com Mon May 27 21:49:30 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 28 May 2013 11:49:30 +1000 Subject: [Biopython] gff3: feature.location.end problem Message-ID: Hi, When parsing this gff3 file: ##gff-version 3 ##sequence-region ID1 1 20 ID1 prediction gene 1 20 10.0 + . other=Some,annotations;ID=gene1 ID1 prediction exon 1 5 . + . Parent=gene1 ID1 prediction exon 16 20 . + . Parent=gene1 with this code: from BCBio import GFF # handles GFF files with open("test.gff3") as file: for rec in GFF.parse(file): annotations = rec.annotations['sequence-region'][0] id = annotations[0] start = int(annotations[1]) end = int(annotations[2]) print id, start, end for feature in rec.features: contig_id = feature.qualifiers['ID'][0] print contig_id, int(feature.location.start), int(feature.location.end) I get the following output: ID1 1 20 gene1 0 20 Why is it not "gene1 0 19" and "ID1 0 19"? Thank you in advance. Mic From p.j.a.cock at googlemail.com Tue May 28 04:41:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 May 2013 09:41:58 +0100 Subject: [Biopython] gff3: feature.location.end problem In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 2:49 AM, Mic wrote: > Hi, > When parsing this gff3 file: > > ##gff-version 3 > ##sequence-region ID1 1 20 > ID1 prediction gene 1 20 10.0 + . > other=Some,annotations;ID=gene1 > ID1 prediction exon 1 5 . + . > Parent=gene1 > ID1 prediction exon 16 20 . + . > Parent=gene1 > > > with this code: > > from BCBio import GFF # handles GFF files > > with open("test.gff3") as file: > for rec in GFF.parse(file): > annotations = rec.annotations['sequence-region'][0] > id = annotations[0] > start = int(annotations[1]) > end = int(annotations[2]) > print id, start, end > > for feature in rec.features: > contig_id = feature.qualifiers['ID'][0] > print contig_id, int(feature.location.start), > int(feature.location.end) > > I get the following output: > ID1 1 20 > gene1 0 20 > > > Why is it not "gene1 0 19" and "ID1 0 19"? > > Thank you in advance. > > Mic Hi Mic, That looks correct, just like when parsing a GenBank/EMBL feature with a location string 1..20 you'd get the start as 0 and the end as 20 in Biopython. This is using Python style slice notation - the start is inclusive and the end is exclusive meaning sequence[0:20] will give the first 20 bases as you would expect for this location. Peter From markbudde at gmail.com Tue May 28 18:03:06 2013 From: markbudde at gmail.com (Mark Budde) Date: Tue, 28 May 2013 15:03:06 -0700 Subject: [Biopython] SeqRecord slicing bug and fix Message-ID: There is a bug in the SeqRecord slicing behavior. The bug crops up on circular records with a feature spanning the beginning and end of the plasmid. Any slice outside of the feature will return the feature, and the feature.location.end is negative. >>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >>> record SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', dbxrefs=[]) >>> record.features [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), strand=1), type='misc_feature')] >>> record[500:600].features #This slice should contain no features [SeqFeature(FeatureLocation(ExactPosition(1799), ExactPosition(-300), strand=1), type='misc_feature')] This can be fixed by modifying line 453 of SeqRecord.py... from: if start <= f.location.nofuzzy_start \ and f.location.nofuzzy_end <= stop: to: if start <= f.location.nofuzzy_start \ and f.location.nofuzzy_end <= stop \ and f.location.nofuzzy_start <= f.location.nofuzzy_end: On a related note, is there an appropriate way to modify the position of a SeqFeature? I have been doing "feature.location._end = ExactPosition(newEnd)" , but I was under the impression that I shouldn't modify objects beginning with an underscore. -Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: pUC19_mod.gb Type: application/octet-stream Size: 3674 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Tue May 28 19:09:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 May 2013 00:09:25 +0100 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: > There is a bug in the SeqRecord slicing behavior. The bug crops up on > circular records with a feature spanning the beginning and end of the > plasmid. Any slice outside of the feature will return the feature, and the > feature.location.end is negative. > >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >>>> record > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', dbxrefs=[]) >>>> record.features > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), > strand=1), type='misc_feature')] The issue is you've got start > end, which arguably should raise an exception (there is a TODO in the code for that). Is this a circular record and a feature spanning the origin? > On a related note, is there an appropriate way to modify the position of a > SeqFeature? I have been doing "feature.location._end = > ExactPosition(newEnd)" , but I was under the impression that I shouldn't > modify objects beginning with an underscore. Yes, things starting with a single underscore should be regarded as private and not used. Currently that appears to be setup as a read only property (which you can change directly using feature.location._end = new_value) and right now I'm not sure why that was done, but it has been read only for since Bio.SeqFeature was first written 12 years ago. Maybe no one has asked till now? Peter From markbudde at gmail.com Tue May 28 21:22:54 2013 From: markbudde at gmail.com (Mark Budde) Date: Tue, 28 May 2013 18:22:54 -0700 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 4:09 PM, Peter Cock wrote: > On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: > > There is a bug in the SeqRecord slicing behavior. The bug crops up on > > circular records with a feature spanning the beginning and end of the > > plasmid. Any slice outside of the feature will return the feature, and > the > > feature.location.end is negative. > > > >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') > >>>> record > > > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', > > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', > dbxrefs=[]) > >>>> record.features > > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), > > strand=1), type='misc_feature')] > > The issue is you've got start > end, which arguably should > raise an exception (there is a TODO in the code for that) Is this a circular record and a feature spanning the origin? > Yes, it is a circular plasmid with a feature spanning the origin. There are legitimate reasons to have features span the origin, so please do not raise an exception. I think the provided code is the best solution to the problem (and completely fixes the problems within my personal code when this is an issue), but would be interested in hearing other suggestions. > > > On a related note, is there an appropriate way to modify the position of > a > > SeqFeature? I have been doing "feature.location._end = > > ExactPosition(newEnd)" , but I was under the impression that I shouldn't > > modify objects beginning with an underscore. > > Yes, things starting with a single underscore should be > regarded as private and not used. Currently that appears > to be setup as a read only property (which you can change > directly using feature.location._end = new_value) and > right now I'm not sure why that was done, but it has been > read only for since Bio.SeqFeature was first written 12 > years ago. Maybe no one has asked till now? Well the alternative is to make a new feature and import all of the other atributes, but that seems like a lot of work for no practical gain. Thanks, Mark From p.j.a.cock at googlemail.com Wed May 29 05:26:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 May 2013 10:26:25 +0100 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Wed, May 29, 2013 at 2:22 AM, Mark Budde wrote: > On Tue, May 28, 2013 at 4:09 PM, Peter Cock > wrote: >> >> On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: >> > There is a bug in the SeqRecord slicing behavior. The bug crops up on >> > circular records with a feature spanning the beginning and end of the >> > plasmid. Any slice outside of the feature will return the feature, and >> > the >> > feature.location.end is negative. >> > >> >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >> >>>> record >> > >> > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', >> > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', >> > dbxrefs=[]) >> >>>> record.features >> > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), >> > strand=1), type='misc_feature')] >> >> The issue is you've got start > end, which arguably should >> raise an exception (there is a TODO in the code for that) >> >> Is this a circular record and a feature spanning the origin? > > Yes, it is a circular plasmid with a feature spanning the origin. There are > legitimate reasons to have features span the origin, The SeqFeature system is very heavily influenced by the GenBank/EMBL feature table, meaning you'd write this like join(2300...3000,1..200) where for the sake of argument I've assumed the genome is 3000 long. Currently join features are handled with sub_features, but this is about to change in the forthcoming Biopython 1.62 release which introduces CompoundLocation objects instead. If you fancy trying the latest Biopython from github, you should be able create this example with: wrap_location = FeatureLocation(2299, 3000) + FeatureLocation(0, 200) (Doing the equivalent with the current system in Biopython 1.61 or older is far more complicated) > ... so please do not raise > an exception. I think the provided code is the best solution to the problem > (and completely fixes the problems within my personal code when this is an > issue), but would be interested in hearing other suggestions. An exception if start > end would prevent downstream surprises in code like __getitem__ which assumes this. There are other problems which prevent allowing this - for example it is impossible to calculate the length of the location (and therefore the SeqFeature) without also knowing the circular genome's length. Likewise __contains__ (which is used for testing if an integer position is in a feature) and __iter__ (which is used for iterating over the integer positions within a feature) would break. Unfortunately your initial code suggestion would only solve a small subset of the feature location functionality. Basically trying to support wrapped features like this would require a *lot* of special case code, but add very little over the current join-based approach which also handles many other biological annotations nicely like spliced genes. >> > On a related note, is there an appropriate way to modify the position of >> > a SeqFeature? I have been doing "feature.location._end = >> > ExactPosition(newEnd)" , but I was under the impression that I shouldn't >> > modify objects beginning with an underscore. >> >> Yes, things starting with a single underscore should be >> regarded as private and not used. Currently that appears >> to be setup as a read only property (which you can change >> directly using feature.location._end = new_value) and >> right now I'm not sure why that was done, but it has been >> read only for since Bio.SeqFeature was first written 12 >> years ago. Maybe no one has asked till now? > > Well the alternative is to make a new feature and import all of the other > atributes, but that seems like a lot of work for no practical gain. No, you'd only need to create a new FeatureLocation and then feature.location = new_location but this is still a hassle, and definitely worth looking at changing. Thanks for raising this. One reason a read-only FeatureLocation *could* be nice is if we had any clever indexing code which could break if the locations were liable to change. But we don't have anything like that within Biopython at the moment. We should probably continue this on the biopython-dev list, in case there is a more practical downside to allowing the FeatureLocation start and end to be updated which I'm currently missing. Regards, Peter From chapmanb at 50mail.com Wed May 29 12:32:29 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 29 May 2013 12:32:29 -0400 Subject: [Biopython] gff3: feature.location.end problem In-Reply-To: References: Message-ID: <87d2s9zqzm.fsf@fastmail.fm> Mic; >> ##gff-version 3 >> ##sequence-region ID1 1 20 >> ID1 prediction gene 1 20 10.0 + . >> other=Some,annotations;ID=gene1 [...] >> I get the following output: >> ID1 1 20 >> gene1 0 20 >> >> Why is it not "gene1 0 19" and "ID1 0 19"? > That looks correct, just like when parsing a GenBank/EMBL > feature with a location string 1..20 you'd get the start as 0 > and the end as 20 in Biopython. This is using Python style > slice notation - the start is inclusive and the end is exclusive > meaning sequence[0:20] will give the first 20 bases as you > would expect for this location. Peter is right on with the conversion information: you expect this to be 0, 20. This is Python 0-based indexing so you convert from GFF 1-based by subtracting from the start base. The code wasn't doing anything special with the sequence-region directive which is why they stay as a raw parse of the test: 1 20. I agree it would be useful to convert these to 0-based for consistency. I pushed a fix which handles this as well: https://github.com/chapmanb/bcbb/commit/51e7f2742059608f98d948fca5b342a9edf9e7a8 Thanks for the feedback, Brad From amitbikram87 at gmail.com Thu May 30 00:34:32 2013 From: amitbikram87 at gmail.com (amit bikram) Date: Thu, 30 May 2013 10:04:32 +0530 Subject: [Biopython] Error in biopython Message-ID: Hi Peter, I tried what u have written but it is not coming. here is the error >>> import Bio >>> print Bio._file_ Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute '_file_' >>> print Bio_version_ Traceback (most recent call last): File "", line 1, in NameError: name 'Bio_version_' is not defined >>> with regards Amit From p.j.a.cock at googlemail.com Thu May 30 04:26:38 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 09:26:38 +0100 Subject: [Biopython] Error in biopython In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 5:34 AM, amit bikram wrote: > Hi Peter, > > I tried what u have written but it is not coming. here is the error > >>>> import Bio >>>> print Bio._file_ > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute '_file_' >>>> print Bio_version_ > Traceback (most recent call last): > File "", line 1, in > NameError: name 'Bio_version_' is not defined >>>> Hi Amit, There are some subtle and important differences: import Bio print Bio.__file__ print Bio.__version__ Those are all double underscores, and also there should be a dot between the Bio and __version__. You'll find double underscores are used in Python for some special functions (which normally you won't need to worry about). The good news is the simple import seemed to work - so if you can try again that will help answer your original question about this failing: from Bio.Seq import Seq Right now my guess would be a broken install, or you've got a file in the current directory called Bio.py which is being imported instead of Biopython. Regards, Peter From p.j.a.cock at googlemail.com Thu May 30 06:19:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 11:19:15 +0100 Subject: [Biopython] Error in biopython In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 11:07 AM, amit bikram wrote: > Hi Peter, > > I tried again i got it now, now it is working fine. > Thank u... > > Regards > Amit Great :) Peter From p.j.a.cock at googlemail.com Thu May 30 07:46:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 12:46:37 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? Message-ID: Dear Biopythoneers, For the forthcoming Biopython 1.62 release, we are planning to officially support Python 3 (as well as Python 2, including PyPy, and Jython). However, which versions of Python 3 would people want to use? One possibility is we'd require at least Python 3.2.5 (which would simplify dealing with things broken in older releases of Python 3). Alternatively, would it be acceptable to insist on at least Python 3.3 for example? If you are interested in running Biopython under Python 3 (which you can already try out), please could you reply with what version of Python 3 you have installed, and if being required to update would be a problem or not. Thank you, Peter From xwchen at yeah.net Thu May 30 08:09:38 2013 From: xwchen at yeah.net (=?UTF-8?B?6ZmI5pmT5paH?=) Date: Thu, 30 May 2013 20:09:38 +0800 (CST) Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: <28aebaa4.a451.13ef557e533.Coremail.xwchen@yeah.net> hi Peter, python 3.3 is being used. Thanks. -- ???? Xiaowen Chen Institute of Hydrobiology, Chinese Academy of Sciences #7 Donghu South Rd, Wuhan, Hubei, 430072, P. R. China At 2013-05-30 19:46:37,"Peter Cock" wrote: >Dear Biopythoneers, > >For the forthcoming Biopython 1.62 release, we are planning to >officially support Python 3 (as well as Python 2, including PyPy, >and Jython). However, which versions of Python 3 would people >want to use? > >One possibility is we'd require at least Python 3.2.5 (which would >simplify dealing with things broken in older releases of Python 3). > >Alternatively, would it be acceptable to insist on at least Python 3.3 >for example? > >If you are interested in running Biopython under Python 3 >(which you can already try out), please could you reply with >what version of Python 3 you have installed, and if being >required to update would be a problem or not. > >Thank you, > >Peter >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From w.arindrarto at gmail.com Thu May 30 08:51:44 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 30 May 2013 14:51:44 +0200 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi everyone, > For the forthcoming Biopython 1.62 release, we are planning to > officially support Python 3 (as well as Python 2, including PyPy, > and Jython). However, which versions of Python 3 would people > want to use? > > One possibility is we'd require at least Python 3.2.5 (which would > simplify dealing with things broken in older releases of Python 3). > > Alternatively, would it be acceptable to insist on at least Python 3.3 > for example? > > If you are interested in running Biopython under Python 3 > (which you can already try out), please could you reply with > what version of Python 3 you have installed, and if being > required to update would be a problem or not. I'm leaning towards insisting on Python >=3.3 support (I'm running 3.3.2). I suppose that even if Python3.3 is not available on a machine or through the default package manager, it's always installable on its own. If that's not the case, I imagine Python2.x is most likely present in these machines (so Biopython can still be used). On a related note, do we have a defined timeline on when we would drop support for Python2.x? Are there any plans to have our codebase written in Python3.x instead of Python2.x? Best, Bow From p.j.a.cock at googlemail.com Thu May 30 09:13:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:13:20 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Thank you for all the comments so far, don't stop yet :) On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto wrote: > Hi everyone, > > I'm leaning towards insisting on Python >=3.3 support (I'm running > 3.3.2). I suppose that even if Python3.3 is not available on a machine > or through the default package manager, it's always installable on its > own. If that's not the case, I imagine Python2.x is most likely > present in these machines (so Biopython can still be used). True. So far everyone who has replied (including some off list) have said they are using Python 3.3 which is encouraging. Thank you for the comments so far. It looks like we can forget about Python 3.1, and just need to decide if it is worth including Python 3.2.5 in the short term. > On a related note, do we have a defined timeline on when we would drop > support for Python2.x? Are there any plans to have our codebase > written in Python3.x instead of Python2.x? Nothing concrete planned, no. I'll reply in more detail on the biopython-dev list as I do have some thoughts about this. Regards, Peter From ivangreg at gmail.com Thu May 30 09:23:43 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 30 May 2013 09:23:43 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi Bow, I think that we should drop support for Python 2.x once it is left out in favour of Python 3. I am not aware of any major linux distrubution that uses Python 3 as default. By major linux distribution I mean Debian, Ubuntu, CentOS, Fedora and Red Hat Enterprise Linux. Out of all the five distributions listed above, most administrators use CentOS. Perhaps we should schedule Python v2.x support to be dropped when CentOS switches to Python 3. That is likely to happen a long time from now. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 30, 2013 at 8:51 AM, Wibowo Arindrarto wrote: > Hi everyone, > >> For the forthcoming Biopython 1.62 release, we are planning to >> officially support Python 3 (as well as Python 2, including PyPy, >> and Jython). However, which versions of Python 3 would people >> want to use? >> >> One possibility is we'd require at least Python 3.2.5 (which would >> simplify dealing with things broken in older releases of Python 3). >> >> Alternatively, would it be acceptable to insist on at least Python 3.3 >> for example? >> >> If you are interested in running Biopython under Python 3 >> (which you can already try out), please could you reply with >> what version of Python 3 you have installed, and if being >> required to update would be a problem or not. > > I'm leaning towards insisting on Python >=3.3 support (I'm running > 3.3.2). I suppose that even if Python3.3 is not available on a machine > or through the default package manager, it's always installable on its > own. If that's not the case, I imagine Python2.x is most likely > present in these machines (so Biopython can still be used). > > On a related note, do we have a defined timeline on when we would drop > support for Python2.x? Are there any plans to have our codebase > written in Python3.x instead of Python2.x? > > Best, > Bow > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu May 30 09:36:36 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:36:36 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 2:23 PM, Ivan Gregoretti wrote: > Hi Bow, > > I think that we should drop support for Python 2.x once it is left out > in favour of Python 3. > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > > Out of all the five distributions listed above, most administrators > use CentOS. Perhaps we should schedule Python v2.x support to be > dropped when CentOS switches to Python 3. That is likely to happen a > long time from now. > > Thank you, > > Ivan I agree that dropping Python 2 support is still a long way away :) I was particularly hoping to find out if many people are using Python 3.2 (or worse, Python 3.1) or if we can assume Python 3.3 or later. Thanks, Peter From nlindberg at mkei.org Thu May 30 09:34:56 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Thu, 30 May 2013 13:34:56 +0000 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: Message-ID: Hello, The current production versions are Python 2.7.5 and Python 3.3.2. Per Ivan's comments, I do not think you should drop support for 2.x when adding for 3.x. However, I think it's completely fair that you should require the latest current production version of each (or at least put the caveat in that it's only been tested on 2.7.* and 3.3.*. If you need any help with the testing or compatibility, I'd be glad to help. Also, for any Mac users out there, I am working on a Homebrew formula for a package-install of Biopython. More details incoming. Thanks, Nick Lindberg Sr. Consulting Engineer, HPC Milwaukee Institute 414.727.6413 (W) http://www.mkei.org On 5/30/13 8:23 AM, "Ivan Gregoretti" wrote: >Hi Bow, > >I think that we should drop support for Python 2.x once it is left out >in favour of Python 3. > >I am not aware of any major linux distrubution that uses Python 3 as >default. By major linux distribution I mean Debian, Ubuntu, CentOS, >Fedora and Red Hat Enterprise Linux. > >Out of all the five distributions listed above, most administrators >use CentOS. Perhaps we should schedule Python v2.x support to be >dropped when CentOS switches to Python 3. That is likely to happen a >long time from now. > >Thank you, > >Ivan > > > > >Ivan Gregoretti, PhD >Bioinformatics > > >On Thu, May 30, 2013 at 8:51 AM, Wibowo Arindrarto > wrote: >> Hi everyone, >> >>> For the forthcoming Biopython 1.62 release, we are planning to >>> officially support Python 3 (as well as Python 2, including PyPy, >>> and Jython). However, which versions of Python 3 would people >>> want to use? >>> >>> One possibility is we'd require at least Python 3.2.5 (which would >>> simplify dealing with things broken in older releases of Python 3). >>> >>> Alternatively, would it be acceptable to insist on at least Python 3.3 >>> for example? >>> >>> If you are interested in running Biopython under Python 3 >>> (which you can already try out), please could you reply with >>> what version of Python 3 you have installed, and if being >>> required to update would be a problem or not. >> >> I'm leaning towards insisting on Python >=3.3 support (I'm running >> 3.3.2). I suppose that even if Python3.3 is not available on a machine >> or through the default package manager, it's always installable on its >> own. If that's not the case, I imagine Python2.x is most likely >> present in these machines (so Biopython can still be used). >> >> On a related note, do we have a defined timeline on when we would drop >> support for Python2.x? Are there any plans to have our codebase >> written in Python3.x instead of Python2.x? >> >> Best, >> Bow >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu May 30 09:54:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:54:41 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi Nick, On Thu, May 30, 2013 at 2:34 PM, Nick Lindberg wrote: > Hello, > > The current production versions are Python 2.7.5 and Python 3.3.2. Per > Ivan's comments, I do not think you should drop support for 2.x when > adding for 3.x. However, I think it's completely fair that you should > require the latest current production version of each (or at least put the > caveat in that it's only been tested on 2.7.* and 3.3.*. If I was unclear, we not talking about dropping support for Python 2 any time soon - the plan for Biopython 1.62 is to cover at least Python 2.5, 2.6, 2.7 and 3.3. We will continue to support Python 2 for a long time (i.e. at least one year, maybe more). What I'm hoping to confirm is that very few people care about Python 3.2 support - its good to know that so far everyone is happy to target Python 3.3 onwards. > If you need any help with the testing or compatibility, I'd be glad to > help. > > Also, for any Mac users out there, I am working on a Homebrew formula for > a package-install of Biopython. More details incoming. > > Thanks, > > Nick Lindberg Periodically running the unit tests on the latest code from git and reporting any issues is always a good idea. We do try to do this automatically via TravisCI and BuildBot, but this only covers a fraction of the possible configuration permutations. Testing under recent Windows machines including 64 bit Windows would most appreciated - but requires more background knowledge to setup the relevant compiler environments. If you'd be interested in that or helping setup another buildslave which we can run nightly tests on, please write to us on the biopython-dev list for more details. Thanks, Peter From p.j.a.cock at googlemail.com Thu May 30 09:56:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:56:52 +0100 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: On Mon, May 27, 2013 at 6:58 PM, francesco chiani wrote: > Hi Peter, > As you said the input is a list look like > unigene_list=[" Z26634","....",".."] > The output another list gene_symbol =["ANK2", "...",".."] > Sorry, I've not had a change to try this - but I would start by looking at the NCBI Entrez elink functionality. Peter From p.j.a.cock at googlemail.com Thu May 30 12:18:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 17:18:41 +0100 Subject: [Biopython] Biopython projects with NESCent for GSoC 2013 In-Reply-To: References: Message-ID: Dear all, After the disappointing news that the Open Bioinformatics Foundation (OBF) was not accepted as a Google Summer of Code (GSoC) organisation this year, Biopython was fortunate to once again offer some projects with the NESCent team: http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 As always the student proposals have been very competitive, and we've not been able to take on everyone. This year NESCent was fortunately to be able to accept seven students through GSoC and one through the GNOME Outreach Program for Women. Two of these GSoC projects are Biopython related: Codon Alignment and Analysis in Biopython Student: Zheng Ruan Mentors: Eric Talevich, Peter Cock http://www.google-melange.com/gsoc/project/google/gsoc2013/rzzmh12345/32001 Phylogenetics in Biopython: Filling in the gaps Student: Yanbo Ye http://www.google-melange.com/gsoc/project/google/gsoc2013/yeyanbo/45001 Mentors: Mark Holder, Jeet Sukumaran, Eric Talevich Thank you NESCent, and congratulations to Zheng Ruan and Yanbo Ye! I'm hoping you're already setting up a blog, which I hope you'll be able to use for roughly weekly progress reports during the summer - CC'd to the biopython-dev mailing list and the NESCent Phyloinformatics Summer of Code forum on Google+, http://lists.open-bio.org/mailman/listinfo/biopython-dev https://plus.google.com/communities/105828320619238393015 An introduction to your project would be a great idea for your first post - here's Bow's from last year as an example: http://bow.web.id/blog/2012/04/google-summer-of-code-is-on/ http://bow.web.id/blog/2012/08/summers-over/ http://bow.web.id/blog/tag/gsoc/ The idea here is to keep the wider community informed about how your project is going. On behalf of the Biopython developers, congratulations! We're looking forward to another productive Summer of Code :) Peter From eric.talevich at gmail.com Thu May 30 12:26:43 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 30 May 2013 12:26:43 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 9:13 AM, Peter Cock wrote: > Thank you for all the comments so far, don't stop yet :) > > On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto > wrote: > > Hi everyone, > > > > I'm leaning towards insisting on Python >=3.3 support (I'm running > > 3.3.2). I suppose that even if Python3.3 is not available on a machine > > or through the default package manager, it's always installable on its > > own. If that's not the case, I imagine Python2.x is most likely > > present in these machines (so Biopython can still be used). > > True. > > So far everyone who has replied (including some off list) have said > they are using Python 3.3 which is encouraging. Thank you for > the comments so far. > > It looks like we can forget about Python 3.1, and just need to > decide if it is worth including Python 3.2.5 in the short term. > I don't always use Python 3.x, but when I do, I use the latest release (3.3 now). I wonder which one PyPy will target -- I assume they'll try to support the most recent syntax. For anything that needs to be run on machines I don't control, I still target 2.7, though I hope to switch to 3 this year. -Eric From francesco.chiani at gmail.com Thu May 30 12:40:21 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Thu, 30 May 2013 18:40:21 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: Hi Peter, Dont worry, I've just managed in that way: import Bio from Bio import Entrez import xlrd import xlwt name_list=["M75161","Z26634","X02308"] Entrez.email="youremaiil at something.XXX" id_list=[] for a in range(0, len(name_list)): handle=Entrez.esearch(db="unigene", term=listanomi[a]) record=Entrez.read(handle) id_list.append(record["IdList"]) gene_symbol_list=[] for i in range(0, len(id_list)): handle=Entrez.esummary(db='unigene',id=id_list[i]) record=Entrez.read(handle) gene_symbol_list.append(record[0]["GENE"]) print record[0]["GENE"] It may need some improvement for sure (for example a try/raise function for the wrong unigenes at list) but it works. Thanks anyway for your help Francesco 2013/5/30 Peter Cock > On Mon, May 27, 2013 at 6:58 PM, francesco chiani > wrote: > > Hi Peter, > > As you said the input is a list look like > > unigene_list=[" Z26634","....",".."] > > The output another list gene_symbol =["ANK2", "...",".."] > > > > Sorry, I've not had a change to try this - but I would start > by looking at the NCBI Entrez elink functionality. > > Peter > -- PhD Francesco Chiani CNR -- National Research Council of Italy Cell Biology and Neurobiology Institute (IBCN) Campus "A. Buzzati-Traverso" 32 via E. Ramarini 00015 Monterotondo Scalo (Roma) Italy tel: +39-0690091308 fax: +39-0690091260 From llewelr at gmail.com Thu May 30 12:58:03 2013 From: llewelr at gmail.com (Richard Llewellyn) Date: Thu, 30 May 2013 10:58:03 -0600 Subject: [Biopython] Fwd: Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Oops meant for the list. ---------- Forwarded message ---------- From: Richard Llewellyn Date: Thu, May 30, 2013 at 10:57 AM Subject: Re: [Biopython] Versions of Python 3 to support in Biopython? To: Eric Talevich Moved entirely to Python 3.3 (well, only scikits-learn still might require 2.7), so not supporting earlier versions of Py 3 fine by me. Thanks so much. On Thu, May 30, 2013 at 10:26 AM, Eric Talevich wrote: > On Thu, May 30, 2013 at 9:13 AM, Peter Cock >wrote: > > > Thank you for all the comments so far, don't stop yet :) > > > > On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto > > wrote: > > > Hi everyone, > > > > > > I'm leaning towards insisting on Python >=3.3 support (I'm running > > > 3.3.2). I suppose that even if Python3.3 is not available on a machine > > > or through the default package manager, it's always installable on its > > > own. If that's not the case, I imagine Python2.x is most likely > > > present in these machines (so Biopython can still be used). > > > > True. > > > > So far everyone who has replied (including some off list) have said > > they are using Python 3.3 which is encouraging. Thank you for > > the comments so far. > > > > It looks like we can forget about Python 3.1, and just need to > > decide if it is worth including Python 3.2.5 in the short term. > > > > I don't always use Python 3.x, but when I do, I use the latest release (3.3 > now). I wonder which one PyPy will target -- I assume they'll try to > support the most recent syntax. > > For anything that needs to be run on machines I don't control, I still > target 2.7, though I hope to switch to 3 this year. > > -Eric > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mmokrejs at fold.natur.cuni.cz Thu May 30 14:39:50 2013 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Thu, 30 May 2013 20:39:50 +0200 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: <51A79CF6.5010705@fold.natur.cuni.cz> Ivan Gregoretti wrote: > Hi Bow, > > I think that we should drop support for Python 2.x once it is left out > in favour of Python 3. > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > > Out of all the five distributions listed above, most administrators > use CentOS. Perhaps we should schedule Python v2.x support to be > dropped when CentOS switches to Python 3. That is likely to happen a > long time from now. Hi, I don't think 2.x supports needs to be scheduled for the sake of having a deadline. It is only meaningful to come up with a date *once the extra development overhead is NOT acceptable anymore*. For sure by that time the yet to be determined time window will be of some reasonable width so that people start screaming up. But that is far from today. In general, I am very opposed to deprecations of any programming language API. This just puts a wasteful burden on developers to rewrite their possibly years working apps just due to API change. Some people even don't want to touch certains parts of their code. Or they don't have time to do that or stopped changing their fully working application ... hence why should they re-start their work if for several years they did NOT need to touch the code? In such cases when developer does not take an action the API change kills the existing and working application because people just don't use it anymore on incompatible systems. Seriously, look at mod_python. Clearly, for my purposes mod_python was and still is just enough, runs on my servers for 8-10 years and I do NOT care that it is NOW not developed anymore. I do not care about tiny implementation details which resulted in abandoning of the project because I just do NOT need those missing features. But, changing my apps, config files, server config files, re-doing the testing is just a blocker for me. I rather stop upgrading the system because the transition gives me nothing. And I am not the only one. Look into comments at https://bugs.gentoo.org/show_bug.cgi?id=343663 and then poke towards URLs in https://bugs.gentoo.org/show_bug.cgi?id=343663#c6 . There you can read about mod_python but the homework is where else should you move! Similarly, anybody willing to add PHP-5.4 support to EST2uni which was written for 5.3 (http://cichlid.umd.edu/est2uni/)? In summary, somebody should really make a table listing of anticipated 3rd-party python-based apps which are likely to be imported into python code along biopython. More pragmatically, somebody please make sure your future decisions are not conflicting with numpy+matplolib. Personally I want to play in near future with these to post-process/compile python code (incl. biopython): http://www.nuitka.net/ (supports 2.5, 2.6, 3.2) https://github.com/astrand/pyobfuscate http://python.net/crew/atuining/cx_Freeze (supports 2.3, 2.4, 2.5) http://bitboost.com/python-obfuscator/manual (supports 2.5, partially 2.6 and 2.7) But I have read other answers which landed on the biopython's list meanwhile and glad to hear from Peter that the original question was really about what 3.x version should be supported and NOT about stopping certain python 2.x compatibility. That's good to hear and thanks for keeping the compatibility so far. ;) Martin From arklenna at gmail.com Thu May 30 15:21:30 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 30 May 2013 15:21:30 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 9:23 AM, Ivan Gregoretti wrote: > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > I think this is a point worth reiterating - very few people will be "stuck" using Python 3.1 because unlike old versions of 2.x, 3 isn't shipped with distros yet. So anyone who chooses to use Python 3.x should be able to install 3.3. Cheers, Lenna From ajingnk at gmail.com Thu May 30 15:33:40 2013 From: ajingnk at gmail.com (Jing Lu) Date: Thu, 30 May 2013 15:33:40 -0400 Subject: [Biopython] How to make best guest of protein class from protein sequence by biopython Message-ID: Hi Biopythonian, I am not quit sure what the best way is to predict the name of protein class (e.g. Oxidoreductases) from protein sequence by biopython. Just Blast to the whole PDB and read some attributes of the result? I am not every familiar with modules in biopython.. Thanks, Jing From jmtc21 at bath.ac.uk Thu May 30 15:48:59 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 30 May 2013 20:48:59 +0100 Subject: [Biopython] Problem parsing embl files Message-ID: <51A7AD2B.1000303@bath.ac.uk> Hi all, Is the first time I try to parse embl files with biopython. I'm trying to get the gene ids and coordinates for start/end of each gene. I thought it will be straight forward like with other annotation files, so I did a small script to test it. from Bio import SeqIO if __name__ == '__main__': handle = open("sctg_0.embl", "r") records = SeqIO.parse(handle, "embl") for record in records : print(record) But when running the script I get an error which may suggest the embl files have an issue ValueError: Premature end of features table, marker '//' found I checked the source code of the parser and seems the embl file has problems, but when I checked embl file format seems they are ok. I have a few thousand files formatted in the same way. So can't think about other way to deal with the problem but to parse them. The annotation files have only annotation info, no sequences. Here I uploaded an example. http://depositfiles.com/files/481uob95e I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. Any advice and suggestion will be greatly appreciated. Jaime. From p.j.a.cock at googlemail.com Thu May 30 18:03:21 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 23:03:21 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: <51A7AD2B.1000303@bath.ac.uk> References: <51A7AD2B.1000303@bath.ac.uk> Message-ID: On Thu, May 30, 2013 at 8:48 PM, Jaime Tovar wrote: > Hi all, > > Is the first time I try to parse embl files with biopython. I'm trying to > get the gene ids and coordinates for start/end of each gene. > > I thought it will be straight forward like with other annotation files, so I > did a small script to test it. > > from Bio import SeqIO > if __name__ == '__main__': > handle = open("sctg_0.embl", "r") > records = SeqIO.parse(handle, "embl") > for record in records : > print(record) > > But when running the script I get an error which may suggest the embl files > have an issue > > ValueError: Premature end of features table, marker '//' found > > I checked the source code of the parser and seems the embl file has > problems, but when I checked embl file format seems they are ok. If they are like your example, they are a bit unusual. > I have a > few thousand files formatted in the same way. So can't think about other way > to deal with the problem but to parse them. > > The annotation files have only annotation info, no sequences. Here I > uploaded an example. > > http://depositfiles.com/files/481uob95e > > I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. > > Any advice and suggestion will be greatly appreciated. > > Jaime. Hi Jamie, For sharing plain text files, http://gist.github.com is a nicer option. The problem is your file looks like this: ID sctg_0 standard; DNA; DIV; 3745584 BP. XX AC sctg_0; XX FH Key Location/Qualifiers FH FT CDS 302..490 FT /note="EuGene predicted gene nr: Esi0000_0001" ... FT mRNA complement(3744791..3745584) FT /note="EuGene predicted gene nr: Esi0000_0662" // The parser is expecting an SQ line after the FT lines before the // As you said, your files lack any sequence information - is that deliberate? This is not something I've seen before, but we can probably modify the EMBL parser to cope with this - much like how GenBank files can omit the actual sequence data. On the other hand, the SQ line is not defined as optional so perhaps we are doing the right thing and rejecting an invalid file? ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt Where did your EMBL format file come from? Thanks, Peter From jmtc21 at bath.ac.uk Thu May 30 18:55:28 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 30 May 2013 23:55:28 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: References: <51A7AD2B.1000303@bath.ac.uk> Message-ID: <51A7D8E0.3000704@bath.ac.uk> Hi Peter, I checked a similar version with the description of the embl format. They are bit ambiguous, I think. From the definition we have: XX - spacer line (many per entry) SQ - sequence header (1 per entry) CO - contig/construct line (0 or >=1 per entry) bb - (blanks) sequence data (>=1 per entry) // - termination line (ends each entry; 1 per entry) At first I read SQ ... (1 per entry) and thought it meant there most be one of them. And similar situations for the rest (many per entry, >=1). But for example from the same definition we have: DT - date (2 per entry) But my file doest not have DT and the parser was not complaining about it, so it made me think maybe I was doing something wrong. To be honest I can't say I'm sure if it means there should be a SQ or if it is optional but can only show once per entry. The files are not mine. Are third party files I got from another researcher, who in turn got them from someone else, so... They are annotations for algae contigs as far as I know. Not sure why they don't have the sequence part. To be honest I don't know if it is worth making changes to the parser. I can't say these files are actually well formatted. Maybe someone with more experience with embl files can give a second opinion. You think I can cheat the parser if I just 'sed' my embl files and replace the \\ with something like: """XX SQ //""" I didn't know github had gist :) I have some animadversion against github so I never use them :D Thanks for the help! Jaime. On 30/05/2013 23:03, Peter Cock wrote: > On Thu, May 30, 2013 at 8:48 PM, Jaime Tovar wrote: >> Hi all, >> >> Is the first time I try to parse embl files with biopython. I'm trying to >> get the gene ids and coordinates for start/end of each gene. >> >> I thought it will be straight forward like with other annotation files, so I >> did a small script to test it. >> >> from Bio import SeqIO >> if __name__ == '__main__': >> handle = open("sctg_0.embl", "r") >> records = SeqIO.parse(handle, "embl") >> for record in records : >> print(record) >> >> But when running the script I get an error which may suggest the embl files >> have an issue >> >> ValueError: Premature end of features table, marker '//' found >> >> I checked the source code of the parser and seems the embl file has >> problems, but when I checked embl file format seems they are ok. > If they are like your example, they are a bit unusual. > >> I have a >> few thousand files formatted in the same way. So can't think about other way >> to deal with the problem but to parse them. >> >> The annotation files have only annotation info, no sequences. Here I >> uploaded an example. >> >> http://depositfiles.com/files/481uob95e >> >> I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. >> >> Any advice and suggestion will be greatly appreciated. >> >> Jaime. > Hi Jamie, > > For sharing plain text files, http://gist.github.com is a nicer option. > > The problem is your file looks like this: > > ID sctg_0 standard; DNA; DIV; 3745584 BP. > XX > AC sctg_0; > XX > FH Key Location/Qualifiers > FH > FT CDS 302..490 > FT /note="EuGene predicted gene nr: Esi0000_0001" > ... > FT mRNA complement(3744791..3745584) > FT /note="EuGene predicted gene nr: Esi0000_0662" > // > > The parser is expecting an SQ line after the FT lines before the // > As you said, your files lack any sequence information - is that deliberate? > > This is not something I've seen before, but we can probably > modify the EMBL parser to cope with this - much like how > GenBank files can omit the actual sequence data. > > On the other hand, the SQ line is not defined as optional > so perhaps we are doing the right thing and rejecting an > invalid file? ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt > > Where did your EMBL format file come from? > > Thanks, > > Peter From p.j.a.cock at googlemail.com Fri May 31 04:43:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 31 May 2013 09:43:28 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: <51A7D8E0.3000704@bath.ac.uk> References: <51A7AD2B.1000303@bath.ac.uk> <51A7D8E0.3000704@bath.ac.uk> Message-ID: On Thu, May 30, 2013 at 11:55 PM, Jaime Tovar wrote: > Hi Peter, > > I checked a similar version with the description of the embl format. They > are bit ambiguous, I think. From the definition we have: > > XX - spacer line (many per entry) > SQ - sequence header (1 per entry) > CO - contig/construct line (0 or >=1 per entry) > bb - (blanks) sequence data (>=1 per entry) > // - termination line (ends each entry; 1 per entry) > > At first I read SQ ... (1 per entry) and thought it meant there most be one > of them. And similar situations for the rest (many per entry, >=1). But for > example from the same definition we have: > > DT - date (2 per entry) > > But my file doest not have DT and the parser was not complaining about it, > so it made me think maybe I was doing something wrong. To be honest I can't > say I'm sure if it means there should be a SQ or if it is optional but can > only show once per entry. Well yes, it does seem that missing DT lines is also technically invalid - but coping with that was quite simple. Missing a sequence in a sequence centric file format is rather more important ;) > The files are not mine. Are third party files I got from another researcher, > who in turn got them from someone else, so... They are annotations for algae > contigs as far as I know. Not sure why they don't have the sequence part. I would be interested to know how the files were prepared (e.g. which tool produced them), but this isn't vital. > To be honest I don't know if it is worth making changes to the parser. I > can't say these files are actually well formatted. Maybe someone with more > experience with embl files can give a second opinion. Good idea - anyone? > You think I can cheat the parser if I just 'sed' my embl files and replace > the \\ with something like: > > """XX > SQ > > > //""" Possibly - you'd need to do a little experimenting to find out the bare minimum that would allow the parser to continue without code changes. Peter From c0d3g33k at gmail.com Fri May 31 12:31:05 2013 From: c0d3g33k at gmail.com (c0d3g33k) Date: Fri, 31 May 2013 12:31:05 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: <51A79CF6.5010705@fold.natur.cuni.cz> References: <51A79CF6.5010705@fold.natur.cuni.cz> Message-ID: <51A8D049.2060705@gmail.com> Hi Martin, On 5/30/2013 2:39 PM, Martin Mokrejs wrote: > > In general, I am very opposed to deprecations of any programming language > API. This just puts a wasteful burden on developers to rewrite their possibly > years working apps just due to API change. Some people even don't want to touch > certains parts of their code. Or they don't have time to do that or stopped > changing their fully working application ... hence why should they re-start > their work if for several years they did NOT need to touch the code? In such > cases when developer does not take an action the API change kills the existing > and working application because people just don't use it anymore on incompatible > systems. I fully understand the sentiment - change for the sake of change is unwelcome. Bear in mind, though, that this is a discussion about versions of Python to support in *future* releases of Biopython. A developer as conservatively paranoid as you describe isn't going to be tracking the bleeding edge of Biopython unless he really enjoys being self-contradictory. These days, if stability is a high priority and you aren't using virtualenv (http://www.virtualenv.org), "You're Doing It Wrong". Set up a stable virtual environment for that application that's been working for years and tested within an inch of it's life and have done with it. Let the Biopython developers move carefully forward without having to drag the chains of sins past along with them forever like Jacob Marley in A Christmas Carol. From jmtc21 at bath.ac.uk Fri May 31 15:12:32 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Fri, 31 May 2013 20:12:32 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: References: <51A7AD2B.1000303@bath.ac.uk> <51A7D8E0.3000704@bath.ac.uk> Message-ID: <51A8F620.3040807@bath.ac.uk> Thanks Peter, I found gff3 files I can easily parse for the data I need. So will leave this strange embl files alone. If someone with more experience with embl files wants to take a look at them to check the parser let me know and I will forward some sample files. I asked the people who gave me the files if they know what kind of software they used to generate them. But since they are third party data they have no information. I was lucky they had also gff3 files to get gene annotation data. Jaime. On 31/05/2013 09:43, Peter Cock wrote: > On Thu, May 30, 2013 at 11:55 PM, Jaime Tovar wrote: >> Hi Peter, >> >> I checked a similar version with the description of the embl format. They >> are bit ambiguous, I think. From the definition we have: >> >> XX - spacer line (many per entry) >> SQ - sequence header (1 per entry) >> CO - contig/construct line (0 or >=1 per entry) >> bb - (blanks) sequence data (>=1 per entry) >> // - termination line (ends each entry; 1 per entry) >> >> At first I read SQ ... (1 per entry) and thought it meant there most be one >> of them. And similar situations for the rest (many per entry, >=1). But for >> example from the same definition we have: >> >> DT - date (2 per entry) >> >> But my file doest not have DT and the parser was not complaining about it, >> so it made me think maybe I was doing something wrong. To be honest I can't >> say I'm sure if it means there should be a SQ or if it is optional but can >> only show once per entry. > Well yes, it does seem that missing DT lines is also technically invalid - > but coping with that was quite simple. Missing a sequence in a sequence > centric file format is rather more important ;) > >> The files are not mine. Are third party files I got from another researcher, >> who in turn got them from someone else, so... They are annotations for algae >> contigs as far as I know. Not sure why they don't have the sequence part. > I would be interested to know how the files were prepared (e.g. which > tool produced them), but this isn't vital. > >> To be honest I don't know if it is worth making changes to the parser. I >> can't say these files are actually well formatted. Maybe someone with more >> experience with embl files can give a second opinion. > Good idea - anyone? > >> You think I can cheat the parser if I just 'sed' my embl files and replace >> the \\ with something like: >> >> """XX >> SQ >> >> >> //""" > Possibly - you'd need to do a little experimenting to find out the bare > minimum that would allow the parser to continue without code changes. > > Peter From jordan.r.willis at Vanderbilt.Edu Fri May 31 17:15:48 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 31 May 2013 21:15:48 +0000 Subject: [Biopython] Custom Distance Matrices using Custom Scoring Matrices Message-ID: Hello Bio, I know I asked something like this a while ago but didn't really know what I needed to do. Now I think I know exactly the path, however the solution is unclear. The goal is to compare sequences all of the same length and view them in a dendrogram. In one tree I would like it to be scored with something simple like a PAM250 matrix. In another dendrogram, I would like the tree to be scored with my own custom position-specific scoring matrix. So it looks like I can use a neighbor-joining method using a distance matrix where the distance matrix will be all my sequences scored against each other using either the PAM250 or my custom matrix. Now, does Biopython have the means to do this? I can quickly write a method to score all my sequences against each other using PAM250 or my PSSM and store it in some sort of dictionary. Can I then convert that dictionary to a distance matrix to be used in neighbor joining? Is there a method to write out a newick tree using neighbor joining? Should I even be using Biopython? Thanks so much! Jordan From jordan.r.willis at Vanderbilt.Edu Fri May 31 18:59:26 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 31 May 2013 22:59:26 +0000 Subject: [Biopython] Custom Distance Matrices using Custom Scoring Matrices In-Reply-To: References: Message-ID: Hello, I think my solution is to use a neighbor joining from the PHYLIP package. You can define distance matrices yourself, which I will write using biopython, but I don't think that it has been done before. If I get something nice and stable, I will contribute to the devel branch of biopython. Jordan On May 31, 2013, at 4:15 PM, "Willis, Jordan R" wrote: > Hello Bio, > > I know I asked something like this a while ago but didn't really know what I needed to do. Now I think I know exactly the path, however the solution is unclear. > > The goal is to compare sequences all of the same length and view them in a dendrogram. In one tree I would like it to be scored with something simple like a PAM250 matrix. In another dendrogram, I would like the tree to be scored with my own custom position-specific scoring matrix. > > So it looks like I can use a neighbor-joining method using a distance matrix where the distance matrix will be all my sequences scored against each other using either the PAM250 or my custom matrix. > > Now, does Biopython have the means to do this? I can quickly write a method to score all my sequences against each other using PAM250 or my PSSM and store it in some sort of dictionary. Can I then convert that dictionary to a distance matrix to be used in neighbor joining? Is there a method to write out a newick tree using neighbor joining? Should I even be using Biopython? > > Thanks so much! > > Jordan > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From chapmanb at 50mail.com Wed May 1 10:04:08 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 01 May 2013 06:04:08 -0400 Subject: [Biopython] GFF parsing with biopython In-Reply-To: References: Message-ID: <877gjjkodj.fsf@fastmail.fm> Mic; (moving to galaxy-dev list so folks there can follow, but future questions are more appropriate for the Biopython list only since this isn't a Galaxy question) > I have the following GFF file from a SNAP > > X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 [...] > With the code below I have tried to parse the above GFF file The attributes you're missing are parts of the feature, not the SeqRecord itself, which is why you're seeing attribute error. Here's a full example that pulls all of the information from an example line: from BCBio import GFF in_file = "snap.gff" with open(in_file) as in_handle: for rec in GFF.parse(in_handle): feature = rec.features[0] print rec.id print feature.qualifiers["source"][0] print feature.type print feature.location.start print feature.location.end print feature.qualifiers["score"][0] print feature.location.strand print feature.qualifiers.get("X1-snap.1", [None])[0] which outputs: X1 SNAP Einit 2578 2712 -3.221 1 true Hope this helps, Brad From zhigangwu.bgi at gmail.com Thu May 2 02:11:38 2013 From: zhigangwu.bgi at gmail.com (Zhigang Wu) Date: Wed, 1 May 2013 19:11:38 -0700 Subject: [Biopython] [Biopython-dev] Lazy-loading parsers, was: Biopython GSoC 2013 applications via NESCent In-Reply-To: References: Message-ID: Thanks so much Alex. I definitely will take a look at it. Thanks again for making your code available. Zhigang On May 1, 2013, at 3:56 PM, "Alex Leach" wrote: > Dear all, > > I also left some minor comments on the proposal; I hope they're helpful and I wish you every success! > > You should focus on the proposal for now, but I thought I'd share a more presentable version of the fasta lazy-loader I wrote a couple of years ago. The focus at the time was to minimise memory usage and increase the speed of random access to fasta-formatted sequences, stored on disk. Only sequence accessions and file locations are stored in-memory (in a dict). Once the index has been populated, it can 'pickle' the dictionary to a file on disk, for later re-use. > > It doesn't exactly fulfill all of your needs, but I hope it might help you in the right direction.. > > Also, were there plans for making the lazy loader thread-safe? I've done it in the past by passing a `multiprocessing.Pipe` instance to a method (`pipe_sequences`) of the lazy loader. If redesigning the code, I'd try to implement a callback scheme, but passing a Pipe did the job.. Maybe it's outside the current scope of the project, but anyway, I put the module up on github if you want to check it out[1]. > > > Cheers, > Alex > > > [1] - https://github.com/alexleach/fasta_lazy_loader/blob/master/fasta_lazy_loader.py > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mictadlo at gmail.com Fri May 3 05:08:18 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 3 May 2013 15:08:18 +1000 Subject: [Biopython] GFF parsing with biopython In-Reply-To: <877gjjkodj.fsf@fastmail.fm> References: <877gjjkodj.fsf@fastmail.fm> Message-ID: Thank you. On Wed, May 1, 2013 at 8:04 PM, Brad Chapman wrote: > > Mic; > (moving to galaxy-dev list so folks there can follow, but future > questions are more appropriate for the Biopython list only since > this isn't a Galaxy question) > > > I have the following GFF file from a SNAP > > > > X1 SNAP Einit 2579 2712 -3.221 + . > X1-snap.1 > [...] > > With the code below I have tried to parse the above GFF file > > The attributes you're missing are parts of the feature, not the > SeqRecord itself, which is why you're seeing attribute error. Here's a > full example that pulls all of the information from an example line: > > from BCBio import GFF > > in_file = "snap.gff" > with open(in_file) as in_handle: > for rec in GFF.parse(in_handle): > feature = rec.features[0] > print rec.id > print feature.qualifiers["source"][0] > print feature.type > print feature.location.start > print feature.location.end > print feature.qualifiers["score"][0] > print feature.location.strand > print feature.qualifiers.get("X1-snap.1", [None])[0] > > which outputs: > > X1 > SNAP > Einit > 2578 > 2712 > -3.221 > 1 > true > > Hope this helps, > Brad > From mictadlo at gmail.com Fri May 3 05:26:18 2013 From: mictadlo at gmail.com (Mic) Date: Fri, 3 May 2013 15:26:18 +1000 Subject: [Biopython] GFF.writer Message-ID: Hi, I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) an example how to write gff3 files. Is it possible not to write "##sequence-region ID1 1 20"? Thank you in advance. Mic From chapmanb at 50mail.com Fri May 3 10:51:37 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 03 May 2013 06:51:37 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: Message-ID: <878v3ws5dy.fsf@fastmail.fm> Mic; > I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) an > example how to write gff3 files. > > Is it possible not to write "##sequence-region ID1 1 20"? It only writes that line if you supply a sequence for the SeqRecord: https://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFOutput.py#L105 so manually removing the sequence before writing will prevent it from outputting that metadata. However, the GFF3 spec encourages writing these metadata items: http://www.sequenceontology.org/gff3.shtml Brad From carlos.borroto at gmail.com Fri May 3 16:42:33 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Fri, 3 May 2013 12:42:33 -0400 Subject: [Biopython] Private '_filenames' attribute in class _SQLiteManySeqFilesDict? Message-ID: Hi, I'm writing a tool to index and query a local database using SeqIO.index_db(). I want to provide something similar to: $ blastdbcmd -db nt -info Database: Nucleotide collection (nt) 17,200,582 sequences; 44,123,601,625 total bases Date: Feb 7, 2013 4:14 PM Longest sequence: 65,476,681 bases Volumes: /local/db/blast/nt.00 /local/db/blast/nt.01 /local/db/blast/nt.02 /local/db/blast/nt.03 /local/db/blast/nt.04 /local/db/blast/nt.05 /local/db/blast/nt.06 /local/db/blast/nt.07 /local/db/blast/nt.08 /local/db/blast/nt.09 /local/db/blast/nt.10 /local/db/blast/nt.11 /local/db/blast/nt.12 I have limited Object-Oriented Python's skills, actually very low exposure to OO in general. The object returned by SeqIO.index_db() has an attribute '_filenames' which I could use to report "volumes". I recently read it is a common rule to mark private methods and attributes with the starting underscore. Would it be safe to use '_filenames' in this case? Could this attribute be converted to a public one so it can be safely use without fear of breaking the code in the future? Thanks, Carlos From andreagarcia871 at gmail.com Sun May 5 20:21:06 2013 From: andreagarcia871 at gmail.com (=?ISO-8859-1?Q?Andrea_Garc=EDa?=) Date: Sun, 5 May 2013 17:21:06 -0300 Subject: [Biopython] How to install tests manually? Message-ID: Hello all, I have just installed BioPython using defaults on Windows 7, and it seems all tests files are missing. I've checked under: c:\Python27\Lib\site-packages\Bio\ against this directory https://github.com/biopython/biopython/tree/master/Tests How can I install the tests manually? Cheers From mictadlo at gmail.com Mon May 6 05:02:01 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 6 May 2013 15:02:01 +1000 Subject: [Biopython] GFF.writer In-Reply-To: <878v3ws5dy.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: Hi Brad, Thank you it is working, but I have few questions by running the bellow code: from BCBio import GFF from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.SeqFeature import SeqFeature, FeatureLocation out_file = "your_file.gff" seq = Seq("GATCGATCGATCGATCGATC") rec = SeqRecord(seq, "ID1") qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis thaliana RepID:Q9LMV1_ARATH", "ID": "gene1"} sub_qualifiers = {"source": "prediction"} top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, qualifiers=qualifiers) top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", strand=1, score=12, qualifiers=sub_qualifiers), SeqFeature(FeatureLocation(15, 20), type="exon", strand=1, score=-13, qualifiers=sub_qualifiers)] rec.features = [top_feature] with open(out_file, "w") as out_handle: GFF.write([rec], out_handle) * How is it possible to avoid to get e.g. *%20* and is there a way to get this order ID, note in below output? note=F5M15.26*%20*n*%3A* 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 * How is it possible to get score in sub_features, because the above code caused the following error? Traceback (most recent call last): File "problem.py", line 15, in qualifiers=sub_qualifiers), TypeError: __init__() got an unexpected keyword argument 'score' Thank you in advance Mic On Fri, May 3, 2013 at 8:51 PM, Brad Chapman wrote: > > Mic; > > > I found here ( http://www.biopython.org/wiki/GFF_Parsing#Writing_GFF3 ) > an > > example how to write gff3 files. > > > > Is it possible not to write "##sequence-region ID1 1 20"? > > It only writes that line if you supply a sequence for the SeqRecord: > > > https://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFOutput.py#L105 > > so manually removing the sequence before writing will prevent it from > outputting that metadata. However, the GFF3 spec encourages writing > these metadata items: > > http://www.sequenceontology.org/gff3.shtml > > Brad > From w.arindrarto at gmail.com Mon May 6 06:16:11 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 6 May 2013 08:16:11 +0200 Subject: [Biopython] How to install tests manually? In-Reply-To: References: Message-ID: > Hello all, > > I have just installed BioPython using defaults on Windows 7, and it seems > all tests files are missing. I've checked under: > > c:\Python27\Lib\site-packages\Bio\ > > against this directory > > https://github.com/biopython/biopython/tree/master/Tests > > How can I install the tests manually? Hi Andrea, AFAIK, the Windows installer does not include the Test directory. If you would like to run the tests, you need to download the source, unpack it, and run it using `python setup.py test` from the top directory. Hope that helps :), Bow From p.j.a.cock at googlemail.com Mon May 6 08:21:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 09:21:16 +0100 Subject: [Biopython] How to install tests manually? In-Reply-To: References: Message-ID: On Mon, May 6, 2013 at 7:16 AM, Wibowo Arindrarto wrote: >> Hello all, >> >> I have just installed BioPython using defaults on Windows 7, and it seems >> all tests files are missing. I've checked under: >> >> c:\Python27\Lib\site-packages\Bio\ >> >> against this directory >> >> https://github.com/biopython/biopython/tree/master/Tests >> >> How can I install the tests manually? > > Hi Andrea, > > AFAIK, the Windows installer does not include the Test directory. If > you would like to run the tests, you need to download the source, > unpack it, and run it using `python setup.py test` from the top > directory. > > Hope that helps :), > Bow Yes, that is correct - the Windows installers do not include the source code (nor the compiled Tutorial as PDF and HTML). We provide the source code bundles ZIP files to make this easier for Windows users (where the Unix style tar-balls are harder to decompress). If you are interested in doing Biopython development running the tests is a really good idea, but you'll also want to setup the C compilers so you can build Biopython (assuming you are using C Python, this would not apply to PyPy or Jython). Peter From p.j.a.cock at googlemail.com Mon May 6 08:29:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 09:29:05 +0100 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: On Mon, May 6, 2013 at 6:02 AM, Mic wrote: > Hi Brad, > Thank you it is working, but I have few questions by running the bellow > code: > from BCBio import GFF > from Bio.Seq import Seq > from Bio.SeqRecord import SeqRecord > from Bio.SeqFeature import SeqFeature, FeatureLocation > > out_file = "your_file.gff" > seq = Seq("GATCGATCGATCGATCGATC") > rec = SeqRecord(seq, "ID1") > qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis > thaliana RepID:Q9LMV1_ARATH", > "ID": "gene1"} > sub_qualifiers = {"source": "prediction"} > top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, > qualifiers=qualifiers) > top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", > strand=1, score=12, > qualifiers=sub_qualifiers), > SeqFeature(FeatureLocation(15, 20), > type="exon", strand=1, score=-13, > qualifiers=sub_qualifiers)] > rec.features = [top_feature] > > with open(out_file, "w") as out_handle: > GFF.write([rec], out_handle) > > > * How is it possible to avoid to get e.g. *%20* and is there a way to get > this order ID, note in below output? > note=F5M15.26*%20*n*%3A* > 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 > > * How is it possible to get score in sub_features, because the above code > caused the following error? > Traceback (most recent call > last): > > File "problem.py", line 15, in > > > > qualifiers=sub_qualifiers), > > TypeError: __init__() got an unexpected keyword argument 'score' > > Thank you in advance > > Mic > Hi Mic, Just to give you advance warning, sub-features are being deprecated in the next release of Biopython. You'll still get them when parsing a GenBank file etc, but they won't be used when writing the GenBank file. Instead we have a new CompoundFeatureLocation instead. One of the reasons for doing this is that historically sub-features have been used for complex locations and NOT parent/child style relationships as in GFF. Brad - this would be a good thing for us to work on at the upcoming CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013 Peter From chapmanb at 50mail.com Mon May 6 11:03:20 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 06 May 2013 07:03:20 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> Message-ID: <871u9k5q13.fsf@fastmail.fm> Mic; > Thank you it is working, but I have few questions by running the bellow > code: [...] > * How is it possible to avoid to get e.g. *%20* and is there a way to get > this order ID, note in below output? > note=F5M15.26*%20*n*%3A* > 1%20Tax%3AArabidopsis%20thaliana%20RepID%3AQ9LMV1_ARATH;ID=gene1 Apologies, I am escaping too much according to the GFF specification. I checked in a fix to avoid escaping spaces and semi-colons. If you get the latest version from GitHub it will avoid this issue. I also checked in an update to order the key/value attributes in alphabetical order. There isn't a defined ordering of these in the spec but I agree that a consistent one would be nice. Thanks for all the useful feedback. > * How is it possible to get score in sub_features, because the above code > caused the following error? You want to specify the score as part of the SeqFeature qualifiers. Your fixed code is: from BCBio import GFF from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.SeqFeature import SeqFeature, FeatureLocation out_file = "your_file.gff" seq = Seq("GATCGATCGATCGATCGATC") rec = SeqRecord(seq, "ID1") qualifiers = {"source": "prediction", "note": "F5M15.26 n:1 Tax:Arabidopsis thaliana RepID:Q9LMV1_ARATH", "ID": "gene1"} top_feature = SeqFeature(FeatureLocation(0, 20), type="gene", strand=1, qualifiers=qualifiers) top_feature.sub_features = [SeqFeature(FeatureLocation(0, 5), type="exon", strand=1, qualifiers={"source": "prediction", "score": 12}), SeqFeature(FeatureLocation(15, 20), type="exon", strand=1, qualifiers={"source": "prediction", "score": -13})] rec.features = [top_feature] with open(out_file, "w") as out_handle: GFF.write([rec], out_handle) Peter: > Just to give you advance warning, sub-features are being deprecated > in the next release of Biopython. You'll still get them when parsing a > GenBank file etc, but they won't be used when writing the GenBank > file. Instead we have a new CompoundFeatureLocation instead. > One of the reasons for doing this is that historically sub-features > have been used for complex locations and NOT parent/child style > relationships as in GFF. > > Brad - this would be a good thing for us to work on at the upcoming > CodeFest in Berlin: http://www.open-bio.org/wiki/Codefest_2013 Agreed, I need to get up to date with this on the latest release. I'm also going to spend some time and merge most of the functionality into Ryan's gffutils library so it can import and export Biopython objects: https://github.com/daler/gffutils/tree/refactor Brad From p.j.a.cock at googlemail.com Mon May 6 11:17:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 12:17:12 +0100 Subject: [Biopython] GFF.writer In-Reply-To: <871u9k5q13.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> Message-ID: On Mon, May 6, 2013 at 12:03 PM, Brad Chapman wrote: > > I also checked in an update to order the key/value attributes in > alphabetical order. There isn't a defined ordering of these in the spec > but I agree that a consistent one would be nice. Would using an OrderedDict be neater? i.e. Preserve any user given order or whatever there was when parsing. This would allow ad-hoc conventions like the ID is first to be observed (or whatever the user preferred). We'll be dropping Python 2.5 support shortly so that isn't a problem - and in any case we 're already bundling a backport of OrderedDict under Bio._py3k so you could use that if needed: from Bio._py3k import OrderedDict Peter From p.j.a.cock at googlemail.com Mon May 6 13:26:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 6 May 2013 14:26:20 +0100 Subject: [Biopython] Private '_filenames' attribute in class _SQLiteManySeqFilesDict? In-Reply-To: References: Message-ID: On Fri, May 3, 2013 at 5:42 PM, Carlos Borroto wrote: > Hi, > > I'm writing a tool to index and query a local database using > SeqIO.index_db(). I want to provide something similar to: > $ blastdbcmd -db nt -info > Database: Nucleotide collection (nt) > 17,200,582 sequences; 44,123,601,625 total bases > > Date: Feb 7, 2013 4:14 PM Longest sequence: 65,476,681 bases > > Volumes: > /local/db/blast/nt.00 > /local/db/blast/nt.01 > /local/db/blast/nt.02 > /local/db/blast/nt.03 > /local/db/blast/nt.04 > /local/db/blast/nt.05 > /local/db/blast/nt.06 > /local/db/blast/nt.07 > /local/db/blast/nt.08 > /local/db/blast/nt.09 > /local/db/blast/nt.10 > /local/db/blast/nt.11 > /local/db/blast/nt.12 > > I have limited Object-Oriented Python's skills, actually very low > exposure to OO in general. The object returned by SeqIO.index_db() has > an attribute '_filenames' which I could use to report "volumes". I > recently read it is a common rule to mark private methods and > attributes with the starting underscore. Would it be safe to use > '_filenames' in this case? Could this attribute be converted to a > public one so it can be safely use without fear of breaking the code > in the future? > > Thanks, > Carlos Hi Carlos, Yes, this use of the underscore is intended to indicate that the attribute _filenames is private. This should be safe to use if you treat it as read only, but the intent was to keep open the option of changing the implementation without altering the public API. In this case, there is no need to keep the list in memory since it is also in the SQLite index file - and so a future optimisation to reduce memory usage might remove the _filenames attribute. However, that hasn't changed yet. I would not like to make this a public (read write) attribute. It might be reasonable to expose it as a read only attribute (using a Python property). For now I would just use it in your code, but comment that it is a private attribute and could break in a future release of Biopython. Regards, Peter From chapmanb at 50mail.com Tue May 7 02:30:31 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 06 May 2013 22:30:31 -0400 Subject: [Biopython] GFF.writer In-Reply-To: References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> Message-ID: <8761yvh67s.fsf@fastmail.fm> Peter; > Would using an OrderedDict be neater? i.e. Preserve any user > given order or whatever there was when parsing. This would > allow ad-hoc conventions like the ID is first to be observed > (or whatever the user preferred). The current API generates the GFF from Biopython Seq and SeqFeature objects, so there isn't a clean way to pass through ordering like this. We could expose qualifiers as OrderedDicts if that's a useful change, but still need to pick an ordering for non-qualifier items. Practically, there is no guaranteed order to GFF3 attributes. Exposing an alphabetized list seems reasonable but it's probably not worth going too far down this path. Brad From tomlin.9 at wright.edu Tue May 14 04:07:57 2013 From: tomlin.9 at wright.edu (Tomlin, Joshua James) Date: Tue, 14 May 2013 04:07:57 +0000 Subject: [Biopython] help - Biopython Install Message-ID: I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors which you can see below. Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Entrez_online ... FAIL test_SearchIO_write ... FAIL test_SeqIO_index ... FAIL test_Tutorial ... FAIL test_bgzf ... FAIL Bio.bgzf docstring test ... FAIL ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) From tomlin.9 at wright.edu Tue May 14 00:38:35 2013 From: tomlin.9 at wright.edu (Tomlin, Joshua James) Date: Tue, 14 May 2013 00:38:35 +0000 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors as seen in the output below: Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Install psycopg2 if you want to use pg with BioSQL test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... skipping. C module in Bio.KDTree not compiled test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... skipping. C module in Bio.KDTree not compiled test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... FAIL test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_SeqXML ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ok test_Tutorial ... FAIL test_UniGene ... ok test_Uniprot ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... ok test_prosite1 ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Motif docstring test ... ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.Seq docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.FastaIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez_online.py", line 34, in test_read_from_url rec = Entrez.read(einfo) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read record = handler.read(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_write_single_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, single query (xml_2226_blastp_004.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 49, in test_write_single_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_index.py", line 432, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) From nlindberg at mkei.org Tue May 14 13:48:42 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Tue, 14 May 2013 13:48:42 +0000 Subject: [Biopython] Installation Help In-Reply-To: References: , Message-ID: How did you install it? Sent from my Verizon Wireless Phone ----- Reply message ----- From: "Tomlin, Joshua James" Date: Tue, May 14, 2013 3:58 am Subject: [Biopython] Installation Help To: "biopython at biopython.org" I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors as seen in the output below: Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? Thanks. Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Install psycopg2 if you want to use pg with BioSQL test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... skipping. C module in Bio.KDTree not compiled test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... skipping. C module in Bio.KDTree not compiled test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... FAIL test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_SeqXML ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ok test_Tutorial ... FAIL test_UniGene ... ok test_Uniprot ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... ok test_prosite1 ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/joshtomlin/Downloads/biopython-1.61/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Motif docstring test ... ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.Seq docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.FastaIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez_online.py", line 34, in test_read_from_url rec = Entrez.read(einfo) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read record = handler.read(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_write_single_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) Test blast-xml writing from blast-xml, BLAST 2.2.26+, single query (xml_2226_blastp_004.xml) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SearchIO_write.py", line 49, in test_write_single_from_blastxml self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) File "test_SearchIO_write.py", line 27, in parse_write_and_compare SearchIO.write(source_qresults, out_file, out_format, **kwargs) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write writer.write_file(qresults) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file xml.startDocument() File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument self.write('\n' File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write super(UnbufferedTextIOWrapper, self).write(s) TypeError: must be unicode, not str ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "test_SeqIO_index.py", line 432, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 1036, in FastqPhredIterator for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/QualityIO.py", line 897, in FastqGeneralIterator line = handle_readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 272, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 163, in key_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "test_SeqIO_index.py", line 101, in simple_check id_list = [rec.id for rec in SeqIO.parse(h, format, alphabet)] File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 457, in parse_records record = self.parse(handle, do_features) File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 441, in parse if self.feed(handle, consumer, do_features): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 398, in feed if not self.find_start(): File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/GenBank/Scanner.py", line 78, in find_start line = self.handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 451, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 258, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read self._read_gzip_header() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 197, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1289, in __run compileflags, 1) in test.globs File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 213 tests in 296.686 seconds FAILED (failures = 6) _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Tue May 14 14:10:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 15:10:16 +0100 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 1:38 AM, Tomlin, Joshua James wrote: > > I'm trying to install biopython on my laptop. I have mac os x 10.7.5, > python 2.7, and have installed numpy I believe. > I can run the 'import Bio' command in python without any errors but > when I try run 'python setup.py test' I get 6 errors as seen in the output below: > > Basically I need to know if I have done everything I need to do correctly? > If I have then why are these 6 tests failing and will that impact my future > use of biopython? There are four issues here, mostly quite minor: * NCBI online resource temporarily unavailable, retest should be fine * Missing test files ( see below) * Python 2.7.4 made a change to the SAX XML parser (see below) * GZIP bug in Python 2.7.4 (see below) There were some unfortunate breakages in Python 2.7.4 which you've run into, plus missing test files. If you installed Python 2.7.4 yourself, it might be simplest to replace it with Python 2.7.3 in the short term. If this came with the Apple OS then don't do that ;) > ====================================================================== > ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) > Test Entrez.read from URL > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_Entrez_online.py", line 34, in test_read_from_url > rec = Entrez.read(einfo) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/__init__.py", line 362, in read > record = handler.read(handle) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 184, in read > self.parser.ParseFile(handle) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/Entrez/Parser.py", line 322, in endElementHandler > raise RuntimeError(value) > RuntimeError: Unable to open connection to #DbInfo?dbaf= This is an online test and the NCBI API didn't respond in this case, if you wait and retry it should work. > ====================================================================== > ERROR: test_write_multiple_from_blastxml (test_SearchIO_write.BlastXmlWriteCases) > Test blast-xml writing from blast-xml, BLAST 2.2.26+, multiple queries (xml_2226_blastp_001.xml) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_SearchIO_write.py", line 55, in test_write_multiple_from_blastxml > self.parse_write_and_compare(source, self.fmt, self.out, self.fmt) > File "test_SearchIO_write.py", line 27, in parse_write_and_compare > SearchIO.write(source_qresults, out_file, out_format, **kwargs) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/__init__.py", line 610, in write > writer.write_file(qresults) > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 695, in write_file > xml.startDocument() > File "/Users/joshtomlin/Downloads/biopython-1.61/Bio/SearchIO/BlastIO/blast_xml.py", line 612, in startDocument > self.write('\n' > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/saxutils.py", line 103, in write > super(UnbufferedTextIOWrapper, self).write(s) > TypeError: must be unicode, not str Python 2.7.4 broke our code, see: http://lists.open-bio.org/pipermail/biopython-dev/2013-April/010505.html The next Biopython release will fix this, so you could try that or perhaps downgrade to Python 2.7.3 instead? > ====================================================================== > ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) > Index fastq-sanger file Quality/example.fastq.bgz get_raw > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_SeqIO_index.py", line 432, in > f = lambda x : x.get_raw_check(fn, fmt, alpha, c) > File "test_SeqIO_index.py", line 272, in get_raw_check > raw_file = h.read() > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read > while self._read(readsize): > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read > self._read_gzip_header() > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header > self._read_exact(struct.unpack(" File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 185, in _read_exact > data = self.fileobj.read(n) > TypeError: an integer is required This is a bug in Python 2.7.4, which broke GZIP support. You can downgrade to Python 2.7.3, wait for 2.7.5, apply the one line fix by hand: http://bugs.python.org/issue17666 or ignore this if you do not plan to use BGZF compressed files. (This explains most of the failures) > ====================================================================== > ERROR: test_doctests (test_Tutorial.TutorialTestCase) > Run tutorial doctests. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_Tutorial.py", line 152, in test_doctests > ValueError: 4 Tutorial doctests failed: test_from_line_05671, test_from_line_06030, test_from_line_06190, test_from_line_06479 Recently reported bug, I missed two sample data files, see: http://lists.open-bio.org/pipermail/biopython-dev/2013-May/010599.html You can download the two missing files: Doc/examples/my_blast.xml Doc/examples/my_blat.psl Available from github or here: http://biopython.org/SRC/Doc/examples/my_blast.xml http://biopython.org/SRC/Doc/examples/my_blat.psl I guess we need to push out Biopython 1.62 sooner rather than later, which would solve most of these issues. Thanks for taking the time to report these problems. Regards, Peter (Sorry you'll get this twice Tomlin, I forget to CC the list on my first reply) From p.j.a.cock at googlemail.com Tue May 14 14:11:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 15:11:28 +0100 Subject: [Biopython] help - Biopython Install In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 5:07 AM, Tomlin, Joshua James wrote: > I'm trying to install biopython on my laptop. I have mac os x 10.7.5, python 2.7, and have installed numpy I believe. > I can run the 'import Bio' command in python without any errors but when I try run 'python setup.py test' I get 6 errors which you can see below. > > Basically I need to know if I have done everything I need to do correctly? If I have then why are these 6 tests failing and will that impact my future use of biopython? > > Thanks. > > Python version: 2.7.4 (v2.7.4:026ee0057e2d, Apr 6 2013, 11:43:10) > [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] > Operating system: posix darwin > test_Entrez_online ... FAIL > test_SearchIO_write ... FAIL > test_SeqIO_index ... FAIL > test_Tutorial ... FAIL > test_bgzf ... FAIL > Bio.bgzf docstring test ... FAIL > ---------------------------------------------------------------------- > Ran 213 tests in 296.686 seconds > > FAILED (failures = 6) Hi Tomlin, I replied to your longer email with the full test output (it was held for moderation as the email was unusually long): http://lists.open-bio.org/pipermail/biopython/2013-May/008558.html Peter From norbert.auer at boku.ac.at Tue May 14 16:27:00 2013 From: norbert.auer at boku.ac.at (Norbert Auer) Date: Tue, 14 May 2013 18:27:00 +0200 Subject: [Biopython] corrupted blast results In-Reply-To: <51926522020000BD00012F85@gwia2.boku.ac.at> References: <51926522020000BD00012F85@gwia2.boku.ac.at> Message-ID: <519281F4020000BD00012F92@gwia2.boku.ac.at> Hi, I have currently some problems using the NCBIWWW.qblast function. I used this query to blast some sequences. result_handle = NCBIWWW.qblast("blastn", "refseq_genomic", seq_fasta,entrez_query="txid10029 [ORGN]",hitlist_size=2) save_file = open("blast.xml", "w") blast_results = result_handle.read() save_file.write(blast_results) result_handle.close() Last time I haven't any problems with this script but today I get only corrupted (not well formed) XML files back. In my last try I got a correct XML File but after a deeper investigation of this file I found out that the showed alignment was wrong. The header shows Identities = 660/661 but looking into the alignment shows that this cannot be true. I used a similar query over the web fronted and got the same hit expect that the alignment was correct. It seems that there was a insertion of 3 nucleotides in the middle of the subject sequence. How could this be? I have no explanation for this behaviour. from the NCBIWWW.qblast function: Query 241 AAGGCAGGACTGAAGAGTGTCATTATGGGGTGAGCCTTTCAAGGTCCCTGCCACTCTCTC 300 ||||||||||||||||||||||||||||||||||||||||| | | Sbjct 1002610 AAGGCAGGACTGAAGAGTGTCATTATGGGGTGAGCCTTTCATCAAGGTCCCTGCCACTCT 1002551 from the web fronted: Query 241 ACTCTCTTTGTGTACTTTAAAGGTGCTGTGCCCCAAACTCCTGGGACACGGAGAGAACTC 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1169534 ACTCTCTTTGTGTACTTTAAAGGTGCTGTGCCCCAAACTCCTGGGACACGGAGAGAACTC 1169593 I was wondering if this is a NCBI service problem (running on a different server than the web fronted) or is it a biopython issue? I use biopython version 1.61 If necessary I could attach the blast XML files but they are very long. Thanks From p.j.a.cock at googlemail.com Tue May 14 18:11:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 May 2013 19:11:45 +0100 Subject: [Biopython] Installation Help In-Reply-To: References: Message-ID: On Tue, May 14, 2013 at 3:10 PM, Peter Cock wrote: > On Tue, May 14, 2013 at 1:38 AM, Tomlin, Joshua James wrote: >> >> ====================================================================== >> ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) >> Index fastq-sanger file Quality/example.fastq.bgz get_raw >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "test_SeqIO_index.py", line 432, in >> f = lambda x : x.get_raw_check(fn, fmt, alpha, c) >> File "test_SeqIO_index.py", line 272, in get_raw_check >> raw_file = h.read() >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 253, in read >> while self._read(readsize): >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 293, in _read >> self._read_gzip_header() >> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 205, in _read_gzip_header >> self._read_exact(struct.unpack("> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 185, in _read_exact >> data = self.fileobj.read(n) >> TypeError: an integer is required > > This is a bug in Python 2.7.4, which broke GZIP support. > You can downgrade to Python 2.7.3, wait for 2.7.5, apply > the one line fix by hand: http://bugs.python.org/issue17666 or > ignore this if you do not plan to use BGZF compressed files. > > (This explains most of the failures) I've modified the unit tests to skip test_bgzf.py and the Bio.bgzf doctests if run on Python with this broken gzip: https://github.com/biopython/biopython/commit/975f5c4f6422951ff2ac54bed6928312fdcd1a51 https://github.com/biopython/biopython/commit/975f5c4f6422951ff2ac54bed6928312fdcd1a51 The user will now see this in the full test output: test_bgzf ... skipping. Your Python has a broken gzip library, see http://bugs.python.org/issue17666 for details Peter From hlapp at drycafe.net Wed May 15 20:44:07 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 15 May 2013 16:44:07 -0400 Subject: [Biopython] Workshop on Sustainable Software for Science: Practice and Experiences Message-ID: FYI, if you haven't seen this yet: http://wssspe.researchcomputing.org.uk/ It seems to me that the Bio* projects, perhaps led by BioPerl as the oldest and thus longest running (nowadays more fancily called "sustained") of them would have a lot to say about the subject. Anyone interested in a joint submission? Also, I notice that Biojava's Andreas is on the organizing committee, so maybe he's been conspiring on something already :-) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From cjfields at illinois.edu Thu May 16 04:43:22 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 16 May 2013 04:43:22 +0000 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Jason and I have discussed looking into opportunity's like this, I think it makes sense to try a joint submission. chris On May 15, 2013, at 3:44 PM, Hilmar Lapp wrote: > FYI, if you haven't seen this yet: > > http://wssspe.researchcomputing.org.uk/ > > It seems to me that the Bio* projects, perhaps led by BioPerl as the oldest and thus longest running (nowadays more fancily called "sustained") of them would have a lot to say about the subject. Anyone interested in a joint submission? > > Also, I notice that Biojava's Andreas is on the organizing committee, so maybe he's been conspiring on something already :-) > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From mictadlo at gmail.com Thu May 16 04:57:12 2013 From: mictadlo at gmail.com (Mic) Date: Thu, 16 May 2013 14:57:12 +1000 Subject: [Biopython] NCBIXML.parse Message-ID: Hi, Why does NCBIXML.parse attach UR090 to the UniRef90 ids and as results I get is UR090:UniRef90_Q9FX16 with the following code: with open("x.blastp.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: query_name = blast_record.query for alignment in blast_record.alignments: hit_id = alignment.hit_id Is it possible to remove UR090 or maybe it should be UR090:Q9FX16? Is UR090:UniRef90_Q9FX16 compatible Gbrowse2? Thank you in advance. Mic From mictadlo at gmail.com Thu May 16 05:10:35 2013 From: mictadlo at gmail.com (Mic) Date: Thu, 16 May 2013 15:10:35 +1000 Subject: [Biopython] GFF.writer In-Reply-To: <8761yvh67s.fsf@fastmail.fm> References: <878v3ws5dy.fsf@fastmail.fm> <871u9k5q13.fsf@fastmail.fm> <8761yvh67s.fsf@fastmail.fm> Message-ID: Hi all, Thank you it is working fine. From SNAP I have got Eterm and Einit as sub_features together with a score than I created a top_feature gene and got the following gff3 file: ##gff-version 3 ##sequence-region X 1 4795218 X SNAP gene 5974 7324 . - . ID=X-snap.4;Name=UR090:UniRef90_Q9FX16;Note=F12G12.10 protein n:1 Tax:Arabidopsis thaliana RepID:Q9FX16_ARATH X SNAP Eterm 5974 6007 5.650 - . Parent=X-snap.4 X SNAP Einit 6161 7324 -5.800 - . Parent=X-snap.4 Now I just wonder whether I should add sub_features' score together (5.650 + (-5.800 ) = -0.1499) and the result insert to the top_feature score ( -0.1499)? Thank you in advance. Mic On Tue, May 7, 2013 at 12:30 PM, Brad Chapman wrote: > > Peter; > > > Would using an OrderedDict be neater? i.e. Preserve any user > > given order or whatever there was when parsing. This would > > allow ad-hoc conventions like the ID is first to be observed > > (or whatever the user preferred). > > The current API generates the GFF from Biopython Seq and SeqFeature > objects, so there isn't a clean way to pass through ordering like this. > We could expose qualifiers as OrderedDicts if that's a useful change, > but still need to pick an ordering for non-qualifier items. > > Practically, there is no guaranteed order to GFF3 attributes. Exposing > an alphabetized list seems reasonable but it's probably not worth going > too far down this path. > > Brad > From p.j.a.cock at googlemail.com Thu May 16 09:10:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 May 2013 10:10:25 +0100 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Message-ID: On Thu, May 16, 2013 at 5:43 AM, Fields, Christopher J wrote: > Jason and I have discussed looking into opportunity's like this, I think it makes > sense to try a joint submission. > > chris This sounds like a good idea, although given the time and place I am unlikely to be able to attend in person: First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE) (to held in conjunction with SC13, Sunday, 17 November 2013, Denver, CO, USA) http://wssspe.researchcomputing.org.uk/ Rather than trying to discuss this over four mailing lists should we switch to the cross project list open-bio-l, or continue off-list? http://lists.open-bio.org/mailman/listinfo/open-bio-l Thanks, Peter From cjfields at illinois.edu Thu May 16 13:09:45 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 16 May 2013 13:09:45 +0000 Subject: [Biopython] [Bioperl-l] Workshop on Sustainable Software for Science: Practice and Experiences In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF74E1F8C8@CHIMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF74E1FBCF@CHIMBX5.ad.uillinois.edu> Yes, though we need to make sure others (e.g. those not subscribed to open-bio-l) are in the loop. November is a possibility for me. chris On May 16, 2013, at 4:10 AM, Peter Cock wrote: > On Thu, May 16, 2013 at 5:43 AM, Fields, Christopher J > wrote: >> Jason and I have discussed looking into opportunity's like this, I think it makes >> sense to try a joint submission. >> >> chris > > This sounds like a good idea, although given the time and place I am > unlikely to be able to attend in person: > > First Workshop on Sustainable Software for Science: Practice and > Experiences (WSSSPE) > (to held in conjunction with SC13, Sunday, 17 November 2013, Denver, CO, USA) > http://wssspe.researchcomputing.org.uk/ > > Rather than trying to discuss this over four mailing lists should we switch > to the cross project list open-bio-l, or continue off-list? > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > Thanks, > > Peter From p.j.a.cock at googlemail.com Thu May 16 14:20:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 May 2013 15:20:41 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Thu, May 16, 2013 at 5:57 AM, Mic wrote: > Hi, > Why does NCBIXML.parse attach UR090 to the UniRef90 ids and as results I > get is UR090:UniRef90_Q9FX16 with the following code: > > with open("x.blastp.xml") as bf: > blast_records = NCBIXML.parse(bf) > > for blast_record in blast_records: > query_name = blast_record.query > for alignment in blast_record.alignments: > hit_id = alignment.hit_id > > Is it possible to remove UR090 or maybe it should be UR090:Q9FX16? > > Is UR090:UniRef90_Q9FX16 compatible Gbrowse2? > > Thank you in advance. > > Mic Can you post your x.blastp.xml online somewhere for us to look at? If not perhaps you can at least include the relevant snippet of the XML file in your email, and tell us about the database you are using. Thanks, Peter From p.j.a.cock at googlemail.com Fri May 17 08:55:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 May 2013 09:55:05 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Fri, May 17, 2013 at 5:26 AM, Mic wrote: > Please find attached the x.blastp.xml file. Do you need more information? > > >>> from Bio.Blast import NCBIXML >>> with open("x.blastp.xml") as bf: ... for r in NCBIXML.parse(bf): ... for a in r.alignments: ... print r.query, a.hit_id ... X-snap.4 UniRef90_Q9FX16 (etc) This matches the first hit in the XML, 1 UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UniRef90_Q9FX16 308 If you are getting UR090:UniRef90_Q9FX16 then perhaps some other part of your code is adding the prefix? What else did your code do (it was incomplete - there were no print statements for example)? Regards, Peter From p.j.a.cock at googlemail.com Mon May 20 10:53:01 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 May 2013 11:53:01 +0100 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: On Mon, May 20, 2013 at 4:26 AM, Mic wrote: > I am sorry, the XML file which I sent was created with one year old Blast > library. When I run Blast with the following command and a new UniRef90 > library > > blastp -query X.aa.snap -db /db/uniprot/uniref90 -evalue 0.00001 > -max_target_seqs 15 -out x.blastp.xml -num_threads 6 -outfmt 5 > > Please find attached the new XML file ... Got it, and yes this does use a different ID style: 1 UR090:UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UR090:UniRef90_Q9FX16 308 If you want to change that then I would review how the database was created (e.g. did you make this BLAST database yourself with makeblastdb (new) or formatdb (old), and if so what identifiers did the input FASTA file use?). It might be simpler to just handle the alternative identifier style in your script. > and it looks like that a new schema has been created > http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases?style=xml . > > Michal That was a link to the EDAM ontology - I don't see how that is related to the NCBI BLAST XML schema? Thanks, Peter From mictadlo at gmail.com Mon May 20 03:26:29 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 20 May 2013 13:26:29 +1000 Subject: [Biopython] NCBIXML.parse In-Reply-To: References: Message-ID: I am sorry, the XML file which I sent was created with one year old Blast library. When I run Blast with the following command and a new UniRef90 library blastp -query X.aa.snap -db /db/uniprot/uniref90 -evalue 0.00001 -max_target_seqs 15 -out x.blastp.xml -num_threads 6 -outfmt 5 Please find attached the new XML file and it looks like that a new schema has been created http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases?style=xml . Michal On Fri, May 17, 2013 at 6:55 PM, Peter Cock wrote: > > On Fri, May 17, 2013 at 5:26 AM, Mic wrote: > >> Please find attached the x.blastp.xml file. Do you need more information? >> >> > > >>> from Bio.Blast import NCBIXML > > >>> with open("x.blastp.xml") as bf: > ... for r in NCBIXML.parse(bf): > ... for a in r.alignments: > ... print r.query, a.hit_id > ... > X-snap.4 UniRef90_Q9FX16 > (etc) > > This matches the first hit in the XML, > > 1 > UniRef90_Q9FX16 > F12G12.10 protein n=1 Tax=Arabidopsis thaliana > RepID=Q9FX16_ARATH > UniRef90_Q9FX16 > 308 > > If you are getting UR090:UniRef90_Q9FX16 then perhaps some other > part of your code is adding the prefix? What else did your code do > (it was incomplete - there were no print statements for example)? > > Regards, > > Peter > > -------------- next part -------------- A non-text attachment was scrubbed... Name: x.blastp2.xml Type: text/xml Size: 6029 bytes Desc: not available URL: From ferreirafm at usp.br Wed May 22 19:08:18 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Wed, 22 May 2013 16:08:18 -0300 Subject: [Biopython] write one rec to file at once Message-ID: <519D17A2.9060501@usp.br> Hi list, Is there any way of writing one and only one record to a fastafile, without have to open a reclist, append rec to it and only then write the reclist to a previous opened file? I was just thinking, if I can read a rec at once like: inrec = SeqIO.read(open(fastafile, "rU"), "fasta") why not write it at once? Best, Fred From jocelyne at gmail.com Wed May 22 19:46:00 2013 From: jocelyne at gmail.com (Jocelyne) Date: Wed, 22 May 2013 12:46:00 -0700 Subject: [Biopython] write one rec to file at once In-Reply-To: <519D17A2.9060501@usp.br> References: <519D17A2.9060501@usp.br> Message-ID: SeqIO.write(rec, fastq_out, "fastq") On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira wrote: > Hi list, > Is there any way of writing one and only one record to a fastafile, without > have to open a reclist, append rec to it and only then write the reclist to > a previous opened file? I was just thinking, if I can read a rec at once > like: > inrec = SeqIO.read(open(fastafile, "rU"), "fasta") > why not write it at once? > Best, > Fred > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ferreirafm at usp.br Wed May 22 20:06:32 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Wed, 22 May 2013 17:06:32 -0300 Subject: [Biopython] write one rec to file at once In-Reply-To: References: <519D17A2.9060501@usp.br> Message-ID: <519D2548.3050303@usp.br> Tks. Em 22-05-2013 16:46, Jocelyne escreveu: > SeqIO.write(rec, fastq_out, "fastq") > > On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira > wrote: >> Hi list, >> Is there any way of writing one and only one record to a fastafile, without >> have to open a reclist, append rec to it and only then write the reclist to >> a previous opened file? I was just thinking, if I can read a rec at once >> like: >> inrec = SeqIO.read(open(fastafile, "rU"), "fasta") >> why not write it at once? >> Best, >> Fred >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython -- Dr. Frederico Moraes Ferreira University of Sao Paulo Heart Institute, School of Medicine Laboratoy of Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 Sao Paulo - SP Brasil From idoerg at gmail.com Wed May 22 20:25:12 2013 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 22 May 2013 16:25:12 -0400 Subject: [Biopython] write one rec to file at once In-Reply-To: References: <519D17A2.9060501@usp.br> Message-ID: Probably a good idea to add this to the the manual/cookbook. On Wed, May 22, 2013 at 3:46 PM, Jocelyne wrote: > SeqIO.write(rec, fastq_out, "fastq") > > On Wed, May 22, 2013 at 12:08 PM, Frederico Moraes Ferreira > wrote: > > Hi list, > > Is there any way of writing one and only one record to a fastafile, > without > > have to open a reclist, append rec to it and only then write the reclist > to > > a previous opened file? I was just thinking, if I can read a rec at once > > like: > > inrec = SeqIO.read(open(fastafile, "rU"), "fasta") > > why not write it at once? > > Best, > > Fred > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From ivangreg at gmail.com Thu May 23 13:07:59 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:07:59 -0400 Subject: [Biopython] Displaying pairwise2 output fails. Message-ID: Hello Biopythonians, Are you able to display pairwise alignments? It fails in my system: from Bio import pairwise2 for a in pairwise2.align.globalxx("ACCGT", "ACG"): print format_alignment(*a) Traceback (most recent call last): File "", line 2, in NameError: name 'format_alignment' is not defined Notice that the commands above are just a copy/paste of its docstring. It should have produced something like this ACCGT ||||| AC-G- Score=3 ACCGT ||||| A-CG- Score=3 My system information (Fedora 18): Python 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2 and Biopython 1.61. Any help would be appreciated. Ivan Ivan Gregoretti, PhD Bioinformatics From w.arindrarto at gmail.com Thu May 23 13:13:36 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 23 May 2013 15:13:36 +0200 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: Hi Ivan On Thu, May 23, 2013 at 3:07 PM, Ivan Gregoretti wrote: > Hello Biopythonians, > > Are you able to display pairwise alignments? It fails in my system: > > from Bio import pairwise2 > for a in pairwise2.align.globalxx("ACCGT", "ACG"): > print format_alignment(*a) > > Traceback (most recent call last): > File "", line 2, in > NameError: name 'format_alignment' is not defined This happens because of the way Python namespacing works. With your current import, you should write this: print pairwise2.format_alignment ... instead of this: print format_alignment ... The code you have will work if you import format_alignment explicitly, like so: from Bio.pairwise2 import format_alignment Hope that helps :), Bow From ivangreg at gmail.com Thu May 23 13:32:52 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:32:52 -0400 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: Thank you Bow. I did try many variations to try to load format_alignment() but I was unsuccessful; "from Bio.pairwise2 import format_alignment" did not cross my mind. Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 9:13 AM, Wibowo Arindrarto wrote: > Hi Ivan > > On Thu, May 23, 2013 at 3:07 PM, Ivan Gregoretti wrote: >> Hello Biopythonians, >> >> Are you able to display pairwise alignments? It fails in my system: >> >> from Bio import pairwise2 >> for a in pairwise2.align.globalxx("ACCGT", "ACG"): >> print format_alignment(*a) >> >> Traceback (most recent call last): >> File "", line 2, in >> NameError: name 'format_alignment' is not defined > > This happens because of the way Python namespacing works. With your > current import, you should write this: > > print pairwise2.format_alignment ... > > instead of this: > > print format_alignment ... > > The code you have will work if you import format_alignment explicitly, like so: > > from Bio.pairwise2 import format_alignment > > Hope that helps :), > Bow From ivangreg at gmail.com Thu May 23 13:36:39 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 09:36:39 -0400 Subject: [Biopython] pairwise2 and cpairwise2 Message-ID: One more about pairwise2: At the bottom of this page http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html and in the Biopython tutorial, there are references to a C implementation of pairwise2, called cpairwise2. How do you run that C implementation without falling back to pure python? Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics From p.j.a.cock at googlemail.com Thu May 23 13:44:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 14:44:04 +0100 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 2:36 PM, Ivan Gregoretti wrote: > One more about pairwise2: > > At the bottom of this page > http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html > > and in the Biopython tutorial, there are references to a C > implementation of pairwise2, called cpairwise2. > > How do you run that C implementation without falling back to pure python? > > Thank you, > > Ivan It should happen automatically, assuming you're not running under Jython or PyPy where the C code isn't used. How did you install Biopython? Peter From p.j.a.cock at googlemail.com Thu May 23 13:51:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 14:51:32 +0100 Subject: [Biopython] Displaying pairwise2 output fails. In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 2:32 PM, Ivan Gregoretti wrote: > Thank you Bow. I did try many variations to try to load > format_alignment() but I was unsuccessful; "from Bio.pairwise2 import > format_alignment" did not cross my mind. > > Ivan Thanks for this feedback, I've attempted to clarify the example: https://github.com/biopython/biopython/commit/7b058bf9ade7922bf746f4f7c7ebcb897c236a94 Peter From ivangreg at gmail.com Thu May 23 14:33:00 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 10:33:00 -0400 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: Hello Peter, I installed Biopython form biopython-1.61.tar.gz. No customisation, just python setup.py build python setup.py test sudo python setup.py install Is there a way to show that pairwise2 is not falling back to pure python? Alignment feels a bit slow in my machine. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 9:44 AM, Peter Cock wrote: > On Thu, May 23, 2013 at 2:36 PM, Ivan Gregoretti wrote: >> One more about pairwise2: >> >> At the bottom of this page >> http://biopython.org/DIST/docs/api/Bio.pairwise2-pysrc.html >> >> and in the Biopython tutorial, there are references to a C >> implementation of pairwise2, called cpairwise2. >> >> How do you run that C implementation without falling back to pure python? >> >> Thank you, >> >> Ivan > > It should happen automatically, assuming you're not running > under Jython or PyPy where the C code isn't used. How did > you install Biopython? > > Peter From p.j.a.cock at googlemail.com Thu May 23 14:42:06 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 May 2013 15:42:06 +0100 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: On Thu, May 23, 2013 at 3:33 PM, Ivan Gregoretti wrote: > Hello Peter, > > I installed Biopython form biopython-1.61.tar.gz. No customisation, just > > python setup.py build > python setup.py test > sudo python setup.py install > > Is there a way to show that pairwise2 is not falling back to pure python? > Alignment feels a bit slow in my machine. > > Thank you, > > Ivan If the C version not available, Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_45 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import pairwise2 >>> from Bio import cpairwise2 Traceback (most recent call last): File "", line 1, in ImportError: cannot import name cpairwise2 If the C version is installed, this should work: Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import pairwise2 >>> from Bio import cpairwise2 >>> pairwise2.rint is cpairwise2.rint True >>> pairwise2._make_score_matrix_fast is cpairwise2._make_score_matrix_fast True Peter From ivangreg at gmail.com Thu May 23 14:47:11 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 23 May 2013 10:47:11 -0400 Subject: [Biopython] pairwise2 and cpairwise2 In-Reply-To: References: Message-ID: Thank you Peter, my system is indeed using the C version. Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 23, 2013 at 10:42 AM, Peter Cock wrote: > On Thu, May 23, 2013 at 3:33 PM, Ivan Gregoretti wrote: >> Hello Peter, >> >> I installed Biopython form biopython-1.61.tar.gz. No customisation, just >> >> python setup.py build >> python setup.py test >> sudo python setup.py install >> >> Is there a way to show that pairwise2 is not falling back to pure python? >> Alignment feels a bit slow in my machine. >> >> Thank you, >> >> Ivan > > If the C version not available, > > Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) > [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_45 > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio import pairwise2 >>>> from Bio import cpairwise2 > Traceback (most recent call last): > File "", line 1, in > ImportError: cannot import name cpairwise2 > > If the C version is installed, this should work: > > Python 2.7.2 (default, Oct 11 2012, 20:14:37) > [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> from Bio import pairwise2 >>>> from Bio import cpairwise2 >>>> pairwise2.rint is cpairwise2.rint > True >>>> pairwise2._make_score_matrix_fast is cpairwise2._make_score_matrix_fast > True > > Peter From francesco.chiani at gmail.com Mon May 27 15:29:12 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Mon, 27 May 2013 17:29:12 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: HI pythonian, I'm trying to convert a simple txt unigene list in a gene symbol list with Bio.Entrez , let's semplify: just one unigene: handle =Entrez.esearch(db="unigene", term="Z26634") but..what I've got from Entrez.read is fairly unusefull, I think.. what I'm doing wrong? kindly From p.j.a.cock at googlemail.com Mon May 27 15:36:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 May 2013 16:36:45 +0100 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: On Mon, May 27, 2013 at 4:29 PM, francesco chiani wrote: > HI pythonian, > I'm trying to convert a simple txt unigene list in a gene symbol list with > Bio.Entrez , let's semplify: just one unigene: > handle =Entrez.esearch(db="unigene", term="Z26634") > but..what I've got from Entrez.read is fairly unusefull, I think.. what I'm > doing wrong? > > kindly Hi Francesco, I'm not clear what you are hoping to achieve here. Could you give an example of the input data (e.g. a list like ["Z26634", ...] perhaps?) and the desired output to match? Thanks, Peter From francesco.chiani at gmail.com Mon May 27 17:58:52 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Mon, 27 May 2013 19:58:52 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: Hi Peter, As you said the input is a list look like unigene_list=[" Z26634","....",".."] The output another list gene_symbol =["ANK2", "...",".."] Sorry for the very bad posed question. Il giorno 27/mag/2013 17:36, "Peter Cock" ha scritto: > On Mon, May 27, 2013 at 4:29 PM, francesco chiani > wrote: > > HI pythonian, > > I'm trying to convert a simple txt unigene list in a gene symbol list > with > > Bio.Entrez , let's semplify: just one unigene: > > handle =Entrez.esearch(db="unigene", term="Z26634") > > but..what I've got from Entrez.read is fairly unusefull, I think.. what > I'm > > doing wrong? > > > > kindly > > Hi Francesco, > > I'm not clear what you are hoping to achieve here. > > Could you give an example of the input data (e.g. a list like > ["Z26634", ...] perhaps?) and the desired output to match? > > Thanks, > > Peter > From mictadlo at gmail.com Tue May 28 01:49:30 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 28 May 2013 11:49:30 +1000 Subject: [Biopython] gff3: feature.location.end problem Message-ID: Hi, When parsing this gff3 file: ##gff-version 3 ##sequence-region ID1 1 20 ID1 prediction gene 1 20 10.0 + . other=Some,annotations;ID=gene1 ID1 prediction exon 1 5 . + . Parent=gene1 ID1 prediction exon 16 20 . + . Parent=gene1 with this code: from BCBio import GFF # handles GFF files with open("test.gff3") as file: for rec in GFF.parse(file): annotations = rec.annotations['sequence-region'][0] id = annotations[0] start = int(annotations[1]) end = int(annotations[2]) print id, start, end for feature in rec.features: contig_id = feature.qualifiers['ID'][0] print contig_id, int(feature.location.start), int(feature.location.end) I get the following output: ID1 1 20 gene1 0 20 Why is it not "gene1 0 19" and "ID1 0 19"? Thank you in advance. Mic From p.j.a.cock at googlemail.com Tue May 28 08:41:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 28 May 2013 09:41:58 +0100 Subject: [Biopython] gff3: feature.location.end problem In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 2:49 AM, Mic wrote: > Hi, > When parsing this gff3 file: > > ##gff-version 3 > ##sequence-region ID1 1 20 > ID1 prediction gene 1 20 10.0 + . > other=Some,annotations;ID=gene1 > ID1 prediction exon 1 5 . + . > Parent=gene1 > ID1 prediction exon 16 20 . + . > Parent=gene1 > > > with this code: > > from BCBio import GFF # handles GFF files > > with open("test.gff3") as file: > for rec in GFF.parse(file): > annotations = rec.annotations['sequence-region'][0] > id = annotations[0] > start = int(annotations[1]) > end = int(annotations[2]) > print id, start, end > > for feature in rec.features: > contig_id = feature.qualifiers['ID'][0] > print contig_id, int(feature.location.start), > int(feature.location.end) > > I get the following output: > ID1 1 20 > gene1 0 20 > > > Why is it not "gene1 0 19" and "ID1 0 19"? > > Thank you in advance. > > Mic Hi Mic, That looks correct, just like when parsing a GenBank/EMBL feature with a location string 1..20 you'd get the start as 0 and the end as 20 in Biopython. This is using Python style slice notation - the start is inclusive and the end is exclusive meaning sequence[0:20] will give the first 20 bases as you would expect for this location. Peter From markbudde at gmail.com Tue May 28 22:03:06 2013 From: markbudde at gmail.com (Mark Budde) Date: Tue, 28 May 2013 15:03:06 -0700 Subject: [Biopython] SeqRecord slicing bug and fix Message-ID: There is a bug in the SeqRecord slicing behavior. The bug crops up on circular records with a feature spanning the beginning and end of the plasmid. Any slice outside of the feature will return the feature, and the feature.location.end is negative. >>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >>> record SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', dbxrefs=[]) >>> record.features [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), strand=1), type='misc_feature')] >>> record[500:600].features #This slice should contain no features [SeqFeature(FeatureLocation(ExactPosition(1799), ExactPosition(-300), strand=1), type='misc_feature')] This can be fixed by modifying line 453 of SeqRecord.py... from: if start <= f.location.nofuzzy_start \ and f.location.nofuzzy_end <= stop: to: if start <= f.location.nofuzzy_start \ and f.location.nofuzzy_end <= stop \ and f.location.nofuzzy_start <= f.location.nofuzzy_end: On a related note, is there an appropriate way to modify the position of a SeqFeature? I have been doing "feature.location._end = ExactPosition(newEnd)" , but I was under the impression that I shouldn't modify objects beginning with an underscore. -Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: pUC19_mod.gb Type: application/octet-stream Size: 3674 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Tue May 28 23:09:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 May 2013 00:09:25 +0100 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: > There is a bug in the SeqRecord slicing behavior. The bug crops up on > circular records with a feature spanning the beginning and end of the > plasmid. Any slice outside of the feature will return the feature, and the > feature.location.end is negative. > >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >>>> record > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', dbxrefs=[]) >>>> record.features > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), > strand=1), type='misc_feature')] The issue is you've got start > end, which arguably should raise an exception (there is a TODO in the code for that). Is this a circular record and a feature spanning the origin? > On a related note, is there an appropriate way to modify the position of a > SeqFeature? I have been doing "feature.location._end = > ExactPosition(newEnd)" , but I was under the impression that I shouldn't > modify objects beginning with an underscore. Yes, things starting with a single underscore should be regarded as private and not used. Currently that appears to be setup as a read only property (which you can change directly using feature.location._end = new_value) and right now I'm not sure why that was done, but it has been read only for since Bio.SeqFeature was first written 12 years ago. Maybe no one has asked till now? Peter From markbudde at gmail.com Wed May 29 01:22:54 2013 From: markbudde at gmail.com (Mark Budde) Date: Tue, 28 May 2013 18:22:54 -0700 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Tue, May 28, 2013 at 4:09 PM, Peter Cock wrote: > On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: > > There is a bug in the SeqRecord slicing behavior. The bug crops up on > > circular records with a feature spanning the beginning and end of the > > plasmid. Any slice outside of the feature will return the feature, and > the > > feature.location.end is negative. > > > >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') > >>>> record > > > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', > > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', > dbxrefs=[]) > >>>> record.features > > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), > > strand=1), type='misc_feature')] > > The issue is you've got start > end, which arguably should > raise an exception (there is a TODO in the code for that) Is this a circular record and a feature spanning the origin? > Yes, it is a circular plasmid with a feature spanning the origin. There are legitimate reasons to have features span the origin, so please do not raise an exception. I think the provided code is the best solution to the problem (and completely fixes the problems within my personal code when this is an issue), but would be interested in hearing other suggestions. > > > On a related note, is there an appropriate way to modify the position of > a > > SeqFeature? I have been doing "feature.location._end = > > ExactPosition(newEnd)" , but I was under the impression that I shouldn't > > modify objects beginning with an underscore. > > Yes, things starting with a single underscore should be > regarded as private and not used. Currently that appears > to be setup as a read only property (which you can change > directly using feature.location._end = new_value) and > right now I'm not sure why that was done, but it has been > read only for since Bio.SeqFeature was first written 12 > years ago. Maybe no one has asked till now? Well the alternative is to make a new feature and import all of the other atributes, but that seems like a lot of work for no practical gain. Thanks, Mark From p.j.a.cock at googlemail.com Wed May 29 09:26:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 May 2013 10:26:25 +0100 Subject: [Biopython] SeqRecord slicing bug and fix In-Reply-To: References: Message-ID: On Wed, May 29, 2013 at 2:22 AM, Mark Budde wrote: > On Tue, May 28, 2013 at 4:09 PM, Peter Cock > wrote: >> >> On Tue, May 28, 2013 at 11:03 PM, Mark Budde wrote: >> > There is a bug in the SeqRecord slicing behavior. The bug crops up on >> > circular records with a feature spanning the beginning and end of the >> > plasmid. Any slice outside of the feature will return the feature, and >> > the >> > feature.location.end is negative. >> > >> >>>> record = SeqIO.read('pUC19_mod.gb', 'genbank') >> >>>> record >> > >> > SeqRecord(seq=Seq('TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGA...GTC', >> > IUPACAmbiguousDNA()), id='pUC19', name='pUC19', description='', >> > dbxrefs=[]) >> >>>> record.features >> > [SeqFeature(FeatureLocation(ExactPosition(2299), ExactPosition(200), >> > strand=1), type='misc_feature')] >> >> The issue is you've got start > end, which arguably should >> raise an exception (there is a TODO in the code for that) >> >> Is this a circular record and a feature spanning the origin? > > Yes, it is a circular plasmid with a feature spanning the origin. There are > legitimate reasons to have features span the origin, The SeqFeature system is very heavily influenced by the GenBank/EMBL feature table, meaning you'd write this like join(2300...3000,1..200) where for the sake of argument I've assumed the genome is 3000 long. Currently join features are handled with sub_features, but this is about to change in the forthcoming Biopython 1.62 release which introduces CompoundLocation objects instead. If you fancy trying the latest Biopython from github, you should be able create this example with: wrap_location = FeatureLocation(2299, 3000) + FeatureLocation(0, 200) (Doing the equivalent with the current system in Biopython 1.61 or older is far more complicated) > ... so please do not raise > an exception. I think the provided code is the best solution to the problem > (and completely fixes the problems within my personal code when this is an > issue), but would be interested in hearing other suggestions. An exception if start > end would prevent downstream surprises in code like __getitem__ which assumes this. There are other problems which prevent allowing this - for example it is impossible to calculate the length of the location (and therefore the SeqFeature) without also knowing the circular genome's length. Likewise __contains__ (which is used for testing if an integer position is in a feature) and __iter__ (which is used for iterating over the integer positions within a feature) would break. Unfortunately your initial code suggestion would only solve a small subset of the feature location functionality. Basically trying to support wrapped features like this would require a *lot* of special case code, but add very little over the current join-based approach which also handles many other biological annotations nicely like spliced genes. >> > On a related note, is there an appropriate way to modify the position of >> > a SeqFeature? I have been doing "feature.location._end = >> > ExactPosition(newEnd)" , but I was under the impression that I shouldn't >> > modify objects beginning with an underscore. >> >> Yes, things starting with a single underscore should be >> regarded as private and not used. Currently that appears >> to be setup as a read only property (which you can change >> directly using feature.location._end = new_value) and >> right now I'm not sure why that was done, but it has been >> read only for since Bio.SeqFeature was first written 12 >> years ago. Maybe no one has asked till now? > > Well the alternative is to make a new feature and import all of the other > atributes, but that seems like a lot of work for no practical gain. No, you'd only need to create a new FeatureLocation and then feature.location = new_location but this is still a hassle, and definitely worth looking at changing. Thanks for raising this. One reason a read-only FeatureLocation *could* be nice is if we had any clever indexing code which could break if the locations were liable to change. But we don't have anything like that within Biopython at the moment. We should probably continue this on the biopython-dev list, in case there is a more practical downside to allowing the FeatureLocation start and end to be updated which I'm currently missing. Regards, Peter From chapmanb at 50mail.com Wed May 29 16:32:29 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 29 May 2013 12:32:29 -0400 Subject: [Biopython] gff3: feature.location.end problem In-Reply-To: References: Message-ID: <87d2s9zqzm.fsf@fastmail.fm> Mic; >> ##gff-version 3 >> ##sequence-region ID1 1 20 >> ID1 prediction gene 1 20 10.0 + . >> other=Some,annotations;ID=gene1 [...] >> I get the following output: >> ID1 1 20 >> gene1 0 20 >> >> Why is it not "gene1 0 19" and "ID1 0 19"? > That looks correct, just like when parsing a GenBank/EMBL > feature with a location string 1..20 you'd get the start as 0 > and the end as 20 in Biopython. This is using Python style > slice notation - the start is inclusive and the end is exclusive > meaning sequence[0:20] will give the first 20 bases as you > would expect for this location. Peter is right on with the conversion information: you expect this to be 0, 20. This is Python 0-based indexing so you convert from GFF 1-based by subtracting from the start base. The code wasn't doing anything special with the sequence-region directive which is why they stay as a raw parse of the test: 1 20. I agree it would be useful to convert these to 0-based for consistency. I pushed a fix which handles this as well: https://github.com/chapmanb/bcbb/commit/51e7f2742059608f98d948fca5b342a9edf9e7a8 Thanks for the feedback, Brad From amitbikram87 at gmail.com Thu May 30 04:34:32 2013 From: amitbikram87 at gmail.com (amit bikram) Date: Thu, 30 May 2013 10:04:32 +0530 Subject: [Biopython] Error in biopython Message-ID: Hi Peter, I tried what u have written but it is not coming. here is the error >>> import Bio >>> print Bio._file_ Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute '_file_' >>> print Bio_version_ Traceback (most recent call last): File "", line 1, in NameError: name 'Bio_version_' is not defined >>> with regards Amit From p.j.a.cock at googlemail.com Thu May 30 08:26:38 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 09:26:38 +0100 Subject: [Biopython] Error in biopython In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 5:34 AM, amit bikram wrote: > Hi Peter, > > I tried what u have written but it is not coming. here is the error > >>>> import Bio >>>> print Bio._file_ > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute '_file_' >>>> print Bio_version_ > Traceback (most recent call last): > File "", line 1, in > NameError: name 'Bio_version_' is not defined >>>> Hi Amit, There are some subtle and important differences: import Bio print Bio.__file__ print Bio.__version__ Those are all double underscores, and also there should be a dot between the Bio and __version__. You'll find double underscores are used in Python for some special functions (which normally you won't need to worry about). The good news is the simple import seemed to work - so if you can try again that will help answer your original question about this failing: from Bio.Seq import Seq Right now my guess would be a broken install, or you've got a file in the current directory called Bio.py which is being imported instead of Biopython. Regards, Peter From p.j.a.cock at googlemail.com Thu May 30 10:19:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 11:19:15 +0100 Subject: [Biopython] Error in biopython In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 11:07 AM, amit bikram wrote: > Hi Peter, > > I tried again i got it now, now it is working fine. > Thank u... > > Regards > Amit Great :) Peter From p.j.a.cock at googlemail.com Thu May 30 11:46:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 12:46:37 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? Message-ID: Dear Biopythoneers, For the forthcoming Biopython 1.62 release, we are planning to officially support Python 3 (as well as Python 2, including PyPy, and Jython). However, which versions of Python 3 would people want to use? One possibility is we'd require at least Python 3.2.5 (which would simplify dealing with things broken in older releases of Python 3). Alternatively, would it be acceptable to insist on at least Python 3.3 for example? If you are interested in running Biopython under Python 3 (which you can already try out), please could you reply with what version of Python 3 you have installed, and if being required to update would be a problem or not. Thank you, Peter From xwchen at yeah.net Thu May 30 12:09:38 2013 From: xwchen at yeah.net (=?UTF-8?B?6ZmI5pmT5paH?=) Date: Thu, 30 May 2013 20:09:38 +0800 (CST) Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: <28aebaa4.a451.13ef557e533.Coremail.xwchen@yeah.net> hi Peter, python 3.3 is being used. Thanks. -- ???? Xiaowen Chen Institute of Hydrobiology, Chinese Academy of Sciences #7 Donghu South Rd, Wuhan, Hubei, 430072, P. R. China At 2013-05-30 19:46:37,"Peter Cock" wrote: >Dear Biopythoneers, > >For the forthcoming Biopython 1.62 release, we are planning to >officially support Python 3 (as well as Python 2, including PyPy, >and Jython). However, which versions of Python 3 would people >want to use? > >One possibility is we'd require at least Python 3.2.5 (which would >simplify dealing with things broken in older releases of Python 3). > >Alternatively, would it be acceptable to insist on at least Python 3.3 >for example? > >If you are interested in running Biopython under Python 3 >(which you can already try out), please could you reply with >what version of Python 3 you have installed, and if being >required to update would be a problem or not. > >Thank you, > >Peter >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From w.arindrarto at gmail.com Thu May 30 12:51:44 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 30 May 2013 14:51:44 +0200 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi everyone, > For the forthcoming Biopython 1.62 release, we are planning to > officially support Python 3 (as well as Python 2, including PyPy, > and Jython). However, which versions of Python 3 would people > want to use? > > One possibility is we'd require at least Python 3.2.5 (which would > simplify dealing with things broken in older releases of Python 3). > > Alternatively, would it be acceptable to insist on at least Python 3.3 > for example? > > If you are interested in running Biopython under Python 3 > (which you can already try out), please could you reply with > what version of Python 3 you have installed, and if being > required to update would be a problem or not. I'm leaning towards insisting on Python >=3.3 support (I'm running 3.3.2). I suppose that even if Python3.3 is not available on a machine or through the default package manager, it's always installable on its own. If that's not the case, I imagine Python2.x is most likely present in these machines (so Biopython can still be used). On a related note, do we have a defined timeline on when we would drop support for Python2.x? Are there any plans to have our codebase written in Python3.x instead of Python2.x? Best, Bow From p.j.a.cock at googlemail.com Thu May 30 13:13:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:13:20 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Thank you for all the comments so far, don't stop yet :) On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto wrote: > Hi everyone, > > I'm leaning towards insisting on Python >=3.3 support (I'm running > 3.3.2). I suppose that even if Python3.3 is not available on a machine > or through the default package manager, it's always installable on its > own. If that's not the case, I imagine Python2.x is most likely > present in these machines (so Biopython can still be used). True. So far everyone who has replied (including some off list) have said they are using Python 3.3 which is encouraging. Thank you for the comments so far. It looks like we can forget about Python 3.1, and just need to decide if it is worth including Python 3.2.5 in the short term. > On a related note, do we have a defined timeline on when we would drop > support for Python2.x? Are there any plans to have our codebase > written in Python3.x instead of Python2.x? Nothing concrete planned, no. I'll reply in more detail on the biopython-dev list as I do have some thoughts about this. Regards, Peter From ivangreg at gmail.com Thu May 30 13:23:43 2013 From: ivangreg at gmail.com (Ivan Gregoretti) Date: Thu, 30 May 2013 09:23:43 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi Bow, I think that we should drop support for Python 2.x once it is left out in favour of Python 3. I am not aware of any major linux distrubution that uses Python 3 as default. By major linux distribution I mean Debian, Ubuntu, CentOS, Fedora and Red Hat Enterprise Linux. Out of all the five distributions listed above, most administrators use CentOS. Perhaps we should schedule Python v2.x support to be dropped when CentOS switches to Python 3. That is likely to happen a long time from now. Thank you, Ivan Ivan Gregoretti, PhD Bioinformatics On Thu, May 30, 2013 at 8:51 AM, Wibowo Arindrarto wrote: > Hi everyone, > >> For the forthcoming Biopython 1.62 release, we are planning to >> officially support Python 3 (as well as Python 2, including PyPy, >> and Jython). However, which versions of Python 3 would people >> want to use? >> >> One possibility is we'd require at least Python 3.2.5 (which would >> simplify dealing with things broken in older releases of Python 3). >> >> Alternatively, would it be acceptable to insist on at least Python 3.3 >> for example? >> >> If you are interested in running Biopython under Python 3 >> (which you can already try out), please could you reply with >> what version of Python 3 you have installed, and if being >> required to update would be a problem or not. > > I'm leaning towards insisting on Python >=3.3 support (I'm running > 3.3.2). I suppose that even if Python3.3 is not available on a machine > or through the default package manager, it's always installable on its > own. If that's not the case, I imagine Python2.x is most likely > present in these machines (so Biopython can still be used). > > On a related note, do we have a defined timeline on when we would drop > support for Python2.x? Are there any plans to have our codebase > written in Python3.x instead of Python2.x? > > Best, > Bow > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu May 30 13:36:36 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:36:36 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 2:23 PM, Ivan Gregoretti wrote: > Hi Bow, > > I think that we should drop support for Python 2.x once it is left out > in favour of Python 3. > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > > Out of all the five distributions listed above, most administrators > use CentOS. Perhaps we should schedule Python v2.x support to be > dropped when CentOS switches to Python 3. That is likely to happen a > long time from now. > > Thank you, > > Ivan I agree that dropping Python 2 support is still a long way away :) I was particularly hoping to find out if many people are using Python 3.2 (or worse, Python 3.1) or if we can assume Python 3.3 or later. Thanks, Peter From nlindberg at mkei.org Thu May 30 13:34:56 2013 From: nlindberg at mkei.org (Nick Lindberg) Date: Thu, 30 May 2013 13:34:56 +0000 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: Message-ID: Hello, The current production versions are Python 2.7.5 and Python 3.3.2. Per Ivan's comments, I do not think you should drop support for 2.x when adding for 3.x. However, I think it's completely fair that you should require the latest current production version of each (or at least put the caveat in that it's only been tested on 2.7.* and 3.3.*. If you need any help with the testing or compatibility, I'd be glad to help. Also, for any Mac users out there, I am working on a Homebrew formula for a package-install of Biopython. More details incoming. Thanks, Nick Lindberg Sr. Consulting Engineer, HPC Milwaukee Institute 414.727.6413 (W) http://www.mkei.org On 5/30/13 8:23 AM, "Ivan Gregoretti" wrote: >Hi Bow, > >I think that we should drop support for Python 2.x once it is left out >in favour of Python 3. > >I am not aware of any major linux distrubution that uses Python 3 as >default. By major linux distribution I mean Debian, Ubuntu, CentOS, >Fedora and Red Hat Enterprise Linux. > >Out of all the five distributions listed above, most administrators >use CentOS. Perhaps we should schedule Python v2.x support to be >dropped when CentOS switches to Python 3. That is likely to happen a >long time from now. > >Thank you, > >Ivan > > > > >Ivan Gregoretti, PhD >Bioinformatics > > >On Thu, May 30, 2013 at 8:51 AM, Wibowo Arindrarto > wrote: >> Hi everyone, >> >>> For the forthcoming Biopython 1.62 release, we are planning to >>> officially support Python 3 (as well as Python 2, including PyPy, >>> and Jython). However, which versions of Python 3 would people >>> want to use? >>> >>> One possibility is we'd require at least Python 3.2.5 (which would >>> simplify dealing with things broken in older releases of Python 3). >>> >>> Alternatively, would it be acceptable to insist on at least Python 3.3 >>> for example? >>> >>> If you are interested in running Biopython under Python 3 >>> (which you can already try out), please could you reply with >>> what version of Python 3 you have installed, and if being >>> required to update would be a problem or not. >> >> I'm leaning towards insisting on Python >=3.3 support (I'm running >> 3.3.2). I suppose that even if Python3.3 is not available on a machine >> or through the default package manager, it's always installable on its >> own. If that's not the case, I imagine Python2.x is most likely >> present in these machines (so Biopython can still be used). >> >> On a related note, do we have a defined timeline on when we would drop >> support for Python2.x? Are there any plans to have our codebase >> written in Python3.x instead of Python2.x? >> >> Best, >> Bow >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >_______________________________________________ >Biopython mailing list - Biopython at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Thu May 30 13:54:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:54:41 +0100 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Hi Nick, On Thu, May 30, 2013 at 2:34 PM, Nick Lindberg wrote: > Hello, > > The current production versions are Python 2.7.5 and Python 3.3.2. Per > Ivan's comments, I do not think you should drop support for 2.x when > adding for 3.x. However, I think it's completely fair that you should > require the latest current production version of each (or at least put the > caveat in that it's only been tested on 2.7.* and 3.3.*. If I was unclear, we not talking about dropping support for Python 2 any time soon - the plan for Biopython 1.62 is to cover at least Python 2.5, 2.6, 2.7 and 3.3. We will continue to support Python 2 for a long time (i.e. at least one year, maybe more). What I'm hoping to confirm is that very few people care about Python 3.2 support - its good to know that so far everyone is happy to target Python 3.3 onwards. > If you need any help with the testing or compatibility, I'd be glad to > help. > > Also, for any Mac users out there, I am working on a Homebrew formula for > a package-install of Biopython. More details incoming. > > Thanks, > > Nick Lindberg Periodically running the unit tests on the latest code from git and reporting any issues is always a good idea. We do try to do this automatically via TravisCI and BuildBot, but this only covers a fraction of the possible configuration permutations. Testing under recent Windows machines including 64 bit Windows would most appreciated - but requires more background knowledge to setup the relevant compiler environments. If you'd be interested in that or helping setup another buildslave which we can run nightly tests on, please write to us on the biopython-dev list for more details. Thanks, Peter From p.j.a.cock at googlemail.com Thu May 30 13:56:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 14:56:52 +0100 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: On Mon, May 27, 2013 at 6:58 PM, francesco chiani wrote: > Hi Peter, > As you said the input is a list look like > unigene_list=[" Z26634","....",".."] > The output another list gene_symbol =["ANK2", "...",".."] > Sorry, I've not had a change to try this - but I would start by looking at the NCBI Entrez elink functionality. Peter From p.j.a.cock at googlemail.com Thu May 30 16:18:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 17:18:41 +0100 Subject: [Biopython] Biopython projects with NESCent for GSoC 2013 In-Reply-To: References: Message-ID: Dear all, After the disappointing news that the Open Bioinformatics Foundation (OBF) was not accepted as a Google Summer of Code (GSoC) organisation this year, Biopython was fortunate to once again offer some projects with the NESCent team: http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 As always the student proposals have been very competitive, and we've not been able to take on everyone. This year NESCent was fortunately to be able to accept seven students through GSoC and one through the GNOME Outreach Program for Women. Two of these GSoC projects are Biopython related: Codon Alignment and Analysis in Biopython Student: Zheng Ruan Mentors: Eric Talevich, Peter Cock http://www.google-melange.com/gsoc/project/google/gsoc2013/rzzmh12345/32001 Phylogenetics in Biopython: Filling in the gaps Student: Yanbo Ye http://www.google-melange.com/gsoc/project/google/gsoc2013/yeyanbo/45001 Mentors: Mark Holder, Jeet Sukumaran, Eric Talevich Thank you NESCent, and congratulations to Zheng Ruan and Yanbo Ye! I'm hoping you're already setting up a blog, which I hope you'll be able to use for roughly weekly progress reports during the summer - CC'd to the biopython-dev mailing list and the NESCent Phyloinformatics Summer of Code forum on Google+, http://lists.open-bio.org/mailman/listinfo/biopython-dev https://plus.google.com/communities/105828320619238393015 An introduction to your project would be a great idea for your first post - here's Bow's from last year as an example: http://bow.web.id/blog/2012/04/google-summer-of-code-is-on/ http://bow.web.id/blog/2012/08/summers-over/ http://bow.web.id/blog/tag/gsoc/ The idea here is to keep the wider community informed about how your project is going. On behalf of the Biopython developers, congratulations! We're looking forward to another productive Summer of Code :) Peter From eric.talevich at gmail.com Thu May 30 16:26:43 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 30 May 2013 12:26:43 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 9:13 AM, Peter Cock wrote: > Thank you for all the comments so far, don't stop yet :) > > On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto > wrote: > > Hi everyone, > > > > I'm leaning towards insisting on Python >=3.3 support (I'm running > > 3.3.2). I suppose that even if Python3.3 is not available on a machine > > or through the default package manager, it's always installable on its > > own. If that's not the case, I imagine Python2.x is most likely > > present in these machines (so Biopython can still be used). > > True. > > So far everyone who has replied (including some off list) have said > they are using Python 3.3 which is encouraging. Thank you for > the comments so far. > > It looks like we can forget about Python 3.1, and just need to > decide if it is worth including Python 3.2.5 in the short term. > I don't always use Python 3.x, but when I do, I use the latest release (3.3 now). I wonder which one PyPy will target -- I assume they'll try to support the most recent syntax. For anything that needs to be run on machines I don't control, I still target 2.7, though I hope to switch to 3 this year. -Eric From francesco.chiani at gmail.com Thu May 30 16:40:21 2013 From: francesco.chiani at gmail.com (francesco chiani) Date: Thu, 30 May 2013 18:40:21 +0200 Subject: [Biopython] converting unigene list in gene name list In-Reply-To: References: Message-ID: Hi Peter, Dont worry, I've just managed in that way: import Bio from Bio import Entrez import xlrd import xlwt name_list=["M75161","Z26634","X02308"] Entrez.email="youremaiil at something.XXX" id_list=[] for a in range(0, len(name_list)): handle=Entrez.esearch(db="unigene", term=listanomi[a]) record=Entrez.read(handle) id_list.append(record["IdList"]) gene_symbol_list=[] for i in range(0, len(id_list)): handle=Entrez.esummary(db='unigene',id=id_list[i]) record=Entrez.read(handle) gene_symbol_list.append(record[0]["GENE"]) print record[0]["GENE"] It may need some improvement for sure (for example a try/raise function for the wrong unigenes at list) but it works. Thanks anyway for your help Francesco 2013/5/30 Peter Cock > On Mon, May 27, 2013 at 6:58 PM, francesco chiani > wrote: > > Hi Peter, > > As you said the input is a list look like > > unigene_list=[" Z26634","....",".."] > > The output another list gene_symbol =["ANK2", "...",".."] > > > > Sorry, I've not had a change to try this - but I would start > by looking at the NCBI Entrez elink functionality. > > Peter > -- PhD Francesco Chiani CNR -- National Research Council of Italy Cell Biology and Neurobiology Institute (IBCN) Campus "A. Buzzati-Traverso" 32 via E. Ramarini 00015 Monterotondo Scalo (Roma) Italy tel: +39-0690091308 fax: +39-0690091260 From llewelr at gmail.com Thu May 30 16:58:03 2013 From: llewelr at gmail.com (Richard Llewellyn) Date: Thu, 30 May 2013 10:58:03 -0600 Subject: [Biopython] Fwd: Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: Oops meant for the list. ---------- Forwarded message ---------- From: Richard Llewellyn Date: Thu, May 30, 2013 at 10:57 AM Subject: Re: [Biopython] Versions of Python 3 to support in Biopython? To: Eric Talevich Moved entirely to Python 3.3 (well, only scikits-learn still might require 2.7), so not supporting earlier versions of Py 3 fine by me. Thanks so much. On Thu, May 30, 2013 at 10:26 AM, Eric Talevich wrote: > On Thu, May 30, 2013 at 9:13 AM, Peter Cock >wrote: > > > Thank you for all the comments so far, don't stop yet :) > > > > On Thu, May 30, 2013 at 1:51 PM, Wibowo Arindrarto > > wrote: > > > Hi everyone, > > > > > > I'm leaning towards insisting on Python >=3.3 support (I'm running > > > 3.3.2). I suppose that even if Python3.3 is not available on a machine > > > or through the default package manager, it's always installable on its > > > own. If that's not the case, I imagine Python2.x is most likely > > > present in these machines (so Biopython can still be used). > > > > True. > > > > So far everyone who has replied (including some off list) have said > > they are using Python 3.3 which is encouraging. Thank you for > > the comments so far. > > > > It looks like we can forget about Python 3.1, and just need to > > decide if it is worth including Python 3.2.5 in the short term. > > > > I don't always use Python 3.x, but when I do, I use the latest release (3.3 > now). I wonder which one PyPy will target -- I assume they'll try to > support the most recent syntax. > > For anything that needs to be run on machines I don't control, I still > target 2.7, though I hope to switch to 3 this year. > > -Eric > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From mmokrejs at fold.natur.cuni.cz Thu May 30 18:39:50 2013 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Thu, 30 May 2013 20:39:50 +0200 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: <51A79CF6.5010705@fold.natur.cuni.cz> Ivan Gregoretti wrote: > Hi Bow, > > I think that we should drop support for Python 2.x once it is left out > in favour of Python 3. > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > > Out of all the five distributions listed above, most administrators > use CentOS. Perhaps we should schedule Python v2.x support to be > dropped when CentOS switches to Python 3. That is likely to happen a > long time from now. Hi, I don't think 2.x supports needs to be scheduled for the sake of having a deadline. It is only meaningful to come up with a date *once the extra development overhead is NOT acceptable anymore*. For sure by that time the yet to be determined time window will be of some reasonable width so that people start screaming up. But that is far from today. In general, I am very opposed to deprecations of any programming language API. This just puts a wasteful burden on developers to rewrite their possibly years working apps just due to API change. Some people even don't want to touch certains parts of their code. Or they don't have time to do that or stopped changing their fully working application ... hence why should they re-start their work if for several years they did NOT need to touch the code? In such cases when developer does not take an action the API change kills the existing and working application because people just don't use it anymore on incompatible systems. Seriously, look at mod_python. Clearly, for my purposes mod_python was and still is just enough, runs on my servers for 8-10 years and I do NOT care that it is NOW not developed anymore. I do not care about tiny implementation details which resulted in abandoning of the project because I just do NOT need those missing features. But, changing my apps, config files, server config files, re-doing the testing is just a blocker for me. I rather stop upgrading the system because the transition gives me nothing. And I am not the only one. Look into comments at https://bugs.gentoo.org/show_bug.cgi?id=343663 and then poke towards URLs in https://bugs.gentoo.org/show_bug.cgi?id=343663#c6 . There you can read about mod_python but the homework is where else should you move! Similarly, anybody willing to add PHP-5.4 support to EST2uni which was written for 5.3 (http://cichlid.umd.edu/est2uni/)? In summary, somebody should really make a table listing of anticipated 3rd-party python-based apps which are likely to be imported into python code along biopython. More pragmatically, somebody please make sure your future decisions are not conflicting with numpy+matplolib. Personally I want to play in near future with these to post-process/compile python code (incl. biopython): http://www.nuitka.net/ (supports 2.5, 2.6, 3.2) https://github.com/astrand/pyobfuscate http://python.net/crew/atuining/cx_Freeze (supports 2.3, 2.4, 2.5) http://bitboost.com/python-obfuscator/manual (supports 2.5, partially 2.6 and 2.7) But I have read other answers which landed on the biopython's list meanwhile and glad to hear from Peter that the original question was really about what 3.x version should be supported and NOT about stopping certain python 2.x compatibility. That's good to hear and thanks for keeping the compatibility so far. ;) Martin From arklenna at gmail.com Thu May 30 19:21:30 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 30 May 2013 15:21:30 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: References: Message-ID: On Thu, May 30, 2013 at 9:23 AM, Ivan Gregoretti wrote: > > I am not aware of any major linux distrubution that uses Python 3 as > default. By major linux distribution I mean Debian, Ubuntu, CentOS, > Fedora and Red Hat Enterprise Linux. > I think this is a point worth reiterating - very few people will be "stuck" using Python 3.1 because unlike old versions of 2.x, 3 isn't shipped with distros yet. So anyone who chooses to use Python 3.x should be able to install 3.3. Cheers, Lenna From ajingnk at gmail.com Thu May 30 19:33:40 2013 From: ajingnk at gmail.com (Jing Lu) Date: Thu, 30 May 2013 15:33:40 -0400 Subject: [Biopython] How to make best guest of protein class from protein sequence by biopython Message-ID: Hi Biopythonian, I am not quit sure what the best way is to predict the name of protein class (e.g. Oxidoreductases) from protein sequence by biopython. Just Blast to the whole PDB and read some attributes of the result? I am not every familiar with modules in biopython.. Thanks, Jing From jmtc21 at bath.ac.uk Thu May 30 19:48:59 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 30 May 2013 20:48:59 +0100 Subject: [Biopython] Problem parsing embl files Message-ID: <51A7AD2B.1000303@bath.ac.uk> Hi all, Is the first time I try to parse embl files with biopython. I'm trying to get the gene ids and coordinates for start/end of each gene. I thought it will be straight forward like with other annotation files, so I did a small script to test it. from Bio import SeqIO if __name__ == '__main__': handle = open("sctg_0.embl", "r") records = SeqIO.parse(handle, "embl") for record in records : print(record) But when running the script I get an error which may suggest the embl files have an issue ValueError: Premature end of features table, marker '//' found I checked the source code of the parser and seems the embl file has problems, but when I checked embl file format seems they are ok. I have a few thousand files formatted in the same way. So can't think about other way to deal with the problem but to parse them. The annotation files have only annotation info, no sequences. Here I uploaded an example. http://depositfiles.com/files/481uob95e I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. Any advice and suggestion will be greatly appreciated. Jaime. From p.j.a.cock at googlemail.com Thu May 30 22:03:21 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 30 May 2013 23:03:21 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: <51A7AD2B.1000303@bath.ac.uk> References: <51A7AD2B.1000303@bath.ac.uk> Message-ID: On Thu, May 30, 2013 at 8:48 PM, Jaime Tovar wrote: > Hi all, > > Is the first time I try to parse embl files with biopython. I'm trying to > get the gene ids and coordinates for start/end of each gene. > > I thought it will be straight forward like with other annotation files, so I > did a small script to test it. > > from Bio import SeqIO > if __name__ == '__main__': > handle = open("sctg_0.embl", "r") > records = SeqIO.parse(handle, "embl") > for record in records : > print(record) > > But when running the script I get an error which may suggest the embl files > have an issue > > ValueError: Premature end of features table, marker '//' found > > I checked the source code of the parser and seems the embl file has > problems, but when I checked embl file format seems they are ok. If they are like your example, they are a bit unusual. > I have a > few thousand files formatted in the same way. So can't think about other way > to deal with the problem but to parse them. > > The annotation files have only annotation info, no sequences. Here I > uploaded an example. > > http://depositfiles.com/files/481uob95e > > I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. > > Any advice and suggestion will be greatly appreciated. > > Jaime. Hi Jamie, For sharing plain text files, http://gist.github.com is a nicer option. The problem is your file looks like this: ID sctg_0 standard; DNA; DIV; 3745584 BP. XX AC sctg_0; XX FH Key Location/Qualifiers FH FT CDS 302..490 FT /note="EuGene predicted gene nr: Esi0000_0001" ... FT mRNA complement(3744791..3745584) FT /note="EuGene predicted gene nr: Esi0000_0662" // The parser is expecting an SQ line after the FT lines before the // As you said, your files lack any sequence information - is that deliberate? This is not something I've seen before, but we can probably modify the EMBL parser to cope with this - much like how GenBank files can omit the actual sequence data. On the other hand, the SQ line is not defined as optional so perhaps we are doing the right thing and rejecting an invalid file? ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt Where did your EMBL format file come from? Thanks, Peter From jmtc21 at bath.ac.uk Thu May 30 22:55:28 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 30 May 2013 23:55:28 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: References: <51A7AD2B.1000303@bath.ac.uk> Message-ID: <51A7D8E0.3000704@bath.ac.uk> Hi Peter, I checked a similar version with the description of the embl format. They are bit ambiguous, I think. From the definition we have: XX - spacer line (many per entry) SQ - sequence header (1 per entry) CO - contig/construct line (0 or >=1 per entry) bb - (blanks) sequence data (>=1 per entry) // - termination line (ends each entry; 1 per entry) At first I read SQ ... (1 per entry) and thought it meant there most be one of them. And similar situations for the rest (many per entry, >=1). But for example from the same definition we have: DT - date (2 per entry) But my file doest not have DT and the parser was not complaining about it, so it made me think maybe I was doing something wrong. To be honest I can't say I'm sure if it means there should be a SQ or if it is optional but can only show once per entry. The files are not mine. Are third party files I got from another researcher, who in turn got them from someone else, so... They are annotations for algae contigs as far as I know. Not sure why they don't have the sequence part. To be honest I don't know if it is worth making changes to the parser. I can't say these files are actually well formatted. Maybe someone with more experience with embl files can give a second opinion. You think I can cheat the parser if I just 'sed' my embl files and replace the \\ with something like: """XX SQ //""" I didn't know github had gist :) I have some animadversion against github so I never use them :D Thanks for the help! Jaime. On 30/05/2013 23:03, Peter Cock wrote: > On Thu, May 30, 2013 at 8:48 PM, Jaime Tovar wrote: >> Hi all, >> >> Is the first time I try to parse embl files with biopython. I'm trying to >> get the gene ids and coordinates for start/end of each gene. >> >> I thought it will be straight forward like with other annotation files, so I >> did a small script to test it. >> >> from Bio import SeqIO >> if __name__ == '__main__': >> handle = open("sctg_0.embl", "r") >> records = SeqIO.parse(handle, "embl") >> for record in records : >> print(record) >> >> But when running the script I get an error which may suggest the embl files >> have an issue >> >> ValueError: Premature end of features table, marker '//' found >> >> I checked the source code of the parser and seems the embl file has >> problems, but when I checked embl file format seems they are ok. > If they are like your example, they are a bit unusual. > >> I have a >> few thousand files formatted in the same way. So can't think about other way >> to deal with the problem but to parse them. >> >> The annotation files have only annotation info, no sequences. Here I >> uploaded an example. >> >> http://depositfiles.com/files/481uob95e >> >> I'm using python 2.7.4 and biopython 1.61 on a win x64 computer. >> >> Any advice and suggestion will be greatly appreciated. >> >> Jaime. > Hi Jamie, > > For sharing plain text files, http://gist.github.com is a nicer option. > > The problem is your file looks like this: > > ID sctg_0 standard; DNA; DIV; 3745584 BP. > XX > AC sctg_0; > XX > FH Key Location/Qualifiers > FH > FT CDS 302..490 > FT /note="EuGene predicted gene nr: Esi0000_0001" > ... > FT mRNA complement(3744791..3745584) > FT /note="EuGene predicted gene nr: Esi0000_0662" > // > > The parser is expecting an SQ line after the FT lines before the // > As you said, your files lack any sequence information - is that deliberate? > > This is not something I've seen before, but we can probably > modify the EMBL parser to cope with this - much like how > GenBank files can omit the actual sequence data. > > On the other hand, the SQ line is not defined as optional > so perhaps we are doing the right thing and rejecting an > invalid file? ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt > > Where did your EMBL format file come from? > > Thanks, > > Peter From p.j.a.cock at googlemail.com Fri May 31 08:43:28 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 31 May 2013 09:43:28 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: <51A7D8E0.3000704@bath.ac.uk> References: <51A7AD2B.1000303@bath.ac.uk> <51A7D8E0.3000704@bath.ac.uk> Message-ID: On Thu, May 30, 2013 at 11:55 PM, Jaime Tovar wrote: > Hi Peter, > > I checked a similar version with the description of the embl format. They > are bit ambiguous, I think. From the definition we have: > > XX - spacer line (many per entry) > SQ - sequence header (1 per entry) > CO - contig/construct line (0 or >=1 per entry) > bb - (blanks) sequence data (>=1 per entry) > // - termination line (ends each entry; 1 per entry) > > At first I read SQ ... (1 per entry) and thought it meant there most be one > of them. And similar situations for the rest (many per entry, >=1). But for > example from the same definition we have: > > DT - date (2 per entry) > > But my file doest not have DT and the parser was not complaining about it, > so it made me think maybe I was doing something wrong. To be honest I can't > say I'm sure if it means there should be a SQ or if it is optional but can > only show once per entry. Well yes, it does seem that missing DT lines is also technically invalid - but coping with that was quite simple. Missing a sequence in a sequence centric file format is rather more important ;) > The files are not mine. Are third party files I got from another researcher, > who in turn got them from someone else, so... They are annotations for algae > contigs as far as I know. Not sure why they don't have the sequence part. I would be interested to know how the files were prepared (e.g. which tool produced them), but this isn't vital. > To be honest I don't know if it is worth making changes to the parser. I > can't say these files are actually well formatted. Maybe someone with more > experience with embl files can give a second opinion. Good idea - anyone? > You think I can cheat the parser if I just 'sed' my embl files and replace > the \\ with something like: > > """XX > SQ > > > //""" Possibly - you'd need to do a little experimenting to find out the bare minimum that would allow the parser to continue without code changes. Peter From c0d3g33k at gmail.com Fri May 31 16:31:05 2013 From: c0d3g33k at gmail.com (c0d3g33k) Date: Fri, 31 May 2013 12:31:05 -0400 Subject: [Biopython] Versions of Python 3 to support in Biopython? In-Reply-To: <51A79CF6.5010705@fold.natur.cuni.cz> References: <51A79CF6.5010705@fold.natur.cuni.cz> Message-ID: <51A8D049.2060705@gmail.com> Hi Martin, On 5/30/2013 2:39 PM, Martin Mokrejs wrote: > > In general, I am very opposed to deprecations of any programming language > API. This just puts a wasteful burden on developers to rewrite their possibly > years working apps just due to API change. Some people even don't want to touch > certains parts of their code. Or they don't have time to do that or stopped > changing their fully working application ... hence why should they re-start > their work if for several years they did NOT need to touch the code? In such > cases when developer does not take an action the API change kills the existing > and working application because people just don't use it anymore on incompatible > systems. I fully understand the sentiment - change for the sake of change is unwelcome. Bear in mind, though, that this is a discussion about versions of Python to support in *future* releases of Biopython. A developer as conservatively paranoid as you describe isn't going to be tracking the bleeding edge of Biopython unless he really enjoys being self-contradictory. These days, if stability is a high priority and you aren't using virtualenv (http://www.virtualenv.org), "You're Doing It Wrong". Set up a stable virtual environment for that application that's been working for years and tested within an inch of it's life and have done with it. Let the Biopython developers move carefully forward without having to drag the chains of sins past along with them forever like Jacob Marley in A Christmas Carol. From jmtc21 at bath.ac.uk Fri May 31 19:12:32 2013 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Fri, 31 May 2013 20:12:32 +0100 Subject: [Biopython] Problem parsing embl files In-Reply-To: References: <51A7AD2B.1000303@bath.ac.uk> <51A7D8E0.3000704@bath.ac.uk> Message-ID: <51A8F620.3040807@bath.ac.uk> Thanks Peter, I found gff3 files I can easily parse for the data I need. So will leave this strange embl files alone. If someone with more experience with embl files wants to take a look at them to check the parser let me know and I will forward some sample files. I asked the people who gave me the files if they know what kind of software they used to generate them. But since they are third party data they have no information. I was lucky they had also gff3 files to get gene annotation data. Jaime. On 31/05/2013 09:43, Peter Cock wrote: > On Thu, May 30, 2013 at 11:55 PM, Jaime Tovar wrote: >> Hi Peter, >> >> I checked a similar version with the description of the embl format. They >> are bit ambiguous, I think. From the definition we have: >> >> XX - spacer line (many per entry) >> SQ - sequence header (1 per entry) >> CO - contig/construct line (0 or >=1 per entry) >> bb - (blanks) sequence data (>=1 per entry) >> // - termination line (ends each entry; 1 per entry) >> >> At first I read SQ ... (1 per entry) and thought it meant there most be one >> of them. And similar situations for the rest (many per entry, >=1). But for >> example from the same definition we have: >> >> DT - date (2 per entry) >> >> But my file doest not have DT and the parser was not complaining about it, >> so it made me think maybe I was doing something wrong. To be honest I can't >> say I'm sure if it means there should be a SQ or if it is optional but can >> only show once per entry. > Well yes, it does seem that missing DT lines is also technically invalid - > but coping with that was quite simple. Missing a sequence in a sequence > centric file format is rather more important ;) > >> The files are not mine. Are third party files I got from another researcher, >> who in turn got them from someone else, so... They are annotations for algae >> contigs as far as I know. Not sure why they don't have the sequence part. > I would be interested to know how the files were prepared (e.g. which > tool produced them), but this isn't vital. > >> To be honest I don't know if it is worth making changes to the parser. I >> can't say these files are actually well formatted. Maybe someone with more >> experience with embl files can give a second opinion. > Good idea - anyone? > >> You think I can cheat the parser if I just 'sed' my embl files and replace >> the \\ with something like: >> >> """XX >> SQ >> >> >> //""" > Possibly - you'd need to do a little experimenting to find out the bare > minimum that would allow the parser to continue without code changes. > > Peter From jordan.r.willis at Vanderbilt.Edu Fri May 31 21:15:48 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 31 May 2013 21:15:48 +0000 Subject: [Biopython] Custom Distance Matrices using Custom Scoring Matrices Message-ID: Hello Bio, I know I asked something like this a while ago but didn't really know what I needed to do. Now I think I know exactly the path, however the solution is unclear. The goal is to compare sequences all of the same length and view them in a dendrogram. In one tree I would like it to be scored with something simple like a PAM250 matrix. In another dendrogram, I would like the tree to be scored with my own custom position-specific scoring matrix. So it looks like I can use a neighbor-joining method using a distance matrix where the distance matrix will be all my sequences scored against each other using either the PAM250 or my custom matrix. Now, does Biopython have the means to do this? I can quickly write a method to score all my sequences against each other using PAM250 or my PSSM and store it in some sort of dictionary. Can I then convert that dictionary to a distance matrix to be used in neighbor joining? Is there a method to write out a newick tree using neighbor joining? Should I even be using Biopython? Thanks so much! Jordan From jordan.r.willis at Vanderbilt.Edu Fri May 31 22:59:26 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 31 May 2013 22:59:26 +0000 Subject: [Biopython] Custom Distance Matrices using Custom Scoring Matrices In-Reply-To: References: Message-ID: Hello, I think my solution is to use a neighbor joining from the PHYLIP package. You can define distance matrices yourself, which I will write using biopython, but I don't think that it has been done before. If I get something nice and stable, I will contribute to the devel branch of biopython. Jordan On May 31, 2013, at 4:15 PM, "Willis, Jordan R" wrote: > Hello Bio, > > I know I asked something like this a while ago but didn't really know what I needed to do. Now I think I know exactly the path, however the solution is unclear. > > The goal is to compare sequences all of the same length and view them in a dendrogram. In one tree I would like it to be scored with something simple like a PAM250 matrix. In another dendrogram, I would like the tree to be scored with my own custom position-specific scoring matrix. > > So it looks like I can use a neighbor-joining method using a distance matrix where the distance matrix will be all my sequences scored against each other using either the PAM250 or my custom matrix. > > Now, does Biopython have the means to do this? I can quickly write a method to score all my sequences against each other using PAM250 or my PSSM and store it in some sort of dictionary. Can I then convert that dictionary to a distance matrix to be used in neighbor joining? Is there a method to write out a newick tree using neighbor joining? Should I even be using Biopython? > > Thanks so much! > > Jordan > > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython >