From markbudde at gmail.com Mon Apr 1 14:41:43 2013 From: markbudde at gmail.com (Mark Budde) Date: Mon, 1 Apr 2013 11:41:43 -0700 Subject: [Biopython] New to BP. Looking for closely spaced genes Message-ID: Hi, Before I dive too far into BioPython, I'd like to get some input if you BioPython is an appropriate tool for my task.... I would like to look at the human genome ORF structure and identify regions where ORFs are closely spaced but differentially regulated, and also identify whether the ORFs are facing the same direction of opposing directions. To do this, I assume I would first download the annotated genome and write a script in BioPython annotating how far each ORF is from it's neighbors, what the orientation is, and store the result in a dictionary. Then I would download some expression data sets and add this to the data to the dictionary. Then I would write some algorithm comparing gene distance, orientation and expression correlation to generate a list of candidate ORF pairs which fit my criteria. My question is, is BioPython a reasonable tool to accomplish this, or is it going to be way to slow whereas some alternative package is better suited for my task? Thanks, Mark Budde From dtomso at agbiome.com Mon Apr 1 15:09:39 2013 From: dtomso at agbiome.com (Dan Tomso) Date: Mon, 1 Apr 2013 19:09:39 +0000 Subject: [Biopython] New to BP. Looking for closely spaced genes In-Reply-To: References: Message-ID: <0bdbbf85a7284f21ad6d03aec6ac55cb@SN2PR03MB015.namprd03.prod.outlook.com> Hi, Mark. I think BioPython will have the tools you need to do the mechanical handling of sequences. You might want to contemplate various strategies to do the positional comparisons and data overlays. For example, if I were approaching this, I would start building position tables for the various content in SQL and then do the set/join/overlap work there. But to re-answer your primary question--yes, you can get the sequence and features parsed in BioPython with reasonable convenience. Best regards, Dan Tomso ________________________________________ From: biopython-bounces at lists.open-bio.org on behalf of Mark Budde Sent: Monday, April 01, 2013 2:41 PM To: biopython Subject: [Biopython] New to BP. Looking for closely spaced genes Hi, Before I dive too far into BioPython, I'd like to get some input if you BioPython is an appropriate tool for my task.... I would like to look at the human genome ORF structure and identify regions where ORFs are closely spaced but differentially regulated, and also identify whether the ORFs are facing the same direction of opposing directions. To do this, I assume I would first download the annotated genome and write a script in BioPython annotating how far each ORF is from it's neighbors, what the orientation is, and store the result in a dictionary. Then I would download some expression data sets and add this to the data to the dictionary. Then I would write some algorithm comparing gene distance, orientation and expression correlation to generate a list of candidate ORF pairs which fit my criteria. My question is, is BioPython a reasonable tool to accomplish this, or is it going to be way to slow whereas some alternative package is better suited for my task? Thanks, Mark Budde _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From jordan.r.willis at Vanderbilt.Edu Tue Apr 2 00:40:36 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Tue, 2 Apr 2013 04:40:36 +0000 Subject: [Biopython] Superimposer troubles Message-ID: Hello List, I'm having trouble working through some issues with the superimposer for all-atom superpositions. Often times, we work on protein design and our end PDB files differs in atom-number and sometimes composition from our input. I'm a big fan of the Superimposer, so we have implemented like this: p = PDBParser() native_pdb = p.get_structure("input","input.pdb") designed_pdb = p.get_structure("output","output.pdb") native_ca_atoms = [] native_all_atoms = [] designed_ca_atoms = [] designed_all_atoms = [] for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()): native_ca_atoms.append(native_residue['CA']) designed_ca_atoms.append(native_residue['CA'] for (native_atom, designed_atom) in zip(native_residue.get_list(), designed_residue.get_list()): native_all_atoms.append(native_atom) designed_atom.append(designed_atom) superpose_ca = Superimposer() superpose_all = Superimposer() superpose_ca.set(native_ca_atoms, designed_ca_atoms) superpose_ca.apply(designed_pdb) ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms) superpose_all.set(native_all_atoms, designed_all_atoms) superpose_ca.apply(designed_pdb) all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms) For the CA atom residues its not really a big deal since everything we design has a CA atom. However when we go into all atoms, it turns out that the designed residue and the native residue can be different, thus leading to a different number of atoms. I didn't realize, but the zip function was making these two lists as big as the smallest list and not necessarily matching up the atoms. It would just hack off some part of the larger list! This way, the superimposer was never failing because it always had an exact match of atoms. Is the superimposer smart enough to just minimize the rmsd no matter how the lists are input, no matter what order? For instance if I put the same arginines atoms backwards in one list, and forwards in the other list, would it still be able to give a 0.0 rmsd? Thank you for your feedback, Jordan PS. Does the superimposer.rms method give back the RMSD of whatever atoms you put into it? Or is it always the CA atoms? From anaryin at gmail.com Tue Apr 2 03:07:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 2 Apr 2013 09:07:08 +0200 Subject: [Biopython] Superimposer troubles In-Reply-To: References: Message-ID: Hey Jordan, Without checking the code, I'd say order matters. The two sequences of atoms will be aligned per position. If you have ca, c, n, o or ca, n, o, c you'll get different results. Try a simple glycine and switch the order of the atoms. I think it should work like this, but again, not sure. As for the rms value, it depends on the input. If it's ca only, you get ca rmsd, etc. Cheers, Jo?o ----- This message was sent from a mobile phone and is likely to be short, concise, and direct. No dia 2 de Abr de 2013 07:26, "Willis, Jordan R" < jordan.r.willis at vanderbilt.edu> escreveu: > > Hello List, > > > I'm having trouble working through some issues with the superimposer for > all-atom superpositions. Often times, we work on protein design and our end > PDB files differs in atom-number and sometimes composition from our input. > I'm a big fan of the Superimposer, so we have implemented like this: > > p = PDBParser() > native_pdb = p.get_structure("input","input.pdb") > designed_pdb = p.get_structure("output","output.pdb") > > > native_ca_atoms = [] > native_all_atoms = [] > designed_ca_atoms = [] > designed_all_atoms = [] > for (native_residue, designed_residue) in zip(native_pdb.get_residues(), > designed_pdb.get_residues()): > native_ca_atoms.append(native_residue['CA']) > designed_ca_atoms.append(native_residue['CA'] > for (native_atom, designed_atom) in zip(native_residue.get_list(), > designed_residue.get_list()): > native_all_atoms.append(native_atom) > designed_atom.append(designed_atom) > > > superpose_ca = Superimposer() > superpose_all = Superimposer() > > superpose_ca.set(native_ca_atoms, designed_ca_atoms) > superpose_ca.apply(designed_pdb) > ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms) > > > superpose_all.set(native_all_atoms, designed_all_atoms) > superpose_ca.apply(designed_pdb) > all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms) > > > For the CA atom residues its not really a big deal since everything we > design has a CA atom. However when we go into all atoms, it turns out that > the designed residue and the native residue can be different, thus leading > to a different number of atoms. I didn't realize, but the zip function was > making these two lists as big as the smallest list and not necessarily > matching up the atoms. It would just hack off some part of the larger list! > This way, the superimposer was never failing because it always had an > exact match of atoms. Is the superimposer smart enough to just minimize the > rmsd no matter how the lists are input, no matter what order? For instance > if I put the same arginines atoms backwards in one list, and forwards in > the other list, would it still be able to give a 0.0 rmsd? > > Thank you for your feedback, > Jordan > > PS. Does the superimposer.rms method give back the RMSD of whatever atoms > you put into it? Or is it always the CA atoms? > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Tue Apr 2 05:38:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Apr 2013 10:38:24 +0100 Subject: [Biopython] Superimposer troubles In-Reply-To: References: Message-ID: On Tue, Apr 2, 2013 at 5:40 AM, Willis, Jordan R wrote: > > Hello List, > > > I'm having trouble working through some issues with the superimposer for all-atom > superpositions. Often times, we work on protein design and our end PDB files >differs in atom-number and sometimes composition from our input. I'm a big fan > of the Superimposer, so we have implemented like this: > > p = PDBParser() > native_pdb = p.get_structure("input","input.pdb") > designed_pdb = p.get_structure("output","output.pdb") > > > native_ca_atoms = [] > native_all_atoms = [] > designed_ca_atoms = [] > designed_all_atoms = [] > for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()): > native_ca_atoms.append(native_residue['CA']) > designed_ca_atoms.append(native_residue['CA'] > ... > > For the CA atom residues its not really a big deal since everything we design > has a CA atom. However when we go into all atoms, it turns out that the > designed residue and the native residue can be different, thus leading to a > different number of atoms. I didn't realize, but the zip function was making > these two lists as big as the smallest list and not necessarily matching up > the atoms. It would just hack off some part of the larger list! This way, > the superimposer was never failing because it always had an exact > match of atoms. How about using izip_longest (from itertools) rather than zip? That should give a clear error when the residue counts are different. In general however, dealing with similar but different chains will require some sort of pairwise alignment and/or restricting to just backbone atoms (like CA, C-alpha). Peter From p.j.a.cock at googlemail.com Tue Apr 2 12:33:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Apr 2013 17:33:53 +0100 Subject: [Biopython] New to BP. Looking for closely spaced genes In-Reply-To: References: Message-ID: On Mon, Apr 1, 2013 at 7:41 PM, Mark Budde wrote: > Hi, > Before I dive too far into BioPython, I'd like to get some input if you > BioPython is an appropriate tool for my task.... > > I would like to look at the human genome ORF structure and identify regions > where ORFs are closely spaced but differentially regulated, and also > identify whether the ORFs are facing the same direction of opposing > directions. To do this, I assume I would first download the annotated > genome and write a script in BioPython annotating how far each ORF is from > it's neighbors, what the orientation is, and store the result in a > dictionary. Then I would download some expression data sets and add this to > the data to the dictionary. Then I would write some algorithm comparing > gene distance, orientation and expression correlation to generate a list of > candidate ORF pairs which fit my criteria. > > My question is, is BioPython a reasonable tool to accomplish this, or is it > going to be way to slow whereas some alternative package is better suited > for my task? > Thanks, > Mark Budde Hi Mark, That sounds very doable with Biopython parsing GenBank format chromosomes downloaded form the NCBI/EMBL/DDBJ. I did something similar to look at overlaps and gaps between genes of bacteria some years back - also using the Biopython GenBank parser, e.g. http://mbe.oxfordjournals.org/cgi/content/abstract/msp302 In your case with humans there'll be lots of intron/exon structure (join locations in GenBank) so I'm recommend trying the current code from git (which will become Biopython 1.62) where this has been re-factored to hopefully make joins much easier than before. Regards, Peter From linxzh1989 at gmail.com Fri Apr 5 22:53:49 2013 From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=) Date: Sat, 6 Apr 2013 10:53:49 +0800 Subject: [Biopython] MUSCLE for alignment Message-ID: Hi all ! I have a seqdump.fasta file: >lcl|24977 TGAGAAAGACTTGAGAGGACA >lcl|24977:8-21 GAGATGACTTAGAGGACA I want to use a wrapper for Muscle in Biopython to align the two seq. the alignment result will put into a existing fasta file. >>>from Bio.Align.Applications import MuscleCommandline >>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') But i can not find anything in the result.fasta after i run the command. Do i have any missing to get the result? regards Lin From p.j.a.cock at googlemail.com Sat Apr 6 04:58:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 09:58:30 +0100 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 3:53 AM, ??? wrote: > Hi all ! > I have a seqdump.fasta file: >>lcl|24977 > TGAGAAAGACTTGAGAGGACA > >>lcl|24977:8-21 > GAGATGACTTAGAGGACA > > I want to use a wrapper for Muscle in Biopython to align the two seq. > the alignment result will put into a existing fasta file. > >>>>from Bio.Align.Applications import MuscleCommandline >>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') > > But i can not find anything in the result.fasta after i run the command. > Do i have any missing to get the result? > > regards > Lin Hi Lin, In your example you've not yet called Muscle, #Load the library: from Bio.Align.Applications import MuscleCommandline #Create command line wrapper instance, mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') #Optionally show what command it would run: print mcline #Actually run the command, stdout, stderr = mcline() Does that help? The main Tutorial does have some more detailed examples. Peter From p.j.a.cock at googlemail.com Sat Apr 6 07:41:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 12:41:33 +0100 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 12:18 PM, ??? wrote: > Thank you! Peter. > It really helps me. > If i do not specify it by: stdout, stderr = mcline() > the alignment will writen to stdout, instead of the output file. > Is it correct? MUSCLE will by default write the alignment to stdout, but you used the out argument to specify an output filename instead. In this case stdout will probably be empty. There are some stdout examples using MUSCLE in the Biopython Tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Peter P.S. Please CC the mailing list. From linxzh1989 at gmail.com Sat Apr 6 09:57:31 2013 From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=) Date: Sat, 6 Apr 2013 21:57:31 +0800 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: Thank you for you advice. I will CC the maillling list. regards 2013/4/6 Peter Cock : > On Sat, Apr 6, 2013 at 12:18 PM, ?????? wrote: >> Thank you! Peter. >> It really helps me. >> If i do not specify it by: stdout, stderr = mcline() >> the alignment will writen to stdout, instead of the output file. >> Is it correct? > > MUSCLE will by default write the alignment to stdout, but you > used the out argument to specify an output filename instead. > In this case stdout will probably be empty. > > There are some stdout examples using MUSCLE in the > Biopython Tutorial: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Peter > > P.S. Please CC the mailing list. From nicolas.joannin at gmail.com Sat Apr 6 11:31:40 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 7 Apr 2013 00:31:40 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 Message-ID: Hello everyone, I'm having a problem installing biopython with Python 3.3.1rc1... Basically, I get several tests failing (in addition to a lot of warnings). I don't think the failed tests will be a problem for my work, however, I thought you'd want to have a look... Attached is the output of python3 setup.py test. Also, if you think I shouldn't use biopython without having these failed tests fixed first, please let me know! Best regards, Nicolas -------------- next part -------------- Nicolass-MacBook-Air:biopython NicojoAir11$ python3 setup.py test WARNING - Biopython does not yet officially support Python 3 The 2to3 library will be called automatically now, and the converted files cached under build/py3.3 Processing Bio Processing BioSQL Processing Tests Processing Scripts Processing Doc Python 2to3 processing done. running test Python version: 3.3.1rc1 (v3.3.1rc1:92c2cfb92405, Mar 25 2013, 00:54:04) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_Application ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Connection failed, check settings if you plan to use BioSQL: FATAL: role "postgres" does not exist test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... ok test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_Fasttree_tool ... skipping. Install fasttree and correctly set the file path to the program if you want to use it from Biopython. test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... ok test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... ok test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... ok test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_CDAO ... skipping. Install the librdf Python bindings if you want to use the CDAO tree format. test_Phylo_NeXML ... ./test_Phylo_NeXML.py:87: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/9w/kkwnss4n52bbc3crhctbhfnh0000gn/T/tmpf9__6a'> t2 = next(NeXMLIO.Parser(open(DUMMY, 'rb')).parse()) ok test_Phylo_depend ... skipping. Install matplotlib if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='SCOP/scopseq-test/astral-scopdom-seqres-all-test.fa' mode='r' encoding='UTF-8'> for record in sequences: ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... ok test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ok test_SeqIO_Insdc ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ./test_SeqIO_QualityIO.py:348: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> records1 = list(SeqIO.parse(open("Quality/example.fasta"),"fasta")) ./test_SeqIO_QualityIO.py:349: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:357: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> self.assertEqual(h.getvalue(),open("Quality/example.fasta").read()) ./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> open("Quality/example.qual"))) ./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> open("Quality/example.qual"))) ./test_SeqIO_QualityIO.py:329: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) ./test_SeqIO_QualityIO.py:334: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> records1 = list(SeqIO.parse(open("Quality/example.qual"),"qual")) ./test_SeqIO_QualityIO.py:335: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) ./test_SeqIO_QualityIO.py:344: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> self.assertEqual(h.getvalue(),open("Quality/example.qual").read()) ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_original_illumina.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_original_solexa.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.fasta' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.qual' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.fasta' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.qual' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_end.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_start.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_in_middle.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_at_start.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_in_middle.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_no_manifest.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/greek.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/paired.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_93.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/tricky.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ok test_SeqIO_SeqXML ... ./test_SeqIO_SeqXML.py:141: DeprecationWarning: Please use assertEqual instead. self.assertEquals(len(read1_records),len(read2_records)) ok test_SeqIO_convert ... ok test_SeqIO_features ... ./test_SeqIO_features.py:190: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/iro.gb' mode='rU' encoding='UTF-8'> gbk_template = open("GenBank/iro.gb", "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqFeature.py:155: BiopythonDeprecationWarning: Rather than sub_features, use a CompoundFeatureLocation BiopythonDeprecationWarning) ./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds")) ./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.faa' mode='r' encoding='UTF-8'> fasta = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds")) ./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'> fasta = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:1070: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1072: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'> fa_records = list(SeqIO.parse(open(self.ffn_filename),"fasta")) ./test_SeqIO_features.py:1023: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1024: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> embl_record = SeqIO.read(open(self.embl_filename),"embl") ./test_SeqIO_features.py:1054: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1055: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.fna' mode='r' encoding='UTF-8'> fa_record = SeqIO.read(open(self.fna_filename),"fasta") ./test_SeqIO_features.py:1059: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> embl_record = SeqIO.read(open(self.embl_filename),"embl") ./test_SeqIO_features.py:1036: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'> faa_records = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:1037: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'> ffn_records = list(SeqIO.parse(open(self.ffn_filename),"fasta")) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AAA03323.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/DD231055_edited.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/Human_contigs.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NT_019265.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/SC10H5.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/TRBG361.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/U87107.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/arab1.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/blank_seq.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/cor6_6.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/dbsource_wrap.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/extra_keywords.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/gbvrl1_start.seq' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/noref.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/one_of.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/origin_line.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/pri1.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq2.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ./test_SeqUtils.py:71: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> record = SeqIO.read(open(dna_genbank_filename), "genbank") ./test_SeqUtils.py:55: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> seq_records = list(SeqIO.parse(open(dna_fasta_filename), "fasta")) ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ./test_SubsMat.py:21: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_count.txt' mode='r' encoding='UTF-8'> ftab_prot = FreqTable.read_count(open(ftab_file)) ./test_SubsMat.py:23: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_freq.txt' mode='r' encoding='UTF-8'> ctab_prot = FreqTable.read_freq(open(ctab_file)) ./test_SubsMat.py:31: ResourceWarning: unclosed file <_io.BufferedReader name='SubsMat/acc_rep_mat.pik'> acc_rep_mat = pickle.load(open(pickle_file, 'rb')) ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ./test_TogoWS.py:501: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "embl"), "embl") ./test_TogoWS.py:494: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "fasta"), "fasta") ok test_Tutorial ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk.bgz'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_005.txt'> /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result1.txt' mode='r' encoding='UTF-8'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result2.txt' mode='r' encoding='UTF-8'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='lipoprotein.txt' mode='r' encoding='UTF-8'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='REB1.pfm' mode='r' encoding='UTF-8'> /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='meme.out' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='alignace.out' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'> ok test_UniGene ... ok test_Uniprot ... ./test_Uniprot.py:314: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'> ids = [x.strip() for x in open("SwissProt/multi_ex.list")] ./test_Uniprot.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'> ids = [x.strip() for x in open("SwissProt/multi_ex.list")] /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.txt'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.xml'> function() ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSE16.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM645.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM691.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM700.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM804.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy_chp.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_dual.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_family.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_platform.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/Doc/prosite.excerpt.doc' mode='r' encoding='UTF-8'> function() ok test_prosite1 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00107.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00159.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00165.txt' mode='r' encoding='UTF-8'> function() ok test_prosite2 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00432.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00488.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00546.txt' mode='r' encoding='UTF-8'> function() ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/compound.sample' mode='r' encoding='UTF-8'> test.globs.clear() ok Bio.KEGG.Enzyme docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/enzyme.sample' mode='r' encoding='UTF-8'> test.globs.clear() ok Bio.Motif docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/SRF.pfm' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/meme.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> exception = None ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/SRF.pfm' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/meme.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'> exception = None /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml.bgz'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'> test.globs.clear() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'> test.globs.clear() ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.SeqIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'> test.globs.clear() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> for record in sequences: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq.bgz'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_000932.faa'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_005816.faa'> test.globs.clear() ok Bio.SeqIO.FastaIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'> # Copyright 2006-2009 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'> # Copyright 2006-2009 by Peter Cock. All rights reserved. ok Bio.SeqIO.AceIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/AceIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/consed_sample.ace' mode='rU' encoding='UTF-8'> # Copyright 2008-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/contig1.ace' mode='rU' encoding='UTF-8'> test.globs.clear() ok Bio.SeqIO.PhdIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/PhdIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Phd/phd1' mode='r' encoding='UTF-8'> # Copyright 2008-2010 by Peter Cock. All rights reserved. ok Bio.SeqIO.QualityIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> for record in sequences: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'> for record in records: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'> for record in records: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'> for record in records: ok ./run_tests.py:427: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'> gc.collect() Bio.SeqIO.SffIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/SffIO.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> test.globs.clear() ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqRecord.py:2: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='rU' encoding='UTF-8'> # Copyright 2002-2004 Brad Chapman. ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "./test_Entrez_online.py", line 44, in test_read_from_url rec = Entrez.read(einfo) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/__init__.py", line 367, in read record = handler.read(handle) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "./test_SeqIO_index.py", line 441, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header raise IOError('Not a gzipped file') OSError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header raise IOError('Not a gzipped file') OSError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 217 tests in 238.221 seconds FAILED (failures = 4) From p.j.a.cock at googlemail.com Sat Apr 6 14:19:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 19:19:43 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin wrote: > Hello everyone, > > I'm having a problem installing biopython with Python 3.3.1rc1... > Basically, I get several tests failing (in addition to a lot of warnings). > > I don't think the failed tests will be a problem for my work, however, I > thought you'd want to have a look... Attached is the output of python3 > setup.py test. > > Also, if you think I shouldn't use biopython without having these failed > tests fixed first, please let me know! > > Best regards, > Nicolas Hi Nicolas, You should be OK installing this - all the test failures are within Bio.bgzf which is curious, but you probably won't be using BGZF compressed files. We do have buildslaves testing on Python 3.3.0 where this does not happen, so perhaps this is a new failure from a change in Python 3.3.1rc1 - hopefully I'll be able to confirm that by updating one of the buildslaves. Thanks for the alert, Peter From markbudde at gmail.com Sat Apr 6 20:36:10 2013 From: markbudde at gmail.com (Mark Budde) Date: Sat, 6 Apr 2013 17:36:10 -0700 Subject: [Biopython] Restriction enzymes and sticky ends Message-ID: Hi - I have a question about sticky ends in Biopython. Specifically, is there any way to maintain sticky end information? Having read the restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), I suspect that the answer is no. It seems that the cut sites are only maintained for the top strand. So I am planning on adding this data in my program (although I will need to read up on classes). However, this requires that I can get the cut site information. The only way that I can find to extract this information is from the Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can use this information to determine the cut sites, but I expect that there is a more direct way, since the elucidate() function must be generating this from some attribute. FYI, I am curious about this because I want to simulate GoldenGate cloning in Biopython. Thanks, Mark Budde From markbudde at gmail.com Sat Apr 6 21:11:36 2013 From: markbudde at gmail.com (Mark Budde) Date: Sat, 6 Apr 2013 18:11:36 -0700 Subject: [Biopython] Restriction enzymes and sticky ends Message-ID: Hi - I have a question about sticky ends in Biopython. Specifically, is there any way to maintain sticky end information? Having read the restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), I suspect that the answer is no. It seems that the cut sites are only maintained for the top strand. So I am planning on adding this data in my program (although I will need to read up on classes). However, this requires that I can get the cut site information. The only way that I can find to extract this information is from the Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can use this information to determine the cut sites, but I expect that there is a more direct way, since the elucidate() function must be generating this from some attribute. FYI, I am curious about this because I want to simulate GoldenGate cloning in Biopython. Thanks, Mark Budde From nicolas.joannin at gmail.com Sat Apr 6 23:12:54 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 7 Apr 2013 12:12:54 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Hi Peter, Thanks for the quick reply! Indeed, I don't think it is a big issue for me, and I have also not had any problems with Python 3.3.0 on another machine. So, yes, it probably is linked to the Python 3.3.1rc1... However, I should point out that it is not only the Bio.bgzf that fails testing. There are also test_Entrez_online and test_SeqIO_index that are indicated as "FAIL" (both of which I do not directly use). Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Apr 7, 2013 at 3:19 AM, Peter Cock wrote: > On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin > wrote: > > Hello everyone, > > > > I'm having a problem installing biopython with Python 3.3.1rc1... > > Basically, I get several tests failing (in addition to a lot of > warnings). > > > > I don't think the failed tests will be a problem for my work, however, I > > thought you'd want to have a look... Attached is the output of python3 > > setup.py test. > > > > Also, if you think I shouldn't use biopython without having these failed > > tests fixed first, please let me know! > > > > Best regards, > > Nicolas > > Hi Nicolas, > > You should be OK installing this - all the test failures are > within Bio.bgzf which is curious, but you probably won't be > using BGZF compressed files. > > We do have buildslaves testing on Python 3.3.0 where this > does not happen, so perhaps this is a new failure from a > change in Python 3.3.1rc1 - hopefully I'll be able to confirm > that by updating one of the buildslaves. > > Thanks for the alert, > > Peter > From p.j.a.cock at googlemail.com Sun Apr 7 10:41:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 7 Apr 2013 15:41:33 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin wrote: > Hi Peter, > > Thanks for the quick reply! > Indeed, I don't think it is a big issue for me, and I have also not had any > problems with Python 3.3.0 on another machine. > So, yes, it probably is linked to the Python 3.3.1rc1... I see that Python 3.3.1 final is out now - might be worth checking that too, and I'll try to update one of our buildslaves to use this. > However, I should point out that it is not only the Bio.bgzf that fails > testing. > There are also test_Entrez_online and test_SeqIO_index that are indicated as > "FAIL" (both of which I do not directly use). The test_SeqIO_index.py failures all looked to be BGZF related too. I missed the Entrez test, but as an online test that can sometimes fail intermittently anyway. The chances are on rerunning it'll be fine. Peter From bjorn_johansson at bio.uminho.pt Sun Apr 7 14:05:11 2013 From: bjorn_johansson at bio.uminho.pt (=?ISO-8859-1?Q?Bj=F6rn_Johansson?=) Date: Sun, 7 Apr 2013 19:05:11 +0100 Subject: [Biopython] sticky ends in Biopython Message-ID: > > Message: 2 > Date: Sat, 6 Apr 2013 17:36:10 -0700 > From: Mark Budde > Subject: [Biopython] Restriction enzymes and sticky ends > To: biopython > Message-ID: > < > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi - I have a question about sticky ends in Biopython. Specifically, is > there any way to maintain sticky end information? Having read the > restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html > ), > I suspect that the answer is no. It seems that the cut sites are only > maintained for the top strand. So I am planning on adding this data in my > program (although I will need to read up on classes). > > However, this requires that I can get the cut site information. The only > way that I can find to extract this information is from the > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can > use this information to determine the cut sites, but I expect that there is > a more direct way, since the elucidate() function must be generating this > from some attribute. > > FYI, I am curious about this because I want to simulate GoldenGate cloning > in Biopython. > > Thanks, > Mark Budde > > > ------------------------------ > Hi Mark, Check out Python-dna that have classes for dealing with double stranded DNA. This package depends on Biopython and a couple of additional modules. Disclaimer: I am the developer of Python-dna Python-dna at pypi https://pypi.python.org/pypi/python-dna/ Source code https://code.google.com/p/pydna/ Documentation http://python-dna.readthedocs.org/ Discussion group https://groups.google.com/forum/?fromgroups#!forum/python-dna / Bjorn Johansson -- ______O_________oO________oO______o_______oO__ Bj?rn Johansson Assistant Professor Departament of Biology University of Minho Campus de Gualtar 4710-057 Braga PORTUGAL www.bio.uminho.pt Google profile Google Scholar Profile my group Office (direct) +351-253 601517 | (PT) mob. +351-967 147 704 | (SWE) mob. +46 739 792 968 Dept of Biology (secr) +351-253 60 4310 | fax +351-253 678980 From markbudde at gmail.com Sun Apr 7 14:48:16 2013 From: markbudde at gmail.com (Mark Budde) Date: Sun, 7 Apr 2013 11:48:16 -0700 Subject: [Biopython] sticky ends in Biopython In-Reply-To: References: Message-ID: OK, that looks useful. Thanks. -Mark On Sun, Apr 7, 2013 at 11:05 AM, Bj?rn Johansson < bjorn_johansson at bio.uminho.pt> wrote: > > > > Message: 2 > > Date: Sat, 6 Apr 2013 17:36:10 -0700 > > From: Mark Budde > > Subject: [Biopython] Restriction enzymes and sticky ends > > To: biopython > > Message-ID: > > < > > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi - I have a question about sticky ends in Biopython. Specifically, is > > there any way to maintain sticky end information? Having read the > > restriction doc ( > http://biopython.org/DIST/docs/cookbook/Restriction.html > > ), > > I suspect that the answer is no. It seems that the cut sites are only > > maintained for the top strand. So I am planning on adding this data in my > > program (although I will need to read up on classes). > > > > However, this requires that I can get the cut site information. The only > > way that I can find to extract this information is from the > > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I > can > > use this information to determine the cut sites, but I expect that there > is > > a more direct way, since the elucidate() function must be generating this > > from some attribute. > > > > FYI, I am curious about this because I want to simulate GoldenGate > cloning > > in Biopython. > > > > Thanks, > > Mark Budde > > > > > > ------------------------------ > > > > Hi Mark, > > Check out Python-dna that have classes for dealing with > double stranded DNA. This package depends on Biopython and a couple of > additional modules. > > Disclaimer: I am the developer of Python-dna > > Python-dna at pypi https://pypi.python.org/pypi/python-dna/ > Source code https://code.google.com/p/pydna/ > Documentation http://python-dna.readthedocs.org/ > Discussion group > https://groups.google.com/forum/?fromgroups#!forum/python-dna > > / Bjorn Johansson > > > > -- > ______O_________oO________oO______o_______oO__ > Bj?rn Johansson > Assistant Professor > Departament of Biology > University of Minho > Campus de Gualtar > 4710-057 Braga > PORTUGAL > www.bio.uminho.pt > Google profile > Google Scholar Profile< > http://scholar.google.com/citations?user=7AiEuJ4AAAAJ> > my group > Office (direct) +351-253 601517 | (PT) mob. +351-967 147 704 | (SWE) mob. > +46 739 792 968 > Dept of Biology (secr) +351-253 60 4310 | fax +351-253 678980 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Sun Apr 7 15:52:13 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 7 Apr 2013 20:52:13 +0100 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 1:36 AM, Mark Budde wrote: > Hi - I have a question about sticky ends in Biopython. Specifically, is > there any way to maintain sticky end information? Having read the > restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), > I suspect that the answer is no. It seems that the cut sites are only > maintained for the top strand. So I am planning on adding this data in my > program (although I will need to read up on classes). > > However, this requires that I can get the cut site information. The only > way that I can find to extract this information is from the > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can > use this information to determine the cut sites, but I expect that there is > a more direct way, since the elucidate() function must be generating this > from some attribute. > > FYI, I am curious about this because I want to simulate GoldenGate cloning > in Biopython. > > Thanks, > Mark Budde Hi Mark, Good question. Sadly help(EcoRI) doesn't tell you very much, does it? The whole Restriction module could benefit from a new maintainer and/or a rewrite (for one thing, it unfortunately did not follow Python counting in some aspects). Two tips: first dir(object) gives a list of the attributes and methods of an object in Python. Second, you can look at the source of the elucidate method to see where it gets the information you're looking for ;) [A last resort perhaps - but when documentation has let you down, worth knowing how to explore.] https://github.com/biopython/biopython/blob/master/Bio/Restriction/Restriction.py Here EcoRI is a 5' overhanging digest enzyme, and the values you need are EcoRI.fst5 (here 1) and EcoRI.fst3 (here -1) which are relative to the recognition site (here GAATTC). e.g. Overhang type methods include: >>> from Bio.Restriction import EcoRI >>> EcoRI.overhang() "5' overhang" >>> EcoRI.is_blunt() False >>> EcoRI.is_5overhang() True >>> EcoRI.is_3overhang() False >>> EcoRI.elucidate() 'G^AATT_C' >>> EcoRI.fst5 1 >>> EcoRI.fst3 -1 >>> EcoRI.site 'GAATTC' Notice 'GAATTC'[:1] = 'G', 'GAATTC'[1:-1] = 'AATT' and 'GAATTC'[-1:] = 'C' which gives the elucidated string. Is that all you needed? Regards Peter From p.j.a.cock at googlemail.com Mon Apr 8 05:32:00 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 10:32:00 +0100 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde wrote: > Thanks for doing some digging on my behalf, Peter. After I posted my email > last night, I started looking through the Bio.Restriction code myself. You > response is helpful, I was having trouble seeing how the cut site was > encoded for each strand. I think Bjorn's python-dna might be a better > starting place for me than Bio.Restriction, as it already has some of the > functionality I was looking for. Fair enough. > However, to you question, I'm still not quite getting the cut sites. You > example with EcoRI makes complete sense, but I can't figure out the pattern > for some other enzymes, such as BsaI, which is why I got confused initially. > If you repeat that protocol for BsaI, the results don't match up. > > In [80]: BsaI.elucidate() > Out[80]: 'GGTCTCN^NNNN_N' > > In [81]: BsaI.fst5 > Out[81]: 7 > > In [82]: BsaI.fst3 > Out[82]: 5 > > In [83]: BsaI.site > Out[83]: 'GGTCTC' > > Based on this, I would expect that BsaI.fst3 should yield > "11" but it yields 5. I think you are counting from the wrong reference point. Using Python style indexing would only allow cleavage points within the recognition site to be described. BsaI is a weird enzyme, and appears to be handled by the Ambiguous class in Bio/Restriction/Restriction.py - which says this is for enzymes for which the overhang is variable. >>> from Bio.Restriction import Bsal >>> BsaI.is_ambiguous() True >>> BsaI.is_defined() # is there a consistent site? False >>> BsaI.is_unknown() False >>> BsaI.fst5 7 >>> BsaI.fst3 5 >>> BsaI.elucidate() 'GGTCTCN^NNNN_N' This subclass has a more complicated elucidate method, but gives the same string as the REBASE website, so this is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html The 5' cut site of 7 clearly means this is downstream of the 6 bp recognition site. This appears to be counted from the start (left) of the restriction site. >From the illustration the 3' cut side is also to the right of the 5bp recognition site. It appears the number is counted from the end (right) of the recognition site, where positive as in BsaI means to the right (after the recognition site) while negative as in EcoRI means to the left (within the recognition site). Peter P.S. Please remember to CC the mailing list, e.g. reply all. Unless people say explicitly that they have done this deliberately, I generally assume taking a public discussion off list is accidental. From nicolas.joannin at gmail.com Mon Apr 8 09:21:45 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Mon, 8 Apr 2013 22:21:45 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Hi Peter, I need to update another machine, so I'll do that with the final version to see if the problem still exists. Will post back when that's done. Regarding the Entrez test, indeed, it doesn't fail every time. So no worries there. Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Apr 7, 2013 at 11:41 PM, Peter Cock wrote: > On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin > wrote: > > Hi Peter, > > > > Thanks for the quick reply! > > Indeed, I don't think it is a big issue for me, and I have also not had > any > > problems with Python 3.3.0 on another machine. > > So, yes, it probably is linked to the Python 3.3.1rc1... > > I see that Python 3.3.1 final is out now - might be worth checking > that too, and I'll try to update one of our buildslaves to use this. > > > However, I should point out that it is not only the Bio.bgzf that fails > > testing. > > There are also test_Entrez_online and test_SeqIO_index that are > indicated as > > "FAIL" (both of which I do not directly use). > > The test_SeqIO_index.py failures all looked to be BGZF related too. > > I missed the Entrez test, but as an online test that can sometimes > fail intermittently anyway. The chances are on rerunning it'll be fine. > > Peter > From p.j.a.cock at googlemail.com Mon Apr 8 10:05:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 15:05:49 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin wrote: > Hi Peter, > > I need to update another machine, so I'll do that with the final version to > see if the problem still exists. Will post back when that's done. > Regarding the Entrez test, indeed, it doesn't fail every time. So no worries > there. > > Cheers, > Nicolas I've just installed Python 3.3.1 (final) from source on a 64 bit Linux machine, and can confirm test failures from the BGZF code (not failing under Python 3.3.0). I was hoping this would be a glitch in the release candidate but sadly not. Thank you again for bringing this to our attention. Peter From nicolas.joannin at gmail.com Mon Apr 8 10:10:07 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Mon, 8 Apr 2013 23:10:07 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: OK, I guess that'll be the same whichever platform... I guess I'll stick with 3.3.0 for the other machine then. Thanks for the update! Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Mon, Apr 8, 2013 at 11:05 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin > wrote: > > Hi Peter, > > > > I need to update another machine, so I'll do that with the final version > to > > see if the problem still exists. Will post back when that's done. > > Regarding the Entrez test, indeed, it doesn't fail every time. So no > worries > > there. > > > > Cheers, > > Nicolas > > I've just installed Python 3.3.1 (final) from source on a 64 bit Linux > machine, and can confirm test failures from the BGZF code (not > failing under Python 3.3.0). I was hoping this would be a glitch in > the release candidate but sadly not. > > Thank you again for bringing this to our attention. > > Peter > From p.j.a.cock at googlemail.com Mon Apr 8 11:23:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 16:23:25 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin wrote: > OK, I guess that'll be the same whichever platform... > I guess I'll stick with 3.3.0 for the other machine then. > Thanks for the update! > > Nicolas More bad news - what ever was changes I think something similar was done in Python 2.7.4 as well, which also has new test failures not seen under Python 2.7.3. Sigh. Peter From markbudde at gmail.com Mon Apr 8 13:25:24 2013 From: markbudde at gmail.com (Mark Budde) Date: Mon, 8 Apr 2013 10:25:24 -0700 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: Thanks Peter, that explains it. BsaI is indeed a weird enzyme, a TypeIIs restriction enzyme. These enzymes cut a defined distance outside of their recognition sequence. The utility of these enzymes is that by tagging the cut sites on the end of your primers, you can generate whatever sticky ends you desire. Furthermore, because it cuts outside of its recognition sequence, you can incubate a number of these fragments together with both restriction enzyme and ligase, and the fragments will assemble into the final product without subcloning. This is because stciky ends are generated without the corresponding recognition site, so their ligation is irreversible. This is called GoldenGate cloning. -Mark On Mon, Apr 8, 2013 at 2:32 AM, Peter Cock wrote: > On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde wrote: > > Thanks for doing some digging on my behalf, Peter. After I posted my > email > > last night, I started looking through the Bio.Restriction code myself. > You > > response is helpful, I was having trouble seeing how the cut site was > > encoded for each strand. I think Bjorn's python-dna might be a better > > starting place for me than Bio.Restriction, as it already has some of the > > functionality I was looking for. > > Fair enough. > > > However, to you question, I'm still not quite getting the cut sites. You > > example with EcoRI makes complete sense, but I can't figure out the > pattern > > for some other enzymes, such as BsaI, which is why I got confused > initially. > > If you repeat that protocol for BsaI, the results don't match up. > > > > In [80]: BsaI.elucidate() > > Out[80]: 'GGTCTCN^NNNN_N' > > > > In [81]: BsaI.fst5 > > Out[81]: 7 > > > > In [82]: BsaI.fst3 > > Out[82]: 5 > > > > In [83]: BsaI.site > > Out[83]: 'GGTCTC' > > > > Based on this, I would expect that BsaI.fst3 should yield > > "11" but it yields 5. > > I think you are counting from the wrong reference point. > Using Python style indexing would only allow cleavage > points within the recognition site to be described. > > BsaI is a weird enzyme, and appears to be handled by the > Ambiguous class in Bio/Restriction/Restriction.py - which > says this is for enzymes for which the overhang is variable. > > >>> from Bio.Restriction import Bsal > >>> BsaI.is_ambiguous() > True > >>> BsaI.is_defined() # is there a consistent site? > False > >>> BsaI.is_unknown() > False > >>> BsaI.fst5 > 7 > >>> BsaI.fst3 > 5 > >>> BsaI.elucidate() > 'GGTCTCN^NNNN_N' > > This subclass has a more complicated elucidate method, > but gives the same string as the REBASE website, so this > is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html > > The 5' cut site of 7 clearly means this is downstream of > the 6 bp recognition site. This appears to be counted > from the start (left) of the restriction site. > > From the illustration the 3' cut side is also to the right of > the 5bp recognition site. It appears the number is counted > from the end (right) of the recognition site, where positive > as in BsaI means to the right (after the recognition site) > while negative as in EcoRI means to the left (within the > recognition site). > > Peter > > P.S. Please remember to CC the mailing list, e.g. reply all. > Unless people say explicitly that they have done this deliberately, > I generally assume taking a public discussion off list is accidental. > From p.j.a.cock at googlemail.com Mon Apr 8 13:55:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 18:55:47 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 4:23 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin > wrote: >> OK, I guess that'll be the same whichever platform... >> I guess I'll stick with 3.3.0 for the other machine then. >> Thanks for the update! >> >> Nicolas > > More bad news - what ever was changes I think something > similar was done in Python 2.7.4 as well, which also has > new test failures not seen under Python 2.7.3. Sigh. > > Peter Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a lot of gzip work done fixing other issues), but on the bright side the fix is quite trivial to apply manually: http://bugs.python.org/issue17666 Peter From p.j.a.cock at googlemail.com Tue Apr 9 05:39:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Apr 2013 10:39:12 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock wrote: > > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a > lot of gzip work done fixing other issues), but on the bright > side the fix is quite trivial to apply manually: > http://bugs.python.org/issue17666 > > Peter Just a heads up, this also affects Python 3.2.4 as well. Peter From p.j.a.cock at googlemail.com Tue Apr 9 06:20:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Apr 2013 11:20:43 +0100 Subject: [Biopython] OBF not accepted for GSoC 2013 Message-ID: Dear all, Unfortunately this year we have not been accepted on the Google Summer of Code scheme: I'm sure the rest of the OBF board and the other Bio* developers will join me in thanking Pjotr Prins for his efforts as the OBF GSoC administrator co-ordinating our application this year, as well as last year's administrator Rob Bruels and the other mentors for their efforts. For those of you not subscribed to the OBF's GSoC mailing list, I am forwarding Pjotr's email from last night (also below): http://lists.open-bio.org/pipermail/gsoc/2013/000211.html In all 177 organisations were accepted (about the same as the last few years), and they will be listed here (once they have filled out their profile information): https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013 To potential students this summer, the good news is that some related organisations have been accepted, such as NESCent, the National Resource for Network Biology (NRNB - known for Cytoscape), SciRuby (Ruby Science Foundation), so there is still some scope for doing a bioinformatics related project in GSoC 2013, perhaps even with a Bio* developer as a co-mentor. Thank you all, Peter (Biopython developer, OBF board member) ---------- Forwarded message ---------- From: Pjotr Prins Date: Mon, Apr 8, 2013 at 9:13 PM Subject: Re: GSoC 2013 is ON To: Pjotr Prins Cc: ..., OBF GSoC Sadly, our application got rejected by GSoC this year. I am not sure what the reason was, but I am convinced our application was similar to that of other years. Maybe the project ideas could have been better presented. I am not sure at this stage. I'll make a list of successful projects to see if we can digest some truths. The upside is that FOSS is going strong! And that the field is getting increasingly competitive. As an open source geezer I can only be happy, even if it hurts our own application. Sorry everyone, and many thanks for the trouble you took getting projects written up. Let's not feel discouraged for next year. Pj. From nicolas.joannin at gmail.com Tue Apr 9 09:47:03 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Tue, 9 Apr 2013 22:47:03 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Thanks for the fix! Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Tue, Apr 9, 2013 at 6:39 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock > wrote: > > > > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a > > lot of gzip work done fixing other issues), but on the bright > > side the fix is quite trivial to apply manually: > > http://bugs.python.org/issue17666 > > > > Peter > > Just a heads up, this also affects Python 3.2.4 as well. > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From matthiasschade.de at googlemail.com Thu Apr 11 05:20:31 2013 From: matthiasschade.de at googlemail.com (Matthias Schade) Date: Thu, 11 Apr 2013 11:20:31 +0200 Subject: [Biopython] query upper limit for NCBIWWW.qblast? Message-ID: <5166805F.8060603@googlemail.com> Hello everyone, is there an upper limit to how many sequences I can query via NCBIWWW.qblast at once? Sending up to 150 sequences each of 24mer length in a single string everything works fine. But now, I have tried the same for a string containing about 900 sequences. On good times, it takes the NCBI-server about 5min to send an answer. I save the answer and later open and parse the file by other functions in my code. However, even though I have queried the same 900 sequences, the resulting output-file varies in length (10 MB" or even misses more (this does not happen why querying 150 sequences or less). I would guess once the server has started sending its answers, there might only be a limited time NCBIWWW.qblast waits for follow up packets ... and thus depending on the current server-load, the NCBIWWW.qblast-function simply decides to terminate waiting for incomming data after some time, resulting in my blast-output-files to vary in length. Could anyone correct or verify this long-fetched hypothesis? My core-lines are: orgn='Mus Musculus' #on anything else result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, entrez_query=str(orgn+"[orgn]")) save_file = open ('myblast_result.xml',"w") save_file.write(result.read()) Best regards, Matthias From p.j.a.cock at googlemail.com Thu Apr 11 05:43:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 10:43:44 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: <5166805F.8060603@googlemail.com> References: <5166805F.8060603@googlemail.com> Message-ID: On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade wrote: > Hello everyone, > > is there an upper limit to how many sequences I can query via NCBIWWW.qblast > at once? There are sometimes limits on the URL length, especially if going via firewalls and proxies, so that may be one factor. At the NCBI end, I'm not sure what limits they impose on this: http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > Sending up to 150 sequences each of 24mer length in a single string > everything works fine. But now, I have tried the same for a string > containing about 900 sequences. On good times, it takes the NCBI-server > about 5min to send an answer. I save the answer and later open and parse the > file by other functions in my code. However, even though I have queried the > same 900 sequences, the resulting output-file varies in length (10 > MB "<\BlastOutput>" or even misses more (this does not happen why querying 150 > sequences or less). > > I would guess once the server has started sending its answers, there might > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > thus depending on the current server-load, the NCBIWWW.qblast-function > simply decides to terminate waiting for incomming data after some time, > resulting in my blast-output-files to vary in length. Could anyone correct > or verify this long-fetched hypothesis? > > My core-lines are: > > orgn='Mus Musculus' #on anything else > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > entrez_query=str(orgn+"[orgn]")) > save_file = open ('myblast_result.xml',"w") > save_file.write(result.read()) > > Best regards, > Matthias I think you've reach the scale where it would be better to run blastn locally - ideally on a cluster if you have access to one. You can download the whole NT database from here - most departments running BLAST with their own Linux servers will have a central copy which is kept automatically up to date: ftp://ftp.ncbi.nlm.nih.gov/blast/db/ If you don't have those kinds of resources, then you can even run BLAST on your own Windows machine - although I'm not sure how much RAM would be recommended for the NT database which is pretty big. Regards, Peter From ericmajinglong at gmail.com Thu Apr 11 12:49:27 2013 From: ericmajinglong at gmail.com (Eric Ma) Date: Thu, 11 Apr 2013 12:49:27 -0400 Subject: [Biopython] Request from help Message-ID: Hello everybody, I'm new to the mailing list here, though I've been playing with BioPython for quite a while. I'm having some trouble here. I wanted to display a tree of sequences for which I had done a multiple sequence alignment. I tried going through the pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). Because I'm still in the testing phase, instead of writing it as a single script, I wrote it as a series of scripts that I would execute in order. The problem I run into is at step 4 in the example, where I "feed the alignment to PhyML". My data set is 70 protein sequences, and the trouble I run into is that it takes a very, very long time at the "feeding alignment to PhyML" step. I tried running the script on my MacBook Pro overnight, and even the next morning it was not done. Am I missing something here? Just to be clear here, aligning the sequences using Muscle was successful, and I also managed to output a distance matrix from sample to sample, which I used in another downstream pipeline to display the clustering of the sequences on a 2D euclidean plane. However, I wanted to have a tree representation to validate the clustering results; the trouble is, I can't get the _phyml_tree.txt file to be created, which I would then use to draw the tree. Thanks in advance for any help! Cheers, Eric ----------------------------------------------------------------------- Please consider the environment before printing this e-mail. Do you really need to print it? http://about.me/ericmjl From jgibbons1 at mail.usf.edu Thu Apr 11 13:01:19 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Thu, 11 Apr 2013 13:01:19 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: NCBI Standalone Blast gives you the option of querying the website so that you don't have to maintain a local database. Justin Gibbons On Thu, Apr 11, 2013 at 12:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 11 13:07:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 18:07:05 +0100 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On Thu, Apr 11, 2013 at 6:01 PM, Justin Gibbons wrote: > NCBI Standalone Blast gives you the option of querying the website so that > you don't have to maintain a local database. > > Justin Gibbons Did you reply to the wrong email? This thread was about alignments and trees. Peter From p.j.a.cock at googlemail.com Thu Apr 11 13:11:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 18:11:49 +0100 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric Hi Eric, So this part is getting stuck (or taking a very long time): #Feed the alignment to PhyML using the command line wrapper: from Bio.Phylo.Applications import PhymlCommandline cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', model='WAG', alpha='e', bootstrap=100) out_log, err_log = cmdline() At that point is the computer active (high CPU load as measured via the task manager / system monitor / top / etc)? I would suggest trying PHYML at the command line by hand, first check the command the Biopython should be running: print cmdline That may give you visual progress on screen. My guess is simply that this is just slow - you are only running 100 bootstraps, but perhaps each one is taking a while and that adds up. You said you had 70 protein sequences - how many columns are there in the alignment? That can also affect run times. Peter From nuin at genedrift.org Thu Apr 11 13:05:57 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 11 Apr 2013 13:05:57 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On 2013-04-11, at 12:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > Hi With 70 OTUs you have 5.00 E115 possible trees. Guaranteed it will take a long time, independent to what parameters you are using in PhyML. Try with a smaller number of taxa, just for testing purposes and depending on the complexity of your protein phylogeny, give your computer some weeks to actually generate some result. This is not a BioPython issue, is more a phylogenetics one. Cheers Paulo > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ericmajinglong at gmail.com Thu Apr 11 13:20:14 2013 From: ericmajinglong at gmail.com (Eric Ma) Date: Thu, 11 Apr 2013 13:20:14 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: Hi Peter and Paulo, Thank you for your feedback, much appreciated! I still have very sparse knowledge about phylogenies, and especially the run times needed to build the trees, so any new knowledge is appreciated! The sequences I'm using are full Influenza A HA protein sequences, so we're talking about 1700-1750 amino acids being aligned together. The multiple sequence alignment for 70 sequences doesn't take long - on the order of minutes on my laptop. It's the "feeding into PhyML" portion that, for some reason, takes a long time. With that said, I do have a full distance matrix as one of the outputs from a previous script in this script series, in addition to the multiple sequence alignment. I have been able to feed the distance matrix into a separate clustering algorithm from scikit-learn, and I was able to successfully identify six clusters of sequences in there. Hence, I wanted to use a phylogenetic tree to confirm what I'm seeing with the clustering algorithm - it's basically two separate representations of the same data. I have heard that it is possible to create a tree from the distance matrix, and I was thinking this might be an alternative to feeding the alignment into PhyML. Does anybody know how to do this using BioPython? Cheers, Eric ----------------------------------------------------------------------- Please consider the environment before printing this e-mail. Do you really need to print it? http://about.me/ericmjl On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock wrote: > On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: > > Hello everybody, > > > > I'm new to the mailing list here, though I've been playing with BioPython > > for quite a while. > > > > I'm having some trouble here. I wanted to display a tree of sequences for > > which I had done a multiple sequence alignment. I tried going through the > > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline > ). > > Because I'm still in the testing phase, instead of writing it as a single > > script, I wrote it as a series of scripts that I would execute in order. > > > > The problem I run into is at step 4 in the example, where I "feed the > > alignment to PhyML". My data set is 70 protein sequences, and the > trouble I > > run into is that it takes a very, very long time at the "feeding > alignment > > to PhyML" step. I tried running the script on my MacBook Pro overnight, > and > > even the next morning it was not done. Am I missing something here? > > > > Just to be clear here, aligning the sequences using Muscle was > successful, > > and I also managed to output a distance matrix from sample to sample, > which > > I used in another downstream pipeline to display the clustering of the > > sequences on a 2D euclidean plane. However, I wanted to have a tree > > representation to validate the clustering results; the trouble is, I > can't > > get the _phyml_tree.txt file to be created, which I would then use to > draw > > the tree. > > > > Thanks in advance for any help! > > > > Cheers, > > Eric > > Hi Eric, > > So this part is getting stuck (or taking a very long time): > > #Feed the alignment to PhyML using the command line wrapper: > from Bio.Phylo.Applications import PhymlCommandline > cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', > model='WAG', alpha='e', bootstrap=100) > out_log, err_log = cmdline() > > At that point is the computer active (high CPU load as measured > via the task manager / system monitor / top / etc)? > > I would suggest trying PHYML at the command line by hand, first > check the command the Biopython should be running: > > print cmdline > > That may give you visual progress on screen. My guess is simply > that this is just slow - you are only running 100 bootstraps, but > perhaps each one is taking a while and that adds up. > > You said you had 70 protein sequences - how many columns > are there in the alignment? That can also affect run times. > > Peter > From nuin at genedrift.org Thu Apr 11 13:33:05 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 11 Apr 2013 13:33:05 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: <8176FA21-39F6-405A-B338-94D87E6BB7B3@genedrift.org> On 2013-04-11, at 1:20 PM, Eric Ma wrote: > Hi Peter and Paulo, > > Thank you for your feedback, much appreciated! I still have very sparse > knowledge about phylogenies, and especially the run times needed to build > the trees, so any new knowledge is appreciated! > > The sequences I'm using are full Influenza A HA protein sequences, so we're > talking about 1700-1750 amino acids being aligned together. The multiple > sequence alignment for 70 sequences doesn't take long - on the order of > minutes on my laptop. It's the "feeding into PhyML" portion that, for some > reason, takes a long time. Alignment time is much smaller than any phylogeny calculation on your data size. The number of amino acids is not that important on the final time, as the ML is calculation is quite fast, but arranging the branches is the main bottleneck. There's no easy solution for this, maybe you can try some other approaches, that won't be as good as ML (Neighbour Joning) and some that might be as good (Bayes) but take some time too. > > With that said, I do have a full distance matrix as one of the outputs from > a previous script in this script series, in addition to the multiple > sequence alignment. I have been able to feed the distance matrix into a > separate clustering algorithm from scikit-learn, and I was able to > successfully identify six clusters of sequences in there. Hence, I wanted > to use a phylogenetic tree to confirm what I'm seeing with the clustering > algorithm - it's basically two separate representations of the same data. > The distance can be used to generate a diagram, I wouldn't call it a phylogenetic tree, but it can give you some ideas. One quick way to check for your tree is to use Neighbour Joining approach, you can try Mega with your alignment file and see, calculations will be faster. Cheers Paulo > I have heard that it is possible to create a tree from the distance matrix, > and I was thinking this might be an alternative to feeding the alignment > into PhyML. Does anybody know how to do this using BioPython? > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > > > On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock wrote: > >> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: >>> Hello everybody, >>> >>> I'm new to the mailing list here, though I've been playing with BioPython >>> for quite a while. >>> >>> I'm having some trouble here. I wanted to display a tree of sequences for >>> which I had done a multiple sequence alignment. I tried going through the >>> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline >> ). >>> Because I'm still in the testing phase, instead of writing it as a single >>> script, I wrote it as a series of scripts that I would execute in order. >>> >>> The problem I run into is at step 4 in the example, where I "feed the >>> alignment to PhyML". My data set is 70 protein sequences, and the >> trouble I >>> run into is that it takes a very, very long time at the "feeding >> alignment >>> to PhyML" step. I tried running the script on my MacBook Pro overnight, >> and >>> even the next morning it was not done. Am I missing something here? >>> >>> Just to be clear here, aligning the sequences using Muscle was >> successful, >>> and I also managed to output a distance matrix from sample to sample, >> which >>> I used in another downstream pipeline to display the clustering of the >>> sequences on a 2D euclidean plane. However, I wanted to have a tree >>> representation to validate the clustering results; the trouble is, I >> can't >>> get the _phyml_tree.txt file to be created, which I would then use to >> draw >>> the tree. >>> >>> Thanks in advance for any help! >>> >>> Cheers, >>> Eric >> >> Hi Eric, >> >> So this part is getting stuck (or taking a very long time): >> >> #Feed the alignment to PhyML using the command line wrapper: >> from Bio.Phylo.Applications import PhymlCommandline >> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', >> model='WAG', alpha='e', bootstrap=100) >> out_log, err_log = cmdline() >> >> At that point is the computer active (high CPU load as measured >> via the task manager / system monitor / top / etc)? >> >> I would suggest trying PHYML at the command line by hand, first >> check the command the Biopython should be running: >> >> print cmdline >> >> That may give you visual progress on screen. My guess is simply >> that this is just slow - you are only running 100 bootstraps, but >> perhaps each one is taking a while and that adds up. >> >> You said you had 70 protein sequences - how many columns >> are there in the alignment? That can also affect run times. >> >> Peter >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From jgibbons1 at mail.usf.edu Thu Apr 11 14:10:32 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Thu, 11 Apr 2013 14:10:32 -0400 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: References: <5166805F.8060603@googlemail.com> Message-ID: NCBI Standalone Blast gives you the option of querying the website so that you don't have to maintain a local database. Justin Gibbons P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it correct this time. On Thu, Apr 11, 2013 at 5:43 AM, Peter Cock wrote: > On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade > wrote: > > Hello everyone, > > > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast > > at once? > > There are sometimes limits on the URL length, especially if going via > firewalls and proxies, so that may be one factor. > > At the NCBI end, I'm not sure what limits they impose on this: > http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > > > Sending up to 150 sequences each of 24mer length in a single string > > everything works fine. But now, I have tried the same for a string > > containing about 900 sequences. On good times, it takes the NCBI-server > > about 5min to send an answer. I save the answer and later open and parse > the > > file by other functions in my code. However, even though I have queried > the > > same 900 sequences, the resulting output-file varies in length (10 > > MB > "<\BlastOutput>" or even misses more (this does not happen why querying > 150 > > sequences or less). > > > > I would guess once the server has started sending its answers, there > might > > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > > thus depending on the current server-load, the NCBIWWW.qblast-function > > simply decides to terminate waiting for incomming data after some time, > > resulting in my blast-output-files to vary in length. Could anyone > correct > > or verify this long-fetched hypothesis? > > > > My core-lines are: > > > > orgn='Mus Musculus' #on anything else > > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > > entrez_query=str(orgn+"[orgn]")) > > save_file = open ('myblast_result.xml',"w") > > save_file.write(result.read()) > > > > Best regards, > > Matthias > > I think you've reach the scale where it would be better to run blastn > locally - ideally on a cluster if you have access to one. You can > download the whole NT database from here - most departments > running BLAST with their own Linux servers will have a central copy > which is kept automatically up to date: > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > If you don't have those kinds of resources, then you can even > run BLAST on your own Windows machine - although I'm not > sure how much RAM would be recommended for the NT > database which is pretty big. > > Regards, > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 11 14:54:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 19:54:50 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: References: <5166805F.8060603@googlemail.com> Message-ID: On Thursday, April 11, 2013, Justin Gibbons wrote: > NCBI Standalone Blast gives you the option of querying the website so that > you don't have to maintain a local database. Good point - the BLAST+ binaries added the -remote option which does that. Worth exploring as it should know and obey the NCBI limits automatically. > > Justin Gibbons > > P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it > correct this time. > > Easily done, don't worry about it. Peter From dan837446 at gmail.com Thu Apr 11 16:51:13 2013 From: dan837446 at gmail.com (Dan) Date: Fri, 12 Apr 2013 08:51:13 +1200 Subject: [Biopython] Biopython Digest, Vol 124, Issue 9 In-Reply-To: References: Message-ID: This is peripherally relevant to the question, I asked Tao Tao of NCBI user services about general guidelines for remote blast, and got this response: "In general, the key is to reduce the hits to BLAST server: At the search step, DO NOT submit searches that contain only single sequence! You need to batch the query and submit a set in a single search request. At the result polling step, you should reduce the result checking by spacing them out, and start checking for results after a delay (a few minutes). The XML result for batch queries is a bit peculiar each query is wrapped around tag You are better off leaving the other conditions default and post-process it to get the top hits" Also it's best to search between 9PM and 5AM Eastern Standard time and at weekends. Personally I seem to encounter glitches using batches above 100 but it's so specific to your particular workplace that I'm not sure if that's a good guideline. On Fri, Apr 12, 2013 at 4:00 AM, wrote: > Send Biopython mailing list submissions to > biopython at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biopython > or, via email, send a message with subject or body 'help' to > biopython-request at lists.open-bio.org > > You can reach the person managing the list at > biopython-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biopython digest..." > > > Today's Topics: > > 1. query upper limit for NCBIWWW.qblast? (Matthias Schade) > 2. Re: query upper limit for NCBIWWW.qblast? (Peter Cock) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 11 Apr 2013 11:20:31 +0200 > From: Matthias Schade > Subject: [Biopython] query upper limit for NCBIWWW.qblast? > To: biopython at lists.open-bio.org > Message-ID: <5166805F.8060603 at googlemail.com> > Content-Type: text/plain; charset=ISO-8859-15; format=flowed > > Hello everyone, > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast at once? > > Sending up to 150 sequences each of 24mer length in a single string > everything works fine. But now, I have tried the same for a string > containing about 900 sequences. On good times, it takes the NCBI-server > about 5min to send an answer. I save the answer and later open and parse > the file by other functions in my code. However, even though I have > queried the same 900 sequences, the resulting output-file varies in > length (10 MB termination-tag in "<\BlastOutput>" or even misses more (this does not > happen why querying 150 sequences or less). > > I would guess once the server has started sending its answers, there > might only be a limited time NCBIWWW.qblast waits for follow up packets > ... and thus depending on the current server-load, the > NCBIWWW.qblast-function simply decides to terminate waiting for > incomming data after some time, resulting in my blast-output-files to > vary in length. Could anyone correct or verify this long-fetched > hypothesis? > > My core-lines are: > > orgn='Mus Musculus' #on anything else > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > entrez_query=str(orgn+"[orgn]")) > save_file = open ('myblast_result.xml',"w") > save_file.write(result.read()) > > Best regards, > Matthias > > > ------------------------------ > > Message: 2 > Date: Thu, 11 Apr 2013 10:43:44 +0100 > From: Peter Cock > Subject: Re: [Biopython] query upper limit for NCBIWWW.qblast? > To: Matthias Schade > Cc: biopython at lists.open-bio.org > Message-ID: > ZYEg at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade > wrote: > > Hello everyone, > > > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast > > at once? > > There are sometimes limits on the URL length, especially if going via > firewalls and proxies, so that may be one factor. > > At the NCBI end, I'm not sure what limits they impose on this: > http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > > > Sending up to 150 sequences each of 24mer length in a single string > > everything works fine. But now, I have tried the same for a string > > containing about 900 sequences. On good times, it takes the NCBI-server > > about 5min to send an answer. I save the answer and later open and parse > the > > file by other functions in my code. However, even though I have queried > the > > same 900 sequences, the resulting output-file varies in length (10 > > MB > "<\BlastOutput>" or even misses more (this does not happen why querying > 150 > > sequences or less). > > > > I would guess once the server has started sending its answers, there > might > > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > > thus depending on the current server-load, the NCBIWWW.qblast-function > > simply decides to terminate waiting for incomming data after some time, > > resulting in my blast-output-files to vary in length. Could anyone > correct > > or verify this long-fetched hypothesis? > > > > My core-lines are: > > > > orgn='Mus Musculus' #on anything else > > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > > entrez_query=str(orgn+"[orgn]")) > > save_file = open ('myblast_result.xml',"w") > > save_file.write(result.read()) > > > > Best regards, > > Matthias > > I think you've reach the scale where it would be better to run blastn > locally - ideally on a cluster if you have access to one. You can > download the whole NT database from here - most departments > running BLAST with their own Linux servers will have a central copy > which is kept automatically up to date: > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > If you don't have those kinds of resources, then you can even > run BLAST on your own Windows machine - although I'm not > sure how much RAM would be recommended for the NT > database which is pretty big. > > Regards, > > Peter > > > ------------------------------ > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > End of Biopython Digest, Vol 124, Issue 9 > ***************************************** > From p.j.a.cock at googlemail.com Fri Apr 12 05:49:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Apr 2013 10:49:31 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: <5166805F.8060603@googlemail.com> References: <5166805F.8060603@googlemail.com> Message-ID: Dan replied via the digest (summary emails rather than individual emails) here: http://lists.open-bio.org/pipermail/biopython/2013-April/008507.html On Thu, Apr 11, 2013 at 9:51 PM, Dan wrote: > This is peripherally relevant to the question, I asked Tao Tao of NCBI user > services about general guidelines for remote blast, and got this response: > > "In general, the key is to reduce the hits to BLAST server: > At the search step, DO NOT submit searches that contain only single > sequence! You need to batch the query and submit a set in a single search > request. > At the result polling step, you should reduce the result checking by > spacing them out, and start checking for results after a delay (a few > minutes). > The XML result for batch queries is a bit peculiar each query is wrapped > around tag > You are better off leaving the other conditions default and post-process it > to get the top hits" > > Also it's best to search between 9PM and 5AM Eastern Standard time and at > weekends. > Personally I seem to encounter glitches using batches above 100 but it's so > specific to your particular workplace that I'm not sure if that's a good > guideline. > Perhaps Biopython's QBLAST wrapper could benefit from adaptive time delays in the polling step - at the moment it just checks every three seconds. Peter From john at picloud.com Fri Apr 12 19:11:43 2013 From: john at picloud.com (John Riley) Date: Fri, 12 Apr 2013 16:11:43 -0700 Subject: [Biopython] BioPython now available on PiCloud by default Message-ID: Hello, We've had some requests for BioPython to be deployed on PiCloud [1]. While any user could always create a custom environment, and install the latest version themselves [2], we've decided to address the issue directly by adding BioPython (1.60) into the default suite of scientific tools on PiCloud. In short, to offload a Python function or program that uses BioPython, you don't need to do any setup! The instructions for using other scientific tools work just the same [3]. Hope this helps! [1] http://www.picloud.com [2] http://docs.picloud.com/environment.html [3] http://docs.picloud.com/howto/pyscientifictools.html Best Regards, John -- John Riley PiCloud, Inc. From jgibbons1 at mail.usf.edu Sat Apr 13 16:13:56 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sat, 13 Apr 2013 16:13:56 -0400 Subject: [Biopython] Cookbook suggestion Message-ID: I want to add the following to the cookbook but I am unable to create an account. #using SeqIO.write() without holding records in memory. from Bio import SeqIO seq_ids=set() #create an empty set to hold the sequence IDs. indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence ID but is not held in memory for seq_record in SeqIO.parse(file_path, 'fasta'): #Filter according to some critria: seq_ids.add(seq_record.id) #write the fasta records to a new file using SeqIO.write() SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, 'fasta') So if someone who can edit the cookbook wants to add it feel free to. Justin Gibbons From p.j.a.cock at googlemail.com Sat Apr 13 16:27:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 13 Apr 2013 21:27:24 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: Hi Justin, On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons wrote: > I want to add the following to the cookbook but I am unable to create an > account. Hmm - we should fix that. Is there a specific error message from the wiki? > #using SeqIO.write() without holding records in memory. > > from Bio import SeqIO > > > seq_ids=set() #create an empty set to hold the sequence IDs. > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence > ID but is not held in memory > > for seq_record in SeqIO.parse(file_path, 'fasta'): > #Filter according to some critria: > seq_ids.add(seq_record.id) Why do call SeqIO.index, but not use it and instead get the ID list by doing a full parse of the file? Note that calling SeqIO.index is likely faster than SeqIO.parse because the index code doesn't actually load the sequence information etc - just the record identifier. This speed difference is more obvious on heavier file formats like GenBank. e.g. These single lines both get all the identifiers as a list: seq_ids = SeqIO.parse(file_path, 'fasta').keys() vs: seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] Also note that using a set rather than a list for the ids means the order is lost - which may be important. > #write the fasta records to a new file using SeqIO.write() > > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, > 'fasta') > That last line uses a list comprehension, [indexed_fasta[seq_id] for seq_id in seq_ids] That will therefore load all the records into memory as a list of SeqRecord objects, which can be avoided with a list comprehension: (indexed_fasta[seq_id] for seq_id in seq_ids) i.e. round brackets not square. > So if someone who can edit the cookbook wants to add it feel free to. > > Justin Gibbons Feedback on the documentation and efforts to improve it are always welcome. However, I'm not sure what your example is trying to do yet - it seems to rewrite a FASTA file with the records in a new order (with the order given by however Python sorts the set of IDs). Thanks, Peter From jgibbons1 at mail.usf.edu Sun Apr 14 13:53:26 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sun, 14 Apr 2013 13:53:26 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: My only goal was to demonstrate how to use SeqIO.write without holding all of the sequence records in memory by using a generator expression: SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), new_file_path,'fasta') Everything else was just to provide context for the SeqIO.write() function, but it just ended up just being confusing. I am assuming that you want to check the individual fasta records for specific criteria and then write those that match the criteria to a new file. Which is why I wrote this: for seq_record in SeqIO.parse(file_path, 'fasta'): #Filter according to some critria: seq_ids.add(seq_record.id) For example you can create individual sets holding the sequence IDs of sequences that are within a given size range, and aren't repetitive. So that seq_ids=correct_length_set.intersection(non_repetitive_set) You need the indexed fasta so that you can get a copy of the sequence records that match your criteria: ndexed_fasta=SeqIO.index( file_path, 'fasta') #Can be searched by sequence ID but is not held in memory On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock wrote: > Hi Justin, > > On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons > wrote: > > I want to add the following to the cookbook but I am unable to create an > > account. > > Hmm - we should fix that. Is there a specific error message > from the wiki? > > > #using SeqIO.write() without holding records in memory. > > > > from Bio import SeqIO > > > > > > seq_ids=set() #create an empty set to hold the sequence IDs. > > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by > sequence > > ID but is not held in memory > > > > for seq_record in SeqIO.parse(file_path, 'fasta'): > > #Filter according to some critria: > > seq_ids.add(seq_record.id) > > Why do call SeqIO.index, but not use it and instead get > the ID list by doing a full parse of the file? Note that calling > SeqIO.index is likely faster than SeqIO.parse because the > index code doesn't actually load the sequence information > etc - just the record identifier. This speed difference is more > obvious on heavier file formats like GenBank. e.g. These > single lines both get all the identifiers as a list: > > seq_ids = SeqIO.parse(file_path, 'fasta').keys() > > vs: > > seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] > > Also note that using a set rather than a list for the ids > means the order is lost - which may be important. > > > #write the fasta records to a new file using SeqIO.write() > > > > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, > > 'fasta') > > > > That last line uses a list comprehension, > [indexed_fasta[seq_id] for seq_id in seq_ids] > > That will therefore load all the records into memory as a list of > SeqRecord objects, which can be avoided with a list comprehension: > > (indexed_fasta[seq_id] for seq_id in seq_ids) > > i.e. round brackets not square. > > > So if someone who can edit the cookbook wants to add it feel free to. > > > > Justin Gibbons > > Feedback on the documentation and efforts to improve it > are always welcome. However, I'm not sure what your example > is trying to do yet - it seems to rewrite a FASTA file with the > records in a new order (with the order given by however > Python sorts the set of IDs). > > Thanks, > > Peter > From jgibbons1 at mail.usf.edu Sun Apr 14 13:58:53 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sun, 14 Apr 2013 13:58:53 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: Sorry I accidentally sent the last email. You need the indexed fasta to get a copy of the sequence records that match your criteria: indexed_fasta=SeqIO.index(file_path, 'fasta') SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), new_file_path,'fasta') As for editing the wiki when I click on "Login with OpenID" I get sent to a blank page. I also tried clicking on "Login" and tired to create a new account and was told "The action you have requested is limited to users in the group: Administrators ." On Sun, Apr 14, 2013 at 1:53 PM, Justin Gibbons wrote: > My only goal was to demonstrate how to use SeqIO.write without holding all > of the sequence records in memory by using a generator expression: > > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > new_file_path,'fasta') > > Everything else was just to provide context for the SeqIO.write() > function, but it just ended up just being confusing. > > I am assuming that you want to check the individual fasta records for > specific criteria and then write those that match the criteria to a new > file. Which is why I wrote this: > > for seq_record in SeqIO.parse(file_path, 'fasta'): > #Filter according to some critria: > seq_ids.add(seq_record.id) > > For example you can create individual sets holding the sequence IDs of > sequences that are within a given size range, and aren't repetitive. So > that seq_ids=correct_length_set.intersection(non_repetitive_set) > > You need the indexed fasta so that you can get a copy of the sequence > records that match your criteria: > > ndexed_fasta=SeqIO.index( > file_path, 'fasta') #Can be searched by sequence > ID but is not held in memory > > > > > > On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock wrote: > >> Hi Justin, >> >> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons >> wrote: >> > I want to add the following to the cookbook but I am unable to create an >> > account. >> >> Hmm - we should fix that. Is there a specific error message >> from the wiki? >> >> > #using SeqIO.write() without holding records in memory. >> > >> > from Bio import SeqIO >> > >> > >> > seq_ids=set() #create an empty set to hold the sequence IDs. >> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by >> sequence >> > ID but is not held in memory >> > >> > for seq_record in SeqIO.parse(file_path, 'fasta'): >> > #Filter according to some critria: >> > seq_ids.add(seq_record.id) >> >> Why do call SeqIO.index, but not use it and instead get >> the ID list by doing a full parse of the file? Note that calling >> SeqIO.index is likely faster than SeqIO.parse because the >> index code doesn't actually load the sequence information >> etc - just the record identifier. This speed difference is more >> obvious on heavier file formats like GenBank. e.g. These >> single lines both get all the identifiers as a list: >> >> seq_ids = SeqIO.parse(file_path, 'fasta').keys() >> >> vs: >> >> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] >> >> Also note that using a set rather than a list for the ids >> means the order is lost - which may be important. >> >> > #write the fasta records to a new file using SeqIO.write() >> > >> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], >> new_file_path, >> > 'fasta') >> > >> >> That last line uses a list comprehension, >> [indexed_fasta[seq_id] for seq_id in seq_ids] >> >> That will therefore load all the records into memory as a list of >> SeqRecord objects, which can be avoided with a list comprehension: >> >> (indexed_fasta[seq_id] for seq_id in seq_ids) >> >> i.e. round brackets not square. >> >> > So if someone who can edit the cookbook wants to add it feel free to. >> > >> > Justin Gibbons >> >> Feedback on the documentation and efforts to improve it >> are always welcome. However, I'm not sure what your example >> is trying to do yet - it seems to rewrite a FASTA file with the >> records in a new order (with the order given by however >> Python sorts the set of IDs). >> >> Thanks, >> >> Peter >> > > From p.j.a.cock at googlemail.com Mon Apr 15 06:10:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 11:10:15 +0100 Subject: [Biopython] BioPython now available on PiCloud by default In-Reply-To: References: Message-ID: On Sat, Apr 13, 2013 at 12:11 AM, John Riley wrote: > Hello, > > We've had some requests for BioPython to be deployed on PiCloud [1]. While > any user could always create a custom environment, and install the latest > version themselves [2], we've decided to address the issue directly by > adding BioPython (1.60) into the default suite of scientific tools on > PiCloud. > > In short, to offload a Python function or program that uses BioPython, you > don't need to do any setup! The instructions for using other scientific > tools work just the same [3]. Hope this helps! > > [1] http://www.picloud.com > [2] http://docs.picloud.com/environment.html > [3] http://docs.picloud.com/howto/pyscientifictools.html > > Best Regards, > John Sounds interesting, and you have some very keen users already :) http://blog.picloud.com/2011/09/27/building-a-biological-database-and-doing-comparative-genomics-in-the-cloud/ Regards, Peter From p.j.a.cock at googlemail.com Mon Apr 15 06:46:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 11:46:53 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons wrote: > Sorry I accidentally sent the last email. > > You need the indexed fasta to get a copy of the sequence records that match > your criteria: > > indexed_fasta=SeqIO.index(file_path, 'fasta') > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > new_file_path,'fasta') With a simple sequential file format like FASTA where there are no complex file headers/footers to worry about, this might be the faster route: with open(new_file_path, "w") as handle: for seq_id in seq_ids: handle.write(indexed_fasta.get_raw(seq_id)) The idea here is never to parse the records into SeqRecord objects, just keep them as raw strings in FASTA format. The same idea works well on GenBank or SwissProt files which are slower to parse, there are examples of this in the main Tutorial, http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Were you intending this to be a self contained cookbook example for: http://biopython.org/wiki/Category:Cookbook ? > As for editing the wiki when I click on "Login with OpenID" I get sent to a > blank page. I also tried clicking on "Login" and tired to create a new > account and was told "The action you have requested is limited to users in > the group: Administrators > ." Thanks - I've passed that on to our volunteer SysAdmin team. (As an aside, do you have a GitHub account and would you think it would be easier to use the wiki hosted on GitHub instead of our own MediaWiki installation?) Thanks, Peter From swang129 at gmail.com Mon Apr 15 07:15:23 2013 From: swang129 at gmail.com (Sarah Wang) Date: Mon, 15 Apr 2013 04:15:23 -0700 Subject: [Biopython] pysam installation errors Inbox x In-Reply-To: References: Message-ID: When I tried to install pysam with "python setup.py install", multiple > warning messages have been generated (error messages copied below). I can > not import pysam. How can I resolve them? Thanks > > $Python setup.py install > > ... > Compiling module Cython.Plex.Scanners ... > Compiling module Cython.Plex.Actions ... > Compiling module Cython.Compiler.Lexicon ... > Compiling module Cython.Compiler.Scanning ... > Compiling module Cython.Compiler.Parsing ... > Compiling module Cython.Compiler.Visitor ... > Compiling module Cython.Compiler.FlowControl ... > Compiling module Cython.Compiler.Code ... > Compiling module Cython.Runtime.refnanny ... > warning: no files found matching '*.pyx' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.pxd' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.h' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.pxd' under directory 'Cython/Utility' > clang: warning: argument unused during compilation: '-mno-fused-madd' > /tmp/easy_install-9yggMe/ > Cython-0.18/Cython/Plex/Scanners.c:7117:18: > warning: > unused function '__Pyx_CyFunction_New' [-Wunused-function] > static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef > *ml,... > ^ > 1 warning generated. > /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:2992:31: > warning: > implicit conversion loses integer precision: 'long' to 'int' > [-Wshorten-64-to-32] > __pyx_v_self->input_state = __pyx_v_input_state; > ~ ^~~~~~~~~~~~~~~~~~~ > /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:7117:18: > warning: > unused function '__Pyx_CyFunction_New' [-Wunused-function] > static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef > *ml,... > ^ > 2 warnings generated. > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > Adding Cython 0.18 to easy-install.pth file > Installing cygdb script to /usr/local/bin > Installing cython script to /usr/local/bin > > Installed > /Library/Python/2.7/site-packages/Cython-0.18-py2.7-macosx-10.8-intel.egg > Finished processing dependencies for pysam==0.7.4 > > > >>> import pysam > Traceback (most recent call last): > File "", line 1, in > File "pysam/__init__.py", line 1, in > from pysam.csamtools import * > ImportError: No module named csamtools > From p.j.a.cock at googlemail.com Mon Apr 15 07:27:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 12:27:30 +0100 Subject: [Biopython] pysam installation errors Inbox x In-Reply-To: References: Message-ID: On Mon, Apr 15, 2013 at 12:15 PM, Sarah Wang wrote: > When I tried to install pysam with "python setup.py install", multiple > warning messages have been generated (error messages copied below). I can > not import pysam. How can I resolve them? Thanks Hi Sarah, This is the Biopython mailing list, and while we do discuss other tools in this case the pysam Google Group is the best place to ask: https://groups.google.com/forum/?fromgroups=#!topic/pysam-user-group/tOikIFU_ZFk Peter P.S. Those were compiler warnings, not errors, and I would guess they can be ignored. From ferreirafm at usp.br Mon Apr 15 08:34:12 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Mon, 15 Apr 2013 09:34:12 -0300 Subject: [Biopython] BioPython now available on PiCloud by default In-Reply-To: References: Message-ID: <516BF3C4.1070107@usp.br> Hi John, Thanks for sharing such a very nice module. Best, Fred Em 12-04-2013 20:11, John Riley escreveu: > Hello, > > We've had some requests for BioPython to be deployed on PiCloud [1]. While > any user could always create a custom environment, and install the latest > version themselves [2], we've decided to address the issue directly by > adding BioPython (1.60) into the default suite of scientific tools on > PiCloud. > > In short, to offload a Python function or program that uses BioPython, you > don't need to do any setup! The instructions for using other scientific > tools work just the same [3]. Hope this helps! > > [1] http://www.picloud.com > [2] http://docs.picloud.com/environment.html > [3] http://docs.picloud.com/howto/pyscientifictools.html > > Best Regards, > John > > -- > John Riley > PiCloud, Inc. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Dr. Frederico Moraes Ferreira University of Sao Paulo School of Medice Heart Institute - Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 S?o Paulo - SP Brasil From jgibbons1 at mail.usf.edu Mon Apr 15 15:40:15 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Mon, 15 Apr 2013 15:40:15 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: It looks like there is already an example of this in the tutorial under 18.1.5, but I was planning on making it a self contained cookbook example so that it is easier to find. If this is the fastest way to do it though: with open(new_file_path, "w") as handle: for seq_id in seq_ids: handle.write(indexed_fasta. get_raw(seq_id)) Is there any advantage to using SeqIO.write() other then it being shorter? I do not have a GitHub account so I cannot comment on whether it would be easier to use Github. Thanks, Justin On Mon, Apr 15, 2013 at 6:46 AM, Peter Cock wrote: > On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons > wrote: > > Sorry I accidentally sent the last email. > > > > You need the indexed fasta to get a copy of the sequence records that > match > > your criteria: > > > > indexed_fasta=SeqIO.index(file_path, 'fasta') > > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > > new_file_path,'fasta') > > With a simple sequential file format like FASTA where there are no complex > file headers/footers to worry about, this might be the faster route: > > with open(new_file_path, "w") as handle: > for seq_id in seq_ids: > handle.write(indexed_fasta.get_raw(seq_id)) > > The idea here is never to parse the records into SeqRecord objects, just > keep them as raw strings in FASTA format. The same idea works well on > GenBank or SwissProt files which are slower to parse, there are examples > of this in the main Tutorial, > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Were you intending this to be a self contained cookbook example for: > http://biopython.org/wiki/Category:Cookbook ? > > > As for editing the wiki when I click on "Login with OpenID" I get sent > to a > > blank page. I also tried clicking on "Login" and tired to create a new > > account and was told "The action you have requested is limited to users > in > > the group: Administrators< > http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1 > > > > ." > > Thanks - I've passed that on to our volunteer SysAdmin team. > > (As an aside, do you have a GitHub account and would you think > it would be easier to use the wiki hosted on GitHub instead of > our own MediaWiki installation?) > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Tue Apr 16 05:02:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 10:02:58 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: On Mon, Apr 15, 2013 at 8:40 PM, Justin Gibbons wrote: > It looks like there is already an example of this in the tutorial under > 18.1.5, but I was planning on making it a self contained cookbook example > so that it is easier to find. > > If this is the fastest way to do it though: > > with open(new_file_path, "w") as handle: > for seq_id in seq_ids: > handle.write(indexed_fasta. > get_raw(seq_id)) > Is there any advantage to using SeqIO.write() other then it being shorter? There are two linked choices here, (a) Full parsing into SeqRecord objects using SeqIO.parse, or use the SeqIO.index or SeqIO.index_db to just extract the record identifiers. Unless you need some of the annotation or the sequence, parsing it into a SeqRecord is a waste of CPU time. (b) Convert the SeqRecord back into a file on disk, or reuse the original representation from the input file. For a format like FASTA, this is almost a moot point - the only change is the white space (using SeqIO.write will produce consistent line wrapping). For some of the richer formats like GenBank the parse/write round trip is not expected to produce an identical output, so it can be prudent to reuse the original. For some formats like we don't have writing support, so you have to reuse the original. My point whether to use SeqIO.write() or indexing and get_raw() depends on the file format and what you are trying to do. My recommendations would be to use get_raw to write simple file formats without headers/footers if: (*) You need to preserve original records exactly (*) You need this to be as fast as possible (*) SeqIO.write doesn't support the file format Otherwise using SeqIO.write should be fine - it is also simpler in terms of the code to call it. If course, if you are editing the records in any way, then you must use SeqIO.write anyway. > I do not have a GitHub account so I cannot comment on whether > it would be easier to use Github. Thanks. My thinking right now you would need to register separately for (1) the mailing lists, (2) editing the wiki, (3) reporting bugs on RedMine, (4) submitting pull requests on github, If we used GitHub for the wiki and/or issue tracker, this means less user accounts so a little easier for contributors, but also less SysAdmin work behind the scenes. Peter From nuin at genedrift.org Wed Apr 17 14:45:20 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Wed, 17 Apr 2013 14:45:20 -0400 Subject: [Biopython] GEO profiles retrieval Message-ID: Hi everyone Quite a longish question about some data retrieval we are trying to implement on GEO profiles. I don't know if this is possible to achieve programatically with (or without BioPython), but some parts I already have set using Python and BioPython. What we are trying to achieve: - we are building a pipeline where initially we want to see if the gene in question (let's say PTEN) is over or under expressed in certain conditions. - using a eSearch URL/procedure I can get an XML with all the profile IDs for PTEN - in order to get more information about each profile, I can use an eSummary URL/procedure that will get an XML file for each profile - with these profiles we then want to check the gene expression level in each sample subgroup or the study and see if the gene is under or over expressed, or there's no change between the groups. The problem I have is that in the profile XML file there's no information about sample annotation, or gene expression in each sample. I created a workaround that from the eSummary XML, I can get to this page of the profile http://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2877:1441937_s_at using the GDS and probe ID found on the XML. Again, from this file there's no easy way to extract the sample grouping/annotation, although it's quite straightforward to extract the gene expression levels for each sample. What I want to find is: - a way to get sample grouping/annotation for a specific GDS, that would give me the sample IDs that I could correlate to an expression value - a eSearch, eSummary, eFetch, any URL that would give me expression values per sample, with sample ID annotated to a group Thanks in advance for any help, idea and comments. Paulo From markbudde at gmail.com Wed Apr 17 17:24:00 2013 From: markbudde at gmail.com (Mark Budde) Date: Wed, 17 Apr 2013 14:24:00 -0700 Subject: [Biopython] Adding a SeqFeature to a SeqRecord Message-ID: Hi, I have a simple question. The cookbook shows many examples using SeqFeatures, I can't find any information on adding features to a SeqRecord. Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans nucleotides 10..100, is called "Gene1" and is on the reverse strand. How would I add this to my SeqRecord? Thanks, Mark From p.j.a.cock at googlemail.com Wed Apr 17 17:53:57 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Apr 2013 22:53:57 +0100 Subject: [Biopython] Adding a SeqFeature to a SeqRecord In-Reply-To: References: Message-ID: Hi Mark, On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde wrote: > Hi, I have a simple question. The cookbook shows many examples using > SeqFeatures, I can't find any information on adding features to a > SeqRecord. The "Tutorial and Cookbook" does have examples of creating a SeqFeature - if this was not obvious to you how might we make it clearer? http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf See also the docstrings, >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> help(SeqFeature) >>> help(FeatureLocation) Online here (for the current release): http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How > would I add this to my SeqRecord? > > Thanks, > Mark Which version of Biopython do you have? The strand is moving from the SeqFeature to the FeatureLocation, but this will work on old and new: from Bio.SeqFeature import SeqFeature, FeatureLocation loc = FeatureLocation(9, 100) f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"}) This is preferred for future-proofing: from Bio.SeqFeature import SeqFeature, FeatureLocation loc = FeatureLocation(9, 100, strand=-1) f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"}) Exactly where you put the gene name depends on what you'll be doing with the record - for GenBank or EMBL output, using a locus_tag key would be a sensible option. Then if you have a SeqRecord, use my_record.features.append(f) or similar (and for GenBank/EMBL output pay attention to the order). Is that clear? Regards, Peter From markbudde at gmail.com Wed Apr 17 18:52:31 2013 From: markbudde at gmail.com (Mark Budde) Date: Wed, 17 Apr 2013 15:52:31 -0700 Subject: [Biopython] Adding a SeqFeature to a SeqRecord In-Reply-To: References: Message-ID: On Wed, Apr 17, 2013 at 2:53 PM, Peter Cock wrote: > Hi Mark, > > On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde wrote: > > Hi, I have a simple question. The cookbook shows many examples using > > SeqFeatures, I can't find any information on adding features to a > > SeqRecord. > > The "Tutorial and Cookbook" does have examples of creating a > SeqFeature - if this was not obvious to you how might we make > it clearer? > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > I am coming at this from the perspective of generating a plasmid with features on it. I guess most people would be using this for mining data from pubmed or something, so maybe I'm just not the targeted user. I spent a lot of time looking for how to name a feature, like you would in a vector editing program. I now see that I can generate a feature as shown in the first example in 4.3.3 - is this what you are referring to? I was confused earlier because I could never figure out how to name the feature, nor how to add it to the SeqRecord. I can see how to do this from you example below (using qualifiers to name the feature, and append to add the feature). I think the cookbook would benefit from adding a line such as >>> len(MyRecord.features) 0 >>> example_feature.qualifiers['locus_tag'] = 'Gene1' >>> MyRecord.features.append(example_feature) >>> len(MyRecord.features) 1 > See also the docstrings, > > >>> from Bio.SeqFeature import SeqFeature, FeatureLocation > >>> help(SeqFeature) > >>> help(FeatureLocation) > > Online here (for the current release): > http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html > > http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html > > > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans > > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How > > would I add this to my SeqRecord? > > > > Thanks, > > Mark > > Which version of Biopython do you have? The strand is moving > from the SeqFeature to the FeatureLocation, but this will work > on old and new: > > I have v1.59 > from Bio.SeqFeature import SeqFeature, FeatureLocation > loc = FeatureLocation(9, 100) > f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"}) > > This is preferred for future-proofing: > > from Bio.SeqFeature import SeqFeature, FeatureLocation > loc = FeatureLocation(9, 100, strand=-1) > f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"}) > > Exactly where you put the gene name depends on what you'll be > doing with the record - for GenBank or EMBL output, using a > locus_tag key would be a sensible option. > > Then if you have a SeqRecord, use my_record.features.append(f) > or similar (and for GenBank/EMBL output pay attention to the > order). > > Is that clear? Yes. Your example provided here is clear and I think it should be added to the cookbook. > > Regards, > > Peter > Thanks for your help Peter, and pardon my ignorance. -Mark From mictadlo at gmail.com Mon Apr 22 00:05:58 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 22 Apr 2013 14:05:58 +1000 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' Message-ID: Hi, The following code (BioPython 1.61, Blast+ 2.2.26): from Bio.Blast import NCBIXML with open("test/X.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: for alignment in blast_records.alignments: for hsp in alignment.hsps: if hsp.expect < 0.04: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' caused the following error: $ python parseBlastXML.py Traceback (most recent call last): File "parseBlastXML.py", line 8, in for alignment in blast_records.alignments: AttributeError: 'generator' object has no attribute 'alignments' What did I do wrong? Thank you in advance. Mic From mictadlo at gmail.com Mon Apr 22 00:27:12 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 22 Apr 2013 14:27:12 +1000 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' In-Reply-To: References: Message-ID: My mistake. This is the solution from Bio.Blast import NCBIXML with open("test/XA10m_v3.0.aa.snap_vs_uniref90.blastp.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: for alignment in *blast_record.alignments*: for hsp in alignment.hsps: if hsp.expect < 0.04: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' On Mon, Apr 22, 2013 at 2:05 PM, Mic wrote: > Hi, > The following code (BioPython 1.61, Blast+ 2.2.26): > > from Bio.Blast import NCBIXML > > with open("test/X.xml") as bf: > blast_records = NCBIXML.parse(bf) > > for blast_record in blast_records: > for alignment in blast_records.alignments: > for hsp in alignment.hsps: > if hsp.expect < 0.04: > print '****Alignment****' > print 'sequence:', alignment.title > print 'length:', alignment.length > print 'e value:', hsp.expect > print hsp.query[0:75] + '...' > print hsp.match[0:75] + '...' > print hsp.sbjct[0:75] + '...' > > caused the following error: > $ python parseBlastXML.py > Traceback (most recent call last): > File "parseBlastXML.py", line 8, in > for alignment in blast_records.alignments: > AttributeError: 'generator' object has no attribute 'alignments' > > What did I do wrong? > > Thank you in advance. > > Mic > > > From p.j.a.cock at googlemail.com Mon Apr 22 04:08:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Apr 2013 09:08:50 +0100 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' In-Reply-To: References: Message-ID: On Monday, April 22, 2013, Mic wrote: > My mistake. This is the solution > from Bio.Blast import NCBIXML Hi Mic, Yep, you had two variables with very similar names. An easy mistake to make - its one of the things you'll learn to check with an AttrributeError: Am I using the object I think I'm using. Well done for solving it yourself, and thank you for posting the solution here. Regards, Peter From mictadlo at gmail.com Wed Apr 24 01:55:06 2013 From: mictadlo at gmail.com (Mic) Date: Wed, 24 Apr 2013 15:55:06 +1000 Subject: [Biopython] NCBIXML: hit start and end Message-ID: Hi, I have tried to rewrite the Perl code to Biopython sub retrieve { my $blast_report = $options->{'blast'}; my $max_hits = $options->{'maxhits'}; my $searchio = new Bio::SearchIO( -format => 'blast', -file => $blast_report ); while ( my $result = $searchio->next_result ) { my $query_name = $result->query_name(); my $count_unirefs = 0; my %hit_names_count = (); while ( my $hit = $result->next_hit ) { $count_unirefs++; my $count_hsp = 0; my @plushsps = (); my @minhsps = (); while ( my $hsp = $hit->next_hsp ) { $count_hsp++; my $query_start = $hsp->start('query'); my $query_end = $hsp->end('query'); my $hit_start = $hsp->start('hit'); my $hit_end = $hsp->end('hit'); my $strand = $hsp->strand(); my $hit_desc = $hit->description(); my @hsp_data = ($query_start, $query_end, $hit_start, $hit_end, $hit_desc); } } } } Biopython code: --------------- from Bio import SeqIO from Bio.Blast import NCBIXML def retrieve_hits_data(): max_hits = 5 # Change to args with open("test/x.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: print blast_record.query print for alignment in blast_record.alignments: print 'sequence:', alignment.title print alignment.hit_id print alignment.hit_def print 'length:', alignment.length for hsp in alignment.hsps: print "HSPs" print "----" print 'e value:', hsp.expect #print hsp.query #print hsp.match #print hsp.sbjct print hsp.score print hsp.bits print hsp.num_alignments print hsp.identities print hsp.positives print hsp.gaps print hsp.align_length print hsp.strand print hsp.frame print hsp.query_start print hsp.query_end #print hsp.hit_start #print hsp.hit_end print hsp.sbjct_start print hsp.sbjct_end retrieve_hits_data() Output from Biopython code: XA10_v3.0-snap.1 XA10_v3.0-snap.2 XA10_v3.0-snap.3 XA10_v3.0-snap.4 sequence: UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH length: 308 HSPs ---- e value: 8.30308e-88 709.0 277.715 None 146 192 10 285 (None, None) (0, 0) 10 290 8 286 How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in Biopython? Why does blast_record.query appears immediately in sequence and not after the other two for loops has finished? Thank you in advance. Mic From w.arindrarto at gmail.com Wed Apr 24 03:04:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 24 Apr 2013 09:04:02 +0200 Subject: [Biopython] NCBIXML: hit start and end In-Reply-To: References: Message-ID: Hi Mic, > How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in > Biopython? With NCBIXML, they should be hsp.sbjct_start and hsp.sbjct_end respectively. > Why does blast_record.query appears immediately in sequence and not after > the other two for loops has finished? It may be because the first three queries in your BLAST XML results (XA10_v3.0-snap.{1..3}) do not have any hits and hsps. Check with your XML results to be sure. Hope that helps :), Bow From p.j.a.cock at googlemail.com Wed Apr 24 15:19:48 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Apr 2013 20:19:48 +0100 Subject: [Biopython] Biopython GSoC 2013 applications via NESCent Message-ID: To all the Biopythoneers, For the last few years Biopython has participated in the Google Summer of Code (GSoC) program under the umbrella of the Open Bioinformatics Foundation (OBF): https://developers.google.com/open-source/soc/ https://github.com/OBF/GSoC Unfortunately like quite a few previously accepted organisations, this year the OBF not accepted. Google has kept the total about the same year on year, so this is probably simply a slot rotation to get some new organisations involved. The good news (for those not following the Biopython-dev mailing list) is we have an alternative option agreed with the good people at NESCent, as we did back in 2009: http://biopython.org/wiki/Google_Summer_of_Code http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 I'd like to thank Eric for co-ordinating this, and encourage any interested potential students to sign up to the Biopython development list and NESCent's Google+ group as soon as possible (if you haven't done so already): http://lists.open-bio.org/mailman/listinfo/biopython-dev https://plus.google.com/communities/105828320619238393015 Google are already accepting student applications, and the deadline is Friday 3 May. That doesn't leave very long for asking feedback and talking to potential mentors - which is essential for a competitive proposal. Thank you for your interest, Peter From nuin at genedrift.org Thu Apr 25 14:42:07 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 25 Apr 2013 14:42:07 -0400 Subject: [Biopython] PubmedCentral XML parsing Message-ID: Hi What would be the most direct way of parsing XML files downloaded from PubmedCentral ftp using BioPython? These are files that use the archivearticle.dtd and when parsed using non-DTD based code generate broken paragraphs on the body of the document due to < > between

items of the body. Thanks in advance Paulo From p.j.a.cock at googlemail.com Thu Apr 25 15:05:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 25 Apr 2013 20:05:32 +0100 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin wrote: > Hi > > What would be the most direct way of parsing XML files downloaded from > PubmedCentral ftp using BioPython? These are files that use the > archivearticle.dtd and when parsed using non-DTD based code generate broken > paragraphs on the body of the document due to < > between

items of the > body. > > Thanks in advance > > Paulo The Bio.Entrez parser is DTD based, and might suit your needs. Peter From nuin at genedrift.org Thu Apr 25 15:16:49 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 25 Apr 2013 15:16:49 -0400 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: Hi Peter Thanks a lot. I am getting an error when trying to parse with Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP server in order to avoid their bulk downloading restrictions. Anyway, the code I am using is quite simple (with ipython): In [1]: from Bio import Entrez In [2]: handle = open('nihms83342.nxml') In [3]: records = Entrez.parse(handle) In [4]: for i in records: ...: print i ...: --------------------------------------------------------------------------- NotXMLError Traceback (most recent call last) in () ----> 1 for i in records: 2 print i 3 /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle) 229 # We did not see the initial 231 raise NotXMLError("XML declaration not found") 232 self.parser.Parse("", True) 233 self.parser = None NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format. And the file header is

Is there a different way of parsing this file? Thanks in advance Paulo On 2013-04-25, at 3:05 PM, Peter Cock wrote: > On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin wrote: >> Hi >> >> What would be the most direct way of parsing XML files downloaded from >> PubmedCentral ftp using BioPython? These are files that use the >> archivearticle.dtd and when parsed using non-DTD based code generate broken >> paragraphs on the body of the document due to < > between

items of the >> body. >> >> Thanks in advance >> >> Paulo > > The Bio.Entrez parser is DTD based, and might suit your needs. > > Peter From zhigang.wu at email.ucr.edu Fri Apr 26 20:52:19 2013 From: zhigang.wu at email.ucr.edu (Zhigang Wu) Date: Fri, 26 Apr 2013 17:52:19 -0700 Subject: [Biopython] [Biopython-dev] Biopython GSoC 2013 applications via NESCent In-Reply-To: References: Message-ID: Hi Peter, I am interested in implementing the lazy-loading sequence parsers. I know the time is pretty tight for me to write an proposal on it. But even I cannot contribute under the umbrella of GSoC and assuming no body is implemented, I am still interested in implementing this (I just wanna have something nice on my CV and while contributing to Open source software community as well). While at this moment, I don't have very clear picture on how to do it. Can you point me to somewhere where I can start to get a sense how this can be implemented. As far as I know, samtools (view) may have similar techniques in them. Thanks. Zhigang On Wed, Apr 24, 2013 at 12:19 PM, Peter Cock wrote: > To all the Biopythoneers, > > For the last few years Biopython has participated in the > Google Summer of Code (GSoC) program under the umbrella > of the Open Bioinformatics Foundation (OBF): > https://developers.google.com/open-source/soc/ > https://github.com/OBF/GSoC > > Unfortunately like quite a few previously accepted organisations, > this year the OBF not accepted. Google has kept the total about > the same year on year, so this is probably simply a slot rotation > to get some new organisations involved. > > The good news (for those not following the Biopython-dev > mailing list) is we have an alternative option agreed with > the good people at NESCent, as we did back in 2009: > > http://biopython.org/wiki/Google_Summer_of_Code > http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 > > I'd like to thank Eric for co-ordinating this, and encourage > any interested potential students to sign up to the Biopython > development list and NESCent's Google+ group as soon as > possible (if you haven't done so already): > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > https://plus.google.com/communities/105828320619238393015 > > Google are already accepting student applications, and the > deadline is Friday 3 May. That doesn't leave very long for > asking feedback and talking to potential mentors - which > is essential for a competitive proposal. > > Thank you for your interest, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mictadlo at gmail.com Sun Apr 28 21:13:49 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 29 Apr 2013 11:13:49 +1000 Subject: [Biopython] gff installation failed with easy_install Message-ID: Hi, I have tried to install gff with easy_install, but I got the following error: $ easy_install --prefix=/home/mic/apps/pymodules -UZ https://github.com/chapmanb/bcbb/tree/master/gff Downloading https://github.com/chapmanb/bcbb/tree/master/gff error: Unexpected HTML page found at https://github.com/chapmanb/bcbb/tree/master/gff How is it possible to install gff? Thank you in advance. Mic From chapmanb at 50mail.com Mon Apr 29 06:34:42 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Apr 2013 06:34:42 -0400 Subject: [Biopython] gff installation failed with easy_install In-Reply-To: <517DEECF.60705@bx.psu.edu> References: <517DEECF.60705@bx.psu.edu> Message-ID: <87bo8xhbgd.fsf@fastmail.fm> Mic; > I have tried to install gff with easy_install, but I got the following > error: > $ easy_install --prefix=/home/mic/apps/pymodules -UZ > https://github.com/chapmanb/bcbb/tree/master/gff > Downloading https://github.com/chapmanb/bcbb/tree/master/gff > error: Unexpected HTML page found at > https://github.com/chapmanb/bcbb/tree/master/gff > > How is it possible to install gff? I don't know of a way to install directly from git with subdirectories like that. You'd need to clone, then install with easy_install or pip: $ git clone git://github.com/chapmanb/bcbb.git $ easy_install bcbb/gff $ pip install bcbb/gff Apologies about the convoluted setup. Depending on what you're doing, you might want to have a look at gffutils: https://github.com/daler/gffutils We're working on rolling the functionality from the gff library into this so there'll be one place to work from for GFF in python. Hope this helps, Brad From p.j.a.cock at googlemail.com Mon Apr 29 07:23:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 29 Apr 2013 12:23:16 +0100 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: On Thu, Apr 25, 2013 at 8:16 PM, Paulo Nuin wrote: > Hi Peter > > Thanks a lot. I am getting an error when trying to parse with > Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP > server in order to avoid their bulk downloading restrictions. Anyway, the > code I am using is quite simple (with ipython): > > In [1]: from Bio import Entrez > > In [2]: handle = open('nihms83342.nxml') > > In [3]: records = Entrez.parse(handle) > > In [4]: for i in records: > ...: print i > ...: > > --------------------------------------------------------------------------- > NotXMLError Traceback (most recent call > last) > in () > ----> 1 for i in records: > 2 print i > 3 > > /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, > handle) > 229 # We did not see the initial declaration, so > 230 # probably the input data is not in XML > format. > --> 231 raise NotXMLError("XML declaration not > found") > 232 self.parser.Parse("", True) > 233 self.parser = None > > NotXMLError: Failed to parse the XML data (XML declaration not found). > Please make sure that the input data are in XML format. > > And the file header is > > > DTD v2.3 20070202//EN" "archivearticle.dtd"> >

xmlns:mml="http://www.w3.org/1998/Math/MathML" > article-type="research-article" xml:lang="EN"> > > > > > > Is there a different way of parsing this file? > > Thanks in advance > > Paulo Hi Paulo, The header you've shown here does not match the file you attached to the bug report (the where first line is missing and there seem to be no line breaks either): https://redmine.open-bio.org/issues/3430 Where exactly did the nihms83342.nxml file come from? Is there a URL we can download it from to check? Thanks, Peter From mictadlo at gmail.com Mon Apr 29 23:13:19 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 30 Apr 2013 13:13:19 +1000 Subject: [Biopython] gff installation failed with easy_install In-Reply-To: <87bo8xhbgd.fsf@fastmail.fm> References: <517DEECF.60705@bx.psu.edu> <87bo8xhbgd.fsf@fastmail.fm> Message-ID: Thank you it is working. On Mon, Apr 29, 2013 at 8:34 PM, Brad Chapman wrote: > > Mic; > > > I have tried to install gff with easy_install, but I got the following > > error: > > $ easy_install --prefix=/home/mic/apps/pymodules -UZ > > https://github.com/chapmanb/bcbb/tree/master/gff > > Downloading https://github.com/chapmanb/bcbb/tree/master/gff > > error: Unexpected HTML page found at > > https://github.com/chapmanb/bcbb/tree/master/gff > > > > How is it possible to install gff? > > I don't know of a way to install directly from git with subdirectories > like that. You'd need to clone, then install with easy_install or pip: > > $ git clone git://github.com/chapmanb/bcbb.git > $ easy_install bcbb/gff > $ pip install bcbb/gff > > Apologies about the convoluted setup. Depending on what you're doing, > you might want to have a look at gffutils: > > https://github.com/daler/gffutils > > We're working on rolling the functionality from the gff library into > this so there'll be one place to work from for GFF in python. > > Hope this helps, > Brad > From mictadlo at gmail.com Tue Apr 30 00:12:34 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 30 Apr 2013 14:12:34 +1000 Subject: [Biopython] GFF parsing with biopython Message-ID: Hi, I have the following GFF file from a SNAP X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 X1 SNAP Exon 2813 2945 4.836 + . X1-snap.1 X1 SNAP Eterm 3013 3033 10.467 + . X1-snap.1 X1 SNAP Esngl 3457 3702 -17.856 + . X1-snap.2 X1 SNAP Einit 4901 4974 -4.954 + . X1-snap.3 X1 SNAP Eterm 5021 5150 14.231 + . X1-snap.3 X1 SNAP Einit 6245 7325 -1.525 - . X1-snap.4 X1 SNAP Eterm 5974 6008 5.398 - . X1-snap.4 With the code below I have tried to parse the above GFF file from BCBio import GFF from pprint import pprint from BCBio.GFF import GFFExaminer def retrieve_pred_genes_data(): with open("test/X1_small.snap.gff") as sf: #examiner = GFFExaminer() #pprint(examiner.available_limits(sf)) for rec in GFF.parse(sf): pprint(rec.id) pprint(rec.description) pprint(rec.name) pprint(rec.features) #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute retrieve_pred_genes_data() and got the following output: 'X1' '' '' [SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945), strand=1), type='Exon'), SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702), strand=1), type='Esngl'), SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325), strand=-1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008), strand=-1), type='Eterm')] and with GFFExaminer I got these: {'gff_id': {('X1',): 8}, 'gff_source': {('SNAP',): 8}, 'gff_source_type': {('SNAP', 'Einit'): 3, ('SNAP', 'Esngl'): 1, ('SNAP', 'Eterm'): 3, ('SNAP', 'Exon'): 1}, 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}} I found these examples ( https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt), but I got these kind of errors: #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute What did I do wrong and how is it possible to access all fields in the above GFF file? Thank you in advance. Mic From markbudde at gmail.com Mon Apr 1 18:41:43 2013 From: markbudde at gmail.com (Mark Budde) Date: Mon, 1 Apr 2013 11:41:43 -0700 Subject: [Biopython] New to BP. Looking for closely spaced genes Message-ID: Hi, Before I dive too far into BioPython, I'd like to get some input if you BioPython is an appropriate tool for my task.... I would like to look at the human genome ORF structure and identify regions where ORFs are closely spaced but differentially regulated, and also identify whether the ORFs are facing the same direction of opposing directions. To do this, I assume I would first download the annotated genome and write a script in BioPython annotating how far each ORF is from it's neighbors, what the orientation is, and store the result in a dictionary. Then I would download some expression data sets and add this to the data to the dictionary. Then I would write some algorithm comparing gene distance, orientation and expression correlation to generate a list of candidate ORF pairs which fit my criteria. My question is, is BioPython a reasonable tool to accomplish this, or is it going to be way to slow whereas some alternative package is better suited for my task? Thanks, Mark Budde From dtomso at agbiome.com Mon Apr 1 19:09:39 2013 From: dtomso at agbiome.com (Dan Tomso) Date: Mon, 1 Apr 2013 19:09:39 +0000 Subject: [Biopython] New to BP. Looking for closely spaced genes In-Reply-To: References: Message-ID: <0bdbbf85a7284f21ad6d03aec6ac55cb@SN2PR03MB015.namprd03.prod.outlook.com> Hi, Mark. I think BioPython will have the tools you need to do the mechanical handling of sequences. You might want to contemplate various strategies to do the positional comparisons and data overlays. For example, if I were approaching this, I would start building position tables for the various content in SQL and then do the set/join/overlap work there. But to re-answer your primary question--yes, you can get the sequence and features parsed in BioPython with reasonable convenience. Best regards, Dan Tomso ________________________________________ From: biopython-bounces at lists.open-bio.org on behalf of Mark Budde Sent: Monday, April 01, 2013 2:41 PM To: biopython Subject: [Biopython] New to BP. Looking for closely spaced genes Hi, Before I dive too far into BioPython, I'd like to get some input if you BioPython is an appropriate tool for my task.... I would like to look at the human genome ORF structure and identify regions where ORFs are closely spaced but differentially regulated, and also identify whether the ORFs are facing the same direction of opposing directions. To do this, I assume I would first download the annotated genome and write a script in BioPython annotating how far each ORF is from it's neighbors, what the orientation is, and store the result in a dictionary. Then I would download some expression data sets and add this to the data to the dictionary. Then I would write some algorithm comparing gene distance, orientation and expression correlation to generate a list of candidate ORF pairs which fit my criteria. My question is, is BioPython a reasonable tool to accomplish this, or is it going to be way to slow whereas some alternative package is better suited for my task? Thanks, Mark Budde _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From jordan.r.willis at Vanderbilt.Edu Tue Apr 2 04:40:36 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Tue, 2 Apr 2013 04:40:36 +0000 Subject: [Biopython] Superimposer troubles Message-ID: Hello List, I'm having trouble working through some issues with the superimposer for all-atom superpositions. Often times, we work on protein design and our end PDB files differs in atom-number and sometimes composition from our input. I'm a big fan of the Superimposer, so we have implemented like this: p = PDBParser() native_pdb = p.get_structure("input","input.pdb") designed_pdb = p.get_structure("output","output.pdb") native_ca_atoms = [] native_all_atoms = [] designed_ca_atoms = [] designed_all_atoms = [] for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()): native_ca_atoms.append(native_residue['CA']) designed_ca_atoms.append(native_residue['CA'] for (native_atom, designed_atom) in zip(native_residue.get_list(), designed_residue.get_list()): native_all_atoms.append(native_atom) designed_atom.append(designed_atom) superpose_ca = Superimposer() superpose_all = Superimposer() superpose_ca.set(native_ca_atoms, designed_ca_atoms) superpose_ca.apply(designed_pdb) ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms) superpose_all.set(native_all_atoms, designed_all_atoms) superpose_ca.apply(designed_pdb) all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms) For the CA atom residues its not really a big deal since everything we design has a CA atom. However when we go into all atoms, it turns out that the designed residue and the native residue can be different, thus leading to a different number of atoms. I didn't realize, but the zip function was making these two lists as big as the smallest list and not necessarily matching up the atoms. It would just hack off some part of the larger list! This way, the superimposer was never failing because it always had an exact match of atoms. Is the superimposer smart enough to just minimize the rmsd no matter how the lists are input, no matter what order? For instance if I put the same arginines atoms backwards in one list, and forwards in the other list, would it still be able to give a 0.0 rmsd? Thank you for your feedback, Jordan PS. Does the superimposer.rms method give back the RMSD of whatever atoms you put into it? Or is it always the CA atoms? From anaryin at gmail.com Tue Apr 2 07:07:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 2 Apr 2013 09:07:08 +0200 Subject: [Biopython] Superimposer troubles In-Reply-To: References: Message-ID: Hey Jordan, Without checking the code, I'd say order matters. The two sequences of atoms will be aligned per position. If you have ca, c, n, o or ca, n, o, c you'll get different results. Try a simple glycine and switch the order of the atoms. I think it should work like this, but again, not sure. As for the rms value, it depends on the input. If it's ca only, you get ca rmsd, etc. Cheers, Jo?o ----- This message was sent from a mobile phone and is likely to be short, concise, and direct. No dia 2 de Abr de 2013 07:26, "Willis, Jordan R" < jordan.r.willis at vanderbilt.edu> escreveu: > > Hello List, > > > I'm having trouble working through some issues with the superimposer for > all-atom superpositions. Often times, we work on protein design and our end > PDB files differs in atom-number and sometimes composition from our input. > I'm a big fan of the Superimposer, so we have implemented like this: > > p = PDBParser() > native_pdb = p.get_structure("input","input.pdb") > designed_pdb = p.get_structure("output","output.pdb") > > > native_ca_atoms = [] > native_all_atoms = [] > designed_ca_atoms = [] > designed_all_atoms = [] > for (native_residue, designed_residue) in zip(native_pdb.get_residues(), > designed_pdb.get_residues()): > native_ca_atoms.append(native_residue['CA']) > designed_ca_atoms.append(native_residue['CA'] > for (native_atom, designed_atom) in zip(native_residue.get_list(), > designed_residue.get_list()): > native_all_atoms.append(native_atom) > designed_atom.append(designed_atom) > > > superpose_ca = Superimposer() > superpose_all = Superimposer() > > superpose_ca.set(native_ca_atoms, designed_ca_atoms) > superpose_ca.apply(designed_pdb) > ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms) > > > superpose_all.set(native_all_atoms, designed_all_atoms) > superpose_ca.apply(designed_pdb) > all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms) > > > For the CA atom residues its not really a big deal since everything we > design has a CA atom. However when we go into all atoms, it turns out that > the designed residue and the native residue can be different, thus leading > to a different number of atoms. I didn't realize, but the zip function was > making these two lists as big as the smallest list and not necessarily > matching up the atoms. It would just hack off some part of the larger list! > This way, the superimposer was never failing because it always had an > exact match of atoms. Is the superimposer smart enough to just minimize the > rmsd no matter how the lists are input, no matter what order? For instance > if I put the same arginines atoms backwards in one list, and forwards in > the other list, would it still be able to give a 0.0 rmsd? > > Thank you for your feedback, > Jordan > > PS. Does the superimposer.rms method give back the RMSD of whatever atoms > you put into it? Or is it always the CA atoms? > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Tue Apr 2 09:38:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Apr 2013 10:38:24 +0100 Subject: [Biopython] Superimposer troubles In-Reply-To: References: Message-ID: On Tue, Apr 2, 2013 at 5:40 AM, Willis, Jordan R wrote: > > Hello List, > > > I'm having trouble working through some issues with the superimposer for all-atom > superpositions. Often times, we work on protein design and our end PDB files >differs in atom-number and sometimes composition from our input. I'm a big fan > of the Superimposer, so we have implemented like this: > > p = PDBParser() > native_pdb = p.get_structure("input","input.pdb") > designed_pdb = p.get_structure("output","output.pdb") > > > native_ca_atoms = [] > native_all_atoms = [] > designed_ca_atoms = [] > designed_all_atoms = [] > for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()): > native_ca_atoms.append(native_residue['CA']) > designed_ca_atoms.append(native_residue['CA'] > ... > > For the CA atom residues its not really a big deal since everything we design > has a CA atom. However when we go into all atoms, it turns out that the > designed residue and the native residue can be different, thus leading to a > different number of atoms. I didn't realize, but the zip function was making > these two lists as big as the smallest list and not necessarily matching up > the atoms. It would just hack off some part of the larger list! This way, > the superimposer was never failing because it always had an exact > match of atoms. How about using izip_longest (from itertools) rather than zip? That should give a clear error when the residue counts are different. In general however, dealing with similar but different chains will require some sort of pairwise alignment and/or restricting to just backbone atoms (like CA, C-alpha). Peter From p.j.a.cock at googlemail.com Tue Apr 2 16:33:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Apr 2013 17:33:53 +0100 Subject: [Biopython] New to BP. Looking for closely spaced genes In-Reply-To: References: Message-ID: On Mon, Apr 1, 2013 at 7:41 PM, Mark Budde wrote: > Hi, > Before I dive too far into BioPython, I'd like to get some input if you > BioPython is an appropriate tool for my task.... > > I would like to look at the human genome ORF structure and identify regions > where ORFs are closely spaced but differentially regulated, and also > identify whether the ORFs are facing the same direction of opposing > directions. To do this, I assume I would first download the annotated > genome and write a script in BioPython annotating how far each ORF is from > it's neighbors, what the orientation is, and store the result in a > dictionary. Then I would download some expression data sets and add this to > the data to the dictionary. Then I would write some algorithm comparing > gene distance, orientation and expression correlation to generate a list of > candidate ORF pairs which fit my criteria. > > My question is, is BioPython a reasonable tool to accomplish this, or is it > going to be way to slow whereas some alternative package is better suited > for my task? > Thanks, > Mark Budde Hi Mark, That sounds very doable with Biopython parsing GenBank format chromosomes downloaded form the NCBI/EMBL/DDBJ. I did something similar to look at overlaps and gaps between genes of bacteria some years back - also using the Biopython GenBank parser, e.g. http://mbe.oxfordjournals.org/cgi/content/abstract/msp302 In your case with humans there'll be lots of intron/exon structure (join locations in GenBank) so I'm recommend trying the current code from git (which will become Biopython 1.62) where this has been re-factored to hopefully make joins much easier than before. Regards, Peter From linxzh1989 at gmail.com Sat Apr 6 02:53:49 2013 From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=) Date: Sat, 6 Apr 2013 10:53:49 +0800 Subject: [Biopython] MUSCLE for alignment Message-ID: Hi all ! I have a seqdump.fasta file: >lcl|24977 TGAGAAAGACTTGAGAGGACA >lcl|24977:8-21 GAGATGACTTAGAGGACA I want to use a wrapper for Muscle in Biopython to align the two seq. the alignment result will put into a existing fasta file. >>>from Bio.Align.Applications import MuscleCommandline >>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') But i can not find anything in the result.fasta after i run the command. Do i have any missing to get the result? regards Lin From p.j.a.cock at googlemail.com Sat Apr 6 08:58:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 09:58:30 +0100 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 3:53 AM, ??? wrote: > Hi all ! > I have a seqdump.fasta file: >>lcl|24977 > TGAGAAAGACTTGAGAGGACA > >>lcl|24977:8-21 > GAGATGACTTAGAGGACA > > I want to use a wrapper for Muscle in Biopython to align the two seq. > the alignment result will put into a existing fasta file. > >>>>from Bio.Align.Applications import MuscleCommandline >>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') > > But i can not find anything in the result.fasta after i run the command. > Do i have any missing to get the result? > > regards > Lin Hi Lin, In your example you've not yet called Muscle, #Load the library: from Bio.Align.Applications import MuscleCommandline #Create command line wrapper instance, mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta') #Optionally show what command it would run: print mcline #Actually run the command, stdout, stderr = mcline() Does that help? The main Tutorial does have some more detailed examples. Peter From p.j.a.cock at googlemail.com Sat Apr 6 11:41:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 12:41:33 +0100 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 12:18 PM, ??? wrote: > Thank you! Peter. > It really helps me. > If i do not specify it by: stdout, stderr = mcline() > the alignment will writen to stdout, instead of the output file. > Is it correct? MUSCLE will by default write the alignment to stdout, but you used the out argument to specify an output filename instead. In this case stdout will probably be empty. There are some stdout examples using MUSCLE in the Biopython Tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Peter P.S. Please CC the mailing list. From linxzh1989 at gmail.com Sat Apr 6 13:57:31 2013 From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=) Date: Sat, 6 Apr 2013 21:57:31 +0800 Subject: [Biopython] MUSCLE for alignment In-Reply-To: References: Message-ID: Thank you for you advice. I will CC the maillling list. regards 2013/4/6 Peter Cock : > On Sat, Apr 6, 2013 at 12:18 PM, ??? wrote: >> Thank you! Peter. >> It really helps me. >> If i do not specify it by: stdout, stderr = mcline() >> the alignment will writen to stdout, instead of the output file. >> Is it correct? > > MUSCLE will by default write the alignment to stdout, but you > used the out argument to specify an output filename instead. > In this case stdout will probably be empty. > > There are some stdout examples using MUSCLE in the > Biopython Tutorial: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Peter > > P.S. Please CC the mailing list. From nicolas.joannin at gmail.com Sat Apr 6 15:31:40 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 7 Apr 2013 00:31:40 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 Message-ID: Hello everyone, I'm having a problem installing biopython with Python 3.3.1rc1... Basically, I get several tests failing (in addition to a lot of warnings). I don't think the failed tests will be a problem for my work, however, I thought you'd want to have a look... Attached is the output of python3 setup.py test. Also, if you think I shouldn't use biopython without having these failed tests fixed first, please let me know! Best regards, Nicolas -------------- next part -------------- Nicolass-MacBook-Air:biopython NicojoAir11$ python3 setup.py test WARNING - Biopython does not yet officially support Python 3 The 2to3 library will be called automatically now, and the converted files cached under build/py3.3 Processing Bio Processing BioSQL Processing Tests Processing Scripts Processing Doc Python 2to3 processing done. running test Python version: 3.3.1rc1 (v3.3.1rc1:92c2cfb92405, Mar 25 2013, 00:54:04) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] Operating system: posix darwin test_Ace ... ok test_AlignIO ... ok test_AlignIO_FastaIO ... ok test_AlignIO_convert ... ok test_Application ... ok test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL test_BioSQL_psycopg2 ... skipping. Connection failed, check settings if you plan to use BioSQL: FATAL: role "postgres" does not exist test_BioSQL_sqlite3 ... ok test_CAPS ... ok test_Chi2 ... ok test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython. test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython. test_Cluster ... ok test_CodonTable ... ok test_CodonUsage ... ok test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics. test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Entrez_online ... FAIL test_Enzyme ... ok test_FSSP ... ok test_Fasttree_tool ... skipping. Install fasttree and correctly set the file path to the program if you want to use it from Biopython. test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_KDTree ... ok test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LogisticRegression ... ok test_MMCIF ... skipping. C extension MMCIFlex not installed. test_Mafft_tool ... ok test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PAML_baseml ... ok test_PAML_codeml ... ok test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper. test_PAML_yn00 ... ok test_PDB ... ok test_PDB_KDTree ... ok test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_CDAO ... skipping. Install the librdf Python bindings if you want to use the CDAO tree format. test_Phylo_NeXML ... ./test_Phylo_NeXML.py:87: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/9w/kkwnss4n52bbc3crhctbhfnh0000gn/T/tmpf9__6a'> t2 = next(NeXMLIO.Parser(open(DUMMY, 'rb')).parse()) ok test_Phylo_depend ... skipping. Install matplotlib if you want to use Bio.Phylo._utils. test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist. test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='SCOP/scopseq-test/astral-scopdom-seqres-all-test.fa' mode='r' encoding='UTF-8'> for record in sequences: ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SCOP_online ... ok test_SVDSuperimposer ... ok test_SearchIO_blast_tab ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release. BiopythonExperimentalWarning) ok test_SearchIO_blast_tab_index ... ok test_SearchIO_blast_text ... ok test_SearchIO_blast_xml ... ok test_SearchIO_blast_xml_index ... ok test_SearchIO_blat_psl ... ok test_SearchIO_blat_psl_index ... ok test_SearchIO_exonerate ... ok test_SearchIO_exonerate_text_index ... ok test_SearchIO_exonerate_vulgar_index ... ok test_SearchIO_fasta_m10 ... ok test_SearchIO_fasta_m10_index ... ok test_SearchIO_hmmer2_text ... ok test_SearchIO_hmmer2_text_index ... ok test_SearchIO_hmmer3_domtab ... ok test_SearchIO_hmmer3_domtab_index ... ok test_SearchIO_hmmer3_tab ... ok test_SearchIO_hmmer3_tab_index ... ok test_SearchIO_hmmer3_text ... ok test_SearchIO_hmmer3_text_index ... ok test_SearchIO_model ... ok test_SearchIO_write ... ok test_SeqIO ... ok test_SeqIO_AbiIO ... ok test_SeqIO_FastaIO ... ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'> re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids)) ./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'> default = list(SeqIO.parse(open(filename), "fasta", alphabet)) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'> second = next(iterator) ./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'> record = SeqIO.read(open(filename), "fasta", alphabet) ok test_SeqIO_Insdc ... ok test_SeqIO_PdbIO ... ok test_SeqIO_QualityIO ... ./test_SeqIO_QualityIO.py:348: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> records1 = list(SeqIO.parse(open("Quality/example.fasta"),"fasta")) ./test_SeqIO_QualityIO.py:349: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:357: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> self.assertEqual(h.getvalue(),open("Quality/example.fasta").read()) ./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> open("Quality/example.qual"))) ./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> open("Quality/example.qual"))) ./test_SeqIO_QualityIO.py:329: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) ./test_SeqIO_QualityIO.py:334: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> records1 = list(SeqIO.parse(open("Quality/example.qual"),"qual")) ./test_SeqIO_QualityIO.py:335: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq")) ./test_SeqIO_QualityIO.py:344: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> self.assertEqual(h.getvalue(),open("Quality/example.qual").read()) ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_original_illumina.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_original_solexa.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_sanger.fastq' mode='rU' encoding='UTF-8'> "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_original_sanger.fastq' mode='r' encoding='UTF-8'> for record in records: ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_solexa.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_illumina.fastq' mode='rU' encoding='UTF-8'> "rU").read() ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.fasta' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.qual' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.fasta' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.qual' mode='r' encoding='UTF-8'> wanted = list(SeqIO.parse(open(out_name), format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_end.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_start.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_in_middle.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_at_start.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_in_middle.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_no_manifest.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/greek.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/paired.sff'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_93.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/tricky.fastq' mode='r' encoding='UTF-8'> records = list(SeqIO.parse(open(filename, mode),in_format)) ok test_SeqIO_SeqXML ... ./test_SeqIO_SeqXML.py:141: DeprecationWarning: Please use assertEqual instead. self.assertEquals(len(read1_records),len(read2_records)) ok test_SeqIO_convert ... ok test_SeqIO_features ... ./test_SeqIO_features.py:190: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/iro.gb' mode='rU' encoding='UTF-8'> gbk_template = open("GenBank/iro.gb", "rU").read() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqFeature.py:155: BiopythonDeprecationWarning: Rather than sub_features, use a CompoundFeatureLocation BiopythonDeprecationWarning) ./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds")) ./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.faa' mode='r' encoding='UTF-8'> fasta = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds")) ./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'> fasta = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:1070: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1072: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'> fa_records = list(SeqIO.parse(open(self.ffn_filename),"fasta")) ./test_SeqIO_features.py:1023: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1024: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> embl_record = SeqIO.read(open(self.embl_filename),"embl") ./test_SeqIO_features.py:1054: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_record = SeqIO.read(open(self.gb_filename),"genbank") ./test_SeqIO_features.py:1055: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.fna' mode='r' encoding='UTF-8'> fa_record = SeqIO.read(open(self.fna_filename),"fasta") ./test_SeqIO_features.py:1059: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> embl_record = SeqIO.read(open(self.embl_filename),"embl") ./test_SeqIO_features.py:1036: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'> faa_records = list(SeqIO.parse(open(self.faa_filename),"fasta")) ./test_SeqIO_features.py:1037: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'> ffn_records = list(SeqIO.parse(open(self.ffn_filename),"fasta")) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AAA03323.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/DD231055_edited.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/Human_contigs.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NT_019265.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/SC10H5.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/TRBG361.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/U87107.embl' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/arab1.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/blank_seq.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/cor6_6.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/dbsource_wrap.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/extra_keywords.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/gbvrl1_start.seq' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/noref.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/one_of.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/origin_line.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/pri1.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq2.gb' mode='r' encoding='UTF-8'> gb_records = list(SeqIO.parse(open(filename),in_format)) ok test_SeqIO_index ... FAIL test_SeqIO_online ... ok test_SeqIO_write ... ok test_SeqRecord ... ok test_SeqUtils ... ./test_SeqUtils.py:71: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> record = SeqIO.read(open(dna_genbank_filename), "genbank") ./test_SeqUtils.py:55: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'> seq_records = list(SeqIO.parse(open(dna_fasta_filename), "fasta")) ok test_Seq_objs ... ok test_SffIO ... ok test_SubsMat ... ./test_SubsMat.py:21: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_count.txt' mode='r' encoding='UTF-8'> ftab_prot = FreqTable.read_count(open(ftab_file)) ./test_SubsMat.py:23: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_freq.txt' mode='r' encoding='UTF-8'> ctab_prot = FreqTable.read_freq(open(ctab_file)) ./test_SubsMat.py:31: ResourceWarning: unclosed file <_io.BufferedReader name='SubsMat/acc_rep_mat.pik'> acc_rep_mat = pickle.load(open(pickle_file, 'rb')) ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_TogoWS ... ./test_TogoWS.py:501: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "embl"), "embl") ./test_TogoWS.py:494: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'> new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "fasta"), "fasta") ok test_Tutorial ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk.bgz'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_005.txt'> /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result1.txt' mode='r' encoding='UTF-8'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result2.txt' mode='r' encoding='UTF-8'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='lipoprotein.txt' mode='r' encoding='UTF-8'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='REB1.pfm' mode='r' encoding='UTF-8'> /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> test.globs.clear() ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='meme.out' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='alignace.out' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'> ./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'> ok test_UniGene ... ok test_Uniprot ... ./test_Uniprot.py:314: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'> ids = [x.strip() for x in open("SwissProt/multi_ex.list")] ./test_Uniprot.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'> ids = [x.strip() for x in open("SwissProt/multi_ex.list")] /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.txt'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.xml'> function() ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython. test_align ... ok test_bgzf ... FAIL test_geo ... ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSE16.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM645.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM691.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM700.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM804.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy_chp.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_dual.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_family.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_platform.txt' mode='r' encoding='latin'> fh = open(os.path.join("Geo", file), encoding="latin") ok test_kNN ... ok test_lowess ... ok test_motifs ... ok test_pairwise2 ... ok test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper. test_prodoc ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/Doc/prosite.excerpt.doc' mode='r' encoding='UTF-8'> function() ok test_prosite1 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00107.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00159.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00165.txt' mode='r' encoding='UTF-8'> function() ok test_prosite2 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00432.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00488.txt' mode='r' encoding='UTF-8'> function() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00546.txt' mode='r' encoding='UTF-8'> function() ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. test_py3k ... ok test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper. test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.Align.Applications._Clustalw docstring test ... ok Bio.Align.Applications._ClustalOmega docstring test ... ok Bio.Align.Applications._Mafft docstring test ... ok Bio.Align.Applications._Muscle docstring test ... ok Bio.Align.Applications._Probcons docstring test ... ok Bio.Align.Applications._Prank docstring test ... ok Bio.Align.Applications._TCoffee docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.bgzf docstring test ... FAIL Bio.Blast.Applications docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython. warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning) ok Bio.Emboss.Applications docstring test ... ok Bio.GenBank docstring test ... ok Bio.KEGG.Compound docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/compound.sample' mode='r' encoding='UTF-8'> test.globs.clear() ok Bio.KEGG.Enzyme docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/enzyme.sample' mode='r' encoding='UTF-8'> test.globs.clear() ok Bio.Motif docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/SRF.pfm' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/meme.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> exception = None ok Bio.Motif.Applications._AlignAce docstring test ... ok Bio.Motif.Applications._XXmotif docstring test ... ok Bio.motifs docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/SRF.pfm' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/meme.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'> exception = None /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'> # Copyright 2003-2009 by Bartek Wilczynski. All rights reserved. ok Bio.motifs.applications._alignace docstring test ... ok Bio.motifs.applications._xxmotif docstring test ... ok Bio.pairwise2 docstring test ... ok Bio.Phylo.Applications._Raxml docstring test ... ok Bio.SearchIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml.bgz'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'> test.globs.clear() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'> # Copyright 2012 by Wibowo Arindrarto. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'> test.globs.clear() ok Bio.SearchIO._model docstring test ... ok Bio.SearchIO._model.query docstring test ... ok Bio.SearchIO._model.hit docstring test ... ok Bio.SearchIO._model.hsp docstring test ... ok Bio.SearchIO.BlastIO docstring test ... ok Bio.SearchIO.HmmerIO docstring test ... ok Bio.SearchIO.FastaIO docstring test ... ok Bio.SearchIO.BlatIO docstring test ... ok Bio.SearchIO.ExonerateIO docstring test ... ok Bio.SeqIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'> test.globs.clear() /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'> for record in sequences: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq.bgz'> # Copyright 2006-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_000932.faa'> test.globs.clear() /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_005816.faa'> test.globs.clear() ok Bio.SeqIO.FastaIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'> # Copyright 2006-2009 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'> # Copyright 2006-2009 by Peter Cock. All rights reserved. ok Bio.SeqIO.AceIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/AceIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/consed_sample.ace' mode='rU' encoding='UTF-8'> # Copyright 2008-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/contig1.ace' mode='rU' encoding='UTF-8'> test.globs.clear() ok Bio.SeqIO.PhdIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/PhdIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Phd/phd1' mode='r' encoding='UTF-8'> # Copyright 2008-2010 by Peter Cock. All rights reserved. ok Bio.SeqIO.QualityIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'> for record in sequences: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'> for record in records: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'> for record in records: /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'> for record in records: ok ./run_tests.py:427: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'> gc.collect() Bio.SeqIO.SffIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/SffIO.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> # Copyright 2009-2010 by Peter Cock. All rights reserved. /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'> test.globs.clear() ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqRecord.py:2: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='rU' encoding='UTF-8'> # Copyright 2002-2004 Brad Chapman. ok Bio.SeqUtils docstring test ... ok Bio.SeqUtils.MeltingTemp docstring test ... ok Bio.Sequencing.Applications._Novoalign docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok Bio.PDB.Polypeptide docstring test ... ok Bio.PDB.Selection docstring test ... ok ====================================================================== ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase) Test Entrez.read from URL ---------------------------------------------------------------------- Traceback (most recent call last): File "./test_Entrez_online.py", line 44, in test_read_from_url rec = Entrez.read(einfo) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/__init__.py", line 367, in read record = handler.read(handle) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 184, in read self.parser.ParseFile(handle) File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 322, in endElementHandler raise RuntimeError(value) RuntimeError: Unable to open connection to #DbInfo?dbaf= ====================================================================== ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests) Index fastq-sanger file Quality/example.fastq.bgz get_raw ---------------------------------------------------------------------- Traceback (most recent call last): File "./test_SeqIO_index.py", line 441, in f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.get_raw_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 281, in get_raw_check raw_file = h.read() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.key_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 171, in key_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack(" f = lambda x : x.simple_check(fn, fmt, alpha, c) File "./test_SeqIO_index.py", line 109, in simple_check h = gzip_open(filename, format) File "./test_SeqIO_index.py", line 49, in gzip_open data = handle.read() # bytes! File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read while self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header self._read_exact(struct.unpack("", line 1, in assert 80 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 128, in Bio.bgzf Failed example: line = handle.readline() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in line = handle.readline() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline c = self.read(readsize) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header raise IOError('Not a gzipped file') OSError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 129, in Bio.bgzf Failed example: assert 143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in assert 143 == handle.tell() AssertionError ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 130, in Bio.bgzf Failed example: data = handle.read(70000) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in data = handle.read(70000) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read if not self._read(readsize): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header raise IOError('Not a gzipped file') OSError: Not a gzipped file ---------------------------------------------------------------------- File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 131, in Bio.bgzf Failed example: assert 70143 == handle.tell() Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run compileflags, 1), test.globs) File "", line 1, in assert 70143 == handle.tell() AssertionError ---------------------------------------------------------------------- Ran 217 tests in 238.221 seconds FAILED (failures = 4) From p.j.a.cock at googlemail.com Sat Apr 6 18:19:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 6 Apr 2013 19:19:43 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin wrote: > Hello everyone, > > I'm having a problem installing biopython with Python 3.3.1rc1... > Basically, I get several tests failing (in addition to a lot of warnings). > > I don't think the failed tests will be a problem for my work, however, I > thought you'd want to have a look... Attached is the output of python3 > setup.py test. > > Also, if you think I shouldn't use biopython without having these failed > tests fixed first, please let me know! > > Best regards, > Nicolas Hi Nicolas, You should be OK installing this - all the test failures are within Bio.bgzf which is curious, but you probably won't be using BGZF compressed files. We do have buildslaves testing on Python 3.3.0 where this does not happen, so perhaps this is a new failure from a change in Python 3.3.1rc1 - hopefully I'll be able to confirm that by updating one of the buildslaves. Thanks for the alert, Peter From markbudde at gmail.com Sun Apr 7 00:36:10 2013 From: markbudde at gmail.com (Mark Budde) Date: Sat, 6 Apr 2013 17:36:10 -0700 Subject: [Biopython] Restriction enzymes and sticky ends Message-ID: Hi - I have a question about sticky ends in Biopython. Specifically, is there any way to maintain sticky end information? Having read the restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), I suspect that the answer is no. It seems that the cut sites are only maintained for the top strand. So I am planning on adding this data in my program (although I will need to read up on classes). However, this requires that I can get the cut site information. The only way that I can find to extract this information is from the Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can use this information to determine the cut sites, but I expect that there is a more direct way, since the elucidate() function must be generating this from some attribute. FYI, I am curious about this because I want to simulate GoldenGate cloning in Biopython. Thanks, Mark Budde From markbudde at gmail.com Sun Apr 7 01:11:36 2013 From: markbudde at gmail.com (Mark Budde) Date: Sat, 6 Apr 2013 18:11:36 -0700 Subject: [Biopython] Restriction enzymes and sticky ends Message-ID: Hi - I have a question about sticky ends in Biopython. Specifically, is there any way to maintain sticky end information? Having read the restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), I suspect that the answer is no. It seems that the cut sites are only maintained for the top strand. So I am planning on adding this data in my program (although I will need to read up on classes). However, this requires that I can get the cut site information. The only way that I can find to extract this information is from the Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can use this information to determine the cut sites, but I expect that there is a more direct way, since the elucidate() function must be generating this from some attribute. FYI, I am curious about this because I want to simulate GoldenGate cloning in Biopython. Thanks, Mark Budde From nicolas.joannin at gmail.com Sun Apr 7 03:12:54 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Sun, 7 Apr 2013 12:12:54 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Hi Peter, Thanks for the quick reply! Indeed, I don't think it is a big issue for me, and I have also not had any problems with Python 3.3.0 on another machine. So, yes, it probably is linked to the Python 3.3.1rc1... However, I should point out that it is not only the Bio.bgzf that fails testing. There are also test_Entrez_online and test_SeqIO_index that are indicated as "FAIL" (both of which I do not directly use). Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Apr 7, 2013 at 3:19 AM, Peter Cock wrote: > On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin > wrote: > > Hello everyone, > > > > I'm having a problem installing biopython with Python 3.3.1rc1... > > Basically, I get several tests failing (in addition to a lot of > warnings). > > > > I don't think the failed tests will be a problem for my work, however, I > > thought you'd want to have a look... Attached is the output of python3 > > setup.py test. > > > > Also, if you think I shouldn't use biopython without having these failed > > tests fixed first, please let me know! > > > > Best regards, > > Nicolas > > Hi Nicolas, > > You should be OK installing this - all the test failures are > within Bio.bgzf which is curious, but you probably won't be > using BGZF compressed files. > > We do have buildslaves testing on Python 3.3.0 where this > does not happen, so perhaps this is a new failure from a > change in Python 3.3.1rc1 - hopefully I'll be able to confirm > that by updating one of the buildslaves. > > Thanks for the alert, > > Peter > From p.j.a.cock at googlemail.com Sun Apr 7 14:41:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 7 Apr 2013 15:41:33 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin wrote: > Hi Peter, > > Thanks for the quick reply! > Indeed, I don't think it is a big issue for me, and I have also not had any > problems with Python 3.3.0 on another machine. > So, yes, it probably is linked to the Python 3.3.1rc1... I see that Python 3.3.1 final is out now - might be worth checking that too, and I'll try to update one of our buildslaves to use this. > However, I should point out that it is not only the Bio.bgzf that fails > testing. > There are also test_Entrez_online and test_SeqIO_index that are indicated as > "FAIL" (both of which I do not directly use). The test_SeqIO_index.py failures all looked to be BGZF related too. I missed the Entrez test, but as an online test that can sometimes fail intermittently anyway. The chances are on rerunning it'll be fine. Peter From bjorn_johansson at bio.uminho.pt Sun Apr 7 18:05:11 2013 From: bjorn_johansson at bio.uminho.pt (=?ISO-8859-1?Q?Bj=F6rn_Johansson?=) Date: Sun, 7 Apr 2013 19:05:11 +0100 Subject: [Biopython] sticky ends in Biopython Message-ID: > > Message: 2 > Date: Sat, 6 Apr 2013 17:36:10 -0700 > From: Mark Budde > Subject: [Biopython] Restriction enzymes and sticky ends > To: biopython > Message-ID: > < > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi - I have a question about sticky ends in Biopython. Specifically, is > there any way to maintain sticky end information? Having read the > restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html > ), > I suspect that the answer is no. It seems that the cut sites are only > maintained for the top strand. So I am planning on adding this data in my > program (although I will need to read up on classes). > > However, this requires that I can get the cut site information. The only > way that I can find to extract this information is from the > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can > use this information to determine the cut sites, but I expect that there is > a more direct way, since the elucidate() function must be generating this > from some attribute. > > FYI, I am curious about this because I want to simulate GoldenGate cloning > in Biopython. > > Thanks, > Mark Budde > > > ------------------------------ > Hi Mark, Check out Python-dna that have classes for dealing with double stranded DNA. This package depends on Biopython and a couple of additional modules. Disclaimer: I am the developer of Python-dna Python-dna at pypi https://pypi.python.org/pypi/python-dna/ Source code https://code.google.com/p/pydna/ Documentation http://python-dna.readthedocs.org/ Discussion group https://groups.google.com/forum/?fromgroups#!forum/python-dna / Bjorn Johansson -- ______O_________oO________oO______o_______oO__ Bj?rn Johansson Assistant Professor Departament of Biology University of Minho Campus de Gualtar 4710-057 Braga PORTUGAL www.bio.uminho.pt Google profile Google Scholar Profile my group Office (direct) +351-253 601517 | (PT) mob. +351-967 147 704 | (SWE) mob. +46 739 792 968 Dept of Biology (secr) +351-253 60 4310 | fax +351-253 678980 From markbudde at gmail.com Sun Apr 7 18:48:16 2013 From: markbudde at gmail.com (Mark Budde) Date: Sun, 7 Apr 2013 11:48:16 -0700 Subject: [Biopython] sticky ends in Biopython In-Reply-To: References: Message-ID: OK, that looks useful. Thanks. -Mark On Sun, Apr 7, 2013 at 11:05 AM, Bj?rn Johansson < bjorn_johansson at bio.uminho.pt> wrote: > > > > Message: 2 > > Date: Sat, 6 Apr 2013 17:36:10 -0700 > > From: Mark Budde > > Subject: [Biopython] Restriction enzymes and sticky ends > > To: biopython > > Message-ID: > > < > > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi - I have a question about sticky ends in Biopython. Specifically, is > > there any way to maintain sticky end information? Having read the > > restriction doc ( > http://biopython.org/DIST/docs/cookbook/Restriction.html > > ), > > I suspect that the answer is no. It seems that the cut sites are only > > maintained for the top strand. So I am planning on adding this data in my > > program (although I will need to read up on classes). > > > > However, this requires that I can get the cut site information. The only > > way that I can find to extract this information is from the > > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I > can > > use this information to determine the cut sites, but I expect that there > is > > a more direct way, since the elucidate() function must be generating this > > from some attribute. > > > > FYI, I am curious about this because I want to simulate GoldenGate > cloning > > in Biopython. > > > > Thanks, > > Mark Budde > > > > > > ------------------------------ > > > > Hi Mark, > > Check out Python-dna that have classes for dealing with > double stranded DNA. This package depends on Biopython and a couple of > additional modules. > > Disclaimer: I am the developer of Python-dna > > Python-dna at pypi https://pypi.python.org/pypi/python-dna/ > Source code https://code.google.com/p/pydna/ > Documentation http://python-dna.readthedocs.org/ > Discussion group > https://groups.google.com/forum/?fromgroups#!forum/python-dna > > / Bjorn Johansson > > > > -- > ______O_________oO________oO______o_______oO__ > Bj?rn Johansson > Assistant Professor > Departament of Biology > University of Minho > Campus de Gualtar > 4710-057 Braga > PORTUGAL > www.bio.uminho.pt > Google profile > Google Scholar Profile< > http://scholar.google.com/citations?user=7AiEuJ4AAAAJ> > my group > Office (direct) +351-253 601517 | (PT) mob. +351-967 147 704 | (SWE) mob. > +46 739 792 968 > Dept of Biology (secr) +351-253 60 4310 | fax +351-253 678980 > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Sun Apr 7 19:52:13 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 7 Apr 2013 20:52:13 +0100 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 1:36 AM, Mark Budde wrote: > Hi - I have a question about sticky ends in Biopython. Specifically, is > there any way to maintain sticky end information? Having read the > restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html), > I suspect that the answer is no. It seems that the cut sites are only > maintained for the top strand. So I am planning on adding this data in my > program (although I will need to read up on classes). > > However, this requires that I can get the cut site information. The only > way that I can find to extract this information is from the > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can > use this information to determine the cut sites, but I expect that there is > a more direct way, since the elucidate() function must be generating this > from some attribute. > > FYI, I am curious about this because I want to simulate GoldenGate cloning > in Biopython. > > Thanks, > Mark Budde Hi Mark, Good question. Sadly help(EcoRI) doesn't tell you very much, does it? The whole Restriction module could benefit from a new maintainer and/or a rewrite (for one thing, it unfortunately did not follow Python counting in some aspects). Two tips: first dir(object) gives a list of the attributes and methods of an object in Python. Second, you can look at the source of the elucidate method to see where it gets the information you're looking for ;) [A last resort perhaps - but when documentation has let you down, worth knowing how to explore.] https://github.com/biopython/biopython/blob/master/Bio/Restriction/Restriction.py Here EcoRI is a 5' overhanging digest enzyme, and the values you need are EcoRI.fst5 (here 1) and EcoRI.fst3 (here -1) which are relative to the recognition site (here GAATTC). e.g. Overhang type methods include: >>> from Bio.Restriction import EcoRI >>> EcoRI.overhang() "5' overhang" >>> EcoRI.is_blunt() False >>> EcoRI.is_5overhang() True >>> EcoRI.is_3overhang() False >>> EcoRI.elucidate() 'G^AATT_C' >>> EcoRI.fst5 1 >>> EcoRI.fst3 -1 >>> EcoRI.site 'GAATTC' Notice 'GAATTC'[:1] = 'G', 'GAATTC'[1:-1] = 'AATT' and 'GAATTC'[-1:] = 'C' which gives the elucidated string. Is that all you needed? Regards Peter From p.j.a.cock at googlemail.com Mon Apr 8 09:32:00 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 10:32:00 +0100 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde wrote: > Thanks for doing some digging on my behalf, Peter. After I posted my email > last night, I started looking through the Bio.Restriction code myself. You > response is helpful, I was having trouble seeing how the cut site was > encoded for each strand. I think Bjorn's python-dna might be a better > starting place for me than Bio.Restriction, as it already has some of the > functionality I was looking for. Fair enough. > However, to you question, I'm still not quite getting the cut sites. You > example with EcoRI makes complete sense, but I can't figure out the pattern > for some other enzymes, such as BsaI, which is why I got confused initially. > If you repeat that protocol for BsaI, the results don't match up. > > In [80]: BsaI.elucidate() > Out[80]: 'GGTCTCN^NNNN_N' > > In [81]: BsaI.fst5 > Out[81]: 7 > > In [82]: BsaI.fst3 > Out[82]: 5 > > In [83]: BsaI.site > Out[83]: 'GGTCTC' > > Based on this, I would expect that BsaI.fst3 should yield > "11" but it yields 5. I think you are counting from the wrong reference point. Using Python style indexing would only allow cleavage points within the recognition site to be described. BsaI is a weird enzyme, and appears to be handled by the Ambiguous class in Bio/Restriction/Restriction.py - which says this is for enzymes for which the overhang is variable. >>> from Bio.Restriction import Bsal >>> BsaI.is_ambiguous() True >>> BsaI.is_defined() # is there a consistent site? False >>> BsaI.is_unknown() False >>> BsaI.fst5 7 >>> BsaI.fst3 5 >>> BsaI.elucidate() 'GGTCTCN^NNNN_N' This subclass has a more complicated elucidate method, but gives the same string as the REBASE website, so this is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html The 5' cut site of 7 clearly means this is downstream of the 6 bp recognition site. This appears to be counted from the start (left) of the restriction site. >From the illustration the 3' cut side is also to the right of the 5bp recognition site. It appears the number is counted from the end (right) of the recognition site, where positive as in BsaI means to the right (after the recognition site) while negative as in EcoRI means to the left (within the recognition site). Peter P.S. Please remember to CC the mailing list, e.g. reply all. Unless people say explicitly that they have done this deliberately, I generally assume taking a public discussion off list is accidental. From nicolas.joannin at gmail.com Mon Apr 8 13:21:45 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Mon, 8 Apr 2013 22:21:45 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Hi Peter, I need to update another machine, so I'll do that with the final version to see if the problem still exists. Will post back when that's done. Regarding the Entrez test, indeed, it doesn't fail every time. So no worries there. Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Sun, Apr 7, 2013 at 11:41 PM, Peter Cock wrote: > On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin > wrote: > > Hi Peter, > > > > Thanks for the quick reply! > > Indeed, I don't think it is a big issue for me, and I have also not had > any > > problems with Python 3.3.0 on another machine. > > So, yes, it probably is linked to the Python 3.3.1rc1... > > I see that Python 3.3.1 final is out now - might be worth checking > that too, and I'll try to update one of our buildslaves to use this. > > > However, I should point out that it is not only the Bio.bgzf that fails > > testing. > > There are also test_Entrez_online and test_SeqIO_index that are > indicated as > > "FAIL" (both of which I do not directly use). > > The test_SeqIO_index.py failures all looked to be BGZF related too. > > I missed the Entrez test, but as an online test that can sometimes > fail intermittently anyway. The chances are on rerunning it'll be fine. > > Peter > From p.j.a.cock at googlemail.com Mon Apr 8 14:05:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 15:05:49 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin wrote: > Hi Peter, > > I need to update another machine, so I'll do that with the final version to > see if the problem still exists. Will post back when that's done. > Regarding the Entrez test, indeed, it doesn't fail every time. So no worries > there. > > Cheers, > Nicolas I've just installed Python 3.3.1 (final) from source on a 64 bit Linux machine, and can confirm test failures from the BGZF code (not failing under Python 3.3.0). I was hoping this would be a glitch in the release candidate but sadly not. Thank you again for bringing this to our attention. Peter From nicolas.joannin at gmail.com Mon Apr 8 14:10:07 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Mon, 8 Apr 2013 23:10:07 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: OK, I guess that'll be the same whichever platform... I guess I'll stick with 3.3.0 for the other machine then. Thanks for the update! Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Mon, Apr 8, 2013 at 11:05 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin > wrote: > > Hi Peter, > > > > I need to update another machine, so I'll do that with the final version > to > > see if the problem still exists. Will post back when that's done. > > Regarding the Entrez test, indeed, it doesn't fail every time. So no > worries > > there. > > > > Cheers, > > Nicolas > > I've just installed Python 3.3.1 (final) from source on a 64 bit Linux > machine, and can confirm test failures from the BGZF code (not > failing under Python 3.3.0). I was hoping this would be a glitch in > the release candidate but sadly not. > > Thank you again for bringing this to our attention. > > Peter > From p.j.a.cock at googlemail.com Mon Apr 8 15:23:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 16:23:25 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin wrote: > OK, I guess that'll be the same whichever platform... > I guess I'll stick with 3.3.0 for the other machine then. > Thanks for the update! > > Nicolas More bad news - what ever was changes I think something similar was done in Python 2.7.4 as well, which also has new test failures not seen under Python 2.7.3. Sigh. Peter From markbudde at gmail.com Mon Apr 8 17:25:24 2013 From: markbudde at gmail.com (Mark Budde) Date: Mon, 8 Apr 2013 10:25:24 -0700 Subject: [Biopython] Restriction enzymes and sticky ends In-Reply-To: References: Message-ID: Thanks Peter, that explains it. BsaI is indeed a weird enzyme, a TypeIIs restriction enzyme. These enzymes cut a defined distance outside of their recognition sequence. The utility of these enzymes is that by tagging the cut sites on the end of your primers, you can generate whatever sticky ends you desire. Furthermore, because it cuts outside of its recognition sequence, you can incubate a number of these fragments together with both restriction enzyme and ligase, and the fragments will assemble into the final product without subcloning. This is because stciky ends are generated without the corresponding recognition site, so their ligation is irreversible. This is called GoldenGate cloning. -Mark On Mon, Apr 8, 2013 at 2:32 AM, Peter Cock wrote: > On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde wrote: > > Thanks for doing some digging on my behalf, Peter. After I posted my > email > > last night, I started looking through the Bio.Restriction code myself. > You > > response is helpful, I was having trouble seeing how the cut site was > > encoded for each strand. I think Bjorn's python-dna might be a better > > starting place for me than Bio.Restriction, as it already has some of the > > functionality I was looking for. > > Fair enough. > > > However, to you question, I'm still not quite getting the cut sites. You > > example with EcoRI makes complete sense, but I can't figure out the > pattern > > for some other enzymes, such as BsaI, which is why I got confused > initially. > > If you repeat that protocol for BsaI, the results don't match up. > > > > In [80]: BsaI.elucidate() > > Out[80]: 'GGTCTCN^NNNN_N' > > > > In [81]: BsaI.fst5 > > Out[81]: 7 > > > > In [82]: BsaI.fst3 > > Out[82]: 5 > > > > In [83]: BsaI.site > > Out[83]: 'GGTCTC' > > > > Based on this, I would expect that BsaI.fst3 should yield > > "11" but it yields 5. > > I think you are counting from the wrong reference point. > Using Python style indexing would only allow cleavage > points within the recognition site to be described. > > BsaI is a weird enzyme, and appears to be handled by the > Ambiguous class in Bio/Restriction/Restriction.py - which > says this is for enzymes for which the overhang is variable. > > >>> from Bio.Restriction import Bsal > >>> BsaI.is_ambiguous() > True > >>> BsaI.is_defined() # is there a consistent site? > False > >>> BsaI.is_unknown() > False > >>> BsaI.fst5 > 7 > >>> BsaI.fst3 > 5 > >>> BsaI.elucidate() > 'GGTCTCN^NNNN_N' > > This subclass has a more complicated elucidate method, > but gives the same string as the REBASE website, so this > is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html > > The 5' cut site of 7 clearly means this is downstream of > the 6 bp recognition site. This appears to be counted > from the start (left) of the restriction site. > > From the illustration the 3' cut side is also to the right of > the 5bp recognition site. It appears the number is counted > from the end (right) of the recognition site, where positive > as in BsaI means to the right (after the recognition site) > while negative as in EcoRI means to the left (within the > recognition site). > > Peter > > P.S. Please remember to CC the mailing list, e.g. reply all. > Unless people say explicitly that they have done this deliberately, > I generally assume taking a public discussion off list is accidental. > From p.j.a.cock at googlemail.com Mon Apr 8 17:55:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Apr 2013 18:55:47 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 4:23 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin > wrote: >> OK, I guess that'll be the same whichever platform... >> I guess I'll stick with 3.3.0 for the other machine then. >> Thanks for the update! >> >> Nicolas > > More bad news - what ever was changes I think something > similar was done in Python 2.7.4 as well, which also has > new test failures not seen under Python 2.7.3. Sigh. > > Peter Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a lot of gzip work done fixing other issues), but on the bright side the fix is quite trivial to apply manually: http://bugs.python.org/issue17666 Peter From p.j.a.cock at googlemail.com Tue Apr 9 09:39:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Apr 2013 10:39:12 +0100 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock wrote: > > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a > lot of gzip work done fixing other issues), but on the bright > side the fix is quite trivial to apply manually: > http://bugs.python.org/issue17666 > > Peter Just a heads up, this also affects Python 3.2.4 as well. Peter From p.j.a.cock at googlemail.com Tue Apr 9 10:20:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Apr 2013 11:20:43 +0100 Subject: [Biopython] OBF not accepted for GSoC 2013 Message-ID: Dear all, Unfortunately this year we have not been accepted on the Google Summer of Code scheme: I'm sure the rest of the OBF board and the other Bio* developers will join me in thanking Pjotr Prins for his efforts as the OBF GSoC administrator co-ordinating our application this year, as well as last year's administrator Rob Bruels and the other mentors for their efforts. For those of you not subscribed to the OBF's GSoC mailing list, I am forwarding Pjotr's email from last night (also below): http://lists.open-bio.org/pipermail/gsoc/2013/000211.html In all 177 organisations were accepted (about the same as the last few years), and they will be listed here (once they have filled out their profile information): https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013 To potential students this summer, the good news is that some related organisations have been accepted, such as NESCent, the National Resource for Network Biology (NRNB - known for Cytoscape), SciRuby (Ruby Science Foundation), so there is still some scope for doing a bioinformatics related project in GSoC 2013, perhaps even with a Bio* developer as a co-mentor. Thank you all, Peter (Biopython developer, OBF board member) ---------- Forwarded message ---------- From: Pjotr Prins Date: Mon, Apr 8, 2013 at 9:13 PM Subject: Re: GSoC 2013 is ON To: Pjotr Prins Cc: ..., OBF GSoC Sadly, our application got rejected by GSoC this year. I am not sure what the reason was, but I am convinced our application was similar to that of other years. Maybe the project ideas could have been better presented. I am not sure at this stage. I'll make a list of successful projects to see if we can digest some truths. The upside is that FOSS is going strong! And that the field is getting increasingly competitive. As an open source geezer I can only be happy, even if it hurts our own application. Sorry everyone, and many thanks for the trouble you took getting projects written up. Let's not feel discouraged for next year. Pj. From nicolas.joannin at gmail.com Tue Apr 9 13:47:03 2013 From: nicolas.joannin at gmail.com (Nicolas Joannin) Date: Tue, 9 Apr 2013 22:47:03 +0900 Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1 In-Reply-To: References: Message-ID: Thanks for the fix! Cheers, Nicolas Nicolas Joannin, Ph.D. Bioinformatics Center Kyoto University, Uji campus, Japan On Tue, Apr 9, 2013 at 6:39 PM, Peter Cock wrote: > On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock > wrote: > > > > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a > > lot of gzip work done fixing other issues), but on the bright > > side the fix is quite trivial to apply manually: > > http://bugs.python.org/issue17666 > > > > Peter > > Just a heads up, this also affects Python 3.2.4 as well. > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From matthiasschade.de at googlemail.com Thu Apr 11 09:20:31 2013 From: matthiasschade.de at googlemail.com (Matthias Schade) Date: Thu, 11 Apr 2013 11:20:31 +0200 Subject: [Biopython] query upper limit for NCBIWWW.qblast? Message-ID: <5166805F.8060603@googlemail.com> Hello everyone, is there an upper limit to how many sequences I can query via NCBIWWW.qblast at once? Sending up to 150 sequences each of 24mer length in a single string everything works fine. But now, I have tried the same for a string containing about 900 sequences. On good times, it takes the NCBI-server about 5min to send an answer. I save the answer and later open and parse the file by other functions in my code. However, even though I have queried the same 900 sequences, the resulting output-file varies in length (10 MB" or even misses more (this does not happen why querying 150 sequences or less). I would guess once the server has started sending its answers, there might only be a limited time NCBIWWW.qblast waits for follow up packets ... and thus depending on the current server-load, the NCBIWWW.qblast-function simply decides to terminate waiting for incomming data after some time, resulting in my blast-output-files to vary in length. Could anyone correct or verify this long-fetched hypothesis? My core-lines are: orgn='Mus Musculus' #on anything else result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, entrez_query=str(orgn+"[orgn]")) save_file = open ('myblast_result.xml',"w") save_file.write(result.read()) Best regards, Matthias From p.j.a.cock at googlemail.com Thu Apr 11 09:43:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 10:43:44 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: <5166805F.8060603@googlemail.com> References: <5166805F.8060603@googlemail.com> Message-ID: On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade wrote: > Hello everyone, > > is there an upper limit to how many sequences I can query via NCBIWWW.qblast > at once? There are sometimes limits on the URL length, especially if going via firewalls and proxies, so that may be one factor. At the NCBI end, I'm not sure what limits they impose on this: http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > Sending up to 150 sequences each of 24mer length in a single string > everything works fine. But now, I have tried the same for a string > containing about 900 sequences. On good times, it takes the NCBI-server > about 5min to send an answer. I save the answer and later open and parse the > file by other functions in my code. However, even though I have queried the > same 900 sequences, the resulting output-file varies in length (10 > MB "<\BlastOutput>" or even misses more (this does not happen why querying 150 > sequences or less). > > I would guess once the server has started sending its answers, there might > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > thus depending on the current server-load, the NCBIWWW.qblast-function > simply decides to terminate waiting for incomming data after some time, > resulting in my blast-output-files to vary in length. Could anyone correct > or verify this long-fetched hypothesis? > > My core-lines are: > > orgn='Mus Musculus' #on anything else > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > entrez_query=str(orgn+"[orgn]")) > save_file = open ('myblast_result.xml',"w") > save_file.write(result.read()) > > Best regards, > Matthias I think you've reach the scale where it would be better to run blastn locally - ideally on a cluster if you have access to one. You can download the whole NT database from here - most departments running BLAST with their own Linux servers will have a central copy which is kept automatically up to date: ftp://ftp.ncbi.nlm.nih.gov/blast/db/ If you don't have those kinds of resources, then you can even run BLAST on your own Windows machine - although I'm not sure how much RAM would be recommended for the NT database which is pretty big. Regards, Peter From ericmajinglong at gmail.com Thu Apr 11 16:49:27 2013 From: ericmajinglong at gmail.com (Eric Ma) Date: Thu, 11 Apr 2013 12:49:27 -0400 Subject: [Biopython] Request from help Message-ID: Hello everybody, I'm new to the mailing list here, though I've been playing with BioPython for quite a while. I'm having some trouble here. I wanted to display a tree of sequences for which I had done a multiple sequence alignment. I tried going through the pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). Because I'm still in the testing phase, instead of writing it as a single script, I wrote it as a series of scripts that I would execute in order. The problem I run into is at step 4 in the example, where I "feed the alignment to PhyML". My data set is 70 protein sequences, and the trouble I run into is that it takes a very, very long time at the "feeding alignment to PhyML" step. I tried running the script on my MacBook Pro overnight, and even the next morning it was not done. Am I missing something here? Just to be clear here, aligning the sequences using Muscle was successful, and I also managed to output a distance matrix from sample to sample, which I used in another downstream pipeline to display the clustering of the sequences on a 2D euclidean plane. However, I wanted to have a tree representation to validate the clustering results; the trouble is, I can't get the _phyml_tree.txt file to be created, which I would then use to draw the tree. Thanks in advance for any help! Cheers, Eric ----------------------------------------------------------------------- Please consider the environment before printing this e-mail. Do you really need to print it? http://about.me/ericmjl From jgibbons1 at mail.usf.edu Thu Apr 11 17:01:19 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Thu, 11 Apr 2013 13:01:19 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: NCBI Standalone Blast gives you the option of querying the website so that you don't have to maintain a local database. Justin Gibbons On Thu, Apr 11, 2013 at 12:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 11 17:07:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 18:07:05 +0100 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On Thu, Apr 11, 2013 at 6:01 PM, Justin Gibbons wrote: > NCBI Standalone Blast gives you the option of querying the website so that > you don't have to maintain a local database. > > Justin Gibbons Did you reply to the wrong email? This thread was about alignments and trees. Peter From p.j.a.cock at googlemail.com Thu Apr 11 17:11:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 18:11:49 +0100 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric Hi Eric, So this part is getting stuck (or taking a very long time): #Feed the alignment to PhyML using the command line wrapper: from Bio.Phylo.Applications import PhymlCommandline cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', model='WAG', alpha='e', bootstrap=100) out_log, err_log = cmdline() At that point is the computer active (high CPU load as measured via the task manager / system monitor / top / etc)? I would suggest trying PHYML at the command line by hand, first check the command the Biopython should be running: print cmdline That may give you visual progress on screen. My guess is simply that this is just slow - you are only running 100 bootstraps, but perhaps each one is taking a while and that adds up. You said you had 70 protein sequences - how many columns are there in the alignment? That can also affect run times. Peter From nuin at genedrift.org Thu Apr 11 17:05:57 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 11 Apr 2013 13:05:57 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: On 2013-04-11, at 12:49 PM, Eric Ma wrote: > Hello everybody, > > I'm new to the mailing list here, though I've been playing with BioPython > for quite a while. > > I'm having some trouble here. I wanted to display a tree of sequences for > which I had done a multiple sequence alignment. I tried going through the > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline). > Because I'm still in the testing phase, instead of writing it as a single > script, I wrote it as a series of scripts that I would execute in order. > > The problem I run into is at step 4 in the example, where I "feed the > alignment to PhyML". My data set is 70 protein sequences, and the trouble I > run into is that it takes a very, very long time at the "feeding alignment > to PhyML" step. I tried running the script on my MacBook Pro overnight, and > even the next morning it was not done. Am I missing something here? > Hi With 70 OTUs you have 5.00 E115 possible trees. Guaranteed it will take a long time, independent to what parameters you are using in PhyML. Try with a smaller number of taxa, just for testing purposes and depending on the complexity of your protein phylogeny, give your computer some weeks to actually generate some result. This is not a BioPython issue, is more a phylogenetics one. Cheers Paulo > Just to be clear here, aligning the sequences using Muscle was successful, > and I also managed to output a distance matrix from sample to sample, which > I used in another downstream pipeline to display the clustering of the > sequences on a 2D euclidean plane. However, I wanted to have a tree > representation to validate the clustering results; the trouble is, I can't > get the _phyml_tree.txt file to be created, which I would then use to draw > the tree. > > Thanks in advance for any help! > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From ericmajinglong at gmail.com Thu Apr 11 17:20:14 2013 From: ericmajinglong at gmail.com (Eric Ma) Date: Thu, 11 Apr 2013 13:20:14 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: Hi Peter and Paulo, Thank you for your feedback, much appreciated! I still have very sparse knowledge about phylogenies, and especially the run times needed to build the trees, so any new knowledge is appreciated! The sequences I'm using are full Influenza A HA protein sequences, so we're talking about 1700-1750 amino acids being aligned together. The multiple sequence alignment for 70 sequences doesn't take long - on the order of minutes on my laptop. It's the "feeding into PhyML" portion that, for some reason, takes a long time. With that said, I do have a full distance matrix as one of the outputs from a previous script in this script series, in addition to the multiple sequence alignment. I have been able to feed the distance matrix into a separate clustering algorithm from scikit-learn, and I was able to successfully identify six clusters of sequences in there. Hence, I wanted to use a phylogenetic tree to confirm what I'm seeing with the clustering algorithm - it's basically two separate representations of the same data. I have heard that it is possible to create a tree from the distance matrix, and I was thinking this might be an alternative to feeding the alignment into PhyML. Does anybody know how to do this using BioPython? Cheers, Eric ----------------------------------------------------------------------- Please consider the environment before printing this e-mail. Do you really need to print it? http://about.me/ericmjl On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock wrote: > On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: > > Hello everybody, > > > > I'm new to the mailing list here, though I've been playing with BioPython > > for quite a while. > > > > I'm having some trouble here. I wanted to display a tree of sequences for > > which I had done a multiple sequence alignment. I tried going through the > > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline > ). > > Because I'm still in the testing phase, instead of writing it as a single > > script, I wrote it as a series of scripts that I would execute in order. > > > > The problem I run into is at step 4 in the example, where I "feed the > > alignment to PhyML". My data set is 70 protein sequences, and the > trouble I > > run into is that it takes a very, very long time at the "feeding > alignment > > to PhyML" step. I tried running the script on my MacBook Pro overnight, > and > > even the next morning it was not done. Am I missing something here? > > > > Just to be clear here, aligning the sequences using Muscle was > successful, > > and I also managed to output a distance matrix from sample to sample, > which > > I used in another downstream pipeline to display the clustering of the > > sequences on a 2D euclidean plane. However, I wanted to have a tree > > representation to validate the clustering results; the trouble is, I > can't > > get the _phyml_tree.txt file to be created, which I would then use to > draw > > the tree. > > > > Thanks in advance for any help! > > > > Cheers, > > Eric > > Hi Eric, > > So this part is getting stuck (or taking a very long time): > > #Feed the alignment to PhyML using the command line wrapper: > from Bio.Phylo.Applications import PhymlCommandline > cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', > model='WAG', alpha='e', bootstrap=100) > out_log, err_log = cmdline() > > At that point is the computer active (high CPU load as measured > via the task manager / system monitor / top / etc)? > > I would suggest trying PHYML at the command line by hand, first > check the command the Biopython should be running: > > print cmdline > > That may give you visual progress on screen. My guess is simply > that this is just slow - you are only running 100 bootstraps, but > perhaps each one is taking a while and that adds up. > > You said you had 70 protein sequences - how many columns > are there in the alignment? That can also affect run times. > > Peter > From nuin at genedrift.org Thu Apr 11 17:33:05 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 11 Apr 2013 13:33:05 -0400 Subject: [Biopython] Request from help In-Reply-To: References: Message-ID: <8176FA21-39F6-405A-B338-94D87E6BB7B3@genedrift.org> On 2013-04-11, at 1:20 PM, Eric Ma wrote: > Hi Peter and Paulo, > > Thank you for your feedback, much appreciated! I still have very sparse > knowledge about phylogenies, and especially the run times needed to build > the trees, so any new knowledge is appreciated! > > The sequences I'm using are full Influenza A HA protein sequences, so we're > talking about 1700-1750 amino acids being aligned together. The multiple > sequence alignment for 70 sequences doesn't take long - on the order of > minutes on my laptop. It's the "feeding into PhyML" portion that, for some > reason, takes a long time. Alignment time is much smaller than any phylogeny calculation on your data size. The number of amino acids is not that important on the final time, as the ML is calculation is quite fast, but arranging the branches is the main bottleneck. There's no easy solution for this, maybe you can try some other approaches, that won't be as good as ML (Neighbour Joning) and some that might be as good (Bayes) but take some time too. > > With that said, I do have a full distance matrix as one of the outputs from > a previous script in this script series, in addition to the multiple > sequence alignment. I have been able to feed the distance matrix into a > separate clustering algorithm from scikit-learn, and I was able to > successfully identify six clusters of sequences in there. Hence, I wanted > to use a phylogenetic tree to confirm what I'm seeing with the clustering > algorithm - it's basically two separate representations of the same data. > The distance can be used to generate a diagram, I wouldn't call it a phylogenetic tree, but it can give you some ideas. One quick way to check for your tree is to use Neighbour Joining approach, you can try Mega with your alignment file and see, calculations will be faster. Cheers Paulo > I have heard that it is possible to create a tree from the distance matrix, > and I was thinking this might be an alternative to feeding the alignment > into PhyML. Does anybody know how to do this using BioPython? > > Cheers, > Eric > ----------------------------------------------------------------------- > Please consider the environment before printing this e-mail. Do you really > need to print it? > > http://about.me/ericmjl > > > On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock wrote: > >> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma wrote: >>> Hello everybody, >>> >>> I'm new to the mailing list here, though I've been playing with BioPython >>> for quite a while. >>> >>> I'm having some trouble here. I wanted to display a tree of sequences for >>> which I had done a multiple sequence alignment. I tried going through the >>> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline >> ). >>> Because I'm still in the testing phase, instead of writing it as a single >>> script, I wrote it as a series of scripts that I would execute in order. >>> >>> The problem I run into is at step 4 in the example, where I "feed the >>> alignment to PhyML". My data set is 70 protein sequences, and the >> trouble I >>> run into is that it takes a very, very long time at the "feeding >> alignment >>> to PhyML" step. I tried running the script on my MacBook Pro overnight, >> and >>> even the next morning it was not done. Am I missing something here? >>> >>> Just to be clear here, aligning the sequences using Muscle was >> successful, >>> and I also managed to output a distance matrix from sample to sample, >> which >>> I used in another downstream pipeline to display the clustering of the >>> sequences on a 2D euclidean plane. However, I wanted to have a tree >>> representation to validate the clustering results; the trouble is, I >> can't >>> get the _phyml_tree.txt file to be created, which I would then use to >> draw >>> the tree. >>> >>> Thanks in advance for any help! >>> >>> Cheers, >>> Eric >> >> Hi Eric, >> >> So this part is getting stuck (or taking a very long time): >> >> #Feed the alignment to PhyML using the command line wrapper: >> from Bio.Phylo.Applications import PhymlCommandline >> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa', >> model='WAG', alpha='e', bootstrap=100) >> out_log, err_log = cmdline() >> >> At that point is the computer active (high CPU load as measured >> via the task manager / system monitor / top / etc)? >> >> I would suggest trying PHYML at the command line by hand, first >> check the command the Biopython should be running: >> >> print cmdline >> >> That may give you visual progress on screen. My guess is simply >> that this is just slow - you are only running 100 bootstraps, but >> perhaps each one is taking a while and that adds up. >> >> You said you had 70 protein sequences - how many columns >> are there in the alignment? That can also affect run times. >> >> Peter >> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From jgibbons1 at mail.usf.edu Thu Apr 11 18:10:32 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Thu, 11 Apr 2013 14:10:32 -0400 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: References: <5166805F.8060603@googlemail.com> Message-ID: NCBI Standalone Blast gives you the option of querying the website so that you don't have to maintain a local database. Justin Gibbons P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it correct this time. On Thu, Apr 11, 2013 at 5:43 AM, Peter Cock wrote: > On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade > wrote: > > Hello everyone, > > > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast > > at once? > > There are sometimes limits on the URL length, especially if going via > firewalls and proxies, so that may be one factor. > > At the NCBI end, I'm not sure what limits they impose on this: > http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > > > Sending up to 150 sequences each of 24mer length in a single string > > everything works fine. But now, I have tried the same for a string > > containing about 900 sequences. On good times, it takes the NCBI-server > > about 5min to send an answer. I save the answer and later open and parse > the > > file by other functions in my code. However, even though I have queried > the > > same 900 sequences, the resulting output-file varies in length (10 > > MB > "<\BlastOutput>" or even misses more (this does not happen why querying > 150 > > sequences or less). > > > > I would guess once the server has started sending its answers, there > might > > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > > thus depending on the current server-load, the NCBIWWW.qblast-function > > simply decides to terminate waiting for incomming data after some time, > > resulting in my blast-output-files to vary in length. Could anyone > correct > > or verify this long-fetched hypothesis? > > > > My core-lines are: > > > > orgn='Mus Musculus' #on anything else > > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > > entrez_query=str(orgn+"[orgn]")) > > save_file = open ('myblast_result.xml',"w") > > save_file.write(result.read()) > > > > Best regards, > > Matthias > > I think you've reach the scale where it would be better to run blastn > locally - ideally on a cluster if you have access to one. You can > download the whole NT database from here - most departments > running BLAST with their own Linux servers will have a central copy > which is kept automatically up to date: > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > If you don't have those kinds of resources, then you can even > run BLAST on your own Windows machine - although I'm not > sure how much RAM would be recommended for the NT > database which is pretty big. > > Regards, > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Apr 11 18:54:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Apr 2013 19:54:50 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: References: <5166805F.8060603@googlemail.com> Message-ID: On Thursday, April 11, 2013, Justin Gibbons wrote: > NCBI Standalone Blast gives you the option of querying the website so that > you don't have to maintain a local database. Good point - the BLAST+ binaries added the -remote option which does that. Worth exploring as it should know and obey the NCBI limits automatically. > > Justin Gibbons > > P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it > correct this time. > > Easily done, don't worry about it. Peter From dan837446 at gmail.com Thu Apr 11 20:51:13 2013 From: dan837446 at gmail.com (Dan) Date: Fri, 12 Apr 2013 08:51:13 +1200 Subject: [Biopython] Biopython Digest, Vol 124, Issue 9 In-Reply-To: References: Message-ID: This is peripherally relevant to the question, I asked Tao Tao of NCBI user services about general guidelines for remote blast, and got this response: "In general, the key is to reduce the hits to BLAST server: At the search step, DO NOT submit searches that contain only single sequence! You need to batch the query and submit a set in a single search request. At the result polling step, you should reduce the result checking by spacing them out, and start checking for results after a delay (a few minutes). The XML result for batch queries is a bit peculiar each query is wrapped around tag You are better off leaving the other conditions default and post-process it to get the top hits" Also it's best to search between 9PM and 5AM Eastern Standard time and at weekends. Personally I seem to encounter glitches using batches above 100 but it's so specific to your particular workplace that I'm not sure if that's a good guideline. On Fri, Apr 12, 2013 at 4:00 AM, wrote: > Send Biopython mailing list submissions to > biopython at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biopython > or, via email, send a message with subject or body 'help' to > biopython-request at lists.open-bio.org > > You can reach the person managing the list at > biopython-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biopython digest..." > > > Today's Topics: > > 1. query upper limit for NCBIWWW.qblast? (Matthias Schade) > 2. Re: query upper limit for NCBIWWW.qblast? (Peter Cock) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 11 Apr 2013 11:20:31 +0200 > From: Matthias Schade > Subject: [Biopython] query upper limit for NCBIWWW.qblast? > To: biopython at lists.open-bio.org > Message-ID: <5166805F.8060603 at googlemail.com> > Content-Type: text/plain; charset=ISO-8859-15; format=flowed > > Hello everyone, > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast at once? > > Sending up to 150 sequences each of 24mer length in a single string > everything works fine. But now, I have tried the same for a string > containing about 900 sequences. On good times, it takes the NCBI-server > about 5min to send an answer. I save the answer and later open and parse > the file by other functions in my code. However, even though I have > queried the same 900 sequences, the resulting output-file varies in > length (10 MB termination-tag in "<\BlastOutput>" or even misses more (this does not > happen why querying 150 sequences or less). > > I would guess once the server has started sending its answers, there > might only be a limited time NCBIWWW.qblast waits for follow up packets > ... and thus depending on the current server-load, the > NCBIWWW.qblast-function simply decides to terminate waiting for > incomming data after some time, resulting in my blast-output-files to > vary in length. Could anyone correct or verify this long-fetched > hypothesis? > > My core-lines are: > > orgn='Mus Musculus' #on anything else > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > entrez_query=str(orgn+"[orgn]")) > save_file = open ('myblast_result.xml',"w") > save_file.write(result.read()) > > Best regards, > Matthias > > > ------------------------------ > > Message: 2 > Date: Thu, 11 Apr 2013 10:43:44 +0100 > From: Peter Cock > Subject: Re: [Biopython] query upper limit for NCBIWWW.qblast? > To: Matthias Schade > Cc: biopython at lists.open-bio.org > Message-ID: > ZYEg at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade > wrote: > > Hello everyone, > > > > is there an upper limit to how many sequences I can query via > NCBIWWW.qblast > > at once? > > There are sometimes limits on the URL length, especially if going via > firewalls and proxies, so that may be one factor. > > At the NCBI end, I'm not sure what limits they impose on this: > http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > > > Sending up to 150 sequences each of 24mer length in a single string > > everything works fine. But now, I have tried the same for a string > > containing about 900 sequences. On good times, it takes the NCBI-server > > about 5min to send an answer. I save the answer and later open and parse > the > > file by other functions in my code. However, even though I have queried > the > > same 900 sequences, the resulting output-file varies in length (10 > > MB > "<\BlastOutput>" or even misses more (this does not happen why querying > 150 > > sequences or less). > > > > I would guess once the server has started sending its answers, there > might > > only be a limited time NCBIWWW.qblast waits for follow up packets ... and > > thus depending on the current server-load, the NCBIWWW.qblast-function > > simply decides to terminate waiting for incomming data after some time, > > resulting in my blast-output-files to vary in length. Could anyone > correct > > or verify this long-fetched hypothesis? > > > > My core-lines are: > > > > orgn='Mus Musculus' #on anything else > > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, > > entrez_query=str(orgn+"[orgn]")) > > save_file = open ('myblast_result.xml',"w") > > save_file.write(result.read()) > > > > Best regards, > > Matthias > > I think you've reach the scale where it would be better to run blastn > locally - ideally on a cluster if you have access to one. You can > download the whole NT database from here - most departments > running BLAST with their own Linux servers will have a central copy > which is kept automatically up to date: > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > If you don't have those kinds of resources, then you can even > run BLAST on your own Windows machine - although I'm not > sure how much RAM would be recommended for the NT > database which is pretty big. > > Regards, > > Peter > > > ------------------------------ > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > End of Biopython Digest, Vol 124, Issue 9 > ***************************************** > From p.j.a.cock at googlemail.com Fri Apr 12 09:49:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Apr 2013 10:49:31 +0100 Subject: [Biopython] query upper limit for NCBIWWW.qblast? In-Reply-To: <5166805F.8060603@googlemail.com> References: <5166805F.8060603@googlemail.com> Message-ID: Dan replied via the digest (summary emails rather than individual emails) here: http://lists.open-bio.org/pipermail/biopython/2013-April/008507.html On Thu, Apr 11, 2013 at 9:51 PM, Dan wrote: > This is peripherally relevant to the question, I asked Tao Tao of NCBI user > services about general guidelines for remote blast, and got this response: > > "In general, the key is to reduce the hits to BLAST server: > At the search step, DO NOT submit searches that contain only single > sequence! You need to batch the query and submit a set in a single search > request. > At the result polling step, you should reduce the result checking by > spacing them out, and start checking for results after a delay (a few > minutes). > The XML result for batch queries is a bit peculiar each query is wrapped > around tag > You are better off leaving the other conditions default and post-process it > to get the top hits" > > Also it's best to search between 9PM and 5AM Eastern Standard time and at > weekends. > Personally I seem to encounter glitches using batches above 100 but it's so > specific to your particular workplace that I'm not sure if that's a good > guideline. > Perhaps Biopython's QBLAST wrapper could benefit from adaptive time delays in the polling step - at the moment it just checks every three seconds. Peter From john at picloud.com Fri Apr 12 23:11:43 2013 From: john at picloud.com (John Riley) Date: Fri, 12 Apr 2013 16:11:43 -0700 Subject: [Biopython] BioPython now available on PiCloud by default Message-ID: Hello, We've had some requests for BioPython to be deployed on PiCloud [1]. While any user could always create a custom environment, and install the latest version themselves [2], we've decided to address the issue directly by adding BioPython (1.60) into the default suite of scientific tools on PiCloud. In short, to offload a Python function or program that uses BioPython, you don't need to do any setup! The instructions for using other scientific tools work just the same [3]. Hope this helps! [1] http://www.picloud.com [2] http://docs.picloud.com/environment.html [3] http://docs.picloud.com/howto/pyscientifictools.html Best Regards, John -- John Riley PiCloud, Inc. From jgibbons1 at mail.usf.edu Sat Apr 13 20:13:56 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sat, 13 Apr 2013 16:13:56 -0400 Subject: [Biopython] Cookbook suggestion Message-ID: I want to add the following to the cookbook but I am unable to create an account. #using SeqIO.write() without holding records in memory. from Bio import SeqIO seq_ids=set() #create an empty set to hold the sequence IDs. indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence ID but is not held in memory for seq_record in SeqIO.parse(file_path, 'fasta'): #Filter according to some critria: seq_ids.add(seq_record.id) #write the fasta records to a new file using SeqIO.write() SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, 'fasta') So if someone who can edit the cookbook wants to add it feel free to. Justin Gibbons From p.j.a.cock at googlemail.com Sat Apr 13 20:27:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 13 Apr 2013 21:27:24 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: Hi Justin, On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons wrote: > I want to add the following to the cookbook but I am unable to create an > account. Hmm - we should fix that. Is there a specific error message from the wiki? > #using SeqIO.write() without holding records in memory. > > from Bio import SeqIO > > > seq_ids=set() #create an empty set to hold the sequence IDs. > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence > ID but is not held in memory > > for seq_record in SeqIO.parse(file_path, 'fasta'): > #Filter according to some critria: > seq_ids.add(seq_record.id) Why do call SeqIO.index, but not use it and instead get the ID list by doing a full parse of the file? Note that calling SeqIO.index is likely faster than SeqIO.parse because the index code doesn't actually load the sequence information etc - just the record identifier. This speed difference is more obvious on heavier file formats like GenBank. e.g. These single lines both get all the identifiers as a list: seq_ids = SeqIO.parse(file_path, 'fasta').keys() vs: seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] Also note that using a set rather than a list for the ids means the order is lost - which may be important. > #write the fasta records to a new file using SeqIO.write() > > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, > 'fasta') > That last line uses a list comprehension, [indexed_fasta[seq_id] for seq_id in seq_ids] That will therefore load all the records into memory as a list of SeqRecord objects, which can be avoided with a list comprehension: (indexed_fasta[seq_id] for seq_id in seq_ids) i.e. round brackets not square. > So if someone who can edit the cookbook wants to add it feel free to. > > Justin Gibbons Feedback on the documentation and efforts to improve it are always welcome. However, I'm not sure what your example is trying to do yet - it seems to rewrite a FASTA file with the records in a new order (with the order given by however Python sorts the set of IDs). Thanks, Peter From jgibbons1 at mail.usf.edu Sun Apr 14 17:53:26 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sun, 14 Apr 2013 13:53:26 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: My only goal was to demonstrate how to use SeqIO.write without holding all of the sequence records in memory by using a generator expression: SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), new_file_path,'fasta') Everything else was just to provide context for the SeqIO.write() function, but it just ended up just being confusing. I am assuming that you want to check the individual fasta records for specific criteria and then write those that match the criteria to a new file. Which is why I wrote this: for seq_record in SeqIO.parse(file_path, 'fasta'): #Filter according to some critria: seq_ids.add(seq_record.id) For example you can create individual sets holding the sequence IDs of sequences that are within a given size range, and aren't repetitive. So that seq_ids=correct_length_set.intersection(non_repetitive_set) You need the indexed fasta so that you can get a copy of the sequence records that match your criteria: ndexed_fasta=SeqIO.index( file_path, 'fasta') #Can be searched by sequence ID but is not held in memory On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock wrote: > Hi Justin, > > On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons > wrote: > > I want to add the following to the cookbook but I am unable to create an > > account. > > Hmm - we should fix that. Is there a specific error message > from the wiki? > > > #using SeqIO.write() without holding records in memory. > > > > from Bio import SeqIO > > > > > > seq_ids=set() #create an empty set to hold the sequence IDs. > > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by > sequence > > ID but is not held in memory > > > > for seq_record in SeqIO.parse(file_path, 'fasta'): > > #Filter according to some critria: > > seq_ids.add(seq_record.id) > > Why do call SeqIO.index, but not use it and instead get > the ID list by doing a full parse of the file? Note that calling > SeqIO.index is likely faster than SeqIO.parse because the > index code doesn't actually load the sequence information > etc - just the record identifier. This speed difference is more > obvious on heavier file formats like GenBank. e.g. These > single lines both get all the identifiers as a list: > > seq_ids = SeqIO.parse(file_path, 'fasta').keys() > > vs: > > seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] > > Also note that using a set rather than a list for the ids > means the order is lost - which may be important. > > > #write the fasta records to a new file using SeqIO.write() > > > > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path, > > 'fasta') > > > > That last line uses a list comprehension, > [indexed_fasta[seq_id] for seq_id in seq_ids] > > That will therefore load all the records into memory as a list of > SeqRecord objects, which can be avoided with a list comprehension: > > (indexed_fasta[seq_id] for seq_id in seq_ids) > > i.e. round brackets not square. > > > So if someone who can edit the cookbook wants to add it feel free to. > > > > Justin Gibbons > > Feedback on the documentation and efforts to improve it > are always welcome. However, I'm not sure what your example > is trying to do yet - it seems to rewrite a FASTA file with the > records in a new order (with the order given by however > Python sorts the set of IDs). > > Thanks, > > Peter > From jgibbons1 at mail.usf.edu Sun Apr 14 17:58:53 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Sun, 14 Apr 2013 13:58:53 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: Sorry I accidentally sent the last email. You need the indexed fasta to get a copy of the sequence records that match your criteria: indexed_fasta=SeqIO.index(file_path, 'fasta') SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), new_file_path,'fasta') As for editing the wiki when I click on "Login with OpenID" I get sent to a blank page. I also tried clicking on "Login" and tired to create a new account and was told "The action you have requested is limited to users in the group: Administrators ." On Sun, Apr 14, 2013 at 1:53 PM, Justin Gibbons wrote: > My only goal was to demonstrate how to use SeqIO.write without holding all > of the sequence records in memory by using a generator expression: > > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > new_file_path,'fasta') > > Everything else was just to provide context for the SeqIO.write() > function, but it just ended up just being confusing. > > I am assuming that you want to check the individual fasta records for > specific criteria and then write those that match the criteria to a new > file. Which is why I wrote this: > > for seq_record in SeqIO.parse(file_path, 'fasta'): > #Filter according to some critria: > seq_ids.add(seq_record.id) > > For example you can create individual sets holding the sequence IDs of > sequences that are within a given size range, and aren't repetitive. So > that seq_ids=correct_length_set.intersection(non_repetitive_set) > > You need the indexed fasta so that you can get a copy of the sequence > records that match your criteria: > > ndexed_fasta=SeqIO.index( > file_path, 'fasta') #Can be searched by sequence > ID but is not held in memory > > > > > > On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock wrote: > >> Hi Justin, >> >> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons >> wrote: >> > I want to add the following to the cookbook but I am unable to create an >> > account. >> >> Hmm - we should fix that. Is there a specific error message >> from the wiki? >> >> > #using SeqIO.write() without holding records in memory. >> > >> > from Bio import SeqIO >> > >> > >> > seq_ids=set() #create an empty set to hold the sequence IDs. >> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by >> sequence >> > ID but is not held in memory >> > >> > for seq_record in SeqIO.parse(file_path, 'fasta'): >> > #Filter according to some critria: >> > seq_ids.add(seq_record.id) >> >> Why do call SeqIO.index, but not use it and instead get >> the ID list by doing a full parse of the file? Note that calling >> SeqIO.index is likely faster than SeqIO.parse because the >> index code doesn't actually load the sequence information >> etc - just the record identifier. This speed difference is more >> obvious on heavier file formats like GenBank. e.g. These >> single lines both get all the identifiers as a list: >> >> seq_ids = SeqIO.parse(file_path, 'fasta').keys() >> >> vs: >> >> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')] >> >> Also note that using a set rather than a list for the ids >> means the order is lost - which may be important. >> >> > #write the fasta records to a new file using SeqIO.write() >> > >> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], >> new_file_path, >> > 'fasta') >> > >> >> That last line uses a list comprehension, >> [indexed_fasta[seq_id] for seq_id in seq_ids] >> >> That will therefore load all the records into memory as a list of >> SeqRecord objects, which can be avoided with a list comprehension: >> >> (indexed_fasta[seq_id] for seq_id in seq_ids) >> >> i.e. round brackets not square. >> >> > So if someone who can edit the cookbook wants to add it feel free to. >> > >> > Justin Gibbons >> >> Feedback on the documentation and efforts to improve it >> are always welcome. However, I'm not sure what your example >> is trying to do yet - it seems to rewrite a FASTA file with the >> records in a new order (with the order given by however >> Python sorts the set of IDs). >> >> Thanks, >> >> Peter >> > > From p.j.a.cock at googlemail.com Mon Apr 15 10:10:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 11:10:15 +0100 Subject: [Biopython] BioPython now available on PiCloud by default In-Reply-To: References: Message-ID: On Sat, Apr 13, 2013 at 12:11 AM, John Riley wrote: > Hello, > > We've had some requests for BioPython to be deployed on PiCloud [1]. While > any user could always create a custom environment, and install the latest > version themselves [2], we've decided to address the issue directly by > adding BioPython (1.60) into the default suite of scientific tools on > PiCloud. > > In short, to offload a Python function or program that uses BioPython, you > don't need to do any setup! The instructions for using other scientific > tools work just the same [3]. Hope this helps! > > [1] http://www.picloud.com > [2] http://docs.picloud.com/environment.html > [3] http://docs.picloud.com/howto/pyscientifictools.html > > Best Regards, > John Sounds interesting, and you have some very keen users already :) http://blog.picloud.com/2011/09/27/building-a-biological-database-and-doing-comparative-genomics-in-the-cloud/ Regards, Peter From p.j.a.cock at googlemail.com Mon Apr 15 10:46:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 11:46:53 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons wrote: > Sorry I accidentally sent the last email. > > You need the indexed fasta to get a copy of the sequence records that match > your criteria: > > indexed_fasta=SeqIO.index(file_path, 'fasta') > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > new_file_path,'fasta') With a simple sequential file format like FASTA where there are no complex file headers/footers to worry about, this might be the faster route: with open(new_file_path, "w") as handle: for seq_id in seq_ids: handle.write(indexed_fasta.get_raw(seq_id)) The idea here is never to parse the records into SeqRecord objects, just keep them as raw strings in FASTA format. The same idea works well on GenBank or SwissProt files which are slower to parse, there are examples of this in the main Tutorial, http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Were you intending this to be a self contained cookbook example for: http://biopython.org/wiki/Category:Cookbook ? > As for editing the wiki when I click on "Login with OpenID" I get sent to a > blank page. I also tried clicking on "Login" and tired to create a new > account and was told "The action you have requested is limited to users in > the group: Administrators > ." Thanks - I've passed that on to our volunteer SysAdmin team. (As an aside, do you have a GitHub account and would you think it would be easier to use the wiki hosted on GitHub instead of our own MediaWiki installation?) Thanks, Peter From swang129 at gmail.com Mon Apr 15 11:15:23 2013 From: swang129 at gmail.com (Sarah Wang) Date: Mon, 15 Apr 2013 04:15:23 -0700 Subject: [Biopython] pysam installation errors Inbox x In-Reply-To: References: Message-ID: When I tried to install pysam with "python setup.py install", multiple > warning messages have been generated (error messages copied below). I can > not import pysam. How can I resolve them? Thanks > > $Python setup.py install > > ... > Compiling module Cython.Plex.Scanners ... > Compiling module Cython.Plex.Actions ... > Compiling module Cython.Compiler.Lexicon ... > Compiling module Cython.Compiler.Scanning ... > Compiling module Cython.Compiler.Parsing ... > Compiling module Cython.Compiler.Visitor ... > Compiling module Cython.Compiler.FlowControl ... > Compiling module Cython.Compiler.Code ... > Compiling module Cython.Runtime.refnanny ... > warning: no files found matching '*.pyx' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.pxd' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.h' under directory > 'Cython/Debugger/Tests' > warning: no files found matching '*.pxd' under directory 'Cython/Utility' > clang: warning: argument unused during compilation: '-mno-fused-madd' > /tmp/easy_install-9yggMe/ > Cython-0.18/Cython/Plex/Scanners.c:7117:18: > warning: > unused function '__Pyx_CyFunction_New' [-Wunused-function] > static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef > *ml,... > ^ > 1 warning generated. > /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:2992:31: > warning: > implicit conversion loses integer precision: 'long' to 'int' > [-Wshorten-64-to-32] > __pyx_v_self->input_state = __pyx_v_input_state; > ~ ^~~~~~~~~~~~~~~~~~~ > /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:7117:18: > warning: > unused function '__Pyx_CyFunction_New' [-Wunused-function] > static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef > *ml,... > ^ > 2 warnings generated. > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > clang: warning: argument unused during compilation: '-mno-fused-madd' > Adding Cython 0.18 to easy-install.pth file > Installing cygdb script to /usr/local/bin > Installing cython script to /usr/local/bin > > Installed > /Library/Python/2.7/site-packages/Cython-0.18-py2.7-macosx-10.8-intel.egg > Finished processing dependencies for pysam==0.7.4 > > > >>> import pysam > Traceback (most recent call last): > File "", line 1, in > File "pysam/__init__.py", line 1, in > from pysam.csamtools import * > ImportError: No module named csamtools > From p.j.a.cock at googlemail.com Mon Apr 15 11:27:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Apr 2013 12:27:30 +0100 Subject: [Biopython] pysam installation errors Inbox x In-Reply-To: References: Message-ID: On Mon, Apr 15, 2013 at 12:15 PM, Sarah Wang wrote: > When I tried to install pysam with "python setup.py install", multiple > warning messages have been generated (error messages copied below). I can > not import pysam. How can I resolve them? Thanks Hi Sarah, This is the Biopython mailing list, and while we do discuss other tools in this case the pysam Google Group is the best place to ask: https://groups.google.com/forum/?fromgroups=#!topic/pysam-user-group/tOikIFU_ZFk Peter P.S. Those were compiler warnings, not errors, and I would guess they can be ignored. From ferreirafm at usp.br Mon Apr 15 12:34:12 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Mon, 15 Apr 2013 09:34:12 -0300 Subject: [Biopython] BioPython now available on PiCloud by default In-Reply-To: References: Message-ID: <516BF3C4.1070107@usp.br> Hi John, Thanks for sharing such a very nice module. Best, Fred Em 12-04-2013 20:11, John Riley escreveu: > Hello, > > We've had some requests for BioPython to be deployed on PiCloud [1]. While > any user could always create a custom environment, and install the latest > version themselves [2], we've decided to address the issue directly by > adding BioPython (1.60) into the default suite of scientific tools on > PiCloud. > > In short, to offload a Python function or program that uses BioPython, you > don't need to do any setup! The instructions for using other scientific > tools work just the same [3]. Hope this helps! > > [1] http://www.picloud.com > [2] http://docs.picloud.com/environment.html > [3] http://docs.picloud.com/howto/pyscientifictools.html > > Best Regards, > John > > -- > John Riley > PiCloud, Inc. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Dr. Frederico Moraes Ferreira University of Sao Paulo School of Medice Heart Institute - Immunology Av. Dr. En?as de Carvalho Aguiar, 44 05403-900 S?o Paulo - SP Brasil From jgibbons1 at mail.usf.edu Mon Apr 15 19:40:15 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Mon, 15 Apr 2013 15:40:15 -0400 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: It looks like there is already an example of this in the tutorial under 18.1.5, but I was planning on making it a self contained cookbook example so that it is easier to find. If this is the fastest way to do it though: with open(new_file_path, "w") as handle: for seq_id in seq_ids: handle.write(indexed_fasta. get_raw(seq_id)) Is there any advantage to using SeqIO.write() other then it being shorter? I do not have a GitHub account so I cannot comment on whether it would be easier to use Github. Thanks, Justin On Mon, Apr 15, 2013 at 6:46 AM, Peter Cock wrote: > On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons > wrote: > > Sorry I accidentally sent the last email. > > > > You need the indexed fasta to get a copy of the sequence records that > match > > your criteria: > > > > indexed_fasta=SeqIO.index(file_path, 'fasta') > > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids), > > new_file_path,'fasta') > > With a simple sequential file format like FASTA where there are no complex > file headers/footers to worry about, this might be the faster route: > > with open(new_file_path, "w") as handle: > for seq_id in seq_ids: > handle.write(indexed_fasta.get_raw(seq_id)) > > The idea here is never to parse the records into SeqRecord objects, just > keep them as raw strings in FASTA format. The same idea works well on > GenBank or SwissProt files which are slower to parse, there are examples > of this in the main Tutorial, > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Were you intending this to be a self contained cookbook example for: > http://biopython.org/wiki/Category:Cookbook ? > > > As for editing the wiki when I click on "Login with OpenID" I get sent > to a > > blank page. I also tried clicking on "Login" and tired to create a new > > account and was told "The action you have requested is limited to users > in > > the group: Administrators< > http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1 > > > > ." > > Thanks - I've passed that on to our volunteer SysAdmin team. > > (As an aside, do you have a GitHub account and would you think > it would be easier to use the wiki hosted on GitHub instead of > our own MediaWiki installation?) > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Tue Apr 16 09:02:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 16 Apr 2013 10:02:58 +0100 Subject: [Biopython] Cookbook suggestion In-Reply-To: References: Message-ID: On Mon, Apr 15, 2013 at 8:40 PM, Justin Gibbons wrote: > It looks like there is already an example of this in the tutorial under > 18.1.5, but I was planning on making it a self contained cookbook example > so that it is easier to find. > > If this is the fastest way to do it though: > > with open(new_file_path, "w") as handle: > for seq_id in seq_ids: > handle.write(indexed_fasta. > get_raw(seq_id)) > Is there any advantage to using SeqIO.write() other then it being shorter? There are two linked choices here, (a) Full parsing into SeqRecord objects using SeqIO.parse, or use the SeqIO.index or SeqIO.index_db to just extract the record identifiers. Unless you need some of the annotation or the sequence, parsing it into a SeqRecord is a waste of CPU time. (b) Convert the SeqRecord back into a file on disk, or reuse the original representation from the input file. For a format like FASTA, this is almost a moot point - the only change is the white space (using SeqIO.write will produce consistent line wrapping). For some of the richer formats like GenBank the parse/write round trip is not expected to produce an identical output, so it can be prudent to reuse the original. For some formats like we don't have writing support, so you have to reuse the original. My point whether to use SeqIO.write() or indexing and get_raw() depends on the file format and what you are trying to do. My recommendations would be to use get_raw to write simple file formats without headers/footers if: (*) You need to preserve original records exactly (*) You need this to be as fast as possible (*) SeqIO.write doesn't support the file format Otherwise using SeqIO.write should be fine - it is also simpler in terms of the code to call it. If course, if you are editing the records in any way, then you must use SeqIO.write anyway. > I do not have a GitHub account so I cannot comment on whether > it would be easier to use Github. Thanks. My thinking right now you would need to register separately for (1) the mailing lists, (2) editing the wiki, (3) reporting bugs on RedMine, (4) submitting pull requests on github, If we used GitHub for the wiki and/or issue tracker, this means less user accounts so a little easier for contributors, but also less SysAdmin work behind the scenes. Peter From nuin at genedrift.org Wed Apr 17 18:45:20 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Wed, 17 Apr 2013 14:45:20 -0400 Subject: [Biopython] GEO profiles retrieval Message-ID: Hi everyone Quite a longish question about some data retrieval we are trying to implement on GEO profiles. I don't know if this is possible to achieve programatically with (or without BioPython), but some parts I already have set using Python and BioPython. What we are trying to achieve: - we are building a pipeline where initially we want to see if the gene in question (let's say PTEN) is over or under expressed in certain conditions. - using a eSearch URL/procedure I can get an XML with all the profile IDs for PTEN - in order to get more information about each profile, I can use an eSummary URL/procedure that will get an XML file for each profile - with these profiles we then want to check the gene expression level in each sample subgroup or the study and see if the gene is under or over expressed, or there's no change between the groups. The problem I have is that in the profile XML file there's no information about sample annotation, or gene expression in each sample. I created a workaround that from the eSummary XML, I can get to this page of the profile http://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2877:1441937_s_at using the GDS and probe ID found on the XML. Again, from this file there's no easy way to extract the sample grouping/annotation, although it's quite straightforward to extract the gene expression levels for each sample. What I want to find is: - a way to get sample grouping/annotation for a specific GDS, that would give me the sample IDs that I could correlate to an expression value - a eSearch, eSummary, eFetch, any URL that would give me expression values per sample, with sample ID annotated to a group Thanks in advance for any help, idea and comments. Paulo From markbudde at gmail.com Wed Apr 17 21:24:00 2013 From: markbudde at gmail.com (Mark Budde) Date: Wed, 17 Apr 2013 14:24:00 -0700 Subject: [Biopython] Adding a SeqFeature to a SeqRecord Message-ID: Hi, I have a simple question. The cookbook shows many examples using SeqFeatures, I can't find any information on adding features to a SeqRecord. Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans nucleotides 10..100, is called "Gene1" and is on the reverse strand. How would I add this to my SeqRecord? Thanks, Mark From p.j.a.cock at googlemail.com Wed Apr 17 21:53:57 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Apr 2013 22:53:57 +0100 Subject: [Biopython] Adding a SeqFeature to a SeqRecord In-Reply-To: References: Message-ID: Hi Mark, On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde wrote: > Hi, I have a simple question. The cookbook shows many examples using > SeqFeatures, I can't find any information on adding features to a > SeqRecord. The "Tutorial and Cookbook" does have examples of creating a SeqFeature - if this was not obvious to you how might we make it clearer? http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf See also the docstrings, >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> help(SeqFeature) >>> help(FeatureLocation) Online here (for the current release): http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How > would I add this to my SeqRecord? > > Thanks, > Mark Which version of Biopython do you have? The strand is moving from the SeqFeature to the FeatureLocation, but this will work on old and new: from Bio.SeqFeature import SeqFeature, FeatureLocation loc = FeatureLocation(9, 100) f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"}) This is preferred for future-proofing: from Bio.SeqFeature import SeqFeature, FeatureLocation loc = FeatureLocation(9, 100, strand=-1) f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"}) Exactly where you put the gene name depends on what you'll be doing with the record - for GenBank or EMBL output, using a locus_tag key would be a sensible option. Then if you have a SeqRecord, use my_record.features.append(f) or similar (and for GenBank/EMBL output pay attention to the order). Is that clear? Regards, Peter From markbudde at gmail.com Wed Apr 17 22:52:31 2013 From: markbudde at gmail.com (Mark Budde) Date: Wed, 17 Apr 2013 15:52:31 -0700 Subject: [Biopython] Adding a SeqFeature to a SeqRecord In-Reply-To: References: Message-ID: On Wed, Apr 17, 2013 at 2:53 PM, Peter Cock wrote: > Hi Mark, > > On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde wrote: > > Hi, I have a simple question. The cookbook shows many examples using > > SeqFeatures, I can't find any information on adding features to a > > SeqRecord. > > The "Tutorial and Cookbook" does have examples of creating a > SeqFeature - if this was not obvious to you how might we make > it clearer? > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > I am coming at this from the perspective of generating a plasmid with features on it. I guess most people would be using this for mining data from pubmed or something, so maybe I'm just not the targeted user. I spent a lot of time looking for how to name a feature, like you would in a vector editing program. I now see that I can generate a feature as shown in the first example in 4.3.3 - is this what you are referring to? I was confused earlier because I could never figure out how to name the feature, nor how to add it to the SeqRecord. I can see how to do this from you example below (using qualifiers to name the feature, and append to add the feature). I think the cookbook would benefit from adding a line such as >>> len(MyRecord.features) 0 >>> example_feature.qualifiers['locus_tag'] = 'Gene1' >>> MyRecord.features.append(example_feature) >>> len(MyRecord.features) 1 > See also the docstrings, > > >>> from Bio.SeqFeature import SeqFeature, FeatureLocation > >>> help(SeqFeature) > >>> help(FeatureLocation) > > Online here (for the current release): > http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html > > http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html > > > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans > > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How > > would I add this to my SeqRecord? > > > > Thanks, > > Mark > > Which version of Biopython do you have? The strand is moving > from the SeqFeature to the FeatureLocation, but this will work > on old and new: > > I have v1.59 > from Bio.SeqFeature import SeqFeature, FeatureLocation > loc = FeatureLocation(9, 100) > f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"}) > > This is preferred for future-proofing: > > from Bio.SeqFeature import SeqFeature, FeatureLocation > loc = FeatureLocation(9, 100, strand=-1) > f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"}) > > Exactly where you put the gene name depends on what you'll be > doing with the record - for GenBank or EMBL output, using a > locus_tag key would be a sensible option. > > Then if you have a SeqRecord, use my_record.features.append(f) > or similar (and for GenBank/EMBL output pay attention to the > order). > > Is that clear? Yes. Your example provided here is clear and I think it should be added to the cookbook. > > Regards, > > Peter > Thanks for your help Peter, and pardon my ignorance. -Mark From mictadlo at gmail.com Mon Apr 22 04:05:58 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 22 Apr 2013 14:05:58 +1000 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' Message-ID: Hi, The following code (BioPython 1.61, Blast+ 2.2.26): from Bio.Blast import NCBIXML with open("test/X.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: for alignment in blast_records.alignments: for hsp in alignment.hsps: if hsp.expect < 0.04: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' caused the following error: $ python parseBlastXML.py Traceback (most recent call last): File "parseBlastXML.py", line 8, in for alignment in blast_records.alignments: AttributeError: 'generator' object has no attribute 'alignments' What did I do wrong? Thank you in advance. Mic From mictadlo at gmail.com Mon Apr 22 04:27:12 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 22 Apr 2013 14:27:12 +1000 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' In-Reply-To: References: Message-ID: My mistake. This is the solution from Bio.Blast import NCBIXML with open("test/XA10m_v3.0.aa.snap_vs_uniref90.blastp.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: for alignment in *blast_record.alignments*: for hsp in alignment.hsps: if hsp.expect < 0.04: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' On Mon, Apr 22, 2013 at 2:05 PM, Mic wrote: > Hi, > The following code (BioPython 1.61, Blast+ 2.2.26): > > from Bio.Blast import NCBIXML > > with open("test/X.xml") as bf: > blast_records = NCBIXML.parse(bf) > > for blast_record in blast_records: > for alignment in blast_records.alignments: > for hsp in alignment.hsps: > if hsp.expect < 0.04: > print '****Alignment****' > print 'sequence:', alignment.title > print 'length:', alignment.length > print 'e value:', hsp.expect > print hsp.query[0:75] + '...' > print hsp.match[0:75] + '...' > print hsp.sbjct[0:75] + '...' > > caused the following error: > $ python parseBlastXML.py > Traceback (most recent call last): > File "parseBlastXML.py", line 8, in > for alignment in blast_records.alignments: > AttributeError: 'generator' object has no attribute 'alignments' > > What did I do wrong? > > Thank you in advance. > > Mic > > > From p.j.a.cock at googlemail.com Mon Apr 22 08:08:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Apr 2013 09:08:50 +0100 Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments' In-Reply-To: References: Message-ID: On Monday, April 22, 2013, Mic wrote: > My mistake. This is the solution > from Bio.Blast import NCBIXML Hi Mic, Yep, you had two variables with very similar names. An easy mistake to make - its one of the things you'll learn to check with an AttrributeError: Am I using the object I think I'm using. Well done for solving it yourself, and thank you for posting the solution here. Regards, Peter From mictadlo at gmail.com Wed Apr 24 05:55:06 2013 From: mictadlo at gmail.com (Mic) Date: Wed, 24 Apr 2013 15:55:06 +1000 Subject: [Biopython] NCBIXML: hit start and end Message-ID: Hi, I have tried to rewrite the Perl code to Biopython sub retrieve { my $blast_report = $options->{'blast'}; my $max_hits = $options->{'maxhits'}; my $searchio = new Bio::SearchIO( -format => 'blast', -file => $blast_report ); while ( my $result = $searchio->next_result ) { my $query_name = $result->query_name(); my $count_unirefs = 0; my %hit_names_count = (); while ( my $hit = $result->next_hit ) { $count_unirefs++; my $count_hsp = 0; my @plushsps = (); my @minhsps = (); while ( my $hsp = $hit->next_hsp ) { $count_hsp++; my $query_start = $hsp->start('query'); my $query_end = $hsp->end('query'); my $hit_start = $hsp->start('hit'); my $hit_end = $hsp->end('hit'); my $strand = $hsp->strand(); my $hit_desc = $hit->description(); my @hsp_data = ($query_start, $query_end, $hit_start, $hit_end, $hit_desc); } } } } Biopython code: --------------- from Bio import SeqIO from Bio.Blast import NCBIXML def retrieve_hits_data(): max_hits = 5 # Change to args with open("test/x.xml") as bf: blast_records = NCBIXML.parse(bf) for blast_record in blast_records: print blast_record.query print for alignment in blast_record.alignments: print 'sequence:', alignment.title print alignment.hit_id print alignment.hit_def print 'length:', alignment.length for hsp in alignment.hsps: print "HSPs" print "----" print 'e value:', hsp.expect #print hsp.query #print hsp.match #print hsp.sbjct print hsp.score print hsp.bits print hsp.num_alignments print hsp.identities print hsp.positives print hsp.gaps print hsp.align_length print hsp.strand print hsp.frame print hsp.query_start print hsp.query_end #print hsp.hit_start #print hsp.hit_end print hsp.sbjct_start print hsp.sbjct_end retrieve_hits_data() Output from Biopython code: XA10_v3.0-snap.1 XA10_v3.0-snap.2 XA10_v3.0-snap.3 XA10_v3.0-snap.4 sequence: UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH length: 308 HSPs ---- e value: 8.30308e-88 709.0 277.715 None 146 192 10 285 (None, None) (0, 0) 10 290 8 286 How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in Biopython? Why does blast_record.query appears immediately in sequence and not after the other two for loops has finished? Thank you in advance. Mic From w.arindrarto at gmail.com Wed Apr 24 07:04:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 24 Apr 2013 09:04:02 +0200 Subject: [Biopython] NCBIXML: hit start and end In-Reply-To: References: Message-ID: Hi Mic, > How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in > Biopython? With NCBIXML, they should be hsp.sbjct_start and hsp.sbjct_end respectively. > Why does blast_record.query appears immediately in sequence and not after > the other two for loops has finished? It may be because the first three queries in your BLAST XML results (XA10_v3.0-snap.{1..3}) do not have any hits and hsps. Check with your XML results to be sure. Hope that helps :), Bow From p.j.a.cock at googlemail.com Wed Apr 24 19:19:48 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Apr 2013 20:19:48 +0100 Subject: [Biopython] Biopython GSoC 2013 applications via NESCent Message-ID: To all the Biopythoneers, For the last few years Biopython has participated in the Google Summer of Code (GSoC) program under the umbrella of the Open Bioinformatics Foundation (OBF): https://developers.google.com/open-source/soc/ https://github.com/OBF/GSoC Unfortunately like quite a few previously accepted organisations, this year the OBF not accepted. Google has kept the total about the same year on year, so this is probably simply a slot rotation to get some new organisations involved. The good news (for those not following the Biopython-dev mailing list) is we have an alternative option agreed with the good people at NESCent, as we did back in 2009: http://biopython.org/wiki/Google_Summer_of_Code http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 I'd like to thank Eric for co-ordinating this, and encourage any interested potential students to sign up to the Biopython development list and NESCent's Google+ group as soon as possible (if you haven't done so already): http://lists.open-bio.org/mailman/listinfo/biopython-dev https://plus.google.com/communities/105828320619238393015 Google are already accepting student applications, and the deadline is Friday 3 May. That doesn't leave very long for asking feedback and talking to potential mentors - which is essential for a competitive proposal. Thank you for your interest, Peter From nuin at genedrift.org Thu Apr 25 18:42:07 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 25 Apr 2013 14:42:07 -0400 Subject: [Biopython] PubmedCentral XML parsing Message-ID: Hi What would be the most direct way of parsing XML files downloaded from PubmedCentral ftp using BioPython? These are files that use the archivearticle.dtd and when parsed using non-DTD based code generate broken paragraphs on the body of the document due to < > between

items of the body. Thanks in advance Paulo From p.j.a.cock at googlemail.com Thu Apr 25 19:05:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 25 Apr 2013 20:05:32 +0100 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin wrote: > Hi > > What would be the most direct way of parsing XML files downloaded from > PubmedCentral ftp using BioPython? These are files that use the > archivearticle.dtd and when parsed using non-DTD based code generate broken > paragraphs on the body of the document due to < > between

items of the > body. > > Thanks in advance > > Paulo The Bio.Entrez parser is DTD based, and might suit your needs. Peter From nuin at genedrift.org Thu Apr 25 19:16:49 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 25 Apr 2013 15:16:49 -0400 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: Hi Peter Thanks a lot. I am getting an error when trying to parse with Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP server in order to avoid their bulk downloading restrictions. Anyway, the code I am using is quite simple (with ipython): In [1]: from Bio import Entrez In [2]: handle = open('nihms83342.nxml') In [3]: records = Entrez.parse(handle) In [4]: for i in records: ...: print i ...: --------------------------------------------------------------------------- NotXMLError Traceback (most recent call last) in () ----> 1 for i in records: 2 print i 3 /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle) 229 # We did not see the initial 231 raise NotXMLError("XML declaration not found") 232 self.parser.Parse("", True) 233 self.parser = None NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format. And the file header is

Is there a different way of parsing this file? Thanks in advance Paulo On 2013-04-25, at 3:05 PM, Peter Cock wrote: > On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin wrote: >> Hi >> >> What would be the most direct way of parsing XML files downloaded from >> PubmedCentral ftp using BioPython? These are files that use the >> archivearticle.dtd and when parsed using non-DTD based code generate broken >> paragraphs on the body of the document due to < > between

items of the >> body. >> >> Thanks in advance >> >> Paulo > > The Bio.Entrez parser is DTD based, and might suit your needs. > > Peter From zhigang.wu at email.ucr.edu Sat Apr 27 00:52:19 2013 From: zhigang.wu at email.ucr.edu (Zhigang Wu) Date: Fri, 26 Apr 2013 17:52:19 -0700 Subject: [Biopython] [Biopython-dev] Biopython GSoC 2013 applications via NESCent In-Reply-To: References: Message-ID: Hi Peter, I am interested in implementing the lazy-loading sequence parsers. I know the time is pretty tight for me to write an proposal on it. But even I cannot contribute under the umbrella of GSoC and assuming no body is implemented, I am still interested in implementing this (I just wanna have something nice on my CV and while contributing to Open source software community as well). While at this moment, I don't have very clear picture on how to do it. Can you point me to somewhere where I can start to get a sense how this can be implemented. As far as I know, samtools (view) may have similar techniques in them. Thanks. Zhigang On Wed, Apr 24, 2013 at 12:19 PM, Peter Cock wrote: > To all the Biopythoneers, > > For the last few years Biopython has participated in the > Google Summer of Code (GSoC) program under the umbrella > of the Open Bioinformatics Foundation (OBF): > https://developers.google.com/open-source/soc/ > https://github.com/OBF/GSoC > > Unfortunately like quite a few previously accepted organisations, > this year the OBF not accepted. Google has kept the total about > the same year on year, so this is probably simply a slot rotation > to get some new organisations involved. > > The good news (for those not following the Biopython-dev > mailing list) is we have an alternative option agreed with > the good people at NESCent, as we did back in 2009: > > http://biopython.org/wiki/Google_Summer_of_Code > http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 > > I'd like to thank Eric for co-ordinating this, and encourage > any interested potential students to sign up to the Biopython > development list and NESCent's Google+ group as soon as > possible (if you haven't done so already): > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > https://plus.google.com/communities/105828320619238393015 > > Google are already accepting student applications, and the > deadline is Friday 3 May. That doesn't leave very long for > asking feedback and talking to potential mentors - which > is essential for a competitive proposal. > > Thank you for your interest, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mictadlo at gmail.com Mon Apr 29 01:13:49 2013 From: mictadlo at gmail.com (Mic) Date: Mon, 29 Apr 2013 11:13:49 +1000 Subject: [Biopython] gff installation failed with easy_install Message-ID: Hi, I have tried to install gff with easy_install, but I got the following error: $ easy_install --prefix=/home/mic/apps/pymodules -UZ https://github.com/chapmanb/bcbb/tree/master/gff Downloading https://github.com/chapmanb/bcbb/tree/master/gff error: Unexpected HTML page found at https://github.com/chapmanb/bcbb/tree/master/gff How is it possible to install gff? Thank you in advance. Mic From chapmanb at 50mail.com Mon Apr 29 10:34:42 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Apr 2013 06:34:42 -0400 Subject: [Biopython] gff installation failed with easy_install In-Reply-To: <517DEECF.60705@bx.psu.edu> References: <517DEECF.60705@bx.psu.edu> Message-ID: <87bo8xhbgd.fsf@fastmail.fm> Mic; > I have tried to install gff with easy_install, but I got the following > error: > $ easy_install --prefix=/home/mic/apps/pymodules -UZ > https://github.com/chapmanb/bcbb/tree/master/gff > Downloading https://github.com/chapmanb/bcbb/tree/master/gff > error: Unexpected HTML page found at > https://github.com/chapmanb/bcbb/tree/master/gff > > How is it possible to install gff? I don't know of a way to install directly from git with subdirectories like that. You'd need to clone, then install with easy_install or pip: $ git clone git://github.com/chapmanb/bcbb.git $ easy_install bcbb/gff $ pip install bcbb/gff Apologies about the convoluted setup. Depending on what you're doing, you might want to have a look at gffutils: https://github.com/daler/gffutils We're working on rolling the functionality from the gff library into this so there'll be one place to work from for GFF in python. Hope this helps, Brad From p.j.a.cock at googlemail.com Mon Apr 29 11:23:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 29 Apr 2013 12:23:16 +0100 Subject: [Biopython] PubmedCentral XML parsing In-Reply-To: References: Message-ID: On Thu, Apr 25, 2013 at 8:16 PM, Paulo Nuin wrote: > Hi Peter > > Thanks a lot. I am getting an error when trying to parse with > Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP > server in order to avoid their bulk downloading restrictions. Anyway, the > code I am using is quite simple (with ipython): > > In [1]: from Bio import Entrez > > In [2]: handle = open('nihms83342.nxml') > > In [3]: records = Entrez.parse(handle) > > In [4]: for i in records: > ...: print i > ...: > > --------------------------------------------------------------------------- > NotXMLError Traceback (most recent call > last) > in () > ----> 1 for i in records: > 2 print i > 3 > > /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, > handle) > 229 # We did not see the initial declaration, so > 230 # probably the input data is not in XML > format. > --> 231 raise NotXMLError("XML declaration not > found") > 232 self.parser.Parse("", True) > 233 self.parser = None > > NotXMLError: Failed to parse the XML data (XML declaration not found). > Please make sure that the input data are in XML format. > > And the file header is > > > DTD v2.3 20070202//EN" "archivearticle.dtd"> >

xmlns:mml="http://www.w3.org/1998/Math/MathML" > article-type="research-article" xml:lang="EN"> > > > > > > Is there a different way of parsing this file? > > Thanks in advance > > Paulo Hi Paulo, The header you've shown here does not match the file you attached to the bug report (the where first line is missing and there seem to be no line breaks either): https://redmine.open-bio.org/issues/3430 Where exactly did the nihms83342.nxml file come from? Is there a URL we can download it from to check? Thanks, Peter From mictadlo at gmail.com Tue Apr 30 03:13:19 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 30 Apr 2013 13:13:19 +1000 Subject: [Biopython] gff installation failed with easy_install In-Reply-To: <87bo8xhbgd.fsf@fastmail.fm> References: <517DEECF.60705@bx.psu.edu> <87bo8xhbgd.fsf@fastmail.fm> Message-ID: Thank you it is working. On Mon, Apr 29, 2013 at 8:34 PM, Brad Chapman wrote: > > Mic; > > > I have tried to install gff with easy_install, but I got the following > > error: > > $ easy_install --prefix=/home/mic/apps/pymodules -UZ > > https://github.com/chapmanb/bcbb/tree/master/gff > > Downloading https://github.com/chapmanb/bcbb/tree/master/gff > > error: Unexpected HTML page found at > > https://github.com/chapmanb/bcbb/tree/master/gff > > > > How is it possible to install gff? > > I don't know of a way to install directly from git with subdirectories > like that. You'd need to clone, then install with easy_install or pip: > > $ git clone git://github.com/chapmanb/bcbb.git > $ easy_install bcbb/gff > $ pip install bcbb/gff > > Apologies about the convoluted setup. Depending on what you're doing, > you might want to have a look at gffutils: > > https://github.com/daler/gffutils > > We're working on rolling the functionality from the gff library into > this so there'll be one place to work from for GFF in python. > > Hope this helps, > Brad > From mictadlo at gmail.com Tue Apr 30 04:12:34 2013 From: mictadlo at gmail.com (Mic) Date: Tue, 30 Apr 2013 14:12:34 +1000 Subject: [Biopython] GFF parsing with biopython Message-ID: Hi, I have the following GFF file from a SNAP X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 X1 SNAP Exon 2813 2945 4.836 + . X1-snap.1 X1 SNAP Eterm 3013 3033 10.467 + . X1-snap.1 X1 SNAP Esngl 3457 3702 -17.856 + . X1-snap.2 X1 SNAP Einit 4901 4974 -4.954 + . X1-snap.3 X1 SNAP Eterm 5021 5150 14.231 + . X1-snap.3 X1 SNAP Einit 6245 7325 -1.525 - . X1-snap.4 X1 SNAP Eterm 5974 6008 5.398 - . X1-snap.4 With the code below I have tried to parse the above GFF file from BCBio import GFF from pprint import pprint from BCBio.GFF import GFFExaminer def retrieve_pred_genes_data(): with open("test/X1_small.snap.gff") as sf: #examiner = GFFExaminer() #pprint(examiner.available_limits(sf)) for rec in GFF.parse(sf): pprint(rec.id) pprint(rec.description) pprint(rec.name) pprint(rec.features) #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute retrieve_pred_genes_data() and got the following output: 'X1' '' '' [SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945), strand=1), type='Exon'), SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702), strand=1), type='Esngl'), SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325), strand=-1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008), strand=-1), type='Eterm')] and with GFFExaminer I got these: {'gff_id': {('X1',): 8}, 'gff_source': {('SNAP',): 8}, 'gff_source_type': {('SNAP', 'Einit'): 3, ('SNAP', 'Esngl'): 1, ('SNAP', 'Eterm'): 3, ('SNAP', 'Exon'): 1}, 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}} I found these examples ( https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt), but I got these kind of errors: #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute What did I do wrong and how is it possible to access all fields in the above GFF file? Thank you in advance. Mic