From markbudde at gmail.com  Mon Apr  1 14:41:43 2013
From: markbudde at gmail.com (Mark Budde)
Date: Mon, 1 Apr 2013 11:41:43 -0700
Subject: [Biopython] New to BP. Looking for closely spaced genes
Message-ID: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>

Hi,
Before I dive too far into BioPython, I'd like to get some input if you
BioPython is an appropriate tool for my task....

I would like to look at the human genome ORF structure and identify regions
where ORFs are closely spaced but differentially regulated, and also
identify whether the ORFs are facing the same direction of opposing
directions. To do this, I assume I would first download the annotated
genome and write a script in BioPython annotating how far each ORF is from
it's neighbors, what the orientation is, and store the result in a
dictionary. Then I would download some expression data sets and add this to
the data to the dictionary. Then I would write some algorithm comparing
gene distance, orientation and expression correlation to generate a list of
candidate ORF pairs which fit my criteria.

My question is, is BioPython a reasonable tool to accomplish this, or is it
going to be way to slow whereas some alternative package is better suited
for my task?
Thanks,
Mark Budde

From dtomso at agbiome.com  Mon Apr  1 15:09:39 2013
From: dtomso at agbiome.com (Dan Tomso)
Date: Mon, 1 Apr 2013 19:09:39 +0000
Subject: [Biopython] New to BP. Looking for closely spaced genes
In-Reply-To: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
References: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
Message-ID: <0bdbbf85a7284f21ad6d03aec6ac55cb@SN2PR03MB015.namprd03.prod.outlook.com>

Hi, Mark.

I think BioPython will have the tools you need to do the mechanical handling of sequences.  You might want to contemplate various strategies to do the positional comparisons and data overlays.  For example, if I were approaching this, I would start building position tables for the various content in SQL and then do the set/join/overlap work there.  

But to re-answer your primary question--yes, you can get the sequence and features parsed in BioPython with reasonable convenience.

Best regards,
Dan Tomso

________________________________________
From: biopython-bounces at lists.open-bio.org on behalf of Mark Budde
Sent: Monday, April 01, 2013 2:41 PM
To: biopython
Subject: [Biopython] New to BP. Looking for closely spaced genes

Hi,
Before I dive too far into BioPython, I'd like to get some input if you
BioPython is an appropriate tool for my task....

I would like to look at the human genome ORF structure and identify regions
where ORFs are closely spaced but differentially regulated, and also
identify whether the ORFs are facing the same direction of opposing
directions. To do this, I assume I would first download the annotated
genome and write a script in BioPython annotating how far each ORF is from
it's neighbors, what the orientation is, and store the result in a
dictionary. Then I would download some expression data sets and add this to
the data to the dictionary. Then I would write some algorithm comparing
gene distance, orientation and expression correlation to generate a list of
candidate ORF pairs which fit my criteria.

My question is, is BioPython a reasonable tool to accomplish this, or is it
going to be way to slow whereas some alternative package is better suited
for my task?
Thanks,
Mark Budde
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From jordan.r.willis at Vanderbilt.Edu  Tue Apr  2 00:40:36 2013
From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R)
Date: Tue, 2 Apr 2013 04:40:36 +0000
Subject: [Biopython] Superimposer troubles
Message-ID: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>


Hello List,


I'm having trouble working through some issues with the superimposer for all-atom superpositions. Often times, we work on protein design and our end PDB files differs in atom-number and sometimes composition from our input. I'm a big fan of the Superimposer, so we have implemented like this:

p = PDBParser()
native_pdb = p.get_structure("input","input.pdb")
designed_pdb = p.get_structure("output","output.pdb")


native_ca_atoms = []
native_all_atoms = []
designed_ca_atoms = []
designed_all_atoms = []
for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()):
	native_ca_atoms.append(native_residue['CA'])
	designed_ca_atoms.append(native_residue['CA']
	for (native_atom, designed_atom) in zip(native_residue.get_list(), designed_residue.get_list()):
		native_all_atoms.append(native_atom)
		designed_atom.append(designed_atom)


superpose_ca = Superimposer()
superpose_all = Superimposer()

superpose_ca.set(native_ca_atoms, designed_ca_atoms)
superpose_ca.apply(designed_pdb)
ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms)


superpose_all.set(native_all_atoms, designed_all_atoms)
superpose_ca.apply(designed_pdb)
all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms)


For the CA atom residues its not really a big deal since everything we design has a CA atom. However when we go into all atoms, it turns out that the designed residue and the native residue can be different, thus leading to a different number of atoms. I didn't realize, but the zip function was making these two lists as big as the smallest list and not necessarily matching up the atoms. It would just hack off some part of the larger list!  This way, the superimposer was never failing because it always had an exact match of atoms. Is the superimposer smart enough to just minimize the rmsd no matter how the lists are input, no matter what order? For instance if I put the same arginines atoms backwards in one list, and forwards in the other list, would it still be able to give a 0.0 rmsd?

Thank you for your feedback,
Jordan

PS. Does the superimposer.rms method give back the RMSD of whatever atoms you put into it? Or is it always the CA atoms?


From anaryin at gmail.com  Tue Apr  2 03:07:08 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 2 Apr 2013 09:07:08 +0200
Subject: [Biopython] Superimposer troubles
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAJ9sUYPCXVn1iP_cw5X+-x3bDjEQZ913EQuHvRn08o_+41_-nQ@mail.gmail.com>

Hey Jordan,

Without checking the code, I'd say order matters. The two sequences of
atoms will be aligned per position. If you have ca, c, n, o or ca, n, o, c
you'll get different results.

Try a simple glycine and switch the order of the atoms. I think it should
work like this, but again, not sure.

As for the rms value, it depends on the input. If it's ca only, you get ca
rmsd, etc.

Cheers,

Jo?o
-----

This message was sent from a mobile phone and is likely to be short,
concise, and direct.
No dia 2 de Abr de 2013 07:26, "Willis, Jordan R" <
jordan.r.willis at vanderbilt.edu> escreveu:

>
> Hello List,
>
>
> I'm having trouble working through some issues with the superimposer for
> all-atom superpositions. Often times, we work on protein design and our end
> PDB files differs in atom-number and sometimes composition from our input.
> I'm a big fan of the Superimposer, so we have implemented like this:
>
> p = PDBParser()
> native_pdb = p.get_structure("input","input.pdb")
> designed_pdb = p.get_structure("output","output.pdb")
>
>
> native_ca_atoms = []
> native_all_atoms = []
> designed_ca_atoms = []
> designed_all_atoms = []
> for (native_residue, designed_residue) in zip(native_pdb.get_residues(),
> designed_pdb.get_residues()):
>         native_ca_atoms.append(native_residue['CA'])
>         designed_ca_atoms.append(native_residue['CA']
>         for (native_atom, designed_atom) in zip(native_residue.get_list(),
> designed_residue.get_list()):
>                 native_all_atoms.append(native_atom)
>                 designed_atom.append(designed_atom)
>
>
> superpose_ca = Superimposer()
> superpose_all = Superimposer()
>
> superpose_ca.set(native_ca_atoms, designed_ca_atoms)
> superpose_ca.apply(designed_pdb)
> ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms)
>
>
> superpose_all.set(native_all_atoms, designed_all_atoms)
> superpose_ca.apply(designed_pdb)
> all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms)
>
>
> For the CA atom residues its not really a big deal since everything we
> design has a CA atom. However when we go into all atoms, it turns out that
> the designed residue and the native residue can be different, thus leading
> to a different number of atoms. I didn't realize, but the zip function was
> making these two lists as big as the smallest list and not necessarily
> matching up the atoms. It would just hack off some part of the larger list!
>  This way, the superimposer was never failing because it always had an
> exact match of atoms. Is the superimposer smart enough to just minimize the
> rmsd no matter how the lists are input, no matter what order? For instance
> if I put the same arginines atoms backwards in one list, and forwards in
> the other list, would it still be able to give a 0.0 rmsd?
>
> Thank you for your feedback,
> Jordan
>
> PS. Does the superimposer.rms method give back the RMSD of whatever atoms
> you put into it? Or is it always the CA atoms?
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Tue Apr  2 05:38:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Apr 2013 10:38:24 +0100
Subject: [Biopython] Superimposer troubles
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAKVJ-_4JnmmahzwMnDYtrps8C5B8=hMtB4VUGXkcAMg_gT3AVA@mail.gmail.com>

On Tue, Apr 2, 2013 at 5:40 AM, Willis, Jordan R
<jordan.r.willis at vanderbilt.edu> wrote:
>
> Hello List,
>
>
> I'm having trouble working through some issues with the superimposer for all-atom
> superpositions. Often times, we work on protein design and our end PDB files
>differs in atom-number and sometimes composition from our input. I'm a big fan
> of the Superimposer, so we have implemented like this:
>
> p = PDBParser()
> native_pdb = p.get_structure("input","input.pdb")
> designed_pdb = p.get_structure("output","output.pdb")
>
>
> native_ca_atoms = []
> native_all_atoms = []
> designed_ca_atoms = []
> designed_all_atoms = []
> for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()):
>         native_ca_atoms.append(native_residue['CA'])
>         designed_ca_atoms.append(native_residue['CA']
>         ...
>
> For the CA atom residues its not really a big deal since everything we design
> has a CA atom. However when we go into all atoms, it turns out that the
> designed residue and the native residue can be different, thus leading to a
> different number of atoms. I didn't realize, but the zip function was making
> these two lists as big as the smallest list and not necessarily matching up
> the atoms. It would just hack off some part of the larger list!  This way,
> the superimposer was never failing because it always had an exact
> match of atoms.

How about using izip_longest (from itertools) rather than zip? That
should give a clear error when the residue counts are different.

In general however, dealing with similar but different chains will
require some sort of pairwise alignment and/or restricting to just
backbone atoms (like CA, C-alpha).

Peter

From p.j.a.cock at googlemail.com  Tue Apr  2 12:33:53 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Apr 2013 17:33:53 +0100
Subject: [Biopython] New to BP. Looking for closely spaced genes
In-Reply-To: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
References: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
Message-ID: <CAKVJ-_5WSp82MG988UEBJ7YbksU0PKjizOAMP2t-DemhNJReTA@mail.gmail.com>

On Mon, Apr 1, 2013 at 7:41 PM, Mark Budde <markbudde at gmail.com> wrote:
> Hi,
> Before I dive too far into BioPython, I'd like to get some input if you
> BioPython is an appropriate tool for my task....
>
> I would like to look at the human genome ORF structure and identify regions
> where ORFs are closely spaced but differentially regulated, and also
> identify whether the ORFs are facing the same direction of opposing
> directions. To do this, I assume I would first download the annotated
> genome and write a script in BioPython annotating how far each ORF is from
> it's neighbors, what the orientation is, and store the result in a
> dictionary. Then I would download some expression data sets and add this to
> the data to the dictionary. Then I would write some algorithm comparing
> gene distance, orientation and expression correlation to generate a list of
> candidate ORF pairs which fit my criteria.
>
> My question is, is BioPython a reasonable tool to accomplish this, or is it
> going to be way to slow whereas some alternative package is better suited
> for my task?
> Thanks,
> Mark Budde

Hi Mark,

That sounds very doable with Biopython parsing GenBank format
chromosomes downloaded form the NCBI/EMBL/DDBJ. I did
something similar to look at overlaps and gaps between genes of
bacteria some years back - also using the Biopython GenBank
parser, e.g. http://mbe.oxfordjournals.org/cgi/content/abstract/msp302

In your case with humans there'll be lots of intron/exon structure
(join locations in GenBank) so I'm recommend trying the current
code from git (which will become Biopython 1.62) where this has
been re-factored to hopefully make joins much easier than before.

Regards,

Peter

From linxzh1989 at gmail.com  Fri Apr  5 22:53:49 2013
From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=)
Date: Sat, 6 Apr 2013 10:53:49 +0800
Subject: [Biopython] MUSCLE for alignment
Message-ID: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>

Hi all !
I have a seqdump.fasta file:
>lcl|24977
TGAGAAAGACTTGAGAGGACA

>lcl|24977:8-21
GAGATGACTTAGAGGACA

I want to use a wrapper for Muscle in Biopython to align the two seq.
the alignment result will put into a existing fasta file.

>>>from Bio.Align.Applications import MuscleCommandline
>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')

But i can not find anything in the result.fasta after i run the command.
Do i have any missing to get the result?

regards
Lin

From p.j.a.cock at googlemail.com  Sat Apr  6 04:58:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 09:58:30 +0100
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
Message-ID: <CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>

On Sat, Apr 6, 2013 at 3:53 AM, ??? <linxzh1989 at gmail.com> wrote:
> Hi all !
> I have a seqdump.fasta file:
>>lcl|24977
> TGAGAAAGACTTGAGAGGACA
>
>>lcl|24977:8-21
> GAGATGACTTAGAGGACA
>
> I want to use a wrapper for Muscle in Biopython to align the two seq.
> the alignment result will put into a existing fasta file.
>
>>>>from Bio.Align.Applications import MuscleCommandline
>>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')
>
> But i can not find anything in the result.fasta after i run the command.
> Do i have any missing to get the result?
>
> regards
> Lin

Hi Lin,

In your example you've not yet called Muscle,

#Load the library:
from Bio.Align.Applications import MuscleCommandline

#Create command line wrapper instance,
mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')

#Optionally show what command it would run:
print mcline

#Actually run the command,
stdout, stderr = mcline()

Does that help? The main Tutorial does have some more
detailed examples.

Peter


From p.j.a.cock at googlemail.com  Sat Apr  6 07:41:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 12:41:33 +0100
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
	<CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>
	<CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
Message-ID: <CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>

On Sat, Apr 6, 2013 at 12:18 PM, ??? <linxzh1989 at gmail.com> wrote:
> Thank you! Peter.
> It really helps me.
> If i do not specify it by: stdout, stderr = mcline()
> the alignment will writen to stdout, instead of the output file.
> Is it correct?

MUSCLE will by default write the alignment to stdout, but you
used the out argument to specify an output filename instead.
In this case stdout will probably be empty.

There are some stdout examples using MUSCLE in the
Biopython Tutorial:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Peter

P.S. Please CC the mailing list.


From linxzh1989 at gmail.com  Sat Apr  6 09:57:31 2013
From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=)
Date: Sat, 6 Apr 2013 21:57:31 +0800
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
	<CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>
	<CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
	<CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>
Message-ID: <CALzRd7O2kvahTjF01hQ3OmTywGWgRnXqEKHoKcRtKhN1SbwX9Q@mail.gmail.com>

Thank you for you advice.
I will CC the maillling list.

regards

2013/4/6 Peter Cock <p.j.a.cock at googlemail.com>:
> On Sat, Apr 6, 2013 at 12:18 PM, ?????? <linxzh1989 at gmail.com> wrote:
>> Thank you! Peter.
>> It really helps me.
>> If i do not specify it by: stdout, stderr = mcline()
>> the alignment will writen to stdout, instead of the output file.
>> Is it correct?
>
> MUSCLE will by default write the alignment to stdout, but you
> used the out argument to specify an output filename instead.
> In this case stdout will probably be empty.
>
> There are some stdout examples using MUSCLE in the
> Biopython Tutorial:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Peter
>
> P.S. Please CC the mailing list.


From nicolas.joannin at gmail.com  Sat Apr  6 11:31:40 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Sun, 7 Apr 2013 00:31:40 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
Message-ID: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>

Hello everyone,

I'm having a problem installing biopython with Python 3.3.1rc1...
Basically, I get several tests failing (in addition to a lot of warnings).

I don't think the failed tests will be a problem for my work, however, I
thought you'd want to have a look... Attached is the output of python3
setup.py test.

Also, if you think I shouldn't use biopython without having these failed
tests fixed first, please let me know!

Best regards,
Nicolas
-------------- next part --------------
Nicolass-MacBook-Air:biopython NicojoAir11$ python3 setup.py test
WARNING - Biopython does not yet officially support Python 3
The 2to3 library will be called automatically now,
and the converted files cached under build/py3.3
Processing Bio
Processing BioSQL
Processing Tests
Processing Scripts
Processing Doc
Python 2to3 processing done.
running test
Python version: 3.3.1rc1 (v3.3.1rc1:92c2cfb92405, Mar 25 2013, 00:54:04) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
Operating system: posix darwin
test_Ace ... ok
test_AlignIO ... ok
test_AlignIO_FastaIO ... ok
test_AlignIO_convert ... ok
test_Application ... ok
test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL 
test_BioSQL_psycopg2 ... skipping. Connection failed, check settings if you plan to use BioSQL: FATAL:  role "postgres" does not exist

test_BioSQL_sqlite3 ... ok
test_CAPS ... ok
test_Chi2 ... ok
test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython.
test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython.
test_Cluster ... ok
test_CodonTable ... ok
test_CodonUsage ... ok
test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics.
test_Compass ... ok
test_Crystal ... ok
test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper.
test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL.
test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss.
test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools.
test_EmbossPrimer ... ok
test_Entrez ... ok
test_Entrez_online ... FAIL
test_Enzyme ... ok
test_FSSP ... ok
test_Fasttree_tool ... skipping. Install fasttree and correctly set the file path to the program if you want to use it from Biopython.
test_File ... ok
test_GACrossover ... ok
test_GAMutation ... ok
test_GAOrganism ... ok
test_GAQueens ... ok
test_GARepair ... ok
test_GASelection ... ok
test_GenBank ... ok
test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics.
test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics.
test_HMMCasino ... ok
test_HMMGeneral ... ok
test_HotRand ... ok
test_KDTree ... ok
test_KEGG ... ok
test_KeyWList ... ok
test_Location ... ok
test_LogisticRegression ... ok
test_MMCIF ... skipping. C extension MMCIFlex not installed.
test_Mafft_tool ... ok
test_MarkovModel ... ok
test_Medline ... ok
test_Motif ... ok
test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper.
test_NCBIStandalone ... ok
test_NCBITextParser ... ok
test_NCBIXML ... ok
test_NCBI_BLAST_tools ... ok
test_NCBI_qblast ... ok
test_NNExclusiveOr ... ok
test_NNGene ... ok
test_NNGeneral ... ok
test_Nexus ... ok
test_PAML_baseml ... ok
test_PAML_codeml ... ok
test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper.
test_PAML_yn00 ... ok
test_PDB ... ok
test_PDB_KDTree ... ok
test_ParserSupport ... ok
test_Pathway ... ok
test_Phd ... ok
test_Phylo ... ok
test_PhyloXML ... ok
test_Phylo_CDAO ... skipping. Install the librdf Python bindings if you want to use the CDAO tree format.
test_Phylo_NeXML ... ./test_Phylo_NeXML.py:87: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/9w/kkwnss4n52bbc3crhctbhfnh0000gn/T/tmpf9__6a'>
  t2 = next(NeXMLIO.Parser(open(DUMMY, 'rb')).parse())
ok
test_Phylo_depend ... skipping. Install matplotlib if you want to use Bio.Phylo._utils.
test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist.
test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist.
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop.
test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop.
test_PopGen_GenePop_nodepend ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal.
test_PopGen_SimCoal_nodepend ... ok
test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper.
test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper.
test_ProtParam ... ok
test_Restriction ... ok
test_SCOP_Astral ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='SCOP/scopseq-test/astral-scopdom-seqres-all-test.fa' mode='r' encoding='UTF-8'>
  for record in sequences:
ok
test_SCOP_Cla ... ok
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... ok
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SCOP_online ... ok
test_SVDSuperimposer ... ok
test_SearchIO_blast_tab ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release.
  BiopythonExperimentalWarning)
ok
test_SearchIO_blast_tab_index ... ok
test_SearchIO_blast_text ... ok
test_SearchIO_blast_xml ... ok
test_SearchIO_blast_xml_index ... ok
test_SearchIO_blat_psl ... ok
test_SearchIO_blat_psl_index ... ok
test_SearchIO_exonerate ... ok
test_SearchIO_exonerate_text_index ... ok
test_SearchIO_exonerate_vulgar_index ... ok
test_SearchIO_fasta_m10 ... ok
test_SearchIO_fasta_m10_index ... ok
test_SearchIO_hmmer2_text ... ok
test_SearchIO_hmmer2_text_index ... ok
test_SearchIO_hmmer3_domtab ... ok
test_SearchIO_hmmer3_domtab_index ... ok
test_SearchIO_hmmer3_tab ... ok
test_SearchIO_hmmer3_tab_index ... ok
test_SearchIO_hmmer3_text ... ok
test_SearchIO_hmmer3_text_index ... ok
test_SearchIO_model ... ok
test_SearchIO_write ... ok
test_SeqIO ... ok
test_SeqIO_AbiIO ... ok
test_SeqIO_FastaIO ... ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
ok
test_SeqIO_Insdc ... ok
test_SeqIO_PdbIO ... ok
test_SeqIO_QualityIO ... ./test_SeqIO_QualityIO.py:348: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  records1 = list(SeqIO.parse(open("Quality/example.fasta"),"fasta"))
./test_SeqIO_QualityIO.py:349: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:357: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  self.assertEqual(h.getvalue(),open("Quality/example.fasta").read())
./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  open("Quality/example.qual")))
./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  open("Quality/example.qual")))
./test_SeqIO_QualityIO.py:329: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
./test_SeqIO_QualityIO.py:334: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  records1 = list(SeqIO.parse(open("Quality/example.qual"),"qual"))
./test_SeqIO_QualityIO.py:335: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
./test_SeqIO_QualityIO.py:344: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  self.assertEqual(h.getvalue(),open("Quality/example.qual").read())
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_original_illumina.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_original_solexa.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.fasta' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.qual' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.fasta' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.qual' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_end.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_start.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_in_middle.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_at_start.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_in_middle.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_no_manifest.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/greek.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/paired.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_93.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/tricky.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
ok
test_SeqIO_SeqXML ... ./test_SeqIO_SeqXML.py:141: DeprecationWarning: Please use assertEqual instead.
  self.assertEquals(len(read1_records),len(read2_records))
ok
test_SeqIO_convert ... ok
test_SeqIO_features ... ./test_SeqIO_features.py:190: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/iro.gb' mode='rU' encoding='UTF-8'>
  gbk_template = open("GenBank/iro.gb", "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqFeature.py:155: BiopythonDeprecationWarning: Rather than sub_features, use a CompoundFeatureLocation
  BiopythonDeprecationWarning)
./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds"))
./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.faa' mode='r' encoding='UTF-8'>
  fasta = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds"))
./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'>
  fasta = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:1070: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1072: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'>
  fa_records = list(SeqIO.parse(open(self.ffn_filename),"fasta"))
./test_SeqIO_features.py:1023: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1024: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  embl_record = SeqIO.read(open(self.embl_filename),"embl")
./test_SeqIO_features.py:1054: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1055: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.fna' mode='r' encoding='UTF-8'>
  fa_record = SeqIO.read(open(self.fna_filename),"fasta")
./test_SeqIO_features.py:1059: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  embl_record = SeqIO.read(open(self.embl_filename),"embl")
./test_SeqIO_features.py:1036: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'>
  faa_records = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:1037: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'>
  ffn_records = list(SeqIO.parse(open(self.ffn_filename),"fasta"))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AAA03323.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/DD231055_edited.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/Human_contigs.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NT_019265.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/SC10H5.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/TRBG361.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/U87107.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/arab1.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/blank_seq.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/cor6_6.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/dbsource_wrap.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/extra_keywords.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/gbvrl1_start.seq' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/noref.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/one_of.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/origin_line.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/pri1.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq2.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
ok
test_SeqIO_index ... FAIL
test_SeqIO_online ... ok
test_SeqIO_write ... ok
test_SeqRecord ... ok
test_SeqUtils ... ./test_SeqUtils.py:71: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(dna_genbank_filename), "genbank")
./test_SeqUtils.py:55: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  seq_records = list(SeqIO.parse(open(dna_fasta_filename), "fasta"))
ok
test_Seq_objs ... ok
test_SffIO ... ok
test_SubsMat ... ./test_SubsMat.py:21: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_count.txt' mode='r' encoding='UTF-8'>
  ftab_prot = FreqTable.read_count(open(ftab_file))
./test_SubsMat.py:23: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_freq.txt' mode='r' encoding='UTF-8'>
  ctab_prot = FreqTable.read_freq(open(ctab_file))
./test_SubsMat.py:31: ResourceWarning: unclosed file <_io.BufferedReader name='SubsMat/acc_rep_mat.pik'>
  acc_rep_mat = pickle.load(open(pickle_file, 'rb'))
ok
test_SwissProt ... ok
test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper.
test_TogoWS ... ./test_TogoWS.py:501: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "embl"), "embl")
./test_TogoWS.py:494: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "fasta"), "fasta")
ok
test_Tutorial ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk.bgz'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_005.txt'>
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result1.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result2.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='lipoprotein.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='REB1.pfm' mode='r' encoding='UTF-8'>
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='meme.out' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='alignace.out' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'>
ok
test_UniGene ... ok
test_Uniprot ... ./test_Uniprot.py:314: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'>
  ids = [x.strip() for x in open("SwissProt/multi_ex.list")]
./test_Uniprot.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'>
  ids = [x.strip() for x in open("SwissProt/multi_ex.list")]
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.txt'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.xml'>
  function()
ok
test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython.
test_align ... ok
test_bgzf ... FAIL
test_geo ... ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSE16.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM645.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM691.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM700.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM804.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy_chp.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_dual.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_family.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_platform.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
ok
test_kNN ... ok
test_lowess ... ok
test_motifs ... ok
test_pairwise2 ... ok
test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper.
test_prodoc ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/Doc/prosite.excerpt.doc' mode='r' encoding='UTF-8'>
  function()
ok
test_prosite1 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00107.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00159.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00165.txt' mode='r' encoding='UTF-8'>
  function()
ok
test_prosite2 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00432.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00488.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00546.txt' mode='r' encoding='UTF-8'>
  function()
ok
test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
test_py3k ... ok
test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper.
test_seq ... ok
test_translate ... ok
test_trie ... skipping. Could not import Bio.trie, check C code was compiled.
Bio.Align docstring test ... ok
Bio.Align.Generic docstring test ... ok
Bio.Align.Applications._Clustalw docstring test ... ok
Bio.Align.Applications._ClustalOmega docstring test ... ok
Bio.Align.Applications._Mafft docstring test ... ok
Bio.Align.Applications._Muscle docstring test ... ok
Bio.Align.Applications._Probcons docstring test ... ok
Bio.Align.Applications._Prank docstring test ... ok
Bio.Align.Applications._TCoffee docstring test ... ok
Bio.AlignIO docstring test ... ok
Bio.AlignIO.StockholmIO docstring test ... ok
Bio.Alphabet docstring test ... ok
Bio.Application docstring test ... ok
Bio.bgzf docstring test ... FAIL
Bio.Blast.Applications docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
ok
Bio.Emboss.Applications docstring test ... ok
Bio.GenBank docstring test ... ok
Bio.KEGG.Compound docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/compound.sample' mode='r' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.KEGG.Enzyme docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/enzyme.sample' mode='r' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.Motif docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/SRF.pfm' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/meme.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  exception = None
ok
Bio.Motif.Applications._AlignAce docstring test ... ok
Bio.Motif.Applications._XXmotif docstring test ... ok
Bio.motifs docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/SRF.pfm' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/meme.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'>
  exception = None
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
ok
Bio.motifs.applications._alignace docstring test ... ok
Bio.motifs.applications._xxmotif docstring test ... ok
Bio.pairwise2 docstring test ... ok
Bio.Phylo.Applications._Raxml docstring test ... ok
Bio.SearchIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml.bgz'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'>
  test.globs.clear()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'>
  test.globs.clear()
ok
Bio.SearchIO._model docstring test ... ok
Bio.SearchIO._model.query docstring test ... ok
Bio.SearchIO._model.hit docstring test ... ok
Bio.SearchIO._model.hsp docstring test ... ok
Bio.SearchIO.BlastIO docstring test ... ok
Bio.SearchIO.HmmerIO docstring test ... ok
Bio.SearchIO.FastaIO docstring test ... ok
Bio.SearchIO.BlatIO docstring test ... ok
Bio.SearchIO.ExonerateIO docstring test ... ok
Bio.SeqIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'>
  test.globs.clear()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  for record in sequences:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq.bgz'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_000932.faa'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_005816.faa'>
  test.globs.clear()
ok
Bio.SeqIO.FastaIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'>
  # Copyright 2006-2009 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'>
  # Copyright 2006-2009 by Peter Cock.  All rights reserved.
ok
Bio.SeqIO.AceIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/AceIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/consed_sample.ace' mode='rU' encoding='UTF-8'>
  # Copyright 2008-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/contig1.ace' mode='rU' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.SeqIO.PhdIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/PhdIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Phd/phd1' mode='r' encoding='UTF-8'>
  # Copyright 2008-2010 by Peter Cock.  All rights reserved.
ok
Bio.SeqIO.QualityIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  for record in sequences:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'>
  for record in records:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'>
  for record in records:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'>
  for record in records:
ok
./run_tests.py:427: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'>
  gc.collect()
Bio.SeqIO.SffIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/SffIO.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  test.globs.clear()
ok
Bio.SeqFeature docstring test ... ok
Bio.SeqRecord docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqRecord.py:2: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='rU' encoding='UTF-8'>
  # Copyright 2002-2004 Brad Chapman.
ok
Bio.SeqUtils docstring test ... ok
Bio.SeqUtils.MeltingTemp docstring test ... ok
Bio.Sequencing.Applications._Novoalign docstring test ... ok
Bio.Wise docstring test ... ok
Bio.Wise.psw docstring test ... ok
Bio.Statistics.lowess docstring test ... ok
Bio.PDB.Polypeptide docstring test ... ok
Bio.PDB.Selection docstring test ... ok
======================================================================
ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase)
Test Entrez.read from URL
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_Entrez_online.py", line 44, in test_read_from_url
    rec = Entrez.read(einfo)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/__init__.py", line 367, in read
    record = handler.read(handle)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 184, in read
    self.parser.ParseFile(handle)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 322, in endElementHandler
    raise RuntimeError(value)
RuntimeError: Unable to open connection to #DbInfo?dbaf=

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_simple (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_simple (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_simple (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_bam_ex1 (test_bgzf.BgzfTests)
Reproduce BGZF compression for BAM file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 288, in test_bam_ex1
    self.rewrite("SamBam/ex1.bam", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_cor6 (test_bgzf.BgzfTests)
Reproduce BGZF compression for cor6_6.gb GenBank file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 313, in test_example_cor6
    self.rewrite("GenBank/cor6_6.gb.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_fastq (test_bgzf.BgzfTests)
Reproduce BGZF compression for a FASTQ file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 301, in test_example_fastq
    self.rewrite("Quality/example.fastq.gz", temp_file)
  File "./test_bgzf.py", line 45, in rewrite
    new_data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_gb (test_bgzf.BgzfTests)
Reproduce BGZF compression for NC_000932 GenBank file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 307, in test_example_gb
    self.rewrite("GenBank/NC_000932.gb.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_wnts_xml (test_bgzf.BgzfTests)
Reproduce BGZF compression for wnts.xml BLAST file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 319, in test_example_wnts_xml
    self.rewrite("Blast/wnts.xml.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_iter_bam_ex1 (test_bgzf.BgzfTests)
Check iteration over SamBam/ex1.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 296, in test_iter_bam_ex1
    self.check_by_char("SamBam/ex1.bam", "SamBam/ex1.bam", True)
  File "./test_bgzf.py", line 112, in check_by_char
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1 (test_bgzf.BgzfTests)
Check random access to SamBam/ex1.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 232, in test_random_bam_ex1
    self.check_random("SamBam/ex1.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1_header (test_bgzf.BgzfTests)
Check random access to SamBam/ex1_header.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 240, in test_random_bam_ex1_header
    self.check_random("SamBam/ex1_header.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1_refresh (test_bgzf.BgzfTests)
Check random access to SamBam/ex1_refresh.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 236, in test_random_bam_ex1_refresh
    self.check_random("SamBam/ex1_refresh.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_example_cor6 (test_bgzf.BgzfTests)
Check random access to GenBank/cor6_6.gb.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 252, in test_random_example_cor6
    self.check_random("GenBank/cor6_6.gb.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_example_fastq (test_bgzf.BgzfTests)
Check random access to Quality/example.fastq.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 248, in test_random_example_fastq
    self.check_random("Quality/example.fastq.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_wnts_xml (test_bgzf.BgzfTests)
Check random access to Blast/wnts.xml.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 244, in test_random_wnts_xml
    self.check_random("Blast/wnts.xml.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
FAIL: bgzf (Bio)
Doctest: Bio.bgzf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 2154, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for Bio.bgzf
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 6, in bgzf

----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 126, in Bio.bgzf
Failed example:
    line = handle.readline()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[10]>", line 1, in <module>
        line = handle.readline()
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline
        c = self.read(readsize)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
        self._read_exact(struct.unpack("<H", self._read_exact(2)))
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
        data = self.fileobj.read(n)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
        return self.file.read(size)
    TypeError: integer argument expected, got 'tuple'
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 127, in Bio.bgzf
Failed example:
    assert 80 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[11]>", line 1, in <module>
        assert 80 == handle.tell()
    AssertionError
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 128, in Bio.bgzf
Failed example:
    line = handle.readline()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[12]>", line 1, in <module>
        line = handle.readline()
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline
        c = self.read(readsize)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header
        raise IOError('Not a gzipped file')
    OSError: Not a gzipped file
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 129, in Bio.bgzf
Failed example:
    assert 143 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[13]>", line 1, in <module>
        assert 143 == handle.tell()
    AssertionError
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 130, in Bio.bgzf
Failed example:
    data = handle.read(70000)
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[14]>", line 1, in <module>
        data = handle.read(70000)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header
        raise IOError('Not a gzipped file')
    OSError: Not a gzipped file
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 131, in Bio.bgzf
Failed example:
    assert 70143 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[15]>", line 1, in <module>
        assert 70143 == handle.tell()
    AssertionError


----------------------------------------------------------------------
Ran 217 tests in 238.221 seconds

FAILED (failures = 4)

From p.j.a.cock at googlemail.com  Sat Apr  6 14:19:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 19:19:43 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
Message-ID: <CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>

On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hello everyone,
>
> I'm having a problem installing biopython with Python 3.3.1rc1...
> Basically, I get several tests failing (in addition to a lot of warnings).
>
> I don't think the failed tests will be a problem for my work, however, I
> thought you'd want to have a look... Attached is the output of python3
> setup.py test.
>
> Also, if you think I shouldn't use biopython without having these failed
> tests fixed first, please let me know!
>
> Best regards,
> Nicolas

Hi Nicolas,

You should be OK installing this - all the test failures are
within Bio.bgzf which is curious, but you probably won't be
using BGZF compressed files.

We do have buildslaves testing on Python 3.3.0 where this
does not happen, so perhaps this is a new failure from a
change in Python 3.3.1rc1 - hopefully I'll be able to confirm
that by updating one of the buildslaves.

Thanks for the alert,

Peter

From markbudde at gmail.com  Sat Apr  6 20:36:10 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sat, 6 Apr 2013 17:36:10 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
Message-ID: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>

Hi - I have a question about sticky ends in Biopython. Specifically, is
there any way to  maintain sticky end information? Having read the
restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
I suspect that the answer is no. It seems that the cut sites are only
maintained for the top strand. So I am planning on adding this data in my
program (although I will need to read up on classes).

However, this requires that I can get the cut site information. The only
way that I can find to extract this information is from the
Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
use this information to determine the cut sites, but I expect that there is
a more direct way, since the elucidate() function must be generating this
from some attribute.

FYI, I am curious about this because I want to simulate GoldenGate cloning
in Biopython.

Thanks,
Mark Budde

From markbudde at gmail.com  Sat Apr  6 21:11:36 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sat, 6 Apr 2013 18:11:36 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
Message-ID: <CAEwaGEuYHg4M+4H+9CLoMVRUsg2d2AA7pOiwtNvXT496ZUy55Q@mail.gmail.com>

Hi - I have a question about sticky ends in Biopython. Specifically, is
there any way to  maintain sticky end information? Having read the
restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
I suspect that the answer is no. It seems that the cut sites are only
maintained for the top strand. So I am planning on adding this data in my
program (although I will need to read up on classes).

However, this requires that I can get the cut site information. The only
way that I can find to extract this information is from the
Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
use this information to determine the cut sites, but I expect that there is
a more direct way, since the elucidate() function must be generating this
from some attribute.

FYI, I am curious about this because I want to simulate GoldenGate cloning
in Biopython.

Thanks,
Mark Budde

From nicolas.joannin at gmail.com  Sat Apr  6 23:12:54 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Sun, 7 Apr 2013 12:12:54 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
Message-ID: <CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>

Hi Peter,

Thanks for the quick reply!
Indeed, I don't think it is a big issue for me, and I have also not had any
problems with Python 3.3.0 on another machine.
So, yes, it probably is linked to the Python 3.3.1rc1...

However, I should point out that it is not only the Bio.bgzf that fails
testing.
There are also test_Entrez_online and test_SeqIO_index that are indicated
as "FAIL" (both of which I do not directly use).

Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Sun, Apr 7, 2013 at 3:19 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hello everyone,
> >
> > I'm having a problem installing biopython with Python 3.3.1rc1...
> > Basically, I get several tests failing (in addition to a lot of
> warnings).
> >
> > I don't think the failed tests will be a problem for my work, however, I
> > thought you'd want to have a look... Attached is the output of python3
> > setup.py test.
> >
> > Also, if you think I shouldn't use biopython without having these failed
> > tests fixed first, please let me know!
> >
> > Best regards,
> > Nicolas
>
> Hi Nicolas,
>
> You should be OK installing this - all the test failures are
> within Bio.bgzf which is curious, but you probably won't be
> using BGZF compressed files.
>
> We do have buildslaves testing on Python 3.3.0 where this
> does not happen, so perhaps this is a new failure from a
> change in Python 3.3.1rc1 - hopefully I'll be able to confirm
> that by updating one of the buildslaves.
>
> Thanks for the alert,
>
> Peter
>

From p.j.a.cock at googlemail.com  Sun Apr  7 10:41:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 7 Apr 2013 15:41:33 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
Message-ID: <CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>

On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hi Peter,
>
> Thanks for the quick reply!
> Indeed, I don't think it is a big issue for me, and I have also not had any
> problems with Python 3.3.0 on another machine.
> So, yes, it probably is linked to the Python 3.3.1rc1...

I see that Python 3.3.1 final is out now - might be worth checking
that too, and I'll try to update one of our buildslaves to use this.

> However, I should point out that it is not only the Bio.bgzf that fails
> testing.
> There are also test_Entrez_online and test_SeqIO_index that are indicated as
> "FAIL" (both of which I do not directly use).

The test_SeqIO_index.py failures all looked to be BGZF related too.

I missed the Entrez test, but as an online test that can sometimes
fail intermittently anyway. The chances are on rerunning it'll be fine.

Peter

From bjorn_johansson at bio.uminho.pt  Sun Apr  7 14:05:11 2013
From: bjorn_johansson at bio.uminho.pt (=?ISO-8859-1?Q?Bj=F6rn_Johansson?=)
Date: Sun, 7 Apr 2013 19:05:11 +0100
Subject: [Biopython] sticky ends in Biopython
Message-ID: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>

>
> Message: 2
> Date: Sat, 6 Apr 2013 17:36:10 -0700
> From: Mark Budde <markbudde at gmail.com>
> Subject: [Biopython] Restriction enzymes and sticky ends
> To: biopython <biopython at lists.open-bio.org>
> Message-ID:
>         <
> CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi - I have a question about sticky ends in Biopython. Specifically, is
> there any way to  maintain sticky end information? Having read the
> restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html
> ),
> I suspect that the answer is no. It seems that the cut sites are only
> maintained for the top strand. So I am planning on adding this data in my
> program (although I will need to read up on classes).
>
> However, this requires that I can get the cut site information. The only
> way that I can find to extract this information is from the
> Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
> use this information to determine the cut sites, but I expect that there is
> a more direct way, since the elucidate() function must be generating this
> from some attribute.
>
> FYI, I am curious about this because I want to simulate GoldenGate cloning
> in Biopython.
>
> Thanks,
> Mark Budde
>
>
> ------------------------------
>

Hi Mark,

Check out Python-dna that have classes for dealing with
double stranded DNA. This package depends on Biopython and a couple of
additional modules.

Disclaimer: I am the developer of Python-dna

Python-dna at pypi https://pypi.python.org/pypi/python-dna/
Source code        https://code.google.com/p/pydna/
Documentation      http://python-dna.readthedocs.org/
Discussion group
https://groups.google.com/forum/?fromgroups#!forum/python-dna

/ Bjorn Johansson


-- 
______O_________oO________oO______o_______oO__
Bj?rn Johansson
Assistant Professor
Departament of Biology
University of Minho
Campus de Gualtar
4710-057 Braga
PORTUGAL
www.bio.uminho.pt
Google profile <https://profiles.google.com/bjornjobb>
Google Scholar Profile<http://scholar.google.com/citations?user=7AiEuJ4AAAAJ>
my group <https://sites.google.com/site/metabolicengineeringgroup/>
Office (direct) +351-253 601517 | (PT) mob.  +351-967 147 704 | (SWE) mob.
 +46 739 792 968
Dept of Biology (secr) +351-253 60 4310  | fax +351-253 678980


From markbudde at gmail.com  Sun Apr  7 14:48:16 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sun, 7 Apr 2013 11:48:16 -0700
Subject: [Biopython] sticky ends in Biopython
In-Reply-To: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>
References: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>
Message-ID: <CAEwaGEsBR7D9pBo=3HLF1tkRiyXv5qq=uusCv3qsj2kupYiXXg@mail.gmail.com>

OK, that looks useful. Thanks.
-Mark


On Sun, Apr 7, 2013 at 11:05 AM, Bj?rn Johansson <
bjorn_johansson at bio.uminho.pt> wrote:

> >
> > Message: 2
> > Date: Sat, 6 Apr 2013 17:36:10 -0700
> > From: Mark Budde <markbudde at gmail.com>
> > Subject: [Biopython] Restriction enzymes and sticky ends
> > To: biopython <biopython at lists.open-bio.org>
> > Message-ID:
> >         <
> > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi - I have a question about sticky ends in Biopython. Specifically, is
> > there any way to  maintain sticky end information? Having read the
> > restriction doc (
> http://biopython.org/DIST/docs/cookbook/Restriction.html
> > ),
> > I suspect that the answer is no. It seems that the cut sites are only
> > maintained for the top strand. So I am planning on adding this data in my
> > program (although I will need to read up on classes).
> >
> > However, this requires that I can get the cut site information. The only
> > way that I can find to extract this information is from the
> > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I
> can
> > use this information to determine the cut sites, but I expect that there
> is
> > a more direct way, since the elucidate() function must be generating this
> > from some attribute.
> >
> > FYI, I am curious about this because I want to simulate GoldenGate
> cloning
> > in Biopython.
> >
> > Thanks,
> > Mark Budde
> >
> >
> > ------------------------------
> >
>
> Hi Mark,
>
> Check out Python-dna that have classes for dealing with
> double stranded DNA. This package depends on Biopython and a couple of
> additional modules.
>
> Disclaimer: I am the developer of Python-dna
>
> Python-dna at pypi https://pypi.python.org/pypi/python-dna/
> Source code        https://code.google.com/p/pydna/
> Documentation      http://python-dna.readthedocs.org/
> Discussion group
> https://groups.google.com/forum/?fromgroups#!forum/python-dna
>
> / Bjorn Johansson
>
>
>
> --
> ______O_________oO________oO______o_______oO__
> Bj?rn Johansson
> Assistant Professor
> Departament of Biology
> University of Minho
> Campus de Gualtar
> 4710-057 Braga
> PORTUGAL
> www.bio.uminho.pt
> Google profile <https://profiles.google.com/bjornjobb>
> Google Scholar Profile<
> http://scholar.google.com/citations?user=7AiEuJ4AAAAJ>
> my group <https://sites.google.com/site/metabolicengineeringgroup/>
> Office (direct) +351-253 601517 | (PT) mob.  +351-967 147 704 | (SWE) mob.
>  +46 739 792 968
> Dept of Biology (secr) +351-253 60 4310  | fax +351-253 678980
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Sun Apr  7 15:52:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 7 Apr 2013 20:52:13 +0100
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
Message-ID: <CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>

On Sun, Apr 7, 2013 at 1:36 AM, Mark Budde <markbudde at gmail.com> wrote:
> Hi - I have a question about sticky ends in Biopython. Specifically, is
> there any way to  maintain sticky end information? Having read the
> restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
> I suspect that the answer is no. It seems that the cut sites are only
> maintained for the top strand. So I am planning on adding this data in my
> program (although I will need to read up on classes).
>
> However, this requires that I can get the cut site information. The only
> way that I can find to extract this information is from the
> Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
> use this information to determine the cut sites, but I expect that there is
> a more direct way, since the elucidate() function must be generating this
> from some attribute.
>
> FYI, I am curious about this because I want to simulate GoldenGate cloning
> in Biopython.
>
> Thanks,
> Mark Budde

Hi Mark,

Good question. Sadly help(EcoRI) doesn't tell you very much,
does it? The whole Restriction module could benefit from a
new maintainer and/or a rewrite (for one thing, it unfortunately
did not follow Python counting in some aspects).

Two tips: first dir(object) gives a list of the attributes and methods
of an object in Python. Second, you can look at the source of the
elucidate method to see where it gets the information you're
looking for ;)  [A last resort perhaps - but when documentation
has let you down, worth knowing how to explore.]

https://github.com/biopython/biopython/blob/master/Bio/Restriction/Restriction.py

Here EcoRI is a 5' overhanging digest enzyme, and the values
you need are EcoRI.fst5 (here 1) and EcoRI.fst3 (here -1)
which are relative to the recognition site (here GAATTC).
e.g.

Overhang type methods include:

>>> from Bio.Restriction import EcoRI
>>> EcoRI.overhang()
"5' overhang"
>>> EcoRI.is_blunt()
False
>>> EcoRI.is_5overhang()
True
>>> EcoRI.is_3overhang()
False

>>> EcoRI.elucidate()
'G^AATT_C'
>>> EcoRI.fst5
1
>>> EcoRI.fst3
-1
>>> EcoRI.site
'GAATTC'

Notice 'GAATTC'[:1] = 'G', 'GAATTC'[1:-1] = 'AATT' and
'GAATTC'[-1:] = 'C' which gives the elucidated string.

Is that all you needed?

Regards

Peter

From p.j.a.cock at googlemail.com  Mon Apr  8 05:32:00 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 10:32:00 +0100
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
	<CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>
	<CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
Message-ID: <CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>

On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde <markbudde at gmail.com> wrote:
> Thanks for doing some digging on my behalf, Peter. After I posted my email
> last night, I started looking through the Bio.Restriction code myself. You
> response is helpful, I was having trouble seeing how the cut site was
> encoded for each strand. I think Bjorn's python-dna might be a better
> starting place for me than Bio.Restriction, as it already has some of the
> functionality I was looking for.

Fair enough.

> However, to you question, I'm still not quite getting the cut sites. You
> example with EcoRI makes complete sense, but I can't figure out the pattern
> for some other enzymes, such as BsaI, which is why I got confused initially.
> If you repeat that protocol for BsaI, the results don't match up.
>
> In [80]: BsaI.elucidate()
> Out[80]: 'GGTCTCN^NNNN_N'
>
> In [81]: BsaI.fst5
> Out[81]: 7
>
> In [82]: BsaI.fst3
> Out[82]: 5
>
> In [83]: BsaI.site
> Out[83]: 'GGTCTC'
>
> Based on this, I would expect that BsaI.fst3 should yield
> "11" but it yields 5.

I think you are counting from the wrong reference point.
Using Python style indexing would only allow cleavage
points within the recognition site to be described.

BsaI is a weird enzyme, and appears to be handled by the
Ambiguous class in Bio/Restriction/Restriction.py - which
says this is for enzymes for which the overhang is variable.

>>> from Bio.Restriction import Bsal
>>> BsaI.is_ambiguous()
True
>>> BsaI.is_defined() # is there a consistent site?
False
>>> BsaI.is_unknown()
False
>>> BsaI.fst5
7
>>> BsaI.fst3
5
>>> BsaI.elucidate()
'GGTCTCN^NNNN_N'

This subclass has a more complicated elucidate method,
but gives the same string as the REBASE website, so this
is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html

The 5' cut site of 7 clearly means this is downstream of
the 6 bp recognition site. This appears to be counted
from the start (left) of the restriction site.

>From the illustration the 3' cut side is also to the right of
the 5bp recognition site. It appears the number is counted
from the end (right) of the recognition site, where positive
as in BsaI means to the right (after the recognition site)
while negative as in EcoRI means to the left (within the
recognition site).

Peter

P.S. Please remember to CC the mailing list, e.g. reply all.
Unless people say explicitly that they have done this deliberately,
I generally assume taking a public discussion off list is accidental.

From nicolas.joannin at gmail.com  Mon Apr  8 09:21:45 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Mon, 8 Apr 2013 22:21:45 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
Message-ID: <CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>

Hi Peter,

I need to update another machine, so I'll do that with the final version to
see if the problem still exists. Will post back when that's done.
Regarding the Entrez test, indeed, it doesn't fail every time. So no
worries there.

Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Sun, Apr 7, 2013 at 11:41 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hi Peter,
> >
> > Thanks for the quick reply!
> > Indeed, I don't think it is a big issue for me, and I have also not had
> any
> > problems with Python 3.3.0 on another machine.
> > So, yes, it probably is linked to the Python 3.3.1rc1...
>
> I see that Python 3.3.1 final is out now - might be worth checking
> that too, and I'll try to update one of our buildslaves to use this.
>
> > However, I should point out that it is not only the Bio.bgzf that fails
> > testing.
> > There are also test_Entrez_online and test_SeqIO_index that are
> indicated as
> > "FAIL" (both of which I do not directly use).
>
> The test_SeqIO_index.py failures all looked to be BGZF related too.
>
> I missed the Entrez test, but as an online test that can sometimes
> fail intermittently anyway. The chances are on rerunning it'll be fine.
>
> Peter
>

From p.j.a.cock at googlemail.com  Mon Apr  8 10:05:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 15:05:49 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
Message-ID: <CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>

On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hi Peter,
>
> I need to update another machine, so I'll do that with the final version to
> see if the problem still exists. Will post back when that's done.
> Regarding the Entrez test, indeed, it doesn't fail every time. So no worries
> there.
>
> Cheers,
> Nicolas

I've just installed Python 3.3.1 (final) from source on a 64 bit Linux
machine, and can confirm test failures from the BGZF code (not
failing under Python 3.3.0). I was hoping this would be a glitch in
the release candidate but sadly not.

Thank you again for bringing this to our attention.

Peter

From nicolas.joannin at gmail.com  Mon Apr  8 10:10:07 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Mon, 8 Apr 2013 23:10:07 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
Message-ID: <CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>

OK, I guess that'll be the same whichever platform...
I guess I'll stick with 3.3.0 for the other machine then.
Thanks for the update!

Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Mon, Apr 8, 2013 at 11:05 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hi Peter,
> >
> > I need to update another machine, so I'll do that with the final version
> to
> > see if the problem still exists. Will post back when that's done.
> > Regarding the Entrez test, indeed, it doesn't fail every time. So no
> worries
> > there.
> >
> > Cheers,
> > Nicolas
>
> I've just installed Python 3.3.1 (final) from source on a 64 bit Linux
> machine, and can confirm test failures from the BGZF code (not
> failing under Python 3.3.0). I was hoping this would be a glitch in
> the release candidate but sadly not.
>
> Thank you again for bringing this to our attention.
>
> Peter
>

From p.j.a.cock at googlemail.com  Mon Apr  8 11:23:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 16:23:25 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
Message-ID: <CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>

On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> OK, I guess that'll be the same whichever platform...
> I guess I'll stick with 3.3.0 for the other machine then.
> Thanks for the update!
>
> Nicolas

More bad news - what ever was changes I think something
similar was done in Python 2.7.4 as well, which also has
new test failures not seen under Python 2.7.3. Sigh.

Peter

From markbudde at gmail.com  Mon Apr  8 13:25:24 2013
From: markbudde at gmail.com (Mark Budde)
Date: Mon, 8 Apr 2013 10:25:24 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
	<CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>
	<CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
	<CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>
Message-ID: <CAEwaGEuhSqaZ757DV=LvD00fE-HcRVxpr8mfyvMJ7T5ivhdKXQ@mail.gmail.com>

Thanks Peter, that explains it. BsaI is indeed a weird enzyme, a TypeIIs
restriction enzyme. These enzymes cut a defined distance outside of their
recognition sequence. The utility of these enzymes is that by tagging the
cut sites on the end of your primers, you can generate whatever sticky ends
you desire. Furthermore, because it cuts outside of its recognition
sequence, you can incubate a number of these fragments together with both
restriction enzyme and ligase, and the fragments will assemble into the
final product without subcloning. This is because stciky ends are generated
without the corresponding recognition site, so their ligation is
irreversible. This is called GoldenGate cloning.
-Mark


On Mon, Apr 8, 2013 at 2:32 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde <markbudde at gmail.com> wrote:
> > Thanks for doing some digging on my behalf, Peter. After I posted my
> email
> > last night, I started looking through the Bio.Restriction code myself.
> You
> > response is helpful, I was having trouble seeing how the cut site was
> > encoded for each strand. I think Bjorn's python-dna might be a better
> > starting place for me than Bio.Restriction, as it already has some of the
> > functionality I was looking for.
>
> Fair enough.
>
> > However, to you question, I'm still not quite getting the cut sites. You
> > example with EcoRI makes complete sense, but I can't figure out the
> pattern
> > for some other enzymes, such as BsaI, which is why I got confused
> initially.
> > If you repeat that protocol for BsaI, the results don't match up.
> >
> > In [80]: BsaI.elucidate()
> > Out[80]: 'GGTCTCN^NNNN_N'
> >
> > In [81]: BsaI.fst5
> > Out[81]: 7
> >
> > In [82]: BsaI.fst3
> > Out[82]: 5
> >
> > In [83]: BsaI.site
> > Out[83]: 'GGTCTC'
> >
> > Based on this, I would expect that BsaI.fst3 should yield
> > "11" but it yields 5.
>
> I think you are counting from the wrong reference point.
> Using Python style indexing would only allow cleavage
> points within the recognition site to be described.
>
> BsaI is a weird enzyme, and appears to be handled by the
> Ambiguous class in Bio/Restriction/Restriction.py - which
> says this is for enzymes for which the overhang is variable.
>
> >>> from Bio.Restriction import Bsal
> >>> BsaI.is_ambiguous()
> True
> >>> BsaI.is_defined() # is there a consistent site?
> False
> >>> BsaI.is_unknown()
> False
> >>> BsaI.fst5
> 7
> >>> BsaI.fst3
> 5
> >>> BsaI.elucidate()
> 'GGTCTCN^NNNN_N'
>
> This subclass has a more complicated elucidate method,
> but gives the same string as the REBASE website, so this
> is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html
>
> The 5' cut site of 7 clearly means this is downstream of
> the 6 bp recognition site. This appears to be counted
> from the start (left) of the restriction site.
>
> From the illustration the 3' cut side is also to the right of
> the 5bp recognition site. It appears the number is counted
> from the end (right) of the recognition site, where positive
> as in BsaI means to the right (after the recognition site)
> while negative as in EcoRI means to the left (within the
> recognition site).
>
> Peter
>
> P.S. Please remember to CC the mailing list, e.g. reply all.
> Unless people say explicitly that they have done this deliberately,
> I generally assume taking a public discussion off list is accidental.
>

From p.j.a.cock at googlemail.com  Mon Apr  8 13:55:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 18:55:47 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
Message-ID: <CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>

On Mon, Apr 8, 2013 at 4:23 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
>> OK, I guess that'll be the same whichever platform...
>> I guess I'll stick with 3.3.0 for the other machine then.
>> Thanks for the update!
>>
>> Nicolas
>
> More bad news - what ever was changes I think something
> similar was done in Python 2.7.4 as well, which also has
> new test failures not seen under Python 2.7.3. Sigh.
>
> Peter

Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
lot of gzip work done fixing other issues), but on the bright
side the fix is quite trivial to apply manually:
http://bugs.python.org/issue17666

Peter

From p.j.a.cock at googlemail.com  Tue Apr  9 05:39:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Apr 2013 10:39:12 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
	<CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
Message-ID: <CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>

On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
> lot of gzip work done fixing other issues), but on the bright
> side the fix is quite trivial to apply manually:
> http://bugs.python.org/issue17666
>
> Peter

Just a heads up, this also affects Python 3.2.4 as well.

Peter

From p.j.a.cock at googlemail.com  Tue Apr  9 06:20:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Apr 2013 11:20:43 +0100
Subject: [Biopython] OBF not accepted for GSoC 2013
Message-ID: <CAKVJ-_6-+nyJk+7tp5YOyy7i7N7GtkRswZ_2Vs0uTuqGEV4wWQ@mail.gmail.com>

Dear all,

Unfortunately this year we have not been accepted on the Google
Summer of Code scheme:

I'm sure the rest of the OBF board and the other Bio* developers
will join me in thanking Pjotr Prins for his efforts as the OBF
GSoC administrator co-ordinating our application this year, as
well as last year's administrator Rob Bruels and the other mentors
for their efforts.

For those of you not subscribed to the OBF's GSoC mailing list,
I am forwarding Pjotr's email from last night (also below):
http://lists.open-bio.org/pipermail/gsoc/2013/000211.html

In all 177 organisations were accepted (about the same as the
last few years), and they will be listed here (once they have filled
out their profile information):
https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013

To potential students this summer, the good news is that some
related organisations have been accepted, such as NESCent,
the National Resource for Network Biology (NRNB - known for
Cytoscape), SciRuby (Ruby Science Foundation), so there is
still some scope for doing a bioinformatics related project in
GSoC 2013, perhaps even with a Bio* developer as a co-mentor.

Thank you all,

Peter
(Biopython developer, OBF board member)

---------- Forwarded message ----------
From: Pjotr Prins <pjotr2010 at thebird.nl>
Date: Mon, Apr 8, 2013 at 9:13 PM
Subject: Re: GSoC 2013 is ON
To: Pjotr Prins <pjotr2010 at thebird.nl>
Cc: ..., OBF GSoC <gsoc at lists.open-bio.org>


Sadly, our application got rejected by GSoC this year. I am not sure
what the reason was, but I am convinced our application was similar to
that of other years. Maybe the project ideas could have been better
presented. I am not sure at this stage. I'll make a list of successful
projects to see if we can digest some truths.

The upside is that FOSS is going strong! And that the field is getting
increasingly competitive. As an open source geezer I can only be
happy, even if it hurts our own application.

Sorry everyone, and many thanks for the trouble you took getting
projects written up. Let's not feel discouraged for next year.

Pj.

From nicolas.joannin at gmail.com  Tue Apr  9 09:47:03 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Tue, 9 Apr 2013 22:47:03 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
	<CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
	<CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>
Message-ID: <CAPJVvAxzSMWLvnmnZv76A6VY4c3wk7naCnWR8CdZJO50DrC09Q@mail.gmail.com>

Thanks for the fix!
Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Tue, Apr 9, 2013 at 6:39 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >
> > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
> > lot of gzip work done fixing other issues), but on the bright
> > side the fix is quite trivial to apply manually:
> > http://bugs.python.org/issue17666
> >
> > Peter
>
> Just a heads up, this also affects Python 3.2.4 as well.
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From matthiasschade.de at googlemail.com  Thu Apr 11 05:20:31 2013
From: matthiasschade.de at googlemail.com (Matthias Schade)
Date: Thu, 11 Apr 2013 11:20:31 +0200
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
Message-ID: <5166805F.8060603@googlemail.com>

Hello everyone,

is there an upper limit to how many sequences I can query via 
NCBIWWW.qblast at once?

Sending up to 150 sequences each of 24mer length in a single string 
everything works fine. But now, I have tried the same for a string 
containing about 900 sequences. On good times, it takes the NCBI-server 
about 5min to send an answer. I save the answer and later open and parse 
the file by other functions in my code. However, even though I have 
queried the same 900 sequences, the resulting output-file varies in 
length (10 MB<x<20MB) and always at least misses the correct 
termination-tag in "<\BlastOutput>" or even misses more (this does not 
happen why querying 150 sequences or less).

I would guess once the server has started sending its answers, there 
might only be a limited time NCBIWWW.qblast waits for follow up packets 
... and thus depending on the current server-load, the 
NCBIWWW.qblast-function simply decides to terminate waiting for 
incomming data after some time, resulting in my blast-output-files to 
vary in length. Could anyone correct or verify this long-fetched hypothesis?

My core-lines are:

orgn='Mus Musculus' #on anything else
result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, 
entrez_query=str(orgn+"[orgn]"))
save_file = open ('myblast_result.xml',"w")
save_file.write(result.read())

Best regards,
Matthias

From p.j.a.cock at googlemail.com  Thu Apr 11 05:43:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 10:43:44 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <5166805F.8060603@googlemail.com>
References: <5166805F.8060603@googlemail.com>
Message-ID: <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>

On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
<matthiasschade.de at googlemail.com> wrote:
> Hello everyone,
>
> is there an upper limit to how many sequences I can query via NCBIWWW.qblast
> at once?

There are sometimes limits on the URL length, especially if going via
firewalls and proxies, so that may be one factor.

At the NCBI end, I'm not sure what limits they impose on this:
http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

> Sending up to 150 sequences each of 24mer length in a single string
> everything works fine. But now, I have tried the same for a string
> containing about 900 sequences. On good times, it takes the NCBI-server
> about 5min to send an answer. I save the answer and later open and parse the
> file by other functions in my code. However, even though I have queried the
> same 900 sequences, the resulting output-file varies in length (10
> MB<x<20MB) and always at least misses the correct termination-tag in
> "<\BlastOutput>" or even misses more (this does not happen why querying 150
> sequences or less).
>
> I would guess once the server has started sending its answers, there might
> only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> thus depending on the current server-load, the NCBIWWW.qblast-function
> simply decides to terminate waiting for incomming data after some time,
> resulting in my blast-output-files to vary in length. Could anyone correct
> or verify this long-fetched hypothesis?
>
> My core-lines are:
>
> orgn='Mus Musculus' #on anything else
> result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> entrez_query=str(orgn+"[orgn]"))
> save_file = open ('myblast_result.xml',"w")
> save_file.write(result.read())
>
> Best regards,
> Matthias

I think you've reach the scale where it would be better to run blastn
locally - ideally on a cluster if you have access to one. You can
download the whole NT database from here - most departments
running BLAST with their own Linux servers will have a central copy
which is kept automatically up to date:
ftp://ftp.ncbi.nlm.nih.gov/blast/db/

If you don't have those kinds of resources, then you can even
run BLAST on your own Windows machine - although I'm not
sure how much RAM would be recommended for the NT
database which is pretty big.

Regards,

Peter

From ericmajinglong at gmail.com  Thu Apr 11 12:49:27 2013
From: ericmajinglong at gmail.com (Eric Ma)
Date: Thu, 11 Apr 2013 12:49:27 -0400
Subject: [Biopython] Request from help
Message-ID: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>

Hello everybody,

I'm new to the mailing list here, though I've been playing with BioPython
for quite a while.

I'm having some trouble here. I wanted to display a tree of sequences for
which I had done a multiple sequence alignment. I tried going through the
pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
Because I'm still in the testing phase, instead of writing it as a single
script, I wrote it as a series of scripts that I would execute in order.

The problem I run into is at step 4 in the example, where I "feed the
alignment to PhyML". My data set is 70 protein sequences, and the trouble I
run into is that it takes a very, very long time at the "feeding alignment
to PhyML" step. I tried running the script on my MacBook Pro overnight, and
even the next morning it was not done. Am I missing something here?

Just to be clear here, aligning the sequences using Muscle was successful,
and I also managed to output a distance matrix from sample to sample, which
I used in another downstream pipeline to display the clustering of the
sequences on a 2D euclidean plane. However, I wanted to have a tree
representation to validate the clustering results; the trouble is, I can't
get the _phyml_tree.txt file to be created, which I would then use to draw
the tree.

Thanks in advance for any help!

Cheers,
Eric
-----------------------------------------------------------------------
Please consider the environment before printing this e-mail. Do you really
need to print it?

http://about.me/ericmjl

From jgibbons1 at mail.usf.edu  Thu Apr 11 13:01:19 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Thu, 11 Apr 2013 13:01:19 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>

NCBI Standalone Blast gives you the option of querying the website so that
you don't have to maintain a local database.

Justin Gibbons


On Thu, Apr 11, 2013 at 12:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hello everybody,
>
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
>
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
>
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
>
> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
>
> Thanks in advance for any help!
>
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
>
> http://about.me/ericmjl
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From p.j.a.cock at googlemail.com  Thu Apr 11 13:07:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 18:07:05 +0100
Subject: [Biopython] Request from help
In-Reply-To: <CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>
Message-ID: <CAKVJ-_43iHLgbDB9mJLiHzqm-JLbt8xR0yMbLZFgR94cHUnC2w@mail.gmail.com>

On Thu, Apr 11, 2013 at 6:01 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> NCBI Standalone Blast gives you the option of querying the website so that
> you don't have to maintain a local database.
>
> Justin Gibbons

Did you reply to the wrong email? This thread was about alignments and trees.

Peter

From p.j.a.cock at googlemail.com  Thu Apr 11 13:11:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 18:11:49 +0100
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>

On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> Hello everybody,
>
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
>
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
>
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
>
> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
>
> Thanks in advance for any help!
>
> Cheers,
> Eric

Hi Eric,

So this part is getting stuck (or taking a very long time):

#Feed the alignment to PhyML using the command line wrapper:
from Bio.Phylo.Applications import PhymlCommandline
cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
model='WAG', alpha='e', bootstrap=100)
out_log, err_log = cmdline()

At that point is the computer active (high CPU load as measured
via the task manager / system monitor / top / etc)?

I would suggest trying PHYML at the command line by hand, first
check the command the Biopython should be running:

print cmdline

That may give you visual progress on screen. My guess is simply
that this is just slow - you are only running 100 bootstraps, but
perhaps each one is taking a while and that adds up.

You said you had 70 protein sequences - how many columns
are there in the alignment? That can also affect run times.

Peter

From nuin at genedrift.org  Thu Apr 11 13:05:57 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 11 Apr 2013 13:05:57 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CEA2A651-7F21-405C-B4D6-DF098E7704EE@genedrift.org>


On 2013-04-11, at 12:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hello everybody,
> 
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
> 
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
> 
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
> 

Hi

With 70 OTUs you have 5.00 E115 possible trees. Guaranteed it will take a long time, independent to what parameters you are using in PhyML. Try with a smaller number of taxa, just for testing purposes and depending on the complexity of your protein phylogeny, give your computer some weeks to actually generate some result.

This is not a BioPython issue, is more a phylogenetics one.

Cheers
Paulo


> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
> 
> Thanks in advance for any help!
> 
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
> 
> http://about.me/ericmjl
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From ericmajinglong at gmail.com  Thu Apr 11 13:20:14 2013
From: ericmajinglong at gmail.com (Eric Ma)
Date: Thu, 11 Apr 2013 13:20:14 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
Message-ID: <CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>

Hi Peter and Paulo,

Thank you for your feedback, much appreciated! I still have very sparse
knowledge about phylogenies, and especially the run times needed to build
the trees, so any new knowledge is appreciated!

The sequences I'm using are full Influenza A HA protein sequences, so we're
talking about 1700-1750 amino acids being aligned together. The multiple
sequence alignment for 70 sequences doesn't take long - on the order of
minutes on my laptop. It's the "feeding into PhyML" portion that, for some
reason, takes a long time.

With that said, I do have a full distance matrix as one of the outputs from
a previous script in this script series, in addition to the multiple
sequence alignment. I have been able to feed the distance matrix into a
separate clustering algorithm from scikit-learn, and I was able to
successfully identify six clusters of sequences in there. Hence, I wanted
to use a phylogenetic tree to confirm what I'm seeing with the clustering
algorithm - it's basically two separate representations of the same data.

I have heard that it is possible to create a tree from the distance matrix,
and I was thinking this might be an alternative to feeding the alignment
into PhyML. Does anybody know how to do this using BioPython?

Cheers,
Eric
-----------------------------------------------------------------------
Please consider the environment before printing this e-mail. Do you really
need to print it?

http://about.me/ericmjl


On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> > Hello everybody,
> >
> > I'm new to the mailing list here, though I've been playing with BioPython
> > for quite a while.
> >
> > I'm having some trouble here. I wanted to display a tree of sequences for
> > which I had done a multiple sequence alignment. I tried going through the
> > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline
> ).
> > Because I'm still in the testing phase, instead of writing it as a single
> > script, I wrote it as a series of scripts that I would execute in order.
> >
> > The problem I run into is at step 4 in the example, where I "feed the
> > alignment to PhyML". My data set is 70 protein sequences, and the
> trouble I
> > run into is that it takes a very, very long time at the "feeding
> alignment
> > to PhyML" step. I tried running the script on my MacBook Pro overnight,
> and
> > even the next morning it was not done. Am I missing something here?
> >
> > Just to be clear here, aligning the sequences using Muscle was
> successful,
> > and I also managed to output a distance matrix from sample to sample,
> which
> > I used in another downstream pipeline to display the clustering of the
> > sequences on a 2D euclidean plane. However, I wanted to have a tree
> > representation to validate the clustering results; the trouble is, I
> can't
> > get the _phyml_tree.txt file to be created, which I would then use to
> draw
> > the tree.
> >
> > Thanks in advance for any help!
> >
> > Cheers,
> > Eric
>
> Hi Eric,
>
> So this part is getting stuck (or taking a very long time):
>
> #Feed the alignment to PhyML using the command line wrapper:
> from Bio.Phylo.Applications import PhymlCommandline
> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
> model='WAG', alpha='e', bootstrap=100)
> out_log, err_log = cmdline()
>
> At that point is the computer active (high CPU load as measured
> via the task manager / system monitor / top / etc)?
>
> I would suggest trying PHYML at the command line by hand, first
> check the command the Biopython should be running:
>
> print cmdline
>
> That may give you visual progress on screen. My guess is simply
> that this is just slow - you are only running 100 bootstraps, but
> perhaps each one is taking a while and that adds up.
>
> You said you had 70 protein sequences - how many columns
> are there in the alignment? That can also affect run times.
>
> Peter
>

From nuin at genedrift.org  Thu Apr 11 13:33:05 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 11 Apr 2013 13:33:05 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
	<CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>
Message-ID: <8176FA21-39F6-405A-B338-94D87E6BB7B3@genedrift.org>


On 2013-04-11, at 1:20 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hi Peter and Paulo,
> 
> Thank you for your feedback, much appreciated! I still have very sparse
> knowledge about phylogenies, and especially the run times needed to build
> the trees, so any new knowledge is appreciated!
> 
> The sequences I'm using are full Influenza A HA protein sequences, so we're
> talking about 1700-1750 amino acids being aligned together. The multiple
> sequence alignment for 70 sequences doesn't take long - on the order of
> minutes on my laptop. It's the "feeding into PhyML" portion that, for some
> reason, takes a long time.


Alignment time is much smaller than any phylogeny calculation on your data size. The number of amino acids is not that important on the final time, as the ML is calculation is quite fast, but arranging the branches is the main bottleneck.

There's no easy solution for this, maybe you can try some other approaches, that won't be as good as ML (Neighbour Joning) and some that might be as good (Bayes) but take some time too.
> 
> With that said, I do have a full distance matrix as one of the outputs from
> a previous script in this script series, in addition to the multiple
> sequence alignment. I have been able to feed the distance matrix into a
> separate clustering algorithm from scikit-learn, and I was able to
> successfully identify six clusters of sequences in there. Hence, I wanted
> to use a phylogenetic tree to confirm what I'm seeing with the clustering
> algorithm - it's basically two separate representations of the same data.
> 

The distance can be used to generate a diagram, I wouldn't call it a phylogenetic tree, but it can give you some ideas. One quick way to check for your tree is to use Neighbour Joining approach, you can try Mega with your alignment file and see, calculations will be faster.

Cheers
Paulo


> I have heard that it is possible to create a tree from the distance matrix,
> and I was thinking this might be an alternative to feeding the alignment
> into PhyML. Does anybody know how to do this using BioPython?
> 
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
> 
> http://about.me/ericmjl
> 
> 
> On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> 
>> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
>>> Hello everybody,
>>> 
>>> I'm new to the mailing list here, though I've been playing with BioPython
>>> for quite a while.
>>> 
>>> I'm having some trouble here. I wanted to display a tree of sequences for
>>> which I had done a multiple sequence alignment. I tried going through the
>>> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline
>> ).
>>> Because I'm still in the testing phase, instead of writing it as a single
>>> script, I wrote it as a series of scripts that I would execute in order.
>>> 
>>> The problem I run into is at step 4 in the example, where I "feed the
>>> alignment to PhyML". My data set is 70 protein sequences, and the
>> trouble I
>>> run into is that it takes a very, very long time at the "feeding
>> alignment
>>> to PhyML" step. I tried running the script on my MacBook Pro overnight,
>> and
>>> even the next morning it was not done. Am I missing something here?
>>> 
>>> Just to be clear here, aligning the sequences using Muscle was
>> successful,
>>> and I also managed to output a distance matrix from sample to sample,
>> which
>>> I used in another downstream pipeline to display the clustering of the
>>> sequences on a 2D euclidean plane. However, I wanted to have a tree
>>> representation to validate the clustering results; the trouble is, I
>> can't
>>> get the _phyml_tree.txt file to be created, which I would then use to
>> draw
>>> the tree.
>>> 
>>> Thanks in advance for any help!
>>> 
>>> Cheers,
>>> Eric
>> 
>> Hi Eric,
>> 
>> So this part is getting stuck (or taking a very long time):
>> 
>> #Feed the alignment to PhyML using the command line wrapper:
>> from Bio.Phylo.Applications import PhymlCommandline
>> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
>> model='WAG', alpha='e', bootstrap=100)
>> out_log, err_log = cmdline()
>> 
>> At that point is the computer active (high CPU load as measured
>> via the task manager / system monitor / top / etc)?
>> 
>> I would suggest trying PHYML at the command line by hand, first
>> check the command the Biopython should be running:
>> 
>> print cmdline
>> 
>> That may give you visual progress on screen. My guess is simply
>> that this is just slow - you are only running 100 bootstraps, but
>> perhaps each one is taking a while and that adds up.
>> 
>> You said you had 70 protein sequences - how many columns
>> are there in the alignment? That can also affect run times.
>> 
>> Peter
>> 
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From jgibbons1 at mail.usf.edu  Thu Apr 11 14:10:32 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Thu, 11 Apr 2013 14:10:32 -0400
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
References: <5166805F.8060603@googlemail.com>
	<CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
Message-ID: <CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>

NCBI Standalone Blast gives you the option of querying the website so that
you don't have to maintain a local database.

Justin Gibbons

P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it
correct this time.


On Thu, Apr 11, 2013 at 5:43 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
> <matthiasschade.de at googlemail.com> wrote:
> > Hello everyone,
> >
> > is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast
> > at once?
>
> There are sometimes limits on the URL length, especially if going via
> firewalls and proxies, so that may be one factor.
>
> At the NCBI end, I'm not sure what limits they impose on this:
> http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
> > Sending up to 150 sequences each of 24mer length in a single string
> > everything works fine. But now, I have tried the same for a string
> > containing about 900 sequences. On good times, it takes the NCBI-server
> > about 5min to send an answer. I save the answer and later open and parse
> the
> > file by other functions in my code. However, even though I have queried
> the
> > same 900 sequences, the resulting output-file varies in length (10
> > MB<x<20MB) and always at least misses the correct termination-tag in
> > "<\BlastOutput>" or even misses more (this does not happen why querying
> 150
> > sequences or less).
> >
> > I would guess once the server has started sending its answers, there
> might
> > only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> > thus depending on the current server-load, the NCBIWWW.qblast-function
> > simply decides to terminate waiting for incomming data after some time,
> > resulting in my blast-output-files to vary in length. Could anyone
> correct
> > or verify this long-fetched hypothesis?
> >
> > My core-lines are:
> >
> > orgn='Mus Musculus' #on anything else
> > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> > entrez_query=str(orgn+"[orgn]"))
> > save_file = open ('myblast_result.xml',"w")
> > save_file.write(result.read())
> >
> > Best regards,
> > Matthias
>
> I think you've reach the scale where it would be better to run blastn
> locally - ideally on a cluster if you have access to one. You can
> download the whole NT database from here - most departments
> running BLAST with their own Linux servers will have a central copy
> which is kept automatically up to date:
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/
>
> If you don't have those kinds of resources, then you can even
> run BLAST on your own Windows machine - although I'm not
> sure how much RAM would be recommended for the NT
> database which is pretty big.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From p.j.a.cock at googlemail.com  Thu Apr 11 14:54:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 19:54:50 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>
References: <5166805F.8060603@googlemail.com>
	<CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
	<CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>
Message-ID: <CAKVJ-_4p55n9PCJOs4mv=pNniDrYXc0GUtwM_HG-7QZzxFNnFg@mail.gmail.com>

On Thursday, April 11, 2013, Justin Gibbons wrote:

> NCBI Standalone Blast gives you the option of querying the website so that
> you don't have to maintain a local database.


Good point - the BLAST+ binaries added the -remote option
which does that. Worth exploring as it should know and
obey the NCBI limits automatically.


>
> Justin Gibbons
>
> P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it
> correct this time.
>
>
Easily done, don't worry about it.

Peter

From dan837446 at gmail.com  Thu Apr 11 16:51:13 2013
From: dan837446 at gmail.com (Dan)
Date: Fri, 12 Apr 2013 08:51:13 +1200
Subject: [Biopython] Biopython Digest, Vol 124, Issue 9
In-Reply-To: <mailman.3.1365696001.2331.biopython@lists.open-bio.org>
References: <mailman.3.1365696001.2331.biopython@lists.open-bio.org>
Message-ID: <CAExy72jLbfaFiLAqDOOPYTQg6g14f+i9x7css_=ojtPwgy_grw@mail.gmail.com>

This is peripherally relevant to the question, I asked Tao Tao of NCBI user
services about general guidelines for remote blast, and got this response:

"In general, the key is to reduce the hits to BLAST server:
At the search step, DO NOT submit searches that contain only single
sequence! You need to batch the query and submit a set in a single search
request.
At the result polling step, you should reduce the result checking by
spacing them out, and start checking for results after a delay (a few
minutes).
The XML result for batch queries is a bit peculiar each query is wrapped
around  <Iteration> tag
You are better off leaving the other conditions default and post-process it
to get the top hits"

Also it's best to search between 9PM and 5AM Eastern Standard time and at
weekends.
Personally I seem to encounter glitches using batches above 100 but it's so
specific to your particular workplace that I'm not sure if that's a good
guideline.


On Fri, Apr 12, 2013 at 4:00 AM, <biopython-request at lists.open-bio.org>wrote:

> Send Biopython mailing list submissions to
>         biopython at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.open-bio.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
>         biopython-request at lists.open-bio.org
>
> You can reach the person managing the list at
>         biopython-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython digest..."
>
>
> Today's Topics:
>
>    1. query upper limit for NCBIWWW.qblast? (Matthias Schade)
>    2. Re: query upper limit for NCBIWWW.qblast? (Peter Cock)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 11 Apr 2013 11:20:31 +0200
> From: Matthias Schade <matthiasschade.de at googlemail.com>
> Subject: [Biopython] query upper limit for NCBIWWW.qblast?
> To: biopython at lists.open-bio.org
> Message-ID: <5166805F.8060603 at googlemail.com>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> Hello everyone,
>
> is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast at once?
>
> Sending up to 150 sequences each of 24mer length in a single string
> everything works fine. But now, I have tried the same for a string
> containing about 900 sequences. On good times, it takes the NCBI-server
> about 5min to send an answer. I save the answer and later open and parse
> the file by other functions in my code. However, even though I have
> queried the same 900 sequences, the resulting output-file varies in
> length (10 MB<x<20MB) and always at least misses the correct
> termination-tag in "<\BlastOutput>" or even misses more (this does not
> happen why querying 150 sequences or less).
>
> I would guess once the server has started sending its answers, there
> might only be a limited time NCBIWWW.qblast waits for follow up packets
> ... and thus depending on the current server-load, the
> NCBIWWW.qblast-function simply decides to terminate waiting for
> incomming data after some time, resulting in my blast-output-files to
> vary in length. Could anyone correct or verify this long-fetched
> hypothesis?
>
> My core-lines are:
>
> orgn='Mus Musculus' #on anything else
> result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> entrez_query=str(orgn+"[orgn]"))
> save_file = open ('myblast_result.xml',"w")
> save_file.write(result.read())
>
> Best regards,
> Matthias
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 11 Apr 2013 10:43:44 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython] query upper limit for NCBIWWW.qblast?
> To: Matthias Schade <matthiasschade.de at googlemail.com>
> Cc: biopython at lists.open-bio.org
> Message-ID:
>         <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=
> ZYEg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
> <matthiasschade.de at googlemail.com> wrote:
> > Hello everyone,
> >
> > is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast
> > at once?
>
> There are sometimes limits on the URL length, especially if going via
> firewalls and proxies, so that may be one factor.
>
> At the NCBI end, I'm not sure what limits they impose on this:
> http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
> > Sending up to 150 sequences each of 24mer length in a single string
> > everything works fine. But now, I have tried the same for a string
> > containing about 900 sequences. On good times, it takes the NCBI-server
> > about 5min to send an answer. I save the answer and later open and parse
> the
> > file by other functions in my code. However, even though I have queried
> the
> > same 900 sequences, the resulting output-file varies in length (10
> > MB<x<20MB) and always at least misses the correct termination-tag in
> > "<\BlastOutput>" or even misses more (this does not happen why querying
> 150
> > sequences or less).
> >
> > I would guess once the server has started sending its answers, there
> might
> > only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> > thus depending on the current server-load, the NCBIWWW.qblast-function
> > simply decides to terminate waiting for incomming data after some time,
> > resulting in my blast-output-files to vary in length. Could anyone
> correct
> > or verify this long-fetched hypothesis?
> >
> > My core-lines are:
> >
> > orgn='Mus Musculus' #on anything else
> > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> > entrez_query=str(orgn+"[orgn]"))
> > save_file = open ('myblast_result.xml',"w")
> > save_file.write(result.read())
> >
> > Best regards,
> > Matthias
>
> I think you've reach the scale where it would be better to run blastn
> locally - ideally on a cluster if you have access to one. You can
> download the whole NT database from here - most departments
> running BLAST with their own Linux servers will have a central copy
> which is kept automatically up to date:
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/
>
> If you don't have those kinds of resources, then you can even
> run BLAST on your own Windows machine - although I'm not
> sure how much RAM would be recommended for the NT
> database which is pretty big.
>
> Regards,
>
> Peter
>
>
> ------------------------------
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> End of Biopython Digest, Vol 124, Issue 9
> *****************************************
>

From p.j.a.cock at googlemail.com  Fri Apr 12 05:49:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Apr 2013 10:49:31 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <5166805F.8060603@googlemail.com>
References: <5166805F.8060603@googlemail.com>
Message-ID: <CAKVJ-_4yYoXoV5X2T_7MHL2PjLZnx0-sHrHMcpfCZXNsXzDDWw@mail.gmail.com>

Dan replied via the digest (summary emails rather than individual emails) here:
http://lists.open-bio.org/pipermail/biopython/2013-April/008507.html

On Thu, Apr 11, 2013 at 9:51 PM, Dan <dan837446 at gmail.com> wrote:
> This is peripherally relevant to the question, I asked Tao Tao of NCBI user
> services about general guidelines for remote blast, and got this response:
>
> "In general, the key is to reduce the hits to BLAST server:
> At the search step, DO NOT submit searches that contain only single
> sequence! You need to batch the query and submit a set in a single search
> request.
> At the result polling step, you should reduce the result checking by
> spacing them out, and start checking for results after a delay (a few
> minutes).
> The XML result for batch queries is a bit peculiar each query is wrapped
> around  <Iteration> tag
> You are better off leaving the other conditions default and post-process it
> to get the top hits"
>
> Also it's best to search between 9PM and 5AM Eastern Standard time and at
> weekends.
> Personally I seem to encounter glitches using batches above 100 but it's so
> specific to your particular workplace that I'm not sure if that's a good
> guideline.
>

Perhaps Biopython's QBLAST wrapper could benefit from adaptive
time delays in the polling step - at the moment it just checks every
three seconds.

Peter

From john at picloud.com  Fri Apr 12 19:11:43 2013
From: john at picloud.com (John Riley)
Date: Fri, 12 Apr 2013 16:11:43 -0700
Subject: [Biopython] BioPython now available on PiCloud by default
Message-ID: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>

Hello,

We've had some requests for BioPython to be deployed on PiCloud [1]. While
any user could always create a custom environment, and install the latest
version themselves [2], we've decided to address the issue directly by
adding BioPython (1.60) into the default suite of scientific tools on
PiCloud.

In short, to offload a Python function or program that uses BioPython, you
don't need to do any setup! The instructions for using other scientific
tools work just the same [3]. Hope this helps!

[1] http://www.picloud.com
[2] http://docs.picloud.com/environment.html
[3] http://docs.picloud.com/howto/pyscientifictools.html

Best Regards,
John

--
John Riley
PiCloud, Inc.

From jgibbons1 at mail.usf.edu  Sat Apr 13 16:13:56 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sat, 13 Apr 2013 16:13:56 -0400
Subject: [Biopython] Cookbook suggestion
Message-ID: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>

I want to add the following to the cookbook but I am unable to create an
account.

#using SeqIO.write() without holding records in memory.

from Bio import SeqIO


seq_ids=set() #create an empty set to hold the sequence IDs.
indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence
ID but is not held in memory

for seq_record in SeqIO.parse(file_path, 'fasta'):
    #Filter according to some critria:
        seq_ids.add(seq_record.id)

#write the fasta records to a new file using SeqIO.write()

SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
'fasta')

So if someone who can edit the cookbook wants to add it feel free to.

Justin Gibbons

From p.j.a.cock at googlemail.com  Sat Apr 13 16:27:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 13 Apr 2013 21:27:24 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
Message-ID: <CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>

Hi Justin,

On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> I want to add the following to the cookbook but I am unable to create an
> account.

Hmm - we should fix that. Is there a specific error message
from the wiki?

> #using SeqIO.write() without holding records in memory.
>
> from Bio import SeqIO
>
>
> seq_ids=set() #create an empty set to hold the sequence IDs.
> indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence
> ID but is not held in memory
>
> for seq_record in SeqIO.parse(file_path, 'fasta'):
>     #Filter according to some critria:
>         seq_ids.add(seq_record.id)

Why do call SeqIO.index, but not use it and instead get
the ID list by doing a full parse of the file? Note that calling
SeqIO.index is likely faster than SeqIO.parse because the
index code doesn't actually load the sequence information
etc - just the record identifier. This speed difference is more
obvious on heavier file formats like GenBank. e.g. These
single lines both get all the identifiers as a list:

seq_ids = SeqIO.parse(file_path, 'fasta').keys()

vs:

seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]

Also note that using a set rather than a list for the ids
means the order is lost - which may be important.

> #write the fasta records to a new file using SeqIO.write()
>
> SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
> 'fasta')
>

That last line uses a list comprehension,
[indexed_fasta[seq_id] for seq_id in seq_ids]

That will therefore load all the records into memory as a list of
SeqRecord objects, which can be avoided with a list comprehension:

(indexed_fasta[seq_id] for seq_id in seq_ids)

i.e. round brackets not square.

> So if someone who can edit the cookbook wants to add it feel free to.
>
> Justin Gibbons

Feedback on the documentation and efforts to improve it
are always welcome. However, I'm not sure what your example
is trying to do yet - it seems to rewrite a FASTA file with the
records in a new order (with the order given by however
Python sorts the set of IDs).

Thanks,

Peter

From jgibbons1 at mail.usf.edu  Sun Apr 14 13:53:26 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sun, 14 Apr 2013 13:53:26 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
Message-ID: <CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>

My only goal was to demonstrate how to use SeqIO.write without holding all
of the sequence records in memory by using a generator expression:

    SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
new_file_path,'fasta')

Everything else was just to provide context for the SeqIO.write() function,
but it just ended up just being confusing.

I am assuming that you want to check the individual fasta records for
specific criteria and then write those that match the criteria to a new
file. Which is why I wrote this:

for seq_record in SeqIO.parse(file_path, 'fasta'):
     #Filter according to some critria:
         seq_ids.add(seq_record.id)

 For example you can create individual sets holding the sequence IDs of
sequences that are within a given size range, and aren't repetitive. So
that seq_ids=correct_length_set.intersection(non_repetitive_set)

You need the indexed fasta so that you can get a copy of the sequence
records that match your criteria:

ndexed_fasta=SeqIO.index(
file_path, 'fasta') #Can be searched by sequence
  ID but is not held in memory


On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Justin,
>
> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
> wrote:
> > I want to add the following to the cookbook but I am unable to create an
> > account.
>
> Hmm - we should fix that. Is there a specific error message
> from the wiki?
>
> > #using SeqIO.write() without holding records in memory.
> >
> > from Bio import SeqIO
> >
> >
> > seq_ids=set() #create an empty set to hold the sequence IDs.
> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by
> sequence
> > ID but is not held in memory
> >
> > for seq_record in SeqIO.parse(file_path, 'fasta'):
> >     #Filter according to some critria:
> >         seq_ids.add(seq_record.id)
>
> Why do call SeqIO.index, but not use it and instead get
> the ID list by doing a full parse of the file? Note that calling
> SeqIO.index is likely faster than SeqIO.parse because the
> index code doesn't actually load the sequence information
> etc - just the record identifier. This speed difference is more
> obvious on heavier file formats like GenBank. e.g. These
> single lines both get all the identifiers as a list:
>
> seq_ids = SeqIO.parse(file_path, 'fasta').keys()
>
> vs:
>
> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]
>
> Also note that using a set rather than a list for the ids
> means the order is lost - which may be important.
>
> > #write the fasta records to a new file using SeqIO.write()
> >
> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
> > 'fasta')
> >
>
> That last line uses a list comprehension,
> [indexed_fasta[seq_id] for seq_id in seq_ids]
>
> That will therefore load all the records into memory as a list of
> SeqRecord objects, which can be avoided with a list comprehension:
>
> (indexed_fasta[seq_id] for seq_id in seq_ids)
>
> i.e. round brackets not square.
>
> > So if someone who can edit the cookbook wants to add it feel free to.
> >
> > Justin Gibbons
>
> Feedback on the documentation and efforts to improve it
> are always welcome. However, I'm not sure what your example
> is trying to do yet - it seems to rewrite a FASTA file with the
> records in a new order (with the order given by however
> Python sorts the set of IDs).
>
> Thanks,
>
> Peter
>

From jgibbons1 at mail.usf.edu  Sun Apr 14 13:58:53 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sun, 14 Apr 2013 13:58:53 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
Message-ID: <CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>

Sorry I accidentally sent the last email.

You need the indexed fasta to get a copy of the sequence records that match
your criteria:

indexed_fasta=SeqIO.index(file_path, 'fasta')
SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
new_file_path,'fasta')

As for editing the wiki when I click on "Login with OpenID" I get sent to a
blank page. I also tried clicking on "Login" and tired to create a new
account and was told "The action you have requested is limited to users in
the group: Administrators<http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1>
."


On Sun, Apr 14, 2013 at 1:53 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>wrote:

> My only goal was to demonstrate how to use SeqIO.write without holding all
> of the sequence records in memory by using a generator expression:
>
>     SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> new_file_path,'fasta')
>
> Everything else was just to provide context for the SeqIO.write()
> function, but it just ended up just being confusing.
>
> I am assuming that you want to check the individual fasta records for
> specific criteria and then write those that match the criteria to a new
> file. Which is why I wrote this:
>
> for seq_record in SeqIO.parse(file_path, 'fasta'):
>      #Filter according to some critria:
>          seq_ids.add(seq_record.id)
>
>  For example you can create individual sets holding the sequence IDs of
> sequences that are within a given size range, and aren't repetitive. So
> that seq_ids=correct_length_set.intersection(non_repetitive_set)
>
> You need the indexed fasta so that you can get a copy of the sequence
> records that match your criteria:
>
> ndexed_fasta=SeqIO.index(
> file_path, 'fasta') #Can be searched by sequence
>   ID but is not held in memory
>
>
>
>
>
> On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
>
>> Hi Justin,
>>
>> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
>> wrote:
>> > I want to add the following to the cookbook but I am unable to create an
>> > account.
>>
>> Hmm - we should fix that. Is there a specific error message
>> from the wiki?
>>
>> > #using SeqIO.write() without holding records in memory.
>> >
>> > from Bio import SeqIO
>> >
>> >
>> > seq_ids=set() #create an empty set to hold the sequence IDs.
>> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by
>> sequence
>> > ID but is not held in memory
>> >
>> > for seq_record in SeqIO.parse(file_path, 'fasta'):
>> >     #Filter according to some critria:
>> >         seq_ids.add(seq_record.id)
>>
>> Why do call SeqIO.index, but not use it and instead get
>> the ID list by doing a full parse of the file? Note that calling
>> SeqIO.index is likely faster than SeqIO.parse because the
>> index code doesn't actually load the sequence information
>> etc - just the record identifier. This speed difference is more
>> obvious on heavier file formats like GenBank. e.g. These
>> single lines both get all the identifiers as a list:
>>
>> seq_ids = SeqIO.parse(file_path, 'fasta').keys()
>>
>> vs:
>>
>> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]
>>
>> Also note that using a set rather than a list for the ids
>> means the order is lost - which may be important.
>>
>> > #write the fasta records to a new file using SeqIO.write()
>> >
>> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids],
>> new_file_path,
>> > 'fasta')
>> >
>>
>> That last line uses a list comprehension,
>> [indexed_fasta[seq_id] for seq_id in seq_ids]
>>
>> That will therefore load all the records into memory as a list of
>> SeqRecord objects, which can be avoided with a list comprehension:
>>
>> (indexed_fasta[seq_id] for seq_id in seq_ids)
>>
>> i.e. round brackets not square.
>>
>> > So if someone who can edit the cookbook wants to add it feel free to.
>> >
>> > Justin Gibbons
>>
>> Feedback on the documentation and efforts to improve it
>> are always welcome. However, I'm not sure what your example
>> is trying to do yet - it seems to rewrite a FASTA file with the
>> records in a new order (with the order given by however
>> Python sorts the set of IDs).
>>
>> Thanks,
>>
>> Peter
>>
>
>

From p.j.a.cock at googlemail.com  Mon Apr 15 06:10:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 11:10:15 +0100
Subject: [Biopython] BioPython now available on PiCloud by default
In-Reply-To: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
References: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
Message-ID: <CAKVJ-_6Yqhc1LYNoHph5VxAKx5Fwj5j9p9b-GNa8P0ufK2seYQ@mail.gmail.com>

On Sat, Apr 13, 2013 at 12:11 AM, John Riley <john at picloud.com> wrote:
> Hello,
>
> We've had some requests for BioPython to be deployed on PiCloud [1]. While
> any user could always create a custom environment, and install the latest
> version themselves [2], we've decided to address the issue directly by
> adding BioPython (1.60) into the default suite of scientific tools on
> PiCloud.
>
> In short, to offload a Python function or program that uses BioPython, you
> don't need to do any setup! The instructions for using other scientific
> tools work just the same [3]. Hope this helps!
>
> [1] http://www.picloud.com
> [2] http://docs.picloud.com/environment.html
> [3] http://docs.picloud.com/howto/pyscientifictools.html
>
> Best Regards,
> John

Sounds interesting, and you have some very keen users already :)
http://blog.picloud.com/2011/09/27/building-a-biological-database-and-doing-comparative-genomics-in-the-cloud/

Regards,

Peter

From p.j.a.cock at googlemail.com  Mon Apr 15 06:46:53 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 11:46:53 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
Message-ID: <CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>

On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> Sorry I accidentally sent the last email.
>
> You need the indexed fasta to get a copy of the sequence records that match
> your criteria:
>
> indexed_fasta=SeqIO.index(file_path, 'fasta')
> SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> new_file_path,'fasta')

With a simple sequential file format like FASTA where there are no complex
file headers/footers to worry about, this might be the faster route:

with open(new_file_path, "w") as handle:
    for seq_id in seq_ids:
        handle.write(indexed_fasta.get_raw(seq_id))

The idea here is never to parse the records into SeqRecord objects, just
keep them as raw strings in FASTA format. The same idea works well on
GenBank or SwissProt files which are slower to parse, there are examples
of this in the main Tutorial,
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Were you intending this to be a self contained cookbook example for:
http://biopython.org/wiki/Category:Cookbook ?

> As for editing the wiki when I click on "Login with OpenID" I get sent to a
> blank page. I also tried clicking on "Login" and tired to create a new
> account and was told "The action you have requested is limited to users in
> the group: Administrators<http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1>
> ."

Thanks - I've passed that on to our volunteer SysAdmin team.

(As an aside, do you have a GitHub account and would you think
it would be easier to use the wiki hosted on GitHub instead of
our own MediaWiki installation?)

Thanks,

Peter

From swang129 at gmail.com  Mon Apr 15 07:15:23 2013
From: swang129 at gmail.com (Sarah Wang)
Date: Mon, 15 Apr 2013 04:15:23 -0700
Subject: [Biopython] pysam installation errors Inbox x
In-Reply-To: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
References: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
Message-ID: <CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>

When I tried to install pysam with "python setup.py install", multiple
> warning messages have been generated (error messages copied below). I can
> not import pysam. How can I resolve them? Thanks
>
> $Python setup.py install
>
> ...
> Compiling module Cython.Plex.Scanners ...
> Compiling module Cython.Plex.Actions ...
> Compiling module Cython.Compiler.Lexicon ...
> Compiling module Cython.Compiler.Scanning ...
> Compiling module Cython.Compiler.Parsing ...
> Compiling module Cython.Compiler.Visitor ...
> Compiling module Cython.Compiler.FlowControl ...
> Compiling module Cython.Compiler.Code ...
> Compiling module Cython.Runtime.refnanny ...
> warning: no files found matching '*.pyx' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.pxd' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.h' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.pxd' under directory 'Cython/Utility'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> /tmp/easy_install-9yggMe/
> Cython-0.18/Cython/Plex/Scanners.c:7117:18:
> warning:
>       unused function '__Pyx_CyFunction_New' [-Wunused-function]
> static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef
> *ml,...
>                  ^
> 1 warning generated.
> /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:2992:31:
> warning:
>       implicit conversion loses integer precision: 'long' to 'int'
>       [-Wshorten-64-to-32]
>   __pyx_v_self->input_state = __pyx_v_input_state;
>                             ~ ^~~~~~~~~~~~~~~~~~~
> /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:7117:18:
> warning:
>       unused function '__Pyx_CyFunction_New' [-Wunused-function]
> static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef
> *ml,...
>                  ^
> 2 warnings generated.
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> Adding Cython 0.18 to easy-install.pth file
> Installing cygdb script to /usr/local/bin
> Installing cython script to /usr/local/bin
>
> Installed
> /Library/Python/2.7/site-packages/Cython-0.18-py2.7-macosx-10.8-intel.egg
> Finished processing dependencies for pysam==0.7.4
>
>
> >>> import pysam
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pysam/__init__.py", line 1, in <module>
>     from pysam.csamtools import *
> ImportError: No module named csamtools
>

From p.j.a.cock at googlemail.com  Mon Apr 15 07:27:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 12:27:30 +0100
Subject: [Biopython] pysam installation errors Inbox x
In-Reply-To: <CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>
References: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
	<CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>
Message-ID: <CAKVJ-_4F1X2NKKJ_4fkYUnh-kvau3iFsmugCe2Cs=vVBMk8FsQ@mail.gmail.com>

On Mon, Apr 15, 2013 at 12:15 PM, Sarah Wang <swang129 at gmail.com> wrote:
> When I tried to install pysam with "python setup.py install", multiple
> warning messages have been generated (error messages copied below). I can
> not import pysam. How can I resolve them? Thanks

Hi Sarah,

This is the Biopython mailing list, and while we do discuss other
tools in this case the pysam Google Group is the best place to ask:

https://groups.google.com/forum/?fromgroups=#!topic/pysam-user-group/tOikIFU_ZFk

Peter

P.S. Those were compiler warnings, not errors, and I would guess they
can be ignored.

From ferreirafm at usp.br  Mon Apr 15 08:34:12 2013
From: ferreirafm at usp.br (Frederico Moraes Ferreira)
Date: Mon, 15 Apr 2013 09:34:12 -0300
Subject: [Biopython] BioPython now available on PiCloud by default
In-Reply-To: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
References: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
Message-ID: <516BF3C4.1070107@usp.br>

Hi John,
Thanks for sharing  such a very nice module.
Best,
Fred

Em 12-04-2013 20:11, John Riley escreveu:
> Hello,
>
> We've had some requests for BioPython to be deployed on PiCloud [1]. While
> any user could always create a custom environment, and install the latest
> version themselves [2], we've decided to address the issue directly by
> adding BioPython (1.60) into the default suite of scientific tools on
> PiCloud.
>
> In short, to offload a Python function or program that uses BioPython, you
> don't need to do any setup! The instructions for using other scientific
> tools work just the same [3]. Hope this helps!
>
> [1] http://www.picloud.com
> [2] http://docs.picloud.com/environment.html
> [3] http://docs.picloud.com/howto/pyscientifictools.html
>
> Best Regards,
> John
>
> --
> John Riley
> PiCloud, Inc.
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

-- 
Dr. Frederico Moraes Ferreira
University of Sao Paulo
School of Medice
Heart Institute - Immunology
Av. Dr. En?as de Carvalho Aguiar, 44
05403-900     S?o Paulo - SP
Brasil


From jgibbons1 at mail.usf.edu  Mon Apr 15 15:40:15 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Mon, 15 Apr 2013 15:40:15 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
	<CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
Message-ID: <CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>

It looks like there is already an example of this in the tutorial under
18.1.5, but I was planning on making it a self contained cookbook example
so that it is easier to find.

If this is the fastest way to do it though:

with open(new_file_path, "w") as handle:
    for seq_id in seq_ids:
        handle.write(indexed_fasta.
        get_raw(seq_id))
Is there any advantage to using SeqIO.write() other then it being shorter?

I do not have a GitHub account so I cannot comment on whether it would be
easier to use Github.

Thanks,

Justin


On Mon, Apr 15, 2013 at 6:46 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
> wrote:
> > Sorry I accidentally sent the last email.
> >
> > You need the indexed fasta to get a copy of the sequence records that
> match
> > your criteria:
> >
> > indexed_fasta=SeqIO.index(file_path, 'fasta')
> > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> > new_file_path,'fasta')
>
> With a simple sequential file format like FASTA where there are no complex
> file headers/footers to worry about, this might be the faster route:
>
> with open(new_file_path, "w") as handle:
>     for seq_id in seq_ids:
>         handle.write(indexed_fasta.get_raw(seq_id))
>
> The idea here is never to parse the records into SeqRecord objects, just
> keep them as raw strings in FASTA format. The same idea works well on
> GenBank or SwissProt files which are slower to parse, there are examples
> of this in the main Tutorial,
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Were you intending this to be a self contained cookbook example for:
> http://biopython.org/wiki/Category:Cookbook ?
>
> > As for editing the wiki when I click on "Login with OpenID" I get sent
> to a
> > blank page. I also tried clicking on "Login" and tired to create a new
> > account and was told "The action you have requested is limited to users
> in
> > the group: Administrators<
> http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1
> >
> > ."
>
> Thanks - I've passed that on to our volunteer SysAdmin team.
>
> (As an aside, do you have a GitHub account and would you think
> it would be easier to use the wiki hosted on GitHub instead of
> our own MediaWiki installation?)
>
> Thanks,
>
> Peter
>

From p.j.a.cock at googlemail.com  Tue Apr 16 05:02:58 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 16 Apr 2013 10:02:58 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
	<CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
	<CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>
Message-ID: <CAKVJ-_63O2rCzW7HGMx9zHF4_BLNjyANz6+v1xv28H5+6UEBQQ@mail.gmail.com>

On Mon, Apr 15, 2013 at 8:40 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> It looks like there is already an example of this in the tutorial under
> 18.1.5, but I was planning on making it a self contained cookbook example
> so that it is easier to find.
>
> If this is the fastest way to do it though:
>
> with open(new_file_path, "w") as handle:
>     for seq_id in seq_ids:
>         handle.write(indexed_fasta.
>         get_raw(seq_id))
> Is there any advantage to using SeqIO.write() other then it being shorter?

There are two linked choices here,

(a) Full parsing into SeqRecord objects using SeqIO.parse, or use
the SeqIO.index or SeqIO.index_db to just extract the record identifiers.
Unless you need some of the annotation or the sequence, parsing it
into a SeqRecord is a waste of CPU time.

(b) Convert the SeqRecord back into a file on disk, or reuse the
original representation from the input file. For a format like FASTA,
this is almost a moot point - the only change is the white space
(using SeqIO.write will produce consistent line wrapping). For
some of the richer formats like GenBank the parse/write round
trip is not expected to produce an identical output, so it can be
prudent to reuse the original. For some formats like we don't
have writing support, so you have to reuse the original.

My point whether to use SeqIO.write() or indexing and get_raw()
depends on the file format and what you are trying to do. My
recommendations would be to use get_raw to write simple file
formats without headers/footers if:

(*) You need to preserve original records exactly
(*) You need this to be as fast as possible
(*) SeqIO.write doesn't support the file format

Otherwise using SeqIO.write should be fine - it is also simpler
in terms of the code to call it.

If course, if you are editing the records in any way, then you
must use SeqIO.write anyway.

> I do not have a GitHub account so I cannot comment on whether
> it would be easier to use Github.

Thanks. My thinking right now you would need to register separately
for (1) the mailing lists, (2) editing the wiki, (3) reporting bugs on
RedMine, (4) submitting pull requests on github, If we used GitHub
for the wiki and/or issue tracker, this means less user accounts
so a little easier for contributors, but also less SysAdmin work
behind the scenes.

Peter

From nuin at genedrift.org  Wed Apr 17 14:45:20 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Wed, 17 Apr 2013 14:45:20 -0400
Subject: [Biopython] GEO profiles retrieval
Message-ID: <FCB53CAF-E8F4-47D7-8506-7B3D55870CA8@genedrift.org>

Hi everyone

Quite a longish question about some data retrieval we are trying to implement on GEO profiles. I don't know if this is possible to achieve programatically with (or without BioPython), but some parts I already have set using Python and BioPython. What we are trying to achieve:

- we are building a pipeline where initially we want to see if the gene in question (let's say PTEN) is over or under expressed in certain conditions.
- using a eSearch URL/procedure I can get an XML with all the profile IDs for PTEN
- in order to get more information about each profile, I can use an eSummary URL/procedure that will get an XML file for each profile
- with these profiles we then want to check the gene expression level in each sample subgroup or the study and see if the gene is under or over expressed, or there's no change between the groups.

The problem I have is that in the profile XML file there's no information about sample annotation, or gene expression in each sample. I created a workaround that from the eSummary XML, I can get to this page of the profile

http://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2877:1441937_s_at

using the GDS and probe ID found on the XML. Again, from this file there's no easy way to extract the sample grouping/annotation, although it's quite straightforward to extract the gene expression levels for each sample. What I want to find is:

- a way to get sample grouping/annotation for a specific GDS, that would give me the sample IDs that I could correlate to an expression value
- a eSearch, eSummary, eFetch, any URL that would give me expression values per sample, with sample ID annotated to a group

Thanks in advance for any help, idea and comments.

Paulo

From markbudde at gmail.com  Wed Apr 17 17:24:00 2013
From: markbudde at gmail.com (Mark Budde)
Date: Wed, 17 Apr 2013 14:24:00 -0700
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
Message-ID: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>

Hi, I have a simple question. The cookbook shows many examples using
SeqFeatures, I can't find any information on adding features to a
SeqRecord.

Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
would I add this to my SeqRecord?

Thanks,
Mark

From p.j.a.cock at googlemail.com  Wed Apr 17 17:53:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Apr 2013 22:53:57 +0100
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
In-Reply-To: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
References: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
Message-ID: <CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>

Hi Mark,

On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde <markbudde at gmail.com> wrote:
> Hi, I have a simple question. The cookbook shows many examples using
> SeqFeatures, I can't find any information on adding features to a
> SeqRecord.

The "Tutorial and Cookbook" does have examples of creating a
SeqFeature - if this was not obvious to you how might we make
it clearer?

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

See also the docstrings,

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> help(SeqFeature)
>>> help(FeatureLocation)

Online here (for the current release):
http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html
http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html

> Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
> nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
> would I add this to my SeqRecord?
>
> Thanks,
> Mark

Which version of Biopython do you have? The strand is moving
from the SeqFeature to the FeatureLocation, but this will work
on old and new:

from Bio.SeqFeature import SeqFeature, FeatureLocation
loc = FeatureLocation(9, 100)
f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"})

This is preferred for future-proofing:

from Bio.SeqFeature import SeqFeature, FeatureLocation
loc = FeatureLocation(9, 100, strand=-1)
f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"})

Exactly where you put the gene name depends on what you'll be
doing with the record - for GenBank or EMBL output, using a
locus_tag key would be a sensible option.

Then if you have a SeqRecord, use my_record.features.append(f)
or similar (and for GenBank/EMBL output pay attention to the
order).

Is that clear?

Regards,

Peter

From markbudde at gmail.com  Wed Apr 17 18:52:31 2013
From: markbudde at gmail.com (Mark Budde)
Date: Wed, 17 Apr 2013 15:52:31 -0700
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
In-Reply-To: <CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>
References: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
	<CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>
Message-ID: <CAEwaGEuZN86SpseXNRg2qVjb3M05M-gdmrDVTHwanUAV=7dAWA@mail.gmail.com>

On Wed, Apr 17, 2013 at 2:53 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Mark,
>
> On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde <markbudde at gmail.com> wrote:
> > Hi, I have a simple question. The cookbook shows many examples using
> > SeqFeatures, I can't find any information on adding features to a
> > SeqRecord.
>
> The "Tutorial and Cookbook" does have examples of creating a
> SeqFeature - if this was not obvious to you how might we make
> it clearer?
>
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> I am coming at this from the perspective of generating a plasmid with
features on it. I guess most people would be using this for mining data
from pubmed or something, so maybe I'm just not the targeted user. I spent
a lot of time looking for how to name a feature, like you would in a vector
editing program. I now see that I can generate a feature as shown in the
first example in 4.3.3 - is this what you are referring to? I was confused
earlier because I could never figure out how to name the feature, nor how
to add it to the SeqRecord. I can see how to do this from you example below
(using qualifiers to name the feature, and append to add the feature). I
think the cookbook would benefit from adding a line such as

>>> len(MyRecord.features)
0
>>> example_feature.qualifiers['locus_tag'] = 'Gene1'
>>> MyRecord.features.append(example_feature)
>>> len(MyRecord.features)
1


> See also the docstrings,
>
> >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
> >>> help(SeqFeature)
> >>> help(FeatureLocation)
>
> Online here (for the current release):
> http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html
>
> http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html
>
> > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
> > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
> > would I add this to my SeqRecord?
> >
> > Thanks,
> > Mark
>
> Which version of Biopython do you have? The strand is moving
> from the SeqFeature to the FeatureLocation, but this will work
> on old and new:
>
> I have v1.59

> from Bio.SeqFeature import SeqFeature, FeatureLocation
> loc = FeatureLocation(9, 100)
> f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"})
>
> This is preferred for future-proofing:
>
> from Bio.SeqFeature import SeqFeature, FeatureLocation
> loc = FeatureLocation(9, 100, strand=-1)
> f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"})
>
> Exactly where you put the gene name depends on what you'll be
> doing with the record - for GenBank or EMBL output, using a
> locus_tag key would be a sensible option.
>
> Then if you have a SeqRecord, use my_record.features.append(f)
> or similar (and for GenBank/EMBL output pay attention to the
> order).
>
> Is that clear?

Yes. Your example provided here is clear and I think it should be added to
the cookbook.

>
>
Regards,
>
> Peter
>
Thanks for your help Peter, and pardon my ignorance.
-Mark

From mictadlo at gmail.com  Mon Apr 22 00:05:58 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 22 Apr 2013 14:05:58 +1000
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments'
Message-ID: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>

Hi,
The following code (BioPython 1.61, Blast+ 2.2.26):

from Bio.Blast import NCBIXML

with open("test/X.xml") as bf:
    blast_records = NCBIXML.parse(bf)

    for blast_record in blast_records:
        for alignment in blast_records.alignments:
            for hsp in alignment.hsps:
                if hsp.expect < 0.04:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect
                    print hsp.query[0:75] + '...'
                    print hsp.match[0:75] + '...'
                    print hsp.sbjct[0:75] + '...'

caused the following error:
$ python parseBlastXML.py
Traceback (most recent call last):
  File "parseBlastXML.py", line 8, in <module>
    for alignment in blast_records.alignments:
AttributeError: 'generator' object has no attribute 'alignments'

What did I do wrong?

Thank you in advance.

Mic

From mictadlo at gmail.com  Mon Apr 22 00:27:12 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 22 Apr 2013 14:27:12 +1000
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute
	'alignments'
In-Reply-To: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
References: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
Message-ID: <CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>

My mistake. This is the solution
from Bio.Blast import NCBIXML

with open("test/XA10m_v3.0.aa.snap_vs_uniref90.blastp.xml") as bf:
    blast_records = NCBIXML.parse(bf)

    for blast_record in blast_records:
        for alignment in *blast_record.alignments*:
            for hsp in alignment.hsps:
                if hsp.expect < 0.04:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect
                    print hsp.query[0:75] + '...'
                    print hsp.match[0:75] + '...'
                    print hsp.sbjct[0:75] + '...'


On Mon, Apr 22, 2013 at 2:05 PM, Mic <mictadlo at gmail.com> wrote:

> Hi,
> The following code (BioPython 1.61, Blast+ 2.2.26):
>
> from Bio.Blast import NCBIXML
>
> with open("test/X.xml") as bf:
>     blast_records = NCBIXML.parse(bf)
>
>     for blast_record in blast_records:
>         for alignment in blast_records.alignments:
>             for hsp in alignment.hsps:
>                 if hsp.expect < 0.04:
>                     print '****Alignment****'
>                     print 'sequence:', alignment.title
>                     print 'length:', alignment.length
>                     print 'e value:', hsp.expect
>                     print hsp.query[0:75] + '...'
>                     print hsp.match[0:75] + '...'
>                     print hsp.sbjct[0:75] + '...'
>
> caused the following error:
> $ python parseBlastXML.py
> Traceback (most recent call last):
>   File "parseBlastXML.py", line 8, in <module>
>     for alignment in blast_records.alignments:
> AttributeError: 'generator' object has no attribute 'alignments'
>
> What did I do wrong?
>
> Thank you in advance.
>
> Mic
>
>
>

From p.j.a.cock at googlemail.com  Mon Apr 22 04:08:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Apr 2013 09:08:50 +0100
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute
	'alignments'
In-Reply-To: <CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>
References: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
	<CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>
Message-ID: <CAKVJ-_4QtsDdNct8rMrHMg7Jc==HVqVx8ssK47NkP6dByPgJ5g@mail.gmail.com>

On Monday, April 22, 2013, Mic wrote:

> My mistake. This is the solution
> from Bio.Blast import NCBIXML


Hi Mic,

Yep, you had two variables with very similar names.
An easy mistake to make - its one of the things
you'll learn to check with an AttrributeError: Am
I using the object I think I'm using. Well done
for solving it yourself, and thank you for posting
the solution here.

Regards,

Peter

From mictadlo at gmail.com  Wed Apr 24 01:55:06 2013
From: mictadlo at gmail.com (Mic)
Date: Wed, 24 Apr 2013 15:55:06 +1000
Subject: [Biopython] NCBIXML: hit start and end
Message-ID: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>

Hi,
I have tried to rewrite the Perl code to Biopython

sub retrieve {
    my $blast_report = $options->{'blast'};
    my $max_hits  = $options->{'maxhits'};
    my $searchio     = new Bio::SearchIO(
        -format => 'blast',
        -file   => $blast_report
    );

    while ( my $result = $searchio->next_result ) {
        my $query_name = $result->query_name();
        my $count_unirefs   = 0;
        my %hit_names_count = ();
        while ( my $hit = $result->next_hit ) {
            $count_unirefs++;


            my $count_hsp = 0;
            my @plushsps  = ();
            my @minhsps   = ();

            while ( my $hsp = $hit->next_hsp ) {
                $count_hsp++;
                my $query_start = $hsp->start('query');
                my $query_end   = $hsp->end('query');
                my $hit_start   = $hsp->start('hit');
                my $hit_end     = $hsp->end('hit');
                my $strand      = $hsp->strand();
                my $hit_desc    = $hit->description();


                my @hsp_data    = ($query_start, $query_end, $hit_start,
$hit_end, $hit_desc);


            }
        }

    }
}


Biopython code:
---------------
from Bio import SeqIO
from Bio.Blast import NCBIXML

def retrieve_hits_data():

    max_hits = 5  # Change to args

    with open("test/x.xml") as bf:
        blast_records = NCBIXML.parse(bf)

        for blast_record in blast_records:
            print blast_record.query
            print
            for alignment in blast_record.alignments:
                print 'sequence:', alignment.title
                print alignment.hit_id
                print alignment.hit_def
                print 'length:', alignment.length

                for hsp in alignment.hsps:
                    print "HSPs"
                    print "----"
                    print 'e value:', hsp.expect
                    #print hsp.query
                    #print hsp.match
                    #print hsp.sbjct
                    print hsp.score
                    print hsp.bits
                    print hsp.num_alignments
                    print hsp.identities
                    print hsp.positives
                    print hsp.gaps
                    print hsp.align_length
                    print hsp.strand
                    print hsp.frame
                    print hsp.query_start
                    print hsp.query_end
                    #print hsp.hit_start
                    #print hsp.hit_end
                    print hsp.sbjct_start
                    print hsp.sbjct_end


retrieve_hits_data()


Output from Biopython code:
XA10_v3.0-snap.1

XA10_v3.0-snap.2

XA10_v3.0-snap.3

XA10_v3.0-snap.4

sequence: UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana
RepID=Q9FX16_ARATH
UniRef90_Q9FX16
F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH
length: 308
HSPs
----
e value: 8.30308e-88
709.0
277.715
None
146
192
10
285
(None, None)
(0, 0)
10
290
8
286


How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in
Biopython?
Why does blast_record.query appears immediately in sequence and not after
the other two for loops has finished?

Thank you in advance.

Mic

From w.arindrarto at gmail.com  Wed Apr 24 03:04:02 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 24 Apr 2013 09:04:02 +0200
Subject: [Biopython] NCBIXML: hit start and end
In-Reply-To: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>
References: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>
Message-ID: <CADEGkF5WPrKDYD+gk=z3_7SHNpFv4cg0rv-6mAYZ+11gcFzkHw@mail.gmail.com>

Hi Mic,

> How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in
> Biopython?

With NCBIXML, they should be hsp.sbjct_start and hsp.sbjct_end respectively.

> Why does blast_record.query appears immediately in sequence and not after
> the other two for loops has finished?

It may be because the first three queries in your BLAST XML results
(XA10_v3.0-snap.{1..3}) do not have any hits and hsps. Check with your
XML  results to be sure.

Hope that helps :),
Bow

From p.j.a.cock at googlemail.com  Wed Apr 24 15:19:48 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 24 Apr 2013 20:19:48 +0100
Subject: [Biopython] Biopython GSoC 2013 applications via NESCent
Message-ID: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>

To all the Biopythoneers,

For the last few years Biopython has participated in the
Google Summer of Code (GSoC) program under the umbrella
of the Open Bioinformatics Foundation (OBF):
https://developers.google.com/open-source/soc/
https://github.com/OBF/GSoC

Unfortunately like quite a few previously accepted organisations,
this year the OBF not accepted. Google has kept the total about
the same year on year, so this is probably simply a slot rotation
to get some new organisations involved.

The good news (for those not following the Biopython-dev
mailing list) is we have an alternative option agreed with
the good people at NESCent, as we did back in 2009:

http://biopython.org/wiki/Google_Summer_of_Code
http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013

I'd like to thank Eric for co-ordinating this, and encourage
any interested potential students to sign up to the Biopython
development list and NESCent's Google+ group as soon as
possible (if you haven't done so already):

http://lists.open-bio.org/mailman/listinfo/biopython-dev
https://plus.google.com/communities/105828320619238393015

Google are already accepting student applications, and the
deadline is Friday 3 May.  That doesn't leave very long for
asking feedback and talking to potential mentors - which
is essential for a competitive proposal.

Thank you for your interest,

Peter

From nuin at genedrift.org  Thu Apr 25 14:42:07 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 25 Apr 2013 14:42:07 -0400
Subject: [Biopython] PubmedCentral XML parsing
Message-ID: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>

Hi

What would be the most direct way of parsing XML files downloaded from PubmedCentral ftp using BioPython?  These are files that use the archivearticle.dtd and when parsed using non-DTD based code generate broken paragraphs on the body of the document due to < > between <p> items of the body.

Thanks in advance

Paulo 

From p.j.a.cock at googlemail.com  Thu Apr 25 15:05:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 25 Apr 2013 20:05:32 +0100
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
Message-ID: <CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>

On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi
>
> What would be the most direct way of parsing XML files downloaded from
> PubmedCentral ftp using BioPython?  These are files that use the
> archivearticle.dtd and when parsed using non-DTD based code generate broken
> paragraphs on the body of the document due to < > between <p> items of the
> body.
>
> Thanks in advance
>
> Paulo

The Bio.Entrez parser is DTD based, and might suit your needs.

Peter

From nuin at genedrift.org  Thu Apr 25 15:16:49 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 25 Apr 2013 15:16:49 -0400
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
	<CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
Message-ID: <A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>

Hi Peter

Thanks a lot. I am getting an error when trying to parse with Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP server in order to avoid their bulk downloading restrictions. Anyway, the code I am using is quite simple (with ipython):

In [1]: from Bio import Entrez

In [2]: handle = open('nihms83342.nxml')

In [3]: records = Entrez.parse(handle)

In [4]: for i in records:
   ...:     print i
   ...:
---------------------------------------------------------------------------
NotXMLError                               Traceback (most recent call last)
<ipython-input-4-82461854c9e7> in <module>()
----> 1 for i in records:
      2     print i
      3

/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
    229                         # We did not see the initial <!xml declaration, so
    230                         # probably the input data is not in XML format.
--> 231                         raise NotXMLError("XML declaration not found")
    232                 self.parser.Parse("", True)
    233                 self.parser = None

NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

And the file header is

<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="EN">
	<?properties open_access?>
	<?properties manuscript?>
	<front>
		<journal-meta>

Is there a different way of parsing this file?

Thanks in advance

Paulo


On 2013-04-25, at 3:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>> Hi
>> 
>> What would be the most direct way of parsing XML files downloaded from
>> PubmedCentral ftp using BioPython?  These are files that use the
>> archivearticle.dtd and when parsed using non-DTD based code generate broken
>> paragraphs on the body of the document due to < > between <p> items of the
>> body.
>> 
>> Thanks in advance
>> 
>> Paulo
> 
> The Bio.Entrez parser is DTD based, and might suit your needs.
> 
> Peter


From zhigang.wu at email.ucr.edu  Fri Apr 26 20:52:19 2013
From: zhigang.wu at email.ucr.edu (Zhigang Wu)
Date: Fri, 26 Apr 2013 17:52:19 -0700
Subject: [Biopython] [Biopython-dev] Biopython GSoC 2013 applications
	via NESCent
In-Reply-To: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>
References: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>
Message-ID: <CADhJE9t9u2QQ5rM4mpD9eHDUuCSbV4PDJmOgpV9SHSix1X6yLA@mail.gmail.com>

Hi Peter,

I am interested in implementing the lazy-loading sequence parsers.
I know the time is pretty tight for me to write an proposal on it. But even
I cannot contribute under the umbrella of GSoC and assuming no body is
implemented, I am still interested in implementing this (I just wanna have
something nice on my CV and while contributing to Open source software
community as well). While at this moment, I don't have very clear picture
on how to do it. Can you point me to somewhere where I can start to get a
sense how this can be implemented. As far as I know, samtools (view) may
have similar techniques in them. Thanks.


Zhigang


On Wed, Apr 24, 2013 at 12:19 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> To all the Biopythoneers,
>
> For the last few years Biopython has participated in the
> Google Summer of Code (GSoC) program under the umbrella
> of the Open Bioinformatics Foundation (OBF):
> https://developers.google.com/open-source/soc/
> https://github.com/OBF/GSoC
>
> Unfortunately like quite a few previously accepted organisations,
> this year the OBF not accepted. Google has kept the total about
> the same year on year, so this is probably simply a slot rotation
> to get some new organisations involved.
>
> The good news (for those not following the Biopython-dev
> mailing list) is we have an alternative option agreed with
> the good people at NESCent, as we did back in 2009:
>
> http://biopython.org/wiki/Google_Summer_of_Code
> http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013
>
> I'd like to thank Eric for co-ordinating this, and encourage
> any interested potential students to sign up to the Biopython
> development list and NESCent's Google+ group as soon as
> possible (if you haven't done so already):
>
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> https://plus.google.com/communities/105828320619238393015
>
> Google are already accepting student applications, and the
> deadline is Friday 3 May.  That doesn't leave very long for
> asking feedback and talking to potential mentors - which
> is essential for a competitive proposal.
>
> Thank you for your interest,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From mictadlo at gmail.com  Sun Apr 28 21:13:49 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 29 Apr 2013 11:13:49 +1000
Subject: [Biopython] gff installation failed with easy_install
Message-ID: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>

Hi,
I have tried to install gff with easy_install, but I got the following
error:
$ easy_install --prefix=/home/mic/apps/pymodules -UZ
https://github.com/chapmanb/bcbb/tree/master/gff
Downloading https://github.com/chapmanb/bcbb/tree/master/gff
error: Unexpected HTML page found at
https://github.com/chapmanb/bcbb/tree/master/gff

How is it possible to install gff?

Thank you in advance.

Mic

From chapmanb at 50mail.com  Mon Apr 29 06:34:42 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Apr 2013 06:34:42 -0400
Subject: [Biopython] gff installation failed with easy_install
In-Reply-To: <517DEECF.60705@bx.psu.edu>
References: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>
	<517DEECF.60705@bx.psu.edu>
Message-ID: <87bo8xhbgd.fsf@fastmail.fm>


Mic;

> I have tried to install gff with easy_install, but I got the following 
> error:
> $ easy_install --prefix=/home/mic/apps/pymodules -UZ 
> https://github.com/chapmanb/bcbb/tree/master/gff
> Downloading https://github.com/chapmanb/bcbb/tree/master/gff
> error: Unexpected HTML page found at 
> https://github.com/chapmanb/bcbb/tree/master/gff
>
> How is it possible to install gff?

I don't know of a way to install directly from git with subdirectories
like that. You'd need to clone, then install with easy_install or pip:

$ git clone git://github.com/chapmanb/bcbb.git
$ easy_install bcbb/gff
$ pip install bcbb/gff

Apologies about the convoluted setup. Depending on what you're doing,
you might want to have a look at gffutils:

https://github.com/daler/gffutils

We're working on rolling the functionality from the gff library into
this so there'll be one place to work from for GFF in python.

Hope this helps,
Brad

From p.j.a.cock at googlemail.com  Mon Apr 29 07:23:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 29 Apr 2013 12:23:16 +0100
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
	<CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
	<A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>
Message-ID: <CAKVJ-_7_q58ajdUmdvKFunZNtxL=xDGth5wemg+Sk+XAH82AWA@mail.gmail.com>

On Thu, Apr 25, 2013 at 8:16 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi Peter
>
> Thanks a lot. I am getting an error when trying to parse with
> Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP
> server in order to avoid their bulk downloading restrictions. Anyway, the
> code I am using is quite simple (with ipython):
>
> In [1]: from Bio import Entrez
>
> In [2]: handle = open('nihms83342.nxml')
>
> In [3]: records = Entrez.parse(handle)
>
> In [4]: for i in records:
>    ...:     print i
>    ...:
>
> ---------------------------------------------------------------------------
> NotXMLError                               Traceback (most recent call
> last)
> <ipython-input-4-82461854c9e7> in <module>()
> ----> 1 for i in records:
>       2     print i
>       3
>
> /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self,
> handle)
>     229                         # We did not see the initial <!xml
> declaration, so
>     230                         # probably the input data is not in XML
> format.
> --> 231                         raise NotXMLError("XML declaration not
> found")
>     232                 self.parser.Parse("", True)
>     233                 self.parser = None
>
> NotXMLError: Failed to parse the XML data (XML declaration not found).
> Please make sure that the input data are in XML format.
>
> And the file header is
>
> <?xml version="1.0"?>
> <!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange
> DTD v2.3 20070202//EN" "archivearticle.dtd">
> <article xmlns:xlink="http://www.w3.org/1999/xlink"
> xmlns:mml="http://www.w3.org/1998/Math/MathML"
> article-type="research-article" xml:lang="EN">
>         <?properties open_access?>
>         <?properties manuscript?>
>         <front>
>                 <journal-meta>
>
> Is there a different way of parsing this file?
>
> Thanks in advance
>
> Paulo

Hi Paulo,

The header you've shown here does not match the file you
attached to the bug report (the where first line is missing
and there seem to be no line breaks either):
https://redmine.open-bio.org/issues/3430

Where exactly did the nihms83342.nxml file come from?
Is there a URL we can download it from to check?

Thanks,

Peter

From mictadlo at gmail.com  Mon Apr 29 23:13:19 2013
From: mictadlo at gmail.com (Mic)
Date: Tue, 30 Apr 2013 13:13:19 +1000
Subject: [Biopython] gff installation failed with easy_install
In-Reply-To: <87bo8xhbgd.fsf@fastmail.fm>
References: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>
	<517DEECF.60705@bx.psu.edu> <87bo8xhbgd.fsf@fastmail.fm>
Message-ID: <CAOP6n=hmuhyJP-+j5o8-4j4OvmUCuc7JhzTHtZbLzZRqVLzUsg@mail.gmail.com>

Thank you it is working.


On Mon, Apr 29, 2013 at 8:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Mic;
>
> > I have tried to install gff with easy_install, but I got the following
> > error:
> > $ easy_install --prefix=/home/mic/apps/pymodules -UZ
> > https://github.com/chapmanb/bcbb/tree/master/gff
> > Downloading https://github.com/chapmanb/bcbb/tree/master/gff
> > error: Unexpected HTML page found at
> > https://github.com/chapmanb/bcbb/tree/master/gff
> >
> > How is it possible to install gff?
>
> I don't know of a way to install directly from git with subdirectories
> like that. You'd need to clone, then install with easy_install or pip:
>
> $ git clone git://github.com/chapmanb/bcbb.git
> $ easy_install bcbb/gff
> $ pip install bcbb/gff
>
> Apologies about the convoluted setup. Depending on what you're doing,
> you might want to have a look at gffutils:
>
> https://github.com/daler/gffutils
>
> We're working on rolling the functionality from the gff library into
> this so there'll be one place to work from for GFF in python.
>
> Hope this helps,
> Brad
>

From mictadlo at gmail.com  Tue Apr 30 00:12:34 2013
From: mictadlo at gmail.com (Mic)
Date: Tue, 30 Apr 2013 14:12:34 +1000
Subject: [Biopython] GFF parsing with biopython
Message-ID: <CAOP6n=gOKR2EtjOYr-aXvKvMKLXBnoWAanNQLJdi-eNB1=+3qA@mail.gmail.com>

Hi,
I have the following GFF file from a SNAP

X1       SNAP    Einit   2579    2712    -3.221  +       .       X1-snap.1
X1       SNAP    Exon    2813    2945    4.836   +       .       X1-snap.1
X1       SNAP    Eterm   3013    3033    10.467  +       .       X1-snap.1
X1       SNAP    Esngl   3457    3702    -17.856 +       .       X1-snap.2
X1       SNAP    Einit   4901    4974    -4.954  +       .       X1-snap.3
X1       SNAP    Eterm   5021    5150    14.231  +       .       X1-snap.3
X1       SNAP    Einit   6245    7325    -1.525  -       .       X1-snap.4
X1       SNAP    Eterm   5974    6008    5.398   -       .       X1-snap.4


With the code below I have tried to parse the above GFF file

from BCBio import GFF
from pprint import pprint
from BCBio.GFF import GFFExaminer

def retrieve_pred_genes_data():
    with open("test/X1_small.snap.gff") as sf:
        #examiner = GFFExaminer()
        #pprint(examiner.available_limits(sf))

        for rec in GFF.parse(sf):
            pprint(rec.id)
            pprint(rec.description)
            pprint(rec.name)
            pprint(rec.features)
            #pprint(rec.type)              #'SeqRecord' object has no
attribute
            #pprint(rec.ref)               #'SeqRecord' object has no
attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no
attribute
            #pprint(rec.location)          #'SeqRecord' object has no
attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no
attribute
            #pprint(rec.strand)            #'SeqRecord' object has no
attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no
attribute

retrieve_pred_genes_data()


and got the following output:

'X1'
'<unknown description>'
'<unknown name>'
[SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712),
strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945),
strand=1), type='Exon'),
 SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033),
strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702),
strand=1), type='Esngl'),
 SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974),
strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150),
strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325),
strand=-1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008),
strand=-1), type='Eterm')]

and with GFFExaminer I got these:

{'gff_id': {('X1',): 8},
 'gff_source': {('SNAP',): 8},
 'gff_source_type': {('SNAP', 'Einit'): 3,
                     ('SNAP', 'Esngl'): 1,
                     ('SNAP', 'Eterm'): 3,
                     ('SNAP', 'Exon'): 1},
 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}}


I found these examples (
https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt),
but I got these kind of errors:
            #pprint(rec.type)              #'SeqRecord' object has no
attribute
            #pprint(rec.ref)               #'SeqRecord' object has no
attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no
attribute
            #pprint(rec.location)          #'SeqRecord' object has no
attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no
attribute
            #pprint(rec.strand)            #'SeqRecord' object has no
attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no
attribute

What did I do wrong and how is it possible to access all fields in the
above GFF file?

Thank you in advance.

Mic

From markbudde at gmail.com  Mon Apr  1 18:41:43 2013
From: markbudde at gmail.com (Mark Budde)
Date: Mon, 1 Apr 2013 11:41:43 -0700
Subject: [Biopython] New to BP. Looking for closely spaced genes
Message-ID: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>

Hi,
Before I dive too far into BioPython, I'd like to get some input if you
BioPython is an appropriate tool for my task....

I would like to look at the human genome ORF structure and identify regions
where ORFs are closely spaced but differentially regulated, and also
identify whether the ORFs are facing the same direction of opposing
directions. To do this, I assume I would first download the annotated
genome and write a script in BioPython annotating how far each ORF is from
it's neighbors, what the orientation is, and store the result in a
dictionary. Then I would download some expression data sets and add this to
the data to the dictionary. Then I would write some algorithm comparing
gene distance, orientation and expression correlation to generate a list of
candidate ORF pairs which fit my criteria.

My question is, is BioPython a reasonable tool to accomplish this, or is it
going to be way to slow whereas some alternative package is better suited
for my task?
Thanks,
Mark Budde


From dtomso at agbiome.com  Mon Apr  1 19:09:39 2013
From: dtomso at agbiome.com (Dan Tomso)
Date: Mon, 1 Apr 2013 19:09:39 +0000
Subject: [Biopython] New to BP. Looking for closely spaced genes
In-Reply-To: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
References: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
Message-ID: <0bdbbf85a7284f21ad6d03aec6ac55cb@SN2PR03MB015.namprd03.prod.outlook.com>

Hi, Mark.

I think BioPython will have the tools you need to do the mechanical handling of sequences.  You might want to contemplate various strategies to do the positional comparisons and data overlays.  For example, if I were approaching this, I would start building position tables for the various content in SQL and then do the set/join/overlap work there.  

But to re-answer your primary question--yes, you can get the sequence and features parsed in BioPython with reasonable convenience.

Best regards,
Dan Tomso

________________________________________
From: biopython-bounces at lists.open-bio.org on behalf of Mark Budde
Sent: Monday, April 01, 2013 2:41 PM
To: biopython
Subject: [Biopython] New to BP. Looking for closely spaced genes

Hi,
Before I dive too far into BioPython, I'd like to get some input if you
BioPython is an appropriate tool for my task....

I would like to look at the human genome ORF structure and identify regions
where ORFs are closely spaced but differentially regulated, and also
identify whether the ORFs are facing the same direction of opposing
directions. To do this, I assume I would first download the annotated
genome and write a script in BioPython annotating how far each ORF is from
it's neighbors, what the orientation is, and store the result in a
dictionary. Then I would download some expression data sets and add this to
the data to the dictionary. Then I would write some algorithm comparing
gene distance, orientation and expression correlation to generate a list of
candidate ORF pairs which fit my criteria.

My question is, is BioPython a reasonable tool to accomplish this, or is it
going to be way to slow whereas some alternative package is better suited
for my task?
Thanks,
Mark Budde
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From jordan.r.willis at Vanderbilt.Edu  Tue Apr  2 04:40:36 2013
From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R)
Date: Tue, 2 Apr 2013 04:40:36 +0000
Subject: [Biopython] Superimposer troubles
Message-ID: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>


Hello List,


I'm having trouble working through some issues with the superimposer for all-atom superpositions. Often times, we work on protein design and our end PDB files differs in atom-number and sometimes composition from our input. I'm a big fan of the Superimposer, so we have implemented like this:

p = PDBParser()
native_pdb = p.get_structure("input","input.pdb")
designed_pdb = p.get_structure("output","output.pdb")


native_ca_atoms = []
native_all_atoms = []
designed_ca_atoms = []
designed_all_atoms = []
for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()):
	native_ca_atoms.append(native_residue['CA'])
	designed_ca_atoms.append(native_residue['CA']
	for (native_atom, designed_atom) in zip(native_residue.get_list(), designed_residue.get_list()):
		native_all_atoms.append(native_atom)
		designed_atom.append(designed_atom)


superpose_ca = Superimposer()
superpose_all = Superimposer()

superpose_ca.set(native_ca_atoms, designed_ca_atoms)
superpose_ca.apply(designed_pdb)
ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms)


superpose_all.set(native_all_atoms, designed_all_atoms)
superpose_ca.apply(designed_pdb)
all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms)


For the CA atom residues its not really a big deal since everything we design has a CA atom. However when we go into all atoms, it turns out that the designed residue and the native residue can be different, thus leading to a different number of atoms. I didn't realize, but the zip function was making these two lists as big as the smallest list and not necessarily matching up the atoms. It would just hack off some part of the larger list!  This way, the superimposer was never failing because it always had an exact match of atoms. Is the superimposer smart enough to just minimize the rmsd no matter how the lists are input, no matter what order? For instance if I put the same arginines atoms backwards in one list, and forwards in the other list, would it still be able to give a 0.0 rmsd?

Thank you for your feedback,
Jordan

PS. Does the superimposer.rms method give back the RMSD of whatever atoms you put into it? Or is it always the CA atoms?


From anaryin at gmail.com  Tue Apr  2 07:07:08 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 2 Apr 2013 09:07:08 +0200
Subject: [Biopython] Superimposer troubles
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAJ9sUYPCXVn1iP_cw5X+-x3bDjEQZ913EQuHvRn08o_+41_-nQ@mail.gmail.com>

Hey Jordan,

Without checking the code, I'd say order matters. The two sequences of
atoms will be aligned per position. If you have ca, c, n, o or ca, n, o, c
you'll get different results.

Try a simple glycine and switch the order of the atoms. I think it should
work like this, but again, not sure.

As for the rms value, it depends on the input. If it's ca only, you get ca
rmsd, etc.

Cheers,

Jo?o
-----

This message was sent from a mobile phone and is likely to be short,
concise, and direct.
No dia 2 de Abr de 2013 07:26, "Willis, Jordan R" <
jordan.r.willis at vanderbilt.edu> escreveu:

>
> Hello List,
>
>
> I'm having trouble working through some issues with the superimposer for
> all-atom superpositions. Often times, we work on protein design and our end
> PDB files differs in atom-number and sometimes composition from our input.
> I'm a big fan of the Superimposer, so we have implemented like this:
>
> p = PDBParser()
> native_pdb = p.get_structure("input","input.pdb")
> designed_pdb = p.get_structure("output","output.pdb")
>
>
> native_ca_atoms = []
> native_all_atoms = []
> designed_ca_atoms = []
> designed_all_atoms = []
> for (native_residue, designed_residue) in zip(native_pdb.get_residues(),
> designed_pdb.get_residues()):
>         native_ca_atoms.append(native_residue['CA'])
>         designed_ca_atoms.append(native_residue['CA']
>         for (native_atom, designed_atom) in zip(native_residue.get_list(),
> designed_residue.get_list()):
>                 native_all_atoms.append(native_atom)
>                 designed_atom.append(designed_atom)
>
>
> superpose_ca = Superimposer()
> superpose_all = Superimposer()
>
> superpose_ca.set(native_ca_atoms, designed_ca_atoms)
> superpose_ca.apply(designed_pdb)
> ca_rms = my_spiffy_rms_calculator(native_ca_atoms, designed_ca_atoms)
>
>
> superpose_all.set(native_all_atoms, designed_all_atoms)
> superpose_ca.apply(designed_pdb)
> all_rms = my_spiffy_rms_calculator(native_all_atoms, designed_all_atoms)
>
>
> For the CA atom residues its not really a big deal since everything we
> design has a CA atom. However when we go into all atoms, it turns out that
> the designed residue and the native residue can be different, thus leading
> to a different number of atoms. I didn't realize, but the zip function was
> making these two lists as big as the smallest list and not necessarily
> matching up the atoms. It would just hack off some part of the larger list!
>  This way, the superimposer was never failing because it always had an
> exact match of atoms. Is the superimposer smart enough to just minimize the
> rmsd no matter how the lists are input, no matter what order? For instance
> if I put the same arginines atoms backwards in one list, and forwards in
> the other list, would it still be able to give a 0.0 rmsd?
>
> Thank you for your feedback,
> Jordan
>
> PS. Does the superimposer.rms method give back the RMSD of whatever atoms
> you put into it? Or is it always the CA atoms?
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Tue Apr  2 09:38:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Apr 2013 10:38:24 +0100
Subject: [Biopython] Superimposer troubles
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CCA7E67@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAKVJ-_4JnmmahzwMnDYtrps8C5B8=hMtB4VUGXkcAMg_gT3AVA@mail.gmail.com>

On Tue, Apr 2, 2013 at 5:40 AM, Willis, Jordan R
<jordan.r.willis at vanderbilt.edu> wrote:
>
> Hello List,
>
>
> I'm having trouble working through some issues with the superimposer for all-atom
> superpositions. Often times, we work on protein design and our end PDB files
>differs in atom-number and sometimes composition from our input. I'm a big fan
> of the Superimposer, so we have implemented like this:
>
> p = PDBParser()
> native_pdb = p.get_structure("input","input.pdb")
> designed_pdb = p.get_structure("output","output.pdb")
>
>
> native_ca_atoms = []
> native_all_atoms = []
> designed_ca_atoms = []
> designed_all_atoms = []
> for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()):
>         native_ca_atoms.append(native_residue['CA'])
>         designed_ca_atoms.append(native_residue['CA']
>         ...
>
> For the CA atom residues its not really a big deal since everything we design
> has a CA atom. However when we go into all atoms, it turns out that the
> designed residue and the native residue can be different, thus leading to a
> different number of atoms. I didn't realize, but the zip function was making
> these two lists as big as the smallest list and not necessarily matching up
> the atoms. It would just hack off some part of the larger list!  This way,
> the superimposer was never failing because it always had an exact
> match of atoms.

How about using izip_longest (from itertools) rather than zip? That
should give a clear error when the residue counts are different.

In general however, dealing with similar but different chains will
require some sort of pairwise alignment and/or restricting to just
backbone atoms (like CA, C-alpha).

Peter


From p.j.a.cock at googlemail.com  Tue Apr  2 16:33:53 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Apr 2013 17:33:53 +0100
Subject: [Biopython] New to BP. Looking for closely spaced genes
In-Reply-To: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
References: <CAEwaGEu9gBsdJEy5JiyG06CvE0xLTTE7RDbxHDXz-k9Z9ZxXMg@mail.gmail.com>
Message-ID: <CAKVJ-_5WSp82MG988UEBJ7YbksU0PKjizOAMP2t-DemhNJReTA@mail.gmail.com>

On Mon, Apr 1, 2013 at 7:41 PM, Mark Budde <markbudde at gmail.com> wrote:
> Hi,
> Before I dive too far into BioPython, I'd like to get some input if you
> BioPython is an appropriate tool for my task....
>
> I would like to look at the human genome ORF structure and identify regions
> where ORFs are closely spaced but differentially regulated, and also
> identify whether the ORFs are facing the same direction of opposing
> directions. To do this, I assume I would first download the annotated
> genome and write a script in BioPython annotating how far each ORF is from
> it's neighbors, what the orientation is, and store the result in a
> dictionary. Then I would download some expression data sets and add this to
> the data to the dictionary. Then I would write some algorithm comparing
> gene distance, orientation and expression correlation to generate a list of
> candidate ORF pairs which fit my criteria.
>
> My question is, is BioPython a reasonable tool to accomplish this, or is it
> going to be way to slow whereas some alternative package is better suited
> for my task?
> Thanks,
> Mark Budde

Hi Mark,

That sounds very doable with Biopython parsing GenBank format
chromosomes downloaded form the NCBI/EMBL/DDBJ. I did
something similar to look at overlaps and gaps between genes of
bacteria some years back - also using the Biopython GenBank
parser, e.g. http://mbe.oxfordjournals.org/cgi/content/abstract/msp302

In your case with humans there'll be lots of intron/exon structure
(join locations in GenBank) so I'm recommend trying the current
code from git (which will become Biopython 1.62) where this has
been re-factored to hopefully make joins much easier than before.

Regards,

Peter


From linxzh1989 at gmail.com  Sat Apr  6 02:53:49 2013
From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=)
Date: Sat, 6 Apr 2013 10:53:49 +0800
Subject: [Biopython] MUSCLE for alignment
Message-ID: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>

Hi all !
I have a seqdump.fasta file:
>lcl|24977
TGAGAAAGACTTGAGAGGACA

>lcl|24977:8-21
GAGATGACTTAGAGGACA

I want to use a wrapper for Muscle in Biopython to align the two seq.
the alignment result will put into a existing fasta file.

>>>from Bio.Align.Applications import MuscleCommandline
>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')

But i can not find anything in the result.fasta after i run the command.
Do i have any missing to get the result?

regards
Lin


From p.j.a.cock at googlemail.com  Sat Apr  6 08:58:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 09:58:30 +0100
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
Message-ID: <CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>

On Sat, Apr 6, 2013 at 3:53 AM, ??? <linxzh1989 at gmail.com> wrote:
> Hi all !
> I have a seqdump.fasta file:
>>lcl|24977
> TGAGAAAGACTTGAGAGGACA
>
>>lcl|24977:8-21
> GAGATGACTTAGAGGACA
>
> I want to use a wrapper for Muscle in Biopython to align the two seq.
> the alignment result will put into a existing fasta file.
>
>>>>from Bio.Align.Applications import MuscleCommandline
>>>>mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')
>
> But i can not find anything in the result.fasta after i run the command.
> Do i have any missing to get the result?
>
> regards
> Lin

Hi Lin,

In your example you've not yet called Muscle,

#Load the library:
from Bio.Align.Applications import MuscleCommandline

#Create command line wrapper instance,
mcline = MuscleCommandline(input='seqdump.fasta',out='result.fasta')

#Optionally show what command it would run:
print mcline

#Actually run the command,
stdout, stderr = mcline()

Does that help? The main Tutorial does have some more
detailed examples.

Peter


From p.j.a.cock at googlemail.com  Sat Apr  6 11:41:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 12:41:33 +0100
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
	<CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>
	<CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
Message-ID: <CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>

On Sat, Apr 6, 2013 at 12:18 PM, ??? <linxzh1989 at gmail.com> wrote:
> Thank you! Peter.
> It really helps me.
> If i do not specify it by: stdout, stderr = mcline()
> the alignment will writen to stdout, instead of the output file.
> Is it correct?

MUSCLE will by default write the alignment to stdout, but you
used the out argument to specify an output filename instead.
In this case stdout will probably be empty.

There are some stdout examples using MUSCLE in the
Biopython Tutorial:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Peter

P.S. Please CC the mailing list.


From linxzh1989 at gmail.com  Sat Apr  6 13:57:31 2013
From: linxzh1989 at gmail.com (=?GB2312?B?wdbQ0Nba?=)
Date: Sat, 6 Apr 2013 21:57:31 +0800
Subject: [Biopython] MUSCLE for alignment
In-Reply-To: <CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>
References: <CALzRd7On5sh2hEfu_E7-S5QZ4O53YP77VrnnWre8CB63=DD6QQ@mail.gmail.com>
	<CAKVJ-_55yZuX5BKansnu0HfPTLHoj92WE9J0g7koFd2e+MpmjQ@mail.gmail.com>
	<CALzRd7MYYmQmCU4qJsy4iGnpvcC5HzNCfxs87QP4j9vpCb5okg@mail.gmail.com>
	<CAKVJ-_7yLTUTr9_OUjkx-NLcN+D5kT4XU+u96hYY5+kvja-3zA@mail.gmail.com>
Message-ID: <CALzRd7O2kvahTjF01hQ3OmTywGWgRnXqEKHoKcRtKhN1SbwX9Q@mail.gmail.com>

Thank you for you advice.
I will CC the maillling list.

regards

2013/4/6 Peter Cock <p.j.a.cock at googlemail.com>:
> On Sat, Apr 6, 2013 at 12:18 PM, ??? <linxzh1989 at gmail.com> wrote:
>> Thank you! Peter.
>> It really helps me.
>> If i do not specify it by: stdout, stderr = mcline()
>> the alignment will writen to stdout, instead of the output file.
>> Is it correct?
>
> MUSCLE will by default write the alignment to stdout, but you
> used the out argument to specify an output filename instead.
> In this case stdout will probably be empty.
>
> There are some stdout examples using MUSCLE in the
> Biopython Tutorial:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Peter
>
> P.S. Please CC the mailing list.


From nicolas.joannin at gmail.com  Sat Apr  6 15:31:40 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Sun, 7 Apr 2013 00:31:40 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
Message-ID: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>

Hello everyone,

I'm having a problem installing biopython with Python 3.3.1rc1...
Basically, I get several tests failing (in addition to a lot of warnings).

I don't think the failed tests will be a problem for my work, however, I
thought you'd want to have a look... Attached is the output of python3
setup.py test.

Also, if you think I shouldn't use biopython without having these failed
tests fixed first, please let me know!

Best regards,
Nicolas
-------------- next part --------------
Nicolass-MacBook-Air:biopython NicojoAir11$ python3 setup.py test
WARNING - Biopython does not yet officially support Python 3
The 2to3 library will be called automatically now,
and the converted files cached under build/py3.3
Processing Bio
Processing BioSQL
Processing Tests
Processing Scripts
Processing Doc
Python 2to3 processing done.
running test
Python version: 3.3.1rc1 (v3.3.1rc1:92c2cfb92405, Mar 25 2013, 00:54:04) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
Operating system: posix darwin
test_Ace ... ok
test_AlignIO ... ok
test_AlignIO_FastaIO ... ok
test_AlignIO_convert ... ok
test_Application ... ok
test_BioSQL_MySQLdb ... skipping. Install MySQLdb if you want to use mysql with BioSQL 
test_BioSQL_psycopg2 ... skipping. Connection failed, check settings if you plan to use BioSQL: FATAL:  role "postgres" does not exist

test_BioSQL_sqlite3 ... ok
test_CAPS ... ok
test_Chi2 ... ok
test_ClustalOmega_tool ... skipping. Install clustalo if you want to use Clustal Omega from Biopython.
test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use it from Biopython.
test_Cluster ... ok
test_CodonTable ... ok
test_CodonUsage ... ok
test_ColorSpiral ... skipping. Install reportlab if you want to use Bio.Graphics.
test_Compass ... ok
test_Crystal ... ok
test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper.
test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL.
test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss.
test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools.
test_EmbossPrimer ... ok
test_Entrez ... ok
test_Entrez_online ... FAIL
test_Enzyme ... ok
test_FSSP ... ok
test_Fasttree_tool ... skipping. Install fasttree and correctly set the file path to the program if you want to use it from Biopython.
test_File ... ok
test_GACrossover ... ok
test_GAMutation ... ok
test_GAOrganism ... ok
test_GAQueens ... ok
test_GARepair ... ok
test_GASelection ... ok
test_GenBank ... ok
test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics.
test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics.
test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics.
test_HMMCasino ... ok
test_HMMGeneral ... ok
test_HotRand ... ok
test_KDTree ... ok
test_KEGG ... ok
test_KeyWList ... ok
test_Location ... ok
test_LogisticRegression ... ok
test_MMCIF ... skipping. C extension MMCIFlex not installed.
test_Mafft_tool ... ok
test_MarkovModel ... ok
test_Medline ... ok
test_Motif ... ok
test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper.
test_NCBIStandalone ... ok
test_NCBITextParser ... ok
test_NCBIXML ... ok
test_NCBI_BLAST_tools ... ok
test_NCBI_qblast ... ok
test_NNExclusiveOr ... ok
test_NNGene ... ok
test_NNGeneral ... ok
test_Nexus ... ok
test_PAML_baseml ... ok
test_PAML_codeml ... ok
test_PAML_tools ... skipping. Install PAML if you want to use the Bio.Phylo.PAML wrapper.
test_PAML_yn00 ... ok
test_PDB ... ok
test_PDB_KDTree ... ok
test_ParserSupport ... ok
test_Pathway ... ok
test_Phd ... ok
test_Phylo ... ok
test_PhyloXML ... ok
test_Phylo_CDAO ... skipping. Install the librdf Python bindings if you want to use the CDAO tree format.
test_Phylo_NeXML ... ./test_Phylo_NeXML.py:87: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/9w/kkwnss4n52bbc3crhctbhfnh0000gn/T/tmpf9__6a'>
  t2 = next(NeXMLIO.Parser(open(DUMMY, 'rb')).parse())
ok
test_Phylo_depend ... skipping. Install matplotlib if you want to use Bio.Phylo._utils.
test_PopGen_DFDist ... skipping. Install Dfdist, Ddatacal, pv2 and cplot2 if you want to use DFDist with Bio.PopGen.FDist.
test_PopGen_FDist ... skipping. Install fdist2, datacal, pv and cplot if you want to use FDist2 with Bio.PopGen.FDist.
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop.
test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop.
test_PopGen_GenePop_nodepend ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal.
test_PopGen_SimCoal_nodepend ... ok
test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper.
test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper.
test_ProtParam ... ok
test_Restriction ... ok
test_SCOP_Astral ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='SCOP/scopseq-test/astral-scopdom-seqres-all-test.fa' mode='r' encoding='UTF-8'>
  for record in sequences:
ok
test_SCOP_Cla ... ok
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... ok
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SCOP_online ... ok
test_SVDSuperimposer ... ok
test_SearchIO_blast_tab ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:213: BiopythonExperimentalWarning: Bio.SearchIO is an experimental submodule which may undergo significant changes prior to its future official release.
  BiopythonExperimentalWarning)
ok
test_SearchIO_blast_tab_index ... ok
test_SearchIO_blast_text ... ok
test_SearchIO_blast_xml ... ok
test_SearchIO_blast_xml_index ... ok
test_SearchIO_blat_psl ... ok
test_SearchIO_blat_psl_index ... ok
test_SearchIO_exonerate ... ok
test_SearchIO_exonerate_text_index ... ok
test_SearchIO_exonerate_vulgar_index ... ok
test_SearchIO_fasta_m10 ... ok
test_SearchIO_fasta_m10_index ... ok
test_SearchIO_hmmer2_text ... ok
test_SearchIO_hmmer2_text_index ... ok
test_SearchIO_hmmer3_domtab ... ok
test_SearchIO_hmmer3_domtab_index ... ok
test_SearchIO_hmmer3_tab ... ok
test_SearchIO_hmmer3_tab_index ... ok
test_SearchIO_hmmer3_text ... ok
test_SearchIO_hmmer3_text_index ... ok
test_SearchIO_model ... ok
test_SearchIO_write ... ok
test_SeqIO ... ok
test_SeqIO_AbiIO ... ok
test_SeqIO_FastaIO ... ./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:94: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'>
  re_titled = list(FastaIterator(open(filename), alphabet, title_to_ids))
./test_SeqIO_FastaIO.py:95: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/fa01' mode='r' encoding='UTF-8'>
  default = list(SeqIO.parse(open(filename), "fasta", alphabet))
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/centaurea.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/elderberry.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f001' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lavender.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/lupine.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/phlox.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/sweetpea.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/wisteria.nu' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/aster.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/loveliesbleeding.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rose.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
./test_SeqIO_FastaIO.py:48: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'>
  second = next(iterator)
./test_SeqIO_FastaIO.py:83: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/rosemary.pro' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(filename), "fasta", alphabet)
ok
test_SeqIO_Insdc ... ok
test_SeqIO_PdbIO ... ok
test_SeqIO_QualityIO ... ./test_SeqIO_QualityIO.py:348: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  records1 = list(SeqIO.parse(open("Quality/example.fasta"),"fasta"))
./test_SeqIO_QualityIO.py:349: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:357: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  self.assertEqual(h.getvalue(),open("Quality/example.fasta").read())
./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  open("Quality/example.qual")))
./test_SeqIO_QualityIO.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  open("Quality/example.qual")))
./test_SeqIO_QualityIO.py:329: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
./test_SeqIO_QualityIO.py:334: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  records1 = list(SeqIO.parse(open("Quality/example.qual"),"qual"))
./test_SeqIO_QualityIO.py:335: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records2 = list(SeqIO.parse(open("Quality/example.fastq"),"fastq"))
./test_SeqIO_QualityIO.py:344: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  self.assertEqual(h.getvalue(),open("Quality/example.qual").read())
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_original_illumina.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/longreads_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_dna_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/misc_rna_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_original_solexa.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_full_range_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_sanger.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_original_sanger.fastq' mode='r' encoding='UTF-8'>
  for record in records:
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_solexa.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:287: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/wrapping_as_illumina.fastq' mode='rU' encoding='UTF-8'>
  "rU").read()
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.fasta' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads_no_trim.qual' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.fasta' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:223: ResourceWarning: unclosed file <_io.TextIOWrapper name='Roche/E3MFGYR02_random_10_reads.qual' mode='r' encoding='UTF-8'>
  wanted = list(SeqIO.parse(open(out_name), format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_end.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_at_start.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_alt_index_in_middle.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_at_start.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_index_in_middle.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_no_manifest.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/greek.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/paired.sff'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_93.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
./test_SeqIO_QualityIO.py:45: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/tricky.fastq' mode='r' encoding='UTF-8'>
  records = list(SeqIO.parse(open(filename, mode),in_format))
ok
test_SeqIO_SeqXML ... ./test_SeqIO_SeqXML.py:141: DeprecationWarning: Please use assertEqual instead.
  self.assertEquals(len(read1_records),len(read2_records))
ok
test_SeqIO_convert ... ok
test_SeqIO_features ... ./test_SeqIO_features.py:190: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/iro.gb' mode='rU' encoding='UTF-8'>
  gbk_template = open("GenBank/iro.gb", "rU").read()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqFeature.py:155: BiopythonDeprecationWarning: Rather than sub_features, use a CompoundFeatureLocation
  BiopythonDeprecationWarning)
./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds"))
./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.faa' mode='r' encoding='UTF-8'>
  fasta = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:988: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:989: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_cds = list(SeqIO.parse(open(self.gb_filename),"genbank-cds"))
./test_SeqIO_features.py:990: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'>
  fasta = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:1070: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1072: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'>
  fa_records = list(SeqIO.parse(open(self.ffn_filename),"fasta"))
./test_SeqIO_features.py:1023: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1024: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  embl_record = SeqIO.read(open(self.embl_filename),"embl")
./test_SeqIO_features.py:1054: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_record = SeqIO.read(open(self.gb_filename),"genbank")
./test_SeqIO_features.py:1055: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.fna' mode='r' encoding='UTF-8'>
  fa_record = SeqIO.read(open(self.fna_filename),"fasta")
./test_SeqIO_features.py:1059: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  embl_record = SeqIO.read(open(self.embl_filename),"embl")
./test_SeqIO_features.py:1036: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.faa' mode='r' encoding='UTF-8'>
  faa_records = list(SeqIO.parse(open(self.faa_filename),"fasta"))
./test_SeqIO_features.py:1037: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.ffn' mode='r' encoding='UTF-8'>
  ffn_records = list(SeqIO.parse(open(self.ffn_filename),"fasta"))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AAA03323.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/AE017046.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/DD231055_edited.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/Human_contigs.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_000932.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NT_019265.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/SC10H5.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/TRBG361.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='EMBL/U87107.embl' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/arab1.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/blank_seq.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/cor6_6.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/dbsource_wrap.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/extra_keywords.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/gbvrl1_start.seq' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/noref.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/one_of.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/origin_line.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/pri1.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
./test_SeqIO_features.py:28: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/protein_refseq2.gb' mode='r' encoding='UTF-8'>
  gb_records = list(SeqIO.parse(open(filename),in_format))
ok
test_SeqIO_index ... FAIL
test_SeqIO_online ... ok
test_SeqIO_write ... ok
test_SeqRecord ... ok
test_SeqUtils ... ./test_SeqUtils.py:71: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  record = SeqIO.read(open(dna_genbank_filename), "genbank")
./test_SeqUtils.py:55: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/f002' mode='r' encoding='UTF-8'>
  seq_records = list(SeqIO.parse(open(dna_fasta_filename), "fasta"))
ok
test_Seq_objs ... ok
test_SffIO ... ok
test_SubsMat ... ./test_SubsMat.py:21: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_count.txt' mode='r' encoding='UTF-8'>
  ftab_prot = FreqTable.read_count(open(ftab_file))
./test_SubsMat.py:23: ResourceWarning: unclosed file <_io.TextIOWrapper name='SubsMat/protein_freq.txt' mode='r' encoding='UTF-8'>
  ctab_prot = FreqTable.read_freq(open(ctab_file))
./test_SubsMat.py:31: ResourceWarning: unclosed file <_io.BufferedReader name='SubsMat/acc_rep_mat.pik'>
  acc_rep_mat = pickle.load(open(pickle_file, 'rb'))
ok
test_SwissProt ... ok
test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper.
test_TogoWS ... ./test_TogoWS.py:501: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "embl"), "embl")
./test_TogoWS.py:494: ResourceWarning: unclosed file <_io.TextIOWrapper name='GenBank/NC_005816.gb' mode='r' encoding='UTF-8'>
  new = SeqIO.read(TogoWS.convert(open(filename), "genbank", "fasta"), "fasta")
ok
test_Tutorial ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='ls_orchid.gbk.bgz'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_005.txt'>
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.BufferedReader name='tab_2226_tblastn_001.txt'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result1.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='pubmed_result2.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='lipoprotein.txt' mode='r' encoding='UTF-8'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='REB1.pfm' mode='r' encoding='UTF-8'>
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1439: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
  test.globs.clear()
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='meme.out' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='alignace.out' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Arnt.sites' mode='r' encoding='UTF-8'>
./test_Tutorial.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='SRF.pfm' mode='r' encoding='UTF-8'>
ok
test_UniGene ... ok
test_Uniprot ... ./test_Uniprot.py:314: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'>
  ids = [x.strip() for x in open("SwissProt/multi_ex.list")]
./test_Uniprot.py:328: ResourceWarning: unclosed file <_io.TextIOWrapper name='SwissProt/multi_ex.list' mode='r' encoding='UTF-8'>
  ids = [x.strip() for x in open("SwissProt/multi_ex.list")]
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.txt'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.BufferedReader name='SwissProt/multi_ex.xml'>
  function()
ok
test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
test_XXmotif_tool ... skipping. Install XXmotif if you want to use XXmotif from Biopython.
test_align ... ok
test_bgzf ... FAIL
test_geo ... ./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSE16.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM645.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM691.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM700.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/GSM804.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_affy_chp.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_dual.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_family.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
./test_geo.py:24: ResourceWarning: unclosed file <_io.TextIOWrapper name='Geo/soft_ex_platform.txt' mode='r' encoding='latin'>
  fh = open(os.path.join("Geo", file), encoding="latin")
ok
test_kNN ... ok
test_lowess ... ok
test_motifs ... ok
test_pairwise2 ... ok
test_phyml_tool ... skipping. Install PhyML 3.0 if you want to use the Bio.Phylo.Applications wrapper.
test_prodoc ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/Doc/prosite.excerpt.doc' mode='r' encoding='UTF-8'>
  function()
ok
test_prosite1 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00107.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00159.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00165.txt' mode='r' encoding='UTF-8'>
  function()
ok
test_prosite2 ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00432.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00488.txt' mode='r' encoding='UTF-8'>
  function()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/unittest/case.py:385: ResourceWarning: unclosed file <_io.TextIOWrapper name='Prosite/ps00546.txt' mode='r' encoding='UTF-8'>
  function()
ok
test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
test_py3k ... ok
test_raxml_tool ... skipping. Install RAxML (binary raxmlHPC) if you want to test the Bio.Phylo.Applications wrapper.
test_seq ... ok
test_translate ... ok
test_trie ... skipping. Could not import Bio.trie, check C code was compiled.
Bio.Align docstring test ... ok
Bio.Align.Generic docstring test ... ok
Bio.Align.Applications._Clustalw docstring test ... ok
Bio.Align.Applications._ClustalOmega docstring test ... ok
Bio.Align.Applications._Mafft docstring test ... ok
Bio.Align.Applications._Muscle docstring test ... ok
Bio.Align.Applications._Probcons docstring test ... ok
Bio.Align.Applications._Prank docstring test ... ok
Bio.Align.Applications._TCoffee docstring test ... ok
Bio.AlignIO docstring test ... ok
Bio.AlignIO.StockholmIO docstring test ... ok
Bio.Alphabet docstring test ... ok
Bio.Application docstring test ... ok
Bio.bgzf docstring test ... FAIL
Bio.Blast.Applications docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:218: BiopythonDeprecationWarning: Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like blastall, this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:321: BiopythonDeprecationWarning: Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like blastpgp (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Blast/Applications.py:400: BiopythonDeprecationWarning: Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.
  warnings.warn("Like the old rpsblast (and blastall), this wrapper is now deprecated and will be removed in a future release of Biopython.", BiopythonDeprecationWarning)
ok
Bio.Emboss.Applications docstring test ... ok
Bio.GenBank docstring test ... ok
Bio.KEGG.Compound docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/compound.sample' mode='r' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.KEGG.Enzyme docstring test ... /Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='KEGG/enzyme.sample' mode='r' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.Motif docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/SRF.pfm' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Motif/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/meme.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  exception = None
ok
Bio.Motif.Applications._AlignAce docstring test ... ok
Bio.Motif.Applications._XXmotif docstring test ... ok
Bio.motifs docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Motif/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/SRF.pfm' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/meme.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:1289: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'>
  exception = None
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/motifs/__init__.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='motifs/alignace.out' mode='r' encoding='UTF-8'>
  # Copyright 2003-2009 by Bartek Wilczynski.  All rights reserved.
ok
Bio.motifs.applications._alignace docstring test ... ok
Bio.motifs.applications._xxmotif docstring test ... ok
Bio.pairwise2 docstring test ... ok
Bio.Phylo.Applications._Raxml docstring test ... ok
Bio.SearchIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml.bgz'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/wnts.xml'>
  test.globs.clear()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SearchIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'>
  # Copyright 2012 by Wibowo Arindrarto.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Blast/mirna.xml'>
  test.globs.clear()
ok
Bio.SearchIO._model docstring test ... ok
Bio.SearchIO._model.query docstring test ... ok
Bio.SearchIO._model.hit docstring test ... ok
Bio.SearchIO._model.hsp docstring test ... ok
Bio.SearchIO.BlastIO docstring test ... ok
Bio.SearchIO.HmmerIO docstring test ... ok
Bio.SearchIO.FastaIO docstring test ... ok
Bio.SearchIO.BlatIO docstring test ... ok
Bio.SearchIO.ExonerateIO docstring test ... ok
Bio.SeqIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Fasta/f002'>
  test.globs.clear()
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fastq' mode='r' encoding='UTF-8'>
  for record in sequences:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq.bgz'>
  # Copyright 2006-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Quality/example.fastq'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_000932.faa'>
  test.globs.clear()
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='GenBank/NC_005816.faa'>
  test.globs.clear()
ok
Bio.SeqIO.FastaIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'>
  # Copyright 2006-2009 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/FastaIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Fasta/dups.fasta' mode='r' encoding='UTF-8'>
  # Copyright 2006-2009 by Peter Cock.  All rights reserved.
ok
Bio.SeqIO.AceIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/AceIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/consed_sample.ace' mode='rU' encoding='UTF-8'>
  # Copyright 2008-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.TextIOWrapper name='Ace/contig1.ace' mode='rU' encoding='UTF-8'>
  test.globs.clear()
ok
Bio.SeqIO.PhdIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/PhdIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Phd/phd1' mode='r' encoding='UTF-8'>
  # Copyright 2008-2010 by Peter Cock.  All rights reserved.
ok
Bio.SeqIO.QualityIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/__init__.py:672: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='r' encoding='UTF-8'>
  for record in sequences:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/illumina_faked.fastq' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/sanger_faked.fastq' mode='r' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_example.fastq' mode='r' encoding='UTF-8'>
  for record in records:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/QualityIO.py:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.fasta' mode='rU' encoding='UTF-8'>
  for record in records:
/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/Interfaces.py:238: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/example.qual' mode='rU' encoding='UTF-8'>
  for record in records:
ok
./run_tests.py:427: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='r' encoding='UTF-8'>
  gc.collect()
Bio.SeqIO.SffIO docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqIO/SffIO.py:1: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  # Copyright 2009-2010 by Peter Cock.  All rights reserved.
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py:2130: ResourceWarning: unclosed file <_io.BufferedReader name='Roche/E3MFGYR02_random_10_reads.sff'>
  test.globs.clear()
ok
Bio.SeqFeature docstring test ... ok
Bio.SeqRecord docstring test ... /Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/SeqRecord.py:2: ResourceWarning: unclosed file <_io.TextIOWrapper name='Quality/solexa_faked.fastq' mode='rU' encoding='UTF-8'>
  # Copyright 2002-2004 Brad Chapman.
ok
Bio.SeqUtils docstring test ... ok
Bio.SeqUtils.MeltingTemp docstring test ... ok
Bio.Sequencing.Applications._Novoalign docstring test ... ok
Bio.Wise docstring test ... ok
Bio.Wise.psw docstring test ... ok
Bio.Statistics.lowess docstring test ... ok
Bio.PDB.Polypeptide docstring test ... ok
Bio.PDB.Selection docstring test ... ok
======================================================================
ERROR: test_read_from_url (test_Entrez_online.EntrezOnlineCase)
Test Entrez.read from URL
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_Entrez_online.py", line 44, in test_read_from_url
    rec = Entrez.read(einfo)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/__init__.py", line 367, in read
    record = handler.read(handle)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 184, in read
    self.parser.ParseFile(handle)
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/Entrez/Parser.py", line 322, in endElementHandler
    raise RuntimeError(value)
RuntimeError: Unable to open connection to #DbInfo?dbaf=

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq-sanger_Quality_example_fastq_bgz_simple (test_SeqIO_index.IndexDictTests)
Index fastq-sanger file Quality/example.fastq.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_fastq_Quality_example_fastq_bgz_simple (test_SeqIO_index.IndexDictTests)
Index fastq file Quality/example.fastq.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_get_raw (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 441, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 281, in get_raw_check
    raw_file = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_keyf (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz with key function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 432, in <lambda>
    f = lambda x : x.key_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 171, in key_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_genbank_GenBank_cor6_6_gb_bgz_simple (test_SeqIO_index.IndexDictTests)
Index genbank file GenBank/cor6_6.gb.bgz defaults
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_SeqIO_index.py", line 423, in <lambda>
    f = lambda x : x.simple_check(fn, fmt, alpha, c)
  File "./test_SeqIO_index.py", line 109, in simple_check
    h = gzip_open(filename, format)
  File "./test_SeqIO_index.py", line 49, in gzip_open
    data = handle.read()  # bytes!
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_bam_ex1 (test_bgzf.BgzfTests)
Reproduce BGZF compression for BAM file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 288, in test_bam_ex1
    self.rewrite("SamBam/ex1.bam", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_cor6 (test_bgzf.BgzfTests)
Reproduce BGZF compression for cor6_6.gb GenBank file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 313, in test_example_cor6
    self.rewrite("GenBank/cor6_6.gb.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_fastq (test_bgzf.BgzfTests)
Reproduce BGZF compression for a FASTQ file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 301, in test_example_fastq
    self.rewrite("Quality/example.fastq.gz", temp_file)
  File "./test_bgzf.py", line 45, in rewrite
    new_data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_gb (test_bgzf.BgzfTests)
Reproduce BGZF compression for NC_000932 GenBank file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 307, in test_example_gb
    self.rewrite("GenBank/NC_000932.gb.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_example_wnts_xml (test_bgzf.BgzfTests)
Reproduce BGZF compression for wnts.xml BLAST file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 319, in test_example_wnts_xml
    self.rewrite("Blast/wnts.xml.bgz", temp_file)
  File "./test_bgzf.py", line 34, in rewrite
    data = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_iter_bam_ex1 (test_bgzf.BgzfTests)
Check iteration over SamBam/ex1.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 296, in test_iter_bam_ex1
    self.check_by_char("SamBam/ex1.bam", "SamBam/ex1.bam", True)
  File "./test_bgzf.py", line 112, in check_by_char
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1 (test_bgzf.BgzfTests)
Check random access to SamBam/ex1.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 232, in test_random_bam_ex1
    self.check_random("SamBam/ex1.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1_header (test_bgzf.BgzfTests)
Check random access to SamBam/ex1_header.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 240, in test_random_bam_ex1_header
    self.check_random("SamBam/ex1_header.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_bam_ex1_refresh (test_bgzf.BgzfTests)
Check random access to SamBam/ex1_refresh.bam
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 236, in test_random_bam_ex1_refresh
    self.check_random("SamBam/ex1_refresh.bam")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_example_cor6 (test_bgzf.BgzfTests)
Check random access to GenBank/cor6_6.gb.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 252, in test_random_example_cor6
    self.check_random("GenBank/cor6_6.gb.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_example_fastq (test_bgzf.BgzfTests)
Check random access to Quality/example.fastq.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 248, in test_random_example_fastq
    self.check_random("Quality/example.fastq.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
ERROR: test_random_wnts_xml (test_bgzf.BgzfTests)
Check random access to Blast/wnts.xml.bgz
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_bgzf.py", line 244, in test_random_wnts_xml
    self.check_random("Blast/wnts.xml.bgz")
  File "./test_bgzf.py", line 145, in check_random
    old = h.read()
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'

======================================================================
FAIL: bgzf (Bio)
Doctest: Bio.bgzf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 2154, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for Bio.bgzf
  File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 6, in bgzf

----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 126, in Bio.bgzf
Failed example:
    line = handle.readline()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[10]>", line 1, in <module>
        line = handle.readline()
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline
        c = self.read(readsize)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 305, in _read_gzip_header
        self._read_exact(struct.unpack("<H", self._read_exact(2)))
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 282, in _read_exact
        data = self.fileobj.read(n)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 81, in read
        return self.file.read(size)
    TypeError: integer argument expected, got 'tuple'
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 127, in Bio.bgzf
Failed example:
    assert 80 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[11]>", line 1, in <module>
        assert 80 == handle.tell()
    AssertionError
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 128, in Bio.bgzf
Failed example:
    line = handle.readline()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[12]>", line 1, in <module>
        line = handle.readline()
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 593, in readline
        c = self.read(readsize)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header
        raise IOError('Not a gzipped file')
    OSError: Not a gzipped file
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 129, in Bio.bgzf
Failed example:
    assert 143 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[13]>", line 1, in <module>
        assert 143 == handle.tell()
    AssertionError
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 130, in Bio.bgzf
Failed example:
    data = handle.read(70000)
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[14]>", line 1, in <module>
        data = handle.read(70000)
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 364, in read
        if not self._read(readsize):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 432, in _read
        if not self._read_gzip_header():
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/gzip.py", line 297, in _read_gzip_header
        raise IOError('Not a gzipped file')
    OSError: Not a gzipped file
----------------------------------------------------------------------
File "/Users/NicojoAir11/Downloads/biopython/build/py3.3/build/lib.macosx-10.6-intel-3.3/Bio/bgzf.py", line 131, in Bio.bgzf
Failed example:
    assert 70143 == handle.tell()
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 1287, in __run
        compileflags, 1), test.globs)
      File "<doctest Bio.bgzf[15]>", line 1, in <module>
        assert 70143 == handle.tell()
    AssertionError


----------------------------------------------------------------------
Ran 217 tests in 238.221 seconds

FAILED (failures = 4)

From p.j.a.cock at googlemail.com  Sat Apr  6 18:19:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 6 Apr 2013 19:19:43 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
Message-ID: <CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>

On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hello everyone,
>
> I'm having a problem installing biopython with Python 3.3.1rc1...
> Basically, I get several tests failing (in addition to a lot of warnings).
>
> I don't think the failed tests will be a problem for my work, however, I
> thought you'd want to have a look... Attached is the output of python3
> setup.py test.
>
> Also, if you think I shouldn't use biopython without having these failed
> tests fixed first, please let me know!
>
> Best regards,
> Nicolas

Hi Nicolas,

You should be OK installing this - all the test failures are
within Bio.bgzf which is curious, but you probably won't be
using BGZF compressed files.

We do have buildslaves testing on Python 3.3.0 where this
does not happen, so perhaps this is a new failure from a
change in Python 3.3.1rc1 - hopefully I'll be able to confirm
that by updating one of the buildslaves.

Thanks for the alert,

Peter


From markbudde at gmail.com  Sun Apr  7 00:36:10 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sat, 6 Apr 2013 17:36:10 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
Message-ID: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>

Hi - I have a question about sticky ends in Biopython. Specifically, is
there any way to  maintain sticky end information? Having read the
restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
I suspect that the answer is no. It seems that the cut sites are only
maintained for the top strand. So I am planning on adding this data in my
program (although I will need to read up on classes).

However, this requires that I can get the cut site information. The only
way that I can find to extract this information is from the
Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
use this information to determine the cut sites, but I expect that there is
a more direct way, since the elucidate() function must be generating this
from some attribute.

FYI, I am curious about this because I want to simulate GoldenGate cloning
in Biopython.

Thanks,
Mark Budde


From markbudde at gmail.com  Sun Apr  7 01:11:36 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sat, 6 Apr 2013 18:11:36 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
Message-ID: <CAEwaGEuYHg4M+4H+9CLoMVRUsg2d2AA7pOiwtNvXT496ZUy55Q@mail.gmail.com>

Hi - I have a question about sticky ends in Biopython. Specifically, is
there any way to  maintain sticky end information? Having read the
restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
I suspect that the answer is no. It seems that the cut sites are only
maintained for the top strand. So I am planning on adding this data in my
program (although I will need to read up on classes).

However, this requires that I can get the cut site information. The only
way that I can find to extract this information is from the
Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
use this information to determine the cut sites, but I expect that there is
a more direct way, since the elucidate() function must be generating this
from some attribute.

FYI, I am curious about this because I want to simulate GoldenGate cloning
in Biopython.

Thanks,
Mark Budde


From nicolas.joannin at gmail.com  Sun Apr  7 03:12:54 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Sun, 7 Apr 2013 12:12:54 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
Message-ID: <CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>

Hi Peter,

Thanks for the quick reply!
Indeed, I don't think it is a big issue for me, and I have also not had any
problems with Python 3.3.0 on another machine.
So, yes, it probably is linked to the Python 3.3.1rc1...

However, I should point out that it is not only the Bio.bgzf that fails
testing.
There are also test_Entrez_online and test_SeqIO_index that are indicated
as "FAIL" (both of which I do not directly use).

Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Sun, Apr 7, 2013 at 3:19 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Apr 6, 2013 at 4:31 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hello everyone,
> >
> > I'm having a problem installing biopython with Python 3.3.1rc1...
> > Basically, I get several tests failing (in addition to a lot of
> warnings).
> >
> > I don't think the failed tests will be a problem for my work, however, I
> > thought you'd want to have a look... Attached is the output of python3
> > setup.py test.
> >
> > Also, if you think I shouldn't use biopython without having these failed
> > tests fixed first, please let me know!
> >
> > Best regards,
> > Nicolas
>
> Hi Nicolas,
>
> You should be OK installing this - all the test failures are
> within Bio.bgzf which is curious, but you probably won't be
> using BGZF compressed files.
>
> We do have buildslaves testing on Python 3.3.0 where this
> does not happen, so perhaps this is a new failure from a
> change in Python 3.3.1rc1 - hopefully I'll be able to confirm
> that by updating one of the buildslaves.
>
> Thanks for the alert,
>
> Peter
>


From p.j.a.cock at googlemail.com  Sun Apr  7 14:41:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 7 Apr 2013 15:41:33 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
Message-ID: <CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>

On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hi Peter,
>
> Thanks for the quick reply!
> Indeed, I don't think it is a big issue for me, and I have also not had any
> problems with Python 3.3.0 on another machine.
> So, yes, it probably is linked to the Python 3.3.1rc1...

I see that Python 3.3.1 final is out now - might be worth checking
that too, and I'll try to update one of our buildslaves to use this.

> However, I should point out that it is not only the Bio.bgzf that fails
> testing.
> There are also test_Entrez_online and test_SeqIO_index that are indicated as
> "FAIL" (both of which I do not directly use).

The test_SeqIO_index.py failures all looked to be BGZF related too.

I missed the Entrez test, but as an online test that can sometimes
fail intermittently anyway. The chances are on rerunning it'll be fine.

Peter


From bjorn_johansson at bio.uminho.pt  Sun Apr  7 18:05:11 2013
From: bjorn_johansson at bio.uminho.pt (=?ISO-8859-1?Q?Bj=F6rn_Johansson?=)
Date: Sun, 7 Apr 2013 19:05:11 +0100
Subject: [Biopython] sticky ends in Biopython
Message-ID: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>

>
> Message: 2
> Date: Sat, 6 Apr 2013 17:36:10 -0700
> From: Mark Budde <markbudde at gmail.com>
> Subject: [Biopython] Restriction enzymes and sticky ends
> To: biopython <biopython at lists.open-bio.org>
> Message-ID:
>         <
> CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi - I have a question about sticky ends in Biopython. Specifically, is
> there any way to  maintain sticky end information? Having read the
> restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html
> ),
> I suspect that the answer is no. It seems that the cut sites are only
> maintained for the top strand. So I am planning on adding this data in my
> program (although I will need to read up on classes).
>
> However, this requires that I can get the cut site information. The only
> way that I can find to extract this information is from the
> Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
> use this information to determine the cut sites, but I expect that there is
> a more direct way, since the elucidate() function must be generating this
> from some attribute.
>
> FYI, I am curious about this because I want to simulate GoldenGate cloning
> in Biopython.
>
> Thanks,
> Mark Budde
>
>
> ------------------------------
>

Hi Mark,

Check out Python-dna that have classes for dealing with
double stranded DNA. This package depends on Biopython and a couple of
additional modules.

Disclaimer: I am the developer of Python-dna

Python-dna at pypi https://pypi.python.org/pypi/python-dna/
Source code        https://code.google.com/p/pydna/
Documentation      http://python-dna.readthedocs.org/
Discussion group
https://groups.google.com/forum/?fromgroups#!forum/python-dna

/ Bjorn Johansson


-- 
______O_________oO________oO______o_______oO__
Bj?rn Johansson
Assistant Professor
Departament of Biology
University of Minho
Campus de Gualtar
4710-057 Braga
PORTUGAL
www.bio.uminho.pt
Google profile <https://profiles.google.com/bjornjobb>
Google Scholar Profile<http://scholar.google.com/citations?user=7AiEuJ4AAAAJ>
my group <https://sites.google.com/site/metabolicengineeringgroup/>
Office (direct) +351-253 601517 | (PT) mob.  +351-967 147 704 | (SWE) mob.
 +46 739 792 968
Dept of Biology (secr) +351-253 60 4310  | fax +351-253 678980


From markbudde at gmail.com  Sun Apr  7 18:48:16 2013
From: markbudde at gmail.com (Mark Budde)
Date: Sun, 7 Apr 2013 11:48:16 -0700
Subject: [Biopython] sticky ends in Biopython
In-Reply-To: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>
References: <CAG_4V=ZOODZ5KMqm=s_Kr=5JxSVHKHxm8ozwTMKToMqBp8LkLw@mail.gmail.com>
Message-ID: <CAEwaGEsBR7D9pBo=3HLF1tkRiyXv5qq=uusCv3qsj2kupYiXXg@mail.gmail.com>

OK, that looks useful. Thanks.
-Mark


On Sun, Apr 7, 2013 at 11:05 AM, Bj?rn Johansson <
bjorn_johansson at bio.uminho.pt> wrote:

> >
> > Message: 2
> > Date: Sat, 6 Apr 2013 17:36:10 -0700
> > From: Mark Budde <markbudde at gmail.com>
> > Subject: [Biopython] Restriction enzymes and sticky ends
> > To: biopython <biopython at lists.open-bio.org>
> > Message-ID:
> >         <
> > CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi - I have a question about sticky ends in Biopython. Specifically, is
> > there any way to  maintain sticky end information? Having read the
> > restriction doc (
> http://biopython.org/DIST/docs/cookbook/Restriction.html
> > ),
> > I suspect that the answer is no. It seems that the cut sites are only
> > maintained for the top strand. So I am planning on adding this data in my
> > program (although I will need to read up on classes).
> >
> > However, this requires that I can get the cut site information. The only
> > way that I can find to extract this information is from the
> > Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I
> can
> > use this information to determine the cut sites, but I expect that there
> is
> > a more direct way, since the elucidate() function must be generating this
> > from some attribute.
> >
> > FYI, I am curious about this because I want to simulate GoldenGate
> cloning
> > in Biopython.
> >
> > Thanks,
> > Mark Budde
> >
> >
> > ------------------------------
> >
>
> Hi Mark,
>
> Check out Python-dna that have classes for dealing with
> double stranded DNA. This package depends on Biopython and a couple of
> additional modules.
>
> Disclaimer: I am the developer of Python-dna
>
> Python-dna at pypi https://pypi.python.org/pypi/python-dna/
> Source code        https://code.google.com/p/pydna/
> Documentation      http://python-dna.readthedocs.org/
> Discussion group
> https://groups.google.com/forum/?fromgroups#!forum/python-dna
>
> / Bjorn Johansson
>
>
>
> --
> ______O_________oO________oO______o_______oO__
> Bj?rn Johansson
> Assistant Professor
> Departament of Biology
> University of Minho
> Campus de Gualtar
> 4710-057 Braga
> PORTUGAL
> www.bio.uminho.pt
> Google profile <https://profiles.google.com/bjornjobb>
> Google Scholar Profile<
> http://scholar.google.com/citations?user=7AiEuJ4AAAAJ>
> my group <https://sites.google.com/site/metabolicengineeringgroup/>
> Office (direct) +351-253 601517 | (PT) mob.  +351-967 147 704 | (SWE) mob.
>  +46 739 792 968
> Dept of Biology (secr) +351-253 60 4310  | fax +351-253 678980
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Sun Apr  7 19:52:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 7 Apr 2013 20:52:13 +0100
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
Message-ID: <CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>

On Sun, Apr 7, 2013 at 1:36 AM, Mark Budde <markbudde at gmail.com> wrote:
> Hi - I have a question about sticky ends in Biopython. Specifically, is
> there any way to  maintain sticky end information? Having read the
> restriction doc (http://biopython.org/DIST/docs/cookbook/Restriction.html),
> I suspect that the answer is no. It seems that the cut sites are only
> maintained for the top strand. So I am planning on adding this data in my
> program (although I will need to read up on classes).
>
> However, this requires that I can get the cut site information. The only
> way that I can find to extract this information is from the
> Restriction.Enzyme.elucidate(), which gives the cut site as NN^NN_NN. I can
> use this information to determine the cut sites, but I expect that there is
> a more direct way, since the elucidate() function must be generating this
> from some attribute.
>
> FYI, I am curious about this because I want to simulate GoldenGate cloning
> in Biopython.
>
> Thanks,
> Mark Budde

Hi Mark,

Good question. Sadly help(EcoRI) doesn't tell you very much,
does it? The whole Restriction module could benefit from a
new maintainer and/or a rewrite (for one thing, it unfortunately
did not follow Python counting in some aspects).

Two tips: first dir(object) gives a list of the attributes and methods
of an object in Python. Second, you can look at the source of the
elucidate method to see where it gets the information you're
looking for ;)  [A last resort perhaps - but when documentation
has let you down, worth knowing how to explore.]

https://github.com/biopython/biopython/blob/master/Bio/Restriction/Restriction.py

Here EcoRI is a 5' overhanging digest enzyme, and the values
you need are EcoRI.fst5 (here 1) and EcoRI.fst3 (here -1)
which are relative to the recognition site (here GAATTC).
e.g.

Overhang type methods include:

>>> from Bio.Restriction import EcoRI
>>> EcoRI.overhang()
"5' overhang"
>>> EcoRI.is_blunt()
False
>>> EcoRI.is_5overhang()
True
>>> EcoRI.is_3overhang()
False

>>> EcoRI.elucidate()
'G^AATT_C'
>>> EcoRI.fst5
1
>>> EcoRI.fst3
-1
>>> EcoRI.site
'GAATTC'

Notice 'GAATTC'[:1] = 'G', 'GAATTC'[1:-1] = 'AATT' and
'GAATTC'[-1:] = 'C' which gives the elucidated string.

Is that all you needed?

Regards

Peter


From p.j.a.cock at googlemail.com  Mon Apr  8 09:32:00 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 10:32:00 +0100
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
	<CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>
	<CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
Message-ID: <CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>

On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde <markbudde at gmail.com> wrote:
> Thanks for doing some digging on my behalf, Peter. After I posted my email
> last night, I started looking through the Bio.Restriction code myself. You
> response is helpful, I was having trouble seeing how the cut site was
> encoded for each strand. I think Bjorn's python-dna might be a better
> starting place for me than Bio.Restriction, as it already has some of the
> functionality I was looking for.

Fair enough.

> However, to you question, I'm still not quite getting the cut sites. You
> example with EcoRI makes complete sense, but I can't figure out the pattern
> for some other enzymes, such as BsaI, which is why I got confused initially.
> If you repeat that protocol for BsaI, the results don't match up.
>
> In [80]: BsaI.elucidate()
> Out[80]: 'GGTCTCN^NNNN_N'
>
> In [81]: BsaI.fst5
> Out[81]: 7
>
> In [82]: BsaI.fst3
> Out[82]: 5
>
> In [83]: BsaI.site
> Out[83]: 'GGTCTC'
>
> Based on this, I would expect that BsaI.fst3 should yield
> "11" but it yields 5.

I think you are counting from the wrong reference point.
Using Python style indexing would only allow cleavage
points within the recognition site to be described.

BsaI is a weird enzyme, and appears to be handled by the
Ambiguous class in Bio/Restriction/Restriction.py - which
says this is for enzymes for which the overhang is variable.

>>> from Bio.Restriction import Bsal
>>> BsaI.is_ambiguous()
True
>>> BsaI.is_defined() # is there a consistent site?
False
>>> BsaI.is_unknown()
False
>>> BsaI.fst5
7
>>> BsaI.fst3
5
>>> BsaI.elucidate()
'GGTCTCN^NNNN_N'

This subclass has a more complicated elucidate method,
but gives the same string as the REBASE website, so this
is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html

The 5' cut site of 7 clearly means this is downstream of
the 6 bp recognition site. This appears to be counted
from the start (left) of the restriction site.

>From the illustration the 3' cut side is also to the right of
the 5bp recognition site. It appears the number is counted
from the end (right) of the recognition site, where positive
as in BsaI means to the right (after the recognition site)
while negative as in EcoRI means to the left (within the
recognition site).

Peter

P.S. Please remember to CC the mailing list, e.g. reply all.
Unless people say explicitly that they have done this deliberately,
I generally assume taking a public discussion off list is accidental.


From nicolas.joannin at gmail.com  Mon Apr  8 13:21:45 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Mon, 8 Apr 2013 22:21:45 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
Message-ID: <CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>

Hi Peter,

I need to update another machine, so I'll do that with the final version to
see if the problem still exists. Will post back when that's done.
Regarding the Entrez test, indeed, it doesn't fail every time. So no
worries there.

Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Sun, Apr 7, 2013 at 11:41 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 7, 2013 at 4:12 AM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hi Peter,
> >
> > Thanks for the quick reply!
> > Indeed, I don't think it is a big issue for me, and I have also not had
> any
> > problems with Python 3.3.0 on another machine.
> > So, yes, it probably is linked to the Python 3.3.1rc1...
>
> I see that Python 3.3.1 final is out now - might be worth checking
> that too, and I'll try to update one of our buildslaves to use this.
>
> > However, I should point out that it is not only the Bio.bgzf that fails
> > testing.
> > There are also test_Entrez_online and test_SeqIO_index that are
> indicated as
> > "FAIL" (both of which I do not directly use).
>
> The test_SeqIO_index.py failures all looked to be BGZF related too.
>
> I missed the Entrez test, but as an online test that can sometimes
> fail intermittently anyway. The chances are on rerunning it'll be fine.
>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Apr  8 14:05:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 15:05:49 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
Message-ID: <CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>

On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> Hi Peter,
>
> I need to update another machine, so I'll do that with the final version to
> see if the problem still exists. Will post back when that's done.
> Regarding the Entrez test, indeed, it doesn't fail every time. So no worries
> there.
>
> Cheers,
> Nicolas

I've just installed Python 3.3.1 (final) from source on a 64 bit Linux
machine, and can confirm test failures from the BGZF code (not
failing under Python 3.3.0). I was hoping this would be a glitch in
the release candidate but sadly not.

Thank you again for bringing this to our attention.

Peter


From nicolas.joannin at gmail.com  Mon Apr  8 14:10:07 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Mon, 8 Apr 2013 23:10:07 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
Message-ID: <CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>

OK, I guess that'll be the same whichever platform...
I guess I'll stick with 3.3.0 for the other machine then.
Thanks for the update!

Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Mon, Apr 8, 2013 at 11:05 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Apr 8, 2013 at 2:21 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
> > Hi Peter,
> >
> > I need to update another machine, so I'll do that with the final version
> to
> > see if the problem still exists. Will post back when that's done.
> > Regarding the Entrez test, indeed, it doesn't fail every time. So no
> worries
> > there.
> >
> > Cheers,
> > Nicolas
>
> I've just installed Python 3.3.1 (final) from source on a 64 bit Linux
> machine, and can confirm test failures from the BGZF code (not
> failing under Python 3.3.0). I was hoping this would be a glitch in
> the release candidate but sadly not.
>
> Thank you again for bringing this to our attention.
>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Apr  8 15:23:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 16:23:25 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
Message-ID: <CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>

On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin
<nicolas.joannin at gmail.com> wrote:
> OK, I guess that'll be the same whichever platform...
> I guess I'll stick with 3.3.0 for the other machine then.
> Thanks for the update!
>
> Nicolas

More bad news - what ever was changes I think something
similar was done in Python 2.7.4 as well, which also has
new test failures not seen under Python 2.7.3. Sigh.

Peter


From markbudde at gmail.com  Mon Apr  8 17:25:24 2013
From: markbudde at gmail.com (Mark Budde)
Date: Mon, 8 Apr 2013 10:25:24 -0700
Subject: [Biopython] Restriction enzymes and sticky ends
In-Reply-To: <CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>
References: <CAEwaGEv5pq+N2EfghiQUTjBShkt2mZXLN85kZrTcg_dJoFB86w@mail.gmail.com>
	<CAKVJ-_7ZPPRwfjKe0FPyx3bHsx8iUCGmg1LXTR+PRSAMfX6+Ww@mail.gmail.com>
	<CAEwaGEuuFkKVsMbBTRQ8zixCb9Zijz_2E2hMeYAa6akvw4EZaA@mail.gmail.com>
	<CAKVJ-_7bUzoUwesBy8BtehhiqRq5zQu77-jJEiT717oBR1F0pw@mail.gmail.com>
Message-ID: <CAEwaGEuhSqaZ757DV=LvD00fE-HcRVxpr8mfyvMJ7T5ivhdKXQ@mail.gmail.com>

Thanks Peter, that explains it. BsaI is indeed a weird enzyme, a TypeIIs
restriction enzyme. These enzymes cut a defined distance outside of their
recognition sequence. The utility of these enzymes is that by tagging the
cut sites on the end of your primers, you can generate whatever sticky ends
you desire. Furthermore, because it cuts outside of its recognition
sequence, you can incubate a number of these fragments together with both
restriction enzyme and ligase, and the fragments will assemble into the
final product without subcloning. This is because stciky ends are generated
without the corresponding recognition site, so their ligation is
irreversible. This is called GoldenGate cloning.
-Mark


On Mon, Apr 8, 2013 at 2:32 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde <markbudde at gmail.com> wrote:
> > Thanks for doing some digging on my behalf, Peter. After I posted my
> email
> > last night, I started looking through the Bio.Restriction code myself.
> You
> > response is helpful, I was having trouble seeing how the cut site was
> > encoded for each strand. I think Bjorn's python-dna might be a better
> > starting place for me than Bio.Restriction, as it already has some of the
> > functionality I was looking for.
>
> Fair enough.
>
> > However, to you question, I'm still not quite getting the cut sites. You
> > example with EcoRI makes complete sense, but I can't figure out the
> pattern
> > for some other enzymes, such as BsaI, which is why I got confused
> initially.
> > If you repeat that protocol for BsaI, the results don't match up.
> >
> > In [80]: BsaI.elucidate()
> > Out[80]: 'GGTCTCN^NNNN_N'
> >
> > In [81]: BsaI.fst5
> > Out[81]: 7
> >
> > In [82]: BsaI.fst3
> > Out[82]: 5
> >
> > In [83]: BsaI.site
> > Out[83]: 'GGTCTC'
> >
> > Based on this, I would expect that BsaI.fst3 should yield
> > "11" but it yields 5.
>
> I think you are counting from the wrong reference point.
> Using Python style indexing would only allow cleavage
> points within the recognition site to be described.
>
> BsaI is a weird enzyme, and appears to be handled by the
> Ambiguous class in Bio/Restriction/Restriction.py - which
> says this is for enzymes for which the overhang is variable.
>
> >>> from Bio.Restriction import Bsal
> >>> BsaI.is_ambiguous()
> True
> >>> BsaI.is_defined() # is there a consistent site?
> False
> >>> BsaI.is_unknown()
> False
> >>> BsaI.fst5
> 7
> >>> BsaI.fst3
> 5
> >>> BsaI.elucidate()
> 'GGTCTCN^NNNN_N'
>
> This subclass has a more complicated elucidate method,
> but gives the same string as the REBASE website, so this
> is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html
>
> The 5' cut site of 7 clearly means this is downstream of
> the 6 bp recognition site. This appears to be counted
> from the start (left) of the restriction site.
>
> From the illustration the 3' cut side is also to the right of
> the 5bp recognition site. It appears the number is counted
> from the end (right) of the recognition site, where positive
> as in BsaI means to the right (after the recognition site)
> while negative as in EcoRI means to the left (within the
> recognition site).
>
> Peter
>
> P.S. Please remember to CC the mailing list, e.g. reply all.
> Unless people say explicitly that they have done this deliberately,
> I generally assume taking a public discussion off list is accidental.
>


From p.j.a.cock at googlemail.com  Mon Apr  8 17:55:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Apr 2013 18:55:47 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
Message-ID: <CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>

On Mon, Apr 8, 2013 at 4:23 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Apr 8, 2013 at 3:10 PM, Nicolas Joannin
> <nicolas.joannin at gmail.com> wrote:
>> OK, I guess that'll be the same whichever platform...
>> I guess I'll stick with 3.3.0 for the other machine then.
>> Thanks for the update!
>>
>> Nicolas
>
> More bad news - what ever was changes I think something
> similar was done in Python 2.7.4 as well, which also has
> new test failures not seen under Python 2.7.3. Sigh.
>
> Peter

Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
lot of gzip work done fixing other issues), but on the bright
side the fix is quite trivial to apply manually:
http://bugs.python.org/issue17666

Peter


From p.j.a.cock at googlemail.com  Tue Apr  9 09:39:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Apr 2013 10:39:12 +0100
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
	<CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
Message-ID: <CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>

On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
> lot of gzip work done fixing other issues), but on the bright
> side the fix is quite trivial to apply manually:
> http://bugs.python.org/issue17666
>
> Peter

Just a heads up, this also affects Python 3.2.4 as well.

Peter


From p.j.a.cock at googlemail.com  Tue Apr  9 10:20:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Apr 2013 11:20:43 +0100
Subject: [Biopython] OBF not accepted for GSoC 2013
Message-ID: <CAKVJ-_6-+nyJk+7tp5YOyy7i7N7GtkRswZ_2Vs0uTuqGEV4wWQ@mail.gmail.com>

Dear all,

Unfortunately this year we have not been accepted on the Google
Summer of Code scheme:

I'm sure the rest of the OBF board and the other Bio* developers
will join me in thanking Pjotr Prins for his efforts as the OBF
GSoC administrator co-ordinating our application this year, as
well as last year's administrator Rob Bruels and the other mentors
for their efforts.

For those of you not subscribed to the OBF's GSoC mailing list,
I am forwarding Pjotr's email from last night (also below):
http://lists.open-bio.org/pipermail/gsoc/2013/000211.html

In all 177 organisations were accepted (about the same as the
last few years), and they will be listed here (once they have filled
out their profile information):
https://google-melange.appspot.com/gsoc/accepted_orgs/google/gsoc2013

To potential students this summer, the good news is that some
related organisations have been accepted, such as NESCent,
the National Resource for Network Biology (NRNB - known for
Cytoscape), SciRuby (Ruby Science Foundation), so there is
still some scope for doing a bioinformatics related project in
GSoC 2013, perhaps even with a Bio* developer as a co-mentor.

Thank you all,

Peter
(Biopython developer, OBF board member)

---------- Forwarded message ----------
From: Pjotr Prins <pjotr2010 at thebird.nl>
Date: Mon, Apr 8, 2013 at 9:13 PM
Subject: Re: GSoC 2013 is ON
To: Pjotr Prins <pjotr2010 at thebird.nl>
Cc: ..., OBF GSoC <gsoc at lists.open-bio.org>


Sadly, our application got rejected by GSoC this year. I am not sure
what the reason was, but I am convinced our application was similar to
that of other years. Maybe the project ideas could have been better
presented. I am not sure at this stage. I'll make a list of successful
projects to see if we can digest some truths.

The upside is that FOSS is going strong! And that the field is getting
increasingly competitive. As an open source geezer I can only be
happy, even if it hurts our own application.

Sorry everyone, and many thanks for the trouble you took getting
projects written up. Let's not feel discouraged for next year.

Pj.


From nicolas.joannin at gmail.com  Tue Apr  9 13:47:03 2013
From: nicolas.joannin at gmail.com (Nicolas Joannin)
Date: Tue, 9 Apr 2013 22:47:03 +0900
Subject: [Biopython] Problem installing biopython with Python 3.3.1.rc1
In-Reply-To: <CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>
References: <CAPJVvAwLXCODx36eoKvV7QZYUyvA72zqVLU--6CtDjZ-Cbiqtw@mail.gmail.com>
	<CAKVJ-_76i53AZw20B3mdF70xdjxxkBs_O4zZUE=T3=00fD9V5Q@mail.gmail.com>
	<CAPJVvAyTQy37o3VsvFpFw9vLz1t9OfOgKgxX+gzay6zDiRWx3w@mail.gmail.com>
	<CAKVJ-_6ARgQj4nv=mB9C4L-cN-1cjA0LcgkC2sON=cRnqAyrwg@mail.gmail.com>
	<CAPJVvAwgWKnqZMFWH8+ECZ0_39DAQydi79V-8rzZ-z3zpu7uGQ@mail.gmail.com>
	<CAKVJ-_6egwJi82V6SmLX7es1j4hrZb4xEU7zpwWUQ6pkYqA+=w@mail.gmail.com>
	<CAPJVvAyNwc3dZ869PPyC4TKPzh8RmnvUY=puLH2qThddeD1tWw@mail.gmail.com>
	<CAKVJ-_4GZtnXo2u+M4EA6A57hppYXeq5RGyjJNj-Vw3sXd2e9g@mail.gmail.com>
	<CAKVJ-_4rAWanDXhU14gZsfpAEZvJa1ABEoTCnEidWAp_P9AZfg@mail.gmail.com>
	<CAKVJ-_69vVwn-UMYm4OQ4dM5yTJ-R7JhdDnVBZRxEaUOHpzdRg@mail.gmail.com>
Message-ID: <CAPJVvAxzSMWLvnmnZv76A6VY4c3wk7naCnWR8CdZJO50DrC09Q@mail.gmail.com>

Thanks for the fix!
Cheers,
Nicolas


Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


On Tue, Apr 9, 2013 at 6:39 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Apr 8, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >
> > Solved - this is bug in Python 2.7.4 and 3.3.1 (which had a
> > lot of gzip work done fixing other issues), but on the bright
> > side the fix is quite trivial to apply manually:
> > http://bugs.python.org/issue17666
> >
> > Peter
>
> Just a heads up, this also affects Python 3.2.4 as well.
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From matthiasschade.de at googlemail.com  Thu Apr 11 09:20:31 2013
From: matthiasschade.de at googlemail.com (Matthias Schade)
Date: Thu, 11 Apr 2013 11:20:31 +0200
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
Message-ID: <5166805F.8060603@googlemail.com>

Hello everyone,

is there an upper limit to how many sequences I can query via 
NCBIWWW.qblast at once?

Sending up to 150 sequences each of 24mer length in a single string 
everything works fine. But now, I have tried the same for a string 
containing about 900 sequences. On good times, it takes the NCBI-server 
about 5min to send an answer. I save the answer and later open and parse 
the file by other functions in my code. However, even though I have 
queried the same 900 sequences, the resulting output-file varies in 
length (10 MB<x<20MB) and always at least misses the correct 
termination-tag in "<\BlastOutput>" or even misses more (this does not 
happen why querying 150 sequences or less).

I would guess once the server has started sending its answers, there 
might only be a limited time NCBIWWW.qblast waits for follow up packets 
... and thus depending on the current server-load, the 
NCBIWWW.qblast-function simply decides to terminate waiting for 
incomming data after some time, resulting in my blast-output-files to 
vary in length. Could anyone correct or verify this long-fetched hypothesis?

My core-lines are:

orgn='Mus Musculus' #on anything else
result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100, 
entrez_query=str(orgn+"[orgn]"))
save_file = open ('myblast_result.xml',"w")
save_file.write(result.read())

Best regards,
Matthias


From p.j.a.cock at googlemail.com  Thu Apr 11 09:43:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 10:43:44 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <5166805F.8060603@googlemail.com>
References: <5166805F.8060603@googlemail.com>
Message-ID: <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>

On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
<matthiasschade.de at googlemail.com> wrote:
> Hello everyone,
>
> is there an upper limit to how many sequences I can query via NCBIWWW.qblast
> at once?

There are sometimes limits on the URL length, especially if going via
firewalls and proxies, so that may be one factor.

At the NCBI end, I'm not sure what limits they impose on this:
http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

> Sending up to 150 sequences each of 24mer length in a single string
> everything works fine. But now, I have tried the same for a string
> containing about 900 sequences. On good times, it takes the NCBI-server
> about 5min to send an answer. I save the answer and later open and parse the
> file by other functions in my code. However, even though I have queried the
> same 900 sequences, the resulting output-file varies in length (10
> MB<x<20MB) and always at least misses the correct termination-tag in
> "<\BlastOutput>" or even misses more (this does not happen why querying 150
> sequences or less).
>
> I would guess once the server has started sending its answers, there might
> only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> thus depending on the current server-load, the NCBIWWW.qblast-function
> simply decides to terminate waiting for incomming data after some time,
> resulting in my blast-output-files to vary in length. Could anyone correct
> or verify this long-fetched hypothesis?
>
> My core-lines are:
>
> orgn='Mus Musculus' #on anything else
> result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> entrez_query=str(orgn+"[orgn]"))
> save_file = open ('myblast_result.xml',"w")
> save_file.write(result.read())
>
> Best regards,
> Matthias

I think you've reach the scale where it would be better to run blastn
locally - ideally on a cluster if you have access to one. You can
download the whole NT database from here - most departments
running BLAST with their own Linux servers will have a central copy
which is kept automatically up to date:
ftp://ftp.ncbi.nlm.nih.gov/blast/db/

If you don't have those kinds of resources, then you can even
run BLAST on your own Windows machine - although I'm not
sure how much RAM would be recommended for the NT
database which is pretty big.

Regards,

Peter


From ericmajinglong at gmail.com  Thu Apr 11 16:49:27 2013
From: ericmajinglong at gmail.com (Eric Ma)
Date: Thu, 11 Apr 2013 12:49:27 -0400
Subject: [Biopython] Request from help
Message-ID: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>

Hello everybody,

I'm new to the mailing list here, though I've been playing with BioPython
for quite a while.

I'm having some trouble here. I wanted to display a tree of sequences for
which I had done a multiple sequence alignment. I tried going through the
pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
Because I'm still in the testing phase, instead of writing it as a single
script, I wrote it as a series of scripts that I would execute in order.

The problem I run into is at step 4 in the example, where I "feed the
alignment to PhyML". My data set is 70 protein sequences, and the trouble I
run into is that it takes a very, very long time at the "feeding alignment
to PhyML" step. I tried running the script on my MacBook Pro overnight, and
even the next morning it was not done. Am I missing something here?

Just to be clear here, aligning the sequences using Muscle was successful,
and I also managed to output a distance matrix from sample to sample, which
I used in another downstream pipeline to display the clustering of the
sequences on a 2D euclidean plane. However, I wanted to have a tree
representation to validate the clustering results; the trouble is, I can't
get the _phyml_tree.txt file to be created, which I would then use to draw
the tree.

Thanks in advance for any help!

Cheers,
Eric
-----------------------------------------------------------------------
Please consider the environment before printing this e-mail. Do you really
need to print it?

http://about.me/ericmjl


From jgibbons1 at mail.usf.edu  Thu Apr 11 17:01:19 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Thu, 11 Apr 2013 13:01:19 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>

NCBI Standalone Blast gives you the option of querying the website so that
you don't have to maintain a local database.

Justin Gibbons


On Thu, Apr 11, 2013 at 12:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hello everybody,
>
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
>
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
>
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
>
> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
>
> Thanks in advance for any help!
>
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
>
> http://about.me/ericmjl
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Thu Apr 11 17:07:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 18:07:05 +0100
Subject: [Biopython] Request from help
In-Reply-To: <CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CALaGxMixcphikkuHvyr5B8QhOUXX-jUCsbiB3nvGuOLDsyxYMQ@mail.gmail.com>
Message-ID: <CAKVJ-_43iHLgbDB9mJLiHzqm-JLbt8xR0yMbLZFgR94cHUnC2w@mail.gmail.com>

On Thu, Apr 11, 2013 at 6:01 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> NCBI Standalone Blast gives you the option of querying the website so that
> you don't have to maintain a local database.
>
> Justin Gibbons

Did you reply to the wrong email? This thread was about alignments and trees.

Peter


From p.j.a.cock at googlemail.com  Thu Apr 11 17:11:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 18:11:49 +0100
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>

On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> Hello everybody,
>
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
>
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
>
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
>
> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
>
> Thanks in advance for any help!
>
> Cheers,
> Eric

Hi Eric,

So this part is getting stuck (or taking a very long time):

#Feed the alignment to PhyML using the command line wrapper:
from Bio.Phylo.Applications import PhymlCommandline
cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
model='WAG', alpha='e', bootstrap=100)
out_log, err_log = cmdline()

At that point is the computer active (high CPU load as measured
via the task manager / system monitor / top / etc)?

I would suggest trying PHYML at the command line by hand, first
check the command the Biopython should be running:

print cmdline

That may give you visual progress on screen. My guess is simply
that this is just slow - you are only running 100 bootstraps, but
perhaps each one is taking a while and that adds up.

You said you had 70 protein sequences - how many columns
are there in the alignment? That can also affect run times.

Peter


From nuin at genedrift.org  Thu Apr 11 17:05:57 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 11 Apr 2013 13:05:57 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
Message-ID: <CEA2A651-7F21-405C-B4D6-DF098E7704EE@genedrift.org>


On 2013-04-11, at 12:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hello everybody,
> 
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
> 
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
> 
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
> 

Hi

With 70 OTUs you have 5.00 E115 possible trees. Guaranteed it will take a long time, independent to what parameters you are using in PhyML. Try with a smaller number of taxa, just for testing purposes and depending on the complexity of your protein phylogeny, give your computer some weeks to actually generate some result.

This is not a BioPython issue, is more a phylogenetics one.

Cheers
Paulo


> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
> 
> Thanks in advance for any help!
> 
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
> 
> http://about.me/ericmjl
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From ericmajinglong at gmail.com  Thu Apr 11 17:20:14 2013
From: ericmajinglong at gmail.com (Eric Ma)
Date: Thu, 11 Apr 2013 13:20:14 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
Message-ID: <CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>

Hi Peter and Paulo,

Thank you for your feedback, much appreciated! I still have very sparse
knowledge about phylogenies, and especially the run times needed to build
the trees, so any new knowledge is appreciated!

The sequences I'm using are full Influenza A HA protein sequences, so we're
talking about 1700-1750 amino acids being aligned together. The multiple
sequence alignment for 70 sequences doesn't take long - on the order of
minutes on my laptop. It's the "feeding into PhyML" portion that, for some
reason, takes a long time.

With that said, I do have a full distance matrix as one of the outputs from
a previous script in this script series, in addition to the multiple
sequence alignment. I have been able to feed the distance matrix into a
separate clustering algorithm from scikit-learn, and I was able to
successfully identify six clusters of sequences in there. Hence, I wanted
to use a phylogenetic tree to confirm what I'm seeing with the clustering
algorithm - it's basically two separate representations of the same data.

I have heard that it is possible to create a tree from the distance matrix,
and I was thinking this might be an alternative to feeding the alignment
into PhyML. Does anybody know how to do this using BioPython?

Cheers,
Eric
-----------------------------------------------------------------------
Please consider the environment before printing this e-mail. Do you really
need to print it?

http://about.me/ericmjl


On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> > Hello everybody,
> >
> > I'm new to the mailing list here, though I've been playing with BioPython
> > for quite a while.
> >
> > I'm having some trouble here. I wanted to display a tree of sequences for
> > which I had done a multiple sequence alignment. I tried going through the
> > pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline
> ).
> > Because I'm still in the testing phase, instead of writing it as a single
> > script, I wrote it as a series of scripts that I would execute in order.
> >
> > The problem I run into is at step 4 in the example, where I "feed the
> > alignment to PhyML". My data set is 70 protein sequences, and the
> trouble I
> > run into is that it takes a very, very long time at the "feeding
> alignment
> > to PhyML" step. I tried running the script on my MacBook Pro overnight,
> and
> > even the next morning it was not done. Am I missing something here?
> >
> > Just to be clear here, aligning the sequences using Muscle was
> successful,
> > and I also managed to output a distance matrix from sample to sample,
> which
> > I used in another downstream pipeline to display the clustering of the
> > sequences on a 2D euclidean plane. However, I wanted to have a tree
> > representation to validate the clustering results; the trouble is, I
> can't
> > get the _phyml_tree.txt file to be created, which I would then use to
> draw
> > the tree.
> >
> > Thanks in advance for any help!
> >
> > Cheers,
> > Eric
>
> Hi Eric,
>
> So this part is getting stuck (or taking a very long time):
>
> #Feed the alignment to PhyML using the command line wrapper:
> from Bio.Phylo.Applications import PhymlCommandline
> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
> model='WAG', alpha='e', bootstrap=100)
> out_log, err_log = cmdline()
>
> At that point is the computer active (high CPU load as measured
> via the task manager / system monitor / top / etc)?
>
> I would suggest trying PHYML at the command line by hand, first
> check the command the Biopython should be running:
>
> print cmdline
>
> That may give you visual progress on screen. My guess is simply
> that this is just slow - you are only running 100 bootstraps, but
> perhaps each one is taking a while and that adds up.
>
> You said you had 70 protein sequences - how many columns
> are there in the alignment? That can also affect run times.
>
> Peter
>


From nuin at genedrift.org  Thu Apr 11 17:33:05 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 11 Apr 2013 13:33:05 -0400
Subject: [Biopython] Request from help
In-Reply-To: <CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>
References: <CAK-i=xh6HDGp+bYjWJ2pgJxk-sLRcWFhP+6MpJvrunNUe-7XoQ@mail.gmail.com>
	<CAKVJ-_6xVNRreg1yZt_XrhF6OBW87gx_8OjLiXgT7BpTLOK9Og@mail.gmail.com>
	<CAK-i=xgpN3OfRMkfB8CguEUNvKEwCQ6gnB_-obPoG5iMSOJ7+Q@mail.gmail.com>
Message-ID: <8176FA21-39F6-405A-B338-94D87E6BB7B3@genedrift.org>


On 2013-04-11, at 1:20 PM, Eric Ma <ericmajinglong at gmail.com> wrote:

> Hi Peter and Paulo,
> 
> Thank you for your feedback, much appreciated! I still have very sparse
> knowledge about phylogenies, and especially the run times needed to build
> the trees, so any new knowledge is appreciated!
> 
> The sequences I'm using are full Influenza A HA protein sequences, so we're
> talking about 1700-1750 amino acids being aligned together. The multiple
> sequence alignment for 70 sequences doesn't take long - on the order of
> minutes on my laptop. It's the "feeding into PhyML" portion that, for some
> reason, takes a long time.


Alignment time is much smaller than any phylogeny calculation on your data size. The number of amino acids is not that important on the final time, as the ML is calculation is quite fast, but arranging the branches is the main bottleneck.

There's no easy solution for this, maybe you can try some other approaches, that won't be as good as ML (Neighbour Joning) and some that might be as good (Bayes) but take some time too.
> 
> With that said, I do have a full distance matrix as one of the outputs from
> a previous script in this script series, in addition to the multiple
> sequence alignment. I have been able to feed the distance matrix into a
> separate clustering algorithm from scikit-learn, and I was able to
> successfully identify six clusters of sequences in there. Hence, I wanted
> to use a phylogenetic tree to confirm what I'm seeing with the clustering
> algorithm - it's basically two separate representations of the same data.
> 

The distance can be used to generate a diagram, I wouldn't call it a phylogenetic tree, but it can give you some ideas. One quick way to check for your tree is to use Neighbour Joining approach, you can try Mega with your alignment file and see, calculations will be faster.

Cheers
Paulo


> I have heard that it is possible to create a tree from the distance matrix,
> and I was thinking this might be an alternative to feeding the alignment
> into PhyML. Does anybody know how to do this using BioPython?
> 
> Cheers,
> Eric
> -----------------------------------------------------------------------
> Please consider the environment before printing this e-mail. Do you really
> need to print it?
> 
> http://about.me/ericmjl
> 
> 
> On Thu, Apr 11, 2013 at 1:11 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> 
>> On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
>>> Hello everybody,
>>> 
>>> I'm new to the mailing list here, though I've been playing with BioPython
>>> for quite a while.
>>> 
>>> I'm having some trouble here. I wanted to display a tree of sequences for
>>> which I had done a multiple sequence alignment. I tried going through the
>>> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline
>> ).
>>> Because I'm still in the testing phase, instead of writing it as a single
>>> script, I wrote it as a series of scripts that I would execute in order.
>>> 
>>> The problem I run into is at step 4 in the example, where I "feed the
>>> alignment to PhyML". My data set is 70 protein sequences, and the
>> trouble I
>>> run into is that it takes a very, very long time at the "feeding
>> alignment
>>> to PhyML" step. I tried running the script on my MacBook Pro overnight,
>> and
>>> even the next morning it was not done. Am I missing something here?
>>> 
>>> Just to be clear here, aligning the sequences using Muscle was
>> successful,
>>> and I also managed to output a distance matrix from sample to sample,
>> which
>>> I used in another downstream pipeline to display the clustering of the
>>> sequences on a 2D euclidean plane. However, I wanted to have a tree
>>> representation to validate the clustering results; the trouble is, I
>> can't
>>> get the _phyml_tree.txt file to be created, which I would then use to
>> draw
>>> the tree.
>>> 
>>> Thanks in advance for any help!
>>> 
>>> Cheers,
>>> Eric
>> 
>> Hi Eric,
>> 
>> So this part is getting stuck (or taking a very long time):
>> 
>> #Feed the alignment to PhyML using the command line wrapper:
>> from Bio.Phylo.Applications import PhymlCommandline
>> cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
>> model='WAG', alpha='e', bootstrap=100)
>> out_log, err_log = cmdline()
>> 
>> At that point is the computer active (high CPU load as measured
>> via the task manager / system monitor / top / etc)?
>> 
>> I would suggest trying PHYML at the command line by hand, first
>> check the command the Biopython should be running:
>> 
>> print cmdline
>> 
>> That may give you visual progress on screen. My guess is simply
>> that this is just slow - you are only running 100 bootstraps, but
>> perhaps each one is taking a while and that adds up.
>> 
>> You said you had 70 protein sequences - how many columns
>> are there in the alignment? That can also affect run times.
>> 
>> Peter
>> 
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From jgibbons1 at mail.usf.edu  Thu Apr 11 18:10:32 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Thu, 11 Apr 2013 14:10:32 -0400
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
References: <5166805F.8060603@googlemail.com>
	<CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
Message-ID: <CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>

NCBI Standalone Blast gives you the option of querying the website so that
you don't have to maintain a local database.

Justin Gibbons

P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it
correct this time.


On Thu, Apr 11, 2013 at 5:43 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
> <matthiasschade.de at googlemail.com> wrote:
> > Hello everyone,
> >
> > is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast
> > at once?
>
> There are sometimes limits on the URL length, especially if going via
> firewalls and proxies, so that may be one factor.
>
> At the NCBI end, I'm not sure what limits they impose on this:
> http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
> > Sending up to 150 sequences each of 24mer length in a single string
> > everything works fine. But now, I have tried the same for a string
> > containing about 900 sequences. On good times, it takes the NCBI-server
> > about 5min to send an answer. I save the answer and later open and parse
> the
> > file by other functions in my code. However, even though I have queried
> the
> > same 900 sequences, the resulting output-file varies in length (10
> > MB<x<20MB) and always at least misses the correct termination-tag in
> > "<\BlastOutput>" or even misses more (this does not happen why querying
> 150
> > sequences or less).
> >
> > I would guess once the server has started sending its answers, there
> might
> > only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> > thus depending on the current server-load, the NCBIWWW.qblast-function
> > simply decides to terminate waiting for incomming data after some time,
> > resulting in my blast-output-files to vary in length. Could anyone
> correct
> > or verify this long-fetched hypothesis?
> >
> > My core-lines are:
> >
> > orgn='Mus Musculus' #on anything else
> > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> > entrez_query=str(orgn+"[orgn]"))
> > save_file = open ('myblast_result.xml',"w")
> > save_file.write(result.read())
> >
> > Best regards,
> > Matthias
>
> I think you've reach the scale where it would be better to run blastn
> locally - ideally on a cluster if you have access to one. You can
> download the whole NT database from here - most departments
> running BLAST with their own Linux servers will have a central copy
> which is kept automatically up to date:
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/
>
> If you don't have those kinds of resources, then you can even
> run BLAST on your own Windows machine - although I'm not
> sure how much RAM would be recommended for the NT
> database which is pretty big.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From p.j.a.cock at googlemail.com  Thu Apr 11 18:54:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Apr 2013 19:54:50 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>
References: <5166805F.8060603@googlemail.com>
	<CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=ZYEg@mail.gmail.com>
	<CALaGxMjGCOAinAixo5q5UWxQ-nfCNe76q5dLR2Gpca=3Q0ihLQ@mail.gmail.com>
Message-ID: <CAKVJ-_4p55n9PCJOs4mv=pNniDrYXc0GUtwM_HG-7QZzxFNnFg@mail.gmail.com>

On Thursday, April 11, 2013, Justin Gibbons wrote:

> NCBI Standalone Blast gives you the option of querying the website so that
> you don't have to maintain a local database.


Good point - the BLAST+ binaries added the -remote option
which does that. Worth exploring as it should know and
obey the NCBI limits automatically.


>
> Justin Gibbons
>
> P.S. Yes Peter, I did respond to the wrong email. Hopefully, I got it
> correct this time.
>
>
Easily done, don't worry about it.

Peter


From dan837446 at gmail.com  Thu Apr 11 20:51:13 2013
From: dan837446 at gmail.com (Dan)
Date: Fri, 12 Apr 2013 08:51:13 +1200
Subject: [Biopython] Biopython Digest, Vol 124, Issue 9
In-Reply-To: <mailman.3.1365696001.2331.biopython@lists.open-bio.org>
References: <mailman.3.1365696001.2331.biopython@lists.open-bio.org>
Message-ID: <CAExy72jLbfaFiLAqDOOPYTQg6g14f+i9x7css_=ojtPwgy_grw@mail.gmail.com>

This is peripherally relevant to the question, I asked Tao Tao of NCBI user
services about general guidelines for remote blast, and got this response:

"In general, the key is to reduce the hits to BLAST server:
At the search step, DO NOT submit searches that contain only single
sequence! You need to batch the query and submit a set in a single search
request.
At the result polling step, you should reduce the result checking by
spacing them out, and start checking for results after a delay (a few
minutes).
The XML result for batch queries is a bit peculiar each query is wrapped
around  <Iteration> tag
You are better off leaving the other conditions default and post-process it
to get the top hits"

Also it's best to search between 9PM and 5AM Eastern Standard time and at
weekends.
Personally I seem to encounter glitches using batches above 100 but it's so
specific to your particular workplace that I'm not sure if that's a good
guideline.


On Fri, Apr 12, 2013 at 4:00 AM, <biopython-request at lists.open-bio.org>wrote:

> Send Biopython mailing list submissions to
>         biopython at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.open-bio.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
>         biopython-request at lists.open-bio.org
>
> You can reach the person managing the list at
>         biopython-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython digest..."
>
>
> Today's Topics:
>
>    1. query upper limit for NCBIWWW.qblast? (Matthias Schade)
>    2. Re: query upper limit for NCBIWWW.qblast? (Peter Cock)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 11 Apr 2013 11:20:31 +0200
> From: Matthias Schade <matthiasschade.de at googlemail.com>
> Subject: [Biopython] query upper limit for NCBIWWW.qblast?
> To: biopython at lists.open-bio.org
> Message-ID: <5166805F.8060603 at googlemail.com>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> Hello everyone,
>
> is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast at once?
>
> Sending up to 150 sequences each of 24mer length in a single string
> everything works fine. But now, I have tried the same for a string
> containing about 900 sequences. On good times, it takes the NCBI-server
> about 5min to send an answer. I save the answer and later open and parse
> the file by other functions in my code. However, even though I have
> queried the same 900 sequences, the resulting output-file varies in
> length (10 MB<x<20MB) and always at least misses the correct
> termination-tag in "<\BlastOutput>" or even misses more (this does not
> happen why querying 150 sequences or less).
>
> I would guess once the server has started sending its answers, there
> might only be a limited time NCBIWWW.qblast waits for follow up packets
> ... and thus depending on the current server-load, the
> NCBIWWW.qblast-function simply decides to terminate waiting for
> incomming data after some time, resulting in my blast-output-files to
> vary in length. Could anyone correct or verify this long-fetched
> hypothesis?
>
> My core-lines are:
>
> orgn='Mus Musculus' #on anything else
> result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> entrez_query=str(orgn+"[orgn]"))
> save_file = open ('myblast_result.xml',"w")
> save_file.write(result.read())
>
> Best regards,
> Matthias
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 11 Apr 2013 10:43:44 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython] query upper limit for NCBIWWW.qblast?
> To: Matthias Schade <matthiasschade.de at googlemail.com>
> Cc: biopython at lists.open-bio.org
> Message-ID:
>         <CAKVJ-_6y_q8e=EV5+1vCCeRY5c8z-brOsyHWW960dG0bX=
> ZYEg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Apr 11, 2013 at 10:20 AM, Matthias Schade
> <matthiasschade.de at googlemail.com> wrote:
> > Hello everyone,
> >
> > is there an upper limit to how many sequences I can query via
> NCBIWWW.qblast
> > at once?
>
> There are sometimes limits on the URL length, especially if going via
> firewalls and proxies, so that may be one factor.
>
> At the NCBI end, I'm not sure what limits they impose on this:
> http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
>
> > Sending up to 150 sequences each of 24mer length in a single string
> > everything works fine. But now, I have tried the same for a string
> > containing about 900 sequences. On good times, it takes the NCBI-server
> > about 5min to send an answer. I save the answer and later open and parse
> the
> > file by other functions in my code. However, even though I have queried
> the
> > same 900 sequences, the resulting output-file varies in length (10
> > MB<x<20MB) and always at least misses the correct termination-tag in
> > "<\BlastOutput>" or even misses more (this does not happen why querying
> 150
> > sequences or less).
> >
> > I would guess once the server has started sending its answers, there
> might
> > only be a limited time NCBIWWW.qblast waits for follow up packets ... and
> > thus depending on the current server-load, the NCBIWWW.qblast-function
> > simply decides to terminate waiting for incomming data after some time,
> > resulting in my blast-output-files to vary in length. Could anyone
> correct
> > or verify this long-fetched hypothesis?
> >
> > My core-lines are:
> >
> > orgn='Mus Musculus' #on anything else
> > result = NCBIWWW.qblast("blastn", "nt", fasta_seq_string, expect=100,
> > entrez_query=str(orgn+"[orgn]"))
> > save_file = open ('myblast_result.xml',"w")
> > save_file.write(result.read())
> >
> > Best regards,
> > Matthias
>
> I think you've reach the scale where it would be better to run blastn
> locally - ideally on a cluster if you have access to one. You can
> download the whole NT database from here - most departments
> running BLAST with their own Linux servers will have a central copy
> which is kept automatically up to date:
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/
>
> If you don't have those kinds of resources, then you can even
> run BLAST on your own Windows machine - although I'm not
> sure how much RAM would be recommended for the NT
> database which is pretty big.
>
> Regards,
>
> Peter
>
>
> ------------------------------
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> End of Biopython Digest, Vol 124, Issue 9
> *****************************************
>


From p.j.a.cock at googlemail.com  Fri Apr 12 09:49:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Apr 2013 10:49:31 +0100
Subject: [Biopython] query upper limit for NCBIWWW.qblast?
In-Reply-To: <5166805F.8060603@googlemail.com>
References: <5166805F.8060603@googlemail.com>
Message-ID: <CAKVJ-_4yYoXoV5X2T_7MHL2PjLZnx0-sHrHMcpfCZXNsXzDDWw@mail.gmail.com>

Dan replied via the digest (summary emails rather than individual emails) here:
http://lists.open-bio.org/pipermail/biopython/2013-April/008507.html

On Thu, Apr 11, 2013 at 9:51 PM, Dan <dan837446 at gmail.com> wrote:
> This is peripherally relevant to the question, I asked Tao Tao of NCBI user
> services about general guidelines for remote blast, and got this response:
>
> "In general, the key is to reduce the hits to BLAST server:
> At the search step, DO NOT submit searches that contain only single
> sequence! You need to batch the query and submit a set in a single search
> request.
> At the result polling step, you should reduce the result checking by
> spacing them out, and start checking for results after a delay (a few
> minutes).
> The XML result for batch queries is a bit peculiar each query is wrapped
> around  <Iteration> tag
> You are better off leaving the other conditions default and post-process it
> to get the top hits"
>
> Also it's best to search between 9PM and 5AM Eastern Standard time and at
> weekends.
> Personally I seem to encounter glitches using batches above 100 but it's so
> specific to your particular workplace that I'm not sure if that's a good
> guideline.
>

Perhaps Biopython's QBLAST wrapper could benefit from adaptive
time delays in the polling step - at the moment it just checks every
three seconds.

Peter


From john at picloud.com  Fri Apr 12 23:11:43 2013
From: john at picloud.com (John Riley)
Date: Fri, 12 Apr 2013 16:11:43 -0700
Subject: [Biopython] BioPython now available on PiCloud by default
Message-ID: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>

Hello,

We've had some requests for BioPython to be deployed on PiCloud [1]. While
any user could always create a custom environment, and install the latest
version themselves [2], we've decided to address the issue directly by
adding BioPython (1.60) into the default suite of scientific tools on
PiCloud.

In short, to offload a Python function or program that uses BioPython, you
don't need to do any setup! The instructions for using other scientific
tools work just the same [3]. Hope this helps!

[1] http://www.picloud.com
[2] http://docs.picloud.com/environment.html
[3] http://docs.picloud.com/howto/pyscientifictools.html

Best Regards,
John

--
John Riley
PiCloud, Inc.


From jgibbons1 at mail.usf.edu  Sat Apr 13 20:13:56 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sat, 13 Apr 2013 16:13:56 -0400
Subject: [Biopython] Cookbook suggestion
Message-ID: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>

I want to add the following to the cookbook but I am unable to create an
account.

#using SeqIO.write() without holding records in memory.

from Bio import SeqIO


seq_ids=set() #create an empty set to hold the sequence IDs.
indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence
ID but is not held in memory

for seq_record in SeqIO.parse(file_path, 'fasta'):
    #Filter according to some critria:
        seq_ids.add(seq_record.id)

#write the fasta records to a new file using SeqIO.write()

SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
'fasta')

So if someone who can edit the cookbook wants to add it feel free to.

Justin Gibbons


From p.j.a.cock at googlemail.com  Sat Apr 13 20:27:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 13 Apr 2013 21:27:24 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
Message-ID: <CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>

Hi Justin,

On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> I want to add the following to the cookbook but I am unable to create an
> account.

Hmm - we should fix that. Is there a specific error message
from the wiki?

> #using SeqIO.write() without holding records in memory.
>
> from Bio import SeqIO
>
>
> seq_ids=set() #create an empty set to hold the sequence IDs.
> indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by sequence
> ID but is not held in memory
>
> for seq_record in SeqIO.parse(file_path, 'fasta'):
>     #Filter according to some critria:
>         seq_ids.add(seq_record.id)

Why do call SeqIO.index, but not use it and instead get
the ID list by doing a full parse of the file? Note that calling
SeqIO.index is likely faster than SeqIO.parse because the
index code doesn't actually load the sequence information
etc - just the record identifier. This speed difference is more
obvious on heavier file formats like GenBank. e.g. These
single lines both get all the identifiers as a list:

seq_ids = SeqIO.parse(file_path, 'fasta').keys()

vs:

seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]

Also note that using a set rather than a list for the ids
means the order is lost - which may be important.

> #write the fasta records to a new file using SeqIO.write()
>
> SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
> 'fasta')
>

That last line uses a list comprehension,
[indexed_fasta[seq_id] for seq_id in seq_ids]

That will therefore load all the records into memory as a list of
SeqRecord objects, which can be avoided with a list comprehension:

(indexed_fasta[seq_id] for seq_id in seq_ids)

i.e. round brackets not square.

> So if someone who can edit the cookbook wants to add it feel free to.
>
> Justin Gibbons

Feedback on the documentation and efforts to improve it
are always welcome. However, I'm not sure what your example
is trying to do yet - it seems to rewrite a FASTA file with the
records in a new order (with the order given by however
Python sorts the set of IDs).

Thanks,

Peter


From jgibbons1 at mail.usf.edu  Sun Apr 14 17:53:26 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sun, 14 Apr 2013 13:53:26 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
Message-ID: <CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>

My only goal was to demonstrate how to use SeqIO.write without holding all
of the sequence records in memory by using a generator expression:

    SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
new_file_path,'fasta')

Everything else was just to provide context for the SeqIO.write() function,
but it just ended up just being confusing.

I am assuming that you want to check the individual fasta records for
specific criteria and then write those that match the criteria to a new
file. Which is why I wrote this:

for seq_record in SeqIO.parse(file_path, 'fasta'):
     #Filter according to some critria:
         seq_ids.add(seq_record.id)

 For example you can create individual sets holding the sequence IDs of
sequences that are within a given size range, and aren't repetitive. So
that seq_ids=correct_length_set.intersection(non_repetitive_set)

You need the indexed fasta so that you can get a copy of the sequence
records that match your criteria:

ndexed_fasta=SeqIO.index(
file_path, 'fasta') #Can be searched by sequence
  ID but is not held in memory


On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Justin,
>
> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
> wrote:
> > I want to add the following to the cookbook but I am unable to create an
> > account.
>
> Hmm - we should fix that. Is there a specific error message
> from the wiki?
>
> > #using SeqIO.write() without holding records in memory.
> >
> > from Bio import SeqIO
> >
> >
> > seq_ids=set() #create an empty set to hold the sequence IDs.
> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by
> sequence
> > ID but is not held in memory
> >
> > for seq_record in SeqIO.parse(file_path, 'fasta'):
> >     #Filter according to some critria:
> >         seq_ids.add(seq_record.id)
>
> Why do call SeqIO.index, but not use it and instead get
> the ID list by doing a full parse of the file? Note that calling
> SeqIO.index is likely faster than SeqIO.parse because the
> index code doesn't actually load the sequence information
> etc - just the record identifier. This speed difference is more
> obvious on heavier file formats like GenBank. e.g. These
> single lines both get all the identifiers as a list:
>
> seq_ids = SeqIO.parse(file_path, 'fasta').keys()
>
> vs:
>
> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]
>
> Also note that using a set rather than a list for the ids
> means the order is lost - which may be important.
>
> > #write the fasta records to a new file using SeqIO.write()
> >
> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids], new_file_path,
> > 'fasta')
> >
>
> That last line uses a list comprehension,
> [indexed_fasta[seq_id] for seq_id in seq_ids]
>
> That will therefore load all the records into memory as a list of
> SeqRecord objects, which can be avoided with a list comprehension:
>
> (indexed_fasta[seq_id] for seq_id in seq_ids)
>
> i.e. round brackets not square.
>
> > So if someone who can edit the cookbook wants to add it feel free to.
> >
> > Justin Gibbons
>
> Feedback on the documentation and efforts to improve it
> are always welcome. However, I'm not sure what your example
> is trying to do yet - it seems to rewrite a FASTA file with the
> records in a new order (with the order given by however
> Python sorts the set of IDs).
>
> Thanks,
>
> Peter
>


From jgibbons1 at mail.usf.edu  Sun Apr 14 17:58:53 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Sun, 14 Apr 2013 13:58:53 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
Message-ID: <CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>

Sorry I accidentally sent the last email.

You need the indexed fasta to get a copy of the sequence records that match
your criteria:

indexed_fasta=SeqIO.index(file_path, 'fasta')
SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
new_file_path,'fasta')

As for editing the wiki when I click on "Login with OpenID" I get sent to a
blank page. I also tried clicking on "Login" and tired to create a new
account and was told "The action you have requested is limited to users in
the group: Administrators<http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1>
."


On Sun, Apr 14, 2013 at 1:53 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>wrote:

> My only goal was to demonstrate how to use SeqIO.write without holding all
> of the sequence records in memory by using a generator expression:
>
>     SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> new_file_path,'fasta')
>
> Everything else was just to provide context for the SeqIO.write()
> function, but it just ended up just being confusing.
>
> I am assuming that you want to check the individual fasta records for
> specific criteria and then write those that match the criteria to a new
> file. Which is why I wrote this:
>
> for seq_record in SeqIO.parse(file_path, 'fasta'):
>      #Filter according to some critria:
>          seq_ids.add(seq_record.id)
>
>  For example you can create individual sets holding the sequence IDs of
> sequences that are within a given size range, and aren't repetitive. So
> that seq_ids=correct_length_set.intersection(non_repetitive_set)
>
> You need the indexed fasta so that you can get a copy of the sequence
> records that match your criteria:
>
> ndexed_fasta=SeqIO.index(
> file_path, 'fasta') #Can be searched by sequence
>   ID but is not held in memory
>
>
>
>
>
> On Sat, Apr 13, 2013 at 4:27 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
>
>> Hi Justin,
>>
>> On Sat, Apr 13, 2013 at 9:13 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
>> wrote:
>> > I want to add the following to the cookbook but I am unable to create an
>> > account.
>>
>> Hmm - we should fix that. Is there a specific error message
>> from the wiki?
>>
>> > #using SeqIO.write() without holding records in memory.
>> >
>> > from Bio import SeqIO
>> >
>> >
>> > seq_ids=set() #create an empty set to hold the sequence IDs.
>> > indexed_fasta=SeqIO.index(file_path, 'fasta') #Can be searched by
>> sequence
>> > ID but is not held in memory
>> >
>> > for seq_record in SeqIO.parse(file_path, 'fasta'):
>> >     #Filter according to some critria:
>> >         seq_ids.add(seq_record.id)
>>
>> Why do call SeqIO.index, but not use it and instead get
>> the ID list by doing a full parse of the file? Note that calling
>> SeqIO.index is likely faster than SeqIO.parse because the
>> index code doesn't actually load the sequence information
>> etc - just the record identifier. This speed difference is more
>> obvious on heavier file formats like GenBank. e.g. These
>> single lines both get all the identifiers as a list:
>>
>> seq_ids = SeqIO.parse(file_path, 'fasta').keys()
>>
>> vs:
>>
>> seq_ids = [rec.id for rec in SeqIO.parse(file_path, 'fasta')]
>>
>> Also note that using a set rather than a list for the ids
>> means the order is lost - which may be important.
>>
>> > #write the fasta records to a new file using SeqIO.write()
>> >
>> > SeqIO.write([indexed_fasta[seq_id] for seq_id in seq_ids],
>> new_file_path,
>> > 'fasta')
>> >
>>
>> That last line uses a list comprehension,
>> [indexed_fasta[seq_id] for seq_id in seq_ids]
>>
>> That will therefore load all the records into memory as a list of
>> SeqRecord objects, which can be avoided with a list comprehension:
>>
>> (indexed_fasta[seq_id] for seq_id in seq_ids)
>>
>> i.e. round brackets not square.
>>
>> > So if someone who can edit the cookbook wants to add it feel free to.
>> >
>> > Justin Gibbons
>>
>> Feedback on the documentation and efforts to improve it
>> are always welcome. However, I'm not sure what your example
>> is trying to do yet - it seems to rewrite a FASTA file with the
>> records in a new order (with the order given by however
>> Python sorts the set of IDs).
>>
>> Thanks,
>>
>> Peter
>>
>
>


From p.j.a.cock at googlemail.com  Mon Apr 15 10:10:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 11:10:15 +0100
Subject: [Biopython] BioPython now available on PiCloud by default
In-Reply-To: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
References: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
Message-ID: <CAKVJ-_6Yqhc1LYNoHph5VxAKx5Fwj5j9p9b-GNa8P0ufK2seYQ@mail.gmail.com>

On Sat, Apr 13, 2013 at 12:11 AM, John Riley <john at picloud.com> wrote:
> Hello,
>
> We've had some requests for BioPython to be deployed on PiCloud [1]. While
> any user could always create a custom environment, and install the latest
> version themselves [2], we've decided to address the issue directly by
> adding BioPython (1.60) into the default suite of scientific tools on
> PiCloud.
>
> In short, to offload a Python function or program that uses BioPython, you
> don't need to do any setup! The instructions for using other scientific
> tools work just the same [3]. Hope this helps!
>
> [1] http://www.picloud.com
> [2] http://docs.picloud.com/environment.html
> [3] http://docs.picloud.com/howto/pyscientifictools.html
>
> Best Regards,
> John

Sounds interesting, and you have some very keen users already :)
http://blog.picloud.com/2011/09/27/building-a-biological-database-and-doing-comparative-genomics-in-the-cloud/

Regards,

Peter


From p.j.a.cock at googlemail.com  Mon Apr 15 10:46:53 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 11:46:53 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
Message-ID: <CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>

On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> Sorry I accidentally sent the last email.
>
> You need the indexed fasta to get a copy of the sequence records that match
> your criteria:
>
> indexed_fasta=SeqIO.index(file_path, 'fasta')
> SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> new_file_path,'fasta')

With a simple sequential file format like FASTA where there are no complex
file headers/footers to worry about, this might be the faster route:

with open(new_file_path, "w") as handle:
    for seq_id in seq_ids:
        handle.write(indexed_fasta.get_raw(seq_id))

The idea here is never to parse the records into SeqRecord objects, just
keep them as raw strings in FASTA format. The same idea works well on
GenBank or SwissProt files which are slower to parse, there are examples
of this in the main Tutorial,
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Were you intending this to be a self contained cookbook example for:
http://biopython.org/wiki/Category:Cookbook ?

> As for editing the wiki when I click on "Login with OpenID" I get sent to a
> blank page. I also tried clicking on "Login" and tired to create a new
> account and was told "The action you have requested is limited to users in
> the group: Administrators<http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1>
> ."

Thanks - I've passed that on to our volunteer SysAdmin team.

(As an aside, do you have a GitHub account and would you think
it would be easier to use the wiki hosted on GitHub instead of
our own MediaWiki installation?)

Thanks,

Peter


From swang129 at gmail.com  Mon Apr 15 11:15:23 2013
From: swang129 at gmail.com (Sarah Wang)
Date: Mon, 15 Apr 2013 04:15:23 -0700
Subject: [Biopython] pysam installation errors Inbox x
In-Reply-To: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
References: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
Message-ID: <CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>

When I tried to install pysam with "python setup.py install", multiple
> warning messages have been generated (error messages copied below). I can
> not import pysam. How can I resolve them? Thanks
>
> $Python setup.py install
>
> ...
> Compiling module Cython.Plex.Scanners ...
> Compiling module Cython.Plex.Actions ...
> Compiling module Cython.Compiler.Lexicon ...
> Compiling module Cython.Compiler.Scanning ...
> Compiling module Cython.Compiler.Parsing ...
> Compiling module Cython.Compiler.Visitor ...
> Compiling module Cython.Compiler.FlowControl ...
> Compiling module Cython.Compiler.Code ...
> Compiling module Cython.Runtime.refnanny ...
> warning: no files found matching '*.pyx' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.pxd' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.h' under directory
> 'Cython/Debugger/Tests'
> warning: no files found matching '*.pxd' under directory 'Cython/Utility'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> /tmp/easy_install-9yggMe/
> Cython-0.18/Cython/Plex/Scanners.c:7117:18:
> warning:
>       unused function '__Pyx_CyFunction_New' [-Wunused-function]
> static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef
> *ml,...
>                  ^
> 1 warning generated.
> /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:2992:31:
> warning:
>       implicit conversion loses integer precision: 'long' to 'int'
>       [-Wshorten-64-to-32]
>   __pyx_v_self->input_state = __pyx_v_input_state;
>                             ~ ^~~~~~~~~~~~~~~~~~~
> /tmp/easy_install-9yggMe/Cython-0.18/Cython/Plex/Scanners.c:7117:18:
> warning:
>       unused function '__Pyx_CyFunction_New' [-Wunused-function]
> static PyObject *__Pyx_CyFunction_New(PyTypeObject *type, PyMethodDef
> *ml,...
>                  ^
> 2 warnings generated.
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> clang: warning: argument unused during compilation: '-mno-fused-madd'
> Adding Cython 0.18 to easy-install.pth file
> Installing cygdb script to /usr/local/bin
> Installing cython script to /usr/local/bin
>
> Installed
> /Library/Python/2.7/site-packages/Cython-0.18-py2.7-macosx-10.8-intel.egg
> Finished processing dependencies for pysam==0.7.4
>
>
> >>> import pysam
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pysam/__init__.py", line 1, in <module>
>     from pysam.csamtools import *
> ImportError: No module named csamtools
>


From p.j.a.cock at googlemail.com  Mon Apr 15 11:27:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Apr 2013 12:27:30 +0100
Subject: [Biopython] pysam installation errors Inbox x
In-Reply-To: <CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>
References: <CAJfHGQX5Uop2UhXA6a+M6mhiP7+=Vv8xw+0kzhymknAG+yk+5A@mail.gmail.com>
	<CAJfHGQUmZ5-qV7BqnPX4+ybPifeRuORXR=3qBgWYkMtztKkosw@mail.gmail.com>
Message-ID: <CAKVJ-_4F1X2NKKJ_4fkYUnh-kvau3iFsmugCe2Cs=vVBMk8FsQ@mail.gmail.com>

On Mon, Apr 15, 2013 at 12:15 PM, Sarah Wang <swang129 at gmail.com> wrote:
> When I tried to install pysam with "python setup.py install", multiple
> warning messages have been generated (error messages copied below). I can
> not import pysam. How can I resolve them? Thanks

Hi Sarah,

This is the Biopython mailing list, and while we do discuss other
tools in this case the pysam Google Group is the best place to ask:

https://groups.google.com/forum/?fromgroups=#!topic/pysam-user-group/tOikIFU_ZFk

Peter

P.S. Those were compiler warnings, not errors, and I would guess they
can be ignored.


From ferreirafm at usp.br  Mon Apr 15 12:34:12 2013
From: ferreirafm at usp.br (Frederico Moraes Ferreira)
Date: Mon, 15 Apr 2013 09:34:12 -0300
Subject: [Biopython] BioPython now available on PiCloud by default
In-Reply-To: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
References: <CAHS-D6T3wiqU7==dG+94uBxfsEA46pFubZ38iV9-58wztVtbXg@mail.gmail.com>
Message-ID: <516BF3C4.1070107@usp.br>

Hi John,
Thanks for sharing  such a very nice module.
Best,
Fred

Em 12-04-2013 20:11, John Riley escreveu:
> Hello,
>
> We've had some requests for BioPython to be deployed on PiCloud [1]. While
> any user could always create a custom environment, and install the latest
> version themselves [2], we've decided to address the issue directly by
> adding BioPython (1.60) into the default suite of scientific tools on
> PiCloud.
>
> In short, to offload a Python function or program that uses BioPython, you
> don't need to do any setup! The instructions for using other scientific
> tools work just the same [3]. Hope this helps!
>
> [1] http://www.picloud.com
> [2] http://docs.picloud.com/environment.html
> [3] http://docs.picloud.com/howto/pyscientifictools.html
>
> Best Regards,
> John
>
> --
> John Riley
> PiCloud, Inc.
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

-- 
Dr. Frederico Moraes Ferreira
University of Sao Paulo
School of Medice
Heart Institute - Immunology
Av. Dr. En?as de Carvalho Aguiar, 44
05403-900     S?o Paulo - SP
Brasil


From jgibbons1 at mail.usf.edu  Mon Apr 15 19:40:15 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Mon, 15 Apr 2013 15:40:15 -0400
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
	<CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
Message-ID: <CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>

It looks like there is already an example of this in the tutorial under
18.1.5, but I was planning on making it a self contained cookbook example
so that it is easier to find.

If this is the fastest way to do it though:

with open(new_file_path, "w") as handle:
    for seq_id in seq_ids:
        handle.write(indexed_fasta.
        get_raw(seq_id))
Is there any advantage to using SeqIO.write() other then it being shorter?

I do not have a GitHub account so I cannot comment on whether it would be
easier to use Github.

Thanks,

Justin


On Mon, Apr 15, 2013 at 6:46 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Apr 14, 2013 at 6:58 PM, Justin Gibbons <jgibbons1 at mail.usf.edu>
> wrote:
> > Sorry I accidentally sent the last email.
> >
> > You need the indexed fasta to get a copy of the sequence records that
> match
> > your criteria:
> >
> > indexed_fasta=SeqIO.index(file_path, 'fasta')
> > SeqIO.write( (indexed_fasta[seq_id] for seq_id in seq_ids),
> > new_file_path,'fasta')
>
> With a simple sequential file format like FASTA where there are no complex
> file headers/footers to worry about, this might be the faster route:
>
> with open(new_file_path, "w") as handle:
>     for seq_id in seq_ids:
>         handle.write(indexed_fasta.get_raw(seq_id))
>
> The idea here is never to parse the records into SeqRecord objects, just
> keep them as raw strings in FASTA format. The same idea works well on
> GenBank or SwissProt files which are slower to parse, there are examples
> of this in the main Tutorial,
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Were you intending this to be a self contained cookbook example for:
> http://biopython.org/wiki/Category:Cookbook ?
>
> > As for editing the wiki when I click on "Login with OpenID" I get sent
> to a
> > blank page. I also tried clicking on "Login" and tired to create a new
> > account and was told "The action you have requested is limited to users
> in
> > the group: Administrators<
> http://biopython.org/w/index.php?title=Biopython:Administrators&action=edit&redlink=1
> >
> > ."
>
> Thanks - I've passed that on to our volunteer SysAdmin team.
>
> (As an aside, do you have a GitHub account and would you think
> it would be easier to use the wiki hosted on GitHub instead of
> our own MediaWiki installation?)
>
> Thanks,
>
> Peter
>


From p.j.a.cock at googlemail.com  Tue Apr 16 09:02:58 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 16 Apr 2013 10:02:58 +0100
Subject: [Biopython] Cookbook suggestion
In-Reply-To: <CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>
References: <CALaGxMh8YhzY39jiPDFejJJieKuZOrcorLL0PauYStuir71MSg@mail.gmail.com>
	<CAKVJ-_58Gv700oc2KbFzhrYvLbMExQE3EQeofO2HzSVjfbM5Lg@mail.gmail.com>
	<CALaGxMgFALpbwWcoh=MFKdyHUG2V0nOcBcG-g26MhVEdB71KNQ@mail.gmail.com>
	<CALaGxMgMG_Et+E61UR+CwayyaX1tVyBjZKkOTF3cZu4HoiHoAQ@mail.gmail.com>
	<CAKVJ-_4NfTnHTPRAXLvvi3S5ewRNTAQLs0h5AXOXT35OiqiD5g@mail.gmail.com>
	<CALaGxMg-F6jAmgKhvPpFQwo6pQ3bdZ9++wnE5NtNG3h6tRoDfQ@mail.gmail.com>
Message-ID: <CAKVJ-_63O2rCzW7HGMx9zHF4_BLNjyANz6+v1xv28H5+6UEBQQ@mail.gmail.com>

On Mon, Apr 15, 2013 at 8:40 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> It looks like there is already an example of this in the tutorial under
> 18.1.5, but I was planning on making it a self contained cookbook example
> so that it is easier to find.
>
> If this is the fastest way to do it though:
>
> with open(new_file_path, "w") as handle:
>     for seq_id in seq_ids:
>         handle.write(indexed_fasta.
>         get_raw(seq_id))
> Is there any advantage to using SeqIO.write() other then it being shorter?

There are two linked choices here,

(a) Full parsing into SeqRecord objects using SeqIO.parse, or use
the SeqIO.index or SeqIO.index_db to just extract the record identifiers.
Unless you need some of the annotation or the sequence, parsing it
into a SeqRecord is a waste of CPU time.

(b) Convert the SeqRecord back into a file on disk, or reuse the
original representation from the input file. For a format like FASTA,
this is almost a moot point - the only change is the white space
(using SeqIO.write will produce consistent line wrapping). For
some of the richer formats like GenBank the parse/write round
trip is not expected to produce an identical output, so it can be
prudent to reuse the original. For some formats like we don't
have writing support, so you have to reuse the original.

My point whether to use SeqIO.write() or indexing and get_raw()
depends on the file format and what you are trying to do. My
recommendations would be to use get_raw to write simple file
formats without headers/footers if:

(*) You need to preserve original records exactly
(*) You need this to be as fast as possible
(*) SeqIO.write doesn't support the file format

Otherwise using SeqIO.write should be fine - it is also simpler
in terms of the code to call it.

If course, if you are editing the records in any way, then you
must use SeqIO.write anyway.

> I do not have a GitHub account so I cannot comment on whether
> it would be easier to use Github.

Thanks. My thinking right now you would need to register separately
for (1) the mailing lists, (2) editing the wiki, (3) reporting bugs on
RedMine, (4) submitting pull requests on github, If we used GitHub
for the wiki and/or issue tracker, this means less user accounts
so a little easier for contributors, but also less SysAdmin work
behind the scenes.

Peter


From nuin at genedrift.org  Wed Apr 17 18:45:20 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Wed, 17 Apr 2013 14:45:20 -0400
Subject: [Biopython] GEO profiles retrieval
Message-ID: <FCB53CAF-E8F4-47D7-8506-7B3D55870CA8@genedrift.org>

Hi everyone

Quite a longish question about some data retrieval we are trying to implement on GEO profiles. I don't know if this is possible to achieve programatically with (or without BioPython), but some parts I already have set using Python and BioPython. What we are trying to achieve:

- we are building a pipeline where initially we want to see if the gene in question (let's say PTEN) is over or under expressed in certain conditions.
- using a eSearch URL/procedure I can get an XML with all the profile IDs for PTEN
- in order to get more information about each profile, I can use an eSummary URL/procedure that will get an XML file for each profile
- with these profiles we then want to check the gene expression level in each sample subgroup or the study and see if the gene is under or over expressed, or there's no change between the groups.

The problem I have is that in the profile XML file there's no information about sample annotation, or gene expression in each sample. I created a workaround that from the eSummary XML, I can get to this page of the profile

http://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2877:1441937_s_at

using the GDS and probe ID found on the XML. Again, from this file there's no easy way to extract the sample grouping/annotation, although it's quite straightforward to extract the gene expression levels for each sample. What I want to find is:

- a way to get sample grouping/annotation for a specific GDS, that would give me the sample IDs that I could correlate to an expression value
- a eSearch, eSummary, eFetch, any URL that would give me expression values per sample, with sample ID annotated to a group

Thanks in advance for any help, idea and comments.

Paulo


From markbudde at gmail.com  Wed Apr 17 21:24:00 2013
From: markbudde at gmail.com (Mark Budde)
Date: Wed, 17 Apr 2013 14:24:00 -0700
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
Message-ID: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>

Hi, I have a simple question. The cookbook shows many examples using
SeqFeatures, I can't find any information on adding features to a
SeqRecord.

Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
would I add this to my SeqRecord?

Thanks,
Mark


From p.j.a.cock at googlemail.com  Wed Apr 17 21:53:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Apr 2013 22:53:57 +0100
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
In-Reply-To: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
References: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
Message-ID: <CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>

Hi Mark,

On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde <markbudde at gmail.com> wrote:
> Hi, I have a simple question. The cookbook shows many examples using
> SeqFeatures, I can't find any information on adding features to a
> SeqRecord.

The "Tutorial and Cookbook" does have examples of creating a
SeqFeature - if this was not obvious to you how might we make
it clearer?

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

See also the docstrings,

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> help(SeqFeature)
>>> help(FeatureLocation)

Online here (for the current release):
http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html
http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html

> Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
> nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
> would I add this to my SeqRecord?
>
> Thanks,
> Mark

Which version of Biopython do you have? The strand is moving
from the SeqFeature to the FeatureLocation, but this will work
on old and new:

from Bio.SeqFeature import SeqFeature, FeatureLocation
loc = FeatureLocation(9, 100)
f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"})

This is preferred for future-proofing:

from Bio.SeqFeature import SeqFeature, FeatureLocation
loc = FeatureLocation(9, 100, strand=-1)
f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"})

Exactly where you put the gene name depends on what you'll be
doing with the record - for GenBank or EMBL output, using a
locus_tag key would be a sensible option.

Then if you have a SeqRecord, use my_record.features.append(f)
or similar (and for GenBank/EMBL output pay attention to the
order).

Is that clear?

Regards,

Peter


From markbudde at gmail.com  Wed Apr 17 22:52:31 2013
From: markbudde at gmail.com (Mark Budde)
Date: Wed, 17 Apr 2013 15:52:31 -0700
Subject: [Biopython] Adding a SeqFeature to a SeqRecord
In-Reply-To: <CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>
References: <CAEwaGEvLAjYFiZAgD_yBCn-WeczF5vsNZDthr008ari0ObQ_wQ@mail.gmail.com>
	<CAKVJ-_7tBH_KERLc-1sfbOPJu_mDPNW=bQ6hQxx2qXjHBUSqUA@mail.gmail.com>
Message-ID: <CAEwaGEuZN86SpseXNRg2qVjb3M05M-gdmrDVTHwanUAV=7dAWA@mail.gmail.com>

On Wed, Apr 17, 2013 at 2:53 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Mark,
>
> On Wed, Apr 17, 2013 at 10:24 PM, Mark Budde <markbudde at gmail.com> wrote:
> > Hi, I have a simple question. The cookbook shows many examples using
> > SeqFeatures, I can't find any information on adding features to a
> > SeqRecord.
>
> The "Tutorial and Cookbook" does have examples of creating a
> SeqFeature - if this was not obvious to you how might we make
> it clearer?
>
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> I am coming at this from the perspective of generating a plasmid with
features on it. I guess most people would be using this for mining data
from pubmed or something, so maybe I'm just not the targeted user. I spent
a lot of time looking for how to name a feature, like you would in a vector
editing program. I now see that I can generate a feature as shown in the
first example in 4.3.3 - is this what you are referring to? I was confused
earlier because I could never figure out how to name the feature, nor how
to add it to the SeqRecord. I can see how to do this from you example below
(using qualifiers to name the feature, and append to add the feature). I
think the cookbook would benefit from adding a line such as

>>> len(MyRecord.features)
0
>>> example_feature.qualifiers['locus_tag'] = 'Gene1'
>>> MyRecord.features.append(example_feature)
>>> len(MyRecord.features)
1


> See also the docstrings,
>
> >>> from Bio.SeqFeature import SeqFeature, FeatureLocation
> >>> help(SeqFeature)
> >>> help(FeatureLocation)
>
> Online here (for the current release):
> http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html
>
> http://biopython.org/DIST/docs/api/Bio.SeqFeature.FeatureLocation-class.html
>
> > Say I wanted to add a Feature to an existing SeqRecord. Lets say it spans
> > nucleotides 10..100, is called "Gene1" and is on the reverse strand. How
> > would I add this to my SeqRecord?
> >
> > Thanks,
> > Mark
>
> Which version of Biopython do you have? The strand is moving
> from the SeqFeature to the FeatureLocation, but this will work
> on old and new:
>
> I have v1.59

> from Bio.SeqFeature import SeqFeature, FeatureLocation
> loc = FeatureLocation(9, 100)
> f = SeqFeature(loc, strand=-1, qualifiers={"locus_tag":"Gene1"})
>
> This is preferred for future-proofing:
>
> from Bio.SeqFeature import SeqFeature, FeatureLocation
> loc = FeatureLocation(9, 100, strand=-1)
> f = SeqFeature(loc, qualifiers={"locus_tag":"Gene1"})
>
> Exactly where you put the gene name depends on what you'll be
> doing with the record - for GenBank or EMBL output, using a
> locus_tag key would be a sensible option.
>
> Then if you have a SeqRecord, use my_record.features.append(f)
> or similar (and for GenBank/EMBL output pay attention to the
> order).
>
> Is that clear?

Yes. Your example provided here is clear and I think it should be added to
the cookbook.

>
>
Regards,
>
> Peter
>
Thanks for your help Peter, and pardon my ignorance.
-Mark


From mictadlo at gmail.com  Mon Apr 22 04:05:58 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 22 Apr 2013 14:05:58 +1000
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute 'alignments'
Message-ID: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>

Hi,
The following code (BioPython 1.61, Blast+ 2.2.26):

from Bio.Blast import NCBIXML

with open("test/X.xml") as bf:
    blast_records = NCBIXML.parse(bf)

    for blast_record in blast_records:
        for alignment in blast_records.alignments:
            for hsp in alignment.hsps:
                if hsp.expect < 0.04:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect
                    print hsp.query[0:75] + '...'
                    print hsp.match[0:75] + '...'
                    print hsp.sbjct[0:75] + '...'

caused the following error:
$ python parseBlastXML.py
Traceback (most recent call last):
  File "parseBlastXML.py", line 8, in <module>
    for alignment in blast_records.alignments:
AttributeError: 'generator' object has no attribute 'alignments'

What did I do wrong?

Thank you in advance.

Mic


From mictadlo at gmail.com  Mon Apr 22 04:27:12 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 22 Apr 2013 14:27:12 +1000
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute
	'alignments'
In-Reply-To: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
References: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
Message-ID: <CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>

My mistake. This is the solution
from Bio.Blast import NCBIXML

with open("test/XA10m_v3.0.aa.snap_vs_uniref90.blastp.xml") as bf:
    blast_records = NCBIXML.parse(bf)

    for blast_record in blast_records:
        for alignment in *blast_record.alignments*:
            for hsp in alignment.hsps:
                if hsp.expect < 0.04:
                    print '****Alignment****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect
                    print hsp.query[0:75] + '...'
                    print hsp.match[0:75] + '...'
                    print hsp.sbjct[0:75] + '...'


On Mon, Apr 22, 2013 at 2:05 PM, Mic <mictadlo at gmail.com> wrote:

> Hi,
> The following code (BioPython 1.61, Blast+ 2.2.26):
>
> from Bio.Blast import NCBIXML
>
> with open("test/X.xml") as bf:
>     blast_records = NCBIXML.parse(bf)
>
>     for blast_record in blast_records:
>         for alignment in blast_records.alignments:
>             for hsp in alignment.hsps:
>                 if hsp.expect < 0.04:
>                     print '****Alignment****'
>                     print 'sequence:', alignment.title
>                     print 'length:', alignment.length
>                     print 'e value:', hsp.expect
>                     print hsp.query[0:75] + '...'
>                     print hsp.match[0:75] + '...'
>                     print hsp.sbjct[0:75] + '...'
>
> caused the following error:
> $ python parseBlastXML.py
> Traceback (most recent call last):
>   File "parseBlastXML.py", line 8, in <module>
>     for alignment in blast_records.alignments:
> AttributeError: 'generator' object has no attribute 'alignments'
>
> What did I do wrong?
>
> Thank you in advance.
>
> Mic
>
>
>


From p.j.a.cock at googlemail.com  Mon Apr 22 08:08:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Apr 2013 09:08:50 +0100
Subject: [Biopython] NCBIXML: 'generator' objecthas no attribute
	'alignments'
In-Reply-To: <CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>
References: <CAOP6n=hnZaP2oq3ho3Hx7WKGfGFBHmhtGRVu4mjhuwJQbfJiYw@mail.gmail.com>
	<CAOP6n=iEAKqyDTcdDLXa966TT3PkiapmGekcGmY8r8YaDwQTEg@mail.gmail.com>
Message-ID: <CAKVJ-_4QtsDdNct8rMrHMg7Jc==HVqVx8ssK47NkP6dByPgJ5g@mail.gmail.com>

On Monday, April 22, 2013, Mic wrote:

> My mistake. This is the solution
> from Bio.Blast import NCBIXML


Hi Mic,

Yep, you had two variables with very similar names.
An easy mistake to make - its one of the things
you'll learn to check with an AttrributeError: Am
I using the object I think I'm using. Well done
for solving it yourself, and thank you for posting
the solution here.

Regards,

Peter


From mictadlo at gmail.com  Wed Apr 24 05:55:06 2013
From: mictadlo at gmail.com (Mic)
Date: Wed, 24 Apr 2013 15:55:06 +1000
Subject: [Biopython] NCBIXML: hit start and end
Message-ID: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>

Hi,
I have tried to rewrite the Perl code to Biopython

sub retrieve {
    my $blast_report = $options->{'blast'};
    my $max_hits  = $options->{'maxhits'};
    my $searchio     = new Bio::SearchIO(
        -format => 'blast',
        -file   => $blast_report
    );

    while ( my $result = $searchio->next_result ) {
        my $query_name = $result->query_name();
        my $count_unirefs   = 0;
        my %hit_names_count = ();
        while ( my $hit = $result->next_hit ) {
            $count_unirefs++;


            my $count_hsp = 0;
            my @plushsps  = ();
            my @minhsps   = ();

            while ( my $hsp = $hit->next_hsp ) {
                $count_hsp++;
                my $query_start = $hsp->start('query');
                my $query_end   = $hsp->end('query');
                my $hit_start   = $hsp->start('hit');
                my $hit_end     = $hsp->end('hit');
                my $strand      = $hsp->strand();
                my $hit_desc    = $hit->description();


                my @hsp_data    = ($query_start, $query_end, $hit_start,
$hit_end, $hit_desc);


            }
        }

    }
}


Biopython code:
---------------
from Bio import SeqIO
from Bio.Blast import NCBIXML

def retrieve_hits_data():

    max_hits = 5  # Change to args

    with open("test/x.xml") as bf:
        blast_records = NCBIXML.parse(bf)

        for blast_record in blast_records:
            print blast_record.query
            print
            for alignment in blast_record.alignments:
                print 'sequence:', alignment.title
                print alignment.hit_id
                print alignment.hit_def
                print 'length:', alignment.length

                for hsp in alignment.hsps:
                    print "HSPs"
                    print "----"
                    print 'e value:', hsp.expect
                    #print hsp.query
                    #print hsp.match
                    #print hsp.sbjct
                    print hsp.score
                    print hsp.bits
                    print hsp.num_alignments
                    print hsp.identities
                    print hsp.positives
                    print hsp.gaps
                    print hsp.align_length
                    print hsp.strand
                    print hsp.frame
                    print hsp.query_start
                    print hsp.query_end
                    #print hsp.hit_start
                    #print hsp.hit_end
                    print hsp.sbjct_start
                    print hsp.sbjct_end


retrieve_hits_data()


Output from Biopython code:
XA10_v3.0-snap.1

XA10_v3.0-snap.2

XA10_v3.0-snap.3

XA10_v3.0-snap.4

sequence: UniRef90_Q9FX16 F12G12.10 protein n=1 Tax=Arabidopsis thaliana
RepID=Q9FX16_ARATH
UniRef90_Q9FX16
F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH
length: 308
HSPs
----
e value: 8.30308e-88
709.0
277.715
None
146
192
10
285
(None, None)
(0, 0)
10
290
8
286


How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in
Biopython?
Why does blast_record.query appears immediately in sequence and not after
the other two for loops has finished?

Thank you in advance.

Mic


From w.arindrarto at gmail.com  Wed Apr 24 07:04:02 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 24 Apr 2013 09:04:02 +0200
Subject: [Biopython] NCBIXML: hit start and end
In-Reply-To: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>
References: <CAOP6n=jgtS=hS64OO6o491MeWwLFL4WNQ8x9FAR1bVYu6mMHmA@mail.gmail.com>
Message-ID: <CADEGkF5WPrKDYD+gk=z3_7SHNpFv4cg0rv-6mAYZ+11gcFzkHw@mail.gmail.com>

Hi Mic,

> How do I get hsp->start('hit') and hsp->end('hit') from the bioperl code in
> Biopython?

With NCBIXML, they should be hsp.sbjct_start and hsp.sbjct_end respectively.

> Why does blast_record.query appears immediately in sequence and not after
> the other two for loops has finished?

It may be because the first three queries in your BLAST XML results
(XA10_v3.0-snap.{1..3}) do not have any hits and hsps. Check with your
XML  results to be sure.

Hope that helps :),
Bow


From p.j.a.cock at googlemail.com  Wed Apr 24 19:19:48 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 24 Apr 2013 20:19:48 +0100
Subject: [Biopython] Biopython GSoC 2013 applications via NESCent
Message-ID: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>

To all the Biopythoneers,

For the last few years Biopython has participated in the
Google Summer of Code (GSoC) program under the umbrella
of the Open Bioinformatics Foundation (OBF):
https://developers.google.com/open-source/soc/
https://github.com/OBF/GSoC

Unfortunately like quite a few previously accepted organisations,
this year the OBF not accepted. Google has kept the total about
the same year on year, so this is probably simply a slot rotation
to get some new organisations involved.

The good news (for those not following the Biopython-dev
mailing list) is we have an alternative option agreed with
the good people at NESCent, as we did back in 2009:

http://biopython.org/wiki/Google_Summer_of_Code
http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013

I'd like to thank Eric for co-ordinating this, and encourage
any interested potential students to sign up to the Biopython
development list and NESCent's Google+ group as soon as
possible (if you haven't done so already):

http://lists.open-bio.org/mailman/listinfo/biopython-dev
https://plus.google.com/communities/105828320619238393015

Google are already accepting student applications, and the
deadline is Friday 3 May.  That doesn't leave very long for
asking feedback and talking to potential mentors - which
is essential for a competitive proposal.

Thank you for your interest,

Peter


From nuin at genedrift.org  Thu Apr 25 18:42:07 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 25 Apr 2013 14:42:07 -0400
Subject: [Biopython] PubmedCentral XML parsing
Message-ID: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>

Hi

What would be the most direct way of parsing XML files downloaded from PubmedCentral ftp using BioPython?  These are files that use the archivearticle.dtd and when parsed using non-DTD based code generate broken paragraphs on the body of the document due to < > between <p> items of the body.

Thanks in advance

Paulo 


From p.j.a.cock at googlemail.com  Thu Apr 25 19:05:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 25 Apr 2013 20:05:32 +0100
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
Message-ID: <CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>

On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi
>
> What would be the most direct way of parsing XML files downloaded from
> PubmedCentral ftp using BioPython?  These are files that use the
> archivearticle.dtd and when parsed using non-DTD based code generate broken
> paragraphs on the body of the document due to < > between <p> items of the
> body.
>
> Thanks in advance
>
> Paulo

The Bio.Entrez parser is DTD based, and might suit your needs.

Peter


From nuin at genedrift.org  Thu Apr 25 19:16:49 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 25 Apr 2013 15:16:49 -0400
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
	<CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
Message-ID: <A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>

Hi Peter

Thanks a lot. I am getting an error when trying to parse with Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP server in order to avoid their bulk downloading restrictions. Anyway, the code I am using is quite simple (with ipython):

In [1]: from Bio import Entrez

In [2]: handle = open('nihms83342.nxml')

In [3]: records = Entrez.parse(handle)

In [4]: for i in records:
   ...:     print i
   ...:
---------------------------------------------------------------------------
NotXMLError                               Traceback (most recent call last)
<ipython-input-4-82461854c9e7> in <module>()
----> 1 for i in records:
      2     print i
      3

/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
    229                         # We did not see the initial <!xml declaration, so
    230                         # probably the input data is not in XML format.
--> 231                         raise NotXMLError("XML declaration not found")
    232                 self.parser.Parse("", True)
    233                 self.parser = None

NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

And the file header is

<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="EN">
	<?properties open_access?>
	<?properties manuscript?>
	<front>
		<journal-meta>

Is there a different way of parsing this file?

Thanks in advance

Paulo


On 2013-04-25, at 3:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>> Hi
>> 
>> What would be the most direct way of parsing XML files downloaded from
>> PubmedCentral ftp using BioPython?  These are files that use the
>> archivearticle.dtd and when parsed using non-DTD based code generate broken
>> paragraphs on the body of the document due to < > between <p> items of the
>> body.
>> 
>> Thanks in advance
>> 
>> Paulo
> 
> The Bio.Entrez parser is DTD based, and might suit your needs.
> 
> Peter


From zhigang.wu at email.ucr.edu  Sat Apr 27 00:52:19 2013
From: zhigang.wu at email.ucr.edu (Zhigang Wu)
Date: Fri, 26 Apr 2013 17:52:19 -0700
Subject: [Biopython] [Biopython-dev] Biopython GSoC 2013 applications
	via NESCent
In-Reply-To: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>
References: <CAKVJ-_5kQvFGWFNcFSDF3VADcCGeL_Wac4BsQttxY79v+XCR4w@mail.gmail.com>
Message-ID: <CADhJE9t9u2QQ5rM4mpD9eHDUuCSbV4PDJmOgpV9SHSix1X6yLA@mail.gmail.com>

Hi Peter,

I am interested in implementing the lazy-loading sequence parsers.
I know the time is pretty tight for me to write an proposal on it. But even
I cannot contribute under the umbrella of GSoC and assuming no body is
implemented, I am still interested in implementing this (I just wanna have
something nice on my CV and while contributing to Open source software
community as well). While at this moment, I don't have very clear picture
on how to do it. Can you point me to somewhere where I can start to get a
sense how this can be implemented. As far as I know, samtools (view) may
have similar techniques in them. Thanks.


Zhigang


On Wed, Apr 24, 2013 at 12:19 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> To all the Biopythoneers,
>
> For the last few years Biopython has participated in the
> Google Summer of Code (GSoC) program under the umbrella
> of the Open Bioinformatics Foundation (OBF):
> https://developers.google.com/open-source/soc/
> https://github.com/OBF/GSoC
>
> Unfortunately like quite a few previously accepted organisations,
> this year the OBF not accepted. Google has kept the total about
> the same year on year, so this is probably simply a slot rotation
> to get some new organisations involved.
>
> The good news (for those not following the Biopython-dev
> mailing list) is we have an alternative option agreed with
> the good people at NESCent, as we did back in 2009:
>
> http://biopython.org/wiki/Google_Summer_of_Code
> http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013
>
> I'd like to thank Eric for co-ordinating this, and encourage
> any interested potential students to sign up to the Biopython
> development list and NESCent's Google+ group as soon as
> possible (if you haven't done so already):
>
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> https://plus.google.com/communities/105828320619238393015
>
> Google are already accepting student applications, and the
> deadline is Friday 3 May.  That doesn't leave very long for
> asking feedback and talking to potential mentors - which
> is essential for a competitive proposal.
>
> Thank you for your interest,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From mictadlo at gmail.com  Mon Apr 29 01:13:49 2013
From: mictadlo at gmail.com (Mic)
Date: Mon, 29 Apr 2013 11:13:49 +1000
Subject: [Biopython] gff installation failed with easy_install
Message-ID: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>

Hi,
I have tried to install gff with easy_install, but I got the following
error:
$ easy_install --prefix=/home/mic/apps/pymodules -UZ
https://github.com/chapmanb/bcbb/tree/master/gff
Downloading https://github.com/chapmanb/bcbb/tree/master/gff
error: Unexpected HTML page found at
https://github.com/chapmanb/bcbb/tree/master/gff

How is it possible to install gff?

Thank you in advance.

Mic


From chapmanb at 50mail.com  Mon Apr 29 10:34:42 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Apr 2013 06:34:42 -0400
Subject: [Biopython] gff installation failed with easy_install
In-Reply-To: <517DEECF.60705@bx.psu.edu>
References: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>
	<517DEECF.60705@bx.psu.edu>
Message-ID: <87bo8xhbgd.fsf@fastmail.fm>


Mic;

> I have tried to install gff with easy_install, but I got the following 
> error:
> $ easy_install --prefix=/home/mic/apps/pymodules -UZ 
> https://github.com/chapmanb/bcbb/tree/master/gff
> Downloading https://github.com/chapmanb/bcbb/tree/master/gff
> error: Unexpected HTML page found at 
> https://github.com/chapmanb/bcbb/tree/master/gff
>
> How is it possible to install gff?

I don't know of a way to install directly from git with subdirectories
like that. You'd need to clone, then install with easy_install or pip:

$ git clone git://github.com/chapmanb/bcbb.git
$ easy_install bcbb/gff
$ pip install bcbb/gff

Apologies about the convoluted setup. Depending on what you're doing,
you might want to have a look at gffutils:

https://github.com/daler/gffutils

We're working on rolling the functionality from the gff library into
this so there'll be one place to work from for GFF in python.

Hope this helps,
Brad


From p.j.a.cock at googlemail.com  Mon Apr 29 11:23:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 29 Apr 2013 12:23:16 +0100
Subject: [Biopython] PubmedCentral XML parsing
In-Reply-To: <A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>
References: <B7476E5F-FBC8-4612-B6D9-9CE74E708C75@genedrift.org>
	<CAKVJ-_5XiP2jLVB27cFeBULC0OR5xZ=yDM6wRG5+7kt=HWLORw@mail.gmail.com>
	<A5227F2D-594D-4AFC-9110-85348E74CFD5@genedrift.org>
Message-ID: <CAKVJ-_7_q58ajdUmdvKFunZNtxL=xDGth5wemg+Sk+XAH82AWA@mail.gmail.com>

On Thu, Apr 25, 2013 at 8:16 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi Peter
>
> Thanks a lot. I am getting an error when trying to parse with
> Entrez.parse. I download the nxml file prior to parsing, using PMC's FTP
> server in order to avoid their bulk downloading restrictions. Anyway, the
> code I am using is quite simple (with ipython):
>
> In [1]: from Bio import Entrez
>
> In [2]: handle = open('nihms83342.nxml')
>
> In [3]: records = Entrez.parse(handle)
>
> In [4]: for i in records:
>    ...:     print i
>    ...:
>
> ---------------------------------------------------------------------------
> NotXMLError                               Traceback (most recent call
> last)
> <ipython-input-4-82461854c9e7> in <module>()
> ----> 1 for i in records:
>       2     print i
>       3
>
> /Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self,
> handle)
>     229                         # We did not see the initial <!xml
> declaration, so
>     230                         # probably the input data is not in XML
> format.
> --> 231                         raise NotXMLError("XML declaration not
> found")
>     232                 self.parser.Parse("", True)
>     233                 self.parser = None
>
> NotXMLError: Failed to parse the XML data (XML declaration not found).
> Please make sure that the input data are in XML format.
>
> And the file header is
>
> <?xml version="1.0"?>
> <!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange
> DTD v2.3 20070202//EN" "archivearticle.dtd">
> <article xmlns:xlink="http://www.w3.org/1999/xlink"
> xmlns:mml="http://www.w3.org/1998/Math/MathML"
> article-type="research-article" xml:lang="EN">
>         <?properties open_access?>
>         <?properties manuscript?>
>         <front>
>                 <journal-meta>
>
> Is there a different way of parsing this file?
>
> Thanks in advance
>
> Paulo

Hi Paulo,

The header you've shown here does not match the file you
attached to the bug report (the where first line is missing
and there seem to be no line breaks either):
https://redmine.open-bio.org/issues/3430

Where exactly did the nihms83342.nxml file come from?
Is there a URL we can download it from to check?

Thanks,

Peter


From mictadlo at gmail.com  Tue Apr 30 03:13:19 2013
From: mictadlo at gmail.com (Mic)
Date: Tue, 30 Apr 2013 13:13:19 +1000
Subject: [Biopython] gff installation failed with easy_install
In-Reply-To: <87bo8xhbgd.fsf@fastmail.fm>
References: <CAOP6n=gV8fvkjF5vpvCQ0xejbgOQxkodiorz9Rky_t9QrgLSLg@mail.gmail.com>
	<517DEECF.60705@bx.psu.edu> <87bo8xhbgd.fsf@fastmail.fm>
Message-ID: <CAOP6n=hmuhyJP-+j5o8-4j4OvmUCuc7JhzTHtZbLzZRqVLzUsg@mail.gmail.com>

Thank you it is working.


On Mon, Apr 29, 2013 at 8:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Mic;
>
> > I have tried to install gff with easy_install, but I got the following
> > error:
> > $ easy_install --prefix=/home/mic/apps/pymodules -UZ
> > https://github.com/chapmanb/bcbb/tree/master/gff
> > Downloading https://github.com/chapmanb/bcbb/tree/master/gff
> > error: Unexpected HTML page found at
> > https://github.com/chapmanb/bcbb/tree/master/gff
> >
> > How is it possible to install gff?
>
> I don't know of a way to install directly from git with subdirectories
> like that. You'd need to clone, then install with easy_install or pip:
>
> $ git clone git://github.com/chapmanb/bcbb.git
> $ easy_install bcbb/gff
> $ pip install bcbb/gff
>
> Apologies about the convoluted setup. Depending on what you're doing,
> you might want to have a look at gffutils:
>
> https://github.com/daler/gffutils
>
> We're working on rolling the functionality from the gff library into
> this so there'll be one place to work from for GFF in python.
>
> Hope this helps,
> Brad
>


From mictadlo at gmail.com  Tue Apr 30 04:12:34 2013
From: mictadlo at gmail.com (Mic)
Date: Tue, 30 Apr 2013 14:12:34 +1000
Subject: [Biopython] GFF parsing with biopython
Message-ID: <CAOP6n=gOKR2EtjOYr-aXvKvMKLXBnoWAanNQLJdi-eNB1=+3qA@mail.gmail.com>

Hi,
I have the following GFF file from a SNAP

X1       SNAP    Einit   2579    2712    -3.221  +       .       X1-snap.1
X1       SNAP    Exon    2813    2945    4.836   +       .       X1-snap.1
X1       SNAP    Eterm   3013    3033    10.467  +       .       X1-snap.1
X1       SNAP    Esngl   3457    3702    -17.856 +       .       X1-snap.2
X1       SNAP    Einit   4901    4974    -4.954  +       .       X1-snap.3
X1       SNAP    Eterm   5021    5150    14.231  +       .       X1-snap.3
X1       SNAP    Einit   6245    7325    -1.525  -       .       X1-snap.4
X1       SNAP    Eterm   5974    6008    5.398   -       .       X1-snap.4


With the code below I have tried to parse the above GFF file

from BCBio import GFF
from pprint import pprint
from BCBio.GFF import GFFExaminer

def retrieve_pred_genes_data():
    with open("test/X1_small.snap.gff") as sf:
        #examiner = GFFExaminer()
        #pprint(examiner.available_limits(sf))

        for rec in GFF.parse(sf):
            pprint(rec.id)
            pprint(rec.description)
            pprint(rec.name)
            pprint(rec.features)
            #pprint(rec.type)              #'SeqRecord' object has no
attribute
            #pprint(rec.ref)               #'SeqRecord' object has no
attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no
attribute
            #pprint(rec.location)          #'SeqRecord' object has no
attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no
attribute
            #pprint(rec.strand)            #'SeqRecord' object has no
attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no
attribute

retrieve_pred_genes_data()


and got the following output:

'X1'
'<unknown description>'
'<unknown name>'
[SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712),
strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945),
strand=1), type='Exon'),
 SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033),
strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702),
strand=1), type='Esngl'),
 SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974),
strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150),
strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325),
strand=-1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008),
strand=-1), type='Eterm')]

and with GFFExaminer I got these:

{'gff_id': {('X1',): 8},
 'gff_source': {('SNAP',): 8},
 'gff_source_type': {('SNAP', 'Einit'): 3,
                     ('SNAP', 'Esngl'): 1,
                     ('SNAP', 'Eterm'): 3,
                     ('SNAP', 'Exon'): 1},
 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}}


I found these examples (
https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt),
but I got these kind of errors:
            #pprint(rec.type)              #'SeqRecord' object has no
attribute
            #pprint(rec.ref)               #'SeqRecord' object has no
attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no
attribute
            #pprint(rec.location)          #'SeqRecord' object has no
attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no
attribute
            #pprint(rec.strand)            #'SeqRecord' object has no
attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no
attribute

What did I do wrong and how is it possible to access all fields in the
above GFF file?

Thank you in advance.

Mic