From biopython at maubp.freeserve.co.uk  Wed Sep  1 12:52:51 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 1 Sep 2010 17:52:51 +0100
Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers?
Message-ID: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>

Hello all,

One of the improvements in Biopython 1.55 was a re-written location
parser for Bio.GenBank (which also covers EMBL parsing). This made
parsing much faster, and also meant Bio.GenBank.LocationParser
and the underlying Bio.Parsers and Bio.Parsers.spark modules were
obsolete. I'd like to mark these as deprecated in the next release:

* Bio.GenBank.LocationParser
* Bio.Parsers (including Bio.Parsers.spark)

Would this cause anyone a problem?

Thanks,

Peter

From j.reid at mail.cryst.bbk.ac.uk  Fri Sep  3 09:11:28 2010
From: j.reid at mail.cryst.bbk.ac.uk (John Reid)
Date: Fri, 03 Sep 2010 14:11:28 +0100
Subject: [Biopython] Wrong instance length bug in MEME parser
Message-ID: <i5qs60$9i9$1@dough.gmane.org>

Hi,

The MEME parser in biopython 1.55 seems to incorrectly set the length of 
the first instance of a motif to 0. Here is an example:

#Sequence, start, length, site
Motif: E-value: 0.000010
      seq_3,   213,     0, AGGTGACAGAG
      seq_1,   146,    11, AGGTGACAGAG
      seq_0,   490,    11, AGGTGACAGAG
      seq_0,    83,    11, AGGTGACAGAG
      seq_0,   388,    11, AGGAAACAGAG
      seq_1,   422,    11, AGGGGACAGAG
      seq_1,    79,    11, TGGAGACAGAG
      seq_0,   281,    11, TGGGGACAGAG
      seq_0,    16,    11, TAGAGACAGAG
      seq_1,   228,    11, TTGTGACAGAG
      seq_4,   156,    11, AGGGGACAGGG
      seq_0,   348,    11, AGGAAAGAGAA
      seq_0,   374,    11, AGGAATGAGAG
      seq_5,    22,    11, GGGAAACTGAG
      seq_3,   486,    11, AAGGGAGTGAG


Here's the code that generated the above:

from Bio.Motif.Parsers.MEME import MEMEParser
import cStringIO

meme_output = cStringIO.StringIO("""
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009)

For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.nbcr.net.

This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs.  MAST is available
for interactive use and downloading at http://meme.nbcr.net.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:

Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************


********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta
ALPHABET= ACGT
Sequence name            Weight Length  Sequence name            Weight 
Length
-------------            ------ ------  -------------            ------ 
------
seq_0                    1.0000    500  seq_1                    1.0000 
    500
seq_2                    1.0000    500  seq_3                    1.0000 
    500
seq_4                    1.0000    500  seq_5                    1.0000 
    500
********************************************************************************

********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.

command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 
1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp 
-print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1

model:  mod=           anr    nmotifs=         1    evt=           inf
object function=  E-value of product of p-values
width:  minw=            8    maxw=           20    minic=        0.00
width:  wg=             11    ws=              1    endgaps=       yes
nsites: minsites=        2    maxsites=       30    wnsites=       0.8
theta:  prob=            1    spmap=         uni    spfuzz=        0.5
global: substring=     yes    branching=      no    wbranch=        no
em:     prior=   dirichlet    b=            0.01    maxiter=      1000
         distance=    1e-05
data:   n=            3000    N=               6
strands: + -
sample: seed=            0    seqfrac=         1
Letter frequencies in dataset:
A 0.195 C 0.305 G 0.305 T 0.195
Background letter frequencies (from dataset with add-one prior applied):
A 0.195 C 0.305 G 0.305 T 0.195
********************************************************************************


********************************************************************************
MOTIF  1    width =   11   sites =  15   llr = 159   E-value = 9.8e-006
********************************************************************************
--------------------------------------------------------------------------------
     Motif 1 Description
--------------------------------------------------------------------------------
Simplified        A  71:439:9:91
pos.-specific     C  ::::::8::::
probability       G  18a37:2:a19
matrix            T  31:3:1:1:::

          bits    2.4
                  2.1      *
                  1.9      * * *
                  1.6   *  * ***
Relative         1.4   *  * ****
Entropy          1.2 * *  * ****
(15.3 bits)      0.9 *** *******
                  0.7 ***********
                  0.5 ***********
                  0.2 ***********
                  0.0 -----------

Multilevel           AGGAGACAGAG
consensus            T  TA G
sequence                G

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name            Strand  Start   P-value               Site
-------------            ------  ----- ---------            -----------
seq_3                        -    213  4.54e-07 GGCCTTTGGA AGGTGACAGAG 
GCGCGGCCAC
seq_1                        -    146  4.54e-07 CCCAACAGGA AGGTGACAGAG 
GTGGCTCTGG
seq_0                        +    490  4.54e-07 AAAACAGCAG AGGTGACAGAG 

seq_0                        -     83  4.54e-07 CCCAGCAGGA AGGTGACAGAG 
GTGGCTCTGG
seq_0                        +    388  5.99e-07 ATGAGAGGAG AGGAAACAGAG 
CTTCCTGGAC
seq_1                        +    422  1.10e-06 ATGAGAGGGG AGGGGACAGAG 
GACACCTGAA
seq_1                        +     79  1.33e-06 TTGGTGGTAC TGGAGACAGAG 
GGCTGGTCCC
seq_0                        +    281  3.17e-06 CCTCCCCTGA TGGGGACAGAG 
GTCTCATCAG
seq_0                        +     16  5.72e-06 CTGGTGACAC TAGAGACAGAG 
GGCTGGTCCC
seq_1                        -    228  1.18e-05 TTATTTTCCT TTGTGACAGAG 
AAACCCAGCA
seq_4                        +    156  2.07e-05 TCAAGTCCCA AGGGGACAGGG 
AGCAGAAGGG
seq_0                        +    348  2.47e-05 GTAGACAGAA AGGAAAGAGAA 
AGTAAGGACA
seq_0                        +    374  3.14e-05 GGACAAAGGT AGGAATGAGAG 
GAGAGGAAAC
seq_5                        -     22  4.53e-05 CTCTTGTGTA GGGAAACTGAG 
CACGGGGAAC
seq_3                        +    486  5.02e-05 CGCCAATGGG AAGGGAGTGAG 
TGCC
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME            POSITION P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
seq_3                               5e-05  212_[-1]_262_[+1]_4
seq_1                             1.2e-05 
78_[+1]_56_[-1]_71_[-1]_183_[+1]_68
seq_0                             3.2e-06  15_[+1]_56_[-1]_187_[+1]_56_[+1]_
                                            15_[+1]_3_[+1]_91_[+1]
seq_4                             2.1e-05  155_[+1]_334
seq_5                             4.5e-05  21_[-1]_468
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL   MOTIF 1 width=11 seqs=15
seq_3                    (  213) AGGTGACAGAG  1
seq_1                    (  146) AGGTGACAGAG  1
seq_0                    (  490) AGGTGACAGAG  1
seq_0                    (   83) AGGTGACAGAG  1
seq_0                    (  388) AGGAAACAGAG  1
seq_1                    (  422) AGGGGACAGAG  1
seq_1                    (   79) TGGAGACAGAG  1
seq_0                    (  281) TGGGGACAGAG  1
seq_0                    (   16) TAGAGACAGAG  1
seq_1                    (  228) TTGTGACAGAG  1
seq_4                    (  156) AGGGGACAGGG  1
seq_0                    (  348) AGGAAAGAGAA  1
seq_0                    (  374) AGGAATGAGAG  1
seq_5                    (   22) GGGAAACTGAG  1
seq_3                    (  486) AAGGGAGTGAG  1
//

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006
    177  -1055   -219     45
    -55  -1055    139   -155
  -1055  -1055    171  -1055
    103  -1055    -19     77
     45  -1055    127  -1055
    226  -1055  -1055   -155
  -1055    139    -61  -1055
    215  -1055  -1055    -55
  -1055  -1055    171  -1055
    226  -1055   -219  -1055
   -155  -1055    161  -1055
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006
  0.666667  0.000000  0.066667  0.266667
  0.133333  0.000000  0.800000  0.066667
  0.000000  0.000000  1.000000  0.000000
  0.400000  0.000000  0.266667  0.333333
  0.266667  0.000000  0.733333  0.000000
  0.933333  0.000000  0.000000  0.066667
  0.000000  0.800000  0.200000  0.000000
  0.866667  0.000000  0.000000  0.133333
  0.000000  0.000000  1.000000  0.000000
  0.933333  0.000000  0.066667  0.000000
  0.066667  0.000000  0.933333  0.000000
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 regular expression
--------------------------------------------------------------------------------
[AT]GG[ATG][GA]A[CG]AGAG
--------------------------------------------------------------------------------


Time  3.78 secs.

********************************************************************************


********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************

--------------------------------------------------------------------------------
     Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME            COMBINED P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
seq_0                            4.45e-04 
15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)]
seq_1                            4.45e-04 
78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68
seq_2                            2.03e-01  500
seq_3                            4.45e-04 
212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4
seq_4                            2.01e-02  155_[+1(2.07e-05)]_334
seq_5                            4.34e-02  21_[-1(4.53e-05)]_468
--------------------------------------------------------------------------------

********************************************************************************


********************************************************************************
Stopped because nmotifs = 1 reached.
********************************************************************************

CPU: john-dell

********************************************************************************
""")


parser = MEMEParser()
parsed = parser.parse(meme_output)

print '#Sequence, start, length, site'
for motif in parsed.motifs:
     print 'Motif: E-value: %f' % motif.evalue
     for instance in motif.instances:
         print "%10s, %5d, %5d, %s" % (
             instance.sequence_name,
             instance.start,
             instance.length,
             str(instance),
         )
         #assert instance.length == motif.length


From biopython at maubp.freeserve.co.uk  Fri Sep  3 09:44:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 3 Sep 2010 14:44:27 +0100
Subject: [Biopython] Wrong instance length bug in MEME parser
In-Reply-To: <i5qs60$9i9$1@dough.gmane.org>
References: <i5qs60$9i9$1@dough.gmane.org>
Message-ID: <AANLkTikT0JSLc6X+qXAGezFnjh+=+YT1kmsLUhRB8N0s@mail.gmail.com>

On Fri, Sep 3, 2010 at 2:11 PM, John Reid <j.reid at mail.cryst.bbk.ac.uk> wrote:
> Hi,
>
> The MEME parser in biopython 1.55 seems to incorrectly set the length of the
> first instance of a motif to 0. Here is an example:
> ...

Could you file a bug with all that useful information?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Thanks,

Peter

From bartek at rezolwenta.eu.org  Fri Sep  3 10:52:32 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 3 Sep 2010 16:52:32 +0200
Subject: [Biopython] Wrong instance length bug in MEME parser
In-Reply-To: <i5qs60$9i9$1@dough.gmane.org>
References: <i5qs60$9i9$1@dough.gmane.org>
Message-ID: <AANLkTikDo7dwWM8E8VpcW3a9JZQ+ApFkC4-i8SBSq_KO@mail.gmail.com>

On Fri, Sep 3, 2010 at 3:11 PM, John Reid <j.reid at mail.cryst.bbk.ac.uk>wrote:

> Hi,
>
> The MEME parser in biopython 1.55 seems to incorrectly set the length of
> the first instance of a motif to 0. Here is an example:
>
> /cut/
Hi,

Thanks for reporting the bug. It is fixed now in the main branch (small
change, you can see the diff here :
http://github.com/biopython/biopython/commit/102ad30a8c5d8bd87847000b33f771b40143e743

I'm closing the bug now, if you find anything else, please let us know.

thanks
Bartek

From mitlox at op.pl  Mon Sep  6 09:24:14 2010
From: mitlox at op.pl (xyz)
Date: Mon, 06 Sep 2010 23:24:14 +1000
Subject: [Biopython] reading two fastq files at the same time
Message-ID: <4C84EB7E.90200@op.pl>

Hi,
How is it possible to read two fastq files at the same time in 
BioPython? I have the following BioRuby example:

require 'bio'

begin
   fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2])
   fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3])

   while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry)

     fastq_A1 = entry1.entry_id
     fastq_A2 = entry1.seq

     fastq_B1 = entry2.entry_id
     fastq_B2 = entry2.seq
   end

rescue => err
   raise "Exception: #{err}"
end

Thank you in advance.

From biopython at maubp.freeserve.co.uk  Mon Sep  6 09:51:13 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Sep 2010 14:51:13 +0100
Subject: [Biopython] reading two fastq files at the same time
In-Reply-To: <4C84EB7E.90200@op.pl>
References: <4C84EB7E.90200@op.pl>
Message-ID: <AANLkTimD8at8N9y0pYhu+SQuFiGZ7Xh-72hj+9xhgnn5@mail.gmail.com>

On Mon, Sep 6, 2010 at 2:24 PM, xyz <mitlox at op.pl> wrote:
> Hi,
> How is it possible to read two fastq files at the same time in BioPython? I
> have the following BioRuby example:
>
> require 'bio'
>
> begin
> ?fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2])
> ?fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3])
>
> ?while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry)
>
> ? ?fastq_A1 = entry1.entry_id
> ? ?fastq_A2 = entry1.seq
>
> ? ?fastq_B1 = entry2.entry_id
> ? ?fastq_B2 = entry2.seq
> ?end
>
> rescue => err
> ?raise "Exception: #{err}"
> end
>
> Thank you in advance.

Hi,

If you are using Python 2.6+ then probably itertools.izip_longest
would do what you want. You could use itertools.izip but this
won't catch the error condition when one file has more records
than the other.

Alternatively you could use something like this,

from Bio import SeqIO
iter1 = SeqIO.parse(filename1, "fastq")
iter2 = SeqIO.parse(filename1, "fastq")
while True:
    try:
        rec1 = iter1.next()
    except StopIteration:
        rec1 = None
    try:
        rec2 = iter2.next()
    except StopIteration:
        rec2 = None
    if rec1 is None and rec2 is None:
        break #end of both files
    elif rec1 is None or rec2 is None:
        raise ValueError("Diff record count")
    else:
        print rec1.seq, rec1.id
        print rec2.seq, rec2.id

I haven't tested that but it is based on a similar example in
Bio.SeqIO.QualityIO.PairedFastaQualIterator for a paired
FASTQ and QUAL file.

Peter


From biopython at maubp.freeserve.co.uk  Thu Sep  9 13:13:34 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Sep 2010 18:13:34 +0100
Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and
	Bio.Parsers?
In-Reply-To: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>
References: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>
Message-ID: <AANLkTinnSm_xkbGgcn6G5vhw51cVr8T5hYXc0X-D_hM-@mail.gmail.com>

On Wed, Sep 1, 2010 at 5:52 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hello all,
>
> One of the improvements in Biopython 1.55 was a re-written location
> parser for Bio.GenBank (which also covers EMBL parsing). This made
> parsing much faster, and also meant Bio.GenBank.LocationParser
> and the underlying Bio.Parsers and Bio.Parsers.spark modules were
> obsolete. I'd like to mark these as deprecated in the next release:
>
> * Bio.GenBank.LocationParser
> * Bio.Parsers (including Bio.Parsers.spark)
>
> Would this cause anyone a problem?
>
> Thanks,

I've just added the deprecation warnings to the code, ready for
Biopython 1.56 - it is not too late to undo this is anyone is using
this code, but you need to tell us.

Peter

From margeemail at gmail.com  Fri Sep 10 00:10:23 2010
From: margeemail at gmail.com (mailing list)
Date: Fri, 10 Sep 2010 00:10:23 -0400
Subject: [Biopython] Added Biopython to my web tool
Message-ID: <AANLkTi=N9EvkTvtULg7z_GzA1OZfdrkWTbbr=nTQfW0E@mail.gmail.com>

I made a web application (http://utilitymill.com) that lets people make
online utilities with Python.  I thought you guys might appreciate I added
Biopython as a built-in library users can use in their utilities.

Here's an example of a utility using Biopython:
http://utilitymill.com/utility/RNA_Transcription  (It's very simple, I just
wanted to try it out.)

I'm curious to know if it's useful to you guys.  And I'm also hoping I
installed everything correctly, so let me know if anything doesn't work.

-Greg

From mjldehoon at yahoo.com  Sat Sep 11 01:29:32 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Sep 2010 22:29:32 -0700 (PDT)
Subject: [Biopython] Parsing XML returned by efetch from the Journals
	database
Message-ID: <825067.71139.qm@web62404.mail.re1.yahoo.com>

Dear users,

The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code.

To make the parser more stable for other XML documents, I'd like to remove these hacks. Currently is anybody using Bio.Entrez to parse XML returned by efetch from the Journals database?

--Michiel.


From natassa_g_2000 at yahoo.com  Mon Sep 13 12:22:26 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Mon, 13 Sep 2010 09:22:26 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
Message-ID: <533513.93597.qm@web52005.mail.re2.yahoo.com>

Hello, 
I was wondering if there  is a Biopython solution to parsing codeml results from 
paml. the output files are pretty standard, so such a parser should be quite 
straightforward to write up. I d volunteer for this, but thought I might check 
first if somebody else has done this. Actually, I found a read-only pypaml 
interface in google codes, tried it out and realized I had to edit several 
things to even import it (in python 2.5), which is quite strange: It was mainly 
built-in methods that throwed errors..Anyway, i 'corrected' this and then 
realized that the output files assumed by this code may not be the same as mine, 
although again, the outputs of codeml are pretty standard. I am not sure how 
much this code is used and was not sure what is the developper's email to ask 
him some questions. 

I am interested in parsing outputs from Branch, Site and BranchSite models, so 
everthing that codeml can do. Any information by experienced users is welcome!
Thanks, 
Anastasia Gioti


From biopython at maubp.freeserve.co.uk  Mon Sep 13 12:45:28 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 13 Sep 2010 17:45:28 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <533513.93597.qm@web52005.mail.re2.yahoo.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
Message-ID: <AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>

On Mon, Sep 13, 2010 at 5:22 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
> Hello,
> I was wondering if there ?is a Biopython solution to parsing codeml results from
> paml. the output files are pretty standard, so such a parser should be quite
> straightforward to write up. I d volunteer for this, but thought I might check
> first if somebody else has done this. Actually, I found a read-only pypaml
> interface in google codes, tried it out and realized I had to edit several
> things to even import it (in python 2.5), which is quite strange: It was mainly
> built-in methods that throwed errors..Anyway, i 'corrected' this and then
> realized that the output files assumed by this code may not be the same as mine,
> although again, the outputs of codeml are pretty standard. I am not sure how
> much this code is used and was not sure what is the developper's email to ask
> him some questions.
>
> I am interested in parsing outputs from Branch, Site and BranchSite models, so
> everthing that codeml can do. Any information by experienced users is welcome!
> Thanks,
> Anastasia Gioti

Hi Anastasia,

Could you post a short example of the kind of output you are looking at?

Can you get codeml to output what you need in another format, such as NEXUS?

Peter


From p.j.a.cock at googlemail.com  Mon Sep 13 16:40:30 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 13 Sep 2010 21:40:30 +0100
Subject: [Biopython] Fwd: problems searching swiss prot
In-Reply-To: <a06240806c8b41d461cfc@131.229.113.228>
References: <mailman.4871.1284059429.3031.biopython@lists.open-bio.org>
	<a06240806c8b41d461cfc@131.229.113.228>
Message-ID: <AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>

Forwarding a query from Jessica Grant since she appears
to have had trouble posting to the mailing list.

Jessica wrote:

> Hello,
>
> I am running a few scripts to try to extract sequence information
> out of uniprot. ?One program called AutoFACT gives me ID numbers
> associated with that database. ?Most of these look like this:
>
> D2V5S4_NAEGR
> Q48KU2_PSE14
> Q22B72_TETTH
>
>
> and my downstream scripts, which are written in biopython, are
> fine with this. ?Then, every once in a while, a sequence will come
> back with a name that looks like this:
>
> UPI00006CC162
>
> and everything goes bad. ?My script can't handle these names,
> apparently, although if I go to uniprot.org and search for it, the
> sequence comes up.
>
> My script uses the following, where RepID is the number
> extracted from AutoFACT:
>
> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None)
> ? ? ? ?seq_record = SeqIO.read(handle, "swiss")
>
> Any thoughts?
>
> Thank you,
>
> Jessica

Hi Jessica,

I think the problem is that these unusual identifiers are
not UniProt/SwissProt accession identifiers. The URL
this Biopython function uses was originally from
www.expasy.ch but is now on www.uniprot.org as
described here:

http://www.expasy.ch/expasy_urls.html

I think the ID UPI00006CC162 is a UniProt ID of some
kind, so it may be possible to access the information
you want somehow. See for example:

http://www.uniprot.org/uniparc/UPI00006CC162

However, it is not clear to me right away if you can get
this record back as a plain text "swiss" format entry...

Peter


From natassa_g_2000 at yahoo.com  Tue Sep 14 04:02:18 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Tue, 14 Sep 2010 01:02:18 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
Message-ID: <937371.74873.qm@web52006.mail.re2.yahoo.com>

Hi Peter, 


Could you post a short example of the kind of output you are looking at?
Here is an example output, but this caan differ depending on the model used 
(there are several models for Branch, Site, BranchSite, but all are pretty 
standard)


-------------------------------------------------------------------------------OUTPUT-------------------------


seed used = 808671289
CODONML (in paml version 4.4, January 2010)  align.phy
Model: One dN/dS ratio for branches
Codon frequency model: F3x4
Site-class models:  NearlyNeutral
ns =   7  ls = 861

Codon usage in sequences
--------------------------------------------------------------------------------------------------------------

Phe TTT 12 14 15 14 14 12 | Ser TCT  6 11 12  8 10  6 | Tyr TAT  5  5  4  7  9  
5 | Cys TGT 11  8 10  9 11  8
    TTC 23 18 18 20 20 20 |     TCC 16 13 16 19 16 18 |     TAC 11 12 13 17 11 
13 |     TGC  6  2  6  6  4  6
Leu TTA  8  5  6  5  4  2 |     TCA 17 16 18 20 21 15 | *** TAA  0  0  0  0  0  
0 | *** TGA  0  0  0  0  0  0
    TTG 13 11 11 15 15 17 |     TCG 17 14 14 17 17 18 |     TAG  0  0  0  0  0  
0 | Trp TGG  9  8  8 11  8  7
--------------------------------------------------------------------------------------------------------------

Leu CTT 13 15 16 11 12 16 | Pro CCT  7  7 10  6 10  8 | His CAT  8  7  8  4  6  
5 | Arg CGT  6  4  5  4  5  5
    CTC 14 14 13 19 14 15 |     CCC 20 13 16 24 19 20 |     CAC 23 18 22 20 24 
17 |     CGC 14 13 15 14 14 15
    CTA  6  4  8  7  6  9 |     CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18 
13 |     CGA  8  4  6  5  6  6
    CTG 17 17 14 14 16 10 |     CCG  7  8  8  9  6  8 |     CAG 18 14 15 14 14 
13 |     CGG  7  7  8  9  9  8
--------------------------------------------------------------------------------------------------------------

Ile ATT  6  7  9  5  7  6 | Thr ACT  5  7  7  7  5  4 | Asn AAT  3  3  4  2  5  
2 | Ser AGT  7  7  9  8  7  7
    ATC 16 13 15 23 14 16 |     ACC 21 14 17 20 20 16 |     AAC 12 14 14 21 14 
11 |     AGC 14 13 14 15 11 10
    ATA 13  9 10 11 11 10 |     ACA 19 17 22 22 28 18 | Lys AAA 17  8 13  9 13 
12 | Arg AGA 11  5  8  4  6  5
Met ATG 23 21 23 22 23 20 |     ACG 11 12 12 12 14 13 |     AAG 18 15 19 19 18 
18 |     AGG  9 10 13 14 12 13
--------------------------------------------------------------------------------------------------------------

Val GTT  8 13 10 10 10  6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15 
17 | Gly GGT 13  7 12 10 11 10
    GTC 18 13 18 20 19 21 |     GCC 28 26 28 28 28 23 |     GAC 29 21 26 33 29 
30 |     GGC  9  9  8  7 12  8
    GTA  8  8  9  7  6  7 |     GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21 
22 |     GGA  7  7 10  9  7  9
    GTG 13 11 14 13 13  9 |     GCG 11 10 10 10  7  7 |     GAG 14 14 17 13 19 
17 |     GGG  7  6  9  8  7  9
--------------------------------------------------------------------------------------------------------------


--------------------------------------------------
Phe TTT 12 | Ser TCT  8 | Tyr TAT  6 | Cys TGT  8
    TTC 22 |     TCC 18 |     TAC 15 |     TGC  6
Leu TTA  5 |     TCA 22 | *** TAA  0 | *** TGA  0
    TTG 17 |     TCG 17 |     TAG  0 | Trp TGG  9
--------------------------------------------------
Leu CTT 14 | Pro CCT 12 | His CAT  5 | Arg CGT  6
    CTC 19 |     CCC 20 |     CAC 20 |     CGC 13
    CTA 10 |     CCA 16 | Gln CAA 17 |     CGA  5
    CTG  8 |     CCG 11 |     CAG 15 |     CGG  8
--------------------------------------------------
Ile ATT  5 | Thr ACT  4 | Asn AAT  4 | Ser AGT  7
    ATC 20 |     ACC 21 |     AAC 14 |     AGC 12
    ATA 11 |     ACA 29 | Lys AAA 11 | Arg AGA  4
Met ATG 25 |     ACG 15 |     AAG 23 |     AGG 13
--------------------------------------------------
Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT  7
    GTC 18 |     GCC 26 |     GAC 33 |     GGC 11
    GTA  7 |     GCA 24 | Glu GAA 23 |     GGA 11
    GTG 10 |     GCG  8 |     GAG 15 |     GGG 11
--------------------------------------------------

Codon position x base (3x4) table for each sequence.

#1: species1       
position  1:    T:0.18989    C:0.25524    A:0.25277    G:0.30210
position  2:    T:0.26017    C:0.29470    A:0.27497    G:0.17016
position  3:    T:0.17386    C:0.33785    A:0.24908    G:0.23921
Average         T:0.20797    C:0.29593    A:0.25894    G:0.23716

#2: species2         
position  1:    T:0.19296    C:0.25211    A:0.24648    G:0.30845
position  2:    T:0.27183    C:0.30704    A:0.26620    G:0.15493
position  3:    T:0.20141    C:0.31831    A:0.22958    G:0.25070
Average         T:0.22207    C:0.29249    A:0.24742    G:0.23803

#3: species3    
position  1:    T:0.18619    C:0.25031    A:0.25771    G:0.30580
position  2:    T:0.25771    C:0.30210    A:0.26634    G:0.17386
position  3:    T:0.19729    C:0.31936    A:0.24291    G:0.24044
Average         T:0.21373    C:0.29059    A:0.25565    G:0.24003

#4: species4   
position  1:    T:0.20664    C:0.23616    A:0.26322    G:0.29397
position  2:    T:0.26568    C:0.29766    A:0.27306    G:0.16359
position  3:    T:0.16236    C:0.37638    A:0.21525    G:0.24600
Average         T:0.21156    C:0.30340    A:0.25051    G:0.23452

#5: species5       
position  1:    T:0.19876    C:0.24348    A:0.25839    G:0.29938
position  2:    T:0.25342    C:0.31677    A:0.26832    G:0.16149
position  3:    T:0.18758    C:0.33416    A:0.23230    G:0.24596
Average         T:0.21325    C:0.29814    A:0.25300    G:0.23561

#6: species6      
position  1:    T:0.19892    C:0.24899    A:0.24493    G:0.30717
position  2:    T:0.26522    C:0.30041    A:0.26387    G:0.17050
position  3:    T:0.17591    C:0.35047    A:0.22057    G:0.25304
Average         T:0.21335    C:0.29995    A:0.24312    G:0.24357

#7: species7      
position  1:    T:0.20000    C:0.24121    A:0.26424    G:0.29455
position  2:    T:0.25818    C:0.32000    A:0.26303    G:0.15879
position  3:    T:0.16606    C:0.34909    A:0.23636    G:0.24848
Average         T:0.20808    C:0.30343    A:0.25455    G:0.23394

Sums of codon usage counts
------------------------------------------------------------------------------
Phe F TTT      93 | Ser S TCT      61 | Tyr Y TAT      41 | Cys C TGT      65
      TTC     141 |       TCC     116 |       TAC      92 |       TGC      36
Leu L TTA      35 |       TCA     129 | *** * TAA       0 | *** * TGA       0
      TTG      99 |       TCG     114 |       TAG       0 | Trp W TGG      60
------------------------------------------------------------------------------
Leu L CTT      97 | Pro P CCT      60 | His H CAT      43 | Arg R CGT      35
      CTC     108 |       CCC     132 |       CAC     144 |       CGC      98
      CTA      50 |       CCA     116 | Gln Q CAA     125 |       CGA      40
      CTG      96 |       CCG      57 |       CAG     103 |       CGG      56
------------------------------------------------------------------------------
Ile I ATT      45 | Thr T ACT      39 | Asn N AAT      23 | Ser S AGT      52
      ATC     117 |       ACC     129 |       AAC     100 |       AGC      89
      ATA      75 |       ACA     155 | Lys K AAA      83 | Arg R AGA      43
Met M ATG     157 |       ACG      89 |       AAG     130 |       AGG      84
------------------------------------------------------------------------------
Val V GTT      67 | Ala A GCT      87 | Asp D GAT     116 | Gly G GGT      70
      GTC     127 |       GCC     187 |       GAC     201 |       GGC      64
      GTA      52 |       GCA     151 | Glu E GAA     168 |       GGA      60
      GTG      83 |       GCG      63 |       GAG     109 |       GGG      57
------------------------------------------------------------------------------

(Ambiguity data are not used in the counts.)


Codon position x base (3x4) table, overall

position  1:    T:0.19623    C:0.24664    A:0.25571    G:0.30141
position  2:    T:0.26152    C:0.30559    A:0.26804    G:0.16485
position  3:    T:0.18027    C:0.34113    A:0.23250    G:0.24610
Average         T:0.21267    C:0.29779    A:0.25209    G:0.23746


Nei & Gojobori 1986. dN/dS (dN, dS)
(Pairwise deletion)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)

species1            
species2               0.2598 (0.0599 0.2306)
species3          0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680)
species4         0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981 
0.3838)
species5             0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487 
(0.0552 0.2221) 0.2993 (0.0908 0.3034)
species6            0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644 
0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260)
species7            0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787 
0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967)


TREE #  1:  (((1, (2, 3)), 5), (6, 4), 7);   MP score: -1
lnL(ntime: 11  np: 14):  -7469.732728      +0.000000
   8..9     9..10   10..1    10..11   11..2    11..3     9..5     8..12   
12..6    12..4     8..7  

 0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030 
0.062291 0.297695 0.117429 2.800021 0.731929 0.083728

Note: Branch length is defined as number of nucleotide substitutions per codon 
(not per neucleotide site).

tree length =   1.22476

(((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010): 
0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429);

(((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525): 
0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4: 
0.297695): 0.001030, species7: 0.117429);

Detailed output identifying parameters

kappa (ts/tv) =  2.80002


dN/dS (w) for site classes (K=2)

p:   0.73193  0.26807
w:   0.08373  1.00000

dN & dS for each branch

 branch          t       N       S   dN/dS      dN      dS  N*dN  S*dS

   8..9       0.180   1857.3    725.7   0.3294   0.0381   0.1158   70.8   84.0
   9..10      0.083   1857.3    725.7   0.3294   0.0176   0.0534   32.7   38.7
  10..1       0.173   1857.3    725.7   0.3294   0.0366   0.1111   68.0   80.6
  10..11      0.088   1857.3    725.7   0.3294   0.0186   0.0563   34.5   40.9
  11..2       0.067   1857.3    725.7   0.3294   0.0143   0.0434   26.6   31.5
  11..3       0.032   1857.3    725.7   0.3294   0.0068   0.0206   12.6   15.0
   9..5       0.124   1857.3    725.7   0.3294   0.0263   0.0798   48.8   57.9
   8..12      0.001   1857.3    725.7   0.3294   0.0002   0.0007    0.4    0.5
  12..6       0.062   1857.3    725.7   0.3294   0.0132   0.0401   24.5   29.1
  12..4       0.298   1857.3    725.7   0.3294   0.0631   0.1917  117.2  139.1
   8..7       0.117   1857.3    725.7   0.3294   0.0249   0.0756   46.2   54.9


Time used:  0:10

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Can you get codeml to output what you need in another format, such as NEXUS?

Haven't tried that, but as you can see, this is a very verbose output and NEXUS 
does not seem an option. 

Ultimately, I want to parse this to get all the information I need in a 
tabulated file. I am still working out what exactly I need (there are standard 
values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type 
of downstram analysis). I will now work on the pypaml class and modify the 
original code to make it more generic (it seems that it only works for Site 
Models). 

Will let you know, was just wondering if there was already a solution.There is 
one in Bioperl, but heard it is very slow and in any case, I don't understand 
much of perl....
Thanks, 
Anastasia


From biopython at maubp.freeserve.co.uk  Tue Sep 14 05:04:56 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 10:04:56 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <937371.74873.qm@web52006.mail.re2.yahoo.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
Message-ID: <AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>

Hi Anastasia,

On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
> Hi Peter,
>
>>
>> Could you post a short example of the kind of output you are looking at?
>>
>
> Here is an example output, but this caan differ depending on the model used
> (there are several models for Branch, Site, BranchSite, but all are pretty
> standard)
>

Thanks - that looks possible to parse, but not very easy (especially if the
codeml output changes slightly between versions).

>>
>> Can you get codeml to output what you need in another format, such as NEXUS?
>>
>
> Haven't tried that, but as you can see, this is a very verbose output and
> NEXUS does not seem an option.

At first glance, the NEXUS format could hold a lot of that information.
Another possibility might be phyloXML. However, you are at the mercy
of the codeml tool and what it supports. I might be worth politely asking
the author(s) about supporting one of these more standard formats as
a optional output.

> Ultimately, I want to parse this to get all the information I need in a
> tabulated file. I am still working out what exactly I need (there are standard
> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
> of downstram analysis). I will now work on the pypaml class and modify the
> original code to make it more generic (it seems that it only works for Site
> Models).

Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
unless he agrees to re-license it we cannot include it in Biopython.

> Will let you know, was just wondering if there was already a solution.There is
> one in Bioperl, but heard it is very slow and in any case, I don't understand
> much of perl....

I don't know much Perl either ;)

Peter

From p.j.a.cock at googlemail.com  Tue Sep 14 05:13:04 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 10:13:04 +0100
Subject: [Biopython] problems searching swiss prot
In-Reply-To: <AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>
References: <mailman.4871.1284059429.3031.biopython@lists.open-bio.org>
	<a06240806c8b41d461cfc@131.229.113.228>
	<AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>
Message-ID: <AANLkTim9Cs96dbS_1cD7pfNt3g77koUkHm5Si-4gZANs@mail.gmail.com>

On Mon, Sep 13, 2010 at 9:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Forwarding a query from Jessica Grant since she appears
> to have had trouble posting to the mailing list.
>
> Jessica wrote:
>
>> Hello,
>>
>> I am running a few scripts to try to extract sequence information
>> out of uniprot. ?One program called AutoFACT gives me ID numbers
>> associated with that database. ?Most of these look like this:
>>
>> D2V5S4_NAEGR
>> Q48KU2_PSE14
>> Q22B72_TETTH
>>
>>
>> and my downstream scripts, which are written in biopython, are
>> fine with this. ?Then, every once in a while, a sequence will come
>> back with a name that looks like this:
>>
>> UPI00006CC162
>>
>> and everything goes bad. ?My script can't handle these names,
>> apparently, although if I go to uniprot.org and search for it, the
>> sequence comes up.
>>
>> My script uses the following, where RepID is the number
>> extracted from AutoFACT:
>>
>> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None)
>> ? ? ? ?seq_record = SeqIO.read(handle, "swiss")
>>
>> Any thoughts?
>>
>> Thank you,
>>
>> Jessica
>
> Hi Jessica,
>
> I think the problem is that these unusual identifiers are
> not UniProt/SwissProt accession identifiers. The URL
> this Biopython function uses was originally from
> www.expasy.ch but is now on www.uniprot.org as
> described here:
>
> http://www.expasy.ch/expasy_urls.html
>
> I think the ID UPI00006CC162 is a UniProt ID of some
> kind, so it may be possible to access the information
> you want somehow. See for example:
>
> http://www.uniprot.org/uniparc/UPI00006CC162
>
> However, it is not clear to me right away if you can get
> this record back as a plain text "swiss" format entry...
>
> Peter

Jessica replied (off list), to say:

>> Oh, and I got a great help from someone at Uniprot for my
>> previous question...turns out you can get the sequences
>> downloaded as fasta files:
>>
>> http://www.uniprot.org/uniparc/UPI00006CC162.fasta
>>
>> and I could then read them into SeqIO as a fasta and
>> manipulate them that way.

I guess the UPI at the start stands for Uni Parc Identifier.

Note that the page I linked to earlier has links to several
file formats including FASTA, but not plain text "SwissProt"
format: http://www.uniprot.org/uniparc/UPI00006CC162

Peter


From p.j.a.cock at googlemail.com  Tue Sep 14 05:49:56 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 10:49:56 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240801c8b490b75d3f@10.0.1.4>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
Message-ID: <AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:47 AM, Jessica Grant wrote:
>On Mon, Sep 13, 2010 at 9:49 PM, Peter Cock wrote:
>> On Mon, Sep 13, 2010 at 7:43 PM, Jessica Grant wrote:
>>>
>>> Hello,
>>>
>>> I am working with an organism that has an unusual genetic code.  Is there
>>> a way I can use biopython's translate() but import my own codon table
>>> instead of using the standard ncbi tables that are coded in the CodonTable
>>> module?
>>>
>>> Thanks!
>>>
>>> Jessica
>>
>> Hi Jessica,
>>
>> Good question - this is something I had thought about but not done anything
>> since no one had ever asked about using a non-standard table. After all, the
>> NCBI do have a pretty comprehensive list. I'm curious which organism(s) you
>> are using.
>>
>> In answer to your query, right now, not easily. However, it would be simple to
>> tweak the Bio.Seq module to allow the table argument to be a string or integer
>> as now (for referring to a built in NCBI table) or a CodonTable object which you
>> would have to supply. These are defined in the Bio.Data.CodonTable module.
>> If this sounds useful and you could help with testing, it could be done ready
>> for the next release of Biopython.
>>
>> Peter
>
> Thanks Peter,
>
> We are doing some work on a ciliate called Chilodonella uncinata. ?It
> apparently has only one stop codon and the others are recoded in an
> unusual way so it doesn't quite fit any of the ncbi tables.
>
> I did try to play around with the CodonTable module, but couldnt' quite
> figure out how to do it. ?Just making a new table similar to the tables that
> are in the module didn't do it, and I didn't feel comfortable messing around
> in the depths of biopython. ?:)
>
> I would be happy to help with testing and I guess in the meantime I will be
> putting lots of if statements in my script.
>
> Jessica

Hi Jessica,

Do you have the information for the CodonTable handy? e.g. a list of the
start codons, and how to translate the 64 codons (including stop codons).
Given that I could show you how to make the CodonTable object.

Peter


From p.j.a.cock at googlemail.com  Tue Sep 14 06:39:34 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 11:39:34 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
Message-ID: <AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>

On Tue, Sep 14, 2010 at 10:49 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Hi Jessica,
>
> Do you have the information for the CodonTable handy? e.g. a list of the
> start codons, and how to translate the 64 codons (including stop codons).
> Given that I could show you how to make the CodonTable object.
>
> Peter
>

I've done a proof of principle change to Bio.Seq on this branch:
http://github.com/peterjc/biopython/tree/trans-table
specifically this commit:
http://github.com/peterjc/biopython/commit/56a2fd5f92098e9be892eb51f27b08aaa46a19a6

I'm not expecting you to try this code out yet (unless you happen to
know your way round git already). The basic idea is that the Bio.Seq
translate function and the Seq object translate method are extended
so that the table argument can now also be a CodonTable object.

Once we know what your table should look like, I can write a complete
example. Probably Bio.Data.CodonTable will need some more
documentation added...

Peter

From zaricdragoslav at gmail.com  Tue Sep 14 07:08:55 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 15:08:55 +0400
Subject: [Biopython] Intro
Message-ID: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>

Dear All,

I am new member and I like to send welcome greet to everyone.

I have few newbie questions so please be cooperative :)

1. How can I see biopython version and is there connection between python
version
    and biopython version ?

2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 (at least I
think I did :)
    I have same installation on windows machine and everything works fine.
    But for example when I want to use something like this:

    from Bio import SeqIO
    orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")

    Two problems happens in ubuntu environment:
    first is that SeqIO complains that there is no index method
    second is that everywhere I should put string location of file
    biopython wants handle to file
    The first thing I can think of is maybe I am using old version of
    biopython, which points to question 1.

3. Does somebody have experience with using biopython in django web site ?
    Do I install biopython on web server or I can keep libraries in some
folder and
    load them dynamically in code ?

Kind regards,

Dragoslav Zaric
Programmer
Msc. in Astrophysics

From biopython at maubp.freeserve.co.uk  Tue Sep 14 07:44:47 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 12:44:47 +0100
Subject: [Biopython] Intro
In-Reply-To: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
Message-ID: <AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>

On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric
<zaricdragoslav at gmail.com> wrote:
> Dear All,
>
> I am new member and I like to send welcome greet to everyone.
>
> I have few newbie questions so please be cooperative :)
>

Hello and welcome :)

> 1. How can I see biopython version and is there connection between python
> version and biopython version ?

Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions
of Biopython may not have worked 100% on Python 2.7, but we did
previously support Python 2.3 and even older versions of Python.

There is a FAQ (frequently asked questions) section in the Tutorial for
how to determine the version of Biopython installed. Try:

import Bio
print Bio.__version__

The Tutorial for the latest release is online as PDF or HTML,
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/DIST/docs/tutorial/Tutorial.html

> 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04
> (at least I think I did :)

Based on the problems below, I don't think it worked.

> ? ?I have same installation on windows machine and everything works fine.
> ? ?But for example when I want to use something like this:
>
> ? ?from Bio import SeqIO
> ? ?orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")
>
> ? ?Two problems happens in ubuntu environment:
> ? ?first is that SeqIO complains that there is no index method

That does suggest you have an old version of Biopython. The index
function was added in Biopython 1.52, see:

http://biopython.open-bio.org/SRC/biopython/NEWS
http://news.open-bio.org/news/2009/09/biopython-release-152/

> ? ?second is that everywhere I should put string location of file
> ? ?biopython wants handle to file

Things like Bio.SeqIO will accept filenames in recent versions of
Biopython (since release 1.54), but older versions only accepted
file handles. This is discussed in an FAQ in recent versions of the
Tutorial which point to this section on handles:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles

> ? ?The first thing I can think of is maybe I am using old version of
> ? ?biopython, which points to question 1.

That does seem to be the problem.

> 3. Does somebody have experience with using biopython in django
> web site ? ?Do I install biopython on web server or I can keep libraries
> in some folder and?load them dynamically in code ?

I've used Biopython within TurboGears, but I haven't used django.
You should probably consult the django documentation for how they
recommend installing 3rd party libraries (e.g. they may recommend
using a virtual environment).

Peter


From zaricdragoslav at gmail.com  Tue Sep 14 08:04:47 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 16:04:47 +0400
Subject: [Biopython] Intro
In-Reply-To: <AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
Message-ID: <AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>

Peter,

Thank you so very much for detailed explanations.

I will try to upgrade biopython version under linux.

Kind regards,

Dragoslav Zaric


On Tue, Sep 14, 2010 at 3:44 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric
> <zaricdragoslav at gmail.com> wrote:
> > Dear All,
> >
> > I am new member and I like to send welcome greet to everyone.
> >
> > I have few newbie questions so please be cooperative :)
> >
>
> Hello and welcome :)
>
> > 1. How can I see biopython version and is there connection between python
> > version and biopython version ?
>
> Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions
> of Biopython may not have worked 100% on Python 2.7, but we did
> previously support Python 2.3 and even older versions of Python.
>
> There is a FAQ (frequently asked questions) section in the Tutorial for
> how to determine the version of Biopython installed. Try:
>
> import Bio
> print Bio.__version__
>
> The Tutorial for the latest release is online as PDF or HTML,
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
>
> > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04
> > (at least I think I did :)
>
> Based on the problems below, I don't think it worked.
>
> >    I have same installation on windows machine and everything works fine.
> >    But for example when I want to use something like this:
> >
> >    from Bio import SeqIO
> >    orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")
> >
> >    Two problems happens in ubuntu environment:
> >    first is that SeqIO complains that there is no index method
>
> That does suggest you have an old version of Biopython. The index
> function was added in Biopython 1.52, see:
>
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://news.open-bio.org/news/2009/09/biopython-release-152/
>
> >    second is that everywhere I should put string location of file
> >    biopython wants handle to file
>
> Things like Bio.SeqIO will accept filenames in recent versions of
> Biopython (since release 1.54), but older versions only accepted
> file handles. This is discussed in an FAQ in recent versions of the
> Tutorial which point to this section on handles:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles
>
> >    The first thing I can think of is maybe I am using old version of
> >    biopython, which points to question 1.
>
> That does seem to be the problem.
>
> > 3. Does somebody have experience with using biopython in django
> > web site ?  Do I install biopython on web server or I can keep libraries
> > in some folder and load them dynamically in code ?
>
> I've used Biopython within TurboGears, but I haven't used django.
> You should probably consult the django documentation for how they
> recommend installing 3rd party libraries (e.g. they may recommend
> using a virtual environment).
>
> Peter
>

From bartek at rezolwenta.eu.org  Tue Sep 14 08:20:44 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 14 Sep 2010 14:20:44 +0200
Subject: [Biopython] Intro
In-Reply-To: <AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
Message-ID: <AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:04 PM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Peter,
>
> Thank you so very much for detailed explanations.
>
> I will try to upgrade biopython version under linux.
>
> Hi,

Since you mentioned that you are working on ubuntu, I wanted to add that you
should be careful when upgrading the python/biopython versions on your
machine.

You are most probably now running both python and biopython installed from
ubuntu packages, but if you want to upgrade, you have to choose between
taking newer packages from a newer ubuntu  (currently the newest ubuntu
10.10 beta contains biopython 1.53
http://packages.ubuntu.com/lucid/python-biopython) or compiling from source.
If you choose to install from source, be sure to first uninstall the old
version from the package:
sudo apt-get remove python-biopython

if you want to install from source, you will need some extra packages:
sudo apt-get install python-dev python-reportlab python-numpy

good luck
Bartek

From bartek at rezolwenta.eu.org  Tue Sep 14 09:13:59 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 14 Sep 2010 15:13:59 +0200
Subject: [Biopython] Intro
In-Reply-To: <AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
	<AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
Message-ID: <AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:

>   (currently the newest ubuntu 10.10 beta contains biopython 1.53
> http://packages.ubuntu.com/lucid/python-biopython)


Just a small correction:
1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains
1.54 (http://packages.ubuntu.com/maverick/python-biopython)

sorry for the error
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From zaricdragoslav at gmail.com  Tue Sep 14 10:22:55 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 18:22:55 +0400
Subject: [Biopython] Intro
In-Reply-To: <AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
	<AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
	<AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>
Message-ID: <AANLkTinYgWm9uLzw_CJCx=ZUQd8E9AgxST6RrMta5E=Q@mail.gmail.com>

Thanks for answers and help !

Actually I do not prefer to use ubuntu above 9.04 and there is no reason to
change
distribution because one program.

I just did

sudo apt-get remove python-biopython

and after this 1.55 was automatically activated. I did install 1.55 but it
looks like
older version of biopython from default package was masking new biopython
version.

Thanks again !

On Tue, Sep 14, 2010 at 5:13 PM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:

>
>
> On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski <
> bartek at rezolwenta.eu.org> wrote:
>
>>   (currently the newest ubuntu 10.10 beta contains biopython 1.53
>> http://packages.ubuntu.com/lucid/python-biopython)
>
>
> Just a small correction:
> 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains
> 1.54 (http://packages.ubuntu.com/maverick/python-biopython)
>
> sorry for the error
> Bartek
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics

From zaricdragoslav at gmail.com  Tue Sep 14 10:29:27 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 18:29:27 +0400
Subject: [Biopython] Some books
Message-ID: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>

Dear friends,

I do not come from bioinformatics background, so can anybody
recommend some introducing book about bioinformatics so I can
cover the basics.

Of course there are a lot of python programming in biopython that is
out of biology (like parsing of database files, connect to databases), but
to get clear picture it is good to read some introducing book.

Is book

"Introduction to Bioinformatics" by Arthur Lesk

good one ?

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics

From p.j.a.cock at googlemail.com  Tue Sep 14 10:58:05 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 15:58:05 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240801c8b529c10a62@131.229.113.228>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
	<AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>
	<a06240801c8b529c10a62@131.229.113.228>
Message-ID: <AANLkTinRGCzPT-pQpgcJwNxHA31H7_HphVE+H=78S5cq@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hi Peter,
>
> Here is the codon table, in the format I found in CodonTable.py.
>
> I will look at the links you sent, but I don't know if I will be able to
> follow it all. ?Thanks,
>
> Jessica
>
> ? ? ? ? ? ? ? ? ? ?table = {
> ? ? 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S',
> ? ? 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y',
> ? ? 'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L',
> ? ? 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P',
> ? ? 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
> ? ? 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I',
> ? ? 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T',
> ? ? 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K',
> ? ? 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
> ? ? 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A',
> ? ? 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D',
> ? ? 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G',
> ? ? 'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',},
> ? ? ? ? ? ? ? ? ? ?stop_codons = ['TAA' ],
> ? ? ? ? ? ? ? ? ? ?start_codons = [ 'ATG']
> ? ? ? ? ? ? ? ? ? ?)

OK, don't worry about the git branch stuff - I've just merged
this to the main repository. Are you happy with installing
Biopython from source? If so grab the latest source code
as described here:

http://www.biopython.org/wiki/SourceCode

Alternatively all you need to update is the Bio/Seq.py file
to the latest version:

http://github.com/biopython/biopython/raw/master/Bio/Seq.py

To use the new functionality, first you need to create a
CodonData object with your special table, and assuming
you are just working with unambiguous DNA that means:

from Bio.Data.CodonTable import CodonTable
c_uncinata_table = CodonTable(forward_table={
    'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L',
    'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S',
    'TAT': 'Y', 'TAC': 'Y',             'TAG': 'Q',
    'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W',
    'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L',
    'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M',
    'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
    'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
    'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
    'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V',
    'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'},
    start_codons = [ 'ATG'],
    stop_codons = ['TAA' ])

Note that order of the forward table dictionary entries
does not actually matter, however, I have moved the
TAG and TGA entries from the end to keep the whole
table in a standard order - I found this easier to check.

If you have the updated Bio.Seq module, then you
can do this:

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> seq =  Seq("AAATAGTGATAA", generic_dna)
>>> print seq.translate()
K***
>>> print seq.translate(table=c_uncinata_table)
KQW*

Or using strings,

>>> from Bio.Seq import translate
>>> print translate("AAATAGTGATAA")
K***
>>> print translate("AAATAGTGATAA", table=c_uncinata_table)
KQW*

Does that make sense? Does it do what you expect?
Don't hesitate to ask for clarification.

Peter


From Achim.Treumann at NEPAF.com  Tue Sep 14 10:51:29 2010
From: Achim.Treumann at NEPAF.com (Achim Treumann)
Date: Tue, 14 Sep 2010 15:51:29 +0100
Subject: [Biopython] Some books
In-Reply-To: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
Message-ID: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>

Dear Dragoslav, 

I cannot comment on Arthur Lesk's book - haven't read it. 

I can really recommend two freely available tutorials on Katja
Schuerer's website:
 
One of them is an introduction to programming using Python:
http://www.pasteur.fr/formation/infobio/python/

The other one is a Python course in Bioinformatics:
http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

Both of them provide you with numerous examples and take you through
tips and tricks on how to address bioinformatic problems using Python
and Biopython. 

I presume that you are familiar with the Biopython manual that is part
of the Biopython distribution:
http://www.biopython.org/DIST/docs/tutorial/Tutorial.html

Hope this helps, 
Best wishes, 
Achim

-----Original Message-----
From: biopython-bounces at lists.open-bio.org
[mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Dragoslav
Zaric
Sent: 14 September 2010 15:29
To: biopython at lists.open-bio.org
Subject: [Biopython] Some books

Dear friends,

I do not come from bioinformatics background, so can anybody
recommend some introducing book about bioinformatics so I can
cover the basics.

Of course there are a lot of python programming in biopython that is
out of biology (like parsing of database files, connect to databases),
but
to get clear picture it is good to read some introducing book.

Is book

"Introduction to Bioinformatics" by Arthur Lesk

good one ?

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at illinois.edu  Tue Sep 14 10:59:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 14 Sep 2010 09:59:49 -0500
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
Message-ID: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>

On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris


From p.j.a.cock at googlemail.com  Tue Sep 14 12:04:52 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 17:04:52 +0100
Subject: [Biopython] Some books
In-Reply-To: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
	<01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
Message-ID: <AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann
<Achim.Treumann at nepaf.com> wrote:
> Dear Dragoslav,
>
> ...
>
> The other one is a Python course in Bioinformatics:
> http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

The above Pasteur Institute course using Biopython is sadly very out of
date in places, and I have been unable to get in touch with the authors
to revise it or at least add some warning text to it.

Peter

From biopython at maubp.freeserve.co.uk  Tue Sep 14 12:07:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 17:07:52 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
	<8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
Message-ID: <AANLkTin-S=G2d76qe0t0QZ1eZ0rGXmSAAUhDuJSxHZeX@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Sep 14, 2010, at 4:04 AM, Peter wrote:
>> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>>>
>>> Here is an example output, but this caan differ depending on the model used
>>> (there are several models for Branch, Site, BranchSite, but all are pretty
>>> standard)
>>
>> Thanks - that looks possible to parse, but not very easy (especially if the
>> codeml output changes slightly between versions).
>
> Just a warning from those experienced with paml parsers (bioperl): the
> output is notoriously shifty even between minor releases (sections get
> reordered, etc), so pretty much any parse needs to accommodate that.
> It's extremely frustrating.

Thanks Chris - I was afraid of that. It sounds like parsing plain text
NCBI BLAST output, but worse.

Do you know if anyone has asked about codeml outputting something
nicer to parse instead? e.g. Nexus or any kind of XML?

Peter

From Achim.Treumann at NEPAF.com  Tue Sep 14 12:20:36 2010
From: Achim.Treumann at NEPAF.com (Achim Treumann)
Date: Tue, 14 Sep 2010 17:20:36 +0100
Subject: [Biopython] Some books
In-Reply-To: <AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com><01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
	<AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>
Message-ID: <01798D2396253A449511F31F1CDE83550FBAF2@srv1.NEPAF.local>

Hiya, 

I agree about this warning (and have come across a few bits where this
has caused problems) - despite that I found them very useful. 

Best wishes, 
Achim 

-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Sent: 14 September 2010 17:05
To: Achim Treumann
Cc: Dragoslav Zaric; biopython at lists.open-bio.org
Subject: Re: [Biopython] Some books

On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann
<Achim.Treumann at nepaf.com> wrote:
> Dear Dragoslav,
>
> ...
>
> The other one is a Python course in Bioinformatics:
> http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

The above Pasteur Institute course using Biopython is sadly very out of
date in places, and I have been unable to get in touch with the authors
to revise it or at least add some warning text to it.

Peter


From natassa_g_2000 at yahoo.com  Tue Sep 14 12:15:18 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Tue, 14 Sep 2010 09:15:18 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
	<8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
Message-ID: <884978.85315.qm@web52004.mail.re2.yahoo.com>

Thanks Chris, 
Good to know.. I am dealing with paml results for the first time, but somehow 
thought that outputs were standard. Apparently not...
Now that I started writing my own python parser, I see that even among models of 
the same run, the text changes without any obvious reason (from 'omega' to 'w' 
etc). Indeed frustrating!
Does the Bioperl solution  include different parsers for different types of 
analysis ex The Branch analysis models, another for the Site Analysis models 
etc? It would be good o have one for all, but I am not sure this is feasible...I 
start with separate parsers and will see how it can be generalized.
Thanks, 
Anastasia


________________________________
From: Chris Fields <cjfields at illinois.edu>
To: Peter <biopython at maubp.freeserve.co.uk>
Cc: natassa <natassa_g_2000 at yahoo.com>; biopython at biopython.org
Sent: Tue, September 14, 2010 4:59:49 PM
Subject: Re: [Biopython] Codeml parser in Biopython?

On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are 
standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the 
>>type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There 
is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is 
notoriously shifty even between minor releases (sections get reordered, etc), so 
pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris


From p.j.a.cock at googlemail.com  Wed Sep 15 13:10:46 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Sep 2010 18:10:46 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240802c8b6a77d8334@131.229.113.228>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
	<AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>
	<a06240801c8b529c10a62@131.229.113.228>
	<AANLkTinRGCzPT-pQpgcJwNxHA31H7_HphVE+H=78S5cq@mail.gmail.com>
	<a06240802c8b6a77d8334@131.229.113.228>
Message-ID: <AANLkTinLSfzyQ8Uou4tgSHanGDN1LgQHM_71bmPmRS_C@mail.gmail.com>

On Wed, Sep 15, 2010 at 5:50 PM, Jessica Grant <jgrant at smith.edu> wrote:
>Peter wrote:
>> ...
>> To use the new functionality, first you need to create a
>> CodonData object with your special table, and assuming
>> you are just working with unambiguous DNA that means:
>> ...
>> Does that make sense? Does it do what you expect?
>> Don't hesitate to ask for clarification.
>>
>> Peter
>
> It works!  Thanks so much!!
>
> Jessica

Great - thanks for letting us know.

Peter

From cwalentas at gmail.com  Thu Sep 16 00:36:13 2010
From: cwalentas at gmail.com (Christopher Walentas)
Date: Thu, 16 Sep 2010 00:36:13 -0400
Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized
	relational resource
Message-ID: <4C919EBD.3080802@gmail.com>

  Apologies in advance- all of this is very new to me- and I hope that 
this is the proper forum for this query.

What I would like to do is parse the returns of an entrez pubmed search 
into their smallest, unique useful bits and create a relational database 
(sqlite, dee?).  Ideally this would not only be of returned fields, but 
also drilling further down into say affiliation, addresses, etc...

I believe that I've mastered the search and download functions and 
individual citations exist as a stacked dictionary of the xml outputs.

Where I am falling down is understanding how to extract the structure of 
these outputs and create a persistent relational resource that's been 
normalized such that these fields can be mapped to used to "correct" 
values in an uncurated dataset with highly analogous fields.

I've been struggling to bridge the gap between python and sqlite/dee, 
however have recently been informed that it might be possible to do 
everything within python itself and again apologies for any navieties- 
they are indeed sincere, however I'm well aware that a little knowledge 
can be dangerous- hence reaching out.

 From what I've already read, it would seem that all of this is ideally 
suited to bio-/python and am looking forward to learning- I'm just 
looking for that swift shove in the right direction and to benefit from 
your collective informed guidance.

Cheers in advance,
christopher

From mjldehoon at yahoo.com  Thu Sep 16 06:53:55 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 16 Sep 2010 03:53:55 -0700 (PDT)
Subject: [Biopython] Fwd: NETTAB 2010: Submission deadline is approaching:
	Sep 24, 2010
Message-ID: <841162.95892.qm@web62405.mail.re1.yahoo.com>


--- On Thu, 9/16/10, Paolo Romano <paolo.romano at istge.it> wrote:

> From: Paolo Romano <paolo.romano at istge.it>
> Subject: Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24,  2010
> To: biopython-owner at lists.open-bio.org
> Date: Thursday, September 16, 2010, 3:25 AM
> Dear list owner,
> 
> I would be glad if you would forward thsi message to the
> list.
> 
> Many thanks in adavnce.
> 
> Ciao. Paolo
> 
> >Date: Wed, 15 Sep 2010 17:50:48 +0200
> >To: biopython at lists.open-bio.org
> >From: Paolo Romano <paolo.romano at istge.it>
> >Subject: NETTAB 2010: Submission deadline is
> approaching: Sep 24, 2010
> >
> >I hope this announcement can be of interest for this
> list.
> >
> >Forgive me if I'm wrong!
> >
> >Ciao. Paolo
> >
> >==========
> >NETTAB 2010 on "Biological Wikis"
> >joint with the BBCC 2010 workshop on Bioinformatics and
> 
> >Computational Biology in Campania
> >
> >November 29 - December 1, 2010, Naples, Italy
> >http://www.nettab.org/2010/
> >http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/
> >
> >
> >The deadline for the submission of oral communications
> is quickly 
> >approaching, submit you contribution within next
> >Friday September 24, 2010 through the EasyChair site (
> 
> >http://www.easychair.org/conferences/?conf=nettab2010
> ).
> >
> >The lenght of contributions for oral communications
> should be 
> >between 3 and 5 pages, including tables and figures.
> >See more instructions below.
> >
> >
> >NETTAB 2010 workshop promises to be a great meeting for
> all 
> >researchers involved in the exploitation of wikis in
> biology.
> >
> >Don't miss this opportunity to discuss your ideas and
> doubts with 
> >such scientists as
> >- Alex Bateman, Wellcome Trust Sanger Institute,
> Hinxton, Cambridge, 
> >United Kingdom
> >- Alexander Pico, Gladstone Institute of Cardiovascular
> Disease, San 
> >Francisco, USA
> >- Andrew Su, Bioinformatics and Computational Biology,
> Genomics 
> >Institute of the Novartis Research Foundation (GNF),
> San Diego, USA
> >- Dan Bolser, College of Life Sciences, University of
> Dundee, 
> >Scotland, United Kingdom
> >- Robert Hoffmann, Computational Biology Center, cBIO,
> Memorial 
> >Sloan-Kettering Cancer Center, MSKCC, New York, USA
> >- Thomas Kelder, Department of Bioinformatics (BiGCaT),
> Maastricht 
> >University, the Netherlands
> >- Jaime Prilusky, Bioinformatics, Weizmann Institute of
> Science, 
> >Rehovot, Israel
> >- and many other who, we hope, will join the workshop.
> >
> >Here below, please find a summary of the Call. The
> complete Call is 
> >available on-line at http://www.nettab.org/2010/call.html .
> >
> >Further information is availble at http://www.nettab.org/2010/ .
> >
> >============
> >CALL FOR PAPERS
> >
> >TOPICS
> >The following list is not meant to be exclusive of any
> further 
> >topics as stated above.
> >Submitted contributions should address one or more of
> the following topics:
> >? ???* Wiki development tools
> >? ? ? ? ???o
> Wikimedia
> >? ? ? ? ???o
> Wikimedia extensions
> >? ? ? ? ???o
> Semantic Wikis
> >? ? ? ? ???o
> Wiki-coupled CMSs
> >? ? ? ? ???o Other
> wikis
> >? ???* Arising issues for the
> biomedical domain:
> >? ? ? ? ???o
> Authoritativeness of contributions and sites
> >? ? ? ? ???o Quality
> assessment
> >? ? ? ? ???o Users
> acknowledgement
> >? ? ? ? ???o
> Stimulatation of quality contributions
> >? ? ? ? ???o
> Authorships management and reward
> >? ? ? ? ???o
> 'Scientific production' value for contributions
> >? ? ? ? ???o
> Management of bioinformatics data types
> >? ???* Wikis and collaborative
> systems for:
> >? ? ? ? ???o
> Genomics, proteomics, metabolomics, any -omics
> >? ? ? ? ???o
> Proteins analysis and visualization
> >? ? ? ? ???o gene
> and proteins interactions
> >? ? ? ? ???o
> metabolic pathways
> >? ? ? ? ???o
> oncology research
> >? ???* Issues to be tackled by wiki
> and collaborative research for:
> >? ? ? ? ???o
> Genomics, proteomics, metabolomics, any -omics
> >? ? ? ? ???o
> Proteins analysis and visualization
> >? ? ? ? ???o gene
> and proteins interactions
> >? ? ? ? ???o
> metabolic pathways
> >? ? ? ? ???o
> oncology research
> >
> >The NETTAB 2010 workshop is a joint event with the BBCC
> 2010 workshop on
> >This deadline also applies to the BBCC 2010 workshop.
> >Submit for BBCC through the same EasyChair site and
> select 'BBCC 
> >session' topic.
> >
> >
> >TYPE OF CONTRIBUTIONS
> >
> >The following possible contributions are sought:
> >? ???* Oral communications
> >? ???* Posters
> >? ???* Software demos
> >All accepted contributions will be published in the
> proceedings of 
> >the workshop.
> >
> >
> >DEADLINES
> >
> >* September 24, 2010: Oral communications submission
> >? ? ? ? ???o
> Decisions announced: October 24, 2010
> >
> >* October 29, 2010: Early registration ends
> >
> >* November 29 - December 1, 2010: Workshop and
> Tutorials
> >
> >
> >INSTRUCTIONS
> >Kindly follow the instructions carefully when preparing
> your 
> >contribution and submit your contribution through the
> EasyChair 
> >system at http://www.easychair.org/conferences/?conf=nettab2010.
> >
> >All contributions should follow the same format, as
> specified here:
> >font type: Times New Roman, font size: 12 pti, page
> size: A4, left 
> >and right margins: 2.0 cm, upper margin: 2.5 cm, lower
> margin: 2.0 cm.
> >
> >The lenght of contributions for oral communications
> should be 
> >between 3 and 5 pages, including tables and figures.
> >They should include: Abstract, Introduction, Methods,
> Results and 
> >Discussion, References.
> >All contributions for oral communications will be
> evaluated by at 
> >least three referees.
> >
> >For any further information or clarification, please
> contact the 
> >organization by email at info at nettab.org.
> >
> >
> >ORGANIZATION (see http://www.nettab.org/2010/organization.html for 
> >the Scientific Committee and more information)
> >
> >Co-chairs
> >? ???* Angelo Facchiano, CNR-ISA,
> Avellino, Italy
> >? ???* Paolo Romano, National
> Cancer Research Institute, Genoa, Italy
> >
> >We look forward to meeting you in Naples!
> >
> >Paolo Romano and Angelo Facchiano
> >???on behalf of the Scientific
> Committee
> >
> >
> >Paolo Romano (paolo.romano at istge.it)
> >Bioinformatics
> >National Cancer Research Institute (IST)
> >Largo Rosanna Benzi, 10, I-16132, Genova, Italy
> >Tel: +39-010-5737-288? Fax: +39-010-5737-295?
> Skype: p.romano
> >Web: http://www.nettab.org/promano/
> >
> >
> >
> >
> 
> 
> Paolo Romano (paolo.romano at istge.it)
> Bioinformatics
> National Cancer Research Institute (IST)
> 
> 
> 
> 


From zaricdragoslav at gmail.com  Sun Sep 19 02:33:06 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 10:33:06 +0400
Subject: [Biopython] Trhird party library
Message-ID: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>

Did anybody used biopython as third part library, like for example in python
web project ?
I ask this because probably you can not expect to find or install biopython
in provider server
environment.

For example, after installing biopython in windows environment, you can see
that biopython is
installed inside python 2.6 installation:

C:\Python26\Lib\site-packages\Bio
C:\Python26\Lib\site-packages\BioSQL
C:\Python26\Lib\site-packages\numpy

So can you copy these folders to, for example, \Lib\ folder of web project,
and reference them
somehow from code ?

Of course I can test this by myself, and I will do this, but maybe somebody
have experience
with this problem, and it would be probably good info for others in this
forum.

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics

From chapmanb at 50mail.com  Sun Sep 19 06:51:19 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 19 Sep 2010 06:51:19 -0400
Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized
 relational resource
In-Reply-To: <4C919EBD.3080802@gmail.com>
References: <4C919EBD.3080802@gmail.com>
Message-ID: <20100919105119.GC2030@kunkel>

Christopher;

> What I would like to do is parse the returns of an entrez pubmed
> search into their smallest, unique useful bits and create a
> relational database (sqlite, dee?).  Ideally this would not only be
> of returned fields, but also drilling further down into say
> affiliation, addresses, etc...
[...]
> Where I am falling down is understanding how to extract the
> structure of these outputs and create a persistent relational
> resource that's been normalized such that these fields can be mapped
> to used to "correct" values in an uncurated dataset with highly
> analogous fields.

This is the standard problem of represent object style data in a
flat relational database. It's tough to answer succinctly on a
mailing list, as there are entire textbooks and courses devoted to
the problem. The wikipedia entry on normalization and first normal
form is a good place to get started:

http://en.wikipedia.org/wiki/Database_normalization

As far as accessing relational databases, Python is great for this.
An object relational mapper like SQLAlchemy:

http://www.sqlalchemy.org/

is a great place to get started. This allows you to deal more
directly with objects, and also generalizes database access so you
can quickly switch from SQLite to MySQL to whatever.

Another suggestion is to use a document oriented database like
MongoDB for storing your data:

http://www.mongodb.org/

This allows you to store objects without flattening them, which may
be more intuitive for the XML/dictionary results you get back from
Entrez searches.

Hope this helps,
Brad

From chapmanb at 50mail.com  Sun Sep 19 06:44:40 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 19 Sep 2010 06:44:40 -0400
Subject: [Biopython] Third party library
In-Reply-To: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
Message-ID: <20100919104440.GB2030@kunkel>

Dragoslav;

> Did anybody used biopython as third part library, like for example in python
> web project ?

Yes, absolutely. Biopython doesn't behave any different than other
Python third party libraries, so there wouldn't be any special
instructions outside the documentation for the library you are
using.

> I ask this because probably you can not expect to find or install biopython
> in provider server environment.

It's tough to answer this generally without knowing what framework
you are planning to use. For an example, Google App Engine has a
restricted environment where only pure Python libraries work. As an
install procedure you can most simply do:

python setup.py build

and then copy the libraries from build/lib.your_platform to the
site-libraries location in your application.

More formally, virtualenv is also very useful for building an isolated
Python environment with only the libraries for a project:

http://pypi.python.org/pypi/virtualenv

> For example, after installing biopython in windows environment, you can see
> that biopython is
> installed inside python 2.6 installation:
> 
> C:\Python26\Lib\site-packages\Bio
> C:\Python26\Lib\site-packages\BioSQL
> C:\Python26\Lib\site-packages\numpy
> 
> So can you copy these folders to, for example, \Lib\ folder of web project,
> and reference them somehow from code ?

Sure, that all seems fine but it's hard to offer specific advise
without knowing exactly what you are doing. The best place for
questions is probably in the community of the web framework you are
using. Everything that applies to other third party libraries will
apply to Biopython.

Hope this helps,
Brad

From sdavis2 at mail.nih.gov  Sun Sep 19 07:02:45 2010
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sun, 19 Sep 2010 07:02:45 -0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
Message-ID: <AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>

On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Did anybody used biopython as third part library, like for example in
> python
> web project ?
> I ask this because probably you can not expect to find or install biopython
> in provider server
> environment.
>
> For example, after installing biopython in windows environment, you can see
> that biopython is
> installed inside python 2.6 installation:
>
> C:\Python26\Lib\site-packages\Bio
> C:\Python26\Lib\site-packages\BioSQL
> C:\Python26\Lib\site-packages\numpy
>
> So can you copy these folders to, for example, \Lib\ folder of web project,
> and reference them
> somehow from code ?
>
> Of course I can test this by myself, and I will do this, but maybe somebody
> have experience
> with this problem, and it would be probably good info for others in this
> forum.
>
>
Hi, Dragoslav.  The python developers thought of this problem.

http://docs.python.org/install/#alternate-installation-the-home-scheme

Sean

From zaricdragoslav at gmail.com  Sun Sep 19 08:11:48 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 16:11:48 +0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
Message-ID: <AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>

Anyway,

I will try simplest thing, to copy folder with biopython modules in some
folder of web app and access modules trough absolute path of web server,
this must work.

At first I planned to use django web framework, but I recently discovered
there are
many python web frameworks. So i prefer most simplistic and effective
frameworks,
I will check out

web.py

it looks nice at first glance.

Kind regards

On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:

>
>
> On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <zaricdragoslav at gmail.com
> > wrote:
>
>> Did anybody used biopython as third part library, like for example in
>> python
>> web project ?
>> I ask this because probably you can not expect to find or install
>> biopython
>> in provider server
>> environment.
>>
>> For example, after installing biopython in windows environment, you can
>> see
>> that biopython is
>> installed inside python 2.6 installation:
>>
>> C:\Python26\Lib\site-packages\Bio
>> C:\Python26\Lib\site-packages\BioSQL
>> C:\Python26\Lib\site-packages\numpy
>>
>> So can you copy these folders to, for example, \Lib\ folder of web
>> project,
>> and reference them
>> somehow from code ?
>>
>> Of course I can test this by myself, and I will do this, but maybe
>> somebody
>> have experience
>> with this problem, and it would be probably good info for others in this
>> forum.
>>
>>
> Hi, Dragoslav.  The python developers thought of this problem.
>
> http://docs.python.org/install/#alternate-installation-the-home-scheme
>
> Sean
>
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics

From rodrigo_faccioli at uol.com.br  Sun Sep 19 09:59:47 2010
From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli)
Date: Sun, 19 Sep 2010 10:59:47 -0300
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
	<AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
Message-ID: <AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>

Hi,

I've worked with BioPython in web project. I've installed BioPython normally
in our ubuntu server.

My web project was developed its front-end in jsp. But I ran my scripts with
BioPython. You can find this project in
http://glu.fcfrp.usp.br:8180/prometheus/

About the python frameworks, I've read Django is an excellent framework.

Thanks in advance,

--
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structure Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-9366 Ext 229
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5


On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Anyway,
>
> I will try simplest thing, to copy folder with biopython modules in some
> folder of web app and access modules trough absolute path of web server,
> this must work.
>
> At first I planned to use django web framework, but I recently discovered
> there are
> many python web frameworks. So i prefer most simplistic and effective
> frameworks,
> I will check out
>
> web.py
>
> it looks nice at first glance.
>
> Kind regards
>
> On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> >
> >
> > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <
> zaricdragoslav at gmail.com
> > > wrote:
> >
> >> Did anybody used biopython as third part library, like for example in
> >> python
> >> web project ?
> >> I ask this because probably you can not expect to find or install
> >> biopython
> >> in provider server
> >> environment.
> >>
> >> For example, after installing biopython in windows environment, you can
> >> see
> >> that biopython is
> >> installed inside python 2.6 installation:
> >>
> >> C:\Python26\Lib\site-packages\Bio
> >> C:\Python26\Lib\site-packages\BioSQL
> >> C:\Python26\Lib\site-packages\numpy
> >>
> >> So can you copy these folders to, for example, \Lib\ folder of web
> >> project,
> >> and reference them
> >> somehow from code ?
> >>
> >> Of course I can test this by myself, and I will do this, but maybe
> >> somebody
> >> have experience
> >> with this problem, and it would be probably good info for others in this
> >> forum.
> >>
> >>
> > Hi, Dragoslav.  The python developers thought of this problem.
> >
> > http://docs.python.org/install/#alternate-installation-the-home-scheme
> >
> > Sean
> >
> >
>
>
>
> --
> Dragoslav Zaric
>
> Professional Programmer
> MSc Astrophysics
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From zaricdragoslav at gmail.com  Sun Sep 19 10:34:37 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 18:34:37 +0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
	<AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
	<AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>
Message-ID: <AANLkTiksp+WOsu5obF2cdTdOJweM4+UR3Tkq1s2ef+jf@mail.gmail.com>

Thanks Rodrigo,

I have come to same conclusion after little searching.
Also hosting for django is very common.

kind regards

On Sun, Sep 19, 2010 at 5:59 PM, Rodrigo Faccioli <
rodrigo_faccioli at uol.com.br> wrote:

> Hi,
>
> I've worked with BioPython in web project. I've installed BioPython
> normally
> in our ubuntu server.
>
> My web project was developed its front-end in jsp. But I ran my scripts
> with
> BioPython. You can find this project in
> http://glu.fcfrp.usp.br:8180/prometheus/
>
> About the python frameworks, I've read Django is an excellent framework.
>
> Thanks in advance,
>
> --
> Rodrigo Antonio Faccioli
> Ph.D Student in Electrical Engineering
> University of Sao Paulo - USP
> Engineering School of Sao Carlos - EESC
> Department of Electrical Engineering - SEL
> Intelligent System in Structure Bioinformatics
> http://laips.sel.eesc.usp.br
> Phone: 55 (16) 3373-9366 Ext 229
> Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
> Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5
>
>
> On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric
> <zaricdragoslav at gmail.com>wrote:
>
> > Anyway,
> >
> > I will try simplest thing, to copy folder with biopython modules in some
> > folder of web app and access modules trough absolute path of web server,
> > this must work.
> >
> > At first I planned to use django web framework, but I recently discovered
> > there are
> > many python web frameworks. So i prefer most simplistic and effective
> > frameworks,
> > I will check out
> >
> > web.py
> >
> > it looks nice at first glance.
> >
> > Kind regards
> >
> > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >
> > >
> > >
> > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <
> > zaricdragoslav at gmail.com
> > > > wrote:
> > >
> > >> Did anybody used biopython as third part library, like for example in
> > >> python
> > >> web project ?
> > >> I ask this because probably you can not expect to find or install
> > >> biopython
> > >> in provider server
> > >> environment.
> > >>
> > >> For example, after installing biopython in windows environment, you
> can
> > >> see
> > >> that biopython is
> > >> installed inside python 2.6 installation:
> > >>
> > >> C:\Python26\Lib\site-packages\Bio
> > >> C:\Python26\Lib\site-packages\BioSQL
> > >> C:\Python26\Lib\site-packages\numpy
> > >>
> > >> So can you copy these folders to, for example, \Lib\ folder of web
> > >> project,
> > >> and reference them
> > >> somehow from code ?
> > >>
> > >> Of course I can test this by myself, and I will do this, but maybe
> > >> somebody
> > >> have experience
> > >> with this problem, and it would be probably good info for others in
> this
> > >> forum.
> > >>
> > >>
> > > Hi, Dragoslav.  The python developers thought of this problem.
> > >
> > > http://docs.python.org/install/#alternate-installation-the-home-scheme
> > >
> > > Sean
> > >
> > >
> >
> >
> >
> > --
> > Dragoslav Zaric
> >
> > Professional Programmer
> > MSc Astrophysics
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics

From biopython at maubp.freeserve.co.uk  Wed Sep  1 16:52:51 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 1 Sep 2010 17:52:51 +0100
Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and Bio.Parsers?
Message-ID: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>

Hello all,

One of the improvements in Biopython 1.55 was a re-written location
parser for Bio.GenBank (which also covers EMBL parsing). This made
parsing much faster, and also meant Bio.GenBank.LocationParser
and the underlying Bio.Parsers and Bio.Parsers.spark modules were
obsolete. I'd like to mark these as deprecated in the next release:

* Bio.GenBank.LocationParser
* Bio.Parsers (including Bio.Parsers.spark)

Would this cause anyone a problem?

Thanks,

Peter


From j.reid at mail.cryst.bbk.ac.uk  Fri Sep  3 13:11:28 2010
From: j.reid at mail.cryst.bbk.ac.uk (John Reid)
Date: Fri, 03 Sep 2010 14:11:28 +0100
Subject: [Biopython] Wrong instance length bug in MEME parser
Message-ID: <i5qs60$9i9$1@dough.gmane.org>

Hi,

The MEME parser in biopython 1.55 seems to incorrectly set the length of 
the first instance of a motif to 0. Here is an example:

#Sequence, start, length, site
Motif: E-value: 0.000010
      seq_3,   213,     0, AGGTGACAGAG
      seq_1,   146,    11, AGGTGACAGAG
      seq_0,   490,    11, AGGTGACAGAG
      seq_0,    83,    11, AGGTGACAGAG
      seq_0,   388,    11, AGGAAACAGAG
      seq_1,   422,    11, AGGGGACAGAG
      seq_1,    79,    11, TGGAGACAGAG
      seq_0,   281,    11, TGGGGACAGAG
      seq_0,    16,    11, TAGAGACAGAG
      seq_1,   228,    11, TTGTGACAGAG
      seq_4,   156,    11, AGGGGACAGGG
      seq_0,   348,    11, AGGAAAGAGAA
      seq_0,   374,    11, AGGAATGAGAG
      seq_5,    22,    11, GGGAAACTGAG
      seq_3,   486,    11, AAGGGAGTGAG


Here's the code that generated the above:

from Bio.Motif.Parsers.MEME import MEMEParser
import cStringIO

meme_output = cStringIO.StringIO("""
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 4.3.0 (Release date: Sat Sep 26 01:51:56 PDT 2009)

For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.nbcr.net.

This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs.  MAST is available
for interactive use and downloading at http://meme.nbcr.net.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:

Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************


********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= /home/john/Data/Tompa-data-set/Real/hm22r.fasta
ALPHABET= ACGT
Sequence name            Weight Length  Sequence name            Weight 
Length
-------------            ------ ------  -------------            ------ 
------
seq_0                    1.0000    500  seq_1                    1.0000 
    500
seq_2                    1.0000    500  seq_3                    1.0000 
    500
seq_4                    1.0000    500  seq_5                    1.0000 
    500
********************************************************************************

********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.

command: meme /home/john/Data/Tompa-data-set/Real/hm22r.fasta -maxsize 
1000000 -oc output/run_dataset/Tompa/hm22r/Real -dna -mod anr -revcomp 
-print_starts -maxiter 1000 -minw 8 -maxw 20 -minsites 2 -nmotifs 1

model:  mod=           anr    nmotifs=         1    evt=           inf
object function=  E-value of product of p-values
width:  minw=            8    maxw=           20    minic=        0.00
width:  wg=             11    ws=              1    endgaps=       yes
nsites: minsites=        2    maxsites=       30    wnsites=       0.8
theta:  prob=            1    spmap=         uni    spfuzz=        0.5
global: substring=     yes    branching=      no    wbranch=        no
em:     prior=   dirichlet    b=            0.01    maxiter=      1000
         distance=    1e-05
data:   n=            3000    N=               6
strands: + -
sample: seed=            0    seqfrac=         1
Letter frequencies in dataset:
A 0.195 C 0.305 G 0.305 T 0.195
Background letter frequencies (from dataset with add-one prior applied):
A 0.195 C 0.305 G 0.305 T 0.195
********************************************************************************


********************************************************************************
MOTIF  1    width =   11   sites =  15   llr = 159   E-value = 9.8e-006
********************************************************************************
--------------------------------------------------------------------------------
     Motif 1 Description
--------------------------------------------------------------------------------
Simplified        A  71:439:9:91
pos.-specific     C  ::::::8::::
probability       G  18a37:2:a19
matrix            T  31:3:1:1:::

          bits    2.4
                  2.1      *
                  1.9      * * *
                  1.6   *  * ***
Relative         1.4   *  * ****
Entropy          1.2 * *  * ****
(15.3 bits)      0.9 *** *******
                  0.7 ***********
                  0.5 ***********
                  0.2 ***********
                  0.0 -----------

Multilevel           AGGAGACAGAG
consensus            T  TA G
sequence                G

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name            Strand  Start   P-value               Site
-------------            ------  ----- ---------            -----------
seq_3                        -    213  4.54e-07 GGCCTTTGGA AGGTGACAGAG 
GCGCGGCCAC
seq_1                        -    146  4.54e-07 CCCAACAGGA AGGTGACAGAG 
GTGGCTCTGG
seq_0                        +    490  4.54e-07 AAAACAGCAG AGGTGACAGAG 

seq_0                        -     83  4.54e-07 CCCAGCAGGA AGGTGACAGAG 
GTGGCTCTGG
seq_0                        +    388  5.99e-07 ATGAGAGGAG AGGAAACAGAG 
CTTCCTGGAC
seq_1                        +    422  1.10e-06 ATGAGAGGGG AGGGGACAGAG 
GACACCTGAA
seq_1                        +     79  1.33e-06 TTGGTGGTAC TGGAGACAGAG 
GGCTGGTCCC
seq_0                        +    281  3.17e-06 CCTCCCCTGA TGGGGACAGAG 
GTCTCATCAG
seq_0                        +     16  5.72e-06 CTGGTGACAC TAGAGACAGAG 
GGCTGGTCCC
seq_1                        -    228  1.18e-05 TTATTTTCCT TTGTGACAGAG 
AAACCCAGCA
seq_4                        +    156  2.07e-05 TCAAGTCCCA AGGGGACAGGG 
AGCAGAAGGG
seq_0                        +    348  2.47e-05 GTAGACAGAA AGGAAAGAGAA 
AGTAAGGACA
seq_0                        +    374  3.14e-05 GGACAAAGGT AGGAATGAGAG 
GAGAGGAAAC
seq_5                        -     22  4.53e-05 CTCTTGTGTA GGGAAACTGAG 
CACGGGGAAC
seq_3                        +    486  5.02e-05 CGCCAATGGG AAGGGAGTGAG 
TGCC
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME            POSITION P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
seq_3                               5e-05  212_[-1]_262_[+1]_4
seq_1                             1.2e-05 
78_[+1]_56_[-1]_71_[-1]_183_[+1]_68
seq_0                             3.2e-06  15_[+1]_56_[-1]_187_[+1]_56_[+1]_
                                            15_[+1]_3_[+1]_91_[+1]
seq_4                             2.1e-05  155_[+1]_334
seq_5                             4.5e-05  21_[-1]_468
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL   MOTIF 1 width=11 seqs=15
seq_3                    (  213) AGGTGACAGAG  1
seq_1                    (  146) AGGTGACAGAG  1
seq_0                    (  490) AGGTGACAGAG  1
seq_0                    (   83) AGGTGACAGAG  1
seq_0                    (  388) AGGAAACAGAG  1
seq_1                    (  422) AGGGGACAGAG  1
seq_1                    (   79) TGGAGACAGAG  1
seq_0                    (  281) TGGGGACAGAG  1
seq_0                    (   16) TAGAGACAGAG  1
seq_1                    (  228) TTGTGACAGAG  1
seq_4                    (  156) AGGGGACAGGG  1
seq_0                    (  348) AGGAAAGAGAA  1
seq_0                    (  374) AGGAATGAGAG  1
seq_5                    (   22) GGGAAACTGAG  1
seq_3                    (  486) AAGGGAGTGAG  1
//

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 11 n= 2940 bayes= 6.7534 E= 9.8e-006
    177  -1055   -219     45
    -55  -1055    139   -155
  -1055  -1055    171  -1055
    103  -1055    -19     77
     45  -1055    127  -1055
    226  -1055  -1055   -155
  -1055    139    -61  -1055
    215  -1055  -1055    -55
  -1055  -1055    171  -1055
    226  -1055   -219  -1055
   -155  -1055    161  -1055
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 11 nsites= 15 E= 9.8e-006
  0.666667  0.000000  0.066667  0.266667
  0.133333  0.000000  0.800000  0.066667
  0.000000  0.000000  1.000000  0.000000
  0.400000  0.000000  0.266667  0.333333
  0.266667  0.000000  0.733333  0.000000
  0.933333  0.000000  0.000000  0.066667
  0.000000  0.800000  0.200000  0.000000
  0.866667  0.000000  0.000000  0.133333
  0.000000  0.000000  1.000000  0.000000
  0.933333  0.000000  0.066667  0.000000
  0.066667  0.000000  0.933333  0.000000
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
     Motif 1 regular expression
--------------------------------------------------------------------------------
[AT]GG[ATG][GA]A[CG]AGAG
--------------------------------------------------------------------------------


Time  3.78 secs.

********************************************************************************


********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************

--------------------------------------------------------------------------------
     Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME            COMBINED P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
seq_0                            4.45e-04 
15_[+1(5.72e-06)]_56_[-1(4.54e-07)]_187_[+1(3.17e-06)]_56_[+1(2.47e-05)]_15_[+1(3.14e-05)]_3_[+1(5.99e-07)]_91_[+1(4.54e-07)]
seq_1                            4.45e-04 
78_[+1(1.33e-06)]_56_[-1(4.54e-07)]_71_[-1(1.18e-05)]_183_[+1(1.10e-06)]_68
seq_2                            2.03e-01  500
seq_3                            4.45e-04 
212_[-1(4.54e-07)]_262_[+1(5.02e-05)]_4
seq_4                            2.01e-02  155_[+1(2.07e-05)]_334
seq_5                            4.34e-02  21_[-1(4.53e-05)]_468
--------------------------------------------------------------------------------

********************************************************************************


********************************************************************************
Stopped because nmotifs = 1 reached.
********************************************************************************

CPU: john-dell

********************************************************************************
""")


parser = MEMEParser()
parsed = parser.parse(meme_output)

print '#Sequence, start, length, site'
for motif in parsed.motifs:
     print 'Motif: E-value: %f' % motif.evalue
     for instance in motif.instances:
         print "%10s, %5d, %5d, %s" % (
             instance.sequence_name,
             instance.start,
             instance.length,
             str(instance),
         )
         #assert instance.length == motif.length


From biopython at maubp.freeserve.co.uk  Fri Sep  3 13:44:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 3 Sep 2010 14:44:27 +0100
Subject: [Biopython] Wrong instance length bug in MEME parser
In-Reply-To: <i5qs60$9i9$1@dough.gmane.org>
References: <i5qs60$9i9$1@dough.gmane.org>
Message-ID: <AANLkTikT0JSLc6X+qXAGezFnjh+=+YT1kmsLUhRB8N0s@mail.gmail.com>

On Fri, Sep 3, 2010 at 2:11 PM, John Reid <j.reid at mail.cryst.bbk.ac.uk> wrote:
> Hi,
>
> The MEME parser in biopython 1.55 seems to incorrectly set the length of the
> first instance of a motif to 0. Here is an example:
> ...

Could you file a bug with all that useful information?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Thanks,

Peter


From bartek at rezolwenta.eu.org  Fri Sep  3 14:52:32 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 3 Sep 2010 16:52:32 +0200
Subject: [Biopython] Wrong instance length bug in MEME parser
In-Reply-To: <i5qs60$9i9$1@dough.gmane.org>
References: <i5qs60$9i9$1@dough.gmane.org>
Message-ID: <AANLkTikDo7dwWM8E8VpcW3a9JZQ+ApFkC4-i8SBSq_KO@mail.gmail.com>

On Fri, Sep 3, 2010 at 3:11 PM, John Reid <j.reid at mail.cryst.bbk.ac.uk>wrote:

> Hi,
>
> The MEME parser in biopython 1.55 seems to incorrectly set the length of
> the first instance of a motif to 0. Here is an example:
>
> /cut/
Hi,

Thanks for reporting the bug. It is fixed now in the main branch (small
change, you can see the diff here :
http://github.com/biopython/biopython/commit/102ad30a8c5d8bd87847000b33f771b40143e743

I'm closing the bug now, if you find anything else, please let us know.

thanks
Bartek


From mitlox at op.pl  Mon Sep  6 13:24:14 2010
From: mitlox at op.pl (xyz)
Date: Mon, 06 Sep 2010 23:24:14 +1000
Subject: [Biopython] reading two fastq files at the same time
Message-ID: <4C84EB7E.90200@op.pl>

Hi,
How is it possible to read two fastq files at the same time in 
BioPython? I have the following BioRuby example:

require 'bio'

begin
   fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2])
   fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3])

   while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry)

     fastq_A1 = entry1.entry_id
     fastq_A2 = entry1.seq

     fastq_B1 = entry2.entry_id
     fastq_B2 = entry2.seq
   end

rescue => err
   raise "Exception: #{err}"
end

Thank you in advance.


From biopython at maubp.freeserve.co.uk  Mon Sep  6 13:51:13 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Sep 2010 14:51:13 +0100
Subject: [Biopython] reading two fastq files at the same time
In-Reply-To: <4C84EB7E.90200@op.pl>
References: <4C84EB7E.90200@op.pl>
Message-ID: <AANLkTimD8at8N9y0pYhu+SQuFiGZ7Xh-72hj+9xhgnn5@mail.gmail.com>

On Mon, Sep 6, 2010 at 2:24 PM, xyz <mitlox at op.pl> wrote:
> Hi,
> How is it possible to read two fastq files at the same time in BioPython? I
> have the following BioRuby example:
>
> require 'bio'
>
> begin
> ?fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2])
> ?fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3])
>
> ?while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry)
>
> ? ?fastq_A1 = entry1.entry_id
> ? ?fastq_A2 = entry1.seq
>
> ? ?fastq_B1 = entry2.entry_id
> ? ?fastq_B2 = entry2.seq
> ?end
>
> rescue => err
> ?raise "Exception: #{err}"
> end
>
> Thank you in advance.

Hi,

If you are using Python 2.6+ then probably itertools.izip_longest
would do what you want. You could use itertools.izip but this
won't catch the error condition when one file has more records
than the other.

Alternatively you could use something like this,

from Bio import SeqIO
iter1 = SeqIO.parse(filename1, "fastq")
iter2 = SeqIO.parse(filename1, "fastq")
while True:
    try:
        rec1 = iter1.next()
    except StopIteration:
        rec1 = None
    try:
        rec2 = iter2.next()
    except StopIteration:
        rec2 = None
    if rec1 is None and rec2 is None:
        break #end of both files
    elif rec1 is None or rec2 is None:
        raise ValueError("Diff record count")
    else:
        print rec1.seq, rec1.id
        print rec2.seq, rec2.id

I haven't tested that but it is based on a similar example in
Bio.SeqIO.QualityIO.PairedFastaQualIterator for a paired
FASTQ and QUAL file.

Peter


From biopython at maubp.freeserve.co.uk  Thu Sep  9 17:13:34 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Sep 2010 18:13:34 +0100
Subject: [Biopython] Deprecating Bio.GenBank.LocationParser and
	Bio.Parsers?
In-Reply-To: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>
References: <AANLkTi=ZEDV3GZ=9f0SweQen9+dToYfe0xhDvkJ5N1e6@mail.gmail.com>
Message-ID: <AANLkTinnSm_xkbGgcn6G5vhw51cVr8T5hYXc0X-D_hM-@mail.gmail.com>

On Wed, Sep 1, 2010 at 5:52 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hello all,
>
> One of the improvements in Biopython 1.55 was a re-written location
> parser for Bio.GenBank (which also covers EMBL parsing). This made
> parsing much faster, and also meant Bio.GenBank.LocationParser
> and the underlying Bio.Parsers and Bio.Parsers.spark modules were
> obsolete. I'd like to mark these as deprecated in the next release:
>
> * Bio.GenBank.LocationParser
> * Bio.Parsers (including Bio.Parsers.spark)
>
> Would this cause anyone a problem?
>
> Thanks,

I've just added the deprecation warnings to the code, ready for
Biopython 1.56 - it is not too late to undo this is anyone is using
this code, but you need to tell us.

Peter


From margeemail at gmail.com  Fri Sep 10 04:10:23 2010
From: margeemail at gmail.com (mailing list)
Date: Fri, 10 Sep 2010 00:10:23 -0400
Subject: [Biopython] Added Biopython to my web tool
Message-ID: <AANLkTi=N9EvkTvtULg7z_GzA1OZfdrkWTbbr=nTQfW0E@mail.gmail.com>

I made a web application (http://utilitymill.com) that lets people make
online utilities with Python.  I thought you guys might appreciate I added
Biopython as a built-in library users can use in their utilities.

Here's an example of a utility using Biopython:
http://utilitymill.com/utility/RNA_Transcription  (It's very simple, I just
wanted to try it out.)

I'm curious to know if it's useful to you guys.  And I'm also hoping I
installed everything correctly, so let me know if anything doesn't work.

-Greg


From mjldehoon at yahoo.com  Sat Sep 11 05:29:32 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Sep 2010 22:29:32 -0700 (PDT)
Subject: [Biopython] Parsing XML returned by efetch from the Journals
	database
Message-ID: <825067.71139.qm@web62404.mail.re1.yahoo.com>

Dear users,

The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities as long as the corresponding DTD is available (which are included with each release of Biopython). One corner case is efetch results from the Journals database. Officially, efetch from the Journals database does not generate output in the XML format, but only plain text or HTML. However, when requesting XML explicitly from Entrez, in practice it does return an XML-like output. Our parser in Bio.Entrez is able to parse this XML, but it requires several hacks in the parser code.

To make the parser more stable for other XML documents, I'd like to remove these hacks. Currently is anybody using Bio.Entrez to parse XML returned by efetch from the Journals database?

--Michiel.


From natassa_g_2000 at yahoo.com  Mon Sep 13 16:22:26 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Mon, 13 Sep 2010 09:22:26 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
Message-ID: <533513.93597.qm@web52005.mail.re2.yahoo.com>

Hello, 
I was wondering if there  is a Biopython solution to parsing codeml results from 
paml. the output files are pretty standard, so such a parser should be quite 
straightforward to write up. I d volunteer for this, but thought I might check 
first if somebody else has done this. Actually, I found a read-only pypaml 
interface in google codes, tried it out and realized I had to edit several 
things to even import it (in python 2.5), which is quite strange: It was mainly 
built-in methods that throwed errors..Anyway, i 'corrected' this and then 
realized that the output files assumed by this code may not be the same as mine, 
although again, the outputs of codeml are pretty standard. I am not sure how 
much this code is used and was not sure what is the developper's email to ask 
him some questions. 

I am interested in parsing outputs from Branch, Site and BranchSite models, so 
everthing that codeml can do. Any information by experienced users is welcome!
Thanks, 
Anastasia Gioti


From biopython at maubp.freeserve.co.uk  Mon Sep 13 16:45:28 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 13 Sep 2010 17:45:28 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <533513.93597.qm@web52005.mail.re2.yahoo.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
Message-ID: <AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>

On Mon, Sep 13, 2010 at 5:22 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
> Hello,
> I was wondering if there ?is a Biopython solution to parsing codeml results from
> paml. the output files are pretty standard, so such a parser should be quite
> straightforward to write up. I d volunteer for this, but thought I might check
> first if somebody else has done this. Actually, I found a read-only pypaml
> interface in google codes, tried it out and realized I had to edit several
> things to even import it (in python 2.5), which is quite strange: It was mainly
> built-in methods that throwed errors..Anyway, i 'corrected' this and then
> realized that the output files assumed by this code may not be the same as mine,
> although again, the outputs of codeml are pretty standard. I am not sure how
> much this code is used and was not sure what is the developper's email to ask
> him some questions.
>
> I am interested in parsing outputs from Branch, Site and BranchSite models, so
> everthing that codeml can do. Any information by experienced users is welcome!
> Thanks,
> Anastasia Gioti

Hi Anastasia,

Could you post a short example of the kind of output you are looking at?

Can you get codeml to output what you need in another format, such as NEXUS?

Peter


From p.j.a.cock at googlemail.com  Mon Sep 13 20:40:30 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 13 Sep 2010 21:40:30 +0100
Subject: [Biopython] Fwd: problems searching swiss prot
In-Reply-To: <a06240806c8b41d461cfc@131.229.113.228>
References: <mailman.4871.1284059429.3031.biopython@lists.open-bio.org>
	<a06240806c8b41d461cfc@131.229.113.228>
Message-ID: <AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>

Forwarding a query from Jessica Grant since she appears
to have had trouble posting to the mailing list.

Jessica wrote:

> Hello,
>
> I am running a few scripts to try to extract sequence information
> out of uniprot. ?One program called AutoFACT gives me ID numbers
> associated with that database. ?Most of these look like this:
>
> D2V5S4_NAEGR
> Q48KU2_PSE14
> Q22B72_TETTH
>
>
> and my downstream scripts, which are written in biopython, are
> fine with this. ?Then, every once in a while, a sequence will come
> back with a name that looks like this:
>
> UPI00006CC162
>
> and everything goes bad. ?My script can't handle these names,
> apparently, although if I go to uniprot.org and search for it, the
> sequence comes up.
>
> My script uses the following, where RepID is the number
> extracted from AutoFACT:
>
> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None)
> ? ? ? ?seq_record = SeqIO.read(handle, "swiss")
>
> Any thoughts?
>
> Thank you,
>
> Jessica

Hi Jessica,

I think the problem is that these unusual identifiers are
not UniProt/SwissProt accession identifiers. The URL
this Biopython function uses was originally from
www.expasy.ch but is now on www.uniprot.org as
described here:

http://www.expasy.ch/expasy_urls.html

I think the ID UPI00006CC162 is a UniProt ID of some
kind, so it may be possible to access the information
you want somehow. See for example:

http://www.uniprot.org/uniparc/UPI00006CC162

However, it is not clear to me right away if you can get
this record back as a plain text "swiss" format entry...

Peter


From natassa_g_2000 at yahoo.com  Tue Sep 14 08:02:18 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Tue, 14 Sep 2010 01:02:18 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
Message-ID: <937371.74873.qm@web52006.mail.re2.yahoo.com>

Hi Peter, 


Could you post a short example of the kind of output you are looking at?
Here is an example output, but this caan differ depending on the model used 
(there are several models for Branch, Site, BranchSite, but all are pretty 
standard)


-------------------------------------------------------------------------------OUTPUT-------------------------


seed used = 808671289
CODONML (in paml version 4.4, January 2010)  align.phy
Model: One dN/dS ratio for branches
Codon frequency model: F3x4
Site-class models:  NearlyNeutral
ns =   7  ls = 861

Codon usage in sequences
--------------------------------------------------------------------------------------------------------------

Phe TTT 12 14 15 14 14 12 | Ser TCT  6 11 12  8 10  6 | Tyr TAT  5  5  4  7  9  
5 | Cys TGT 11  8 10  9 11  8
    TTC 23 18 18 20 20 20 |     TCC 16 13 16 19 16 18 |     TAC 11 12 13 17 11 
13 |     TGC  6  2  6  6  4  6
Leu TTA  8  5  6  5  4  2 |     TCA 17 16 18 20 21 15 | *** TAA  0  0  0  0  0  
0 | *** TGA  0  0  0  0  0  0
    TTG 13 11 11 15 15 17 |     TCG 17 14 14 17 17 18 |     TAG  0  0  0  0  0  
0 | Trp TGG  9  8  8 11  8  7
--------------------------------------------------------------------------------------------------------------

Leu CTT 13 15 16 11 12 16 | Pro CCT  7  7 10  6 10  8 | His CAT  8  7  8  4  6  
5 | Arg CGT  6  4  5  4  5  5
    CTC 14 14 13 19 14 15 |     CCC 20 13 16 24 19 20 |     CAC 23 18 22 20 24 
17 |     CGC 14 13 15 14 14 15
    CTA  6  4  8  7  6  9 |     CCA 19 18 19 11 17 16 | Gln CAA 20 16 20 21 18 
13 |     CGA  8  4  6  5  6  6
    CTG 17 17 14 14 16 10 |     CCG  7  8  8  9  6  8 |     CAG 18 14 15 14 14 
13 |     CGG  7  7  8  9  9  8
--------------------------------------------------------------------------------------------------------------

Ile ATT  6  7  9  5  7  6 | Thr ACT  5  7  7  7  5  4 | Asn AAT  3  3  4  2  5  
2 | Ser AGT  7  7  9  8  7  7
    ATC 16 13 15 23 14 16 |     ACC 21 14 17 20 20 16 |     AAC 12 14 14 21 14 
11 |     AGC 14 13 14 15 11 10
    ATA 13  9 10 11 11 10 |     ACA 19 17 22 22 28 18 | Lys AAA 17  8 13  9 13 
12 | Arg AGA 11  5  8  4  6  5
Met ATG 23 21 23 22 23 20 |     ACG 11 12 12 12 14 13 |     AAG 18 15 19 19 18 
18 |     AGG  9 10 13 14 12 13
--------------------------------------------------------------------------------------------------------------

Val GTT  8 13 10 10 10  6 | Ala GCT 13 10 12 12 14 13 | Asp GAT 18 18 17 15 15 
17 | Gly GGT 13  7 12 10 11 10
    GTC 18 13 18 20 19 21 |     GCC 28 26 28 28 28 23 |     GAC 29 21 26 33 29 
30 |     GGC  9  9  8  7 12  8
    GTA  8  8  9  7  6  7 |     GCA 22 22 24 17 23 19 | Glu GAA 27 24 24 27 21 
22 |     GGA  7  7 10  9  7  9
    GTG 13 11 14 13 13  9 |     GCG 11 10 10 10  7  7 |     GAG 14 14 17 13 19 
17 |     GGG  7  6  9  8  7  9
--------------------------------------------------------------------------------------------------------------


--------------------------------------------------
Phe TTT 12 | Ser TCT  8 | Tyr TAT  6 | Cys TGT  8
    TTC 22 |     TCC 18 |     TAC 15 |     TGC  6
Leu TTA  5 |     TCA 22 | *** TAA  0 | *** TGA  0
    TTG 17 |     TCG 17 |     TAG  0 | Trp TGG  9
--------------------------------------------------
Leu CTT 14 | Pro CCT 12 | His CAT  5 | Arg CGT  6
    CTC 19 |     CCC 20 |     CAC 20 |     CGC 13
    CTA 10 |     CCA 16 | Gln CAA 17 |     CGA  5
    CTG  8 |     CCG 11 |     CAG 15 |     CGG  8
--------------------------------------------------
Ile ATT  5 | Thr ACT  4 | Asn AAT  4 | Ser AGT  7
    ATC 20 |     ACC 21 |     AAC 14 |     AGC 12
    ATA 11 |     ACA 29 | Lys AAA 11 | Arg AGA  4
Met ATG 25 |     ACG 15 |     AAG 23 |     AGG 13
--------------------------------------------------
Val GTT 10 | Ala GCT 13 | Asp GAT 16 | Gly GGT  7
    GTC 18 |     GCC 26 |     GAC 33 |     GGC 11
    GTA  7 |     GCA 24 | Glu GAA 23 |     GGA 11
    GTG 10 |     GCG  8 |     GAG 15 |     GGG 11
--------------------------------------------------

Codon position x base (3x4) table for each sequence.

#1: species1       
position  1:    T:0.18989    C:0.25524    A:0.25277    G:0.30210
position  2:    T:0.26017    C:0.29470    A:0.27497    G:0.17016
position  3:    T:0.17386    C:0.33785    A:0.24908    G:0.23921
Average         T:0.20797    C:0.29593    A:0.25894    G:0.23716

#2: species2         
position  1:    T:0.19296    C:0.25211    A:0.24648    G:0.30845
position  2:    T:0.27183    C:0.30704    A:0.26620    G:0.15493
position  3:    T:0.20141    C:0.31831    A:0.22958    G:0.25070
Average         T:0.22207    C:0.29249    A:0.24742    G:0.23803

#3: species3    
position  1:    T:0.18619    C:0.25031    A:0.25771    G:0.30580
position  2:    T:0.25771    C:0.30210    A:0.26634    G:0.17386
position  3:    T:0.19729    C:0.31936    A:0.24291    G:0.24044
Average         T:0.21373    C:0.29059    A:0.25565    G:0.24003

#4: species4   
position  1:    T:0.20664    C:0.23616    A:0.26322    G:0.29397
position  2:    T:0.26568    C:0.29766    A:0.27306    G:0.16359
position  3:    T:0.16236    C:0.37638    A:0.21525    G:0.24600
Average         T:0.21156    C:0.30340    A:0.25051    G:0.23452

#5: species5       
position  1:    T:0.19876    C:0.24348    A:0.25839    G:0.29938
position  2:    T:0.25342    C:0.31677    A:0.26832    G:0.16149
position  3:    T:0.18758    C:0.33416    A:0.23230    G:0.24596
Average         T:0.21325    C:0.29814    A:0.25300    G:0.23561

#6: species6      
position  1:    T:0.19892    C:0.24899    A:0.24493    G:0.30717
position  2:    T:0.26522    C:0.30041    A:0.26387    G:0.17050
position  3:    T:0.17591    C:0.35047    A:0.22057    G:0.25304
Average         T:0.21335    C:0.29995    A:0.24312    G:0.24357

#7: species7      
position  1:    T:0.20000    C:0.24121    A:0.26424    G:0.29455
position  2:    T:0.25818    C:0.32000    A:0.26303    G:0.15879
position  3:    T:0.16606    C:0.34909    A:0.23636    G:0.24848
Average         T:0.20808    C:0.30343    A:0.25455    G:0.23394

Sums of codon usage counts
------------------------------------------------------------------------------
Phe F TTT      93 | Ser S TCT      61 | Tyr Y TAT      41 | Cys C TGT      65
      TTC     141 |       TCC     116 |       TAC      92 |       TGC      36
Leu L TTA      35 |       TCA     129 | *** * TAA       0 | *** * TGA       0
      TTG      99 |       TCG     114 |       TAG       0 | Trp W TGG      60
------------------------------------------------------------------------------
Leu L CTT      97 | Pro P CCT      60 | His H CAT      43 | Arg R CGT      35
      CTC     108 |       CCC     132 |       CAC     144 |       CGC      98
      CTA      50 |       CCA     116 | Gln Q CAA     125 |       CGA      40
      CTG      96 |       CCG      57 |       CAG     103 |       CGG      56
------------------------------------------------------------------------------
Ile I ATT      45 | Thr T ACT      39 | Asn N AAT      23 | Ser S AGT      52
      ATC     117 |       ACC     129 |       AAC     100 |       AGC      89
      ATA      75 |       ACA     155 | Lys K AAA      83 | Arg R AGA      43
Met M ATG     157 |       ACG      89 |       AAG     130 |       AGG      84
------------------------------------------------------------------------------
Val V GTT      67 | Ala A GCT      87 | Asp D GAT     116 | Gly G GGT      70
      GTC     127 |       GCC     187 |       GAC     201 |       GGC      64
      GTA      52 |       GCA     151 | Glu E GAA     168 |       GGA      60
      GTG      83 |       GCG      63 |       GAG     109 |       GGG      57
------------------------------------------------------------------------------

(Ambiguity data are not used in the counts.)


Codon position x base (3x4) table, overall

position  1:    T:0.19623    C:0.24664    A:0.25571    G:0.30141
position  2:    T:0.26152    C:0.30559    A:0.26804    G:0.16485
position  3:    T:0.18027    C:0.34113    A:0.23250    G:0.24610
Average         T:0.21267    C:0.29779    A:0.25209    G:0.23746


Nei & Gojobori 1986. dN/dS (dN, dS)
(Pairwise deletion)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)

species1            
species2               0.2598 (0.0599 0.2306)
species3          0.2532 (0.0528 0.2085) 0.2778 (0.0189 0.0680)
species4         0.2815 (0.1116 0.3966) 0.1905 (0.0738 0.3873) 0.2555 (0.0981 
0.3838)
species5             0.2780 (0.0654 0.2351) 0.2611 (0.0631 0.2419) 0.2487 
(0.0552 0.2221) 0.2993 (0.0908 0.3034)
species6            0.2041 (0.0693 0.3396) 0.1785 (0.0613 0.3437) 0.2147 (0.0644 
0.2997) 0.2510 (0.0598 0.2384) 0.2261 (0.0511 0.2260)
species7            0.2374 (0.0890 0.3748) 0.2080 (0.0819 0.3935) 0.2272 (0.0787 
0.3465) 0.2415 (0.0676 0.2797) 0.2646 (0.0731 0.2764) 0.1821 (0.0176 0.0967)


TREE #  1:  (((1, (2, 3)), 5), (6, 4), 7);   MP score: -1
lnL(ntime: 11  np: 14):  -7469.732728      +0.000000
   8..9     9..10   10..1    10..11   11..2    11..3     9..5     8..12   
12..6    12..4     8..7  

 0.179837 0.082919 0.172587 0.087525 0.067422 0.032013 0.124010 0.001030 
0.062291 0.297695 0.117429 2.800021 0.731929 0.083728

Note: Branch length is defined as number of nucleotide substitutions per codon 
(not per neucleotide site).

tree length =   1.22476

(((1: 0.172587, (2: 0.067422, 3: 0.032013): 0.087525): 0.082919, 5: 0.124010): 
0.179837, (6: 0.062291, 4: 0.297695): 0.001030, 7: 0.117429);

(((species1: 0.172587, (species2: 0.067422, species3: 0.032013): 0.087525): 
0.082919, species5: 0.124010): 0.179837, (species6: 0.062291, species4: 
0.297695): 0.001030, species7: 0.117429);

Detailed output identifying parameters

kappa (ts/tv) =  2.80002


dN/dS (w) for site classes (K=2)

p:   0.73193  0.26807
w:   0.08373  1.00000

dN & dS for each branch

 branch          t       N       S   dN/dS      dN      dS  N*dN  S*dS

   8..9       0.180   1857.3    725.7   0.3294   0.0381   0.1158   70.8   84.0
   9..10      0.083   1857.3    725.7   0.3294   0.0176   0.0534   32.7   38.7
  10..1       0.173   1857.3    725.7   0.3294   0.0366   0.1111   68.0   80.6
  10..11      0.088   1857.3    725.7   0.3294   0.0186   0.0563   34.5   40.9
  11..2       0.067   1857.3    725.7   0.3294   0.0143   0.0434   26.6   31.5
  11..3       0.032   1857.3    725.7   0.3294   0.0068   0.0206   12.6   15.0
   9..5       0.124   1857.3    725.7   0.3294   0.0263   0.0798   48.8   57.9
   8..12      0.001   1857.3    725.7   0.3294   0.0002   0.0007    0.4    0.5
  12..6       0.062   1857.3    725.7   0.3294   0.0132   0.0401   24.5   29.1
  12..4       0.298   1857.3    725.7   0.3294   0.0631   0.1917  117.2  139.1
   8..7       0.117   1857.3    725.7   0.3294   0.0249   0.0756   46.2   54.9


Time used:  0:10

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Can you get codeml to output what you need in another format, such as NEXUS?

Haven't tried that, but as you can see, this is a very verbose output and NEXUS 
does not seem an option. 

Ultimately, I want to parse this to get all the information I need in a 
tabulated file. I am still working out what exactly I need (there are standard 
values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type 
of downstram analysis). I will now work on the pypaml class and modify the 
original code to make it more generic (it seems that it only works for Site 
Models). 

Will let you know, was just wondering if there was already a solution.There is 
one in Bioperl, but heard it is very slow and in any case, I don't understand 
much of perl....
Thanks, 
Anastasia


From biopython at maubp.freeserve.co.uk  Tue Sep 14 09:04:56 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 10:04:56 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <937371.74873.qm@web52006.mail.re2.yahoo.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
Message-ID: <AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>

Hi Anastasia,

On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
> Hi Peter,
>
>>
>> Could you post a short example of the kind of output you are looking at?
>>
>
> Here is an example output, but this caan differ depending on the model used
> (there are several models for Branch, Site, BranchSite, but all are pretty
> standard)
>

Thanks - that looks possible to parse, but not very easy (especially if the
codeml output changes slightly between versions).

>>
>> Can you get codeml to output what you need in another format, such as NEXUS?
>>
>
> Haven't tried that, but as you can see, this is a very verbose output and
> NEXUS does not seem an option.

At first glance, the NEXUS format could hold a lot of that information.
Another possibility might be phyloXML. However, you are at the mercy
of the codeml tool and what it supports. I might be worth politely asking
the author(s) about supporting one of these more standard formats as
a optional output.

> Ultimately, I want to parse this to get all the information I need in a
> tabulated file. I am still working out what exactly I need (there are standard
> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
> of downstram analysis). I will now work on the pypaml class and modify the
> original code to make it more generic (it seems that it only works for Site
> Models).

Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
unless he agrees to re-license it we cannot include it in Biopython.

> Will let you know, was just wondering if there was already a solution.There is
> one in Bioperl, but heard it is very slow and in any case, I don't understand
> much of perl....

I don't know much Perl either ;)

Peter


From p.j.a.cock at googlemail.com  Tue Sep 14 09:13:04 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 10:13:04 +0100
Subject: [Biopython] problems searching swiss prot
In-Reply-To: <AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>
References: <mailman.4871.1284059429.3031.biopython@lists.open-bio.org>
	<a06240806c8b41d461cfc@131.229.113.228>
	<AANLkTikT9rjLwtbfEJTOUVpLNbQmcXSFf5R6cayi+Gkf@mail.gmail.com>
Message-ID: <AANLkTim9Cs96dbS_1cD7pfNt3g77koUkHm5Si-4gZANs@mail.gmail.com>

On Mon, Sep 13, 2010 at 9:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Forwarding a query from Jessica Grant since she appears
> to have had trouble posting to the mailing list.
>
> Jessica wrote:
>
>> Hello,
>>
>> I am running a few scripts to try to extract sequence information
>> out of uniprot. ?One program called AutoFACT gives me ID numbers
>> associated with that database. ?Most of these look like this:
>>
>> D2V5S4_NAEGR
>> Q48KU2_PSE14
>> Q22B72_TETTH
>>
>>
>> and my downstream scripts, which are written in biopython, are
>> fine with this. ?Then, every once in a while, a sequence will come
>> back with a name that looks like this:
>>
>> UPI00006CC162
>>
>> and everything goes bad. ?My script can't handle these names,
>> apparently, although if I go to uniprot.org and search for it, the
>> sequence comes up.
>>
>> My script uses the following, where RepID is the number
>> extracted from AutoFACT:
>>
>> ? ? ? ?handle = ExPASy.get_sprot_raw(RepID, cgi=None)
>> ? ? ? ?seq_record = SeqIO.read(handle, "swiss")
>>
>> Any thoughts?
>>
>> Thank you,
>>
>> Jessica
>
> Hi Jessica,
>
> I think the problem is that these unusual identifiers are
> not UniProt/SwissProt accession identifiers. The URL
> this Biopython function uses was originally from
> www.expasy.ch but is now on www.uniprot.org as
> described here:
>
> http://www.expasy.ch/expasy_urls.html
>
> I think the ID UPI00006CC162 is a UniProt ID of some
> kind, so it may be possible to access the information
> you want somehow. See for example:
>
> http://www.uniprot.org/uniparc/UPI00006CC162
>
> However, it is not clear to me right away if you can get
> this record back as a plain text "swiss" format entry...
>
> Peter

Jessica replied (off list), to say:

>> Oh, and I got a great help from someone at Uniprot for my
>> previous question...turns out you can get the sequences
>> downloaded as fasta files:
>>
>> http://www.uniprot.org/uniparc/UPI00006CC162.fasta
>>
>> and I could then read them into SeqIO as a fasta and
>> manipulate them that way.

I guess the UPI at the start stands for Uni Parc Identifier.

Note that the page I linked to earlier has links to several
file formats including FASTA, but not plain text "SwissProt"
format: http://www.uniprot.org/uniparc/UPI00006CC162

Peter


From p.j.a.cock at googlemail.com  Tue Sep 14 09:49:56 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 10:49:56 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240801c8b490b75d3f@10.0.1.4>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
Message-ID: <AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:47 AM, Jessica Grant wrote:
>On Mon, Sep 13, 2010 at 9:49 PM, Peter Cock wrote:
>> On Mon, Sep 13, 2010 at 7:43 PM, Jessica Grant wrote:
>>>
>>> Hello,
>>>
>>> I am working with an organism that has an unusual genetic code.  Is there
>>> a way I can use biopython's translate() but import my own codon table
>>> instead of using the standard ncbi tables that are coded in the CodonTable
>>> module?
>>>
>>> Thanks!
>>>
>>> Jessica
>>
>> Hi Jessica,
>>
>> Good question - this is something I had thought about but not done anything
>> since no one had ever asked about using a non-standard table. After all, the
>> NCBI do have a pretty comprehensive list. I'm curious which organism(s) you
>> are using.
>>
>> In answer to your query, right now, not easily. However, it would be simple to
>> tweak the Bio.Seq module to allow the table argument to be a string or integer
>> as now (for referring to a built in NCBI table) or a CodonTable object which you
>> would have to supply. These are defined in the Bio.Data.CodonTable module.
>> If this sounds useful and you could help with testing, it could be done ready
>> for the next release of Biopython.
>>
>> Peter
>
> Thanks Peter,
>
> We are doing some work on a ciliate called Chilodonella uncinata. ?It
> apparently has only one stop codon and the others are recoded in an
> unusual way so it doesn't quite fit any of the ncbi tables.
>
> I did try to play around with the CodonTable module, but couldnt' quite
> figure out how to do it. ?Just making a new table similar to the tables that
> are in the module didn't do it, and I didn't feel comfortable messing around
> in the depths of biopython. ?:)
>
> I would be happy to help with testing and I guess in the meantime I will be
> putting lots of if statements in my script.
>
> Jessica

Hi Jessica,

Do you have the information for the CodonTable handy? e.g. a list of the
start codons, and how to translate the 64 codons (including stop codons).
Given that I could show you how to make the CodonTable object.

Peter


From p.j.a.cock at googlemail.com  Tue Sep 14 10:39:34 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 11:39:34 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
Message-ID: <AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>

On Tue, Sep 14, 2010 at 10:49 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Hi Jessica,
>
> Do you have the information for the CodonTable handy? e.g. a list of the
> start codons, and how to translate the 64 codons (including stop codons).
> Given that I could show you how to make the CodonTable object.
>
> Peter
>

I've done a proof of principle change to Bio.Seq on this branch:
http://github.com/peterjc/biopython/tree/trans-table
specifically this commit:
http://github.com/peterjc/biopython/commit/56a2fd5f92098e9be892eb51f27b08aaa46a19a6

I'm not expecting you to try this code out yet (unless you happen to
know your way round git already). The basic idea is that the Bio.Seq
translate function and the Seq object translate method are extended
so that the table argument can now also be a CodonTable object.

Once we know what your table should look like, I can write a complete
example. Probably Bio.Data.CodonTable will need some more
documentation added...

Peter


From zaricdragoslav at gmail.com  Tue Sep 14 11:08:55 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 15:08:55 +0400
Subject: [Biopython] Intro
Message-ID: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>

Dear All,

I am new member and I like to send welcome greet to everyone.

I have few newbie questions so please be cooperative :)

1. How can I see biopython version and is there connection between python
version
    and biopython version ?

2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04 (at least I
think I did :)
    I have same installation on windows machine and everything works fine.
    But for example when I want to use something like this:

    from Bio import SeqIO
    orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")

    Two problems happens in ubuntu environment:
    first is that SeqIO complains that there is no index method
    second is that everywhere I should put string location of file
    biopython wants handle to file
    The first thing I can think of is maybe I am using old version of
    biopython, which points to question 1.

3. Does somebody have experience with using biopython in django web site ?
    Do I install biopython on web server or I can keep libraries in some
folder and
    load them dynamically in code ?

Kind regards,

Dragoslav Zaric
Programmer
Msc. in Astrophysics


From biopython at maubp.freeserve.co.uk  Tue Sep 14 11:44:47 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 12:44:47 +0100
Subject: [Biopython] Intro
In-Reply-To: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
Message-ID: <AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>

On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric
<zaricdragoslav at gmail.com> wrote:
> Dear All,
>
> I am new member and I like to send welcome greet to everyone.
>
> I have few newbie questions so please be cooperative :)
>

Hello and welcome :)

> 1. How can I see biopython version and is there connection between python
> version and biopython version ?

Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions
of Biopython may not have worked 100% on Python 2.7, but we did
previously support Python 2.3 and even older versions of Python.

There is a FAQ (frequently asked questions) section in the Tutorial for
how to determine the version of Biopython installed. Try:

import Bio
print Bio.__version__

The Tutorial for the latest release is online as PDF or HTML,
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/DIST/docs/tutorial/Tutorial.html

> 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04
> (at least I think I did :)

Based on the problems below, I don't think it worked.

> ? ?I have same installation on windows machine and everything works fine.
> ? ?But for example when I want to use something like this:
>
> ? ?from Bio import SeqIO
> ? ?orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")
>
> ? ?Two problems happens in ubuntu environment:
> ? ?first is that SeqIO complains that there is no index method

That does suggest you have an old version of Biopython. The index
function was added in Biopython 1.52, see:

http://biopython.open-bio.org/SRC/biopython/NEWS
http://news.open-bio.org/news/2009/09/biopython-release-152/

> ? ?second is that everywhere I should put string location of file
> ? ?biopython wants handle to file

Things like Bio.SeqIO will accept filenames in recent versions of
Biopython (since release 1.54), but older versions only accepted
file handles. This is discussed in an FAQ in recent versions of the
Tutorial which point to this section on handles:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles

> ? ?The first thing I can think of is maybe I am using old version of
> ? ?biopython, which points to question 1.

That does seem to be the problem.

> 3. Does somebody have experience with using biopython in django
> web site ? ?Do I install biopython on web server or I can keep libraries
> in some folder and?load them dynamically in code ?

I've used Biopython within TurboGears, but I haven't used django.
You should probably consult the django documentation for how they
recommend installing 3rd party libraries (e.g. they may recommend
using a virtual environment).

Peter


From zaricdragoslav at gmail.com  Tue Sep 14 12:04:47 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 16:04:47 +0400
Subject: [Biopython] Intro
In-Reply-To: <AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
Message-ID: <AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>

Peter,

Thank you so very much for detailed explanations.

I will try to upgrade biopython version under linux.

Kind regards,

Dragoslav Zaric


On Tue, Sep 14, 2010 at 3:44 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Tue, Sep 14, 2010 at 12:08 PM, Dragoslav Zaric
> <zaricdragoslav at gmail.com> wrote:
> > Dear All,
> >
> > I am new member and I like to send welcome greet to everyone.
> >
> > I have few newbie questions so please be cooperative :)
> >
>
> Hello and welcome :)
>
> > 1. How can I see biopython version and is there connection between python
> > version and biopython version ?
>
> Biopython currently supports Python 2.4, 2.5, 2.6 and 2.7. Older versions
> of Biopython may not have worked 100% on Python 2.7, but we did
> previously support Python 2.3 and even older versions of Python.
>
> There is a FAQ (frequently asked questions) section in the Tutorial for
> how to determine the version of Biopython installed. Try:
>
> import Bio
> print Bio.__version__
>
> The Tutorial for the latest release is online as PDF or HTML,
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
>
> > 2. I have installed python 2.6 and biopython 1.55 on ubuntu 9.04
> > (at least I think I did :)
>
> Based on the problems below, I don't think it worked.
>
> >    I have same installation on windows machine and everything works fine.
> >    But for example when I want to use something like this:
> >
> >    from Bio import SeqIO
> >    orchid_dict = SeqIO.index("d:\ls_orchid.fasta", "fasta")
> >
> >    Two problems happens in ubuntu environment:
> >    first is that SeqIO complains that there is no index method
>
> That does suggest you have an old version of Biopython. The index
> function was added in Biopython 1.52, see:
>
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://news.open-bio.org/news/2009/09/biopython-release-152/
>
> >    second is that everywhere I should put string location of file
> >    biopython wants handle to file
>
> Things like Bio.SeqIO will accept filenames in recent versions of
> Biopython (since release 1.54), but older versions only accepted
> file handles. This is discussed in an FAQ in recent versions of the
> Tutorial which point to this section on handles:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:appendix-handles
>
> >    The first thing I can think of is maybe I am using old version of
> >    biopython, which points to question 1.
>
> That does seem to be the problem.
>
> > 3. Does somebody have experience with using biopython in django
> > web site ?  Do I install biopython on web server or I can keep libraries
> > in some folder and load them dynamically in code ?
>
> I've used Biopython within TurboGears, but I haven't used django.
> You should probably consult the django documentation for how they
> recommend installing 3rd party libraries (e.g. they may recommend
> using a virtual environment).
>
> Peter
>


From bartek at rezolwenta.eu.org  Tue Sep 14 12:20:44 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 14 Sep 2010 14:20:44 +0200
Subject: [Biopython] Intro
In-Reply-To: <AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
Message-ID: <AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:04 PM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Peter,
>
> Thank you so very much for detailed explanations.
>
> I will try to upgrade biopython version under linux.
>
> Hi,

Since you mentioned that you are working on ubuntu, I wanted to add that you
should be careful when upgrading the python/biopython versions on your
machine.

You are most probably now running both python and biopython installed from
ubuntu packages, but if you want to upgrade, you have to choose between
taking newer packages from a newer ubuntu  (currently the newest ubuntu
10.10 beta contains biopython 1.53
http://packages.ubuntu.com/lucid/python-biopython) or compiling from source.
If you choose to install from source, be sure to first uninstall the old
version from the package:
sudo apt-get remove python-biopython

if you want to install from source, you will need some extra packages:
sudo apt-get install python-dev python-reportlab python-numpy

good luck
Bartek


From bartek at rezolwenta.eu.org  Tue Sep 14 13:13:59 2010
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 14 Sep 2010 15:13:59 +0200
Subject: [Biopython] Intro
In-Reply-To: <AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
	<AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
Message-ID: <AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:

>   (currently the newest ubuntu 10.10 beta contains biopython 1.53
> http://packages.ubuntu.com/lucid/python-biopython)


Just a small correction:
1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains
1.54 (http://packages.ubuntu.com/maverick/python-biopython)

sorry for the error
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From zaricdragoslav at gmail.com  Tue Sep 14 14:22:55 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 18:22:55 +0400
Subject: [Biopython] Intro
In-Reply-To: <AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>
References: <AANLkTimetM_2k0gk28kt4Eq6LD=N_9jUrpEBavo4BU7y@mail.gmail.com>
	<AANLkTimiZv4dFy6ux7-dimqNJwvouSUxioekT1Xf5rro@mail.gmail.com>
	<AANLkTi=9MbvQpp_3R0Hh6vhKVUnED+e4m4Yc9PkdhPdC@mail.gmail.com>
	<AANLkTi=UdhjHaD8WXp+QkoP0NTD0R_dwUFZT3hDN6Tg3@mail.gmail.com>
	<AANLkTikhvdohrgGyZB8ipH+19=FSazYSz_6y5KakQeXR@mail.gmail.com>
Message-ID: <AANLkTinYgWm9uLzw_CJCx=ZUQd8E9AgxST6RrMta5E=Q@mail.gmail.com>

Thanks for answers and help !

Actually I do not prefer to use ubuntu above 9.04 and there is no reason to
change
distribution because one program.

I just did

sudo apt-get remove python-biopython

and after this 1.55 was automatically activated. I did install 1.55 but it
looks like
older version of biopython from default package was masking new biopython
version.

Thanks again !

On Tue, Sep 14, 2010 at 5:13 PM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:

>
>
> On Tue, Sep 14, 2010 at 2:20 PM, Bartek Wilczynski <
> bartek at rezolwenta.eu.org> wrote:
>
>>   (currently the newest ubuntu 10.10 beta contains biopython 1.53
>> http://packages.ubuntu.com/lucid/python-biopython)
>
>
> Just a small correction:
> 1.53 is a version from 10.4 (lucid), while 10.10 beta (maverick) contains
> 1.54 (http://packages.ubuntu.com/maverick/python-biopython)
>
> sorry for the error
> Bartek
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics


From zaricdragoslav at gmail.com  Tue Sep 14 14:29:27 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Tue, 14 Sep 2010 18:29:27 +0400
Subject: [Biopython] Some books
Message-ID: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>

Dear friends,

I do not come from bioinformatics background, so can anybody
recommend some introducing book about bioinformatics so I can
cover the basics.

Of course there are a lot of python programming in biopython that is
out of biology (like parsing of database files, connect to databases), but
to get clear picture it is good to read some introducing book.

Is book

"Introduction to Bioinformatics" by Arthur Lesk

good one ?

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics


From p.j.a.cock at googlemail.com  Tue Sep 14 14:58:05 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 15:58:05 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240801c8b529c10a62@131.229.113.228>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
	<AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>
	<a06240801c8b529c10a62@131.229.113.228>
Message-ID: <AANLkTinRGCzPT-pQpgcJwNxHA31H7_HphVE+H=78S5cq@mail.gmail.com>

On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hi Peter,
>
> Here is the codon table, in the format I found in CodonTable.py.
>
> I will look at the links you sent, but I don't know if I will be able to
> follow it all. ?Thanks,
>
> Jessica
>
> ? ? ? ? ? ? ? ? ? ?table = {
> ? ? 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S',
> ? ? 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y',
> ? ? 'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L',
> ? ? 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P',
> ? ? 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
> ? ? 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I',
> ? ? 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T',
> ? ? 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K',
> ? ? 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
> ? ? 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A',
> ? ? 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D',
> ? ? 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G',
> ? ? 'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',},
> ? ? ? ? ? ? ? ? ? ?stop_codons = ['TAA' ],
> ? ? ? ? ? ? ? ? ? ?start_codons = [ 'ATG']
> ? ? ? ? ? ? ? ? ? ?)

OK, don't worry about the git branch stuff - I've just merged
this to the main repository. Are you happy with installing
Biopython from source? If so grab the latest source code
as described here:

http://www.biopython.org/wiki/SourceCode

Alternatively all you need to update is the Bio/Seq.py file
to the latest version:

http://github.com/biopython/biopython/raw/master/Bio/Seq.py

To use the new functionality, first you need to create a
CodonData object with your special table, and assuming
you are just working with unambiguous DNA that means:

from Bio.Data.CodonTable import CodonTable
c_uncinata_table = CodonTable(forward_table={
    'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L',
    'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S',
    'TAT': 'Y', 'TAC': 'Y',             'TAG': 'Q',
    'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W',
    'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L',
    'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M',
    'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
    'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
    'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
    'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V',
    'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'},
    start_codons = [ 'ATG'],
    stop_codons = ['TAA' ])

Note that order of the forward table dictionary entries
does not actually matter, however, I have moved the
TAG and TGA entries from the end to keep the whole
table in a standard order - I found this easier to check.

If you have the updated Bio.Seq module, then you
can do this:

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> seq =  Seq("AAATAGTGATAA", generic_dna)
>>> print seq.translate()
K***
>>> print seq.translate(table=c_uncinata_table)
KQW*

Or using strings,

>>> from Bio.Seq import translate
>>> print translate("AAATAGTGATAA")
K***
>>> print translate("AAATAGTGATAA", table=c_uncinata_table)
KQW*

Does that make sense? Does it do what you expect?
Don't hesitate to ask for clarification.

Peter


From Achim.Treumann at NEPAF.com  Tue Sep 14 14:51:29 2010
From: Achim.Treumann at NEPAF.com (Achim Treumann)
Date: Tue, 14 Sep 2010 15:51:29 +0100
Subject: [Biopython] Some books
In-Reply-To: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
Message-ID: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>

Dear Dragoslav, 

I cannot comment on Arthur Lesk's book - haven't read it. 

I can really recommend two freely available tutorials on Katja
Schuerer's website:
 
One of them is an introduction to programming using Python:
http://www.pasteur.fr/formation/infobio/python/

The other one is a Python course in Bioinformatics:
http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

Both of them provide you with numerous examples and take you through
tips and tricks on how to address bioinformatic problems using Python
and Biopython. 

I presume that you are familiar with the Biopython manual that is part
of the Biopython distribution:
http://www.biopython.org/DIST/docs/tutorial/Tutorial.html

Hope this helps, 
Best wishes, 
Achim

-----Original Message-----
From: biopython-bounces at lists.open-bio.org
[mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Dragoslav
Zaric
Sent: 14 September 2010 15:29
To: biopython at lists.open-bio.org
Subject: [Biopython] Some books

Dear friends,

I do not come from bioinformatics background, so can anybody
recommend some introducing book about bioinformatics so I can
cover the basics.

Of course there are a lot of python programming in biopython that is
out of biology (like parsing of database files, connect to databases),
but
to get clear picture it is good to read some introducing book.

Is book

"Introduction to Bioinformatics" by Arthur Lesk

good one ?

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at illinois.edu  Tue Sep 14 14:59:49 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 14 Sep 2010 09:59:49 -0500
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
Message-ID: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>

On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris


From p.j.a.cock at googlemail.com  Tue Sep 14 16:04:52 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Sep 2010 17:04:52 +0100
Subject: [Biopython] Some books
In-Reply-To: <01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com>
	<01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
Message-ID: <AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann
<Achim.Treumann at nepaf.com> wrote:
> Dear Dragoslav,
>
> ...
>
> The other one is a Python course in Bioinformatics:
> http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

The above Pasteur Institute course using Biopython is sadly very out of
date in places, and I have been unable to get in touch with the authors
to revise it or at least add some warning text to it.

Peter


From biopython at maubp.freeserve.co.uk  Tue Sep 14 16:07:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 14 Sep 2010 17:07:52 +0100
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
	<8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
Message-ID: <AANLkTin-S=G2d76qe0t0QZ1eZ0rGXmSAAUhDuJSxHZeX@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Sep 14, 2010, at 4:04 AM, Peter wrote:
>> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>>>
>>> Here is an example output, but this caan differ depending on the model used
>>> (there are several models for Branch, Site, BranchSite, but all are pretty
>>> standard)
>>
>> Thanks - that looks possible to parse, but not very easy (especially if the
>> codeml output changes slightly between versions).
>
> Just a warning from those experienced with paml parsers (bioperl): the
> output is notoriously shifty even between minor releases (sections get
> reordered, etc), so pretty much any parse needs to accommodate that.
> It's extremely frustrating.

Thanks Chris - I was afraid of that. It sounds like parsing plain text
NCBI BLAST output, but worse.

Do you know if anyone has asked about codeml outputting something
nicer to parse instead? e.g. Nexus or any kind of XML?

Peter


From Achim.Treumann at NEPAF.com  Tue Sep 14 16:20:36 2010
From: Achim.Treumann at NEPAF.com (Achim Treumann)
Date: Tue, 14 Sep 2010 17:20:36 +0100
Subject: [Biopython] Some books
In-Reply-To: <AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>
References: <AANLkTikkZ=gFn1J8OAxLQY9pVkuYrZGf9ZjV8PKtD6Rq@mail.gmail.com><01798D2396253A449511F31F1CDE83550FBAED@srv1.NEPAF.local>
	<AANLkTin-mT7VGqNF4NgdvT_c+wcH_86TxkHOCnSC-312@mail.gmail.com>
Message-ID: <01798D2396253A449511F31F1CDE83550FBAF2@srv1.NEPAF.local>

Hiya, 

I agree about this warning (and have come across a few bits where this
has caused problems) - despite that I found them very useful. 

Best wishes, 
Achim 

-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Sent: 14 September 2010 17:05
To: Achim Treumann
Cc: Dragoslav Zaric; biopython at lists.open-bio.org
Subject: Re: [Biopython] Some books

On Tue, Sep 14, 2010 at 3:51 PM, Achim Treumann
<Achim.Treumann at nepaf.com> wrote:
> Dear Dragoslav,
>
> ...
>
> The other one is a Python course in Bioinformatics:
> http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html

The above Pasteur Institute course using Biopython is sadly very out of
date in places, and I have been unable to get in touch with the authors
to revise it or at least add some warning text to it.

Peter


From natassa_g_2000 at yahoo.com  Tue Sep 14 16:15:18 2010
From: natassa_g_2000 at yahoo.com (natassa)
Date: Tue, 14 Sep 2010 09:15:18 -0700 (PDT)
Subject: [Biopython] Codeml parser in Biopython?
In-Reply-To: <8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
References: <533513.93597.qm@web52005.mail.re2.yahoo.com>
	<AANLkTimBzw0y7n6S70bX4D_ZzEuCTw3HW8UK8V3W87=p@mail.gmail.com>
	<937371.74873.qm@web52006.mail.re2.yahoo.com>
	<AANLkTimYbSaD0aOk7dgMn_QyF6C6nXAK0mS3WVbvrgoA@mail.gmail.com>
	<8667F93F-7BB0-442F-997D-62642F2BA80F@illinois.edu>
Message-ID: <884978.85315.qm@web52004.mail.re2.yahoo.com>

Thanks Chris, 
Good to know.. I am dealing with paml results for the first time, but somehow 
thought that outputs were standard. Apparently not...
Now that I started writing my own python parser, I see that even among models of 
the same run, the text changes without any obvious reason (from 'omega' to 'w' 
etc). Indeed frustrating!
Does the Bioperl solution  include different parsers for different types of 
analysis ex The Branch analysis models, another for the Site Analysis models 
etc? It would be good o have one for all, but I am not sure this is feasible...I 
start with separate parsers and will see how it can be generalized.
Thanks, 
Anastasia


________________________________
From: Chris Fields <cjfields at illinois.edu>
To: Peter <biopython at maubp.freeserve.co.uk>
Cc: natassa <natassa_g_2000 at yahoo.com>; biopython at biopython.org
Sent: Tue, September 14, 2010 4:59:49 PM
Subject: Re: [Biopython] Codeml parser in Biopython?

On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are 
standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the 
>>type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There 
is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is 
notoriously shifty even between minor releases (sections get reordered, etc), so 
pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris


From p.j.a.cock at googlemail.com  Wed Sep 15 17:10:46 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Sep 2010 18:10:46 +0100
Subject: [Biopython] unusual genetic code
In-Reply-To: <a06240802c8b6a77d8334@131.229.113.228>
References: <a06240808c8b421190255@131.229.113.228>
	<AANLkTi=BGd7t+SXyDVu749VU5ywWrFZ+eRfJXQ0-QkyA@mail.gmail.com>
	<a06240801c8b490b75d3f@10.0.1.4>
	<AANLkTimbR5eJasu7c9UhwHQNUF1BEkr-wqb0=6Dwv8o_@mail.gmail.com>
	<AANLkTi=vMZt-+ed+2t1LFz1GhDYi76L1BLgTuVKWfpBx@mail.gmail.com>
	<a06240801c8b529c10a62@131.229.113.228>
	<AANLkTinRGCzPT-pQpgcJwNxHA31H7_HphVE+H=78S5cq@mail.gmail.com>
	<a06240802c8b6a77d8334@131.229.113.228>
Message-ID: <AANLkTinLSfzyQ8Uou4tgSHanGDN1LgQHM_71bmPmRS_C@mail.gmail.com>

On Wed, Sep 15, 2010 at 5:50 PM, Jessica Grant <jgrant at smith.edu> wrote:
>Peter wrote:
>> ...
>> To use the new functionality, first you need to create a
>> CodonData object with your special table, and assuming
>> you are just working with unambiguous DNA that means:
>> ...
>> Does that make sense? Does it do what you expect?
>> Don't hesitate to ask for clarification.
>>
>> Peter
>
> It works!  Thanks so much!!
>
> Jessica

Great - thanks for letting us know.

Peter


From cwalentas at gmail.com  Thu Sep 16 04:36:13 2010
From: cwalentas at gmail.com (Christopher Walentas)
Date: Thu, 16 Sep 2010 00:36:13 -0400
Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized
	relational resource
Message-ID: <4C919EBD.3080802@gmail.com>

  Apologies in advance- all of this is very new to me- and I hope that 
this is the proper forum for this query.

What I would like to do is parse the returns of an entrez pubmed search 
into their smallest, unique useful bits and create a relational database 
(sqlite, dee?).  Ideally this would not only be of returned fields, but 
also drilling further down into say affiliation, addresses, etc...

I believe that I've mastered the search and download functions and 
individual citations exist as a stacked dictionary of the xml outputs.

Where I am falling down is understanding how to extract the structure of 
these outputs and create a persistent relational resource that's been 
normalized such that these fields can be mapped to used to "correct" 
values in an uncurated dataset with highly analogous fields.

I've been struggling to bridge the gap between python and sqlite/dee, 
however have recently been informed that it might be possible to do 
everything within python itself and again apologies for any navieties- 
they are indeed sincere, however I'm well aware that a little knowledge 
can be dangerous- hence reaching out.

 From what I've already read, it would seem that all of this is ideally 
suited to bio-/python and am looking forward to learning- I'm just 
looking for that swift shove in the right direction and to benefit from 
your collective informed guidance.

Cheers in advance,
christopher


From mjldehoon at yahoo.com  Thu Sep 16 10:53:55 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 16 Sep 2010 03:53:55 -0700 (PDT)
Subject: [Biopython] Fwd: NETTAB 2010: Submission deadline is approaching:
	Sep 24, 2010
Message-ID: <841162.95892.qm@web62405.mail.re1.yahoo.com>


--- On Thu, 9/16/10, Paolo Romano <paolo.romano at istge.it> wrote:

> From: Paolo Romano <paolo.romano at istge.it>
> Subject: Fwd: NETTAB 2010: Submission deadline is approaching: Sep 24,  2010
> To: biopython-owner at lists.open-bio.org
> Date: Thursday, September 16, 2010, 3:25 AM
> Dear list owner,
> 
> I would be glad if you would forward thsi message to the
> list.
> 
> Many thanks in adavnce.
> 
> Ciao. Paolo
> 
> >Date: Wed, 15 Sep 2010 17:50:48 +0200
> >To: biopython at lists.open-bio.org
> >From: Paolo Romano <paolo.romano at istge.it>
> >Subject: NETTAB 2010: Submission deadline is
> approaching: Sep 24, 2010
> >
> >I hope this announcement can be of interest for this
> list.
> >
> >Forgive me if I'm wrong!
> >
> >Ciao. Paolo
> >
> >==========
> >NETTAB 2010 on "Biological Wikis"
> >joint with the BBCC 2010 workshop on Bioinformatics and
> 
> >Computational Biology in Campania
> >
> >November 29 - December 1, 2010, Naples, Italy
> >http://www.nettab.org/2010/
> >http://bioinformatica.isa.cnr.it./BBCC/BBCC2010/
> >
> >
> >The deadline for the submission of oral communications
> is quickly 
> >approaching, submit you contribution within next
> >Friday September 24, 2010 through the EasyChair site (
> 
> >http://www.easychair.org/conferences/?conf=nettab2010
> ).
> >
> >The lenght of contributions for oral communications
> should be 
> >between 3 and 5 pages, including tables and figures.
> >See more instructions below.
> >
> >
> >NETTAB 2010 workshop promises to be a great meeting for
> all 
> >researchers involved in the exploitation of wikis in
> biology.
> >
> >Don't miss this opportunity to discuss your ideas and
> doubts with 
> >such scientists as
> >- Alex Bateman, Wellcome Trust Sanger Institute,
> Hinxton, Cambridge, 
> >United Kingdom
> >- Alexander Pico, Gladstone Institute of Cardiovascular
> Disease, San 
> >Francisco, USA
> >- Andrew Su, Bioinformatics and Computational Biology,
> Genomics 
> >Institute of the Novartis Research Foundation (GNF),
> San Diego, USA
> >- Dan Bolser, College of Life Sciences, University of
> Dundee, 
> >Scotland, United Kingdom
> >- Robert Hoffmann, Computational Biology Center, cBIO,
> Memorial 
> >Sloan-Kettering Cancer Center, MSKCC, New York, USA
> >- Thomas Kelder, Department of Bioinformatics (BiGCaT),
> Maastricht 
> >University, the Netherlands
> >- Jaime Prilusky, Bioinformatics, Weizmann Institute of
> Science, 
> >Rehovot, Israel
> >- and many other who, we hope, will join the workshop.
> >
> >Here below, please find a summary of the Call. The
> complete Call is 
> >available on-line at http://www.nettab.org/2010/call.html .
> >
> >Further information is availble at http://www.nettab.org/2010/ .
> >
> >============
> >CALL FOR PAPERS
> >
> >TOPICS
> >The following list is not meant to be exclusive of any
> further 
> >topics as stated above.
> >Submitted contributions should address one or more of
> the following topics:
> >? ???* Wiki development tools
> >? ? ? ? ???o
> Wikimedia
> >? ? ? ? ???o
> Wikimedia extensions
> >? ? ? ? ???o
> Semantic Wikis
> >? ? ? ? ???o
> Wiki-coupled CMSs
> >? ? ? ? ???o Other
> wikis
> >? ???* Arising issues for the
> biomedical domain:
> >? ? ? ? ???o
> Authoritativeness of contributions and sites
> >? ? ? ? ???o Quality
> assessment
> >? ? ? ? ???o Users
> acknowledgement
> >? ? ? ? ???o
> Stimulatation of quality contributions
> >? ? ? ? ???o
> Authorships management and reward
> >? ? ? ? ???o
> 'Scientific production' value for contributions
> >? ? ? ? ???o
> Management of bioinformatics data types
> >? ???* Wikis and collaborative
> systems for:
> >? ? ? ? ???o
> Genomics, proteomics, metabolomics, any -omics
> >? ? ? ? ???o
> Proteins analysis and visualization
> >? ? ? ? ???o gene
> and proteins interactions
> >? ? ? ? ???o
> metabolic pathways
> >? ? ? ? ???o
> oncology research
> >? ???* Issues to be tackled by wiki
> and collaborative research for:
> >? ? ? ? ???o
> Genomics, proteomics, metabolomics, any -omics
> >? ? ? ? ???o
> Proteins analysis and visualization
> >? ? ? ? ???o gene
> and proteins interactions
> >? ? ? ? ???o
> metabolic pathways
> >? ? ? ? ???o
> oncology research
> >
> >The NETTAB 2010 workshop is a joint event with the BBCC
> 2010 workshop on
> >This deadline also applies to the BBCC 2010 workshop.
> >Submit for BBCC through the same EasyChair site and
> select 'BBCC 
> >session' topic.
> >
> >
> >TYPE OF CONTRIBUTIONS
> >
> >The following possible contributions are sought:
> >? ???* Oral communications
> >? ???* Posters
> >? ???* Software demos
> >All accepted contributions will be published in the
> proceedings of 
> >the workshop.
> >
> >
> >DEADLINES
> >
> >* September 24, 2010: Oral communications submission
> >? ? ? ? ???o
> Decisions announced: October 24, 2010
> >
> >* October 29, 2010: Early registration ends
> >
> >* November 29 - December 1, 2010: Workshop and
> Tutorials
> >
> >
> >INSTRUCTIONS
> >Kindly follow the instructions carefully when preparing
> your 
> >contribution and submit your contribution through the
> EasyChair 
> >system at http://www.easychair.org/conferences/?conf=nettab2010.
> >
> >All contributions should follow the same format, as
> specified here:
> >font type: Times New Roman, font size: 12 pti, page
> size: A4, left 
> >and right margins: 2.0 cm, upper margin: 2.5 cm, lower
> margin: 2.0 cm.
> >
> >The lenght of contributions for oral communications
> should be 
> >between 3 and 5 pages, including tables and figures.
> >They should include: Abstract, Introduction, Methods,
> Results and 
> >Discussion, References.
> >All contributions for oral communications will be
> evaluated by at 
> >least three referees.
> >
> >For any further information or clarification, please
> contact the 
> >organization by email at info at nettab.org.
> >
> >
> >ORGANIZATION (see http://www.nettab.org/2010/organization.html for 
> >the Scientific Committee and more information)
> >
> >Co-chairs
> >? ???* Angelo Facchiano, CNR-ISA,
> Avellino, Italy
> >? ???* Paolo Romano, National
> Cancer Research Institute, Genoa, Italy
> >
> >We look forward to meeting you in Naples!
> >
> >Paolo Romano and Angelo Facchiano
> >???on behalf of the Scientific
> Committee
> >
> >
> >Paolo Romano (paolo.romano at istge.it)
> >Bioinformatics
> >National Cancer Research Institute (IST)
> >Largo Rosanna Benzi, 10, I-16132, Genova, Italy
> >Tel: +39-010-5737-288? Fax: +39-010-5737-295?
> Skype: p.romano
> >Web: http://www.nettab.org/promano/
> >
> >
> >
> >
> 
> 
> Paolo Romano (paolo.romano at istge.it)
> Bioinformatics
> National Cancer Research Institute (IST)
> 
> 
> 
> 


From zaricdragoslav at gmail.com  Sun Sep 19 06:33:06 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 10:33:06 +0400
Subject: [Biopython] Trhird party library
Message-ID: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>

Did anybody used biopython as third part library, like for example in python
web project ?
I ask this because probably you can not expect to find or install biopython
in provider server
environment.

For example, after installing biopython in windows environment, you can see
that biopython is
installed inside python 2.6 installation:

C:\Python26\Lib\site-packages\Bio
C:\Python26\Lib\site-packages\BioSQL
C:\Python26\Lib\site-packages\numpy

So can you copy these folders to, for example, \Lib\ folder of web project,
and reference them
somehow from code ?

Of course I can test this by myself, and I will do this, but maybe somebody
have experience
with this problem, and it would be probably good info for others in this
forum.

Kind regards

-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics


From chapmanb at 50mail.com  Sun Sep 19 10:51:19 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 19 Sep 2010 06:51:19 -0400
Subject: [Biopython] Parsing Pubmed-Entrez searches into a normalized
 relational resource
In-Reply-To: <4C919EBD.3080802@gmail.com>
References: <4C919EBD.3080802@gmail.com>
Message-ID: <20100919105119.GC2030@kunkel>

Christopher;

> What I would like to do is parse the returns of an entrez pubmed
> search into their smallest, unique useful bits and create a
> relational database (sqlite, dee?).  Ideally this would not only be
> of returned fields, but also drilling further down into say
> affiliation, addresses, etc...
[...]
> Where I am falling down is understanding how to extract the
> structure of these outputs and create a persistent relational
> resource that's been normalized such that these fields can be mapped
> to used to "correct" values in an uncurated dataset with highly
> analogous fields.

This is the standard problem of represent object style data in a
flat relational database. It's tough to answer succinctly on a
mailing list, as there are entire textbooks and courses devoted to
the problem. The wikipedia entry on normalization and first normal
form is a good place to get started:

http://en.wikipedia.org/wiki/Database_normalization

As far as accessing relational databases, Python is great for this.
An object relational mapper like SQLAlchemy:

http://www.sqlalchemy.org/

is a great place to get started. This allows you to deal more
directly with objects, and also generalizes database access so you
can quickly switch from SQLite to MySQL to whatever.

Another suggestion is to use a document oriented database like
MongoDB for storing your data:

http://www.mongodb.org/

This allows you to store objects without flattening them, which may
be more intuitive for the XML/dictionary results you get back from
Entrez searches.

Hope this helps,
Brad


From chapmanb at 50mail.com  Sun Sep 19 10:44:40 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 19 Sep 2010 06:44:40 -0400
Subject: [Biopython] Third party library
In-Reply-To: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
Message-ID: <20100919104440.GB2030@kunkel>

Dragoslav;

> Did anybody used biopython as third part library, like for example in python
> web project ?

Yes, absolutely. Biopython doesn't behave any different than other
Python third party libraries, so there wouldn't be any special
instructions outside the documentation for the library you are
using.

> I ask this because probably you can not expect to find or install biopython
> in provider server environment.

It's tough to answer this generally without knowing what framework
you are planning to use. For an example, Google App Engine has a
restricted environment where only pure Python libraries work. As an
install procedure you can most simply do:

python setup.py build

and then copy the libraries from build/lib.your_platform to the
site-libraries location in your application.

More formally, virtualenv is also very useful for building an isolated
Python environment with only the libraries for a project:

http://pypi.python.org/pypi/virtualenv

> For example, after installing biopython in windows environment, you can see
> that biopython is
> installed inside python 2.6 installation:
> 
> C:\Python26\Lib\site-packages\Bio
> C:\Python26\Lib\site-packages\BioSQL
> C:\Python26\Lib\site-packages\numpy
> 
> So can you copy these folders to, for example, \Lib\ folder of web project,
> and reference them somehow from code ?

Sure, that all seems fine but it's hard to offer specific advise
without knowing exactly what you are doing. The best place for
questions is probably in the community of the web framework you are
using. Everything that applies to other third party libraries will
apply to Biopython.

Hope this helps,
Brad


From sdavis2 at mail.nih.gov  Sun Sep 19 11:02:45 2010
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sun, 19 Sep 2010 07:02:45 -0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
Message-ID: <AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>

On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Did anybody used biopython as third part library, like for example in
> python
> web project ?
> I ask this because probably you can not expect to find or install biopython
> in provider server
> environment.
>
> For example, after installing biopython in windows environment, you can see
> that biopython is
> installed inside python 2.6 installation:
>
> C:\Python26\Lib\site-packages\Bio
> C:\Python26\Lib\site-packages\BioSQL
> C:\Python26\Lib\site-packages\numpy
>
> So can you copy these folders to, for example, \Lib\ folder of web project,
> and reference them
> somehow from code ?
>
> Of course I can test this by myself, and I will do this, but maybe somebody
> have experience
> with this problem, and it would be probably good info for others in this
> forum.
>
>
Hi, Dragoslav.  The python developers thought of this problem.

http://docs.python.org/install/#alternate-installation-the-home-scheme

Sean


From zaricdragoslav at gmail.com  Sun Sep 19 12:11:48 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 16:11:48 +0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
Message-ID: <AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>

Anyway,

I will try simplest thing, to copy folder with biopython modules in some
folder of web app and access modules trough absolute path of web server,
this must work.

At first I planned to use django web framework, but I recently discovered
there are
many python web frameworks. So i prefer most simplistic and effective
frameworks,
I will check out

web.py

it looks nice at first glance.

Kind regards

On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:

>
>
> On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <zaricdragoslav at gmail.com
> > wrote:
>
>> Did anybody used biopython as third part library, like for example in
>> python
>> web project ?
>> I ask this because probably you can not expect to find or install
>> biopython
>> in provider server
>> environment.
>>
>> For example, after installing biopython in windows environment, you can
>> see
>> that biopython is
>> installed inside python 2.6 installation:
>>
>> C:\Python26\Lib\site-packages\Bio
>> C:\Python26\Lib\site-packages\BioSQL
>> C:\Python26\Lib\site-packages\numpy
>>
>> So can you copy these folders to, for example, \Lib\ folder of web
>> project,
>> and reference them
>> somehow from code ?
>>
>> Of course I can test this by myself, and I will do this, but maybe
>> somebody
>> have experience
>> with this problem, and it would be probably good info for others in this
>> forum.
>>
>>
> Hi, Dragoslav.  The python developers thought of this problem.
>
> http://docs.python.org/install/#alternate-installation-the-home-scheme
>
> Sean
>
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics


From rodrigo_faccioli at uol.com.br  Sun Sep 19 13:59:47 2010
From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli)
Date: Sun, 19 Sep 2010 10:59:47 -0300
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
	<AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
Message-ID: <AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>

Hi,

I've worked with BioPython in web project. I've installed BioPython normally
in our ubuntu server.

My web project was developed its front-end in jsp. But I ran my scripts with
BioPython. You can find this project in
http://glu.fcfrp.usp.br:8180/prometheus/

About the python frameworks, I've read Django is an excellent framework.

Thanks in advance,

--
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structure Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-9366 Ext 229
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5


On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric
<zaricdragoslav at gmail.com>wrote:

> Anyway,
>
> I will try simplest thing, to copy folder with biopython modules in some
> folder of web app and access modules trough absolute path of web server,
> this must work.
>
> At first I planned to use django web framework, but I recently discovered
> there are
> many python web frameworks. So i prefer most simplistic and effective
> frameworks,
> I will check out
>
> web.py
>
> it looks nice at first glance.
>
> Kind regards
>
> On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> >
> >
> > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <
> zaricdragoslav at gmail.com
> > > wrote:
> >
> >> Did anybody used biopython as third part library, like for example in
> >> python
> >> web project ?
> >> I ask this because probably you can not expect to find or install
> >> biopython
> >> in provider server
> >> environment.
> >>
> >> For example, after installing biopython in windows environment, you can
> >> see
> >> that biopython is
> >> installed inside python 2.6 installation:
> >>
> >> C:\Python26\Lib\site-packages\Bio
> >> C:\Python26\Lib\site-packages\BioSQL
> >> C:\Python26\Lib\site-packages\numpy
> >>
> >> So can you copy these folders to, for example, \Lib\ folder of web
> >> project,
> >> and reference them
> >> somehow from code ?
> >>
> >> Of course I can test this by myself, and I will do this, but maybe
> >> somebody
> >> have experience
> >> with this problem, and it would be probably good info for others in this
> >> forum.
> >>
> >>
> > Hi, Dragoslav.  The python developers thought of this problem.
> >
> > http://docs.python.org/install/#alternate-installation-the-home-scheme
> >
> > Sean
> >
> >
>
>
>
> --
> Dragoslav Zaric
>
> Professional Programmer
> MSc Astrophysics
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From zaricdragoslav at gmail.com  Sun Sep 19 14:34:37 2010
From: zaricdragoslav at gmail.com (Dragoslav Zaric)
Date: Sun, 19 Sep 2010 18:34:37 +0400
Subject: [Biopython] Trhird party library
In-Reply-To: <AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>
References: <AANLkTimFxhqskC8qvsxB1bRQpJGFEZwcXMkqQy7sfZ0j@mail.gmail.com>
	<AANLkTinYbSzyStG=opAaB_5EmdLDDv8NQRX8g7WzayCS@mail.gmail.com>
	<AANLkTim70C45S8u5zhyhiDihGYg0-paTdzc-nVQPU+36@mail.gmail.com>
	<AANLkTinkXbB3uyy18ZPODKv2_4EPDWa5-0MB0EXSJBy7@mail.gmail.com>
Message-ID: <AANLkTiksp+WOsu5obF2cdTdOJweM4+UR3Tkq1s2ef+jf@mail.gmail.com>

Thanks Rodrigo,

I have come to same conclusion after little searching.
Also hosting for django is very common.

kind regards

On Sun, Sep 19, 2010 at 5:59 PM, Rodrigo Faccioli <
rodrigo_faccioli at uol.com.br> wrote:

> Hi,
>
> I've worked with BioPython in web project. I've installed BioPython
> normally
> in our ubuntu server.
>
> My web project was developed its front-end in jsp. But I ran my scripts
> with
> BioPython. You can find this project in
> http://glu.fcfrp.usp.br:8180/prometheus/
>
> About the python frameworks, I've read Django is an excellent framework.
>
> Thanks in advance,
>
> --
> Rodrigo Antonio Faccioli
> Ph.D Student in Electrical Engineering
> University of Sao Paulo - USP
> Engineering School of Sao Carlos - EESC
> Department of Electrical Engineering - SEL
> Intelligent System in Structure Bioinformatics
> http://laips.sel.eesc.usp.br
> Phone: 55 (16) 3373-9366 Ext 229
> Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
> Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5
>
>
> On Sun, Sep 19, 2010 at 9:11 AM, Dragoslav Zaric
> <zaricdragoslav at gmail.com>wrote:
>
> > Anyway,
> >
> > I will try simplest thing, to copy folder with biopython modules in some
> > folder of web app and access modules trough absolute path of web server,
> > this must work.
> >
> > At first I planned to use django web framework, but I recently discovered
> > there are
> > many python web frameworks. So i prefer most simplistic and effective
> > frameworks,
> > I will check out
> >
> > web.py
> >
> > it looks nice at first glance.
> >
> > Kind regards
> >
> > On Sun, Sep 19, 2010 at 3:02 PM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >
> > >
> > >
> > > On Sun, Sep 19, 2010 at 2:33 AM, Dragoslav Zaric <
> > zaricdragoslav at gmail.com
> > > > wrote:
> > >
> > >> Did anybody used biopython as third part library, like for example in
> > >> python
> > >> web project ?
> > >> I ask this because probably you can not expect to find or install
> > >> biopython
> > >> in provider server
> > >> environment.
> > >>
> > >> For example, after installing biopython in windows environment, you
> can
> > >> see
> > >> that biopython is
> > >> installed inside python 2.6 installation:
> > >>
> > >> C:\Python26\Lib\site-packages\Bio
> > >> C:\Python26\Lib\site-packages\BioSQL
> > >> C:\Python26\Lib\site-packages\numpy
> > >>
> > >> So can you copy these folders to, for example, \Lib\ folder of web
> > >> project,
> > >> and reference them
> > >> somehow from code ?
> > >>
> > >> Of course I can test this by myself, and I will do this, but maybe
> > >> somebody
> > >> have experience
> > >> with this problem, and it would be probably good info for others in
> this
> > >> forum.
> > >>
> > >>
> > > Hi, Dragoslav.  The python developers thought of this problem.
> > >
> > > http://docs.python.org/install/#alternate-installation-the-home-scheme
> > >
> > > Sean
> > >
> > >
> >
> >
> >
> > --
> > Dragoslav Zaric
> >
> > Professional Programmer
> > MSc Astrophysics
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
Dragoslav Zaric

Professional Programmer
MSc Astrophysics