Rap Battle

 tags and the raw text) rather than the
expected HTML. Thus, the NCBIWWW parser chokes on the file, since it
is not really the HTML it expects.

It turns out that the problem was that the format_type parameter of
blast defaulted to 'html'. NCBI no longer accepts this and only
takes 'HTML'. I updated the blast function so that it now has 'HTML'
as the default, so you can either update from CVS or pass
format_type = "HTML" as an argument to the blast call.

After these changes, the parser seems to work fine. Thanks for the
report and hope this helps.
Brad
From anunberg at oriongenomics.com  Mon Mar  8 17:23:01 2004
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Mon Mar  8 17:26:56 2004
Subject: [BioPython] Updates to CVS -- please do test
In-Reply-To: <20040229223143.GJ24150@evostick.agtec.uga.edu>
Message-ID: 

I tried running my fasta indexing script which normally works and got the
following error
File "/loginhome/anunberg/bin/index_fasta.py", line 39, in ?
    main()
  File "/loginhome/anunberg/bin/index_fasta.py", line 36, in main
    Fasta.index_file(fasta_file,options.name,get_id)
  File "/compbio/lib/python/Bio/Fasta/__init__.py", line 229, in index_file
    SimpleSeqRecord.create_berkeleydb([filename], indexname, indexer)
  File "/compbio/lib/python/Bio/Mindy/SimpleSeqRecord.py", line 98, in
create_berkeleydb
    from Bio.Mindy import BerkeleyDB
  File "/compbio/lib/python/Bio/Mindy/BerkeleyDB.py", line 2, in ?
    from bsddb3 import db
ImportError: No module named bsddb3

Do I need to install something? I did a check out of cvs and installed it..
Andy
> Hello everyone;
> I wanted to write a quick mail because I made a number of changes to
> CVS. Specifically, I did some work on the GenBank parser and then
> checked in the new Martel-based Fasta parser I wrote about last
> week. Since I know these are some of the more widely used modules in
> Biopython, this might make the CVS a little more unstable (well,
> potentially having more bugs) than normal.
> 
> I'd appreciate it if interested people would check it out and give
> the new modules and changes a spin. If we can catch and squish bugs
> now, then it'll help make the next release smooth as normal.
> 
> For the curious, here are the major changes:
> 
> -> Checked in the Fasta parser using Martel. The gruesome details
> were described here:
> 
> http://portal.open-bio.org/pipermail/biopython/2004-February/001877.html
> 
> -> Updated the GenBank parser, specifically the Martel GenBank
> format. This involved several things:
> * Removing the restricted list of names of feature and qualifier
>   keys. We now use more general regular expressions. Hopefully
>   this will make life easier for developers and users.
> * Adding useful bits of code from the redundant
>   Bio/expressions/genbank.py, which is Andrew's take on the
>   GenBank parsing problem.
> * Moved Bio/GenBank/genbank_format.py to
>   Bio/expressions/genbank.py to keep the Martel formats together.
> 
> -> Misc fixes to GenBank/__init__.py to make the new changes work.
> 
> Thanks in advance for testing and reporting bugs!
> Brad
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 

-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com

From chapmanb at uga.edu  Mon Mar  8 21:02:54 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Mon Mar  8 21:14:45 2004
Subject: [BioPython] Updates to CVS -- please do test
In-Reply-To: 
References: <20040229223143.GJ24150@evostick.agtec.uga.edu>

Message-ID: <20040309020254.GB30775@evostick.agtec.uga.edu>

Hi Andy;
Thanks for checking out the changes in CVS.

> I tried running my fasta indexing script which normally works 
> and got the following error
[...]
> ImportError: No module named bsddb3
> 
> Do I need to install something? I did a check out of cvs and 
> installed it..

It's complaining about not having the bsddb3 module -- which are
bindings to BerkeleyDB.

After some reflection and digging around, I'm realizing that I made
a mistake in making the Berkeley indexing the default -- although
bsddb is included with Python, it seems like it will not be built on
a lot of platforms. Sounds like more trouble then it solves.

So, I've adjusted the scripts to use the Flat file indexing by
default, with an option for BerkeleyDB indexing if you want to do it
that way. Introducing new required dependencies is bad.

So, I've just fixed the index_file function in CVS -- please try out
the new version and let me know if it gives you any problems (it
also requires a fix to Bio/Mindy which is also checked in). Thanks
again for testing things out -- I'm very happy to have people
looking at this.

Brad
From dlondon at ebi.ac.uk  Wed Mar 10 09:35:02 2004
From: dlondon at ebi.ac.uk (Darin London)
Date: Wed Mar 10 09:40:40 2004
Subject: [BioPython] BOSC 2004 Announcement and Call for Papers (fwd)
Message-ID: 

 {Please pass the word!}

 MEETING ANNOUNCEMENT & CALL FOR SPEAKERS

 The 5th annual Bioinformatics Open Source Conference (BOSC'2004) is 
 organized by the not-for-profit Open Bioinformatics Foundation. The 
 meeting will take place July 29-30, 2004 in Glasgow, Scotland, and is 
 one of several Special Interest Group (SIG) meetings occurring in 
 conjunction with the 12th International Conference on Intelligent 
 Systems for Molecular Biology.

 see http://www.iscb.org/ismb2004/ for more information.

 The focus of the meeting will be on current and emerging Open Source** 
 informatics tools and toolkits. BOSC provides a forum for developers, 
 project groups, users and interested parties to meet personally, exchange ideas and 
 collaborate together.

 In addition, keynote speeches from well known Open Source Bioinformatics 
 leaders are being planned.

 BOSC PROGRAM & CONTACT INFO

 * Web: http://www.open-bio.org/bosc2004/
 * Email: bosc@open-bio.org
 * Online registration: https://www.cteusa.com/iscb3/

 FEES

 * Corporate :GBP ?165.00 british pounds sterling
 * Academic : GBP ?120.00 british pounds sterling
 * Student : GBP ?90.00 british pounds sterling

 A 17.5% Valued Added Tax(VAT) will be added to all fees.

 Note: We have tried to set our fees as low as possible without risking 
 the chance that the foundation will lose money on the event. We budget 
 with the goal of breaking even on costs or realizing a small profit.

 REGISTER ONLINE FOR BOSC'2004 & ISMB AT:
 https://www.cteusa.com/iscb3/

 SPEAKERS & ABSTRACTS WANTED

 The program committee is currently seeking abstracts for talks at BOSC 
 2004. BOSC is a great opportunity for you to tell the community about 
 your use, development, or philosophy of open source software development 
 in bioinformatics. The committee will select several submitted abstracts 
 for 25-minute talks and others for shorter "lightning" talks. Accepted 
 abstracts will be published on the BOSC web site.

 If you are interested in speaking at BOSC 2004, 
 please send us:

 * an abstract (no more than a few paragraphs)
 * a URL for the project page, if applicable  
 * information about the open source license used for your software or
   your release plans.

 LIGHTNING-TALK SPEAKERS WANTED!

 The program committee is currently seeking speakers for the lightning 
 talks at BOSC 2004. Lightning talks are quick - only five minutes 
 long - and a great opportunity for you to give people a quick 
 summary of your open source project, code, idea, or vision of the future.

 If you are interested in giving a lightning talk at BOSC 2004, 
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 We will accept entries on-line until BOSC starts, but
 space for demos and lightning talks is limited.

 SOFTWARE DEMONSTRATIONS WANTED!

 If you are involved in the development of Open Source Bioinformatics 
 Software, you are invited to provide a short demonstration to attendees 
 of BOSC 2004.

 If you are interested in giving a software demonstration at BOSC 2004,
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * Internet connectivity requirements (e.g. website Application served on 
   the world wide web, or web based client application).

   We will accept entries on-line until the BOSC starts, but
   space for demos and lightning talks is limited.  

 ** Because the mission of the OBF is to promote Open Source software, we 
 will favor submissions for projects that apply a recognized Open Source 
 License, or adhere to the general Open Source Philosophy.

 See the following websites for further details:
 href="http://www.opensource.org/licenses/
 href="http://www.opensource.org/docs/definition.php

From qinfo8 at bvimailbox.com  Wed Mar 10 11:45:22 2004
From: qinfo8 at bvimailbox.com (Info)
Date: Wed Mar 10 11:50:55 2004
Subject: [BioPython] Communique / Press release
Message-ID: <200403101650.i2AGoltk009453@portal.open-bio.org>

Publications Canadiennes / Canadian Publications
4865 Hwy 138, r.r. 1
St-Andrews west
Ontario, KOC 2A0

PRESS RELEASE

CANADIAN SUBSIDY DIRECTORY YEAR 2004 EDITION

Legal Deposit-National Library of Canada
ISBN 2-922870-05-7

The Canadian Subsidy Directory 2004 is now available, newly revised it is
the most
complete and affordable reference for anyone looking for financing.
It is the perfect tool for new and existing businesses, individuals,
foundations 
and associations.

This Publication contains  more than 2000 direct and indirect
financial subsidies, grants and loans offered by government departments and
agencies, foundations, associations and organisations.  In this new 2004
edition
all programs are well described.

The Canadian Subsidy Directory is the most comprehensive tool to start up a
business, improve existent activities, set up a business plan, or obtain
assistance from experts in fields such as: Industry, transport, agriculture,
communications, municipal infrastructure, education, import-export, labor,
construction and renovation, the service sector, hi-tech industries,
research
and development, joint ventures, arts, cinema, theatre, music and recording
industry, the self employed, contests, and new talents.
Assistance from and for foundations and associations, guidance to prepare a
business plan, market surveys, computers, and much more!

The Canadian Subsidy Directory is sold $ 69.95, to obtain a copy please
call 
819-322-5756 or visit the web site at:  http://www.netpublications.net
From libsvm at tom.com  Wed Mar 10 20:00:08 2004
From: libsvm at tom.com (denny)
Date: Wed Mar 10 20:07:26 2004
Subject: [BioPython] Is there any more detailed documentation to BioPython?
Message-ID: <200403110100.i2B106tk013180@portal.open-bio.org>

Hi,
	I use BioPython about one year and it is really a good programming language.
But I think the ONLY drawback is that BioPython has simple or poor documentation.
When I use a certain module in BioPython, I have to read all of its source code.
It is really not very convenience.On the contrast, BioPerl gives more detailed documentation. 
Maybe someone has the same feeling like me.

Regards.

Denny

����������������libsvm@tom.com
��������������������2005-03-12

From dag at sonsorol.org  Wed Mar 10 21:30:13 2004
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed Mar 10 21:35:56 2004
Subject: [BioPython] O|B|F mail update -- making progress on anti-spam
 issues with our mailing lists
Message-ID: <404FCF35.5010705@sonsorol.org>

Hi folks,

Apologies for the cross-posting but I just wanted to give our list 
members and admins an update on some new anti-spam measures we have 
(re)enabled. Good news to report basically...

The most annoying spams recently have been the simple plain text 
messages without any HTML, attachments or mime-encoding that just slip 
right by our filters.  Some lists have been forced to switch over to 
"only members can post" while other lists (like bioperl) have 
consistantly voted to stay as open as possible.

I'll update you on our current efforts as well as a new effort that is 
about 24 hours old but already working really well so far.

Until yesterday we had three main lines of defense against spam:

1. The mailserver itself (rejects mail from nonexistant domains, etc.)

2. The sendmail Mail::Milter extention (MIMEDefang+SpamAssassin are used 
to scan all incoming messages. Anything that scores higher than 8.0 is 
simply discarded automatically. MIMEDefang also strips dangerous 
attachments like .exe and .pif)

3. Our mailing list moderation queue (emails with attachments, odd MIME 
encodings and spamassassin scores from 0.0 - 7.9 are held in a moderator 
queue for a human to make an accept/discard decision)

Here are some stats on how this system worked over the past few days:

  o 138 attempts to relay mail through our server blocked
  o 192 emails blocked due to forged or unresolvable sender domain
  o 577 emails discarded automatically by SpamAssassin+MIMEDefang

This system worked *ok* but put a lot of work onto the shoulders of our 
list admins who constantly had to weed out the spam caught up in the 
mailing list moderator system.

Yesterday I brought online another system that seems to be already 
working really well. It catches spam before we even accept it on our 
server which makes the load easier on both our scanning software and our 
  human list moderators.

The system is the RBL+ blackhole list from http://www.mail-abuse.org and 
the way it works is that we now query (via DNS) the RBL+ database each 
time someone connects to our mail server. If the RBL check against the 
sender IP address comes back as "positive" we reject the incoming email.

RBL+ is a combination of four constantly updated databases:

  1. RBL -- IP addresses of known, documented spammers and spam machines
  2. RSS -- IP addresses of documented/tested unsecured email relays
  3. OPS -- IP addresses of documented open proxy servers w/ spam history
  3. DUL -- IP addresses belonging to ISP dialup and DHCP customers

We have already blocked 137 email attempts in the last 24 hours from 
machines that were listed in one or more of the RBL databases.

It is too soon to tell but if the RBL+ system plus our existing 
anti-spam measures work well enough we may be in a position where our 
"closed" mailing lists could revert back to being 'anyone can post'.

Feedback appreciated. Especially if you get a "reject" message from us 
saying that you are listed in the RBL+ blackhole database!

Regards,
Chris
O|B|F

From cavallo at biochem.ucl.ac.uk  Thu Mar 11 06:42:44 2004
From: cavallo at biochem.ucl.ac.uk (Antonio Cavallo)
Date: Thu Mar 11 06:48:27 2004
Subject: [BioPython] embl
Message-ID: <405050B4.5060007@biochem.ucl.ac.uk>

Hi,
I'm new (and quite confused) to biopython.
I have a simple question (maybe it looks silly):
how do I parse an embl data file using biopython?
Is there any way to retrieve the sequence information (The CDS section)?
What about the position of the CDS sections (they are split in sub pieces)?

kindly regards,
antonio cavallo
From lpritc at scri.sari.ac.uk  Thu Mar 11 07:53:20 2004
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Thu Mar 11 07:59:03 2004
Subject: [BioPython] embl
In-Reply-To: <405050B4.5060007@biochem.ucl.ac.uk>
References: <405050B4.5060007@biochem.ucl.ac.uk>
Message-ID: <40506140.5050401@scri.sari.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Antonio Cavallo wrote:

| Hi,
| I'm new (and quite confused) to biopython.
| I have a simple question (maybe it looks silly):
| how do I parse an embl data file using biopython?
| Is there any way to retrieve the sequence information (The CDS section)?
| What about the position of the CDS sections (they are split in sub pieces)?

Not that silly a question.  I had a similar problem when I was working with
.tab files (with no header information) from the Sanger, and ended up writing
a BioPython-style parser for them.  It's not the most robust code in the
world, but you're welcome to a copy if it might help you.

- --
Dr Leighton Pritchard AMRSC
D104, PPI, Scottish Crop Research Institute
Invergowrie, Dundee, DD2 5DA, Scotland, UK
E: lpritc@scri.sari.ac.uk	W: http://bioinf.scri.sari.ac.uk/index.shtml
T: +44 (0)1382 568579		F: +44 (0)1382 568578
PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAUGFAL1gZ+OWLpBsRAj9yAJ4tCmuI43Xzdz/oa7AQvPQ07HvKrACeLZjj
m67cOc3ZCZzkfhGDIFmft80=
=EBcY
-----END PGP SIGNATURE-----

From biopython at wardroper.org  Thu Mar 11 10:32:35 2004
From: biopython at wardroper.org (Alan Wardroper)
Date: Thu Mar 11 10:32:37 2004
Subject: [BioPython] contig mapping in BioPython
Message-ID: <6.0.3.0.2.20040424150959.03801db0@wardroper.org>

I'm thinking about writing some BioPython modules for contig/genome mapping 
- something akin to BioPerl's Bio::Assembler::contig - for use in genome 
mapping (and whatever else it ends up lending itself to).

Can't find any references to any such projects that are ongoing but would 
like to check if anyone else is working on this before I put in too much 
time in reinventing more wheels than we need.
Anyone think this would/would not be useful?

Thanks for your input...

========================
           Alan Wardroper
======================== 

From anunberg at oriongenomics.com  Thu Mar 11 11:20:43 2004
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Thu Mar 11 11:24:34 2004
Subject: [BioPython] contig mapping in BioPython
In-Reply-To: <6.0.3.0.2.20040424150959.03801db0@wardroper.org>
Message-ID: 

There is something similar in Bio.Sequencing
Have you checked out Biopython from CVS?

On a totally different note:
I agree that as a simple user of biopython that the documentation can be
confusing because python does not use special characters to denote basic
data types(variables,lists, dictionary
And I recollect that in other places the object to pass or what is returned
is not documented either(ie you pass a Seq object and get a SeqFeature
object in return...

If I ever get any time, or if I get so fed up to make the time, I will go
through some of the libraries I use most often and try to create more
documentation

Andy

> I'm thinking about writing some BioPython modules for contig/genome mapping
> - something akin to BioPerl's Bio::Assembler::contig - for use in genome
> mapping (and whatever else it ends up lending itself to).
> 
> Can't find any references to any such projects that are ongoing but would
> like to check if anyone else is working on this before I put in too much
> time in reinventing more wheels than we need.
> Anyone think this would/would not be useful?
> 
> Thanks for your input...
> 
> ========================
>          Alan Wardroper
> ========================
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 

-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com

From mbatesalann at netscape.net  Wed Mar 10 14:31:32 2004
From: mbatesalann at netscape.net (mbatesalann@netscape.net)
Date: Thu Mar 11 16:28:06 2004
Subject: [BioPython] REPLY SOON
Message-ID: 

Dear Friend,

As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. 
My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer.
It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. 
I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my 
business. Though I am very rich, I was never generous, I was always hostile to people and only 
focused on my business as that was the only thing I cared for. But now I regret all this as I now know 
that there is more to life than just wanting to have or make all the money in the world. 
I believe when God gives me a second chance to come to this world I would live my life a different way 
from how I have lived it. Now that God has called me, I have willed and given most of my property 
and assets to my immediate and extended family members as well as a few close friends. 
I want God to be merciful to me and accept my soul so, I have decided to give alms to charity 
organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed 
money to some charity organizations in the U.A.E, Algeria and Malaysia. Now that my health has 
deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one 
of my accounts and distribute the money which I have there to charity organization in Bulgaria and 
Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as 
they seem not to be contended with what I have left 
for them. The last of my money which no one knows of is the huge cash deposit of eighteen million dollars 
$18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations.
I have set aside 10% for you and for your time.

God be with you. 

BATES ALAN

From sbassi at asalup.org  Fri Mar 12 10:20:33 2004
From: sbassi at asalup.org (Sebastian Bassi)
Date: Fri Mar 12 18:19:33 2004
Subject: [BioPython] [Fwd: Re: Tm calc: 1.3 (This is the good one!)]
Message-ID: <4051D541.6020107@asalup.org>

Brad: I send this using the list because it seems you have an antispam 
filter that doesn't allow me to reach you using attachments.

-------- Original Message --------
Date: Mon, 08 Mar 2004 21:04:55 -0300
From: Sebastian Bassi 
Reply-To: sbassi@asalup.org
Organization: ASALUP
To: Brad Chapman 
Subject: Re: Tm calc: 1.3 (This is the good one!)

Brad Chapman wrote:
> Hi Sebastian;
>>Did you get my last version of Tm function?
> I didn't. If you could send it again that would be great. I was
> wondering what happened with that :-).

Here is the new version, see inside the zip (I did zip it because plain
text often get corrupted by email)

-- 
Best regards,

//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 -     \=//

                  http://Bioinformatica.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tm.zip
Type: application/zip
Size: 3775 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040312/6c805e64/tm-0001.zip
-------------- next part --------------
*** tmori.py	Fri Mar  5 11:06:45 2004
--- tm.py	Fri Mar  5 11:10:25 2004
***************
*** 1,6 ****
--- 1,9 ----
  import string
  import math

+ STRONG_BONDS = ["G", "C"]
+ WEAK_BONDS = ["A", "T", "U"]
+ 
  def Tm_staluc(s,dnac=50,saltc=50,rna=0):
      """Returns DNA/DNA tm using nearest neighbor thermodynamics. dnac is
      DNA concentration [nM] and saltc is salt concentration [mM].
***************
*** 34,66 ****
          if rna==0:
              #DNA/DNA
              #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594
!             if stri[0]=="G" or stri[0]=="C":
                  deltah=deltah-0.1
                  deltas=deltas+2.8
!             elif stri[0]=="A" or stri[0]=="T":
                  deltah=deltah-2.3
                  deltas=deltas-4.1
!             if stri[-1]=="G" or stri[-1]=="C":
!                 deltah=deltah-0.1
                  deltas=deltas+2.8
!             elif stri[-1]=="A" or stri[-1]=="T":
                  deltah=deltah-2.3
                  deltas=deltas-4.1
              dhL=dh+deltah
              dsL=ds+deltas
              return dsL,dhL
          elif rna==1:
!             #RNA
!             if stri[0]=="G" or stri[0]=="C":
                  deltah=deltah-3.61
                  deltas=deltas-1.5
!             elif stri[0]=="A" or stri[0]=="T" or stri[0]=="U":
                  deltah=deltah-3.72
                  deltas=deltas+10.5
!             if stri[-1]=="G" or stri[-1]=="C":
                  deltah=deltah-3.61
                  deltas=deltas-1.5
!             elif stri[-1]=="A" or stri[-1]=="T" or stri[0]=="U":
                  deltah=deltah-3.72
                  deltas=deltas+10.5
              dhL=dh+deltah
--- 37,69 ----
          if rna==0:
              #DNA/DNA
              #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594
! 	    if stri[0] in STRONG_BONDS:
                  deltah=deltah-0.1
                  deltas=deltas+2.8
!             elif stri[0] in WEAK_BONDS:
                  deltah=deltah-2.3
                  deltas=deltas-4.1
!             if stri[0] in STRONG_BONDS:
! 		deltah=deltah-0.1
                  deltas=deltas+2.8
!             elif stri[0] in WEAK_BONDS:
                  deltah=deltah-2.3
                  deltas=deltas-4.1
              dhL=dh+deltah
              dsL=ds+deltas
              return dsL,dhL
          elif rna==1:
!             #RNA/RNA
!             if stri[0] in STRONG_BONDS:
                  deltah=deltah-3.61
                  deltas=deltas-1.5
!             elif stri[0] in WEAK_BONDS:
                  deltah=deltah-3.72
                  deltas=deltas+10.5
!             if stri[0] in STRONG_BONDS:
                  deltah=deltah-3.61
                  deltas=deltas-1.5
!             elif stri[0] in WEAK_BONDS:
                  deltah=deltah-3.72
                  deltas=deltas+10.5
              dhL=dh+deltah
***************
*** 68,90 ****
              # print "delta h=",dhL
              return dsL,dhL

!     def overcount(st,p):
!         """Returns how many p are on st, works even for overlapping"""
!         ocu=0
!         x=0
!         while 1:
!             try:
!                 i=st.index(p,x)
!             except ValueError:
!                 break
!             ocu=ocu+1
!             x=i+1
!         return ocu

      sup=string.upper(s)
      R=1.987 # universal gas constant in Cal/degrees C*Mol
      vsTC,vh=tercorr(sup)
      vs=vsTC

      k=(dnac/4.0)*1e-8
      #With complementary check on, the 4.0 should be changed to a variable.
--- 71,91 ----
              # print "delta h=",dhL
              return dsL,dhL

!     def countdinucs(s):
!         """Counts dinucleotide frequencies in a sequence"""
!         dinucs={}
!         map(dinucs.__setitem__,[a+b for a in 'ACGT' for b in 'ACGT'],[0]*16)
!         for i in range(len(s)-1):
!             dn=s[i:i+2]
!             dinucs[dn]+=1
!         return dinucs

      sup=string.upper(s)
      R=1.987 # universal gas constant in Cal/degrees C*Mol
      vsTC,vh=tercorr(sup)
      vs=vsTC
+     dinuc=countdinucs(sup)
+ 

      k=(dnac/4.0)*1e-8
      #With complementary check on, the 4.0 should be changed to a variable.
***************
*** 92,136 ****
      if rna==0:
          #DNA/DNA
          #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594
!         vh=vh+((overcount(sup,"AA"))*7.9+(overcount(sup,"TT"))*
!            7.9+(overcount(sup,"AT"))*7.2+(overcount(sup,"TA"))*
!            7.2+(overcount(sup,"CA"))*8.5+(overcount(sup,"TG"))*
!            8.5+(overcount(sup,"GT"))*8.4+(overcount(sup,"AC"))*8.4)
!         vh=vh+((overcount(sup,"CT"))*7.8+(overcount(sup,"AG"))*
!            7.8+(overcount(sup,"GA"))*8.2+(overcount(sup,"TC"))*8.2)
!         vh=vh+((overcount(sup,"CG"))*10.6+(overcount(sup,"GC"))*
!           10.6+(overcount(sup,"GG"))*8+(overcount(sup,"CC"))*8)
!         
!         vs=vs+((overcount(sup,"AA"))*22.2+(overcount(sup,"TT"))*
!           22.2+(overcount(sup,"AT"))*20.4+(overcount(sup,"TA"))*21.3)
!         vs=vs+((overcount(sup,"CA"))*22.7+(overcount(sup,"TG"))*
!           22.7+(overcount(sup,"GT"))*22.4+(overcount(sup,"AC"))*22.4)
!         vs=vs+((overcount(sup,"CT"))*21.0+(overcount(sup,"AG"))*
!           21.0+(overcount(sup,"GA"))*22.2+(overcount(sup,"TC"))*22.2)
!         vs=vs+((overcount(sup,"CG"))*27.2+(overcount(sup,"GC"))*
!           27.2+(overcount(sup,"GG"))*19.9+(overcount(sup,"CC"))*19.9)
          ds=vs
          dh=vh
      else:
          #RNA/RNA hybridisation of Xia et al (1998)
          #Biochemistry 37: 14719-14735         
!         vh=vh+((overcount(sup,"AA"))*6.82+(overcount(sup,"TT"))*
!            6.6+(overcount(sup,"AT"))*9.38+(overcount(sup,"TA"))*
!           7.69+(overcount(sup,"CA"))*10.44+(overcount(sup,"TG"))*
!           10.5+(overcount(sup,"GT"))*11.4+(overcount(sup,"AC"))*10.2)
!         vh=vh+((overcount(sup,"CT"))*10.48+(overcount(sup,"AG"))*
!            7.6+(overcount(sup,"GA"))*12.44+(overcount(sup,"TC"))*13.3)
!         vh= vh+((overcount(sup,"CG"))*10.64+(overcount(sup,"GC"))*
!           14.88+(overcount(sup,"GG"))*13.39+(overcount(sup,"CC"))*12.2)
! 
!         vs=vs+((overcount(sup,"AA"))*19.0+(overcount(sup,"TT"))*
!           18.4+(overcount(sup,"AT"))*26.7+(overcount(sup,"TA"))*20.5)
!         vs=vs+((overcount(sup,"CA"))*26.9+(overcount(sup,"TG"))*
!           27.8+(overcount(sup,"GT"))*29.5+(overcount(sup,"AC"))*26.2)
!         vs=vs+((overcount(sup,"CT"))*27.1+(overcount(sup,"AG"))*
!           19.2+(overcount(sup,"GA"))*32.5+(overcount(sup,"TC"))*35.5)
!         vs=vs+((overcount(sup,"CG"))*26.7+(overcount(sup,"GC"))*
!           36.9+(overcount(sup,"GG"))*32.7+(overcount(sup,"CC"))*29.7)
          ds=vs
          dh=vh

--- 93,119 ----
      if rna==0:
          #DNA/DNA
          #Allawi and SantaLucia (1997). Biochemistry 36 : 10581-10594
!         vh=vh+dinuc["AA"]*7.9+dinuc["TT"]*7.9+dinuc["AT"]*7.2+dinuc["TA"]*7.2+\
!          dinuc["CA"]*8.5+dinuc["TG"]*8.5+dinuc["GT"]*8.4+dinuc["AC"]*8.4+\
!          dinuc["CT"]*7.8+dinuc["AG"]*7.8+dinuc["GA"]*8.2+dinuc["TC"]*8.2+\
!          dinuc["CG"]*10.6+dinuc["GC"]*10.6+dinuc["GG"]*8+dinuc["CC"]*8
!         vs=vs+dinuc["AA"]*22.2+dinuc["TT"]*22.2+dinuc["AT"]*20.4+dinuc["TA"]*21.3+\
!          dinuc["CA"]*22.7+dinuc["TG"]*22.7+dinuc["GT"]*22.4+dinuc["AC"]*22.4+\
!          dinuc["CT"]*21.0+dinuc["AG"]*21.0+dinuc["GA"]*22.2+dinuc["TC"]*22.2+\
!          dinuc["CG"]*27.2+dinuc["GC"]*27.2+dinuc["GG"]*19.9+dinuc["CC"]*19.9
          ds=vs
          dh=vh
      else:
          #RNA/RNA hybridisation of Xia et al (1998)
          #Biochemistry 37: 14719-14735         
!         vh=dinuc["AA"]*6.6+dinuc["TT"]*6.6+dinuc["AT"]*5.7+dinuc["TA"]*8.1+\
!          dinuc["CA"]*10.5+dinuc["TG"]*10.5+dinuc["GT"]*10.2+dinuc["AC"]*10.2+\
!          dinuc["CT"]*7.6+dinuc["AG"]*7.6+dinuc["GA"]*13.3+dinuc["TC"]*13.3+\
!          dinuc["CG"]*8.0+dinuc["GC"]*14.2+dinuc["GG"]*12.2+dinuc["CC"]*12.2+\
!          dinuc["AA"]*18.4+dinuc["TT"]*18.4+dinuc["AT"]*15.5+dinuc["TA"]*16.9
!         vs=vs+dinuc["CA"]*27.8+dinuc["TG"]*27.8+dinuc["GT"]*26.2+dinuc["AC"]*26.2+\
!          dinuc["CT"]*19.2+dinuc["AG"]*19.2+dinuc["GA"]*35.5+dinuc["TC"]*35.5+\
!          dinuc["CG"]*19.4+dinuc["GC"]*34.9+dinuc["GG"]*29.7+dinuc["CC"]*29.7
          ds=vs
          dh=vh

***************
*** 138,141 ****
      tm=((1000* (-dh))/(-ds+(R * (math.log(k)))))-273.15
      # print "ds="+str(ds)
      # print "dh="+str(dh)
!     return tm
\ No newline at end of file
--- 121,124 ----
      tm=((1000* (-dh))/(-ds+(R * (math.log(k)))))-273.15
      # print "ds="+str(ds)
      # print "dh="+str(dh)
!     return tm

From nauman.maqbool at agresearch.co.nz  Sun Mar 14 22:47:37 2004
From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman)
Date: Sun Mar 14 22:53:11 2004
Subject: [BioPython] Blast parser error
Message-ID: 

Hi 

I am new to biopython and am trying out the NCBI Standalone Blast
parser. While trying the blast parsing methods from the cookbook
(parsing standalone Blastn output) I got the following error message:

>>> ================================ RESTART
================================
>>> 

Traceback (most recent call last):
  File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 26, in
-toplevel-
    testparser(in_file)
  File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 11, in
testparser
    b_record = b_iterator.next()
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
1332, in next
    return self._parser.parse(File.StringHandle(data))
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
557, in parse
    self._scanner.feed(handle, self._consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
97, in feed
    self._scan_rounds(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
153, in _scan_rounds
    self._scan_alignments(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
287, in _scan_alignments
    self._scan_pairwise_alignments(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
297, in _scan_pairwise_alignments
    self._scan_one_pairwise_alignment(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
309, in _scan_one_pairwise_alignment
    self._scan_hsp(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
337, in _scan_hsp
    self._scan_hsp_alignment(uhandle, consumer)
  File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line
368, in _scan_hsp_alignment
    read_and_call(uhandle, consumer.query, start='Query')
  File "C:\Python23\Lib\site-packages\Bio\ParserSupport.py", line 300,
in read_and_call
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'Query':
ncbiClient.20040311_1242_9391.log

>>> 

The version of Blast we are running is: 2.2.6 [Apr-09-2003]. I found a
similar blast parser error in the biopython archives but that was
referring to the output format change in blastx. I don't think that the
Blastn output has changed in the recent past, so it might be due to
something that I might be missing in my script, here is the script that
I am running:

from Bio.Blast import NCBIStandalone

in_file = 'test.blast'

def testparser(blastfile):
    blast_out = open(blastfile, "r")
    b_parser = NCBIStandalone.BlastParser()
    b_iterator = NCBIStandalone.Iterator(blast_out, b_parser)

    while 1:
        b_record = b_iterator.next()

        if b_record is None:
            break

        E_VALUE_THRESH = 0.05
        for alignment in b_record.alignments:
            for hsp in alignments.hsp:
                if hsp.expect < E_VALUE_THRESH:
                    print '*****Alignmnent*****'
                    print 'sequence:', alignment.title
                    print 'length:', alignment.length
                    print 'e value:', hsp.expect

# Main
testparser(in_file) 

Any help will be highly appreciated.

Regards

Nauman

********************************************
Nauman J Maqbool PhD
Bioinformatics Group
AgResearch Invermay
Private Bag 50034
Puddle Alley
Mosgiel
New Zealand
email: nauman.maqbool@agresearch.co.nz
Tel: +64-3-489 9031
Fax: +64-3-489 3739
********************************************

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From jeffrey_chang at stanfordalumni.org  Mon Mar 15 00:14:24 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Mon Mar 15 00:19:55 2004
Subject: [BioPython] Blast parser error
In-Reply-To: 
References: 
Message-ID: <9FC1F012-763F-11D8-8B70-000A956845CE@stanfordalumni.org>

It is likely because the BLAST format has changed.  If you can send the 
BLAST output file that is causing the problem, I can take a look at it 
for you.  Otherwise, you will need to figure out which line is causing 
the problem and update the parser to deal with it properly.

Jeff

On Mar 14, 2004, at 7:47 PM, Maqbool, Nauman wrote:

> Hi
>
> I am new to biopython and am trying out the NCBI Standalone Blast
> parser. While trying the blast parsing methods from the cookbook
> (parsing standalone Blastn output) I got the following error message:
>
>>>> ================================ RESTART
> ================================
>>>>
>
> Traceback (most recent call last):
>   File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 26, in
> -toplevel-
>     testparser(in_file)
>   File "C:/Python/NM Python work/SV/sing_blst_SVparse.py", line 11, in
> testparser
>     b_record = b_iterator.next()
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 1332, in next
>     return self._parser.parse(File.StringHandle(data))
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 557, in parse
>     self._scanner.feed(handle, self._consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 97, in feed
>     self._scan_rounds(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 153, in _scan_rounds
>     self._scan_alignments(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 287, in _scan_alignments
>     self._scan_pairwise_alignments(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 297, in _scan_pairwise_alignments
>     self._scan_one_pairwise_alignment(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 309, in _scan_one_pairwise_alignment
>     self._scan_hsp(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 337, in _scan_hsp
>     self._scan_hsp_alignment(uhandle, consumer)
>   File "C:\Python23\Lib\site-packages\Bio\Blast\NCBIStandalone.py", 
> line
> 368, in _scan_hsp_alignment
>     read_and_call(uhandle, consumer.query, start='Query')
>   File "C:\Python23\Lib\site-packages\Bio\ParserSupport.py", line 300,
> in read_and_call
>     raise SyntaxError, errmsg
> SyntaxError: Line does not start with 'Query':
> ncbiClient.20040311_1242_9391.log
>
>>>>
>
> The version of Blast we are running is: 2.2.6 [Apr-09-2003]. I found a
> similar blast parser error in the biopython archives but that was
> referring to the output format change in blastx. I don't think that the
> Blastn output has changed in the recent past, so it might be due to
> something that I might be missing in my script, here is the script that
> I am running:
>
>
> from Bio.Blast import NCBIStandalone
>
> in_file = 'test.blast'
>
> def testparser(blastfile):
>     blast_out = open(blastfile, "r")
>     b_parser = NCBIStandalone.BlastParser()
>     b_iterator = NCBIStandalone.Iterator(blast_out, b_parser)
>
>     while 1:
>         b_record = b_iterator.next()
>
>         if b_record is None:
>             break
>
>         E_VALUE_THRESH = 0.05
>         for alignment in b_record.alignments:
>             for hsp in alignments.hsp:
>                 if hsp.expect < E_VALUE_THRESH:
>                     print '*****Alignmnent*****'
>                     print 'sequence:', alignment.title
>                     print 'length:', alignment.length
>                     print 'e value:', hsp.expect
>
> # Main
> testparser(in_file)
>
>
> Any help will be highly appreciated.
>
> Regards
>
> Nauman
>
> ********************************************
> Nauman J Maqbool PhD
> Bioinformatics Group
> AgResearch Invermay
> Private Bag 50034
> Puddle Alley
> Mosgiel
> New Zealand
> email: nauman.maqbool@agresearch.co.nz
> Tel: +64-3-489 9031
> Fax: +64-3-489 3739
> ********************************************
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython

From absmythe at ucdavis.edu  Mon Mar 15 16:57:48 2004
From: absmythe at ucdavis.edu (ashleigh smythe)
Date: Mon Mar 15 17:03:19 2004
Subject: [BioPython] trying to make NBRF dictionary
Message-ID: <1079387868.6757.20.camel@nate.ucdavis.edu>

Hello.  As there seems to be no existing Bio.Fasta-style dictionary code
for alignments (Clustalw or NBRF), I thought I'd try to write a simple
script using the NBRF iterator to make a dictionary of sequence
name:sequence key:value pairs.  My ultimate goal is to be able to
combine different aligned datasets where the sequence names (taxa) are
the same but they are in a different order (otherwise I could just
append one to the other).  It seemed like a good use of a dictionary,
only I'm still pretty lame at python.  I thought I'd start with just
trying to get one file into a dictionary, and I'm stuck already.  My
code seems to make a dictionary of sorts, but it behaves like it only
has 1 key:value pair rather than 4 (len(mydict) returns 1) and the keys 
are just my variable name (cur_record.sequence_name), not what I think
the keys should be - the actual data I put into the dictionary.  I'm
guessing that means I have some scope problem.  Can anybody please give
me some tips on where to go, at least for this first chunk?
Here is my script:

import Bio
from Bio import NBRF

mydict={}

def makedict(file1):
     parser=NBRF.RecordParser()
     first_file=open(file1, 'r')
     iterator=NBRF.Iterator(first_file, parser)

     while 1:
         cur_record=iterator.next()
         if cur_record is None:
             break
         name=cur_record.sequence_name
         sequence=cur_record.sequence.data
         mydict[name] = sequence

     return mydict

And here is what I get:
>>> seqcombine2.makedict('test.pir')
{'9.1Otostrongylus_sp._U81589.1':
'----------------------------------------------------------------------------------------------------T-GTC-GA--GTTC-A--CC------TT--C--A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--C--C--A-TTT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCG-ATTA-A-AC-CCTG-AC---T--T-T--T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGGA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-GTTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAGAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CAA---------------GA----TT-----------TT------T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGT--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTG----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----TGGTA----TAGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTG---------------------------------------------------------------------', '813Otostrongylus_circumlitus_A': '-----------------------------------------------------------------------------------GATT-AAGCCATG-CA-T-GTC-GA--GTTC-A--GC------TT--C--A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--C--C--A-TTT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCG-ATTA-A-AC-CCTG-AC---T--T-T--T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGGA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-GTTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAGAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CAA---------------GA----TT-----------TT------T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGT--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTG----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----TGGTA----TAGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTGTAGGTGAACCTGG--------------------------------------------------------', '815Parelaphostrongylus_odocoil': '------------------------------------------------------------------------------------ATT-AAGCCATG-CA-T-GTG-GA--GTTC-A--AC------TT--CA-A---AG-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-CATT---TATT-CG--G--AA-A--A-T--CC-T--T-AAT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATATGCAT-A-A-AC-CCTG-AC---T--C-TG-T---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATATT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTCATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGTA--TCAAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTCGC---AT--GCA-AT-G-ATTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTATA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAAAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAAT-AATGGTTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CTA---------------GA----T------------ACG-----T----ATGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTTCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGC--TGGC---CTA--T----CCAT-TAC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCGCTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTA----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----CGATA----TGGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-A---------------------------------------------------------------------------------', '804Angiostrongylus_cantonensis': '------------------------------------------------------------------------------------ATT-AAGCCATG-CA-T-GAG-GA--GTTC-A--GC------TT--TA-A----G-T-GA--AA-C-TGCGAACGGCTCATTAG-AGCAGATG-T-GATT---TATT-CG--G--AA-A--A-T--CC-T----ATT-GGA--TAACTGCG--GTAAT-TCTGGAGCTAATACATGCGTAT-A-A-AC-CCTG-AC---T--T-T--C---GAAA--GGGTGCAAT-TA-TTAGAG---C---AA-A-TCAAT-CAT-------------T-T---TC----------G-GA------TG----TAGTT----------T---GCT---G-A-C-TC-TGAATA-A---CG--CAG--CATA-TCGG-CGGC-T-T-GT---TCGCCGATAAT-CCGAAAA----AG---TGT-C-TGCCC-TATCA--AC---CT---GA-TGGTAGTCTATTAGTCTA-CCATGGTTATTACGGGTAACGGAGAATAAGGGTT-CGACTCCGGAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAG-GCGCGAAACTTATCCAA-T-CTTG-----A-ATAGATGA-GATAGTGACT-----------------------AAAAATAAAAA--GACCA---TTCC-T-AT-G--GAACG-GTTATTTCAATGAGT--TGATCATAAACCTTTTTT--C-G-AGTA--TCCAGTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTC--CACTAGTGTA-AATCGTCATTGCTGCGGTTAAAAAGC-TCGTAGTTGGAT-C-TGAGTTGC---AT--GCA-AT-G-ATTCG--C-CT----T--TG--G--CGT----TAAT------C---AT-TG-TTGTG---ACTA---T------T-T---G--CTG--G-T-T--TTCT-AT--TG-A--AA-----TTTC-----G-A-TT-----TCTTTA-GTG-GC-TA--GCGA-GTT-TA-CTTTGA-AT-AAATTAAAGTGCT-CAGAACAAG---CGTT-----T--GC-TT-G--AAT-G-GTCGAT-CATGGAATAA-----TAAAAGAGGAC--TTCG---GT-T------CTATT-T----ATTGGTTC-AG---G-AA------CTG------AAGT-AATGATTAAGAGGGACA--ATTC-GGGGGCATTCGTATCCCTGCGCGAGAGGTGAAATTCGTG-GACCG-CAGGGGGACGCCCTAAAGCGAAAG-CATTTGCC-AAGAAT--GTCTTCATTAATCA-AGAACGAAAGTCAGAGGTTCGAAGGCGATTAGATA--CCGCCC-TAGTTCTGACCGTAAACTATGCCATCTAGC-GA--TCC-GAT--GG-GG--TA--T--TG--T-T----GCCTT--GTCGAGG-AGCTT-CCCGGAAACGA--AA-GTCTTTCGGT-TCCTGGGGTAGTATGGTTGC-AAAGCT-G-AAACTTAAAGA-AATTGACGGAATGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGA--AAACT-CACCC-GGCCCGGACACCGTAA-GGATTGAC-----AGATTGA--A---AGCTCTTTCTC-GATTTGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTG-GTGGAG-CGATTTGTCTGGTTTATTCC-GAT-AACGAGCGAGACTCT-AG-C-C--TG-CTAAA-TA-G--TGA--CTA---------------GA----TT-----------AT------T----GAGTC-------TA-G----T--C-------TA-------------C-TT-----CTT-AG---AGGGATAAG-CGG---TGTT-T-----A-G-C--CGCA--CG-AGATTGAGCGATAACAGGTCTGTGATGCCCTTAGATGTCCGGGG-CTG-CACGCGCGCTACAATGGAAG-AAT-CAGC--TGGC---CTA--T----CCAT-TGC-CG-A-AAGGT-AT----T----GGTAAACCG-TTGAAACT--CTTCC-GTG-ACCGGGATAGGGAATTGT--A-ATT---------ATT---TCCC-TTGAACG-AGGAATTCCTAGTAAGTGTG-AGTCATCAGCTCACGCTGATTACGTCCC-TGCCATTTGTACACACCGCCCGTCG

CTGTC-CGGG-ACTG--AGC-TGTC--TCGAGAGGACT-GCGG-A-CTA----CT--GTA----TTGA-GG---CCT-------T---CGGG------TCG-----CGATA----TGGCG---GG-AAA-CAG-TTC-AATC-G-CAATG-G--CTTGAACCGGGTAAAAGTCGT-AACAAGGTATCTG---------------------------------------------------------------------'}

Thanks for any input.

Ashleigh

From nauman.maqbool at agresearch.co.nz  Mon Mar 15 20:41:52 2004
From: nauman.maqbool at agresearch.co.nz (Maqbool, Nauman)
Date: Mon Mar 15 20:47:25 2004
Subject: [BioPython] Blast parser error
Message-ID: 

Hi everyone

I have another (beginner's) question. Is there a way the query title in
the header of the Blast report can be returned? By Query title I mean
the title of the sequence used as the query for the Blast search. 

I notice that other objects e.g. from title, length and info about hsps
can be returned very easily but returning objects from header,
databasereport or parameters is not that straight forward, or is it?  

Regards

Nauman

> On Mar 14, 2004, at 7:47 PM, Maqbool, Nauman wrote:
> 
> > Hi
> >
> > I am new to biopython and am trying out the NCBI Standalone Blast
> > parser. While trying the blast parsing methods from the cookbook
> > (parsing standalone Blastn output) I got the following error
message:
> >
> >>>> ================================ RESTART
> > ================================
//snip
> >
> > ********************************************
> > Nauman J Maqbool PhD
> > Bioinformatics Group
> > AgResearch Invermay
> > Private Bag 50034
> > Puddle Alley
> > Mosgiel
> > New Zealand
> > email: nauman.maqbool@agresearch.co.nz
> > Tel: +64-3-489 9031
> > Fax: +64-3-489 3739
> > ********************************************

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From lpritc at scri.sari.ac.uk  Tue Mar 16 07:00:25 2004
From: lpritc at scri.sari.ac.uk (Leighton Pritchard)
Date: Tue Mar 16 07:05:59 2004
Subject: [BioPython] Blast parser error
In-Reply-To: 
References: 
Message-ID: <4056EC59.2060302@scri.sari.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Maqbool, Nauman wrote:
| Hi everyone
|
| I have another (beginner's) question. Is there a way the query title in
| the header of the Blast report can be returned? By Query title I mean
| the title of the sequence used as the query for the Blast search.
|
| I notice that other objects e.g. from title, length and info about hsps
| can be returned very easily but returning objects from header,
| databasereport or parameters is not that straight forward, or is it?
|
| Regards
|
| Nauman

Hi Nauman,

If the record object being returned from the parser is b_record, then
the title of the query sequence in the search producing the record is
b_record.query

(for orientation, alignments are in b_record.alignments, hsps in
b_record.alignments[0].hsps and so on).

Try dir(b_record) for a list of the attributes of your record.

Best,

- --
Dr Leighton Pritchard AMRSC
D104, PPI, Scottish Crop Research Institute
Invergowrie, Dundee, DD2 5DA, Scotland, UK
E: lpritc@scri.sari.ac.uk	W: http://bioinf.scri.sari.ac.uk/index.shtml
T: +44 (0)1382 568579		F: +44 (0)1382 568578
PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAVuxYL1gZ+OWLpBsRAu8+AJwN6vd2wU/YvLMz/yVKUHMkU2Um2QCfQJNG
VS+VEgE3Nd4wuKyk4xig4+0=
=ZqmN
-----END PGP SIGNATURE-----

From chapmanb at uga.edu  Wed Mar 17 20:00:29 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Wed Mar 17 20:11:53 2004
Subject: [BioPython] Could BioPython transfer GenBank file to Fasta file
	like BioPerl?
In-Reply-To: <20040227175451.19840.qmail@web12704.mail.yahoo.com>
References: <20040227175451.19840.qmail@web12704.mail.yahoo.com>
Message-ID: <20040318010029.GA99271@evostick.agtec.uga.edu>

Hi Long;

> I have a GenBank file, and want to transfer it to
> Fasta file.  Of course, I can use FeatureParser to get
> the "sequence", "id", "description" ..., then write to
> a file.
> 
> I want to know if there is a simple command or a
> module to do that?  In Perl/BioPerl, it is very simple
> to transfer files between kinds of file format.

We do have a FormatIO system under development which is meant to act
much like the BioPerl SeqIO system and make simple format
conversions much easier.

I wrote up some cookbook style documentation for the system (and for
the "by hand" system) you describe above. You can get them from the
documentation page:

http://biopython.org/documentation/

under "Cookbook-style documentation" and "Converting GenBank (and
other formats) to Fasta."

Hopefully this is helpful -- please let me know if there are any
questions or I can improve the docs. Thanks!
Brad

From chapmanb at uga.edu  Wed Mar 17 20:18:53 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Wed Mar 17 20:30:18 2004
Subject: [BioPython] embl
In-Reply-To: <405050B4.5060007@biochem.ucl.ac.uk>
References: <405050B4.5060007@biochem.ucl.ac.uk>
Message-ID: <20040318011853.GC99271@evostick.agtec.uga.edu>

Hi Antonio;

> I'm new (and quite confused) to biopython.
> I have a simple question (maybe it looks silly):
> how do I parse an embl data file using biopython?
> Is there any way to retrieve the sequence information (The CDS section)?
> What about the position of the CDS sections (they are split in sub pieces)?

EMBL support is still lacking in Biopython. Currently we do have the
basis for developing a EMBL parser -- there is a Martel (the
underyling parsing system in Biopython) grammar for embl. This is
located in Bio/expressions/embl/embl65.py.

We still do need someone to help do the work to build this grammar
into a "Biopython-style" parser.

As a workaround, the GenBank parser in Biopython is quite functional
and widely used -- so you could fetch your sequences in GenBank
format and parse out the features from there, as described in the
documentation:

http://biopython.org/docs/tutorial/Tutorial004.html#toc13

Hope this helps!
Brad
From chapmanb at uga.edu  Wed Mar 17 20:39:43 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Wed Mar 17 20:51:07 2004
Subject: [BioPython] Is there any more detailed documentation to BioPython?
In-Reply-To: <200403110100.i2B106tk013180@portal.open-bio.org>
References: <200403110100.i2B106tk013180@portal.open-bio.org>
Message-ID: <20040318013943.GF99271@evostick.agtec.uga.edu>

Hi Denny;

> 	I use BioPython about one year and it is really 
> a good programming language. But I think the ONLY drawback 
> is that BioPython has simple or poor documentation.

Thanks for the comments. Definitely I feel the same as you --
documentation is always something where we are lacking. A couple of
ways in which I am thinking about trying to improve things are:

* Getting a better representation of the documentation that is in
the modules. Many of the Biopython docstrings are very useful, but
the automatic extraction tool I've been using (Happydoc) hasn't
always made me happy. Just last week I was pointed to epydoc:

http://epydoc.sourceforge.net/

which I am hearing is much better. So we may be improving like that.

* Modularizing the Biopython Tutorial documentation into smaller
"cook-book" like sections. Honestly, the Tutorial is getting too big
and unwieldy to maintain, and I'm planning on working to section it
up into smaller parts that describe individual sections. We have
already started doing this with the installation instructions and
BioSQL, and recently I've written a couple other smaller bits.

The second part -- writing small documentation on doing something
with Biopython -- is something we can always use help with. We are
definitely looking for Biopython users to contribute here, and I do
hope that with an emphasis on writing just a small document that
describes something we can get more people to contribute on this
front.

Thanks for the feedback.
Brad
From chapmanb at uga.edu  Wed Mar 17 20:47:38 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Wed Mar 17 20:59:01 2004
Subject: [BioPython] contig mapping in BioPython
In-Reply-To: <6.0.3.0.2.20040424150959.03801db0@wardroper.org>
References: <6.0.3.0.2.20040424150959.03801db0@wardroper.org>
Message-ID: <20040318014738.GG99271@evostick.agtec.uga.edu>

Hi Alan;

> I'm thinking about writing some BioPython modules for contig/genome mapping 
> - something akin to BioPerl's Bio::Assembler::contig - for use in genome 
> mapping (and whatever else it ends up lending itself to).

Cool. This is definitely something we can use. A reasonable set of
Python objects to hold contig information, and annotations on the
assemblies of contigs, would be excellent. BioPerl does seem a
reasonable place to start to look at these objects, especially since
they've dealt with some of the messy problems of sequence
coordinates along contigs.

> Can't find any references to any such projects that are ongoing but would 
> like to check if anyone else is working on this before I put in too much 
> time in reinventing more wheels than we need.
> Anyone think this would/would not be useful?

Definitely useful. As Andy pointed out, the helpful code along these
lines that we currently have is in CVS in Bio/Sequencing. These are
parsers for Phred and Ace contig files, the latter of which I
imagine might be most useful/relevant.

But yes, I do offer definite encouragement :-). Please do keep us up
to date with ideas/code.

Thanks for the mail.
Brad
From chapmanb at uga.edu  Wed Mar 17 21:32:28 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Wed Mar 17 21:43:58 2004
Subject: [BioPython] trying to make NBRF dictionary
In-Reply-To: <1079387868.6757.20.camel@nate.ucdavis.edu>
References: <1079387868.6757.20.camel@nate.ucdavis.edu>
Message-ID: <20040318023228.GJ99271@evostick.agtec.uga.edu>

Hi Ashleigh;

> Hello.  As there seems to be no existing Bio.Fasta-style dictionary code
> for alignments (Clustalw or NBRF), I thought I'd try to write a simple
> script using the NBRF iterator to make a dictionary of sequence
> name:sequence key:value pairs.

Okay, this makes good sense.

> I'm stuck already.  My code seems to make a dictionary of sorts, 
> but it behaves like it only
> has 1 key:value pair rather than 4 (len(mydict) returns 1) and the keys 
> are just my variable name (cur_record.sequence_name), not what I think
> the keys should be - the actual data I put into the dictionary.  I'm
> guessing that means I have some scope problem.

Yes, I think you're right. The output you gave seems to be what you
actually want (or at least what you describe you want above) but the
code itself does contain a bit of confusion with the mydict
dictionary, so it's probably something in the code that we don't see
in the example.

> mydict={}
>  
> def makedict(file1):
>      parser=NBRF.RecordParser()
>      first_file=open(file1, 'r')
>      iterator=NBRF.Iterator(first_file, parser)
>       
>      while 1:
>          cur_record=iterator.next()
>          if cur_record is None:
>              break
>          name=cur_record.sequence_name
>          sequence=cur_record.sequence.data
>          mydict[name] = sequence
>           
>      return mydict

Okay, that major confusion here is that mydict should be internal to
the makedict function. It seems like you would get an
UnboundLocalError with the code you posted, so I'm not exactly sure,
but guessing your function should look like:

def makedict(file1):
     parser=NBRF.RecordParser()
     first_file=open(file1, 'r')
     iterator=NBRF.Iterator(first_file, parser)
     mydict = {}

     while 1:
         cur_record=iterator.next()
         if cur_record is None:
             break
         name=cur_record.sequence_name
         sequence=cur_record.sequence.data
         mydict[name] = sequence

     return mydict

Then you should be able to call it without any problem doing
something like:

file1_dict = makedict("my_file1.nbrf")
file2_dict = makedict("my_file2.nbrf")

>From the problems you are describing, it should like you are doing
something where you reassign mydict because it is used both
internally and externally of the function. 

One of the major problems with using functions (definitely forgive
me if I'm being too simplified here) is not having a good grasp of
which variables are internal to the function and which are external.
In general you want to focus on remembering that the only outside
information you should be passing the function is the argument
(file1 in this case) and the only information you should get back is
what you return (the dictionary in this case).

But, I digress. Hope this helps.
Brad
From thamelry at binf.ku.dk  Fri Mar 19 04:08:28 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Fri Mar 19 08:31:49 2004
Subject: [BioPython] PDB header parser
In-Reply-To: <20040226001009.GB20365@evostick.agtec.uga.edu>
References: <20040226001009.GB20365@evostick.agtec.uga.edu>
Message-ID: <200403191008.28748.thamelry@binf.ku.dk>

Hi everybody,

Thanks to Kristian Rother (again!), Bio.PDB (the Biopython module that deals 
with macromolecular structure data) now also provides convenient access to a 
PDB file's header information.

The Structure class now has a 'header' attribute which is a dictionary whose
keys are the header fields.

For example

>>> structure.header["structure_method"]
x-ray diffraction
>>> structure.header["resolution"]
2.2

The code is in the CVS repository.

Best regards,

---
Thomas Hamelryck
Bioinformatik centret      
Universitetsparken 15     
Bygning 10                 
DK-2100 K?benhavn ?
Denmark
http://www.binf.ku.dk/users/thamelry/

From thamelry at binf.ku.dk  Fri Mar 19 04:23:01 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Fri Mar 19 08:31:50 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <20040212235232.GB2841@evostick.agtec.uga.edu>
References: <401AC30A.E639BD2E@ebc.uu.se> <40226C51.F70315B7@ebc.uu.se>
	<20040212235232.GB2841@evostick.agtec.uga.edu>
Message-ID: <200403191023.01597.thamelry@binf.ku.dk>

Hi everybody,

I recently moved to a new position and of course immediately started
to convert my colleagues to (Bio)Python :-). One of the most often asked 
questions with respect to Biopython here is "Does Biopython have the same 
functionalities as Bioperl?". Is there a document somewhere that compares 
BioPerl and BioPython? Would be REALLY useful. 

Another question: is anybody using BioPerl from BioPython? 
If so, how? BioCorba?

And then some suggestions: I think it's time to do something about the 
Biopython documentation, and maybe remove some obsolete, incomplete and/or 
unmaintained code. At the moment it's a bit difficult to get a good overview 
of what is present and useable in Biopython, I think... 

Brad, I vaguely remember that you mentioned something about replacing 
HappyDoc? I'd be happy to help out.... 

Best regards,  

---
Thomas Hamelryck
Bioinformatik centret      
Universitetsparken 15     
Bygning 10                 
DK-2100 K?benhavn ?
Denmark
http://www.binf.ku.dk/users/thamelry/

From sbassi at asalup.org  Fri Mar 19 12:06:18 2004
From: sbassi at asalup.org (Sebastian Bassi)
Date: Fri Mar 19 12:12:31 2004
Subject: [BioPython] Parsing genbank problem
Message-ID: <405B288A.5050902@asalup.org>

Hello,

I've been trying to parse a gb file to no avail.
Here is my code (extracted from biopython cookbook)

from Bio import GenBank
from Bio.Seq import MutableSeq
from Bio.Alphabet import IUPAC
from Bio import utils

gb_handle = open("f:\\download\\ors.gbk","r")
feature_parser=GenBank.FeatureParser()
iterator = GenBank.Iterator(gb_handle, feature_parser)
while 1:
     cur_entry=iterator.next()
     if cur_entry is None:
         break
     print "test", cur_entry.id
gb_handle.close()

In ors.gbk file there are lots of genbank entries (don't post it here 
because is 800Kb long).
Here is what I get:

Traceback (most recent call last):
   File "C:/Program Files/Python22/parseGB.py", line 13, in ?
     cur_entry=iterator.next()
   File 
"C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 
183, in next
     return self._parser.parse(File.StringHandle(data))
   File 
"C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 
268, in parse
     self._scanner.feed(handle, self._consumer)
   File 
"C:\PROGRA~1\Python22\Lib\site-packages\Bio\GenBank\__init__.py", line 
1255, in feed
     self._parser.parseFile(handle)
   File "C:\PROGRA~1\Python22\Lib\site-packages\Martel\Parser.py", line 
338, in parseFile
     self.parseString(fileobj.read())
   File "C:\PROGRA~1\Python22\Lib\site-packages\Martel\Parser.py", line 
366, in parseString
     self._err_handler.fatalError(result)
   File 
"C:\PROGRA~1\Python22\Lib\site-packages\_xmlplus\sax\handler.py", line 
38, in fatalError
     raise exception
ParserPositionException: error parsing at or beyond character 1496

From chapmanb at uga.edu  Fri Mar 19 12:18:57 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Fri Mar 19 12:30:16 2004
Subject: [BioPython] Parsing genbank problem
In-Reply-To: <405B288A.5050902@asalup.org>
References: <405B288A.5050902@asalup.org>
Message-ID: <20040319171857.GB95219@evostick.agtec.uga.edu>

Hi Sebastian;

> I've been trying to parse a gb file to no avail.
> Here is my code (extracted from biopython cookbook)

Your code looks just fine, so the traceback...

> Traceback (most recent call last):
[...]
> ParserPositionException: error parsing at or beyond character 1496

...indicates that there is a problem with the Martel grammar reading
one of the records in your file. I've done a number of fixes to the
GenBank parser since the last release, so if you could check things
out with the latest CVS that should hopefully fix things.

Alternatively (or if you still have problems). You can find out more
information about where the parser is failing by initializing your
parser with debug_level = 2:

feature_parser=GenBank.FeatureParser(debug_level = 2)

This will cause Martel to spit out lots of information and likely
tell you exactly where things are failing. But the best bet is to
get the latest CVS and use that. I'm hoping to push out a new
release semi-soon to get the code out there, but CVS is the way to
go until then.

Hope this helps!
Brad
From chapmanb at uga.edu  Fri Mar 19 12:37:03 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Fri Mar 19 12:48:21 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <200403191023.01597.thamelry@binf.ku.dk>
References: <401AC30A.E639BD2E@ebc.uu.se> <40226C51.F70315B7@ebc.uu.se>
	<20040212235232.GB2841@evostick.agtec.uga.edu>
	<200403191023.01597.thamelry@binf.ku.dk>
Message-ID: <20040319173703.GC95219@evostick.agtec.uga.edu>

Hi Thomas;

> I recently moved to a new position and of course immediately started
> to convert my colleagues to (Bio)Python :-). One of the most often asked 
> questions with respect to Biopython here is "Does Biopython have the same 
> functionalities as Bioperl?". Is there a document somewhere that compares 
> BioPerl and BioPython? Would be REALLY useful. 

No, not that I know of. Honestly, I am not a big fan of
BioPerl/Biopython comparisons just as I'm not a huge fan of
Perl/Python comparisons -- I'm all for sticking with what you like
and working with it. But definitely if it were useful to people
looking at the projects, I'd be for having that kind of document.

> Another question: is anybody using BioPerl from BioPython? 
> If so, how? BioCorba?

BioCorba is for all intensive purposes dead. The code still works
and all but it was not really being used and I've stopped doing
development on it (so I can graduate and all :-).

> And then some suggestions: I think it's time to do something about the 
> Biopython documentation, and maybe remove some obsolete, incomplete and/or 
> unmaintained code. At the moment it's a bit difficult to get a good overview 
> of what is present and useable in Biopython, I think... 

Agreed. About the docs -- as I mentioned the other day, I'm planning
to factor out the Tutorial into smaller cookbook-style sections
(there is a directory in CVS -- Doc/cookbook, that I've started
populating). For this weekend I have in my head to pull out at least 
the "Working with sequences" section thanks to the feedback I got
from Marc, and to pull out and update the Bio.db registries section.

I'd certainly welcome help on this front -- from yourself and anyone
else. Taking it as quick as it'll go but that's my current plan to
keep the useful docs and fix them to be up to date as possible.

If you are talking to beginners in Python, it might also be nice to
point them to Katja and Catherine's course:

http://www.pasteur.fr/recherche/unites/sis/formation/python/

which is linked from the documentation page. This does have a lot of
very nice code and explanations for getting started.

As far as code goes -- if you have suggestions for modules that are
no longer useful and we don't think can be fixed/updated easily
please do suggest a plan of action. We can get a survey on whether
others use these modules and then decide where to go. It's always a
good idea to keep out cruft.

> Brad, I vaguely remember that you mentioned something about replacing 
> HappyDoc? I'd be happy to help out.... 

Yeah, I will also play around with that this weekend. But, I do
think the number one priority on the doc front is extracting the
Tutorial into smaller sections. If you want to help on that, it
would be great. For instance, the PDB module could have it's own
documentation section :-).

Thanks for the comments -- I'd be interested to know what others
think about the plans, and also very interested in others picking a
section of the Tutorial to work on :-).

Thanks-for-the-PDB-updates-as-well-ly yr's,
Brad
From idoerg at burnham.org  Fri Mar 19 12:46:34 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Fri Mar 19 12:54:39 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <200403191023.01597.thamelry@binf.ku.dk>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<40226C51.F70315B7@ebc.uu.se>	<20040212235232.GB2841@evostick.agtec.uga.edu>
	<200403191023.01597.thamelry@binf.ku.dk>
Message-ID: <405B31FA.5000509@burnham.org>

Hi,

Regarding the documentation: how about adopting two models to help keep 
it up-to-date:

1) CVS the Biopython Book. In that manner, it will be easy for people to 
insert fixes/updates new entries, etc. etc. See the plone book

http://plone.org/documentation

CVS on http://sourceforge.net/projects/plone-docs

2) Online comments from users, like in the Zope or MySQL manuals. that 
would be helpful in identifying glaring gaps in the docs.

Thomas, I'm glad to hear you're perfoeming some missionary work as well 
...;)

Oh, and thanks for the PDB header adition.

./I

Thomas Hamelryck wrote:
> Hi everybody,
> 
> I recently moved to a new position and of course immediately started
> to convert my colleagues to (Bio)Python :-). One of the most often asked 
> questions with respect to Biopython here is "Does Biopython have the same 
> functionalities as Bioperl?". Is there a document somewhere that compares 
> BioPerl and BioPython? Would be REALLY useful. 
> 
> Another question: is anybody using BioPerl from BioPython? 
> If so, how? BioCorba?
> 
> And then some suggestions: I think it's time to do something about the 
> Biopython documentation, and maybe remove some obsolete, incomplete and/or 
> unmaintained code. At the moment it's a bit difficult to get a good overview 
> of what is present and useable in Biopython, I think... 
> 
> Brad, I vaguely remember that you mentioned something about replacing 
> HappyDoc? I'd be happy to help out.... 
> 
> Best regards,  
> 
> ---
> Thomas Hamelryck
> Bioinformatik centret      
> Universitetsparken 15     
> Bygning 10                 
> DK-2100 K?benhavn ?
> Denmark
> http://www.binf.ku.dk/users/thamelry/

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo

From anunberg at oriongenomics.com  Fri Mar 19 13:29:51 2004
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Fri Mar 19 14:15:11 2004
Subject: [BioPython] Problem parsing genbank file
Message-ID: 

I just updated from cvs and got this error when trying to parse a genbank
file that had mutliple genbank files in it, I got this error :
Traceback (most recent call last):
  File "/loginhome/anunberg/bin/bac_hits.py", line 239, in ?
    main()
  File "/loginhome/anunberg/bin/bac_hits.py", line 98, in main
    seq_record = iterator.next()# go through each record
  File "/compbio/lib/python/Bio/GenBank/__init__.py", line 130, in next
    return self._parser.parse(File.StringHandle(data))
  File "/compbio/lib/python/Bio/GenBank/__init__.py", line 220, in parse
    self._scanner.feed(handle, self._consumer)
  File "/compbio/lib/python/Bio/GenBank/__init__.py", line 1248, in feed
    self._parser.parseFile(handle)
  File "/compbio/lib/python/Martel/Parser.py", line 328, in parseFile
    self.parseString(fileobj.read())
  File "/compbio/lib/python/Martel/Parser.py", line 356, in parseString
    self._err_handler.fatalError(result)
  File "/usr/local/lib/python2.3/xml/sax/handler.py", line 38, in fatalError
    raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond character
20805

The parser seems to work if the genbank file only has one record

I will suggest it again, PLEASE PLEASE PLEASE tag the code in cvs so I can
revert to stable versions easily.
I am now using biopython regularly and I am on a bit of schedule for some of
this work.  Updating code is fine however tagging it will save some
headaches..

-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com

From jeffrey_chang at stanfordalumni.org  Fri Mar 19 14:23:57 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Fri Mar 19 14:29:23 2004
Subject: [BioPython] Problem parsing genbank file
In-Reply-To: 
References: 
Message-ID: 

On Mar 19, 2004, at 1:29 PM, Andrew Nunberg wrote:

> I will suggest it again, PLEASE PLEASE PLEASE tag the code in cvs so I 
> can
> revert to stable versions easily.
> I am now using biopython regularly and I am on a bit of schedule for 
> some of
> this work.  Updating code is fine however tagging it will save some
> headaches..

We have tagged all the releases, back to the beginning.  You should be 
able to revert to any of the following releases:
symbolic names:
         biopython-124: 1.8
         biopython-123: 1.7
         biopython-122: 1.7
         biopython-121: 1.6
         biopython-120: 1.6
         biopython-110: 1.5
         biopython-100a4: 1.4
         biopython-100a3: 1.4
         biopython-100a2: 1.4
         biopython-100a1: 1.3
         biopython-090d02: 1.1
         biopython-090d01: 1.1

If I recall, though, the tag for one of the very early releases got 
lost somewhere along the way...  :(

Jeff

From thamelry at binf.ku.dk  Fri Mar 19 15:15:49 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Fri Mar 19 15:27:18 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <405B31FA.5000509@burnham.org>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<405B31FA.5000509@burnham.org>
Message-ID: <200403192115.49577.thamelry@binf.ku.dk>

On Friday 19 March 2004 18:46, Iddo Friedberg wrote:

> 1) CVS the Biopython Book. In that manner, it will be easy for people to
> insert fixes/updates new entries, etc. etc. See the plone book

That sounds like a good idea... Didn't it use to be in the CVS? 
I can't find it anymore...

> Oh, and thanks for the PDB header adition.

Thanks should go to Kristian Rother (he also donated a module 
to download PDB files and keep a local PDB database up-to-date
previously).

Nice!

-Thomas

From thamelry at binf.ku.dk  Fri Mar 19 15:51:05 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Fri Mar 19 16:24:04 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <20040319173703.GC95219@evostick.agtec.uga.edu>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<20040319173703.GC95219@evostick.agtec.uga.edu>
Message-ID: <200403192151.05004.thamelry@binf.ku.dk>

On Friday 19 March 2004 18:37, Brad Chapman wrote:

> No, not that I know of. Honestly, I am not a big fan of
> BioPerl/Biopython comparisons just as I'm not a huge fan of
> Perl/Python comparisons 

I agree, but still, it would be handy to have a page that lists BioPerl and 
Biopython features so that people could decide what they want to use for a 
certain purpose. I'm not suggesting a 'BioPython is better than BioPerl 
page'. That is indeed pointless.

> As far as code goes -- if you have suggestions for modules that are
> no longer useful and we don't think can be fixed/updated easily
> please do suggest a plan of action. We can get a survey on whether
> others use these modules and then decide where to go. It's always a
> good idea to keep out cruft.

We could make a list of modules that will be potentially removed, post it to 
the biopython list, and then actually remove them when no-one objects. Is 
anybody using the two HMMs (HMM and MarkovModel) for instance? Or the 
support vector machine (SVM) and NeuralNetwork modules? The xKMeans, 
KNN and KMeans clustering modules also seem to be obsolete in view of Michiel 
de Hoons clustering module. 

> Yeah, I will also play around with that this weekend. But, I do
> think the number one priority on the doc front is extracting the
> Tutorial into smaller sections. If you want to help on that, it
> would be great. For instance, the PDB module could have it's own
> documentation section :-).

That's coming up!

I am very much in favor of automatically generated documentation. Each module 
should at least have 5-10 lines or so in api/Bio/index.html that describe 
what the module actually does! 
What are you thinking of using in the future? I must admit that the HappyDoc 
requirements for generating good and readable descriptions are a bit of a 
mystery to me.... Bio.PDB looks very ugly (my fault, probably). I'd sure like 
to improve that (especially since Bio.PDB is actually pretty well commented 
code).

In any case, thanks a lot for all the work you put into Biopython!

-Thomas

From mdehoon at ims.u-tokyo.ac.jp  Sat Mar 20 01:29:55 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Sat Mar 20 01:35:31 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <200403192151.05004.thamelry@binf.ku.dk>
References: <401AC30A.E639BD2E@ebc.uu.se>	<200403191023.01597.thamelry@binf.ku.dk>	<20040319173703.GC95219@evostick.agtec.uga.edu>
	<200403192151.05004.thamelry@binf.ku.dk>
Message-ID: <405BE4E3.6000103@ims.u-tokyo.ac.jp>

Thomas Hamelryck wrote:
> We could make a list of modules that will be potentially removed, post it to 
> the biopython list, and then actually remove them when no-one objects. Is 
> anybody using the two HMMs (HMM and MarkovModel) for instance? Or the 
> support vector machine (SVM) and NeuralNetwork modules? The xKMeans, 
> KNN and KMeans clustering modules also seem to be obsolete in view of Michiel 
> de Hoons clustering module. 

The xKMeans and KMeans can be considered obsolete, as they are included in 
Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are 
currently not obsolete, as they contain supervised learning methods, which are 
not included in Bio.Cluster.

I am not sure what the purpose is of the GA (Genetic Algorithm Neural Network) 
module and the NeuralNetwork module. Are they the same? Is their usage described 
somewhere?

After cleaning up the modules, it may be a good idea to set up some kind of 
unified way to deal with gene expression data in Biopython.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon

From thamelry at binf.ku.dk  Sat Mar 20 02:54:09 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Sat Mar 20 21:53:17 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <405BE4E3.6000103@ims.u-tokyo.ac.jp>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403192151.05004.thamelry@binf.ku.dk>
	<405BE4E3.6000103@ims.u-tokyo.ac.jp>
Message-ID: <200403200854.09983.thamelry@binf.ku.dk>

On Saturday 20 March 2004 07:29, Michiel Jan Laurens de Hoon wrote:

> I am not sure what the purpose is of the GA (Genetic Algorithm Neural
> Network) module and the NeuralNetwork module. Are they the same? Is their
> usage described somewhere?

They are not the same. GA is a genetic algorithm framework and NeuralNetwork 
is a neural network (which seems to have some special features to deal with 
genes as input). They are both potentially interesting (providing that they 
actually work) but it's a complete mystery how they are to be used. Does 
anybody know who implemented these modules?

-Thomas

From chapmanb at uga.edu  Sat Mar 20 11:32:02 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sat Mar 20 22:51:16 2004
Subject: [BioPython] Problem parsing genbank file
In-Reply-To: 
References: 
Message-ID: <20040320163202.GF95219@evostick.agtec.uga.edu>

Hi Andy;

> I just updated from cvs and got this error when trying to parse a genbank
> file that had mutliple genbank files in it, I got this error :
> Traceback (most recent call last):
[...]
> Martel.Parser.ParserPositionException: error parsing at or beyond character
> 20805

Thanks for sending me the file separately. It actually looks like
one of the records in the file: AF124045, was somehow corrupted. The
region where the parser fails looks like:

     misc_feature    <38880..>39000
                     /note="putative breakpoint of recombination in orthologous
                     maize region, 
                     38875 bp is the end of homology, >38875-50877repeat_region   join(<38904..38924,38960..>39022)
                     /note="CT-rich stretches"
                     /evidence=not_experimental

where in the original file (from NCBI), it looks like:

     misc_feature    <38880..>39000
                     /note="putative breakpoint of recombination in orthologous
                     maize region,
                     38875 bp is the end of homology, >38875-50877< region
                     missing in maize; Region: Breakpoint"
                     /evidence=not_experimental
     repeat_region   join(<38904..38924,38960..>39022)
                     /note="CT-rich stretches"
                     /evidence=not_experimental

So somehow it looks like the text from "< region missing in" to the
next feature key (repeat_region) was deleted.

I've not seen something like this before, but the best solution
seems to be to re-download this record and try parsing it all again.
All of the other records in your file seem to parse fine.

Hope this helps.
Brad
From chapmanb at uga.edu  Sun Mar 21 12:46:05 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Sun Mar 21 12:57:31 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <200403192151.05004.thamelry@binf.ku.dk>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<20040319173703.GC95219@evostick.agtec.uga.edu>
	<200403192151.05004.thamelry@binf.ku.dk>
Message-ID: <20040321174605.GA18818@evostick.agtec.uga.edu>

Hey all;
Great discussion on things. I'll try to touch on all the points in
one e-mail; apologies for the length.

[Automated documentation generation]
Thomas:
> What are you thinking of using in the future? I must admit that the HappyDoc 
> requirements for generating good and readable descriptions are a bit of a 
> mystery to me.... Bio.PDB looks very ugly (my fault, probably). I'd sure like 
> to improve that (especially since Bio.PDB is actually pretty well commented 
> code).

Yes, I've not been a fan of HappyDoc for a while. I was pointed to,
and really like, epydoc. Please take a look at:

http://biopython.org/docs/api/private/trees.html

and let me know what you think. I am a big fan of the new output, and we
can just pull out of the text documentation of modules, classes and
functions without having to try and format it up in some pretty way.
I made a number of small modifications to the docs to get them to 
look nicer under this system.

I'd like to stick with epydoc unless people have objections. I added
some documentation to the end of the contributing guidelines to
describe the simple things you can do to make your modules, classes
and functions be maximally useful with epydoc:

http://biopython.org/docs/developer/contrib.html

[Non-automated documentation (someone has to write it style)]

Iddo:
> Regarding the documentation: how about adopting two models to help keep 
> it up-to-date:
> 
> 1) CVS the Biopython Book. In that manner, it will be easy for people to 
> insert fixes/updates new entries, etc. etc. See the plone book
> 
> http://plone.org/documentation

The documentation is in CVS -- Docs/Tutorial.tex and Docs/cookbook
for all the new cookbook stuff (one directory per example there).

As far as getting a framework like Plone in place, I honestly am not
sure I am really for that. I do think it is a good idea, but our
attempts at the Wiki in the past have really soured me on "fancier"
ways to generate documentation.

Really, what I'd like to see is contributions from people in the new
cookbook style. This requires no need to learn any type of system --
I'm happy to accept docs in plain text, html, pdf -- anything that
will be viewable on the web. So people can write documentation
however they feel comfortable.

> 2) Online comments from users, like in the Zope or MySQL manuals. that 
> would be helpful in identifying glaring gaps in the docs.

This is a good idea, but also along the lines of my biases against
trying to be fancier than we need to be. Honestly, the user bases of
Zope or MySQL dwarf those of Biopython (although we are catching up
fast :-) and I don't want to put the cart before the horse (or
however that cliche goes).

But those are just my opinions -- I can always be convinced
otherwise :-).

[Removal/Deprecation of modules]
Thomas:
> We could make a list of modules that will be potentially removed, post it to 
> the biopython list, and then actually remove them when no-one objects. Is 
> anybody using the two HMMs (HMM and MarkovModel) for instance? Or the 
> support vector machine (SVM) and NeuralNetwork modules? 

Is potential non-use (or trying to assess non-use) really a good 
model to remove modules? If they work and are decently coded then I
think they have a potential use -- I definitely do know that a lot of the
different supervised learning methods are useful to people doing
clustering of literature (which is what I'm pretty positive Jeff
worked on for his thesis).

If things don't work, or are duplicated, then I'm in favor of trying
to get rid of that, but working code seems useful to me.

Thomas:
> The xKMeans, 
> KNN and KMeans clustering modules also seem to be obsolete in view of Michiel 
> de Hoons clustering module. 

Michiel:
> The xKMeans and KMeans can be considered obsolete, as they are included in 
> Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are 
> currently not obsolete, as they contain supervised learning methods, which 
> are not included in Bio.Cluster.

If things are duplicated then the right thing to do is to remove the
duplication. I'd like to consider two things, though:

1. I'd like Jeff to chime in since these are his modules (I think).
I don't have enough knowledge about clustering to know if
Bio.Cluster also does the things that he needed his code to do.

2. We want to make sure to be careful about back-compatibility. If
we decide to remove things, I'd like to first have them raise
DeprecationWarnings for a couple of releases so that people have
time to change their code -- and also have some quick docs about how
to change from new to old. Breaking code is bad, and I want to make
it as easy as possible for people to keep up with changes.

Thomas:
> GA is a genetic algorithm framework and NeuralNetwork 
> is a neural network (which seems to have some special features to deal with 
> genes as input). They are both potentially interesting (providing that they 
> actually work) but it's a complete mystery how they are to be used. Does 
> anybody know who implemented these modules?

Yup, I did. I don't think they are perfect by any means, but they
should still work (all of the tests still pass, at least) and be
useful. I honestly don't use them much myself anymore since I
finished the project I used them for. But, they do need
documentation.

Michiel:
> After cleaning up the modules, it may be a good idea to set up some kind of 
> unified way to deal with gene expression data in Biopython.

Definitely -- I would welcome this. I nominate you to be in charge
:-).

[Miscellaneous bits]

Me:
> > No, not that I know of. Honestly, I am not a big fan of
> > BioPerl/Biopython comparisons just as I'm not a huge fan of
> > Perl/Python comparisons 
Thomas:
> I agree, but still, it would be handy to have a page that lists BioPerl and 
> Biopython features so that people could decide what they want to use for a 
> certain purpose. 

I agree. I'd be happy to accept a document like this :-).

Thanks again for everyone's comments!
Brad
From jeffrey_chang at stanfordalumni.org  Sun Mar 21 16:30:09 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Sun Mar 21 16:35:34 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <20040321174605.GA18818@evostick.agtec.uga.edu>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<20040319173703.GC95219@evostick.agtec.uga.edu>
	<200403192151.05004.thamelry@binf.ku.dk>
	<20040321174605.GA18818@evostick.agtec.uga.edu>
Message-ID: 

On Mar 21, 2004, at 12:46 PM, Brad Chapman wrote:

> Yes, I've not been a fan of HappyDoc for a while. I was pointed to,
> and really like, epydoc. Please take a look at:
>
> http://biopython.org/docs/api/private/trees.html

This looks very nice to me.  Is there any way to ask it to hide private 
methods or variables, i.e. those that begin with "_"?  Although knowing 
what those are is occasionally useful, exposing that extra information 
may be confusing for people reading the docs and trying to figure out 
how to use the module.

>> 2) Online comments from users, like in the Zope or MySQL manuals. that
>> would be helpful in identifying glaring gaps in the docs.
>
> This is a good idea, but also along the lines of my biases against
> trying to be fancier than we need to be. Honestly, the user bases of
> Zope or MySQL dwarf those of Biopython (although we are catching up
> fast :-) and I don't want to put the cart before the horse (or
> however that cliche goes).

Agreed.  People did not take to our wiki, and bioperl doesn't use 
theirs much either.

> [Removal/Deprecation of modules]
> Thomas:
>> We could make a list of modules that will be potentially removed, 
>> post it to
>> the biopython list, and then actually remove them when no-one 
>> objects. Is
>> anybody using the two HMMs (HMM and MarkovModel) for instance? Or the
>> support vector machine (SVM) and NeuralNetwork modules?
>
> Is potential non-use (or trying to assess non-use) really a good
> model to remove modules? If they work and are decently coded then I
> think they have a potential use -- I definitely do know that a lot of 
> the
> different supervised learning methods are useful to people doing
> clustering of literature (which is what I'm pretty positive Jeff
> worked on for his thesis).
>
> If things don't work, or are duplicated, then I'm in favor of trying
> to get rid of that, but working code seems useful to me.
>
> Thomas:
>> The xKMeans,
>> KNN and KMeans clustering modules also seem to be obsolete in view of 
>> Michiel
>> de Hoons clustering module.
>
> Michiel:
>> The xKMeans and KMeans can be considered obsolete, as they are 
>> included in
>> Bio.Cluster. The KNN and other modules under Bio/Tools/Classification 
>> are
>> currently not obsolete, as they contain supervised learning methods, 
>> which
>> are not included in Bio.Cluster.
>
> If things are duplicated then the right thing to do is to remove the
> duplication. I'd like to consider two things, though:
>
> 1. I'd like Jeff to chime in since these are his modules (I think).
> I don't have enough knowledge about clustering to know if
> Bio.Cluster also does the things that he needed his code to do.

kMeans is superceded by Bio.Cluster, and can be deprecated.  Thomas 
wrote xkMeans, which is a visualizer for kMeans, and could be rewritten 
to use Bio.Cluster instead.

MarkovModel is redundant with HMM.  Probably only one of them is 
necessary.

SVM is superceded by libsvm.  It should be deprecated.

kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, 
but need more documentation.  Also, another idea is that they could be 
donated to the pyml project.  Currently, no code in Biopython depends 
on them.  However, they might be useful for a microarray package, in 
which case donating them would introduce another dependency.

Jeff

From chapmanb at uga.edu  Mon Mar 22 18:06:54 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Mon Mar 22 18:18:13 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: 
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<20040319173703.GC95219@evostick.agtec.uga.edu>
	<200403192151.05004.thamelry@binf.ku.dk>
	<20040321174605.GA18818@evostick.agtec.uga.edu>

Message-ID: <20040322230654.GF22666@evostick.agtec.uga.edu>

Hey Jeff and everyone;

Me:
> >Yes, I've not been a fan of HappyDoc for a while. I was pointed to,
> >and really like, epydoc. Please take a look at:
> >
> >http://biopython.org/docs/api/private/trees.html
> 
> This looks very nice to me.  Is there any way to ask it to hide private 
> methods or variables, i.e. those that begin with "_"?  Although knowing 
> what those are is occasionally useful, exposing that extra information 
> may be confusing for people reading the docs and trying to figure out 
> how to use the module.

Good points. The one I have linked to is actually the version that
includes private variables. If you subsitution public for private in
the url above (or just click "hide private") at the top the private
functions. 

The problem I've had so far is that epydoc hides some public modules
by labelling them as private. I hadn't figured exactly sure how it 
decides what is public and what is private in terms of modules (it
seems to use the _Underscore for classes and functions, which I'm
happy with). 

I played around with it a bit since then and it looks like it was
using the __all__ variable to determine what it public and private.
To be honest, I'd like to remove the use of __all__ completely
unless people object. Unless I'm mistaken it controls what happens
when people do from Bio import * (or from Bio.Whatever import *).
Doing the import * is pretty discouraged now, and for maintenence it
is fairly annoying to have variables you have to make sure are
updated.

Would anyone object to stop using __all__? Any reasons to keep it? I
may be missing the point of it completely.

> kMeans is superceded by Bio.Cluster, and can be deprecated.  Thomas 
> wrote xkMeans, which is a visualizer for kMeans, and could be rewritten 
> to use Bio.Cluster instead.

Okay. I guess this would involve a couple of steps:

1. Starting to raise a Deprecation Warning for the kMeans module.
2. Trying to write some kind of short document on how to switch from
using kMeans to using Bio.Cluster.kcluster. BioPerl has a document
called DEPRECATED with this kind of info -- that seems like a
reasonable step to follow. Jeff and Michiel, would it be possible to
write something up quick.
3. Thomas needs to decide if he wants to rewrite xkMeans or
deprecate it as well.

Also, Thomas did mention the potential usefulness of having both
pure Python and Python/C implementation, in case someone wanted to
use the code for learning purposes. I'm not sure how much this
weighs on people's minds versus maintaining a slimmer code base. It
does seem to me like duplicate versions are a bad for confusion
issues, and because we have limited developer time to maintain and
document things. Anyways, just a point to bring up.

> MarkovModel is redundant with HMM.  Probably only one of them is 
> necessary.

Okay, I wrote HMM a long time ago and really haven't used it much
since then. I think you wrote MarkovModel. Both have tests and
things. MarkovModel has the serious advantage of having a C module
underlying it, which I think makes it the best candidate for
keeping.

I'd be very happy if we could get a volunteer to look at these and
decide if one has more functionality then the other, and then move
forward on this. Anyone excited about volunteering? If I can't get
someone, I can try to look at this myself (but not real soon).

> SVM is superceded by libsvm.  It should be deprecated.
> 
> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, 
> but need more documentation.  Also, another idea is that they could be 
> donated to the pyml project.  Currently, no code in Biopython depends 
> on them.  However, they might be useful for a microarray package, in 
> which case donating them would introduce another dependency.

Ah, I didn't know about PyML. It does seem like it would be useful
to try and coordinate with their project -- do you happen to know the
author (Stanford connections and all)? Other candidates for donation
are the recently discussed GA and Neural Network packages.

Lots of thoughts. I think for the next release (which I'd like to
try and do soon-like) I think we should work on the kMeans code as a
priority and go from there.

Brad
From jeffrey_chang at stanfordalumni.org  Mon Mar 22 21:56:15 2004
From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang)
Date: Mon Mar 22 22:01:41 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: <20040322230654.GF22666@evostick.agtec.uga.edu>
References: <401AC30A.E639BD2E@ebc.uu.se>
	<200403191023.01597.thamelry@binf.ku.dk>
	<20040319173703.GC95219@evostick.agtec.uga.edu>
	<200403192151.05004.thamelry@binf.ku.dk>
	<20040321174605.GA18818@evostick.agtec.uga.edu>

	<20040322230654.GF22666@evostick.agtec.uga.edu>
Message-ID: 

On Mar 22, 2004, at 6:06 PM, Brad Chapman wrote:

[Jeff]
>> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful,
>> but need more documentation.  Also, another idea is that they could be
>> donated to the pyml project.  Currently, no code in Biopython depends
>> on them.  However, they might be useful for a microarray package, in
>> which case donating them would introduce another dependency.
>
> Ah, I didn't know about PyML. It does seem like it would be useful
> to try and coordinate with their project -- do you happen to know the
> author (Stanford connections and all)? Other candidates for donation
> are the recently discussed GA and Neural Network packages.

Yes, I've talked to Asa about merging this machine learning code into 
pyml, and he seemed open to the idea.  However, it looked like it would 
be a bit of work to port things over to the pyml style of doing things. 
  OTOH, it's still a bit of work getting the code documented up to the 
point where it is generally useful...

Jeff

From mdehoon at ims.u-tokyo.ac.jp  Mon Mar 22 22:51:19 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Mar 22 22:57:12 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: 
References: <401AC30A.E639BD2E@ebc.uu.se>	<200403191023.01597.thamelry@binf.ku.dk>	<20040319173703.GC95219@evostick.agtec.uga.edu>	<200403192151.05004.thamelry@binf.ku.dk>	<20040321174605.GA18818@evostick.agtec.uga.edu>

Message-ID: <405FB437.5020300@ims.u-tokyo.ac.jp>

Thomas:
> The xKMeans, KNN and KMeans clustering modules also seem to be obsolete in
> view of Michiel de Hoons clustering module.
> 
Michiel:
> The xKMeans and KMeans can be considered obsolete, as they are included in
> Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are
> currently not obsolete, as they contain supervised learning methods, which
> are not included in Bio.Cluster.
Jeffrey Chang wrote:
> kMeans is superceded by Bio.Cluster, and can be deprecated.  Thomas wrote
> xkMeans, which is a visualizer for kMeans, and could be rewritten to use
> Bio.Cluster instead.
> 
Jeffrey Chang wrote:
> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but
> need more documentation.  Also, another idea is that they could be donated to
> the pyml project.  Currently, no code in Biopython depends on them.  However,
> they might be useful for a microarray package, in which case donating them
> would introduce another dependency. Okay. I guess this would involve a couple
> of steps:
Brad:
> 1. Starting to raise a Deprecation Warning for the kMeans module. 2. Trying
> to write some kind of short document on how to switch from using kMeans to
> using Bio.Cluster.kcluster. BioPerl has a document called DEPRECATED with
> this kind of info -- that seems like a reasonable step to follow. Jeff and
> Michiel, would it be possible to write something up quick. 3. Thomas needs to
> decide if he wants to rewrite xkMeans or deprecate it as well.

Michiel again:
1. OK.
2. OK I'll work on that.
3. If I understand correctly, the xkMeans module provides a visualization of the 
progress of the k-means clustering algorithm by showing the cluster sizes. If 
so, it would not be clear how to switch that to using the kcluster in 
Bio.Cluster. One of the key points in Bio.Cluster's kcluster is that it 
automatically repeats the k-means algorithm starting from different initial 
(random) clusterings. For the kMeans module, I assume it performs one run of the 
k-means algorithm, for which the visualization in xkMeans make sense. For 
repeated k-means runs, such a visualization may not be as useful.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon

From mdehoon at ims.u-tokyo.ac.jp  Mon Mar 22 22:59:20 2004
From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon)
Date: Mon Mar 22 23:04:54 2004
Subject: [BioPython] Questions & suggestions
In-Reply-To: 
References: <401AC30A.E639BD2E@ebc.uu.se>	<200403191023.01597.thamelry@binf.ku.dk>	<20040319173703.GC95219@evostick.agtec.uga.edu>	<200403192151.05004.thamelry@binf.ku.dk>	<20040321174605.GA18818@evostick.agtec.uga.edu>

Message-ID: <405FB618.1070901@ims.u-tokyo.ac.jp>

Jeffrey Chang wrote:
> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but
> need more documentation.  Also, another idea is that they could be donated to
> the pyml project.  Currently, no code in Biopython depends on them.  However,
> they might be useful for a microarray package, in which case donating them
> would introduce another dependency.

Biopython and pyml are likely to have different goals for these routines. In 
particular a routine as kNN is quite useful for microarray data analysis, and I 
expect that the routine will get updated over time to be better suited for 
biological (microarray or other) data analysis. So I would suggest to keep these 
routines in Biopython and to continue working on them. We can still donate these 
routines to pyml also, but I would expect that e.g. the Biopython-kNN and the 
pyml-kNN will diverge over time to be most suited for each package's requirements.

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon

From letondal at pasteur.fr  Tue Mar 23 01:49:37 2004
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue Mar 23 01:55:00 2004
Subject: [BioPython] Questions & suggestions 
In-Reply-To: Your message of "Mon, 22 Mar 2004 18:06:54 EST."
	<20040322230654.GF22666@evostick.agtec.uga.edu> 
Message-ID: <200403230649.i2N6nbUe239936@electre.pasteur.fr>

Hi,

> > >http://biopython.org/docs/api/private/trees.html

This document is absolutely useful!

Something that could be useful too would be to have an example in the
documentation of the modules (available by pydoc or the Web page
as the one at this url). Something like the Synopsis in bioperl modules, where
the rule is that you can cut and paste the example and it is supposed
to work. Maybe, also: including full examples where interactions between several
classes are explained could be useful. Having recently teached
biopython to biologists, I observed this was the most difficult: which
class is playing which role - that's quite complex.

--
Catherine Letondal -- Pasteur Institute Computing Center
From anunberg at oriongenomics.com  Tue Mar 23 12:51:40 2004
From: anunberg at oriongenomics.com (Andrew Nunberg)
Date: Tue Mar 23 13:25:25 2004
Subject: [BioPython] GFF parser?
Message-ID: 

Hi,
I was looking through Biopython looking for a GFF library that does parsing
and creates a seq feature object.

I did find a GFF module but I wasn?t sure what it was doing
Andy
-- 
Andrew Nunberg
Bioinformagician
Orion Genomics
(314)-615-6989
www.oriongenomics.com

From aaron at ocelot-atroxen.dyndns.org  Thu Mar 25 01:14:56 2004
From: aaron at ocelot-atroxen.dyndns.org (Aaron Zschau)
Date: Thu Mar 25 01:20:17 2004
Subject: [BioPython] parsing blast results for use in clustal
Message-ID: 

I'm new to biopython and python in general. I am trying to take the 
results from a blast search to feed into a clustal multiple alignment.  
I followed the cookbook tutorials and can get results from blast but 
parsing into a file that clustal can read is giving me some trouble.  
(my current code prints all results under the e_value threshold with 
index numbers, and I then take user input to take the selected results 
and print them into a file) from what I can tell, some of the title 
records in my blast results have newline characters in them and are 
causing my resulting file to throw up seg faults when it runs in 
clustal.

is there an easier way to send selected blast results to a file that 
clustal can easily read?

thanks in advance,

Aaron Zschau

from Bio.Blast import NCBIWWW
b_parser = NCBIWWW.BlastParser()
b_record = b_parser.parse(blast_results)
index = 0
E_VALUE_THRESH = 0.01
for alignment in b_record.alignments:
	for hsp in alignment.hsps:
		if hsp.expect < E_VALUE_THRESH:
			print "[" + str(index) + "] " + alignment.title + "    " + 
hsp.match[0:20] + '...'
	index = index + 1
output_file = open('clustal-in', 'w')
while 1:
	input = raw_input("\nEnter the index of the next sequence to align or 
'a' to align\n")
	if input=='a':
		break
	else:
		output_file.write(b_record.alignments[int(input)].title[0:5] + "    " 
+ b_record.alignments[int(input)].hsps[0].match[0:alignment.length] + 
"\n")
output_file.close()

From pal at cbu.uib.no  Thu Mar 25 06:06:29 2004
From: pal at cbu.uib.no (Paal Puntervoll)
Date: Thu Mar 25 06:11:52 2004
Subject: [BioPython] PHI-BLAST support?
Message-ID: <20040325110629.GB6048@svartfuru.ii.uib.no>

Hi,

I'm wondering whether BioPython supports doing PHI-BLAST searches (locally), 
and whether parsing of PHI-BLAST output information such as pattern positions
is supported (see below)?

Excerpet from PHI-BLAST output
-------
Query:  541  SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA 600
pattern 557                  ****
             SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA
Sbjct:  541  SEGHGVSLGSSLASPDLKMGNLQNSPVNMNPPPLSKMGSLDSKDCFGLYGEPSEGTTGQA 600
-------

btw, here's the command I issue to run PHI-BLAST from command-line:

    blastpgp -i [infile] -k [patternfile] -p patseedp

P?l

-- 
P?l Puntervoll
Computational Biology Unit
Bergen Centre for Computational Science
University of Bergen
Phone: +47 555 84040
From pwilkinson at videotron.ca  Fri Mar 26 00:04:23 2004
From: pwilkinson at videotron.ca (Peter Wilkinson)
Date: Fri Mar 26 00:09:43 2004
Subject: [BioPython] Martel Question
Message-ID: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca>

I have just built a parser for Quantarray ... but not with Martel. I did 
this as warm up for a pile of code that I need to write for a microarray 
project (~3000 arrays). It has been some time since I have written any 
code, and I need to "get into it".

This parser is built with the typical scanner consumer model philosophy, 
built on a state machine that will handle the quantarray output files. This 
parser was not built to load anything into the memory. It was meant to be 
as fast as possible to transform the original file into a new format 
written to disk :I was processing many files 5mb each x 3000... . So each 
line was processed from the input stream and processed and discarded. 
Eventually I will be building matrices of 19000 genes by 3000 samples from 
the quantarray files that I will be reading and I need something that can 
load into a Quantarray Record object, however I was a little worried about 
the Record sizes. There is only 1 record per file, which might be the 
saving grace. When I was parsing Genbank genomic files (many megs), the 
Genbank parser was slowing to a crawl (and required piles of memory); 1 
genomic record per file.

I would like to know if Martel scales to processing 5mb Records at a time, 
if the entire file is  is in memory? Has Martel been improved over the last 
few months in the regard ... I may have a need to parse large Genbank NT 
records again.

In Dalke's Martel paper it reads "Similarly, it should be possible to read 
data from the input stream only when required, so that overall memory 
footprint stays low. " is that still to be done ?

Peter

From chapmanb at uga.edu  Thu Mar 25 19:39:02 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Fri Mar 26 00:52:23 2004
Subject: [BioPython] parsing blast results for use in clustal
In-Reply-To: 
References: 
Message-ID: <20040326003902.GA24957@misterbd.agtec.uga.edu>

Hi Aaron;

> I'm new to biopython and python in general. I am trying to take the 
> results from a blast search to feed into a clustal multiple alignment.  
> I followed the cookbook tutorials and can get results from blast but 
> parsing into a file that clustal can read is giving me some trouble.

Okay, so if I'm understanding you correctly, what you want is a file
that you can put into Clustalw to do an alignment. From the code you
supplied, it looks like what you are printing out is clustalw aln
output -- the results from an alignment.

If I can try and extrapolate, what you probably want to do is
retrieve the FASTA record for the hit and then write this to a file
-- then subsequently use clustalw for the alignment.

If I'm at all interpreting you correctly, then you can do this quite
readily. Since it's NCBIWWW, I'll assume you are BLASting against
some kind of standard NCBI database. Then you'll just need to split
the title of the hit to get out the GI or accession number. With
this, you can retrieve the corresponding full length FASTA record
from NCBI with code like:

>>> accession = "AAN04997.1"
>>> from Bio import GenBank
>>> dict = GenBank.NCBIDictionary(format = "fasta")
>>> rec = dict[accession]
>>> print rec
>gi|22725997|gb|AAN04997.1| putative transcription initiation factor [Oryza sativa (japonica cultivar-group)]
MGSADLVLKAACEGCGSPSDLYGTSCKHTTLCSSCGKSMALSGARCLVCSAPITNLIREYNVRANATTDK
SFSIGRFVTGLPPFSKKKSAENKWSLHKEGLQGRQIPENMREKYNRKPWILEDETGQYQYQGQMEGSQSS
TATYYLLMMHGKEFHAYPAGSWYNFSKIAQYKQLTLEEAEEKMNKRKTSATGYERWMMKAATNGPAAFGS
DVKKLEPTNGTEKENARPKKGKNNEEGNNSDKGEEDEEEEAARKNRLALNKKSMDDDEEGGKDLDFDLDD
EIEKGDDWEHEETFTDDDEAVDIDPEERADLAPEIPAPPEIKQDDEENEEEGGLSKSGKELKKLLGKAAG
LNESDADEDDEDDDQEDESSPVLAPKQKDQPKDEPVDNSPAKPTPSGHARGTPPASKSKQKRKSGGGDDS
KASGGAASKKAKVESDTKPSVAKDETPSSSKPASKATAASKTSANVSPVTEDEIRTVLLAVAPVTTQDLV
SRFKSRLRGPEDKNAFAEILKKISKIQKTNGHNYVVLRDDKK

So the returned record is a string FASTA record and you can replace
your output_file.write(...) code with:

output_file.write(rec)

and then end up with a file full of FASTA sequences, which clustalw
will take as input to do a subsequent alignment.

If you wanted to trim the sequence to the length of the hit, you
could parse the Fasta result you retrieve:

>>> from Bio import Fasta
>>> fasta_parser = Fasta.RecordParser()
>>> import StringIO
>>> fasta_rec = fasta_parser.parse(StringIO.StringIO(rec))

Manipulate the sequence:

>>> fasta_rec.sequence = fasta_rec.sequence[20:70]

And then write this out to your file:

output_file.write(str(rec) + "\n")

Hope some of that helped!
Brad
From MBATESALANN at netscape.net  Fri Mar 26 02:41:26 2004
From: MBATESALANN at netscape.net (MBATESALANN@netscape.net)
Date: Fri Mar 26 03:39:50 2004
Subject: [BioPython] REPLY BACK
Message-ID: <0HV6001CVAP0IM@morfeus.helvetia.edu.co>

Dear Friend,

As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. 
My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer.
It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. 
I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my 
business. Though I am very rich, I was never generous, I was always hostile to people and only 
focused on my business as that was the only thing I cared for. But now I regret all this as I now know 
that there is more to life than just wanting to have or make all the money in the world. 
I believe when God gives me a second chance to come to this world I would live my life a different way 
from how I have lived it. Now that God has called me, I have willed and given most of my property 
and assets to my immediate and extended family members as well as a few close friends. 
I want God to be merciful to me and accept my soul so, I have decided to give alms to charity 
organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed 
money to some charity organizations in the U.A.E, Algeria and Malaysia. Now that my health has 
deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one 
of my accounts and distribute the money which I have there to charity organization in Bulgaria and 
Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as 
they seem not to be contended with what I have left 
for them. The last of my money which no one knows of is the huge cash deposit of eighteen million dollars 
$18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations.
I have set aside 10% for you and for your time.

God be with you. 

BATES ALAN

From idoerg at burnham.org  Fri Mar 26 16:31:52 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Fri Mar 26 16:37:37 2004
Subject: [BioPython] Prothon
Message-ID: <4064A148.9020405@burnham.org>

Slightly off topic, but people on this list might be interested:

http://www.prothon.org

 From the homepage (I have nothing to do with these guys, just copied & 
pasted):

``Prothon is a fresh new language that gets rid of classes altogether in 
the same way that Self does and regains the original practical and fun 
sensibility of Python. This major improvement plus many minor ones make 
for a clean new revolutionary break in language development. Prothon is 
quite simple and yet offers the power of Python and Self.

Prothon is also an industrial-strength alternative to Python and Self. 
Prothon uses native threads and a 64-bit architecture to maximize 
performance in applications such as multiple-cpu hosting.''

./I

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo
From shak91 at comcast.net  Sat Mar 27 06:31:59 2004
From: shak91 at comcast.net (shak91@comcast.net)
Date: Sat Mar 27 09:09:34 2004
Subject: [BioPython] Play George W. Bush Credibility Twister
Message-ID: <200403271409.i2RE9Vg2021262@portal.open-bio.org>

Sonny Emerson  wants you to know about President Bush's deception on Iraq.

WHITE HOUSE PLAYS "TWISTER" WITH THE TRUTH -- NOW YOU CAN, TOO!

President Bush and his administration have been twisting the truth when it comes to Iraq. Can you figure out the truth?

Join in the truth-twisting action by playing our new game: George W. Bush Credibility Twister!

	http://www.democrats.org/truth/twister.html?s=taf

FACTS AND FUN FOR THE WHOLE FAMILY

George W. Bush Credibility Twister is an interactive, online game that exposes the facts with the click of a mouse: just how much the president, his advisers, and key agencies knew about the phony claim before it made it into Bush's State of the Union address.

Play George W. Bush Credibility Twister today to learn the facts about President Bush!

	http://www.democrats.org/truth/twister.html?s=taf

HOLD BUSH ACCOUNTABLE: DEMAND AN INVESTIGATION

After you've played the game, sign the online petition demanding an independent, bipartisan investigation into Bush's statement and the intelligence his administration used.

	http://www.democrats.org/truth/twister.html?s=taf

We will deliver your comments to Bush and Congressional leaders and tell them you want the full truth about Bush's deception.

	http://www.democrats.org/truth/twister.html?s=taf

From pieter at laeremans.org  Mon Mar 29 18:56:18 2004
From: pieter at laeremans.org (Pieter Laeremans)
Date: Mon Mar 29 19:16:29 2004
Subject: [BioPython] Installation problem on debian
Message-ID: <877jx3b42l.fsf@hades.kotnet.org>

Hi,

When I'm trying to install biopython on a debian system (sarge), with
python2.3 and all the dependencies installed.

But I get this error:

/tmp/biopython-1.24 $ python setup.py build
running build
running build./_py
creating build
creating build/lib.linux-i686-2.3
creating build/lib.linux-i686-2.3/Bio
copying Bio/DBXRef.py -> build/lib.linux-i686-2.3/Bio
copying Bio/Decode.py -> build/lib.linux-i686-2.3/Bio
copying Bio/DocSQL.py -> build/lib.linux-i686-2.3/
.... 
.....
....
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -IBio -I/usr/include/python2.3 -c Bio/PDB/mmCIF/MMCIFlexmodule.c -o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlexmodule.o
Bio/PDB/mmCIF/MMCIFlexmodule.c: In function `MMCIFlex_open_file':
Bio/PDB/mmCIF/MMCIFlexmodule.c:14: warning: implicit declaration of function `mmcif_set_file'
Bio/PDB/mmCIF/MMCIFlexmodule.c: In function `MMCIFlex_get_token':
Bio/PDB/mmCIF/MMCIFlexmodule.c:42: warning: implicit declaration of function `mmcif_get_token'
Bio/PDB/mmCIF/MMCIFlexmodule.c:47: warning: implicit declaration of function `mmcif_get_string'
Bio/PDB/mmCIF/MMCIFlexmodule.c: At top level:
Bio/PDB/mmCIF/MMCIFlexmodule.c:65: warning: function declaration isn't a prototype
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -IBio -I/usr/include/python2.3 -c Bio/PDB/mmCIF/lex.yy.c -o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/lex.yy.o
mmcif.lex:52: warning: function declaration isn't a prototype
lex.yy.c:1046: warning: `yyunput' defined but not used
gcc -pthread -shared build/temp.linux-i686-2.3/Bio/PDB/mmCIF/lex.yy.o build/temp.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlexmodule.o -lfl -o build/lib.linux-i686-2.3/Bio/PDB/mmCIF/MMCIFlex.so
/usr/bin/ld: cannot find -lfl
collect2: ld returned 1 exit status
error: command 'gcc' failed with exit status 1

So I think there has to be a library 'fl' whIch has to be installed.
But I don't know which librarh it is. Has someone succeeded in
installing this software on a debian system?

Thanks, 

Pieter

From pieter at laeremans.org  Tue Mar 30 02:54:41 2004
From: pieter at laeremans.org (Pieter Laeremans)
Date: Tue Mar 30 03:00:00 2004
Subject: [BioPython] Installation problem on debian
In-Reply-To: <200403300928.06024.thamelry@binf.ku.dk> (Thomas Hamelryck's
	message of "Tue, 30 Mar 2004 09:28:06 +0200")
References: <877jx3b42l.fsf@hades.kotnet.org>
	<200403300928.06024.thamelry@binf.ku.dk>
Message-ID: <87lllidb26.fsf@laeremans.org>

Thomas Hamelryck  writes:

>
> Dag Pieter,
>
> The missing library is flex, the GNU version of lex. 
> Alternatively, you can comment out the MMCIF lines 
> in setup.py if you do not need it...
>
> Best regards,
>

Thank you very much!
It does work now.

kind regards,

Pieter
From thamelry at binf.ku.dk  Tue Mar 30 02:28:06 2004
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Tue Mar 30 08:23:11 2004
Subject: [BioPython] Installation problem on debian
In-Reply-To: <877jx3b42l.fsf@hades.kotnet.org>
References: <877jx3b42l.fsf@hades.kotnet.org>
Message-ID: <200403300928.06024.thamelry@binf.ku.dk>

On Tuesday 30 March 2004 01:56, Pieter Laeremans wrote:

[knip]

> So I think there has to be a library 'fl' whIch has to be installed.
> But I don't know which librarh it is. Has someone succeeded in
> installing this software on a debian system?

Dag Pieter,

The missing library is flex, the GNU version of lex. 
Alternatively, you can comment out the MMCIF lines 
in setup.py if you do not need it...

Best regards,

---
Thomas Hamelryck
Bioinformatik centret      
Universitetsparken 15     
Bygning 10                 
DK-2100 K?benhavn ?
Denmark
http://www.binf.ku.dk/users/thamelry/

From M.BATES.ALANN at netscape.net  Tue Mar 30 15:34:30 2004
From: M.BATES.ALANN at netscape.net (M.BATES.ALANN@netscape.net)
Date: Tue Mar 30 15:38:55 2004
Subject: [BioPython] REPLY BACK
Message-ID: <20040330143446.SM00238@netscape.net>

Dear Friend,

As you read this, I don't want you to feel sorry for me, because, I believe everyone will die someday. 
My name is BATES ALAN a merchant in Dubai, in the U.A.E.I have been diagnosed with Esophageal cancer.
It has defiled all forms of medical treatment, and right now I have only about a few months to live, according to medical experts. 
I have not particularly lived my life so well, as I never really cared for anyone(not even myself)but my business. Though I am very rich, I was never generous, I was always hostile to people and only focused on my business as that was the only thing I cared for. But now I regret all this as I now know that there is more to life than just wanting to have or make all the money in the world. 
I believe when God gives me a second chance to come to this world I would live my life a different way from how I have lived it. Now that God has called me, I have willed and given most of my property and assets to my immediate and extended family members as well as a few close friends. 
I want God to be merciful to me and accept my soul so, I have decided to give alms to charity 
organizations, as I want this to be one of the last good deeds I do on earth. So far, I have distributed money to some charity organizations in the U.A.E, Algeria and Malaysia. 
Now that my health has deteriorated so badly, I cannot do this myself anymore. I once asked members of my family to close one of my accounts and distribute the money which I have there to charity organization in Bulgaria and Pakistan, they refused and kept the money to themselves. Hence, I do not trust them anymore, as they seem not to be contended with what I have left 
for them. 
The last of my money which no one knows of is the huge cash deposit of eighteen million dollars $18,000,000,00 that I have with a finance/Security Company abroad. I will want you to help me collect this deposit and dispatched it to charity organizations.
I have set aside 10% for you and for your time.

God be with you. 

BATES ALAN

From chapmanb at uga.edu  Tue Mar 30 19:17:00 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Tue Mar 30 19:27:54 2004
Subject: [BioPython] PHI-BLAST support?
In-Reply-To: <20040325110629.GB6048@svartfuru.ii.uib.no>
References: <20040325110629.GB6048@svartfuru.ii.uib.no>
Message-ID: <20040331001700.GF29401@evostick.agtec.uga.edu>

Hi P?l;

> I'm wondering whether BioPython supports doing PHI-BLAST searches (locally), 
>
> btw, here's the command I issue to run PHI-BLAST from command-line:
> 
>     blastpgp -i [infile] -k [patternfile] -p patseedp

Yes, we do support that. If you had your infile in a variable
'input_file' and your patternfile in a variable 'pattern_file',
then you could do the search against, say, a local swissprot
database with:

from Bio.Blast import NCBIStandalone

result_handle, error_handle = NCBIStandalone.blastpgp(
        "/usr/local/bin/blastpgp", "swissprot", input_file,
        program = "patseedp", hit_infile = pattern_file)

The variable result_handle contains the output of this run, and
error_handle any errors that may have occurred.

> and whether parsing of PHI-BLAST output information such as pattern positions
> is supported (see below)?

I don't believe the BLAST parser currently supports output from
PHI-BLAST searches. We'd certainly accept contributions towards this
goal.

Hope this helps!
Brad
From chapmanb at uga.edu  Tue Mar 30 19:24:44 2004
From: chapmanb at uga.edu (Brad Chapman)
Date: Tue Mar 30 19:35:38 2004
Subject: [BioPython] Martel Question
In-Reply-To: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca>
References: <5.2.0.9.0.20040325232407.00b28650@pop.videotron.ca>
Message-ID: <20040331002444.GG29401@evostick.agtec.uga.edu>

Hi Peter;

> I have just built a parser for Quantarray ... but not with Martel. 
> This parser is built with the typical scanner consumer model philosophy, 
> built on a state machine that will handle the quantarray output files. This 
> parser was not built to load anything into the memory. 
[...]
> I will be reading and I need something that can 
> load into a Quantarray Record object, however I was a little worried about 
> the Record sizes. There is only 1 record per file, which might be the 
> saving grace. When I was parsing Genbank genomic files (many megs), the 
> Genbank parser was slowing to a crawl (and required piles of memory); 1 
> genomic record per file.

I know there were some speedups done on the GenBank parser over
recent releases. Nothing related specifically Martel, but rather to
some of my Python code which utilizes it. Have you tried it lately
on your files and machines and found it to be especially slow? But
yes, we haven't done any work on making big records not be stored in
memory.

> I would like to know if Martel scales to processing 5mb Records at a time, 
> if the entire file is  is in memory? Has Martel been improved over the last 
> few months in the regard ... I may have a need to parse large Genbank NT 
> records again.

There haven't been any specific changes to Martel -- if you are
basing the memory problems soley on the GenBank parser I know a
number of parts of that were written badly (by myself) and have been
attempted to fixed up.

> In Dalke's Martel paper it reads "Similarly, it should be possible to read 
> data from the input stream only when required, so that overall memory 
> footprint stays low. " is that still to be done ?

Nothing drastic has happened by Andrew to Martel in the last few
months so I assume so. He could probably give a better answer.

Yeah, so sorry but my answer sums up to -- I'm not sure how it will
act, I guess you'll have to try and see.

>From my own experience using the new Fasta parser (which uses
Martel) -- it works quite well on large chromosome sized FASTA
sequences on my machine (nothing fancy, just a standard desktop).

Hope this answer helps some, sorry I can't be more specific.
Brad
From karin.lagesen at labmed.uio.no  Wed Mar 31 11:43:54 2004
From: karin.lagesen at labmed.uio.no (Karin Lagesen)
Date: Wed Mar 31 11:49:08 2004
Subject: [BioPython] error with Fasta.Record?
Message-ID: <20040331164354.GA9655@uracil.uio.no>

I use the following code to read in a fasta file:

    genes = quick_FASTA_reader(geneFile)
    genelist = {}
    rec = Fasta.Record()
    iterator = 10001
    for entry in genes:
        g = ecoligene.EcoliGene(entry)
        oname = os.path.join(over300, str(iterator))
        if dofiles:
            rec.title, rec.sequence = entry
            print iterator, rec.title, rec.sequence
            ofile = open(oname, 'w')
            ofile.write(str(rec))
            ofile.close()

I do this with a test file:

adenine:18:38> cat /med/adenine/u2/projects/locator/gard/testfile
>1_dapB_to_carA_29196_29650
gtctataagtgccaaaaattacatgttttgtcttctgtttttgttgttttaatgtaaatt
ttgaccatttggtccacttttttctgctcgtttttatttcatgcaatc
>2_caiT_to_fixA_41932_42366
aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct
gtattct
>3_caiT_to_fixA_41932_42366
aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct
gtattctgtgattggtatcacatttttgtttcgggtgaatagagggcgttttttcgttaa
t
>4_caiT_to_fixA_41932_42366
aattattattaacctcgtggacgcgttaatggctaactcataatgggtattcaataagct
gtattctgtgattggtatcacatttttgtttcgggtgaatagagggcgttttttcgttaa
ttttgattaataatcagtttgttatgctctgttgtgagtaaaaaataacatctgac
>5_fruR_to_yabB_89033_89633
gcttcgcacgttggacgtaaaataaacaacgctgatattagccgtaaacatcgggttttt
tacctcggtatgccttgtgac
>6_fruR_to_yabB_89033_89633
aaacaacgctgatattagccgtaaacatcgggttttttacctcggtatgccttgtgac
>7_aroP_to_pdhR_121552_122091
gtttacatcaaagaagtttgaattgttacaaaaagacttccgtcagatcaagaataatgg
tatg
adenine:18:38>

And the files I get look like this:

adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10001
>1_dapB_to_carA_29196_29650
GTCTATAAGTGCCAAAAATTACATGTTTTGTCTTCTGTTTTTGTTGTTTTAATGTAAATT
adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10002
>2_caiT_to_fixA_41932_42366
AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT
adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10003
>3_caiT_to_fixA_41932_42366
AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT
GTATTCTGTGATTGGTATCACATTTTTGTTTCGGGTGAATAGAGGGCGTTTTTTCGTTAA
adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10004
>4_caiT_to_fixA_41932_42366
AATTATTATTAACCTCGTGGACGCGTTAATGGCTAACTCATAATGGGTATTCAATAAGCT
GTATTCTGTGATTGGTATCACATTTTTGTTTCGGGTGAATAGAGGGCGTTTTTTCGTTAA
adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10005
>5_fruR_to_yabB_89033_89633
GCTTCGCACGTTGGACGTAAAATAAACAACGCTGATATTAGCCGTAAACATCGGGTTTTT
adenine:18:38> cat /med/adenine/u2/projects/locator/gard/singles/10006
>6_fruR_to_yabB_89033_89633
adenine:18:38> cat /med/adenine/u2/projects/locator/gard/singles/10007
>7_aroP_to_pdhR_121552_122091
GTTTACATCAAAGAAGTTTGAATTGTTACAAAAAGACTTCCGTCAGATCAAGAATAATGG
adenine:18:38>

I try printing the rec object to test if the sequences are read in
correctly, and they are. Thus it seems to be a problem with writing
this object to file.

Is this something I do wrong, or is it something else?

Karin
-- 
Karin Lagesen, PhD student
karin.lagesen@labmed.uio.no