From biopyte at yahoo.de  Mon Jan  2 10:34:18 2006
From: biopyte at yahoo.de (Hans Meier)
Date: Mon Jan  2 10:38:46 2006
Subject: [BioPython] answer to mail from sebastian bassi on 2005Dec31 
Message-ID: <20060102153418.92730.qmail@web26303.mail.ukl.yahoo.com>

Dear Sebastian,
 
  I'm answering your following E-Mail from 2005 Dec 31:
 
> Your computer is not underpowered and the file is not so large, so it
> should not hangup. Could you provide code for us to check it? (and the
> datafile, you should upload it to a ftp/web server if the data is
> public).
 Sorry, but I couldn't find out how to answer it within the thread.
 Could you tell me how to do that, please?
 
 
 I don't believe that the problem is specific to the file.
 Anyway, the data file is
 
 ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12/NC_000913.gbk
 
 But  watch  out. You have to substitute 5" by 5' in the field shown below.
 Otherwise the file will not parse at all.
 
/note="2'-(5"-phosphoribosyl)-3'-dephospho-CoA
transferase; holo-citrate lyase synthase; CitG forms the
prosthetic group precursor
2'-(5"-triphosphoribosyl)-3'-dephospho-CoA which is then
transferred to apo-ACP by CitX to produce holo-ACP and
pyrophosphate;
go_process: protein modification [goid 0006464]"
  
 Best regards, Harald
 
 
---------------------------------
Telefonieren Sie ohne weitere Kosten mit Ihren Freunden von PC zu PC!
Jetzt Yahoo! Messenger installieren!
From biopyte at yahoo.de  Mon Jan  2 11:33:49 2006
From: biopyte at yahoo.de (Hans Meier)
Date: Mon Jan  2 11:37:57 2006
Subject: [BioPython] Sorry,
	one more time: extract data from a large .gbk file
Message-ID: <20060102163349.57644.qmail@web26305.mail.ukl.yahoo.com>

Dear friends,
 
 I apologize for bothering you once more with this.
 But maybe we can make it now clear.
 All I want to do is extract data from a whole genome .gbk file on my disk. 
 The file has about 5000(!) entries like the one shown below.
 All I want to do is:
 
 Give me the  protein sequence (="/translation) (or whatever)
 of gene (="/gene") soandso. 
 
 Speed matters.
 
 Though I believe I'm not a total dummy in programming
 and I tried several approaches taken from the web
 I could not program this so that the request is finished
 within a reasonable time or without crushing my box 
 (P3,700MHz,256MB)
 
 Since this is an important question for me but 
 I don't want to bother you with this any further,
 maybe someone could just post a code snippet
 how to accomplish this trivial(?) task?
 
 
 Thanks a lot for all your work and your help, Harald
 
 
 ###### a typical .gbk entry ###########
  gene            94650..96008
                      /gene="murF"
                      /locus_tag="b0086"
                      /note="synonyms: mra, EG10622, b0086"
                      /db_xref="GeneID:944813"
 CDS             94650..96008
                      /gene="murF"
                      /locus_tag="b0086"
                      /EC_number="6.3.2.15"
                      /function="enzyme; Murein sacculus, peptidoglycan"
                      /note="go_component: cytoplasm [goid 0005737];
                      go_process: peptidoglycan biosynthesis [goid 0009252];
                      go_process: peptidoglycan metabolism [goid 0000270]"
                      /codon_start=1
                      /transl_table=11
                      /product="D-alanine:D-alanine-adding enzyme"
                      /protein_id="NP_414628.1"
                      /db_xref="ASAP:313"
                      /db_xref="GI:16128079"
                      /db_xref="GeneID:944813"
                      /translation="MISVTLSQLTDILNGELQGADITLDAVTTDTRKLTPGCLFVALK
 GERFDAHDFADQAKAGGAGALLVSRPLDIDLPQLIVKDTRLAFGELAAWVRQQVPARV
 VALTGSSGKTSVKEMTAAILSQCGNTLYTAGNLNNDIGVPMTLLRLTPEYDYAVIELG             ANHQGEIAWTVSLTRPEAALVNNLAAAHLEGFGSLAGVAKAKGEIFSGLPENGIAIMN  ADNNDWLNWQSVIGSRKVWRFSPNAANSDFTATNIHVTSHGTEFTLQTPTGSVDVLLP LPGRHNIANALAAAALSMSVGATLDAIKAGLANLKAVPGRLFPIQLAENQLLLDDSYN
 ANVGSMTAAVQVLAEMPGYRVLVVGDMAELGAESEACHVQVGEAAKAAGIDRVLSVGK      QSHAISTASGVGEHFADKTALITRLKLLIAEQQVITILVKGSRSAAMEEVVRALQENG
 TC"
 ########end of the example#####################
 

---------------------------------
Telefonieren Sie ohne weitere Kosten mit Ihren Freunden von PC zu PC!
Jetzt Yahoo! Messenger installieren!
From cariaso at yahoo.com  Mon Jan  2 17:05:43 2006
From: cariaso at yahoo.com (Michael Cariaso)
Date: Mon Jan  2 17:09:47 2006
Subject: [BioPython] Sorry, one more time: extract data from a large .gbk
	file
In-Reply-To: <20060102163349.57644.qmail@web26305.mail.ukl.yahoo.com>
References: <20060102163349.57644.qmail@web26305.mail.ukl.yahoo.com>
Message-ID: <43B9A3B7.9030203@yahoo.com>

Until we see some code, I can't be sure. But what you are doing seems 
well within BioPython's abilities. I wonder if perhaps your code looks 
like this:

alist = readWholeFile(filename)
for record in alist:
     process(record)


which is what is causing the performance problems. If your code does 
resemble the above, try to change it so that it looks more like:

recordIterator = createIterator(filename)
for record in recordIterator:
     process(record)


or

fileobj = open(filename)
done = false
while not done:
     record = readNextRecord(fileobj)
     if record:
         process(record)
     else:
         done = true


The first form reads the whole genbank file into memory, and might crush 
your machine. The second form reads in one record at a time, and 
processes it. This requires far less memory.


Hans Meier wrote:
> Dear friends,
>  
>  I apologize for bothering you once more with this.
>  But maybe we can make it now clear.
>  All I want to do is extract data from a whole genome .gbk file on my disk. 
>  The file has about 5000(!) entries like the one shown below.
>  All I want to do is:
>  
>  Give me the  protein sequence (="/translation) (or whatever)
>  of gene (="/gene") soandso. 
>  
>  Speed matters.
>  
>  Though I believe I'm not a total dummy in programming
>  and I tried several approaches taken from the web
>  I could not program this so that the request is finished
>  within a reasonable time or without crushing my box 
>  (P3,700MHz,256MB)
>  
>  Since this is an important question for me but 
>  I don't want to bother you with this any further,
>  maybe someone could just post a code snippet
>  how to accomplish this trivial(?) task?
>  
>  
>  Thanks a lot for all your work and your help, Harald
>  
>  
>  ###### a typical .gbk entry ###########
>   gene            94650..96008
>                       /gene="murF"
>                       /locus_tag="b0086"
>                       /note="synonyms: mra, EG10622, b0086"
>                       /db_xref="GeneID:944813"
>  CDS             94650..96008
>                       /gene="murF"
>                       /locus_tag="b0086"
>                       /EC_number="6.3.2.15"
>                       /function="enzyme; Murein sacculus, peptidoglycan"
>                       /note="go_component: cytoplasm [goid 0005737];
>                       go_process: peptidoglycan biosynthesis [goid 0009252];
>                       go_process: peptidoglycan metabolism [goid 0000270]"
>                       /codon_start=1
>                       /transl_table=11
>                       /product="D-alanine:D-alanine-adding enzyme"
>                       /protein_id="NP_414628.1"
>                       /db_xref="ASAP:313"
>                       /db_xref="GI:16128079"
>                       /db_xref="GeneID:944813"
>                       /translation="MISVTLSQLTDILNGELQGADITLDAVTTDTRKLTPGCLFVALK
>  GERFDAHDFADQAKAGGAGALLVSRPLDIDLPQLIVKDTRLAFGELAAWVRQQVPARV
>  VALTGSSGKTSVKEMTAAILSQCGNTLYTAGNLNNDIGVPMTLLRLTPEYDYAVIELG             ANHQGEIAWTVSLTRPEAALVNNLAAAHLEGFGSLAGVAKAKGEIFSGLPENGIAIMN  ADNNDWLNWQSVIGSRKVWRFSPNAANSDFTATNIHVTSHGTEFTLQTPTGSVDVLLP LPGRHNIANALAAAALSMSVGATLDAIKAGLANLKAVPGRLFPIQLAENQLLLDDSYN
>  ANVGSMTAAVQVLAEMPGYRVLVVGDMAELGAESEACHVQVGEAAKAAGIDRVLSVGK      QSHAISTASGVGEHFADKTALITRLKLLIAEQQVITILVKGSRSAAMEEVVRALQENG
>  TC"
>  ########end of the example#####################
>  
> 
> 		
> ---------------------------------
> Telefonieren Sie ohne weitere Kosten mit Ihren Freunden von PC zu PC!
> Jetzt Yahoo! Messenger installieren!
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 

From srini_iyyer_bio at yahoo.com  Tue Jan  3 11:47:08 2006
From: srini_iyyer_bio at yahoo.com (Srinivas Iyyer)
Date: Tue Jan  3 11:50:30 2006
Subject: [BioPython] E-utils on NCBI site
In-Reply-To: <20060102163349.57644.qmail@web26305.mail.ukl.yahoo.com>
Message-ID: <20060103164708.3912.qmail@web31614.mail.mud.yahoo.com>

Dear group, 
 there are 2 questions that are important to my
research.  I hope group members who have tried would
be willing to help me. 

1. How do I use E-utils using python, bio-python
modules. Are there any examples?

2. here is the specific that I want to do. I am
interested in downloading all the human affymetrix
data submitted to GEO site. 

for example:GSE2152_RAW_tar is the raw CEL file
package submitted to GEO by the authors.  It is in the
directory:
ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/raw_data/series/GSE2152/GSE2152_RAW.tar

However, checking manually for every dataset that is
submitted to ncbi is pains taking procedure. So I want
to be able to write a script that would check for
every dataset submitted to GEO. After that I want to
filter human tar files. 

Has any one did this before.  could you please help
me. 

Thanks
Srini


__________________________________ 
Yahoo! for Good - Make a difference this year. 
http://brand.yahoo.com/cybergivingweek2005/
From saccenti at cerm.unifi.it  Tue Jan  3 12:24:43 2006
From: saccenti at cerm.unifi.it (saccenti@cerm.unifi.it)
Date: Tue Jan  3 12:34:17 2006
Subject: [BioPython] E-utils on NCBI site
In-Reply-To: <20060103164708.3912.qmail@web31614.mail.mud.yahoo.com>
References: <20060102163349.57644.qmail@web26305.mail.ukl.yahoo.com>
	<20060103164708.3912.qmail@web31614.mail.mud.yahoo.com>
Message-ID: <1345.155.52.120.87.1136309083.squirrel@alpha.cerm.unifi.it>


> 1. How do I use E-utils using python, bio-python
> modules. Are there any examples?

I do not know if Biopython has a E-utils module. I had to deal with E-utils
to jump from database code to another in NCBI databsae. I read instructions
in NCBI E-utils link then a I wrote my own consumer to open different web
pages and get different codes parsing the simple html code
>
> 2. here is the ......

My be you are able to get a complete list of all GEO files in NCBI databes.
When you have a list you can use standard commands of ftp module to
connect to the ncbi server and download what you want.
To filter human data maybe you must find a regularity in files name if
possible to discriminate human files.
Maybe inside the file there will be a flag. You can download all tar files
and then read them one after the other deleting non human file. Python has
has an util  to read into zipped files without have to open them before,
but I do not remeber if it works also with tar files.

Hope It can helps
edoardo

Maybe this is not elgant but should be fast to write


From biopython at maubp.freeserve.co.uk  Tue Jan  3 13:57:01 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue Jan  3 13:59:22 2006
Subject: [BioPython] Large GenBank files: impossible to handle?
In-Reply-To: <20051231025437.5979.qmail@web26304.mail.ukl.yahoo.com>
References: <20051231025437.5979.qmail@web26304.mail.ukl.yahoo.com>
Message-ID: <43BAC8FD.8030703@maubp.freeserve.co.uk>

Hans Meier wrote:
>  Dear friends,
>  
>  I tried to handle a .gbk file of 4,7MB in size
>  with a "700MHz, Pentium III, 256 MB RAM"-box.

This might work with the current release (BioPython 1.41) but will use a 
lot of memory - I would guess about 250MB, which is all your machine 
has.  This is a limitation of the old Martel based parser.

You should be able to install the new GenBank parser (from CVS) which I 
wrote specifically due to problems with large GenBank files.

Ask if you need help with this - and are you on Windows or Linux?

See also bug 1747, http://bugzilla.open-bio.org/show_bug.cgi?id=1747

>  Parsing with "RecordParser" and indexing with "index_file"
>  crushed the machine in both cases, I had to reboot
>  (what happens not so often with Debian).

I would avoid using index_file on large GenBank files - this still uses 
Martel and can be rather slow.  Also, I strongly suspect your files have 
a single record each (i.e. only one LOCUS line) in which case there is 
no need to index them.

>  My final goal is to access the .gbk file somehow like a database.

Have you tried using the FeatureParser and then accessing the .features 
list property of the record?

>  The alternative would be to use .fna,.faa and .fnn files 
>  and write my own methods. Or stuff all the data in a SQL-database.
>  But I still hope that Biopython could help.
>  
>  Before I spend more time on this, I'd like to ask you:
>  
>  With the Biopython tools, is it possible to handle
>  .gbk files of about 5MB in a reasonable time with 
>  a low- to middle-class desktop computer? If so, how?

Using the latest BioPython code it should be easy (see above).

Also, these two recent examples might be handy:

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank/

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank2fasta/

Peter

From idoerg at gmail.com  Wed Jan  4 00:46:53 2006
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed Jan  4 01:10:58 2006
Subject: [BioPython] E-utils on NCBI site
In-Reply-To: <20060103164708.3912.qmail@web31614.mail.mud.yahoo.com>
References: <20060103164708.3912.qmail@web31614.mail.mud.yahoo.com>
Message-ID: <43BB614D.1000808@burnham.org>

Bio.EUtils is the module you are looking for. Look in the tests 
subdirectory for some examples.

Cheers,

Iddo


Srinivas Iyyer wrote:

>Dear group, 
> there are 2 questions that are important to my
>research.  I hope group members who have tried would
>be willing to help me. 
>
>1. How do I use E-utils using python, bio-python
>modules. Are there any examples?
>
>2. here is the specific that I want to do. I am
>interested in downloading all the human affymetrix
>data submitted to GEO site. 
>
>for example:GSE2152_RAW_tar is the raw CEL file
>package submitted to GEO by the authors.  It is in the
>directory:
>ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/raw_data/series/GSE2152/GSE2152_RAW.tar
>
>However, checking manually for every dataset that is
>submitted to ncbi is pains taking procedure. So I want
>to be able to write a script that would check for
>every dataset submitted to GEO. After that I want to
>filter human tar files. 
>
>Has any one did this before.  could you please help
>me. 
>
>Thanks
>Srini
>
>
>	
>		
>__________________________________ 
>Yahoo! for Good - Make a difference this year. 
>http://brand.yahoo.com/cybergivingweek2005/
>_______________________________________________
>BioPython mailing list  -  BioPython@biopython.org
>http://biopython.org/mailman/listinfo/biopython
>
>
>  
>


-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://ffas.ljcrf.edu/~iddo

From biopython at maubp.freeserve.co.uk  Fri Jan  6 17:44:34 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri Jan  6 17:41:11 2006
Subject: [BioPython] parsing error with GenBank.RecordParser
In-Reply-To: <20051230204756.25343.qmail@web26311.mail.ukl.yahoo.com>
References: <20051230204756.25343.qmail@web26311.mail.ukl.yahoo.com>
Message-ID: <43BEF2D2.1090600@maubp.freeserve.co.uk>

Hans Meier wrote:
>  Hi,
>  
>  parsing of NC_000913.gbk does not work.
>  
>  Greets, Harald

Sorry I didn't reply earlier, I was away for the New Year...

 From the trackback you provided, I would guess that the old GenBank 
parser (included with BioPython 1.41) didn't like the double quotes in 
that note:

/note="2'-(5"-phosphoribosyl)-3'-dephospho-CoA...

Interestingly enough, in the most recent version of NC_000913.gbk dated 
Dec 2005 (check the first line, starting LOCUS), the NCBI have switched 
the double quotes to single quotes in the note (gene citX):

/note="2'-(5'-phosphoribosyl)-3'-dephospho-CoA...

If you download this revised NC_000913.gbk the problem should go away 
(but note that as Escherichia coli genbank file is 11 MB you might be 
better off updating the GenBank parser).

The new GenBank parser (available in CVS now) should cope with either 
version of the file (and should use less memory, and be a lot faster too).

To try this, you just need to replace the file 
/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py with the latest 
version (but make a backup of the old one just in case).

Peter

From ziemys.1 at osu.edu  Mon Jan  9 09:30:26 2006
From: ziemys.1 at osu.edu (ARTURAS ZIEMYS)
Date: Mon Jan  9 09:45:15 2006
Subject: [BioPython] NeighborSearch (No module named _CKDTree)
Message-ID: <214350b2144a7a.2144a7a214350b@osu.edu>

HI, 

I've got a problem with 'NeighborSearch' (python 2.4.2, Biopython 1.41, Windows XP-SP2). I need to find the nearest atoms for my structure, but I can not use 'NeighborSearch'. There something wrong in distribution ? 

For example , when I tray to import 'NeighborSearch' in a shell : 

IDLE 1.1.2 ==== No Subprocess ==== 
>>> from Bio.PDB.NeighborSearch import NeighborSearch 
Traceback (most recent call last): 
File "<pyshell#0>", line 1, in ? 
from Bio.PDB.NeighborSearch import NeighborSearch 
File "C:\Python24\Lib\site-packages\Bio\PDB\NeighborSearch.py", line 3, in ? 
from Bio.KDTree import * 
File "C:\Python24\Lib\site-packages\Bio\KDTree\__init__.py", line 10, in ? 
from KDTree import KDTree 
File "C:\Python24\Lib\site-packages\Bio\KDTree\KDTree.py", line 17, in ? 
import CKDTree 
File "C:\Python24\Lib\site-packages\Bio\KDTree\CKDTree.py", line 4, in ? 
import _CKDTree 
ImportError: No module named _CKDTree 
>>> 

With best 
Arturas Z. 

From idoerg at gmail.com  Wed Jan 11 00:13:36 2006
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed Jan 11 00:17:27 2006
Subject: [BioPython] BLAST XML problem?
Message-ID: <43C49400.1070802@burnham.org>

Not sure what we're doing wrong here...

Using the cookbook example, biopython 1.41, python 2.2 (our Zope needs 
that Python version, sorry):

from Bio.Blast import NCBIXML

b_parser = NCBIXML.BlastParser()
b_record = b_parser.parse(blast_out)


Breaks on "Alejandro Sch?efer",  in the XML <BlastOutput_reference> tag. 
The ? seems to cause the error. Replace it with a regular "a" everything 
is hunky-dory

Huh?

[idoerg@hotdog:~/results/jafa]> ./try_blast.py star_human.fasta
/home/idoerg/biopy_cvs/biopython/Bio/Blast/NCBIWWW.py:1070: UserWarning: 
qblast works only with blastn and blastp for now.
  warnings.warn("qblast works only with blastn and blastp for now.")
Traceback (most recent call last):
  File "./try_blast.py", line 19, in ?
    b_record = b_parser.parse(open('my_blast.out'))
  File "/home/idoerg/biopy_cvs/biopython/Bio/Blast/NCBIXML.py", line 
112, in parse
    self._parser.parse(handler)
  File "/usr/lib/python2.3/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.3/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.3/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python2.3/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: my_blast.out:6:81: not 
well-formed (invalid token)


Iddo

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From idoerg at gmail.com  Wed Jan 11 00:27:16 2006
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed Jan 11 01:21:53 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <43C49400.1070802@burnham.org>
References: <43C49400.1070802@burnham.org>
Message-ID: <43C49734.7050409@burnham.org>

Slight correction to my previous email: using biopython from CVS, and 
python 2.3 as you can see from the stack dump

Iddo Friedberg wrote:

> Not sure what we're doing wrong here...
>
> Using the cookbook example, biopython 1.41, python 2.2 (our Zope needs 
> that Python version, sorry):
>
> from Bio.Blast import NCBIXML
>
> b_parser = NCBIXML.BlastParser()
> b_record = b_parser.parse(blast_out)
>
>
> Breaks on "Alejandro Sch?efer",  in the XML <BlastOutput_reference> 
> tag. The ? seems to cause the error. Replace it with a regular "a" 
> everything is hunky-dory
>
> Huh?
>
> [idoerg@hotdog:~/results/jafa]> ./try_blast.py star_human.fasta
> /home/idoerg/biopy_cvs/biopython/Bio/Blast/NCBIWWW.py:1070: 
> UserWarning: qblast works only with blastn and blastp for now.
>  warnings.warn("qblast works only with blastn and blastp for now.")
> Traceback (most recent call last):
>  File "./try_blast.py", line 19, in ?
>    b_record = b_parser.parse(open('my_blast.out'))
>  File "/home/idoerg/biopy_cvs/biopython/Bio/Blast/NCBIXML.py", line 
> 112, in parse
>    self._parser.parse(handler)
>  File "/usr/lib/python2.3/xml/sax/expatreader.py", line 107, in parse
>    xmlreader.IncrementalParser.parse(self, source)
>  File "/usr/lib/python2.3/xml/sax/xmlreader.py", line 123, in parse
>    self.feed(buffer)
>  File "/usr/lib/python2.3/xml/sax/expatreader.py", line 211, in feed
>    self._err_handler.fatalError(exc)
>  File "/usr/lib/python2.3/xml/sax/handler.py", line 38, in fatalError
>    raise exception
> xml.sax._exceptions.SAXParseException: my_blast.out:6:81: not 
> well-formed (invalid token)
>
>
>
> Iddo
>


-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From biopython at maubp.freeserve.co.uk  Wed Jan 11 06:48:33 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed Jan 11 07:22:44 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <43C49734.7050409@burnham.org>
References: <43C49400.1070802@burnham.org> <43C49734.7050409@burnham.org>
Message-ID: <43C4F091.2020408@maubp.freeserve.co.uk>

Iddo Friedberg wrote:
> Slight correction to my previous email: using biopython from CVS, and 
> python 2.3 as you can see from the stack dump
> 
> Iddo Friedberg wrote:
> 
>> Not sure what we're doing wrong here...
>>
>> Using the cookbook example, biopython 1.41, python 2.2 (our Zope needs 
>> that Python version, sorry):
>>
>> from Bio.Blast import NCBIXML
>>
>> b_parser = NCBIXML.BlastParser()
>> b_record = b_parser.parse(blast_out)
>>
>>
>> Breaks on "Alejandro Sch?efer",  in the XML <BlastOutput_reference> 
>> tag. The ? seems to cause the error. Replace it with a regular "a" 
>> everything is hunky-dory

Is the lower-case a with umlaut in the XML file as ?, or using an 
encoding like &auml; or &#228; instead? (ampersand characters, aka 
character entities)

Also, what character set does the blast_out XML file claim to be in? 
And does that fit with the inclusion of an a-umlaut as a character?

It may be the NCBI's fault for producing a bad XML file...

Peter

From idoerg at burnham.org  Wed Jan 11 12:08:01 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed Jan 11 12:17:13 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <43C4F091.2020408@maubp.freeserve.co.uk>
References: <43C49400.1070802@burnham.org> <43C49734.7050409@burnham.org>
	<43C4F091.2020408@maubp.freeserve.co.uk>
Message-ID: <43C53B71.5080705@burnham.org>

Peter wrote:

> Iddo Friedberg wrote:
>
>> Slight correction to my previous email: using biopython from CVS, and 
>> python 2.3 as you can see from the stack dump
>>
>> Iddo Friedberg wrote:
>>
>>> Not sure what we're doing wrong here...
>>>
>>> Using the cookbook example, biopython 1.41, python 2.2 (our Zope 
>>> needs that Python version, sorry):
>>>
>>> from Bio.Blast import NCBIXML
>>>
>>> b_parser = NCBIXML.BlastParser()
>>> b_record = b_parser.parse(blast_out)
>>>
>>>
>>> Breaks on "Alejandro Sch?ffer",  in the XML <BlastOutput_reference> 
>>> tag. The ? seems to cause the error. Replace it with a regular "a" 
>>> everything is hunky-dory
>>
>
> Is the lower-case a with umlaut in the XML file as ?, or using an 
> encoding like &auml; or &#228; instead? (ampersand characters, aka 
> character entities)


It's an ? not a character entity.

>
> Also, what character set does the blast_out XML file claim to be in? 
> And does that fit with the inclusion of an a-umlaut as a character?


I haven't the foggiest... :)

>
> It may be the NCBI's fault for producing a bad XML file...
>

Yeah, well, I still have to deal with it :(  In any case, why is this 
cropping up now? Sch?ffer has been in NCBI for years...

The file is available at http://iddo-friedberg.org/biopy_bad_blast.xml

in case anyone wants to have a look-see.

Thanks,

Iddo


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org

From biopython at maubp.freeserve.co.uk  Wed Jan 11 13:39:46 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed Jan 11 13:43:45 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <43C53B71.5080705@burnham.org>
References: <43C49400.1070802@burnham.org> <43C49734.7050409@burnham.org>
	<43C4F091.2020408@maubp.freeserve.co.uk>
	<43C53B71.5080705@burnham.org>
Message-ID: <43C550F2.5030806@maubp.freeserve.co.uk>

>> It may be the NCBI's fault for producing a bad XML file...
> 
> Yeah, well, I still have to deal with it :(  In any case, why is this 
> cropping up now? Sch?ffer has been in NCBI for years...

I would guess because BioPython users would have parsed the plain text 
output from blast, rather than XML.

> The file is available at http://iddo-friedberg.org/biopy_bad_blast.xml
> 
> in case anyone wants to have a look-see.

The first line of the XML file could (should?) define an encoding, e.g.

<?xml version="1.0" encoding="utf-8"?>

or:

<?xml version="1.0" encoding="ISO-8859-1"?>

Instead its just:

<?xml version="1.0"?>

Short term solutions which I have just tried and got to work:

(1) Edit the offending character by hand (as you did)
(2) Specify encoding="ISO-8859-1" by editing the first line by hand
(2) Covert the file to unicode (doubles the size)

BTW - Are you getting the file from standalone blast, or the NCBI website?

Unless a local XML expert steps up, would you like to contact the NCBI 
on this issue?

Peter

From sbassi at gmail.com  Wed Jan 11 14:01:07 2006
From: sbassi at gmail.com (Sebastian Bassi)
Date: Wed Jan 11 14:25:40 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <43C550F2.5030806@maubp.freeserve.co.uk>
References: <43C49400.1070802@burnham.org> <43C49734.7050409@burnham.org>
	<43C4F091.2020408@maubp.freeserve.co.uk>
	<43C53B71.5080705@burnham.org>
	<43C550F2.5030806@maubp.freeserve.co.uk>
Message-ID: <b43bf2080601111101u329f0acfu3ab80171b51feb95@mail.gmail.com>

On 1/11/06, Peter <biopython@maubp.freeserve.co.uk> wrote:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> Instead its just:
> <?xml version="1.0"?>
> Short term solutions which I have just tried and got to work:
> (1) Edit the offending character by hand (as you did)
> (2) Specify encoding="ISO-8859-1" by editing the first line by hand
> (2) Covert the file to unicode (doubles the size)

I have a 4th solution, that doesn't involve XML editing, so it will
"fix" the problem for other users:
4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
there is no encoding information.
I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
file. If this is OK (from the point of view of the XML standard), then
the parser should be corrected, if not, according to the standard, the
file should be rejected for non compliance (this is not HTML where the
browser client can accept and correct invalid code, the specifications
states that XML should validate before being used).

--
<a href="http://www.spreadfirefox.com/?q=affiliates&id=24672&t=1">La
web sin popups ni spyware: Usa Firefox en lugar de Internet
Explorer</a>

From tharder at burnham.org  Wed Jan 11 15:07:30 2006
From: tharder at burnham.org (Tim Harder)
Date: Wed Jan 11 15:07:08 2006
Subject: [Fwd: Re: [BioPython] BLAST XML problem?]
Message-ID: <43C56582.5050809@burnham.org>

|XMLDecl| ::= |'<?xml' VersionInfo EncodingDecl ? SDDecl ? S ? '?>'

(source http://www.w3.org/TR/2004/REC-xml-20040204/#NT-XMLDecl)

As far as I understand that definition, the encoding attribute is 
optional, so the NCBI File should be ok from the XML point of view.
Anyway, how can I tell SAX which encoding table to use, beside editing 
the XML file itself?

Tim


Sebastian Bassi wrote:

>On 1/11/06, Peter <biopython@maubp.freeserve.co.uk> wrote:
>  
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>>    
>>
>
>I have a 4th solution, that doesn't involve XML editing, so it will
>"fix" the problem for other users:
>4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
>there is no encoding information.
>I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
>file. If this is OK (from the point of view of the XML standard), then
>the parser should be corrected, if not, according to the standard, the
>file should be rejected for non compliance (this is not HTML where the
>browser client can accept and correct invalid code, the specifications
>states that XML should validate before being used).
>
>--
><a href="http://www.spreadfirefox.com/?q=affiliates&id=24672&t=1">La
>web sin popups ni spyware: Usa Firefox en lugar de Internet
>Explorer</a>
>
>_______________________________________________
>BioPython mailing list  -  BioPython@biopython.org
>http://biopython.org/mailman/listinfo/biopython
>
>
>  
>


From biopython at maubp.freeserve.co.uk  Wed Jan 11 15:13:42 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed Jan 11 15:10:30 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <b43bf2080601111101u329f0acfu3ab80171b51feb95@mail.gmail.com>
References: <43C49400.1070802@burnham.org>
	<43C49734.7050409@burnham.org>	<43C4F091.2020408@maubp.freeserve.co.uk>	<43C53B71.5080705@burnham.org>	<43C550F2.5030806@maubp.freeserve.co.uk>
	<b43bf2080601111101u329f0acfu3ab80171b51feb95@mail.gmail.com>
Message-ID: <43C566F6.2010203@maubp.freeserve.co.uk>

Sebastian Bassi wrote:
> On 1/11/06, Peter <biopython@maubp.freeserve.co.uk> wrote:
> 
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
> 
> 
> I have a 4th solution, that doesn't involve XML editing, so it will
> "fix" the problem for other users:
> 4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
> there is no encoding information.

Well yes, that did cross my mind.  I even went off to try and find out
how to do this, but failed.  Any ideas?

> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance

You sound like you know a lot more about XML than I do, would you be 
able to find out one way or the other?  This would be useful information 
for trying to get the NCBI to make a change.

Iddo's bad file is fine, according to www.xmlvalidation.com (cut and 
pasting).  The NCBI DTD files are here:

http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entity.mod.dtd
http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod.dtd

I think this does mean declaring the encoding may be optional, but this 
validation program could identify the encoding on its own.

 > (this is not HTML where the
 > browser client can accept and correct invalid code, the specifications
 > states that XML should validate before being used).

Which is good, unless you are trying to deal with bad XML produced by a 
third party.  I'm sure the NCBI will fix this, if it is their problem. 
It just might take a while.

Peter

From idoerg at gmail.com  Wed Jan 11 15:36:31 2006
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed Jan 11 17:23:42 2006
Subject: [BioPython] BLAST XML problem?
In-Reply-To: <b43bf2080601111101u329f0acfu3ab80171b51feb95@mail.gmail.com>
References: <43C49400.1070802@burnham.org>
	<43C49734.7050409@burnham.org>	<43C4F091.2020408@maubp.freeserve.co.uk>	<43C53B71.5080705@burnham.org>	<43C550F2.5030806@maubp.freeserve.co.uk>
	<b43bf2080601111101u329f0acfu3ab80171b51feb95@mail.gmail.com>
Message-ID: <43C56C4F.3080003@burnham.org>

Sebastian Bassi wrote:

>On 1/11/06, Peter <biopython@maubp.freeserve.co.uk> wrote:
>  
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>>    
>>
>
>I have a 4th solution, that doesn't involve XML editing, so it will
>"fix" the problem for other users:
>4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
>there is no encoding information.
>  
>

OK, I was actually going to do this.

I found a bit of code that will detect file encoding from the first two 
bytes. I was planning to put the return value  into the BLAST XML parser.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/363841

If this would not have worked, I would have force-plugged the  ISO-8859-1

But...

When I generated a new XML file from NCBI to test the encoding-detection 
module, the code used for the ? actually changed! Everything works now.

So... there are there  biopython fans with a (very) quick response time 
in NCBI?

Spooky...

> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance (this is not HTML where the
> browser client can accept and correct invalid code, the specifications
> states that XML should validate before being used).


I believe that the default is UTF-8, and that <?xml version="1.0"?> is 
valid.

./I

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From idoerg at gmail.com  Wed Jan 11 17:14:09 2006
From: idoerg at gmail.com (Iddo Friedberg)
Date: Wed Jan 11 20:06:14 2006
Subject: [BioPython] BLAST XML problem?
Message-ID: <43C58331.9080304@burnham.org>


Sebastian Bassi wrote:

>On 1/11/06, Peter <biopython@maubp.freeserve.co.uk> wrote:
>  
>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>>Instead its just:
>><?xml version="1.0"?>
>>Short term solutions which I have just tried and got to work:
>>(1) Edit the offending character by hand (as you did)
>>(2) Specify encoding="ISO-8859-1" by editing the first line by hand
>>(2) Covert the file to unicode (doubles the size)
>>    
>>
>
>I have a 4th solution, that doesn't involve XML editing, so it will
>"fix" the problem for other users:
>4) Change Biopython or XML parser to assume encoding = ISO-8859-1 when
>there is no encoding information.
>  
>

OK, I was actually going to do this.

I found a bit of code that will detect file encoding from the first two 
bytes. I was planning to put the return value  into the BLAST XML parser.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/363841

If this would not have worked, I would have force-plugged the  ISO-8859-1

But...

When I generated a new XML file from NCBI to test the encoding-detection 
module, the code used for the ? actually changed! Everything works now.

So... there are there  biopython fans with a (very) quick response time 
in NCBI?

Spooky...

> I wonder if <?xml version="1.0"?> is a W3C valid first line for a XML
> file. If this is OK (from the point of view of the XML standard), then
> the parser should be corrected, if not, according to the standard, the
> file should be rejected for non compliance (this is not HTML where the
> browser client can accept and correct invalid code, the specifications
> states that XML should validate before being used).


I believe that the default is UTF-8, and that <?xml version="1.0"?> is 
valid.

./I

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org


-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From karbak at gmail.com  Wed Jan 11 02:41:05 2006
From: karbak at gmail.com (K. Arun)
Date: Thu Jan 12 05:18:10 2006
Subject: [BioPython] NeighborSearch (No module named _CKDTree)
In-Reply-To: <214350b2144a7a.2144a7a214350b@osu.edu>
References: <214350b2144a7a.2144a7a214350b@osu.edu>
Message-ID: <162452a10601102341j4d64e729g333a97279d6243@mail.gmail.com>

On 1/9/06, ARTURAS ZIEMYS <ziemys.1@osu.edu> wrote:

> I've got a problem with 'NeighborSearch' (python 2.4.2, Biopython 1.41, Windows XP-SP2). I

[...]

> import _CKDTree
> ImportError: No module named _CKDTree

If you examine the setup.py used to compile Biopython, you'll find two
places where comments
indicate that the KDTree extension's compilation is turned off by
default to avoid C++ errors. If I remember correctly,  all I had to do
was uncomment those two sections and run 'python setup.py install'
again to get the module working.

-arun

From sbassi at gmail.com  Thu Jan 12 08:17:43 2006
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu Jan 12 08:40:20 2006
Subject: [BioPython] NeighborSearch (No module named _CKDTree)
In-Reply-To: <162452a10601102341j4d64e729g333a97279d6243@mail.gmail.com>
References: <214350b2144a7a.2144a7a214350b@osu.edu>
	<162452a10601102341j4d64e729g333a97279d6243@mail.gmail.com>
Message-ID: <b43bf2080601120517q4ee4dfccj94e85b52bdafef88@mail.gmail.com>

On 1/11/06, K. Arun <karbak@gmail.com> wrote:
> If you examine the setup.py used to compile Biopython, you'll find two
> places where comments
> indicate that the KDTree extension's compilation is turned off by
> default to avoid C++ errors. If I remember correctly,  all I had to do
> was uncomment those two sections and run 'python setup.py install'
> again to get the module working.

Another thing to be aware of: I had problems compiling biopython in an
AMD64 due to this KDTree extension, I guess it was not turned off in
that moment (two years ago).  I didn't try again with the KDTree turn
off since I don't have that computer, so this is just a warning if you
have a x86-64 bit computer.

--
<a href="http://www.spreadfirefox.com/?q=affiliates&id=24672&t=1">La
web sin popups ni spyware: Usa Firefox en lugar de Internet
Explorer</a>

From manuel at pinguinkiste.de  Thu Jan 12 14:50:07 2006
From: manuel at pinguinkiste.de (Manuel Prinz)
Date: Thu Jan 12 15:12:24 2006
Subject: [Fwd: Re: [BioPython] BLAST XML problem?]
In-Reply-To: <43C56582.5050809@burnham.org>
References: <43C56582.5050809@burnham.org>
Message-ID: <1137095407.4672.37.camel@woodstock>

> |XMLDecl| ::= |'<?xml' VersionInfo EncodingDecl ? SDDecl ? S ? '?>'
> 
> (source http://www.w3.org/TR/2004/REC-xml-20040204/#NT-XMLDecl)
> 
> As far as I understand that definition, the encoding attribute is 
> optional, so the NCBI File should be ok from the XML point of view.

This is not totally right. The encoding is optional, if the encoding is
proper UTF-8 (or UTF-16) or if the encoding can be obtained from a
higher instance such as mimetypes, which does not affect a file. The
standard reads this (in "4.3.3 Character Encoding in Entities"):

"In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is a fatal error for an entity
including an encoding declaration to be presented to the XML processor
in an encoding other than that named in the declaration, or for an
entity which begins with neither a Byte Order Mark nor an encoding
declaration to use an encoding other than UTF-8. Note that since ASCII
is a subset of UTF-8, ordinary ASCII entities do not strictly need an
encoding declaration."

It's also mentioned in that section that processors HAVE to know UTF-8
and UTF-16 and MAY know others. The standard further states the
following:

"It is a fatal error when an XML processor encounters an entity with an
encoding that it is unable to process. It is a fatal error if an XML
entity is determined (via default, encoding declaration, or higher-level
protocol) to be in a certain encoding but contains byte sequences that
are not legal in that encoding. Specifically, it is a fatal error if an
entity encoded in UTF-8 contains any irregular code unit sequences, as
defined in Unicode 3.1 [Unicode3]. Unless an encoding is determined by a
higher-level protocol, it is also a fatal error if an XML entity
contains no encoding declaration and its content is not legal UTF-8 or
UTF-16."

So the BioPython parser has to reject the XML file (since it is/was not
proper UTF-8 or UTF-16) to meet the standard. Auto-detecting encodings
is a nice feature but from the processors point of view only useful to
check if the declared encoding matches the real one in terms of the
standard.

> Anyway, how can I tell SAX which encoding table to use, beside editing 
> the XML file itself?

Since SAX is standard compliant AFAIK, there probably isn't any. Either
convert your files to UTF-8 or you have to declare the character
encoding. (iconv is a great tool to convert between different
encodings.)

With kind regards,
Manuel


From weisman at lydon.com  Mon Jan 16 10:50:26 2006
From: weisman at lydon.com (David Weisman)
Date: Mon Jan 16 12:44:27 2006
Subject: [BioPython] NCBIXML for multiple queries
Message-ID: <43CBC0C2.10800@lydon.com>

Hello,

I tried using NCBIXML parsing on a local blast run, in which the input had multiple
query sequences.  Blastall writes multiple xml documents to the output file, and the
SAX parser threw a SAXParseException on the second <?xml...> declaration, complaining
of junk after the document element.

I couldn't find an obvious workaround, so I wrote a python generator function that
returns a new file handle (based on a CStringIO) for each xml document in the stream.
The usage model is:

     import xmlStreamSeparator	# new

     blastInFile = open (blastInPath, "r")   # composite blast output

     x_gen=xmlStreamSeparator.getXmlDoc(blastInFile)
     x_doc=x_gen.next()
     while not xmlStreamSeparator.xmlStreamEOF(x_doc):
         iter=NCBIStandalone.Iterator(x_doc, NCBIXML.BlastParser())

         for b_rec in iter:
             process blast record...

         x_doc=x_gen.next()   # get next xml doc from stream

Any pointers to a better model?  Many thanks for any tips.

Regards,
David

From OmenkeukwuG at americanimaging.net  Mon Jan 16 13:05:27 2006
From: OmenkeukwuG at americanimaging.net (Omenkeukwu, Gregory)
Date: Mon Jan 16 13:01:01 2006
Subject: [BioPython] Blast Error
Message-ID: <2B8B6630ACA40940AABCF63158986A311CB140@XCHSRV02.DEERFIELD.AIM.local>


> I get the following error when I run the BLAST code in the tutorial. I will appreciate any help I can get on this issue. Thanks
> 
> 
> 
> Warning (from warnings module):
>   File "C:\Python24\lib\site-packages\Bio\Blast\NCBIWWW.py", line 1070
>     warnings.warn("qblast works only with blastn and blastp for now.")
> UserWarning: qblast works only with blastn and blastp for now.
> 
> Traceback (most recent call last):
>   File "C:\Python24\biopython examples\blast_example.py", line 10, in -toplevel-
>     result_handle = NCBIWWW.qblast('blastn', 'nr', f_record)
>   File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1092, in qblast
>     rid, rtoe = _parse_qblast_ref_page(handle)
>   File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1177, in _parse_qblast_ref_page
>     return rid, int(rtoe)
> ValueError: invalid literal for int(): 1.0 400 We don't support 0.9
> 
> 
> 
> 
> 
> 
> just for reference, the code I am running is listed below
> 
> from Bio import Fasta
> 
> file_for_blast = open('m_cold.fasta', 'r')
> f_iterator = Fasta.Iterator(file_for_blast)
> 
> f_record = f_iterator.next()
> 
> 
> from Bio.Blast import NCBIWWW
> result_handle = NCBIWWW.qblast('blastn', 'nr', f_record)
> 
> 
> # save the results for later, in case we want to look at it
> save_file = open('my_blast.out', 'w')
> blast_results = result_handle.read()
> save_file.write(blast_results)
> save_file.close()
> 
> import cStringIO
> blast_out = cStringIO.StringIO(blast_results)
> 
> 
> blast_out = open('my_blast.out', 'r')
> 
> 
> from Bio.Blast import NCBIXML
> 
> b_parser = NCBIXML.BlastParser()
> b_record = b_parser.parse(blast_out)
> 
> E_VALUE_THRESH = 0.04
> 
> for alignment in b_record.alignments:
>     for hsp in alignment.hsps:
>         if hsp.expect < E_VALUE_THRESH:
>             print '****Alignment****'
>             print 'sequence:', alignment.title
>             print 'length:', alignment.length
>             print 'e value:', hsp.expect
>             print hsp.query[0:75] + '...'
>             print hsp.match[0:75] + '...'
>             print hsp.sbjct[0:75] + '...'
> 
> 
> Gregory Omenkeukwu
> Provider Information Management
> 


======================================================================
The material in this transmission contains confidential information
intended for the addressee. If you are not the addressee, any disclosure 
or use of this information by you is strictly prohibited. If you have
received this transmission in error, please delete it and destroy
all copies. Notify American Imaging Management at 847 564-8500.
Thank You. 
======================================================================


From as_nascimento at yahoo.com.br  Mon Jan 16 20:29:02 2006
From: as_nascimento at yahoo.com.br (Alessandro S. Nascimento)
Date: Mon Jan 16 20:34:36 2006
Subject: [BioPython] problems when parsing blast output
Message-ID: <43CC485E.7050702@yahoo.com.br>

Hi all,

I am trying to write something very simpleto parse very extense blast 
output file. But when I try something as described in web's cookbook i 
get the following error message:


asn@frodo:~/fool/programming/python$ ./teste_asn.py
Traceback (most recent call last):
  File "./teste_asn.py", line 10, in ?
    b_record = b_iterator.next()
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", 
line 1342, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", 
line 567, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", 
line 95, in feed
    self._scan_header(uhandle, consumer)
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", 
line 127, in _scan_header
    read_and_call(uhandle, consumer.query_info, start='Query=')
  File "/usr/lib/python2.4/site-packages/Bio/ParserSupport.py", line 
300, in read_and_call
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'Query=':
Reference for composition-based statistics:

Does anyone have any idea?

Thanks so much

Alessandro
From biopython at maubp.freeserve.co.uk  Tue Jan 17 05:28:36 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue Jan 17 05:52:36 2006
Subject: [BioPython] problems when parsing blast output
In-Reply-To: <43CC485E.7050702@yahoo.com.br>
References: <43CC485E.7050702@yahoo.com.br>
Message-ID: <43CCC6D4.4020307@maubp.freeserve.co.uk>

Alessandro S. Nascimento wrote:
 > Hi all,
 >
 > I am trying to write something very simpleto parse very extense blast
 > output file. But when I try something as described in web's cookbook i
 > get the following error message:

...

 > SyntaxError: Line does not start with 'Query=':
 > Reference for composition-based statistics:
 >
 > Does anyone have any idea?

I don't remember seeing a reference line for "composition-based 
statistics" before.

Could you send us the command line you are using (i.e. what options did 
you give to BLASTALL).

We would probably also need to the the output file.  If it is very 
large, could you create a smaller one (e.g. different input) which shows 
the same problem?

If you like, you could submit a bug report, and then attach the blast 
output file to it (this saves emailing a large file to everyone on the 
list).

It looks like you are using Linux.  We would also like to know which 
version of BioPython you are using (1.41 maybe?).

Thank you

Peter


From biopython at maubp.freeserve.co.uk  Tue Jan 17 06:25:42 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue Jan 17 06:42:00 2006
Subject: [BioPython] problems when parsing blast output
In-Reply-To: <43CCCF56.40803@yahoo.com.br>
References: <43CC485E.7050702@yahoo.com.br>
	<43CCC6D4.4020307@maubp.freeserve.co.uk>
	<43CCCF56.40803@yahoo.com.br>
Message-ID: <43CCD436.7020704@maubp.freeserve.co.uk>

OK, thanks for the extra information Alessandro.

It looks like the current BLAST parser doesn't like the current blastpgp 
output.

A quick Google suggests that it used to work, my guess is the NCBI 
recently changed the format to add this extra reference:

Reference for composition-based statistics:
Schaffer, Alejandro A., L. Aravaind, Thomas L. Madden,
Sergei Shavirin, John L. Spouge, Yuri I. Wolf,
Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements",  Nucleic Acids Res. 
29:2994-3005.

If I delete this from the blast.output file you sent me, then your 
example code works fine.

If you are running the blast search separately, and then trying to parse 
the output in Python, this short term fix should get you up and running.

You could also try getting BLAST to produce XML output.  There was a 
recent post on the list where someone was having problems with that and 
multiple inputs, and a suggestion to cope.

I have logged a bug for this issue (and attached your test file to it):

http://bugzilla.open-bio.org/show_bug.cgi?id=1929

Hopefully someone will tackle this soon - I'm off sick today, and should 
really be resting.

Peter

Alessandro S. Nascimento wrote:
> Hi Peter,
> 
> as you will see in the attached script file, I tried two parse my blast 
> output into tow ways described in the biopython cookbook.
> I'm using linux Kubuntu, python 2.4.2. I'm not completely sure about my 
> biopython version, cause it was installed from debian repositories 
> through apt-get, but it seems to be version 1.30.
> 
> I also performed Blaspgp search separately using parameters "blastpgp -i 
> seqinput -o blast.output -j 50  -v 10000 -b 10000 -d ../db/nr -h 0.001". 
> A smaller blast result which gives me the same result from my python 
> script is also attached.
> 
> My desire is to get a large number of sequences using blastpgp, filter 
> them by length and identities (e.g. > 30 and < 90), comparing the 
> results one to another using blast2seq and align them using clustalw for 
> statistical aanalysis. I have tried to do it using bioperl, but get some 
> bugs when working with a large number of sequences. Then, I am trying 
> python now. This should be something quite simple.  (I guess)
> 
> Any help will be very appreciable!!!!
> 
> Thank you so much,
> 
> 
> Alessandro

From OmenkeukwuG at americanimaging.net  Wed Jan 18 14:54:47 2006
From: OmenkeukwuG at americanimaging.net (Omenkeukwu, Gregory)
Date: Wed Jan 18 14:50:24 2006
Subject: [BioPython] Qblast problem
Message-ID: <2B8B6630ACA40940AABCF63158986A311CB162@XCHSRV02.DEERFIELD.AIM.local>

I am new to Biopython and I am experiencing a little problem. I am running the Blast over the internet example and I keep getting stuck when I invoke the qblast function in NCBIWWW.py. Has anyone ever dealt with the problem below? Every time I run this code I get Invalid literal error for int(). I will appreciate any response thanks.


>>> result_handle = NCBIWWW.qblast('blastn', 'nr', f_record)

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in -toplevel-
    result_handle = NCBIWWW.qblast('blastn', 'nr', f_record)
  File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1092, in qblast
    rid, rtoe = _parse_qblast_ref_page(handle)
  File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 1177, in _parse_qblast_ref_page
    return rid, int(rtoe)
ValueError: invalid literal for int(): 1.0 400 We don't support 0.9


Gregory Omenkeukwu
Provider Information Management


======================================================================
The material in this transmission contains confidential information
intended for the addressee. If you are not the addressee, any disclosure 
or use of this information by you is strictly prohibited. If you have
received this transmission in error, please delete it and destroy
all copies. Notify American Imaging Management at 847 564-8500.
Thank You. 
======================================================================


From biopython at maubp.freeserve.co.uk  Thu Jan 19 05:12:50 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu Jan 19 05:09:01 2006
Subject: [BioPython] Qblast problem
In-Reply-To: <2B8B6630ACA40940AABCF63158986A311CB162@XCHSRV02.DEERFIELD.AIM.local>
References: <2B8B6630ACA40940AABCF63158986A311CB162@XCHSRV02.DEERFIELD.AIM.local>
Message-ID: <43CF6622.6090607@maubp.freeserve.co.uk>

Omenkeukwu, Gregory wrote:
> I am new to Biopython and I am experiencing a little problem. I am
> running the Blast over the internet example and I keep getting stuck
> when I invoke the qblast function in NCBIWWW.py. Has anyone ever
> dealt with the problem below? Every time I run this code I get
> Invalid literal error for int(). I will appreciate any response
> thanks.

Do you have an example XML blast output file that we could use to 
recreate the problem?

Ideally could log a bug for us, and then attach the XML file to the bug?

Peter

From OmenkeukwuG at americanimaging.net  Thu Jan 19 10:07:52 2006
From: OmenkeukwuG at americanimaging.net (Omenkeukwu, Gregory)
Date: Thu Jan 19 10:04:24 2006
Subject: [BioPython] Qblast problem
Message-ID: <2B8B6630ACA40940AABCF63158986A311CB169@XCHSRV02.DEERFIELD.AIM.local>

Here is the FASTA file (m_cold.fasta) and below it is the code I am running

>gi|8332116|gb|BE037100.1|BE037100 MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence
CACTAGTACTCGAGCGTNCTGCACCAATTCGGCACGAGCAAGTGACTACGTTNTGTGAACAGAAAATGGG
GAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATC
AATGAGCTTAAAATGGCAACAATGAGGCTCATCAATGATGCTAGTATGCTCGGTCATTACGGGTTTGGCA
CTCATTTCCTCAAATGGCTCGCCTGCCTTGCGGCTATTTACTTGTTGATATTGGATCGAACAAACTGGAG
AACCAACATGCTCACGTCACTTTTAGTCCCTTACATATTCCTCAGTCTTCCATCCGGGCCATTTCATCTG
TTCAGAGGCGAGGTCGGGAAATGGATTGCCATCATTGCAGTCGTGTTAAGGCTGTTCTTCAACCGGCATT
TCCCAGTTTGGCTGGAAATGCCTGGATCGTTGATACTCCTCCTGGTGGTGGCACCAGACTTCTTTACACA
CAAAGTGAAGGAGAGCTGGATCGGAATTGCAATTATGATAGCGATAGGGTGTCACCTGATGCAAGAACAT
ATCAGAGCCACTGGTGGCTTTTGGAATTCCTTCACACAGAGCCACGGAACTTTTAACACAATTGGGCTTA
TCCTTCTACTGGCTTACCCTGTCTGTTTATGGTCATCTTCATGATGTAGTAGCTTAGTCTTGATCCTAAT
CCTCAAATNTACTTTTCCAGCTCTTTCGACGCTCTTGCTAAAGCCCATTCAATTCGCCCCATATTTCGCA
CACATTCATTTCACCACCCAATACGTGCTCTCCTTCTCCCTCTCTCCCTCTCCTCCCTCTTTTCTTCCTC
TCACTTCTCTTCTCTTCTCTTCTTCAATACTCCCCTGGAGCGCCCTCTTCACCTCCCTACTCTCTACTCC
TCTCTCTCACTCTCTCTTCCTCTCTTATCTCTCTCCTCCTCTCCTTCTCATCCCTCCTCCTTCTCTTCCT
TTTCTTCTTTCTATCCACGCGCCATCCTCCCTCTTCCCTCTTCCCTTCTCTCTCCTCTCTTTCTCTCTCC
TCTCTTCCTCATCTCACCACCTCCTCCTCTCTTTCTTCCGTCCTCCTTCCCTTCCTTCTTC


from Bio import Fasta

file_for_blast = open('m_cold.fasta', 'r')
f_iterator = Fasta.Iterator(file_for_blast)

f_record = f_iterator.next()

from Bio.Blast import NCBIWWW
result_handle = NCBIWWW.qblast('blastn', 'nr', f_record)


-----Original Message-----
From: Peter [mailto:biopython@maubp.freeserve.co.uk]
Sent: Thursday, January 19, 2006 4:13 AM
To: Omenkeukwu, Gregory; BioPython@biopython.org
Subject: Re: [BioPython] Qblast problem


Omenkeukwu, Gregory wrote:
> I am new to Biopython and I am experiencing a little problem. I am
> running the Blast over the internet example and I keep getting stuck
> when I invoke the qblast function in NCBIWWW.py. Has anyone ever
> dealt with the problem below? Every time I run this code I get
> Invalid literal error for int(). I will appreciate any response
> thanks.

Do you have an example XML blast output file that we could use to 
recreate the problem?

Ideally could log a bug for us, and then attach the XML file to the bug?

Peter


======================================================================
The material in this transmission contains confidential information
intended for the addressee. If you are not the addressee, any disclosure 
or use of this information by you is strictly prohibited. If you have
received this transmission in error, please delete it and destroy
all copies. Notify American Imaging Management at 847 564-8500.
Thank You. 
======================================================================


From mike at maibaum.org  Thu Jan 19 06:59:07 2006
From: mike at maibaum.org (Michael Anthony Maibaum)
Date: Thu Jan 19 10:06:53 2006
Subject: [BioPython] NCBIXML for multiple queries
In-Reply-To: <16BDA615-72FC-43EC-8E68-B9739284A33B@maibaum.org>
References: <43CBC0C2.10800@lydon.com>
	<16BDA615-72FC-43EC-8E68-B9739284A33B@maibaum.org>
Message-ID: <EA2E7E99-A84F-48ED-BB3D-66886E201602@maibaum.org>


On 16 Jan 2006, at 21:08, Michael Anthony Maibaum wrote:

>
> On 16 Jan 2006, at 15:50, David Weisman wrote:
>
>> Hello,
>>
>> I tried using NCBIXML parsing on a local blast run, in which the  
>> input had multiple
>> query sequences.  Blastall writes multiple xml documents to the  
>> output file, and the
>> SAX parser threw a SAXParseException on the second <?xml...>  
>> declaration, complaining
>> of junk after the document element.

--snip--

>  I've been meaning to check if this fixed in cvs and file a bug if  
> not but haven't got around to it yet.


FWIW, I tried to file a bug with a patch, but bugzilla appears to  
have taken a dislike to me. Hopefully someone with cvs access can  
have a look at the patch I sent to biopython-dev but in the meantime  
if anyone else actually wants a patch I've included it with this  
message.


NCBIStandalone chunks multiple searches based on the string 'BLAST',  
which works fine for text output but doesn't work for xml. The patch  
attached adds '<?xml ' as an extra option to chunk the output. I've  
tested this on a fair amount of Blastpgp output but not with any  
other output, although I don't know of a reason why it wouldn't work.

If users are encouraged to use the xml output mode it may be better  
to put the '<?xml ' string first rather than last in the sequence of  
options.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: NCBIStandalone.patch
Type: application/octet-stream
Size: 572 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20060119/e7f5c630/NCBIStandalone.obj
From biopython at maubp.freeserve.co.uk  Thu Jan 19 11:47:47 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu Jan 19 12:02:12 2006
Subject: [BioPython] NCBIXML for multiple queries
In-Reply-To: <EA2E7E99-A84F-48ED-BB3D-66886E201602@maibaum.org>
References: <43CBC0C2.10800@lydon.com>	<16BDA615-72FC-43EC-8E68-B9739284A33B@maibaum.org>
	<EA2E7E99-A84F-48ED-BB3D-66886E201602@maibaum.org>
Message-ID: <43CFC2B3.1070001@maubp.freeserve.co.uk>

Michael Anthony Maibaum wrote:
> FWIW, I tried to file a bug with a patch, but bugzilla appears to  have 
> taken a dislike to me. Hopefully someone with cvs access can  have a 
> look at the patch I sent to biopython-dev but in the meantime  if anyone 
> else actually wants a patch I've included it with this  message.

Bug filed on your behalf, should stop the patch getting lost:-

http://bugzilla.open-bio.org/show_bug.cgi?id=1933

The fix looks fine to me, but I don't really have time to test it out 
today...

Peter

From biopython at maubp.freeserve.co.uk  Thu Jan 19 11:52:31 2006
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu Jan 19 12:09:54 2006
Subject: [BioPython] BLAST XML problem? (missing encoding)
In-Reply-To: <43C550F2.5030806@maubp.freeserve.co.uk>
References: <43C49400.1070802@burnham.org>
	<43C49734.7050409@burnham.org>	<43C4F091.2020408@maubp.freeserve.co.uk>	<43C53B71.5080705@burnham.org>
	<43C550F2.5030806@maubp.freeserve.co.uk>
Message-ID: <43CFC3CF.9050103@maubp.freeserve.co.uk>

Hi all,

As discussed earlier, we have a problem on some Blast XML output files
where entities like a-umlaut appear in names e.g. "Alejandro Sch?ffer",
without the XML file specifying an encoding:

<?xml version="1.0"?>

rather than say:

<?xml version="1.0" encoding="ISO-8859-1"?>

Did anyone get in touch with the NCBI over this issue?  Any reply?

Peter

From idoerg at burnham.org  Thu Jan 19 12:27:01 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu Jan 19 12:31:03 2006
Subject: [BioPython] BLAST XML problem? (missing encoding)
In-Reply-To: <43CFC3CF.9050103@maubp.freeserve.co.uk>
References: <43C49400.1070802@burnham.org>	<43C49734.7050409@burnham.org>	<43C4F091.2020408@maubp.freeserve.co.uk>	<43C53B71.5080705@burnham.org>	<43C550F2.5030806@maubp.freeserve.co.uk>
	<43CFC3CF.9050103@maubp.freeserve.co.uk>
Message-ID: <43CFCBE5.5000804@burnham.org>

They seem to have fixed the encoding.. we currently have a production 
level system which seems to work fine with this.

Iddo


Peter wrote:

> Hi all,
>
> As discussed earlier, we have a problem on some Blast XML output files
> where entities like a-umlaut appear in names e.g. "Alejandro Sch?ffer",
> without the XML file specifying an encoding:
>
> <?xml version="1.0"?>
>
> rather than say:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> Did anyone get in touch with the NCBI over this issue?  Any reply?
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org

From ziemys.1 at osu.edu  Fri Jan 20 16:55:38 2006
From: ziemys.1 at osu.edu (ARTURAS ZIEMYS)
Date: Fri Jan 20 17:08:20 2006
Subject: [BioPython] DSSP / output
Message-ID: <1fa3a151fa29a7.1fa29a71fa3a15@osu.edu>

Hi, 

After 'dssp=DSSP(structure[0], 'test.pdb')' my dssp contains records marked with * (see below). I did not find the description for it in dssp homepage. 

Is it accessibility expressed in % ? I wanna just to be sure.
* 
(<Residue GLU het= resseq=43 icode= >, '-', 46, 0.23711340206185566) 
(<Residue THR het= resseq=44 icode= >, 'S', 28, 0.19718309859154928) 
(<Residue LYS het= resseq=45 icode= >, 'S', 121, 0.59024390243902436) 

With best 
Arturas 


From bill at barnard-engineering.com  Thu Jan 26 14:18:31 2006
From: bill at barnard-engineering.com (Bill Barnard)
Date: Thu Jan 26 14:43:05 2006
Subject: [BioPython] Patches to enable Doc building for source and rpm
	distributions
In-Reply-To: <1127857738.16589.94.camel@tioga.barnard-engineering.com>
References: <1127779458.16589.60.camel@tioga.barnard-engineering.com>
	<1127802694.16589.78.camel@tioga.barnard-engineering.com>
	<1127857738.16589.94.camel@tioga.barnard-engineering.com>
Message-ID: <1138303111.12796.29.camel@tioga.barnard-engineering.com>

At the end of September, when I last had an opportunity to work on this,
I rewrote the Makefile for the Doc directory subtree. The gist of the
work was to properly call pdflatex, hevea, & hacha to build the pdfs,
htmls, & txt files from their .tex input files. I made a common.mk file
to abstract the common parts from the subdirectory makefiles.

I created a patch for the Doc/biopdb_faq.tex file, generated from
Doc/biopdb_faq.lyx, which contained an error from the perspective of the
doc-generating utilities.

I modified MANIFEST.in to include the new files, and to exclude the
files which will be subsequently generated by the make/build and hence
included in the distribution. I added a one line mod to setup.py to call
the Doc make [ os.system('make -C Doc') ].

I sent these changes to the mailing list
http://www.biopython.org/pipermail/biopython/2005-September/002777.html

Recently I retrieved updates from CVS and discovered a small change I
needed to make. Doc/Makefile did not correctly clean the dirs and
subdirs; I fixed that with a command line env target, e.g. "make
TARGET=clean". I also modified the top level Makefile which called make
clean for the Doc directory so it uses the new calling convention. (This
is probably irrelevant to the purpose of that makefile however.)

I also note that my patch attachments to my September emails were
cleaned by the mail list server. I will attach the patches and new
common.mk in a tarball to this email. These patches and files could be
applied and added to the current CVS tree as of 26-Jan-2006. Please feel
free to use any portion that seems useful.

Best,

Bill
-- 
Bill Barnard <bill@barnard-engineering.com>

p.s. In order that you see exactly which files I've touched I'm
including some details from my log files below

(The Updated files below are the ones currently checked into CVS and are
generated from the make; they could be removed from CVS.)

cvs-update.2006-01-26.log
#########################
? Doc/biopdb_faq.tex.hevea-html-fix.patch
? Doc/common.mk
M MANIFEST.in
M Makefile
M setup.py
M Doc/Makefile
U Doc/Tutorial.txt
U Doc/cookbook/LogisticRegression/LogisticRegression.html
U Doc/cookbook/LogisticRegression/LogisticRegression.pdf
U Doc/cookbook/LogisticRegression/LogisticRegression.txt
M Doc/cookbook/LogisticRegression/Makefile
M Doc/cookbook/biopython_test/Makefile
U Doc/cookbook/biopython_test/biopython_test.html
U Doc/cookbook/biopython_test/biopython_test.pdf
U Doc/cookbook/biopython_test/biopython_test.txt
M Doc/cookbook/genbank_to_fasta/Makefile
U Doc/cookbook/genbank_to_fasta/genbank_to_fasta.html
U Doc/cookbook/genbank_to_fasta/genbank_to_fasta.pdf
U Doc/cookbook/genbank_to_fasta/genbank_to_fasta.txt
U Doc/install/Installation.html
U Doc/install/Installation.pdf
U Doc/install/Installation.txt
M Doc/install/Makefile

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Doc_Makefile_fix.tgz
Type: application/x-compressed-tar
Size: 2985 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20060126/ae259533/Doc_Makefile_fix.bin
From omid9dr18 at hotmail.com  Thu Jan 26 16:30:51 2006
From: omid9dr18 at hotmail.com (Omid Khalouei)
Date: Thu Jan 26 16:44:05 2006
Subject: [BioPython] Structural superpostion script
Message-ID: <BAY103-F3BB6006B14B51AD5A193CE6150@phx.gbl>

Hello,

I was wondering if there are any open source protein structural alignment 
(superposition) programs, specifically in Python.

Thanks alot,
Sam K.


From idoerg at burnham.org  Thu Jan 26 18:27:37 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu Jan 26 18:25:32 2006
Subject: [BioPython] pbwiki -- bad idea
Message-ID: <43D95AE9.9020308@burnham.org>

Sorry, the pbwiki thing was a bad idea... not very easy.

Just email me the information regarding use of biopython. I'll sort it out.

Thanks,

Iddo

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From idoerg at burnham.org  Thu Jan 26 18:23:53 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu Jan 26 18:25:51 2006
Subject: [BioPython] Tools in biopython
Message-ID: <43D95A09.7000508@burnham.org>

Hi all,

I thought it would be cool to see which bioinformatics tool are have, to 
any extent, a Biopython module underthe hood. Those can be web servers, 
proprietary and open-source tools, and anything else. I would like to 
know by Saturday, so I can write this up in the OBF newsletter (sorry 
about the last minute anouncement). Can you login to

http://biopython.pbwiki.com/BioPythonTools

username: biopython
password: biopython

Add the tools into which biopython modules have been incorporated, and a 
URL,  if relevant.

Thanks,

Iddo

-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org

From j.pansanel at pansanel.net  Fri Jan 27 03:19:27 2006
From: j.pansanel at pansanel.net (Jerome PANSANEL)
Date: Fri Jan 27 03:51:08 2006
Subject: [BioPython] Structural superpostion script
In-Reply-To: <BAY103-F3BB6006B14B51AD5A193CE6150@phx.gbl>
References: <BAY103-F3BB6006B14B51AD5A193CE6150@phx.gbl>
Message-ID: <200601270919.27826.j.pansanel@pansanel.net>

Hi !

PyMOL can do such job :
http://www.rubor.de/bioinf/tips_modeling.html#superpos
http://pymol.sourceforge.net/newman/ref/S1000comref.html#2_110

Jerome Pansanel

Le Jeudi 26 Janvier 2006 22:30, Omid Khalouei a ?crit?:
> Hello,
>
> I was wondering if there are any open source protein structural alignment
> (superposition) programs, specifically in Python.
>
> Thanks alot,
> Sam K.
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython


From idoerg at burnham.org  Fri Jan 27 12:49:34 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Fri Jan 27 12:45:48 2006
Subject: [BioPython] Tools in biopython
In-Reply-To: <43D95A09.7000508@burnham.org>
References: <43D95A09.7000508@burnham.org>
Message-ID: <43DA5D2E.7020400@burnham.org>


Thanks to all who have written. A draft release of the Biopython bit in 
the OBF newsletter is viewable at:

http://www.open-bio.org/wiki/Newsletter:2006_Winter#BioPython

Let me know if I screwed up, if there is something i should put in wich 
I have not, or if tehre is something I should take out. There is still 
time to fix things (by Sunday).

Best,

Iddo

Iddo Friedberg wrote:

> Hi all,
>
> I thought it would be cool to see which bioinformatics tool are have, 
> to any extent, a Biopython module underthe hood. Those can be web 
> servers, proprietary and open-source tools, and anything else. I would 
> like to know by Saturday, so I can write this up in the OBF newsletter 
> (sorry about the last minute anouncement).


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org
http://BioFunctionPrediction.com

From ajoyner at UCSD.Edu  Tue Jan 31 11:50:06 2006
From: ajoyner at UCSD.Edu (ajoyner@UCSD.Edu)
Date: Tue Jan 31 11:46:25 2006
Subject: [BioPython] Pairwise BLASTS
Message-ID: <200601311650.k0VGo6KW006353@smtp.ucsd.edu>

Hi,
Does anyone know of a program that I can use to run Pairwise BLASTS in a batch 
fashion? This would be as opposed to an online GUI.
Thanks!

From idoerg at burnham.org  Tue Jan 31 12:08:38 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Tue Jan 31 12:09:34 2006
Subject: [BioPython] Pairwise BLASTS
In-Reply-To: <200601311650.k0VGo6KW006353@smtp.ucsd.edu>
References: <200601311650.k0VGo6KW006353@smtp.ucsd.edu>
Message-ID: <43DF9996.3030309@burnham.org>

Attached is a little script I wrote a while ago.

Usage example:

./all_bl2seq   *.fasta


HTH,

Iddo


ajoyner@ucsd.edu wrote:

>Hi,
>Does anyone know of a program that I can use to run Pairwise BLASTS in a batch 
>fashion? This would be as opposed to an online GUI.
>Thanks!
>
>_______________________________________________
>BioPython mailing list  -  BioPython@biopython.org
>http://biopython.org/mailman/listinfo/biopython
>
>
>  
>


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://iddo-friedberg.org
http://BioFunctionPrediction.org

-------------- next part --------------
#!/usr/bin/python
import sys
import glob
import os
TMP_BLAST_OUT = 'tmp_blast_out'
BLAST_OUT = 'blast_out.blast'

def all_bl2seq(file_list):
	# accepts a file list, does an all-vs-all pairwise BLAST. Must have NCBI tools installed
	for seq1 in file_list:
		print '.',
		for seq2 in file_list[file_list.index(seq1)+1:]:
			os.system('bl2seq -p blastp -i %s -D 1 -j %s -o %s' % (seq1, seq2,  TMP_BLAST_OUT))
			os.system('cat %s >> %s' % (TMP_BLAST_OUT, BLAST_OUT))

if __name__ == "__main__":
	try:
		all_bl2seq(sys.argv[1:])
	except:
		print "usage: all_bl2seq <file-list>"
From hubin.keio at gmail.com  Tue Jan 31 22:53:17 2006
From: hubin.keio at gmail.com (Bin Hu)
Date: Wed Feb  1 01:35:06 2006
Subject: [BioPython] protein net charge and PSI blast
Message-ID: <71dea9850601311953s58ec5c30p67de4b751f543035@mail.gmail.com>

Hi,

Does anyone know any existing package to calculate the protein net
charge? And could any one tell me how to do a PSI-blast instead of a
regular blast using biopython? Thank you.

Bin